We propose a collaborative project between an industrial partner and four academic laboratories to design a unique tool, a focused chemical library, that will accelerate identification of bioactive molecules targeting protein-protein interfaces. <br /> <br />This focused chemical library, made of approximately 10,000 compounds, will be dedicated to the inhibition of protein-protein interactions (PPIs).
We propose a collaborative project between an industrial partner (Hybrigenics) and four academic laboratories to design a unique tool, a focused chemical library, that will accelerate identification of bioactive molecules targeting protein-protein interfaces. This focused library, made of approximately 10,000 compounds, will be dedicated to the inhibition of protein-protein interactions (PPIs). <br /> <br />Once validated, this original chemical library will be made available to academic laboratories, especially to the French network of academic screening platforms, ‘GDR3056 ChemBioScreen’, to biotech’s and to pharmaceutical industries through an MTA written by CNRS valorization department after agreement with the different valorization departments (INSERMTransfert, SATT etc.…). <br /> <br />Some concepts and methods driving the selection of PPI modulators will be also available to the scientific community. This important structuration effort will permit to decrease the cost for new chemical entities identification in this very challenging field. Chemical probes that will be developed as a direct consequence of this proposal will provide innovative tools to study biological processes involving protein-protein interactions. Some of them will certainly be patented and transferred to industries. Additionally, one should keep in mind that the present project draws on several breakthroughs that are highly likely to materialize into high impact products and discoveries, such as: <br /> <br />1/ A unique diverse chemical library dedicated to PPIs that will be made available for the entire scientific community <br />2/ Publications in peer-reviewed journals related to this work <br />3/ New hits identified from the dozen of HTS that will be developed to evaluate the library, some being published as molecular probes to study biology processes while others will get engaged in drug discovery programs <br />4/ Return on investments for the company partner ‘Hybrigenics’ that will provide services with this library.
Task 1: Preparation of the ‘PPI-oriented’ Fr-PPI-Chem library.
Subtask 1.1 SDF Preparation (Partners 1 & 3)
Protocols for SDF preparation, standardization and curation were applied to Ambinter and MolPort collection of compounds (partner 1) and to the ZINC database (partner 3).
Subtask 1.2a Updates of the PPI Filters (Partner 1)
85 non-redundant PPI orthosteric inhibitors from 2P2IDB and 734 approved drugs from DrugBank were collected as positive and negative training datasets. Compounds were standardized and models were constructed using different type of 2D molecular descriptors (MOE 2D, Dragon, ISIDA) and different machine learning methods (SVM, RF). The best models were selected based on ROC AUC in 5-folds cross-validation. An external validation composed of 2,032 active and 135,315 inactive compounds from the PubChem database were used in order to externally validate our models. The best models constructed demonstrate high enrichment factors in 5-CV and external validation. Y-scrambling were also performed to confirm that our models were not the result of chance correlation.
Subtask 1.2b Updates of the PPI Filters (Partner 3)
A dataset of iPPI containing 3,033 compounds (1756 from iPPI-DB and 1277 from TIMBAL) were first prepared. Similarly, we built a reference dataset of 82,382 non-iPPI. Then we calculated 2990 molecular descriptors from different programs (MOE, Dragon, RDkit) on these two datasets. We removed invariant or correlated descriptors. Using the 167 resulting descriptors, four classification methods (J48, RF, JRip and SVM) were applied to predict the class of the molecules: iPPI or non-IPPI. Each method’s parameters were optimized by grid search using a 5-folds cross validation on the training set. The models were selected upon best mean F1 score. The performance of the models was evaluated by predicting a test set (30% of training data). We also performed a response permutation testing (Y-scrambling).
Subtask 1.2a Updates of the PPI Filters (Partner 1)
These models were used to filter chemical vendor databases (Ambinter and MolPort) in order to find putative iPPI-like orthosteric inhibitors.
From more than 12M of compounds, we selected 74,376 compounds for further steps.
Subtask 1.2b Updates of the PPI Filters (Partner 3)
The best model for each of the 4 methods was used to select from the purchasable ZINC database a cumulated list of 143,967 putative iPPI compounds.
Subtask 1.3 Updates of the in silico ADMET Toolbox (Partner 3)
Selections from Partner 1 and 3 were combined; molecules were then standardized and duplicates removed, leading to a collection of 103,656 molecules.
FAF-Drug3 software was used to predict PAINS and undesirable compounds. The software has been especially optimized for this project.
After several meetings within the consortium, a consensus was reached based on individual experiences and analysis of more than 100 publications. Compounds were selected if less than 6 aromatic rings, fused aromatic rings = 3, rotatable bonds = 20, heteroatoms = 12, logP (XlogP3) and logD between -7 and 8, molecular weight > 300 g.mol-1, number of halogens (Br, I, Cl) < 5 and compounds with linear alkyl chains, if any, with less than 5 consecutive CH2. Compounds containing at least one of the 137 sub-structures associated with toxicity and those containing at least one of the 22 PAINS scaffolds (Filter-A) were discarded. In the end, this step resulted in the selection of 78,243 compounds.
Subtask 1.4 Chemical space analysis (Partner 4)
A final step was applied to reduce the number of compounds and to generate a diversity-oriented chemical library. The 78,243 compounds were clusterized using FCFP6 fingerprints from Pipeline Pilot with a maximum Tanimoto-based dissimilarity <0.3 in each cluster. Clusters containing less than 3 compounds were discarded, taking into account the limited potential of these compounds in SAR analyses in the optimization phase.
The final selection of approximately 10-12,000 compounds should be ready in the next few days.
Quotations will be requested to several providers and compounds will be purchased in the coming weeks as scheduled initially.
This ANR Program has permitted to rationalize and organize the French cheminformatics laboratories identified has pioneer in the Protein-Protein Interaction field of research. Thanks to this reorganization we have now pooled our ‘know-how’ and will soon propose to the scientific community a unique tool, a product of matter – a chemical database oriented towards the inhibition of Protein-Protein Interactions, ‘Fr-PPI-Chem’. This tool will be invaluable for the scientific community looking for new chemical entities targeting these challenging interfaces.
Moreover, this ANR program and the chemical database that will be distributed ‘on demand’ have been presented to the CNRS school ‘Ecole Thematique de Criblage’. This program was particularly appreciated by the entire community, waiting for this important ‘tool’ to be available.
The product of our ANR program, a chemical library publically available for the scientific community is now waited by the scientist throughout the world (at least in Europe) working on this exciting ‘Protein-protein interaction inhibition’ field. These scientists need specific calls to help them for the next step: Hit identification on their original targets.
Our question to the ANR: “would it be possible to organize a dedicated call, direct application of this matter of product, to finance the High Throughput Screening in the laboratories and accelerate the process of Hit identification – Hit to lead optimization of iPPI in this challenging field?”
We are ready to help the ANR to organize such dedicated call if necessary.
Protein-protein interactions (PPI) play a major role in most biological processes and they are involved in various cell disorders, leading to numerous diseases. As a consequence, PPI have emerged as a new class of very promising therapeutic targets despite being considered as challenging mainly due to the nature of the interface that did not evolve to interact with small molecule compounds unlike enzymes, for example. The characterization of small molecule compounds able to modulate PPIs requires the development of specific tools to improve and reduce the cost of drug discovery campaigns. One of the most popular approaches to design PPI inhibitors, especially when no structural data is available for the target, is high throughput screening (HTS). This strategy relies on the screening of a collection of medium or large size collection of compounds. In addition to the structural complexity of the interfacial features of transiently formed complexes, the major hurdles in the field concern the nature of compounds that are used to search for PPI modulators. The poor hit rate success in drug discovery campaigns is often due to a choice of screening libraries that are either not adapted to the chemical space of PPI or too small for such diverse class of targets or both. There is therefore a major need to develop innovative chemical libraries dedicated to PPI to be shared with the scientific community. Data concerning both PPI targets and their small molecule inhibitors has resulted in the development of dedicated structural databases and tools to characterize the profiles of PPI disruptors through the detailed analysis of their physicochemical properties. Members of this consortium have used these specific properties to develop dedicated algorithms and select ‘PPI-like’ modulators within large collection of compounds.
In this proposal, we offer to build a PPI-focused chemical library, compatible with drug discovery constraints that will be made public and available to the scientific community as a powerful tool for discovering new biological probes and early hits for the development of potential therapeutic drugs. We will first optimize our current algorithms and tools by taking into account recent results from the screening of our PPI-focused chemical library of 1664 compounds that was used as a proof of concept as well as current data from our respective databases dedicated to PPI. Some important issues such as solubility of the compounds, physicochemical properties, interference with bioassays, drug-likeness or pharmacokinetics properties will also be considered. This work will result in optimized and standardized protocols that will be applied to various vendors compound collections to build a diverse chemical library of approximately 10,000 compounds that will be purchased, plated and stored. Compounds in the focused-library will be evaluated for their cytotoxicity on three cell lines. The focused-library will then be evaluated against 10 structurally diverse PPI targets using cell-based assays or in vitro assays such as Homogeneous Time Resolved Fluorescence (HTRF) and fluorescence polarization technologies. Primary hits will be validated using dose response bioassays and IC50 will be measured for the best compounds.
This study will first result in improved strategies to select PPI-like modulators from large collections of compounds. A validated PPI-focused chemical library of approximately 10,000 small molecule compounds will be made available through MTA to academic laboratories, screening platforms, biotech’s and pharmaceutical industries. Several chemical probes will be developed during the validation phase of this proposal on 10 biologically important targets. The large distribution of the library will also ensure that other chemical probes will be developed as a direct consequence of this proposal and will provide innovative tools to study biological processes involving protein-protein interactions.
Monsieur Xavier MORELLI (Centre National de la Recherche Scientifique délégation Provence et Corse _ [Centre de Recherche en Cancérologie de Marseille])
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Hybrigenics services SAS
CEA/DSV/iRTSV/BGE Laboratoire de Biologie à Grande Echelle- Plateforme de Criblage pour des Molécules Bioactives
INSERM_[MTi] Molécules Thérapeutiques in silico
AFMB Laboratoire Architecture et Fonction des Macromolécules Biologiques
ICOA INSTITUT DE CHIMIE ORGANIQUE ET ANALYTIQUE
CNRS DR12 _ [CRCM] Centre National de la Recherche Scientifique délégation Provence et Corse _ [Centre de Recherche en Cancérologie de Marseille]
Help of the ANR 524,598 euros
Beginning and duration of the scientific project: September 2015 - 36 Months