Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
Front Chem ; 10: 1090643, 2022.
Article in English | MEDLINE | ID: mdl-36700083

ABSTRACT

Protein-protein interactions (PPIs) are recognized as important targets in drug discovery. The characteristics of molecules that inhibit PPIs differ from those of small-molecule compounds. We developed a novel chemical library database system (DLiP) to design PPI inhibitors. A total of 32,647 PPI-related compounds are registered in the DLiP. It contains 15,214 newly synthesized compounds, with molecular weight ranging from 450 to 650, and 17,433 active and inactive compounds registered by extracting and integrating known compound data related to 105 PPI targets from public databases and published literature. Our analysis revealed that the compounds in this database contain unique chemical structures and have physicochemical properties suitable for binding to the protein-protein interface. In addition, advanced functions have been integrated with the web interface, which allows users to search for potential PPI inhibitor compounds based on types of protein-protein interfaces, filter results by drug-likeness indicators important for PPI targeting such as rule-of-4, and display known active and inactive compounds for each PPI target. The DLiP aids the search for new candidate molecules for PPI drug discovery and is available online (https://skb-insilico.com/dlip).

2.
Sci Rep ; 5: 17209, 2015 Nov 26.
Article in English | MEDLINE | ID: mdl-26607293

ABSTRACT

A search of broader range of chemical space is important for drug discovery. Different methods of computer-aided drug discovery (CADD) are known to propose compounds in different chemical spaces as hit molecules for the same target protein. This study aimed at using multiple CADD methods through open innovation to achieve a level of hit molecule diversity that is not achievable with any particular single method. We held a compound proposal contest, in which multiple research groups participated and predicted inhibitors of tyrosine-protein kinase Yes. This showed whether collective knowledge based on individual approaches helped to obtain hit compounds from a broad range of chemical space and whether the contest-based approach was effective.


Subject(s)
Drug Evaluation, Preclinical , Protein Kinase Inhibitors/analysis , Protein Kinase Inhibitors/pharmacology , Proto-Oncogene Proteins c-yes/antagonists & inhibitors , Humans , Principal Component Analysis , Proto-Oncogene Proteins c-yes/chemistry , Reproducibility of Results , src-Family Kinases/metabolism
3.
J Chem Inf Model ; 54(10): 2751-63, 2014 Oct 27.
Article in English | MEDLINE | ID: mdl-25220713

ABSTRACT

The concept of ligand efficiency (LE) indices is widely accepted throughout the drug design community and is frequently used in a retrospective manner in the process of drug development. For example, LE indices are used to investigate LE optimization processes of already-approved drugs and to re-evaluate hit compounds obtained from structure-based virtual screening methods and/or high-throughput experimental assays. However, LE indices could also be applied in a prospective manner to explore drug candidates. Here, we describe the construction of machine learning-based regression models in which LE indices are adopted as an end point and show that LE-based regression models can outperform regression models based on pIC50 values. In addition to pIC50 values traditionally used in machine learning studies based on chemogenomics data, three representative LE indices (ligand lipophilicity efficiency (LLE), binding efficiency index (BEI), and surface efficiency index (SEI)) were adopted, then used to create four types of training data. We constructed regression models by applying a support vector regression (SVR) method to the training data. In cross-validation tests of the SVR models, the LE-based SVR models showed higher correlations between the observed and predicted values than the pIC50-based models. Application tests to new data displayed that, generally, the predictive performance of SVR models follows the order SEI > BEI > LLE > pIC50. Close examination of the distributions of the activity values (pIC50, LLE, BEI, and SEI) in the training and validation data implied that the performance order of the SVR models may be ascribed to the much higher diversity of the LE-based training and validation data. In the application tests, the LE-based SVR models can offer better predictive performance of compound-protein pairs with a wider range of ligand potencies than the pIC50-based models. This finding strongly suggests that LE-based SVR models are better than pIC50-based models at predicting bioactivities of compounds that could exhibit a much higher (or lower) potency.


Subject(s)
Drugs, Investigational/chemistry , Ion Channels/chemistry , Protein Kinases/chemistry , Receptors, Opioid/chemistry , Small Molecule Libraries/chemistry , Support Vector Machine , Binding Sites , Databases, Chemical , High-Throughput Screening Assays , Humans , Hydrophobic and Hydrophilic Interactions , Inhibitory Concentration 50 , Ion Channels/agonists , Ion Channels/antagonists & inhibitors , Ligands , Logistic Models , Molecular Conformation , Predictive Value of Tests , Protein Binding , Receptors, Opioid/agonists , Research Design , Structure-Activity Relationship , User-Computer Interface
4.
J Chem Inf Model ; 53(10): 2525-37, 2013 Oct 28.
Article in English | MEDLINE | ID: mdl-24020509

ABSTRACT

Machine learning methods based on ligand-protein interaction data in bioactivity databases are one of the current strategies for efficiently finding novel lead compounds as the first step in the drug discovery process. Although previous machine learning studies have succeeded in predicting novel ligand-protein interactions with high performance, all of the previous studies to date have been heavily dependent on the simple use of raw bioactivity data of ligand potencies measured by IC50, EC50, K(i), and K(d) deposited in databases. ChEMBL provides us with a unique opportunity to investigate whether a machine-learning-based classifier created by reflecting ligand efficiency other than the IC50, EC50, K(i), and Kd values can also offer high predictive performance. Here we report that classifiers created from training data based on ligand efficiency show higher performance than those from data based on IC50 or K(i) values. Utilizing GPCRSARfari and KinaseSARfari databases in ChEMBL, we created IC50- or K(i)-based training data and binding efficiency index (BEI) based training data then constructed classifiers using support vector machines (SVMs). The SVM classifiers from the BEI-based training data showed slightly higher area under curve (AUC), accuracy, sensitivity, and specificity in the cross-validation tests. Application of the classifiers to the validation data demonstrated that the AUCs and specificities of the BEI-based classifiers dramatically increased in comparison with the IC50- or K(i)-based classifiers. The improvement of the predictive power by the BEI-based classifiers can be attributed to (i) the more separated distributions of positives and negatives, (ii) the higher diversity of negatives in the BEI-based training data in a feature space of SVMs, and (iii) a more balanced number of positives and negatives in the BEI-based training data. These results strongly suggest that training data based on ligand efficiency as well as data based on classical IC50, EC50, K(d), and K(i) values are important when creating a classifier using a machine learning approach based on bioactivity data.


Subject(s)
Artificial Intelligence , Protein Kinases/chemistry , Receptors, G-Protein-Coupled/chemistry , Small Molecule Libraries/chemistry , Support Vector Machine , Area Under Curve , Data Mining , Databases, Chemical , Databases, Pharmaceutical , Drug Discovery , Humans , Inhibitory Concentration 50 , Ligands , Principal Component Analysis , Receptors, G-Protein-Coupled/agonists , Receptors, G-Protein-Coupled/antagonists & inhibitors , Sensitivity and Specificity
5.
Database (Oxford) ; 2012: bas034, 2012.
Article in English | MEDLINE | ID: mdl-23060433

ABSTRACT

Druggable Protein-protein Interaction Assessment System (Dr. PIAS) is a database of druggable protein-protein interactions (PPIs) predicted by our support vector machine (SVM)-based method. Since the first publication of this database, Dr. PIAS has been updated to version 2.0. PPI data have been increased considerably, from 71,500 to 83,324 entries. As the new positive instances in our method, 4 PPIs and 10 tertiary structures have been added. This addition increases the prediction accuracy of our SVM classifier in comparison with the previous classifier, despite the number of added PPIs and structures is small. We have introduced the novel concept of 'similar positives' of druggable PPIs, which will help researchers discover small compounds that can inhibit predicted druggable PPIs. Dr. PIAS will aid the effective search for druggable PPIs from a mine of interactome data being rapidly accumulated. Dr. PIAS 2.0 is available at http://www.drpias.net.


Subject(s)
Algorithms , Computational Biology/methods , Databases, Protein , Drug Discovery/methods , Molecular Targeted Therapy/methods , Protein Binding/drug effects , Protein Interaction Mapping , Support Vector Machine
6.
Sci Rep ; 2: 323, 2012.
Article in English | MEDLINE | ID: mdl-22435086

ABSTRACT

Molecular docking is the most commonly used technique in the modern drug discovery process where computational approaches involving docking algorithms are used to dock small molecules into macromolecular target structures. Over the recent years several evaluation studies have been reported by independent scientists comparing the performance of the docking programs by using default 'black box' protocols supplied by the software companies. Such studies have to be considered carefully as the docking programs can be tweaked towards optimum performance by selecting the parameters suitable for the target of interest. In this study we address the problem of selecting an appropriate docking and scoring function combination (88 docking algorithm-scoring functions) for substrate specificity predictions for feruloyl esterases, an industrially relevant enzyme family. We also propose the 'Key Interaction Score System' (KISS), a more biochemically meaningful measure for evaluation of docking programs based on pose prediction accuracy.

7.
BMC Bioinformatics ; 12: 50, 2011 Feb 09.
Article in English | MEDLINE | ID: mdl-21303559

ABSTRACT

BACKGROUND: The amount of data on protein-protein interactions (PPIs) available in public databases and in the literature has rapidly expanded in recent years. PPI data can provide useful information for researchers in pharmacology and medicine as well as those in interactome studies. There is urgent need for a novel methodology or software allowing the efficient utilization of PPI data in pharmacology and medicine. RESULTS: To address this need, we have developed the 'Druggable Protein-protein Interaction Assessment System' (Dr. PIAS). Dr. PIAS has a meta-database that stores various types of information (tertiary structures, drugs/chemicals, and biological functions associated with PPIs) retrieved from public sources. By integrating this information, Dr. PIAS assesses whether a PPI is druggable as a target for small chemical ligands by using a supervised machine-learning method, support vector machine (SVM). Dr. PIAS holds not only known druggable PPIs but also all PPIs of human, mouse, rat, and human immunodeficiency virus (HIV) proteins identified to date. CONCLUSIONS: The design concept of Dr. PIAS is distinct from other published PPI databases in that it focuses on selecting the PPIs most likely to make good drug targets, rather than merely collecting PPI data.


Subject(s)
Databases, Protein , Protein Interaction Mapping/methods , Proteins/chemistry , Software , Animals , Human Immunodeficiency Virus Proteins/chemistry , Humans , Mice , Rats , Technology, Pharmaceutical/methods
8.
BMC Bioinformatics ; 10: 263, 2009 Aug 25.
Article in English | MEDLINE | ID: mdl-19703312

ABSTRACT

BACKGROUND: Protein-protein interactions (PPIs) are challenging but attractive targets of small molecule drugs for therapeutic interventions of human diseases. In this era of rapid accumulation of PPI data, there is great need for a methodology that can efficiently select drug target PPIs by holistically assessing the druggability of PPIs. To address this need, we propose here a novel approach based on a supervised machine-learning method, support vector machine (SVM). RESULTS: To assess the druggability of the PPIs, 69 attributes were selected to cover a wide range of structural, drug and chemical, and functional information on the PPIs. These attributes were used as feature vectors in the SVM-based method. Thirty PPIs known to be druggable were carefully selected from previous studies; these were used as positive instances. Our approach was applied to 1,295 human PPIs with tertiary structures of their protein complexes already solved. The best SVM model constructed discriminated the already-known target PPIs from others at an accuracy of 81% (sensitivity, 82%; specificity, 79%) in cross-validation. Among the attributes, the two with the greatest discriminative power in the best SVM model were the number of interacting proteins and the number of pathways. CONCLUSION: Using the model, we predicted several promising candidates for druggable PPIs, such as SMAD4/SKI. As more PPI data are accumulated in the near future, our method will have increased ability to accelerate the discovery of druggable PPIs.


Subject(s)
Artificial Intelligence , Computational Biology/methods , Pharmaceutical Preparations/chemistry , Protein Interaction Mapping/methods , Proteins/chemistry , Proteins/metabolism , Binding Sites , Databases, Protein , Pharmaceutical Preparations/metabolism
10.
BMC Pharmacol ; 7: 10, 2007 Aug 20.
Article in English | MEDLINE | ID: mdl-17705877

ABSTRACT

BACKGROUND: Protein-protein interactions (PPIs) are challenging but attractive targets for small chemical drugs. Whole PPIs, called the 'interactome', have been emerged in several organisms, including human, based on the recent development of high-throughput screening (HTS) technologies. Individual PPIs have been targeted by small drug-like chemicals (SDCs), however, interactome data have not been fully utilized for exploring drug targets due to the lack of comprehensive methodology for utilizing these data. Here we propose an integrative in silico approach for discovering candidates for drug-targetable PPIs in interactome data. RESULTS: Our novel in silico screening system comprises three independent assessment procedures: i) detection of protein domains responsible for PPIs, ii) finding SDC-binding pockets on protein surfaces, and iii) evaluating similarities in the assignment of Gene Ontology (GO) terms between specific partner proteins. We discovered six candidates for drug-targetable PPIs by applying our in silico approach to original human PPI data composed of 770 binary interactions produced by our HTS yeast two-hybrid (HTS-Y2H) assays. Among them, we further examined two candidates, RXRA/NRIP1 and CDK2/CDKN1A, with respect to their biological roles, PPI network around each candidate, and tertiary structures of the interacting domains. CONCLUSION: An integrative in silico approach for discovering candidates for drug-targetable PPIs was applied to original human PPIs data. The system excludes false positive interactions and selects reliable PPIs as drug targets. Its effectiveness was demonstrated by the discovery of the six promising candidate target PPIs. Inhibition or stabilization of the two interactions may have potential therapeutic effects against human diseases.


Subject(s)
Drug Delivery Systems/methods , Pharmaceutical Preparations/metabolism , Protein Interaction Mapping/methods , Drug Evaluation, Preclinical/methods , Humans , Pharmaceutical Preparations/chemistry , Protein Binding/physiology , Protein Structure, Secondary/physiology , Technology, Pharmaceutical/methods
11.
Bioinformatics ; 20(16): 2853-6, 2004 Nov 01.
Article in English | MEDLINE | ID: mdl-15130931

ABSTRACT

UNLABELLED: We constructed a website for inferring a network by applying the graphical Gaussian model, from a large amount of data, including redundant information. The available tools on the website are based on a system, named ASIAN (Automatic System for Inferring A Network), in combination with the two methods in our previous papers, which were designed to analyze gene expression profiles on a genomic scale. One of the remarkable features of the website is its ability to infer a network, concomitant with hierarchical clustering and the following estimation of cluster boundaries. AVAILABILITY: http://eureka.ims.u-tokyo.ac.jp/asian


Subject(s)
Gene Expression Profiling/methods , Gene Expression Regulation/physiology , Internet , Models, Biological , Proteome/metabolism , Signal Transduction/physiology , User-Computer Interface , Computer Simulation , Models, Statistical
12.
Genome Inform ; 15(1): 170-9, 2004.
Article in English | MEDLINE | ID: mdl-15712120

ABSTRACT

Various types of periodic patterns in nucleotide sequences are known to be very abundant in a genomic DNA sequence, and to play important biological roles such as gene expression, genome structural stabilization, and recombination. We present a new method, named "STEPSTONE", to find a specific periodic pattern of repeat sequence, inter-spread repeat, in which the tandem repeats of the conserved and the not-conserved regions appear periodically. In our method, at first, the data on periods of short repeat sequences found in a target sequence are stored as a hash data, and then are selected by application of an auto-correlation test in time series analysis. Among the statistically selected sequences, the inter-spread repeats are obtained by usual alignment procedures through two steps. To test the performance of our method, we examined the inter-spread repeats in Mycobacterium tuberculosis and Zamia paucijuga genomic sequences. As a result, our method exactly detected the repeats in the two sequences, being useful for identifying systematically the inter-spread repeats in DNA sequence.


Subject(s)
DNA/genetics , Genome , Repetitive Sequences, Nucleic Acid , Algorithms , Base Sequence , DNA/chemistry , Genome, Plant , Models, Genetic , Pattern Recognition, Automated , Zamiaceae/genetics
13.
Genome Inform ; 15(1): 229-38, 2004.
Article in English | MEDLINE | ID: mdl-15712125

ABSTRACT

Three possible causes responsible for the large genome size of a cyanobacterium Anabaena sp. PCC7120 are investigated: 1) sequential tandem duplications of gene segments, genes or genomic segments, 2) horizontal gene transfers from other organisms, and 3) whole-genome duplication. We evaluated the frequency distribution of angles between paralog locations for the possibility 1), the fraction of genes deviated in GC content, GC skew, AT skew and codon adaptation index for the 2) and the gene-configuration comparison of paralogs for the 3). As a result, the possibility 3), the whole-genome duplication, was more reasonable as a molecular cause than the other causes for the large genome size in Anabaena sp. PCC7120. In addition, the whole-genome duplication was supported by the analysis of distribution pattern of protein genes with respect to functional categories.


Subject(s)
Anabaena/genetics , Genome, Bacterial , Models, Genetic , Anabaena/metabolism , Base Composition , Base Pairing , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , Gene Duplication , Gene Transfer Techniques
14.
J Theor Biol ; 222(4): 447-60, 2003 Jun 21.
Article in English | MEDLINE | ID: mdl-12781743

ABSTRACT

The ribosomal RNAs (rRNAs) of animal mitochondria, especially those of arthropod mitochondria, have a higher content of G:U and U:G base pairs in their stem regions than the nuclear rRNAs. Thus, the theoretical formulation of base pair changes is extended to incorporate the faster base pair changes A:U<-->G:U<-->G:C and U:A<-->U:G<-->C:G into the previous formulation of the slower base pair changes between A:U, G:C, C:G and U:A. The relative base pair change probability containing the faster and slower base pair changes is theoretically derived to estimate the divergence time of rRNAs under the influence of selection for these base pairs. Using the cartilaginous fish-teleost fish divergence and the crustacean-insect divergence as calibration points, the present method successfully predicts the divergence times of the main branches of animals: Deuterostomia and Protostomia diverged 9.2 x 10(8) years ago, the divergence of Echinodermata, Hemichordata and Cephalochordata succeedingly occurred during the period from 8 x 10(8) to 6 x 10(8) years ago, while Arthropoda, Annelida and Mollusca diverged almost concomitantly about 7 x 10(8) years ago. The dating for the divergence of Platyhelminthes and Cnidaria is traced back to 1.2 x 10(9) years ago. This result is consistent with the fossil records in the Stirling Range Formation of southwestern Australia, the Ediacara and Avalon faunas and the Cambrian Burgess Shale. Thus, the present method may be useful for estimating the divergence times of animals ranging from 10(8) to 10(9) years ago, resolving the difficult problems, e.g. deviation from rate constancy and large sampling variances, in the usual methods of treating apparent change rates between individual bases and/or base pairs.


Subject(s)
Models, Genetic , Phylogeny , RNA, Ribosomal/genetics , RNA/genetics , Animals , Base Composition , Evolution, Molecular , RNA, Mitochondrial , Selection, Genetic
15.
J Mol Evol ; 55(5): 584-94, 2002 Nov.
Article in English | MEDLINE | ID: mdl-12399932

ABSTRACT

The base-pair changes in the stem regions of ribosomal RNAs provide a useful measure for resolving the phylogeny of organisms. In the present study, how the biased base-pair content influences the estimation of evolutionary distances is theoretically investigated. By regarding the biased base-pair content as a result of the difference in selective strength between A:U and G:C base pairs, the evolutionary distance empirically obtained by enumerating base-pair changes is theoretically expressed in terms of selective strength, base-pair change rate, and divergence time. Its application to nuclear-coded large subunit ribosomal RNAs (LSU rRNAs) reveals the followings. LSU rRNAs from most organisms have moderate base-pair contents and the empirical evolutionary distances obtained by the comparison of these LSU rRNAs are approximately proportional to their divergence times. In the comparison of these moderate LSU rRNAs with the GC-rich LSU rRNAs such as those from Mycoplasma, Crenarchaeota, and Giardia, however, the empirically calculated distances are considerably smaller than the true evolutionary distances, while the comparison with AU-rich LSU rRNAs from Microsporidia overestimates their distances. With this result in mind, the relative base-pair change probabilities among three kingdoms are carefully estimated from the statistical distribution of base-pair change ratios enumerated for LSU rRNAs showing almost the same base-pair contents, leading to the result that prokaryotes and eukaryotes first diverged and that archaebacteria and eubacteria diverged on the line of prokaryotes slightly later, by about 0.3 billion years.


Subject(s)
Evolution, Molecular , RNA, Ribosomal/chemistry , RNA, Ribosomal/genetics , Animals , Archaea/classification , Archaea/genetics , Bacteria/classification , Bacteria/genetics , Base Composition , Eukaryotic Cells , Models, Genetic , Phylogeny
SELECTION OF CITATIONS
SEARCH DETAIL
...