Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Bioinformatics ; 38(21): 4962-4965, 2022 10 31.
Artigo em Inglês | MEDLINE | ID: mdl-36124958

RESUMO

SUMMARY: HMI-PRED 2.0 is a publicly available web service for the prediction of host-microbe protein-protein interaction by interface mimicry that is intended to be used without extensive computational experience. A microbial protein structure is screened against a database covering the entire available structural space of complexes of known human proteins. AVAILABILITY AND IMPLEMENTATION: HMI-PRED 2.0 provides user-friendly graphic interfaces for predicting, visualizing and analyzing host-microbe interactions. HMI-PRED 2.0 is available at https://hmipred.org/.


Assuntos
Proteínas , Software , Humanos , Proteínas/química , Interface Usuário-Computador
2.
BMC Bioinformatics ; 23(Suppl 3): 158, 2022 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-35501680

RESUMO

BACKGROUND: Drug discovery is time-consuming and costly. Machine learning, especially deep learning, shows great potential in quantitative structure-activity relationship (QSAR) modeling to accelerate drug discovery process and reduce its cost. A big challenge in developing robust and generalizable deep learning models for QSAR is the lack of a large amount of data with high-quality and balanced labels. To address this challenge, we developed a self-training method, Partially LAbeled Noisy Student (PLANS), and a novel self-supervised graph embedding, Graph-Isomorphism-Network Fingerprint (GINFP), for chemical compounds representations with substructure information using unlabeled data. The representations can be used for predicting chemical properties such as binding affinity, toxicity, and others. PLANS-GINFP allows us to exploit millions of unlabeled chemical compounds as well as labeled and partially labeled pharmacological data to improve the generalizability of neural network models. RESULTS: We evaluated the performance of PLANS-GINFP for predicting Cytochrome P450 (CYP450) binding activity in a CYP450 dataset and chemical toxicity in the Tox21 dataset. The extensive benchmark studies demonstrated that PLANS-GINFP could significantly improve the performance in both cases by a large margin. Both PLANS-based self-training and GINFP-based self-supervised learning contribute to the performance improvement. CONCLUSION: To better exploit chemical structures as an input for machine learning algorithms, we proposed a self-supervised graph neural network-based embedding method that can encode substructure information. Furthermore, we developed a model agnostic self-training method, PLANS, that can be applied to any deep learning architectures to improve prediction accuracies. PLANS provided a way to better utilize partially labeled and unlabeled data. Comprehensive benchmark studies demonstrated their potentials in predicting drug metabolism and toxicity profiles using sparse, noisy, and imbalanced data. PLANS-GINFP could serve as a general solution to improve the predictive modeling for QSAR modeling.


Assuntos
Algoritmos , Redes Neurais de Computação , Humanos , Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade , Estudantes
3.
Bioinformatics ; 36(9): 2787-2795, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32003771

RESUMO

MOTIVATION: LINCS L1000 dataset contains numerous cellular expression data induced by large sets of perturbagens. Although it provides invaluable resources for drug discovery as well as understanding of disease mechanisms, the existing peak deconvolution algorithms cannot recover the accurate expression level of genes in many cases, inducing severe noise in the dataset and limiting its applications in biomedical studies. RESULTS: Here, we present a novel Bayesian-based peak deconvolution algorithm that gives unbiased likelihood estimations for peak locations and characterize the peaks with probability based z-scores. Based on the above algorithm, we build a pipeline to process raw data from L1000 assay into signatures that represent the features of perturbagen. The performance of the proposed pipeline is evaluated using similarity between the signatures of bio-replicates and the drugs with shared targets, and the results show that signatures derived from our pipeline gives a substantially more reliable and informative representation for perturbagens than existing methods. Thus, the new pipeline may significantly boost the performance of L1000 data in the downstream applications such as drug repurposing, disease modeling and gene function prediction. AVAILABILITY AND IMPLEMENTATION: The code and the precomputed data for LINCS L1000 Phase II (GSE 70138) are available at https://github.com/njpipeorgan/L1000-bayesian. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Descoberta de Drogas , Teorema de Bayes , Reposicionamento de Medicamentos
4.
J Chem Inf Model ; 61(4): 1570-1582, 2021 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-33757283

RESUMO

Small molecules play a critical role in modulating biological systems. Knowledge of chemical-protein interactions helps address fundamental and practical questions in biology and medicine. However, with the rapid emergence of newly sequenced genes, the endogenous or surrogate ligands of a vast number of proteins remain unknown. Homology modeling and machine learning are two major methods for assigning new ligands to a protein but mostly fail when sequence homology between an unannotated protein and those with known functions or structures is low. In this study, we develop a new deep learning framework to predict chemical binding to evolutionary divergent unannotated proteins, whose ligand cannot be reliably predicted by existing methods. By incorporating evolutionary information into self-supervised learning of unlabeled protein sequences, we develop a novel method, distilled sequence alignment embedding (DISAE), for the protein sequence representation. DISAE can utilize all protein sequences and their multiple sequence alignment (MSA) to capture functional relationships between proteins without the knowledge of their structure and function. Followed by the DISAE pretraining, we devise a module-based fine-tuning strategy for the supervised learning of chemical-protein interactions. In the benchmark studies, DISAE significantly improves the generalizability of machine learning models and outperforms the state-of-the-art methods by a large margin. Comprehensive ablation studies suggest that the use of MSA, sequence distillation, and triplet pretraining critically contributes to the success of DISAE. The interpretability analysis of DISAE suggests that it learns biologically meaningful information. We further use DISAE to assign ligands to human orphan G-protein coupled receptors (GPCRs) and to cluster the human GPCRome by integrating their phylogenetic and ligand relationships. The promising results of DISAE open an avenue for exploring the chemical landscape of entire sequenced genomes.


Assuntos
Biologia Computacional , Aprendizado de Máquina , Sequência de Aminoácidos , Humanos , Ligantes , Filogenia , Alinhamento de Sequência
5.
PLoS Comput Biol ; 15(6): e1006619, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31206508

RESUMO

Many complex diseases such as cancer are associated with multiple pathological manifestations. Moreover, the therapeutics for their treatments often lead to serious side effects. Thus, it is needed to develop multi-indication therapeutics that can simultaneously target multiple clinical indications of interest and mitigate the side effects. However, conventional one-drug-one-gene drug discovery paradigm and emerging polypharmacology approach rarely tackle the challenge of multi-indication drug design. For the first time, we propose a one-drug-multi-target-multi-indication strategy. We develop a novel structural systems pharmacology platform 3D-REMAP that uses ligand binding site comparison and protein-ligand docking to augment sparse chemical genomics data for the machine learning model of genome-scale chemical-protein interaction prediction. Experimentally validated predictions systematically show that 3D-REMAP outperforms state-of-the-art ligand-based, receptor-based, and machine learning methods alone. As a proof-of-concept, we utilize the concept of drug repurposing that is enabled by 3D-REMAP to design dual-indication anti-cancer therapy. The repurposed drug can demonstrate anti-cancer activity for cancers that do not have effective treatment as well as reduce the risk of heart failure that is associated with all types of existing anti-cancer therapies. We predict that levosimendan, a PDE inhibitor for heart failure, inhibits serine/threonine-protein kinase RIOK1 and other kinases. Subsequent experiments and systems biology analyses confirm this prediction, and suggest that levosimendan is active against multiple cancers, notably lymphoma, through the direct inhibition of RIOK1 and RNA processing pathway. We further develop machine learning models to predict cancer cell-line's and a patient's response to levosimendan. Our findings suggest that levosimendan can be a promising novel lead compound for the development of safe, effective, and precision multi-indication anti-cancer therapy. This study demonstrates the potential of structural systems pharmacology in designing polypharmacology for precision medicine. It may facilitate transforming the conventional one-drug-one-gene-one-disease drug discovery process and single-indication polypharmacology approach into a new one-drug-multi-target-multi-indication paradigm for complex diseases.


Assuntos
Antineoplásicos/farmacologia , Descoberta de Drogas/métodos , Farmacogenética/métodos , Inibidores de Fosfodiesterase , Medicina de Precisão/métodos , Biologia Computacional , Humanos
6.
BMC Bioinformatics ; 20(Suppl 24): 674, 2019 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-31861982

RESUMO

BACKGROUND: Computational prediction of a phenotypic response upon the chemical perturbation on a biological system plays an important role in drug discovery, and many other applications. Chemical fingerprints are a widely used feature to build machine learning models. However, the fingerprints that are derived from chemical structures ignore the biological context, thus, they suffer from several problems such as the activity cliff and curse of dimensionality. Fundamentally, the chemical modulation of biological activities is a multi-scale process. It is the genome-wide chemical-target interactions that modulate chemical phenotypic responses. Thus, the genome-scale chemical-target interaction profile will more directly correlate with in vitro and in vivo activities than the chemical structure. Nevertheless, the scope of direct application of the chemical-target interaction profile is limited due to the severe incompleteness, biasness, and noisiness of bioassay data. RESULTS: To address the aforementioned problems, we developed a novel chemical representation method: Latent Target Interaction Profile (LTIP). LTIP embeds chemicals into a low dimensional continuous latent space that represents genome-scale chemical-target interactions. Subsequently LTIP can be used as a feature to build machine learning models. Using the drug sensitivity of cancer cell lines as a benchmark, we have shown that the LTIP robustly outperforms chemical fingerprints regardless of machine learning algorithms. Moreover, the LTIP is complementary with the chemical fingerprints. It is possible for us to combine LTIP with other fingerprints to further improve the performance of bioactivity prediction. CONCLUSIONS: Our results demonstrate the potential of LTIP in particular and multi-scale modeling in general in predictive modeling of chemical modulation of biological activities.


Assuntos
Descoberta de Drogas , Algoritmos , Aprendizado de Máquina
7.
PLoS Comput Biol ; 12(10): e1005135, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-27716836

RESUMO

Target-based screening is one of the major approaches in drug discovery. Besides the intended target, unexpected drug off-target interactions often occur, and many of them have not been recognized and characterized. The off-target interactions can be responsible for either therapeutic or side effects. Thus, identifying the genome-wide off-targets of lead compounds or existing drugs will be critical for designing effective and safe drugs, and providing new opportunities for drug repurposing. Although many computational methods have been developed to predict drug-target interactions, they are either less accurate than the one that we are proposing here or computationally too intensive, thereby limiting their capability for large-scale off-target identification. In addition, the performances of most machine learning based algorithms have been mainly evaluated to predict off-target interactions in the same gene family for hundreds of chemicals. It is not clear how these algorithms perform in terms of detecting off-targets across gene families on a proteome scale. Here, we are presenting a fast and accurate off-target prediction method, REMAP, which is based on a dual regularized one-class collaborative filtering algorithm, to explore continuous chemical space, protein space, and their interactome on a large scale. When tested in a reliable, extensive, and cross-gene family benchmark, REMAP outperforms the state-of-the-art methods. Furthermore, REMAP is highly scalable. It can screen a dataset of 200 thousands chemicals against 20 thousands proteins within 2 hours. Using the reconstructed genome-wide target profile as the fingerprint of a chemical compound, we predicted that seven FDA-approved drugs can be repurposed as novel anti-cancer therapies. The anti-cancer activity of six of them is supported by experimental evidences. Thus, REMAP is a valuable addition to the existing in silico toolbox for drug target identification, drug repurposing, phenotypic screening, and side effect prediction. The software and benchmark are available at https://github.com/hansaimlim/REMAP.


Assuntos
Antineoplásicos/química , Avaliação Pré-Clínica de Medicamentos/métodos , Reposicionamento de Medicamentos/métodos , Ensaios de Triagem em Larga Escala/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Terapia de Alvo Molecular/métodos , Ligação Proteica
8.
Curr Opin Struct Biol ; 73: 102328, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35152186

RESUMO

Host-microbiome interactions play significant roles in human health and disease. Artificial intelligence approaches have been developed to better understand and predict the molecular interplay between the host and its microbiome. Here, we review recent advancements in computational methods to predict microbial effects on human cells with a special focus on protein-protein interactions. We categorize recent methods from traditional ones to more recent deep learning methods, followed by several challenges and potential solutions in structure-based approaches. This review serves as a brief guide to the current status and future directions in the field.


Assuntos
Inteligência Artificial , Microbiota , Humanos
9.
Artigo em Inglês | MEDLINE | ID: mdl-31995498

RESUMO

Identifying target genes of transcription factors (TFs) is crucial to understand transcriptional regulation. However, our understanding of genome-wide TF targeting profile is limited due to the cost of large-scale experiments and intrinsic complexity of gene regulation. Thus, computational prediction methods are useful to predict unobserved TF-gene associations. Here, we develop a new Weighted Imputed Neighborhood-regularized Tri-Factorization one-class collaborative filtering algorithm, WINTF. It predicts unobserved target genes for TFs using known but noisy, incomplete, and biased TF-gene associations and protein-protein interaction networks. Our benchmark study shows that WINTF significantly outperforms its counterpart matrix factorization-based algorithms and tri-factorization methods that do not include weight, imputation, and neighbor-regularization, for TF-gene association prediction. When evaluated by independent datasets, accuracy is 37.8 percent on the top 495 predicted associations, an enrichment factor of 4.19 compared with random guess. Furthermore, many predicted novel associations are supported by literature evidence. Although we only use canonical TF-gene interaction data, WINTF can directly be applied to tissue-specific data when available. Thus, WINTF provides a potentially useful framework to integrate multiple omics data for further improvement of TF-gene prediction and applications to other sparse and noisy biological data. The benchmark dataset and source code are freely available at https://github.com/XieResearchGroup/WINTF.


Assuntos
Algoritmos , Biologia Computacional/métodos , Fatores de Transcrição , Animais , Regulação da Expressão Gênica/genética , Humanos , Camundongos , Fatores de Transcrição/classificação , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Transcriptoma/genética
10.
Methods Mol Biol ; 1939: 199-214, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30848463

RESUMO

Systems pharmacology aims to understand drug actions on a multi-scale from atomic details of drug-target interactions to emergent properties of biological network and rationally design drugs targeting an interacting network instead of a single gene. Multifaceted data-driven studies, including machine learning-based predictions, play a key role in systems pharmacology. In such works, the integration of multiple omics data is the key initial step, followed by optimization and prediction. Here, we describe the overall procedures for drug-target association prediction using REMAP, a large-scale off-target prediction tool. The method introduced here can be applied to other relation inference problems in systems pharmacology.


Assuntos
Descoberta de Drogas/métodos , Software , Biologia de Sistemas/métodos , Bases de Dados Factuais , Reposicionamento de Medicamentos/métodos , Humanos , Aprendizado de Máquina
11.
ACM BCB ; 2018: 1-10, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-31061989

RESUMO

Identifying the target genes of transcription factors (TFs) is one of the key factors to understand transcriptional regulation. However, our understanding of genome-wide TF targeting profile is limited due to the cost of large scale experiments and intrinsic complexity. Thus, computational prediction methods are useful to predict the unobserved associations. Here, we developed a new one-class collaborative filtering algorithm tREMAP that is based on regularized, weighted nonnegative matrix tri-factorization. The algorithm predicts unobserved target genes for TFs using known gene-TF associations and protein-protein interaction network. Our benchmark study shows that tREMAP significantly outperforms its counterpart REMAP, a bi-factorization-based algorithm, for transcription factor target gene prediction in all four performance metrics AUC, MAP, MPR, and HLU. When evaluated by independent data sets, the prediction accuracy is 37.8% on the top 495 predicted associations, an enrichment factor of 4.19 compared with the random guess. Furthermore, many of the predicted novel associations by tREMAP are supported by evidence from literature. Although we only use canonical TF-target gene interaction data in this study, tREMAP can be directly applied to tissue-specific data sets. tREMAP provides a framework to integrate multiple omics data for the further improvement of TF target gene prediction. Thus, tREMAP is a potentially useful tool in studying gene regulatory networks. The benchmark data set and the source code of tREMAP are freely available at https://github.com/hansaimlim/REMAP/tree/master/TriFacREMAP.

12.
AMIA Jt Summits Transl Sci Proc ; 2017: 132-141, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29888057

RESUMO

Side effects are the second and the fourth leading causes of drug attrition and death in the US. Thus, accurate prediction of side effects and understanding their mechanism of action will significantly impact drug discovery and clinical practice. Here, we show REMAP, a neighborhood-regularized weighted and imputed one-class collaborative filtering algorithm, is effective in predicting drug-side effect associations from a drug-side effect association network, and significantly outperforms the state-of-the-art multi-target learning algorithm for predicting rare side effects. We also apply FASCINATE, an extension of REMAP for multi-layered networks, to infer associations among side effects and drug targets from drug-target-side effect networks. Then, using random permutation analysis and gene overrepresentation tests, we infer statistically significant side effect-pathway associations. The predicted drug-side effect associations and side effect-causing pathways are consistent with clinical evidences. We expect more novel drug-side effect associations and side effect-causing pathways to be identified when applying REMAP and FASCINATE to large-scale chemical-gene-side effect networks.

13.
IEEE/ACM Trans Comput Biol Bioinform ; 15(6): 1960-1967, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29993812

RESUMO

Existing drug discovery processes follow a reductionist model of "one-drug-one-gene-one-disease," which is inadequate to tackle complex diseases involving multiple malfunctioned genes. The availability of big omics data offers opportunities to transform drug discovery process into a new paradigm of systems pharmacology that focuses on designing drugs to target molecular interaction networks instead of a single gene. Here, we develop a reliable multi-rank, multi-layered recommender system, ANTENNA, to mine large-scale chemical genomics and disease association data for prediction of novel drug-gene-disease associations. ANTENNA integrates a novel tri-factorization based dual-regularized weighted and imputed One Class Collaborative Filtering (OCCF) algorithm, tREMAP, with a statistical framework based on Random Walk with Restart and assess the reliability of specific predictions. In the benchmark, tREMAP clearly outperforms the single-rank OCCF. We apply ANTENNA to a real-world problem: repurposing old drugs for new clinical indications without effective treatments. We discover that FDA-approved drug diazoxide can inhibit multiple kinase genes responsible for many diseases including cancer and kill triple negative breast cancer (TNBC) cells efficiently [Formula: see text]. TNBC is a deadly disease without effective targeted therapies. Our finding demonstrates the power of big data analytics in drug discovery and developing a targeted therapy for TNBC.


Assuntos
Antineoplásicos/farmacologia , Biologia Computacional/métodos , Mineração de Dados/métodos , Diazóxido/farmacologia , Reposicionamento de Medicamentos/métodos , Aprendizado de Máquina , Algoritmos , Linhagem Celular Tumoral , Sobrevivência Celular/efeitos dos fármacos , Humanos , Reprodutibilidade dos Testes , Software
14.
Sci Rep ; 6: 38860, 2016 12 13.
Artigo em Inglês | MEDLINE | ID: mdl-27958331

RESUMO

Conventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology. Here, we present a novel theoretical framework and a corresponding algorithm for genome-scale multi-target virtual screening based on the one-class collaborative filtering technique. Our method overcomes the sparseness of the protein-chemical interaction data by means of interaction matrix weighting and dual regularization from both chemicals and proteins. While the statistical foundation behind our method is general enough to encompass genome-wide drug off-target prediction, the program is specifically tailored to find protein targets for new chemicals with little to no available interaction data. We extensively evaluate our method using a number of the most widely accepted gene-specific and cross-gene family benchmarks and demonstrate that our method outperforms other state-of-the-art algorithms for predicting the interaction of new chemicals with multiple proteins. Thus, the proposed algorithm may provide a powerful tool for multi-target drug design.


Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Polifarmacologia , Algoritmos , Genoma , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA