Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 118(6)2021 02 09.
Artigo em Inglês | MEDLINE | ID: mdl-33526657

RESUMO

RNA polymerase II (Pol II) generally pauses at certain positions along gene bodies, thereby interrupting the transcription elongation process, which is often coupled with various important biological functions, such as precursor mRNA splicing and gene expression regulation. Characterizing the transcriptional elongation dynamics can thus help us understand many essential biological processes in eukaryotic cells. However, experimentally measuring Pol II elongation rates is generally time and resource consuming. We developed PEPMAN (polymerase II elongation pausing modeling through attention-based deep neural network), a deep learning-based model that accurately predicts Pol II pausing sites based on the native elongating transcript sequencing (NET-seq) data. Through fully taking advantage of the attention mechanism, PEPMAN is able to decipher important sequence features underlying Pol II pausing. More importantly, we demonstrated that the analyses of the PEPMAN-predicted results around various types of alternative splicing sites can provide useful clues into understanding the cotranscriptional splicing events. In addition, associating the PEPMAN prediction results with different epigenetic features can help reveal important factors related to the transcription elongation process. All these results demonstrated that PEPMAN can provide a useful and effective tool for modeling transcription elongation and understanding the related biological factors from available high-throughput sequencing data.


Assuntos
Genoma Humano , Aprendizado de Máquina , Modelos Biológicos , Elongação da Transcrição Genética , Sequência de Bases , Sítios de Ligação , Metilação de DNA/genética , Epigênese Genética , Células HEK293 , Células HeLa , Histonas/metabolismo , Humanos , Motivos de Nucleotídeos/genética , Processamento de Proteína Pós-Traducional , RNA Polimerase II/metabolismo , Sítios de Splice de RNA/genética , Splicing de RNA/genética
2.
Bioinformatics ; 36(9): 2872-2880, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31950974

RESUMO

MOTIVATION: Quantitative structure-activity relationship (QSAR) and drug-target interaction (DTI) prediction are both commonly used in drug discovery. Collaboration among pharmaceutical institutions can lead to better performance in both QSAR and DTI prediction. However, the drug-related data privacy and intellectual property issues have become a noticeable hindrance for inter-institutional collaboration in drug discovery. RESULTS: We have developed two novel algorithms under secure multiparty computation (MPC), including QSARMPC and DTIMPC, which enable pharmaceutical institutions to achieve high-quality collaboration to advance drug discovery without divulging private drug-related information. QSARMPC, a neural network model under MPC, displays good scalability and performance and is feasible for privacy-preserving collaboration on large-scale QSAR prediction. DTIMPC integrates drug-related heterogeneous network data and accurately predicts novel DTIs, while keeping the drug information confidential. Under several experimental settings that reflect the situations in real drug discovery scenarios, we have demonstrated that DTIMPC possesses significant performance improvement over the baseline methods, generates novel DTI predictions with supporting evidence from the literature and shows the feasible scalability to handle growing DTI data. All these results indicate that QSARMPC and DTIMPC can provide practically useful tools for advancing privacy-preserving drug discovery. AVAILABILITY AND IMPLEMENTATION: The source codes of QSARMPC and DTIMPC are available on the GitHub: https://github.com/rongma6/QSARMPC_DTIMPC.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Descoberta de Drogas , Privacidade , Algoritmos , Desenvolvimento de Medicamentos
3.
Bioinformatics ; 35(1): 104-111, 2019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30561548

RESUMO

Motivation: Accurately predicting drug-target interactions (DTIs) in silico can guide the drug discovery process and thus facilitate drug development. Computational approaches for DTI prediction that adopt the systems biology perspective generally exploit the rationale that the properties of drugs and targets can be characterized by their functional roles in biological networks. Results: Inspired by recent advance of information passing and aggregation techniques that generalize the convolution neural networks to mine large-scale graph data and greatly improve the performance of many network-related prediction tasks, we develop a new nonlinear end-to-end learning model, called NeoDTI, that integrates diverse information from heterogeneous network data and automatically learns topology-preserving representations of drugs and targets to facilitate DTI prediction. The substantial prediction performance improvement over other state-of-the-art DTI prediction methods as well as several novel predicted DTIs with evidence supports from previous studies have demonstrated the superior predictive power of NeoDTI. In addition, NeoDTI is robust against a wide range of choices of hyperparameters and is ready to integrate more drug and target related information (e.g. compound-protein binding affinity data). All these results suggest that NeoDTI can offer a powerful and robust tool for drug development and drug repositioning. Availability and implementation: The source code and data used in NeoDTI are available at: https://github.com/FangpingWan/NeoDTI. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Simulação por Computador , Desenvolvimento de Medicamentos/métodos , Software , Descoberta de Drogas , Reposicionamento de Medicamentos , Ligação Proteica
4.
Bioinformatics ; 35(23): 4946-4954, 2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31120490

RESUMO

MOTIVATION: Prediction of peptide binding to the major histocompatibility complex (MHC) plays a vital role in the development of therapeutic vaccines for the treatment of cancer. Algorithms with improved correlations between predicted and actual binding affinities are needed to increase precision and reduce the number of false positive predictions. RESULTS: We present ACME (Attention-based Convolutional neural networks for MHC Epitope binding prediction), a new pan-specific algorithm to accurately predict the binding affinities between peptides and MHC class I molecules, even for those new alleles that are not seen in the training data. Extensive tests have demonstrated that ACME can significantly outperform other state-of-the-art prediction methods with an increase of the Pearson correlation coefficient between predicted and measured binding affinities by up to 23 percentage points. In addition, its ability to identify strong-binding peptides has been experimentally validated. Moreover, by integrating the convolutional neural network with attention mechanism, ACME is able to extract interpretable patterns that can provide useful and detailed insights into the binding preferences between peptides and their MHC partners. All these results have demonstrated that ACME can provide a powerful and practically useful tool for the studies of peptide-MHC class I interactions. AVAILABILITY AND IMPLEMENTATION: ACME is available as an open source software at https://github.com/HYsxe/ACME. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Neurais de Computação , Algoritmos , Atenção , Sítios de Ligação , Biologia Computacional , Antígenos de Histocompatibilidade Classe I , Peptídeos , Ligação Proteica
5.
Artigo em Inglês | MEDLINE | ID: mdl-38913513

RESUMO

The recent boom in single-cell sequencing technologies provides valuable insights into the transcriptomes of individual cells. Through single-cell data analyses, a number of biological discoveries, such as novel cell types, developmental cell lineage trajectories, and gene regulatory networks, have been uncovered. However, the massive and increasingly accumulated single-cell datasets have also posed a seriously computational and analytical challenge for researchers. To address this issue, one typically applies dimensionality reduction approaches to reduce the large-scale datasets. However, these approaches are generally computationally infeasible for tall matrices. In addition, the downstream data analysis tasks such as clustering still take a large time complexity even on the dimension-reduced datasets. We present single-cell Coreset (scCoreset), a data summarization framework that extracts a small weighted subset of cells from a huge sparse single-cell RNA-seq data to facilitate the downstream data analysis tasks. Single-cell data analyses run on the extracted subset yield similar results to those derived from the original uncompressed data. Tests on various single-cell datasets show that scCoreset outperforms the existing data summarization approaches for common downstream tasks such as visualization and clustering. We believe that scCoreset can serve as a useful plug-in tool to improve the efficiency of current single-cell RNA-seq data analyses.

6.
Nat Biomed Eng ; 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38862735

RESUMO

Molecular de-extinction aims at resurrecting molecules to solve antibiotic resistance and other present-day biological and biomedical problems. Here we show that deep learning can be used to mine the proteomes of all available extinct organisms for the discovery of antibiotic peptides. We trained ensembles of deep-learning models consisting of a peptide-sequence encoder coupled with neural networks for the prediction of antimicrobial activity and used it to mine 10,311,899 peptides. The models predicted 37,176 sequences with broad-spectrum antimicrobial activity, 11,035 of which were not found in extant organisms. We synthesized 69 peptides and experimentally confirmed their activity against bacterial pathogens. Most peptides killed bacteria by depolarizing their cytoplasmic membrane, contrary to known antimicrobial peptides, which tend to target the outer membrane. Notably, lead compounds (including mammuthusin-2 from the woolly mammoth, elephasin-2 from the straight-tusked elephant, hydrodamin-1 from the ancient sea cow, mylodonin-2 from the giant sloth and megalocerin-1 from the extinct giant elk) showed anti-infective activity in mice with skin abscess or thigh infections. Molecular de-extinction aided by deep learning may accelerate the discovery of therapeutic molecules.

7.
Expert Opin Drug Discov ; 18(11): 1245-1257, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37794737

RESUMO

INTRODUCTION: As machine learning (ML) and artificial intelligence (AI) expand to many segments of our society, they are increasingly being used for drug discovery. Recent deep learning models offer an efficient way to explore high-dimensional data and design compounds with desired properties, including those with antibacterial activity. AREAS COVERED: This review covers key frameworks in antibiotic discovery, highlighting physicochemical features and addressing dataset limitations. The deep learning approaches here described include discriminative models such as convolutional neural networks, recurrent neural networks, graph neural networks, and generative models like neural language models, variational autoencoders, generative adversarial networks, normalizing flow, and diffusion models. As the integration of these approaches in drug discovery continues to evolve, this review aims to provide insights into promising prospects and challenges that lie ahead in harnessing such technologies for the development of antibiotics. EXPERT OPINION: Accurate antimicrobial prediction using deep learning faces challenges such as imbalanced data, limited datasets, experimental validation, target strains, and structure. The integration of deep generative models with bioinformatics, molecular dynamics, and data augmentation holds the potential to overcome these challenges, enhance model performance, and utlimately accelerate antimicrobial discovery.


Assuntos
Inteligência Artificial , Aprendizado Profundo , Humanos , Antibacterianos/farmacologia , Redes Neurais de Computação , Aprendizado de Máquina
8.
Digit Discov ; 1(3): 195-208, 2022 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-35769205

RESUMO

Computers can already be programmed for superhuman pattern recognition of images and text. For machines to discover novel molecules, they must first be trained to sort through the many characteristics of molecules and determine which properties should be retained, suppressed, or enhanced to optimize functions of interest. Machines need to be able to understand, read, write, and eventually create new molecules. Today, this creative process relies on deep generative models, which have gained popularity since powerful deep neural networks were introduced to generative model frameworks. In recent years, they have demonstrated excellent ability to model complex distribution of real-word data (e.g., images, audio, text, molecules, and biological sequences). Deep generative models can generate data beyond those provided in training samples, thus yielding an efficient and rapid tool for exploring the massive search space of high-dimensional data such as DNA/protein sequences and facilitating the design of biomolecules with desired functions. Here, we review the emerging field of deep generative models applied to peptide science. In particular, we discuss several popular deep generative model frameworks as well as their applications to generate peptides with various kinds of properties (e.g., antimicrobial, anticancer, cell penetration, etc). We conclude our review with a discussion of current limitations and future perspectives in this emerging field.

9.
iScience ; 25(10): 105231, 2022 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-36274947

RESUMO

Deeply understanding the properties (e.g., chemical or biological characteristics) of small molecules plays an essential role in drug development. A large number of molecular property datasets have been rapidly accumulated in recent years. However, most of these datasets contain only a limited amount of data, which hinders deep learning methods from making accurate predictions of the corresponding molecular properties. In this work, we propose a transfer learning strategy to alleviate such a data scarcity problem by exploiting the similarity between molecular property prediction tasks. We introduce an effective and interpretable computational framework, named MoTSE (Molecular Tasks Similarity Estimator), to provide an accurate estimation of task similarity. Comprehensive tests demonstrated that the task similarity derived from MoTSE can serve as useful guidance to improve the prediction performance of transfer learning on molecular properties. We also showed that MoTSE can capture the intrinsic relationships between molecular properties and provide meaningful interpretability for the derived similarity.

10.
Cell Rep Med ; 3(1): 100492, 2022 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-35106508

RESUMO

The Columbia Cancer Target Discovery and Development (CTD2) Center is developing PANACEA, a resource comprising dose-responses and RNA sequencing (RNA-seq) profiles of 25 cell lines perturbed with ∼400 clinical oncology drugs, to study a tumor-specific drug mechanism of action. Here, this resource serves as the basis for a DREAM Challenge assessing the accuracy and sensitivity of computational algorithms for de novo drug polypharmacology predictions. Dose-response and perturbational profiles for 32 kinase inhibitors are provided to 21 teams who are blind to the identity of the compounds. The teams are asked to predict high-affinity binding targets of each compound among ∼1,300 targets cataloged in DrugBank. The best performing methods leverage gene expression profile similarity analysis as well as deep-learning methodologies trained on individual datasets. This study lays the foundation for future integrative analyses of pharmacogenomic data, reconciliation of polypharmacology effects in different tumor contexts, and insights into network-based assessments of drug mechanisms of action.


Assuntos
Neoplasias/tratamento farmacológico , Polifarmacologia , Algoritmos , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Redes Neurais de Computação , Proteínas Quinases/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Transcrição Gênica
11.
Nat Commun ; 12(1): 5465, 2021 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-34526500

RESUMO

Peptide-protein interactions are involved in various fundamental cellular functions and their identification is crucial for designing efficacious peptide therapeutics. Recently, a number of computational methods have been developed to predict peptide-protein interactions. However, most of the existing prediction approaches heavily depend on high-resolution structure data. Here, we present a deep learning framework for multi-level peptide-protein interaction prediction, called CAMP, including binary peptide-protein interaction prediction and corresponding peptide binding residue identification. Comprehensive evaluation demonstrated that CAMP can successfully capture the binary interactions between peptides and proteins and identify the binding residues along the peptides involved in the interactions. In addition, CAMP outperformed other state-of-the-art methods on binary peptide-protein interaction prediction. CAMP can serve as a useful tool in peptide-protein interaction prediction and identification of important binding residues in the peptides, which can thus facilitate the peptide drug discovery process.


Assuntos
Algoritmos , Biologia Computacional/métodos , Aprendizado Profundo , Peptídeos/metabolismo , Proteínas/metabolismo , Sítios de Ligação , Modelos Moleculares , Peptídeos/química , Ligação Proteica , Domínios Proteicos , Proteínas/química , Reprodutibilidade dos Testes
12.
Nat Commun ; 12(1): 3307, 2021 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-34083538

RESUMO

Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound-kinase interactions for novel and potent activities. Here, we carry out a crowdsourced benchmarking of predictive algorithms for kinase inhibitor potencies across multiple kinase families tested on unpublished bioactivity data. We find the top-performing predictions are based on various models, including kernel learning, gradient boosting and deep learning, and their ensemble leads to a predictive accuracy exceeding that of single-dose kinase activity assays. We design experiments based on the model predictions and identify unexpected activities even for under-studied kinases, thereby accelerating experimental mapping efforts. The open-source prediction algorithms together with the bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking prediction algorithms and for extending the druggable kinome.


Assuntos
Inibidores de Proteínas Quinases/farmacologia , Proteínas Quinases/metabolismo , Algoritmos , Benchmarking , Crowdsourcing , Bases de Dados de Produtos Farmacêuticos , Aprendizado Profundo , Descoberta de Drogas , Avaliação Pré-Clínica de Medicamentos , Humanos , Cinética , Aprendizado de Máquina , Modelos Biológicos , Modelos Químicos , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/farmacocinética , Proteínas Quinases/química , Proteômica , Análise de Regressão
13.
Signal Transduct Target Ther ; 6(1): 165, 2021 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-33895786

RESUMO

The global spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) requires an urgent need to find effective therapeutics for the treatment of coronavirus disease 2019 (COVID-19). In this study, we developed an integrative drug repositioning framework, which fully takes advantage of machine learning and statistical analysis approaches to systematically integrate and mine large-scale knowledge graph, literature and transcriptome data to discover the potential drug candidates against SARS-CoV-2. Our in silico screening followed by wet-lab validation indicated that a poly-ADP-ribose polymerase 1 (PARP1) inhibitor, CVL218, currently in Phase I clinical trial, may be repurposed to treat COVID-19. Our in vitro assays revealed that CVL218 can exhibit effective inhibitory activity against SARS-CoV-2 replication without obvious cytopathic effect. In addition, we showed that CVL218 can interact with the nucleocapsid (N) protein of SARS-CoV-2 and is able to suppress the LPS-induced production of several inflammatory cytokines that are highly relevant to the prevention of immunopathology induced by SARS-CoV-2 infection.


Assuntos
Antivirais/uso terapêutico , Tratamento Farmacológico da COVID-19 , COVID-19/metabolismo , Simulação por Computador , Reposicionamento de Medicamentos , Modelos Biológicos , SARS-CoV-2/metabolismo , Humanos
14.
Front Pharmacol ; 11: 112, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32184722

RESUMO

Synthetic lethality (SL), an important type of genetic interaction, can provide useful insight into the target identification process for the development of anticancer therapeutics. Although several well-established SL gene pairs have been verified to be conserved in humans, most SL interactions remain cell-line specific. Here, we demonstrated that the cell-line-specific gene expression profiles derived from the shRNA perturbation experiments performed in the LINCS L1000 project can provide useful features for predicting SL interactions in human. In this paper, we developed a semi-supervised neural network-based method called EXP2SL to accurately identify SL interactions from the L1000 gene expression profiles. Through a systematic evaluation on the SL datasets of three different cell lines, we demonstrated that our model achieved better performance than the baseline methods and verified the effectiveness of using the L1000 gene expression features and the semi-supervise training technique in SL prediction.

15.
Genomics Proteomics Bioinformatics ; 17(5): 478-495, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-32035227

RESUMO

Accurate identification of compound-protein interactions (CPIs) in silico may deepen our understanding of the underlying mechanisms of drug action and thus remarkably facilitate drug discovery and development. Conventional similarity- or docking-based computational methods for predicting CPIs rarely exploit latent features from currently available large-scale unlabeled compound and protein data and often limit their usage to relatively small-scale datasets. In the present study, we propose DeepCPI, a novel general and scalable computational framework that combines effective feature embedding (a technique of representation learning) with powerful deep learning methods to accurately predict CPIs at a large scale. DeepCPI automatically learns the implicit yet expressive low-dimensional features of compounds and proteins from a massive amount of unlabeled data. Evaluations of the measured CPIs in large-scale databases, such as ChEMBL and BindingDB, as well as of the known drug-target interactions from DrugBank, demonstrated the superior predictive performance of DeepCPI. Furthermore, several interactions among small-molecule compounds and three G protein-coupled receptor targets (glucagon-like peptide-1 receptor, glucagon receptor, and vasoactive intestinal peptide receptor) predicted using DeepCPI were experimentally validated. The present study suggests that DeepCPI is a useful and powerful tool for drug discovery and repositioning. The source code of DeepCPI can be downloaded from https://github.com/FangpingWan/DeepCPI.


Assuntos
Aprendizado Profundo , Interface Usuário-Computador , Área Sob a Curva , Bases de Dados de Compostos Químicos , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo , Proteínas/química , Proteínas/metabolismo , Curva ROC
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA