Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Pharmacol Res ; 173: 105752, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34481072

RESUMO

Traditional Chinese medicine (TCM) formula is widely used for thousands of years in clinical practice. With the development of artificial intelligence, deep learning models may help doctors prescribe reasonable formulas. Meanwhile, current studies of formula recommendation only focus on the observable clinical symptoms and lack of molecular information. Here, inspired by the theory of TCM network pharmacology, we propose an intelligent formula recommendation system based on deep learning (FordNet), fusing the information of phenotype and molecule. We collected more than 20,000 electronic health records from TCM Master Li Jiren's experience from 2013 to March 2020. In the FordNet system, the feature of diagnosis description is extracted by convolution neural network and the feature of TCM formula is extracted by network embedding, which fusing the molecular information. A hierarchical sampling strategy for data augmentation is designed to effectively learn training samples. Based on the expanded samples, a deep neural network based quantitative optimization model is developed for TCM formula recommendation. FordNet performs significantly better than baseline methods (hit ratio of top 10 improved by 46.9% compared with the best baseline random forest method). Moreover, the molecular information helps FordNet improve 17.3% hit ratio compared with the model using only macro information. Clinical evaluation shows that FordNet can well learn the effective experience of TCM Master and obtain excellent recommendation results. Our study, for the first time, proposes an intelligent recommendation system for TCM formula integrating phenotype and molecule information, which has potential to improve clinical diagnosis and treatment, and promote the shift of TCM research pattern from "experience based, macro" to "data based, macro-micro combined" as well as the development of TCM network pharmacology.

2.
Nat Commun ; 12(1): 5465, 2021 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-34526500

RESUMO

Peptide-protein interactions are involved in various fundamental cellular functions and their identification is crucial for designing efficacious peptide therapeutics. Recently, a number of computational methods have been developed to predict peptide-protein interactions. However, most of the existing prediction approaches heavily depend on high-resolution structure data. Here, we present a deep learning framework for multi-level peptide-protein interaction prediction, called CAMP, including binary peptide-protein interaction prediction and corresponding peptide binding residue identification. Comprehensive evaluation demonstrated that CAMP can successfully capture the binary interactions between peptides and proteins and identify the binding residues along the peptides involved in the interactions. In addition, CAMP outperformed other state-of-the-art methods on binary peptide-protein interaction prediction. CAMP can serve as a useful tool in peptide-protein interaction prediction and identification of important binding residues in the peptides, which can thus facilitate the peptide drug discovery process.


Assuntos
Algoritmos , Biologia Computacional/métodos , Aprendizado Profundo , Peptídeos/metabolismo , Proteínas/metabolismo , Sítios de Ligação , Modelos Moleculares , Peptídeos/química , Ligação Proteica , Domínios Proteicos , Proteínas/química , Reprodutibilidade dos Testes
3.
Bioinformatics ; 37(Suppl_1): i254-i261, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252932

RESUMO

MOTIVATION: The prediction of the binding between peptides and major histocompatibility complex (MHC) molecules plays an important role in neoantigen identification. Although a large number of computational methods have been developed to address this problem, they produce high false-positive rates in practical applications, since in most cases, a single residue mutation may largely alter the binding affinity of a peptide binding to MHC which cannot be identified by conventional deep learning methods. RESULTS: We developed a differential boundary tree-based model, named DBTpred, to address this problem. We demonstrated that DBTpred can accurately predict MHC class I binding affinity compared to the state-of-art deep learning methods. We also presented a parallel training algorithm to accelerate the training and inference process which enables DBTpred to be applied to large datasets. By investigating the statistical properties of differential boundary trees and the prediction paths to test samples, we revealed that DBTpred can provide an intuitive interpretation and possible hints in detecting important residue mutations that can largely influence binding affinity. AVAILABILITY AND IMPLEMENTATION: The DBTpred package is implemented in Python and freely available at: https://github.com/fpy94/DBT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Antígenos de Histocompatibilidade Classe I , Peptídeos , Algoritmos , Antígenos de Histocompatibilidade Classe I/genética , Antígenos de Histocompatibilidade Classe I/metabolismo , Humanos , Complexo Principal de Histocompatibilidade , Peptídeos/metabolismo , Ligação Proteica
4.
Nat Commun ; 12(1): 3307, 2021 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-34083538

RESUMO

Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound-kinase interactions for novel and potent activities. Here, we carry out a crowdsourced benchmarking of predictive algorithms for kinase inhibitor potencies across multiple kinase families tested on unpublished bioactivity data. We find the top-performing predictions are based on various models, including kernel learning, gradient boosting and deep learning, and their ensemble leads to a predictive accuracy exceeding that of single-dose kinase activity assays. We design experiments based on the model predictions and identify unexpected activities even for under-studied kinases, thereby accelerating experimental mapping efforts. The open-source prediction algorithms together with the bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking prediction algorithms and for extending the druggable kinome.


Assuntos
Inibidores de Proteínas Quinases/farmacologia , Proteínas Quinases/metabolismo , Algoritmos , Benchmarking , Crowdsourcing , Bases de Dados de Produtos Farmacêuticos , Aprendizado Profundo , Descoberta de Drogas , Avaliação Pré-Clínica de Medicamentos , Humanos , Cinética , Aprendizado de Máquina , Modelos Biológicos , Modelos Químicos , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/farmacocinética , Proteínas Quinases/química , Proteômica , Análise de Regressão
5.
Signal Transduct Target Ther ; 6(1): 165, 2021 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-33895786

RESUMO

The global spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) requires an urgent need to find effective therapeutics for the treatment of coronavirus disease 2019 (COVID-19). In this study, we developed an integrative drug repositioning framework, which fully takes advantage of machine learning and statistical analysis approaches to systematically integrate and mine large-scale knowledge graph, literature and transcriptome data to discover the potential drug candidates against SARS-CoV-2. Our in silico screening followed by wet-lab validation indicated that a poly-ADP-ribose polymerase 1 (PARP1) inhibitor, CVL218, currently in Phase I clinical trial, may be repurposed to treat COVID-19. Our in vitro assays revealed that CVL218 can exhibit effective inhibitory activity against SARS-CoV-2 replication without obvious cytopathic effect. In addition, we showed that CVL218 can interact with the nucleocapsid (N) protein of SARS-CoV-2 and is able to suppress the LPS-induced production of several inflammatory cytokines that are highly relevant to the prevention of immunopathology induced by SARS-CoV-2 infection.


Assuntos
Antivirais/uso terapêutico , COVID-19/tratamento farmacológico , COVID-19/metabolismo , Simulação por Computador , Reposicionamento de Medicamentos , Modelos Biológicos , SARS-CoV-2/metabolismo , Humanos
6.
PLoS Comput Biol ; 17(3): e1008842, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33770074

RESUMO

Translation elongation is regulated by a series of complicated mechanisms in both prokaryotes and eukaryotes. Although recent advance in ribosome profiling techniques has enabled one to capture the genome-wide ribosome footprints along transcripts at codon resolution, the regulatory codes of elongation dynamics are still not fully understood. Most of the existing computational approaches for modeling translation elongation from ribosome profiling data mainly focus on local contextual patterns, while ignoring the continuity of the elongation process and relations between ribosome densities of remote codons. Modeling the translation elongation process in full-length coding sequence (CDS) level has not been studied to the best of our knowledge. In this paper, we developed a deep learning based approach with a multi-input and multi-output framework, named RiboMIMO, for modeling the ribosome density distributions of full-length mRNA CDS regions. Through considering the underlying correlations in translation efficiency among neighboring and remote codons and extracting hidden features from the input full-length coding sequence, RiboMIMO can greatly outperform the state-of-the-art baseline approaches and accurately predict the ribosome density distributions along the whole mRNA CDS regions. In addition, RiboMIMO explores the contributions of individual input codons to the predictions of output ribosome densities, which thus can help reveal important biological factors influencing the translation elongation process. The analyses, based on our interpretable metric named codon impact score, not only identified several patterns consistent with the previously-published literatures, but also for the first time (to the best of our knowledge) revealed that the codons located at a long distance from the ribosomal A site may also have an association on the translation elongation rate. This finding of long-range impact on translation elongation velocity may shed new light on the regulatory mechanisms of protein synthesis. Overall, these results indicated that RiboMIMO can provide a useful tool for studying the regulation of translation elongation in the range of full-length CDS.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Modelos Genéticos , Elongação Traducional da Cadeia Peptídica/genética , Ribossomos , Códon/genética , Códon/metabolismo , Escherichia coli/genética , RNA Mensageiro/química , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Ribossomos/genética , Ribossomos/metabolismo , Saccharomyces cerevisiae/genética
8.
Nucleic Acids Res ; 49(7): 3719-3734, 2021 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-33744973

RESUMO

N6-methyladenosine (m6A) is the most pervasive modification in eukaryotic mRNAs. Numerous biological processes are regulated by this critical post-transcriptional mark, such as gene expression, RNA stability, RNA structure and translation. Recently, various experimental techniques and computational methods have been developed to characterize the transcriptome-wide landscapes of m6A modification for understanding its underlying mechanisms and functions in mRNA regulation. However, the experimental techniques are generally costly and time-consuming, while the existing computational models are usually designed only for m6A site prediction in a single-species and have significant limitations in accuracy, interpretability and generalizability. Here, we propose a highly interpretable computational framework, called MASS, based on a multi-task curriculum learning strategy to capture m6A features across multiple species simultaneously. Extensive computational experiments demonstrate the superior performances of MASS when compared to the state-of-the-art prediction methods. Furthermore, the contextual sequence features of m6A captured by MASS can be explained by the known critical binding motifs of the related RNA-binding proteins, which also help elucidate the similarity and difference among m6A features across species. In addition, based on the predicted m6A profiles, we further delineate the relationships between m6A and various properties of gene regulation, including gene expression, RNA stability, translation, RNA structure and histone modification. In summary, MASS may serve as a useful tool for characterizing m6A modification and studying its regulatory code. The source code of MASS can be downloaded from https://github.com/mlcb-thu/MASS.


Assuntos
Adenosina/análogos & derivados , Aprendizado de Máquina , RNA/química , Adenosina/química , Animais , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Regulação da Expressão Gênica , Humanos , Proteínas de Ligação a RNA , Análise de Sequência de RNA , Software , Transcriptoma
9.
Proc Natl Acad Sci U S A ; 118(6)2021 02 09.
Artigo em Inglês | MEDLINE | ID: mdl-33526657

RESUMO

RNA polymerase II (Pol II) generally pauses at certain positions along gene bodies, thereby interrupting the transcription elongation process, which is often coupled with various important biological functions, such as precursor mRNA splicing and gene expression regulation. Characterizing the transcriptional elongation dynamics can thus help us understand many essential biological processes in eukaryotic cells. However, experimentally measuring Pol II elongation rates is generally time and resource consuming. We developed PEPMAN (polymerase II elongation pausing modeling through attention-based deep neural network), a deep learning-based model that accurately predicts Pol II pausing sites based on the native elongating transcript sequencing (NET-seq) data. Through fully taking advantage of the attention mechanism, PEPMAN is able to decipher important sequence features underlying Pol II pausing. More importantly, we demonstrated that the analyses of the PEPMAN-predicted results around various types of alternative splicing sites can provide useful clues into understanding the cotranscriptional splicing events. In addition, associating the PEPMAN prediction results with different epigenetic features can help reveal important factors related to the transcription elongation process. All these results demonstrated that PEPMAN can provide a useful and effective tool for modeling transcription elongation and understanding the related biological factors from available high-throughput sequencing data.


Assuntos
Genoma Humano , Aprendizado de Máquina , Modelos Biológicos , Elongação da Transcrição Genética , Sequência de Bases , Sítios de Ligação , Metilação de DNA/genética , Epigênese Genética , Células HEK293 , Células HeLa , Histonas/metabolismo , Humanos , Motivos de Nucleotídeos/genética , Processamento de Proteína Pós-Traducional , RNA Polimerase II/metabolismo , Sítios de Splice de RNA/genética , Splicing de RNA/genética
10.
Artigo em Inglês | MEDLINE | ID: mdl-33631424

RESUMO

Pseudouridine (Ψ) is the most prevalent post-transcriptional RNA modification and is widespread in small cellular RNAs and mRNAs. However, the functions, mechanisms and precise distribution of Ψs (especially in mRNAs) still remain largely unclear. The landscape of Ψs across the transcriptome has not yet been fully delineated. Here, we present a highly effective model based on a convolutional neural network (CNN), called PULSE, to analyze large-scale profiling data of Ψ sites and characterize the contextual sequence features of pseudouridylation. PULSE, consisting of two alternatively stacked convolution and pooling layers followed by a fully-connected neural network, can automatically learn the hidden patterns of pseudouridylation from the local sequence information. Extensive validation tests demonstrated that PULSE can outperform other state-of-the-art prediction methods and achieve high prediction accuracy, thus enabling us to further characterize the transcriptome-wide landscape of Ψ sites. We further showed that the prediction results derived from PULSE can provide novel insights into understanding the functional roles of pseudouridylation, such as the regulations of RNA secondary structure, codon usage, translation, and RNA stability, and the connection to single nucleotide variants.

11.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33479731

RESUMO

Translation elongation is a crucial phase during protein biosynthesis. In this study, we develop a novel deep reinforcement learning-based framework, named Riboexp, to model the determinants of the uneven distribution of ribosomes on mRNA transcripts during translation elongation. In particular, our model employs a policy network to perform a context-dependent feature selection in the setting of ribosome density prediction. Our extensive tests demonstrated that Riboexp can significantly outperform the state-of-the-art methods in predicting ribosome density by up to 5.9% in terms of per-gene Pearson correlation coefficient on the datasets from three species. In addition, Riboexp can indicate more informative sequence features for the prediction task than other commonly used attribution methods in deep learning. In-depth analyses also revealed the meaningful biological insights generated by the Riboexp framework. Moreover, the application of Riboexp in codon optimization resulted in an increase of protein production by around 31% over the previous state-of-the-art method that models ribosome density. These results have established Riboexp as a powerful and useful computational tool in the studies of translation dynamics and protein synthesis. Availability: The data and code of this study are available on GitHub: https://github.com/Liuxg16/Riboexp. Contact:zengjy321@tsinghua.edu.cn; songsen@tsinghua.edu.cn.


Assuntos
Códon/metabolismo , Biologia Computacional , Modelos Biológicos , Biossíntese de Proteínas , Ribossomos/metabolismo
12.
Ann Transl Med ; 8(17): 1061, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33145280

RESUMO

Background: The prognosis for patients with hepatocellular carcinoma (HCC) after liver resection ranges widely and is unsatisfactory. This study aimed to develop two novel nomograms that combined tumor characteristics and inflammation-related indexes to predict overall survival (OS) and recurrence-free survival (RFS). Methods: In total, 3,071 patients who underwent radical resection were recruited. Independent risk factors were identified by Cox regression analysis and used to conduct prognostic nomograms. The C-index, time-dependent areas under the receiver operating characteristic curve (time-dependent AUC), decision curve analysis (DCA), and calibration curves were used to assess the performance of the nomograms. Results: Multivariate analysis revealed that alpha-fetoprotein (AFP), resection margin, neutrophil times γ-glutamyl transpeptidase-to-lymphocyte ratio (NrLR), platelet-to-lymphocyte ratio (PLR), γ-glutamyl transpeptidase-to-platelet ratio (GPR), tumor size, tumor number, microvascular invasion, and Edmondson-Steiner grade were the independent risk factors associated with OS. The independent risk factors associated with RFS were hepatitis, AFP, albumin-bilirubin (ALBI), NrLR, PLR, PNI, GPR, tumor size, tumor number, microvascular invasion, and Edmondson-Steiner grade. The C-index of the nomograms in the training and validation cohort were 0.71 [95% confidence interval (CI): 0.70-0.73] and 0.71 (95% CI: 0.69-0.74) for the OS, and 0.71 (95% CI: 0.70-0.73) and 0.74 (95% CI: 0.72-0.76) for RFS, respectively. The C-index, time-dependent AUC, and DCA of the nomograms showed significantly better predictive performances than those of commonly used staging systems. The models could stratify patients into three different risk groups. The web-based tools are convenient for clinical practice. Conclusions: Two novel nomograms in which integrated inflammation-related indexes and accessible clinical parameters were developed to predict OS and RFS in HCC patients who underwent radical resection. Such models will help guide postoperative individualized follow-up and adjuvant therapy.

13.
Front Pharmacol ; 11: 112, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32184722

RESUMO

Synthetic lethality (SL), an important type of genetic interaction, can provide useful insight into the target identification process for the development of anticancer therapeutics. Although several well-established SL gene pairs have been verified to be conserved in humans, most SL interactions remain cell-line specific. Here, we demonstrated that the cell-line-specific gene expression profiles derived from the shRNA perturbation experiments performed in the LINCS L1000 project can provide useful features for predicting SL interactions in human. In this paper, we developed a semi-supervised neural network-based method called EXP2SL to accurately identify SL interactions from the L1000 gene expression profiles. Through a systematic evaluation on the SL datasets of three different cell lines, we demonstrated that our model achieved better performance than the baseline methods and verified the effectiveness of using the L1000 gene expression features and the semi-supervise training technique in SL prediction.

14.
Bioinformatics ; 36(9): 2872-2880, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31950974

RESUMO

MOTIVATION: Quantitative structure-activity relationship (QSAR) and drug-target interaction (DTI) prediction are both commonly used in drug discovery. Collaboration among pharmaceutical institutions can lead to better performance in both QSAR and DTI prediction. However, the drug-related data privacy and intellectual property issues have become a noticeable hindrance for inter-institutional collaboration in drug discovery. RESULTS: We have developed two novel algorithms under secure multiparty computation (MPC), including QSARMPC and DTIMPC, which enable pharmaceutical institutions to achieve high-quality collaboration to advance drug discovery without divulging private drug-related information. QSARMPC, a neural network model under MPC, displays good scalability and performance and is feasible for privacy-preserving collaboration on large-scale QSAR prediction. DTIMPC integrates drug-related heterogeneous network data and accurately predicts novel DTIs, while keeping the drug information confidential. Under several experimental settings that reflect the situations in real drug discovery scenarios, we have demonstrated that DTIMPC possesses significant performance improvement over the baseline methods, generates novel DTI predictions with supporting evidence from the literature and shows the feasible scalability to handle growing DTI data. All these results indicate that QSARMPC and DTIMPC can provide practically useful tools for advancing privacy-preserving drug discovery. AVAILABILITY AND IMPLEMENTATION: The source codes of QSARMPC and DTIMPC are available on the GitHub: https://github.com/rongma6/QSARMPC_DTIMPC.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Descoberta de Drogas , Privacidade , Algoritmos , Desenvolvimento de Medicamentos
15.
BMC Bioinformatics ; 20(Suppl 24): 678, 2019 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-31861979

RESUMO

BACKGROUND: Ribosome profiling brings insight to the process of translation. A basic step in profile construction at transcript level is to map Ribo-seq data to transcripts, and then assign a huge number of multiple-mapped reads to similar isoforms. Existing methods either discard the multiple mapped-reads, or allocate them randomly, or assign them proportionally according to transcript abundance estimated from RNA-seq data. RESULTS: Here we present DeepShape, an RNA-seq free computational method to estimate ribosome abundance of isoforms, and simultaneously compute their ribosome profiles using a deep learning model. Our simulation results demonstrate that DeepShape can provide more accurate estimations on both ribosome abundance and profiles when compared to state-of-the-art methods. We applied DeepShape to a set of Ribo-seq data from PC3 human prostate cancer cells with and without PP242 treatment. In the four cell invasion/metastasis genes that are translationally regulated by PP242 treatment, different isoforms show very different characteristics of translational efficiency and regulation patterns. Transcript level ribosome distributions were analyzed by "Codon Residence Index (CRI)" proposed in this study to investigate the relative speed that a ribosome moves on a codon compared to its synonymous codons. We observe consistent CRI patterns in PC3 cells. We found that the translation of several codons could be regulated by PP242 treatment. CONCLUSION: In summary, we demonstrate that DeepShape can serve as a powerful tool for Ribo-seq data analysis.


Assuntos
Ribossomos/metabolismo , Análise de Sequência de RNA/métodos , Linhagem Celular Tumoral , Códon/genética , Códon/metabolismo , Humanos , Isoformas de Proteínas/genética , Software
16.
Bioinformatics ; 35(14): i284-i294, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510699

RESUMO

MOTIVATION: Alternative splicing generates multiple isoforms from a single gene, greatly increasing the functional diversity of a genome. Although gene functions have been well studied, little is known about the specific functions of isoforms, making accurate prediction of isoform functions highly desirable. However, the existing approaches to predicting isoform functions are far from satisfactory due to at least two reasons: (i) unlike genes, isoform-level functional annotations are scarce. (ii) The information of isoform functions is concealed in various types of data including isoform sequences, co-expression relationship among isoforms, etc. RESULTS: In this study, we present a novel approach, DIFFUSE (Deep learning-based prediction of IsoForm FUnctions from Sequences and Expression), to predict isoform functions. To integrate various types of data, our approach adopts a hybrid framework by first using a deep neural network (DNN) to predict the functions of isoforms from their genomic sequences and then refining the prediction using a conditional random field (CRF) based on co-expression relationship. To overcome the lack of isoform-level ground truth labels, we further propose an iterative semi-supervised learning algorithm to train both the DNN and CRF together. Our extensive computational experiments demonstrate that DIFFUSE could effectively predict the functions of isoforms and genes. It achieves an average area under the receiver operating characteristics curve of 0.840 and area under the precision-recall curve of 0.581 over 4184 GO functional categories, which are significantly higher than the state-of-the-art methods. We further validate the prediction results by analyzing the correlation between functional similarity, sequence similarity, expression similarity and structural similarity, as well as the consistency between the predicted functions and some well-studied functional features of isoform sequences. AVAILABILITY AND IMPLEMENTATION: https://github.com/haochenucr/DIFFUSE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Algoritmos , Processamento Alternativo
17.
Nat Commun ; 10(1): 2049, 2019 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-31053705

RESUMO

The new advances in various experimental techniques that provide complementary information about the spatial conformations of chromosomes have inspired researchers to develop computational methods to fully exploit the merits of individual data sources and combine them to improve the modeling of chromosome structure. Here we propose GEM-FISH, a method for reconstructing the 3D models of chromosomes through systematically integrating both Hi-C and FISH data with the prior biophysical knowledge of a polymer model. Comprehensive tests on a set of chromosomes, for which both Hi-C and FISH data are available, demonstrate that GEM-FISH can outperform previous chromosome structure modeling methods and accurately capture the higher order spatial features of chromosome conformations. Moreover, our reconstructed 3D models of chromosomes revealed interesting patterns of spatial distributions of super-enhancers which can provide useful insights into understanding the functional roles of these super-enhancers in gene regulation.


Assuntos
Cromossomos/química , Imageamento Tridimensional/métodos , Modelos Moleculares , Conformação de Ácido Nucleico , Linhagem Celular , Cromatina/química , Cromatina/genética , Cromossomos/genética , Simulação por Computador , Conjuntos de Dados como Assunto , Elementos Facilitadores Genéticos/genética , Genoma Humano/genética , Humanos , Hibridização in Situ Fluorescente/métodos
18.
Bioinformatics ; 35(23): 4946-4954, 2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31120490

RESUMO

MOTIVATION: Prediction of peptide binding to the major histocompatibility complex (MHC) plays a vital role in the development of therapeutic vaccines for the treatment of cancer. Algorithms with improved correlations between predicted and actual binding affinities are needed to increase precision and reduce the number of false positive predictions. RESULTS: We present ACME (Attention-based Convolutional neural networks for MHC Epitope binding prediction), a new pan-specific algorithm to accurately predict the binding affinities between peptides and MHC class I molecules, even for those new alleles that are not seen in the training data. Extensive tests have demonstrated that ACME can significantly outperform other state-of-the-art prediction methods with an increase of the Pearson correlation coefficient between predicted and measured binding affinities by up to 23 percentage points. In addition, its ability to identify strong-binding peptides has been experimentally validated. Moreover, by integrating the convolutional neural network with attention mechanism, ACME is able to extract interpretable patterns that can provide useful and detailed insights into the binding preferences between peptides and their MHC partners. All these results have demonstrated that ACME can provide a powerful and practically useful tool for the studies of peptide-MHC class I interactions. AVAILABILITY AND IMPLEMENTATION: ACME is available as an open source software at https://github.com/HYsxe/ACME. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Neurais de Computação , Algoritmos , Atenção , Sítios de Ligação , Biologia Computacional , Antígenos de Histocompatibilidade Classe I , Peptídeos , Ligação Proteica
19.
Genomics Proteomics Bioinformatics ; 17(5): 478-495, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-32035227

RESUMO

Accurate identification of compound-protein interactions (CPIs) in silico may deepen our understanding of the underlying mechanisms of drug action and thus remarkably facilitate drug discovery and development. Conventional similarity- or docking-based computational methods for predicting CPIs rarely exploit latent features from currently available large-scale unlabeled compound and protein data and often limit their usage to relatively small-scale datasets. In the present study, we propose DeepCPI, a novel general and scalable computational framework that combines effective feature embedding (a technique of representation learning) with powerful deep learning methods to accurately predict CPIs at a large scale. DeepCPI automatically learns the implicit yet expressive low-dimensional features of compounds and proteins from a massive amount of unlabeled data. Evaluations of the measured CPIs in large-scale databases, such as ChEMBL and BindingDB, as well as of the known drug-target interactions from DrugBank, demonstrated the superior predictive performance of DeepCPI. Furthermore, several interactions among small-molecule compounds and three G protein-coupled receptor targets (glucagon-like peptide-1 receptor, glucagon receptor, and vasoactive intestinal peptide receptor) predicted using DeepCPI were experimentally validated. The present study suggests that DeepCPI is a useful and powerful tool for drug discovery and repositioning. The source code of DeepCPI can be downloaded from https://github.com/FangpingWan/DeepCPI.


Assuntos
Aprendizado Profundo , Interface Usuário-Computador , Área Sob a Curva , Bases de Dados de Compostos Químicos , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo , Proteínas/química , Proteínas/metabolismo , Curva ROC
20.
Bioinformatics ; 35(1): 104-111, 2019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30561548

RESUMO

Motivation: Accurately predicting drug-target interactions (DTIs) in silico can guide the drug discovery process and thus facilitate drug development. Computational approaches for DTI prediction that adopt the systems biology perspective generally exploit the rationale that the properties of drugs and targets can be characterized by their functional roles in biological networks. Results: Inspired by recent advance of information passing and aggregation techniques that generalize the convolution neural networks to mine large-scale graph data and greatly improve the performance of many network-related prediction tasks, we develop a new nonlinear end-to-end learning model, called NeoDTI, that integrates diverse information from heterogeneous network data and automatically learns topology-preserving representations of drugs and targets to facilitate DTI prediction. The substantial prediction performance improvement over other state-of-the-art DTI prediction methods as well as several novel predicted DTIs with evidence supports from previous studies have demonstrated the superior predictive power of NeoDTI. In addition, NeoDTI is robust against a wide range of choices of hyperparameters and is ready to integrate more drug and target related information (e.g. compound-protein binding affinity data). All these results suggest that NeoDTI can offer a powerful and robust tool for drug development and drug repositioning. Availability and implementation: The source code and data used in NeoDTI are available at: https://github.com/FangpingWan/NeoDTI. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Simulação por Computador , Desenvolvimento de Medicamentos/métodos , Software , Descoberta de Drogas , Reposicionamento de Medicamentos , Ligação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...