Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 91
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36627113

RESUMEN

Protein-ligand binding affinity prediction is an important task in structural bioinformatics for drug discovery and design. Although various scoring functions (SFs) have been proposed, it remains challenging to accurately evaluate the binding affinity of a protein-ligand complex with the known bound structure because of the potential preference of scoring system. In recent years, deep learning (DL) techniques have been applied to SFs without sophisticated feature engineering. Nevertheless, existing methods cannot model the differential contribution of atoms in various regions of proteins, and the relationship between atom properties and intermolecular distance is also not fully explored. We propose a novel empirical graph neural network for accurate protein-ligand binding affinity prediction (EGNA). Graphs of protein, ligand and their interactions are constructed based on different regions of each bound complex. Proteins and ligands are effectively represented by graph convolutional layers, enabling the EGNA to capture interaction patterns precisely by simulating empirical SFs. The contributions of different factors on binding affinity can thus be transparently investigated. EGNA is compared with the state-of-the-art machine learning-based SFs on two widely used benchmark data sets. The results demonstrate the superiority of EGNA and its good generalization capability.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Ligandos , Proteínas/química , Unión Proteica , Algoritmos
2.
Bioinformatics ; 40(4)2024 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-38483285

RESUMEN

MOTIVATION: Drug-target interaction (DTI) prediction refers to the prediction of whether a given drug molecule will bind to a specific target and thus exert a targeted therapeutic effect. Although intelligent computational approaches for drug target prediction have received much attention and made many advances, they are still a challenging task that requires further research. The main challenges are manifested as follows: (i) most graph neural network-based methods only consider the information of the first-order neighboring nodes (drug and target) in the graph, without learning deeper and richer structural features from the higher-order neighboring nodes. (ii) Existing methods do not consider both the sequence and structural features of drugs and targets, and each method is independent of each other, and cannot combine the advantages of sequence and structural features to improve the interactive learning effect. RESULTS: To address the above challenges, a Multi-view Integrated learning Network that integrates Deep learning and Graph Learning (MINDG) is proposed in this study, which consists of the following parts: (i) a mixed deep network is used to extract sequence features of drugs and targets, (ii) a higher-order graph attention convolutional network is proposed to better extract and capture structural features, and (iii) a multi-view adaptive integrated decision module is used to improve and complement the initial prediction results of the above two networks to enhance the prediction performance. We evaluate MINDG on two dataset and show it improved DTI prediction performance compared to state-of-the-art baselines. AVAILABILITY AND IMPLEMENTATION: https://github.com/jnuaipr/MINDG.


Asunto(s)
Algoritmos , Redes Neurales de la Computación
3.
Methods ; 222: 28-40, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38159688

RESUMEN

Due to the abnormal secretion of adreno-cortico-tropic-hormone (ACTH) by tumors, Cushing's disease leads to hypercortisonemia, a precursor to a series of metabolic disorders and serious complications. Cushing's disease has high recurrence rate, short recurrence time and undiscovered recurrence reason after surgical resection. Qualitative or quantitative automatic image analysis of histology images can potentially in providing insights into Cushing's disease, but still no software has been available to the best of our knowledge. In this study, we propose a quantitative image analysis-based pipeline CRCS, which aims to explore the relationship between the expression level of ACTH in normal cell tissues adjacent to tumor cells and the postoperative prognosis of patients. CRCS mainly consists of image-level clustering, cluster-level multi-modal image registration, patch-level image classification and pixel-level image segmentation on the whole slide imaging (WSI). On both image registration and classification tasks, our method CRCS achieves state-of-the-art performance compared to recently published methods on our collected benchmark dataset. In addition, CRCS achieves an accuracy of 0.83 for postoperative prognosis of 12 cases. CRCS demonstrates great potential for instrumenting automatic diagnosis and treatment for Cushing's disease.


Asunto(s)
Hipersecreción de la Hormona Adrenocorticotrópica Pituitaria (HACT) , Humanos , Hipersecreción de la Hormona Adrenocorticotrópica Pituitaria (HACT)/diagnóstico por imagen , Pronóstico , Hormona Adrenocorticotrópica
4.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34486019

RESUMEN

Long noncoding RNAs (lncRNAs) play important roles in various biological regulatory processes, and are closely related to the occurrence and development of diseases. Identifying lncRNA-disease associations is valuable for revealing the molecular mechanism of diseases and exploring treatment strategies. Thus, it is necessary to computationally predict lncRNA-disease associations as a complementary method for biological experiments. In this study, we proposed a novel prediction method GCRFLDA based on the graph convolutional matrix completion. GCRFLDA first constructed a graph using the available lncRNA-disease association information. Then, it constructed an encoder consisting of conditional random field and attention mechanism to learn efficient embeddings of nodes, and a decoder layer to score lncRNA-disease associations. In GCRFLDA, the Gaussian interaction profile kernels similarity and cosine similarity were fused as side information of lncRNA and disease nodes. Experimental results on four benchmark datasets show that GCRFLDA is superior to other existing methods. Moreover, we conducted case studies on four diseases and observed that 70 of 80 predicted associated lncRNAs were confirmed by the literature.


Asunto(s)
ARN Largo no Codificante , Algoritmos , Biología Computacional/métodos , ARN Largo no Codificante/genética , Proyectos de Investigación
5.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34571539

RESUMEN

Circular RNAs (circRNAs) generally bind to RNA-binding proteins (RBPs) to play an important role in the regulation of autoimmune diseases. Thus, it is crucial to study the binding sites of RBPs on circRNAs. Although many methods, including traditional machine learning and deep learning, have been developed to predict the interactions between RNAs and RBPs, and most of them are focused on linear RNAs. At present, few studies have been done on the binding relationships between circRNAs and RBPs. Thus, in-depth research is urgently needed. In the existing circRNA-RBP binding site prediction methods, circRNA sequences are the main research subjects, but the relevant characteristics of circRNAs have not been fully exploited, such as the structure and composition information of circRNA sequences. Some methods have extracted different views to construct recognition models, but how to efficiently use the multi-view data to construct recognition models is still not well studied. Considering the above problems, this paper proposes a multi-view classification method called DMSK based on multi-view deep learning, subspace learning and multi-view classifier for the identification of circRNA-RBP interaction sites. In the DMSK method, first, we converted circRNA sequences into pseudo-amino acid sequences and pseudo-dipeptide components for extracting high-dimensional sequence features and component features of circRNAs, respectively. Then, the structure prediction method RNAfold was used to predict the secondary structure of the RNA sequences, and the sequence embedding model was used to extract the context-dependent features. Next, we fed the above four views' raw features to a hybrid network, which is composed of a convolutional neural network and a long short-term memory network, to obtain the deep features of circRNAs. Furthermore, we used view-weighted generalized canonical correlation analysis to extract four views' common features by subspace learning. Finally, the learned subspace common features and multi-view deep features were fed to train the downstream multi-view TSK fuzzy system to construct a fuzzy rule and fuzzy inference-based multi-view classifier. The trained classifier was used to predict the specific positions of the RBP binding sites on the circRNAs. The experiments show that the prediction performance of the proposed method DMSK has been improved compared with the existing methods. The code and dataset of this study are available at https://github.com/Rebecca3150/DMSK.


Asunto(s)
Aprendizaje Profundo , ARN Circular , Sitios de Unión , Proteínas Portadoras/metabolismo , Biología Computacional/métodos , Humanos
6.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35907779

RESUMEN

Circular RNA (circRNA) is closely involved in physiological and pathological processes of many diseases. Discovering the associations between circRNAs and diseases is of great significance. Due to the high-cost to verify the circRNA-disease associations by wet-lab experiments, computational approaches for predicting the associations become a promising research direction. In this paper, we propose a method, MDGF-MCEC, based on multi-view dual attention graph convolution network (GCN) with cooperative ensemble learning to predict circRNA-disease associations. First, MDGF-MCEC constructs two disease relation graphs and two circRNA relation graphs based on different similarities. Then, the relation graphs are fed into a multi-view GCN for representation learning. In order to learn high discriminative features, a dual-attention mechanism is introduced to adjust the contribution weights, at both channel level and spatial level, of different features. Based on the learned embedding features of diseases and circRNAs, nine different feature combinations between diseases and circRNAs are treated as new multi-view data. Finally, we construct a multi-view cooperative ensemble classifier to predict the associations between circRNAs and diseases. Experiments conducted on the CircR2Disease database demonstrate that the proposed MDGF-MCEC model achieves a high area under curve of 0.9744 and outperforms the state-of-the-art methods. Promising results are also obtained from experiments on the circ2Disease and circRNADisease databases. Furthermore, the predicted associated circRNAs for hepatocellular carcinoma and gastric cancer are supported by the literature. The code and dataset of this study are available at https://github.com/ABard0/MDGF-MCEC.


Asunto(s)
ARN Circular , Neoplasias Gástricas , Humanos , Péptidos y Proteínas de Señalización Intercelular , Aprendizaje Automático , Neoplasias Gástricas/genética
7.
Bioinformatics ; 39(4)2023 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-36961341

RESUMEN

MOTIVATION: Generating molecules of high quality and drug-likeness in the vast chemical space is a big challenge in the drug discovery. Most existing molecule generative methods focus on diversity and novelty of molecules, but ignoring drug potentials of the generated molecules during the generation process. RESULTS: In this study, we present a novel de novo multiobjective quality assessment-based drug design approach (QADD), which integrates an iterative refinement framework with a novel graph-based molecular quality assessment model on drug potentials. QADD designs a multiobjective deep reinforcement learning pipeline to generate molecules with multiple desired properties iteratively, where a graph neural network-based model for accurate molecular quality assessment on drug potentials is introduced to guide molecule generation. Experimental results show that QADD can jointly optimize multiple molecular properties with a promising performance and the quality assessment module is capable of guiding the generated molecules with high drug potentials. Furthermore, applying QADD to generate novel molecules binding to a biological target protein DRD2 also demonstrates the algorithm's efficacy. AVAILABILITY AND IMPLEMENTATION: QADD is freely available online for academic use at https://github.com/yifang000/QADD or http://www.csbio.sjtu.edu.cn/bioinf/QADD.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Modelos Moleculares , Diseño de Fármacos
8.
Bioinformatics ; 39(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37561093

RESUMEN

MOTIVATION: CircRNAs play a critical regulatory role in physiological processes, and the abnormal expression of circRNAs can mediate the processes of diseases. Therefore, exploring circRNAs-disease associations is gradually becoming an important area of research. Due to the high cost of validating circRNA-disease associations using traditional wet-lab experiments, novel computational methods based on machine learning are gaining more and more attention in this field. However, current computational methods suffer to insufficient consideration of latent features in circRNA-disease interactions. RESULTS: In this study, a multilayer attention neural graph-based collaborative filtering (MLNGCF) is proposed. MLNGCF first enhances multiple biological information with autoencoder as the initial features of circRNAs and diseases. Then, by constructing a central network of different diseases and circRNAs, a multilayer cooperative attention-based message propagation is performed on the central network to obtain the high-order features of circRNAs and diseases. A neural network-based collaborative filtering is constructed to predict the unknown circRNA-disease associations and update the model parameters. Experiments on the benchmark datasets demonstrate that MLNGCF outperforms state-of-the-art methods, and the prediction results are supported by the literature in the case studies. AVAILABILITY AND IMPLEMENTATION: The source codes and benchmark datasets of MLNGCF are available at https://github.com/ABard0/MLNGCF.


Asunto(s)
Redes Neurales de la Computación , ARN Circular , Aprendizaje Automático , Programas Informáticos , Biología Computacional/métodos
9.
Zhejiang Da Xue Xue Bao Yi Xue Ban ; 53(2): 184-193, 2024 Apr 25.
Artículo en Inglés, Zh | MEDLINE | ID: mdl-38562030

RESUMEN

OBJECTIVES: To investigate the role of m.4435A>G and YARS2 c.572G>T (p.G191V) mutations in the development of essential hypertension. METHODS: A hypertensive patient with m.4435A>G and YARS2 p.G191V mutations was identified from previously collected mitochondrial genome and exon sequencing data. Clinical data were collected, and a molecular genetic study was conducted in the proband and his family members. Peripheral venous blood was collected, and immortalized lymphocyte lines constructed. The mitochondrial transfer RNA (tRNA), mitochondrial protein, adenosine triphosphate (ATP), mitochondrial membrane potential (MMP), and reactive oxygen species (ROS) in the constructed lymphocyte cell lines were measured. RESULTS: Mitochondrial genome sequencing showed that all maternal members carried a highly conserved m.4435A>G mutation. The m.4435A>G mutation might affect the secondary structure and folding free energy of mitochondrial tRNA and change its stability, which may influence the anticodon ring structure. Compared with the control group, the cell lines carrying m.4435A>G and YARS2 p.G191V mutations had decreased mitochondrial tRNA homeostasis, mitochondrial protein expression, ATP production and MMP levels, as well as increased ROS levels (all P<0.05). CONCLUSIONS: The YARS2 p.G191V mutation aggravates the changes in mitochondrial translation and mitochondrial function caused by m.4435A>G through affecting the steady-state level of mitochondrial tRNA and further leads to cell dysfunction, indicating that YARS2 p.G191V and m.4435A>G mutations have a synergistic effect in this family and jointly participate in the occurrence and development of essential hypertension.


Asunto(s)
Hipertensión Esencial , Mutación , ARN de Transferencia de Metionina , Tirosina-ARNt Ligasa , Femenino , Humanos , Masculino , Hipertensión Esencial/genética , Genoma Mitocondrial , Potencial de la Membrana Mitocondrial/genética , Mitocondrias/genética , Especies Reactivas de Oxígeno/metabolismo , ARN de Transferencia/genética , ARN de Transferencia de Metionina/genética , Tirosina-ARNt Ligasa/genética
10.
BMC Bioinformatics ; 24(1): 430, 2023 Nov 13.
Artículo en Inglés | MEDLINE | ID: mdl-37957563

RESUMEN

BACKGROUND: Antibody-mediated immune responses play a crucial role in the immune defense of human body. The evolution of bioengineering has led the progress of antibody-derived drugs, showing promising efficacy in cancer and autoimmune disease therapy. A critical step of this development process is obtaining the affinity between antibodies and their binding antigens. RESULTS: In this study, we introduce a novel sequence-based antigen-antibody affinity prediction method, named DG-Affinity. DG-Affinity uses deep neural networks to efficiently and accurately predict the affinity between antibodies and antigens from sequences, without the need for structural information. The sequences of both the antigen and the antibody are first transformed into embedding vectors by two pre-trained language models, then these embeddings are concatenated into an ConvNeXt framework with a regression task. The results demonstrate the superiority of DG-Affinity over the existing structure-based prediction methods and the sequence-based tools, achieving a Pearson's correlation of over 0.65 on an independent test dataset. CONCLUSIONS: Compared to the baseline methods, DG-Affinity achieves the best performance and can advance the development of antibody design. It is freely available as an easy-to-use web server at https://www.digitalgeneai.tech/solution/affinity .


Asunto(s)
Anticuerpos , Redes Neurales de la Computación , Humanos , Afinidad de Anticuerpos
11.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34297803

RESUMEN

Circular RNAs (circRNAs) interact with RNA-binding proteins (RBPs) to play crucial roles in gene regulation and disease development. Computational approaches have attracted much attention to quickly predict highly potential RBP binding sites on circRNAs using the sequence or structure statistical binding knowledge. Deep learning is one of the popular learning models in this area but usually requires a lot of labeled training data. It would perform unsatisfactorily for the less characterized RBPs with a limited number of known target circRNAs. How to improve the prediction performance for such small-size labeled characterized RBPs is a challenging task for deep learning-based models. In this study, we propose an RBP-specific method iDeepC for predicting RBP binding sites on circRNAs from sequences. It adopts a Siamese neural network consisting of a lightweight attention module and a metric module. We have found that Siamese neural network effectively enhances the network capability of capturing mutual information between circRNAs with pairwise metric learning. To further deal with the small-sample size problem, we have performed the pretraining using available labeled data from other RBPs and also demonstrate the efficacy of this transfer-learning pipeline. We comprehensively evaluated iDeepC on the benchmark datasets of RBP-binding circRNAs, and the results suggest iDeepC achieving promising results on the poorly characterized RBPs. The source code is available at https://github.com/hehew321/iDeepC.


Asunto(s)
ARN Circular/metabolismo , Proteínas de Unión al ARN/metabolismo , Sitios de Unión , Biología Computacional/métodos , Redes Neurales de la Computación
12.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32808039

RESUMEN

RNA-binding protein (RBP) is a class of proteins that bind to and accompany RNAs in regulating biological processes. An RBP may have multiple target RNAs, and its aberrant expression can cause multiple diseases. Methods have been designed to predict whether a specific RBP can bind to an RNA and the position of the binding site using binary classification model. However, most of the existing methods do not take into account the binding similarity and correlation between different RBPs. While methods employing multiple labels and Long Short Term Memory Network (LSTM) are proposed to consider binding similarity between different RBPs, the accuracy remains low due to insufficient feature learning and multi-label learning on RNA sequences. In response to this challenge, the concept of RNA-RBP Binding Network (RRBN) is proposed in this paper to provide theoretical support for multi-label learning to identify RBPs that can bind to RNAs. It is experimentally shown that the RRBN information can significantly improve the prediction of unknown RNA-RBP interactions. To further improve the prediction accuracy, we present the novel computational method iDeepMV which integrates multi-view deep learning technology under the multi-label learning framework. iDeepMV first extracts data from the views of amino acid sequence and dipeptide component based on the RNA sequences as the original view. Deep neural network models are then designed for the respective views to perform deep feature learning. The extracted deep features are fed into multi-label classifiers which are trained with the RNA-RBP interaction information for the three views. Finally, a voting mechanism is designed to make comprehensive decision on the results of the multi-label classifiers. Our experimental results show that the prediction performance of iDeepMV, which combines multi-view deep feature learning models with RNA-RBP interaction information, is significantly better than that of the state-of-the-art methods. iDeepMV is freely available at http://www.csbio.sjtu.edu.cn/bioinf/iDeepMV for academic use. The code is freely available at http://github.com/uchihayht/iDeepMV.


Asunto(s)
Aprendizaje Automático , Proteínas de Unión al ARN/metabolismo , Biología Computacional/métodos , Redes Neurales de la Computación
13.
PLoS Comput Biol ; 18(3): e1009986, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35324898

RESUMEN

Protein structure alignment algorithms are often time-consuming, resulting in challenges for large-scale protein structure similarity-based retrieval. There is an urgent need for more efficient structure comparison approaches as the number of protein structures increases rapidly. In this paper, we propose an effective graph-based protein structure representation learning method, GraSR, for fast and accurate structure comparison. In GraSR, a graph is constructed based on the intra-residue distance derived from the tertiary structure. Then, deep graph neural networks (GNNs) with a short-cut connection learn graph representations of the tertiary structures under a contrastive learning framework. To further improve GraSR, a novel dynamic training data partition strategy and length-scaling cosine distance are introduced. We objectively evaluate our method GraSR on SCOPe v2.07 and a new released independent test set from PDB database with a designed comprehensive performance metric. Compared with other state-of-the-art methods, GraSR achieves about 7%-10% improvement on two benchmark datasets. GraSR is also much faster than alignment-based methods. We dig into the model and observe that the superiority of GraSR is mainly brought by the learned discriminative residue-level and global descriptors. The web-server and source code of GraSR are freely available at www.csbio.sjtu.edu.cn/bioinf/GraSR/ for academic use.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Algoritmos , Aprendizaje , Programas Informáticos
14.
Nucleic Acids Res ; 49(9): e51, 2021 05 21.
Artículo en Inglés | MEDLINE | ID: mdl-33577689

RESUMEN

Knowledge of the interactions between proteins and nucleic acids is the basis of understanding various biological activities and designing new drugs. How to accurately identify the nucleic-acid-binding residues remains a challenging task. In this paper, we propose an accurate predictor, GraphBind, for identifying nucleic-acid-binding residues on proteins based on an end-to-end graph neural network. Considering that binding sites often behave in highly conservative patterns on local tertiary structures, we first construct graphs based on the structural contexts of target residues and their spatial neighborhood. Then, hierarchical graph neural networks (HGNNs) are used to embed the latent local patterns of structural and bio-physicochemical characteristics for binding residue recognition. We comprehensively evaluate GraphBind on DNA/RNA benchmark datasets. The results demonstrate the superior performance of GraphBind than state-of-the-art methods. Moreover, GraphBind is extended to other ligand-binding residue prediction to verify its generalization capability. Web server of GraphBind is freely available at http://www.csbio.sjtu.edu.cn/bioinf/GraphBind/.


Asunto(s)
Proteínas de Unión al ADN/química , Redes Neurales de la Computación , Proteínas de Unión al ARN/química , Sitios de Unión , ADN/química , Unión Proteica , Conformación Proteica , ARN/química , Programas Informáticos
15.
Bioinformatics ; 37(16): 2308-2316, 2021 Aug 25.
Artículo en Inglés | MEDLINE | ID: mdl-33630066

RESUMEN

MOTIVATION: Long non-coding RNAs (lncRNAs) are generally expressed in a tissue-specific way, and subcellular localizations of lncRNAs depend on the tissues or cell lines that they are expressed. Previous computational methods for predicting subcellular localizations of lncRNAs do not take this characteristic into account, they train a unified machine learning model for pooled lncRNAs from all available cell lines. It is of importance to develop a cell-line-specific computational method to predict lncRNA locations in different cell lines. RESULTS: In this study, we present an updated cell-line-specific predictor lncLocator 2.0, which trains an end-to-end deep model per cell line, for predicting lncRNA subcellular localization from sequences. We first construct benchmark datasets of lncRNA subcellular localizations for 15 cell lines. Then we learn word embeddings using natural language models, and these learned embeddings are fed into convolutional neural network, long short-term memory and multilayer perceptron to classify subcellular localizations. lncLocator 2.0 achieves varying effectiveness for different cell lines and demonstrates the necessity of training cell-line-specific models. Furthermore, we adopt Integrated Gradients to explain the proposed model in lncLocator 2.0, and find some potential patterns that determine the subcellular localizations of lncRNAs, suggesting that the subcellular localization of lncRNAs is linked to some specific nucleotides. AVAILABILITYAND IMPLEMENTATION: The lncLocator 2.0 is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator2 and the source code can be found at https://github.com/Yang-J-LIN/lncLocator2.

16.
Bioinformatics ; 36(21): 5159-5168, 2021 01 29.
Artículo en Inglés | MEDLINE | ID: mdl-32692832

RESUMEN

MOTIVATION: Genetically engineering food crops involves introducing proteins from other species into crop plant species or modifying already existing proteins with gene editing techniques. In addition, newly synthesized proteins can be used as therapeutic protein drugs against diseases. For both research and safety regulation purposes, being able to assess the potential toxicity of newly introduced/synthesized proteins is of high importance. RESULTS: In this study, we present ToxDL, a deep learning-based approach for in silico prediction of protein toxicity from sequence alone. ToxDL consists of (i) a module encompassing a convolutional neural network that has been designed to handle variable-length input sequences, (ii) a domain2vec module for generating protein domain embeddings and (iii) an output module that classifies proteins as toxic or non-toxic, using the outputs of the two aforementioned modules. Independent test results obtained for animal proteins and cross-species transferability results obtained for bacteria proteins indicate that ToxDL outperforms traditional homology-based approaches and state-of-the-art machine-learning techniques. Furthermore, through visualizations based on saliency maps, we are able to verify that the proposed network learns known toxic motifs. Moreover, the saliency maps allow for directed in silico modification of a sequence, thus making it possible to alter its predicted protein toxicity. AVAILABILITY AND IMPLEMENTATION: ToxDL is freely available at http://www.csbio.sjtu.edu.cn/bioinf/ToxDL/. The source code can be found at https://github.com/xypan1232/ToxDL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Aprendizaje Automático , Redes Neurales de la Computación , Proteínas/genética , Programas Informáticos
17.
RNA ; 25(12): 1604-1615, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31537716

RESUMEN

Circular RNAs (circRNAs), with their crucial roles in gene regulation and disease development, have become rising stars in the RNA world. To understand the regulatory function of circRNAs, many studies focus on the interactions between circRNAs and RNA-binding proteins (RBPs). Recently, the abundant CLIP-seq experimental data has enabled the large-scale identification and analysis of circRNA-RBP interactions, whereas, as far as we know, no computational tool based on machine learning has been proposed yet. We develop CRIP (CircRNAs Interact with Proteins) for the prediction of RBP-binding sites on circRNAs using RNA sequences alone. CRIP consists of a stacked codon-based encoding scheme and a hybrid deep learning architecture, in which a convolutional neural network (CNN) learns high-level abstract features and a recurrent neural network (RNN) learns long dependency in the sequences. We construct 37 data sets including sequence fragments of binding sites on circRNAs, and each set corresponds to an RBP. The experimental results show that the new encoding scheme is superior to the existing feature representation methods for RNA sequences, and the hybrid network outperforms conventional classifiers by a large margin, where both the CNN and RNN components contribute to the performance improvement.


Asunto(s)
Sitios de Unión/fisiología , Unión Proteica/genética , ARN Circular/genética , ARN Circular/metabolismo , Proteínas de Unión al ARN/metabolismo , Codón/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Regulación de la Expresión Génica/genética , Humanos , Aprendizaje Automático , Redes Neurales de la Computación , Análisis de Secuencia de ARN
18.
Bioinformatics ; 36(10): 3018-3027, 2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-32091580

RESUMEN

MOTIVATION: Knowledge of protein-ligand binding residues is important for understanding the functions of proteins and their interaction mechanisms. From experimentally solved protein structures, how to accurately identify its potential binding sites of a specific ligand on the protein is still a challenging problem. Compared with structure-alignment-based methods, machine learning algorithms provide an alternative flexible solution which is less dependent on annotated homogeneous protein structures. Several factors are important for an efficient protein-ligand prediction model, e.g. discriminative feature representation and effective learning architecture to deal with both the large-scale and severely imbalanced data. RESULTS: In this study, we propose a novel deep-learning-based method called DELIA for protein-ligand binding residue prediction. In DELIA, a hybrid deep neural network is designed to integrate 1D sequence-based features with 2D structure-based amino acid distance matrices. To overcome the problem of severe data imbalance between the binding and nonbinding residues, strategies of oversampling in mini-batch, random undersampling and stacking ensemble are designed to enhance the model. Experimental results on five benchmark datasets demonstrate the effectiveness of proposed DELIA pipeline. AVAILABILITY AND IMPLEMENTATION: The web server of DELIA is available at www.csbio.sjtu.edu.cn/bioinf/delia/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Proteínas , Algoritmos , Sitios de Unión , Biología Computacional , Ligandos , Unión Proteica , Proteínas/metabolismo
19.
Genomics ; 112(6): 4945-4958, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32919019

RESUMEN

Coronary artery disease (CAD) is the most common cardiovascular disease. CAD research has greatly progressed during the past decade. mRNA is a traditional and popular pipeline to investigate various disease, including CAD. Compared with mRNA, lncRNA has better stability and thus may serve as a better disease indicator in blood. Investigating potential CAD-related lncRNAs and mRNAs will greatly contribute to the diagnosis and treatment of CAD. In this study, a computational analysis was conducted on patients with CAD by using a comprehensive transcription dataset with combined mRNA and lncRNA expression data. Several machine learning algorithms, including feature selection methods and classification algorithms, were applied to screen for the most CAD-related RNA molecules. Decision rules were also reported to provide a quantitative description about the effect of these RNA molecules on CAD progression. These new findings (CAD-related RNA molecules and rules) can help understand mRNA and lncRNA expression levels in CAD.


Asunto(s)
Enfermedad de la Arteria Coronaria/genética , ARN Largo no Codificante/metabolismo , ARN Mensajero/metabolismo , Enfermedad de la Arteria Coronaria/metabolismo , Perfilación de la Expresión Génica , Humanos , Aprendizaje Automático
20.
Genomics ; 112(3): 2524-2534, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32045671

RESUMEN

The development of embryonic cells involves several continuous stages, and some genes are related to embryogenesis. To date, few studies have systematically investigated changes in gene expression profiles during mammalian embryogenesis. In this study, a computational analysis using machine learning algorithms was performed on the gene expression profiles of mouse embryonic cells at seven stages. First, the profiles were analyzed through a powerful Monte Carlo feature selection method for the generation of a feature list. Second, increment feature selection was applied on the list by incorporating two classification algorithms: support vector machine (SVM) and repeated incremental pruning to produce error reduction (RIPPER). Through SVM, we extracted several latent gene biomarkers, indicating the stages of embryonic cells, and constructed an optimal SVM classifier that produced a nearly perfect classification of embryonic cells. Furthermore, some interesting rules were accessed by the RIPPER algorithm, suggesting different expression patterns for different stages.


Asunto(s)
Embrión de Mamíferos/metabolismo , Desarrollo Embrionario/genética , Aprendizaje Automático , Transcriptoma , Animales , Perfilación de la Expresión Génica , Ratones , Análisis de la Célula Individual , Máquina de Vectores de Soporte
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA