Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 81
Filtrar
1.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38975895

RESUMO

Spatial transcriptomics provides valuable insights into gene expression within the native tissue context, effectively merging molecular data with spatial information to uncover intricate cellular relationships and tissue organizations. In this context, deciphering cellular spatial domains becomes essential for revealing complex cellular dynamics and tissue structures. However, current methods encounter challenges in seamlessly integrating gene expression data with spatial information, resulting in less informative representations of spots and suboptimal accuracy in spatial domain identification. We introduce stCluster, a novel method that integrates graph contrastive learning with multi-task learning to refine informative representations for spatial transcriptomic data, consequently improving spatial domain identification. stCluster first leverages graph contrastive learning technology to obtain discriminative representations capable of recognizing spatially coherent patterns. Through jointly optimizing multiple tasks, stCluster further fine-tunes the representations to be able to capture complex relationships between gene expression and spatial organization. Benchmarked against six state-of-the-art methods, the experimental results reveal its proficiency in accurately identifying complex spatial domains across various datasets and platforms, spanning tissue, organ, and embryo levels. Moreover, stCluster can effectively denoise the spatial gene expression patterns and enhance the spatial trajectory inference. The source code of stCluster is freely available at https://github.com/hannshu/stCluster.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Perfilação da Expressão Gênica/métodos , Biologia Computacional/métodos , Algoritmos , Humanos , Animais , Software , Aprendizado de Máquina
2.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38221904

RESUMO

Identifying the binding affinity between a drug and its target is essential in drug discovery and repurposing. Numerous computational approaches have been proposed for understanding these interactions. However, most existing methods only utilize either the molecular structure information of drugs and targets or the interaction information of drug-target bipartite networks. They may fail to combine the molecule-scale and network-scale features to obtain high-quality representations. In this study, we propose CSCo-DTA, a novel cross-scale graph contrastive learning approach for drug-target binding affinity prediction. The proposed model combines features learned from the molecular scale and the network scale to capture information from both local and global perspectives. We conducted experiments on two benchmark datasets, and the proposed model outperformed existing state-of-art methods. The ablation experiment demonstrated the significance and efficacy of multi-scale features and cross-scale contrastive learning modules in improving the prediction performance. Moreover, we applied the CSCo-DTA to predict the novel potential targets for Erlotinib and validated the predicted targets with the molecular docking analysis.


Assuntos
Benchmarking , Aprendizagem , Simulação de Acoplamento Molecular , Sistemas de Liberação de Medicamentos , Descoberta de Drogas
3.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36847701

RESUMO

Emerging studies have shown that circular RNAs (circRNAs) are involved in a variety of biological processes and play a key role in disease diagnosing, treating and inferring. Although many methods, including traditional machine learning and deep learning, have been developed to predict associations between circRNAs and diseases, the biological function of circRNAs has not been fully exploited. Some methods have explored disease-related circRNAs based on different views, but how to efficiently use the multi-view data about circRNA is still not well studied. Therefore, we propose a computational model to predict potential circRNA-disease associations based on collaborative learning with circRNA multi-view functional annotations. First, we extract circRNA multi-view functional annotations and build circRNA association networks, respectively, to enable effective network fusion. Then, a collaborative deep learning framework for multi-view information is designed to get circRNA multi-source information features, which can make full use of the internal relationship among circRNA multi-view information. We build a network consisting of circRNAs and diseases by their functional similarity and extract the consistency description information of circRNAs and diseases. Last, we predict potential associations between circRNAs and diseases based on graph auto encoder. Our computational model has better performance in predicting candidate disease-related circRNAs than the existing ones. Furthermore, it shows the high practicability of the method that we use several common diseases as case studies to find some unknown circRNAs related to them. The experiments show that CLCDA can efficiently predict disease-related circRNAs and are helpful for the diagnosis and treatment of human disease.


Assuntos
Aprendizado Profundo , Práticas Interdisciplinares , Humanos , RNA Circular/genética , Aprendizado de Máquina , Biologia Computacional/métodos
4.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37903416

RESUMO

The emergence of single-cell RNA sequencing (scRNA-seq) technology has revolutionized the identification of cell types and the study of cellular states at a single-cell level. Despite its significant potential, scRNA-seq data analysis is plagued by the issue of missing values. Many existing imputation methods rely on simplistic data distribution assumptions while ignoring the intrinsic gene expression distribution specific to cells. This work presents a novel deep-learning model, named scMultiGAN, for scRNA-seq imputation, which utilizes multiple collaborative generative adversarial networks (GAN). Unlike traditional GAN-based imputation methods that generate missing values based on random noises, scMultiGAN employs a two-stage training process and utilizes multiple GANs to achieve cell-specific imputation. Experimental results show the efficacy of scMultiGAN in imputation accuracy, cell clustering, differential gene expression analysis and trajectory analysis, significantly outperforming existing state-of-the-art techniques. Additionally, scMultiGAN is scalable to large scRNA-seq datasets and consistently performs well across sequencing platforms. The scMultiGAN code is freely available at https://github.com/Galaxy8172/scMultiGAN.


Assuntos
Análise de Célula Única , Transcriptoma , Análise de Célula Única/métodos , Análise por Conglomerados , Sequenciamento do Exoma , Análise de Dados , Análise de Sequência de RNA , Perfilação da Expressão Gênica
5.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34727570

RESUMO

Brain disease gene identification is critical for revealing the biological mechanism and developing drugs for brain diseases. To enhance the identification of brain disease genes, similarity-based computational methods, especially network-based methods, have been adopted for narrowing down the searching space. However, these network-based methods only use molecular networks, ignoring brain connectome data, which have been widely used in many brain-related studies. In our study, we propose a novel framework, named brainMI, for integrating brain connectome data and molecular-based gene association networks to predict brain disease genes. For the consistent representation of molecular-based network data and brain connectome data, brainMI first constructs a novel gene network, called brain functional connectivity (BFC)-based gene network, based on resting-state functional magnetic resonance imaging data and brain region-specific gene expression data. Then, a multiple network integration method is proposed to learn low-dimensional features of genes by integrating the BFC-based gene network and existing protein-protein interaction networks. Finally, these features are utilized to predict brain disease genes based on a support vector machine-based model. We evaluate brainMI on four brain diseases, including Alzheimer's disease, Parkinson's disease, major depressive disorder and autism. brainMI achieves of 0.761, 0.729, 0.728 and 0.744 using the BFC-based gene network alone and enhances the molecular network-based performance by 6.3% on average. In addition, the results show that brainMI achieves higher performance in predicting brain disease genes compared to the existing three state-of-the-art methods.


Assuntos
Doença de Alzheimer , Conectoma , Transtorno Depressivo Maior , Encéfalo/diagnóstico por imagem , Conectoma/métodos , Humanos , Imageamento por Ressonância Magnética/métodos
6.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34545927

RESUMO

Quantitative trait locus (QTL) analyses of multiomic molecular traits, such as gene transcription (eQTL), DNA methylation (mQTL) and histone modification (haQTL), have been widely used to infer the functional effects of genome variants. However, the QTL discovery is largely restricted by the limited study sample size, which demands higher threshold of minor allele frequency and then causes heavy missing molecular trait-variant associations. This happens prominently in single-cell level molecular QTL studies because of sample availability and cost. It is urgent to propose a method to solve this problem in order to enhance discoveries of current molecular QTL studies with small sample size. In this study, we presented an efficient computational framework called xQTLImp to impute missing molecular QTL associations. In the local-region imputation, xQTLImp uses multivariate Gaussian model to impute the missing associations by leveraging known association statistics of variants and the linkage disequilibrium (LD) around. In the genome-wide imputation, novel procedures are implemented to improve efficiency, including dynamically constructing a reused LD buffer, adopting multiple heuristic strategies and parallel computing. Experiments on various multiomic bulk and single-cell sequencing-based QTL datasets have demonstrated high imputation accuracy and novel QTL discovery ability of xQTLImp. Finally, a C++ software package is freely available at https://github.com/stormlovetao/QTLIMP.


Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Estudo de Associação Genômica Ampla/métodos , Genótipo , Desequilíbrio de Ligação , Fenótipo , Polimorfismo de Nucleotídeo Único , Tamanho da Amostra
7.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36151714

RESUMO

The three-dimensional genome structure plays a key role in cellular function and gene regulation. Single-cell Hi-C (high-resolution chromosome conformation capture) technology can capture genome structure information at the cell level, which provides the opportunity to study how genome structure varies among different cell types. Recently, a few methods are well designed for single-cell Hi-C clustering. In this manuscript, we perform an in-depth benchmark study of available single-cell Hi-C data clustering methods to implement an evaluation system for multiple clustering frameworks based on both human and mouse datasets. We compare eight methods in terms of visualization and clustering performance. Performance is evaluated using four benchmark metrics including adjusted rand index, normalized mutual information, homogeneity and Fowlkes-Mallows index. Furthermore, we also evaluate the eight methods for the task of separating cells at different stages of the cell cycle based on single-cell Hi-C data.


Assuntos
Cromatina , Cromossomos , Humanos , Camundongos , Animais , Análise por Conglomerados , Genoma , Conformação Molecular
8.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36409016

RESUMO

MOTIVATION: Symptom-based automatic diagnostic system queries the patient's potential symptoms through continuous interaction with the patient and makes predictions about possible diseases. A few studies use reinforcement learning (RL) to learn the optimal policy from the joint action space of symptoms and diseases. However, existing RL (or Non-RL) methods focus on disease diagnosis while ignoring the importance of symptom inquiry. Although these systems have achieved considerable diagnostic accuracy, they are still far below its performance upper bound due to few turns of interaction with patients and insufficient performance of symptom inquiry. To address this problem, we propose a new automatic diagnostic framework called DxFormer, which decouples symptom inquiry and disease diagnosis, so that these two modules can be independently optimized. The transition from symptom inquiry to disease diagnosis is parametrically determined by the stopping criteria. In DxFormer, we treat each symptom as a token, and formalize the symptom inquiry and disease diagnosis to a language generation model and a sequence classification model, respectively. We use the inverted version of Transformer, i.e. the decoder-encoder structure, to learn the representation of symptoms by jointly optimizing the reinforce reward and cross-entropy loss. RESULTS: We conduct experiments on three real-world medical dialogue datasets, and the experimental results verify the feasibility of increasing diagnostic accuracy by improving symptom recall. Our model overcomes the shortcomings of previous RL-based methods. By decoupling symptom query from the process of diagnosis, DxFormer greatly improves the symptom recall and achieves the state-of-the-art diagnostic accuracy. AVAILABILITY AND IMPLEMENTATION: Both code and data are available at https://github.com/lemuria-wchen/DxFormer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Idioma , Humanos , Entropia
9.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36579885

RESUMO

MOTIVATION: Drug-food interactions (DFIs) occur when some constituents of food affect the bioaccessibility or efficacy of the drug by involving in drug pharmacodynamic and/or pharmacokinetic processes. Many computational methods have achieved remarkable results in link prediction tasks between biological entities, which show the potential of computational methods in discovering novel DFIs. However, there are few computational approaches that pay attention to DFI identification. This is mainly due to the lack of DFI data. In addition, food is generally made up of a variety of chemical substances. The complexity of food makes it difficult to generate accurate feature representations for food. Therefore, it is urgent to develop effective computational approaches for learning the food feature representation and predicting DFIs. RESULTS: In this article, we first collect DFI data from DrugBank and PubMed, respectively, to construct two datasets, named DrugBank-DFI and PubMed-DFI. Based on these two datasets, two DFI networks are constructed. Then, we propose a novel end-to-end graph embedding-based method named DFinder to identify DFIs. DFinder combines node attribute features and topological structure features to learn the representations of drugs and food constituents. In topology space, we adopt a simplified graph convolution network-based method to learn the topological structure features. In feature space, we use a deep neural network to extract attribute features from the original node attributes. The evaluation results indicate that DFinder performs better than other baseline methods. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/23AIBox/23AIBox-DFinder. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Interações Alimento-Droga , Redes Neurais de Computação , Software
10.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36539203

RESUMO

MOTIVATION: In recent years, interest has arisen in using machine learning to improve the efficiency of automatic medical consultation and enhance patient experience. In this article, we propose two frameworks to support automatic medical consultation, namely doctor-patient dialogue understanding and task-oriented interaction. We create a new large medical dialogue dataset with multi-level fine-grained annotations and establish five independent tasks, including named entity recognition, dialogue act classification, symptom label inference, medical report generation and diagnosis-oriented dialogue policy. RESULTS: We report a set of benchmark results for each task, which shows the usability of the dataset and sets a baseline for future studies. AVAILABILITY AND IMPLEMENTATION: Both code and data are available from https://github.com/lemuria-wchen/imcs21. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Benchmarking , Aprendizado de Máquina , Humanos , Encaminhamento e Consulta
11.
Brief Bioinform ; 22(2): 2141-2150, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32367110

RESUMO

Identification of new drug-target interactions (DTIs) is an important but a time-consuming and costly step in drug discovery. In recent years, to mitigate these drawbacks, researchers have sought to identify DTIs using computational approaches. However, most existing methods construct drug networks and target networks separately, and then predict novel DTIs based on known associations between the drugs and targets without accounting for associations between drug-protein pairs (DPPs). To incorporate the associations between DPPs into DTI modeling, we built a DPP network based on multiple drugs and proteins in which DPPs are the nodes and the associations between DPPs are the edges of the network. We then propose a novel learning-based framework, 'graph convolutional network (GCN)-DTI', for DTI identification. The model first uses a graph convolutional network to learn the features for each DPP. Second, using the feature representation as an input, it uses a deep neural network to predict the final label. The results of our analysis show that the proposed framework outperforms some state-of-the-art approaches by a large margin.


Assuntos
Aprendizado Profundo , Sistemas de Liberação de Medicamentos , Redes Neurais de Computação , Algoritmos , Humanos
12.
Brief Bioinform ; 22(2): 2096-2105, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32249297

RESUMO

MOTIVATION: The emergence of abundant biological networks, which benefit from the development of advanced high-throughput techniques, contributes to describing and modeling complex internal interactions among biological entities such as genes and proteins. Multiple networks provide rich information for inferring the function of genes or proteins. To extract functional patterns of genes based on multiple heterogeneous networks, network embedding-based methods, aiming to capture non-linear and low-dimensional feature representation based on network biology, have recently achieved remarkable performance in gene function prediction. However, existing methods do not consider the shared information among different networks during the feature learning process. RESULTS: Taking the correlation among the networks into account, we design a novel semi-supervised autoencoder method to integrate multiple networks and generate a low-dimensional feature representation. Then we utilize a convolutional neural network based on the integrated feature embedding to annotate unlabeled gene functions. We test our method on both yeast and human datasets and compare with three state-of-the-art methods. The results demonstrate the superior performance of our method. We not only provide a comprehensive analysis of the performance of the newly proposed algorithm but also provide a tool for extracting features of genes based on multiple networks, which can be used in the downstream machine learning task. AVAILABILITY: DeepMNE-CNN is freely available at https://github.com/xuehansheng/DeepMNE-CNN. CONTACT: jiajiepeng@nwpu.edu.cn; shang@nwpu.edu.cn; jianye.hao@tju.edu.cn.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Algoritmos , Redes Reguladoras de Genes , Genes Fúngicos , Humanos , Anotação de Sequência Molecular , Leveduras/genética
13.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33517357

RESUMO

Accurately identifying potential drug-target interactions (DTIs) is a key step in drug discovery. Although many related experimental studies have been carried out for identifying DTIs in the past few decades, the biological experiment-based DTI identification is still timeconsuming and expensive. Therefore, it is of great significance to develop effective computational methods for identifying DTIs. In this paper, we develop a novel 'end-to-end' learning-based framework based on heterogeneous 'graph' convolutional networks for 'DTI' prediction called end-to-end graph (EEG)-DTI. Given a heterogeneous network containing multiple types of biological entities (i.e. drug, protein, disease, side-effect), EEG-DTI learns the low-dimensional feature representation of drugs and targets using a graph convolutional networks-based model and predicts DTIs based on the learned features. During the training process, EEG-DTI learns the feature representation of nodes in an end-to-end mode. The evaluation test shows that EEG-DTI performs better than existing state-of-art methods. The data and source code are available at: https://github.com/MedicineBiology-AI/EEG-DTI.


Assuntos
Simulação por Computador , Desenvolvimento de Medicamentos , Descoberta de Drogas , Aprendizado de Máquina , Preparações Farmacêuticas/química , Software , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Proteínas/química , Proteínas/metabolismo
14.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33554247

RESUMO

Interactions between proteins and small molecule metabolites play vital roles in regulating protein functions and controlling various cellular processes. The activities of metabolic enzymes, transcription factors, transporters and membrane receptors can all be mediated through protein-metabolite interactions (PMIs). Compared with the rich knowledge of protein-protein interactions, little is known about PMIs. To the best of our knowledge, no existing database has been developed for collecting PMIs. The recent rapid development of large-scale mass spectrometry analysis of biomolecules has led to the discovery of large amounts of PMIs. Therefore, we developed the PMI-DB to provide a comprehensive and accurate resource of PMIs. A total of 49 785 entries were manually collected in the PMI-DB, corresponding to 23 small molecule metabolites, 9631 proteins and 4 species. Unlike other databases that only provide positive samples, the PMI-DB provides non-interaction between proteins and metabolites, which not only reduces the experimental cost for biological experimenters but also facilitates the construction of more accurate algorithms for researchers using machine learning. To show the convenience of the PMI-DB, we developed a deep learning-based method to predict PMIs in the PMI-DB and compared it with several methods. The experimental results show that the area under the curve and area under the precision-recall curve of our method are 0.88 and 0.95, respectively. Overall, the PMI-DB provides a user-friendly interface for browsing the biological functions of metabolites/proteins of interest, and experimental techniques for identifying PMIs in different species, which provides important support for furthering the understanding of cellular processes. The PMI-DB is freely accessible at http://easybioai.com/PMIDB.


Assuntos
Aprendizado Profundo , Escherichia coli/metabolismo , Metaboloma , Mapas de Interação de Proteínas , Proteínas/metabolismo , Leveduras/metabolismo , Animais , Cromatografia Líquida , Bases de Dados de Proteínas , Humanos , Espectrometria de Massas , Metabolômica , Camundongos , Interface Usuário-Computador
15.
Bioinformatics ; 38(16): 3995-4001, 2022 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-35775965

RESUMO

MOTIVATION: Disease diagnosis-oriented dialog system models the interactive consultation procedure as the Markov decision process, and reinforcement learning algorithms are used to solve the problem. Existing approaches usually employ a flat policy structure that treat all symptoms and diseases equally for action making. This strategy works well in a simple scenario when the action space is small; however, its efficiency will be challenged in the real environment. Inspired by the offline consultation process, we propose to integrate a hierarchical policy structure of two levels into the dialog system for policy learning. The high-level policy consists of a master model that is responsible for triggering a low-level model, the low-level policy consists of several symptom checkers and a disease classifier. The proposed policy structure is capable to deal with diagnosis problem including large number of diseases and symptoms. RESULTS: Experimental results on three real-world datasets and a synthetic dataset demonstrate that our hierarchical framework achieves higher accuracy and symptom recall in disease diagnosis compared with existing systems. We construct a benchmark including datasets and implementation of existing algorithms to encourage follow-up researches. AVAILABILITY AND IMPLEMENTATION: The code and data are available from https://github.com/FudanDISC/DISCOpen-MedBox-DialoDiagnosis. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Aprendizado Profundo , Cadeias de Markov , Benchmarking
16.
Nucleic Acids Res ; 49(D1): D1413-D1419, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33010177

RESUMO

SC2disease (http://easybioai.com/sc2disease/) is a manually curated database that aims to provide a comprehensive and accurate resource of gene expression profiles in various cell types for different diseases. With the development of single-cell RNA sequencing (scRNA-seq) technologies, uncovering cellular heterogeneity of different tissues for different diseases has become feasible by profiling transcriptomes across cell types at the cellular level. In particular, comparing gene expression profiles between different cell types and identifying cell-type-specific genes in various diseases offers new possibilities to address biological and medical questions. However, systematic, hierarchical and vast databases of gene expression profiles in human diseases at the cellular level are lacking. Thus, we reviewed the literature prior to March 2020 for studies which used scRNA-seq to study diseases with human samples, and developed the SC2disease database to summarize all the data by different diseases, tissues and cell types. SC2disease documents 946 481 entries, corresponding to 341 cell types, 29 tissues and 25 diseases. Each entry in the SC2disease database contains comparisons of differentially expressed genes between different cell types, tissues and disease-related health status. Furthermore, we reanalyzed gene expression matrix by unified pipeline to improve the comparability between different studies. For each disease, we also compare cell-type-specific genes with the corresponding genes of lead single nucleotide polymorphisms (SNPs) identified in genome-wide association studies (GWAS) to implicate cell type specificity of the traits.


Assuntos
Transtorno do Espectro Autista/genética , Doenças Autoimunes/genética , Doenças Cardiovasculares/genética , Bases de Dados Factuais , Gastroenteropatias/genética , Neoplasias/genética , Doenças Neurodegenerativas/genética , Viroses/genética , Algoritmos , Transtorno do Espectro Autista/metabolismo , Transtorno do Espectro Autista/patologia , Doenças Autoimunes/metabolismo , Doenças Autoimunes/patologia , Doenças Cardiovasculares/metabolismo , Doenças Cardiovasculares/patologia , Gastroenteropatias/metabolismo , Gastroenteropatias/patologia , Perfilação da Expressão Gênica , Heterogeneidade Genética , Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , Neoplasias/metabolismo , Neoplasias/patologia , Doenças Neurodegenerativas/metabolismo , Doenças Neurodegenerativas/patologia , Especificidade de Órgãos , Polimorfismo de Nucleotídeo Único , Análise de Célula Única/métodos , Software , Transcriptoma , Viroses/metabolismo , Viroses/patologia
17.
BMC Genomics ; 23(Suppl 1): 269, 2022 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-35387615

RESUMO

BACKGROUND: In biological systems, metabolomics can not only contribute to the discovery of metabolic signatures for disease diagnosis, but is very helpful to illustrate the underlying molecular disease-causing mechanism. Therefore, identification of disease-related metabolites is of great significance for comprehensively understanding the pathogenesis of diseases and improving clinical medicine. RESULTS: In the paper, we propose a disease and literature driven metabolism prediction model (DLMPM) to identify the potential associations between metabolites and diseases based on latent factor model. We build the disease glossary with disease terms from different databases and an association matrix based on the mapping between diseases and metabolites. The similarity of diseases and metabolites is used to complete the association matrix. Finally, we predict potential associations between metabolites and diseases based on the matrix decomposition method. In total, 1,406 direct associations between diseases and metabolites are found. There are 119,206 unknown associations between diseases and metabolites predicted with a coverage rate of 80.88%. Subsequently, we extract training sets and testing sets based on data increment from the database of disease-related metabolites and assess the performance of DLMPM on 19 diseases. As a result, DLMPM is proven to be successful in predicting potential metabolic signatures for human diseases with an average AUC value of 82.33%. CONCLUSION: In this paper, a computational model is proposed for exploring metabolite-disease pairs and has good performance in predicting potential metabolites related to diseases through adequate validation. The results show that DLMPM has a better performance in prioritizing candidate diseases-related metabolites compared with the previous methods and would be helpful for researchers to reveal more information about human diseases.


Assuntos
Metabolômica , Publicações , Biologia Computacional/métodos , Bases de Dados Factuais , Humanos , Metabolômica/métodos
18.
Plant Physiol ; 186(4): 1786-1799, 2021 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-34618108

RESUMO

The proper biogenesis, morphogenesis, and dynamics of subcellular organelles are essential to their metabolic functions. Conventional techniques for identifying, classifying, and quantifying abnormalities in organelle morphology are largely manual and time-consuming, and require specific expertise. Deep learning has the potential to revolutionize image-based screens by greatly improving their scope, speed, and efficiency. Here, we used transfer learning and a convolutional neural network (CNN) to analyze over 47,000 confocal microscopy images from Arabidopsis wild-type and mutant plants with abnormal division of one of three essential energy organelles: chloroplasts, mitochondria, or peroxisomes. We have built a deep-learning framework, DeepLearnMOR (Deep Learning of the Morphology of Organelles), which can rapidly classify image categories and identify abnormalities in organelle morphology with over 97% accuracy. Feature visualization analysis identified important features used by the CNN to predict morphological abnormalities, and visual clues helped to better understand the decision-making process, thereby validating the reliability and interpretability of the neural network. This framework establishes a foundation for future larger-scale research with broader scopes and greater data set diversity and heterogeneity.


Assuntos
Desenho Assistido por Computador , Aprendizado Profundo , Redes Neurais de Computação , Plantas/anatomia & histologia , Fluorescência , Organelas , Células Vegetais , Reprodutibilidade dos Testes
19.
Methods ; 192: 77-84, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-32946974

RESUMO

Analyzing disease-disease relationships plays an important role for understanding disease mechanisms and finding alternative uses for a drug. A disease is usually the result of abnormal state of multiple molecular process. Since biological networks can model the interplay of multiple molecular processes, network-based methods have been proposed to uncover the disease-disease relationships recently. Given a disease and a network, the disease could be represented as a subnetwork constructed by the disease genes involved in the given network, named disease subnetwork. Because it is difficult to learn the feature representation of disease subnetworks, most existing methods are unsupervised ones without using labeled information. To fill this gap, we propose a novel method named SubNet2vec to learn the feature vectors of diseases from their corresponding subnetwork in the biological network. By utilizing the feature representation of disease subnetwork, we can analyze disease-disease relationships in a supervised fashion. The evaluation results show that the proposed framework outperforms some state-of-the-art approaches in a large margin on disease-disease/disease-drug association prediction. The source code and data are available athttps://github.com/MedicineBiology-AI/SubNet2vec.git.


Assuntos
Software , Preparações Farmacêuticas
20.
BMC Bioinformatics ; 22(Suppl 9): 281, 2021 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-34433409

RESUMO

BACKGROUND: It is important to understand the composition of cell type and its proportion in intact tissues, as changes in certain cell types are the underlying cause of disease in humans. Although compositions of cell type and ratios can be obtained by single-cell sequencing, single-cell sequencing is currently expensive and cannot be applied in clinical studies involving a large number of subjects. Therefore, it is useful to apply the bulk RNA-Seq dataset and the single-cell RNA dataset to deconvolute and obtain the cell type composition in the tissue. RESULTS: By analyzing the existing cell population prediction methods, we found that most of the existing methods need the cell-type-specific gene expression profile as the input of the signature matrix. However, in real applications, it is not always possible to find an available signature matrix. To solve this problem, we proposed a novel method, named DCap, to predict cell abundance. DCap is a deconvolution method based on non-negative least squares. DCap considers the weight resulting from measurement noise of bulk RNA-seq and calculation error of single-cell RNA-seq data, during the calculation process of non-negative least squares and performs the weighted iterative calculation based on least squares. By weighting the bulk tissue gene expression matrix and single-cell gene expression matrix, DCap minimizes the measurement error of bulk RNA-Seq and also reduces errors resulting from differences in the number of expressed genes in the same type of cells in different samples. Evaluation test shows that DCap performs better in cell type abundance prediction than existing methods. CONCLUSION: DCap solves the deconvolution problem using weighted non-negative least squares to predict cell type abundance in tissues. DCap has better prediction results and does not need to prepare a signature matrix that gives the cell-type-specific gene expression profile in advance. By using DCap, we can better study the changes in cell proportion in diseased tissues and provide more information on the follow-up treatment of diseases.


Assuntos
Perfilação da Expressão Gênica , RNA , Humanos , RNA/genética , RNA-Seq , Análise de Sequência de RNA , Sequenciamento do Exoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA