Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38348746

RESUMO

The prediction of molecular interactions is vital for drug discovery. Existing methods often focus on individual prediction tasks and overlook the relationships between them. Additionally, certain tasks encounter limitations due to insufficient data availability, resulting in limited performance. To overcome these limitations, we propose KGE-UNIT, a unified framework that combines knowledge graph embedding (KGE) and multi-task learning, for simultaneous prediction of drug-target interactions (DTIs) and drug-drug interactions (DDIs) and enhancing the performance of each task, even when data availability is limited. Via KGE, we extract heterogeneous features from the drug knowledge graph to enhance the structural features of drug and protein nodes, thereby improving the quality of features. Additionally, employing multi-task learning, we introduce an innovative predictor that comprises the task-aware Convolutional Neural Network-based (CNN-based) encoder and the task-aware attention decoder which can fuse better multimodal features, capture the contextual interactions of molecular tasks and enhance task awareness, leading to improved performance. Experiments on two imbalanced datasets for DTIs and DDIs demonstrate the superiority of KGE-UNIT, achieving high area under the receiver operating characteristics curves (AUROCs) (0.942, 0.987) and area under the precision-recall curve ( AUPRs) (0.930, 0.980) for DTIs and high AUROCs (0.975, 0.989) and AUPRs (0.966, 0.988) for DDIs. Notably, on the LUO dataset where the data were more limited, KGE-UNIT exhibited a more pronounced improvement, with increases of 4.32$\%$ in AUROC and 3.56$\%$ in AUPR for DTIs and 6.56$\%$ in AUROC and 8.17$\%$ in AUPR for DDIs. The scalability of KGE-UNIT is demonstrated through its extension to protein-protein interactions prediction, ablation studies and case studies further validate its effectiveness.


Assuntos
Aprendizagem , Reconhecimento Automatizado de Padrão , Descoberta de Drogas , Área Sob a Curva , Redes Neurais de Computação , Interações Medicamentosas
2.
Nucleic Acids Res ; 52(D1): D562-D571, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37953313

RESUMO

The single-cell proteomics enables the direct quantification of protein abundance at the single-cell resolution, providing valuable insights into cellular phenotypes beyond what can be inferred from transcriptome analysis alone. However, insufficient large-scale integrated databases hinder researchers from accessing and exploring single-cell proteomics, impeding the advancement of this field. To fill this deficiency, we present a comprehensive database, namely Single-cell Proteomic DataBase (SPDB, https://scproteomicsdb.com/), for general single-cell proteomic data, including antibody-based or mass spectrometry-based single-cell proteomics. Equipped with standardized data process and a user-friendly web interface, SPDB provides unified data formats for convenient interaction with downstream analysis, and offers not only dataset-level but also protein-level data search and exploration capabilities. To enable detailed exhibition of single-cell proteomic data, SPDB also provides a module for visualizing data from the perspectives of cell metadata or protein features. The current version of SPDB encompasses 133 antibody-based single-cell proteomic datasets involving more than 300 million cells and over 800 marker/surface proteins, and 10 mass spectrometry-based single-cell proteomic datasets involving more than 4000 cells and over 7000 proteins. Overall, SPDB is envisioned to be explored as a useful resource that will facilitate the wider research communities by providing detailed insights into proteomics from the single-cell perspective.


Assuntos
Proteínas , Proteômica , Anticorpos , Bases de Conhecimento , Espectrometria de Massas , Humanos , Animais , Análise de Célula Única
3.
Am J Physiol Endocrinol Metab ; 326(3): E268-E276, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38197791

RESUMO

Glucagon-like peptide 1 (GLP-1) regulates food intake, insulin production, and metabolism. Our recent study demonstrated that pancreatic α-cells-secreted (intraislet) GLP-1 effectively promotes maternal insulin secretion and metabolic adaptation during pregnancy. However, the role of circulating GLP-1 in maternal energy metabolism remains largely unknown. Our study aims to investigate systemic GLP-1 response to pregnancy and its regulatory effect on fetal growth. Using C57BL/6 mice, we observed a gradual decline in maternal blood GLP-1 concentrations. Subsequent administration of the GLP-1 receptor agonist semaglutide (Sem) to dams in late pregnancy revealed a modest decrease in maternal food intake during initial treatment. At the same time, no significant alterations were observed in maternal body weight or fat mass. Notably, Sem-treated dams exhibited a significant decrease in fetal body weight, which persisted even following the restoration of maternal blood glucose levels. Despite no observable change in placental weight, a marked reduction in the placenta labyrinth area from Sem-treated dams was evident. Our investigation further demonstrated a substantial decrease in the expression levels of various pivotal nutrient transporters within the placenta, including glucose transporter one and sodium-neutral amino acid transporter one, after Sem treatment. In addition, Sem injection led to a notable reduction in the capillary area, number, and surface densities within the labyrinth. These findings underscore the crucial role of modulating circulating GLP-1 levels in maternal adaptation, emphasizing the inhibitory effects of excessive GLP-1 receptor activation on both placental development and fetal growth.NEW & NOTEWORTHY Our study reveals a progressive decline in maternal blood glucagon-like peptide 1 (GLP-1) concentration. GLP-1 receptor agonist injection in late pregnancy significantly reduced fetal body weight, even after restoration of maternal blood glucose concentration. GLP-1 receptor activation significantly reduced the placental labyrinth area, expression of some nutrient transporters, and capillary development. Our study indicates that reducing maternal blood GLP-1 levels is a physiological adaptation process that benefits placental development and fetal growth.


Assuntos
Glicemia , Placenta , Animais , Feminino , Camundongos , Gravidez , Glicemia/metabolismo , Desenvolvimento Fetal , Peso Fetal , Peptídeo 1 Semelhante ao Glucagon/metabolismo , Receptor do Peptídeo Semelhante ao Glucagon 1/metabolismo , Agonistas do Receptor do Peptídeo 1 Semelhante ao Glucagon , Camundongos Endogâmicos C57BL , Placenta/metabolismo
4.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35443027

RESUMO

Predicting the binding of peptide and major histocompatibility complex (MHC) plays a vital role in immunotherapy for cancer. The success of Alphafold of applying natural language processing (NLP) algorithms in protein secondary struction prediction has inspired us to explore the possibility of NLP methods in predicting peptide-MHC class I binding. Based on the above motivations, we propose the MHCRoBERTa method, RoBERTa pre-training approach, for predicting the binding affinity between type I MHC and peptides. Analysis of the results on benchmark dataset demonstrates that MHCRoBERTa can outperform other state-of-art prediction methods with an increase of the Spearman rank correlation coefficient (SRCC) value. Notably, our model gave a significant improvement on IC50 value. Our method has achieved SRCC value and AUC value as 0.785 and 0.817, respectively. Our SRCC value is 14.3% higher than NetMHCpan3.0 (the second highest SRCC value on pan-specific) and is 3% higher than MHCflurry (the second highest SRCC value on all methods). The AUC value is also better than any other pan-specific methods. Moreover, we visualize the multi-head self-attention for the token representation across the layers and heads by this method. Through the analysis of the representation of each layer and head, we can show whether the model has learned the syntax and semantics necessary to perform the prediction task well. All these results demonstrate that our model can accurately predict the peptide-MHC class I binding affinity and that MHCRoBERTa is a powerful tool for screening potential neoantigens for cancer immunotherapy. MHCRoBERTa is available as an open source software at github (https://github.com/FuxuWang/MHCRoBERTa).


Assuntos
Antígenos de Histocompatibilidade Classe I , Peptídeos , Algoritmos , Sequência de Aminoácidos , Antígenos de Histocompatibilidade Classe I/metabolismo , Aprendizado de Máquina , Peptídeos/metabolismo , Ligação Proteica
5.
Brief Bioinform ; 22(2): 2141-2150, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32367110

RESUMO

Identification of new drug-target interactions (DTIs) is an important but a time-consuming and costly step in drug discovery. In recent years, to mitigate these drawbacks, researchers have sought to identify DTIs using computational approaches. However, most existing methods construct drug networks and target networks separately, and then predict novel DTIs based on known associations between the drugs and targets without accounting for associations between drug-protein pairs (DPPs). To incorporate the associations between DPPs into DTI modeling, we built a DPP network based on multiple drugs and proteins in which DPPs are the nodes and the associations between DPPs are the edges of the network. We then propose a novel learning-based framework, 'graph convolutional network (GCN)-DTI', for DTI identification. The model first uses a graph convolutional network to learn the features for each DPP. Second, using the feature representation as an input, it uses a deep neural network to predict the final label. The results of our analysis show that the proposed framework outperforms some state-of-the-art approaches by a large margin.


Assuntos
Aprendizado Profundo , Sistemas de Liberação de Medicamentos , Redes Neurais de Computação , Algoritmos , Humanos
6.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33554247

RESUMO

Interactions between proteins and small molecule metabolites play vital roles in regulating protein functions and controlling various cellular processes. The activities of metabolic enzymes, transcription factors, transporters and membrane receptors can all be mediated through protein-metabolite interactions (PMIs). Compared with the rich knowledge of protein-protein interactions, little is known about PMIs. To the best of our knowledge, no existing database has been developed for collecting PMIs. The recent rapid development of large-scale mass spectrometry analysis of biomolecules has led to the discovery of large amounts of PMIs. Therefore, we developed the PMI-DB to provide a comprehensive and accurate resource of PMIs. A total of 49 785 entries were manually collected in the PMI-DB, corresponding to 23 small molecule metabolites, 9631 proteins and 4 species. Unlike other databases that only provide positive samples, the PMI-DB provides non-interaction between proteins and metabolites, which not only reduces the experimental cost for biological experimenters but also facilitates the construction of more accurate algorithms for researchers using machine learning. To show the convenience of the PMI-DB, we developed a deep learning-based method to predict PMIs in the PMI-DB and compared it with several methods. The experimental results show that the area under the curve and area under the precision-recall curve of our method are 0.88 and 0.95, respectively. Overall, the PMI-DB provides a user-friendly interface for browsing the biological functions of metabolites/proteins of interest, and experimental techniques for identifying PMIs in different species, which provides important support for furthering the understanding of cellular processes. The PMI-DB is freely accessible at http://easybioai.com/PMIDB.


Assuntos
Aprendizado Profundo , Escherichia coli/metabolismo , Metaboloma , Mapas de Interação de Proteínas , Proteínas/metabolismo , Leveduras/metabolismo , Animais , Cromatografia Líquida , Bases de Dados de Proteínas , Humanos , Espectrometria de Massas , Metabolômica , Camundongos , Interface Usuário-Computador
7.
BMC Bioinformatics ; 23(Suppl 1): 89, 2022 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-35255810

RESUMO

BACKGROUND: Measuring similarity between complex diseases has significant implications for revealing the pathogenesis of diseases and development in the domain of biomedicine. It has been consentaneous that functional associations between disease-related genes and semantic associations can be applied to calculate disease similarity. Currently, more and more studies have demonstrated the profound involvement of non-coding RNA in the regulation of genome organization and gene expression. Thus, taking ncRNA into account can be useful in measuring disease similarities. However, existing methods ignore the regulation functions of ncRNA in biological process. In this study, we proposed a novel deep-learning method to deduce disease similarity. RESULTS: In this article, we proposed a novel method, ImpAESim, a framework integrating multiple networks embedding to learn compact feature representations and disease similarity calculation. We first utilize three different disease-related information networks to build up a heterogeneous network, after a network diffusion process, RWR, a compact feature learning model composed of classic Auto Encoder (AE) and improved AE model is proposed to extract constraints and low-dimensional feature representations. We finally obtain an accurate and low-dimensional feature representation of diseases, then we employed the cosine distance as the measurement of disease similarity. CONCLUSION: ImpAESim focuses on extracting a low-dimensional vector representation of features based on ncRNA regulation, and gene-gene interaction network. Our method can significantly reduce the calculation bias resulted from the sparse disease associations which are derived from semantic associations.


Assuntos
Redes Reguladoras de Genes , RNA Longo não Codificante , RNA Longo não Codificante/genética , RNA não Traduzido/genética
8.
BMC Bioinformatics ; 23(Suppl 1): 88, 2022 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-35255808

RESUMO

BACKGROUND: Drug-drug interactions (DDIs) are the reactions between drugs. They are compartmentalized into three types: synergistic, antagonistic and no reaction. As a rapidly developing technology, predicting DDIs-associated events is getting more and more attention and application in drug development and disease diagnosis fields. In this work, we study not only whether the two drugs interact, but also specific interaction types. And we propose a learning-based method using convolution neural networks to learn feature representations and predict DDIs. RESULTS: In this paper, we proposed a novel algorithm using a CNN architecture, named CNN-DDI, to predict drug-drug interactions. First, we extract feature interactions from drug categories, targets, pathways and enzymes as feature vectors and employ the Jaccard similarity as the measurement of drugs similarity. Then, based on the representation of features, we build a new convolution neural network as the DDIs' predictor. CONCLUSION: The experimental results indicate that drug categories is effective as a new feature type applied to CNN-DDI method. And using multiple features is more informative and more effective than single feature. It can be concluded that CNN-DDI has more superiority than other existing algorithms on task of predicting DDIs.


Assuntos
Algoritmos , Redes Neurais de Computação , Desenvolvimento de Medicamentos , Interações Medicamentosas , Projetos de Pesquisa
9.
BMC Genomics ; 23(Suppl 1): 269, 2022 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-35387615

RESUMO

BACKGROUND: In biological systems, metabolomics can not only contribute to the discovery of metabolic signatures for disease diagnosis, but is very helpful to illustrate the underlying molecular disease-causing mechanism. Therefore, identification of disease-related metabolites is of great significance for comprehensively understanding the pathogenesis of diseases and improving clinical medicine. RESULTS: In the paper, we propose a disease and literature driven metabolism prediction model (DLMPM) to identify the potential associations between metabolites and diseases based on latent factor model. We build the disease glossary with disease terms from different databases and an association matrix based on the mapping between diseases and metabolites. The similarity of diseases and metabolites is used to complete the association matrix. Finally, we predict potential associations between metabolites and diseases based on the matrix decomposition method. In total, 1,406 direct associations between diseases and metabolites are found. There are 119,206 unknown associations between diseases and metabolites predicted with a coverage rate of 80.88%. Subsequently, we extract training sets and testing sets based on data increment from the database of disease-related metabolites and assess the performance of DLMPM on 19 diseases. As a result, DLMPM is proven to be successful in predicting potential metabolic signatures for human diseases with an average AUC value of 82.33%. CONCLUSION: In this paper, a computational model is proposed for exploring metabolite-disease pairs and has good performance in predicting potential metabolites related to diseases through adequate validation. The results show that DLMPM has a better performance in prioritizing candidate diseases-related metabolites compared with the previous methods and would be helpful for researchers to reveal more information about human diseases.


Assuntos
Metabolômica , Publicações , Biologia Computacional/métodos , Bases de Dados Factuais , Humanos , Metabolômica/métodos
10.
Bioinformatics ; 37(20): 3642-3644, 2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-33830205

RESUMO

SUMMARY: JavaScript-based Circos libraries have been widely implemented to generate interactive Circos plots in web applications. However, these libraries require either local installation, which requires the compilation of extra libraries, or extra data processing procedures to prepare input and configuration for each track of plot, which limits the utility and capability of integration with powerful R packages. In this report, we present interacCircos, an R package for creating interactive Circos plots through the integration of JavaScript-based libraries. interacCircos can simply and flexibly implement 14 track-plot functions and 7 auxiliary functions for presenting large-scale genomic data in interactive Circos plots. AVAILABILITY AND IMPLEMENTATION: InteracCircos and its manual are freely available at https://github.com/mrcuizhe/interacCircos under the GPL license. The online documentation is available at https://mrcuizhe.github.io/interacCircos_documentation/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
Bioinformatics ; 37(20): 3647-3649, 2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-33963826

RESUMO

SUMMARY: Circular consensus sequencing reads are promising for the comprehensive detection of structural variants (SVs). However, alignment-based SV calling pipelines are computationally intensive due to the generation of complete read-alignments and its post-processing. Herein, we propose a SKeleton-based analysis toolkit for Structural Variation detection (SKSV). Benchmarks on real and simulated datasets demonstrate that SKSV has an order of magnitude of faster speed than state-of-the-art SV calling approaches; moreover, it achieves higher F1 scores for various types of SVs. AVAILABILITY AND IMPLEMENTATION: SKSV is available from https://github.com/ydLiu-HIT/SKSV. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

12.
BMC Bioinformatics ; 21(Suppl 16): 559, 2020 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-33323099

RESUMO

BACKGROUND: Millions of people are suffering from cancers, but accurate early diagnosis and effective treatment are still tough for all doctors. Common ways against cancer include surgical operation, radiotherapy and chemotherapy. However, they are all very harmful for patients. Recently, the anticancer peptides (ACPs) have been discovered to be a potential way to treat cancer. Since ACPs are natural biologics, they are safer than other methods. However, the experimental technology is an expensive way to find ACPs so we purpose a new machine learning method to identify the ACPs. RESULTS: Firstly, we extracted the feature of ACPs in two aspects: sequence and chemical characteristics of amino acids. For sequence, average 20 amino acids composition was extracted. For chemical characteristics, we classified amino acids into six groups based on the patterns of hydrophobic and hydrophilic residues. Then, deep belief network has been used to encode the features of ACPs. Finally, we purposed Random Relevance Vector Machines to identify the true ACPs. We call this method 'DRACP' and tested the performance of it on two independent datasets. Its AUC and AUPR are higher than 0.9 in both datasets. CONCLUSION: We developed a novel method named 'DRACP' and compared it with some traditional methods. The cross-validation results showed its effectiveness in identifying ACPs.


Assuntos
Antineoplásicos/uso terapêutico , Biologia Computacional/métodos , Peptídeos/uso terapêutico , Humanos , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Peptídeos/química , Curva ROC , Máquina de Vetores de Suporte
13.
Hum Genomics ; 13(Suppl 1): 49, 2019 10 22.
Artigo em Inglês | MEDLINE | ID: mdl-31639043

RESUMO

BACKGROUND: In recent years, with the development of high-throughput genome sequencing technologies, a large amount of genome data has been generated, which has caused widespread concern about data storage and transmission costs. However, how to effectively compression genome sequences data remains an unsolved problem. RESULTS: In this paper, we propose a compression method using machine learning techniques (DeepDNA), for compressing human mitochondrial genome data. The experimental results show the effectiveness of our proposed method compared with other on the human mitochondrial genome data. CONCLUSIONS: The compression method we proposed can be classified as non-reference based method, but the compression effect is comparable to that of reference based methods. Moreover, our method not only have a well compression results in the population genome with large redundancy, but also in the single genome with small redundancy. The codes of DeepDNA are available at https://github.com/rongjiewang/DeepDNA .


Assuntos
Compressão de Dados , Genoma Mitocondrial , Aprendizado de Máquina , Algoritmos , Sequência de Bases , Bases de Dados Genéticas , Humanos , Modelos Genéticos , Redes Neurais de Computação
14.
BMC Bioinformatics ; 20(Suppl 16): 582, 2019 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-31787106

RESUMO

BACKGROUND: Over the past decades, a large number of long non-coding RNAs (lncRNAs) have been identified. Growing evidence has indicated that the mutation and dysregulation of lncRNAs play a critical role in the development of many complex human diseases. Consequently, identifying potential disease-related lncRNAs is an effective means to improve the quality of disease diagnostics and treatment, which is the motivation of this work. Here, we propose a computational model (LncDisAP) for potential disease-related lncRNA identification based on multiple biological datasets. First, the associations between lncRNA and different data sources are collected from different databases. With these data sources as dimensions, we calculate the functional associations between lncRNAs by the recommendation strategy of collaborative filtering. Subsequently, a disease-associated lncRNA functional network is built with functional similarities between lncRNAs as the weight. Ultimately, potential disease-related lncRNAs can be identified based on ranked scores derived by random walking with restart (RWR). Then, training sets and testing sets are extracted from two different versions of a disease-lncRNA dataset to assess the performance of LncDisAP on 54 diseases. RESULTS: A lncRNA functional network is built based on the proposed computational model, and it contains 66,060 associations among 364 lncRNAs associated with 182 diseases in total. We extract 218 known disease-lncRNA pairs associated with 54 diseases to assess the network. As a result, the average AUC (area under the receiver operating characteristic curve) of LncDisAP is 78.08%. CONCLUSION: In this article, a computational model integrating multiple lncRNA-related biological datasets is proposed for identifying potential disease-related lncRNAs. The result shows that LncDisAP is successful in predicting novel disease-related lncRNA signatures. In addition, with several common cancers taken as case studies, we found some unknown lncRNAs that could be associated with these diseases through our network. These results suggest that this method can be helpful in improving the quality for disease diagnostics and treatment.


Assuntos
Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados Genéticas , Doença/genética , Regulação da Expressão Gênica , RNA Longo não Codificante/genética , Área Sob a Curva , Redes Reguladoras de Genes , Humanos , Curva ROC
15.
BMC Bioinformatics ; 20(Suppl 18): 570, 2019 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-31760934

RESUMO

BACKGROUND: Alzheimer's disease (AD) imposes a heavy burden on society and every family. Therefore, diagnosing AD in advance and discovering new drug targets are crucial, while these could be achieved by identifying AD-related proteins. The time-consuming and money-costing biological experiment makes researchers turn to develop more advanced algorithms to identify AD-related proteins. RESULTS: Firstly, we proposed a hypothesis "similar diseases share similar related proteins". Therefore, five similarity calculation methods are introduced to find out others diseases which are similar to AD. Then, these diseases' related proteins could be obtained by public data set. Finally, these proteins are features of each disease and could be used to map their similarity to AD. We developed a novel method 'LRRGD' which combines Logistic Regression (LR) and Gradient Descent (GD) and borrows the idea of Random Forest (RF). LR is introduced to regress features to similarities. Borrowing the idea of RF, hundreds of LR models have been built by randomly selecting 40 features (proteins) each time. Here, GD is introduced to find out the optimal result. To avoid the drawback of local optimal solution, a good initial value is selected by some known AD-related proteins. Finally, 376 proteins are found to be related to AD. CONCLUSION: Three hundred eight of three hundred seventy-six proteins are the novel proteins. Three case studies are done to prove our method's effectiveness. These 308 proteins could give researchers a basis to do biological experiments to help treatment and diagnostic AD.


Assuntos
Doença de Alzheimer/metabolismo , Biologia Computacional/métodos , Proteínas/metabolismo , Algoritmos , Doença de Alzheimer/diagnóstico , Doença de Alzheimer/genética , Bases de Dados de Proteínas , Humanos , Proteínas/genética
16.
BMC Bioinformatics ; 20(Suppl 18): 574, 2019 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-31760947

RESUMO

BACKGROUND: As the terminal products of cellular regulatory process, functional related metabolites have a close relationship with complex diseases, and are often associated with the same or similar diseases. Therefore, identification of disease related metabolites play a critical role in understanding comprehensively pathogenesis of disease, aiming at improving the clinical medicine. Considering that a large number of metabolic markers of diseases need to be explored, we propose a computational model to identify potential disease-related metabolites based on functional relationships and scores of referred literatures between metabolites. First, obtaining associations between metabolites and diseases from the Human Metabolome database, we calculate the similarities of metabolites based on modified recommendation strategy of collaborative filtering utilizing the similarities between diseases. Next, a disease-associated metabolite network (DMN) is built with similarities between metabolites as weight. To improve the ability of identifying disease-related metabolites, we introduce scores of text mining from the existing database of chemicals and proteins into DMN and build a new disease-associated metabolite network (FLDMN) by fusing functional associations and scores of literatures. Finally, we utilize random walking with restart (RWR) in this network to predict candidate metabolites related to diseases. RESULTS: We construct the disease-associated metabolite network and its improved network (FLDMN) with 245 diseases, 587 metabolites and 28,715 disease-metabolite associations. Subsequently, we extract training sets and testing sets from two different versions of the Human Metabolome database and assess the performance of DMN and FLDMN on 19 diseases, respectively. As a result, the average AUC (area under the receiver operating characteristic curve) of DMN is 64.35%. As a further improved network, FLDMN is proven to be successful in predicting potential metabolic signatures for 19 diseases with an average AUC value of 76.03%. CONCLUSION: In this paper, a computational model is proposed for exploring metabolite-disease pairs and has good performance in predicting potential metabolites related to diseases through adequate validation. This result suggests that integrating literature and functional associations can be an effective way to construct disease associated metabolite network for prioritizing candidate diseases-related metabolites.


Assuntos
Biologia Computacional/métodos , Metaboloma , Algoritmos , Simulação por Computador , Mineração de Dados , Bases de Dados Factuais , Humanos , Publicações/estatística & dados numéricos , Curva ROC
17.
BMC Bioinformatics ; 19(Suppl 5): 116, 2018 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-29671398

RESUMO

BACKGROUND: Metabolites disrupted by abnormal state of human body are deemed as the effect of diseases. In comparison with the cause of diseases like genes, these markers are easier to be captured for the prevention and diagnosis of metabolic diseases. Currently, a large number of metabolic markers of diseases need to be explored, which drive us to do this work. METHODS: The existing metabolite-disease associations were extracted from Human Metabolome Database (HMDB) using a text mining tool NCBO annotator as priori knowledge. Next we calculated the similarity of a pair-wise metabolites based on the similarity of disease sets of them. Then, all the similarities of metabolite pairs were utilized for constructing a weighted metabolite association network (WMAN). Subsequently, the network was utilized for predicting novel metabolic markers of diseases using random walk. RESULTS: Totally, 604 metabolites and 228 diseases were extracted from HMDB. From 604 metabolites, 453 metabolites are selected to construct the WMAN, where each metabolite is deemed as a node, and the similarity of two metabolites as the weight of the edge linking them. The performance of the network is validated using the leave one out method. As a result, the high area under the receiver operating characteristic curve (AUC) (0.7048) is achieved. The further case studies for identifying novel metabolites of diabetes mellitus were validated in the recent studies. CONCLUSION: In this paper, we presented a novel method for prioritizing metabolite-disease pairs. The superior performance validates its reliability for exploring novel metabolic markers of diseases.


Assuntos
Algoritmos , Doença , Metaboloma , Análise de Dados , Bases de Dados Factuais , Humanos , Probabilidade , Reprodutibilidade dos Testes
19.
Bioinformatics ; 31(14): 2262-8, 2015 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-25788626

RESUMO

MOTIVATION: Families with inherited diseases are widely used in Mendelian/complex disease studies. Owing to the advances in high-throughput sequencing technologies, family genome sequencing becomes more and more prevalent. Visualizing family genomes can greatly facilitate human genetics studies and personalized medicine. However, due to the complex genetic relationships and high similarities among genomes of consanguineous family members, family genomes are difficult to be visualized in traditional genome visualization framework. How to visualize the family genome variants and their functions with integrated pedigree information remains a critical challenge. RESULTS: We developed the Family Genome Browser (FGB) to provide comprehensive analysis and visualization for family genomes. The FGB can visualize family genomes in both individual level and variant level effectively, through integrating genome data with pedigree information. Family genome analysis, including determination of parental origin of the variants, detection of de novo mutations, identification of potential recombination events and identical-by-decent segments, etc., can be performed flexibly. Diverse annotations for the family genome variants, such as dbSNP memberships, linkage disequilibriums, genes, variant effects, potential phenotypes, etc., are illustrated as well. Moreover, the FGB can automatically search de novo mutations and compound heterozygous variants for a selected individual, and guide investigators to find high-risk genes with flexible navigation options. These features enable users to investigate and understand family genomes intuitively and systematically. AVAILABILITY AND IMPLEMENTATION: The FGB is available at http://mlg.hit.edu.cn/FGB/.


Assuntos
Genoma Humano , Linhagem , Software , Gráficos por Computador , Variação Genética , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular
20.
Nucleic Acids Res ; 42(Web Server issue): W192-7, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24799434

RESUMO

Advances in high-throughput sequencing technologies have brought us into the individual genome era. Projects such as the 1000 Genomes Project have led the individual genome sequencing to become more and more popular. How to visualize, analyse and annotate individual genomes with knowledge bases to support genome studies and personalized healthcare is still a big challenge. The Personal Genome Browser (PGB) is developed to provide comprehensive functional annotation and visualization for individual genomes based on the genetic-molecular-phenotypic model. Investigators can easily view individual genetic variants, such as single nucleotide variants (SNVs), INDELs and structural variations (SVs), as well as genomic features and phenotypes associated to the individual genetic variants. The PGB especially highlights potential functional variants using the PGB built-in method or SIFT/PolyPhen2 scores. Moreover, the functional risks of genes could be evaluated by scanning individual genetic variants on the whole genome, a chromosome, or a cytoband based on functional implications of the variants. Investigators can then navigate to high risk genes on the scanned individual genome. The PGB accepts Variant Call Format (VCF) and Genetic Variation Format (GVF) files as the input. The functional annotation of input individual genome variants can be visualized in real time by well-defined symbols and shapes. The PGB is available at http://www.pgbrowser.org/.


Assuntos
Variação Genética , Genoma Humano , Software , Gráficos por Computador , Genômica , Humanos , Internet
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA