Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 139
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 167(5): 1369-1384.e19, 2016 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-27863249

RESUMO

Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases.


Assuntos
Células Sanguíneas/citologia , Doença/genética , Regiões Promotoras Genéticas , Linhagem da Célula , Separação Celular , Cromatina , Elementos Facilitadores Genéticos , Epigenômica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Hematopoese , Humanos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
2.
Am J Hum Genet ; 111(1): 70-81, 2024 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-38091987

RESUMO

Protein-truncating variants (PTVs) near the 3' end of genes may escape nonsense-mediated decay (NMD). PTVs in the NMD-escape region (PTVescs) can cause Mendelian disease but are difficult to interpret given their varying impact on protein function. Previously, PTVesc burden was assessed in an epilepsy cohort, but no large-scale analysis has systematically evaluated these variants in rare disease. We performed a retrospective analysis of 29,031 neurodevelopmental disorder (NDD) parent-offspring trios referred for clinical exome sequencing to identify PTVesc de novo mutations (DNMs). We identified 1,376 PTVesc DNMs and 133 genes that were significantly enriched (binomial p < 0.001). The PTVesc-enriched genes included those with PTVescs previously described to cause dominant Mendelian disease (e.g., SEMA6B, PPM1D, and DAGLA). We annotated ClinVar variants for PTVescs and identified 948 genes with at least one high-confidence pathogenic variant. Twenty-two known Mendelian PTVesc-enriched genes had no prior evidence of PTVesc-associated disease. We found 22 additional PTVesc-enriched genes that are not well established to be associated with Mendelian disease, several of which showed phenotypic similarity between individuals harboring PTVesc variants in the same gene. Four individuals with PTVesc mutations in RAB1A had similar phenotypes including NDD and spasticity. PTVesc mutations in IRF2BP1 were found in two individuals who each had severe immunodeficiency manifesting in NDD. Three individuals with PTVesc mutations in LDB1 all had NDD and multiple congenital anomalies. Using a large-scale, systematic analysis of DNMs, we extend the mutation spectrum for known Mendelian disease-associated genes and identify potentially novel disease-associated genes.


Assuntos
Epilepsia , Transtornos do Neurodesenvolvimento , Humanos , Estudos Retrospectivos , Mutação/genética , Epilepsia/genética , Fenótipo , Transtornos do Neurodesenvolvimento/genética
3.
Am J Hum Genet ; 110(1): 92-104, 2023 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-36563679

RESUMO

Variant interpretation remains a major challenge in medical genetics. We developed Meta-Domain HotSpot (MDHS) to identify mutational hotspots across homologous protein domains. We applied MDHS to a dataset of 45,221 de novo mutations (DNMs) from 31,058 individuals with neurodevelopmental disorders (NDDs) and identified three significantly enriched missense DNM hotspots in the ion transport protein domain family (PF00520). The 37 unique missense DNMs that drive enrichment affect 25 genes, 19 of which were previously associated with NDDs. 3D protein structure modeling supports the hypothesis of function-altering effects of these mutations. Hotspot genes have a unique expression pattern in tissue, and we used this pattern alongside in silico predictors and population constraint information to identify candidate NDD-associated genes. We also propose a lenient version of our method, which identifies 32 hotspot positions across 16 different protein domains. These positions are enriched for likely pathogenic variation in clinical databases and DNMs in other genetic disorders.


Assuntos
Transtornos do Neurodesenvolvimento , Humanos , Domínios Proteicos/genética , Mutação/genética , Transtornos do Neurodesenvolvimento/genética
4.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38605639

RESUMO

The accurate identification of disease-associated genes is crucial for understanding the molecular mechanisms underlying various diseases. Most current methods focus on constructing biological networks and utilizing machine learning, particularly deep learning, to identify disease genes. However, these methods overlook complex relations among entities in biological knowledge graphs. Such information has been successfully applied in other areas of life science research, demonstrating their effectiveness. Knowledge graph embedding methods can learn the semantic information of different relations within the knowledge graphs. Nonetheless, the performance of existing representation learning techniques, when applied to domain-specific biological data, remains suboptimal. To solve these problems, we construct a biological knowledge graph centered on diseases and genes, and develop an end-to-end knowledge graph completion framework for disease gene prediction using interactional tensor decomposition named KDGene. KDGene incorporates an interaction module that bridges entity and relation embeddings within tensor decomposition, aiming to improve the representation of semantically similar concepts in specific domains and enhance the ability to accurately predict disease genes. Experimental results show that KDGene significantly outperforms state-of-the-art algorithms, whether existing disease gene prediction methods or knowledge graph embedding methods for general domains. Moreover, the comprehensive biological analysis of the predicted results further validates KDGene's capability to accurately identify new candidate genes. This work proposes a scalable knowledge graph completion framework to identify disease candidate genes, from which the results are promising to provide valuable references for further wet experiments. Data and source codes are available at https://github.com/2020MEAI/KDGene.


Assuntos
Disciplinas das Ciências Biológicas , Reconhecimento Automatizado de Padrão , Algoritmos , Aprendizado de Máquina , Semântica
5.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38340090

RESUMO

MOTIVATION: Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led to fruitful insights on disease genes. RESULTS: We present an overview of the design choices and pitfalls that prove crucial in the application of network propagation methods to GWAS summary statistics. We highlight general trends from the literature, and present benchmark experiments to expand on these insights selecting as case study three diseases and five molecular networks. We verify that the use of gene-level scores based on GWAS P-values offers advantages over the selection of a set of 'seed' disease genes not weighted by the associated P-values if the GWAS summary statistics are of sufficient quality. Beyond that, the size and the density of the networks prove to be important factors for consideration. Finally, we explore several ensemble methods and show that combining multiple networks may improve the network propagation approach.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Estudo de Associação Genômica Ampla/métodos , Algoritmos , Redes Reguladoras de Genes , Predisposição Genética para Doença
6.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39256197

RESUMO

Unraveling the intricate network of associations among microRNAs (miRNAs), genes, and diseases is pivotal for deciphering molecular mechanisms, refining disease diagnosis, and crafting targeted therapies. Computational strategies, leveraging link prediction within biological graphs, present a cost-efficient alternative to high-cost empirical assays. However, while plenty of methods excel at predicting specific associations, such as miRNA-disease associations (MDAs), miRNA-target interactions (MTIs), and disease-gene associations (DGAs), a holistic approach harnessing diverse data sources for multifaceted association prediction remains largely unexplored. The limited availability of high-quality data, as vitro experiments to comprehensively confirm associations are often expensive and time-consuming, results in a sparse and noisy heterogeneous graph, hindering an accurate prediction of these complex associations. To address this challenge, we propose a novel framework called Global-local aware Heterogeneous Graph Contrastive Learning (GlaHGCL). GlaHGCL combines global and local contrastive learning to improve node embeddings in the heterogeneous graph. In particular, global contrastive learning enhances the robustness of node embeddings against noise by aligning global representations of the original graph and its augmented counterpart. Local contrastive learning enforces representation consistency between functionally similar or connected nodes across diverse data sources, effectively leveraging data heterogeneity and mitigating the issue of data scarcity. The refined node representations are applied to downstream tasks, such as MDA, MTI, and DGA prediction. Experiments show GlaHGCL outperforming state-of-the-art methods, and case studies further demonstrate its ability to accurately uncover new associations among miRNAs, genes, and diseases. We have made the datasets and source code publicly available at https://github.com/Sue-syx/GlaHGCL.


Assuntos
Biologia Computacional , Redes Reguladoras de Genes , MicroRNAs , MicroRNAs/genética , Humanos , Biologia Computacional/métodos , Aprendizado de Máquina , Algoritmos , Predisposição Genética para Doença
7.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36987781

RESUMO

Identifying disease-gene associations is a fundamental and critical biomedical task towards understanding molecular mechanisms, the diagnosis and treatment of diseases. It is time-consuming and expensive to experimentally verify causal links between diseases and genes. Recently, deep learning methods have achieved tremendous success in identifying candidate genes for genetic diseases. The gene prediction problem can be modeled as a link prediction problem based on the features of nodes and edges of the gene-disease graph. However, most existing researches either build homogeneous networks based on one single data source or heterogeneous networks based on multi-source data, and artificially define meta-paths, so as to learn the network representation of diseases and genes. The former cannot make use of abundant multi-source heterogeneous information, while the latter needs domain knowledge and experience when defining meta-paths, and the accuracy of the model largely depends on the definition of meta-paths. To address the aforementioned challenges above bottlenecks, we propose an end-to-end disease-gene association prediction model with parallel graph transformer network (DGP-PGTN), which deeply integrates the heterogeneous information of diseases, genes, ontologies and phenotypes. DGP-PGTN can automatically and comprehensively capture the multiple latent interactions between diseases and genes, discover the causal relationship between them and is fully interpretable at the same time. We conduct comprehensive experiments and show that DGP-PGTN outperforms the state-of-the-art methods significantly on the task of disease-gene association prediction. Furthermore, DGP-PGTN can automatically learn the implicit relationship between diseases and genes without manually defining meta paths.


Assuntos
Algoritmos , Redes Neurais de Computação , Fenótipo
8.
BMC Bioinformatics ; 25(1): 84, 2024 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-38413851

RESUMO

BACKGROUND: Thousands of genes have been associated with different Mendelian conditions. One of the valuable sources to track these gene-disease associations (GDAs) is the Online Mendelian Inheritance in Man (OMIM) database. However, most of the information in OMIM is textual, and heterogeneous (e.g. summarized by different experts), which complicates automated reading and understanding of the data. Here, we used Natural Language Processing (NLP) to make a tool (Gene-Phenotype Association Discovery (GPAD)) that could syntactically process OMIM text and extract the data of interest. RESULTS: GPAD applies a series of language-based techniques to the text obtained from OMIM API to extract GDA discovery-related information. GPAD can inform when a particular gene was associated with a specific phenotype, as well as the type of validation-whether through model organisms or cohort-based patient-matching approaches-for such an association. GPAD extracted data was validated with published reports and was compared with large language model. Utilizing GPAD's extracted data, we analysed trends in GDA discoveries, noting a significant increase in their rate after the introduction of exome sequencing, rising from an average of about 150-250 discoveries each year. Contrary to hopes of resolving most GDAs for Mendelian disorders by now, our data indicate a substantial decline in discovery rates over the past five years (2017-2022). This decline appears to be linked to the increasing necessity for larger cohorts to substantiate GDAs. The rising use of zebrafish and Drosophila as model organisms in providing evidential support for GDAs is also observed. CONCLUSIONS: GPAD's real-time analyzing capacity offers an up-to-date view of GDA discovery and could help in planning and managing the research strategies. In future, this solution can be extended or modified to capture other information in OMIM and scientific literature.


Assuntos
Processamento de Linguagem Natural , Peixe-Zebra , Humanos , Animais , Fenótipo , Bases de Dados Genéticas , Previsões
9.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36151740

RESUMO

Drug discovery and development is a complex and costly process. Machine learning approaches are being investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Of these, those that use Knowledge Graphs (KG) have promise in many tasks, including drug repurposing, drug toxicity prediction and target gene-disease prioritization. In a drug discovery KG, crucial elements including genes, diseases and drugs are represented as entities, while relationships between them indicate an interaction. However, to construct high-quality KGs, suitable data are required. In this review, we detail publicly available sources suitable for use in constructing drug discovery focused KGs. We aim to help guide machine learning and KG practitioners who are interested in applying new techniques to the drug discovery field, but who may be unfamiliar with the relevant data sources. The datasets are selected via strict criteria, categorized according to the primary type of information contained within and are considered based upon what information could be extracted to build a KG. We then present a comparative analysis of existing public drug discovery KGs and an evaluation of selected motivating case studies from the literature. Additionally, we raise numerous and unique challenges and issues associated with the domain and its datasets, while also highlighting key future research directions. We hope this review will motivate KGs use in solving key and emerging questions in the drug discovery domain.


Assuntos
Aprendizado de Máquina , Reconhecimento Automatizado de Padrão , Descoberta de Drogas , Conhecimento , Armazenamento e Recuperação da Informação
10.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34953465

RESUMO

Alzheimer's disease (AD) has a strong genetic predisposition. However, its risk genes remain incompletely identified. We developed an Alzheimer's brain gene network-based approach to predict AD-associated genes by leveraging the functional pattern of known AD-associated genes. Our constructed network outperformed existing networks in predicting AD genes. We then systematically validated the predictions using independent genetic, transcriptomic, proteomic data, neuropathological and clinical data. First, top-ranked genes were enriched in AD-associated pathways. Second, using external gene expression data from the Mount Sinai Brain Bank study, we found that the top-ranked genes were significantly associated with neuropathological and clinical traits, including the Consortium to Establish a Registry for Alzheimer's Disease score, Braak stage score and clinical dementia rating. The analysis of Alzheimer's brain single-cell RNA-seq data revealed cell-type-specific association of predicted genes with early pathology of AD. Third, by interrogating proteomic data in the Religious Orders Study and Memory and Aging Project and Baltimore Longitudinal Study of Aging studies, we observed a significant association of protein expression level with cognitive function and AD clinical severity. The network, method and predictions could become a valuable resource to advance the identification of risk genes for AD.


Assuntos
Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Encéfalo/metabolismo , Redes Reguladoras de Genes , Predisposição Genética para Doença , Envelhecimento/genética , Perfilação da Expressão Gênica , Humanos , Estudos Longitudinais , Memória , Proteômica , RNA-Seq , Transcriptoma
11.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35880623

RESUMO

Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KGs) that utilize the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inference tasks such as predicting missing links. One common application is to produce ranked lists of genes for a given disease, where the rank is based on the perceived likelihood of association between the gene and the disease. It is thus critical that these predictions are not only pertinent but also biologically meaningful. However, KGs can be biased either directly due to the underlying data sources that are integrated or due to modelling choices in the construction of the graph, one consequence of which is that certain entities can get topologically overrepresented. We demonstrate the effect of these inherent structural imbalances, resulting in densely connected entities being highly ranked no matter the context. We provide support for this observation across different datasets, models as well as predictive tasks. Further, we present various graph perturbation experiments which yield more support to the observation that KGE models can be more influenced by the frequency of entities rather than any biological information encoded within the relations. Our results highlight the importance of data modelling choices, and emphasizes the need for practitioners to be mindful of these issues when interpreting model outputs and during KG composition.


Assuntos
Aprendizado de Máquina , Reconhecimento Automatizado de Padrão , Conhecimento
12.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35275996

RESUMO

MOTIVATION: Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS: We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS: The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.


Assuntos
Algoritmos , Biologia Computacional , Biologia Computacional/métodos , Modelos Estatísticos , Proteínas
13.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34727570

RESUMO

Brain disease gene identification is critical for revealing the biological mechanism and developing drugs for brain diseases. To enhance the identification of brain disease genes, similarity-based computational methods, especially network-based methods, have been adopted for narrowing down the searching space. However, these network-based methods only use molecular networks, ignoring brain connectome data, which have been widely used in many brain-related studies. In our study, we propose a novel framework, named brainMI, for integrating brain connectome data and molecular-based gene association networks to predict brain disease genes. For the consistent representation of molecular-based network data and brain connectome data, brainMI first constructs a novel gene network, called brain functional connectivity (BFC)-based gene network, based on resting-state functional magnetic resonance imaging data and brain region-specific gene expression data. Then, a multiple network integration method is proposed to learn low-dimensional features of genes by integrating the BFC-based gene network and existing protein-protein interaction networks. Finally, these features are utilized to predict brain disease genes based on a support vector machine-based model. We evaluate brainMI on four brain diseases, including Alzheimer's disease, Parkinson's disease, major depressive disorder and autism. brainMI achieves of 0.761, 0.729, 0.728 and 0.744 using the BFC-based gene network alone and enhances the molecular network-based performance by 6.3% on average. In addition, the results show that brainMI achieves higher performance in predicting brain disease genes compared to the existing three state-of-the-art methods.


Assuntos
Doença de Alzheimer , Conectoma , Transtorno Depressivo Maior , Encéfalo/diagnóstico por imagem , Conectoma/métodos , Humanos , Imageamento por Ressonância Magnética/métodos
14.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35551347

RESUMO

Understanding the biological functions of molecules in specific human tissues or cell types is crucial for gaining insights into human physiology and disease. To address this issue, it is essential to systematically uncover associations among multilevel elements consisting of disease phenotypes, tissues, cell types and molecules, which could pose a challenge because of their heterogeneity and incompleteness. To address this challenge, we describe a new methodological framework, called Graph Local InfoMax (GLIM), based on a human multilevel network (HMLN) that we established by introducing multiple tissues and cell types on top of molecular networks. GLIM can systematically mine the potential relationships between multilevel elements by embedding the features of the HMLN through contrastive learning. Our simulation results demonstrated that GLIM consistently outperforms other state-of-the-art algorithms in disease gene prediction. Moreover, GLIM was also successfully used to infer cell markers and rewire intercellular and molecular interactions in the context of specific tissues or diseases. As a typical case, the tissue-cell-molecule network underlying gastritis and gastric cancer was first uncovered by GLIM, providing systematic insights into the mechanism underlying the occurrence and development of gastric cancer. Overall, our constructed methodological framework has the potential to systematically uncover complex disease mechanisms and mine high-quality relationships among phenotypical, tissue, cellular and molecular elements.


Assuntos
Biologia Computacional , Neoplasias Gástricas , Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Humanos
15.
Pediatr Int ; 66(1): e15760, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38641939

RESUMO

Diseases are caused by genetic and/or environmental factors. It is important to understand the pathomechanism of monogenic diseases that are caused only by genetic factors, especially prenatal- or childhood-onset diseases for pediatricians. Identifying "novel" disease genes and elucidating how genomic changes lead to human phenotypes would develop new therapeutic approaches for rare diseases for which no fundamental cure has yet been established. Genomic analysis has evolved along with the development of analytical techniques, from Sanger sequencing (first-generation sequencing) to techniques such as comparative genomic hybridization, massive parallel short-read sequencing (using a next-generation sequencer or second-generation sequencer) and long-read sequencing (using a next-next generation sequencer or third-generation sequencer). I have been researching human genetics using conventional and new technologies, together with my mentors and numerous collaborators, and have identified genes responsible for more than 60 diseases. Here, an overview of genomic analyses of monogenic diseases that aims to identify novel disease genes, and several examples using different approaches depending on the disease characteristics are presented.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Criança , Hibridização Genômica Comparativa , Fenótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos
16.
BMC Med ; 21(1): 267, 2023 07 24.
Artigo em Inglês | MEDLINE | ID: mdl-37488529

RESUMO

BACKGROUND: Comorbidities are expected to impact the pathophysiology of heart failure (HF) with preserved ejection fraction (HFpEF). However, comorbidity profiles are usually reduced to a few comorbid disorders. Systems medicine approaches can model phenome-wide comorbidity profiles to improve our understanding of HFpEF and infer associated genetic profiles. METHODS: We retrospectively explored 569 comorbidities in 29,047 HF patients, including 8062 HFpEF and 6585 HF with reduced ejection fraction (HFrEF) patients from a German university hospital. We assessed differences in comorbidity profiles between HF subtypes via multiple correspondence analysis. Then, we used machine learning classifiers to identify distinctive comorbidity profiles of HFpEF and HFrEF patients. Moreover, we built a comorbidity network (HFnet) to identify the main disease clusters that summarized the phenome-wide comorbidity. Lastly, we predicted novel gene candidates for HFpEF by linking the HFnet to a multilayer gene network, integrating multiple databases. To corroborate HFpEF candidate genes, we collected transcriptomic data in a murine HFpEF model. We compared predicted genes with the murine disease signature as well as with the literature. RESULTS: We found a high degree of variance between the comorbidity profiles of HFpEF and HFrEF, while each was more similar to HFmrEF. The comorbidities present in HFpEF patients were more diverse than those in HFrEF and included neoplastic, osteologic and rheumatoid disorders. Disease communities in the HFnet captured important comorbidity concepts of HF patients which could be assigned to HF subtypes, age groups, and sex. Based on the HFpEF comorbidity profile, we predicted and recovered gene candidates, including genes involved in fibrosis (COL3A1, LOX, SMAD9, PTHL), hypertrophy (GATA5, MYH7), oxidative stress (NOS1, GSST1, XDH), and endoplasmic reticulum stress (ATF6). Finally, predicted genes were significantly overrepresented in the murine transcriptomic disease signature providing additional plausibility for their relevance. CONCLUSIONS: We applied systems medicine concepts to analyze comorbidity profiles in a HF patient cohort. We were able to identify disease clusters that helped to characterize HF patients. We derived a distinct comorbidity profile for HFpEF, which was leveraged to suggest novel candidate genes via network propagation. The identification of distinctive comorbidity profiles and candidate genes from routine clinical data provides insights that may be leveraged to improve diagnosis and identify treatment targets for HFpEF patients.


Assuntos
Insuficiência Cardíaca , Medicina , Humanos , Animais , Camundongos , Estudos Retrospectivos , Volume Sistólico , Comorbidade
17.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33866352

RESUMO

The prediction of genes related to diseases is important to the study of the diseases due to high cost and time consumption of biological experiments. Network propagation is a popular strategy for disease-gene prediction. However, existing methods focus on the stable solution of dynamics while ignoring the useful information hidden in the dynamical process, and it is still a challenge to make use of multiple types of physical/functional relationships between proteins/genes to effectively predict disease-related genes. Therefore, we proposed a framework of network impulsive dynamics on multiplex biological network (NIDM) to predict disease-related genes, along with four variants of NIDM models and four kinds of impulsive dynamical signatures (IDSs). NIDM is to identify disease-related genes by mining the dynamical responses of nodes to impulsive signals being exerted at specific nodes. By a series of experimental evaluations in various types of biological networks, we confirmed the advantage of multiplex network and the important roles of functional associations in disease-gene prediction, demonstrated superior performance of NIDM compared with four types of network-based algorithms and then gave the effective recommendations of NIDM models and IDS signatures. To facilitate the prioritization and analysis of (candidate) genes associated to specific diseases, we developed a user-friendly web server, which provides three kinds of filtering patterns for genes, network visualization, enrichment analysis and a wealth of external links (http://bioinformatics.csu.edu.cn/DGP/NID.jsp). NIDM is a protocol for disease-gene prediction integrating different types of biological networks, which may become a very useful computational tool for the study of disease-related genes.


Assuntos
Algoritmos , Biologia Computacional/métodos , Redes Reguladoras de Genes , Estudos de Associação Genética/métodos , Predisposição Genética para Doença/genética , Proteínas/genética , Humanos , Mapas de Interação de Proteínas/genética , Proteínas/metabolismo , Reprodutibilidade dos Testes
18.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33276376

RESUMO

Disease-gene association through genome-wide association study (GWAS) is an arduous task for researchers. Investigating single nucleotide polymorphisms that correlate with specific diseases needs statistical analysis of associations. Considering the huge number of possible mutations, in addition to its high cost, another important drawback of GWAS analysis is the large number of false positives. Thus, researchers search for more evidence to cross-check their results through different sources. To provide the researchers with alternative and complementary low-cost disease-gene association evidence, computational approaches come into play. Since molecular networks are able to capture complex interplay among molecules in diseases, they become one of the most extensively used data for disease-gene association prediction. In this survey, we aim to provide a comprehensive and up-to-date review of network-based methods for disease gene prediction. We also conduct an empirical analysis on 14 state-of-the-art methods. To summarize, we first elucidate the task definition for disease gene prediction. Secondly, we categorize existing network-based efforts into network diffusion methods, traditional machine learning methods with handcrafted graph features and graph representation learning methods. Thirdly, an empirical analysis is conducted to evaluate the performance of the selected methods across seven diseases. We also provide distinguishing findings about the discussed methods based on our empirical analysis. Finally, we highlight potential research directions for future studies on disease gene prediction.


Assuntos
Bases de Dados de Ácidos Nucleicos , Doença/genética , Aprendizado de Máquina , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla , Humanos
19.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34013329

RESUMO

The basis of several recent methods for drug repurposing is the key principle that an efficacious drug will reverse the disease molecular 'signature' with minimal side effects. This principle was defined and popularized by the influential 'connectivity map' study in 2006 regarding reversal relationships between disease- and drug-induced gene expression profiles, quantified by a disease-drug 'connectivity score.' Over the past 15 years, several studies have proposed variations in calculating connectivity scores toward improving accuracy and robustness in light of massive growth in reference drug profiles. However, these variations have been formulated inconsistently using various notations and terminologies even though they are based on a common set of conceptual and statistical ideas. Therefore, we present a systematic reconciliation of multiple disease-drug similarity metrics ($ES$, $css$, $Sum$, $Cosine$, $XSum$, $XCor$, $XSpe$, $XCos$, $EWCos$) and connectivity scores ($CS$, $RGES$, $NCS$, $WCS$, $Tau$, $CSS$, $EMUDRA$) by defining them using consistent notation and terminology. In addition to providing clarity and deeper insights, this coherent definition of connectivity scores and their relationships provides a unified scheme that newer methods can adopt, enabling the computational drug-development community to compare and investigate different approaches easily. To facilitate the continuous and transparent integration of newer methods, this article will be available as a live document (https://jravilab.github.io/connectivity_scores) coupled with a GitHub repository (https://github.com/jravilab/connectivity_scores) that any researcher can build on and push changes to.


Assuntos
Biologia Computacional/métodos , Descoberta de Drogas/métodos , Reposicionamento de Medicamentos/métodos , Perfilação da Expressão Gênica/métodos , Farmacogenética/métodos , Algoritmos , Biomarcadores , Regulação da Expressão Gênica/efeitos dos fármacos , Humanos , Transcriptoma
20.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33634312

RESUMO

Network-based gene prioritization algorithms are designed to prioritize disease-associated genes based on known ones using biological networks of protein interactions, gene-disease associations (GDAs) and other relationships between biological entities. Various algorithms have been developed based on different mechanisms, but it is not obvious which algorithm is optimal for a specific disease. To address this issue, we benchmarked multiple algorithms for their application in cerebral small vessel disease (cSVD). We curated protein-gene interactions (PGIs) and GDAs from databases and assembled PGI networks and disease-gene heterogeneous networks. A screening of algorithms resulted in seven representative algorithms to be benchmarked. Performance of algorithms was assessed using both leave-one-out cross-validation (LOOCV) and external validation with MEGASTROKE genome-wide association study (GWAS). We found that random walk with restart on the heterogeneous network (RWRH) showed best LOOCV performance, with median LOOCV rediscovery rank of 185.5 (out of 19 463 genes). The GenePanda algorithm had most GWAS-confirmable genes in top 200 predictions, while RWRH had best ranks for small vessel stroke-associated genes confirmed in GWAS. In conclusion, RWRH has overall better performance for application in cSVD despite its susceptibility to bias caused by degree centrality. Choice of algorithms should be determined before applying to specific disease. Current pure network-based gene prioritization algorithms are unlikely to find novel disease-associated genes that are not associated with known ones. The tools for implementing and benchmarking algorithms have been made available and can be generalized for other diseases.


Assuntos
Benchmarking/métodos , Doenças de Pequenos Vasos Cerebrais/genética , Biologia Computacional/métodos , Redes Reguladoras de Genes , Mapas de Interação de Proteínas/genética , Algoritmos , Estudo de Associação Genômica Ampla , Humanos , Família Multigênica , Fenótipo , Fatores de Risco
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA