Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
bioRxiv ; 2024 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-38464086

RESUMEN

Elucidating gene regulatory networks (GRNs) is a major area of study within plant systems biology. Phenotypic traits are intricately linked to specific gene expression profiles. These expression patterns arise primarily from regulatory connections between sets of transcription factors (TFs) and their target genes. In this study, we integrated publicly available co-expression networks derived from more than 6,000 RNA-seq samples, 283 protein-DNA interaction assays, and 16 million of SNPs used to identify expression quantitative loci (eQTL), to construct TF-target networks. In total, we analyzed ~4.6M interactions to generate four distinct types of TF-target networks: co-expression, protein-DNA interaction (PDI), trans-expression quantitative loci (trans-eQTL), and cis-eQTL combined with PDIs. To improve the functional annotation of TFs based on its target genes, we implemented three different strategies to integrate these four types of networks. We subsequently evaluated the effectiveness of our method through loss-of function mutant and random networks. The multi-network integration allowed us to identify transcriptional regulators of hormone-, metabolic- and development-related processes. Finally, using the topological properties of the fully integrated network, we identified potentially functional redundant TF paralogs. Our findings retrieved functions previously documented for numerous TFs and revealed novel functions that are crucial for informing the design of future experiments. The approach here-described lays the foundation for the integration of multi-omic datasets in maize and other plant systems.

2.
PLoS Comput Biol ; 20(1): e1011773, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38198480

RESUMEN

Network-based machine learning (ML) has the potential for predicting novel genes associated with nearly any health and disease context. However, this approach often uses network information from only the single species under consideration even though networks for most species are noisy and incomplete. While some recent methods have begun addressing this shortcoming by using networks from more than one species, they lack one or more key desirable properties: handling networks from more than two species simultaneously, incorporating many-to-many orthology information, or generating a network representation that is reusable across different types of and newly-defined prediction tasks. Here, we present GenePlexusZoo, a framework that casts molecular networks from multiple species into a single reusable feature space for network-based ML. We demonstrate that this multi-species network representation improves both gene classification within a single species and knowledge-transfer across species, even in cases where the inter-species correspondence is undetectable based on shared orthologous genes. Thus, GenePlexusZoo enables effectively leveraging the high evolutionary molecular, functional, and phenotypic conservation across species to discover novel genes associated with diverse biological contexts.


Asunto(s)
Genómica , Aprendizaje Automático , Genómica/métodos
3.
PLoS Biol ; 21(12): e3002397, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38051702

RESUMEN

Since they emerged approximately 125 million years ago, flowering plants have evolved to dominate the terrestrial landscape and survive in the most inhospitable environments on earth. At their core, these adaptations have been shaped by changes in numerous, interconnected pathways and genes that collectively give rise to emergent biological phenomena. Linking gene expression to morphological outcomes remains a grand challenge in biology, and new approaches are needed to begin to address this gap. Here, we implemented topological data analysis (TDA) to summarize the high dimensionality and noisiness of gene expression data using lens functions that delineate plant tissue and stress responses. Using this framework, we created a topological representation of the shape of gene expression across plant evolution, development, and environment for the phylogenetically diverse flowering plants. The TDA-based Mapper graphs form a well-defined gradient of tissues from leaves to seeds, or from healthy to stressed samples, depending on the lens function. This suggests that there are distinct and conserved expression patterns across angiosperms that delineate different tissue types or responses to biotic and abiotic stresses. Genes that correlate with the tissue lens function are enriched in central processes such as photosynthetic, growth and development, housekeeping, or stress responses. Together, our results highlight the power of TDA for analyzing complex biological data and reveal a core expression backbone that defines plant form and function.


Asunto(s)
Magnoliopsida , Magnoliopsida/genética , Plantas/genética , Estrés Fisiológico/genética , Hojas de la Planta/genética , Expresión Génica , Regulación de la Expresión Génica de las Plantas/genética
4.
Angew Chem Int Ed Engl ; 62(28): e202305982, 2023 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-37178313

RESUMEN

The role of ß-CoOOH crystallographic orientations in catalytic activity for the oxygen evolution reaction (OER) remains elusive. We combine correlative electron backscatter diffraction/scanning electrochemical cell microscopy with X-ray photoelectron spectroscopy, transmission electron microscopy, and atom probe tomography to establish the structure-activity relationships of various faceted ß-CoOOH formed on a Co microelectrode under OER conditions. We reveal that ≈6 nm ß-CoOOH(01 1 ‾ ${\bar{1}}$ 0), grown on [ 1 ‾ 2 1 ‾ ${\bar{1}2\bar{1}}$ 0]-oriented Co, exhibits higher OER activity than ≈3 nm ß-CoOOH(10 1 ‾ ${\bar{1}}$ 3) or ≈6 nm ß-CoOOH(0006) formed on [02 2 ‾ 1 ] ${\bar{2}1]}$ - and [0001]-oriented Co, respectively. This arises from higher amounts of incorporated hydroxyl ions and more easily reducible CoIII -O sites present in ß-CoOOH(01 1 ‾ ${\bar{1}}$ 0) than those in the latter two oxyhydroxide facets. Our correlative multimodal approach shows great promise in linking local activity with atomic-scale details of structure, thickness and composition of active species, which opens opportunities to design pre-catalysts with preferred defects that promote the formation of the most active OER species.

5.
Bioinformatics ; 39(2)2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36721325

RESUMEN

SUMMARY: PyGenePlexus is a Python package that enables a user to gain insight into any gene set of interest through a molecular interaction network informed supervised machine learning model. PyGenePlexus provides predictions of how associated every gene in the network is to the input gene set, offers interpretability by comparing the model trained on the input gene set to models trained on thousands of known gene sets, and returns the network connectivity of the top predicted genes. AVAILABILITY AND IMPLEMENTATION: https://pypi.org/project/geneplexus/ and https://github.com/krishnanlab/PyGenePlexus. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Programas Informáticos , Aprendizaje Automático , Aprendizaje Automático Supervisado , Estudios de Asociación Genética
6.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36688699

RESUMEN

MOTIVATION: Accurately representing biological networks in a low-dimensional space, also known as network embedding, is a critical step in network-based machine learning and is carried out widely using node2vec, an unsupervised method based on biased random walks. However, while many networks, including functional gene interaction networks, are dense, weighted graphs, node2vec is fundamentally limited in its ability to use edge weights during the biased random walk generation process, thus under-using all the information in the network. RESULTS: Here, we present node2vec+, a natural extension of node2vec that accounts for edge weights when calculating walk biases and reduces to node2vec in the cases of unweighted graphs or unbiased walks. Using two synthetic datasets, we empirically show that node2vec+ is more robust to additive noise than node2vec in weighted graphs. Then, using genome-scale functional gene networks to solve a wide range of gene function and disease prediction tasks, we demonstrate the superior performance of node2vec+ over node2vec in the case of weighted graphs. Notably, due to the limited amount of training data in the gene classification tasks, graph neural networks such as GCN and GraphSAGE are outperformed by both node2vec and node2vec+. AVAILABILITY AND IMPLEMENTATION: The data and code are available on GitHub at https://github.com/krishnanlab/node2vecplus_benchmarks. All additional data underlying this article are available on Zenodo at https://doi.org/10.5281/zenodo.7007164. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Redes Reguladoras de Genes , Fenotipo , Epistasis Genética
7.
Nat Commun ; 13(1): 6736, 2022 11 08.
Artículo en Inglés | MEDLINE | ID: mdl-36347858

RESUMEN

There are currently >1.3 million human -omics samples that are publicly available. This valuable resource remains acutely underused because discovering particular samples from this ever-growing data collection remains a significant challenge. The major impediment is that sample attributes are routinely described using varied terminologies written in unstructured natural language. We propose a natural-language-processing-based machine learning approach (NLP-ML) to infer tissue and cell-type annotations for genomics samples based only on their free-text metadata. NLP-ML works by creating numerical representations of sample descriptions and using these representations as features in a supervised learning classifier that predicts tissue/cell-type terms. Our approach significantly outperforms an advanced graph-based reasoning annotation method (MetaSRA) and a baseline exact string matching method (TAGGER). Model similarities between related tissues demonstrate that NLP-ML models capture biologically-meaningful signals in text. Additionally, these models correctly classify tissue-associated biological processes and diseases based on their text descriptions alone. NLP-ML models are nearly as accurate as models based on gene-expression profiles in predicting sample tissue annotations but have the distinct capability to classify samples irrespective of the genomics experiment type based on their text metadata. Python NLP-ML prediction code and trained tissue models are available at https://github.com/krishnanlab/txt2onto .


Asunto(s)
Metadatos , Procesamiento de Lenguaje Natural , Humanos , Aprendizaje Automático , Genómica , Lenguaje
8.
Front Pharmacol ; 13: 995459, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36313344

RESUMEN

Complex diseases are associated with a wide range of cellular, physiological, and clinical phenotypes. To advance our understanding of disease mechanisms and our ability to treat these diseases, it is critical to delineate the molecular basis and therapeutic avenues of specific disease phenotypes, especially those that are associated with multiple diseases. Inflammatory processes constitute one such prominent phenotype, being involved in a wide range of health problems including ischemic heart disease, stroke, cancer, diabetes mellitus, chronic kidney disease, non-alcoholic fatty liver disease, and autoimmune and neurodegenerative conditions. While hundreds of genes might play a role in the etiology of each of these diseases, isolating the genes involved in the specific phenotype (e.g., inflammation "component") could help us understand the genes and pathways underlying this phenotype across diseases and predict potential drugs to target the phenotype. Here, we present a computational approach that integrates gene interaction networks, disease-/trait-gene associations, and drug-target information to accomplish this goal. We apply this approach to isolate gene signatures of complex diseases that correspond to chronic inflammation and use SAveRUNNER to prioritize drugs to reveal new therapeutic opportunities.

9.
Nucleic Acids Res ; 50(W1): W358-W366, 2022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35580053

RESUMEN

Biomedical researchers take advantage of high-throughput, high-coverage technologies to routinely generate sets of genes of interest across a wide range of biological conditions. Although these technologies have directly shed light on the molecular underpinnings of various biological processes and diseases, the list of genes from any individual experiment is often noisy and incomplete. Additionally, interpreting these lists of genes can be challenging in terms of how they are related to each other and to other genes in the genome. In this work, we present GenePlexus (https://www.geneplexus.net/), a web-server that allows a researcher to utilize a powerful, network-based machine learning method to gain insights into their gene set of interest and additional functionally similar genes. Once a user uploads their own set of human genes and chooses between a number of different human network representations, GenePlexus provides predictions of how associated every gene in the network is to the input set. The web-server also provides interpretability through network visualization and comparison to other machine learning models trained on thousands of known process/pathway and disease gene sets. GenePlexus is free and open to all users without the need for registration.


Asunto(s)
Computadores , Programas Informáticos , Humanos , Genoma , Aprendizaje Automático , Estudios de Asociación Genética , Internet
10.
Genome Biol ; 23(1): 1, 2022 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-34980209

RESUMEN

BACKGROUND: Constructing gene coexpression networks is a powerful approach for analyzing high-throughput gene expression data towards module identification, gene function prediction, and disease-gene prioritization. While optimal workflows for constructing coexpression networks, including good choices for data pre-processing, normalization, and network transformation, have been developed for microarray-based expression data, such well-tested choices do not exist for RNA-seq data. Almost all studies that compare data processing and normalization methods for RNA-seq focus on the end goal of determining differential gene expression. RESULTS: Here, we present a comprehensive benchmarking and analysis of 36 different workflows, each with a unique set of normalization and network transformation methods, for constructing coexpression networks from RNA-seq datasets. We test these workflows on both large, homogenous datasets and small, heterogeneous datasets from various labs. We analyze the workflows in terms of aggregate performance, individual method choices, and the impact of multiple dataset experimental factors. Our results demonstrate that between-sample normalization has the biggest impact, with counts adjusted by size factors producing networks that most accurately recapitulate known tissue-naive and tissue-aware gene functional relationships. CONCLUSIONS: Based on this work, we provide concrete recommendations on robust procedures for building an accurate coexpression network from an RNA-seq dataset. In addition, researchers can examine all the results in great detail at https://krishnanlab.github.io/RNAseq_coexpression to make appropriate choices for coexpression analysis based on the experimental factors of their RNA-seq dataset.


Asunto(s)
Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Perfilación de la Expresión Génica/métodos , RNA-Seq , Análisis de Secuencia de ARN/métodos , Secuenciación del Exoma
11.
Genome Med ; 13(1): 163, 2021 10 18.
Artículo en Inglés | MEDLINE | ID: mdl-34657631

RESUMEN

BACKGROUND: Recent studies have suggested that individual variants do not sufficiently explain the variable expressivity of phenotypes observed in complex disorders. For example, the 16p12.1 deletion is associated with developmental delay and neuropsychiatric features in affected individuals, but is inherited in > 90% of cases from a mildly-affected parent. While children with the deletion are more likely to carry additional "second-hit" variants than their parents, the mechanisms for how these variants contribute to phenotypic variability are unknown. METHODS: We performed detailed clinical assessments, whole-genome sequencing, and RNA sequencing of lymphoblastoid cell lines for 32 individuals in five large families with multiple members carrying the 16p12.1 deletion. We identified contributions of the 16p12.1 deletion and "second-hit" variants towards a range of expression changes in deletion carriers and their family members, including differential expression, outlier expression, alternative splicing, allele-specific expression, and expression quantitative trait loci analyses. RESULTS: We found that the deletion dysregulates multiple autism and brain development genes such as FOXP1, ANK3, and MEF2. Carrier children also showed an average of 5323 gene expression changes compared with one or both parents, which matched with 33/39 observed developmental phenotypes. We identified significant enrichments for 13/25 classes of "second-hit" variants in genes with expression changes, where 4/25 variant classes were only enriched when inherited from the noncarrier parent, including loss-of-function SNVs and large duplications. In 11 instances, including for ZEB2 and SYNJ1, gene expression was synergistically altered by both the deletion and inherited "second-hits" in carrier children. Finally, brain-specific interaction network analysis showed strong connectivity between genes carrying "second-hits" and genes with transcriptome alterations in deletion carriers. CONCLUSIONS: Our results suggest a potential mechanism for how "second-hit" variants modulate expressivity of complex disorders such as the 16p12.1 deletion through transcriptomic perturbation of gene networks important for early development. Our work further shows that family-based assessments of transcriptome data are highly relevant towards understanding the genetic mechanisms associated with complex disorders.


Asunto(s)
Variación Biológica Poblacional , Deleción Cromosómica , Expresión Génica , Ancirinas/genética , Trastorno Autístico/genética , Encéfalo , Familia , Factores de Transcripción Forkhead/genética , Humanos , Fenotipo , Monoéster Fosfórico Hidrolasas/genética , Proteínas Represoras/genética , Factores de Transcripción/genética , Secuenciación del Exoma , Secuenciación Completa del Genoma , Caja Homeótica 2 de Unión a E-Box con Dedos de Zinc/genética
12.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34013329

RESUMEN

The basis of several recent methods for drug repurposing is the key principle that an efficacious drug will reverse the disease molecular 'signature' with minimal side effects. This principle was defined and popularized by the influential 'connectivity map' study in 2006 regarding reversal relationships between disease- and drug-induced gene expression profiles, quantified by a disease-drug 'connectivity score.' Over the past 15 years, several studies have proposed variations in calculating connectivity scores toward improving accuracy and robustness in light of massive growth in reference drug profiles. However, these variations have been formulated inconsistently using various notations and terminologies even though they are based on a common set of conceptual and statistical ideas. Therefore, we present a systematic reconciliation of multiple disease-drug similarity metrics ($ES$, $css$, $Sum$, $Cosine$, $XSum$, $XCor$, $XSpe$, $XCos$, $EWCos$) and connectivity scores ($CS$, $RGES$, $NCS$, $WCS$, $Tau$, $CSS$, $EMUDRA$) by defining them using consistent notation and terminology. In addition to providing clarity and deeper insights, this coherent definition of connectivity scores and their relationships provides a unified scheme that newer methods can adopt, enabling the computational drug-development community to compare and investigate different approaches easily. To facilitate the continuous and transparent integration of newer methods, this article will be available as a live document (https://jravilab.github.io/connectivity_scores) coupled with a GitHub repository (https://github.com/jravilab/connectivity_scores) that any researcher can build on and push changes to.


Asunto(s)
Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Reposicionamiento de Medicamentos/métodos , Perfilación de la Expresión Génica/métodos , Farmacogenética/métodos , Algoritmos , Biomarcadores , Regulación de la Expresión Génica/efectos de los fármacos , Humanos , Transcriptoma
13.
BMJ Open ; 11(5): e044404, 2021 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-33986050

RESUMEN

INTRODUCTION: Available evidence suggests that some racial/ethnic minority populations may be disproportionately burdened by dementia. Cohort studies are an important tool for defining and understanding the causes behind these racial and ethnic inequalities. However, ethnic minority populations may be more likely to be excluded from such research. Therefore, the aim of this study is to systematically investigate and quantify racial and ethnic minority representation in dementia risk factor research. METHODS AND ANALYSIS: The elements of this protocol have been designed in accordance with the relevant sections of the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocols which are specifically applicable to scoping review protocols. We will include population-based cohort studies looking at risk factors for dementia incidence in our review and assess the representation of racial and ethnic minority populations in these studies. We will use multiple strategies to identify relevant studies, including a systematic search of the following electronic databases: MEDLINE (Ovid SP), Embase (Ovid SP) and Scopus. Two review authors will independently perform title and abstract screening, full-text screening and data extraction. Included cohort studies will be evaluated using a comprehensive framework to assess racial/ethnic minority representation. Logistic regression will also be performed to describe associations between cohort study characteristics and outcomes related to racial and ethnic minority representation. ETHICS AND DISSEMINATION: Formal ethical approval is not required to conduct this review as no primary data are to be collected. The final results of this scoping review will be disseminated through publication in peer-reviewed journals and conference presentations.


Asunto(s)
Demencia , Grupos Minoritarios , Estudios de Cohortes , Demencia/epidemiología , Etnicidad , Humanos , Metaanálisis como Asunto , Proyectos de Investigación , Literatura de Revisión como Asunto , Factores de Riesgo , Revisiones Sistemáticas como Asunto
14.
PLoS Genet ; 17(4): e1009112, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33819264

RESUMEN

We previously identified a deletion on chromosome 16p12.1 that is mostly inherited and associated with multiple neurodevelopmental outcomes, where severely affected probands carried an excess of rare pathogenic variants compared to mildly affected carrier parents. We hypothesized that the 16p12.1 deletion sensitizes the genome for disease, while "second-hits" in the genetic background modulate the phenotypic trajectory. To test this model, we examined how neurodevelopmental defects conferred by knockdown of individual 16p12.1 homologs are modulated by simultaneous knockdown of homologs of "second-hit" genes in Drosophila melanogaster and Xenopus laevis. We observed that knockdown of 16p12.1 homologs affect multiple phenotypic domains, leading to delayed developmental timing, seizure susceptibility, brain alterations, abnormal dendrite and axonal morphology, and cellular proliferation defects. Compared to genes within the 16p11.2 deletion, which has higher de novo occurrence, 16p12.1 homologs were less likely to interact with each other in Drosophila models or a human brain-specific interaction network, suggesting that interactions with "second-hit" genes may confer higher impact towards neurodevelopmental phenotypes. Assessment of 212 pairwise interactions in Drosophila between 16p12.1 homologs and 76 homologs of patient-specific "second-hit" genes (such as ARID1B and CACNA1A), genes within neurodevelopmental pathways (such as PTEN and UBE3A), and transcriptomic targets (such as DSCAM and TRRAP) identified genetic interactions in 63% of the tested pairs. In 11 out of 15 families, patient-specific "second-hits" enhanced or suppressed the phenotypic effects of one or many 16p12.1 homologs in 32/96 pairwise combinations tested. In fact, homologs of SETD5 synergistically interacted with homologs of MOSMO in both Drosophila and X. laevis, leading to modified cellular and brain phenotypes, as well as axon outgrowth defects that were not observed with knockdown of either individual homolog. Our results suggest that several 16p12.1 genes sensitize the genome towards neurodevelopmental defects, and complex interactions with "second-hit" genes determine the ultimate phenotypic manifestation.


Asunto(s)
Encéfalo/metabolismo , Deleción Cromosómica , Cromosomas Humanos Par 16/genética , Trastornos del Neurodesarrollo/genética , Proteínas Adaptadoras Transductoras de Señales/genética , Animales , Encéfalo/patología , Canales de Calcio/genética , Moléculas de Adhesión Celular/genética , Proteínas de Unión al ADN/genética , Modelos Animales de Enfermedad , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Epistasis Genética/genética , Regulación del Desarrollo de la Expresión Génica , Humanos , Metiltransferasas/genética , Trastornos del Neurodesarrollo/patología , Proteínas Nucleares/genética , Fosfohidrolasa PTEN/genética , Factores de Transcripción/genética , Ubiquitina-Proteína Ligasas/genética , Proteínas de Xenopus/genética , Xenopus laevis/genética
15.
Bioinformatics ; 37(19): 3377-3379, 2021 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-33760066

RESUMEN

SUMMARY: Learning low-dimensional representations (embeddings) of nodes in large graphs is key to applying machine learning on massive biological networks. Node2vec is the most widely used method for node embedding. However, its original Python and C++ implementations scale poorly with network density, failing for dense biological networks with hundreds of millions of edges. We have developed PecanPy, a new Python implementation of node2vec that uses cache-optimized compact graph data structures and precomputing/parallelization to result in fast, high-quality node embeddings for biological networks of all sizes and densities. AVAILABILITYAND IMPLEMENTATION: PecanPy software is freely available at https://github.com/krishnanlab/PecanPy. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

16.
Nucleic Acids Res ; 48(21): e125, 2020 12 02.
Artículo en Inglés | MEDLINE | ID: mdl-33074331

RESUMEN

While there are >2 million publicly-available human microarray gene-expression profiles, these profiles were measured using a variety of platforms that each cover a pre-defined, limited set of genes. Therefore, key to reanalyzing and integrating this massive data collection are methods that can computationally reconstitute the complete transcriptome in partially-measured microarray samples by imputing the expression of unmeasured genes. Current state-of-the-art imputation methods are tailored to samples from a specific platform and rely on gene-gene relationships regardless of the biological context of the target sample. We show that sparse regression models that capture sample-sample relationships (termed SampleLASSO), built on-the-fly for each new target sample to be imputed, outperform models based on fixed gene relationships. Extensive evaluation involving three machine learning algorithms (LASSO, k-nearest-neighbors, and deep-neural-networks), two gene subsets (GPL96-570 and LINCS), and multiple imputation tasks (within and across microarray/RNA-seq datasets) establishes that SampleLASSO is the most accurate model. Additionally, we demonstrate the biological interpretability of this method by showing that, for imputing a target sample from a certain tissue, SampleLASSO automatically leverages training samples from the same tissue. Thus, SampleLASSO is a simple, yet powerful and flexible approach for harmonizing large-scale gene-expression data.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , RNA-Seq
17.
Bioinformatics ; 36(11): 3457-3465, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32129827

RESUMEN

BACKGROUND: Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. RESULTS: In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene's full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation's appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. AVAILABILITY AND IMPLEMENTATION: The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. CONTACT: arjun@msu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Redes Reguladoras de Genes , Humanos , Aprendizaje Automático Supervisado
18.
Cell Syst ; 8(2): 152-162.e6, 2019 02 27.
Artículo en Inglés | MEDLINE | ID: mdl-30685436

RESUMEN

A key challenge for the diagnosis and treatment of complex human diseases is identifying their molecular basis. Here, we developed a unified computational framework, URSAHD (Unveiling RNA Sample Annotation for Human Diseases), that leverages machine learning and the hierarchy of anatomical relationships present among diseases to integrate thousands of clinical gene expression profiles and identify molecular characteristics specific to each of the hundreds of complex diseases. URSAHD can distinguish between closely related diseases more accurately than literature-validated genes or traditional differential-expression-based computational approaches and is applicable to any disease, including rare and understudied ones. We demonstrate the utility of URSAHD in classifying related nervous system cancers and experimentally verifying novel neuroblastoma-associated genes identified by URSAHD. We highlight the applications for potential targeted drug-repurposing and for quantitatively assessing the molecular response to clinical therapies. URSAHD is freely available for public use, including the use of underlying models, at ursahd.princeton.edu.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Genómica/métodos , Aprendizaje Automático/normas , Transcriptoma/genética , Humanos
19.
Genet Med ; 21(4): 816-825, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30190612

RESUMEN

PURPOSE: To assess the contribution of rare variants in the genetic background toward variability of neurodevelopmental phenotypes in individuals with rare copy-number variants (CNVs) and gene-disruptive variants. METHODS: We analyzed quantitative clinical information, exome sequencing, and microarray data from 757 probands and 233 parents and siblings who carry disease-associated variants. RESULTS: The number of rare likely deleterious variants in functionally intolerant genes ("other hits") correlated with expression of neurodevelopmental phenotypes in probands with 16p12.1 deletion (n=23, p=0.004) and in autism probands carrying gene-disruptive variants (n=184, p=0.03) compared with their carrier family members. Probands with 16p12.1 deletion and a strong family history presented more severe clinical features (p=0.04) and higher burden of other hits compared with those with mild/no family history (p=0.001). The number of other hits also correlated with severity of cognitive impairment in probands carrying pathogenic CNVs (n=53) or de novo pathogenic variants in disease genes (n=290), and negatively correlated with head size among 80 probands with 16p11.2 deletion. These co-occurring hits involved known disease-associated genes such as SETD5, AUTS2, and NRXN1, and were enriched for cellular and developmental processes. CONCLUSION: Accurate genetic diagnosis of complex disorders will require complete evaluation of the genetic background even after a candidate disease-associated variant is identified.


Asunto(s)
Trastorno Autístico/genética , Moléculas de Adhesión Celular Neuronal/genética , Tamización de Portadores Genéticos , Metiltransferasas/genética , Proteínas del Tejido Nervioso/genética , Proteínas/genética , Trastorno Autístico/fisiopatología , Proteínas de Unión al Calcio , Cromosomas Humanos Par 16/genética , Cognición/fisiología , Proteínas del Citoesqueleto , Variaciones en el Número de Copia de ADN/genética , Femenino , Regulación de la Expresión Génica/genética , Antecedentes Genéticos , Humanos , Masculino , Moléculas de Adhesión de Célula Nerviosa , Padres , Linaje , Fenotipo , Eliminación de Secuencia/genética , Hermanos , Factores de Transcripción
20.
Nat Commun ; 9(1): 2548, 2018 06 29.
Artículo en Inglés | MEDLINE | ID: mdl-29959322

RESUMEN

As opposed to syndromic CNVs caused by single genes, extensive phenotypic heterogeneity in variably-expressive CNVs complicates disease gene discovery and functional evaluation. Here, we propose a complex interaction model for pathogenicity of the autism-associated 16p11.2 deletion, where CNV genes interact with each other in conserved pathways to modulate expression of the phenotype. Using multiple quantitative methods in Drosophila RNAi lines, we identify a range of neurodevelopmental phenotypes for knockdown of individual 16p11.2 homologs in different tissues. We test 565 pairwise knockdowns in the developing eye, and identify 24 interactions between pairs of 16p11.2 homologs and 46 interactions between 16p11.2 homologs and neurodevelopmental genes that suppress or enhance cell proliferation phenotypes compared to one-hit knockdowns. These interactions within cell proliferation pathways are also enriched in a human brain-specific network, providing translational relevance in humans. Our study indicates a role for pervasive genetic interactions within CNVs towards cellular and developmental phenotypes.


Asunto(s)
Trastorno Autístico/genética , Secuencia de Bases , Encéfalo/metabolismo , Drosophila melanogaster/genética , Proteínas del Tejido Nervioso/genética , Eliminación de Secuencia , Animales , Trastorno Autístico/metabolismo , Trastorno Autístico/patología , Encéfalo/patología , Proliferación Celular , Cromosomas Humanos Par 16/química , Cromosomas de Insectos/química , Variaciones en el Número de Copia de ADN , Modelos Animales de Enfermedad , Proteínas de Drosophila/antagonistas & inhibidores , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/crecimiento & desarrollo , Drosophila melanogaster/metabolismo , Femenino , Regulación del Desarrollo de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Masculino , Proteínas del Tejido Nervioso/antagonistas & inhibidores , Proteínas del Tejido Nervioso/metabolismo , Neurogénesis/genética , Fenotipo , Mapeo de Interacción de Proteínas , ARN Interferente Pequeño/genética , ARN Interferente Pequeño/metabolismo , Homología de Secuencia de Ácido Nucleico
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...