Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 292
Filtrar
1.
bioRxiv ; 2024 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-38948759

RESUMO

Computational methods in biology can infer large molecular interaction networks from multiple data sources and at different resolutions, creating unprecedented opportunities to explore the mechanisms driving complex biological phenomena. Networks can be built to represent distinct conditions and compared to uncover graph-level differences-such as when comparing patterns of gene-gene interactions that change between biological states. Given the importance of the graph comparison problem, there is a clear and growing need for robust and scalable methods that can identify meaningful differences. We introduce node2vec2rank (n2v2r), a method for graph differential analysis that ranks nodes according to the disparities of their representations in joint latent embedding spaces. Improving upon previous bag-of-features approaches, we take advantage of recent advances in machine learning and statistics to compare graphs in higher-order structures and in a data-driven manner. Formulated as a multi-layer spectral embedding algorithm, n2v2r is computationally efficient, incorporates stability as a key feature, and can provably identify the correct ranking of differences between graphs in an overall procedure that adheres to veridical data science principles. By better adapting to the data, node2vec2rank clearly outperformed the commonly used node degree in finding complex differences in simulated data. In the real-world applications of breast cancer subtype characterization, analysis of cell cycle in single-cell data, and searching for sex differences in lung adenocarcinoma, node2vec2rank found meaningful biological differences enabling the hypothesis generation for therapeutic candidates. Software and analysis pipelines implementing n2v2r and used for the analyses presented here are publicly available.

2.
bioRxiv ; 2024 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-39005266

RESUMO

Aging is the primary risk factor for many individual cancer types, including lung adenocarcinoma (LUAD). To understand how aging-related alterations in the regulation of key cellular processes might affect LUAD risk and survival outcomes, we built individual (person)-specific gene regulatory networks integrating gene expression, transcription factor protein-protein interaction, and sequence motif data, using PANDA/LIONESS algorithms, for both non-cancerous lung tissue samples from the Genotype Tissue Expression (GTEx) project and LUAD samples from The Cancer Genome Atlas (TCGA). In GTEx, we found that pathways involved in cell proliferation and immune response are increasingly targeted by regulatory transcription factors with age; these aging-associated alterations are accelerated by tobacco smoking and resemble oncogenic shifts in the regulatory landscape observed in LUAD and suggests that dysregulation of aging pathways might be associated with an increased risk of LUAD. Comparing normal adjacent samples from individuals with LUAD with healthy lung tissue samples from those without LUAD, we found that aging-associated genes show greater aging-biased targeting patterns in younger individuals with LUAD compared to their healthy counterparts of similar age, a pattern suggestive of age acceleration. This implies that an accelerated aging process may be responsible for tumor incidence in younger individuals. Using drug repurposing tool CLUEreg, we found small molecule drugs with potential geroprotective effects that may alter the accelerating aging profiles we found. We also observed that, in contrast to chronological age, a network-informed aging signature was associated with survival and response to chemotherapy in LUAD.

3.
Genome Biol ; 25(1): 127, 2024 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-38773638

RESUMO

BACKGROUND: Gene regulatory network (GRN) models that are formulated as ordinary differential equations (ODEs) can accurately explain temporal gene expression patterns and promise to yield new insights into important cellular processes, disease progression, and intervention design. Learning such gene regulatory ODEs is challenging, since we want to predict the evolution of gene expression in a way that accurately encodes the underlying GRN governing the dynamics and the nonlinear functional relationships between genes. Most widely used ODE estimation methods either impose too many parametric restrictions or are not guided by meaningful biological insights, both of which impede either scalability, explainability, or both. RESULTS: We developed PHOENIX, a modeling framework based on neural ordinary differential equations (NeuralODEs) and Hill-Langmuir kinetics, that overcomes limitations of other methods by flexibly incorporating prior domain knowledge and biological constraints to promote sparse, biologically interpretable representations of GRN ODEs. We tested the accuracy of PHOENIX in a series of in silico experiments, benchmarking it against several currently used tools. We demonstrated PHOENIX's flexibility by modeling regulation of oscillating expression profiles obtained from synchronized yeast cells. We also assessed the scalability of PHOENIX by modeling genome-scale GRNs for breast cancer samples ordered in pseudotime and for B cells treated with Rituximab. CONCLUSIONS: PHOENIX uses a combination of user-defined prior knowledge and functional forms from systems biology to encode biological "first principles" as soft constraints on the GRN allowing us to predict subsequent gene expression patterns in a biologically explainable manner.


Assuntos
Redes Reguladoras de Genes , Humanos , Redes Neurais de Computação , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Modelos Genéticos
4.
bioRxiv ; 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38464142

RESUMO

Single Nucleotide Polymorphisms (SNPs) associated with traits typically explain a small part of the trait genetic heritability-with the remainder thought to be distributed throughout the genome. Such SNPs are likely to alter expression levels of biologically relevant genes. Expression Quantitative Trait Locus (eQTL) networks analysis has helped to functionally characterize such variants. We systematically analyze the distribution of SNP heritability for ten traits across 29 tissue-specific eQTL networks. We find that heritability is clustered in a small number or tissue-specific, functionally relevant SNP-gene modules and that the greatest occurs in local "hubs" that are both the cornerstone of the network's modules and tissue-specific regulatory elements. The network structure could thus both amplify the genotype-phenotype connection and buffer the deleterious effect of the genetic variations on other traits. Together, these results define a conceptual framework for understanding complex trait architecture and identifying key mutations carrying most of the heritability.

5.
bioRxiv ; 2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-36909563

RESUMO

Modeling dynamics of gene regulatory networks using ordinary differential equations (ODEs) allow a deeper understanding of disease progression and response to therapy, thus aiding in intervention optimization. Although there exist methods to infer regulatory ODEs, these are generally limited to small networks, rely on dimensional reduction, or impose non-biological parametric restrictions - all impeding scalability and explainability. PHOENIX is a neural ODE framework incorporating prior domain knowledge as soft constraints to infer sparse, biologically interpretable dynamics. Extensive experiments - on simulated and real data - demonstrate PHOENIX's unique ability to learn key regulatory dynamics while scaling to the whole genome.

6.
bioRxiv ; 2023 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-38014256

RESUMO

Gene regulatory networks (GRNs) are effective tools for inferring complex interactions between molecules that regulate biological processes and hence can provide insights into drivers of biological systems. Inferring co-expression networks is a critical element of GRN inference as the correlation between expression patterns may indicate that genes are coregulated by common factors. However, methods that estimate co-expression networks generally derive an aggregate network representing the mean regulatory properties of the population and so fail to fully capture population heterogeneity. To address these concerns, we introduce BONOBO (Bayesian Optimized Networks Obtained By assimilating Omics data), a scalable Bayesian model for deriving individual sample-specific co-expression networks by recognizing variations in molecular interactions across individuals. For every sample, BONOBO assumes a Gaussian distribution on the log-transformed centered gene expression and a conjugate prior distribution on the sample-specific co-expression matrix constructed from all other samples in the data. Combining the sample-specific gene expression with the prior distribution, BONOBO yields a closed-form solution for the posterior distribution of the sample-specific co-expression matrices, thus making the method extremely scalable. We demonstrate the utility of BONOBO in several contexts, including analyzing gene regulation in yeast transcription factor knockout studies, prognostic significance of miRNA-mRNA interaction in human breast cancer subtypes, and sex differences in gene regulation within human thyroid tissue. We find that BONOBO outperforms other sample-specific co-expression network inference methods and provides insight into individual differences in the drivers of biological processes.

7.
bioRxiv ; 2023 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-37790409

RESUMO

Lung adenocarcinoma (LUAD) has been observed to have significant sex differences in incidence, prognosis, and response to therapy. However, the molecular mechanisms responsible for these disparities have not been investigated extensively. Sample-specific gene regulatory network methods were used to analyze RNA sequencing data from non-cancerous human lung samples from The Genotype Tissue Expression Project (GTEx) and lung adenocarcinoma primary tumor samples from The Cancer Genome Atlas (TCGA); results were validated on independent data. We observe that genes associated with key biological pathways including cell proliferation, immune response and drug metabolism are differentially regulated between males and females in both healthy lung tissue, as well as in tumor, and that these regulatory differences are further perturbed by tobacco smoking. We also uncovered significant sex bias in transcription factor targeting patterns of clinically actionable oncogenes and tumor suppressor genes, including AKT2 and KRAS. Using differentially regulated genes between healthy and tumor samples in conjunction with a drug repurposing tool, we identified several small-molecule drugs that might have sex-biased efficacy as cancer therapeutics and further validated this observation using an independent cell line database. These findings underscore the importance of including sex as a biological variable and considering gene regulatory processes in developing strategies for disease prevention and management.

8.
J Clin Oncol ; 41(26): 4192-4199, 2023 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-37672882

RESUMO

PURPOSE: To improve on current standards for breast cancer prognosis and prediction of chemotherapy benefit by developing a risk model that incorporates the gene expression-based "intrinsic" subtypes luminal A, luminal B, HER2-enriched, and basal-like. METHODS: A 50-gene subtype predictor was developed using microarray and quantitative reverse transcriptase polymerase chain reaction data from 189 prototype samples. Test sets from 761 patients (no systemic therapy) were evaluated for prognosis, and 133 patients were evaluated for prediction of pathologic complete response (pCR) to a taxane and anthracycline regimen. RESULTS: The intrinsic subtypes as discrete entities showed prognostic significance (P = 2.26E-12) and remained significant in multivariable analyses that incorporated standard parameters (estrogen receptor status, histologic grade, tumor size, and node status). A prognostic model for node-negative breast cancer was built using intrinsic subtype and clinical information. The C-index estimate for the combined model (subtype and tumor size) was a significant improvement on either the clinicopathologic model or subtype model alone. The intrinsic subtype model predicted neoadjuvant chemotherapy efficacy with a negative predictive value for pCR of 97%. CONCLUSION: Diagnosis by intrinsic subtype adds significant prognostic and predictive information to standard parameters for patients with breast cancer. The prognostic properties of the continuous risk score will be of value for the management of node-negative breast cancers. The subtypes and risk score can also be used to assess the likelihood of efficacy from neoadjuvant chemotherapy.

9.
Res Sq ; 2023 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-36993392

RESUMO

Models that are formulated as ordinary differential equations (ODEs) can accurately explain temporal gene expression patterns and promise to yield new insights into important cellular processes, disease progression, and intervention design. Learning such ODEs is challenging, since we want to predict the evolution of gene expression in a way that accurately encodes the causal gene-regulatory network (GRN) governing the dynamics and the nonlinear functional relationships between genes. Most widely used ODE estimation methods either impose too many parametric restrictions or are not guided by meaningful biological insights, both of which impedes scalability and/or explainability. To overcome these limitations, we developed PHOENIX, a modeling framework based on neural ordinary differential equations (NeuralODEs) and Hill-Langmuir kinetics, that can flexibly incorporate prior domain knowledge and biological constraints to promote sparse, biologically interpretable representations of ODEs. We test accuracy of PHOENIX in a series of in silico experiments benchmarking it against several currently used tools for ODE estimation. We also demonstrate PHOENIX's flexibility by studying oscillating expression data from synchronized yeast cells and assess its scalability by modelling genome-scale breast cancer expression for samples ordered in pseudotime. Finally, we show how the combination of user-defined prior knowledge and functional forms from systems biology allows PHOENIX to encode key properties of the underlying GRN, and subsequently predict expression patterns in a biologically explainable way.

10.
Genome Biol ; 24(1): 45, 2023 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-36894939

RESUMO

Inference and analysis of gene regulatory networks (GRNs) require software that integrates multi-omic data from various sources. The Network Zoo (netZoo; netzoo.github.io) is a collection of open-source methods to infer GRNs, conduct differential network analyses, estimate community structure, and explore the transitions between biological states. The netZoo builds on our ongoing development of network methods, harmonizing the implementations in various computing languages and between methods to allow better integration of these tools into analytical pipelines. We demonstrate the utility using multi-omic data from the Cancer Cell Line Encyclopedia. We will continue to expand the netZoo to incorporate additional methods.


Assuntos
Redes Reguladoras de Genes , Neoplasias , Humanos , Algoritmos , Software , Multiômica , Biologia Computacional/métodos
11.
Nucleic Acids Res ; 51(3): e15, 2023 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-36533448

RESUMO

The increasing quantity of multi-omic data, such as methylomic and transcriptomic profiles collected on the same specimen or even on the same cell, provides a unique opportunity to explore the complex interactions that define cell phenotype and govern cellular responses to perturbations. We propose a network approach based on Gaussian Graphical Models (GGMs) that facilitates the joint analysis of paired omics data. This method, called DRAGON (Determining Regulatory Associations using Graphical models on multi-Omic Networks), calibrates its parameters to achieve an optimal trade-off between the network's complexity and estimation accuracy, while explicitly accounting for the characteristics of each of the assessed omics 'layers.' In simulation studies, we show that DRAGON adapts to edge density and feature size differences between omics layers, improving model inference and edge recovery compared to state-of-the-art methods. We further demonstrate in an analysis of joint transcriptome - methylome data from TCGA breast cancer specimens that DRAGON can identify key molecular mechanisms such as gene regulation via promoter methylation. In particular, we identify Transcription Factor AP-2 Beta (TFAP2B) as a potential multi-omic biomarker for basal-type breast cancer. DRAGON is available as open-source code in Python through the Network Zoo package (netZooPy v0.8; netzoo.github.io).


Assuntos
Multiômica , Neoplasias , Humanos , Software , Simulação por Computador , Transcriptoma , Neoplasias/genética , Redes Reguladoras de Genes
12.
Cancer Causes Control ; 33(8): 1107-1120, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35759080

RESUMO

Cancer heterogeneities hold the key to a deeper understanding of cancer etiology and progression and the discovery of more precise cancer therapy. Modern pathological and molecular technologies offer a powerful set of tools to profile tumor heterogeneities at multiple levels in large patient populations, from DNA to RNA, protein and epigenetics, and from tumor tissues to tumor microenvironment and liquid biopsy. When coupled with well-validated epidemiologic methodology and well-characterized epidemiologic resources, the rich tumor pathological and molecular tumor information provide new research opportunities at an unprecedented breadth and depth. This is the research space where Molecular Pathological Epidemiology (MPE) emerged over a decade ago and has been thriving since then. As a truly multidisciplinary field, MPE embraces collaborations from diverse fields including epidemiology, pathology, immunology, genetics, biostatistics, bioinformatics, and data science. Since first convened in 2013, the International MPE Meeting series has grown into a dynamic and dedicated platform for experts from these disciplines to communicate novel findings, discuss new research opportunities and challenges, build professional networks, and educate the next-generation scientists. Herein, we share the proceedings of the Fifth International MPE meeting, held virtually online, on May 24 and 25, 2021. The meeting consisted of 21 presentations organized into the three main themes, which were recent integrative MPE studies, novel cancer profiling technologies, and new statistical and data science approaches. Looking forward to the near future, the meeting attendees anticipated continuous expansion and fruition of MPE research in many research fronts, particularly immune-epidemiology, mutational signatures, liquid biopsy, and health disparities.


Assuntos
Neoplasias , Patologia Molecular , Humanos , Mutação , Neoplasias/epidemiologia , Neoplasias/genética , Neoplasias/terapia , Patologia Molecular/métodos , Microambiente Tumoral
13.
Respir Res ; 23(1): 157, 2022 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-35715807

RESUMO

BACKGROUND: Interstitial lung abnormalities (ILA) are radiologic findings that may progress to idiopathic pulmonary fibrosis (IPF). Blood gene expression profiles can predict IPF mortality, but whether these same genes associate with ILA and ILA outcomes is unknown. This study evaluated if a previously described blood gene expression profile associated with IPF mortality is associated with ILA and all-cause mortality. METHODS: In COPDGene and ECLIPSE study participants with visual scoring of ILA and gene expression data, we evaluated the association of a previously described IPF mortality score with ILA and mortality. We also trained a new ILA score, derived using genes from the IPF score, in a subset of COPDGene. We tested the association with ILA and mortality on the remainder of COPDGene and ECLIPSE. RESULTS: In 1469 COPDGene (training n = 734; testing n = 735) and 571 ECLIPSE participants, the IPF score was not associated with ILA or mortality. However, an ILA score derived from IPF score genes was associated with ILA (meta-analysis of test datasets OR 1.4 [95% CI: 1.2-1.6]) and mortality (HR 1.25 [95% CI: 1.12-1.41]). Six of the 11 genes in the ILA score had discordant directions of effects compared to the IPF score. The ILA score partially mediated the effects of age on mortality (11.8% proportion mediated). CONCLUSIONS: An ILA gene expression score, derived from IPF mortality-associated genes, identified genes with concordant and discordant effects on IPF mortality and ILA. These results suggest shared, and unique biologic processes, amongst those with ILA, IPF, aging, and death.


Assuntos
Fibrose Pulmonar Idiopática , Doenças Pulmonares Intersticiais , Estudos de Coortes , Humanos , Fibrose Pulmonar Idiopática/diagnóstico , Fibrose Pulmonar Idiopática/genética , Pulmão , Doenças Pulmonares Intersticiais/diagnóstico , Doenças Pulmonares Intersticiais/genética , Tomografia Computadorizada por Raios X , Transcriptoma/genética
14.
Cell Rep Methods ; 2(5): 100218, 2022 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-35637906

RESUMO

Expression quantitative trait locus (eQTL) analysis associates SNPs with gene expression; these relationships can be represented as a bipartite network with association strength as "edge weights" between SNPs and genes. However, most eQTL networks use binary edge weights based on thresholded FDR estimates: definitions that influence reproducibility and downstream analyses. We constructed twenty-nine tissue-specific eQTL networks using GTEx data and evaluated a comprehensive set of network specifications based on false discovery rates, test statistics, and p values, focusing on the degree centrality-a metric of an SNP or gene node's potential network influence. We found a thresholded Benjamini-Hochberg q value weighted by the Z-statistic balances metric reproducibility and computational efficiency. Our estimated gene degrees positively correlate with gene degrees in gene regulatory networks, demonstrating that these networks are complementary in understanding regulation. Gene degrees also correlate with genetic diversity, and heritability analyses show that highly connected nodes are enriched for tissue-relevant traits.


Assuntos
Redes Reguladoras de Genes , Locos de Características Quantitativas , Locos de Características Quantitativas/genética , Reprodutibilidade dos Testes , Redes Reguladoras de Genes/genética , Fenótipo , Genômica
16.
Genome Res ; 32(3): 524-533, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35193937

RESUMO

Understanding how each person's unique genotype influences their individual patterns of gene regulation has the potential to improve our understanding of human health and development, and to refine genotype-specific disease risk assessments and treatments. However, the effects of genetic variants are not typically considered when constructing gene regulatory networks, despite the fact that many disease-associated genetic variants are thought to have regulatory effects, including the disruption of transcription factor (TF) binding. We developed EGRET (Estimating the Genetic Regulatory Effect on TFs), which infers a genotype-specific gene regulatory network for each individual in a study population. EGRET begins by constructing a genotype-informed TF-gene prior network derived using TF motif predictions, expression quantitative trait locus (eQTL) data, individual genotypes, and the predicted effects of genetic variants on TF binding. It then uses a technique known as message passing to integrate this prior network with gene expression and TF protein-protein interaction data to produce a refined, genotype-specific regulatory network. We used EGRET to infer gene regulatory networks for two blood-derived cell lines and identified genotype-associated, cell line-specific regulatory differences that we subsequently validated using allele-specific expression, chromatin accessibility QTLs, and differential ChIP-seq TF binding. We also inferred EGRET networks for three cell types from each of 119 individuals and identified cell type-specific regulatory differences associated with diseases related to those cell types. EGRET is, to our knowledge, the first method that infers networks reflective of individual genetic variation in a way that provides insight into the genetic regulatory associations driving complex phenotypes.


Assuntos
Redes Reguladoras de Genes , Fatores de Transcrição , Cromatina , Imunoprecipitação da Cromatina , Genótipo , Humanos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
17.
NAR Genom Bioinform ; 4(1): lqac002, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35156023

RESUMO

Gene regulatory network inference allows for the modeling of genome-scale regulatory processes that are altered during development, in disease, and in response to perturbations. Our group has developed a collection of tools to model various regulatory processes, including transcriptional (PANDA, SPIDER) and post-transcriptional (PUMA) gene regulation, as well as gene regulation in individual samples (LIONESS). These methods work by postulating a network structure and then optimizing that structure to be consistent with multiple lines of biological evidence through repeated operations on data matrices. Although our methods are widely used, the corresponding computational complexity, and the associated costs and run times, do limit some applications. To improve the cost/time performance of these algorithms, we developed gpuZoo which implements GPU-accelerated calculations, dramatically improving the performance of these algorithms. The runtime of the gpuZoo implementation in MATLAB and Python is up to 61 times faster and 28 times less expensive than multi-core CPU implementation of the same methods. gpuZoo is available in MATLAB through the netZooM package https://github.com/netZoo/netZooM and in Python through the netZooPy package https://github.com/netZoo/netZooPy.

18.
Nucleic Acids Res ; 50(D1): D610-D621, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34508353

RESUMO

Gene regulation plays a fundamental role in shaping tissue identity, function, and response to perturbation. Regulatory processes are controlled by complex networks of interacting elements, including transcription factors, miRNAs and their target genes. The structure of these networks helps to determine phenotypes and can ultimately influence the development of disease or response to therapy. We developed GRAND (https://grand.networkmedicine.org) as a database for computationally-inferred, context-specific gene regulatory network models that can be compared between biological states, or used to predict which drugs produce changes in regulatory network structure. The database includes 12 468 genome-scale networks covering 36 human tissues, 28 cancers, 1378 unperturbed cell lines, as well as 173 013 TF and gene targeting scores for 2858 small molecule-induced cell line perturbation paired with phenotypic information. GRAND allows the networks to be queried using phenotypic information and visualized using a variety of interactive tools. In addition, it includes a web application that matches disease states to potentially therapeutic small molecule drugs using regulatory network properties.


Assuntos
Bases de Dados Genéticas , Bases de Dados de Produtos Farmacêuticos , Redes Reguladoras de Genes/genética , Software , Regulação da Expressão Gênica/genética , Genoma Humano/genética , Humanos , MicroRNAs/classificação , MicroRNAs/genética , Fatores de Transcrição/classificação , Fatores de Transcrição/genética
19.
NPJ Syst Biol Appl ; 7(1): 45, 2021 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-34887443

RESUMO

The biological processes that drive cellular function can be represented by a complex network of interactions between regulators (transcription factors) and their targets (genes). A cell's epigenetic state plays an important role in mediating these interactions, primarily by influencing chromatin accessibility. However, how to effectively use epigenetic data when constructing a gene regulatory network remains an open question. Almost all existing network reconstruction approaches focus on estimating transcription factor to gene connections using transcriptomic data. In contrast, computational approaches for analyzing epigenetic data generally focus on improving transcription factor binding site predictions rather than deducing regulatory network relationships. We bridged this gap by developing SPIDER, a network reconstruction approach that incorporates epigenetic data into a message-passing framework to estimate gene regulatory networks. We validated SPIDER's predictions using ChIP-seq data from ENCODE and found that SPIDER networks are both highly accurate and include cell-line-specific regulatory interactions. Notably, SPIDER can recover ChIP-seq verified transcription factor binding events in the regulatory regions of genes that do not have a corresponding sequence motif. The networks estimated by SPIDER have the potential to identify novel hypotheses that will allow us to better characterize cell-type and phenotype specific regulatory mechanisms.


Assuntos
Biologia Computacional , Redes Reguladoras de Genes , Imunoprecipitação da Cromatina , Epigênese Genética/genética , Redes Reguladoras de Genes/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
20.
Proc AAAI Conf Artif Intell ; 35(11): 10263-10272, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34707916

RESUMO

Bipartite network inference is a ubiquitous problem across disciplines. One important example in the field molecular biology is gene regulatory network inference. Gene regulatory networks are an instrumental tool aiding in the discovery of the molecular mechanisms driving diverse diseases, including cancer. However, only noisy observations of the projections of these regulatory networks are typically assayed. In an effort to better estimate regulatory networks from their noisy projections, we formulate a non-convex but analytically tractable optimization problem called OTTER. This problem can be interpreted as relaxed graph matching between the two projections of the bipartite network. OTTER's solutions can be derived explicitly and inspire a spectral algorithm, for which we provide network recovery guarantees. We also provide an alternative approach based on gradient descent that is more robust to noise compared to the spectral algorithm. Interestingly, this gradient descent approach resembles the message passing equations of an established gene regulatory network inference method, PANDA. Using three cancer-related data sets, we show that OTTER outperforms state-of-the-art inference methods in predicting transcription factor binding to gene regulatory regions. To encourage new graph matching applications to this problem, we have made all networks and validation data publicly available.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...