Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Science ; 384(6698): eadi5199, 2024 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-38781369

RESUMEN

Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multiomics datasets into a resource comprising >2.8 million nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550,000 cell type-specific regulatory elements and >1.4 million single-cell expression quantitative trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.


Asunto(s)
Encéfalo , Redes Reguladoras de Genes , Trastornos Mentales , Análisis de la Célula Individual , Humanos , Envejecimiento/genética , Encéfalo/metabolismo , Comunicación Celular/genética , Cromatina/metabolismo , Cromatina/genética , Genómica , Trastornos Mentales/genética , Corteza Prefrontal/metabolismo , Corteza Prefrontal/fisiología , Sitios de Carácter Cuantitativo
2.
bioRxiv ; 2024 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-38562822

RESUMEN

Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet, little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multi-omics datasets into a resource comprising >2.8M nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550K cell-type-specific regulatory elements and >1.4M single-cell expression-quantitative-trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.

3.
PLoS Comput Biol ; 19(7): e1011222, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37410793

RESUMEN

The COVID-19 pandemic caused by the SARS-CoV-2 virus has resulted in millions of deaths worldwide. The disease presents with various manifestations that can vary in severity and long-term outcomes. Previous efforts have contributed to the development of effective strategies for treatment and prevention by uncovering the mechanism of viral infection. We now know all the direct protein-protein interactions that occur during the lifecycle of SARS-CoV-2 infection, but it is critical to move beyond these known interactions to a comprehensive understanding of the "full interactome" of SARS-CoV-2 infection, which incorporates human microRNAs (miRNAs), additional human protein-coding genes, and exogenous microbes. Potentially, this will help in developing new drugs to treat COVID-19, differentiating the nuances of long COVID, and identifying histopathological signatures in SARS-CoV-2-infected organs. To construct the full interactome, we developed a statistical modeling approach called MLCrosstalk (multiple-layer crosstalk) based on latent Dirichlet allocation. MLCrosstalk integrates data from multiple sources, including microbes, human protein-coding genes, miRNAs, and human protein-protein interactions. It constructs "topics" that group SARS-CoV-2 with genes and microbes based on similar patterns of co-occurrence across patient samples. We use these topics to infer linkages between SARS-CoV-2 and protein-coding genes, miRNAs, and microbes. We then refine these initial linkages using network propagation to contextualize them within a larger framework of network and pathway structures. Using MLCrosstalk, we identified genes in the IL1-processing and VEGFA-VEGFR2 pathways that are linked to SARS-CoV-2. We also found that Rothia mucilaginosa and Prevotella melaninogenica are positively and negatively correlated with SARS-CoV-2 abundance, a finding corroborated by analysis of single-cell sequencing data.


Asunto(s)
COVID-19 , MicroARNs , Humanos , SARS-CoV-2/genética , Síndrome Post Agudo de COVID-19 , Pandemias/prevención & control , MicroARNs/genética
4.
PLoS Comput Biol ; 17(8): e1009303, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34424894

RESUMEN

The development of mobile-health technology has the potential to revolutionize personalized medicine. Biomedical sensors (e.g., wearables) can assist with determining treatment plans for individuals, provide quantitative information to healthcare providers, and give objective measurements of health, leading to the goal of precise phenotypic correlates for genotypes. Even though treatments and interventions are becoming more specific and datasets more abundant, measuring the causal impact of health interventions requires careful considerations of complex covariate structures, as well as knowledge of the temporal and spatial properties of the data. Thus, interpreting biomedical sensor data needs to make use of specialized statistical models. Here, we show how the Bayesian structural time series framework, widely used in economics, can be applied to these data. This framework corrects for covariates to provide accurate assessments of the significance of interventions. Furthermore, it allows for a time-dependent confidence interval of impact, which is useful for considering individualized assessments of intervention efficacy. We provide a customized biomedical adaptor tool, MhealthCI, around a specific implementation of the Bayesian structural time series framework that uniformly processes, prepares, and registers diverse biomedical data. We apply the software implementation of MhealthCI to a structured set of examples in biomedicine to showcase the ability of the framework to evaluate interventions with varying levels of data richness and covariate complexity and also compare the performance to other models. Specifically, we show how the framework is able to evaluate an exercise intervention's effect on stabilizing blood glucose in a diabetes dataset. We also provide a future-anticipating illustration from a behavioral dataset showcasing how the framework integrates complex spatial covariates. Overall, we show the robustness of the Bayesian structural time series framework when applied to biomedical sensor data, highlighting its increasing value for current and future datasets.


Asunto(s)
Teorema de Bayes , Modelos Estadísticos , Técnicas Biosensibles , Conjuntos de Datos como Asunto , Humanos , Programas Informáticos
5.
Bioinformatics ; 37(18): 2998-3000, 2021 09 29.
Artículo en Inglés | MEDLINE | ID: mdl-33792640

RESUMEN

MOTIVATION: Traditionally, an individual can only query and retrieve information from a genome browser by using accessories such as a mouse and keyboard. However, technology has changed the way that people interact with their screens. We hypothesized that we could leverage technological advances to use voice recognition as an interactive input to query and visualize genomic information. RESULTS: We developed an Amazon Alexa skill called Gene Tracer that allows users to use their voice to find disease-associated gene information, deleterious mutations and gene networks, while simultaneously enjoy a genome browser-like visualization experience on their screen. As the voice can be well recognized and understood, Gene Tracer provides users with more flexibility to acquire knowledge and is broadly applicable to other scenarios. AVAILABILITYAND IMPLEMENTATION: Alexa skill store (https://www.amazon.com/LT-Gene-tracer/dp/B08HCL1V68/) and a demonstration video (https://youtu.be/XbDbx7JDKmI). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica , Programas Informáticos , Genoma , Almacenamiento y Recuperación de la Información , Mutación
6.
BMC Bioinformatics ; 21(1): 457, 2020 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-33059594

RESUMEN

BACKGROUND: The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data. RESULTS: Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression. CONCLUSION: We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients.


Asunto(s)
Algoritmos , Asma/genética , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Esputo/metabolismo , Área Bajo la Curva , Asma/patología , Análisis por Conglomerados , Humanos , Anotación de Secuencia Molecular , Curva ROC , Índice de Severidad de la Enfermedad , Máquina de Vectores de Soporte
7.
Bioinformatics ; 36(Suppl_1): i474-i481, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32657410

RESUMEN

MOTIVATION: Recently, many chromatin immunoprecipitation sequencing experiments have been carried out for a diverse group of transcription factors (TFs) in many different types of human cells. These experiments manifest large-scale and dynamic changes in regulatory network connectivity (i.e. network 'rewiring'), highlighting the different regulatory programs operating in disparate cellular states. However, due to the dense and noisy nature of current regulatory networks, directly comparing the gains and losses of targets of key TFs across cell states is often not informative. Thus, here, we seek an abstracted, low-dimensional representation to understand the main features of network change. RESULTS: We propose a method called TopicNet that applies latent Dirichlet allocation to extract functional topics for a collection of genes regulated by a given TF. We then define a rewiring score to quantify regulatory-network changes in terms of the topic changes for this TF. Using this framework, we can pinpoint particular TFs that change greatly in network connectivity between different cellular states (such as observed in oncogenesis). Also, incorporating gene expression data, we define a topic activity score that measures the degree to which a given topic is active in a particular cellular state. And we show how activity differences can indicate differential survival in various cancers. AVAILABILITY AND IMPLEMENTATION: The TopicNet framework and related analysis were implemented using R and all codes are available at https://github.com/gersteinlab/topicnet. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Reguladoras de Genes , Factores de Transcripción , Secuenciación de Inmunoprecipitación de Cromatina , Humanos , Factores de Transcripción/genética
8.
Nat Commun ; 11(1): 3696, 2020 07 29.
Artículo en Inglés | MEDLINE | ID: mdl-32728046

RESUMEN

ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.


Asunto(s)
Bases de Datos Genéticas , Genómica , Neoplasias/genética , Línea Celular Tumoral , Transformación Celular Neoplásica/genética , Redes Reguladoras de Genes , Humanos , Mutación/genética , Reproducibilidad de los Resultados , Factores de Transcripción/metabolismo
9.
Genome Biol ; 21(1): 151, 2020 07 30.
Artículo en Inglés | MEDLINE | ID: mdl-32727537

RESUMEN

RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation and disease. Their binding sites cover more of the genome than coding exons; nevertheless, most noncoding variant prioritization methods only focus on transcriptional regulation. Here, we integrate the portfolio of ENCODE-RBP experiments to develop RADAR, a variant-scoring framework. RADAR uses conservation, RNA structure, network centrality, and motifs to provide an overall impact score. Then, it further incorporates tissue-specific inputs to highlight disease-specific variants. Our results demonstrate RADAR can successfully pinpoint variants, both somatic and germline, associated with RBP-function dysregulation, which cannot be found by most current prioritization methods, for example, variants affecting splicing.


Asunto(s)
Genómica/métodos , Procesamiento Postranscripcional del ARN/genética , Proteínas de Unión al ARN/genética , Programas Informáticos , Neoplasias de la Mama/genética , Humanos
10.
BMC Bioinformatics ; 21(1): 281, 2020 Jul 02.
Artículo en Inglés | MEDLINE | ID: mdl-32615918

RESUMEN

BACKGROUND: During transcription, numerous transcription factors (TFs) bind to targets in a highly coordinated manner to control the gene expression. Alterations in groups of TF-binding profiles (i.e. "co-binding changes") can affect the co-regulating associations between TFs (i.e. "rewiring the co-regulator network"). This, in turn, can potentially drive downstream expression changes, phenotypic variation, and even disease. However, quantification of co-regulatory network rewiring has not been comprehensively studied. RESULTS: To address this, we propose DiNeR, a computational method to directly construct a differential TF co-regulation network from paired disease-to-normal ChIP-seq data. Specifically, DiNeR uses a graphical model to capture the gained and lost edges in the co-regulation network. Then, it adopts a stability-based, sparsity-tuning criterion -- by sub-sampling the complete binding profiles to remove spurious edges -- to report only significant co-regulation alterations. Finally, DiNeR highlights hubs in the resultant differential network as key TFs associated with disease. We assembled genome-wide binding profiles of 104 TFs in the K562 and GM12878 cell lines, which loosely model the transition between normal and cancerous states in chronic myeloid leukemia (CML). In total, we identified 351 significantly altered TF co-regulation pairs. In particular, we found that the co-binding of the tumor suppressor BRCA1 and RNA polymerase II, a well-known transcriptional pair in healthy cells, was disrupted in tumors. Thus, DiNeR successfully extracted hub regulators and discovered well-known risk genes. CONCLUSIONS: Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators. Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators.


Asunto(s)
Redes Reguladoras de Genes , Modelos Genéticos , Programas Informáticos , Inmunoprecipitación de Cromatina , Regulación de la Expresión Génica , Genoma , Humanos , Células K562 , Leucemia Mielógena Crónica BCR-ABL Positiva/genética , Unión Proteica , Factores de Transcripción/metabolismo , Transcripción Genética
11.
Genome Biol ; 21(1): 150, 2020 06 22.
Artículo en Inglés | MEDLINE | ID: mdl-32571363

RESUMEN

Sputum induction is a non-invasive method to evaluate the airway environment, particularly for asthma. RNA sequencing (RNA-seq) of sputum samples can be challenging to interpret due to the complex and heterogeneous mixtures of human cells and exogenous (microbial) material. In this study, we develop a pipeline that integrates dimensionality reduction and statistical modeling to grapple with the heterogeneity. LDA(Latent Dirichlet allocation)-link connects microbes to genes using reduced-dimensionality LDA topics. We validate our method with single-cell RNA-seq and microscopy and then apply it to the sputum of asthmatic patients to find known and novel relationships between microbes and genes.


Asunto(s)
Asma/microbiología , Biología Computacional/métodos , Microbiota , Análisis de Secuencia de ARN , Esputo/química , Asma/genética , Estudios de Casos y Controles , Femenino , Humanos , Masculino , Persona de Mediana Edad , Esputo/citología , Aprendizaje Automático no Supervisado
12.
Cell ; 180(5): 915-927.e16, 2020 03 05.
Artículo en Inglés | MEDLINE | ID: mdl-32084333

RESUMEN

The dichotomous model of "drivers" and "passengers" in cancer posits that only a few mutations in a tumor strongly affect its progression, with the remaining ones being inconsequential. Here, we leveraged the comprehensive variant dataset from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) project to demonstrate that-in addition to the dichotomy of high- and low-impact variants-there is a third group of medium-impact putative passengers. Moreover, we also found that molecular impact correlates with subclonal architecture (i.e., early versus late mutations), and different signatures encode for mutations with divergent impact. Furthermore, we adapted an additive-effects model from complex-trait studies to show that the aggregated effect of putative passengers, including undetected weak drivers, provides significant additional power (∼12% additive variance) for predicting cancerous phenotypes, beyond PCAWG-identified driver mutations. Finally, this framework allowed us to estimate the frequency of potential weak-driver mutations in PCAWG samples lacking any well-characterized driver alterations.


Asunto(s)
Genoma Humano/genética , Genómica/métodos , Mutación/genética , Neoplasias/genética , Análisis Mutacional de ADN/métodos , Progresión de la Enfermedad , Humanos , Neoplasias/patología , Secuenciación Completa del Genoma
13.
PLoS Genet ; 15(8): e1007860, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31469829

RESUMEN

There has been much effort to prioritize genomic variants with respect to their impact on "function". However, function is often not precisely defined: sometimes it is the disease association of a variant; on other occasions, it reflects a molecular effect on transcription or epigenetics. Here, we coupled multiple genomic predictors to build GRAM, a GeneRAlized Model, to predict a well-defined experimental target: the expression-modulating effect of a non-coding variant on its associated gene, in a transferable, cell-specific manner. Firstly, we performed feature engineering: using LASSO, a regularized linear model, we found transcription factor (TF) binding most predictive, especially for TFs that are hubs in the regulatory network; in contrast, evolutionary conservation, a popular feature in many other variant-impact predictors, has almost no contribution. Moreover, TF binding inferred from in vitro SELEX is as effective as that from in vivo ChIP-Seq. Second, we implemented GRAM integrating only SELEX features and expression profiles; thus, the program combines a universal regulatory score with an easily obtainable modifier reflecting the particular cell type. We benchmarked GRAM on large-scale MPRA datasets, achieving AUROC scores of 0.72 in GM12878 and 0.66 in a multi-cell line dataset. We then evaluated the performance of GRAM on targeted regions using luciferase assays in the MCF7 and K562 cell lines. We noted that changing the insertion position of the construct relative to the reporter gene gave very different results, highlighting the importance of carefully defining the exact prediction target of the model. Finally, we illustrated the utility of GRAM in fine-mapping causal variants and developed a practical software pipeline to carry this out. In particular, we demonstrated in specific examples how the pipeline could pinpoint variants that directly modulate gene expression within a larger linkage-disequilibrium block associated with a phenotype of interest (e.g., for an eQTL).


Asunto(s)
Regulación de la Expresión Génica/genética , Variación Genética/genética , Análisis de Secuencia de ADN/métodos , Algoritmos , Sitios de Unión , Simulación por Computador , Predicción/métodos , Genómica/métodos , Humanos , Desequilibrio de Ligamiento/genética , Modelos Genéticos , Unión Proteica/genética , Sitios de Carácter Cuantitativo/genética , Programas Informáticos , Factores de Transcripción/genética
14.
Structure ; 27(9): 1469-1481.e3, 2019 09 03.
Artículo en Inglés | MEDLINE | ID: mdl-31279629

RESUMEN

A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on ∼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data.


Asunto(s)
Biología Computacional/métodos , Polimorfismo de Nucleótido Simple , Proteínas/química , Proteínas/genética , Bases de Datos de Proteínas , Diseño de Fármacos , Humanos , Ligandos , Aprendizaje Automático , Modelos Estadísticos , Simulación del Acoplamiento Molecular , Unión Proteica , Conformación Proteica , Proteínas/metabolismo
15.
Science ; 362(6420)2018 12 14.
Artículo en Inglés | MEDLINE | ID: mdl-30545857

RESUMEN

Despite progress in defining genetic risk for psychiatric disorders, their molecular mechanisms remain elusive. Addressing this, the PsychENCODE Consortium has generated a comprehensive online resource for the adult brain across 1866 individuals. The PsychENCODE resource contains ~79,000 brain-active enhancers, sets of Hi-C linkages, and topologically associating domains; single-cell expression profiles for many cell types; expression quantitative-trait loci (QTLs); and further QTLs associated with chromatin, splicing, and cell-type proportions. Integration shows that varying cell-type proportions largely account for the cross-population variation in expression (with >88% reconstruction accuracy). It also allows building of a gene regulatory network, linking genome-wide association study variants to genes (e.g., 321 for schizophrenia). We embed this network into an interpretable deep-learning model, which improves disease prediction by ~6-fold versus polygenic risk scores and identifies key genes and pathways in psychiatric disorders.


Asunto(s)
Encéfalo/metabolismo , Regulación de la Expresión Génica , Trastornos Mentales/genética , Conjuntos de Datos como Asunto , Aprendizaje Profundo , Elementos de Facilitación Genéticos , Epigénesis Genética , Epigenómica , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Humanos , Sitios de Carácter Cuantitativo , Análisis de la Célula Individual , Transcriptoma
16.
PLoS Comput Biol ; 13(7): e1005647, 2017 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-28742097

RESUMEN

Genome-wide proximity ligation based assays such as Hi-C have revealed that eukaryotic genomes are organized into structural units called topologically associating domains (TADs). From a visual examination of the chromosomal contact map, however, it is clear that the organization of the domains is not simple or obvious. Instead, TADs exhibit various length scales and, in many cases, a nested arrangement. Here, by exploiting the resemblance between TADs in a chromosomal contact map and densely connected modules in a network, we formulate TAD identification as a network optimization problem and propose an algorithm, MrTADFinder, to identify TADs from intra-chromosomal contact maps. MrTADFinder is based on the network-science concept of modularity. A key component of it is deriving an appropriate background model for contacts in a random chain, by numerically solving a set of matrix equations. The background model preserves the observed coverage of each genomic bin as well as the distance dependence of the contact frequency for any pair of bins exhibited by the empirical map. Also, by introducing a tunable resolution parameter, MrTADFinder provides a self-consistent approach for identifying TADs at different length scales, hence the acronym "Mr" standing for Multiple Resolutions. We then apply MrTADFinder to various Hi-C datasets. The identified domain boundaries are marked by characteristic signatures in chromatin marks and transcription factors (TF) that are consistent with earlier work. Moreover, by calling TADs at different length scales, we observe that boundary signatures change with resolution, with different chromatin features having different characteristic length scales. Furthermore, we report an enrichment of HOT (high-occupancy target) regions near TAD boundaries and investigate the role of different TFs in determining boundaries at various resolutions. To further explore the interplay between TADs and epigenetic marks, as tumor mutational burden is known to be coupled to chromatin structure, we examine how somatic mutations are distributed across boundaries and find a clear stepwise pattern. Overall, MrTADFinder provides a novel computational framework to explore the multi-scale structures in Hi-C contact maps.


Asunto(s)
Cromatina , Cromosomas , Biología Computacional/métodos , Modelos Genéticos , Algoritmos , Línea Celular , Núcleo Celular/química , Núcleo Celular/genética , Cromatina/química , Cromatina/genética , Cromatina/ultraestructura , Cromosomas/química , Cromosomas/genética , Cromosomas/ultraestructura , Genoma/genética , Genoma/fisiología , Humanos , Unión Proteica , Factores de Transcripción/metabolismo
18.
Genome Biol ; 17: 53, 2016 Mar 23.
Artículo en Inglés | MEDLINE | ID: mdl-27009100

RESUMEN

As the cost of sequencing continues to decrease and the amount of sequence data generated grows, new paradigms for data storage and analysis are increasingly important. The relative scaling behavior of these evolving technologies will impact genomics research moving forward.


Asunto(s)
Biología Computacional/tendencias , Secuenciación de Nucleótidos de Alto Rendimiento/economía , Algoritmos , Investigación Biomédica , Biología Computacional/métodos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Almacenamiento y Recuperación de la Información
19.
Mol Cancer Res ; 14(4): 332-43, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-26856934

RESUMEN

UNLABELLED: Liposarcoma is the second most common form of sarcoma, which has been categorized into four molecular subtypes, which are associated with differential prognosis of patients. However, the transcriptional regulatory programs associated with distinct histologic and molecular subtypes of liposarcoma have not been investigated. This study uses integrative analyses to systematically define the transcriptional regulatory programs associated with liposarcoma. Likewise, computational methods are used to identify regulatory programs associated with different liposarcoma subtypes, as well as programs that are predictive of prognosis. Further analysis of curated gene sets was used to identify prognostic gene signatures. The integration of data from a variety of sources, including gene expression profiles, transcription factor-binding data from ChIP-Seq experiments, curated gene sets, and clinical information of patients, indicated discrete regulatory programs (e.g., controlled by E2F1 and E2F4), with significantly different regulatory activity in one or multiple subtypes of liposarcoma with respect to normal adipose tissue. These programs were also shown to be prognostic, wherein liposarcoma patients with higher E2F4 or E2F1 activity associated with unfavorable prognosis. A total of 259 gene sets were significantly associated with patient survival in liposarcoma, among which > 50% are involved in cell cycle and proliferation. IMPLICATIONS: These integrative analyses provide a general framework that can be applied to investigate the mechanism and predict prognosis of different cancer types.


Asunto(s)
Puntos de Control del Ciclo Celular , Biología Computacional/métodos , Factor de Transcripción E2F1/genética , Factor de Transcripción E2F4/genética , Liposarcoma/patología , Algoritmos , Línea Celular Tumoral , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Liposarcoma/genética , Pronóstico , Análisis de Supervivencia
20.
PLoS Comput Biol ; 11(5): e1004269, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25996148

RESUMEN

The regulatory architecture of breast cancer is extraordinarily complex and gene misregulation can occur at many levels, with transcriptional malfunction being a major cause. This dysfunctional process typically involves additional regulatory modulators including DNA methylation. Thus, the interplay between transcription factor (TF) binding and DNA methylation are two components of a cancer regulatory interactome presumed to display correlated signals. As proof of concept, we performed a systematic motif-based in silico analysis to infer all potential TFs that are involved in breast cancer prognosis through an association with DNA methylation changes. Using breast cancer DNA methylation and clinical data derived from The Cancer Genome Atlas (TCGA), we carried out a systematic inference of TFs whose misregulation underlie different clinical subtypes of breast cancer. Our analysis identified TFs known to be associated with clinical outcomes of p53 and ER (estrogen receptor) subtypes of breast cancer, while also predicting new TFs that may also be involved. Furthermore, our results suggest that misregulation in breast cancer can be caused by the binding of alternative factors to the binding sites of TFs whose activity has been ablated. Overall, this study provides a comprehensive analysis that links DNA methylation to TF binding to patient prognosis.


Asunto(s)
Neoplasias de la Mama/genética , Metilación de ADN , Regulación Neoplásica de la Expresión Génica , Secuencias de Aminoácidos , Sitios de Unión , Neoplasias de la Mama/patología , Análisis por Conglomerados , Islas de CpG , ADN de Neoplasias/metabolismo , Femenino , Perfilación de la Expresión Génica , Humanos , Pronóstico , Receptores de Estrógenos/genética , Receptores de Estrógenos/metabolismo , Factores de Transcripción/metabolismo , Resultado del Tratamiento , Proteína p53 Supresora de Tumor/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...