Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 30
Filter
Add more filters










Publication year range
1.
Science ; 384(6698): eadi5199, 2024 May 24.
Article in English | MEDLINE | ID: mdl-38781369

ABSTRACT

Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multiomics datasets into a resource comprising >2.8 million nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550,000 cell type-specific regulatory elements and >1.4 million single-cell expression quantitative trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.


Subject(s)
Brain , Gene Regulatory Networks , Mental Disorders , Single-Cell Analysis , Humans , Aging/genetics , Brain/metabolism , Cell Communication/genetics , Chromatin/metabolism , Chromatin/genetics , Genomics , Mental Disorders/genetics , Prefrontal Cortex/metabolism , Prefrontal Cortex/physiology , Quantitative Trait Loci
2.
bioRxiv ; 2024 Mar 30.
Article in English | MEDLINE | ID: mdl-38562822

ABSTRACT

Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet, little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multi-omics datasets into a resource comprising >2.8M nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550K cell-type-specific regulatory elements and >1.4M single-cell expression-quantitative-trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.

3.
PLoS Comput Biol ; 19(7): e1011222, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37410793

ABSTRACT

The COVID-19 pandemic caused by the SARS-CoV-2 virus has resulted in millions of deaths worldwide. The disease presents with various manifestations that can vary in severity and long-term outcomes. Previous efforts have contributed to the development of effective strategies for treatment and prevention by uncovering the mechanism of viral infection. We now know all the direct protein-protein interactions that occur during the lifecycle of SARS-CoV-2 infection, but it is critical to move beyond these known interactions to a comprehensive understanding of the "full interactome" of SARS-CoV-2 infection, which incorporates human microRNAs (miRNAs), additional human protein-coding genes, and exogenous microbes. Potentially, this will help in developing new drugs to treat COVID-19, differentiating the nuances of long COVID, and identifying histopathological signatures in SARS-CoV-2-infected organs. To construct the full interactome, we developed a statistical modeling approach called MLCrosstalk (multiple-layer crosstalk) based on latent Dirichlet allocation. MLCrosstalk integrates data from multiple sources, including microbes, human protein-coding genes, miRNAs, and human protein-protein interactions. It constructs "topics" that group SARS-CoV-2 with genes and microbes based on similar patterns of co-occurrence across patient samples. We use these topics to infer linkages between SARS-CoV-2 and protein-coding genes, miRNAs, and microbes. We then refine these initial linkages using network propagation to contextualize them within a larger framework of network and pathway structures. Using MLCrosstalk, we identified genes in the IL1-processing and VEGFA-VEGFR2 pathways that are linked to SARS-CoV-2. We also found that Rothia mucilaginosa and Prevotella melaninogenica are positively and negatively correlated with SARS-CoV-2 abundance, a finding corroborated by analysis of single-cell sequencing data.


Subject(s)
COVID-19 , MicroRNAs , Humans , SARS-CoV-2/genetics , Post-Acute COVID-19 Syndrome , Pandemics/prevention & control , MicroRNAs/genetics
4.
PLoS Comput Biol ; 17(8): e1009303, 2021 08.
Article in English | MEDLINE | ID: mdl-34424894

ABSTRACT

The development of mobile-health technology has the potential to revolutionize personalized medicine. Biomedical sensors (e.g., wearables) can assist with determining treatment plans for individuals, provide quantitative information to healthcare providers, and give objective measurements of health, leading to the goal of precise phenotypic correlates for genotypes. Even though treatments and interventions are becoming more specific and datasets more abundant, measuring the causal impact of health interventions requires careful considerations of complex covariate structures, as well as knowledge of the temporal and spatial properties of the data. Thus, interpreting biomedical sensor data needs to make use of specialized statistical models. Here, we show how the Bayesian structural time series framework, widely used in economics, can be applied to these data. This framework corrects for covariates to provide accurate assessments of the significance of interventions. Furthermore, it allows for a time-dependent confidence interval of impact, which is useful for considering individualized assessments of intervention efficacy. We provide a customized biomedical adaptor tool, MhealthCI, around a specific implementation of the Bayesian structural time series framework that uniformly processes, prepares, and registers diverse biomedical data. We apply the software implementation of MhealthCI to a structured set of examples in biomedicine to showcase the ability of the framework to evaluate interventions with varying levels of data richness and covariate complexity and also compare the performance to other models. Specifically, we show how the framework is able to evaluate an exercise intervention's effect on stabilizing blood glucose in a diabetes dataset. We also provide a future-anticipating illustration from a behavioral dataset showcasing how the framework integrates complex spatial covariates. Overall, we show the robustness of the Bayesian structural time series framework when applied to biomedical sensor data, highlighting its increasing value for current and future datasets.


Subject(s)
Bayes Theorem , Models, Statistical , Biosensing Techniques , Datasets as Topic , Humans , Software
5.
Bioinformatics ; 37(18): 2998-3000, 2021 09 29.
Article in English | MEDLINE | ID: mdl-33792640

ABSTRACT

MOTIVATION: Traditionally, an individual can only query and retrieve information from a genome browser by using accessories such as a mouse and keyboard. However, technology has changed the way that people interact with their screens. We hypothesized that we could leverage technological advances to use voice recognition as an interactive input to query and visualize genomic information. RESULTS: We developed an Amazon Alexa skill called Gene Tracer that allows users to use their voice to find disease-associated gene information, deleterious mutations and gene networks, while simultaneously enjoy a genome browser-like visualization experience on their screen. As the voice can be well recognized and understood, Gene Tracer provides users with more flexibility to acquire knowledge and is broadly applicable to other scenarios. AVAILABILITYAND IMPLEMENTATION: Alexa skill store (https://www.amazon.com/LT-Gene-tracer/dp/B08HCL1V68/) and a demonstration video (https://youtu.be/XbDbx7JDKmI). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics , Software , Genome , Information Storage and Retrieval , Mutation
6.
BMC Bioinformatics ; 21(1): 457, 2020 Oct 15.
Article in English | MEDLINE | ID: mdl-33059594

ABSTRACT

BACKGROUND: The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data. RESULTS: Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression. CONCLUSION: We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients.


Subject(s)
Algorithms , Asthma/genetics , Gene Expression Profiling , Gene Expression Regulation , Sputum/metabolism , Area Under Curve , Asthma/pathology , Cluster Analysis , Humans , Molecular Sequence Annotation , ROC Curve , Severity of Illness Index , Support Vector Machine
7.
Bioinformatics ; 36(Suppl_1): i474-i481, 2020 07 01.
Article in English | MEDLINE | ID: mdl-32657410

ABSTRACT

MOTIVATION: Recently, many chromatin immunoprecipitation sequencing experiments have been carried out for a diverse group of transcription factors (TFs) in many different types of human cells. These experiments manifest large-scale and dynamic changes in regulatory network connectivity (i.e. network 'rewiring'), highlighting the different regulatory programs operating in disparate cellular states. However, due to the dense and noisy nature of current regulatory networks, directly comparing the gains and losses of targets of key TFs across cell states is often not informative. Thus, here, we seek an abstracted, low-dimensional representation to understand the main features of network change. RESULTS: We propose a method called TopicNet that applies latent Dirichlet allocation to extract functional topics for a collection of genes regulated by a given TF. We then define a rewiring score to quantify regulatory-network changes in terms of the topic changes for this TF. Using this framework, we can pinpoint particular TFs that change greatly in network connectivity between different cellular states (such as observed in oncogenesis). Also, incorporating gene expression data, we define a topic activity score that measures the degree to which a given topic is active in a particular cellular state. And we show how activity differences can indicate differential survival in various cancers. AVAILABILITY AND IMPLEMENTATION: The TopicNet framework and related analysis were implemented using R and all codes are available at https://github.com/gersteinlab/topicnet. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Regulatory Networks , Transcription Factors , Chromatin Immunoprecipitation Sequencing , Humans , Transcription Factors/genetics
8.
Nat Commun ; 11(1): 3696, 2020 07 29.
Article in English | MEDLINE | ID: mdl-32728046

ABSTRACT

ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.


Subject(s)
Databases, Genetic , Genomics , Neoplasms/genetics , Cell Line, Tumor , Cell Transformation, Neoplastic/genetics , Gene Regulatory Networks , Humans , Mutation/genetics , Reproducibility of Results , Transcription Factors/metabolism
9.
Genome Biol ; 21(1): 151, 2020 07 30.
Article in English | MEDLINE | ID: mdl-32727537

ABSTRACT

RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation and disease. Their binding sites cover more of the genome than coding exons; nevertheless, most noncoding variant prioritization methods only focus on transcriptional regulation. Here, we integrate the portfolio of ENCODE-RBP experiments to develop RADAR, a variant-scoring framework. RADAR uses conservation, RNA structure, network centrality, and motifs to provide an overall impact score. Then, it further incorporates tissue-specific inputs to highlight disease-specific variants. Our results demonstrate RADAR can successfully pinpoint variants, both somatic and germline, associated with RBP-function dysregulation, which cannot be found by most current prioritization methods, for example, variants affecting splicing.


Subject(s)
Genomics/methods , RNA Processing, Post-Transcriptional/genetics , RNA-Binding Proteins/genetics , Software , Breast Neoplasms/genetics , Humans
10.
BMC Bioinformatics ; 21(1): 281, 2020 Jul 02.
Article in English | MEDLINE | ID: mdl-32615918

ABSTRACT

BACKGROUND: During transcription, numerous transcription factors (TFs) bind to targets in a highly coordinated manner to control the gene expression. Alterations in groups of TF-binding profiles (i.e. "co-binding changes") can affect the co-regulating associations between TFs (i.e. "rewiring the co-regulator network"). This, in turn, can potentially drive downstream expression changes, phenotypic variation, and even disease. However, quantification of co-regulatory network rewiring has not been comprehensively studied. RESULTS: To address this, we propose DiNeR, a computational method to directly construct a differential TF co-regulation network from paired disease-to-normal ChIP-seq data. Specifically, DiNeR uses a graphical model to capture the gained and lost edges in the co-regulation network. Then, it adopts a stability-based, sparsity-tuning criterion -- by sub-sampling the complete binding profiles to remove spurious edges -- to report only significant co-regulation alterations. Finally, DiNeR highlights hubs in the resultant differential network as key TFs associated with disease. We assembled genome-wide binding profiles of 104 TFs in the K562 and GM12878 cell lines, which loosely model the transition between normal and cancerous states in chronic myeloid leukemia (CML). In total, we identified 351 significantly altered TF co-regulation pairs. In particular, we found that the co-binding of the tumor suppressor BRCA1 and RNA polymerase II, a well-known transcriptional pair in healthy cells, was disrupted in tumors. Thus, DiNeR successfully extracted hub regulators and discovered well-known risk genes. CONCLUSIONS: Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators. Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators.


Subject(s)
Gene Regulatory Networks , Models, Genetic , Software , Chromatin Immunoprecipitation , Gene Expression Regulation , Genome , Humans , K562 Cells , Leukemia, Myelogenous, Chronic, BCR-ABL Positive/genetics , Protein Binding , Transcription Factors/metabolism , Transcription, Genetic
11.
Genome Biol ; 21(1): 150, 2020 06 22.
Article in English | MEDLINE | ID: mdl-32571363

ABSTRACT

Sputum induction is a non-invasive method to evaluate the airway environment, particularly for asthma. RNA sequencing (RNA-seq) of sputum samples can be challenging to interpret due to the complex and heterogeneous mixtures of human cells and exogenous (microbial) material. In this study, we develop a pipeline that integrates dimensionality reduction and statistical modeling to grapple with the heterogeneity. LDA(Latent Dirichlet allocation)-link connects microbes to genes using reduced-dimensionality LDA topics. We validate our method with single-cell RNA-seq and microscopy and then apply it to the sputum of asthmatic patients to find known and novel relationships between microbes and genes.


Subject(s)
Asthma/microbiology , Computational Biology/methods , Microbiota , Sequence Analysis, RNA , Sputum/chemistry , Asthma/genetics , Case-Control Studies , Female , Humans , Male , Middle Aged , Sputum/cytology , Unsupervised Machine Learning
12.
Cell ; 180(5): 915-927.e16, 2020 03 05.
Article in English | MEDLINE | ID: mdl-32084333

ABSTRACT

The dichotomous model of "drivers" and "passengers" in cancer posits that only a few mutations in a tumor strongly affect its progression, with the remaining ones being inconsequential. Here, we leveraged the comprehensive variant dataset from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) project to demonstrate that-in addition to the dichotomy of high- and low-impact variants-there is a third group of medium-impact putative passengers. Moreover, we also found that molecular impact correlates with subclonal architecture (i.e., early versus late mutations), and different signatures encode for mutations with divergent impact. Furthermore, we adapted an additive-effects model from complex-trait studies to show that the aggregated effect of putative passengers, including undetected weak drivers, provides significant additional power (∼12% additive variance) for predicting cancerous phenotypes, beyond PCAWG-identified driver mutations. Finally, this framework allowed us to estimate the frequency of potential weak-driver mutations in PCAWG samples lacking any well-characterized driver alterations.


Subject(s)
Genome, Human/genetics , Genomics/methods , Mutation/genetics , Neoplasms/genetics , DNA Mutational Analysis/methods , Disease Progression , Humans , Neoplasms/pathology , Whole Genome Sequencing
13.
PLoS Genet ; 15(8): e1007860, 2019 08.
Article in English | MEDLINE | ID: mdl-31469829

ABSTRACT

There has been much effort to prioritize genomic variants with respect to their impact on "function". However, function is often not precisely defined: sometimes it is the disease association of a variant; on other occasions, it reflects a molecular effect on transcription or epigenetics. Here, we coupled multiple genomic predictors to build GRAM, a GeneRAlized Model, to predict a well-defined experimental target: the expression-modulating effect of a non-coding variant on its associated gene, in a transferable, cell-specific manner. Firstly, we performed feature engineering: using LASSO, a regularized linear model, we found transcription factor (TF) binding most predictive, especially for TFs that are hubs in the regulatory network; in contrast, evolutionary conservation, a popular feature in many other variant-impact predictors, has almost no contribution. Moreover, TF binding inferred from in vitro SELEX is as effective as that from in vivo ChIP-Seq. Second, we implemented GRAM integrating only SELEX features and expression profiles; thus, the program combines a universal regulatory score with an easily obtainable modifier reflecting the particular cell type. We benchmarked GRAM on large-scale MPRA datasets, achieving AUROC scores of 0.72 in GM12878 and 0.66 in a multi-cell line dataset. We then evaluated the performance of GRAM on targeted regions using luciferase assays in the MCF7 and K562 cell lines. We noted that changing the insertion position of the construct relative to the reporter gene gave very different results, highlighting the importance of carefully defining the exact prediction target of the model. Finally, we illustrated the utility of GRAM in fine-mapping causal variants and developed a practical software pipeline to carry this out. In particular, we demonstrated in specific examples how the pipeline could pinpoint variants that directly modulate gene expression within a larger linkage-disequilibrium block associated with a phenotype of interest (e.g., for an eQTL).


Subject(s)
Gene Expression Regulation/genetics , Genetic Variation/genetics , Sequence Analysis, DNA/methods , Algorithms , Binding Sites , Computer Simulation , Forecasting/methods , Genomics/methods , Humans , Linkage Disequilibrium/genetics , Models, Genetic , Protein Binding/genetics , Quantitative Trait Loci/genetics , Software , Transcription Factors/genetics
14.
Structure ; 27(9): 1469-1481.e3, 2019 09 03.
Article in English | MEDLINE | ID: mdl-31279629

ABSTRACT

A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on ∼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data.


Subject(s)
Computational Biology/methods , Polymorphism, Single Nucleotide , Proteins/chemistry , Proteins/genetics , Databases, Protein , Drug Design , Humans , Ligands , Machine Learning , Models, Statistical , Molecular Docking Simulation , Protein Binding , Protein Conformation , Proteins/metabolism
15.
Science ; 362(6420)2018 12 14.
Article in English | MEDLINE | ID: mdl-30545857

ABSTRACT

Despite progress in defining genetic risk for psychiatric disorders, their molecular mechanisms remain elusive. Addressing this, the PsychENCODE Consortium has generated a comprehensive online resource for the adult brain across 1866 individuals. The PsychENCODE resource contains ~79,000 brain-active enhancers, sets of Hi-C linkages, and topologically associating domains; single-cell expression profiles for many cell types; expression quantitative-trait loci (QTLs); and further QTLs associated with chromatin, splicing, and cell-type proportions. Integration shows that varying cell-type proportions largely account for the cross-population variation in expression (with >88% reconstruction accuracy). It also allows building of a gene regulatory network, linking genome-wide association study variants to genes (e.g., 321 for schizophrenia). We embed this network into an interpretable deep-learning model, which improves disease prediction by ~6-fold versus polygenic risk scores and identifies key genes and pathways in psychiatric disorders.


Subject(s)
Brain/metabolism , Gene Expression Regulation , Mental Disorders/genetics , Datasets as Topic , Deep Learning , Enhancer Elements, Genetic , Epigenesis, Genetic , Epigenomics , Gene Regulatory Networks , Genome-Wide Association Study , Humans , Quantitative Trait Loci , Single-Cell Analysis , Transcriptome
16.
PLoS Comput Biol ; 13(7): e1005647, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28742097

ABSTRACT

Genome-wide proximity ligation based assays such as Hi-C have revealed that eukaryotic genomes are organized into structural units called topologically associating domains (TADs). From a visual examination of the chromosomal contact map, however, it is clear that the organization of the domains is not simple or obvious. Instead, TADs exhibit various length scales and, in many cases, a nested arrangement. Here, by exploiting the resemblance between TADs in a chromosomal contact map and densely connected modules in a network, we formulate TAD identification as a network optimization problem and propose an algorithm, MrTADFinder, to identify TADs from intra-chromosomal contact maps. MrTADFinder is based on the network-science concept of modularity. A key component of it is deriving an appropriate background model for contacts in a random chain, by numerically solving a set of matrix equations. The background model preserves the observed coverage of each genomic bin as well as the distance dependence of the contact frequency for any pair of bins exhibited by the empirical map. Also, by introducing a tunable resolution parameter, MrTADFinder provides a self-consistent approach for identifying TADs at different length scales, hence the acronym "Mr" standing for Multiple Resolutions. We then apply MrTADFinder to various Hi-C datasets. The identified domain boundaries are marked by characteristic signatures in chromatin marks and transcription factors (TF) that are consistent with earlier work. Moreover, by calling TADs at different length scales, we observe that boundary signatures change with resolution, with different chromatin features having different characteristic length scales. Furthermore, we report an enrichment of HOT (high-occupancy target) regions near TAD boundaries and investigate the role of different TFs in determining boundaries at various resolutions. To further explore the interplay between TADs and epigenetic marks, as tumor mutational burden is known to be coupled to chromatin structure, we examine how somatic mutations are distributed across boundaries and find a clear stepwise pattern. Overall, MrTADFinder provides a novel computational framework to explore the multi-scale structures in Hi-C contact maps.


Subject(s)
Chromatin , Chromosomes , Computational Biology/methods , Models, Genetic , Algorithms , Cell Line , Cell Nucleus/chemistry , Cell Nucleus/genetics , Chromatin/chemistry , Chromatin/genetics , Chromatin/ultrastructure , Chromosomes/chemistry , Chromosomes/genetics , Chromosomes/ultrastructure , Genome/genetics , Genome/physiology , Humans , Protein Binding , Transcription Factors/metabolism
18.
Genome Biol ; 17: 53, 2016 Mar 23.
Article in English | MEDLINE | ID: mdl-27009100

ABSTRACT

As the cost of sequencing continues to decrease and the amount of sequence data generated grows, new paradigms for data storage and analysis are increasingly important. The relative scaling behavior of these evolving technologies will impact genomics research moving forward.


Subject(s)
Computational Biology/trends , High-Throughput Nucleotide Sequencing/economics , Algorithms , Biomedical Research , Computational Biology/methods , Genomics , High-Throughput Nucleotide Sequencing/methods , Humans , Information Storage and Retrieval
19.
Mol Cancer Res ; 14(4): 332-43, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26856934

ABSTRACT

UNLABELLED: Liposarcoma is the second most common form of sarcoma, which has been categorized into four molecular subtypes, which are associated with differential prognosis of patients. However, the transcriptional regulatory programs associated with distinct histologic and molecular subtypes of liposarcoma have not been investigated. This study uses integrative analyses to systematically define the transcriptional regulatory programs associated with liposarcoma. Likewise, computational methods are used to identify regulatory programs associated with different liposarcoma subtypes, as well as programs that are predictive of prognosis. Further analysis of curated gene sets was used to identify prognostic gene signatures. The integration of data from a variety of sources, including gene expression profiles, transcription factor-binding data from ChIP-Seq experiments, curated gene sets, and clinical information of patients, indicated discrete regulatory programs (e.g., controlled by E2F1 and E2F4), with significantly different regulatory activity in one or multiple subtypes of liposarcoma with respect to normal adipose tissue. These programs were also shown to be prognostic, wherein liposarcoma patients with higher E2F4 or E2F1 activity associated with unfavorable prognosis. A total of 259 gene sets were significantly associated with patient survival in liposarcoma, among which > 50% are involved in cell cycle and proliferation. IMPLICATIONS: These integrative analyses provide a general framework that can be applied to investigate the mechanism and predict prognosis of different cancer types.


Subject(s)
Cell Cycle Checkpoints , Computational Biology/methods , E2F1 Transcription Factor/genetics , E2F4 Transcription Factor/genetics , Liposarcoma/pathology , Algorithms , Cell Line, Tumor , Gene Expression Profiling/methods , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Humans , Liposarcoma/genetics , Prognosis , Survival Analysis
20.
PLoS Comput Biol ; 11(5): e1004269, 2015 May.
Article in English | MEDLINE | ID: mdl-25996148

ABSTRACT

The regulatory architecture of breast cancer is extraordinarily complex and gene misregulation can occur at many levels, with transcriptional malfunction being a major cause. This dysfunctional process typically involves additional regulatory modulators including DNA methylation. Thus, the interplay between transcription factor (TF) binding and DNA methylation are two components of a cancer regulatory interactome presumed to display correlated signals. As proof of concept, we performed a systematic motif-based in silico analysis to infer all potential TFs that are involved in breast cancer prognosis through an association with DNA methylation changes. Using breast cancer DNA methylation and clinical data derived from The Cancer Genome Atlas (TCGA), we carried out a systematic inference of TFs whose misregulation underlie different clinical subtypes of breast cancer. Our analysis identified TFs known to be associated with clinical outcomes of p53 and ER (estrogen receptor) subtypes of breast cancer, while also predicting new TFs that may also be involved. Furthermore, our results suggest that misregulation in breast cancer can be caused by the binding of alternative factors to the binding sites of TFs whose activity has been ablated. Overall, this study provides a comprehensive analysis that links DNA methylation to TF binding to patient prognosis.


Subject(s)
Breast Neoplasms/genetics , DNA Methylation , Gene Expression Regulation, Neoplastic , Amino Acid Motifs , Binding Sites , Breast Neoplasms/pathology , Cluster Analysis , CpG Islands , DNA, Neoplasm/metabolism , Female , Gene Expression Profiling , Humans , Prognosis , Receptors, Estrogen/genetics , Receptors, Estrogen/metabolism , Transcription Factors/metabolism , Treatment Outcome , Tumor Suppressor Protein p53/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...