Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 40
Filter
Add more filters










Publication year range
1.
Sci Adv ; 9(40): eadg9959, 2023 10 06.
Article in English | MEDLINE | ID: mdl-37801507

ABSTRACT

Lentiviral vector (LV)-based gene therapy holds promise for a broad range of diseases. Analyzing more than 280,000 vector integration sites (VISs) in 273 samples from 10 patients with X-linked severe combined immunodeficiency (SCID-X1), we discovered shared LV integrome signatures in 9 of 10 patients in relation to the genomics, epigenomics, and 3D structure of the human genome. VISs were enriched in the nuclear subcompartment A1 and integrated into super-enhancers close to nuclear pore complexes. These signatures were validated in T cells transduced with an LV encoding a CD19-specific chimeric antigen receptor. Intriguingly, the one patient whose VISs deviated from the identified integrome signatures had a distinct clinical course. Comparison of LV and gamma retrovirus integromes regarding their 3D genome signatures identified differences that might explain the lower risk of insertional mutagenesis in LV-based gene therapy. Our findings suggest that LV integrome signatures, shaped by common features such as genome organization, may affect the efficacy of LV-based cellular therapies.


Subject(s)
Genetic Vectors , X-Linked Combined Immunodeficiency Diseases , Humans , Genetic Vectors/genetics , Genetic Therapy , Retroviridae/genetics , X-Linked Combined Immunodeficiency Diseases/genetics , X-Linked Combined Immunodeficiency Diseases/therapy , T-Lymphocytes
2.
Nat Immunol ; 24(10): 1735-1747, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37679549

ABSTRACT

Neurodegenerative diseases, including Alzheimer's disease (AD), are characterized by innate immune-mediated inflammation, but functional and mechanistic effects of the adaptive immune system remain unclear. Here we identify brain-resident CD8+ T cells that coexpress CXCR6 and PD-1 and are in proximity to plaque-associated microglia in human and mouse AD brains. We also establish that CD8+ T cells restrict AD pathologies, including ß-amyloid deposition and cognitive decline. Ligand-receptor interaction analysis identifies CXCL16-CXCR6 intercellular communication between microglia and CD8+ T cells. Further, Cxcr6 deficiency impairs accumulation, tissue residency programming and clonal expansion of brain PD-1+CD8+ T cells. Ablation of Cxcr6 or CD8+ T cells ultimately increases proinflammatory cytokine production from microglia, with CXCR6 orchestrating brain CD8+ T cell-microglia colocalization. Collectively, our study reveals protective roles for brain CD8+ T cells and CXCR6 in mouse AD pathogenesis and highlights that microenvironment-specific, intercellular communication orchestrates tissue homeostasis and protection from neuroinflammation.

3.
Nat Commun ; 14(1): 2581, 2023 05 04.
Article in English | MEDLINE | ID: mdl-37142594

ABSTRACT

Many signaling and other genes known as "hidden" drivers may not be genetically or epigenetically altered or differentially expressed at the mRNA or protein levels, but, rather, drive a phenotype such as tumorigenesis via post-translational modification or other mechanisms. However, conventional approaches based on genomics or differential expression are limited in exposing such hidden drivers. Here, we present a comprehensive algorithm and toolkit NetBID2 (data-driven network-based Bayesian inference of drivers, version 2), which reverse-engineers context-specific interactomes and integrates network activity inferred from large-scale multi-omics data, empowering the identification of hidden drivers that could not be detected by traditional analyses. NetBID2 has substantially re-engineered the previous prototype version by providing versatile data visualization and sophisticated statistical analyses, which strongly facilitate researchers for result interpretation through end-to-end multi-omics data analysis. We demonstrate the power of NetBID2 using three hidden driver examples. We deploy NetBID2 Viewer, Runner, and Cloud apps with 145 context-specific gene regulatory and signaling networks across normal tissues and paediatric and adult cancers to facilitate end-to-end analysis, real-time interactive visualization and cloud-based data sharing. NetBID2 is freely available at https://jyyulab.github.io/NetBID .


Subject(s)
Algorithms , Genomics , Humans , Bayes Theorem , Cell Transformation, Neoplastic/genetics , Research Design , Software
4.
bioRxiv ; 2023 Jan 27.
Article in English | MEDLINE | ID: mdl-36747870

ABSTRACT

The sparse nature of single-cell omics data makes it challenging to dissect the wiring and rewiring of the transcriptional and signaling drivers that regulate cellular states. Many of the drivers, referred to as "hidden drivers", are difficult to identify via conventional expression analysis due to low expression and inconsistency between RNA and protein activity caused by post-translational and other modifications. To address this issue, we developed scMINER, a mutual information (MI)-based computational framework for unsupervised clustering analysis and cell-type specific inference of intracellular networks, hidden drivers and network rewiring from single-cell RNA-seq data. We designed scMINER to capture nonlinear cell-cell and gene-gene relationships and infer driver activities. Systematic benchmarking showed that scMINER outperforms popular single-cell clustering algorithms, especially in distinguishing similar cell types. With respect to network inference, scMINER does not rely on the binding motifs which are available for a limited set of transcription factors, therefore scMINER can provide quantitative activity assessment for more than 6,000 transcription and signaling drivers from a scRNA-seq experiment. As demonstrations, we used scMINER to expose hidden transcription and signaling drivers and dissect their regulon rewiring in immune cell heterogeneity, lineage differentiation, and tissue specification. Overall, activity-based scMINER is a widely applicable, highly accurate, reproducible and scalable method for inferring cellular transcriptional and signaling networks in each cell state from scRNA-seq data. The scMINER software is publicly accessible via: https://github.com/jyyulab/scMINER.

5.
Res Sq ; 2023 Jan 27.
Article in English | MEDLINE | ID: mdl-36747874

ABSTRACT

The sparse nature of single-cell omics data makes it challenging to dissect the wiring and rewiring of the transcriptional and signaling drivers that regulate cellular states. Many of the drivers, referred to as "hidden drivers", are difficult to identify via conventional expression analysis due to low expression and inconsistency between RNA and protein activity caused by post-translational and other modifications. To address this issue, we developed scMINER, a mutual information (MI)-based computational framework for unsupervised clustering analysis and cell-type specific inference of intracellular networks, hidden drivers and network rewiring from single-cell RNA-seq data. We designed scMINER to capture nonlinear cell-cell and gene-gene relationships and infer driver activities. Systematic benchmarking showed that scMINER outperforms popular single-cell clustering algorithms, especially in distinguishing similar cell types. With respect to network inference, scMINER does not rely on the binding motifs which are available for a limited set of transcription factors, therefore scMINER can provide quantitative activity assessment for more than 6,000 transcription and signaling drivers from a scRNA-seq experiment. As demonstrations, we used scMINER to expose hidden transcription and signaling drivers and dissect their regulon rewiring in immune cell heterogeneity, lineage differentiation, and tissue specification. Overall, activity-based scMINER is a widely applicable, highly accurate, reproducible and scalable method for inferring cellular transcriptional and signaling networks in each cell state from scRNA-seq data. The scMINER software is publicly accessible via: https://github.com/jyyulab/scMINER.

6.
J Cell Sci ; 135(10)2022 05 15.
Article in English | MEDLINE | ID: mdl-35502723

ABSTRACT

The mammary gland epithelial tree contains two distinct cell populations, luminal and basal. The investigation of how this heterogeneity is developed and how it influences tumorigenesis has been hampered by the need to perform studies on these populations using animal models. Comma-1D is an immortalized mouse mammary epithelial cell line that has unique morphogenetic properties. By performing single-cell RNA-seq studies, we found that Comma-1D cultures consist of two main populations with luminal and basal features, and a smaller population with mixed lineage and bipotent characteristics. We demonstrated that multiple transcription factors associated with the differentiation of the mammary epithelium in vivo also modulate this process in Comma-1D cultures. Additionally, we found that only cells with luminal features were able to acquire transformed characteristics after an oncogenic HER2 (also known as ERBB2) mutant was introduced in their genomes. Overall, our studies characterize, at a single-cell level, the heterogeneity of the Comma-1D cell line and illustrate how Comma-1D cells can be used as an experimental model to study both the differentiation and the transformation processes in vitro.


Subject(s)
Breast Neoplasms , Cell Line , Mammary Glands, Animal , Animals , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Epithelial Cells , Female , Mammary Glands, Animal/cytology , Mice , Single-Cell Analysis
7.
EMBO Rep ; 22(12): e53201, 2021 12 06.
Article in English | MEDLINE | ID: mdl-34633138

ABSTRACT

During the female lifetime, the expansion of the epithelium dictated by the ovarian cycles is supported by a transient increase in the mammary epithelial stem cell population (MaSCs). Notably, activation of Wnt/ß-catenin signaling is an important trigger for MaSC expansion. Here, we report that the miR-424/503 cluster is a modulator of canonical Wnt signaling in the mammary epithelium. We show that mammary tumors of miR-424(322)/503-depleted mice exhibit activated Wnt/ß-catenin signaling. Importantly, we show a strong association between miR-424/503 deletion and breast cancers with high levels of Wnt/ß-catenin signaling. Moreover, miR-424/503 cluster is required for Wnt-mediated MaSC expansion induced by the ovarian cycles. Lastly, we show that miR-424/503 exerts its function by targeting two binding sites at the 3'UTR of the LRP6 co-receptor and reducing its expression. These results unveil an unknown link between the miR-424/503, regulation of Wnt signaling, MaSC fate, and tumorigenesis.


Subject(s)
Epithelium , Low Density Lipoprotein Receptor-Related Protein-6 , Mammary Glands, Animal/cytology , MicroRNAs , Wnt Signaling Pathway , Animals , Breast Neoplasms , Carcinogenesis , Cell Line, Tumor , Epithelial Cells/cytology , Epithelium/metabolism , Female , Low Density Lipoprotein Receptor-Related Protein-6/genetics , Low Density Lipoprotein Receptor-Related Protein-6/metabolism , Menstrual Cycle , Mice , MicroRNAs/genetics , Stem Cells/cytology
8.
Mol Cancer Ther ; 20(11): 2151-2165, 2021 11.
Article in English | MEDLINE | ID: mdl-34413129

ABSTRACT

Pediatric sarcomas represent a heterogeneous group of malignancies that exhibit variable response to DNA-damaging chemotherapy. Schlafen family member 11 protein (SLFN11) increases sensitivity to replicative stress and has been implicated as a potential biomarker to predict sensitivity to DNA-damaging agents (DDA). SLFN11 expression was quantified in 220 children with solid tumors using IHC. Sensitivity to the PARP inhibitor talazoparib (TAL) and the topoisomerase I inhibitor irinotecan (IRN) was assessed in sarcoma cell lines, including SLFN11 knock-out (KO) and overexpression models, and a patient-derived orthotopic xenograft model (PDOX). SLFN11 was expressed in 69% of pediatric sarcoma sampled, including 90% and 100% of Ewing sarcoma and desmoplastic small round-cell tumors, respectively, although the magnitude of expression varied widely. In sarcoma cell lines, protein expression strongly correlated with response to TAL and IRN, with SLFN11 KO resulting in significant loss of sensitivity in vitro and in vivo Surprisingly, retrospective analysis of children with sarcoma found no association between SLFN11 levels and favorable outcome. Subsequently, high SLFN11 expression was confirmed in a PDOX model derived from a patient with recurrent Ewing sarcoma who failed to respond to treatment with TAL + IRN. Selective inhibition of BCL-xL increased sensitivity to TAL + IRN in SLFN11-positive resistant tumor cells. Although SLFN11 appears to drive sensitivity to replicative stress in pediatric sarcomas, its potential to act as a biomarker may be limited to certain tumor backgrounds or contexts. Impaired apoptotic response may be one mechanism of resistance to DDA-induced replicative stress.


Subject(s)
DNA Damage/genetics , Genomics/methods , Nuclear Proteins/metabolism , Sarcoma, Ewing/genetics , Adolescent , Adult , Animals , Child , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Male , Mice , Mice, Nude , Young Adult
9.
Cell Rep ; 35(4): 109049, 2021 04 27.
Article in English | MEDLINE | ID: mdl-33910004

ABSTRACT

Transforming growth factor ß (TGF-ß) family ligands are key regulators of dendritic cell (DC) differentiation and activation. Epidermal Langerhans cells (LCs) require TGF-ß family signaling for their differentiation, and canonical TGF-ß1 signaling secures a non-activated LC state. LCs reportedly control skin inflammation and are replenished from peripheral blood monocytes, which also give rise to pro-inflammatory monocyte-derived DCs (moDCs). By studying mechanisms in inflammation, we previously screened LCs versus moDCs for differentially expressed microRNAs (miRNAs). This revealed that miR-424/503 is the most strongly inversely regulated (moDCs > LCs). We here demonstrate that miR-424/503 is induced during moDC differentiation and promotes moDC differentiation in human and mouse. Inversely, forced repression of miR-424 during moDC differentiation facilitates TGF-ß1-dependent LC differentiation. Mechanistically, miR-424/503 deficiency in monocyte/DC precursors leads to the induction of TGF-ß1 response genes critical for LC differentiation. Therefore, the miR-424/503 gene cluster plays a decisive role in anti-inflammatory LC versus pro-inflammatory moDC differentiation from monocytes.


Subject(s)
Anti-Inflammatory Agents/therapeutic use , Langerhans Cells/immunology , MicroRNAs/metabolism , Multigene Family/genetics , Transforming Growth Factor beta/metabolism , Animals , Anti-Inflammatory Agents/pharmacology , Cell Differentiation , Humans , Mice , Signal Transduction
10.
Leukemia ; 35(4): 984-1000, 2021 04.
Article in English | MEDLINE | ID: mdl-32733009

ABSTRACT

T-cell acute lymphoblastic leukemia (T-ALL) is a highly malignant pediatric leukemia, where few therapeutic options are available for patients which relapse. We find that therapeutic targeting of GLI transcription factors by GANT-61 is particularly effective against NOTCH1 unmutated T-ALL cells. Investigation of the functional role of GLI1 disclosed that it contributes to T-ALL cell proliferation, survival, and dissemination through the modulation of AKT and CXCR4 signaling pathways. Decreased CXCR4 signaling following GLI1 inactivation was found to be prevalently due to post-transcriptional mechanisms including altered serine 339 CXCR4 phosphorylation and cortactin levels. We also identify a novel cross-talk between GLI transcription factors and FOXC1. Indeed, GLI factors can activate the expression of FOXC1 which is able to stabilize GLI1/2 protein levels through attenuation of their ubiquitination. Further, we find that prolonged GLI1 deficiency has a double-edged role in T-ALL progression favoring disease dissemination through the activation of a putative AKT/FOXC1/GLI2 axis. These findings have clinical significance as T-ALL patients with extensive central nervous system dissemination show low GLI1 transcript levels. Further, T-ALL patients having a GLI2-based Hedgehog activation signature are associated with poor survival. Together, these findings support a rationale for targeting the FOXC1/AKT axis to prevent GLI-dependent oncogenic Hedgehog signaling.


Subject(s)
Forkhead Transcription Factors/metabolism , Precursor T-Cell Lymphoblastic Leukemia-Lymphoma/metabolism , Signal Transduction , Zinc Finger Protein GLI1/metabolism , Animals , Apoptosis , Biopsy , Cell Cycle Checkpoints , Computational Biology/methods , Disease Models, Animal , Disease Progression , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Gene Silencing , Hedgehog Proteins/metabolism , Humans , Immunohistochemistry , Mice , Mutation , Precursor T-Cell Lymphoblastic Leukemia-Lymphoma/diagnosis , Precursor T-Cell Lymphoblastic Leukemia-Lymphoma/etiology , Precursor T-Cell Lymphoblastic Leukemia-Lymphoma/mortality , Prognosis , Protein Binding , Proto-Oncogene Proteins c-akt/metabolism , Receptor, Notch1/genetics , Receptor, Notch1/metabolism , Receptors, CXCR4/metabolism , Transcription Factors
11.
Nat Methods ; 17(8): 807-814, 2020 08.
Article in English | MEDLINE | ID: mdl-32737473

ABSTRACT

Enhancers are important non-coding elements, but they have traditionally been hard to characterize experimentally. The development of massively parallel assays allows the characterization of large numbers of enhancers for the first time. Here, we developed a framework using Drosophila STARR-seq to create shape-matching filters based on meta-profiles of epigenetic features. We integrated these features with supervised machine-learning algorithms to predict enhancers. We further demonstrated that our model could be transferred to predict enhancers in mammals. We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mice and transduction-based reporter assays in human cell lines (153 enhancers in total). The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription factor binding patterns at predicted enhancers versus promoters. We demonstrated that these patterns enable the construction of a secondary model that effectively distinguishes enhancers and promoters.


Subject(s)
Epigenesis, Genetic/physiology , Pattern Recognition, Automated/methods , Animals , Cell Line , Drosophila , Histones/genetics , Histones/metabolism , Humans , Mice , Mice, Transgenic , Reproducibility of Results
12.
Nat Commun ; 11(1): 3696, 2020 07 29.
Article in English | MEDLINE | ID: mdl-32728046

ABSTRACT

ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.


Subject(s)
Databases, Genetic , Genomics , Neoplasms/genetics , Cell Line, Tumor , Cell Transformation, Neoplastic/genetics , Gene Regulatory Networks , Humans , Mutation/genetics , Reproducibility of Results , Transcription Factors/metabolism
13.
iScience ; 23(8): 101354, 2020 Aug 21.
Article in English | MEDLINE | ID: mdl-32717640

ABSTRACT

The immune system is a complex biological network composed of hierarchically organized genes, proteins, and cellular components that combat external pathogens and monitor the onset of internal disease. To meet and ultimately defeat these challenges, the immune system orchestrates an exquisitely complex interplay of numerous cells, often with highly specialized functions, in a tissue-specific manner. One of the major methodologies of systems immunology is to measure quantitatively the components and interaction levels in the immunologic networks to construct a computational network and predict the response of the components to perturbations. The recent advances in high-throughput sequencing techniques have provided us with a powerful approach to dissecting the complexity of the immune system. Here we summarize the latest progress in integrating omics data and network approaches to construct networks and to infer the underlying signaling and transcriptional landscape, as well as cell-cell communication, in the immune system, with a focus on hematopoiesis, adaptive immunity, and tumor immunology. Understanding the network regulation of immune cells has provided new insights into immune homeostasis and disease, with important therapeutic implications for inflammation, cancer, and other immune-mediated disorders.

14.
BMC Bioinformatics ; 21(1): 222, 2020 May 29.
Article in English | MEDLINE | ID: mdl-32471347

ABSTRACT

BACKGROUND: Genome-wide ligation-based assays such as Hi-C provide us with an unprecedented opportunity to investigate the spatial organization of the genome. Results of a typical Hi-C experiment are often summarized in a chromosomal contact map, a matrix whose elements reflect the co-location frequencies of genomic loci. To elucidate the complex structural and functional interactions between those genomic loci, networks offer a natural and powerful framework. RESULTS: We propose a novel graph-theoretical framework, the Corrected Gene Proximity (CGP) map to study the effect of the 3D spatial organization of genes in transcriptional regulation. The starting point of the CGP map is a weighted network, the gene proximity map, whose weights are based on the contact frequencies between genes extracted from genome-wide Hi-C data. We derive a null model for the network based on the signal contributed by the 1D genomic distance and use it to "correct" the gene proximity for cell type 3D specific arrangements. The CGP map, therefore, provides a network framework for the 3D structure of the genome on a global scale. On human cell lines, we show that the CGP map can detect and quantify gene co-regulation and co-localization more effectively than the map obtained by raw contact frequencies. Analyzing the expression pattern of metabolic pathways of two hematopoietic cell lines, we find that the relative positioning of the genes, as captured and quantified by the CGP, is highly correlated with their expression change. We further show that the CGP map can be used to form an inter-chromosomal proximity map that allows large-scale abnormalities, such as chromosomal translocations, to be identified. CONCLUSIONS: The Corrected Gene Proximity map is a map of the 3D structure of the genome on a global scale. It allows the simultaneous analysis of intra- and inter- chromosomal interactions and of gene co-regulation and co-localization more effectively than the map obtained by raw contact frequencies, thus revealing hidden associations between global spatial positioning and gene expression. The flexible graph-based formalism of the CGP map can be easily generalized to study any existing Hi-C datasets.


Subject(s)
Chromosomes, Human , Gene Expression Regulation , Genome, Human , Cell Line , Genomics/methods , Humans , Metabolic Networks and Pathways/genetics
15.
Cell Syst ; 10(3): 219-222, 2020 03 25.
Article in English | MEDLINE | ID: mdl-32213348

ABSTRACT

We compare the "patterns of mutation" in biological and technological networks. Negative selection at central nodes in biological networks has been widely reported; however, we show technological networks have an opposite trend. This suggests a potential contrast: biological evolution involves random tinkering, whereas man-made systems change according to rational planning.


Subject(s)
Technology/trends , Biological Evolution , Humans , Models, Biological , Protein Interaction Maps
16.
Genome Biol ; 20(1): 57, 2019 03 19.
Article in English | MEDLINE | ID: mdl-30890172

ABSTRACT

BACKGROUND: Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. RESULTS: Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments. CONCLUSIONS: In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.


Subject(s)
Genomics/standards , High-Throughput Nucleotide Sequencing/standards , Neoplasms/genetics , Quality Control , Software , Humans , Reproducibility of Results , Tumor Cells, Cultured
17.
Genetics ; 208(3): 937-949, 2018 03.
Article in English | MEDLINE | ID: mdl-29284660

ABSTRACT

To develop a catalog of regulatory sites in two major model organisms, Drosophila melanogaster and Caenorhabditis elegans, the modERN (model organism Encyclopedia of Regulatory Networks) consortium has systematically assayed the binding sites of transcription factors (TFs). Combined with data produced by our predecessor, modENCODE (Model Organism ENCyclopedia Of DNA Elements), we now have data for 262 TFs identifying 1.23 M sites in the fly genome and 217 TFs identifying 0.67 M sites in the worm genome. Because sites from different TFs are often overlapping and tightly clustered, they fall into 91,011 and 59,150 regions in the fly and worm, respectively, and these binding sites span as little as 8.7 and 5.8 Mb in the two organisms. Clusters with large numbers of sites (so-called high occupancy target, or HOT regions) predominantly associate with broadly expressed genes, whereas clusters containing sites from just a few factors are associated with genes expressed in tissue-specific patterns. All of the strains expressing GFP-tagged TFs are available at the stock centers, and the chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center and also through a simple interface (http://epic.gs.washington.edu/modERN/) that facilitates rapid accessibility of processed data sets. These data will facilitate a vast number of scientific inquiries into the function of individual TFs in key developmental, metabolic, and defense and homeostatic regulatory pathways, as well as provide a broader perspective on how individual TFs work together in local networks and globally across the life spans of these two key model organisms.


Subject(s)
Caenorhabditis elegans/genetics , Caenorhabditis elegans/metabolism , Databases, Genetic , Drosophila/genetics , Drosophila/metabolism , Genome-Wide Association Study , Transcription Factors/metabolism , Animals , Binding Sites , Chromatin Immunoprecipitation , Genome-Wide Association Study/methods , Models, Biological , Nucleotide Motifs , Protein Binding
18.
PLoS Comput Biol ; 13(7): e1005647, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28742097

ABSTRACT

Genome-wide proximity ligation based assays such as Hi-C have revealed that eukaryotic genomes are organized into structural units called topologically associating domains (TADs). From a visual examination of the chromosomal contact map, however, it is clear that the organization of the domains is not simple or obvious. Instead, TADs exhibit various length scales and, in many cases, a nested arrangement. Here, by exploiting the resemblance between TADs in a chromosomal contact map and densely connected modules in a network, we formulate TAD identification as a network optimization problem and propose an algorithm, MrTADFinder, to identify TADs from intra-chromosomal contact maps. MrTADFinder is based on the network-science concept of modularity. A key component of it is deriving an appropriate background model for contacts in a random chain, by numerically solving a set of matrix equations. The background model preserves the observed coverage of each genomic bin as well as the distance dependence of the contact frequency for any pair of bins exhibited by the empirical map. Also, by introducing a tunable resolution parameter, MrTADFinder provides a self-consistent approach for identifying TADs at different length scales, hence the acronym "Mr" standing for Multiple Resolutions. We then apply MrTADFinder to various Hi-C datasets. The identified domain boundaries are marked by characteristic signatures in chromatin marks and transcription factors (TF) that are consistent with earlier work. Moreover, by calling TADs at different length scales, we observe that boundary signatures change with resolution, with different chromatin features having different characteristic length scales. Furthermore, we report an enrichment of HOT (high-occupancy target) regions near TAD boundaries and investigate the role of different TFs in determining boundaries at various resolutions. To further explore the interplay between TADs and epigenetic marks, as tumor mutational burden is known to be coupled to chromatin structure, we examine how somatic mutations are distributed across boundaries and find a clear stepwise pattern. Overall, MrTADFinder provides a novel computational framework to explore the multi-scale structures in Hi-C contact maps.


Subject(s)
Chromatin , Chromosomes , Computational Biology/methods , Models, Genetic , Algorithms , Cell Line , Cell Nucleus/chemistry , Cell Nucleus/genetics , Chromatin/chemistry , Chromatin/genetics , Chromatin/ultrastructure , Chromosomes/chemistry , Chromosomes/genetics , Chromosomes/ultrastructure , Genome/genetics , Genome/physiology , Humans , Protein Binding , Transcription Factors/metabolism
19.
Bioinformatics ; 33(14): 2199-2201, 2017 Jul 15.
Article in English | MEDLINE | ID: mdl-28369339

ABSTRACT

SUMMARY: Genome-wide proximity ligation based assays like Hi-C have opened a window to the 3D organization of the genome. In so doing, they present data structures that are different from conventional 1D signal tracks. To exploit the 2D nature of Hi-C contact maps, matrix techniques like spectral analysis are particularly useful. Here, we present HiC-spector, a collection of matrix-related functions for analyzing Hi-C contact maps. In particular, we introduce a novel reproducibility metric for quantifying the similarity between contact maps based on spectral decomposition. The metric successfully separates contact maps mapped from Hi-C data coming from biological replicates, pseudo-replicates and different cell types. AVAILABILITY AND IMPLEMENTATION: Source code in Julia and Python, and detailed documentation is available at https://github.com/gersteinlab/HiC-spector . CONTACT: koonkiu.yan@gmail.com or mark@gersteinlab.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Chromosomes/chemistry , Genetic Techniques , Genome , Biotinylation , DNA/chemistry , Gene Library , Humans , Reproducibility of Results
20.
Cell Syst ; 2(3): 147-157, 2016 Mar 23.
Article in English | MEDLINE | ID: mdl-27047991

ABSTRACT

Biological systems are complex. In particular, the interactions between molecular components often form dense networks that, more often than not, are criticized for being inscrutable 'hairballs'. We argue that one way of untangling these hairballs is through cross-disciplinary network comparison-leveraging advances in other disciplines to obtain new biological insights. In some cases, such comparisons enable the direct transfer of mathematical formalism between disciplines, precisely describing the abstract associations between entities and allowing us to apply a variety of sophisticated formalisms to biology. In cases where the detailed structure of the network does not permit the transfer of complete formalisms between disciplines, comparison of mechanistic interactions in systems for which we have significant day-to-day experience can provide analogies for interpreting relatively more abstruse biological networks. Here, we illustrate how these comparisons benefit the field with a few specific examples related to network growth, organizational hierarchies, and the evolution of adaptive systems.

SELECTION OF CITATIONS
SEARCH DETAIL
...