RESUMO
Ectopic expression of OCT4, SOX2, KLF4 and MYC (OSKM) transforms differentiated cells into induced pluripotent stem cells. To refine our mechanistic understanding of reprogramming, especially during the early stages, we profiled chromatin accessibility and gene expression at single-cell resolution across a densely sampled time course of human fibroblast reprogramming. Using neural networks that map DNA sequence to ATAC-seq profiles at base-resolution, we annotated cell-state-specific predictive transcription factor (TF) motif syntax in regulatory elements, inferred affinity- and concentration-dependent dynamics of Tn5-bias corrected TF footprints, linked peaks to putative target genes, and elucidated rewiring of TF-to-gene cis-regulatory networks. Our models reveal that early in reprogramming, OSK, at supraphysiological concentrations, rapidly open transient regulatory elements by occupying non-canonical low-affinity binding sites. As OSK concentration falls, the accessibility of these transient elements decays as a function of motif affinity. We find that these OSK-dependent transient elements sequester the somatic TF AP-1. This redistribution is strongly associated with the silencing of fibroblast-specific genes within individual nuclei. Together, our integrated single-cell resource and models reveal insights into the cis-regulatory code of reprogramming at unprecedented resolution, connect TF stoichiometry and motif syntax to diversification of cell fate trajectories, and provide new perspectives on the dynamics and role of transient regulatory elements in somatic silencing.
RESUMO
The regenerative potential of brain stem cell niches deteriorates during aging. Yet the mechanisms underlying this decline are largely unknown. Here we characterize genome-wide chromatin accessibility of neurogenic niche cells in vivo during aging. Interestingly, chromatin accessibility at adhesion and migration genes decreases with age in quiescent neural stem cells (NSCs) but increases with age in activated (proliferative) NSCs. Quiescent and activated NSCs exhibit opposing adhesion behaviors during aging: quiescent NSCs become less adhesive, whereas activated NSCs become more adhesive. Old activated NSCs also show decreased migration in vitro and diminished mobilization out of the niche for neurogenesis in vivo. Using tension sensors, we find that aging increases force-producing adhesions in activated NSCs. Inhibiting the cytoskeletal-regulating kinase ROCK reduces these adhesions, restores migration in old activated NSCs in vitro, and boosts neurogenesis in vivo. These results have implications for restoring the migratory potential of NSCs and for improving neurogenesis in the aged brain.
Assuntos
Cromatina , Células-Tronco Neurais , Cromatina/genética , Neurogênese/genética , EncéfaloRESUMO
GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Assuntos
Biologia Computacional , Genoma Humano , Humanos , Animais , Camundongos , Anotação de Sequência Molecular , Biologia Computacional/métodos , Genoma Humano/genética , Transcriptoma/genética , Perfilação da Expressão Gênica , Bases de Dados GenéticasRESUMO
To define the multi-cellular epigenomic and transcriptional landscape of cardiac cellular development, we generated single-cell chromatin accessibility maps of human fetal heart tissues. We identified eight major differentiation trajectories involving primary cardiac cell types, each associated with dynamic transcription factor (TF) activity signatures. We contrasted regulatory landscapes of iPSC-derived cardiac cell types and their in vivo counterparts, which enabled optimization of in vitro differentiation of epicardial cells. Further, we interpreted sequence based deep learning models of cell-type-resolved chromatin accessibility profiles to decipher underlying TF motif lexicons. De novo mutations predicted to affect chromatin accessibility in arterial endothelium were enriched in congenital heart disease (CHD) cases vs. controls. In vitro studies in iPSCs validated the functional impact of identified variation on the predicted developmental cell types. This work thus defines the cell-type-resolved cis-regulatory sequence determinants of heart development and identifies disruption of cell type-specific regulatory elements in CHD.
Assuntos
Cromatina , Cardiopatias Congênitas , Humanos , Cromatina/genética , Cardiopatias Congênitas/genética , Coração , Mutação , Análise de Célula ÚnicaRESUMO
Genome-wide association studies (GWASs) of eye disorders have identified hundreds of genetic variants associated with ocular disease. However, the vast majority of these variants are noncoding, making it challenging to interpret their function. Here we present a joint single-cell atlas of gene expression and chromatin accessibility of the adult human retina with more than 50,000 cells, which we used to analyze single-nucleotide polymorphisms (SNPs) implicated by GWASs of age-related macular degeneration, glaucoma, diabetic retinopathy, myopia, and type 2 macular telangiectasia. We integrate this atlas with a HiChIP enhancer connectome, expression quantitative trait loci (eQTL) data, and base-resolution deep learning models to predict noncoding SNPs with causal roles in eye disease, assess SNP impact on transcription factor binding, and define their known and novel target genes. Our efforts nominate pathogenic SNP-target gene interactions for multiple vision disorders and provide a potentially powerful resource for interpreting noncoding variation in the eye.
RESUMO
MOTIVATION: In silico saturation mutagenesis (ISM) is a popular approach in computational genomics for calculating feature attributions on biological sequences that proceeds by systematically perturbing each position in a sequence and recording the difference in model output. However, this method can be slow because systematically perturbing each position requires performing a number of forward passes proportional to the length of the sequence being examined. RESULTS: In this work, we propose a modification of ISM that leverages the principles of compressed sensing to require only a constant number of forward passes, regardless of sequence length, when applied to models that contain operations with a limited receptive field, such as convolutions. Our method, named Yuzu, can reduce the time that ISM spends in convolution operations by several orders of magnitude and, consequently, Yuzu can speed up ISM on several commonly used architectures in genomics by over an order of magnitude. Notably, we found that Yuzu provides speedups that increase with the complexity of the convolution operation and the length of the sequence being analyzed, suggesting that Yuzu provides large benefits in realistic settings. AVAILABILITY AND IMPLEMENTATION: We have made this tool available at https://github.com/kundajelab/yuzu.
Assuntos
Genômica , Mutagênese , Genômica/métodosRESUMO
MOTIVATION: Deep-learning models, such as convolutional neural networks, are able to accurately map biological sequences to associated functional readouts and properties by learning predictive de novo representations. In silico saturation mutagenesis (ISM) is a popular feature attribution technique for inferring contributions of all characters in an input sequence to the model's predicted output. The main drawback of ISM is its runtime, as it involves multiple forward propagations of all possible mutations of each character in the input sequence through the trained model to predict the effects on the output. RESULTS: We present fastISM, an algorithm that speeds up ISM by a factor of over 10× for commonly used convolutional neural network architectures. fastISM is based on the observations that the majority of computation in ISM is spent in convolutional layers, and a single mutation only disrupts a limited region of intermediate layers, rendering most computation redundant. fastISM reduces the gap between backpropagation-based feature attribution methods and ISM. It far surpasses the runtime of backpropagation-based methods on multi-output architectures, making it feasible to run ISM on a large number of sequences. AVAILABILITY AND IMPLEMENTATION: An easy-to-use Keras/TensorFlow 2 implementation of fastISM is available at https://github.com/kundajelab/fastISM. fastISM can be installed using pip install fastism. A hands-on tutorial can be found at https://colab.research.google.com/github/kundajelab/fastISM/blob/master/notebooks/colab/DeepSEA.ipynb. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Redes Neurais de Computação , Mutagênese , MutaçãoRESUMO
BACKGROUND: Smooth muscle cells (SMCs) transition into a number of different phenotypes during atherosclerosis, including those that resemble fibroblasts and chondrocytes, and make up the majority of cells in the atherosclerotic plaque. To better understand the epigenetic and transcriptional mechanisms that mediate these cell state changes, and how they relate to risk for coronary artery disease (CAD), we have investigated the causality and function of transcription factors at genome-wide associated loci. METHODS: We used CRISPR-Cas 9 genome and epigenome editing to identify the causal gene and cells for a complex CAD genome-wide association study signal at 2q22.3. Single-cell epigenetic and transcriptomic profiling in murine models and human coronary artery smooth muscle cells were used to understand the cellular and molecular mechanism by which this CAD risk gene exerts its function. RESULTS: CRISPR-Cas 9 genome and epigenome editing showed that the complex CAD genetic signals within a genomic region at 2q22.3 lie within smooth muscle long-distance enhancers for ZEB2, a transcription factor extensively studied in the context of epithelial mesenchymal transition in development of cancer. Zeb2 regulates SMC phenotypic transition through chromatin remodeling that obviates accessibility and disrupts both Notch and transforming growth factor ß signaling, thus altering the epigenetic trajectory of SMC transitions. SMC-specific loss of Zeb2 resulted in an inability of transitioning SMCs to turn off contractile programing and take on a fibroblast-like phenotype, but accelerated the formation of chondromyocytes, mirroring features of high-risk atherosclerotic plaques in human coronary arteries. CONCLUSIONS: These studies identify ZEB2 as a new CAD genome-wide association study gene that affects features of plaque vulnerability through direct effects on the epigenome, providing a new therapeutic approach to target vascular disease.
Assuntos
Aterosclerose/genética , Epigênese Genética/genética , Homeobox 2 de Ligação a E-box com Dedos de Zinco/genética , Animais , Aterosclerose/patologia , Humanos , Camundongos , Análise de Célula ÚnicaRESUMO
Somatic cell transcription factors are critical to maintaining cellular identity and constitute a barrier to human somatic cell reprogramming; yet a comprehensive understanding of the mechanism of action is lacking. To gain insight, we examined epigenome remodeling at the onset of human nuclear reprogramming by profiling human fibroblasts after fusion with murine embryonic stem cells (ESCs). By assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) and chromatin immunoprecipitation sequencing we identified enrichment for the activator protein 1 (AP-1) transcription factor c-Jun at regions of early transient accessibility at fibroblast-specific enhancers. Expression of a dominant negative AP-1 mutant (dnAP-1) reduced accessibility and expression of fibroblast genes, overcoming the barrier to reprogramming. Remarkably, efficient reprogramming of human fibroblasts to induced pluripotent stem cells was achieved by transduction with vectors expressing SOX2, KLF4, and inducible dnAP-1, demonstrating that dnAP-1 can substitute for exogenous human OCT4. Mechanistically, we show that the AP-1 component c-Jun has two unexpected temporally distinct functions in human reprogramming: 1) to potentiate fibroblast enhancer accessibility and fibroblast-specific gene expression, and 2) to bind to and repress OCT4 as a complex with MBD3. Our findings highlight AP-1 as a previously unrecognized potent dual gatekeeper of the somatic cell state.
Assuntos
Reprogramação Celular , Regulação da Expressão Gênica , Células-Tronco Embrionárias Murinas/metabolismo , Fator de Transcrição AP-1/metabolismo , Animais , Linhagem Celular , Humanos , Fator 4 Semelhante a Kruppel , Camundongos , Fator de Transcrição AP-1/genéticaRESUMO
MOTIVATION: Genome-wide profiles of chromatin accessibility and gene expression in diverse cellular contexts are critical to decipher the dynamics of transcriptional regulation. Recently, convolutional neural networks have been used to learn predictive cis-regulatory DNA sequence models of context-specific chromatin accessibility landscapes. However, these context-specific regulatory sequence models cannot generalize predictions across cell types. RESULTS: We introduce multi-modal, residual neural network architectures that integrate cis-regulatory sequence and context-specific expression of trans-regulators to predict genome-wide chromatin accessibility profiles across cellular contexts. We show that the average accessibility of a genomic region across training contexts can be a surprisingly powerful predictor. We leverage this feature and employ novel strategies for training models to enhance genome-wide prediction of shared and context-specific chromatin accessible sites across cell types. We interpret the models to reveal insights into cis- and trans-regulation of chromatin dynamics across 123 diverse cellular contexts. AVAILABILITY AND IMPLEMENTATION: The code is available at https://github.com/kundajelab/ChromDragoNN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Cromatina , Genoma , Sequência de Bases , Genômica , Redes Neurais de ComputaçãoRESUMO
The relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of â¼500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.