RESUMEN
Transcriptional enhancers act as docking stations for combinations of transcription factors and thereby regulate spatiotemporal activation of their target genes1. It has been a long-standing goal in the field to decode the regulatory logic of an enhancer and to understand the details of how spatiotemporal gene expression is encoded in an enhancer sequence. Here we show that deep learning models2-6, can be used to efficiently design synthetic, cell-type-specific enhancers, starting from random sequences, and that this optimization process allows detailed tracing of enhancer features at single-nucleotide resolution. We evaluate the function of fully synthetic enhancers to specifically target Kenyon cells or glial cells in the fruit fly brain using transgenic animals. We further exploit enhancer design to create 'dual-code' enhancers that target two cell types and minimal enhancers smaller than 50 base pairs that are fully functional. By examining the state space searches towards local optima, we characterize enhancer codes through the strength, combination and arrangement of transcription factor activator and transcription factor repressor motifs. Finally, we apply the same strategies to successfully design human enhancers, which adhere to enhancer rules similar to those of Drosophila enhancers. Enhancer design guided by deep learning leads to better understanding of how enhancers work and shows that their code can be exploited to manipulate cell states.
Asunto(s)
Células , Aprendizaje Profundo , Drosophila melanogaster , Elementos de Facilitación Genéticos , Biología Sintética , Animales , Humanos , Animales Modificados Genéticamente/genética , Elementos de Facilitación Genéticos/genética , Regulación de la Expresión Génica , Factores de Transcripción/metabolismo , Células/clasificación , Células/metabolismo , Neuroglía/metabolismo , Encéfalo/citología , Drosophila melanogaster/citología , Drosophila melanogaster/genética , Proteínas Represoras/metabolismoRESUMEN
Genomic sequence variation within enhancers and promoters can have a significant impact on the cellular state and phenotype. However, sifting through the millions of candidate variants in a personal genome or a cancer genome, to identify those that impact cis-regulatory function, remains a major challenge. Interpretation of noncoding genome variation benefits from explainable artificial intelligence to predict and interpret the impact of a mutation on gene regulation. Here we generate phased whole genomes with matched chromatin accessibility, histone modifications, and gene expression for 10 melanoma cell lines. We find that training a specialized deep learning model, called DeepMEL2, on melanoma chromatin accessibility data can capture the various regulatory programs of the melanocytic and mesenchymal-like melanoma cell states. This model outperforms motif-based variant scoring, as well as more generic deep learning models. We detect hundreds to thousands of allele-specific chromatin accessibility variants (ASCAVs) in each melanoma genome, of which 15%-20% can be explained by gains or losses of transcription factor binding sites. A considerable fraction of ASCAVs are caused by changes in AP-1 binding, as confirmed by matched ChIP-seq data to identify allele-specific binding of JUN and FOSL1. Finally, by augmenting the DeepMEL2 model with ChIP-seq data for GABPA, the TERT promoter mutation, as well as additional ETS motif gains, can be identified with high confidence. In conclusion, we present a new integrative genomics approach and a deep learning model to identify and interpret functional enhancer mutations with allelic imbalance of chromatin accessibility and gene expression.
Asunto(s)
Cromatina , Aprendizaje Profundo , Alelos , Inteligencia Artificial , Cromatina/genética , Regiones Promotoras GenéticasRESUMEN
Deciphering the genomic regulatory code of enhancers is a key challenge in biology because this code underlies cellular identity. A better understanding of how enhancers work will improve the interpretation of noncoding genome variation and empower the generation of cell type-specific drivers for gene therapy. Here, we explore the combination of deep learning and cross-species chromatin accessibility profiling to build explainable enhancer models. We apply this strategy to decipher the enhancer code in melanoma, a relevant case study owing to the presence of distinct melanoma cell states. We trained and validated a deep learning model, called DeepMEL, using chromatin accessibility data of 26 melanoma samples across six different species. We show the accuracy of DeepMEL predictions on the CAGI5 challenge, where it significantly outperforms existing models on the melanoma enhancer of IRF4 Next, we exploit DeepMEL to analyze enhancer architectures and identify accurate transcription factor binding sites for the core regulatory complexes in the two different melanoma states, with distinct roles for each transcription factor, in terms of nucleosome displacement or enhancer activation. Finally, DeepMEL identifies orthologous enhancers across distantly related species, where sequence alignment fails, and the model highlights specific nucleotide substitutions that underlie enhancer turnover. DeepMEL can be used from the Kipoi database to predict and optimize candidate enhancers and to prioritize enhancer mutations. In addition, our computational strategy can be applied to other cancer or normal cell types.
Asunto(s)
Biología Computacional/métodos , Melanoma/genética , Pez Cebra/genética , Animales , Aprendizaje Profundo , Perros , Elementos de Facilitación Genéticos , Regulación Neoplásica de la Expresión Génica , Caballos , Humanos , Ratones , PorcinosRESUMEN
Single-cell technologies allow measuring chromatin accessibility and gene expression in each cell, but jointly utilizing both layers to map bona fide gene regulatory networks and enhancers remains challenging. Here, we generate independent single-cell RNA-seq and single-cell ATAC-seq atlases of the Drosophila eye-antennal disc and spatially integrate the data into a virtual latent space that mimics the organization of the 2D tissue using ScoMAP (Single-Cell Omics Mapping into spatial Axes using Pseudotime ordering). To validate spatially predicted enhancers, we use a large collection of enhancer-reporter lines and identify ~ 85% of enhancers in which chromatin accessibility and enhancer activity are coupled. Next, we infer enhancer-to-gene relationships in the virtual space, finding that genes are mostly regulated by multiple, often redundant, enhancers. Exploiting cell type-specific enhancers, we deconvolute cell type-specific effects of bulk-derived chromatin accessibility QTLs. Finally, we discover that Prospero drives neuronal differentiation through the binding of a GGG motif. In summary, we provide a comprehensive spatial characterization of gene regulation in a 2D tissue.
Asunto(s)
Cromatina/metabolismo , Drosophila/genética , Elementos de Facilitación Genéticos , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica/genética , Análisis de la Célula Individual/métodos , Animales , Animales Modificados Genéticamente , Antenas de Artrópodos/metabolismo , Diferenciación Celular/genética , Cromatina/genética , Secuenciación de Inmunoprecipitación de Cromatina , Bases de Datos Genéticas , Drosophila/metabolismo , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Epigenómica , Ojo/crecimiento & desarrollo , Ojo/metabolismo , Ontología de Genes , Redes Reguladoras de Genes , Genómica , Inmunohistoquímica , Larva/genética , Larva/crecimiento & desarrollo , Larva/metabolismo , Proteínas del Tejido Nervioso/genética , Proteínas del Tejido Nervioso/metabolismo , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Células Fotorreceptoras/metabolismo , Regiones Promotoras Genéticas , Sitios de Carácter Cuantitativo , Análisis Espacio-Temporal , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Transcriptoma/genéticaRESUMEN
In the mammalian liver, hepatocytes exhibit diverse metabolic and functional profiles based on their location within the liver lobule. However, it is unclear whether this spatial variation, called zonation, is governed by a well-defined gene regulatory code. Here, using a combination of single-cell multiomics, spatial omics, massively parallel reporter assays and deep learning, we mapped enhancer-gene regulatory networks across mouse liver cell types. We found that zonation affects gene expression and chromatin accessibility in hepatocytes, among other cell types. These states are driven by the repressors TCF7L1 and TBX3, alongside other core hepatocyte transcription factors, such as HNF4A, CEBPA, FOXA1 and ONECUT1. To examine the architecture of the enhancers driving these cell states, we trained a hierarchical deep learning model called DeepLiver. Our study provides a multimodal understanding of the regulatory code underlying hepatocyte identity and their zonation state that can be used to engineer enhancers with specific activity levels and zonation patterns.
Asunto(s)
Aprendizaje Profundo , Multiómica , Ratones , Animales , Redes Reguladoras de Genes , Hígado/metabolismo , Hepatocitos , MamíferosRESUMEN
Identifying cell type-specific enhancers in the brain is critical to building genetic tools for investigating the mammalian brain. Computational methods for functional enhancer prediction have been proposed and validated in the fruit fly and not yet the mammalian brain. We organized the 'Brain Initiative Cell Census Network (BICCN) Challenge: Predicting Functional Cell Type-Specific Enhancers from Cross-Species Multi-Omics' to assess machine learning and feature-based methods designed to nominate enhancer DNA sequences to target cell types in the mouse cortex. Methods were evaluated based on in vivo validation data from hundreds of cortical cell type-specific enhancers that were previously packaged into individual AAV vectors and retro-orbitally injected into mice. We find that open chromatin was a key predictor of functional enhancers, and sequence models improved prediction of non-functional enhancers that can be deprioritized as opposed to pursued for in vivo testing. Sequence models also identified cell type-specific transcription factor codes that can guide designs of in silico enhancers. This community challenge establishes a benchmark for enhancer prioritization algorithms and reveals computational approaches and molecular information that are crucial for the identification of functional enhancers for mammalian cortical cell types. The results of this challenge bring us closer to understanding the complex gene regulatory landscape of the mammalian brain and help us design more efficient genetic tools and potential gene therapies for human neurological diseases.
RESUMEN
Understanding how enhancers drive cell-type specificity and efficiently identifying them is essential for the development of innovative therapeutic strategies. In melanoma, the melanocytic (MEL) and the mesenchymal-like (MES) states present themselves with different responses to therapy, making the identification of specific enhancers highly relevant. Using massively parallel reporter assays (MPRAs) in a panel of patient-derived melanoma lines (MM lines), we set to identify and decipher melanoma enhancers by first focusing on regions with state-specific H3K27 acetylation close to differentially expressed genes. An in-depth evaluation of those regions was then pursued by investigating the activity of overlapping ATAC-seq peaks along with a full tiling of the acetylated regions with 190 bp sequences. Activity was observed in more than 60% of the selected regions, and we were able to precisely locate the active enhancers within ATAC-seq peaks. Comparison of sequence content with activity, using the deep learning model DeepMEL2, revealed that AP-1 alone is responsible for the MES enhancer activity. In contrast, SOX10 and MITF both influence MEL enhancer function with SOX10 being required to achieve high levels of activity. Overall, our MPRAs shed light on the relationship between long and short sequences in terms of their sequence content, enhancer activity, and specificity across melanoma cell states.
Asunto(s)
Elementos de Facilitación Genéticos , Melanoma/genética , Factor de Transcripción Asociado a Microftalmía/genética , Factores de Transcripción SOXE/genética , Factor de Transcripción AP-1/genética , Línea Celular Tumoral , Humanos , Melanoma/metabolismo , Factor de Transcripción Asociado a Microftalmía/metabolismo , Factores de Transcripción SOXE/metabolismo , Factor de Transcripción AP-1/metabolismoRESUMEN
One of the main challenges in cell therapy for muscle diseases is to efficiently target the muscle. To address this issue and achieve better understanding of in vivo cell fate, we evaluated the relevance of a non-invasive cell tracking method in the Golden Retriever Muscular Dystrophy (GRMD) model, a well-recognised model of Duchenne Muscular Dystrophy (DMD). Mesoangioblasts were directly labelled with 111In-oxine, and injected through one of the femoral arteries. The scintigraphy images obtained provided the first quantitative mapping of the immediate biodistribution of mesoangioblasts in a large animal model of DMD. The results revealed that cells were trapped by the first capillary filters: the injected limb and the lung. During the days following injection, radioactivity was redistributed to the liver. In vitro studies, performed with the same cells prepared for injecting the animal, revealed prominent cell death and 111In release. In vivo, cell death resulted in 111In release into the vasculature that was taken up by the liver, resulting in a non-specific and non-cell-bound radioactive signal. Indirect labelling methods would be an attractive alternative to track cells on the mid- and long-term.
Asunto(s)
Movimiento Celular/fisiología , Distrofia Muscular Animal/patología , Distrofia Muscular de Duchenne/patología , Células Madre/patología , Animales , Diferenciación Celular/fisiología , Rastreo Celular/métodos , Modelos Animales de Enfermedad , Perros , Distrofina/metabolismo , Femenino , Masculino , Músculo Esquelético/metabolismo , Músculo Esquelético/patología , Distrofia Muscular Animal/metabolismo , Distrofia Muscular de Duchenne/metabolismo , Cintigrafía/métodos , Células Madre/metabolismo , Distribución Tisular/fisiologíaRESUMEN
Stem cell-based therapies are a promising approach for the treatment of degenerative muscular diseases; however, clinical trials have shown inconclusive and even disappointing results so far. Noninvasive cell monitoring by medicine imaging could improve the understanding of the survival and biodistribution of cells following injection. In this study, we assessed the canine sodium iodide symporter (cNIS) reporter gene as an imaging tool to track by single-photon emission computed tomography (SPECT/CT) transduced canine myoblasts after intramuscular (IM) administrations in dogs. cNIS-expressing cells kept their myogenic capacities and showed strong 99 mTc-pertechnetate (99 mTcO4 -) uptake efficiency both in vitro and in vivo. cNIS expression allowed visualization of cells by SPECT/CT along time: 4 h, 48 h, 7 days, and 30 days after IM injection; biopsies collected 30 days post administration showed myofiber's membranes expressing cNIS. This study demonstrates that NIS can be used as a reporter to track cells in vivo in the skeletal muscle of large animals. Our results set a proof of concept of the benefits NIS-tracking tool may bring to the already challenging cell-based therapies arena in myopathies and pave the way to a more efficient translation to the clinical setting from more accurate pre-clinical results.
RESUMEN
Melanoma cells can switch between a melanocytic and a mesenchymal-like state. Scattered evidence indicates that additional intermediate state(s) may exist. Here, to search for such states and decipher their underlying gene regulatory network (GRN), we studied 10 melanoma cultures using single-cell RNA sequencing (RNA-seq) as well as 26 additional cultures using bulk RNA-seq. Although each culture exhibited a unique transcriptome, we identified shared GRNs that underlie the extreme melanocytic and mesenchymal states and the intermediate state. This intermediate state is corroborated by a distinct chromatin landscape and is governed by the transcription factors SOX6, NFATC2, EGR3, ELF1 and ETV4. Single-cell migration assays confirmed the intermediate migratory phenotype of this state. Using time-series sampling of single cells after knockdown of SOX10, we unravelled the sequential and recurrent arrangement of GRNs during phenotype switching. Taken together, these analyses indicate that an intermediate state exists and is driven by a distinct and stable 'mixed' GRN rather than being a symbiotic heterogeneous mix of cells.