RESUMEN
Most human protein-coding genes are regulated by multiple, distinct promoters, suggesting that the choice of promoter is as important as its level of transcriptional activity. However, while a global change in transcription is recognized as a defining feature of cancer, the contribution of alternative promoters still remains largely unexplored. Here, we infer active promoters using RNA-seq data from 18,468 cancer and normal samples, demonstrating that alternative promoters are a major contributor to context-specific regulation of transcription. We find that promoters are deregulated across tissues, cancer types, and patients, affecting known cancer genes and novel candidates. For genes with independently regulated promoters, we demonstrate that promoter activity provides a more accurate predictor of patient survival than gene expression. Our study suggests that a dynamic landscape of active promoters shapes the cancer transcriptome, opening new diagnostic avenues and opportunities to further explore the interplay of regulatory mechanisms with transcriptional aberrations in cancer.
Asunto(s)
Biología Computacional/métodos , Regulación Neoplásica de la Expresión Génica/genética , Neoplasias/genética , Regiones Promotoras Genéticas/genética , Transcriptoma/genética , Bases de Datos Genéticas , Humanos , RNA-Seq/métodosRESUMEN
Dormant hematopoietic stem cells (dHSCs) are atop the hematopoietic hierarchy. The molecular identity of dHSCs and the mechanisms regulating their maintenance or exit from dormancy remain uncertain. Here, we use single-cell RNA sequencing (RNA-seq) analysis to show that the transition from dormancy toward cell-cycle entry is a continuous developmental path associated with upregulation of biosynthetic processes rather than a stepwise progression. In addition, low Myc levels and high expression of a retinoic acid program are characteristic for dHSCs. To follow the behavior of dHSCs in situ, a Gprc5c-controlled reporter mouse was established. Treatment with all-trans retinoic acid antagonizes stress-induced activation of dHSCs by restricting protein translation and levels of reactive oxygen species (ROS) and Myc. Mice maintained on a vitamin A-free diet lose HSCs and show a disrupted re-entry into dormancy after exposure to inflammatory stress stimuli. Our results highlight the impact of dietary vitamin A on the regulation of cell-cycle-mediated stem cell plasticity. VIDEO ABSTRACT.
Asunto(s)
Células Madre Hematopoyéticas/citología , Transducción de Señal , Tretinoina/farmacología , Vitamina A/administración & dosificación , Animales , Vías Biosintéticas , Técnicas de Cultivo de Célula , Ciclo Celular/efectos de los fármacos , Supervivencia Celular , Dieta , Perfilación de la Expresión Génica , Células Madre Hematopoyéticas/efectos de los fármacos , Ratones , Poli I-C/farmacología , Especies Reactivas de Oxígeno/metabolismo , Receptores Acoplados a Proteínas G/metabolismo , Análisis de la Célula Individual , Estrés Fisiológico , Vitamina A/farmacología , Vitaminas/administración & dosificación , Vitaminas/farmacologíaRESUMEN
Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases.
Asunto(s)
Células Sanguíneas/citología , Enfermedad/genética , Regiones Promotoras Genéticas , Linaje de la Célula , Separación Celular , Cromatina , Elementos de Facilitación Genéticos , Epigenómica , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Hematopoyesis , Humanos , Polimorfismo de Nucleótido Simple , Sitios de Carácter CuantitativoRESUMEN
The relationship between the human placenta-the extraembryonic organ made by the fetus, and the decidua-the mucosal layer of the uterus, is essential to nurture and protect the fetus during pregnancy. Extravillous trophoblast cells (EVTs) derived from placental villi infiltrate the decidua, transforming the maternal arteries into high-conductance vessels1. Defects in trophoblast invasion and arterial transformation established during early pregnancy underlie common pregnancy disorders such as pre-eclampsia2. Here we have generated a spatially resolved multiomics single-cell atlas of the entire human maternal-fetal interface including the myometrium, which enables us to resolve the full trajectory of trophoblast differentiation. We have used this cellular map to infer the possible transcription factors mediating EVT invasion and show that they are preserved in in vitro models of EVT differentiation from primary trophoblast organoids3,4 and trophoblast stem cells5. We define the transcriptomes of the final cell states of trophoblast invasion: placental bed giant cells (fused multinucleated EVTs) and endovascular EVTs (which form plugs inside the maternal arteries). We predict the cell-cell communication events contributing to trophoblast invasion and placental bed giant cell formation, and model the dual role of interstitial EVTs and endovascular EVTs in mediating arterial transformation during early pregnancy. Together, our data provide a comprehensive analysis of postimplantation trophoblast differentiation that can be used to inform the design of experimental models of the human placenta in early pregnancy.
Asunto(s)
Multiómica , Primer Trimestre del Embarazo , Trofoblastos , Femenino , Humanos , Embarazo , Movimiento Celular , Placenta/irrigación sanguínea , Placenta/citología , Placenta/fisiología , Primer Trimestre del Embarazo/fisiología , Trofoblastos/citología , Trofoblastos/metabolismo , Trofoblastos/fisiología , Decidua/irrigación sanguínea , Decidua/citología , Relaciones Materno-Fetales/fisiología , Análisis de la Célula Individual , Miometrio/citología , Miometrio/fisiología , Diferenciación Celular , Organoides/citología , Organoides/fisiología , Células Madre/citología , Transcriptoma , Factores de Transcripción/metabolismo , Comunicación CelularRESUMEN
Spatially resolved omics technologies are transforming our understanding of biological tissues. However, the handling of uni- and multimodal spatial omics datasets remains a challenge owing to large data volumes, heterogeneity of data types and the lack of flexible, spatially aware data structures. Here we introduce SpatialData, a framework that establishes a unified and extensible multiplatform file-format, lazy representation of larger-than-memory data, transformations and alignment to common coordinate systems. SpatialData facilitates spatial annotations and cross-modal aggregation and analysis, the utility of which is illustrated in the context of multiple vignettes, including integrative analysis on a multimodal Xenium and Visium breast cancer study.
RESUMEN
Studies with temporal or spatial resolution are crucial to understand the molecular dynamics and spatial dependencies underlying a biological process or system. With advances in high-throughput omic technologies, time- and space-resolved molecular measurements at scale are increasingly accessible, providing new opportunities to study the role of timing or structure in a wide range of biological questions. At the same time, analyses of the data being generated in the context of spatiotemporal studies entail new challenges that need to be considered, including the need to account for temporal and spatial dependencies and compare them across different scales, biological samples or conditions. In this Review, we provide an overview of common principles and challenges in the analysis of temporal and spatial omics data. We discuss statistical concepts to model temporal and spatial dependencies and highlight opportunities for adapting existing analysis methods to data with temporal and spatial dimensions.
RESUMEN
Here we describe the LifeTime Initiative, which aims to track, understand and target human cells during the onset and progression of complex diseases, and to analyse their response to therapy at single-cell resolution. This mission will be implemented through the development, integration and application of single-cell multi-omics and imaging, artificial intelligence and patient-derived experimental disease models during the progression from health to disease. The analysis of large molecular and clinical datasets will identify molecular mechanisms, create predictive computational models of disease progression, and reveal new drug targets and therapies. The timely detection and interception of disease embedded in an ethical and patient-centred vision will be achieved through interactions across academia, hospitals, patient associations, health data management systems and industry. The application of this strategy to key medical challenges in cancer, neurological and neuropsychiatric disorders, and infectious, chronic inflammatory and cardiovascular diseases at the single-cell level will usher in cell-based interceptive medicine in Europe over the next decade.
Asunto(s)
Tratamiento Basado en Trasplante de Células y Tejidos , Atención a la Salud/métodos , Atención a la Salud/tendencias , Medicina/métodos , Medicina/tendencias , Patología , Análisis de la Célula Individual , Inteligencia Artificial , Atención a la Salud/ética , Atención a la Salud/normas , Diagnóstico Precoz , Educación Médica , Europa (Continente) , Femenino , Salud , Humanos , Legislación Médica , Masculino , Medicina/normasRESUMEN
Transcript alterations often result from somatic changes in cancer genomes1. Various forms of RNA alterations have been described in cancer, including overexpression2, altered splicing3 and gene fusions4; however, it is difficult to attribute these to underlying genomic changes owing to heterogeneity among patients and tumour types, and the relatively small cohorts of patients for whom samples have been analysed by both transcriptome and whole-genome sequencing. Here we present, to our knowledge, the most comprehensive catalogue of cancer-associated gene alterations to date, obtained by characterizing tumour transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)5. Using matched whole-genome sequencing data, we associated several categories of RNA alterations with germline and somatic DNA alterations, and identified probable genetic mechanisms. Somatic copy-number alterations were the major drivers of variations in total gene and allele-specific expression. We identified 649 associations of somatic single-nucleotide variants with gene expression in cis, of which 68.4% involved associations with flanking non-coding regions of the gene. We found 1,900 splicing alterations associated with somatic mutations, including the formation of exons within introns in proximity to Alu elements. In addition, 82% of gene fusions were associated with structural variants, including 75 of a new class, termed 'bridged' fusions, in which a third genomic location bridges two genes. We observed transcriptomic alteration signatures that differ between cancer types and have associations with variations in DNA mutational signatures. This compendium of RNA alterations in the genomic context provides a rich resource for identifying genes and mechanisms that are functionally implicated in cancer.
Asunto(s)
Regulación Neoplásica de la Expresión Génica , Neoplasias/genética , ARN/genética , Variaciones en el Número de Copia de ADN , ADN de Neoplasias , Genoma Humano , Genómica , Humanos , TranscriptomaRESUMEN
Factor analysis is a widely used method for dimensionality reduction in genome biology, with applications from personalized health to single-cell biology. Existing factor analysis models assume independence of the observed samples, an assumption that fails in spatio-temporal profiling studies. Here we present MEFISTO, a flexible and versatile toolbox for modeling high-dimensional data when spatial or temporal dependencies between the samples are known. MEFISTO maintains the established benefits of factor analysis for multimodal data, but enables the performance of spatio-temporally informed dimensionality reduction, interpolation, and separation of smooth from non-smooth patterns of variation. Moreover, MEFISTO can integrate multiple related datasets by simultaneously identifying and aligning the underlying patterns of variation in a data-driven manner. To illustrate MEFISTO, we apply the model to different datasets with spatial or temporal resolution, including an evolutionary atlas of organ development, a longitudinal microbiome study, a single-cell multi-omics atlas of mouse gastrulation and spatially resolved transcriptomics.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Microbioma Gastrointestinal/fisiología , Regulación del Desarrollo de la Expresión Génica , Programas Informáticos , Animales , Evolución Molecular , Humanos , Lactante , Estudios Longitudinales , Análisis de la Célula Individual , Análisis Espacio-TemporalRESUMEN
Intratumor heterogeneity as a clinical challenge becomes most evident after several treatment lines, when multidrug-resistant subclones accumulate. To address this challenge, the characterization of resistance mechanisms at the subclonal level is key to identify common vulnerabilities. In this study, we integrate whole-genome sequencing, single-cell (sc) transcriptomics (scRNA sequencing), and chromatin accessibility (scATAC sequencing) together with mitochondrial DNA mutations to define subclonal architecture and evolution for longitudinal samples from 15 patients with relapsed or refractory multiple myeloma. We assess transcriptomic and epigenomic changes to resolve the multifactorial nature of therapy resistance and relate it to the parallel occurrence of different mechanisms: (1) preexisting epigenetic profiles of subclones associated with survival advantages, (2) converging phenotypic adaptation of genetically distinct subclones, and (3) subclone-specific interactions of myeloma and bone marrow microenvironment cells. Our study showcases how an integrative multiomics analysis can be applied to track and characterize distinct multidrug-resistant subclones over time for the identification of molecular targets against them.
Asunto(s)
Mieloma Múltiple , Humanos , Mieloma Múltiple/tratamiento farmacológico , Mieloma Múltiple/genética , Multiómica , Mutación , Transcriptoma , Microambiente Tumoral/genéticaRESUMEN
Despite the overwhelming evidence that multiple sclerosis is an autoimmune disease, relatively little is known about the precise nature of the immune dysregulation underlying the development of the disease. Reasoning that the CSF from patients might be enriched for cells relevant in pathogenesis, we have completed a high-resolution single-cell analysis of 96 732 CSF cells collected from 33 patients with multiple sclerosis (n = 48 675) and 48 patients with other neurological diseases (n = 48 057). Completing comprehensive cell type annotation, we identified a rare population of CD8+ T cells, characterized by the upregulation of inhibitory receptors, increased in patients with multiple sclerosis. Applying a Multi-Omics Factor Analysis to these single-cell data further revealed that activity in pathways responsible for controlling inflammatory and type 1 interferon responses are altered in multiple sclerosis in both T cells and myeloid cells. We also undertook a systematic search for expression quantitative trait loci in the CSF cells. Of particular interest were two expression quantitative trait loci in CD8+ T cells that were fine mapped to multiple sclerosis susceptibility variants in the viral control genes ZC3HAV1 (rs10271373) and IFITM2 (rs1059091). Further analysis suggests that these associations likely reflect genetic effects on RNA splicing and cell-type specific gene expression respectively. Collectively, our study suggests that alterations in viral control mechanisms might be important in the development of multiple sclerosis.
Asunto(s)
Esclerosis Múltiple , Humanos , Linfocitos T CD8-positivos , Regulación hacia Arriba , Antivirales , Líquido Cefalorraquídeo/metabolismo , Proteínas de la Membrana/genéticaRESUMEN
Formation of the three primary germ layers during gastrulation is an essential step in the establishment of the vertebrate body plan and is associated with major transcriptional changes1-5. Global epigenetic reprogramming accompanies these changes6-8, but the role of the epigenome in regulating early cell-fate choice remains unresolved, and the coordination between different molecular layers is unclear. Here we describe a single-cell multi-omics map of chromatin accessibility, DNA methylation and RNA expression during the onset of gastrulation in mouse embryos. The initial exit from pluripotency coincides with the establishment of a global repressive epigenetic landscape, followed by the emergence of lineage-specific epigenetic patterns during gastrulation. Notably, cells committed to mesoderm and endoderm undergo widespread coordinated epigenetic rearrangements at enhancer marks, driven by ten-eleven translocation (TET)-mediated demethylation and a concomitant increase of accessibility. By contrast, the methylation and accessibility landscape of ectodermal cells is already established in the early epiblast. Hence, regulatory elements associated with each germ layer are either epigenetically primed or remodelled before cell-fate decisions, providing the molecular framework for a hierarchical emergence of the primary germ layers.
Asunto(s)
Metilación de ADN , Epigénesis Genética , Gástrula/citología , Gástrula/metabolismo , Gastrulación/genética , Regulación del Desarrollo de la Expresión Génica , ARN/genética , Análisis de la Célula Individual , Animales , Diferenciación Celular/genética , Linaje de la Célula/genética , Cromatina/genética , Cromatina/metabolismo , Desmetilación , Cuerpos Embrioides/citología , Endodermo/citología , Endodermo/embriología , Endodermo/metabolismo , Elementos de Facilitación Genéticos/genética , Epigenoma/genética , Eritropoyesis , Análisis Factorial , Gástrula/embriología , Gastrulación/fisiología , Mesodermo/citología , Mesodermo/embriología , Mesodermo/metabolismo , Ratones , Células Madre Pluripotentes/citología , Células Madre Pluripotentes/metabolismo , ARN/análisis , Factores de Tiempo , Dedos de ZincRESUMEN
MOTIVATION: Factor analysis is a widely used tool for unsupervised dimensionality reduction of high-throughput datasets in molecular biology, with recently proposed extensions designed specifically for spatial transcriptomics data. However, these methods expect (count) matrices as data input and are therefore not directly applicable to single molecule resolution data, which are in the form of coordinate lists annotated with genes and provide insight into subcellular spatial expression patterns. To address this, we here propose FISHFactor, a probabilistic factor model that combines the benefits of spatial, non-negative factor analysis with a Poisson point process likelihood to explicitly model and account for the nature of single molecule resolution data. In addition, FISHFactor shares information across a potentially large number of cells in a common weight matrix, allowing consistent interpretation of factors across cells and yielding improved latent variable estimates. RESULTS: We compare FISHFactor to existing methods that rely on aggregating information through spatial binning and cannot combine information from multiple cells and show that our method leads to more accurate results on simulated data. We show that our method is scalable and can be readily applied to large datasets. Finally, we demonstrate on a real dataset that FISHFactor is able to identify major subcellular expression patterns and spatial gene clusters in a data-driven manner. AVAILABILITY AND IMPLEMENTATION: The model implementation, data simulation and experiment scripts are available under https://www.github.com/bioFAM/FISHFactor.
Asunto(s)
Programas Informáticos , Transcriptoma , Perfilación de la Expresión Génica/métodos , Simulación por Computador , Modelos EstadísticosRESUMEN
Bulk and single-cell DNA sequencing has enabled reconstructing clonal substructures of somatic tissues from frequency and cooccurrence patterns of somatic variants. However, approaches to characterize phenotypic variations between clones are not established. Here we present cardelino (https://github.com/single-cell-genetics/cardelino), a computational method for inferring the clonal tree configuration and the clone of origin of individual cells assayed using single-cell RNA-seq (scRNA-seq). Cardelino flexibly integrates information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. We apply cardelino to a published cancer dataset and to newly generated matched scRNA-seq and exome-seq data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a role for cell division genes in somatic evolution in healthy skin.
Asunto(s)
Fibroblastos/metabolismo , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos , Algoritmos , Ciclo Celular , Proliferación Celular , Humanos , Melanoma , Mutación , TranscriptomaRESUMEN
Single-cell RNA sequencing (scRNA-seq) enables characterizing the cellular heterogeneity in human tissues. Recent technological advances have enabled the first population-scale scRNA-seq studies in hundreds of individuals, allowing to assay genetic effects with single-cell resolution. However, existing strategies to analyze these data remain based on principles established for the genetic analysis of bulk RNA-seq. In particular, current methods depend on a priori definitions of discrete cell types, and hence cannot assess allelic effects across subtle cell types and cell states. To address this, we propose the Cell Regulatory Map (CellRegMap), a statistical framework to test for and quantify genetic effects on gene expression in individual cells. CellRegMap provides a principled approach to identify and characterize genotype-context interactions of known eQTL variants using scRNA-seq data. This model-based approach resolves allelic effects across cellular contexts of different granularity, including genetic effects specific to cell subtypes and continuous cell transitions. We validate CellRegMap using simulated data and apply it to previously identified eQTL from two recent studies of differentiating iPSCs, where we uncover hundreds of eQTL displaying heterogeneity of genetic effects across cellular contexts. Finally, we identify fine-grained genetic regulation in neuronal subtypes for eQTL that are colocalized with human disease variants.
Asunto(s)
Regulación de la Expresión Génica , Análisis de la Célula Individual , Perfilación de la Expresión Génica/métodos , Humanos , RNA-Seq , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodosRESUMEN
Embryonic development is driven by tightly regulated patterns of gene expression, despite extensive genetic variation among individuals. Studies of expression quantitative trait loci (eQTL) indicate that genetic variation frequently alters gene expression in cell-culture models and differentiated tissues. However, the extent and types of genetic variation impacting embryonic gene expression, and their interactions with developmental programs, remain largely unknown. Here we assessed the effect of genetic variation on transcriptional (expression levels) and post-transcriptional (3' RNA processing) regulation across multiple stages of metazoan development, using 80 inbred Drosophila wild isolates, identifying thousands of developmental-stage-specific and shared QTL. Given the small blocks of linkage disequilibrium in Drosophila, we obtain near base-pair resolution, resolving causal mutations in developmental enhancers, validated transcription-factor-binding sites and RNA motifs. This fine-grain mapping uncovered extensive allelic interactions within enhancers that have opposite effects, thereby buffering their impact on enhancer activity. QTL affecting 3' RNA processing identify new functional motifs leading to transcript isoform diversity and changes in the lengths of 3' untranslated regions. These results highlight how developmental stage influences the effects of genetic variation and uncover multiple mechanisms that regulate and buffer expression variation during embryogenesis.
Asunto(s)
Drosophila melanogaster/embriología , Drosophila melanogaster/genética , Desarrollo Embrionario/genética , Regulación del Desarrollo de la Expresión Génica , Variación Genética , Regiones no Traducidas 3'/genética , Alelos , Animales , Sitios de Unión , Elementos de Facilitación Genéticos , Desequilibrio de Ligamiento , Mutación , Sitios de Carácter Cuantitativo , Procesamiento de Término de ARN 3' , Factores de Transcripción/metabolismo , Transcripción GenéticaRESUMEN
This corrects the article DOI: 10.1038/nature22403.
RESUMEN
Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells.