RESUMEN
We applied a combinatorial indexing assay, sci-ATAC-seq, to profile genome-wide chromatin accessibility in â¼100,000 single cells from 13 adult mouse tissues. We identify 85 distinct patterns of chromatin accessibility, most of which can be assigned to cell types, and â¼400,000 differentially accessible elements. We use these data to link regulatory elements to their target genes, to define the transcription factor grammar specifying each cell type, and to discover in vivo correlates of heterogeneity in accessibility within cell types. We develop a technique for mapping single cell gene expression data to single-cell chromatin accessibility data, facilitating the comparison of atlases. By intersecting mouse chromatin accessibility with human genome-wide association summary statistics, we identify cell-type-specific enrichments of the heritability signal for hundreds of complex traits. These data define the in vivo landscape of the regulatory genome for common mammalian cell types at single-cell resolution.
Asunto(s)
Cromatina/química , Análisis de la Célula Individual/métodos , Animales , Análisis por Conglomerados , Epigénesis Genética , Epigenómica , Regulación de la Expresión Génica , Genoma Humano , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Mamíferos , Ratones , Ratones Endogámicos C57BL , Factores de TranscripciónRESUMEN
Mammalian organogenesis is a remarkable process. Within a short timeframe, the cells of the three germ layers transform into an embryo that includes most of the major internal and external organs. Here we investigate the transcriptional dynamics of mouse organogenesis at single-cell resolution. Using single-cell combinatorial indexing, we profiled the transcriptomes of around 2 million cells derived from 61 embryos staged between 9.5 and 13.5 days of gestation, in a single experiment. The resulting 'mouse organogenesis cell atlas' (MOCA) provides a global view of developmental processes during this critical window. We use Monocle 3 to identify hundreds of cell types and 56 trajectories, many of which are detected only because of the depth of cellular coverage, and collectively define thousands of corresponding marker genes. We explore the dynamics of gene expression within cell types and trajectories over time, including focused analyses of the apical ectodermal ridge, limb mesenchyme and skeletal muscle.
Asunto(s)
Embrión de Mamíferos/citología , Embrión de Mamíferos/embriología , Regulación del Desarrollo de la Expresión Génica/genética , Organogénesis/genética , Análisis de la Célula Individual/métodos , Transcriptoma , Animales , Ectodermo/citología , Ectodermo/embriología , Ectodermo/metabolismo , Embrión de Mamíferos/metabolismo , Femenino , Marcadores Genéticos , Masculino , Mesodermo/citología , Mesodermo/embriología , Mesodermo/metabolismo , Ratones , Desarrollo de Músculos/genética , Músculo Esquelético/citología , Músculo Esquelético/embriología , Músculo Esquelético/metabolismo , Especificidad de Órganos/genética , Análisis de Secuencia de ARN , Factores de TiempoRESUMEN
Understanding how gene regulatory networks control the progressive restriction of cell fates is a long-standing challenge. Recent advances in measuring gene expression in single cells are providing new insights into lineage commitment. However, the regulatory events underlying these changes remain unclear. Here we investigate the dynamics of chromatin regulatory landscapes during embryogenesis at single-cell resolution. Using single-cell combinatorial indexing assay for transposase accessible chromatin with sequencing (sci-ATAC-seq), we profiled chromatin accessibility in over 20,000 single nuclei from fixed Drosophila melanogaster embryos spanning three landmark embryonic stages: 2-4 h after egg laying (predominantly stage 5 blastoderm nuclei), when each embryo comprises around 6,000 multipotent cells; 6-8 h after egg laying (predominantly stage 10-11), to capture a midpoint in embryonic development when major lineages in the mesoderm and ectoderm are specified; and 10-12 h after egg laying (predominantly stage 13), when each of the embryo's more than 20,000 cells are undergoing terminal differentiation. Our results show that there is spatial heterogeneity in the accessibility of the regulatory genome before gastrulation, a feature that aligns with future cell fate, and that nuclei can be temporally ordered along developmental trajectories. During mid-embryogenesis, tissue granularity emerges such that individual cell types can be inferred by their chromatin accessibility while maintaining a signature of their germ layer of origin. Analysis of the data reveals overlapping usage of regulatory elements between cells of the endoderm and non-myogenic mesoderm, suggesting a common developmental program that is reminiscent of the mesendoderm lineage in other species. We identify 30,075 distal regulatory elements that exhibit tissue-specific accessibility. We validated the germ-layer specificity of a subset of these predicted enhancers in transgenic embryos, achieving an accuracy of 90%. Overall, our results demonstrate the power of shotgun single-cell profiling of embryos to resolve dynamic changes in the chromatin landscape during development, and to uncover the cis-regulatory programs of metazoan germ layers and cell types.
Asunto(s)
Drosophila melanogaster/citología , Drosophila melanogaster/embriología , Desarrollo Embrionario/genética , Regulación del Desarrollo de la Expresión Génica , Análisis de la Célula Individual , Animales , Diferenciación Celular/genética , Linaje de la Célula/genética , Cromatina/genética , Cromatina/metabolismo , Drosophila melanogaster/genética , Endodermo/citología , Endodermo/metabolismo , Elementos de Facilitación Genéticos/genética , Femenino , Gastrulación/genética , Genoma de los Insectos/genética , Masculino , Mesodermo/citología , Mesodermo/metabolismo , Especificidad de Órganos/genética , Organismos Modificados Genéticamente/citología , Organismos Modificados Genéticamente/genética , Reproducibilidad de los ResultadosRESUMEN
Single-cell genome sequencing has proven valuable for the detection of somatic variation, particularly in the context of tumor evolution. Current technologies suffer from high library construction costs, which restrict the number of cells that can be assessed and thus impose limitations on the ability to measure heterogeneity within a tissue. Here, we present single-cell combinatorial indexed sequencing (SCI-seq) as a means of simultaneously generating thousands of low-pass single-cell libraries for detection of somatic copy-number variants. We constructed libraries for 16,698 single cells from a combination of cultured cell lines, primate frontal cortex tissue and two human adenocarcinomas, and obtained a detailed assessment of subclonal variation within a pancreatic tumor.
Asunto(s)
Adenocarcinoma/genética , Mapeo Cromosómico/métodos , Variaciones en el Número de Copia de ADN/genética , Lóbulo Frontal/citología , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias Pancreáticas/genética , Análisis de Secuencia de ADN/métodos , Análisis de la Célula Individual/métodos , Animales , Línea Celular Tumoral , Biblioteca de Genes , Genoma Humano/genética , Células HeLa , Humanos , Macaca mulattaRESUMEN
We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to > 1 megabase. These pools are "subhaploid," in that the lengths of fragments contained in each pool sums to â¼5% to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate "joins" are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by eight- to 57-fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing midrange contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Transposasas/metabolismo , Animales , Biología Computacional/métodos , Biblioteca de Genes , Genómica/métodos , Humanos , Ratones , Programas InformáticosRESUMEN
High-throughput chemical screens typically use coarse assays such as cell survival, limiting what can be learned about mechanisms of action, off-target effects, and heterogeneous responses. Here, we introduce "sci-Plex," which uses "nuclear hashing" to quantify global transcriptional responses to thousands of independent perturbations at single-cell resolution. As a proof of concept, we applied sci-Plex to screen three cancer cell lines exposed to 188 compounds. In total, we profiled ~650,000 single-cell transcriptomes across ~5000 independent samples in one experiment. Our results reveal substantial intercellular heterogeneity in response to specific compounds, commonalities in response to families of compounds, and insight into differential properties within families. In particular, our results with histone deacetylase inhibitors support the view that chromatin acts as an important reservoir of acetate in cancer cells.
Asunto(s)
Ensayos Analíticos de Alto Rendimiento , Neoplasias/metabolismo , RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Transcriptoma/efectos de los fármacos , Células A549 , Acetatos/metabolismo , Núcleo Celular/efectos de los fármacos , Núcleo Celular/metabolismo , Cromatina/metabolismo , Genómica , Inhibidores de Histona Desacetilasas/farmacología , Humanos , Células K562 , Células MCF-7 , Neoplasias/genéticaRESUMEN
Although we can increasingly measure transcription, chromatin, methylation, and other aspects of molecular biology at single-cell resolution, most assays survey only one aspect of cellular biology. Here we describe sci-CAR, a combinatorial indexing-based coassay that jointly profiles chromatin accessibility and mRNA (CAR) in each of thousands of single cells. As a proof of concept, we apply sci-CAR to 4825 cells, including a time series of dexamethasone treatment, as well as to 11,296 cells from the adult mouse kidney. With the resulting data, we compare the pseudotemporal dynamics of chromatin accessibility and gene expression, reconstruct the chromatin accessibility profiles of cell types defined by RNA profiles, and link cis-regulatory sites to their target genes on the basis of the covariance of chromatin accessibility and transcription across large numbers of single cells.
Asunto(s)
Cromatina/metabolismo , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , Genómica/métodos , Análisis de la Célula Individual/métodos , Células A549 , Animales , Dexametasona/farmacología , Regulación de la Expresión Génica/efectos de los fármacos , Células HEK293 , Humanos , Riñón/citología , Riñón/efectos de los fármacos , Ratones , Células 3T3 NIH , Elementos Reguladores de la Transcripción/efectos de los fármacos , Transcripción Genética/efectos de los fármacosRESUMEN
Most genomes to date have been sequenced without taking into account the diploid nature of the genome. However, the distribution of variants on each individual chromosome can (1) significantly impact gene regulation and protein function, (2) have important implications for analyses of population history and medical genetics, and (3) be of great value for accurate interpretation of medically relevant genetic variation. Here, we describe a comprehensive and detailed protocol for an ultra fast (<3 h library preparation), cost-effective, and scalable haplotyping method, named Contiguity Preserving Transposition sequencing or CPT-seq (Amini et al., Nat Genet 46(12):1343-1349, 2014). CPT-seq accurately phases >95 % of the whole human genome in Mb-scale phasing blocks. Additionally, the same workflow can be used to aid de novo assembly (Adey et al., Genome Res 24(12):2041-2049, 2014), detect structural variants, and perform single cell ATAC-seq analysis (Cusanovich et al., Science 348(6237):910-914, 2015).
Asunto(s)
Genoma Humano/genética , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN , Análisis de la Célula IndividualRESUMEN
Next generation sequencing of the RNA content of single cells or single nuclei (sc/nRNA-seq) has become a powerful approach to understand the cellular complexity and diversity of multicellular organisms and environmental ecosystems. However, the fact that the procedure begins with a relatively small amount of starting material, thereby pushing the limits of the laboratory procedures required, dictates that careful approaches for sample quality control (QC) are essential to reduce the impact of technical noise and sample bias in downstream analysis applications. Here we present a preliminary framework for sample level quality control that is based on the collection of a series of quantitative laboratory and data metrics that are used as features for the construction of QC classification models using random forest machine learning approaches. We've applied this initial framework to a dataset comprised of 2272 single nuclei RNA-seq results and determined that ~79% of samples were of high quality. Removal of the poor quality samples from downstream analysis was found to improve the cell type clustering results. In addition, this approach identified quantitative features related to the proportion of unique or duplicate reads and the proportion of reads remaining after quality trimming as useful features for pass/fail classification. The construction and use of classification models for the identification of poor quality samples provides for an objective and scalable approach to sc/nRNA-seq quality control.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Neocórtex/citología , Neocórtex/metabolismo , ARN Nuclear/genética , Análisis de Secuencia de ARN/estadística & datos numéricos , Autopsia , Sesgo , Núcleo Celular/genética , Biología Computacional , Bases de Datos de Ácidos Nucleicos , Árboles de Decisión , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos , Aprendizaje Automático , Control de Calidad , Análisis de Secuencia de ARN/normas , Análisis de la Célula Individual , Programas InformáticosRESUMEN
Haplotype-resolved genome sequencing promises to unlock a wealth of information in population and medical genetics. However, for the vast majority of genomes sequenced to date, haplotypes have not been determined because of cumbersome haplotyping workflows that require fractions of the genome to be sequenced in a large number of compartments. Here we demonstrate barcode partitioning of long DNA molecules in a single compartment using "on-bead" barcoded tagmentation. The key to the method that we call "contiguity preserving transposition" sequencing on beads (CPTv2-seq) is transposon-mediated transfer of homogenous populations of barcodes from beads to individual long DNA molecules that get fragmented at the same time (tagmentation). These are then processed to sequencing libraries wherein all sequencing reads originating from each long DNA molecule share a common barcode. Single-tube, bulk processing of long DNA molecules with â¼150,000 different barcoded bead types provides a barcode-linked read structure that reveals long-range molecular contiguity. This technology provides a simple, rapid, plate-scalable and automatable route to accurate, haplotype-resolved sequencing, and phasing of structural variants of the genome.
Asunto(s)
Código de Barras del ADN Taxonómico/métodos , Genoma Humano/genética , Genómica/métodos , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , HumanosRESUMEN
Technical advances have enabled the collection of genome and transcriptome data sets with single-cell resolution. However, single-cell characterization of the epigenome has remained challenging. Furthermore, because cells must be physically separated before biochemical processing, conventional single-cell preparatory methods scale linearly. We applied combinatorial cellular indexing to measure chromatin accessibility in thousands of single cells per assay, circumventing the need for compartmentalization of individual cells. We report chromatin accessibility profiles from more than 15,000 single cells and use these data to cluster cells on the basis of chromatin accessibility landscapes. We identify modules of coordinately regulated chromatin accessibility at the level of single cells both between and within cell types, with a scalable method that may accelerate progress toward a human cell atlas.
Asunto(s)
Cromatina/metabolismo , Epigénesis Genética , Análisis de la Célula Individual/métodos , Células HEK293 , Células HL-60 , HumanosRESUMEN
Haplotype-resolved genome sequencing enables the accurate interpretation of medically relevant genetic variation, deep inferences regarding population history and non-invasive prediction of fetal genomes. We describe an approach for genome-wide haplotyping based on contiguity-preserving transposition (CPT-seq) and combinatorial indexing. Tn5 transposition is used to modify DNA with adaptor and index sequences while preserving contiguity. After DNA dilution and compartmentalization, the transposase is removed, resolving the DNA into individually indexed libraries. The libraries in each compartment, enriched for neighboring genomic elements, are further indexed via PCR. Combinatorial 96-plex indexing at both the transposition and PCR stage enables the construction of phased synthetic reads from each of the nearly 10,000 'virtual compartments'. We demonstrate the feasibility of this method by assembling >95% of the heterozygous variants in a human genome into long, accurate haplotype blocks (N50 = 1.4-2.3 Mb). The rapid, scalable and cost-effective workflow could enable haplotype resolution to become routine in human genome sequencing.