ABSTRACT
Chronic stimulation can cause T cell dysfunction and limit the efficacy of cellular immunotherapies. Improved methods are required to compare large numbers of synthetic knockin (KI) sequences to reprogram cell functions. Here, we developed modular pooled KI screening (ModPoKI), an adaptable platform for modular construction of DNA KI libraries using barcoded multicistronic adaptors. We built two ModPoKI libraries of 100 transcription factors (TFs) and 129 natural and synthetic surface receptors (SRs). Over 30 ModPoKI screens across human TCR- and CAR-T cells in diverse conditions identified a transcription factor AP4 (TFAP4) construct that enhanced fitness of chronically stimulated CAR-T cells and anti-cancer function in vitro and in vivo. ModPoKI's modularity allowed us to generate an â¼10,000-member library of TF combinations. Non-viral KI of a combined BATF-TFAP4 polycistronic construct enhanced fitness. Overexpressed BATF and TFAP4 co-occupy and regulate key gene targets to reprogram T cell function. ModPoKI facilitates the discovery of complex gene constructs to program cellular functions.
Subject(s)
Cell- and Tissue-Based Therapy , Exercise , Humans , Gene Library , Immunotherapy , ResearchABSTRACT
Defining long-term protective immunity to SARS-CoV-2 is one of the most pressing questions of our time and will require a detailed understanding of potential ways this virus can evolve to escape immune protection. Immune protection will most likely be mediated by antibodies that bind to the viral entry protein, spike (S). Here, we used Phage-DMS, an approach that comprehensively interrogates the effect of all possible mutations on binding to a protein of interest, to define the profile of antibody escape to the SARS-CoV-2 S protein using coronavirus disease 2019 (COVID-19) convalescent plasma. Antibody binding was common in two regions, the fusion peptide and the linker region upstream of the heptad repeat region 2. However, escape mutations were variable within these immunodominant regions. There was also individual variation in less commonly targeted epitopes. This study provides a granular view of potential antibody escape pathways and suggests there will be individual variation in antibody-mediated virus evolution.
Subject(s)
Antibodies, Neutralizing/immunology , Antibodies, Viral/immunology , COVID-19/immunology , Epitopes/immunology , SARS-CoV-2/immunology , Spike Glycoprotein, Coronavirus/immunology , Algorithms , COVID-19/therapy , COVID-19/virology , Cell Line , Gene Library , Humans , Immunization, Passive , Mutation , Protein Domains , SARS-CoV-2/genetics , Software , Spike Glycoprotein, Coronavirus/chemistry , Spike Glycoprotein, Coronavirus/genetics , COVID-19 SerotherapyABSTRACT
Despite remarkable clinical efficacy of immune checkpoint blockade (ICB) in cancer treatment, ICB benefits for triple-negative breast cancer (TNBC) remain limited. Through pooled in vivo CRISPR knockout (KO) screens in syngeneic TNBC mouse models, we found that deletion of the E3 ubiquitin ligase Cop1 in cancer cells decreases secretion of macrophage-associated chemokines, reduces tumor macrophage infiltration, enhances anti-tumor immunity, and strengthens ICB response. Transcriptomics, epigenomics, and proteomics analyses revealed that Cop1 functions through proteasomal degradation of the C/ebpδ protein. The Cop1 substrate Trib2 functions as a scaffold linking Cop1 and C/ebpδ, which leads to polyubiquitination of C/ebpδ. In addition, deletion of the E3 ubiquitin ligase Cop1 in cancer cells stabilizes C/ebpδ to suppress expression of macrophage chemoattractant genes. Our integrated approach implicates Cop1 as a target for improving cancer immunotherapy efficacy in TNBC by regulating chemokine secretion and macrophage infiltration in the tumor microenvironment.
Subject(s)
Clustered Regularly Interspaced Short Palindromic Repeats/genetics , Immunotherapy , Macrophages/enzymology , Neoplasms/immunology , Neoplasms/therapy , Nuclear Proteins/metabolism , Ubiquitin-Protein Ligases/metabolism , Animals , CCAAT-Enhancer-Binding Protein-delta/metabolism , CRISPR-Associated Protein 9/metabolism , Cell Line, Tumor , Chemokines/metabolism , Chemotaxis , Disease Models, Animal , Gene Library , Humans , Immune Evasion , Mice, Inbred BALB C , Mice, Inbred C57BL , Proteolysis , Substrate Specificity , Triple Negative Breast Neoplasms/immunology , Triple Negative Breast Neoplasms/therapyABSTRACT
Although base editors are widely used to install targeted point mutations, the factors that determine base editing outcomes are not well understood. We characterized sequence-activity relationships of 11 cytosine and adenine base editors (CBEs and ABEs) on 38,538 genomically integrated targets in mammalian cells and used the resulting outcomes to train BE-Hive, a machine learning model that accurately predicts base editing genotypic outcomes (R ≈ 0.9) and efficiency (R ≈ 0.7). We corrected 3,388 disease-associated SNVs with ≥90% precision, including 675 alleles with bystander nucleotides that BE-Hive correctly predicted would not be edited. We discovered determinants of previously unpredictable C-to-G, or C-to-A editing and used these discoveries to correct coding sequences of 174 pathogenic transversion SNVs with ≥90% precision. Finally, we used insights from BE-Hive to engineer novel CBE variants that modulate editing outcomes. These discoveries illuminate base editing, enable editing at previously intractable targets, and provide new base editors with improved editing capabilities.
Subject(s)
Gene Editing/methods , Machine Learning , Animals , Gene Library , Humans , Mice , Mouse Embryonic Stem Cells/cytology , Mouse Embryonic Stem Cells/metabolism , Point Mutation , RNA, Guide, Kinetoplastida/metabolismABSTRACT
Drug resistance and relapse remain key challenges in pancreatic cancer. Here, we have used RNA sequencing (RNA-seq), chromatin immunoprecipitation (ChIP)-seq, and genome-wide CRISPR analysis to map the molecular dependencies of pancreatic cancer stem cells, highly therapy-resistant cells that preferentially drive tumorigenesis and progression. This integrated genomic approach revealed an unexpected utilization of immuno-regulatory signals by pancreatic cancer epithelial cells. In particular, the nuclear hormone receptor retinoic-acid-receptor-related orphan receptor gamma (RORγ), known to drive inflammation and T cell differentiation, was upregulated during pancreatic cancer progression, and its genetic or pharmacologic inhibition led to a striking defect in pancreatic cancer growth and a marked improvement in survival. Further, a large-scale retrospective analysis in patients revealed that RORγ expression may predict pancreatic cancer aggressiveness, as it positively correlated with advanced disease and metastasis. Collectively, these data identify an orthogonal co-option of immuno-regulatory signals by pancreatic cancer stem cells, suggesting that autoimmune drugs should be evaluated as novel treatment strategies for pancreatic cancer patients.
Subject(s)
Adenocarcinoma/pathology , Neoplastic Stem Cells/metabolism , Pancreatic Neoplasms/pathology , Adenocarcinoma/genetics , Adenocarcinoma/metabolism , Animals , Cell Adhesion Molecules/genetics , Cell Adhesion Molecules/metabolism , Cell Differentiation , Epigenesis, Genetic , Gene Library , Humans , Mice , Mice, Knockout , Mice, SCID , Neoplastic Stem Cells/cytology , Nuclear Receptor Subfamily 1, Group F, Member 3/antagonists & inhibitors , Nuclear Receptor Subfamily 1, Group F, Member 3/genetics , Nuclear Receptor Subfamily 1, Group F, Member 3/metabolism , Pancreatic Neoplasms/genetics , Pancreatic Neoplasms/metabolism , RNA Interference , RNA, Small Interfering/metabolism , Receptors, G-Protein-Coupled/antagonists & inhibitors , Receptors, G-Protein-Coupled/genetics , Receptors, G-Protein-Coupled/metabolism , Receptors, Interleukin-10/antagonists & inhibitors , Receptors, Interleukin-10/genetics , Receptors, Interleukin-10/metabolism , T-Lymphocytes/cytology , T-Lymphocytes/immunology , T-Lymphocytes/metabolism , Transcriptome , Tumor Cells, CulturedABSTRACT
The discovery of organic ligands that bind specifically to proteins is a central problem in chemistry, biology, and the biomedical sciences. The encoding of individual organic molecules with distinctive DNA tags, serving as amplifiable identification bar codes, allows the construction and screening of combinatorial libraries of unprecedented size, thus facilitating the discovery of ligands to many different protein targets. Fundamentally, one links powers of genetics and chemical synthesis. After the initial description of DNA-encoded chemical libraries in 1992, several experimental embodiments of the technology have been reduced to practice. This review provides a historical account of important milestones in the development of DNA-encoded chemical libraries, a survey of relevant ongoing research activities, and a glimpse into the future.
Subject(s)
Drug Discovery/methods , Gene Library , Small Molecule Libraries , Animals , Combinatorial Chemistry Techniques , Humans , Ligands , Peptide LibraryABSTRACT
RNAs fold into defined tertiary structures to function in critical biological processes. While quantitative models can predict RNA secondary structure stability, we are still unable to predict the thermodynamic stability of RNA tertiary structure. Here, we probe conformational preferences of diverse RNA two-way junctions to develop a predictive model for the formation of RNA tertiary structure. We quantitatively measured tertiary assembly energetics of >1,000 of RNA junctions inserted in multiple structural scaffolds to generate a "thermodynamic fingerprint" for each junction. Thermodynamic fingerprints enabled comparison of junction conformational preferences, revealing principles for how sequence influences 3-dimensional conformations. Utilizing fingerprints of junctions with known crystal structures, we generated ensembles for related junctions that predicted their thermodynamic effects on assembly formation. This work reveals sequence-structure-energetic relationships in RNA, demonstrates the capacity for diverse compensation strategies within tertiary structures, and provides a path to quantitative modeling of RNA folding energetics based on "ensemble modularity."
Subject(s)
RNA/metabolism , Base Pair Mismatch , Gene Library , Nucleic Acid Conformation , Photobleaching , RNA/chemistry , RNA Folding , RNA Stability , ThermodynamicsABSTRACT
Although animals have evolved multiple mechanisms to suppress transposons, "leaky" mobilizations that cause mutations and diseases still occur. This suggests that transposons employ specific tactics to accomplish robust propagation. By directly tracking mobilization, we show that, during a short and specific time window of oogenesis, retrotransposons achieve massive amplification via a cell-type-specific targeting strategy. Retrotransposons rarely mobilize in undifferentiated germline stem cells. However, as oogenesis proceeds, they utilize supporting nurse cells-which are highly polyploid and eventually undergo apoptosis-as factories to massively manufacture invading products. Moreover, retrotransposons rarely integrate into nurse cells themselves but, instead, via microtubule-mediated transport, they preferentially target the DNA of the interconnected oocytes. Blocking microtubule-dependent intercellular transport from nurse cells significantly alleviates damage to the oocyte genome. Our data reveal that parasitic genomic elements can efficiently hijack a host developmental process to propagate robustly, thereby driving evolutionary change and causing disease.
Subject(s)
Drosophila melanogaster/genetics , Long Interspersed Nucleotide Elements , Oogenesis , RNA, Small Interfering , Retroelements , Retroviridae/genetics , Animals , Drosophila Proteins , Female , Gene Library , Gene Silencing , Germ Cells , Green Fluorescent Proteins/metabolism , In Situ Hybridization, Fluorescence , Male , Oocytes/metabolism , Stem Cells/metabolismABSTRACT
Synthetic multicellular systems hold promise as models for understanding natural development of biofilms and higher organisms and as tools for engineering complex multi-component metabolic pathways and materials. However, such efforts require tools to adhere cells into defined morphologies and patterns, and these tools are currently lacking. Here, we report a 100% genetically encoded synthetic platform for modular cell-cell adhesion in Escherichia coli, which provides control over multicellular self-assembly. Adhesive selectivity is provided by a library of outer membrane-displayed nanobodies and antigens with orthogonal intra-library specificities, while affinity is controlled by intrinsic adhesin affinity, competitive inhibition, and inducible expression. We demonstrate the resulting capabilities for quantitative rational design of well-defined morphologies and patterns through homophilic and heterophilic interactions, lattice-like self-assembly, phase separation, differential adhesion, and sequential layering. Compatible with synthetic biology standards, this adhesion toolbox will enable construction of high-level multicellular designs and shed light on the evolutionary transition to multicellularity.
Subject(s)
Cell Adhesion/physiology , Metabolic Engineering/methods , Synthetic Biology/methods , Bacterial Physiological Phenomena , Biological Evolution , Cell Adhesion/genetics , Cell Differentiation/genetics , Cell Differentiation/physiology , Escherichia coli/genetics , Gene Library , Metabolic Networks and Pathways , Single-Domain Antibodies/genetics , Single-Domain Antibodies/immunology , Single-Domain Antibodies/physiologyABSTRACT
Genomics has provided a detailed structural description of the cancer genome. Identifying oncogenic drivers that work primarily through dosage changes is a current challenge. Unrestrained proliferation is a critical hallmark of cancer. We constructed modular, barcoded libraries of human open reading frames (ORFs) and performed screens for proliferation regulators in multiple cell types. Approximately 10% of genes regulate proliferation, with most performing in an unexpectedly highly tissue-specific manner. Proliferation drivers in a given cell type showed specific enrichment in somatic copy number changes (SCNAs) from cognate tumors and helped predict aneuploidy patterns in those tumors, implying that tissue-type-specific genetic network architectures underlie SCNA and driver selection in different cancers. In vivo screening confirmed these results. We report a substantial contribution to the catalog of SCNA-associated cancer drivers, identifying 147 amplified and 107 deleted genes as potential drivers, and derive insights about the genetic network architecture of aneuploidy in tumors.
Subject(s)
Aneuploidy , Neoplasms/pathology , Animals , Cell Line, Tumor , Cell Proliferation , Chromosome Mapping , Chromosomes/genetics , E2F1 Transcription Factor/antagonists & inhibitors , E2F1 Transcription Factor/genetics , E2F1 Transcription Factor/metabolism , Female , Gene Library , Genomics , Humans , Keratins/metabolism , Mice , Mice, Inbred NOD , Mice, SCID , Oncogenes , Open Reading Frames/genetics , RNA Interference , RNA, Small Interfering/metabolismABSTRACT
Elucidation of the mutational landscape of human cancer has progressed rapidly and been accompanied by the development of therapeutics targeting mutant oncogenes. However, a comprehensive mapping of cancer dependencies has lagged behind and the discovery of therapeutic targets for counteracting tumor suppressor gene loss is needed. To identify vulnerabilities relevant to specific cancer subtypes, we conducted a large-scale RNAi screen in which viability effects of mRNA knockdown were assessed for 7,837 genes using an average of 20 shRNAs per gene in 398 cancer cell lines. We describe findings of this screen, outlining the classes of cancer dependency genes and their relationships to genetic, expression, and lineage features. In addition, we describe robust gene-interaction networks recapitulating both protein complexes and functional cooperation among complexes and pathways. This dataset along with a web portal is provided to the community to assist in the discovery and translation of new therapeutic approaches for cancer.
Subject(s)
Neoplasms/genetics , Neoplasms/pathology , RNA Interference , Cell Line, Tumor , Gene Library , Gene Regulatory Networks , Humans , Multiprotein Complexes/metabolism , Neoplasms/metabolism , Oncogenes , RNA, Small Interfering , Signal Transduction , Transcription Factors/metabolismABSTRACT
Genome-wide association studies (GWAS) have successfully identified thousands of associations between common genetic variants and human disease phenotypes, but the majority of these variants are non-coding, often requiring genetic fine-mapping, epigenomic profiling, and individual reporter assays to delineate potential causal variants. We employ a massively parallel reporter assay (MPRA) to simultaneously screen 2,756 variants in strong linkage disequilibrium with 75 sentinel variants associated with red blood cell traits. We show that this assay identifies elements with endogenous erythroid regulatory activity. Across 23 sentinel variants, we conservatively identified 32 MPRA functional variants (MFVs). We used targeted genome editing to demonstrate endogenous enhancer activity across 3 MFVs that predominantly affect the transcription of SMIM1, RBM38, and CD164. Functional follow-up of RBM38 delineates a key role for this gene in the alternative splicing program occurring during terminal erythropoiesis. Finally, we provide evidence for how common GWAS-nominated variants can disrupt cell-type-specific transcriptional regulatory pathways.
Subject(s)
Erythrocytes , Genetic Techniques , Genetic Variation , Alternative Splicing , Cell Line , Cell Lineage/genetics , Erythropoiesis/genetics , Gene Library , Genes, Reporter , Humans , Regulatory Sequences, Nucleic Acid , Transcription, GeneticABSTRACT
Essential gene functions underpin the core reactions required for cell viability, but their contributions and relationships are poorly studied in vivo. Using CRISPR interference, we created knockdowns of every essential gene in Bacillus subtilis and probed their phenotypes. Our high-confidence essential gene network, established using chemical genomics, showed extensive interconnections among distantly related processes and identified modes of action for uncharacterized antibiotics. Importantly, mild knockdown of essential gene functions significantly reduced stationary-phase survival without affecting maximal growth rate, suggesting that essential protein levels are set to maximize outgrowth from stationary phase. Finally, high-throughput microscopy indicated that cell morphology is relatively insensitive to mild knockdown but profoundly affected by depletion of gene function, revealing intimate connections between cell growth and shape. Our results provide a framework for systematic investigation of essential gene functions in vivo broadly applicable to diverse microorganisms and amenable to comparative analysis.
Subject(s)
Bacillus subtilis/genetics , Genes, Bacterial , Genes, Essential , CRISPR-Cas Systems , Gene Knockdown Techniques , Gene Library , Gene Regulatory Networks , Molecular Targeted TherapyABSTRACT
Although studies have identified hundreds of loci associated with human traits and diseases, pinpointing causal alleles remains difficult, particularly for non-coding variants. To address this challenge, we adapted the massively parallel reporter assay (MPRA) to identify variants that directly modulate gene expression. We applied it to 32,373 variants from 3,642 cis-expression quantitative trait loci and control regions. Detection by MPRA was strongly correlated with measures of regulatory function. We demonstrate MPRA's capabilities for pinpointing causal alleles, using it to identify 842 variants showing differential expression between alleles, including 53 well-annotated variants associated with diseases and traits. We investigated one in detail, a risk allele for ankylosing spondylitis, and provide direct evidence of a non-coding variant that alters expression of the prostaglandin EP4 receptor. These results create a resource of concrete leads and illustrate the promise of this approach for comprehensively interrogating how non-coding polymorphism shapes human biology.
Subject(s)
Gene Expression Regulation , Genes, Reporter , Genetic Diseases, Inborn/genetics , Genetic Techniques , Genetic Variation , Alleles , Gene Library , Hep G2 Cells , Humans , Quantitative Trait Loci , Sensitivity and Specificity , Spondylitis, Ankylosing/geneticsABSTRACT
Data of gene expression levels across individuals, cell types, and disease states is expanding, yet our understanding of how expression levels impact phenotype is limited. Here, we present a massively parallel system for assaying the effect of gene expression levels on fitness in Saccharomyces cerevisiae by systematically altering the expression level of â¼100 genes at â¼100 distinct levels spanning a 500-fold range at high resolution. We show that the relationship between expression levels and growth is gene and environment specific and provides information on the function, stoichiometry, and interactions of genes. Wild-type expression levels in some conditions are not optimal for growth, and genes whose fitness is greatly affected by small changes in expression level tend to exhibit lower cell-to-cell variability in expression. Our study addresses a fundamental gap in understanding the functional significance of gene expression regulation and offers a framework for evaluating the phenotypic effects of expression variation.
Subject(s)
Gene Expression Regulation, Fungal , Gene-Environment Interaction , Genetic Fitness , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae/growth & development , Saccharomyces cerevisiae/genetics , DNA Barcoding, Taxonomic , Gene Library , Genes, Fungal , High-Throughput Nucleotide SequencingABSTRACT
The ability to perturb genes in human cells is crucial for elucidating gene function and holds great potential for finding therapeutic targets for diseases such as cancer. To extend the catalog of human core and context-dependent fitness genes, we have developed a high-complexity second-generation genome-scale CRISPR-Cas9 gRNA library and applied it to fitness screens in five human cell lines. Using an improved Bayesian analytical approach, we consistently discover 5-fold more fitness genes than were previously observed. We present a list of 1,580 human core fitness genes and describe their general properties. Moreover, we demonstrate that context-dependent fitness genes accurately recapitulate pathway-specific genetic vulnerabilities induced by known oncogenes and reveal cell-type-specific dependencies for specific receptor tyrosine kinases, even in oncogenic KRAS backgrounds. Thus, rigorous identification of human cell line fitness genes using a high-complexity CRISPR-Cas9 library affords a high-resolution view of the genetic vulnerabilities of a cell.
Subject(s)
Genes, Essential , Bayes Theorem , CRISPR-Cas Systems , Cell Line, Tumor , Gene Knockout Techniques , Gene Library , Humans , MutationABSTRACT
Bacteria have adapted to phage predation by evolving a vast assortment of defence systems1. Although anti-phage immunity genes can be identified using bioinformatic tools, the discovery of novel systems is restricted to the available prokaryotic sequence data2. Here, to overcome this limitation, we infected Escherichia coli carrying a soil metagenomic DNA library3 with the lytic coliphage T4 to isolate clones carrying protective genes. Following this approach, we identified Brig1, a DNA glycosylase that excises α-glucosyl-hydroxymethylcytosine nucleobases from the bacteriophage T4 genome to generate abasic sites and inhibit viral replication. Brig1 homologues that provide immunity against T-even phages are present in multiple phage defence loci across distinct clades of bacteria. Our study highlights the benefits of screening unsequenced DNA and reveals prokaryotic DNA glycosylases as important players in the bacteria-phage arms race.
Subject(s)
Bacteria , Bacteriophage T4 , DNA Glycosylases , Bacteria/classification , Bacteria/enzymology , Bacteria/genetics , Bacteria/immunology , Bacteria/virology , Bacteriophage T4/growth & development , Bacteriophage T4/immunology , Bacteriophage T4/metabolism , DNA Glycosylases/genetics , DNA Glycosylases/metabolism , Escherichia coli/genetics , Escherichia coli/virology , Gene Library , Metagenomics/methods , Soil Microbiology , Virus ReplicationABSTRACT
Deconvolution of regulatory mechanisms that drive transcriptional programs in cancer cells is key to understanding tumor biology. Herein, we present matched transcriptome (scRNA-seq) and chromatin accessibility (scATAC-seq) profiles at single-cell resolution from human ovarian and endometrial tumors processed immediately following surgical resection. This dataset reveals the complex cellular heterogeneity of these tumors and enabled us to quantitatively link variation in chromatin accessibility to gene expression. We show that malignant cells acquire previously unannotated regulatory elements to drive hallmark cancer pathways. Moreover, malignant cells from within the same patients show substantial variation in chromatin accessibility linked to transcriptional output, highlighting the importance of intratumoral heterogeneity. Finally, we infer the malignant cell type-specific activity of transcription factors. By defining the regulatory logic of cancer cells, this work reveals an important reliance on oncogenic regulatory elements and highlights the ability of matched scRNA-seq/scATAC-seq to uncover clinically relevant mechanisms of tumorigenesis in gynecologic cancers.
Subject(s)
Ovarian Neoplasms/genetics , Ovarian Neoplasms/metabolism , RNA, Small Cytoplasmic/genetics , Aged , Carcinogenesis , Chromatin/metabolism , Enhancer Elements, Genetic , Epithelial-Mesenchymal Transition , Female , Gastrointestinal Stromal Tumors/genetics , Gene Library , Genetic Techniques , Genomics , Humans , Kaplan-Meier Estimate , Middle Aged , Oncogenes , Ovary/metabolism , Proteomics , RNA-Seq , Regulatory Elements, Transcriptional , Transcription Factors/metabolism , TranscriptomeABSTRACT
Gene transcription occurs via a cycle of linked events, including initiation, promoter-proximal pausing, and elongation of RNA polymerase II (Pol II). A key question is how transcriptional enhancers influence these events to control gene expression. Here, we present an approach that evaluates the level and change in promoter-proximal transcription (initiation and pausing) in the context of differential gene expression, genome-wide. This combinatorial approach shows that in primary cells, control of gene expression during differentiation is achieved predominantly via changes in transcription initiation rather than via release of Pol II pausing. Using genetically engineered mouse models, deleted for functionally validated enhancers of the α- and ß-globin loci, we confirm that these elements regulate Pol II recruitment and/or initiation to modulate gene expression. Together, our data show that gene expression during differentiation is regulated predominantly at the level of initiation and that enhancers are key effectors of this process.
Subject(s)
Enhancer Elements, Genetic , Promoter Regions, Genetic , RNA Polymerase II/genetics , Transcription Initiation, Genetic , alpha-Globins/genetics , beta-Globins/genetics , Animals , Cell Differentiation , Exons , Fetus , Gene Expression Regulation , Gene Library , HSP70 Heat-Shock Proteins/genetics , HSP70 Heat-Shock Proteins/metabolism , Humans , Introns , K562 Cells , Liver/cytology , Liver/metabolism , Mice , Mice, Knockout , Proto-Oncogene Proteins c-myc/genetics , Proto-Oncogene Proteins c-myc/metabolism , RNA Polymerase II/metabolism , Signal Transduction , alpha-Globins/deficiency , beta-Globins/deficiencyABSTRACT
To address how genetic variation alters gene expression in complex cell mixtures, we developed direct nuclear tagmentation and RNA sequencing (DNTR-seq), which enables whole-genome and mRNA sequencing jointly in single cells. DNTR-seq readily identified minor subclones within leukemia patients. In a large-scale DNA damage screen, DNTR-seq was used to detect regions under purifying selection and identified genes where mRNA abundance was resistant to copy-number alteration, suggesting strong genetic compensation. mRNA sequencing (mRNA-seq) quality equals RNA-only methods, and the low positional bias of genomic libraries allowed detection of sub-megabase aberrations at ultra-low coverage. Each cell library is individually addressable and can be re-sequenced at increased depth, allowing multi-tiered study designs. Additionally, the direct tagmentation protocol enables coverage-independent estimation of ploidy, which can be used to identify cell singlets. Thus, DNTR-seq directly links each cell's state to its corresponding genome at scale, enabling routine analysis of heterogeneous tumors and other complex tissues.