Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 34
Filter
1.
Mol Cell ; 73(1): 183-194.e8, 2019 01 03.
Article in English | MEDLINE | ID: mdl-30503770

ABSTRACT

Mutations that lead to splicing defects can have severe consequences on gene function and cause disease. Here, we explore how human genetic variation affects exon recognition by developing a multiplexed functional assay of splicing using Sort-seq (MFASS). We assayed 27,733 variants in the Exome Aggregation Consortium (ExAC) within or adjacent to 2,198 human exons in the MFASS minigene reporter and found that 3.8% (1,050) of variants, most of which are extremely rare, led to large-effect splice-disrupting variants (SDVs). Importantly, we find that 83% of SDVs are located outside of canonical splice sites, are distributed evenly across distinct exonic and intronic regions, and are difficult to predict a priori. Our results indicate extant, rare genetic variants can have large functional effects on splicing at appreciable rates, even outside the context of disease, and MFASS enables their empirical assessment at scale.


Subject(s)
Exons , Gene Expression Profiling/methods , High-Throughput Nucleotide Sequencing/methods , Mutation , RNA Splicing , Sequence Analysis, DNA/methods , Cell Separation , Computational Biology , Flow Cytometry , HEK293 Cells , HeLa Cells , Hep G2 Cells , Humans , Introns , K562 Cells , Oligonucleotide Array Sequence Analysis , Reproducibility of Results
2.
Nucleic Acids Res ; 48(16): e95, 2020 09 18.
Article in English | MEDLINE | ID: mdl-32692349

ABSTRACT

Multiplexed assays allow functional testing of large synthetic libraries of genetic elements, but are limited by the designability, length, fidelity and scale of the input DNA. Here, we improve DropSynth, a low-cost, multiplexed method that builds gene libraries by compartmentalizing and assembling microarray-derived oligonucleotides in vortexed emulsions. By optimizing enzyme choice, adding enzymatic error correction and increasing scale, we show that DropSynth can build thousands of gene-length fragments at >20% fidelity.


Subject(s)
Gene Library , Genes, Synthetic , Nucleic Acid Amplification Techniques/methods , Oligonucleotides/genetics , Emulsions/chemistry , Escherichia coli/genetics
3.
Nat Methods ; 15(5): 323-329, 2018 05.
Article in English | MEDLINE | ID: mdl-30052624

ABSTRACT

Robust and predictably performing synthetic circuits rely on the use of well-characterized regulatory parts across different genetic backgrounds and environmental contexts. Here we report the large-scale metagenomic mining of thousands of natural 5' regulatory sequences from diverse bacteria, and their multiplexed gene expression characterization in industrially relevant microbes. We identified sequences with broad and host-specific expression properties that are robust in various growth conditions. We also observed substantial differences between species in terms of their capacity to utilize exogenous regulatory sequences. Finally, we demonstrate programmable species-selective gene expression that produces distinct and diverse output patterns in different microbes. Together, these findings provide a rich resource of characterized natural regulatory sequences and a framework that can be used to engineer synthetic gene circuits with unique and tunable cross-species functionality and properties, and also suggest the prospect of ultimately engineering complex behaviors at the community level.


Subject(s)
Gene Expression Regulation/physiology , Metagenomics/methods , Regulatory Elements, Transcriptional/physiology , Data Mining , Escherichia coli/genetics , Escherichia coli/metabolism , Genetic Engineering/methods , Metabolic Engineering , Metabolic Networks and Pathways , Species Specificity , Synthetic Biology/methods
4.
Clin Chem ; 68(1): 143-152, 2021 12 30.
Article in English | MEDLINE | ID: mdl-34286830

ABSTRACT

BACKGROUND: The urgent need for massively scaled clinical testing for SARS-CoV-2, along with global shortages of critical reagents and supplies, has necessitated development of streamlined laboratory testing protocols. Conventional nucleic acid testing for SARS-CoV-2 involves collection of a clinical specimen with a nasopharyngeal swab in transport medium, nucleic acid extraction, and quantitative reverse-transcription PCR (RT-qPCR). As testing has scaled across the world, the global supply chain has buckled, rendering testing reagents and materials scarce. To address shortages, we developed SwabExpress, an end-to-end protocol developed to employ mass produced anterior nares swabs and bypass the requirement for transport media and nucleic acid extraction. METHODS: We evaluated anterior nares swabs, transported dry and eluted in low-TE buffer as a direct-to-RT-qPCR alternative to extraction-dependent viral transport media. We validated our protocol of using heat treatment for viral inactivation and added a proteinase K digestion step to reduce amplification interference. We tested this protocol across archived and prospectively collected swab specimens to fine-tune test performance. RESULTS: After optimization, SwabExpress has a low limit of detection at 2-4 molecules/µL, 100% sensitivity, and 99.4% specificity when compared side by side with a traditional RT-qPCR protocol employing extraction. On real-world specimens, SwabExpress outperforms an automated extraction system while simultaneously reducing cost and hands-on time. CONCLUSION: SwabExpress is a simplified workflow that facilitates scaled testing for COVID-19 without sacrificing test performance. It may serve as a template for the simplification of PCR-based clinical laboratory tests, particularly in times of critical shortages during pandemics.


Subject(s)
COVID-19 Nucleic Acid Testing/methods , COVID-19 , COVID-19/diagnosis , Clinical Laboratory Techniques , Humans , RNA, Viral/isolation & purification , Real-Time Polymerase Chain Reaction , SARS-CoV-2/isolation & purification , Sensitivity and Specificity , Specimen Handling
5.
Biochemistry ; 58(11): 1539-1551, 2019 03 19.
Article in English | MEDLINE | ID: mdl-29388765

ABSTRACT

Promoters are the key drivers of gene expression and are largely responsible for the regulation of cellular responses to time and environment. In Escherichia coli, decades of studies have revealed most, if not all, of the sequence elements necessary to encode promoter function. Despite our knowledge of these motifs, it is still not possible to predict the strength and regulation of a promoter from primary sequence alone. Here we develop a novel multiplexed assay to study promoter function in E. coli by building a site-specific genomic recombination-mediated cassette exchange system that allows for the facile construction and testing of large libraries of genetic designs integrated into precise genomic locations. We build and test a library of 10898 σ70 promoter variants consisting of all combinations of a set of eight -35 elements, eight -10 elements, three UP elements, eight spacers, and eight backgrounds. We find that the -35 and -10 sequence elements can explain approximately 74% of the variance in promoter strength within our data set using a simple log-linear statistical model. Simple neural network models explain >95% of the variance in our data set by capturing nonlinear interactions with the spacer, background, and UP elements.


Subject(s)
Promoter Regions, Genetic/genetics , Promoter Regions, Genetic/physiology , Sigma Factor/genetics , Base Sequence/genetics , DNA-Directed RNA Polymerases/metabolism , Escherichia coli/genetics , Escherichia coli/metabolism , Escherichia coli Proteins/genetics , Escherichia coli Proteins/metabolism , Gene Expression Regulation, Bacterial/genetics , Gene Library , Genomics/methods , Nerve Net/metabolism , Protein Binding/genetics , Sigma Factor/metabolism , Transcription, Genetic/genetics
6.
Nat Methods ; 13(2): 177-83, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26689263

ABSTRACT

Genetic regulatory proteins inducible by small molecules are useful synthetic biology tools as sensors and switches. Bacterial allosteric transcription factors (aTFs) are a major class of regulatory proteins, but few aTFs have been redesigned to respond to new effectors beyond natural aTF-inducer pairs. Altering inducer specificity in these proteins is difficult because substitutions that affect inducer binding may also disrupt allostery. We engineered an aTF, the Escherichia coli lac repressor, LacI, to respond to one of four new inducer molecules: fucose, gentiobiose, lactitol and sucralose. Using computational protein design, single-residue saturation mutagenesis or random mutagenesis, along with multiplex assembly, we identified new variants comparable in specificity and induction to wild-type LacI with its inducer, isopropyl ß-D-1-thiogalactopyranoside (IPTG). The ability to create designer aTFs will enable applications including dynamic control of cell metabolism, cell biology and synthetic gene circuits.


Subject(s)
Escherichia coli Proteins/genetics , Escherichia coli Proteins/metabolism , Escherichia coli/metabolism , Gene Expression Regulation, Bacterial/genetics , Genetic Engineering , Lac Repressors/genetics , Lac Repressors/metabolism , Allosteric Regulation , DNA, Bacterial/genetics , Disaccharides , Escherichia coli/genetics , Fucose , Models, Molecular , Mutation , Protein Binding , Protein Conformation , Sucrose/analogs & derivatives , Sugar Alcohols
7.
Nucleic Acids Res ; 45(15): 9206-9217, 2017 Sep 06.
Article in English | MEDLINE | ID: mdl-28911123

ABSTRACT

Gene synthesis, the process of assembling gene-length fragments from shorter groups of oligonucleotides (oligos), is becoming an increasingly important tool in molecular and synthetic biology. The length, quality and cost of gene synthesis are limited by errors produced during oligo synthesis and subsequent assembly. Enzymatic error correction methods are cost-effective means to ameliorate errors in gene synthesis. Previous analyses of these methods relied on cloning and Sanger sequencing to evaluate their efficiencies, limiting quantitative assessment. Here, we develop a method to quantify errors in synthetic DNA by next-generation sequencing. We analyzed errors in model gene assemblies and systematically compared six different error correction enzymes across 11 conditions. We find that ErrASE and T7 Endonuclease I are the most effective at decreasing average error rates (up to 5.8-fold relative to the input), whereas MutS is the best for increasing the number of perfect assemblies (up to 25.2-fold). We are able to quantify differential specificities such as ErrASE preferentially corrects C/G transversions whereas T7 Endonuclease I preferentially corrects A/T transversions. More generally, this experimental and computational pipeline is a fast, scalable and extensible way to analyze errors in gene assemblies, to profile error correction methods, and to benchmark DNA synthesis methods.


Subject(s)
Chemistry Techniques, Synthetic/standards , DNA/chemical synthesis , Genes, Synthetic , High-Throughput Nucleotide Sequencing , Benchmarking , DNA/genetics , Deoxyribonuclease I/genetics , Deoxyribonuclease I/metabolism , Escherichia coli Proteins/genetics , Escherichia coli Proteins/metabolism , MutS DNA Mismatch-Binding Protein/genetics , MutS DNA Mismatch-Binding Protein/metabolism , Oligodeoxyribonucleotides/chemistry
8.
Nat Methods ; 11(5): 499-507, 2014 May.
Article in English | MEDLINE | ID: mdl-24781323

ABSTRACT

For over 60 years, the synthetic production of new DNA sequences has helped researchers understand and engineer biology. Here we summarize methods and caveats for the de novo synthesis of DNA, with particular emphasis on recent technologies that allow for large-scale and low-cost production. In addition, we discuss emerging applications enabled by large-scale de novo DNA constructs, as well as the challenges and opportunities that lie ahead.


Subject(s)
DNA/biosynthesis , Genetic Engineering/methods , Animals , Automation , Computational Biology/methods , DNA/genetics , DNA Barcoding, Taxonomic , High-Throughput Nucleotide Sequencing , Humans , Mice , Molecular Biology , Oligonucleotide Array Sequence Analysis , Oligonucleotides/genetics , Polymers/chemistry , Protein Engineering/methods , Sequence Analysis, DNA
9.
Proc Natl Acad Sci U S A ; 110(34): 14024-9, 2013 Aug 20.
Article in English | MEDLINE | ID: mdl-23924614

ABSTRACT

The inability to predict heterologous gene expression levels precisely hinders our ability to engineer biological systems. Using well-characterized regulatory elements offers a potential solution only if such elements behave predictably when combined. We synthesized 12,563 combinations of common promoters and ribosome binding sites and simultaneously measured DNA, RNA, and protein levels from the entire library. Using a simple model, we found that RNA and protein expression were within twofold of expected levels 80% and 64% of the time, respectively. The large dataset allowed quantitation of global effects, such as translation rate on mRNA stability and mRNA secondary structure on translation rate. However, the worst 5% of constructs deviated from prediction by 13-fold on average, which could hinder large-scale genetic engineering projects. The ease and scale this of approach indicates that rather than relying on prediction or standardization, we can screen synthetic libraries for desired behavior.


Subject(s)
Escherichia coli/metabolism , Gene Expression Regulation, Bacterial/genetics , Gene Library , Genetic Engineering/methods , Models, Genetic , RNA, Messenger/genetics , Systems Biology/methods , Cloning, Molecular , DNA Primers/genetics , Escherichia coli/genetics , Flow Cytometry , High-Throughput Nucleotide Sequencing , Promoter Regions, Genetic/genetics , Regulatory Elements, Transcriptional/genetics , Reverse Transcriptase Polymerase Chain Reaction , Ribosomes/genetics
10.
Nat Commun ; 15(1): 3335, 2024 Apr 18.
Article in English | MEDLINE | ID: mdl-38637555

ABSTRACT

Understanding the function of rare non-coding variants represents a significant challenge. Using MapUTR, a screening method, we studied the function of rare 3' UTR variants affecting mRNA abundance post-transcriptionally. Among 17,301 rare gnomAD variants, an average of 24.5% were functional, with 70% in cancer-related genes, many in critical cancer pathways. This observation motivated an interrogation of 11,929 somatic mutations, uncovering 3928 (33%) functional mutations in 155 cancer driver genes. Functional MapUTR variants were enriched in microRNA- or protein-binding sites and may underlie outlier gene expression in tumors. Further, we introduce untranslated tumor mutational burden (uTMB), a metric reflecting the amount of somatic functional MapUTR variants of a tumor and show its potential in predicting patient survival. Through prime editing, we characterized three variants in cancer-relevant genes (MFN2, FOSL2, and IRAK1), demonstrating their cancer-driving potential. Our study elucidates the function of tens of thousands of non-coding variants, nominates non-coding cancer driver mutations, and demonstrates their potential contributions to cancer.


Subject(s)
Neoplasms , Oncogenes , Humans , 3' Untranslated Regions/genetics , RNA, Messenger/genetics , Mutation , Neoplasms/genetics
11.
Nat Commun ; 14(1): 4636, 2023 08 02.
Article in English | MEDLINE | ID: mdl-37532706

ABSTRACT

Protein-protein interactions (PPIs) are crucial for biological functions and have applications ranging from drug design to synthetic cell circuits. Coiled-coils have been used as a model to study the sequence determinants of specificity. However, building well-behaved sets of orthogonal pairs of coiled-coils remains challenging due to inaccurate predictions of orthogonality and difficulties in testing at scale. To address this, we develop the next-generation bacterial two-hybrid (NGB2H) method, which allows for the rapid exploration of interactions of programmed protein libraries in a quantitative and scalable way using next-generation sequencing readout. We design, build, and test large sets of orthogonal synthetic coiled-coils, assayed over 8,000 PPIs, and used the dataset to train a more accurate coiled-coil scoring algorithm (iCipa). After characterizing nearly 18,000 new PPIs, we identify to the best of our knowledge the largest set of orthogonal coiled-coils to date, with fifteen on-target interactions. Our approach provides a powerful tool for the design of orthogonal PPIs.


Subject(s)
Algorithms , Proteins , Proteins/genetics , Proteins/metabolism
12.
bioRxiv ; 2023 May 11.
Article in English | MEDLINE | ID: mdl-37214829

ABSTRACT

Cellular transcription enables cells to adapt to various stimuli and maintain homeostasis. Transcription factors bind to transcription response elements (TREs) in gene promoters, initiating transcription. Synthetic promoters, derived from natural TREs, can be engineered to control exogenous gene expression using endogenous transcription machinery. This technology has found extensive use in biological research for applications including reporter gene assays, biomarker development, and programming synthetic circuits in living cells. However, a reliable and precise method for selecting minimally-sized synthetic promoters with desired background, amplitude, and stimulation response profiles has been elusive. In this study, we introduce a massively parallel reporter assay library containing 6184 synthetic promoters, each less than 250 bp in length. This comprehensive library allows for rapid identification of promoters with optimal transcriptional output parameters across multiple cell lines and stimuli. We showcase this library's utility to identify promoters activated in unique cell types, and in response to metabolites, mitogens, cellular toxins, and agonism of both aminergic and non-aminergic GPCRs. We further show these promoters can be used in luciferase reporter assays, eliciting 50-100 fold dynamic ranges in response to stimuli. Our platform is effective, easily implemented, and provides a solution for selecting short-length promoters with precise performance for a multitude of applications.

13.
Cell Genom ; 3(10): 100404, 2023 Oct 11.
Article in English | MEDLINE | ID: mdl-37868037

ABSTRACT

Genome-wide association studies (GWASs) have successfully identified 145 genomic regions that contribute to schizophrenia risk, but linkage disequilibrium makes it challenging to discern causal variants. We performed a massively parallel reporter assay (MPRA) on 5,173 fine-mapped schizophrenia GWAS variants in primary human neural progenitors and identified 439 variants with allelic regulatory effects (MPRA-positive variants). Transcription factor binding had modest predictive power, while fine-map posterior probability, enhancer overlap, and evolutionary conservation failed to predict MPRA-positive variants. Furthermore, 64% of MPRA-positive variants did not exhibit expressive quantitative trait loci signature, suggesting that MPRA could identify yet unexplored variants with regulatory potentials. To predict the combinatorial effect of MPRA-positive variants on gene regulation, we propose an accessibility-by-contact model that combines MPRA-measured allelic activity with neuronal chromatin architecture.

14.
Science ; 377(6608): eabi8654, 2022 08 19.
Article in English | MEDLINE | ID: mdl-35981026

ABSTRACT

Predicting the function of noncoding variation is a major challenge in modern genetics. In this study, we used massively parallel reporter assays to screen 5706 variants identified from genome-wide association studies for both Alzheimer's disease (AD) and progressive supranuclear palsy (PSP), identifying 320 functional regulatory variants (frVars) across 27 loci, including the complex 17q21.31 region. We identified and validated multiple risk loci using CRISPR interference or excision, including complement 4 (C4A) and APOC1 in AD and PLEKHM1 and KANSL1 in PSP. Functional variants disrupt transcription factor binding sites converging on enhancers with cell type-specific activity in PSP and AD, implicating a neuronal SP1-driven regulatory network in PSP pathogenesis. These analyses suggest that noncoding genetic risk is driven by common genetic variants through their aggregate activity on specific transcriptional programs.


Subject(s)
Alzheimer Disease , Chromosomes, Human, Pair 17 , Gene Regulatory Networks , Genetic Variation , Untranslated Regions , Alzheimer Disease/genetics , Chromosomes, Human, Pair 17/genetics , Genes, Reporter , Genetic Loci , Genome-Wide Association Study , Humans , Risk Factors , Supranuclear Palsy, Progressive/genetics , Untranslated Regions/genetics
15.
Nat Commun ; 12(1): 325, 2021 01 12.
Article in English | MEDLINE | ID: mdl-33436562

ABSTRACT

A crucial step towards engineering biological systems is the ability to precisely tune the genetic response to environmental stimuli. In the case of Escherichia coli inducible promoters, our incomplete understanding of the relationship between sequence composition and gene expression hinders our ability to predictably control transcriptional responses. Here, we profile the expression dynamics of 8269 rationally designed, IPTG-inducible promoters that collectively explore the individual and combinatorial effects of RNA polymerase and LacI repressor binding site strengths. We then fit a statistical mechanics model to measured expression that accurately models gene expression and reveals properties of theoretically optimal inducible promoters. Furthermore, we characterize three alternative promoter architectures and show that repositioning binding sites within promoters influences the types of combinatorial effects observed between promoter elements. In total, this approach enables us to deconstruct relationships between inducible promoter elements and discover practical insights for engineering inducible promoters with desirable characteristics.


Subject(s)
Isopropyl Thiogalactoside/pharmacology , Logic , Promoter Regions, Genetic , Binding Sites , Biophysical Phenomena , DNA-Directed RNA Polymerases/metabolism , Escherichia coli/drug effects , Escherichia coli/metabolism , Fluorescence , Genes, Reporter , Mutation/genetics , Operator Regions, Genetic/genetics , Protein Binding , Reproducibility of Results , Thermodynamics , Transcription Factors/metabolism
16.
bioRxiv ; 2021 Apr 29.
Article in English | MEDLINE | ID: mdl-32511368

ABSTRACT

BACKGROUND: The urgent need for massively scaled clinical testing for SARS-CoV-2, along with global shortages of critical reagents and supplies, has necessitated development of streamlined laboratory testing protocols. Conventional nucleic acid testing for SARS-CoV-2 involves collection of a clinical specimen with a nasopharyngeal swab in transport medium, nucleic acid extraction, and quantitative reverse transcription PCR (RT-qPCR) (1). As testing has scaled across the world, the global supply chain has buckled, rendering testing reagents and materials scarce (2). To address shortages, we developed SwabExpress, an end-to-end protocol developed to employ mass produced anterior nares swabs and bypass the requirement for transport media and nucleic acid extraction. METHODS: We evaluated anterior nares swabs, transported dry and eluted in low-TE buffer as a direct-to-RT-qPCR alternative to extraction-dependent viral transport media. We validated our protocol of using heat treatment for viral activation and added a proteinase K digestion step to reduce amplification interference. We tested this protocol across archived and prospectively collected swab specimens to fine-tune test performance. RESULTS: After optimization, SwabExpress has a low limit of detection at 2-4 molecules/uL, 100% sensitivity, and 99.4% specificity when compared side-by-side with a traditional RT-qPCR protocol employing extraction. On real-world specimens, SwabExpress outperforms an automated extraction system while simultaneously reducing cost and hands-on time. CONCLUSION: SwabExpress is a simplified workflow that facilitates scaled testing for COVID-19 without sacrificing test performance. It may serve as a template for the simplification of PCR-based clinical laboratory tests, particularly in times of critical shortages during pandemics.

17.
medRxiv ; 2021 Mar 09.
Article in English | MEDLINE | ID: mdl-32909008

ABSTRACT

The rapid spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is due to the high rates of transmission by individuals who are asymptomatic at the time of transmission1,2. Frequent, widespread testing of the asymptomatic population for SARS-CoV-2 is essential to suppress viral transmission. Despite increases in testing capacity, multiple challenges remain in deploying traditional reverse transcription and quantitative PCR (RT-qPCR) tests at the scale required for population screening of asymptomatic individuals. We have developed SwabSeq, a high-throughput testing platform for SARS-CoV-2 that uses next-generation sequencing as a readout. SwabSeq employs sample-specific molecular barcodes to enable thousands of samples to be combined and simultaneously analyzed for the presence or absence of SARS-CoV-2 in a single run. Importantly, SwabSeq incorporates an in vitro RNA standard that mimics the viral amplicon, but can be distinguished by sequencing. This standard allows for end-point rather than quantitative PCR, improves quantitation, reduces requirements for automation and sample-to-sample normalization, enables purification-free detection, and gives better ability to call true negatives. After setting up SwabSeq in a high-complexity CLIA laboratory, we performed more than 80,000 tests for COVID-19 in less than two months, confirming in a real world setting that SwabSeq inexpensively delivers highly sensitive and specific results at scale, with a turn-around of less than 24 hours. Our clinical laboratory uses SwabSeq to test both nasal and saliva samples without RNA extraction, while maintaining analytical sensitivity comparable to or better than traditional RT-qPCR tests. Moving forward, SwabSeq can rapidly scale up testing to mitigate devastating spread of novel pathogens.

18.
Nat Biomed Eng ; 5(7): 657-665, 2021 07.
Article in English | MEDLINE | ID: mdl-34211145

ABSTRACT

Frequent and widespread testing of members of the population who are asymptomatic for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is essential for the mitigation of the transmission of the virus. Despite the recent increases in testing capacity, tests based on quantitative polymerase chain reaction (qPCR) assays cannot be easily deployed at the scale required for population-wide screening. Here, we show that next-generation sequencing of pooled samples tagged with sample-specific molecular barcodes enables the testing of thousands of nasal or saliva samples for SARS-CoV-2 RNA in a single run without the need for RNA extraction. The assay, which we named SwabSeq, incorporates a synthetic RNA standard that facilitates end-point quantification and the calling of true negatives, and that reduces the requirements for automation, purification and sample-to-sample normalization. We used SwabSeq to perform 80,000 tests, with an analytical sensitivity and specificity comparable to or better than traditional qPCR tests, in less than two months with turnaround times of less than 24 h. SwabSeq could be rapidly adapted for the detection of other pathogens.


Subject(s)
RNA, Viral/genetics , SARS-CoV-2/pathogenicity , Saliva/virology , High-Throughput Nucleotide Sequencing , Humans , SARS-CoV-2/genetics , Sensitivity and Specificity
19.
Elife ; 92020 11 12.
Article in English | MEDLINE | ID: mdl-33179598

ABSTRACT

Sequence variation in regulatory DNA alters gene expression and shapes genetically complex traits. However, the identification of individual, causal regulatory variants is challenging. Here, we used a massively parallel reporter assay to measure the cis-regulatory consequences of 5832 natural DNA variants in the promoters of 2503 genes in the yeast Saccharomyces cerevisiae. We identified 451 causal variants, which underlie genetic loci known to affect gene expression. Several promoters harbored multiple causal variants. In five promoters, pairs of variants showed non-additive, epistatic interactions. Causal variants were enriched at conserved nucleotides, tended to have low derived allele frequency, and were depleted from promoters of essential genes, which is consistent with the action of negative selection. Causal variants were also enriched for alterations in transcription factor binding sites. Models integrating these features provided modest, but statistically significant, ability to predict causal variants. This work revealed a complex molecular basis for cis-acting regulatory variation.


Subject(s)
Gene Expression Regulation, Fungal/genetics , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Crosses, Genetic , DNA Barcoding, Taxonomic , DNA, Fungal/genetics , Gene Library , Genetic Variation , Promoter Regions, Genetic/genetics , Quantitative Trait Loci , Saccharomyces cerevisiae Proteins/genetics
20.
Cell Syst ; 11(1): 75-85.e7, 2020 07 22.
Article in English | MEDLINE | ID: mdl-32603702

ABSTRACT

In eukaryotes, transcription factors (TFs) orchestrate gene expression by binding to TF-binding sites (TFBSs) and localizing transcriptional co-regulators and RNA polymerase II to cis-regulatory elements. However, we lack a basic understanding of the relationship between TFBS composition and their quantitative transcriptional responses. Here, we measured expression driven by 17,406 synthetic cis-regulatory elements with varied compositions of a model TFBS, the c-AMP response element (CRE) by using massively parallel reporter assays (MPRAs). We find CRE number, affinity, and promoter proximity largely determines expression. In addition, we observe expression modulation based on the spacing between CREs and CRE distance to the promoter, where expression follows a helical periodicity. Finally, we compare library expression between an episomal MPRA and a genomically integrated MPRA, where a single cis-regulatory element is assayed per cell at a defined locus. These assays largely recapitulate each other, although weaker, non-canonical CREs exhibit greater activity in a genomic context.


Subject(s)
Adenosine Monophosphate/metabolism , Genomics/methods , Plasmids/metabolism , Response Elements/genetics , Humans
SELECTION OF CITATIONS
SEARCH DETAIL