ABSTRACT
A central goal of genetics is to understand the links between genetic variation and disease. Intuitively, one might expect disease-causing variants to cluster into key pathways that drive disease etiology. But for complex traits, association signals tend to be spread across most of the genome-including near many genes without an obvious connection to disease. We propose that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells are liable to affect the functions of core disease-related genes and that most heritability can be explained by effects on genes outside core pathways. We refer to this hypothesis as an "omnigenic" model.
Subject(s)
Disease/genetics , Multifactorial Inheritance , Animals , Genetic Diseases, Inborn/genetics , Genome-Wide Association Study , Genomics , Humans , Polymorphism, Single NucleotideABSTRACT
Saturation mutagenesis--coupled to an appropriate biological assay--represents a fundamental means of achieving a high-resolution understanding of regulatory and protein-coding nucleic acid sequences of interest. However, mutagenized sequences introduced in trans on episomes or via random or "safe-harbour" integration fail to capture the native context of the endogenous chromosomal locus. This shortcoming markedly limits the interpretability of the resulting measurements of mutational impact. Here, we couple CRISPR/Cas9 RNA-guided cleavage with multiplex homology-directed repair using a complex library of donor templates to demonstrate saturation editing of genomic regions. In exon 18 of BRCA1, we replace a six-base-pair (bp) genomic region with all possible hexamers, or the full exon with all possible single nucleotide variants (SNVs), and measure strong effects on transcript abundance attributable to nonsense-mediated decay and exonic splicing elements. We similarly perform saturation genome editing of a well-conserved coding region of an essential gene, DBR1, and measure relative effects on growth that correlate with functional impact. Measurement of the functional consequences of large numbers of mutations with saturation genome editing will potentially facilitate high-resolution functional dissection of both cis-regulatory elements and trans-acting factors, as well as the interpretation of variants of uncertain significance observed in clinical sequencing.
Subject(s)
Genomics/methods , Molecular Sequence Annotation/methods , Mutagenesis/genetics , Recombinational DNA Repair/genetics , CRISPR-Associated Proteins/metabolism , CRISPR-Cas Systems/genetics , Cell Line , Clustered Regularly Interspaced Short Palindromic Repeats/genetics , Conserved Sequence/genetics , Exons/genetics , Genes, BRCA1 , Genes, Essential/genetics , Humans , Nonsense Mediated mRNA Decay , Open Reading Frames/genetics , Point Mutation/genetics , RNA Nucleotidyltransferases/genetics , RNA Splicing/genetics , Regulatory Sequences, Nucleic Acid/genetics , Templates, GeneticABSTRACT
The bacterial adaptive immune system CRISPR-Cas9 has been appropriated as a versatile tool for editing genomes, controlling gene expression, and visualizing genetic loci. To analyze Cas9's ability to bind DNA rapidly and specifically, we generated multiple libraries of potential binding partners for measuring the kinetics of nuclease-dead Cas9 (dCas9) interactions. Using a massively parallel method to quantify protein-DNA interactions on a high-throughput sequencing flow cell, we comprehensively assess the effects of combinatorial mismatches between guide RNA (gRNA) and target nucleotides, both in the seed and in more distal nucleotides, plus disruption of the protospacer adjacent motif (PAM). We report two consequences of PAM-distal mismatches: reversal of dCas9 binding at long time scales, and synergistic changes in association kinetics when other gRNA-target mismatches are present. Together, these observations support a model for Cas9 specificity wherein gRNA-DNA mismatches at PAM-distal bases modulate different biophysical parameters that determine association and dissociation rates. The methods we present decouple aspects of kinetic and thermodynamic properties of the Cas9-DNA interaction and broaden the toolkit for investigating off-target binding behavior.
Subject(s)
Bacterial Proteins/metabolism , DNA/metabolism , Endonucleases/metabolism , RNA, Guide, Kinetoplastida/metabolism , CRISPR-Associated Protein 9 , High-Throughput Screening AssaysABSTRACT
Powerful new technologies for perturbing genetic elements have recently expanded the study of genetic interactions in model systems ranging from yeast to human cell lines. However, technical artifacts can confound signal across genetic screens and limit the immense potential of parallel screening approaches. To address this problem, we devised a novel PCA-based method for correcting genome-wide screening data, bolstering the sensitivity and specificity of detection for genetic interactions. Applying this strategy to a set of 436 whole genome CRISPR screens, we report more than 1.5 million pairs of correlated "co-functional" genes that provide finer-scale information about cell compartments, biological pathways, and protein complexes than traditional gene sets. Lastly, we employed a gene community detection approach to implicate core genes for cancer growth and compress signal from functionally related genes in the same community into a single score. This work establishes new algorithms for probing cancer cell networks and motivates the acquisition of further CRISPR screen data across diverse genotypes and cell types to further resolve complex cellular processes.
Subject(s)
Clustered Regularly Interspaced Short Palindromic Repeats/genetics , Gene Regulatory Networks/genetics , Genome, Human/genetics , Neoplasms/genetics , Algorithms , Epistasis, Genetic , Genomics/methods , Genotype , Humans , Neoplasms/pathologyABSTRACT
Freeman-Sheldon syndrome, or distal arthrogryposis type 2A (DA2A), is an autosomal-dominant condition caused by mutations in MYH3 and characterized by multiple congenital contractures of the face and limbs and normal cognitive development. We identified a subset of five individuals who had been putatively diagnosed with "DA2A with severe neurological abnormalities" and for whom congenital contractures of the limbs and face, hypotonia, and global developmental delay had resulted in early death in three cases; this is a unique condition that we now refer to as CLIFAHDD syndrome. Exome sequencing identified missense mutations in the sodium leak channel, non-selective (NALCN) in four families affected by CLIFAHDD syndrome. We used molecular-inversion probes to screen for NALCN in a cohort of 202 distal arthrogryposis (DA)-affected individuals as well as concurrent exome sequencing of six other DA-affected individuals, thus revealing NALCN mutations in ten additional families with "atypical" forms of DA. All 14 mutations were missense variants predicted to alter amino acid residues in or near the S5 and S6 pore-forming segments of NALCN, highlighting the functional importance of these segments. In vitro functional studies demonstrated that NALCN alterations nearly abolished the expression of wild-type NALCN, suggesting that alterations that cause CLIFAHDD syndrome have a dominant-negative effect. In contrast, homozygosity for mutations in other regions of NALCN has been reported in three families affected by an autosomal-recessive condition characterized mainly by hypotonia and severe intellectual disability. Accordingly, mutations in NALCN can cause either a recessive or dominant condition characterized by varied though overlapping phenotypic features, perhaps based on the type of mutation and affected protein domain(s).
Subject(s)
Contracture/genetics , Extremities/physiopathology , Face/abnormalities , Muscle Hypotonia/genetics , Sodium Channels/genetics , Arthrogryposis/genetics , Craniofacial Dysostosis/genetics , Cytoskeletal Proteins/genetics , Cytoskeletal Proteins/metabolism , Exome , Female , Gene Frequency , High-Throughput Nucleotide Sequencing , Homozygote , Humans , Infant , Ion Channels , Male , Membrane Proteins , Mutation, Missense , Sodium Channels/metabolismABSTRACT
BACKGROUND: Cell-free DNA (cfDNA) diagnostics are emerging as a new paradigm of disease monitoring and therapy management. The clinical utility of these diagnostics is relatively limited by a low signal-to-noise ratio, such as with low allele frequency (AF) mutations in cancer. While enriching for rare alleles to increase their AF before sample analysis is one strategy that can greatly improve detection capability, current methods are limited in their generalizability, ease of use, and applicability to point mutations. METHODS: Leveraging the robust single-base-pair specificity and generalizability of the CRISPR associated protein 9 (Cas9) system, we developed a deactivated Cas9 (dCas9)-based method of minor-allele enrichment capable of efficient single-target and multiplexed enrichment. The dCas9 protein was complexed with single guide RNAs targeted to mutations of interest and incubated with cfDNA samples containing mutant strands at low abundance. Mutation-bound dCas9 complexes were isolated, dissociated, and the captured DNA purified for downstream use. RESULTS: Targeting the 3 most common epidermal growth factor receptor mutations (exon 19 deletion, T790M, L858R) found in non-small cell lung cancer (NSCLC), we achieved >20-fold increases in AF and detected mutations by use of qPCR at an AF of 0.1%. In a cohort of 18 NSCLC patient-derived cfDNA samples, our method enabled detection of 8 out of 13 mutations that were otherwise undetected by qPCR. CONCLUSIONS: The dCas9 method provides an important application of the CRISPR/Cas9 system outside the realm of genome editing and can provide a step forward for the detection capability of cfDNA diagnostics.
Subject(s)
CRISPR-Associated Protein 9/genetics , Cell-Free Nucleic Acids/genetics , Gene Frequency , Humans , Limit of Detection , Point Mutation , Real-Time Polymerase Chain Reaction/methods , Sequence DeletionABSTRACT
Focal malformations of cortical development, including focal cortical dysplasia (FCD) and hemimegalencephaly (HME), are important causes of intractable childhood epilepsy. Using targeted and exome sequencing on DNA from resected brain samples and nonbrain samples from 53 patients with FCD or HME, we identified pathogenic germline and mosaic mutations in multiple PI3K/AKT pathway genes in 9 patients, and a likely pathogenic variant in 1 additional patient. Our data confirm the association of DEPDC5 with sporadic FCD but also implicate this gene for the first time in HME. Our findings suggest that modulation of the mammalian target of rapamycin pathway may hold promise for malformation-associated epilepsy.
Subject(s)
Hemimegalencephaly/genetics , Malformations of Cortical Development/genetics , Mutation/genetics , Repressor Proteins/genetics , Signal Transduction/genetics , TOR Serine-Threonine Kinases/genetics , Cohort Studies , GTPase-Activating Proteins , Hemimegalencephaly/diagnosis , Humans , Malformations of Cortical Development/diagnosis , Phosphatidylinositol 3-Kinases/genetics , Proto-Oncogene Proteins c-akt/geneticsABSTRACT
Joubert syndrome (JS) is a recessive neurodevelopmental disorder characterized by a distinctive mid-hindbrain malformation. JS is part of a group of disorders called ciliopathies based on their overlapping phenotypes and common underlying pathophysiology linked to primary cilium dysfunction. Biallelic mutations in one of 28 genes, all encoding proteins localizing to the primary cilium or basal body, can cause JS. Despite this large number of genes, the genetic cause can currently be determined in about 62% of individuals with JS. To identify novel JS genes, we performed whole exome sequencing on 35 individuals with JS and found biallelic rare deleterious variants (RDVs) in KIAA0586, encoding a centrosomal protein required for ciliogenesis, in one individual. Targeted next-generation sequencing in a large JS cohort identified biallelic RDVs in eight additional families for an estimated prevalence of 2.5% (9/366 JS families). All affected individuals displayed JS phenotypes toward the mild end of the spectrum.
Subject(s)
Cell Cycle Proteins/genetics , Cerebellum/abnormalities , Mutation , Retina/abnormalities , Abnormalities, Multiple/diagnosis , Abnormalities, Multiple/genetics , Adolescent , Adult , Alternative Splicing , Brain/pathology , Child , Child, Preschool , DNA Mutational Analysis , Eye Abnormalities/diagnosis , Eye Abnormalities/genetics , Gene Order , Genetic Association Studies , Humans , Kidney Diseases, Cystic/diagnosis , Kidney Diseases, Cystic/genetics , Magnetic Resonance Imaging , Phenotype , Young AdultABSTRACT
UNLABELLED: Molecular inversion probes (MIPs) enable cost-effective multiplex targeted gene resequencing in large cohorts. However, the design of individual MIPs is a critical parameter governing the performance of this technology with respect to capture uniformity and specificity. MIPgen is a user-friendly package that simplifies the process of designing custom MIP assays to arbitrary targets. New logistic and SVM-derived models enable in silico predictions of assay success, and assay redesign exhibits improved coverage uniformity relative to previous methods, which in turn improves the utility of MIPs for cost-effective targeted sequencing for candidate gene validation and for diagnostic sequencing in a clinical setting. AVAILABILITY AND IMPLEMENTATION: MIPgen is implemented in C++. Source code and accompanying Python scripts are available at http://shendurelab.github.io/MIPGEN/.
Subject(s)
Algorithms , Computational Biology/methods , DNA Probes/genetics , Models, Statistical , Sequence Analysis/methods , Computer Simulation , HumansABSTRACT
The uncovering of protein-RNA interactions enables a deeper understanding of RNA processing. Recent multiplexed crosslinking and immunoprecipitation (CLIP) technologies such as antibody-barcoded eCLIP (ABC) dramatically increase the throughput of mapping RNA binding protein (RBP) binding sites. However, multiplex CLIP datasets are multivariate, and each RBP suffers non-uniform signal-to-noise ratio. To address this, we developed Mudskipper, a versatile computational suite comprising two components: a Dirichlet multinomial mixture model to account for the multivariate nature of ABC datasets and a softmasking approach that identifies and removes non-specific protein-RNA interactions in RBPs with low signal-to-noise ratio. Mudskipper demonstrates superior precision and recall over existing tools on multiplex datasets and supports analysis of repetitive elements and small non-coding RNAs. Our findings unravel splicing outcomes and variant-associated disruptions, enabling higher-throughput investigations into diseases and regulation mediated by RBPs.
Subject(s)
RNA-Binding Proteins , RNA-Binding Proteins/metabolism , RNA-Binding Proteins/genetics , Humans , Immunoprecipitation/methods , Binding Sites , Software , Computational Biology/methods , RNA/metabolism , RNA/genetics , Protein BindingABSTRACT
Here, we present a protocol for using Skipper, a pipeline designed to process crosslinking and immunoprecipitation (CLIP) data into annotated binding sites. We describe steps for partitioning annotated transcript regions and fitting data to a beta-binomial model to call windows of enriched binding. From raw CLIP data, we detail how users can map reproducible RNA-binding sites to call enriched windows and perform downstream analysis. This protocol supports optional customizations for different use cases. For complete details on the use and execution of this protocol, please refer to Boyle et al.1.
Subject(s)
Immunoprecipitation , Binding Sites , Immunoprecipitation/methods , Humans , Software , Cross-Linking Reagents/chemistry , RNA/metabolism , RNA/geneticsABSTRACT
RNA-binding proteins (RBPs) modulate alternative splicing outcomes to determine isoform expression and cellular survival. To identify RBPs that directly drive alternative exon inclusion, we developed tethered function luciferase-based splicing reporters that provide rapid, scalable and robust readouts of exon inclusion changes and used these to evaluate 718 human RBPs. We performed enhanced cross-linking immunoprecipitation, RNA sequencing and affinity purification-mass spectrometry to investigate a subset of candidates with no prior association with splicing. Integrative analysis of these assays indicates surprising roles for TRNAU1AP, SCAF8 and RTCA in the modulation of hundreds of endogenous splicing events. We also leveraged our tethering assays and top candidates to identify potent and compact exon inclusion activation domains for splicing modulation applications. Using these identified domains, we engineered programmable fusion proteins that outperform current artificial splicing factors at manipulating inclusion of reporter and endogenous exons. This tethering approach characterizes the ability of RBPs to induce exon inclusion and yields new molecular parts for programmable splicing control.
Subject(s)
Alternative Splicing , Exons , RNA-Binding Proteins , Humans , Exons/genetics , RNA-Binding Proteins/metabolism , RNA-Binding Proteins/genetics , Alternative Splicing/genetics , HEK293 CellsABSTRACT
RNA binding proteins (RBPs) are key regulators of RNA processing and cellular function. Technologies to discover RNA targets of RBPs such as TRIBE (targets of RNA binding proteins identified by editing) and STAMP (surveying targets by APOBEC1 mediated profiling) utilize fusions of RNA base-editors (rBEs) to RBPs to circumvent the limitations of immunoprecipitation (CLIP)-based methods that require enzymatic digestion and large amounts of input material. To broaden the repertoire of rBEs suitable for editing-based RBP-RNA interaction studies, we have devised experimental and computational assays in a framework called PRINTER (protein-RNA interaction-based triaging of enzymes that edit RNA) to assess over thirty A-to-I and C-to-U rBEs, allowing us to identify rBEs that expand the characterization of binding patterns for both sequence-specific and broad-binding RBPs. We also propose specific rBEs suitable for dual-RBP applications. We show that the choice between single or multiple rBEs to fuse with a given RBP or pair of RBPs hinges on the editing biases of the rBEs and the binding preferences of the RBPs themselves. We believe our study streamlines and enhances the selection of rBEs for the next generation of RBP-RNA target discovery.
Subject(s)
RNA-Binding Proteins , RNA , RNA/metabolism , Binding Sites/genetics , RNA-Binding Proteins/metabolism , RNA Processing, Post-TranscriptionalABSTRACT
Technology for crosslinking and immunoprecipitation (CLIP) followed by sequencing (CLIP-seq) has identified the transcriptomic targets of hundreds of RNA-binding proteins in cells. To increase the power of existing and future CLIP-seq datasets, we introduce Skipper, an end-to-end workflow that converts unprocessed reads into annotated binding sites using an improved statistical framework. Compared with existing methods, Skipper on average calls 210%-320% more transcriptomic binding sites and sometimes >1,000% more sites, providing deeper insight into post-transcriptional gene regulation. Skipper also calls binding to annotated repetitive elements and identifies bound elements for 99% of enhanced CLIP experiments. We perform nine translation factor enhanced CLIPs and apply Skipper to learn determinants of translation factor occupancy, including transcript region, sequence, and subcellular localization. Furthermore, we observe depletion of genetic variation in occupied sites and nominate transcripts subject to selective constraint because of translation factor occupancy. Skipper offers fast, easy, customizable, and state-of-the-art analysis of CLIP-seq data.
ABSTRACT
RNA binding proteins (RBPs) are key regulators of RNA processing and cellular function. Technologies to discover RNA targets of RBPs such as TRIBE (targets of RNA binding proteins identified by editing) and STAMP (surveying targets by APOBEC1 mediated profiling) utilize fusions of RNA base-editors (rBEs) to RBPs to circumvent the limitations of immunoprecipitation (CLIP)-based methods that require enzymatic digestion and large amounts of input material. To broaden the repertoire of rBEs suitable for editing-based RBP-RNA interaction studies, we have devised experimental and computational assays in a framework called PRINTER (protein-RNA interaction-based triaging of enzymes that edit RNA) to assess over thirty A-to-I and C-to-U rBEs, allowing us to identify rBEs that expand the characterization of binding patterns for both sequence-specific and broad-binding RBPs. We also propose specific rBEs suitable for dual-RBP applications. We show that the choice between single or multiple rBEs to fuse with a given RBP or pair of RBPs hinges on the editing biases of the rBEs and the binding preferences of the RBPs themselves. We believe our study streamlines and enhances the selection of rBEs for the next generation of RBP-RNA target discovery.
ABSTRACT
Disparities for women and minorities in science, technology, engineering, and math (STEM) careers have continued even amidst mounting evidence for the superior performance of diverse workforces. In response, we launched the Diversity and Science Lecture series, a cross-institutional platform where junior life scientists present their research and comment on diversity, equity, and inclusion in STEM. We characterize speaker representation from 79 profiles and investigate topic noteworthiness via quantitative content analysis of talk transcripts. Nearly every speaker discussed interpersonal support, and three-fifths of speakers commented on race or ethnicity. Other topics, such as sexual and gender minority identity, were less frequently addressed but highly salient to the speakers who mentioned them. We found that significantly co-occurring topics reflected not only conceptual similarity, such as terms for racial identities, but also intersectional significance, such as identifying as a Latina/Hispanic woman or Asian immigrant, and interactions between concerns and identities, including the heightened value of friendship to the LGBTQ community, which we reproduce using transcripts from an independent seminar series. Our approach to scholar profiles and talk transcripts serves as an example for transmuting hundreds of hours of scholarly discourse into rich datasets that can power computational audits of speaker diversity and illuminate speakers' personal and professional priorities.
Subject(s)
Diversity, Equity, Inclusion , Ethnicity , Female , Humans , Minority Groups , TechnologyABSTRACT
The RNA-guided nuclease Cas9 has unlocked powerful methods for perturbing both the genome through targeted DNA cleavage and the regulome through targeted DNA binding, but limited biochemical data have hampered efforts to quantitatively model sequence perturbation of target binding and cleavage across diverse guide sequences. We present scalable, sequencing-based platforms for high-throughput filter binding and cleavage and then perform 62,444 quantitative binding and cleavage assays on 35,047 on- and off-target DNA sequences across 90 Cas9 ribonucleoproteins (RNPs) loaded with distinct guide RNAs. We observe that binding and cleavage efficacy, as well as specificity, vary substantially across RNPs; canonically studied guides often have atypically high specificity; sequence context surrounding the target modulates Cas9 on-rate; and Cas9 RNPs may sequester targets in nonproductive states that contribute to "proofreading" capability. Lastly, we distill our findings into an interpretable biophysical model that predicts changes in binding and cleavage for diverse target sequence perturbations.
ABSTRACT
CRISPR-Cas9 screens are powerful tools for high-throughput interrogation of genome function, but can be confounded by nuclease-induced toxicity at both on- and off-target sites, likely due to DNA damage. Here, to test potential solutions to this issue, we design and analyse a CRISPR-Cas9 library with 10 variable-length guides per gene and thousands of negative controls targeting non-functional, non-genic regions (termed safe-targeting guides), in addition to non-targeting controls. We find this library has excellent performance in identifying genes affecting growth and sensitivity to the ricin toxin. The safe-targeting guides allow for proper control of toxicity from on-target DNA damage. Using this toxicity as a proxy to measure off-target cutting, we demonstrate with tens of thousands of guides both the nucleotide position-dependent sensitivity to single mismatches and the reduction of off-target cutting using truncated guides. Our results demonstrate a simple strategy for high-throughput evaluation of target specificity and nuclease toxicity in Cas9 screens.
Subject(s)
CRISPR-Cas Systems/genetics , Gene Targeting/methods , Genomic Library , High-Throughput Screening Assays/methods , RNA, Guide, Kinetoplastida/genetics , Cell Line , Clustered Regularly Interspaced Short Palindromic Repeats/genetics , DNA Damage/genetics , Humans , Polysaccharides/biosynthesis , RNA Interference , Ricin/toxicityABSTRACT
The accumulation of amyloid beta (Aß) peptide (Amyloid cascade hypothesis), an APP protein cleavage product, is a leading hypothesis in the etiology of Alzheimer's disease (AD). In order to identify additional AD risk genes, we performed targeted sequencing and rare variant burden association study for nine candidate genes involved in the amyloid metabolism in 1886 AD cases and 1700 controls. We identified a significant variant burden association for the gene encoding caspase-8, CASP8 (p = 8.6x10-5). For two CASP8 variants, p.K148R and p.I298V, the association remained significant in a combined sample of 10,820 cases and 8,881 controls. For both variants we performed bioinformatics structural, expression and enzymatic activity studies and obtained evidence for loss of function effects. In addition to their role in amyloid processing, caspase-8 and its downstream effector caspase-3 are involved in synaptic plasticity, learning, memory and control of microglia pro-inflammatory activation and associated neurotoxicity, indicating additional mechanisms that might contribute to AD. As caspase inhibition has been proposed as a mechanism for AD treatment, our finding that AD-associated CASP8 variants reduce caspase function calls for caution and is an impetus for further studies on the role of caspases in AD and other neurodegenerative diseases.