RESUMO
Although base editors are widely used to install targeted point mutations, the factors that determine base editing outcomes are not well understood. We characterized sequence-activity relationships of 11 cytosine and adenine base editors (CBEs and ABEs) on 38,538 genomically integrated targets in mammalian cells and used the resulting outcomes to train BE-Hive, a machine learning model that accurately predicts base editing genotypic outcomes (R ≈ 0.9) and efficiency (R ≈ 0.7). We corrected 3,388 disease-associated SNVs with ≥90% precision, including 675 alleles with bystander nucleotides that BE-Hive correctly predicted would not be edited. We discovered determinants of previously unpredictable C-to-G, or C-to-A editing and used these discoveries to correct coding sequences of 174 pathogenic transversion SNVs with ≥90% precision. Finally, we used insights from BE-Hive to engineer novel CBE variants that modulate editing outcomes. These discoveries illuminate base editing, enable editing at previously intractable targets, and provide new base editors with improved editing capabilities.
Assuntos
Edição de Genes/métodos , Aprendizado de Máquina , Animais , Biblioteca Gênica , Humanos , Camundongos , Células-Tronco Embrionárias Murinas/citologia , Células-Tronco Embrionárias Murinas/metabolismo , Mutação Puntual , RNA Guia de Cinetoplastídeos/metabolismoRESUMO
In this Article, a data processing error affected Fig. 3e and Extended Data Table 2; these errors have been corrected online.
RESUMO
Following Cas9 cleavage, DNA repair without a donor template is generally considered stochastic, heterogeneous and impractical beyond gene disruption. Here, we show that template-free Cas9 editing is predictable and capable of precise repair to a predicted genotype, enabling correction of disease-associated mutations in humans. We constructed a library of 2,000 Cas9 guide RNAs paired with DNA target sites and trained inDelphi, a machine learning model that predicts genotypes and frequencies of 1- to 60-base-pair deletions and 1-base-pair insertions with high accuracy (r = 0.87) in five human and mouse cell lines. inDelphi predicts that 5-11% of Cas9 guide RNAs targeting the human genome are 'precise-50', yielding a single genotype comprising greater than or equal to 50% of all major editing products. We experimentally confirmed precise-50 insertions and deletions in 195 human disease-relevant alleles, including correction in primary patient-derived fibroblasts of pathogenic alleles to wild-type genotype for Hermansky-Pudlak syndrome and Menkes disease. This study establishes an approach for precise, template-free genome editing.
Assuntos
Sistemas CRISPR-Cas/genética , Edição de Genes/métodos , Edição de Genes/normas , Síndrome de Hermanski-Pudlak/genética , Aprendizado de Máquina , Síndrome dos Cabelos Torcidos/genética , Moldes Genéticos , Alelos , Sequência de Bases , Proteína 9 Associada à CRISPR/metabolismo , Reparo do DNA/genética , Fibroblastos/metabolismo , Fibroblastos/patologia , Células HCT116 , Células HEK293 , Síndrome de Hermanski-Pudlak/patologia , Humanos , Células K562 , Síndrome dos Cabelos Torcidos/patologia , Reprodutibilidade dos Testes , Especificidade por SubstratoRESUMO
Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods, as short read lengths can lose mutation linkages in haplotypes. Here we present Evoracle, a machine learning method that accurately reconstructs full-length genotypes (R2 = 0.94) and fitness using short-read data from directed evolution experiments, with substantial improvements over related methods. We validate Evoracle on phage-assisted continuous evolution (PACE) and phage-assisted non-continuous evolution (PANCE) of adenine base editors and OrthoRep evolution of drug-resistant enzymes. Evoracle retains strong performance (R2 = 0.86) on data with complete linkage loss between neighboring nucleotides and large measurement noise, such as pooled Sanger sequencing data (~US$10 per timepoint), and broadens the accessibility of training machine learning models on gene variant fitnesses. Evoracle can also identify high-fitness variants, including low-frequency 'rising stars', well before they are identifiable from consensus mutations.
Assuntos
Adenosina Desaminase/genética , Proteínas de Escherichia coli/genética , Sequenciamento de Nucleotídeos em Larga Escala , Variação Genética/genética , Aprendizado de MáquinaRESUMO
Restoring gene function by the induced skipping of deleterious exons has been shown to be effective for treating genetic disorders. However, many of the clinically successful therapies for exon skipping are transient oligonucleotide-based treatments that require frequent dosing. CRISPR-Cas9 based genome editing that causes exon skipping is a promising therapeutic modality that may offer permanent alleviation of genetic disease. We show that machine learning can select Cas9 guide RNAs that disrupt splice acceptors and cause the skipping of targeted exons. We experimentally measured the exon skipping frequencies of a diverse genome-integrated library of 791 splice sequences targeted by 1,063 guide RNAs in mouse embryonic stem cells. We found that our method, SkipGuide, is able to identify effective guide RNAs with a precision of 0.68 (50% threshold predicted exon skipping frequency) and 0.93 (70% threshold predicted exon skipping frequency). We anticipate that SkipGuide will be useful for selecting guide RNA candidates for evaluation of CRISPR-Cas9-mediated exon skipping therapy.
Assuntos
Sistemas CRISPR-Cas/genética , Edição de Genes/métodos , Terapia Genética/métodos , Aprendizado de Máquina , RNA Guia de Cinetoplastídeos/genética , Animais , Células Cultivadas , Células-Tronco Embrionárias , Éxons , Biblioteca Gênica , Humanos , CamundongosRESUMO
We introduce poly-adenine CRISPR gRNA-based single-cell RNA-sequencing (pAC-Seq), a method that enables the direct observation of guide RNAs (gRNAs) in scRNA-seq. We use pAC-Seq to assess the phenotypic consequences of CRISPR/Cas9 based alterations of gene cis-regulatory regions. We show that pAC-Seq is able to detect cis-regulatory-induced alteration of target gene expression even when biallelic loss of target gene expression occurs in only ~5% of cells. This low rate of biallelic loss significantly increases the number of cells required to detect the consequences of changes to the regulatory genome, but can be ameliorated by transcript-targeted sequencing. Based on our experimental results we model the power to detect regulatory genome induced transcriptomic effects based on the rate of mono/biallelic loss, baseline gene expression, and the number of cells per target gRNA.
Assuntos
Sistemas CRISPR-Cas/genética , Elementos Reguladores de Transcrição/genética , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Transcriptoma/genética , Algoritmos , Animais , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Biologia Computacional , Bases de Dados Factuais , Humanos , Camundongos , RNA Guia de Cinetoplastídeos/genéticaRESUMO
The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Benchmarking , Escherichia coli/genética , Genômica , Reprodutibilidade dos Testes , Software , Xanthomonas/genéticaRESUMO
Spinal muscular atrophy (SMA), the leading genetic cause of infant mortality, arises from survival motor neuron (SMN) protein insufficiency resulting from SMN1 loss. Approved therapies circumvent endogenous SMN regulation and require repeated dosing or may wane. We describe genome editing of SMN2, an insufficient copy of SMN1 harboring a C6>T mutation, to permanently restore SMN protein levels and rescue SMA phenotypes. We used nucleases or base editors to modify five SMN2 regulatory regions. Base editing converted SMN2 T6>C, restoring SMN protein levels to wild type. Adeno-associated virus serotype 9-mediated base editor delivery in Δ7SMA mice yielded 87% average T6>C conversion, improved motor function, and extended average life span, which was enhanced by one-time base editor and nusinersen coadministration (111 versus 17 days untreated). These findings demonstrate the potential of a one-time base editing treatment for SMA.
Assuntos
Edição de Genes , Atrofia Muscular Espinal , Proteína 1 de Sobrevivência do Neurônio Motor , Proteína 2 de Sobrevivência do Neurônio Motor , Animais , Camundongos , Fibroblastos/metabolismo , Neurônios Motores/metabolismo , Atrofia Muscular Espinal/genética , Atrofia Muscular Espinal/terapia , Proteína 1 de Sobrevivência do Neurônio Motor/genética , Proteína 2 de Sobrevivência do Neurônio Motor/genéticaRESUMO
In vitro selection queries large combinatorial libraries for sequence-defined polymers with target binding and reaction catalysis activity. While the total sequence space of these libraries can extend beyond 1022 sequences, practical considerations limit starting sequences to ≤~1015 distinct molecules. Selection-induced sequence convergence and limited sequencing depth further constrain experimentally observable sequence space. To address these limitations, we integrate experimental and machine learning approaches to explore regions of sequence space unrelated to experimentally derived variants. We perform in vitro selections to discover highly side-chain-functionalized nucleic acid polymers (HFNAPs) with potent affinities for a target small molecule (daunomycin KD = 5-65 nM). We then use the selection data to train a conditional variational autoencoder (CVAE) machine learning model to generate diverse and unique HFNAP sequences with high daunomycin affinities (KD = 9-26 nM), even though they are unrelated in sequence to experimental polymers. Coupling in vitro selection with a machine learning model thus enables direct generation of active variants, demonstrating a new approach to the discovery of functional biopolymers.
Assuntos
Ácidos Nucleicos , Biopolímeros , Daunorrubicina , Aprendizado de Máquina , Polímeros/químicaRESUMO
Prime editing enables search-and-replace genome editing but is limited by low editing efficiency. We present a high-throughput approach, the Peptide Self-Editing sequencing assay (PepSEq), to measure how fusion of 12,000 85-amino acid peptides influences prime editing efficiency. We show that peptide fusion can enhance prime editing, prime-enhancing peptides combine productively, and a top dual peptide-prime editor increases prime editing significantly in multiple cell lines across dozens of target sites. Top prime-enhancing peptides function by increasing translation efficiency and serve as broadly useful tools to improve prime editing efficiency.
Assuntos
Sistemas CRISPR-Cas , Edição de Genes , Linhagem Celular , Fusão Gênica , Peptídeos/genéticaRESUMO
Mutational outcomes following CRISPR-Cas9-nuclease cutting in mammalian cells have recently been shown to be predictable and, in certain cases, skewed toward single genotypes. However, the ability to control these outcomes remains limited, especially for 1-bp insertions, a common and therapeutically relevant class of repair outcomes. Here, through a small molecule screen, we identify the ATM kinase inhibitor KU-60019 as a compound capable of reproducibly increasing the fraction of 1-bp insertions relative to other Cas9 repair outcomes. Small molecule or genetic ATM inhibition increases 1-bp insertion outcome fraction across three human and mouse cell lines, two Cas9 species, and dozens of target sites, although concomitantly reducing the fraction of edited alleles. Notably, KU-60019 increases the relative frequency of 1-bp insertions to over 80% of edited alleles at several native human genomic loci and improves the efficiency of correction for pathogenic 1-bp deletion variants. The ability to increase 1-bp insertion frequency adds another dimension to precise template-free Cas9-nuclease genome editing.
Assuntos
Proteínas Mutadas de Ataxia Telangiectasia/antagonistas & inibidores , Proteínas Mutadas de Ataxia Telangiectasia/metabolismo , Sistemas CRISPR-Cas/efeitos dos fármacos , Morfolinas/farmacologia , Mutagênese Insercional/efeitos dos fármacos , Inibidores de Proteínas Quinases/farmacologia , Tioxantenos/farmacologia , Animais , Proteínas Mutadas de Ataxia Telangiectasia/genética , Linhagem Celular , Edição de Genes , Humanos , Deleção de Sequência/efeitos dos fármacosRESUMO
Prime editing (PE) is a versatile genome editing technology, but design of the required guide RNAs is more complex than for standard CRISPR-based nucleases or base editors. Here we describe PrimeDesign, a user-friendly, end-to-end web application and command-line tool for the design of PE experiments. PrimeDesign can be used for single and combination editing applications, as well as genome-wide and saturation mutagenesis screens. Using PrimeDesign, we construct PrimeVar, a comprehensive and searchable database that includes candidate prime editing guide RNA (pegRNA) and nicking sgRNA (ngRNA) combinations for installing or correcting >68,500 pathogenic human genetic variants from the ClinVar database. Finally, we use PrimeDesign to design pegRNAs/ngRNAs to install a variety of human pathogenic variants in human cells.
Assuntos
Sistemas CRISPR-Cas , Edição de Genes/métodos , Genoma Humano , RNA Guia de Cinetoplastídeos/genética , Pareamento de Bases , Sequência de Bases , Proteína 9 Associada à CRISPR/genética , Proteína 9 Associada à CRISPR/metabolismo , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Bases de Dados Genéticas , Doença de Fabry/genética , Doença de Fabry/metabolismo , Doença de Fabry/patologia , Proteínas de Fluorescência Verde/genética , Proteínas de Fluorescência Verde/metabolismo , Células HEK293 , Hemofilia A/genética , Hemofilia A/metabolismo , Hemofilia A/patologia , Humanos , Modelos Biológicos , Distrofia Muscular de Duchenne/genética , Distrofia Muscular de Duchenne/metabolismo , Distrofia Muscular de Duchenne/patologia , Mutação , Conformação de Ácido Nucleico , Plasmídeos/química , Plasmídeos/metabolismo , RNA Guia de Cinetoplastídeos/metabolismo , Proteínas Recombinantes de Fusão/genética , Proteínas Recombinantes de Fusão/metabolismoRESUMO
Programmable Câ¢G-to-Gâ¢C base editors (CGBEs) have broad scientific and therapeutic potential, but their editing outcomes have proved difficult to predict and their editing efficiency and product purity are often low. We describe a suite of engineered CGBEs paired with machine learning models to enable efficient, high-purity Câ¢G-to-Gâ¢C base editing. We performed a CRISPR interference (CRISPRi) screen targeting DNA repair genes to identify factors that affect Câ¢G-to-Gâ¢C editing outcomes and used these insights to develop CGBEs with diverse editing profiles. We characterized ten promising CGBEs on a library of 10,638 genomically integrated target sites in mammalian cells and trained machine learning models that accurately predict the purity and yield of editing outcomes (R = 0.90) using these data. These CGBEs enable correction to the wild-type coding sequence of 546 disease-related transversion single-nucleotide variants (SNVs) with >90% precision (mean 96%) and up to 70% efficiency (mean 14%). Computational prediction of optimal CGBE-single-guide RNA pairs enables high-purity transversion base editing at over fourfold more target sites than achieved using any single CGBE variant.
Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Edição de Genes , Animais , Sistemas CRISPR-Cas/genética , Aprendizado de Máquina , Mamíferos/genética , RNA Guia de Cinetoplastídeos/genéticaRESUMO
Gene expression is controlled by the collective binding of transcription factors to cis-regulatory regions. Deciphering gene-centered regulatory networks is vital to understanding and controlling gene misexpression in human disease; however, systematic approaches to uncovering regulatory networks have been lacking. Here we present high-throughput interrogation of gene-centered activation networks (HIGAN), a pipeline that employs a suite of multifaceted genomic approaches to connect upstream signaling inputs, trans-acting TFs, and cis-regulatory elements. We apply HIGAN to understand the aberrant activation of the cytidine deaminase APOBEC3B, an intrinsic source of cancer hypermutation. We reveal that nuclear factor κB (NF-κB) and AP-1 pathways are the most salient trans-acting inputs, with minor roles for other inflammatory pathways. We identify a cis-regulatory architecture dominated by a major intronic enhancer that requires coordinated NF-κB and AP-1 activity with secondary inputs from distal regulatory regions. Our data demonstrate how integration of cis and trans genomic screening platforms provides a paradigm for building gene-centered regulatory networks.
Assuntos
Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Oncogenes/imunologia , Humanos , Transdução de SinaisRESUMO
The targeting scope of Streptococcus pyogenes Cas9 (SpCas9) and its engineered variants is largely restricted to protospacer-adjacent motif (PAM) sequences containing G bases. Here we report the evolution of three new SpCas9 variants that collectively recognize NRNH PAMs (where R is A or G and H is A, C or T) using phage-assisted non-continuous evolution, three new phage-assisted continuous evolution strategies for DNA binding and a secondary selection for DNA cleavage. The targeting capabilities of these evolved variants and SpCas9-NG were characterized in HEK293T cells using a library of 11,776 genomically integrated protospacer-sgRNA pairs containing all possible NNNN PAMs. The evolved variants mediated indel formation and base editing in human cells and enabled Aâ¢T-to-Gâ¢C base editing of a sickle cell anemia mutation using a previously inaccessible CACC PAM. These new evolved SpCas9 variants, together with previously reported variants, in principle enable targeting of most NR PAM sequences and substantially reduce the fraction of genomic sites that are inaccessible by Cas9-based methods.