Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
1.
Cell ; 173(3): 665-676.e14, 2018 04 19.
Article in English | MEDLINE | ID: mdl-29551272

ABSTRACT

Class 2 CRISPR-Cas systems endow microbes with diverse mechanisms for adaptive immunity. Here, we analyzed prokaryotic genome and metagenome sequences to identify an uncharacterized family of RNA-guided, RNA-targeting CRISPR systems that we classify as type VI-D. Biochemical characterization and protein engineering of seven distinct orthologs generated a ribonuclease effector derived from Ruminococcus flavefaciens XPD3002 (CasRx) with robust activity in human cells. CasRx-mediated knockdown exhibits high efficiency and specificity relative to RNA interference across diverse endogenous transcripts. As one of the most compact single-effector Cas enzymes, CasRx can also be flexibly packaged into adeno-associated virus. We target virally encoded, catalytically inactive CasRx to cis elements of pre-mRNA to manipulate alternative splicing, alleviating dysregulated tau isoform ratios in a neuronal model of frontotemporal dementia. Our results present CasRx as a programmable RNA-binding module for efficient targeting of cellular RNA, enabling a general platform for transcriptome engineering and future therapeutic development.


Subject(s)
CRISPR-Cas Systems , Computational Biology/methods , Genetic Engineering/methods , Protein Engineering/methods , RNA/analysis , Alternative Splicing , Animals , Bacterial Proteins/metabolism , Cell Differentiation , Escherichia coli/metabolism , Gene Expression Profiling , HEK293 Cells , Humans , Induced Pluripotent Stem Cells/cytology , Lentivirus/genetics , Mice , RNA Interference , RNA, Guide, Kinetoplastida/genetics , Ruminococcus , Sequence Analysis, RNA , Transcriptome
2.
Cell ; 175(1): 212-223.e17, 2018 09 20.
Article in English | MEDLINE | ID: mdl-30241607

ABSTRACT

CRISPR-Cas endonucleases directed against foreign nucleic acids mediate prokaryotic adaptive immunity and have been tailored for broad genetic engineering applications. Type VI-D CRISPR systems contain the smallest known family of single effector Cas enzymes, and their signature Cas13d ribonuclease employs guide RNAs to cleave matching target RNAs. To understand the molecular basis for Cas13d function and explain its compact molecular architecture, we resolved cryoelectron microscopy structures of Cas13d-guide RNA binary complex and Cas13d-guide-target RNA ternary complex to 3.4 and 3.3 Å resolution, respectively. Furthermore, a 6.5 Å reconstruction of apo Cas13d combined with hydrogen-deuterium exchange revealed conformational dynamics that have implications for RNA scanning. These structures, together with biochemical and cellular characterization, provide insights into its RNA-guided, RNA-targeting mechanism and delineate a blueprint for the rational design of improved transcriptome engineering technologies.


Subject(s)
CRISPR-Cas Systems/genetics , RNA, Guide, Kinetoplastida/physiology , Ribonucleases/physiology , CRISPR-Cas Systems/physiology , Clustered Regularly Interspaced Short Palindromic Repeats/genetics , Cryoelectron Microscopy/methods , Endonucleases/metabolism , HEK293 Cells , Humans , Molecular Conformation , RNA/genetics , RNA, Guide, Kinetoplastida/genetics , RNA, Guide, Kinetoplastida/ultrastructure , Ribonucleases/metabolism , Ribonucleases/ultrastructure
3.
Cell ; 164(5): 950-61, 2016 Feb 25.
Article in English | MEDLINE | ID: mdl-26875867

ABSTRACT

The RNA-guided endonuclease Cas9 cleaves double-stranded DNA targets complementary to the guide RNA and has been applied to programmable genome editing. Cas9-mediated cleavage requires a protospacer adjacent motif (PAM) juxtaposed with the DNA target sequence, thus constricting the range of targetable sites. Here, we report the 1.7 Å resolution crystal structures of Cas9 from Francisella novicida (FnCas9), one of the largest Cas9 orthologs, in complex with a guide RNA and its PAM-containing DNA targets. A structural comparison of FnCas9 with other Cas9 orthologs revealed striking conserved and divergent features among distantly related CRISPR-Cas9 systems. We found that FnCas9 recognizes the 5'-NGG-3' PAM, and used the structural information to create a variant that can recognize the more relaxed 5'-YG-3' PAM. Furthermore, we demonstrated that the FnCas9-ribonucleoprotein complex can be microinjected into mouse zygotes to edit endogenous sites with the 5'-YG-3' PAM, thus expanding the target space of the CRISPR-Cas9 toolbox.


Subject(s)
Bacterial Proteins/chemistry , CRISPR-Cas Systems , Endonucleases/chemistry , Francisella/enzymology , Genetic Engineering/methods , Animals , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Blastocyst/metabolism , CRISPR-Associated Protein 9 , Crystallography, X-Ray , Embryo, Mammalian/metabolism , Endonucleases/genetics , Endonucleases/metabolism , Mice , Microinjections/methods , Models, Molecular , RNA, Guide, Kinetoplastida/genetics
4.
Nature ; 630(8018): 984-993, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38926615

ABSTRACT

Genomic rearrangements, encompassing mutational changes in the genome such as insertions, deletions or inversions, are essential for genetic diversity. These rearrangements are typically orchestrated by enzymes that are involved in fundamental DNA repair processes, such as homologous recombination, or in the transposition of foreign genetic material by viruses and mobile genetic elements1,2. Here we report that IS110 insertion sequences, a family of minimal and autonomous mobile genetic elements, express a structured non-coding RNA that binds specifically to their encoded recombinase. This bridge RNA contains two internal loops encoding nucleotide stretches that base-pair with the target DNA and the donor DNA, which is the IS110 element itself. We demonstrate that the target-binding and donor-binding loops can be independently reprogrammed to direct sequence-specific recombination between two DNA molecules. This modularity enables the insertion of DNA into genomic target sites, as well as programmable DNA excision and inversion. The IS110 bridge recombination system expands the diversity of nucleic-acid-guided systems beyond CRISPR and RNA interference, offering a unified mechanism for the three fundamental DNA rearrangements-insertion, excision and inversion-that are required for genome design.


Subject(s)
DNA , RNA, Untranslated , Recombination, Genetic , Base Pairing , Base Sequence , DNA/genetics , DNA/metabolism , DNA Transposable Elements/genetics , Mutagenesis, Insertional/genetics , Recombinases/metabolism , Recombinases/genetics , Recombination, Genetic/genetics , RNA, Untranslated/genetics , RNA, Untranslated/metabolism
5.
Nature ; 630(8018): 994-1002, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38926616

ABSTRACT

Insertion sequence (IS) elements are the simplest autonomous transposable elements found in prokaryotic genomes1. We recently discovered that IS110 family elements encode a recombinase and a non-coding bridge RNA (bRNA) that confers modular specificity for target DNA and donor DNA through two programmable loops2. Here we report the cryo-electron microscopy structures of the IS110 recombinase in complex with its bRNA, target DNA and donor DNA in three different stages of the recombination reaction cycle. The IS110 synaptic complex comprises two recombinase dimers, one of which houses the target-binding loop of the bRNA and binds to target DNA, whereas the other coordinates the bRNA donor-binding loop and donor DNA. We uncovered the formation of a composite RuvC-Tnp active site that spans the two dimers, positioning the catalytic serine residues adjacent to the recombination sites in both target and donor DNA. A comparison of the three structures revealed that (1) the top strands of target and donor DNA are cleaved at the composite active sites to form covalent 5'-phosphoserine intermediates, (2) the cleaved DNA strands are exchanged and religated to create a Holliday junction intermediate, and (3) this intermediate is subsequently resolved by cleavage of the bottom strands. Overall, this study reveals the mechanism by which a bispecific RNA confers target and donor DNA specificity to IS110 recombinases for programmable DNA recombination.


Subject(s)
DNA , RNA, Untranslated , Recombination, Genetic , Catalytic Domain , Cryoelectron Microscopy , DNA/chemistry , DNA/metabolism , DNA/ultrastructure , DNA Transposable Elements/genetics , Models, Molecular , Nucleic Acid Conformation , Protein Multimerization , Recombinases/chemistry , Recombinases/genetics , Recombinases/metabolism , RNA, Untranslated/chemistry , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , RNA, Untranslated/ultrastructure , Substrate Specificity
6.
Cell ; 157(6): 1262-1278, 2014 Jun 05.
Article in English | MEDLINE | ID: mdl-24906146

ABSTRACT

Recent advances in genome engineering technologies based on the CRISPR-associated RNA-guided endonuclease Cas9 are enabling the systematic interrogation of mammalian genome function. Analogous to the search function in modern word processors, Cas9 can be guided to specific locations within complex genomes by a short RNA search string. Using this system, DNA sequences within the endogenous genome and their functional outputs are now easily edited or modulated in virtually any organism of choice. Cas9-mediated genetic perturbation is simple and scalable, empowering researchers to elucidate the functional organization of the genome at the systems level and establish causal linkages between genetic variations and biological phenotypes. In this Review, we describe the development and applications of Cas9 for a variety of research or translational applications while highlighting challenges as well as future directions. Derived from a remarkable microbial defense system, Cas9 is driving innovative applications from basic biology to biotechnology and medicine.


Subject(s)
Bacteria/genetics , CRISPR-Cas Systems , Gene Targeting , Genetic Engineering , Animals , Bacteria/classification , Bacteria/immunology , Bacteria/virology , Eukaryotic Cells/metabolism , Genome , Humans , Streptococcus pyogenes/enzymology , Streptococcus pyogenes/genetics
7.
Cell ; 156(5): 935-49, 2014 Feb 27.
Article in English | MEDLINE | ID: mdl-24529477

ABSTRACT

The CRISPR-associated endonuclease Cas9 can be targeted to specific genomic loci by single guide RNAs (sgRNAs). Here, we report the crystal structure of Streptococcus pyogenes Cas9 in complex with sgRNA and its target DNA at 2.5 Å resolution. The structure revealed a bilobed architecture composed of target recognition and nuclease lobes, accommodating the sgRNA:DNA heteroduplex in a positively charged groove at their interface. Whereas the recognition lobe is essential for binding sgRNA and DNA, the nuclease lobe contains the HNH and RuvC nuclease domains, which are properly positioned for cleavage of the complementary and noncomplementary strands of the target DNA, respectively. The nuclease lobe also contains a carboxyl-terminal domain responsible for the interaction with the protospacer adjacent motif (PAM). This high-resolution structure and accompanying functional analyses have revealed the molecular mechanism of RNA-guided DNA targeting by Cas9, thus paving the way for the rational design of new, versatile genome-editing technologies.


Subject(s)
CRISPR-Associated Proteins/chemistry , Crystallography, X-Ray , Endonucleases/chemistry , RNA, Bacterial/chemistry , Streptococcus pyogenes/chemistry , Amino Acid Sequence , Bacteria/enzymology , CRISPR-Associated Proteins/metabolism , DNA, Bacterial/chemistry , DNA, Bacterial/metabolism , Endonucleases/metabolism , Models, Molecular , Molecular Sequence Data , Protein Structure, Tertiary , RNA, Bacterial/metabolism , Sequence Alignment , Streptococcus pyogenes/enzymology , Streptococcus pyogenes/metabolism , RNA, Small Untranslated
8.
Cell ; 154(6): 1380-9, 2013 Sep 12.
Article in English | MEDLINE | ID: mdl-23992846

ABSTRACT

Targeted genome editing technologies have enabled a broad range of research and medical applications. The Cas9 nuclease from the microbial CRISPR-Cas system is targeted to specific genomic loci by a 20 nt guide sequence, which can tolerate certain mismatches to the DNA target and thereby promote undesired off-target mutagenesis. Here, we describe an approach that combines a Cas9 nickase mutant with paired guide RNAs to introduce targeted double-strand breaks. Because individual nicks in the genome are repaired with high fidelity, simultaneous nicking via appropriately offset guide RNAs is required for double-stranded breaks and extends the number of specifically recognized bases for target cleavage. We demonstrate that using paired nicking can reduce off-target activity by 50- to 1,500-fold in cell lines and to facilitate gene knockout in mouse zygotes without sacrificing on-target cleavage efficiency. This versatile strategy enables a wide variety of genome editing applications that require high specificity.


Subject(s)
DNA Breaks, Double-Stranded , Gene Targeting/methods , Genome , Animals , Base Sequence , Mice , Molecular Sequence Data , Streptococcus pyogenes/enzymology , Streptococcus pyogenes/genetics , Zygote/metabolism , RNA, Small Untranslated
9.
PLoS Biol ; 21(6): e3002097, 2023 06.
Article in English | MEDLINE | ID: mdl-37310920

ABSTRACT

Identifying host genes essential for Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has the potential to reveal novel drug targets and further our understanding of Coronavirus Disease 2019 (COVID-19). We previously performed a genome-wide CRISPR/Cas9 screen to identify proviral host factors for highly pathogenic human coronaviruses. Few host factors were required by diverse coronaviruses across multiple cell types, but DYRK1A was one such exception. Although its role in coronavirus infection was previously undescribed, DYRK1A encodes Dual Specificity Tyrosine Phosphorylation Regulated Kinase 1A and is known to regulate cell proliferation and neuronal development. Here, we demonstrate that DYRK1A regulates ACE2 and DPP4 transcription independent of its catalytic kinase function to support SARS-CoV, SARS-CoV-2, and Middle East Respiratory Syndrome Coronavirus (MERS-CoV) entry. We show that DYRK1A promotes DNA accessibility at the ACE2 promoter and a putative distal enhancer, facilitating transcription and gene expression. Finally, we validate that the proviral activity of DYRK1A is conserved across species using cells of nonhuman primate and human origin. In summary, we report that DYRK1A is a novel regulator of ACE2 and DPP4 expression that may dictate susceptibility to multiple highly pathogenic human coronaviruses.


Subject(s)
COVID-19 , Virus Internalization , Animals , Humans , Angiotensin-Converting Enzyme 2 , COVID-19/genetics , COVID-19/metabolism , Dipeptidyl Peptidase 4 , Middle East Respiratory Syndrome Coronavirus/genetics , SARS-CoV-2/genetics , Severe acute respiratory syndrome-related coronavirus/genetics , Dyrk Kinases
10.
PLoS Pathog ; 19(7): e1011351, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37410700

ABSTRACT

Identification of host determinants of coronavirus infection informs mechanisms of pathogenesis and may provide novel therapeutic targets. Here, we demonstrate that the histone demethylase KDM6A promotes infection of diverse coronaviruses, including SARS-CoV, SARS-CoV-2, MERS-CoV and mouse hepatitis virus (MHV) in a demethylase activity-independent manner. Mechanistic studies reveal that KDM6A promotes viral entry by regulating expression of multiple coronavirus receptors, including ACE2, DPP4 and Ceacam1. Importantly, the TPR domain of KDM6A is required for recruitment of the histone methyltransferase KMT2D and histone deacetylase p300. Together this KDM6A-KMT2D-p300 complex localizes to the proximal and distal enhancers of ACE2 and regulates receptor expression. Notably, small molecule inhibition of p300 catalytic activity abrogates ACE2 and DPP4 expression and confers resistance to all major SARS-CoV-2 variants and MERS-CoV in primary human airway and intestinal epithelial cells. These data highlight the role for KDM6A-KMT2D-p300 complex activities in conferring diverse coronaviruses susceptibility and reveal a potential pan-coronavirus therapeutic target to combat current and emerging coronaviruses. One Sentence Summary: The KDM6A/KMT2D/EP300 axis promotes expression of multiple viral receptors and represents a potential drug target for diverse coronaviruses.


Subject(s)
COVID-19 , Middle East Respiratory Syndrome Coronavirus , Animals , Humans , Mice , Angiotensin-Converting Enzyme 2/metabolism , Dipeptidyl Peptidase 4/metabolism , Histone Demethylases/metabolism , Middle East Respiratory Syndrome Coronavirus/metabolism , Receptors, Virus/genetics , Receptors, Virus/metabolism , SARS-CoV-2/metabolism
11.
Mol Cell ; 63(3): 355-70, 2016 08 04.
Article in English | MEDLINE | ID: mdl-27494557

ABSTRACT

Advances in the development of delivery, repair, and specificity strategies for the CRISPR-Cas9 genome engineering toolbox are helping researchers understand gene function with unprecedented precision and sensitivity. CRISPR-Cas9 also holds enormous therapeutic potential for the treatment of genetic disorders by directly correcting disease-causing mutations. Although the Cas9 protein has been shown to bind and cleave DNA at off-target sites, the field of Cas9 specificity is rapidly progressing, with marked improvements in guide RNA selection, protein and guide engineering, novel enzymes, and off-target detection methods. We review important challenges and breakthroughs in the field as a comprehensive practical guide to interested users of genome editing technologies, highlighting key tools and strategies for optimizing specificity. The genome editing community should now strive to standardize such methods for measuring and reporting off-target activity, while keeping in mind that the goal for specificity should be continued improvement and vigilance.


Subject(s)
CRISPR-Associated Proteins/metabolism , CRISPR-Cas Systems , DNA/metabolism , Endonucleases/metabolism , Gene Editing/methods , Gene Targeting/methods , Genomics/methods , Animals , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , CRISPR-Associated Proteins/genetics , Computational Biology , DNA/genetics , Endonucleases/genetics , Humans , Kinetics , Mutation , Protein Engineering , RNA, Guide, Kinetoplastida/genetics , RNA, Guide, Kinetoplastida/metabolism , Substrate Specificity
12.
Nat Chem Biol ; 17(9): 982-988, 2021 09.
Article in English | MEDLINE | ID: mdl-34354262

ABSTRACT

Direct, amplification-free detection of RNA has the potential to transform molecular diagnostics by enabling simple on-site analysis of human or environmental samples. CRISPR-Cas nucleases offer programmable RNA-guided RNA recognition that triggers cleavage and release of a fluorescent reporter molecule, but long reaction times hamper their detection sensitivity and speed. Here, we show that unrelated CRISPR nucleases can be deployed in tandem to provide both direct RNA sensing and rapid signal generation, thus enabling robust detection of ~30 molecules per µl of RNA in 20 min. Combining RNA-guided Cas13 and Csm6 with a chemically stabilized activator creates a one-step assay that can detect severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA extracted from respiratory swab samples with quantitative reverse transcriptase PCR (qRT-PCR)-derived cycle threshold (Ct) values up to 33, using a compact detector. This Fast Integrated Nuclease Detection In Tandem (FIND-IT) approach enables sensitive, direct RNA detection in a format that is amenable to point-of-care infection diagnosis as well as to a wide range of other diagnostic or research applications.


Subject(s)
COVID-19/genetics , CRISPR-Cas Systems/genetics , RNA, Viral/genetics , SARS-CoV-2/genetics , Humans , Reverse Transcriptase Polymerase Chain Reaction
13.
Nature ; 517(7536): 583-8, 2015 Jan 29.
Article in English | MEDLINE | ID: mdl-25494202

ABSTRACT

Systematic interrogation of gene function requires the ability to perturb gene expression in a robust and generalizable manner. Here we describe structure-guided engineering of a CRISPR-Cas9 complex to mediate efficient transcriptional activation at endogenous genomic loci. We used these engineered Cas9 activation complexes to investigate single-guide RNA (sgRNA) targeting rules for effective transcriptional activation, to demonstrate multiplexed activation of ten genes simultaneously, and to upregulate long intergenic non-coding RNA (lincRNA) transcripts. We also synthesized a library consisting of 70,290 guides targeting all human RefSeq coding isoforms to screen for genes that, upon activation, confer resistance to a BRAF inhibitor. The top hits included genes previously shown to be able to confer resistance, and novel candidates were validated using individual sgRNA and complementary DNA overexpression. A gene expression signature based on the top screening hits correlated with markers of BRAF inhibitor resistance in cell lines and patient-derived samples. These results collectively demonstrate the potential of Cas9-based activators as a powerful genetic perturbation technology.


Subject(s)
CRISPR-Cas Systems/genetics , Genetic Engineering/methods , Genome, Human/genetics , Melanoma/genetics , Transcriptional Activation/genetics , CRISPR-Associated Proteins/genetics , CRISPR-Associated Proteins/metabolism , Cell Line, Tumor , Clustered Regularly Interspaced Short Palindromic Repeats/genetics , DNA, Complementary/biosynthesis , DNA, Complementary/genetics , Drug Resistance, Neoplasm/drug effects , Drug Resistance, Neoplasm/genetics , Gene Expression Regulation, Neoplastic/genetics , Gene Library , Genetic Loci/genetics , Genetic Testing , Humans , Indoles/pharmacology , Melanoma/drug therapy , Proto-Oncogene Proteins B-raf/antagonists & inhibitors , RNA, Untranslated/biosynthesis , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Reproducibility of Results , Sulfonamides/pharmacology , Up-Regulation/genetics
14.
Nature ; 500(7463): 472-476, 2013 Aug 22.
Article in English | MEDLINE | ID: mdl-23877069

ABSTRACT

The dynamic nature of gene expression enables cellular programming, homeostasis and environmental adaptation in living systems. Dissection of causal gene functions in cellular and organismal processes therefore necessitates approaches that enable spatially and temporally precise modulation of gene expression. Recently, a variety of microbial and plant-derived light-sensitive proteins have been engineered as optogenetic actuators, enabling high-precision spatiotemporal control of many cellular functions. However, versatile and robust technologies that enable optical modulation of transcription in the mammalian endogenous genome remain elusive. Here we describe the development of light-inducible transcriptional effectors (LITEs), an optogenetic two-hybrid system integrating the customizable TALE DNA-binding domain with the light-sensitive cryptochrome 2 protein and its interacting partner CIB1 from Arabidopsis thaliana. LITEs do not require additional exogenous chemical cofactors, are easily customized to target many endogenous genomic loci, and can be activated within minutes with reversibility. LITEs can be packaged into viral vectors and genetically targeted to probe specific cell populations. We have applied this system in primary mouse neurons, as well as in the brain of freely behaving mice in vivo to mediate reversible modulation of mammalian endogenous gene expression as well as targeted epigenetic chromatin modifications. The LITE system establishes a novel mode of optogenetic control of endogenous cellular processes and enables direct testing of the causal roles of genetic and epigenetic regulation in normal biological processes and disease states.


Subject(s)
Epigenesis, Genetic/genetics , Epigenesis, Genetic/radiation effects , Gene Expression Regulation/radiation effects , Light , Optogenetics/methods , Transcription, Genetic/radiation effects , Animals , Arabidopsis Proteins/metabolism , Basic Helix-Loop-Helix Transcription Factors/metabolism , Cells, Cultured , Chromatin/genetics , Chromatin/radiation effects , Cryptochromes/metabolism , Gene Expression Regulation/genetics , Genetic Vectors/genetics , Male , Mice , Mice, Inbred C57BL , Neurons/metabolism , Neurons/radiation effects , Time Factors , Transcription, Genetic/genetics , Two-Hybrid System Techniques , Wakefulness
16.
bioRxiv ; 2024 Jun 23.
Article in English | MEDLINE | ID: mdl-38948874

ABSTRACT

Gene therapies have the potential to treat disease by delivering therapeutic genetic cargo to disease-associated cells. One limitation to their widespread use is the lack of short regulatory sequences, or promoters, that differentially induce the expression of delivered genetic cargo in target cells, minimizing side effects in other cell types. Such cell-type-specific promoters are difficult to discover using existing methods, requiring either manual curation or access to large datasets of promoter-driven expression from both targeted and untargeted cells. Model-based optimization (MBO) has emerged as an effective method to design biological sequences in an automated manner, and has recently been used in promoter design methods. However, these methods have only been tested using large training datasets that are expensive to collect, and focus on designing promoters for markedly different cell types, overlooking the complexities associated with designing promoters for closely related cell types that share similar regulatory features. Therefore, we introduce a comprehensive framework for utilizing MBO to design promoters in a data-efficient manner, with an emphasis on discovering promoters for similar cell types. We use conservative objective models (COMs) for MBO and highlight practical considerations such as best practices for improving sequence diversity, getting estimates of model uncertainty, and choosing the optimal set of sequences for experimental validation. Using three relatively similar blood cancer cell lines (Jurkat, K562, and THP1), we show that our approach discovers many novel cell-type-specific promoters after experimentally validating the designed sequences. For K562 cells, in particular, we discover a promoter that has 75.85% higher cell-type-specificity than the best promoter from the initial dataset used to train our models.

17.
bioRxiv ; 2024 Jan 26.
Article in English | MEDLINE | ID: mdl-38328150

ABSTRACT

Genomic rearrangements, encompassing mutational changes in the genome such as insertions, deletions, or inversions, are essential for genetic diversity. These rearrangements are typically orchestrated by enzymes involved in fundamental DNA repair processes such as homologous recombination or in the transposition of foreign genetic material by viruses and mobile genetic elements (MGEs). We report that IS110 insertion sequences, a family of minimal and autonomous MGEs, express a structured non-coding RNA that binds specifically to their encoded recombinase. This bridge RNA contains two internal loops encoding nucleotide stretches that base-pair with the target DNA and donor DNA, which is the IS110 element itself. We demonstrate that the target-binding and donor-binding loops can be independently reprogrammed to direct sequence-specific recombination between two DNA molecules. This modularity enables DNA insertion into genomic target sites as well as programmable DNA excision and inversion. The IS110 bridge system expands the diversity of nucleic acid-guided systems beyond CRISPR and RNA interference, offering a unified mechanism for the three fundamental DNA rearrangements required for genome design.

18.
ArXiv ; 2024 Oct 14.
Article in English | MEDLINE | ID: mdl-39398201

ABSTRACT

The cell is arguably the most fundamental unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artificial intelligence (AI), combined with the ability to generate large-scale experimental data, present novel opportunities to model cells. Here we propose a vision of leveraging advances in AI to construct virtual cells, high-fidelity simulations of cells and cellular systems under different conditions that are directly learned from biological data across measurements and scales. We discuss desired capabilities of such AI Virtual Cells, including generating universal representations of biological entities across scales, and facilitating interpretable in silico experiments to predict and understand their behavior using Virtual Instruments. We further address the challenges, opportunities and requirements to realize this vision including data needs, evaluation strategies, and community standards and engagement to ensure biological accuracy and broad utility. We envision a future where AI Virtual Cells help identify new drug targets, predict cellular responses to perturbations, as well as scale hypothesis exploration. With open science collaborations across the biomedical ecosystem that includes academia, philanthropy, and the biopharma and AI industries, a comprehensive predictive understanding of cell mechanisms and interactions has come into reach.

19.
bioRxiv ; 2023 Feb 27.
Article in English | MEDLINE | ID: mdl-36909524

ABSTRACT

Advances in gene delivery technologies are enabling rapid progress in molecular medicine, but require precise expression of genetic cargo in desired cell types, which is predominantly achieved via a regulatory DNA sequence called a promoter; however, only a handful of cell type-specific promoters are known. Efficiently designing compact promoter sequences with a high density of regulatory information by leveraging machine learning models would therefore be broadly impactful for fundamental research and direct therapeutic applications. However, models of expression from such compact promoter sequences are lacking, despite the recent success of deep learning in modelling expression from endogenous regulatory sequences. Despite the lack of large datasets measuring promoter-driven expression in many cell types, data from a few well-studied cell types or from endogenous gene expression may provide relevant information for transfer learning, which has not yet been explored in this setting. Here, we evaluate a variety of pretraining tasks and transfer strategies for modelling cell type-specific expression from compact promoters and demonstrate the effectiveness of pretraining on existing promoter-driven expression datasets from other cell types. Our approach is broadly applicable for modelling promoter-driven expression in any data-limited cell type of interest, and will enable the use of model-based optimization techniques for promoter design for gene delivery applications. Our code and data are available at https://github.com/anikethjr/promoter_models.

20.
Cell Syst ; 14(12): 1087-1102.e13, 2023 12 20.
Article in English | MEDLINE | ID: mdl-38091991

ABSTRACT

Effective and precise mammalian transcriptome engineering technologies are needed to accelerate biological discovery and RNA therapeutics. Despite the promise of programmable CRISPR-Cas13 ribonucleases, their utility has been hampered by an incomplete understanding of guide RNA design rules and cellular toxicity resulting from off-target or collateral RNA cleavage. Here, we quantified the performance of over 127,000 RfxCas13d (CasRx) guide RNAs and systematically evaluated seven machine learning models to build a guide efficiency prediction algorithm orthogonally validated across multiple human cell types. Deep learning model interpretation revealed preferred sequence motifs and secondary features for highly efficient guides. We next identified and screened 46 novel Cas13d orthologs, finding that DjCas13d achieves low cellular toxicity and high specificity-even when targeting abundant transcripts in sensitive cell types, including stem cells and neurons. Our Cas13d guide efficiency model was successfully generalized to DjCas13d, illustrating the power of combining machine learning with ortholog discovery to advance RNA targeting in human cells.


Subject(s)
CRISPR-Cas Systems , Deep Learning , RNA , Humans , CRISPR-Cas Systems/genetics , RNA/genetics , RNA, Guide, CRISPR-Cas Systems , Transcriptome
SELECTION OF CITATIONS
SEARCH DETAIL