Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
Nature ; 630(8018): 994-1002, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38926616

ABSTRACT

Insertion sequence (IS) elements are the simplest autonomous transposable elements found in prokaryotic genomes1. We recently discovered that IS110 family elements encode a recombinase and a non-coding bridge RNA (bRNA) that confers modular specificity for target DNA and donor DNA through two programmable loops2. Here we report the cryo-electron microscopy structures of the IS110 recombinase in complex with its bRNA, target DNA and donor DNA in three different stages of the recombination reaction cycle. The IS110 synaptic complex comprises two recombinase dimers, one of which houses the target-binding loop of the bRNA and binds to target DNA, whereas the other coordinates the bRNA donor-binding loop and donor DNA. We uncovered the formation of a composite RuvC-Tnp active site that spans the two dimers, positioning the catalytic serine residues adjacent to the recombination sites in both target and donor DNA. A comparison of the three structures revealed that (1) the top strands of target and donor DNA are cleaved at the composite active sites to form covalent 5'-phosphoserine intermediates, (2) the cleaved DNA strands are exchanged and religated to create a Holliday junction intermediate, and (3) this intermediate is subsequently resolved by cleavage of the bottom strands. Overall, this study reveals the mechanism by which a bispecific RNA confers target and donor DNA specificity to IS110 recombinases for programmable DNA recombination.


Subject(s)
DNA , RNA, Untranslated , Recombination, Genetic , Catalytic Domain , Cryoelectron Microscopy , DNA/chemistry , DNA/metabolism , DNA/ultrastructure , DNA Transposable Elements/genetics , Models, Molecular , Nucleic Acid Conformation , Protein Multimerization , Recombinases/chemistry , Recombinases/genetics , Recombinases/metabolism , RNA, Untranslated/chemistry , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , RNA, Untranslated/ultrastructure , Substrate Specificity
2.
Nature ; 630(8018): 984-993, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38926615

ABSTRACT

Genomic rearrangements, encompassing mutational changes in the genome such as insertions, deletions or inversions, are essential for genetic diversity. These rearrangements are typically orchestrated by enzymes that are involved in fundamental DNA repair processes, such as homologous recombination, or in the transposition of foreign genetic material by viruses and mobile genetic elements1,2. Here we report that IS110 insertion sequences, a family of minimal and autonomous mobile genetic elements, express a structured non-coding RNA that binds specifically to their encoded recombinase. This bridge RNA contains two internal loops encoding nucleotide stretches that base-pair with the target DNA and the donor DNA, which is the IS110 element itself. We demonstrate that the target-binding and donor-binding loops can be independently reprogrammed to direct sequence-specific recombination between two DNA molecules. This modularity enables the insertion of DNA into genomic target sites, as well as programmable DNA excision and inversion. The IS110 bridge recombination system expands the diversity of nucleic-acid-guided systems beyond CRISPR and RNA interference, offering a unified mechanism for the three fundamental DNA rearrangements-insertion, excision and inversion-that are required for genome design.


Subject(s)
DNA , RNA, Untranslated , Recombination, Genetic , Base Pairing , Base Sequence , DNA/genetics , DNA/metabolism , DNA Transposable Elements/genetics , Mutagenesis, Insertional/genetics , Recombinases/metabolism , Recombinases/genetics , Recombination, Genetic/genetics , RNA, Untranslated/genetics , RNA, Untranslated/metabolism
3.
bioRxiv ; 2024 Jan 26.
Article in English | MEDLINE | ID: mdl-38328150

ABSTRACT

Genomic rearrangements, encompassing mutational changes in the genome such as insertions, deletions, or inversions, are essential for genetic diversity. These rearrangements are typically orchestrated by enzymes involved in fundamental DNA repair processes such as homologous recombination or in the transposition of foreign genetic material by viruses and mobile genetic elements (MGEs). We report that IS110 insertion sequences, a family of minimal and autonomous MGEs, express a structured non-coding RNA that binds specifically to their encoded recombinase. This bridge RNA contains two internal loops encoding nucleotide stretches that base-pair with the target DNA and donor DNA, which is the IS110 element itself. We demonstrate that the target-binding and donor-binding loops can be independently reprogrammed to direct sequence-specific recombination between two DNA molecules. This modularity enables DNA insertion into genomic target sites as well as programmable DNA excision and inversion. The IS110 bridge system expands the diversity of nucleic acid-guided systems beyond CRISPR and RNA interference, offering a unified mechanism for the three fundamental DNA rearrangements required for genome design.

4.
Cell Syst ; 14(12): 1087-1102.e13, 2023 12 20.
Article in English | MEDLINE | ID: mdl-38091991

ABSTRACT

Effective and precise mammalian transcriptome engineering technologies are needed to accelerate biological discovery and RNA therapeutics. Despite the promise of programmable CRISPR-Cas13 ribonucleases, their utility has been hampered by an incomplete understanding of guide RNA design rules and cellular toxicity resulting from off-target or collateral RNA cleavage. Here, we quantified the performance of over 127,000 RfxCas13d (CasRx) guide RNAs and systematically evaluated seven machine learning models to build a guide efficiency prediction algorithm orthogonally validated across multiple human cell types. Deep learning model interpretation revealed preferred sequence motifs and secondary features for highly efficient guides. We next identified and screened 46 novel Cas13d orthologs, finding that DjCas13d achieves low cellular toxicity and high specificity-even when targeting abundant transcripts in sensitive cell types, including stem cells and neurons. Our Cas13d guide efficiency model was successfully generalized to DjCas13d, illustrating the power of combining machine learning with ortholog discovery to advance RNA targeting in human cells.


Subject(s)
CRISPR-Cas Systems , Deep Learning , RNA , Humans , CRISPR-Cas Systems/genetics , RNA/genetics , RNA, Guide, CRISPR-Cas Systems , Transcriptome
5.
bioRxiv ; 2023 Nov 06.
Article in English | MEDLINE | ID: mdl-37986808

ABSTRACT

Mapping the functional human genome and impact of genetic variants is often limited to European-descendent population samples. To aid in overcoming this limitation, we measured gene expression using RNA sequencing in lymphoblastoid cell lines (LCLs) from 599 individuals from six African populations to identify novel transcripts including those not represented in the hg38 reference genome. We used whole genomes from the 1000 Genomes Project and 164 Maasai individuals to identify 8,881 expression and 6,949 splicing quantitative trait loci (eQTLs/sQTLs), and 2,611 structural variants associated with gene expression (SV-eQTLs). We further profiled chromatin accessibility using ATAC-Seq in a subset of 100 representative individuals, to identity chromatin accessibility quantitative trait loci (caQTLs) and allele-specific chromatin accessibility, and provide predictions for the functional effect of 78.9 million variants on chromatin accessibility. Using this map of eQTLs and caQTLs we fine-mapped GWAS signals for a range of complex diseases. Combined, this work expands global functional genomic data to identify novel transcripts, functional elements and variants, understand population genetic history of molecular quantitative trait loci, and further resolve the genetic basis of multiple human traits and disease.

6.
Nat Biotechnol ; 41(4): 488-499, 2023 04.
Article in English | MEDLINE | ID: mdl-36217031

ABSTRACT

Large serine recombinases (LSRs) are DNA integrases that facilitate the site-specific integration of mobile genetic elements into bacterial genomes. Only a few LSRs, such as Bxb1 and PhiC31, have been characterized to date, with limited efficiency as tools for DNA integration in human cells. In this study, we developed a computational approach to identify thousands of LSRs and their DNA attachment sites, expanding known LSR diversity by >100-fold and enabling the prediction of their insertion site specificities. We tested their recombination activity in human cells, classifying them as landing pad, genome-targeting or multi-targeting LSRs. Overall, we achieved up to seven-fold higher recombination than Bxb1 and genome integration efficiencies of 40-75% with cargo sizes over 7 kb. We also demonstrate virus-free, direct integration of plasmid or amplicon libraries for improved functional genomics applications. This systematic discovery of recombinases directly from microbial sequencing data provides a resource of over 60 LSRs experimentally characterized in human cells for large-payload genome insertion without exposed DNA double-stranded breaks.


Subject(s)
Genetic Engineering , Integrases , Humans , Genome, Human , Transfection , Genomic Library
7.
Am J Hum Genet ; 109(6): 1055-1064, 2022 06 02.
Article in English | MEDLINE | ID: mdl-35588732

ABSTRACT

Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual's likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expression and integrate their impact into PRS predictions for body mass index (BMI), obesity, and bariatric surgery. Between the top and bottom 10%, we observed a 20.8% increase in risk for obesity (p = 3 × 10-14), 62.3% increase in risk for severe obesity (p = 1 × 10-6), and median 5.29 years earlier onset for bariatric surgery (p = 0.008), as a function of expression outlier-associated rare variant burden when controlling for common variant PRS. We show that these predictions were more significant than integrating the effects of rare protein-truncating variants (PTVs), observing a mean 19% increase in phenotypic variance explained with expression outlier-associated rare variants when compared with PTVs (p = 2 × 10-15). We replicated these findings by using data from the Million Veteran Program and demonstrated that PRSs across multiple traits and diseases can benefit from the inclusion of expression outlier-associated rare variants identified through population-scale transcriptome sequencing.


Subject(s)
Multifactorial Inheritance , Obesity , Body Mass Index , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Multifactorial Inheritance/genetics , Obesity/genetics , Phenotype , Risk Factors
8.
Sci Data ; 9(1): 167, 2022 04 12.
Article in English | MEDLINE | ID: mdl-35414062

ABSTRACT

London is one of the world's most important coastal cities and is located around the Thames Estuary, United Kingdom (UK). Quantifying changes in sea levels in the Thames Estuary over the 20th century and early part of the 21st century is vital to inform future management of flood risk in London. However, there are currently relatively few long, digital records of sea level available in the Thames. Here we present a new extensive sea level dataset that we have digitised from historical hand-written tabulated ledgers of high and low water, from the Port of London Authority (PLA). We captured 463 years of data, from across 15 tide gauge sites, for the period 1911 to 1995. When these historical datasets are combined with digital records available from the PLA since 1995, the sea level time-series span the 111-year period from 1911 to 2021. This new dataset will be of great importance for ongoing monitoring of mean sea-level rise, and changes in tidal range and extreme sea levels in the Thames Estuary.

9.
Genome Med ; 14(1): 31, 2022 03 15.
Article in English | MEDLINE | ID: mdl-35292083

ABSTRACT

BACKGROUND: Identification of causal genes for polygenic human diseases has been extremely challenging, and our understanding of how physiological and pharmacological stimuli modulate genetic risk at disease-associated loci is limited. Specifically, insulin resistance (IR), a common feature of cardiometabolic disease, including type 2 diabetes, obesity, and dyslipidemia, lacks well-powered genome-wide association studies (GWAS), and therefore, few associated loci and causal genes have been identified. METHODS: Here, we perform and integrate linkage disequilibrium (LD)-adjusted colocalization analyses across nine cardiometabolic traits (fasting insulin, fasting glucose, insulin sensitivity, insulin sensitivity index, type 2 diabetes, triglycerides, high-density lipoprotein, body mass index, and waist-hip ratio) combined with expression and splicing quantitative trait loci (eQTLs and sQTLs) from five metabolically relevant human tissues (subcutaneous and visceral adipose, skeletal muscle, liver, and pancreas). To elucidate the upstream regulators and functional mechanisms for these genes, we integrate their transcriptional responses to 21 relevant physiological and pharmacological perturbations in human adipocytes, hepatocytes, and skeletal muscle cells and map their protein-protein interactions. RESULTS: We identify 470 colocalized loci and prioritize 207 loci with a single colocalized gene. Patterns of shared colocalizations across traits and tissues highlight different potential roles for colocalized genes in cardiometabolic disease and distinguish several genes involved in pancreatic ß-cell function from others with a more direct role in skeletal muscle, liver, and adipose tissues. At the loci with a single colocalized gene, 42 of these genes were regulated by insulin and 35 by glucose in perturbation experiments, including 17 regulated by both. Other metabolic perturbations regulated the expression of 30 more genes not regulated by glucose or insulin, pointing to other potential upstream regulators of candidate causal genes. CONCLUSIONS: Our use of transcriptional responses under metabolic perturbations to contextualize genetic associations from our custom colocalization approach provides a list of likely causal genes and their upstream regulators in the context of IR-associated cardiometabolic risk.


Subject(s)
Cardiovascular Diseases , Diabetes Mellitus, Type 2 , Insulin Resistance , Cardiovascular Diseases/genetics , Diabetes Mellitus, Type 2/genetics , Genome-Wide Association Study , Humans , Insulin Resistance/genetics , Quantitative Trait Loci
10.
Am J Hum Genet ; 108(10): 1866-1879, 2021 10 07.
Article in English | MEDLINE | ID: mdl-34582792

ABSTRACT

Complex traits and diseases can be influenced by both genetics and environment. However, given the large number of environmental stimuli and power challenges for gene-by-environment testing, it remains a critical challenge to identify and prioritize specific disease-relevant environmental exposures. We propose a framework for leveraging signals from transcriptional responses to environmental perturbations to identify disease-relevant perturbations that can modulate genetic risk for complex traits and inform the functions of genetic variants associated with complex traits. We perturbed human skeletal-muscle-, fat-, and liver-relevant cell lines with 21 perturbations affecting insulin resistance, glucose homeostasis, and metabolic regulation in humans and identified thousands of environmentally responsive genes. By combining these data with GWASs from 31 distinct polygenic traits, we show that the heritability of multiple traits is enriched in regions surrounding genes responsive to specific perturbations and, further, that environmentally responsive genes are enriched for associations with specific diseases and phenotypes from the GWAS Catalog. Overall, we demonstrate the advantages of large-scale characterization of transcriptional changes in diversely stimulated and pathologically relevant cells to identify disease-relevant perturbations.


Subject(s)
Gene-Environment Interaction , Genetic Predisposition to Disease , Genome-Wide Association Study , Multifactorial Inheritance , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Autoimmune Diseases/etiology , Autoimmune Diseases/pathology , Humans , Mental Disorders/etiology , Mental Disorders/pathology , Metabolic Diseases/etiology , Metabolic Diseases/pathology , Phenotype
11.
Cell Host Microbe ; 29(1): 121-131.e4, 2021 01 13.
Article in English | MEDLINE | ID: mdl-33290720

ABSTRACT

Small open reading frames (smORFs) and their encoded microproteins play central roles in microbes. However, there is a vast unexplored space of smORFs within human-associated microbes. A recent bioinformatic analysis used evolutionary conservation signals to enhance prediction of small protein families. To facilitate the annotation of specific smORFs, we introduce SmORFinder. This tool combines profile hidden Markov models of each smORF family and deep learning models that better generalize to smORF families not seen in the training set, resulting in predictions enriched for Ribo-seq translation signals. Feature importance analysis reveals that the deep learning models learn to identify Shine-Dalgarno sequences, deprioritize the wobble position in each codon, and group codon synonyms found in the codon table. A core-genome analysis of 26 bacterial species identifies several core smORFs of unknown function. We pre-compute smORF annotations for thousands of RefSeq isolate genomes and Human Microbiome Project metagenomes and provide these data through a public web portal.


Subject(s)
Bacteria/genetics , Genome, Bacterial , Molecular Sequence Annotation , Open Reading Frames , Bacterial Proteins/genetics , Computational Biology , Deep Learning , Humans , Markov Chains , Microbiota , Models, Theoretical
13.
Cell ; 181(5): 1112-1130.e16, 2020 05 28.
Article in English | MEDLINE | ID: mdl-32470399

ABSTRACT

Acute physical activity leads to several changes in metabolic, cardiovascular, and immune pathways. Although studies have examined selected changes in these pathways, the system-wide molecular response to an acute bout of exercise has not been fully characterized. We performed longitudinal multi-omic profiling of plasma and peripheral blood mononuclear cells including metabolome, lipidome, immunome, proteome, and transcriptome from 36 well-characterized volunteers, before and after a controlled bout of symptom-limited exercise. Time-series analysis revealed thousands of molecular changes and an orchestrated choreography of biological processes involving energy metabolism, oxidative stress, inflammation, tissue repair, and growth factor response, as well as regulatory pathways. Most of these processes were dampened and some were reversed in insulin-resistant participants. Finally, we discovered biological pathways involved in cardiopulmonary exercise response and developed prediction models revealing potential resting blood-based biomarkers of peak oxygen consumption.


Subject(s)
Energy Metabolism/physiology , Exercise/physiology , Aged , Biomarkers/metabolism , Female , Humans , Insulin/metabolism , Insulin Resistance , Leukocytes, Mononuclear/metabolism , Longitudinal Studies , Male , Metabolome , Middle Aged , Oxygen/metabolism , Oxygen Consumption , Proteome , Transcriptome
14.
Cell Host Microbe ; 27(1): 140-153.e9, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31862382

ABSTRACT

Mobile genetic elements (MGEs) contribute to bacterial adaptation and evolution; however, high-throughput, unbiased MGE detection remains challenging. We describe MGEfinder, a bioinformatic toolbox that identifies integrative MGEs and their insertion sites by using short-read sequencing data. MGEfinder identifies the genomic site of each MGE insertion and infers the identity of the inserted sequence. We apply MGEfinder to 12,374 sequenced isolates of 9 prevalent bacterial pathogens, including Mycobacterium tuberculosis, Staphylococcus aureus, and Escherichia coli, and identify thousands of MGEs, including candidate insertion sequences, conjugative transposons, and prophage elements. The MGE repertoire and insertion rates vary across species, and integration sites often cluster near genes related to antibiotic resistance, virulence, and pathogenicity. MGE insertions likely contribute to antibiotic resistance in laboratory experiments and clinical isolates. Additionally, we identified thousands of mobility genes, a subset of which have unknown function opening avenues for exploration. Future application of MGEfinder to commensal bacteria will further illuminate bacterial adaptation and evolution.


Subject(s)
Bacteria/genetics , Computational Biology/methods , DNA Transposable Elements/genetics , Adaptation, Biological/genetics , Drug Resistance, Microbial/genetics , Prophages/genetics , Prophages/isolation & purification , Virulence/genetics
15.
Genome Biol ; 20(1): 230, 2019 11 04.
Article in English | MEDLINE | ID: mdl-31684996

ABSTRACT

BACKGROUND: Molecular and cellular changes are intrinsic to aging and age-related diseases. Prior cross-sectional studies have investigated the combined effects of age and genetics on gene expression and alternative splicing; however, there has been no long-term, longitudinal characterization of these molecular changes, especially in older age. RESULTS: We perform RNA sequencing in whole blood from the same individuals at ages 70 and 80 to quantify how gene expression, alternative splicing, and their genetic regulation are altered during this 10-year period of advanced aging at a population and individual level. We observe that individuals are more similar to their own expression profiles later in life than profiles of other individuals their own age. We identify 1291 and 294 genes differentially expressed and alternatively spliced with age, as well as 529 genes with outlying individual trajectories. Further, we observe a strong correlation of genetic effects on expression and splicing between the two ages, with a small subset of tested genes showing a reduction in genetic associations with expression and splicing in older age. CONCLUSIONS: These findings demonstrate that, although the transcriptome and its genetic regulation is mostly stable late in life, a small subset of genes is dynamic and is characterized by a reduction in genetic regulation, most likely due to increasing environmental variance with age.


Subject(s)
Aging/genetics , Alternative Splicing , Gene Expression Regulation , Aged , Aged, 80 and over , Aging/metabolism , Female , Humans , Male
16.
Nat Microbiol ; 4(6): 912-913, 2019 06.
Article in English | MEDLINE | ID: mdl-31118502
17.
New Phytol ; 215(3): 1264-1273, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28618009

ABSTRACT

Herbivory-induced defenses are specific and activated in plants when elicitors, frequently found in the herbivores' oral secretions, are introduced into wounds during attack. While complex signaling cascades are known to be involved, it remains largely unclear how natural selection has shaped the evolution of these induced defenses. We analyzed herbivory-induced transcriptomic responses in wild tobacco, Nicotiana attenuata, using a phylotranscriptomic approach that measures the origin and sequence divergence of herbivory-induced genes. Highly conserved and evolutionarily ancient genes of primary metabolism were activated at intermediate time points (2-6 h) after elicitation, while less constrained and young genes associated with defense signaling and biosynthesis of specialized metabolites were activated at early (before 2 h) and late (after 6 h) stages of the induced response, respectively - a pattern resembling the evolutionary hourglass pattern observed during embryogenesis in animals and the developmental process in plants and fungi. The hourglass patterns found in herbivory-induced defense responses and developmental process are both likely to be a result of signaling modularization and differential evolutionary constraints on the modules involved in the signaling cascade.


Subject(s)
Evolution, Molecular , Herbivory/genetics , Nicotiana/genetics , Transcriptome/genetics , Gene Expression Regulation, Plant , Genes, Plant , Signal Transduction , Nicotiana/immunology
18.
BMC Genet ; 16 Suppl 2: S3, 2015.
Article in English | MEDLINE | ID: mdl-25953496

ABSTRACT

BACKGROUND: The S31N amantadine-resistance mutation in the influenza A M2 sequence currently occurs more frequently in nature than the S31 wild type. Overcoming the resistance of the S31N mutation is the primary focus of M2 researchers who aim to develop novel antiviral therapies. Recent studies have noted a possible rise in frequency of the V27A/S31N double amantadine-resistance mutation in recent years. The purpose of this study is to investigate this recent rise in frequency of the double mutation and any possible bias of the other mutations toward co-occurrence with S31N or S31 strains. RESULTS: The primary dataset used for this study was comprised of 24,152 influenza A M2 channel sequences which were downloaded from UniProt. There is an increased frequency for the S31N/V27A dual AR mutation in recent years, especially in swine. A test for difference in two proportions indicates that the V27A mutation is co-occurring with S31N more often than expected (p-value<0.001) when considering individual amino acid frequencies. At the same time, the different propensities for the V27A as compared to the V27T dual mutant may reflect differences in viral fitness or protein energetics, and this information could be exploited to focus drug development so as to reduce further drug insensitivity. CONCLUSIONS: The development of the S31N/V27A variant in the Midwestern US swine may be a harbinger of novel human strain development. V27A/S31N is a possible path forward for the evolution of M2 which may convey a new level of drug resistance and should receive attention in drug design.


Subject(s)
Amantadine/pharmacology , Drug Resistance, Viral , Influenza A virus/drug effects , Influenza A virus/genetics , Point Mutation , Viral Matrix Proteins/genetics , Animals , Influenza A virus/classification , Midwestern United States , Orthomyxoviridae Infections/veterinary , Orthomyxoviridae Infections/virology , Swine , Swine Diseases/virology
SELECTION OF CITATIONS
SEARCH DETAIL
...