RESUMO
Today's genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), which directly analyzes raw sequencing data, using a statistical test to detect a signature of regulation: sample-specific sequence variation. SPLASH detects many types of variation and can be efficiently run at scale. We show that SPLASH identifies complex mutation patterns in SARS-CoV-2, discovers regulated RNA isoforms at the single-cell level, detects the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a unifying approach to genomic analysis that enables expansive discovery without metadata or references.
Assuntos
Algoritmos , Genômica , Genoma , Análise de Sequência de RNA , Humanos , Antígenos HLA/genética , Análise de Célula ÚnicaRESUMO
Coronavirus genomes sequester their start codons within stem-loop 5 (SL5), a structured, 5' genomic RNA element. In most alpha- and betacoronaviruses, the secondary structure of SL5 is predicted to contain a four-way junction of helical stems, some of which are capped with UUYYGU hexaloops. Here, using cryogenic electron microscopy (cryo-EM) and computational modeling with biochemically determined secondary structures, we present three-dimensional structures of SL5 from six coronaviruses. The SL5 domain of betacoronavirus severe-acute-respiratory-syndrome-related coronavirus 2 (SARS-CoV-2), resolved at 4.7 Å resolution, exhibits a T-shaped structure, with its UUYYGU hexaloops at opposing ends of a coaxial stack, the T's "arms." Further analysis of SL5 domains from SARS-CoV-1 and MERS (7.1 and 6.4 to 6.9 Å resolution, respectively) indicate that the junction geometry and inter-hexaloop distances are conserved features across these human-infecting betacoronaviruses. The MERS SL5 domain displays an additional tertiary interaction, which is also observed in the non-human-infecting betacoronavirus BtCoV-HKU5 (5.9 to 8.0 Å resolution). SL5s from human-infecting alphacoronaviruses, HCoV-229E and HCoV-NL63 (6.5 and 8.4 to 9.0 Å resolution, respectively), exhibit the same coaxial stacks, including the UUYYGU-capped arms, but with a phylogenetically distinct crossing angle, an X-shape. As such, all SL5 domains studied herein fold into stable tertiary structures with cross-genus similarities and notable differences, with implications for potential protein-binding modes and therapeutic targets.
Assuntos
Alphacoronavirus , COVID-19 , Coronavirus Humano 229E , Humanos , SARS-CoV-2/genética , RNARESUMO
The first RNA category of the Critical Assessment of Techniques for Structure Prediction competition was only made possible because of the scientists who provided experimental structures to challenge the predictors. In this article, these scientists offer a unique and valuable analysis of both the successes and areas for improvement in the predicted models. All 10 RNA-only targets yielded predictions topologically similar to experimentally determined structures. For one target, experimentalists were able to phase their x-ray diffraction data by molecular replacement, showing a potential application of structure predictions for RNA structural biologists. Recommended areas for improvement include: enhancing the accuracy in local interaction predictions and increased consideration of the experimental conditions such as multimerization, structure determination method, and time along folding pathways. The prediction of RNA-protein complexes remains the most significant challenge. Finally, given the intrinsic flexibility of many RNAs, we propose the consideration of ensemble models.
Assuntos
Biologia Computacional , Proteínas , Conformação Proteica , Proteínas/química , Modelos Moleculares , Biologia Computacional/métodos , Difração de Raios XRESUMO
The discovery and design of biologically important RNA molecules is outpacing three-dimensional structural characterization. Here, we demonstrate that cryo-electron microscopy can routinely resolve maps of RNA-only systems and that these maps enable subnanometer-resolution coordinate estimation when complemented with multidimensional chemical mapping and Rosetta DRRAFTER computational modeling. This hybrid 'Ribosolve' pipeline detects and falsifies homologies and conformational rearrangements in 11 previously unknown 119- to 338-nucleotide protein-free RNA structures: full-length Tetrahymena ribozyme, hc16 ligase with and without substrate, full-length Vibrio cholerae and Fusobacterium nucleatum glycine riboswitch aptamers with and without glycine, Mycobacterium SAM-IV riboswitch with and without S-adenosylmethionine, and the computer-designed ATP-TTR-3 aptamer with and without AMP. Simulation benchmarks, blind challenges, compensatory mutagenesis, cross-RNA homologies and internal controls demonstrate that Ribosolve can accurately resolve the global architectures of RNA molecules but does not resolve atomic details. These tests offer guidelines for making inferences in future RNA structural studies with similarly accelerated throughput.
Assuntos
Microscopia Crioeletrônica/métodos , RNA/química , Simulação por Computador , Modelos Moleculares , Conformação de Ácido Nucleico , RNA Catalítico/química , RiboswitchRESUMO
Metagenomic sequencing is a swift and powerful tool to ascertain the presence of an organism of interest in a sample. However, sequencing coverage of the organism of interest can be insufficient due to an inundation of reads from irrelevant organisms in the sample. Here, we report a nuclease-based approach to rapidly enrich for DNA from certain organisms, including enterobacteria, based on their differential endogenous modification patterns. We exploit the ability of taxon-specific methylated motifs to resist the action of cognate methylation-sensitive restriction endonucleases that thereby digest unwanted, unmethylated DNA. Subsequently, we use a distributive exonuclease or electrophoretic separation to deplete or exclude the digested fragments, thus enriching for undigested DNA from the organism of interest. As a proof of concept, we apply this method to enrich for the enterobacteria Escherichia coli and Salmonella enterica by 11- to 142-fold from mock metagenomic samples and validate this approach as a versatile means to enrich for genomes of interest in metagenomic samples. IMPORTANCE Pathogens that contaminate the food supply or spread through other means can cause outbreaks that bring devastating repercussions to the health of a populace. Investigations to trace the source of these outbreaks are initiated rapidly but can be drawn out due to the labored methods of pathogen isolation. Metagenomic sequencing can alleviate this hurdle but is often insufficiently sensitive. The approach and implementations detailed here provide a rapid means to enrich for many pathogens involved in foodborne outbreaks, thereby improving the utility of metagenomic sequencing as a tool in outbreak investigations. Additionally, this approach provides a means to broadly enrich for otherwise minute levels of modified DNA, which may escape unnoticed in metagenomic samples.
Assuntos
Enzimas de Restrição do DNA , DNA Bacteriano , Escherichia coli , Metagenômica , Salmonella enterica , DNA , Escherichia coli/genética , Escherichia coli/isolamento & purificação , Sequenciamento de Nucleotídeos em Larga Escala , Metagenoma , Metagenômica/métodos , Salmonella enterica/genética , Salmonella enterica/isolamento & purificação , DNA Bacteriano/genéticaRESUMO
The rapid spread of COVID-19 is motivating development of antivirals targeting conserved SARS-CoV-2 molecular machinery. The SARS-CoV-2 genome includes conserved RNA elements that offer potential small-molecule drug targets, but most of their 3D structures have not been experimentally characterized. Here, we provide a compilation of chemical mapping data from our and other labs, secondary structure models, and 3D model ensembles based on Rosetta's FARFAR2 algorithm for SARS-CoV-2 RNA regions including the individual stems SL1-8 in the extended 5' UTR; the reverse complement of the 5' UTR SL1-4; the frameshift stimulating element (FSE); and the extended pseudoknot, hypervariable region, and s2m of the 3' UTR. For eleven of these elements (the stems in SL1-8, reverse complement of SL1-4, FSE, s2m and 3' UTR pseudoknot), modeling convergence supports the accuracy of predicted low energy states; subsequent cryo-EM characterization of the FSE confirms modeling accuracy. To aid efforts to discover small molecule RNA binders guided by computational models, we provide a second set of similarly prepared models for RNA riboswitches that bind small molecules. Both datasets ('FARFAR2-SARS-CoV-2', https://github.com/DasLab/FARFAR2-SARS-CoV-2; and 'FARFAR2-Apo-Riboswitch', at https://github.com/DasLab/FARFAR2-Apo-Riboswitch') include up to 400 models for each RNA element, which may facilitate drug discovery approaches targeting dynamic ensembles of RNA molecules.
Assuntos
Consenso , Modelos Moleculares , Conformação de Ácido Nucleico , RNA Viral/química , SARS-CoV-2/genética , Regiões 3' não Traduzidas/genética , Regiões 5' não Traduzidas/genética , Algoritmos , Aptâmeros de Nucleotídeos/genética , Sequência de Bases , Sítios de Ligação , Microscopia Crioeletrônica , Conjuntos de Dados como Assunto , Avaliação Pré-Clínica de Medicamentos/métodos , Mudança da Fase de Leitura do Gene Ribossômico/genética , Genoma Viral/genética , Estabilidade de RNA , RNA Viral/genética , Reprodutibilidade dos Testes , Riboswitch/genética , Bibliotecas de Moléculas Pequenas/químicaRESUMO
As the COVID-19 outbreak spreads, there is a growing need for a compilation of conserved RNA genome regions in the SARS-CoV-2 virus along with their structural propensities to guide development of antivirals and diagnostics. Here we present a first look at RNA sequence conservation and structural propensities in the SARS-CoV-2 genome. Using sequence alignments spanning a range of betacoronaviruses, we rank genomic regions by RNA sequence conservation, identifying 79 regions of length at least 15 nt as exactly conserved over SARS-related complete genome sequences available near the beginning of the COVID-19 outbreak. We then confirm the conservation of the majority of these genome regions across 739 SARS-CoV-2 sequences subsequently reported from the COVID-19 outbreak, and we present a curated list of 30 "SARS-related-conserved" regions. We find that known RNA structured elements curated as Rfam families and in prior literature are enriched in these conserved genome regions, and we predict additional conserved, stable secondary structures across the viral genome. We provide 106 "SARS-CoV-2-conserved-structured" regions as potential targets for antivirals that bind to structured RNA. We further provide detailed secondary structure models for the extended 5' UTR, frameshifting stimulation element, and 3' UTR. Lastly, we predict regions of the SARS-CoV-2 viral genome that have low propensity for RNA secondary structure and are conserved within SARS-CoV-2 strains. These 59 "SARS-CoV-2-conserved-unstructured" genomic regions may be most easily accessible by hybridization in primer-based diagnostic strategies.
Assuntos
Betacoronavirus/genética , RNA Viral/química , RNA Viral/genética , Sequência de Bases , Betacoronavirus/classificação , Evolução Molecular , Genoma Viral , Conformação de Ácido Nucleico , SARS-CoV-2 , Alinhamento de Sequência , TermodinâmicaRESUMO
In addition to encoding the tertiary fold and stability, the primary sequence of a protein encodes the folding trajectory and kinetic barriers that determine the speed of folding. How these kinetic barriers are encoded is not well understood. Here, we use evolutionary sequence variation in the α-lytic protease (αLP) protein family to probe the relationship between sequence and energy landscape. αLP has an unusual energy landscape: the native state of αLP is not the most thermodynamically favored conformation and, instead, remains folded due to a large kinetic barrier preventing unfolding. To fold, αLP utilizes an N-terminal pro region similar in size to the protease itself that functions as a folding catalyst. Once folded, the pro region is removed, and the native state does not unfold on a biologically relevant time scale. Without the pro region, αLP folds on the order of millennia. A phylogenetic search uncovers αLP homologs with a wide range of pro region sizes, including some with no pro region at all. In the resulting phylogenetic tree, these homologs cluster by pro region size. By studying homologs naturally lacking a pro region, we demonstrate they can be thermodynamically stable, fold much faster than αLP, yet retain the same fold as αLP. Key amino acids thought to contribute to αLP's extreme kinetic stability are lost in these homologs, supporting their role in kinetic stability. This study highlights how the entire energy landscape plays an important role in determining the evolutionary pressures on the protein sequence.
Assuntos
Proteínas de Bactérias/química , Evolução Molecular , Modelos Moleculares , Filogenia , Dobramento de Proteína , Serina Endopeptidases/química , Proteínas de Bactérias/genética , Estabilidade Enzimática , Cinética , Serina Endopeptidases/genéticaRESUMO
Here, we describe the "Obelisks," a previously unrecognised class of viroid-like elements that we first identified in human gut metatranscriptomic data. "Obelisks" share several properties: (i) apparently circular RNA ~1kb genome assemblies, (ii) predicted rod-like secondary structures encompassing the entire genome, and (iii) open reading frames coding for a novel protein superfamily, which we call the "Oblins". We find that Obelisks form their own distinct phylogenetic group with no detectable sequence or structural similarity to known biological agents. Further, Obelisks are prevalent in tested human microbiome metatranscriptomes with representatives detected in ~7% of analysed stool metatranscriptomes (29/440) and in ~50% of analysed oral metatranscriptomes (17/32). Obelisk compositions appear to differ between the anatomic sites and are capable of persisting in individuals, with continued presence over >300 days observed in one case. Large scale searches identified 29,959 Obelisks (clustered at 90% nucleotide identity), with examples from all seven continents and in diverse ecological niches. From this search, a subset of Obelisks are identified to code for Obelisk-specific variants of the hammerhead type-III self-cleaving ribozyme. Lastly, we identified one case of a bacterial species (Streptococcus sanguinis) in which a subset of defined laboratory strains harboured a specific Obelisk RNA population. As such, Obelisks comprise a class of diverse RNAs that have colonised, and gone unnoticed in, human, and global microbiomes.
RESUMO
Today's genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a new unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), an approach that directly analyzes raw sequencing data to detect a signature of regulation: sample-specific sequence variation. The approach, which includes a new statistical test, is computationally efficient and can be run at scale. SPLASH unifies detection of myriad forms of sequence variation. We demonstrate that SPLASH identifies complex mutation patterns in SARS-CoV-2 strains, discovers regulated RNA isoforms at the single cell level, documents the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a new unifying approach to genomic analysis that enables an expansive scope of discovery without metadata or references.
RESUMO
The authors have withdrawn this manuscript due to a duplicate posting of manuscript number BIORXIV/2022/497555. Therefore, the authors do not wish this work to be cited as reference for the project. If you have any questions, please contact the corresponding author. The correct preprint can be found at doi: https://doi.org/10.1101/2022.06.24.497555.
RESUMO
Coronavirus genomes sequester their start codons within stem-loop 5 (SL5), a structured, 5' genomic RNA element. In most alpha- and betacoronaviruses, the secondary structure of SL5 is predicted to contain a four-way junction of helical stems, some of which are capped with UUYYGU hexaloops. Here, using cryogenic electron microscopy (cryo-EM) and computational modeling with biochemically-determined secondary structures, we present three-dimensional structures of SL5 from six coronaviruses. The SL5 domain of betacoronavirus SARS-CoV-2, resolved at 4.7 Å resolution, exhibits a T-shaped structure, with its UUYYGU hexaloops at opposing ends of a coaxial stack, the T's "arms." Further analysis of SL5 domains from SARS-CoV-1 and MERS (7.1 and 6.4-6.9 Å resolution, respectively) indicate that the junction geometry and inter-hexaloop distances are conserved features across the studied human-infecting betacoronaviruses. The MERS SL5 domain displays an additional tertiary interaction, which is also observed in the non-human-infecting betacoronavirus BtCoV-HKU5 (5.9-8.0 Å resolution). SL5s from human-infecting alphacoronaviruses, HCoV-229E and HCoV-NL63 (6.5 and 8.4-9.0 Å resolution, respectively), exhibit the same coaxial stacks, including the UUYYGU-capped arms, but with a phylogenetically distinct crossing angle, an X-shape. As such, all SL5 domains studied herein fold into stable tertiary structures with cross-genus similarities, with implications for potential protein-binding modes and therapeutic targets.
RESUMO
Earth's life may have originated as self-replicating RNA, and it has been argued that RNA viruses and viroid-like elements are remnants of such pre-cellular RNA world. RNA viruses are defined by linear RNA genomes encoding an RNA-dependent RNA polymerase (RdRp), whereas viroid-like elements consist of small, single-stranded, circular RNA genomes that, in some cases, encode paired self-cleaving ribozymes. Here we show that the number of candidate viroid-like elements occurring in geographically and ecologically diverse niches is much higher than previously thought. We report that, amongst these circular genomes, fungal ambiviruses are viroid-like elements that undergo rolling circle replication and encode their own viral RdRp. Thus, ambiviruses are distinct infectious RNAs showing hybrid features of viroid-like RNAs and viruses. We also detected similar circular RNAs, containing active ribozymes and encoding RdRps, related to mitochondrial-like fungal viruses, highlighting fungi as an evolutionary hub for RNA viruses and viroid-like elements. Our findings point to a deep co-evolutionary history between RNA viruses and subviral elements and offer new perspectives in the origin and evolution of primordial infectious agents, and RNA life.
Assuntos
Vírus de RNA , RNA Catalítico , Viroides , Viroides/genética , RNA Catalítico/genética , RNA Viral/genética , Replicação Viral/genética , RNA/genética , Vírus de RNA/genética , RNA Polimerase Dependente de RNA/genética , Fungos/genéticaRESUMO
Drug discovery campaigns against COVID-19 are beginning to target the SARS-CoV-2 RNA genome. The highly conserved frameshift stimulation element (FSE), required for balanced expression of viral proteins, is a particularly attractive SARS-CoV-2 RNA target. Here we present a 6.9 Å resolution cryo-EM structure of the FSE (88 nucleotides, ~28 kDa), validated through an RNA nanostructure tagging method. The tertiary structure presents a topologically complex fold in which the 5' end is threaded through a ring formed inside a three-stem pseudoknot. Guided by this structure, we develop antisense oligonucleotides that impair FSE function in frameshifting assays and knock down SARS-CoV-2 virus replication in A549-ACE2 cells at 100 nM concentration.
Assuntos
COVID-19/prevenção & controle , Microscopia Crioeletrônica/métodos , Mutação da Fase de Leitura/genética , Oligonucleotídeos Antissenso/genética , RNA Viral/genética , Elementos de Resposta/genética , SARS-CoV-2/genética , Células A549 , Animais , Sequência de Bases , COVID-19/virologia , Linhagem Celular Tumoral , Chlorocebus aethiops , Genoma Viral/genética , Humanos , Modelos Moleculares , Conformação de Ácido Nucleico , Oligonucleotídeos Antissenso/farmacologia , RNA Viral/química , RNA Viral/ultraestrutura , SARS-CoV-2/fisiologia , SARS-CoV-2/ultraestrutura , Células Vero , Replicação Viral/efeitos dos fármacos , Replicação Viral/genéticaRESUMO
As the COVID-19 outbreak spreads, there is a growing need for a compilation of conserved RNA genome regions in the SARS-CoV-2 virus along with their structural propensities to guide development of antivirals and diagnostics. Using sequence alignments spanning a range of betacoronaviruses, we rank genomic regions by RNA sequence conservation, identifying 79 regions of length at least 15 nucleotides as exactly conserved over SARS-related complete genome sequences available near the beginning of the COVID-19 outbreak. We then confirm the conservation of the majority of these genome regions across 739 SARS-CoV-2 sequences reported to date from the current COVID-19 outbreak, and we present a curated list of 30 'SARS-related-conserved' regions. We find that known RNA structured elements curated as Rfam families and in prior literature are enriched in these conserved genome regions, and we predict additional conserved, stable secondary structures across the viral genome. We provide 106 'SARS-CoV-2-conserved-structured' regions as potential targets for antivirals that bind to structured RNA. We further provide detailed secondary structure models for the 5´ UTR, frame-shifting element, and 3´ UTR. Last, we predict regions of the SARS-CoV-2 viral genome have low propensity for RNA secondary structure and are conserved within SARS-CoV-2 strains. These 59 'SARS-CoV-2-conserved-unstructured' genomic regions may be most easily targeted in primer-based diagnostic and oligonucleotide-based therapeutic strategies.
RESUMO
Drug discovery campaigns against Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) are beginning to target the viral RNA genome 1, 2 . The frameshift stimulation element (FSE) of the SARS-CoV-2 genome is required for balanced expression of essential viral proteins and is highly conserved, making it a potential candidate for antiviral targeting by small molecules and oligonucleotides 3-6 . To aid global efforts focusing on SARS-CoV-2 frameshifting, we report exploratory results from frameshifting and cellular replication experiments with locked nucleic acid (LNA) antisense oligonucleotides (ASOs), which support the FSE as a therapeutic target but highlight difficulties in achieving strong inactivation. To understand current limitations, we applied cryogenic electron microscopy (cryo-EM) and the Ribosolve 7 pipeline to determine a three-dimensional structure of the SARS-CoV-2 FSE, validated through an RNA nanostructure tagging method. This is the smallest macromolecule (88 nt; 28 kDa) resolved by single-particle cryo-EM at subnanometer resolution to date. The tertiary structure model, defined to an estimated accuracy of 5.9 Å, presents a topologically complex fold in which the 5' end threads through a ring formed inside a three-stem pseudoknot. Our results suggest an updated model for SARS-CoV-2 frameshifting as well as binding sites that may be targeted by next generation ASOs and small molecules.
RESUMO
Vesicles formed from single-chain amphiphiles (SCAs) such as fatty acids probably played an important role in the origin of life. A major criticism of the hypothesis that life arose in an early ocean hydrothermal environment is that hot temperatures, large pH gradients, high salinity and abundant divalent cations should preclude vesicle formation. However, these arguments are based on model vesicles using 1-3 SCAs, even though Fischer-Tropsch-type synthesis under hydrothermal conditions produces a wide array of fatty acids and 1-alkanols, including abundant C10-C15 compounds. Here, we show that mixtures of these C10-C15 SCAs form vesicles in aqueous solutions between pH ~6.5 and >12 at modern seawater concentrations of NaCl, Mg2+ and Ca2+. Adding C10 isoprenoids improves vesicle stability even further. Vesicles form most readily at temperatures of ~70 °C and require salinity and strongly alkaline conditions to self-assemble. Thus, alkaline hydrothermal conditions not only permit protocell formation at the origin of life but actively favour it.