RESUMO
Today's genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), which directly analyzes raw sequencing data, using a statistical test to detect a signature of regulation: sample-specific sequence variation. SPLASH detects many types of variation and can be efficiently run at scale. We show that SPLASH identifies complex mutation patterns in SARS-CoV-2, discovers regulated RNA isoforms at the single-cell level, detects the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a unifying approach to genomic analysis that enables expansive discovery without metadata or references.
Assuntos
Algoritmos , Genômica , Genoma , Análise de Sequência de RNA , Humanos , Antígenos HLA/genética , Análise de Célula ÚnicaRESUMO
The authors have withdrawn this manuscript due to a duplicate posting of manuscript number BIORXIV/2022/497555. Therefore, the authors do not wish this work to be cited as reference for the project. If you have any questions, please contact the corresponding author. The correct preprint can be found at doi: https://doi.org/10.1101/2022.06.24.497555.
RESUMO
Today's genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a new unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), an approach that directly analyzes raw sequencing data to detect a signature of regulation: sample-specific sequence variation. The approach, which includes a new statistical test, is computationally efficient and can be run at scale. SPLASH unifies detection of myriad forms of sequence variation. We demonstrate that SPLASH identifies complex mutation patterns in SARS-CoV-2 strains, discovers regulated RNA isoforms at the single cell level, documents the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a new unifying approach to genomic analysis that enables an expansive scope of discovery without metadata or references.
RESUMO
RNA processing, including splicing and alternative polyadenylation, is crucial to gene function and regulation, but methods to detect RNA processing from single-cell RNA sequencing data are limited by reliance on pre-existing annotations, peak calling heuristics, and collapsing measurements by cell type. We introduce ReadZS, an annotation-free statistical approach to identify regulated RNA processing in single cells. ReadZS discovers cell type-specific RNA processing in human lung and conserved, developmentally regulated RNA processing in mammalian spermatogenesis-including global 3' UTR shortening in human spermatogenesis. ReadZS also discovers global 3' UTR lengthening in Arabidopsis development, highlighting the usefulness of this method in under-annotated transcriptomes.
Assuntos
Poliadenilação , Transcriptoma , Animais , Humanos , Regiões 3' não Traduzidas , RNA-Seq , Análise de Sequência de RNA/métodos , Mamíferos/genéticaRESUMO
Determining structures of protein complexes is crucial for understanding cellular functions. Here, we describe an integrative structure determination approach that relies on in vivo measurements of genetic interactions. We construct phenotypic profiles for point mutations crossed against gene deletions or exposed to environmental perturbations, followed by converting similarities between two profiles into an upper bound on the distance between the mutated residues. We determine the structure of the yeast histone H3-H4 complex based on ~500,000 genetic interactions of 350 mutants. We then apply the method to subunits Rpb1-Rpb2 of yeast RNA polymerase II and subunits RpoB-RpoC of bacterial RNA polymerase. The accuracy is comparable to that based on chemical cross-links; using restraints from both genetic interactions and cross-links further improves model accuracy and precision. The approach provides an efficient means to augment integrative structure determination with in vivo observations.
Assuntos
Complexos Multiproteicos/química , Complexos Multiproteicos/genética , Mapas de Interação de Proteínas/genética , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Histonas/química , Histonas/genética , Mutação , Conformação Proteica , Mapeamento de Interação de Proteínas , Saccharomyces cerevisiae/genéticaRESUMO
Many microbial functions happen within communities of interacting species. Explaining how species with disparate growth rates can coexist is important for applications such as manipulating host-associated microbiota or engineering industrial communities. Here, we ask how microbes interacting through their chemical environment can achieve coexistence in a continuous growth setup (similar to an industrial bioreactor or gut microbiota) where external resources are being supplied. We formulate and experimentally constrain a model in which mediators of interactions (e.g. metabolites or waste-products) are explicitly incorporated. Our model highlights facilitation and self-restraint as interactions that contribute to coexistence, consistent with our intuition. When interactions are strong, we observe that coexistence is determined primarily by the topology of facilitation and inhibition influences not their strengths. Importantly, we show that consumption or degradation of chemical mediators moderates interaction strengths and promotes coexistence. Our results offer insights into how to build or restructure microbial communities of interest.