Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 186(25): 5440-5456.e26, 2023 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-38065078

RESUMEN

Today's genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), which directly analyzes raw sequencing data, using a statistical test to detect a signature of regulation: sample-specific sequence variation. SPLASH detects many types of variation and can be efficiently run at scale. We show that SPLASH identifies complex mutation patterns in SARS-CoV-2, discovers regulated RNA isoforms at the single-cell level, detects the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a unifying approach to genomic analysis that enables expansive discovery without metadata or references.


Asunto(s)
Algoritmos , Genómica , Genoma , Análisis de Secuencia de ARN , Humanos , Antígenos HLA/genética , Análisis de la Célula Individual
2.
Proc Natl Acad Sci U S A ; 121(15): e2304671121, 2024 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-38564640

RESUMEN

Contingency tables, data represented as counts matrices, are ubiquitous across quantitative research and data-science applications. Existing statistical tests are insufficient however, as none are simultaneously computationally efficient and statistically valid for a finite number of observations. In this work, motivated by a recent application in reference-free genomic inference [K. Chaung et al., Cell 186, 5440-5456 (2023)], we develop Optimized Adaptive Statistic for Inferring Structure (OASIS), a family of statistical tests for contingency tables. OASIS constructs a test statistic which is linear in the normalized data matrix, providing closed-form P-value bounds through classical concentration inequalities. In the process, OASIS provides a decomposition of the table, lending interpretability to its rejection of the null. We derive the asymptotic distribution of the OASIS test statistic, showing that these finite-sample bounds correctly characterize the test statistic's P-value up to a variance term. Experiments on genomic sequencing data highlight the power and interpretability of OASIS. Using OASIS, we develop a method that can detect SARS-CoV-2 and Mycobacterium tuberculosis strains de novo, which existing approaches cannot achieve. We demonstrate in simulations that OASIS is robust to overdispersion, a common feature in genomic data like single-cell RNA sequencing, where under accepted noise models OASIS provides good control of the false discovery rate, while Pearson's [Formula: see text] consistently rejects the null. Additionally, we show in simulations that OASIS is more powerful than Pearson's [Formula: see text] in certain regimes, including for some important two group alternatives, which we corroborate with approximate power calculations.


Asunto(s)
Genoma , Genómica , Mapeo Cromosómico
3.
Nat Methods ; 20(8): 1159-1169, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37443337

RESUMEN

The detection of circular RNA molecules (circRNAs) is typically based on short-read RNA sequencing data processed using computational tools. Numerous such tools have been developed, but a systematic comparison with orthogonal validation is missing. Here, we set up a circRNA detection tool benchmarking study, in which 16 tools detected more than 315,000 unique circRNAs in three deeply sequenced human cell types. Next, 1,516 predicted circRNAs were validated using three orthogonal methods. Generally, tool-specific precision is high and similar (median of 98.8%, 96.3% and 95.5% for qPCR, RNase R and amplicon sequencing, respectively) whereas the sensitivity and number of predicted circRNAs (ranging from 1,372 to 58,032) are the most significant differentiators. Of note, precision values are lower when evaluating low-abundance circRNAs. We also show that the tools can be used complementarily to increase detection sensitivity. Finally, we offer recommendations for future circRNA detection and validation.


Asunto(s)
Benchmarking , ARN Circular , Humanos , ARN Circular/genética , ARN/genética , ARN/metabolismo , Análisis de Secuencia de ARN/métodos
4.
Nat Methods ; 19(3): 307-310, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35241832

RESUMEN

Detecting single-cell-regulated splicing from droplet-based technologies is challenging. Here, we introduce the splicing Z score (SpliZ), an annotation-free statistical method to detect regulated splicing in single-cell RNA sequencing. We applied the SpliZ to human lung cells, discovering hundreds of genes with cell-type-specific splicing patterns including ones with potential implications for basic and translational biology.


Asunto(s)
Empalme Alternativo , Empalme del ARN , Humanos
5.
Nucleic Acids Res ; 50(21): 12400-12424, 2022 11 28.
Artículo en Inglés | MEDLINE | ID: mdl-35947650

RESUMEN

Trimethylguanosine synthase 1 (TGS1) is a highly conserved enzyme that converts the 5'-monomethylguanosine cap of small nuclear RNAs (snRNAs) to a trimethylguanosine cap. Here, we show that loss of TGS1 in Caenorhabditis elegans, Drosophila melanogaster and Danio rerio results in neurological phenotypes similar to those caused by survival motor neuron (SMN) deficiency. Importantly, expression of human TGS1 ameliorates the SMN-dependent neurological phenotypes in both flies and worms, revealing that TGS1 can partly counteract the effects of SMN deficiency. TGS1 loss in HeLa cells leads to the accumulation of immature U2 and U4atac snRNAs with long 3' tails that are often uridylated. snRNAs with defective 3' terminations also accumulate in Drosophila Tgs1 mutants. Consistent with defective snRNA maturation, TGS1 and SMN mutant cells also exhibit partially overlapping transcriptome alterations that include aberrantly spliced and readthrough transcripts. Together, these results identify a neuroprotective function for TGS1 and reinforce the view that defective snRNA maturation affects neuronal viability and function.


Asunto(s)
Metiltransferasas , Neuronas Motoras , ARN Nuclear Pequeño , Animales , Humanos , Caenorhabditis elegans/genética , Caenorhabditis elegans/metabolismo , Drosophila/genética , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Células HeLa , Neuronas Motoras/metabolismo , Neuronas Motoras/patología , Fenotipo , ARN Nuclear Pequeño/metabolismo , Metiltransferasas/metabolismo
6.
Nat Rev Genet ; 17(11): 679-692, 2016 10 14.
Artículo en Inglés | MEDLINE | ID: mdl-27739534

RESUMEN

The pervasive expression of circular RNAs (circRNAs) is a recently discovered feature of gene expression in highly diverged eukaryotes. Numerous algorithms that are used to detect genome-wide circRNA expression from RNA sequencing (RNA-seq) data have been developed in the past few years, but there is little overlap in their predictions and no clear gold-standard method to assess the accuracy of these algorithms. We review sources of experimental and bioinformatic biases that complicate the accurate discovery of circRNAs and discuss statistical approaches to address these biases. We conclude with a discussion of the current experimental progress on the topic.


Asunto(s)
Biología Computacional/métodos , Anotación de Secuencia Molecular/estadística & datos numéricos , ARN/metabolismo , Análisis de Secuencia de ARN/métodos , Bases de Datos de Ácidos Nucleicos , Humanos , Anotación de Secuencia Molecular/métodos , ARN/química , ARN Circular , Programas Informáticos
7.
Proc Natl Acad Sci U S A ; 116(31): 15524-15533, 2019 07 30.
Artículo en Inglés | MEDLINE | ID: mdl-31308241

RESUMEN

The extent to which gene fusions function as drivers of cancer remains a critical open question. Current algorithms do not sufficiently identify false-positive fusions arising during library preparation, sequencing, and alignment. Here, we introduce Data-Enriched Efficient PrEcise STatistical fusion detection (DEEPEST), an algorithm that uses statistical modeling to minimize false-positives while increasing the sensitivity of fusion detection. In 9,946 tumor RNA-sequencing datasets from The Cancer Genome Atlas (TCGA) across 33 tumor types, DEEPEST identifies 31,007 fusions, 30% more than identified by other methods, while calling 10-fold fewer false-positive fusions in nontransformed human tissues. We leverage the increased precision of DEEPEST to discover fundamental cancer biology. Namely, 888 candidate oncogenes are identified based on overrepresentation in DEEPEST calls, and 1,078 previously unreported fusions involving long intergenic noncoding RNAs, demonstrating a previously unappreciated prevalence and potential for function. DEEPEST also reveals a high enrichment for fusions involving oncogenes in cancers, including ovarian cancer, which has had minimal treatment advances in recent decades, finding that more than 50% of tumors harbor gene fusions predicted to be oncogenic. Specific protein domains are enriched in DEEPEST calls, indicating a global selection for fusion functionality: kinase domains are nearly 2-fold more enriched in DEEPEST calls than expected by chance, as are domains involved in (anaerobic) metabolism and DNA binding. The statistical algorithms, population-level analytic framework, and the biological conclusions of DEEPEST call for increased attention to gene fusions as drivers of cancer and for future research into using fusions for targeted therapy.


Asunto(s)
Fusión Génica , Neoplasias/genética , Oncogenes , ARN Neoplásico/genética , Estadística como Asunto , Algoritmos , Secuencia de Bases , Bases de Datos Genéticas , Inestabilidad Genómica , Humanos , Proteoma/metabolismo , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo
9.
Bioinformatics ; 35(8): 1263-1268, 2019 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-30192918

RESUMEN

MOTIVATION: Identification of splice sites is critical to gene annotation and to determine which sequences control circRNA biogenesis. Full-length RNA transcripts could in principle complete annotations of introns and exons in genomes without external ontologies, i.e., ab initio. However, whether it is possible to reconstruct genomic positions where splicing occurs from full-length transcripts, even if sampled in the absence of noise, depends on the genome sequence composition. If it is not, there exist provable limits on the use of RNA-Seq to define splice locations (linear or circular) in the genome. RESULTS: We provide a formal definition of splice site ambiguity due to the genomic sequence by introducing equivalent junction, which is the set of local genomic positions resulting in the same RNA sequence when joined through RNA splicing. We show that equivalent junctions are prevalent in diverse eukaryotic genomes and occur in 88.64% and 78.64% of annotated human splice sites in linear and circRNA junctions, respectively. The observed fractions of equivalent junctions and the frequency of many individual motifs are statistically significant when compared against the null distribution computed via simulation or closed-form. The frequency of equivalent junctions establishes a fundamental limit on the possibility of ab initio reconstruction of RNA transcripts without appealing to the ontology of "GT-AG" boundaries defining introns. Said differently, completely ab initio is impossible in the vast majority of splice sites in annotated circRNAs and linear transcripts. AVAILABILITY AND IMPLEMENTATION: Two python scripts generating an equivalent junction sequence per junction are available at: https://github.com/salzmanlab/Equivalent-Junctions. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma Humano , Empalme Alternativo , Exones , Humanos , Intrones , Sitios de Empalme de ARN , Empalme del ARN , ARN Circular
10.
PLoS Comput Biol ; 15(12): e1007537, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31830035

RESUMEN

Next-generation sequencing is a cutting edge technology, but to quantify a dynamic range of abundances for different RNA or DNA species requires increasing sampling depth to levels that can be prohibitively expensive due to physical limits on molecular throughput of sequencers. To overcome this problem, we introduce a new general sampling theory which uses biophysical principles to functionally encode the abundance of a species before sampling, SeQUential depletIon and enriCHment (SQUICH). In theory and simulation, SQUICH enables sampling at a logarithmic rate to achieve the same precision as attained with conventional sequencing. A simple proof of principle experimental implementation of SQUICH in a controlled complex system of ~262,000 oligonucleotides already reduces sequencing depth by a factor of 10. SQUICH lays the groundwork for a general solution to a fundamental problem in molecular sampling and enables a new generation of efficient, precise molecular measurement at logarithmic or better sampling depth.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuencia de Bases , Biología Computacional , Simulación por Computador , ADN/genética , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Prueba de Estudio Conceptual , ARN/genética , Muestreo , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/estadística & datos numéricos , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/estadística & datos numéricos , Especificidad de la Especie
11.
PLoS Genet ; 13(12): e1007114, 2017 12.
Artículo en Inglés | MEDLINE | ID: mdl-29236709

RESUMEN

ciRS-7 is an intensely studied, highly expressed and conserved circRNA. Essentially nothing is known about its biogenesis, including the location of its promoter. A prevailing assumption has been that ciRS-7 is an exceptional circRNA because it is transcribed from a locus lacking any mature linear RNA transcripts of the same sense. To study the biogenesis of ciRS-7, we developed an algorithm to define its promoter and predicted that the human ciRS-7 promoter coincides with that of the long non-coding RNA, LINC00632. We validated this prediction using multiple orthogonal experimental assays. We also used computational approaches and experimental validation to establish that ciRS-7 exonic sequence is embedded in linear transcripts that are flanked by cryptic exons in both human and mouse. Together, this experimental and computational evidence generates a new model for regulation of this locus: (a) ciRS-7 is like other circRNAs, as it is spliced into linear transcripts; (b) expression of ciRS-7 is primarily determined by the chromatin state of LINC00632 promoters; (c) transcription and splicing factors sufficient for ciRS-7 biogenesis are expressed in cells that lack detectable ciRS-7 expression. These findings have significant implications for the study of the regulation and function of ciRS-7, and the analytic framework we developed to jointly analyze RNA-seq and ChIP-seq data reveal the potential for genome-wide discovery of important biological regulation missed in current reference annotations.


Asunto(s)
ARN/biosíntesis , ARN/genética , Algoritmos , Empalme Alternativo , Animales , Química Encefálica , Exones , Femenino , Células HEK293 , Humanos , Ratones , Embarazo , Empalme del ARN , ARN Circular , ARN Largo no Codificante/genética , Análisis de Secuencia de ARN/métodos
12.
Trends Genet ; 32(5): 309-316, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-27050930

RESUMEN

In 2012, a new feature of eukaryotic gene expression emerged: ubiquitous expression of circular RNA (circRNA) from genes traditionally thought to express messenger or linear noncoding (nc)RNA only. CircRNAs are covalently closed, circular RNA molecules that typically comprise exonic sequences and are spliced at canonical splice sites. This feature of gene expression was first recognized in humans and mouse, but it quickly emerged that it was common across essentially all eukaryotes studied by molecular biologists. CircRNA abundance, and even which alternatively spliced circRNA isoforms are expressed, varies by cell type and can exceed the abundance of the traditional linear mRNA or ncRNA transcript. CircRNAs are enriched in the brain and increase in abundance during fetal development. Together, these features raise fundamental questions regarding the regulation of circRNA in cis and in trans, and its function.


Asunto(s)
Empalme del ARN/genética , ARN/genética , Regulación de la Expresión Génica , Humanos , ARN/biosíntesis , ARN Circular
13.
Development ; 143(11): 1838-47, 2016 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-27246710

RESUMEN

Just a few years ago, it had been assumed that the dominant RNA isoforms produced from eukaryotic genes were variants of messenger RNA, functioning as intermediates in gene expression. In early 2012, however, a surprising discovery was made: circular RNA (circRNA) was shown to be a transcriptional product in thousands of human and mouse genes and in hundreds of cases constituted the dominant RNA isoform. Subsequent studies revealed that the expression of circRNAs is developmentally regulated, tissue and cell-type specific, and shared across the eukaryotic tree of life. These features suggest important functions for these molecules. Here, we describe major advances in the field of circRNA biology, focusing on the regulation of and functional roles played by these molecules.


Asunto(s)
Regulación de la Expresión Génica , ARN/genética , Animales , Biología Computacional , Humanos , Modelos Genéticos , ARN/metabolismo , Empalme del ARN/genética , ARN Circular
14.
Nucleic Acids Res ; 45(13): e126, 2017 Jul 27.
Artículo en Inglés | MEDLINE | ID: mdl-28541529

RESUMEN

Gene fusions are known to play critical roles in tumor pathogenesis. Yet, sensitive and specific algorithms to detect gene fusions in cancer do not currently exist. In this paper, we present a new statistical algorithm, MACHETE (Mismatched Alignment CHimEra Tracking Engine), which achieves highly sensitive and specific detection of gene fusions from RNA-Seq data, including the highest Positive Predictive Value (PPV) compared to the current state-of-the-art, as assessed in simulated data. We show that the best performing published algorithms either find large numbers of fusions in negative control data or suffer from low sensitivity detecting known driving fusions in gold standard settings, such as EWSR1-FLI1. As proof of principle that MACHETE discovers novel gene fusions with high accuracy in vivo, we mined public data to discover and subsequently PCR validate novel gene fusions missed by other algorithms in the ovarian cancer cell line OVCAR3. These results highlight the gains in accuracy achieved by introducing statistical models into fusion detection, and pave the way for unbiased discovery of potentially driving and druggable gene fusions in primary tumors.


Asunto(s)
Algoritmos , Fusión Génica , Biomarcadores de Tumor/genética , Línea Celular Tumoral , Simulación por Computador , Bases de Datos de Ácidos Nucleicos , Femenino , Proteínas de Fusión bcr-abl/genética , Genes abl , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias/genética , Fusión de Oncogenes , Proteínas de Fusión Oncogénica/genética , Neoplasias Ováricas/genética , Alineación de Secuencia , Análisis de Secuencia de ARN
15.
PLoS Genet ; 9(9): e1003777, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24039610

RESUMEN

Thousands of loci in the human and mouse genomes give rise to circular RNA transcripts; at many of these loci, the predominant RNA isoform is a circle. Using an improved computational approach for circular RNA identification, we found widespread circular RNA expression in Drosophila melanogaster and estimate that in humans, circular RNA may account for 1% as many molecules as poly(A) RNA. Analysis of data from the ENCODE consortium revealed that the repertoire of genes expressing circular RNA, the ratio of circular to linear transcripts for each gene, and even the pattern of splice isoforms of circular RNAs from each gene were cell-type specific. These results suggest that biogenesis of circular RNA is an integral, conserved, and regulated feature of the gene expression program.


Asunto(s)
Regulación de la Expresión Génica , ARN Mensajero/genética , ARN/genética , Transcripción Genética , Animales , Linaje de la Célula , Drosophila melanogaster/genética , Exones/genética , Humanos , Ratones , Poli A/genética , Isoformas de Proteínas/genética , ARN/biosíntesis , Empalme del ARN/genética , ARN Circular , Análisis de Secuencia de ARN
16.
Genome Res ; 21(1): 126-36, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21149389

RESUMEN

Viruses may play an important role in the evolution of human microbial communities. Clustered regularly interspaced short palindromic repeats (CRISPRs) provide bacteria and archaea with adaptive immunity to previously encountered viruses. Little is known about CRISPR composition in members of human microbial communities, the relative rate of CRISPR locus change, or how CRISPR loci differ between the microbiota of different individuals. We collected saliva from four periodontally healthy human subjects over an 11- to 17-mo time period and analyzed CRISPR sequences with corresponding streptococcal repeats in order to improve our understanding of the predominant features of oral streptococcal adaptive immune repertoires. We analyzed a total of 6859 CRISPR bearing reads and 427,917 bacterial 16S rRNA gene sequences. We found a core (ranging from 7% to 22%) of shared CRISPR spacers that remained stable over time within each subject, but nearly a third of CRISPR spacers varied between time points. We document high spacer diversity within each subject, suggesting constant addition of new CRISPR spacers. No greater than 2% of CRISPR spacers were shared between subjects, suggesting that each individual was exposed to different virus populations. We detect changes in CRISPR spacer sequence diversity over time that may be attributable to locus diversification or to changes in streptococcal population structure, yet the composition of the populations within subjects remained relatively stable. The individual-specific and traceable character of CRISPR spacer complements could potentially open the way for expansion of the domain of personalized medicine to the oral microbiome, where lineages may be tracked as a function of health and other factors.


Asunto(s)
Variación Genética , Secuencias Invertidas Repetidas/genética , Saliva/microbiología , Streptococcus/clasificación , Streptococcus/genética , ADN Bacteriano/genética , ADN Intergénico/genética , Ecosistema , Humanos , Filogenia , ARN Ribosómico 16S/genética , Análisis de Secuencia de ADN , Streptococcus/aislamiento & purificación
17.
BMC Microbiol ; 14: 146, 2014 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-24903519

RESUMEN

BACKGROUND: Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) are utilized by bacteria to resist encounters with their viruses. Human body surfaces have numerous bacteria that harbor CRISPRs, and their content can provide clues as to the types and features of viruses they may have encountered. RESULTS: We investigated the conservation of CRISPR content from streptococci on skin and saliva of human subjects over 8-weeks to determine whether similarities existed in the CRISPR spacer profiles and whether CRISPR spacers were a stable component of each biogeographic site. Most of the CRISPR sequences identified were unique, but a small proportion of spacers from the skin and saliva of each subject matched spacers derived from previously sequenced loci of S. thermophilus and other streptococci. There were significant proportions of CRISPR spacers conserved over the entire 8-week study period for all subjects, and salivary CRISPR spacers sampled in the mornings showed significantly higher levels of conservation than any other time of day. We also found substantial similarities in the spacer repertoires of the skin and saliva of each subject. Many skin-derived spacers matched salivary viruses, supporting that bacteria of the skin may encounter viruses with similar sequences to those found in the mouth. Despite the similarities between skin and salivary spacer repertoires, the variation present was distinct based on each subject and body site. CONCLUSIONS: The conservation of CRISPR spacers in the saliva and the skin of human subjects over the time period studied suggests a relative conservation of the bacteria harboring them.


Asunto(s)
Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Secuencia Conservada , Saliva/microbiología , Piel/microbiología , Streptococcus/clasificación , Streptococcus/genética , Portador Sano/microbiología , Humanos , Infecciones Estreptocócicas/microbiología , Streptococcus/aislamiento & purificación
18.
PLoS Biol ; 9(9): e1001156, 2011 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-21949640

RESUMEN

Every year, ovarian cancer kills approximately 14,000 women in the United States and more than 140,000 women worldwide. Most of these deaths are caused by tumors of the serous histological type, which is rarely diagnosed before it has disseminated. By deep paired-end sequencing of mRNA from serous ovarian cancers, followed by deep sequencing of the corresponding genomic region, we identified a recurrent fusion transcript. The fusion transcript joins the 5' exons of ESRRA, encoding a ligand-independent member of the nuclear-hormone receptor superfamily, to the 3' exons of C11orf20, a conserved but uncharacterized gene located immediately upstream of ESRRA in the reference genome. To estimate the prevalence of the fusion, we tested 67 cases of serous ovarian cancer by RT-PCR and sequencing and confirmed its presence in 10 of these. Targeted resequencing of the corresponding genomic region from two fusion-positive tumor samples identified a nearly clonal chromosomal rearrangement positioning ESRRA upstream of C11orf20 in one tumor, and evidence of local copy number variation in the ESRRA locus in the second tumor. We hypothesize that the recurrent novel fusion transcript may play a role in pathogenesis of a substantial fraction of serous ovarian cancers and could provide a molecular marker for detection of the cancer. Gene fusions involving adjacent or nearby genes can readily escape detection but may play important roles in the development and progression of cancer.


Asunto(s)
Biomarcadores de Tumor/genética , Cromosomas Humanos Par 11/genética , Cistadenocarcinoma Seroso/genética , Neoplasias Glandulares y Epiteliales/genética , Proteínas de Fusión Oncogénica/genética , Neoplasias Ováricas/genética , Receptores de Estrógenos/genética , Empalme Alternativo , Secuencia de Aminoácidos , Canadá , Carcinoma Epitelial de Ovario , Estudios de Casos y Controles , Aberraciones Cromosómicas , Cromosomas Humanos Par 11/química , Cistadenocarcinoma Seroso/epidemiología , Cistadenocarcinoma Seroso/patología , Variaciones en el Número de Copia de ADN , Exones , Femenino , Humanos , Datos de Secuencia Molecular , Estadificación de Neoplasias , Neoplasias Glandulares y Epiteliales/epidemiología , Neoplasias Glandulares y Epiteliales/patología , Neoplasias Ováricas/epidemiología , Neoplasias Ováricas/patología , Prevalencia , ARN Mensajero , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN , Estados Unidos , Receptor Relacionado con Estrógeno ERRalfa
19.
bioRxiv ; 2024 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-36993432

RESUMEN

SPLASH is an unsupervised, reference-free, and unifying algorithm that discovers regulated sequence variation through statistical analysis of k-mer composition, subsuming many application-specific methods. Here, we introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient k-mer counting approach. SPLASH2 enables rapid analysis of massive datasets from a wide range of sequencing technologies and biological contexts, delivering unparalleled scale and speed. The SPLASH2 algorithm unveils new biology (without tuning) in single-cell RNA-sequencing data from human muscle cells, as well as bulk RNA-seq from the entire Cancer Cell Line Encyclopedia (CCLE), including substantial unannotated alternative splicing in cancer transcriptome. The same untuned SPLASH2 algorithm recovers the BCR-ABL gene fusion, and detects circRNA sensitively and specifically, underscoring SPLASH2's unmatched precision and scalability across diverse RNA-seq detection tasks.

20.
bioRxiv ; 2024 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-38826472

RESUMEN

Most plant genomes and their regulation remain unknown. We used SPLASH - a new, reference-genome free sequence variation detection algorithm - to analyze transcriptional and post-transcriptional regulation from RNA-seq data. We discovered differential homolog expression during maize pollen development, and imbibition-dependent cryptic splicing in Arabidopsis seeds. SPLASH enables discovery of novel regulatory mechanisms, including differential regulation of genes from hybrid parental haplotypes, without the use of alignment to a reference genome.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA