Pesquisa | Biblioteca Virtual em Saúde

1.

Contrasting and combining transcriptome complexity captured by short and long RNA sequencing reads.

Han, Seong Woo; Jewell, San; Thomas-Tikhonenko, Andrei; Barash, Yoseph.

Genome Res ; 2024 Oct 16.

Artigo em Inglês | MEDLINE | ID: mdl-39322279

RESUMO

Mapping transcriptomic variations using either short- or long-read RNA sequencing is a staple of genomic research. Long reads are able to capture entire isoforms and overcome repetitive regions, whereas short reads still provide improved coverage and error rates. Yet, open questions remain, such as how to quantitatively compare the technologies, can we combine them, and what is the benefit of such a combined view? We tackle these questions by first creating a pipeline to assess matched long- and short-read data using a variety of transcriptome statistics. We find that across data sets, algorithms, and technologies, matched short-read data detects â¼30% more splice junctions, such that â¼10%-30% of the splice junctions included at ≥20% by short reads are missed by long reads. In contrast, long reads detect many more intron-retention events and can detect full isoforms, pointing to the benefit of combining the technologies. We introduce MAJIQ-L, an extension of the MAJIQ software, to enable a unified view of transcriptome variations from both technologies and demonstrate its benefits. Our software can be used to assess any future long-read technology or algorithm and can be combined with short-read data for improved transcriptome analysis.

2.

MAJIQlopedia: an encyclopedia of RNA splicing variations in human tissues and cancer.

Quesnel-Vallières, Mathieu; Jewell, San; Lynch, Kristen W; Thomas-Tikhonenko, Andrei; Barash, Yoseph.

Nucleic Acids Res ; 52(D1): D213-D221, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37953365

RESUMO

Quantification of RNA splicing variations based on RNA-Sequencing can reveal tissue- and disease-specific splicing patterns. To study such splicing variations, we introduce MAJIQlopedia, an encyclopedia of splicing variations that encompasses 86 human tissues and 41 cancer datasets. MAJIQlopedia reports annotated and unannotated splicing events for a total of 486 175 alternative splice junctions in normal tissues and 338 317 alternative splice junctions in cancer. This database, available at https://majiq.biociphers.org/majiqlopedia/, includes a user-friendly interface that provides graphical representations of junction usage quantification for each junction across all tissue or cancer types. To demonstrate case usage of MAJIQlopedia, we review splicing variations in genes WT1, MAPT and BIN1, which all have known tissue or cancer-specific splicing variations. We also use MAJIQlopedia to highlight novel splicing variations in FDX1 and MEGF9 in normal tissues, and we uncover a novel exon inclusion event in RPS6KA6 that only occurs in two cancer types. Users can download the database, request the addition of data to the webtool, or install a MAJIQlopedia server to integrate proprietary data. MAJIQlopedia can serve as a reference database for researchers seeking to understand what splicing variations exist in genes of interest, and those looking to understand tissue- or cancer-specific splice isoform usage.

Assuntos

Processamento Alternativo , Neoplasias , Splicing de RNA , Humanos , Processamento Alternativo/genética , Neoplasias/genética , Isoformas de Proteínas/genética , Sítios de Splice de RNA , Splicing de RNA/genética , Análise de Sequência de RNA

3.

A Deep Dive into Statistical Modeling of RNA Splicing QTLs Reveals New Variants that Explain Neurodegenerative Disease.

Wang, David; Gazzara, Matthew R; Jewell, San; Wales-McGrath, Benjamin; Brown, Christopher D; Choi, Peter S; Barash, Yoseph.

bioRxiv ; 2024 Sep 03.

Artigo em Inglês | MEDLINE | ID: mdl-39282456

RESUMO

Genome-wide association studies (GWAS) have identified thousands of putative disease causing variants with unknown regulatory effects. Efforts to connect these variants with splicing quantitative trait loci (sQTLs) have provided functional insights, yet sQTLs reported by existing methods cannot explain many GWAS signals. We show current sQTL modeling approaches can be improved by considering alternative splicing representation, model calibration, and covariate integration. We then introduce MAJIQTL, a new pipeline for sQTL discovery. MAJIQTL includes two new statistical methods: a weighted multiple testing approach for sGene discovery and a model for sQTL effect size inference to improve variant prioritization. By applying MAJIQTL to GTEx, we find significantly more sGenes harboring sQTLs with functional significance. Notably, our analysis implicates the novel variant rs582283 in Alzheimer's disease. Using antisense oligonucleotides, we validate this variant's effect by blocking the implicated YBX3 binding site, leading to exon skipping in the gene MS4A3.

4.

Machine learning-optimized targeted detection of alternative splicing.

Yang, Kevin; Islas, Nathaniel; Jewell, San; Jha, Anupama; Radens, Caleb M; Pleiss, Jeffrey A; Lynch, Kristen W; Barash, Yoseph; Choi, Peter S.

bioRxiv ; 2024 Sep 24.

Artigo em Inglês | MEDLINE | ID: mdl-39386495

RESUMO

RNA-sequencing (RNA-seq) is widely adopted for transcriptome analysis but has inherent biases which hinder the comprehensive detection and quantification of alternative splicing. To address this, we present an efficient targeted RNA-seq method that greatly enriches for splicing-informative junction-spanning reads. Local Splicing Variation sequencing (LSV-seq) utilizes multiplexed reverse transcription from highly scalable pools of primers anchored near splicing events of interest. Primers are designed using Optimal Prime, a novel machine learning algorithm trained on the performance of thousands of primer sequences. In experimental benchmarks, LSV-seq achieves high on-target capture rates and concordance with RNA-seq, while requiring significantly lower sequencing depth. Leveraging deep learning splicing code predictions, we used LSV-seq to target events with low coverage in GTEx RNA-seq data and newly discover hundreds of tissue-specific splicing events. Our results demonstrate the ability of LSV-seq to quantify splicing of events of interest at high-throughput and with exceptional sensitivity.

5.

Contrasting and Combining Transcriptome Complexity Captured by Short and Long RNA Sequencing Reads.

Han, Seong Woo; Jewell, San; Thomas-Tikhonenko, Andrei; Barash, Yoseph.

bioRxiv ; 2023 Nov 21.

Artigo em Inglês | MEDLINE | ID: mdl-38045232

RESUMO

Mapping transcriptomic variations using either short or long reads RNA sequencing is a staple of genomic research. Long reads are able to capture entire isoforms and overcome repetitive regions, while short reads still provides improved coverage and error rates. Yet how to quantitatively compare the technologies, can we combine those, and what may be the benefit of such a combined view remain open questions. We tackle these questions by first creating a pipeline to assess matched long and short reads data using a variety of transcriptome statistics. We find that across datasets, algorithms and technologies, matched short reads data detects roughly 50% more splice junctions, with 10-30% of the splice junctions included at 20% or more are missed by long reads. In contrast, long reads detect many more intron retention events, pointing to the benefit of combining the technologies. We introduce MAJIQ-L, an extension of the MAJIQ software to enable a unified view of transcriptome variations from both technologies and demonstrate its benefits. Our software can be used to assess any future long reads technology or algorithm, and combine it with short reads data for improved transcriptome analysis.

6.

A Bayesian model for unsupervised detection of RNA splicing based subtypes in cancers.

Wang, David; Quesnel-Vallieres, Mathieu; Jewell, San; Elzubeir, Moein; Lynch, Kristen; Thomas-Tikhonenko, Andrei; Barash, Yoseph.

Nat Commun ; 14(1): 63, 2023 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-36599821

RESUMO

Identification of cancer sub-types is a pivotal step for developing personalized treatment. Specifically, sub-typing based on changes in RNA splicing has been motivated by several recent studies. We thus develop CHESSBOARD, an unsupervised algorithm tailored for RNA splicing data that captures "tiles" in the data, defined by a subset of unique splicing changes in a subset of patients. CHESSBOARD allows for a flexible number of tiles, accounts for uncertainty of splicing quantification, and is able to model missing values as additional signals. We first apply CHESSBOARD to synthetic data to assess its domain specific modeling advantages, followed by analysis of several leukemia datasets. We show detected tiles are reproducible in independent studies, investigate their possible regulatory drivers and probe their relation to known AML mutations. Finally, we demonstrate the potential clinical utility of CHESSBOARD by supplementing mutation based diagnostic assays with discovered splicing profiles to improve drug response correlation.

Assuntos

Neoplasias , Splicing de RNA , Humanos , Teorema de Bayes , Splicing de RNA/genética , Neoplasias/diagnóstico , Neoplasias/genética , Fatores de Processamento de RNA/genética , Mutação , Processamento Alternativo/genética

7.

RNA splicing analysis using heterogeneous and large RNA-seq datasets.

Vaquero-Garcia, Jorge; Aicher, Joseph K; Jewell, San; Gazzara, Matthew R; Radens, Caleb M; Jha, Anupama; Norton, Scott S; Lahens, Nicholas F; Grant, Gregory R; Barash, Yoseph.

Nat Commun ; 14(1): 1230, 2023 03 03.

Artigo em Inglês | MEDLINE | ID: mdl-36869033

RESUMO

The ubiquity of RNA-seq has led to many methods that use RNA-seq data to analyze variations in RNA splicing. However, available methods are not well suited for handling heterogeneous and large datasets. Such datasets scale to thousands of samples across dozens of experimental conditions, exhibit increased variability compared to biological replicates, and involve thousands of unannotated splice variants resulting in increased transcriptome complexity. We describe here a suite of algorithms and tools implemented in the MAJIQ v2 package to address challenges in detection, quantification, and visualization of splicing variations from such datasets. Using both large scale synthetic data and GTEx v8 as benchmark datasets, we assess the advantages of MAJIQ v2 compared to existing methods. We then apply MAJIQ v2 package to analyze differential splicing across 2,335 samples from 13 brain subregions, demonstrating its ability to offer insights into brain subregion-specific splicing regulation.

Assuntos

Algoritmos , Splicing de RNA , RNA-Seq , Benchmarking , Encéfalo

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA