Forseti: a mechanistic and predictive model of the splicing status of scRNA-seq reads.

He, Dongze; Gao, Yuan; Chan, Spencer Skylar; Quintana-Parrilla, Natalia; Patro, Rob

He, Dongze; Gao, Yuan; Chan, Spencer Skylar; Quintana-Parrilla, Natalia; Patro, Rob.

Affiliation

He D; Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, United States.
Gao Y; Program in Computational Biology, Bioinformatics and Genomices, University of Maryland, College Park, MD 20742, United States.
Chan SS; Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, United States.
Quintana-Parrilla N; Program in Computational Biology, Bioinformatics and Genomices, University of Maryland, College Park, MD 20742, United States.
Patro R; Department of Computer Science, University of Maryland, College Park, MD 20742, United States.

Bioinformatics ; 40(Suppl 1): i297-i306, 2024 06 28.

Article in En | MEDLINE | ID: mdl-38940130

ABSTRACT

ABSTRACT

MOTIVATION Short-read single-cell RNA-sequencing (scRNA-seq) has been used to study cellular heterogeneity, cellular fate, and transcriptional dynamics. Modeling splicing dynamics in scRNA-seq data is challenging, with inherent difficulty in even the seemingly straightforward task of elucidating the splicing status of the molecules from which sequenced fragments are drawn. This difficulty arises, in part, from the limited read length and positional biases, which substantially reduce the specificity of the sequenced fragments. As a result, the splicing status of many reads in scRNA-seq is ambiguous because of a lack of definitive evidence. We are therefore in need of methods that can recover the splicing status of ambiguous reads which, in turn, can lead to more accuracy and confidence in downstream analyses.

RESULTS:

We develop Forseti, a predictive model to probabilistically assign a splicing status to scRNA-seq reads. Our model has two key components. First, we train a binding affinity model to assign a probability that a given transcriptomic site is used in fragment generation. Second, we fit a robust fragment length distribution model that generalizes well across datasets deriving from different species and tissue types. Forseti combines these two trained models to predict the splicing status of the molecule of origin of reads by scoring putative fragments that associate each alignment of sequenced reads with proximate potential priming sites. Using both simulated and experimental data, we show that our model can precisely predict the splicing status of many reads and identify the true gene origin of multi-gene mapped reads. AVAILABILITY AND IMPLEMENTATION Forseti and the code used for producing the results are available at https//github.com/COMBINE-lab/forseti under a BSD 3-clause license.

Subject(s)

RNA Splicing; Single-Cell Analysis/methods; Sequence Analysis, RNA/methods; Humans; Software; RNA-Seq/methods; Algorithms; Single-Cell Gene Expression Analysis

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Database: MEDLINE Main subject: RNA Splicing Limits: Humans Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2024 Type: Article Affiliation country: United States

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google