Context-aware transcript quantification from long-read RNA-seq data with Bambu.

Chen, Ying; Sim, Andre; Wan, Yuk Kei; Yeo, Keith; Lee, Joseph Jing Xian; Ling, Min Hao; Love, Michael I; Göke, Jonathan

Chen, Ying; Sim, Andre; Wan, Yuk Kei; Yeo, Keith; Lee, Joseph Jing Xian; Ling, Min Hao; Love, Michael I; Göke, Jonathan.

Affiliation

Chen Y; Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
Sim A; Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
Wan YK; Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
Yeo K; Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore.
Lee JJX; Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
Ling MH; Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
Love MI; Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
Göke J; Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.

Nat Methods ; 20(8): 1187-1195, 2023 08.

Article in En | MEDLINE | ID: mdl-37308696

ABSTRACT

Most approaches to transcript quantification rely on fixed reference annotations; however, the transcriptome is dynamic and depending on the context, such static annotations contain inactive isoforms for some genes, whereas they are incomplete for others. Here we present Bambu, a method that performs machine-learning-based transcript discovery to enable quantification specific to the context of interest using long-read RNA-sequencing. To identify novel transcripts, Bambu estimates the novel discovery rate, which replaces arbitrary per-sample thresholds with a single, interpretable, precision-calibrated parameter. Bambu retains the full-length and unique read counts, enabling accurate quantification in presence of inactive isoforms. Compared to existing methods for transcript discovery, Bambu achieves greater precision without sacrificing sensitivity. We show that context-aware annotations improve quantification for both novel and known transcripts. We apply Bambu to quantify isoforms from repetitive HERVH-LTR7 retrotransposons in human embryonic stem cells, demonstrating the ability for context-specific transcript expression analysis.

Subject(s)

Gene Expression Profiling; Transcriptome; Humans; RNA-Seq; Gene Expression Profiling/methods; Sequence Analysis, RNA/methods; Protein Isoforms/genetics

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Database: MEDLINE Main subject: Gene Expression Profiling / Transcriptome Limits: Humans Language: En Journal: Nat Methods Journal subject: TECNICAS E PROCEDIMENTOS DE LABORATORIO Year: 2023 Type: Article

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google