Search | VHL Regional Portal

Systematic evaluation of spliced alignment programs for RNA-seq data.

Engström, Pär G; Steijger, Tamara; Sipos, Botond; Grant, Gregory R; Kahles, André; Rätsch, Gunnar; Goldman, Nick; Hubbard, Tim J; Harrow, Jennifer; Guigó, Roderic; Bertone, Paul.

Nat Methods ; 10(12): 1185-91, 2013 Dec.

Article in English | MEDLINE | ID: mdl-24185836

ABSTRACT

High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.

Subject(s)

RNA Splicing , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Animals , Chromosome Mapping/methods , Computational Biology/methods , Exons , False Positive Reactions , High-Throughput Nucleotide Sequencing/methods , Humans , K562 Cells , Mice , RNA, Messenger/metabolism , Reproducibility of Results , Software

Assessment of transcript reconstruction methods for RNA-seq.

Steijger, Tamara; Abril, Josep F; Engström, Pär G; Kokocinski, Felix; Hubbard, Tim J; Guigó, Roderic; Harrow, Jennifer; Bertone, Paul.

Nat Methods ; 10(12): 1177-84, 2013 Dec.

Article in English | MEDLINE | ID: mdl-24185837

ABSTRACT

We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.

Subject(s)

Computational Biology/methods , RNA Splicing , Sequence Analysis, RNA/methods , Algorithms , Animals , Caenorhabditis elegans , Drosophila melanogaster , Exons , Gene Expression Profiling , Genome , Humans , Introns , RNA Splice Sites , RNA, Messenger/metabolism , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL