CoCo: RNA-seq read assignment correction for nested genes and multimapped reads.
Bioinformatics
; 35(23): 5039-5047, 2019 12 01.
Article
em En
| MEDLINE
| ID: mdl-31141144
MOTIVATION: Next-generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage. RESULTS: Here we present count corrector (CoCo), a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcriptome of higher eukaryotes. CoCo uses a modified annotation file that highlights nested genes and proportionally distributes multimapped reads between repeated sequences. CoCo salvages over 15% of discarded aligned RNA-seq reads and significantly changes the abundance estimates for both coding and non-coding RNA as validated by PCR and bedgraph comparisons. AVAILABILITY AND IMPLEMENTATION: The CoCo software is an open source package written in Python and available from http://gitlabscottgroup.med.usherbrooke.ca/scott-group/coco. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
RNA-Seq
Idioma:
En
Revista:
Bioinformatics
Assunto da revista:
INFORMATICA MEDICA
Ano de publicação:
2019
Tipo de documento:
Article
País de afiliação:
Canadá