Pesquisa | BVS IEC

EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering.

Lee, Soohyun; Seo, Chae Hwa; Alver, Burak Han; Lee, Sanghyuk; Park, Peter J.

BMC Bioinformatics ; 16: 278, 2015 Sep 03.

Artigo em Inglês | MEDLINE | ID: mdl-26335049

RESUMO

BACKGROUND: RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. RESULTS: We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. CONCLUSIONS: EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.

Assuntos

Perfilação da Expressão Gênica/métodos , Genoma/genética , Isoformas de Proteínas/genética , RNA/genética , Análise de Sequência de RNA/métodos , Sequência de Bases , Transcriptoma

Accurate quantification of transcriptome from RNA-Seq data by effective length normalization.

Lee, Soohyun; Seo, Chae Hwa; Lim, Byungho; Yang, Jin Ok; Oh, Jeongsu; Kim, Minjin; Lee, Sooncheol; Lee, Byungwook; Kang, Changwon; Lee, Sanghyuk.

Nucleic Acids Res ; 39(2): e9, 2011 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-21059678

RESUMO

We propose a novel, efficient and intuitive approach of estimating mRNA abundances from the whole transcriptome shotgun sequencing (RNA-Seq) data. Our method, NEUMA (Normalization by Expected Uniquely Mappable Area), is based on effective length normalization using uniquely mappable areas of gene and mRNA isoform models. Using the known transcriptome sequence model such as RefSeq, NEUMA pre-computes the numbers of all possible gene-wise and isoform-wise informative reads: the former being sequences mapped to all mRNA isoforms of a single gene exclusively and the latter uniquely mapped to a single mRNA isoform. The results are used to estimate the effective length of genes and transcripts, taking experimental distributions of fragment size into consideration. Quantitative RT-PCR based on 27 randomly selected genes in two human cell lines and computer simulation experiments demonstrated superior accuracy of NEUMA over other recently developed methods. NEUMA covers a large proportion of genes and mRNA isoforms and offers a measure of consistency ('consistency coefficient') for each gene between an independently measured gene-wise level and the sum of the isoform levels. NEUMA is applicable to both paired-end and single-end RNA-Seq data. We propose that NEUMA could make a standard method in quantifying gene transcript levels from RNA-Seq data.

Assuntos

Algoritmos , Perfilação da Expressão Gênica/métodos , RNA Mensageiro/análise , Análise de Sequência de RNA , Linhagem Celular , Simulação por Computador , Perfilação da Expressão Gênica/normas , Humanos , Reação em Cadeia da Polimerase , Isoformas de Proteínas/genética , RNA Mensageiro/química , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA