Modelling RNA-Seq data with a zero-inflated mixture Poisson linear model.

Liu, Siyun; Jiang, Yuan; Yu, Tao

Liu, Siyun; Jiang, Yuan; Yu, Tao.

Afiliação

Liu S; Department of Statistics and Applied Probability, National University of Singapore, Singapore.
Jiang Y; Department of Statistics, Oregon State University, Corvallis, Oregon.
Yu T; Department of Statistics and Applied Probability, National University of Singapore, Singapore.

Genet Epidemiol ; 43(7): 786-799, 2019 10.

Article em En | MEDLINE | ID: mdl-31328831

ABSTRACT

ABSTRACT

RNA sequencing (RNA-Seq) has been frequently used in genomic studies and has generated a vast amount of data. The RNA-Seq data are composed of two parts (a) a sequence of nucleotides of the genome; and (b) a corresponding sequence of counts, standing for the number of short reads whose mapped positions start at each position of the genome. One common feature of these count data is that they are typically nonuniform; recent studies have revealed that the nonuniformity is partially owing to a systematic bias resulted from the sequencing preference. Existing works in the literature model the nonuniformity with a single component Poisson linear model that incorporates the effects of the sequencing preference. However, we observe consistently that the short reads mapped to a gene may have a mixture structure and can be zero-inflated. A single component model may not suffice to model the complexity of such data. In this paper, we propose a zero-inflated mixture Poisson linear model for the RNA-Seq count data and derive a fast expectation-maximisation-based algorithm for estimating the unknown parameters. Numerical studies are conducted to illustrate the effectiveness of our method.

Assuntos

Modelos Genéticos; Análise de Sequência de RNA; Algoritmos; Simulação por Computador; Regulação da Expressão Gênica; Humanos; Modelos Lineares; Distribuição de Poisson

Palavras-chave

Bayesian information criterion; RNA-Seq; mixture Poisson linear model; nonuniformity; overdispersion; zero-inflated count data

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Análise de Sequência de RNA / Modelos Genéticos Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: Genet Epidemiol Assunto da revista: EPIDEMIOLOGIA / GENETICA MEDICA Ano de publicação: 2019 Tipo de documento: Article País de afiliação: Singapura

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google