A model selection criterion for model-based clustering of annotated gene expression data.

Gallopin, Mélina; Celeux, Gilles; Jaffrézic, Florence; Rau, Andrea

Gallopin, Mélina; Celeux, Gilles; Jaffrézic, Florence; Rau, Andrea.

Stat Appl Genet Mol Biol ; 14(5): 413-28, 2015 Nov.

Article em En | MEDLINE | ID: mdl-26461845

ABSTRACT

ABSTRACT

In co-expression analyses of gene expression data, it is often of interest to interpret clusters of co-expressed genes with respect to a set of external information, such as a potentially incomplete list of functional properties for which a subset of genes may be annotated. Based on the framework of finite mixture models, we propose a model selection criterion that takes into account such external gene annotations, providing an efficient tool for selecting a relevant number of clusters and clustering model. This criterion, called the integrated completed annotated likelihood (ICAL), is defined by adding an entropy term to a penalized likelihood to measure the concordance between a clustering partition and the external annotation information. The ICAL leads to the choice of a model that is more easily interpretable with respect to the known functional gene annotations. We illustrate the interest of this model selection criterion in conjunction with Gaussian mixture models on simulated gene expression data and on real RNA-seq data.

Assuntos

Anotação de Sequência Molecular; Algoritmos; Análise por Conglomerados; Interpretação Estatística de Dados; Expressão Gênica; Perfilação da Expressão Gênica; Modelos Genéticos; Análise de Sequência de RNA

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Anotação de Sequência Molecular Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2015 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google