Inference on differences between classes using cluster-specific contrasts of mixed effects.

Ng, Shu Kay; McLachlan, Geoffrey J; Wang, Kui; Nagymanyoki, Zoltan; Liu, Shubai; Ng, Shu-Wing

Ng, Shu Kay; McLachlan, Geoffrey J; Wang, Kui; Nagymanyoki, Zoltan; Liu, Shubai; Ng, Shu-Wing.

Afiliação

Ng SK; School of Medicine, Griffith Health Institute, Griffith University, Meadowbrook, QLD 4131, Australia s.ng@griffith.edu.au.
McLachlan GJ; Department of Mathematics, University of Queensland, Brisbane, QLD 4072, Australia.
Wang K; Department of Mathematics, University of Queensland, Brisbane, QLD 4072, Australia.
Nagymanyoki Z; Laboratory of Gynecologic Oncology, Department of Obstetrics, Gynecology and Reproductive Biology, Brigham and Women's Hospital, Boston, MA 02115, USA.
Liu S; Laboratory of Gynecologic Oncology, Department of Obstetrics, Gynecology and Reproductive Biology, Brigham and Women's Hospital, Boston, MA 02115, USA.
Ng SW; Laboratory of Gynecologic Oncology, Department of Obstetrics, Gynecology and Reproductive Biology, Brigham and Women's Hospital, Boston, MA 02115, USA.

Biostatistics ; 16(1): 98-112, 2015 Jan.

Article em En | MEDLINE | ID: mdl-24963011

RESUMO

The detection of differentially expressed (DE) genes, that is, genes whose expression levels vary between two or more classes representing different experimental conditions (say, diseases), is one of the most commonly studied problems in bioinformatics. For example, the identification of DE genes between distinct disease phenotypes is an important first step in understanding and developing treatment drugs for the disease. We present a novel approach to the problem of detecting DE genes that is based on a test statistic formed as a weighted (normalized) cluster-specific contrast in the mixed effects of the mixture model used in the first instance to cluster the gene profiles into a manageable number of clusters. The key factor in the formation of our test statistic is the use of gene-specific mixed effects in the cluster-specific contrast. It thus means that the (soft) assignment of a given gene to a cluster is not crucial. This is because in addition to class differences between the (estimated) fixed effects terms for a cluster, gene-specific class differences also contribute to the cluster-specific contributions to the final form of the test statistic. The proposed test statistic can be used where the primary aim is to rank the genes in order of evidence against the null hypothesis of no DE. We also show how a P-value can be calculated for each gene for use in multiple hypothesis testing where the intent is to control the false discovery rate (FDR) at some desired level. With the use of publicly available and simulated datasets, we show that the proposed contrast-based approach outperforms other methods commonly used for the detection of DE genes both in a ranking context with lower proportion of false discoveries and in a multiple hypothesis testing context with higher power for a specified level of the FDR.

Assuntos

Análise por Conglomerados; Interpretação Estatística de Dados; Perfilação da Expressão Gênica/estatística & dados numéricos; Expressão Gênica/genética; Modelos Genéticos; Neoplasias da Mama/genética; Feminino; Humanos

Palavras-chave

Contrast; Differential expression; Mixture model; Random effects modeling

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Análise por Conglomerados / Expressão Gênica / Interpretação Estatística de Dados / Perfilação da Expressão Gênica / Modelos Genéticos Tipo de estudo: Prognostic_studies Limite: Female / Humans Idioma: En Revista: Biostatistics Ano de publicação: 2015 Tipo de documento: Article País de afiliação: Austrália

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google