Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data.

Silva, Anjali; Qin, Xiaoke; Rothstein, Steven J; McNicholas, Paul D; Subedi, Sanjeena

Silva, Anjali; Qin, Xiaoke; Rothstein, Steven J; McNicholas, Paul D; Subedi, Sanjeena.

Afiliação

Silva A; Department of Mathematics and Statistics, University of Guelph, Guelph, ON N1G 2W1, Canada.
Qin X; Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.
Rothstein SJ; School of Mathematics and Statistics, Carleton University, Ottawa, ON K1S 5B6, Canada.
McNicholas PD; Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.
Subedi S; Department of Mathematics and Statistics, McMaster University, Hamilton, ON L8S 4L8, Canada.

Bioinformatics ; 39(5)2023 05 04.

Article em En | MEDLINE | ID: mdl-37018147

ABSTRACT

ABSTRACT

MOTIVATION Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p conditions at r occasions. Matrix variate distributions offer a natural way to model three-way data and mixtures of matrix variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means of discovering gene co-expression networks.

RESULTS:

In this work, a mixture of matrix variate Poisson-log normal distributions is proposed for clustering read counts from RNA sequencing. By considering the matrix variate structure, full information on the conditions and occasions of the RNA sequencing dataset is simultaneously considered, and the number of covariance parameters to be estimated is reduced. We propose three different frameworks for parameter estimation a Markov chain Monte Carlo-based approach, a variational Gaussian approximation-based approach, and a hybrid approach. Various information criteria are used for model selection. The models are applied to both real and simulated data, and we demonstrate that the proposed approaches can recover the underlying cluster structure in both cases. In simulation studies where the true model parameters are known, our proposed approach shows good parameter recovery. AVAILABILITY AND IMPLEMENTATION The GitHub R package for this work is available at https//github.com/anjalisilva/mixMVPLN and is released under the open source MIT license.

Assuntos

Transcriptoma; Distribuição Normal; Simulação por Computador; Distribuições Estatísticas; Análise de Sequência de RNA

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Transcriptoma Idioma: En Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Canadá

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Transcriptoma Idioma: En Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Canadá