Your browser doesn't support javascript.
loading
INSIDER: Interpretable sparse matrix decomposition for RNA expression data analysis.
Zhao, Kai; Huang, Sen; Lin, Cuichan; Sham, Pak Chung; So, Hon-Cheong; Lin, Zhixiang.
Afiliação
  • Zhao K; Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
  • Huang S; Department of System Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
  • Lin C; Department of Psychiatry, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
  • Sham PC; Department of Psychiatry, University of Hong Kong, Pokfulam, Hong Kong, China.
  • So HC; Centre for Genomic Sciences, University of Hong Kong, Pokfulam, Hong Kong, China.
  • Lin Z; State Key Laboratory for Cognitive and Brain Sciences, University of Hong Kong, Pokfulam, Hong Kong, China.
PLoS Genet ; 20(3): e1011189, 2024 Mar.
Article em En | MEDLINE | ID: mdl-38484017
ABSTRACT
RNA sequencing (RNA-Seq) is widely used to capture transcriptome dynamics across tissues, biological entities, and conditions. Currently, few or no methods can handle multiple biological variables (e.g., tissues/ phenotypes) and their interactions simultaneously, while also achieving dimension reduction (DR). We propose INSIDER, a general and flexible statistical framework based on matrix factorization, which is freely available at https//github.com/kai0511/insider. INSIDER decomposes variation from different biological variables and their interactions into a shared low-rank latent space. Particularly, it introduces the elastic net penalty to induce sparsity while considering the grouping effects of genes. It can achieve DR of high-dimensional data (of > = 3 dimensions), as opposed to conventional methods (e.g., PCA/NMF) which generally only handle 2D data (e.g., sample × expression). Besides, it enables computing 'adjusted' expression profiles for specific biological variables while controlling variation from other variables. INSIDER is computationally efficient and accommodates missing data. INSIDER also performed similarly or outperformed a close competing method, SDA, as shown in simulations and can handle complex missing data in RNA-Seq data. Moreover, unlike SDA, it can be used when the data cannot be structured into a tensor. Lastly, we demonstrate its usefulness via real data analysis, including clustering donors for disease subtyping, revealing neuro-development trajectory using the BrainSpan data, and uncovering biological processes contributing to variables of interest (e.g., disease status and tissue) and their interactions.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Transcriptoma Idioma: En Revista: PLoS Genet Assunto da revista: GENETICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Transcriptoma Idioma: En Revista: PLoS Genet Assunto da revista: GENETICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China
...