Your browser doesn't support javascript.
loading
Propensity score matching enables batch-effect-corrected imputation in single-cell RNA-seq analysis.
Xu, Xinyi; Yu, Xiaokang; Hu, Gang; Wang, Kui; Zhang, Jingxiao; Li, Xiangjie.
Afiliação
  • Xu X; School of Statistics and Mathematics, Central University of Finance and Economics, Beijing, 100081,  China.
  • Yu X; Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, 100872,  China.
  • Hu G; School of Statistics and Data Science, Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin, Nankai University, Tianjin 300071,  China.
  • Wang K; School of Statistics and Data Science, Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin, Nankai University, Tianjin 300071,  China.
  • Zhang J; Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, 100872,  China.
  • Li X; School of Statistics and Data Science, Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin, Nankai University, Tianjin 300071,  China.
Brief Bioinform ; 23(4)2022 07 18.
Article em En | MEDLINE | ID: mdl-35821114
Developments of single-cell RNA sequencing (scRNA-seq) technologies have enabled biological discoveries at the single-cell resolution with high throughput. However, large scRNA-seq datasets always suffer from massive technical noises, including batch effects and dropouts, and the dropout is often shown to be batch-dependent. Most existing methods only address one of the problems, and we show that the popularly used methods failed in trading off batch effect correction and dropout imputation. Here, inspired by the idea of causal inference, we propose a novel propensity score matching method for scRNA-seq data (scPSM) by borrowing information and taking the weighted average from similar cells in the deep sequenced batch, which simultaneously removes the batch effect, imputes dropout and denoises data in the entire gene expression space. The proposed method is testified on two simulation datasets and a variety of real scRNA-seq datasets, and the results show that scPSM is superior to other state-of-the-art methods. First, scPSM improves clustering accuracy and mixes cells of the same type, suggesting its ability to keep cell type separation while correcting for batch. Besides, using the scPSM-integrated data as input yields results free of batch effects or dropouts in the differential expression analysis. Moreover, scPSM not only achieves ideal denoising but also preserves real biological structure for downstream gene-based analyses. Furthermore, scPSM is robust to hyperparameters and small datasets with a few cells but enormous genes. Comprehensive evaluations demonstrate that scPSM jointly provides desirable batch effect correction, imputation and denoising for recovering the biologically meaningful expression in scRNA-seq data.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Perfilação da Expressão Gênica / Análise de Célula Única Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Perfilação da Expressão Gênica / Análise de Célula Única Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2022 Tipo de documento: Article