Your browser doesn't support javascript.
loading
Characterizing the impacts of dataset imbalance on single-cell data integration.
Maan, Hassaan; Zhang, Lin; Yu, Chengxin; Geuenich, Michael J; Campbell, Kieran R; Wang, Bo.
Afiliação
  • Maan H; Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada. hassaan.maan@mail.utoronto.ca.
  • Zhang L; Vector Institute, Toronto, Ontario, Canada. hassaan.maan@mail.utoronto.ca.
  • Yu C; Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada. hassaan.maan@mail.utoronto.ca.
  • Geuenich MJ; Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada.
  • Campbell KR; Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada.
  • Wang B; Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
Nat Biotechnol ; 2024 Mar 01.
Article em En | MEDLINE | ID: mdl-38429430
ABSTRACT
Computational methods for integrating single-cell transcriptomic data from multiple samples and conditions do not generally account for imbalances in the cell types measured in different datasets. In this study, we examined how differences in the cell types present, the number of cells per cell type and the cell type proportions across samples affect downstream analyses after integration. The Iniquitate pipeline assesses the robustness of integration results after perturbing the degree of imbalance between datasets. Benchmarking of five state-of-the-art single-cell RNA sequencing integration techniques in 2,600 integration experiments indicates that sample imbalance has substantial impacts on downstream analyses and the biological interpretation of integration results. Imbalance perturbation led to statistically significant variation in unsupervised clustering, cell type classification, differential expression and marker gene annotation, query-to-reference mapping and trajectory inference. We quantified the impacts of imbalance through newly introduced properties-aggregate cell type support and minimum cell type center distance. To better characterize and mitigate impacts of imbalance, we introduce balanced clustering metrics and imbalanced integration guidelines for integration method users.

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article