Your browser doesn't support javascript.
loading
Signal recovery in single cell batch integration.
Zhang, Zhaojun; Mathew, Divij; Lim, Tristan; Mason, Kaishu; Martinez, Clara Morral; Huang, Sijia; Wherry, E John; Susztak, Katalin; Minn, Andy J; Ma, Zongming; Zhang, Nancy R.
Afiliação
  • Zhang Z; Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, PA, United States.
  • Mathew D; Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, United States.
  • Lim T; Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, PA, United States.
  • Mason K; Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, PA, United States.
  • Martinez CM; Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, PA, United States.
  • Huang S; Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, PA, United States.
  • Wherry EJ; Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, PA, United States.
  • Susztak K; Mark Foundation Center for Immunotherapy, Immune Signaling, and Radiation, Perelman School of Medicine, University of Pennsylvania, PA, United States.
  • Minn AJ; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, PA, United States.
  • Ma Z; Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, United States.
  • Zhang NR; Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, PA, United States.
bioRxiv ; 2023 Sep 23.
Article em En | MEDLINE | ID: mdl-37215021
ABSTRACT
Data integration to align cells across batches has become a cornerstone of single cell data analysis, critically affecting downstream results. Yet, how much biological signal is erased during integration? Currently, there are no guidelines for when the biological differences between samples are separable from batch effects, and thus, data integration usually involve a lot of guesswork Cells across batches should be aligned to be "appropriately" mixed, while preserving "main cell type clusters". We show evidence that current paradigms for single cell data integration are unnecessarily aggressive, removing biologically meaningful variation. To remedy this, we present a novel statistical model and computationally scalable algorithm, CellANOVA, to recover biological signal that is lost during single cell data integration. CellANOVA utilizes a "pool-of-controls" design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest. When applied with existing integration methods, CellANOVA allows the recovery of subtle biological signals and corrects, to a large extent, the data distortion introduced by integration. Further, CellANOVA explicitly estimates cell- and gene-specific batch effect terms which can be used to identify the cell types and pathways exhibiting the largest batch variations, providing clarity as to which biological signals can be recovered. These concepts are illustrated on studies of diverse designs, where the biological signals that are recovered by CellANOVA are shown to be validated by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nuclei data integration, where the recovered biological signals are replicated in an independent study.
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2023 Tipo de documento: Article