CHOIR improves significance-based detection of cell types and states from single-cell data.
bioRxiv
; 2024 Jan 23.
Article
em En
| MEDLINE
| ID: mdl-38328105
ABSTRACT
Clustering is a critical step in the analysis of single-cell data, as it enables the discovery and characterization of putative cell types and states. However, most popular clustering tools do not subject clustering results to statistical inference testing, leading to risks of overclustering or underclustering data and often resulting in ineffective identification of cell types with widely differing prevalence. To address these challenges, we present CHOIR (clustering hierarchy optimization by iterative random forests), which applies a framework of random forest classifiers and permutation tests across a hierarchical clustering tree to statistically determine which clusters represent distinct populations. We demonstrate the enhanced performance of CHOIR through extensive benchmarking against 14 existing clustering methods across 100 simulated and 4 real single-cell RNA-seq, ATAC-seq, spatial transcriptomic, and multi-omic datasets. CHOIR can be applied to any single-cell data type and provides a flexible, scalable, and robust solution to the important challenge of identifying biologically relevant cell groupings within heterogeneous single-cell data.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Tipo de estudo:
Diagnostic_studies
/
Risk_factors_studies
Idioma:
En
Revista:
BioRxiv
Ano de publicação:
2024
Tipo de documento:
Article
País de afiliação:
Estados Unidos