Your browser doesn't support javascript.
loading
The impacts of active and self-supervised learning on efficient annotation of single-cell expression data.
Geuenich, Michael J; Gong, Dae-Won; Campbell, Kieran R.
Afiliação
  • Geuenich MJ; Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, M5G 1×5, Canada. mgeuenich@lunenfeld.ca.
  • Gong DW; Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada. mgeuenich@lunenfeld.ca.
  • Campbell KR; Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, M5G 1×5, Canada.
Nat Commun ; 15(1): 1014, 2024 Feb 03.
Article em En | MEDLINE | ID: mdl-38307875
ABSTRACT
A crucial step in the analysis of single-cell data is annotating cells to cell types and states. While a myriad of approaches has been proposed, manual labeling of cells to create training datasets remains tedious and time-consuming. In the field of machine learning, active and self-supervised learning methods have been proposed to improve the performance of a classifier while reducing both annotation time and label budget. However, the benefits of such strategies for single-cell annotation have yet to be evaluated in realistic settings. Here, we perform a comprehensive benchmarking of active and self-supervised labeling strategies across a range of single-cell technologies and cell type annotation algorithms. We quantify the benefits of active learning and self-supervised strategies in the presence of cell type imbalance and variable similarity. We introduce adaptive reweighting, a heuristic procedure tailored to single-cell data-including a marker-aware version-that shows competitive performance with existing approaches. In addition, we demonstrate that having prior knowledge of cell type markers improves annotation accuracy. Finally, we summarize our findings into a set of recommendations for those implementing cell type annotation procedures or platforms. An R package implementing the heuristic approaches introduced in this work may be found at https//github.com/camlab-bioml/leader .
Assuntos

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Algoritmos / Aprendizado de Máquina Idioma: En Revista: Nat Commun Assunto da revista: BIOLOGIA / CIENCIA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Canadá

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Algoritmos / Aprendizado de Máquina Idioma: En Revista: Nat Commun Assunto da revista: BIOLOGIA / CIENCIA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Canadá