Your browser doesn't support javascript.
loading
How does the structure of data impact cell-cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data.
Watson, Ebony Rose; Mora, Ariane; Taherian Fard, Atefeh; Mar, Jessica Cara.
  • Watson ER; Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia.
  • Mora A; School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia.
  • Taherian Fard A; Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia.
  • Mar JC; Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia.
Brief Bioinform ; 23(6)2022 11 19.
Article en En | MEDLINE | ID: mdl-36151725
Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. Accordingly, proximity metrics have been benchmarked for scRNA-seq clustering, typically with results averaged across datasets to identify a highest performing metric. However, the 'best-performing' metric varies between studies, with the performance differing significantly between datasets. This suggests that the unique structural properties of an scRNA-seq dataset, specific to the biological system under study, have a substantial impact on proximity metric performance. Previous benchmarking studies have omitted to factor the structural properties into their evaluations. To address this gap, we developed a framework for the in-depth evaluation of the performance of 17 proximity metrics with respect to core structural properties of scRNA-seq data, including sparsity, dimensionality, cell-population distribution and rarity. We find that clustering performance can be improved substantially by the selection of an appropriate proximity metric and neighbourhood size for the structural properties of a dataset, in addition to performing suitable pre-processing and dimensionality reduction. Furthermore, popular metrics such as Euclidean and Manhattan distance performed poorly in comparison to several lessor applied metrics, suggesting that the default metric for many scRNA-seq methods should be re-evaluated. Our findings highlight the critical nature of tailoring scRNA-seq analyses pipelines to the dataset under study and provide practical guidance for researchers looking to optimize cell-similarity search for the structural properties of their own data.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Benchmarking / Análisis de la Célula Individual Tipo de estudio: Guideline / Prognostic_studies Idioma: En Año: 2022 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Benchmarking / Análisis de la Célula Individual Tipo de estudio: Guideline / Prognostic_studies Idioma: En Año: 2022 Tipo del documento: Article