Your browser doesn't support javascript.
loading
Supervised application of internal validation measures to benchmark dimensionality reduction methods in scRNA-seq data.
Koch, Forrest C; Sutton, Gavin J; Voineagu, Irina; Vafaee, Fatemeh.
Afiliación
  • Koch FC; School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia.
  • Sutton GJ; School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia.
  • Voineagu I; School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia.
  • Vafaee F; UNSW Data Science Hub, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia.
Brief Bioinform ; 22(6)2021 11 05.
Article en En | MEDLINE | ID: mdl-34374742
ABSTRACT
A typical single-cell RNA sequencing (scRNA-seq) experiment will measure on the order of 20 000 transcripts and thousands, if not millions, of cells. The high dimensionality of such data presents serious complications for traditional data analysis methods and, as such, methods to reduce dimensionality play an integral role in many analysis pipelines. However, few studies have benchmarked the performance of these methods on scRNA-seq data, with existing comparisons assessing performance via downstream analysis accuracy measures, which may confound the interpretation of their results. Here, we present the most comprehensive benchmark of dimensionality reduction methods in scRNA-seq data to date, utilizing over 300 000 compute hours to assess the performance of over 25 000 low-dimension embeddings across 33 dimensionality reduction methods and 55 scRNA-seq datasets. We employ a simple, yet novel, approach, which does not rely on the results of downstream analyses. Internal validation measures (IVMs), traditionally used as an unsupervised method to assess clustering performance, are repurposed to measure how well-formed biological clusters are after dimensionality reduction. Performance was further evaluated over nearly 200 000 000 iterations of DBSCAN, a density-based clustering algorithm, showing that hyperparameter optimization using IVMs as the objective function leads to near-optimal clustering. Methods were also assessed on the extent to which they preserve the global structure of the data, and on their computational memory and time requirements across a large range of sample sizes. Our comprehensive benchmarking analysis provides a valuable resource for researchers and aims to guide best practice for dimensionality reduction in scRNA-seq analyses, and we highlight Latent Dirichlet Allocation and Potential of Heat-diffusion for Affinity-based Transition Embedding as high-performing algorithms.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Análisis de Secuencia de ARN / Benchmarking / ARN Citoplasmático Pequeño Tipo de estudio: Guideline Límite: Humans Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: Australia

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Análisis de Secuencia de ARN / Benchmarking / ARN Citoplasmático Pequeño Tipo de estudio: Guideline Límite: Humans Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: Australia