Analyzing Large-Scale Single-Cell RNA-Seq Data Using Coreset.
IEEE/ACM Trans Comput Biol Bioinform
; PP2024 Jun 24.
Article
en En
| MEDLINE
| ID: mdl-38913513
ABSTRACT
The recent boom in single-cell sequencing technologies provides valuable insights into the transcriptomes of individual cells. Through single-cell data analyses, a number of biological discoveries, such as novel cell types, developmental cell lineage trajectories, and gene regulatory networks, have been uncovered. However, the massive and increasingly accumulated single-cell datasets have also posed a seriously computational and analytical challenge for researchers. To address this issue, one typically applies dimensionality reduction approaches to reduce the large-scale datasets. However, these approaches are generally computationally infeasible for tall matrices. In addition, the downstream data analysis tasks such as clustering still take a large time complexity even on the dimension-reduced datasets. We present single-cell Coreset (scCoreset), a data summarization framework that extracts a small weighted subset of cells from a huge sparse single-cell RNA-seq data to facilitate the downstream data analysis tasks. Single-cell data analyses run on the extracted subset yield similar results to those derived from the original uncompressed data. Tests on various single-cell datasets show that scCoreset outperforms the existing data summarization approaches for common downstream tasks such as visualization and clustering. We believe that scCoreset can serve as a useful plug-in tool to improve the efficiency of current single-cell RNA-seq data analyses.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Idioma:
En
Revista:
ACM Trans Comput Biol Bioinform
Asunto de la revista:
BIOLOGIA
/
INFORMATICA MEDICA
Año:
2024
Tipo del documento:
Article