Your browser doesn't support javascript.
loading
mbkmeans: Fast clustering for single cell data using mini-batch k-means.
Hicks, Stephanie C; Liu, Ruoxi; Ni, Yuwei; Purdom, Elizabeth; Risso, Davide.
Afiliação
  • Hicks SC; Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.
  • Liu R; Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, USA.
  • Ni Y; Department of Healthcare Policy and Research, Weill Cornell Medical College, New York, New York, USA.
  • Purdom E; Department of Statistics, University of California, Berkeley, Berkeley, California, USA.
  • Risso D; Department of Statistical Sciences, University of Padova, Padova, Italy.
PLoS Comput Biol ; 17(1): e1008625, 2021 01.
Article em En | MEDLINE | ID: mdl-33497379
ABSTRACT
Single-cell RNA-Sequencing (scRNA-seq) is the most widely used high-throughput technology to measure genome-wide gene expression at the single-cell level. One of the most common analyses of scRNA-seq data detects distinct subpopulations of cells through the use of unsupervised clustering algorithms. However, recent advances in scRNA-seq technologies result in current datasets ranging from thousands to millions of cells. Popular clustering algorithms, such as k-means, typically require the data to be loaded entirely into memory and therefore can be slow or impossible to run with large datasets. To address this problem, we developed the mbkmeans R/Bioconductor package, an open-source implementation of the mini-batch k-means algorithm. Our package allows for on-disk data representations, such as the common HDF5 file format widely used for single-cell data, that do not require all the data to be loaded into memory at one time. We demonstrate the performance of the mbkmeans package using large datasets, including one with 1.3 million cells. We also highlight and compare the computing performance of mbkmeans against the standard implementation of k-means and other popular single-cell clustering methods. Our software package is available in Bioconductor at https//bioconductor.org/packages/mbkmeans.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Análise por Conglomerados / Análise de Sequência de RNA / Análise de Célula Única Limite: Animals Idioma: En Revista: PLoS Comput Biol Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Análise por Conglomerados / Análise de Sequência de RNA / Análise de Célula Única Limite: Animals Idioma: En Revista: PLoS Comput Biol Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Estados Unidos