Your browser doesn't support javascript.
loading
Semisoft clustering of single-cell data.
Zhu, Lingxue; Lei, Jing; Klei, Lambertus; Devlin, Bernie; Roeder, Kathryn.
  • Zhu L; Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213.
  • Lei J; Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213.
  • Klei L; Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213.
  • Devlin B; Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213.
  • Roeder K; Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213; roeder@andrew.cmu.edu.
Proc Natl Acad Sci U S A ; 116(2): 466-471, 2019 01 08.
Article en En | MEDLINE | ID: mdl-30587579
ABSTRACT
Motivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semisoft clustering that can classify both pure and intermediate cell types from data on gene expression from individual cells. Called semisoft clustering with pure cells (SOUP), this algorithm reveals the clustering structure for both pure cells and transitional cells with soft memberships. SOUP involves a two-step process Identify the set of pure cells and then estimate a membership matrix. To find pure cells, SOUP uses the special block structure in the expression similarity matrix. Once pure cells are identified, they provide the key information from which the membership matrix can be computed. By modeling cells as a continuous mixture of K discrete types we obtain more parsimonious results than obtained with standard clustering algorithms. Moreover, using soft membership estimates of cell type cluster centers leads to better estimates of developmental trajectories. The strong performance of SOUP is documented via simulation studies, which show its robustness to violations of modeling assumptions. The advantages of SOUP are illustrated by analyses of two independent datasets of gene expression from a large number of cells from fetal brain.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Algoritmos / Procesamiento Automatizado de Datos / Diferenciación Celular / Proliferación Celular / Modelos Biológicos Límite: Animals / Humans Idioma: En Año: 2019 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Algoritmos / Procesamiento Automatizado de Datos / Diferenciación Celular / Proliferación Celular / Modelos Biológicos Límite: Animals / Humans Idioma: En Año: 2019 Tipo del documento: Article