RESUMEN
Inferring cellular trajectories using a variety of omic data is a critical task in single-cell data science. However, accurate prediction of cell fates, and thereby biologically meaningful discovery, is challenged by the sheer size of single-cell data, the diversity of omic data types, and the complexity of their topologies. We present VIA, a scalable trajectory inference algorithm that overcomes these limitations by using lazy-teleporting random walks to accurately reconstruct complex cellular trajectories beyond tree-like pathways (e.g., cyclic or disconnected structures). We show that VIA robustly and efficiently unravels the fine-grained sub-trajectories in a 1.3-million-cell transcriptomic mouse atlas without losing the global connectivity at such a high cell count. We further apply VIA to discovering elusive lineages and less populous cell fates missed by other methods across a variety of data types, including single-cell proteomic, epigenomic, multi-omics datasets, and a new in-house single-cell morphological dataset.
Asunto(s)
Algoritmos , Genómica , Análisis de la Célula Individual , Animales , Ciclo Celular , Diferenciación Celular , Línea Celular Tumoral , Forma de la Célula , Hematopoyesis , Humanos , Islotes Pancreáticos/citología , Proteínas con Homeodominio LIM/metabolismo , Mesodermo/citología , Ratones , Células Madre Embrionarias de Ratones/citología , Organogénesis , Factores de Transcripción/metabolismoRESUMEN
The association of the intrinsic optical and biophysical properties of cells to homeostasis and pathogenesis has long been acknowledged. Defining these label-free cellular features obviates the need for costly and time-consuming labelling protocols that perturb the living cells. However, wide-ranging applicability of such label-free cell-based assays requires sufficient throughput, statistical power and sensitivity that are unattainable with current technologies. To close this gap, we present a large-scale, integrative imaging flow cytometry platform and strategy that allows hierarchical analysis of intrinsic morphological descriptors of single-cell optical and mass density within a population of millions of cells. The optofluidic cytometry system also enables the synchronous single-cell acquisition of and correlation with fluorescently labeled biochemical markers. Combined with deep neural network and transfer learning, this massive single-cell profiling strategy demonstrates the label-free power to delineate the biophysical signatures of the cancer subtypes, to detect rare populations of cells in the heterogeneous samples (10-5), and to assess the efficacy of targeted therapeutics. This technique could spearhead the development of optofluidic imaging cell-based assays that stratify the underlying physiological and pathological processes based on the information-rich biophysical cellular phenotypes.
Asunto(s)
Aprendizaje Profundo , Biofisica , Citometría de Flujo , Citometría de Imagen , FenotipoRESUMEN
MOTIVATION: New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. RESULTS: We introduce a highly scalable graph-based clustering algorithm PARC-Phenotyping by Accelerated Refined Community-partitioning-for large-scale, high-dimensional single-cell data (>1 million cells). Using large single-cell flow and mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without subsampling of cells, including Phenograph, FlowSOM and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single-cell dataset of 1.1 million cells within 13 min, compared with >2 h for the next fastest graph-clustering algorithm. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis. AVAILABILITY AND IMPLEMENTATION: https://github.com/ShobiStassen/PARC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.