Búsqueda | Portal Regional de la BVS

SWIFT-scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets, part 1: algorithm design.

Naim, Iftekhar; Datta, Suprakash; Rebhahn, Jonathan; Cavenaugh, James S; Mosmann, Tim R; Sharma, Gaurav.

Cytometry A ; 85(5): 408-21, 2014 May.

Artículo en Inglés | MEDLINE | ID: mdl-24677621

RESUMEN

We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems.

Asunto(s)

Algoritmos , Análisis por Conglomerados , Citometría de Flujo/métodos , Linaje de la Célula , Biología Computacional , Modelos Teóricos

SWIFT-scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets, part 2: biological evaluation.

Mosmann, Tim R; Naim, Iftekhar; Rebhahn, Jonathan; Datta, Suprakash; Cavenaugh, James S; Weaver, Jason M; Sharma, Gaurav.

Cytometry A ; 85(5): 422-33, 2014 May.

Artículo en Inglés | MEDLINE | ID: mdl-24532172

RESUMEN

A multistage clustering and data processing method, SWIFT (detailed in a companion manuscript), has been developed to detect rare subpopulations in large, high-dimensional flow cytometry datasets. An iterative sampling procedure initially fits the data to multidimensional Gaussian distributions, then splitting and merging stages use a criterion of unimodality to optimize the detection of rare subpopulations, to converge on a consistent cluster number, and to describe non-Gaussian distributions. Probabilistic assignment of cells to clusters, visualization, and manipulation of clusters by their cluster medians, facilitate application of expert knowledge using standard flow cytometry programs. The dual problems of rigorously comparing similar complex samples, and enumerating absent or very rare cell subpopulations in negative controls, were solved by assigning cells in multiple samples to a cluster template derived from a single or combined sample. Comparison of antigen-stimulated and control human peripheral blood cell samples demonstrated that SWIFT could identify biologically significant subpopulations, such as rare cytokine-producing influenza-specific T cells. A sensitivity of better than one part per million was attained in very large samples. Results were highly consistent on biological replicates, yet the analysis was sensitive enough to show that multiple samples from the same subject were more similar than samples from different subjects. A companion manuscript (Part 1) details the algorithmic development of SWIFT.

Asunto(s)

Algoritmos , Células Sanguíneas/citología , Análisis por Conglomerados , Citometría de Flujo/métodos , Antígenos/sangre , Antígenos/inmunología , Células Sanguíneas/inmunología , Linaje de la Célula , Biología Computacional , Humanos , Distribución Normal , Linfocitos T/citología , Linfocitos T/inmunología

Distinguishing endogenous retroviral LTRs from SINE elements using features extracted from evolved side effect machines.

Ashlock, Wendy; Datta, Suprakash.

IEEE/ACM Trans Comput Biol Bioinform ; 9(6): 1676-89, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-22908128

RESUMEN

Side effect machines produce features for classifiers that distinguish different types of DNA sequences. They have the, as yet unexploited, potential to give insight into biological features of the sequences. We introduce several innovations to the production and use of side effect machine sequence features. We compare the results of using consensus sequences and genomic sequences for training classifiers and find that more accurate results can be obtained using genomic sequences. Surprisingly, we were even able to build a classifier that distinguished consensus sequences from genomic sequences with high accuracy, suggesting that consensus sequences are not always representative of their genomic counterparts. We apply our techniques to the problem of distinguishing two types of transposable elements, solo LTRs and SINEs. Identifying these sequences is important because they affect gene expression,genome structure, and genetic diversity, and they serve as genetic markers. They are of similar length, neither codes for protein, and both have many nearly identical copies throughout the genome. Being able to efficiently and automatically distinguish them will aid efforts to improve annotations of genomes. Our approach reveals structural characteristics of the sequences of potential interest to biologists.

Asunto(s)

Inteligencia Artificial , Biología Computacional/métodos , Retroviridae/genética , Elementos de Nucleótido Esparcido Corto , Secuencias Repetidas Terminales , Algoritmos , Análisis por Conglomerados , Elementos Transponibles de ADN/genética , Humanos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA