gEM/GANN: A multivariate computational strategy for auto-characterizing relationships between cellular and clinical phenotypes and predicting disease progression time using high-dimensional flow cytometry data.

Tong, Dong Ling; Ball, Graham R; Pockley, A Graham

Tong, Dong Ling; Ball, Graham R; Pockley, A Graham.

Afiliación

Tong DL; The John van Geest Cancer Research Centre, Nottingham Trent University, Nottingham, NG11 8NS, United Kingdom.
Ball GR; The John van Geest Cancer Research Centre, Nottingham Trent University, Nottingham, NG11 8NS, United Kingdom.
Pockley AG; The John van Geest Cancer Research Centre, Nottingham Trent University, Nottingham, NG11 8NS, United Kingdom.

Cytometry A ; 87(7): 616-23, 2015 Jul.

Article en En | MEDLINE | ID: mdl-25572884

ABSTRACT

ABSTRACT

The dramatic increase in the complexity of flow cytometric datasets requires new computational approaches that can maximize the amount of information derived and overcome the limitations of traditional gating strategies. Herein, we present a multivariate computational analysis of the HIV-infected flow cytometry datasets that were provided as part of the FlowCAP-IV Challenge using unsupervised and supervised learning techniques. Out of 383 samples (stimulated and unstimulated), 191 samples were used as a training set (34 individuals whose disease did not progress, and 157 individuals whose disease did progress). Using the results from the training set, the participants in the Challenge were then asked to predict the condition and progression time of the remaining individuals (45 "nonprogressors" and 147 "progressors"). To achieve this, we first scaled down data resolution and then excluded doublet cells from the analysis using Expectation Maximization approaches. We then standardized all samples into histograms and used Genetic Algorithm-Neural Network to extract feature sets from the datasets, the reliability of which were examined using WEKA-implemented classifiers. The selected feature set resulted in a high sensitivity and specificity for the discrimination of progressors and nonprogressors in the training set (average True Positive Rate = 1.00 and average False Positive Rate = 0.033). The capacity of the feature set to predict real-time survival time was better when using data from the "unstimulated" training set (r = 0.825). The P-values and 95% confidence interval log-rank ratios between actual and predicted survival time in the test set were 0.682 and 0.9542 ± 0.24 for the unstimulated dataset, and 0.4451 and 0.9173 ± 0.23 for the stimulated dataset. Our analytic strategy has demonstrated a promising capacity to extract useful information from complex flow cytometry datasets, despite a significance imbalance and variation between the training and test sets.

Asunto(s)

Biología Computacional/métodos; Progresión de la Enfermedad; Procesamiento Automatizado de Datos/métodos; Citometría de Flujo/métodos; Infecciones por VIH/diagnóstico; Algoritmos; Análisis por Conglomerados; Humanos; Análisis Multivariante; Pronóstico

Palabras clave

Key terms: FlowCAP; cluster analysis; expectation maximization; feature identification; genetic algorithm-neural network; imbalance; multidimensional; survival time

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Procesamiento Automatizado de Datos / Infecciones por VIH / Progresión de la Enfermedad / Biología Computacional / Citometría de Flujo Tipo de estudio: Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: Cytometry A Año: 2015 Tipo del documento: Article País de afiliación: Reino Unido

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google