Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
1.
BMC Bioinformatics ; 12: 483, 2011 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-22182303

RESUMO

BACKGROUND: Multimodal data, especially imaging and non-imaging data, is being routinely acquired in the context of disease diagnostics; however, computational challenges have limited the ability to quantitatively integrate imaging and non-imaging data channels with different dimensionalities and scales. To the best of our knowledge relatively few attempts have been made to quantitatively fuse such data to construct classifiers and none have attempted to quantitatively combine histology (imaging) and proteomic (non-imaging) measurements for making diagnostic and prognostic predictions. The objective of this work is to create a common subspace to simultaneously accommodate both the imaging and non-imaging data (and hence data corresponding to different scales and dimensionalities), called a metaspace. This metaspace can be used to build a meta-classifier that produces better classification results than a classifier that is based on a single modality alone. Canonical Correlation Analysis (CCA) and Regularized CCA (RCCA) are statistical techniques that extract correlations between two modes of data to construct a homogeneous, uniform representation of heterogeneous data channels. In this paper, we present a novel modification to CCA and RCCA, Supervised Regularized Canonical Correlation Analysis (SRCCA), that (1) enables the quantitative integration of data from multiple modalities using a feature selection scheme, (2) is regularized, and (3) is computationally cheap. We leverage this SRCCA framework towards the fusion of proteomic and histologic image signatures for identifying prostate cancer patients at the risk of 5 year biochemical recurrence following radical prostatectomy. RESULTS: A cohort of 19 grade, stage matched prostate cancer patients, all of whom had radical prostatectomy, including 10 of whom had biochemical recurrence within 5 years of surgery and 9 of whom did not, were considered in this study. The aim was to construct a lower fused dimensional metaspace comprising both the histological and proteomic measurements obtained from the site of the dominant nodule on the surgical specimen. In conjunction with SRCCA, a random forest classifier was able to identify prostate cancer patients, who developed biochemical recurrence within 5 years, with a maximum classification accuracy of 93%. CONCLUSIONS: The classifier performance in the SRCCA space was found to be statistically significantly higher compared to the fused data representations obtained, not only from CCA and RCCA, but also two other statistical techniques called Principal Component Analysis and Partial Least Squares Regression. These results suggest that SRCCA is a computationally efficient and a highly accurate scheme for representing multimodal (histologic and proteomic) data in a metaspace and that it could be used to construct fused biomarkers for predicting disease recurrence and prognosis.


Assuntos
Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/patologia , Proteômica , Idoso , Estudos de Coortes , Diagnóstico por Imagem , Humanos , Masculino , Pessoa de Meia-Idade , Análise Multivariada , Análise de Componente Principal , Prognóstico , Antígeno Prostático Específico , Prostatectomia , Neoplasias da Próstata/genética , Neoplasias da Próstata/cirurgia , Recidiva
2.
Artigo em Inglês | MEDLINE | ID: mdl-22254468

RESUMO

In this work, we analyze and evaluate different strategies for comparing Feature Selection (FS) schemes on High Dimensional (HD) biomedical datasets (e.g. gene and protein expression studies) with a small sample size (SSS). Additionally, we define a new feature, Robustness, specifically for comparing the ability of an FS scheme to be invariant to changes in its training data. While classifier accuracy has been the de facto method for evaluating FS schemes, on account of the curse of dimensionality problem, it might not always be the appropriate measure for HD/SSS datasets. SSS lends the dataset a higher probability of containing data that is not representative of the true distribution of the whole population. However, an ideal FS scheme must be robust enough to produce the same results each time there are changes to the training data. In this study, we employed the robustness performance measure in conjunction with classifier accuracy (measured via the K-Nearest Neighbor and Random Forest classifiers) to quantitatively compare five different FS schemes (T-test, F-test, Kolmogorov-Smirnov Test, Wilks Lambda Test and Wilcoxon Rand Sum Test) on 5 HD/SSS gene and protein expression datasets corresponding to ovarian cancer, lung cancer, bone lesions, celiac disease, and coronary heart disease. Of the five FS schemes compared, the Wilcoxon Rand Sum Test was found to outperform other FS schemes in terms of classification accuracy and robustness. Our results suggest that both classifier accuracy and robustness should be considered when deciding on the appropriate FS scheme for HD/SSS datasets.


Assuntos
Algoritmos , Mineração de Dados/métodos , Bases de Dados Factuais , Perfilação da Expressão Gênica/métodos , Proteínas de Neoplasias/metabolismo , Neoplasias/metabolismo , Reconhecimento Automatizado de Padrão/métodos , Transdução de Sinais , Animais , Humanos
3.
Artigo em Inglês | MEDLINE | ID: mdl-22255811

RESUMO

Multimodal data, especially imaging and non-imaging data, is being routinely acquired in the context of disease diagnostics; however computational challenges have limited the ability to quantitatively integrate imaging and non-imaging data channels with different dimensionalities for making diagnostic and prognostic predictions. The objective of this work is to create a common subspace to simultaneously accommodate both the imaging and non-imaging data, called a metaspace. This metaspace can be used to build a meta-classifier that produces better classification results than a classifier that is based on a single modality alone. In this paper, we present a novel Supervised Regularized Canonical Correlation Analysis (SRCCA) algorithm that (1) enables the quantitative integration of data from multiple modalities using a feature selection scheme, (2) is regularized, and (3) is computationally cheap. We leverage this SRCCA framework towards the fusion of proteomic and histologic image signatures for identifying prostate cancer patients at risk for biochemical recurrence following radical prostatectomy. For a cohort of 19 prostate cancer patients, SRCCA was able to yield a lower fused dimensional metaspace comprising both the histological and proteomic attributes. In conjunction with SRCCA, a random forest classifier was able to identify patients at risk for biochemical failure with a maximum accuracy of 93%. The classifier performance in the SRCCA space was statistically significantly higher compared to the fused data representations obtained either with Canonical Correlation Analysis (CCA) or Regularized CCA.


Assuntos
Biologia Computacional/métodos , Técnicas Histológicas/métodos , Neoplasias da Próstata/metabolismo , Proteômica/métodos , Algoritmos , Bioquímica/métodos , Biomarcadores Tumorais , Diagnóstico por Imagem/métodos , Humanos , Masculino , Modelos Estatísticos , Distribuição Normal , Neoplasias da Próstata/cirurgia , Recidiva , Reprodutibilidade dos Testes , Software , Resultado do Tratamento
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA