k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction.

Parry, R M; Jones, W; Stokes, T H; Phan, J H; Moffitt, R A; Fang, H; Shi, L; Oberthuer, A; Fischer, M; Tong, W; Wang, M D

Parry, R M; Jones, W; Stokes, T H; Phan, J H; Moffitt, R A; Fang, H; Shi, L; Oberthuer, A; Fischer, M; Tong, W; Wang, M D.

Afiliação

Parry RM; Biomedical Engineering Department, Georgia Institute of Technology and Emory University, Atlanta, GA, USA.

Pharmacogenomics J ; 10(4): 292-309, 2010 Aug.

Article em En | MEDLINE | ID: mdl-20676068

ABSTRACT

ABSTRACT

In the clinical application of genomic data analysis and modeling, a number of factors contribute to the performance of disease classification and clinical outcome prediction. This study focuses on the k-nearest neighbor (KNN) modeling strategy and its clinical use. Although KNN is simple and clinically appealing, large performance variations were found among experienced data analysis teams in the MicroArray Quality Control Phase II (MAQC-II) project. For clinical end points and controls from breast cancer, neuroblastoma and multiple myeloma, we systematically generated 463,320 KNN models by varying feature ranking method, number of features, distance metric, number of neighbors, vote weighting and decision threshold. We identified factors that contribute to the MAQC-II project performance variation, and validated a KNN data analysis protocol using a newly generated clinical data set with 478 neuroblastoma patients. We interpreted the biological and practical significance of the derived KNN models, and compared their performance with existing clinical factors.

Assuntos

Modelos Estatísticos; Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos; Animais; Biomarcadores Tumorais; Neoplasias Encefálicas/tratamento farmacológico; Neoplasias Encefálicas/genética; Neoplasias da Mama/tratamento farmacológico; Neoplasias da Mama/genética; Intervalo Livre de Doença; Determinação de Ponto Final/estatística & dados numéricos; Feminino; Humanos; Modelos Logísticos; Mieloma Múltiplo/tratamento farmacológico; Mieloma Múltiplo/genética; Estadiamento de Neoplasias; Neuroblastoma/tratamento farmacológico; Neuroblastoma/genética; Valor Preditivo dos Testes; Controle de Qualidade; Medição de Risco; Resultado do Tratamento

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Modelos Estatísticos / Análise de Sequência com Séries de Oligonucleotídeos Tipo de estudo: Etiology_studies / Guideline / Prognostic_studies / Risk_factors_studies Limite: Animals / Female / Humans Idioma: En Revista: Pharmacogenomics J Assunto da revista: BIOLOGIA MOLECULAR / FARMACOLOGIA Ano de publicação: 2010 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google