Improving cluster-based missing value estimation of DNA microarray data.

Brás, Lígia P; Menezes, José C

Brás, Lígia P; Menezes, José C.

Afiliação

Brás LP; Centre for Chemical & Biological Engineering, Department of Chemical and Biological Engineering, IST, Technical University of Lisbon, Av. Rovisco Pais, P-1049-001 Lisbon, Portugal.

Biomol Eng ; 24(2): 273-82, 2007 Jun.

Article em En | MEDLINE | ID: mdl-17493870

ABSTRACT

ABSTRACT

We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values. The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation. The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.

Assuntos

Algoritmos; Artefatos; Análise por Conglomerados; Interpretação Estatística de Dados; Modelos Genéticos; Análise de Sequência com Séries de Oligonucleotídeos/métodos; Simulação por Computador; Perfilação da Expressão Gênica; Modelos Estatísticos

Buscar no Google

Imprimir

XML

PubMed Links

Base de dados: MEDLINE Assunto principal: Algoritmos / Análise por Conglomerados / Interpretação Estatística de Dados / Artefatos / Análise de Sequência com Séries de Oligonucleotídeos / Modelos Genéticos Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2007 Tipo de documento: Article

Buscar no Google

Imprimir

XML

PubMed Links