RESUMO
OBJECTIVES: Regardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument. STUDY DESIGN AND SETTING: Real-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters. RESULTS: Mean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the subjects. Furthermore, when a large percentage of subjects had missing items (>25%), MI methods applied to the items outperformed methods applied to the total score. CONCLUSION: We recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data.
Assuntos
Interpretação Estatística de Dados , Viés , Simulação por Computador , Humanos , Modelos Lineares , Projetos de PesquisaRESUMO
Double (bi-allelic) mutations in the gene encoding the CCAAT/enhancer-binding protein-alpha (CEBPA) transcription factor have a favorable prognostic impact in acute myeloid leukemia (AML). Double mutations in CEBPA can be detected using various techniques, but it is a notoriously difficult gene to sequence due to its high GC-content. Here we developed a two-step gene expression classifier for accurate and standardized detection of CEBPA double mutations. The key feature of the two-step classifier is that it explicitly removes cases with low CEBPA expression, thereby excluding CEBPA hypermethylated cases that have similar gene expression profiles as a CEBPA double mutant, which would result in false-positive predictions. In the second step, we have developed a 55 gene signature to identity the true CEBPA double-mutation cases. This two-step classifier was tested on a cohort of 505 unselected AML cases, including 26 CEBPA double mutants, 12 CEBPA single mutants, and seven CEBPA promoter hypermethylated cases, on which its performance was estimated by a double-loop cross-validation protocol. The two-step classifier achieves a sensitivity of 96.2% (95% confidence interval [CI] 81.1 to 99.3) and specificity of 100.0% (95% CI 99.2 to 100.0). There are no false-positive detections. This two-step CEBPA double-mutation classifier has been incorporated on a microarray platform that can simultaneously detect other relevant molecular biomarkers, which allows for a standardized comprehensive diagnostic assay. In conclusion, gene expression profiling provides a reliable method for CEBPA double-mutation detection in patients with AML for clinical use.