Assessing and improving the stability of chemometric models in small sample size situations.
Anal Bioanal Chem
; 390(5): 1261-71, 2008 Mar.
Article
em En
| MEDLINE
| ID: mdl-18228011
ABSTRACT
Small sample sizes are very common in multivariate analysis. Sample sizes of 10-100 statistically independent objects (rejects from processes or loading dock analysis, or patients with a rare disease), each with hundreds of data points, cause unstable models with poor predictive quality. Model stability is assessed by comparing models that were built using slightly varying training data. Iterated k-fold cross-validation is used for this purpose. Aggregation stabilizes models. It is possible to assess the quality of the aggregated model without calculating further models. The validation and aggregation methods investigated in this study apply to regression as well as to classification. These techniques are useful for analyzing data with large numbers of variates, e.g., any spectral data like FT-IR, Raman, UV/VIS, fluorescence, AAS, and MS. FT-IR images of tumor tissue were used in this study. Some tissue types occur frequently, while some are very rare. They are classified using LDA. Initial models were severely unstable. Aggregation stabilizes the predictions. The hit rate increased from 67% to 82%.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Modelos Biológicos
Tipo de estudo:
Prognostic_studies
Idioma:
En
Ano de publicação:
2008
Tipo de documento:
Article