Assessing and improving the stability of chemometric models in small sample size situations.

Beleites, Claudia; Salzer, Reiner

Beleites, Claudia; Salzer, Reiner.

Afiliação

Beleites C; Institute for Analytical Chemistry, Dresden University of Technology, Bergstrasse 66, 01062, Dresden, Germany. Claudia.Beleites@chemie.tu-dresden.de

Anal Bioanal Chem ; 390(5): 1261-71, 2008 Mar.

Article em En | MEDLINE | ID: mdl-18228011

ABSTRACT

ABSTRACT

Small sample sizes are very common in multivariate analysis. Sample sizes of 10-100 statistically independent objects (rejects from processes or loading dock analysis, or patients with a rare disease), each with hundreds of data points, cause unstable models with poor predictive quality. Model stability is assessed by comparing models that were built using slightly varying training data. Iterated k-fold cross-validation is used for this purpose. Aggregation stabilizes models. It is possible to assess the quality of the aggregated model without calculating further models. The validation and aggregation methods investigated in this study apply to regression as well as to classification. These techniques are useful for analyzing data with large numbers of variates, e.g., any spectral data like FT-IR, Raman, UV/VIS, fluorescence, AAS, and MS. FT-IR images of tumor tissue were used in this study. Some tissue types occur frequently, while some are very rare. They are classified using LDA. Initial models were severely unstable. Aggregation stabilizes the predictions. The hit rate increased from 67% to 82%.

Assuntos

Modelos Biológicos; Análise Multivariada; Neoplasias/química; Neoplasias/classificação; Probabilidade; Reprodutibilidade dos Testes; Tamanho da Amostra

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Modelos Biológicos Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2008 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google