Your browser doesn't support javascript.
loading
Estimating the number of components and detecting outliers using Angle Distribution of Loading Subspaces (ADLS) in PCA analysis.
Liu, Y J; Tran, T; Postma, G; Buydens, L M C; Jansen, J.
Afiliación
  • Liu YJ; Radboud University, Institute for Molecules and Materials (IMM), Analytical Chemistry, Heyendaalseweg 135, 6525 AJ, Nijmegen, The Netherlands; State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University, 410082 Changsha, China. Electronic address: chemometrics@science.ru.nl.
  • Tran T; Radboud University, Institute for Molecules and Materials (IMM), Analytical Chemistry, Heyendaalseweg 135, 6525 AJ, Nijmegen, The Netherlands; Center for Mathematical Sciences, Merck, Sharp, & Dohme, Oss, The Netherlands.
  • Postma G; Radboud University, Institute for Molecules and Materials (IMM), Analytical Chemistry, Heyendaalseweg 135, 6525 AJ, Nijmegen, The Netherlands.
  • Buydens LMC; Radboud University, Institute for Molecules and Materials (IMM), Analytical Chemistry, Heyendaalseweg 135, 6525 AJ, Nijmegen, The Netherlands.
  • Jansen J; Radboud University, Institute for Molecules and Materials (IMM), Analytical Chemistry, Heyendaalseweg 135, 6525 AJ, Nijmegen, The Netherlands.
Anal Chim Acta ; 1020: 17-29, 2018 Aug 22.
Article en En | MEDLINE | ID: mdl-29655425
ABSTRACT
Principal Component Analysis (PCA) is widely used in analytical chemistry, to reduce the dimensionality of a multivariate data set in a few Principal Components (PCs) that summarize the predominant patterns in the data. An accurate estimate of the number of PCs is indispensable to provide meaningful interpretations and extract useful information. We show how existing estimates for the number of PCs may fall short for datasets with considerable coherence, noise or outlier presence. We present here how Angle Distribution of the Loading Subspaces (ADLS) can be used to estimate the number of PCs based on the variability of loading subspace across bootstrap resamples. Based on comprehensive comparisons with other well-known methods applied on simulated dataset, we show that ADLS (1) may quantify the stability of a PCA model with several numbers of PCs simultaneously; (2) better estimate the appropriate number of PCs when compared with the cross-validation and scree plot methods, specifically for coherent data, and (3) facilitate integrated outlier detection, which we introduce in this manuscript. We, in addition, demonstrate how the analysis of different types of real-life spectroscopic datasets may benefit from these advantages of ADLS.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Anal Chim Acta Año: 2018 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Anal Chim Acta Año: 2018 Tipo del documento: Article