|

Optimizing the soft independent modeling of class analogy (SIMCA) using statistical prediction regions.

Avohou, T Hermane; Sacré, Pierre-Yves; Hamla, Sabrina; Lebrun, Pierre; Hubert, Philippe; Ziemons, Éric.

Anal Chim Acta ; 1229: 340339, 2022 Oct 09.

Article En | MEDLINE | ID: mdl-36156218

The ultimate goal of a one-class classifier like the "rigorous" soft independent modeling of class analogy (SIMCA) is to predict with a certain confidence probability, the conformity of future objects with a given reference class. However, the SIMCA model, as currently implemented often suffers from an undercoverage problem, meaning that its observed sensitivity often falls far below the desired theoretical confidence probability, hence undermining its intended use as a predictive tool. To overcome the issue, the most reported strategy in the literature, involves incrementing the nominal confidence probability until the desired sensitivity is obtained in cross-validation. This article proposes a statistical prediction interval-based strategy as an alternative strategy to properly overcome this undercoverage issue. The strategy uses the concept of predictive distributions sensu stricto to construct statistical prediction regions for the metrics. Firstly, a procedure based on goodness-of-fit criteria is used to select the best-fitting family of probability models for each metric or its monotonic transformation, among several plausible candidate families of right-skewed probability distributions for positive random variables, including the gamma and the lognormal families. Secondly, assuming the best-fitting distribution, a generalized linear model is fitted to each metric data using the Bayesian method. This method enables to conveniently estimate uncertainties about the parameters of the selected distribution. Propagating these uncertainties to the best-fitting probability model of the metric enables to derive its so-called posterior predictive distribution, which is then used to set its critical limit. Overall, the evaluation of the proposed approach on a diversity of real datasets shows that it yields unbiased and more accurate sensitivities than existing methods which are not based on predictive densities. It can even yield better specificities than the strategy that attempts to improve sensitivities of existing methods by "optimizing" the type 1 error, especially in low sample sizes' contexts.

Interpretable One-Class Classification of Raman Spectra Using Prediction Bands Estimated by Wavelet Regression.

Avohou, T Hermane; Sacré, Pierre-Yves; Hubert, Philippe; Ziemons, Eric.

Anal Chem ; 94(10): 4183-4191, 2022 03 15.

Article En | MEDLINE | ID: mdl-35244387

Previously, we introduced a novel one-class classification (OCC) concept for spectra. It uses as acceptance space for genuine spectra of the target chemical, a prediction band in the wavelengths' space. As a decision rule, test spectra falling substantially outside this band are rejected as noncomplying with the target, and their deviations are documented in the wavelengths' space. This band-based OCC concept was applied to smooth signals like near-infrared (NIR) spectra. A regression model based on a smoothed principal component (PC) representation of the training spectra was used to predict unseen trajectories of future spectra. The boundaries of the most central predicted trajectories were chosen as critical trajectories. We now propose a methodology to construct a similar band-based one-class classifier for Raman spectra, which are sharper and noisier than NIR spectra. The spectra are transformed by a composition of wavelet and principal component (wPC) expansions instead of just a PC expansion in the previous methodology for NIR spectra. Wavelets can capture sharp features of Raman signals and provide a framework to efficiently denoise them. A multinormal prediction model is then used to derive predictions of future wPC scores of unseen spectra. These predicted wPC scores are then backtransformed to obtain predictions of future trajectories of unseen spectra in the wavelengths' space, whose most central region defines the acceptance band or space. This band-based one-class classifier successfully classified the first derivatives of real pharmaceutical Raman spectra, while enjoying the advantage of documenting deviations from the critical trajectories in the wavelengths' space and hence is more interpretable.

Spectrum Analysis, Raman , Spectrum Analysis, Raman/methods