Your browser doesn't support javascript.
loading
PROSAC as a selection tool for SO-PLS regression: A strategy for multi-block data fusion.
Diaz-Olivares, Jose A; Bendoula, Ryad; Saeys, Wouter; Ryckewaert, Maxime; Adriaens, Ines; Fu, Xinyue; Pastell, Matti; Roger, Jean-Michel; Aernouts, Ben.
Affiliation
  • Diaz-Olivares JA; KU Leuven, Department of Biosystems, Division of Animal and Human Health Engineering, Campus Geel, Kleinhoefstraat 4, 2440, Geel, Belgium. Electronic address: jose.diaz@kuleuven.be.
  • Bendoula R; ITAP, Univ. Montpellier, INRAE, Institute Agro, Montpellier, France.
  • Saeys W; KU Leuven, Department of Biosystems, MeBioS unit, Kasteelpark Arenberg 30, 3001, Leuven, Belgium.
  • Ryckewaert M; Inria, Univ. Montpellier, CNRS, LIRMM, Montpellier, France.
  • Adriaens I; KU Leuven, Department of Biosystems, Division of Animal and Human Health Engineering, Campus Geel, Kleinhoefstraat 4, 2440, Geel, Belgium; Department of Data Analysis and Mathematical Modelling, Division BioVism, Campus Coupure, Coupure Links 653, 9000, Ghent, Belgium.
  • Fu X; KU Leuven, Department of Biosystems, Division of Animal and Human Health Engineering, Campus Geel, Kleinhoefstraat 4, 2440, Geel, Belgium.
  • Pastell M; Production Systems, Natural Resources Institute Finland (Luke), Latokartanonkaari 9, 00790, Helsinki, Finland.
  • Roger JM; ITAP, Univ. Montpellier, INRAE, Institute Agro, Montpellier, France; ChemHouse Research Group, Montpellier, France.
  • Aernouts B; KU Leuven, Department of Biosystems, Division of Animal and Human Health Engineering, Campus Geel, Kleinhoefstraat 4, 2440, Geel, Belgium. Electronic address: ben.aernouts@kuleuven.be.
Anal Chim Acta ; 1319: 342965, 2024 Aug 29.
Article in En | MEDLINE | ID: mdl-39122277
ABSTRACT

BACKGROUND:

Spectral data from multiple sources can be integrated into multi-block fusion chemometric models, such as sequentially orthogonalized partial-least squares (SO-PLS), to improve the prediction of sample quality features. Pre-processing techniques are often applied to mitigate extraneous variability, unrelated to the response variables. However, the selection of suitable pre-processing methods and identification of informative data blocks becomes increasingly complex and time-consuming when dealing with a large number of blocks. The problem addressed in this work is the efficient pre-processing, selection, and ordering of data blocks for targeted applications in SO-PLS.

RESULTS:

We introduce the PROSAC-SO-PLS methodology, which employs pre-processing ensembles with response-oriented sequential alternation calibration (PROSAC). This approach identifies the best pre-processed data blocks and their sequential order for specific SO-PLS applications. The method uses a stepwise forward selection strategy, facilitated by the rapid Gram-Schmidt process, to prioritize blocks based on their effectiveness in minimizing prediction error, as indicated by the lowest prediction residuals. To validate the efficacy of our approach, we showcase the outcomes of three empirical near-infrared (NIR) datasets. Comparative analyses were performed against partial-least-squares (PLS) regressions on single-block pre-processed datasets and a methodology relying solely on PROSAC. The PROSAC-SO-PLS approach consistently outperformed these methods, yielding significantly lower prediction errors. This has been evidenced by a reduction in the root-mean-squared error of prediction (RMSEP) ranging from 5 to 25 % across seven out of the eight response variables analyzed.

SIGNIFICANCE:

The PROSAC-SO-PLS methodology offers a versatile and efficient technique for ensemble pre-processing in NIR data modeling. It enables the use of SO-PLS minimizing concerns about pre-processing sequence or block order and effectively manages a large number of data blocks. This innovation significantly streamlines the data pre-processing and model-building processes, enhancing the accuracy and efficiency of chemometric models.
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Anal Chim Acta Year: 2024 Document type: Article

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Anal Chim Acta Year: 2024 Document type: Article