Opportunities offered by latent-based multiblock strategies to integrate biomarkers of chemical exposure and biomarkers of effect in environmental health studies.
Chemosphere
; 361: 142465, 2024 Aug.
Article
em En
| MEDLINE
| ID: mdl-38810805
ABSTRACT
Modern environmental epidemiology benefits from a new generation of technologies that enable comprehensive profiling of biomarkers, including environmental chemical exposure and omic datasets. The integration and analysis of large and structured datasets to identify functional associations is constrained by computational challenges that cannot be overcome using conventional regression methods. Some extensions of Partial Least Squares (PLS) regression have been developed to efficently integrate multiple datasets, including Multiblock PLS (MB-PLS) and Sequential and Orthogonalized PLS; however, these approaches remain seldom applied in environmental epidemiology. To address that research gap, this study aimed to assess and compare the applicability of PLS-based multiblock models in an observational case study, where biomarkers of exposure to environmental chemicals and endogenous biomarkers of effect were simultaneously integrated to highlight biological links related to a health outcome. The methods were compared with and without sparsity coupling two metrics to support the variable selection Variable Importance in Projection (VIP) and Selectivity Ratio (SR). The framework was applied to a case-study dataset mimicking the structure of 36 environmental exposure biomarkers (E-block), 61 inflammation biomarkers (M-block), and their relationships with the gestational age at delivery of 161 mother-infant pairs. The results showed an overall consistency in the selected variables across models, although some specific selection patterns were identified. The block-scaled concatenation-based approaches (e.g. MB-PLS) tended to select more variables from the E-block, while these methods were unable to identify certain variables in the M-block. Overall, the number of variables selected using the SR criterion was higher than using the VIP criterion, with lower predictive performances. The multiblock models coupled to VIP, appeared to be the methods of choice for identifying relevant variables with similar statistical performances. Overall, the use of multiblock PLS-based methods appears to be a good strategy to efficiently support the variable selection process in modern environmental epidemiology.
Palavras-chave
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Biomarcadores
/
Exposição Ambiental
Idioma:
En
Ano de publicação:
2024
Tipo de documento:
Article
País de afiliação:
França