RESUMO
Introduction: This investigation leverages advanced machine learning (ML) techniques to dissect the complex relationship between heavy metal exposure and its impacts on osteoarthritis (OA) and rheumatoid arthritis (RA). Utilizing a comprehensive dataset from the National Health and Nutrition Examination Survey (NHANES) spanning from 2003 to 2020, this study aims to elucidate the roles specific heavy metals play in the incidence and differentiation of OA and RA. Methods: Employing a phased ML strategy that encompasses a range of methodologies, including LASSO regression and SHapley Additive exPlanations (SHAP), our analytical framework integrates demographic, laboratory, and questionnaire data. Thirteen distinct ML models were applied across seven methodologies to enhance the predictability and interpretability of clinical outcomes. Each phase of model development was meticulously designed to progressively refine the algorithm's performance. Results: The results reveal significant associations between certain heavy metals and an increased risk of arthritis. The phased ML approach enabled the precise identification of key predictors and their contributions to disease outcomes. Discussion: These findings offer new insights into potential pathways for early detection, prevention, and management strategies for arthritis associated with environmental exposures. By improving the interpretability of ML models, this research provides a potent tool for clinicians and researchers, facilitating a deeper understanding of the environmental determinants of arthritis.
RESUMO
Meteorological factors, which are periodic and regular in a long run, have an unignorable impact on human health. Accurate health risk prediction based on meteorological factors is essential for optimal allocation of resource in healthcare units. However, due to the non-stationary and non-linear nature of the original hospitalization sequence, traditional methods are less robust in predicting it. This study aims to investigate hospital admission prediction models using time series pre-processing algorithms and deep learning approach based on meteorological factors. Using the electronic medical record data from Panyu Central Hospital and meteorological data of Panyu district from 2003 to 2019, 46,089 eligible patients with lower respiratory tract infections (LRTIs) and four meteorological factors were identified to build and evaluate the prediction models. A novel hybrid model, Cascade GAM-CEEMDAN-LSTM Model (CGCLM), was established in combination with generalized additive model (GAM), complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), and long-short term memory (LSTM) networks for predicting daily admissions of patients with LRTIs. The experimental results show that CGCLM multistep method proposed in this paper outperforms single LSTM model in the prediction of health risk time series at different time window sizes. Moreover, our results also indicate that CGCLM has the best prediction performance when the time window is set to 61 days (RMSE = 1.12, MAE = 0.87, R2 = 0.93). Adequate extraction of exposure-response relationships between meteorological factors and diseases and suitable handling of sequence pre-processing have an important role in time series prediction. This hybrid climate-based model for predicting LRTIs disease can also be extended to time series prediction of other epidemic disease.