Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 450
Filtrar
1.
Environ Int ; 191: 108993, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39278045

RESUMO

Changes in energy and environmental policies along with changes in the energy markets of New York State over the past two decades, have spurred interest in evaluating their impacts on emissions from various energy generation sectors. This study focused on quantifying these effects on VOC (volatile organic compounds) emissions and their subsequent impacts on air quality within the New York City (NYC) metropolitan area. NYC is an EPA nonattainment region for ozone (O3) and likely is a VOC limited region. NYC has a complex coastal topography and meteorology with low-level jets and sea/bay/land breeze circulation associated with heat waves, leading to summertime O3 exceedances and formation of secondary organic aerosol (SOA). To date, no comprehensive source apportionment studies have been done to understand the contributions of local and long-range sources of VOCs in this area. This study applied an improved Positive Matrix Factorization (PMF) methodology designed to incorporate atmospheric dispersion and photochemical reaction losses of VOCs to provide improved apportionment results. Hourly measurements of VOCs were obtained from a Photochemical Assessment Monitoring Station located at an urban site in the Bronx from 2000 to 2021. The study further explores the role of VOC sources in O3 and SOA formation and leverages advanced machine learning tools, XGBoost and SHAP algorithms, to identify synergistic interactions between sources and provided VOC source impacts on ambient O3 concentrations. Isoprene demonstrated a substantial influence in the source contribution of the biogenic factor, emphasizing its role in O3 formation. Notable contributions from anthropogenic emissions, such as fuel evaporation and various industrial processes, along with significant traffic-related sources that likely contribute to SOA formation, underscore the combined impact of natural and human-made sources on O3 pollution. Findings of this study can assist regulatory agencies in developing appropriate policy and management initiatives to control O3 pollution in NYC.


Assuntos
Poluentes Atmosféricos , Poluição do Ar , Monitoramento Ambiental , Ozônio , Compostos Orgânicos Voláteis , Ozônio/análise , Cidade de Nova Iorque , Poluentes Atmosféricos/análise , Compostos Orgânicos Voláteis/análise , Poluição do Ar/estatística & dados numéricos , Pentanos/análise , Butadienos/análise , Hemiterpenos/análise
2.
Adv Sci (Weinh) ; : e2407235, 2024 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-39316380

RESUMO

Accurately predicting the power conversion efficiency (PCE) in dye-sensitized solar cells (DSSCs) represents a crucial challenge, one that is pivotal for the high throughput rational design and screening of promising dye sensitizers. This study presents precise, predictive, and interpretable machine learning (ML) models specifically designed for Zn-porphyrin-sensitized solar cells. The model leverages theoretically computable, effective, and reusable molecular descriptors (MDs) to address this challenge. The models achieve excellent performance on a "blind test" of 17 newly designed cells, with a mean absolute error (MAE) of 1.02%. Notably, 10 dyes are predicted within a 1% error margin. These results validate the ML models and their importance in exploring uncharted chemical spaces of Zn-porphyrins. SHAP analysis identifies crucial MDs that align well with experimental observations, providing valuable chemical guidelines for the rational design of dyes in DSSCs. These predictive ML models enable efficient in silico screening, significantly reducing analysis time for photovoltaic cells. Promising Zn-porphyrin-based dyes with exceptional PCE are identified, facilitating high-throughput virtual screening. The prediction tool is publicly accessible at https://ai-meta.chem.ncu.edu.tw/dsc-meta.

3.
Accid Anal Prev ; 208: 107778, 2024 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-39288451

RESUMO

To effectively capture and explain complex, nonlinear relationships within bicycle crash frequency data and account for unobserved heterogeneity simultaneously, this study proposes a new hybrid framework that combines the Random Forest-based SHapley Additive exPlanations (RF-SHAP) method with a random parameter negative binomial regression model (RPNB). First, four machine learning algorithms, including random forest (RF), support vector machine (SVM), gradient boosting machine (GBM), and Extreme Gradient Boosting (XGBoost), were compared for variable importance calculation. The RF algorithm, demonstrating the best performance, was selected and integrated into an interpretable machine learning-based method (i.e., RF-SHAP) to provide an interpretable measure of each variable's impact, which is critical for understanding the model's predictions results. Finally, the RF-SHAP method was combined with the RPNB model to explore individual-specific variations that influence crash frequency predictions. Using 288 traffic analysis zones (TAZs) in Greater London and various regional risk factors for bicycle crash frequency, the proposed framework was validated. The results indicate that the proposed framework demonstrates improved prediction accuracy and better factor interpretation in analyzing bicycle crash frequency. The model exhibits consistent Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values, indicating its reliable explanatory power. Furthermore, there is a significant improvement in the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). This suggests that the proposed model effectively combines the explanatory power of statistical models with the forecasting powers of data-driven models. The interpretability of SHAP values, coupled with the causal insights from RPNB, provides policymakers with actionable information to develop targeted interventions.

4.
Diagnostics (Basel) ; 14(17)2024 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-39272651

RESUMO

Objective: The objective of the study was to establish an AI-driven decision support system by identifying the most important features in the severity of disease for Intensive Care Unit (ICU) with Mechanical Ventilation (MV) requirement, ICU, and InterMediate Care Unit (IMCU) admission for hospitalized patients with COVID-19 in South Florida. The features implicated in the risk factors identified by the model interpretability can be used to forecast treatment plans faster before critical conditions exacerbate. Methods: We analyzed eHR data from 5371 patients diagnosed with COVID-19 from South Florida Memorial Healthcare Systems admitted between March 2020 and January 2021 to predict the need for ICU with MV, ICU, and IMCU admission. A Random Forest classifier was trained on patients' data augmented by SMOTE, collected at hospital admission. We then compared the importance of features utilizing different model interpretability analyses, such as SHAP, MDI, and Permutation Importance. Results: The models for ICU with MV, ICU, and IMCU admission identified the following factors overlapping as the most important predictors among the three outcomes: age, race, sex, BMI, diarrhea, diabetes, hypertension, early stages of kidney disease, and pneumonia. It was observed that individuals over 65 years ('older adults'), males, current smokers, and BMI classified as 'overweight' and 'obese' were at greater risk of severity of illness. The severity was intensified by the co-occurrence of two interacting features (e.g., diarrhea and diabetes). Conclusions: The top features identified by the models' interpretability were from the 'sociodemographic characteristics', 'pre-hospital comorbidities', and 'medications' categories. However, 'pre-hospital comorbidities' played a vital role in different critical conditions. In addition to individual feature importance, the feature interactions also provide crucial information for predicting the most likely outcome of patients' conditions when urgent treatment plans are needed during the surge of patients during the pandemic.

5.
Quant Imaging Med Surg ; 14(9): 6311-6324, 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-39281129

RESUMO

Background: Follicular thyroid carcinoma (FTC) and follicular thyroid adenoma (FTA) present diagnostic challenges due to overlapping clinical and ultrasound features. Improving the diagnosis of FTC can enhance patient prognosis and effectiveness in clinical management. This study seeks to develop a predictive model for FTC based on ultrasound features using machine learning (ML) algorithms and assess its diagnostic effectiveness. Methods: Patients diagnosed with FTA or FTC based on surgical pathology between January 2009 and February 2023 at Zhejiang Provincial Cancer Hospital and Zhejiang Provincial People's Hospital were retrospectively included. A total of 562 patients from Zhejiang Provincial Cancer Hospital comprised the training set, and 218 patients from Zhejiang Provincial People's Hospital constituted the validation set. Subsequently, clinical parameters and ultrasound characteristics of the patients were collected. The diagnostic parameters were analyzed using the least absolute shrinkage and selection operator and multivariate logistic regression screening methods. Next, a comparative analysis was performed using seven ML models. The area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, specificity, positive predicted value (PPV), negative predicted value (NPV), precision, recall, and comprehensive evaluation index (F-score) were calculated to compare the diagnostic efficacy among the seven models and determine the optimal model. Further, the optimal model was validated, and the SHapley Additive ExPlanations (SHAP) approach was applied to explain the significance of the model variables. Finally, an individualized risk assessment was conducted. Results: Age, echogenicity, thyroglobulin antibody (TGAb), echotexture, composition, triiodothyronine (T3), thyroglobulin (TG), margin, thyroid-stimulating hormone (TSH), calcification, and halo thickness >2 mm were influential factors for diagnosing FTC. The XGBoost model was identified as the optimal model after a comprehensive evaluation. The AUC of this model in the validation set was 0.969 [95% confidence interval (CI), 0.946-0.992], while its precision sensitivity, specificity, and accuracy were 0.791, 0.930, 0.913 and 0.917, respectively. Conclusions: XGBoost model based on ultrasound features was constructed and interpreted using the SHAP method, providing evidence for the diagnosis of FTC and guidance for the personalized treatment of patients.

6.
Ophthalmic Epidemiol ; : 1-8, 2024 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-39288325

RESUMO

PURPOSE: To evaluate co-morbid sociomedical conditions affecting corneal donor endothelial cell density and transplant suitability. METHOD(S): Corneal donor transplant information was collected from the CorneaGen eye bank between June 1, 2012 and June 30, 2016. A natural language processing algorithm was applied to generate co-morbid sociomedical conditions for each donor. Variables of importance were identified using four machine learning models (random forest, Glmnet, Earth, nnet), for the outcomes of transplant suitability and endothelial cell density. SHAP (SHapley Additive exPlanations) values were generated, with beeswarm and box plots to visualize the contribution of each feature to the models. RESULTS: With a total of 23,522 unique donors, natural language processing generated 30,573 indices, which were reduced to 41 most common co-morbid sociomedical conditions. For transplant suitability, hypertension ranked the top overall variable of importance in two models. Hypertension, chronic obstructive pulmonary disease, history of smoking, and alcohol use appeared consistently in the top variables of importance. By SHAP feature importance, hypertension (0.042), alcohol use (0.017), ventilation of donor (0.011), and history of smoking (0.010) contributed the most to the transplant suitability model. For endothelial cell density, hypertension was the sociomedical condition of highest importance in three models. SHAP scores were highest among the sociomedical conditions of hypertension (0.037), alcohol use (0.013), myocardial infarction (0.012), and history of smoking (0.011). CONCLUSION: In a large cohort of corneal donor eyes, hypertension was identified as the most common contributor to machine learning models examining sociomedical conditions for corneal donor transplant suitability and endothelial cell density.

7.
Sci Rep ; 14(1): 21667, 2024 09 17.
Artigo em Inglês | MEDLINE | ID: mdl-39289475

RESUMO

In Virtual Reality (VR), a higher level of presence positively influences the experience and engagement of a user. There are several parameters that are responsible for generating different levels of presence in VR, including but not limited to, graphical fidelity, multi-sensory stimuli, and embodiment. However, standard methods of measuring presence, including self-reported questionnaires, are biased. This research focuses on developing a robust model, via machine learning, to detect different levels of presence in VR using multimodal neurological and physiological signals, including electroencephalography and electrodermal activity. An experiment has been undertaken whereby participants (N = 22) were each exposed to three different levels of presence (high, medium, and low) in a random order in VR. Four parameters within each level, including graphics fidelity, audio cues, latency, and embodiment with haptic feedback, were systematically manipulated to differentiate the levels. A number of multi-class classifiers were evaluated within a three-class classification problem, using a One-vs-Rest approach, including Support Vector Machine, k-Nearest Neighbour, Extra Gradient Boosting, Random Forest, Logistic Regression, and Multiple Layer Perceptron. Results demonstrated that the Multiple Layer Perceptron model obtained the highest macro average accuracy of 93 ± 0.03 % . Posthoc analysis revealed that relative band power, which is expressed as the ratio of power in a specific frequency band to the total baseline power, in both the frontal and parietal regions, including beta over theta and alpha ratio, and differential entropy were most significant in detecting different levels of presence.


Assuntos
Eletroencefalografia , Aprendizado de Máquina , Realidade Virtual , Humanos , Masculino , Feminino , Eletroencefalografia/métodos , Adulto , Adulto Jovem , Psicofisiologia/métodos , Resposta Galvânica da Pele/fisiologia
8.
Heliyon ; 10(17): e37179, 2024 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-39296250

RESUMO

Background: Ischemic stroke is a common and serious disease with economic and healthcare burdens. Predicting the unfavorable discharge outcome of patients is essential for formulating appropriate treatment strategies and providing personalized care. Therefore, this study aims to establish and validate a prediction model based on machine learning methods to accurately predict the discharge outcome of ischemic stroke patients, providing valuable information for clinical decision making. Methods: The derivation data consisted of 964 patients from Guangdong Provincial People's Hospital and was used for training and internal validation. A favourable discharge outcome was defined as a National Institutes of Health Stroke Scale score of ≤1 or a decrease of ≥8 points compared to the admission score. A predictive model was created based on 88 medical characteristics gathered during the patient's initial admission, using nine machine learning algorithms. The model's predictive performance was compared using various evaluation metrics. The final model's feature importance was ranked and explained using the Shapley additive explanation method. Findings: The random forest model demonstrated the greatest discriminative ability among the nine machine learning models. We created an interpretable random forest model by ranking and reducing the features based on their importance, which included eight features. In internal validations, the final model accurately predicted the discharge outcomes of ischemic stroke with AUC values of 0.903 and has been translated into a convenient tool to facilitate its utility in clinical settings. Conclusions: Our explainable ML model was not only successfully developed to accurately predict discharge outcomes in patients with ischemic stroke and it mitigated the concern of the "black-box" issue with an undirect interpretation of the ML technique.

9.
Cancer Manag Res ; 16: 1253-1265, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39297055

RESUMO

Purpose: To construct a free and accurate breast cancer mortality prediction tool by incorporating lifestyle factors, aiming to assist healthcare professionals in making informed decisions. Patients and Methods: In this retrospective study, we utilized a ten-year follow-up dataset of female breast cancer patients from a major Chinese hospital and included 1,390 female breast cancer patients with a 7% (96) mortality rate. We employed six machine learning algorithms (ridge regression, k-nearest neighbors, neural network, random forest, support vector machine, and extreme gradient boosting) to construct a mortality prediction model for breast cancer. Results: This model incorporated significant lifestyle factors, such as postsurgery sexual activity, use of totally implantable venous access ports, and prosthetic breast wear, which were identified as independent protective factors. Meanwhile, ten-fold cross-validation demonstrated the superiority of the random forest model (average AUC = 0.918; 1-year AUC = 0.914, 2-year AUC = 0.867, 3-year AUC = 0.883). External validation further supported the model's robustness (average AUC = 0.782; 1-year AUC = 0.809, 2-year AUC = 0.785, 3-year AUC = 0.893). Additionally, a free and user-friendly web tool was developed using the Shiny framework to facilitate easy access to the model. Conclusion: Our breast cancer mortality prediction model is free and accurate, providing healthcare professionals with valuable information to support their clinical decisions and potentially promoting healthier lifestyles for breast cancer patients.

10.
Eur J Surg Oncol ; 50(12): 108703, 2024 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-39326305

RESUMO

BACKGROUND: Unplanned reoperation (URO) after surgery adversely affects the quality of life and prognosis of patients undergoing anterior resection for rectal cancer. This study aims to meet the urgent need for reliable predictive tools by developing an optimized machine learning model to estimate the risk of URO following anterior resection in rectal cancer patients. METHODS: This retrospective study collected multidimensional data from patients who underwent anterior resection for rectal cancer at Tongji Hospital of Huazhong University of Science and Technology from January 2012 to December 2022. Feature selection was conducted using both least absolute shrinkage and selection operator (LASSO) regression and the Boruta algorithm. Multiple machine learning models were developed, with parameter optimization via grid search and cross-validation. Performance metrics included accuracy, specificity, sensitivity, and area under curve (AUC). The optimal model was interpreted using SHapley Additive exPlanations (SHAP), and an online platform was created for real-time risk prediction. RESULTS: A total of 2384 patients who underwent anterior resection for rectal cancer were included in this study. Following rigorous selection, 14 variables were identified for constructing the machine learning model. The optimized model demonstrated high predictive accuracy, with the random forest (RF) model achieving the best overall performance. The model achieved an AUC of 0.889 and an accuracy of 0.842 on the test dataset. SHAP analysis revealed that the tumor location, previous abdominal surgery, and operative time were the most significant factors influencing the risk of URO. CONCLUSION: This study developed an optimized machine learning-based online predictive system to assess the risk of URO after anterior resection in rectal cancer patients. Accessible at https://yangsu2023.shinyapps.io/UROrisk/, this system improves prediction accuracy and offers real-time risk assessment, providing a valuable tool that may support clinical decision-making and potentially improve the prognosis of rectal cancer patients.

11.
Methods ; 2024 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-39326482

RESUMO

In recent years, multi-omics clustering has become a powerful tool in cancer research, offering a comprehensive perspective on the diverse molecular characteristics inherent to various cancer subtypes. However, most existing multi-omics clustering methods directly integrate heterogeneous features from different omics, which may struggle to deal with the noise or redundancy of multi-omics data and lead to poor clustering results. Therefore, we propose a novel multi-omics clustering method to extract interpretable and discriminative features from various omics before data integration. The clinical information is used to supervise the process of feature extraction based on SHAP (SHapley Additive exPlanation) values. Singular value decomposition (SVD) is then applied to integrate the extracted features of different omics by constructing a latent subspace. Finally, we utilize shared nearest neighbor-based spectral clustering on the latent representation to obtain the clustering result. The proposed method is evaluated on several cancer datasets across three levels of omics, in comparison to several state-of-the-art multi-omics clustering methods. The comparison results demonstrate the superior performance of the proposed method in multi-omics data analysis for cancer subtyping. Additionally, experiments reveal the efficacy of utilizing clinical information based on SHAP values for feature extraction, enhancing the performance of clustering analyses. Moreover, enrichment analysis of the identified gene signatures in different subtypes is also performed to further demonstrate the effectiveness of the proposed method. Availability: The proposed method can be freely accessible at https://github.com/Tianyi-Shi-Tsukuba/Multi-omics-clustering-based-on-SHAP. Data will be made available on request.

12.
Biomimetics (Basel) ; 9(9)2024 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-39329567

RESUMO

The performance of ultra-high-performance concrete (UHPC) allows for the design and creation of thinner elements with superior overall durability. The compressive strength of UHPC is a value that can be reached after a certain period of time through a series of tests and cures. However, this value can be estimated by machine-learning methods. In this study, multilayer perceptron (MLP) and Stacking Regressor, an ensemble machine-learning models, is used to predict the compressive strength of high-performance concrete. Then, the ML model's performance is explained with a feature importance analysis and Shapley additive explanations (SHAPs), and the developed models are interpreted. The effect of using different random splits for the training and test sets has been investigated. It was observed that the stacking regressor, which combined the outputs of Extreme Gradient Boosting (XGBoost), Category Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM), and Extra Trees regressors using random forest as the final estimator, performed significantly better than the MLP regressor. It was shown that the compressive strength was predicted by the stacking regressor with an average R2 score of 0.971 on the test set. On the other hand, the average R2 score of the MLP model was 0.909. The results of the SHAP analysis showed that the age of concrete and the amounts of silica fume, fiber, superplasticizer, cement, aggregate, and water have the greatest impact on the model predictions.

13.
Heliyon ; 10(16): e35871, 2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-39220969

RESUMO

Slope instability through can cause catastrophic consequences, so slope stability analysis has been a key topic in the field of geotechnical engineering. Traditional analysis methods have shortcomings such as high operational difficulty and time-consuming, for this reason many researchers have carried out slope stability analysis based on AI. However, the current relevant studies only judged the importance of each factor and did not specifically quantify the correlation between factors and slope stability. For this purpose, this paper carried out a sensitivity analysis based on the XGBoost and SHAP. The sensitivity analysis results of SHAP were also validated using GeoStudio software. The selected influence factors included slope height ( H ), slope angle ( ß ), unit weight ( γ ), cohesion ( c ), angle of internal friction ( φ ) and pore water pressure coefficient ( r u ). The results showed that c and γ were the most and least important influential parameters, respectively. GeoStudio simulation results showed a negative correlation between γ , ß , H , r u and slope stability, while a positive correlation between c , φ and slope stability. However, for real data, SHAP misjudged the correlation between γ and slope stability. Because current AI lacked common sense knowledge and, leading SHAP unable to effectively explain the real mechanism of slope instability. For this reason, this paper overcame this challenge based on the priori data-driven approach. The method provided more reliable and accurate interpretation of the results than a real sample, especially with limited or low-quality data. In addition, the results of this method showed that the critical values of c , φ , ß , H , and r u in slope destabilization are 18 Kpa, 28°, 32°, 30 m, and 0.28, respectively. These results were closer to GeoStudio simulations than real samples.

14.
Environ Monit Assess ; 196(10): 876, 2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39222181

RESUMO

Mine water surge is one of the main safety risks in coal mines. This research offers a novel mine water source identification model (BO-CatBoost) to successfully avoid and control mine sudden water catastrophes by properly identifying the sources of mine water. First, the classification model is trained and built using the Categorical Boosting (CatBoost) algorithm. The Gaussian process Bayesian optimization (BO) algorithm is used to optimize parameters, and the optimal parameter combination is integrated into the CatBoost algorithm to build the BO-CatBoost mine water source identification model, which further improves the accuracy of mine water source identification. The model was also applied to the Pingdingshan mine to verify the practicality of the model. Then, 29 groups of unknown water sources in Pingdingshan were selected as validation samples for the model and compared with the conventional CatBoost, Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (Xgboost) models. The comparison results demonstrate that the accuracy of LightGBM, Xgboost, CatBoost, and BO-CatBoost models can reach 69%, 79.3%, 79.3%, and 100% respectively, and the RMSE is 0.947, 0.643, 0.719, and 0.0 respectively. The comprehensive analysis shows that, when it comes to mine water source detection, the BO-CatBoost model performs noticeably better than other models in terms of discriminative accuracy and generalization capacity. Lastly, the multi-output prediction and decision-making process of the BO-CatBoost water source identification model is visualized by the interpretability analysis performed with the SHAP approach. The research demonstrates that the BO-CatBoost model can more precisely and impartially identify mine water sources, offering fresh concepts for mine water source detection.


Assuntos
Teorema de Bayes , Minas de Carvão , Monitoramento Ambiental , Monitoramento Ambiental/métodos , Algoritmos , Mineração , Abastecimento de Água , Modelos Teóricos
15.
Cancer Sci ; 2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39223585

RESUMO

This study utilized data from 140,294 prostate cancer cases from the Surveillance, Epidemiology, and End Results (SEER) database. Here, 10 different machine learning algorithms were applied to develop treatment options for predicting patients with prostate cancer, differentiating between surgical and non-surgical treatments. The performances of the algorithms were measured using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value. The Shapley Additive Explanations (SHAP) method was employed to investigate the key factors influencing the prediction process. Survival analysis methods were used to compare the survival rates of different treatment options. The CatBoost model yielded the best results (AUC = 0.939, sensitivity = 0.877, accuracy = 0.877). SHAP interpreters revealed that the T stage, cancer stage, age, cores positive percentage, prostate-specific antigen, and Gleason score were the most critical factors in predicting treatment options. The study found that surgery significantly improved survival rates, with patients undergoing surgery experiencing a 20.36% increase in 10-year survival rates compared with those receiving non-surgical treatments. Among surgical options, radical prostatectomy had the highest 10-year survival rate at 89.2%. This study successfully developed a predictive model to guide treatment decisions for prostate cancer. Moreover, the model enhanced the transparency of the decision-making process, providing clinicians with a reference for formulating personalized treatment plans.

16.
J Food Sci ; 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-39218808

RESUMO

Brown rice over-milling causes high economic and nutrient loss. The rice degree of milling (DOM) detection and prediction remain a challenge for moderate processing. In this study, a self-established grain image acquisition platform was built. Degree of bran layer remaining (DOR) datasets is established with image capturing and processing (grain color, texture, and shape features extraction). The mapping relationship between DOR and the DOM is in-depth analyzed. Rice grain DOR typical machine learning and deep learning prediction models are established. The results indicate that the optimized Catboost model can be established with cross-validation and grid search method, with the best accuracy improving from 84.28% to 91.24%, achieving precision 91.31%, recall 90.89%, and F1-score 91.07%. Shapley additive explanations analysis indicates that color, texture, and shape feature affect Catboost prediction accuracy, the feature importance: color > texture > shape. The YCbCr-Cb_ske and GLCM-Contrast features make the most significant contribution to rice milling quality prediction. The feature importance provides theoretical and practical guidance for grain DOM prediction model. PRACTICAL APPLICATION: Rice milling degree prediction and detection are valuable for rice milling process in practical application. In this paper, image processing and machine learning methods provide an automated, nondestructive, and cost-effective way to predict the quality of rice. The study may serve as a valuable reference for improving rice milling methods, retaining rice nutrition, and reducing broken rice yield.

17.
JMIR Public Health Surveill ; 10: e48705, 2024 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-39264706

RESUMO

BACKGROUND: Understanding the factors contributing to mental well-being in youth is a public health priority. Self-reported enthusiasm for the future may be a useful indicator of well-being and has been shown to forecast social and educational success. Typically, cross-domain measures of ecological and health-related factors with relevance to public policy and programming are analyzed either in isolation or in targeted models assessing bivariate interactions. Here, we capitalize on a large provincial data set and machine learning to identify the sociodemographic, experiential, behavioral, and other health-related factors most strongly associated with levels of subjective enthusiasm for the future in a large sample of elementary and secondary school students. OBJECTIVE: The aim of this study was to identify the sociodemographic, experiential, behavioral, and other health-related factors associated with enthusiasm for the future in elementary and secondary school students using machine learning. METHODS: We analyzed data from 13,661 participants in the 2019 Ontario Student Drug Use and Health Survey (OSDUHS) (grades 7-12) with complete data for our primary outcome: self-reported levels of enthusiasm for the future. We used 50 variables as model predictors, including demographics, perception of school experience (i.e., school connectedness and academic performance), physical activity and quantity of sleep, substance use, and physical and mental health indicators. Models were built using a nonlinear decision tree-based machine learning algorithm called extreme gradient boosting to classify students as indicating either high or low levels of enthusiasm. Shapley additive explanations (SHAP) values were used to interpret the generated models, providing a ranking of feature importance and revealing any nonlinear or interactive effects of the input variables. RESULTS: The top 3 contributors to higher self-rated enthusiasm for the future were higher self-rated physical health (SHAP value=0.62), feeling that one is able to discuss problems or feelings with their parents (SHAP value=0.49), and school belonging (SHAP value=0.32). Additionally, subjective social status at school was a top feature and showed nonlinear effects, with benefits to predicted enthusiasm present in the mid-to-high range of values. CONCLUSIONS: Using machine learning, we identified key factors related to self-reported enthusiasm for the future in a large sample of young students: perceived physical health, subjective school social status and connectedness, and quality of relationship with parents. A focus on perceptions of physical health and school connectedness should be considered central to improving the well-being of youth at the population level.


Assuntos
Aprendizado de Máquina , Estudantes , Humanos , Adolescente , Masculino , Estudos Transversais , Feminino , Estudantes/psicologia , Estudantes/estatística & dados numéricos , Criança , Ontário , Instituições Acadêmicas , Autorrelato
18.
J Inflamm Res ; 17: 5901-5913, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39247840

RESUMO

Background: Machine learning (ML) is increasingly used in medical predictive modeling, but there are no studies applying ML to predict prognosis in Guillain-Barré syndrome (GBS). Materials and Methods: The medical records of 223 patients with GBS were analyzed to construct predictive models that affect patient prognosis. Least Absolute Shrinkage and Selection Operator (LASSO) was used to filter the variables. Decision Trees (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), k-nearest Neighbour (KNN), Naive Bayes (NB), Neural Network (NN). Light Gradient Boosting Machine (LGBM) and Logistic Regression (LR) were used to construct predictive models. Clinical data from 55 GBS patients were used to validate the model. SHapley additive explanation (SHAP) analysis was used to explain the model. Single sample gene set enrichment analysis (ssGSEA) was used for immune cell infiltration analysis. Results: The AUCs (area under the curves) of the 8 ML algorithms including DT, RF, XGBoost, KNN, NB, NN, LGBM and LR were as follows: 0.75, 0.896 0.874, 0.666, 0.742, 0.765, 0.869 and 0.744. The accuracy of XGBoost (0.852) was the highest, followed by LGBM (0.803) and RF (0.758), with F1 index of 0.832, 0.794, and 0.667, respectively. The results of the validation set data analysis showed AUCs of 0.839, 0.919, and 0.733 for RF, XGBoost, and LGBM, respectively. SHAP analysis showed that the SHAP values of blood neutrophil/lymphocyte ratio (NLR), age, mechanical ventilation, hyporeflexia and abnormal glossopharyngeal vagus nerve were 0.821, 0.645, 0.517, 0.401 and 0.109, respectively. Conclusion: The combination of NLR, age, mechanical ventilation, hyporeflexia and abnormal glossopharyngeal vagus used to predict short-term prognosis in patients with GBS has a good predictive value.

19.
Sci Total Environ ; 951: 175802, 2024 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-39197776

RESUMO

Soil salinization and heavy metal pollution in the Yellow River Delta region have elicited increasing concern. Therefore, revealing the underlying mechanism of the impact of soil salinity on potential toxic elements (PTEs) is crucial for environmental protection and the rational utilization of resources in this area. In this study, we employed CatBoost-SHAP and multiscale geographically weighted regression (MGWR) models to comprehensively investigate the spatial effects of soil electrical conductivity (EC1:5) on PTEs. Additionally, we employed a space-for-time substitution strategy with the aim of investigating how increasing soil salinity, represented by EC1:5, K+, Na+, Ca2+, and Mg2+, affects the bioavailability of PTEs over time. The primary findings are as follows: (1) for most PTEs, the influence of soil EC1:5 on the bioavailable forms of these elements surpassed its impact on their total concentrations. (2) The results of the MGWR model indicated that exchangeable Ca (aCa) in the soils of the eastern coastal areas markedly increased the bioavailable Cd (aCd), bioavailable Cu (aCu), and bioavailable Zn (aZn). (3) When the soil EC1:5 ranges between 2 and 6 dS/m, exchangeable Na (aNa) primarily competed for the adsorption sites of bioavailable Pb (aPb). However, as the soil EC1:5 increases to 6-10 dS/m, exchangeable Mg (aMg) and aCa became the primary competing ions, with aMg playing a more significant role than aCa. These findings provide valuable theoretical insights and practical guidance for saline-alkali soil improvement and PTEs pollution control in the Yellow River Delta region, thereby providing a foundation for sustainable environmental management and resource utilization.

20.
Sci Total Environ ; 951: 175484, 2024 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-39142415

RESUMO

The Jinsha River Basin (JRB) contributes a significant amount of sediment to the Yangtze River; however, an imbalance exists between runoff and sediment. The underlying mechanisms and primary factors driving this imbalance remain unclear. In this study, the Shapley Additive Explanation (SHAP) and Geographical Detector Model (GDM) were employed to quantify the importance of the driving factors for water yield (WYLD) and sediment yield (SYLD) using the Soil and Water Assessment Tool (SWAT) model in the JRB. The results indicated that the SWAT model performed well in simulating runoff and sediment, with R2 > 0.61 and NSE > 0.5. Based on the simulated data, SYLD exhibited strong spatiotemporal linkages with WYLD. Temporally, both sediment and runoff showed decreasing trends, with the sediment decrease being more pronounced. Spatially, WYLD and SYLD displayed similar distribution patterns, with low values in the southwest and high values in the northeast. By quantifying the driving factors, we found that climatic factors, including precipitation and potential evapotranspiration, were the main influencing factors for WYLD and SYLD across the entire region, though their contributions to the two variables differed. For WYLD, climatic factors accounted for 70 % of the total influencing factors, whereas their contribution to SYLD was 50 %. Furthermore, soil type and land-use type played significant roles in the SYLD, with importance values of 16 % and 12 %, respectively. Under the influence of surface conditions, the proportion of SYLD in the JRB to the total SYLD in the Yangtze River Basin was greater than that of WYLD. The findings of this study provide scientific evidence and technical support for local environmental impact assessments and the formulation of soil and water conservation plans.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA