Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 100
Filter
1.
Sci Total Environ ; 948: 174462, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38992374

ABSTRACT

This comprehensive study unveils the vast global potential of microalgae as a sustainable bioenergy source, focusing on the utilization of marginal lands and employing advanced machine learning techniques to predict biomass productivity. By identifying approximately 7.37 million square kilometers of marginal lands suitable for microalgae cultivation, this research uncovers the extensive potential of these underutilized areas, particularly within equatorial and low-latitude regions, for microalgae bioenergy development. This approach mitigates the competition for food resources and conserves freshwater supplies. Utilizing cutting-edge machine learning algorithms based on robust datasets from global microalgae cultivation experiments spanning 1994 to 2017, this study integrates essential environmental variables to map out a detailed projection of potential yields across a variety of landscapes. The analysis further delineates the bioenergy and carbon sequestration potential across two effective cultivation methods: Photobioreactors (PBRs), and Open Ponds, with PBRs showcasing exceptional productivity, with a global average daily biomass productivity of 142.81mgL-1d-1, followed by Open Ponds at 122.57mgL-1d-1. Projections based on optimal PBR conditions suggest an annual yield of 99.54 gigatons of microalgae biomass. This yield can be transformed into 64.70 gigatons of biodiesel, equivalent to 58.68 gigatons of traditional diesel, while sequestering 182.16 gigatons of CO2, equating to approximately 4.5 times the global CO2 emissions projected for 2023. Notably, Australia leads in microalgae biomass production, with an annual output of 16.19 gigatons, followed by significant contributions from Kazakhstan, Sudan, Brazil, the United States, and China, showcasing the diverse global potential for microalgae bioenergy across varying ecological and geographical landscapes. Through this rigorous investigation, the study emphasizes the strategic importance of microalgae cultivation in achieving sustainable energy solutions and mitigating climate change, while also acknowledging the scalability challenges and the necessity for significant economic and energy investments.

2.
Ying Yong Sheng Tai Xue Bao ; 35(5): 1321-1330, 2024 May.
Article in Chinese | MEDLINE | ID: mdl-38886431

ABSTRACT

Rapid acquisition of the data of soil moisture content (SMC) and soil organic matter (SOM) content is crucial for the improvement and utilization of saline alkali farmland soil. Based on field measurements of hyperspectral reflectance and soil properties of farmland soil in the Hetao Plain, we used a competitive adaptive reweighted sampling algorithm (CARS) to screen sensitive bands after transforming the original spectral reflectance (Ref) into a standard normal variable (SNV). Strategies Ⅰ, Ⅱ, and Ⅲ were used to model the input variables of Ref, Ref SNV, Ref-SNV+ soil covariate (SC), and digital elevation model (DEM). We constructed SMC and SOM estimation models based on random forest (RF) and light gradient boosting machine (LightGBM), and then verified and compared the accuracy of the models. The results showed that after CARS screening, the sensitive bands of SMC and SOM were compressed to below 3.3% of the entire band, which effectively optimized band selection and reduced redundant spectral information. Compared with the LightGBM model, the RF model had higher accuracy in SMC and SOM estimation, and the input variable strategy Ⅲ was better than Ⅱ and Ⅰ. The introduction of auxiliary variables effectively improved the estimation ability of the model. Based on comprehensive analysis, the coefficient of determination (Rp2), root mean square error (RMSE), and relative analysis error (RPD) of the SMC estimation model validation based on strategy Ⅲ-RF were 0.63, 3.16, and 2.01, respectively. The SOM estimation models based on strategy Ⅲ-RF had Rp2, RMSE, and RPD of 0.93, 1.15, and 3.52, respectively. The strategy Ⅲ-RF model was an effective method for estimating SMC and SOM. Our results could provide a new method for the rapid estimation of soil moisture and organic matter content in saline alkali farmland.


Subject(s)
Algorithms , Organic Chemicals , Soil , Water , Soil/chemistry , Organic Chemicals/analysis , Water/analysis , Crops, Agricultural/growth & development , Crops, Agricultural/chemistry , Alkalies/analysis , Alkalies/chemistry , China , Ecosystem
3.
Food Chem ; 456: 140062, 2024 Oct 30.
Article in English | MEDLINE | ID: mdl-38876073

ABSTRACT

Differences in moisture and protein content impact both nutritional value and processing efficiency of corn kernels. Near-infrared (NIR) spectroscopy can be used to estimate kernel composition, but models trained on a few environments may underestimate error rates and bias. We assembled corn samples from diverse international environments and used NIR with chemometrics and partial least squares regression (PLSR) to determine moisture and protein. The potential of five feature selection methods to improve prediction accuracy was assessed by extracting sensitive wavelengths. Gradient boosting machines (GBMs), particularly CatBoost and LightGBM, were found to effectively select crucial wavelengths for moisture (1409, 1900, 1908, 1932, 1953, 2174 nm) and protein (887, 1212, 1705, 1891, 2097, 2456 nm). SHAP plots highlighted significant wavelength contributions to model prediction. These results illustrate GBMs' effectiveness in feature engineering for agricultural and food sector applications, including developing multi-country global calibration models for moisture and protein in corn kernels.


Subject(s)
Plant Proteins , Spectroscopy, Near-Infrared , Water , Zea mays , Zea mays/chemistry , Spectroscopy, Near-Infrared/methods , Plant Proteins/analysis , Plant Proteins/chemistry , Least-Squares Analysis , Water/chemistry , Water/analysis , Seeds/chemistry
4.
Am J Transl Res ; 16(5): 1740-1748, 2024.
Article in English | MEDLINE | ID: mdl-38883341

ABSTRACT

OBJECTIVE: To identify factors influencing recurrence after percutaneous transhepatic choledochoscopic lithotripsy (PTCSL) and to develop a predictive model. METHODS: We retrospectively analyzed clinical data from 354 patients with intrahepatic and extrahepatic bile duct stones treated with PTCSL at Qinzhou First People's Hospital between February 2018 and January 2020. Patients were followed for three years and categorized into non-recurrence and recurrence groups based on postoperative outcome. Univariate analysis identified possible predictors of stone recurrence. Data were split using the gradient boosting machine (GBM) algorithm, assigning 70% as the training set and 30% as the test set. The predictive performance of the GBM model was assessed using the receiver operating characteristic (ROC) curve and calibration curve, and compared with a logistic regression model. RESULTS: Six factors were identified as significant predictors of recurrence: age, diabetes, total bilirubin, biliary stricture, number of stones, and stone diameter. The GBM model, developed based on these factors, showed high predictive accuracy. The area under the ROC curve (AUC) was 0.763 (95% CI: 0.695-0.830) for the training set and 0.709 (95% CI: 0.596-0.822) for the test set. Optimal cutoff values were 0.286 and 0.264, with sensitivities of 62.30% and 66.70%, and specificities of 77.20% and 68.50%, respectively. Calibration curves indicated good agreement between predicted probabilities and observed recurrence rates in both sets. DeLong's test revealed no significant differences between the GBM and logistic regression models in predictive performance (training set: D = 0.003, P = 0.997 > 0.05; test set: D = 0.075, P = 0.940 > 0.05). CONCLUSION: Biliary stricture, stone diameter, diabetes, stone number, age, and total bilirubin significantly influence stone recurrence after PTCSL. The GBM model, based on these factors, demonstrates robust accuracy and discrimination. Both GBM and logistic regression models effectively predicted stone recurrence post-PTCSL.

5.
Phys Med Biol ; 69(11)2024 May 30.
Article in English | MEDLINE | ID: mdl-38749471

ABSTRACT

Accurate diagnosis and treatment assessment of liver fibrosis face significant challenges, including inherent limitations in current techniques like sampling errors and inter-observer variability. Addressing this, our study introduces a novel machine learning (ML) framework, which integrates light gradient boosting machine and multivariate imputation by chained equations to enhance liver status assessment using biomechanical markers. Building upon our previously established multiscale mechanical characteristics in fibrotic and treated livers, this framework employs Gaussian Bayesian optimization for post-imputation, significantly improving classification performance. Our findings indicate a marked increase in the precision of liver fibrosis diagnosis and provide a novel, quantitative approach for assessing fibrosis treatment. This innovative combination of multiscale biomechanical markers with advanced ML algorithms represents a transformative step in liver disease diagnostics and treatment evaluation, with potential implications for other areas in medical diagnostics.


Subject(s)
Liver Cirrhosis , Machine Learning , Biomechanical Phenomena , Humans , Mechanical Phenomena , Bayes Theorem , Animals , Biomarkers/metabolism
6.
Sci Rep ; 14(1): 12539, 2024 May 31.
Article in English | MEDLINE | ID: mdl-38822049

ABSTRACT

Mine water inrush is a serious threat to mine safety production. It is very important to identify water inrush source types quickly to prevent and control water damage. In this study, the aqueous chemical components Na+ + K+, Ca2+, Mg2+, Cl-, SO42- and HCO3- of different aquifers in Pingdingshan coalfield were selected as the characteristic values, and the Surface water, Quaternary pore water, Carboniferous limestone karst water, Permian sandstone water, and Cambrian limestone karst water were used as the labels. An intelligent water source discrimination model is proposed by combining data mining, classification models, and reinforcement learning. As outlier data in the samples may interfere with the model recognition ability, the data distribution range was analyzed using box plots, and 20 groups of abnormal samples were excluded. The processed water chemistry data were divided into 80% learning samples and 20% test samples, and the learning samples were fed into a light gradient boosting machine (LightGBM) for training. The tree-structured parson estimator (TPE) obtains the optimal values of the main parameters of LightGBM in a very short time. Substituting the hyperparameters back into the model yields a 13.9% improvement in the accuracy of the model, proving the effectiveness of the TPE algorithm. To further validate the performance of the model, TPE-LightGBM is compared and analyzed with a Random Search-Multi Layer Perceptron Machine (RS-MLP) and Genetic Algorithm-Extreme Gradient Boosting Tree (GA-SVM). The accuracy of TPE-LightGBM, RS-MLP, and GA-SVM is 0.931, 0.759, 0.724 in that order, and the generalization error RMSE is 0.415, 1.05, and 1.313 in that order. The results show that TPE-LightGBM is more advantageous in water source identification and is more resistant to overfitting. By calculating and comparing the information gain of each variable, the contribution of Ca2+ is the highest, so it is necessary to pay attention to the change in Ca2+ concentration. TPE-LightGBM's high accuracy and generalization ability have a good prospect for the identification of sudden water source types.

7.
Mol Divers ; 2024 Mar 30.
Article in English | MEDLINE | ID: mdl-38554168

ABSTRACT

Cancer, being the second leading cause of death globally. So, the development of effective anticancer treatments is crucial in the field of medicine. Anticancer peptides (ACPs) have shown promising therapeutic potential in cancer treatment compared to traditional methods. However, the process of identifying ACPs through experimental means is often time-intensive and expensive. To overcome this issue, we employed a machine learning-based approach for the first time to develop an anticancer model using small molecules. Anticancer small molecules (ACSMs) are compounds that have been developed to target and inhibit cancer cells. In this study, we used 10,000 compounds to develop the machine learning models using five algorithms such as, Random Forest (RF), Light gradient boosting machine (LightGBM), K-nearest neighbors (KNN), Decision tree (DT) and Extreme Gradient Boosting (XGB). The developed models were evaluated using the test set and top three models were identified (RF, LightGBM and XGB). Furthermore, to validate the predictive performance of our models, we have performed external validation using an FDA approved anticancer compounds/drugs. Following this analysis, we found that our LightGBM model correctly predicted 9 compounds as active. However, RF and XGB exhibited some limitations by predicting 8 and 7 compounds as active out of 10, respectively. These results demonstrate that, when compared to RF and XGB, the LightGBM model showcase robust prediction capabilities, achieving a superior accuracy of 79% with an AUC of 0.88. These findings provide promising insights into the potential of our approach for predicting anticancer small molecules, highlighting the role of machine learning in advancing cancer treatment research.

8.
JMIR Form Res ; 8: e47803, 2024 Mar 11.
Article in English | MEDLINE | ID: mdl-38466973

ABSTRACT

BACKGROUND: Atrial fibrillation (AF) represents a hazardous cardiac arrhythmia that significantly elevates the risk of stroke and heart failure. Despite its severity, its diagnosis largely relies on the proficiency of health care professionals. At present, the real-time identification of paroxysmal AF is hindered by the lack of automated techniques. Consequently, a highly effective machine learning algorithm specifically designed for AF detection could offer substantial clinical benefits. We hypothesized that machine learning algorithms have the potential to identify and extract features of AF with a high degree of accuracy, given the intricate and distinctive patterns present in electrocardiogram (ECG) recordings of AF. OBJECTIVE: This study aims to develop a clinically valuable machine learning algorithm that can accurately detect AF and compare different leads' performances of AF detection. METHODS: We used 12-lead ECG recordings sourced from the 2020 PhysioNet Challenge data sets. The Welch method was used to extract power spectral features of the 12-lead ECGs within a frequency range of 0.083 to 24.92 Hz. Subsequently, various machine learning techniques were evaluated and optimized to classify sinus rhythm (SR) and AF based on these power spectral features. Furthermore, we compared the effects of different frequency subbands and different lead selections on machine learning performances. RESULTS: The light gradient boosting machine (LightGBM) was found to be the most effective in classifying AF and SR, achieving an average F1-score of 0.988 across all ECG leads. Among the frequency subbands, the 0.083 to 4.92 Hz range yielded the highest F1-score of 0.985. In interlead comparisons, aVR had the highest performance (F1=0.993), with minimal differences observed between leads. CONCLUSIONS: In conclusion, this study successfully used machine learning methodologies, particularly the LightGBM model, to differentiate SR and AF based on power spectral features derived from 12-lead ECGs. The performance marked by an average F1-score of 0.988 and minimal interlead variation underscores the potential of machine learning algorithms to bolster real-time AF detection. This advancement could significantly improve patient care in intensive care units as well as facilitate remote monitoring through wearable devices, ultimately enhancing clinical outcomes.

9.
Heliyon ; 10(4): e25406, 2024 Feb 29.
Article in English | MEDLINE | ID: mdl-38370176

ABSTRACT

Objective: This study aims to develop a predictive model using artificial intelligence to estimate the ICU length of stay (LOS) for Congenital Heart Defects (CHD) patients after surgery, improving care planning and resource management. Design: We analyze clinical data from 2240 CHD surgery patients to create and validate the predictive model. Twenty AI models are developed and evaluated for accuracy and reliability. Setting: The study is conducted in a Brazilian hospital's Cardiovascular Surgery Department, focusing on transplants and cardiopulmonary surgeries. Participants: Retrospective analysis is conducted on data from 2240 consecutive CHD patients undergoing surgery. Interventions: Ninety-three pre and intraoperative variables are used as ICU LOS predictors. Measurements and main results: Utilizing regression and clustering methodologies for ICU LOS (ICU Length of Stay) estimation, the Light Gradient Boosting Machine, using regression, achieved a Mean Squared Error (MSE) of 15.4, 11.8, and 15.2 days for training, testing, and unseen data. Key predictors included metrics such as "Mechanical Ventilation Duration", "Weight on Surgery Date", and "Vasoactive-Inotropic Score". Meanwhile, the clustering model, Cat Boost Classifier, attained an accuracy of 0.6917 and AUC of 0.8559 with similar key predictors. Conclusions: Patients with higher ventilation times, vasoactive-inotropic scores, anoxia time, cardiopulmonary bypass time, and lower weight, height, BMI, age, hematocrit, and presurgical oxygen saturation have longer ICU stays, aligning with existing literature.

10.
BMC Med Inform Decis Mak ; 24(1): 48, 2024 Feb 13.
Article in English | MEDLINE | ID: mdl-38350899

ABSTRACT

BACKGROUND: Secondary immunodeficiency can arise from various clinical conditions that include HIV infection, chronic diseases, malignancy and long-term use of immunosuppressives, which makes the suffering patients susceptible to all types of pathogenic infections. Other than HIV infection, the possible pathogen profiles in other aetiology-induced secondary immunodeficiency are largely unknown. METHODS: Medical records of the patients with secondary immunodeficiency caused by various aetiologies were collected from the First Affiliated Hospital of Nanchang University, China. Based on these records, models were developed with the machine learning method to predict the potential infectious pathogens that may inflict the patients with secondary immunodeficiency caused by various disease conditions other than HIV infection. RESULTS: Several metrics were used to evaluate the models' performance. A consistent conclusion can be drawn from all the metrics that Gradient Boosting Machine had the best performance with the highest accuracy at 91.01%, exceeding other models by 13.48, 7.14, and 4.49% respectively. CONCLUSIONS: The models developed in our study enable the prediction of potential infectious pathogens that may affect the patients with secondary immunodeficiency caused by various aetiologies except for HIV infection, which will help clinicians make a timely decision on antibiotic use before microorganism culture results return.


Subject(s)
HIV Infections , Humans , HIV Infections/complications , Benchmarking , China , Hospitals , Machine Learning
11.
Micromachines (Basel) ; 15(2)2024 Jan 30.
Article in English | MEDLINE | ID: mdl-38398939

ABSTRACT

Detecting inclusions in materials at small scales is of high importance to ensure the quality, structural integrity and performance efficiency of microelectromechanical machines and products. Ultrasound waves are commonly used as a non-destructive method to find inclusions or structural flaws in a material. Mathematical continuum models can be used to enable ultrasound techniques to provide quantitative information about the change in the mechanical properties due to the presence of inclusions. In this paper, a nonlocal size-dependent poroelasticity model integrated with machine learning is developed for the description of the mechanical behaviour of spherical inclusions under uniform radial compression. The scale effects on fluid pressure and radial displacement are captured using Eringen's theory of nonlocality. The conservation of mass law is utilised for both the solid matrix and fluid content of the poroelastic material to derive the storage equation. The governing differential equations are derived by decoupling the equilibrium equation and effective stress-strain relations in the spherical coordinate system. An accurate numerical solution is obtained using the Galerkin discretisation technique and a precise integration method. A Dormand-Prince solution is also developed for comparison purposes. A light gradient boosting machine learning model in conjunction with the nonlocal model is used to extract the pattern of changes in the mechanical response of the poroelastic inclusion. The optimised hyperparameters are calculated by a grid search cross validation. The modelling estimation power is enhanced by considering nonlocal effects and applying machine learning processes, facilitating the detection of ultrasmall inclusions within a poroelastic medium at micro/nanoscales.

12.
Technol Cancer Res Treat ; 23: 15330338231219352, 2024.
Article in English | MEDLINE | ID: mdl-38233736

ABSTRACT

Background: Although gastric adenocarcinoma (GA) related ocular metastasis (OM) is rare, its occurrence indicates a more severe disease. We aimed to utilize machine learning (ML) to analyze the risk factors of GA-related OM and predict its risks. Methods: This is a retrospective cohort study. The clinical data of 3532 GA patients were collected and randomly classified into training and validation sets in a ratio of 7:3. Those with or without OM were classified into OM and non-OM (NOM) groups. Univariate and multivariate logistic regression analyses and least absolute shrinkage and selection operator were conducted. We integrated the variables identified through feature importance ranking and further refined the selection process using forward sequential feature selection based on random forest (RF) algorithm before incorporating them into the ML model. We applied six ML algorithms to construct the predictive GA model. The area under the receiver operating characteristic (ROC) curve indicated the model's predictive ability. Also, we established a network risk calculator based on the best performance model. We used Shapley additive interpretation (SHAP) to identify risk factors and to confirm the interpretability of the black box model. We have de-identified all patient details. Results: The ML model, consisting of 13 variables, achieved an optimal predictive performance using the gradient boosting machine (GBM) model, with an impressive area under the curve (AUC) of 0.997 in the test set. Utilizing the SHAP method, we identified crucial factors for OM in GA patients, including LDL, CA724, CEA, AFP, CA125, Hb, CA153, and Ca2+. Additionally, we validated the model's reliability through an analysis of two patient cases and developed a functional online web prediction calculator based on the GBM model. Conclusion: We used the ML method to establish a risk prediction model for GA-related OM and showed that GBM performed best among the six ML models. The model may identify patients with GA-related OM to provide early and timely treatment.


Subject(s)
Adenocarcinoma , Eye Neoplasms , Stomach Neoplasms , Humans , Reproducibility of Results , Retrospective Studies , Algorithms , Machine Learning
13.
J Contam Hydrol ; 261: 104300, 2024 02.
Article in English | MEDLINE | ID: mdl-38242063

ABSTRACT

Long-term agricultural activities have affected the sustainable development of groundwater in the Northern Anhui Plain, East China. It is, therefore, important to identify areas at high groundwater pollution risk in the Northern Anhui Plain to ensure effective protection of regional water resources. In this study, 60 groundwater samples were collected from the shallow aquifer of the plain and analyzed for nitrate (NO3-) concentrations. In addition, 10 environmental and geological factors including the elevations, distances-to-rivers, slope angles, orientations of slopes, land cover types, topographic wetness index (TWI), geomorphology, lithology, soil types, and precipitation amounts in the study area were selected as input layers. The light gradient boosting machine (LightGBM) and random forest (RF) algorithms, combined with the geographic information system (GIS), were performed to generate the groundwater pollution occurrence probability maps. The descriptive statistics showed that the NO3- concentrations in the shallow groundwater ranged from 4.3 to 73.6 mg/L. Most sampling wells exhibited NO3- concentrations above the threshold of 18.3 mg/L. The prediction results of the LightGBM and RF algorithms indicated a high groundwater NO3- pollution risk in the southern part of the plain. However, the LightGBM algorithm had a better prediction performance than RF, with a higher Kappa value of 0.84. Moreover, the frequency ratio method revealed that the precipitation amounts contributed to the groundwater NO3- pollution risk in the study area by 38.14%, followed by the elevations, slope angles, TWI, land cover types, and slope aspects, with contributions of 21.4, 13.02, 8.37, 7.44, and 6.51%, respectively. In the future, sampling of additional wells and further anthropogenic factors shall be considered for the development of more effective groundwater nitrate pollution prevention strategies provided to decision makers.


Subject(s)
Groundwater , Water Pollutants, Chemical , Nitrates/analysis , Geographic Information Systems , Environmental Monitoring/methods , Water Pollutants, Chemical/analysis , China , Risk Assessment , Machine Learning
14.
Graefes Arch Clin Exp Ophthalmol ; 262(1): 203-210, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37773288

ABSTRACT

PURPOSE: To develop a machine learning model to evaluate the activity stage of extraocular muscles in thyroid-associated ophthalmopathy (TAO). METHODS: This study retrospectively analysed data from patients with TAO who underwent contrast-enhanced magnetic resonance imaging (MRI) from 2015 to 2022. Three independent machine learning models, namely, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and deep neural networks (DNNs), were constructed using common clinical features. The performance of these models was compared using evaluation metrics such as the area under the receiver operating curve (AUC), accuracy, precision, recall, and F1 score. The importance of features was explained using Shapley additive explanations (SHAP). RESULTS: A total of 2561 eyes of 1479 TAO patients were included in this study. The original dataset was randomly divided into a training set (80%, n = 2048) and a test set (20%, n = 513). In the performance evaluation of the test set, the LightGBM model had the best diagnostic performance (AUC 0.9260). According to the SHAP results, features such as conjunctival congestion, swollen caruncles, oedema of the upper eyelid, course of TAO, and intraocular pressure had the most significant impact on the LightGBM model. CONCLUSION: This study used contrast-enhanced MRI as an objective evaluation criterion and constructed a LightGBM model based on readily accessible clinical data. The model had good classification performance, making it a promising artificial intelligence (AI)-assisted tool to help community hospitals evaluate the inflammatory activity of extraocular muscles in TAO patients in a timely manner.


Subject(s)
Graves Ophthalmopathy , Humans , Graves Ophthalmopathy/diagnosis , Oculomotor Muscles , Artificial Intelligence , Retrospective Studies , Neural Networks, Computer , Eyelids
15.
Micromachines (Basel) ; 14(11)2023 Oct 31.
Article in English | MEDLINE | ID: mdl-38004904

ABSTRACT

Establishing an excellent recycling mechanism for containers is of great importance for environmental protection, so many technical approaches applied during the whole recycling stage have become popular research issues. Among them, classification is considered a key step, but this work is mostly achieved manually in practical applications. Due to the influence of human subjectivity, the classification accuracy often varies significantly. In order to overcome this shortcoming, this paper proposes an identification method based on a Recursive Feature Elimination-Light Gradient Boosting Machine (RFE-LightGBM) algorithm using electronic nose. Firstly, odor features were extracted, and feature datasets were then constructed based on the response data of the electronic nose to the detected gases. Afterwards, a principal component analysis (PCA) and the RFE-LightGBM algorithm were applied to reduce the dimensionality of the feature datasets, and the differences between these two methods were analyzed, respectively. Finally, the differences in the classification accuracies on the three datasets (the original feature dataset, PCA dimensionality reduction dataset, and RFE-LightGBM dimensionality reduction dataset) were discussed. The results showed that the highest classification accuracy of 95% could be obtained by using the RFE-LightGBM algorithm in the classification stage of recyclable containers, compared to the original feature dataset (88.38%) and PCA dimensionality reduction dataset (92.02%).

16.
BMC Med Inform Decis Mak ; 23(1): 230, 2023 10 19.
Article in English | MEDLINE | ID: mdl-37858225

ABSTRACT

BACKGROUND: Obstructive sleep apnea (OSA) is a globally prevalent disease with a complex diagnostic method. Severe OSA is associated with multi-system dysfunction. We aimed to develop an interpretable machine learning (ML) model for predicting the risk of severe OSA and analyzing the risk factors based on clinical characteristics and questionnaires. METHODS: This was a retrospective study comprising 1656 subjects who presented and underwent polysomnography (PSG) between 2018 and 2021. A total of 23 variables were included, and after univariate analysis, 15 variables were selected for further preprocessing. Six types of classification models were used to evaluate the ability to predict severe OSA, namely logistic regression (LR), gradient boosting machine (GBM), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), bootstrapped aggregating (Bagging), and multilayer perceptron (MLP). All models used the area under the receiver operating characteristic curve (AUC) was calculated as the performance metric. We also drew SHapley Additive exPlanations (SHAP) plots to interpret predictive results and to analyze the relative importance of risk factors. An online calculator was developed to estimate the risk of severe OSA in individuals. RESULTS: Among the enrolled subjects, 61.47% (1018/1656) were diagnosed with severe OSA. Multivariate LR analysis showed that 10 of 23 variables were independent risk factors for severe OSA. The GBM model showed the best performance (AUC = 0.857, accuracy = 0.766, sensitivity = 0.798, specificity = 0.734). An online calculator was developed to estimate the risk of severe OSA based on the GBM model. Finally, waist circumference, neck circumference, the Epworth Sleepiness Scale, age, and the Berlin questionnaire were revealed by the SHAP plot as the top five critical variables contributing to the diagnosis of severe OSA. Additionally, two typical cases were analyzed to interpret the contribution of each variable to the outcome prediction in a single patient. CONCLUSIONS: We established six risk prediction models for severe OSA using ML algorithms. Among them, the GBM model performed best. The model facilitates individualized assessment and further clinical strategies for patients with suspected severe OSA. This will help to identify patients with severe OSA as early as possible and ensure their timely treatment. TRIAL REGISTRATION: Retrospectively registered.


Subject(s)
Sleep Apnea, Obstructive , Humans , Adult , Retrospective Studies , Sleep Apnea, Obstructive/diagnosis , Sleep Apnea, Obstructive/epidemiology , ROC Curve , Risk Factors , Machine Learning
17.
Polymers (Basel) ; 15(20)2023 Oct 19.
Article in English | MEDLINE | ID: mdl-37896391

ABSTRACT

The quality control of thermally modified wood and identifying heat treatment intensity using nondestructive testing methods are critical tasks. This study used near-infrared (NIR) spectroscopy and machine learning modeling to classify thermally modified wood. NIR spectra were collected from the surfaces of untreated and thermally treated (at 170 °C, 212 °C, and 230 °C) western hemlock samples. An explainable machine learning approach was practiced using a TreeNet gradient boosting machine. No dimensionality reduction was performed to better explain the feature ranking results obtained from the model and provide insight into the critical wavelengths contributing to the performance of classification models. NIR spectra in the ranges of 1100-2500 nm, 1400-2500 nm, and 1700-2500 nm were fed into the TreeNet model, which resulted in classification accuracy values (test data) of 94.35%, 89.29%, and 84.52%, respectively. Feature ranking analysis revealed that when using the range of 1100-2500 nm, the changes in wood color resulted in the highest variation in NIR reflectance amongst treatments. As a result, associated features were given higher importance by TreeNet. Limiting the wavelength range increased the significance of features related to water or wood chemistry; however, these predictive models were not as accurate as the one benefiting from the impact of wood color change on the NIR spectra. The developed framework could be applied to different applications in which NIR spectra are used for wood characterization and quality control to provide improved insights into selected NIR wavelengths when developing a machine learning model.

18.
Ren Fail ; 45(2): 2251597, 2023.
Article in English | MEDLINE | ID: mdl-37724550

ABSTRACT

BACKGROUND: Established prognostic models of idiopathic membranous nephropathy (IMN) were limited to traditional modeling methods and did not comprehensively consider clinical and pathological patient data. Based on the electronic medical record (EMR) system, machine learning (ML) was used to construct a risk prediction model for the prognosis of IMN. METHODS: Data from 418 patients with IMN were diagnosed by renal biopsy at the Fifth Clinical Medical College of Shanxi Medical University. Fifty-nine medical features of the patients could be obtained from EMR, and prediction models were established based on five ML algorithms. The area under the curve, recall rate, accuracy, and F1 were used to evaluate and compare the performances of the models. Shapley additive explanation (SHAP) was used to explain the results of the best-performing model. RESULTS: One hundred and seventeen patients (28.0%) with IMN experienced adverse events, 28 of them had compound outcomes (ESRD or double serum creatinine (SCr)), and 89 had relapsed. The gradient boosting machine (LightGBM) model had the best performance, with the highest AUC (0.892 ± 0.052, 95% CI 0.840-0.945), accuracy (0.909 ± 0.016), recall (0.741 ± 0.092), precision (0.906 ± 0.027), and F1 (0.905 ± 0.020). Recursive feature elimination with random forest and SHAP plots based on LightGBM showed that anti-phospholipase A2 receptor (anti-PLA2R), immunohistochemical immunoglobulin G4 (IHC IgG4), D-dimer (D-DIMER), triglyceride (TG), serum albumin (ALB), aspartate transaminase (AST), ß2-microglobulin (BMG), SCr, and fasting plasma glucose (FPG) were important risk factors for the prognosis of IMN. Increased risk of adverse events in IMN patients was correlated with high anti-PLA2R and low IHC IgG4. CONCLUSIONS: This study established a risk prediction model for the prognosis of IMN using ML based on clinical and pathological patient data. The LightGBM model may become a tool for personalized management of IMN patients.


Subject(s)
Glomerulonephritis, Membranous , Humans , Prognosis , Glomerulonephritis, Membranous/diagnosis , Algorithms , Immunoglobulin G , Machine Learning
19.
Front Med (Lausanne) ; 10: 1228833, 2023.
Article in English | MEDLINE | ID: mdl-37671403

ABSTRACT

Background and objective: Accurate and fast diagnosis of rheumatic diseases affecting the hands is essential for further treatment decisions. Fluorescence optical imaging (FOI) visualizes inflammation-induced impaired microcirculation by increasing signal intensity, resulting in different image features. This analysis aimed to find specific image features in FOI that might be important for accurately diagnosing different rheumatic diseases. Patients and methods: FOI images of the hands of patients with different types of rheumatic diseases, such as rheumatoid arthritis (RA), osteoarthritis (OA), and connective tissue diseases (CTD), were assessed in a reading of 20 different image features in three phases of the contrast agent dynamics, yielding 60 different features for each patient. The readings were analyzed for mutual differential diagnosis of the three diseases (One-vs-One) and each disease in all data (One-vs-Rest). In the first step, statistical tools and machine-learning-based methods were applied to reveal the importance rankings of the features, that is, to find features that contribute most to the model-based classification. In the second step machine learning with a stepwise increasing number of features was applied, sequentially adding at each step the most crucial remaining feature to extract a minimized subset that yields the highest diagnostic accuracy. Results: In total, n = 605 FOI of both hands were analyzed (n = 235 with RA, n = 229 with OA, and n = 141 with CTD). All classification problems showed maximum accuracy with a reduced set of image features. For RA-vs.-OA, five features were needed for high accuracy. For RA-vs.-CTD ten, OA-vs.-CTD sixteen, RA-vs.-Rest five, OA-vs.-Rest eleven, and CTD-vs-Rest fifteen, features were needed, respectively. For all problems, the final importance ranking of the features with respect to the contrast agent dynamics was determined. Conclusions: With the presented investigations, the set of features in FOI examinations relevant to the differential diagnosis of the selected rheumatic diseases could be remarkably reduced, providing helpful information for the physician.

20.
Diabetes Res Clin Pract ; 204: 110917, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37748711

ABSTRACT

AIM: To explore the influencing factors of Type 2 diabetes mellitus (T2DM) in the rural population of Henan Province and evaluate the predictive ability of non-invasive factors to T2DM. METHODS: A total of 30,020 participants from the Henan Rural Cohort Study in China were included in this study. The dataset was randomly divided into a training set and a testing set with a 50:50 split for validation purposes. We used logistic regression analysis to investigate the association between 56 factors and T2DM in the training set (false discovery rate < 5 %) and significant factors were further validated in the testing set (P < 0.05). Gradient Boosting Machine (GBM) model was used to determine the ability of the non-invasive variables to classify T2DM individuals accurately and the importance ranking of these variables. RESULTS: The overall population prevalence of T2DM was 9.10 %. After adjusting for age, sex, educational level, marital status, and body measure index (BMI), we identified 13 non-invasive variables and 6 blood biochemical indexes associated with T2DM in the training and testing dataset. The top three factors according to the GBM importance ranking were pulse pressure (PP), urine glucose (UGLU), and waist-to-hip ratio (WHR). The GBM model achieved a receiver operating characteristic (AUC) curve of 0.837 with non-invasive variables and 0.847 for the full model. CONCLUSIONS: Our findings demonstrate that non-invasive variables that can be easily measured and quickly obtained may be used to predict T2DM risk in rural populations in Henan Province.


Subject(s)
Diabetes Mellitus, Type 2 , Humans , Diabetes Mellitus, Type 2/diagnosis , Diabetes Mellitus, Type 2/epidemiology , Cohort Studies , Risk Factors , Rural Population , Body Mass Index , Waist Circumference , China/epidemiology
SELECTION OF CITATIONS
SEARCH DETAIL