RESUMEN
Near-infrared (NIR) spectroscopy has been widely utilized to predict multi-constituents of corn in agriculture. However, directly extracting constituent information from the NIR spectra is challenging due to many issues such as broad absorption band, overlapping and non-specific nature. To solve these problems and extract implicit features from the raw data of NIR spectra to improve performance of quantitative models, a one-dimensional shallow convolutional neural network (CNN) model based on an eXtreme Gradient Boosting (XGBoost) feature extraction method was proposed in this paper. The leaf node feature information in the XGBoost was encoded and reconstructed to obtain the implicit features of raw data in the NIR spectra. A two-parametric Swish (TSwish or TS) activation function was proposed to improve the performance of CNN, and the elastic net (EN) was also applied to avoid the overfitting problem of the CNN model. Performance of the developed XGBoost-CNN-TS-EN model was evaluated using two public NIR spectroscopy datasets of corn and soil, and the obtained determination coefficients (R2) for moisture, oil, protein, and starch of the corn on test set were 0.993, 0.991, 0.998, and 0.992, respectively, with that of the soil organic matter being 0.992. The XGBoost-CNN-TS-EN model exhibits superior stability, good prediction accuracy, and generalization ability, demonstrating its great potentials for quantitative analysis of multi-constituents in spectroscopic applications.
Asunto(s)
Redes Neurales de la Computación , Espectroscopía Infrarroja Corta , Zea mays , Zea mays/química , Espectroscopía Infrarroja Corta/métodos , Almidón/química , Proteínas de Plantas/químicaRESUMEN
BACKGROUND: Several studies have shown a potential relationship between triglyceride-glucose index (TGI) and asthma. However, limited research has been conducted on the relationship between TGI and fractional exhaled nitric oxide (FeNO). METHODS: A total of 1,910 asthmatic individuals from the National Health and Nutrition Examination Survey (NHANES) database were included in this study. Linear regression analyses were used to investigate the relationship between TGI and FeNO in patients with asthma. Subsequently, a trend test was applied to verify whether there was a linear relationship between the TGI and FeNO. Finally, a subgroup analysis was performed to confirm the relationship among the different subgroup populations. RESULTS: Multivariable linear regression analyses showed that TGI was linearly related to FeNO in the asthmatic population. The trend test additionally validated the positive linear relationship between TGI and FeNO. The result of XGBoost revealed the five most influential factors on FeNO in a ranking of contrasted importance: eosinophil (EOS), body mass index (BMI), poverty-to-income ratio (PIR), TGI, and white blood cell count (WBC). CONCLUSIONS: This investigation revealed a positive linear relationship between TGI and FeNO in patients with asthma. This finding suggests a potential relationship between TGI and airway inflammation in patients with asthma, thereby facilitating the prompt identification of irregularities and providing a basis for clinical decision making. This study provides a novel perspective on asthma management.
Asunto(s)
Asma , Óxido Nítrico , Encuestas Nutricionales , Triglicéridos , Humanos , Asma/metabolismo , Asma/diagnóstico , Triglicéridos/metabolismo , Triglicéridos/sangre , Femenino , Masculino , Adulto , Persona de Mediana Edad , Óxido Nítrico/metabolismo , Óxido Nítrico/análisis , Índice de Masa Corporal , Glucemia/metabolismo , Recuento de Leucocitos , Modelos Lineales , Eosinófilos/metabolismo , Prueba de Óxido Nítrico Exhalado Fraccionado , Espiración , Pruebas RespiratoriasRESUMEN
Background: As the Internet becomes an increasingly vital source of medical information, the quality and reliability of brain tumor-related short videos on platforms such as TikTok and Bilibili have not been adequately evaluated. Therefore, this study aims to assess these aspects and explore the factors influencing the dissemination of such videos. Methods: A cross-sectional analysis was conducted on the top 100 brain tumor-related short videos from TikTok and Bilibili. The videos were evaluated using the Global Quality Score and the DISCERN reliability instrument. An eXtreme Gradient Boosting algorithm was utilized to predict dissemination outcomes. The videos were also categorized by content type and uploader. Results: TikTok videos scored relatively higher on both the Global Quality Score (median 2, interquartile range [2, 3] on TikTok vs. median 2, interquartile range [1, 2] on Bilibili, p = 1.51E-04) and the DISCERN reliability instrument (median 15, interquartile range [13, 18.25] on TikTok vs. 13.5, interquartile range [11, 16] on Bilibili, p = 1.66E-04). Subgroup analysis revealed that videos uploaded by professional individuals and institutions had higher quality and reliability compared to those uploaded by non-professional entities. Videos focusing on disease knowledge exhibited the highest quality and reliability compared to other content types. The number of followers emerged as the most important variable in our dissemination prediction model. Conclusion: The overall quality and reliability of brain tumor-related short videos on TikTok and Bilibili were unsatisfactory and did not significantly influence video dissemination. Future research should expand the scope to better understand the factors driving the dissemination of medical-themed videos.
RESUMEN
With the increased directional drilling activities in the oil and gas industry, combined with the digital revolution amongst all industry aspects, the need became high to optimize all planning and operational drilling activities. One important step in planning a directional well is to select a directional tool that can deliver the well in a cost-effective manner. Rotary steerable systems (RSS) and positive displacement mud motors (PDM) are the two widely used tools, each with distinct advantages: RSS excels in hole cleaning, sticking avoidance and hole quality in general, while PDM offers versatility and lower operating costs. This paper presents a series of machine learning (ML) models to automate the selection of the optimal directional tool based on offset well data. By processing lithology, directional, drilling performance, tripping and casing running data, the model predicts section time and cost for upcoming wells. Historical data from offset wells were split into training and testing sets and different ML algorithms were tested to choose the most accurate one. The XGBoost algorithm provided the most accurate predictions during testing, outperforming other algorithms. The beauty of the model is that it successfully accounted for variations in formation thicknesses and drilling environment and adjusts tool recommendations accordingly. Results show that no universal rule favors either RSS or PDM; rather, tool selection is highly dependent on well-specific factors. This data-driven approach reduces human bias, enhances decision-making, and could significantly lower field development costs, particularly in aggressive drilling campaigns.
RESUMEN
BACKGROUND: Bioequivalence risk assessment as an extension of quality risk management lacks examples of quantitative approaches to risk assessment at an early stage of generic drug development. The aim of our study was to develop a model-based approach for bioequivalence risk assessment that uses pharmacokinetic and physicochemical characteristics of drugs as predictors and would standardize the first step of risk assessment. METHODS: The Sandoz in-house bioequivalence database of 128 bioequivalence studies with poorly soluble drugs (23.5% non-bioequivalent) was used to train and validate the model. Four different modeling approaches, random forest, XGBoost, logistic regression and naïve Bayes, were compared. RESULTS: Among the best performing machine learning models, random forest was selected and optimized for the number of features, resulting in an accuracy of 84% on the test data set. The most important features for prediction were those related to solubility (dose number, acid dissociation constant), absorption and elimination rate, effective permeability, variability of pharmacokinetic endpoints, and absolute bioavailability. All features had a conceivable influence on the model predictions. CONCLUSION: The model was used to develop a bioequivalence risk assessment approach to categorize drugs in early development into high, medium or low risk classes.
RESUMEN
Extensive studies support using steel tubes to enhance the structural integrity of rubber aggregate concrete (RBAC), namely RBAC-filled steel tubes (RCFST). However, current design codes for assessing the axial compressive behaviour of circular stub RCFST (CS-RCFST) columns are limited. Furthermore, there is a scarcity of studies focused on ensuring the structural safety of these columns. Based on an extensive experimental database comprising 145 columns, this study explores machine learning (ML) capabilities for predicting the axial strength of CS-RCFST columns, using six typical machine-learning models, i.e., symbolic regression (SR), XGBoost, CatBoost, random forest, LightGBM, and Gaussian process regression models. The hyperparameter tuning of the introduced ML models is performed using the Bayesian Optimization technique. The comparison results show that the CatBoost model is the most reliable and accurate ML model (R2 = 0.999 and 0.993 for the training and testing sets, respectively). In addition, a simple and practical design expression for CS-RCFST columns has been developed with acceptable accuracy based on the SR model (an average test-to-prediction ratio of 0.99 and CoV of 0.132). Meanwhile, the axial strength predicted by ML models was compared with two prominent practice codes (i.e., AISC360 and EC4). The comparison results indicated that the ML models could introduce a highly reliable and accurate approach over current design standards for strength prediction. Furthermore, a reliability analysis is conducted on two different ML models to evaluate the reliability of utilising ML models in practical design applications. This assessment involves identifying the statistical properties associated with the compressive strength of RBAC, as well as introducing the required resistance design factors aligned with the target reliability recommended by code standards.
RESUMEN
Background: Percutaneous coronary intervention (PCI) is one of the most important diagnostic and therapeutic techniques in cardiology. At present, the traditional prediction models for postoperative events after PCI are ineffective, but machine learning has great potential in identification and prediction of risk. Machine learning can reduce overfitting through regularization techniques, cross-validation and ensemble learning, making the model more accurate in predicting large amounts of complex unknown data. This study sought to identify the risk of hemorrhea and major adverse cardiovascular events (MACEs) in patients after PCI through machine learning. Methods: The entire study population consisted of 7,931 individual patients who underwent PCI at Jiangsu Provincial Hospital and The Affiliated Wuxi Second People's Hospital from January 2007 to January 2022. The risk of postoperative hemorrhea and MACE (including cardiac death and in-stent restenosis) was predicted by 53 clinical features after admission. The population was assigned to the training set and the validation set in a specific ratio by simple randomization. Different machine learning algorithms, including eXtreme Gradient Boosting (XGBoost), random forest (RF), and deep learning neural network (DNN), were trained to build prediction models. A 5-fold cross-validation was applied to correct errors. Several evaluation indexes, including the area under the receiver operating characteristic (ROC) curve (AUC), accuracy (Acc), sensitivity (Sens), specificity (Spec), and net reclassification improvement (NRI), were used to compare the predictive performance. To improve the interpretability of the model and identify risk factors individually, SHapley Additive exPlanation (SHAP) was introduced. Results: In this study, 306 patients (3.9%) experienced hemorrhea, 107 patients (1.3%) experienced cardiac death, and 218 patients (2.7%) developed in-stent restenosis. In the training set and validation set, except for previous PCI and statins, there were no significant differences. XGBoost was observed to be the best predictor of every event, namely hemorrhea [AUC: 0.921, 95% confidence interval (CI): 0.864-0.978, Acc: 0.845, Sens: 0.851, Spec: 0.837 and NRI: 0.140], cardiac death (AUC: 0.939, 95% CI: 0.903-0.975, Acc: 0.914, Sens: 0.950, Spec: 0.800 and NRI: 0.148), and in-stent restenosis (AUC: 0.915; 95% CI: 0.863-0.967, Acc: 0.834, Sens: 0.778, Spec: 0.902 and NRI: 0.077). SHAP showed that the number of stents had the greatest influence on hemorrhea, while age and drug-coated balloon were the main factors in cardiogenic death and stent restenosis (all P<0.05). Conclusions: The XGBoost model (machine learning) performed better than the traditional logistic regression model in identifying hemorrhea and MACE after PCI. Machine learning models can be used as a tool for risk prediction. The machine learning model described in this study can personalize the prediction of hemorrhea and MACE after PCI for specific patients, helping clinicians adjust intervenable features.
RESUMEN
Background: High-voltage workers often experience fatigue due to the physically demanding nature of climbing in dynamic and complex environments, which negatively impacts their motor and mental abilities. Effective monitoring is necessary to ensure safety. Methods: This study proposed an experimental method to quantify fatigue in climbing operations. We collected subjective fatigue (using the RPE scale) and objective fatigue data, including systolic blood pressure (SBP), diastolic blood pressure (DBP), blood oxygen saturation (SpO2), vital capacity (VC), grip strength (GS), response time (RT), critical fusion frequency (CFF), and heart rate (HR) from 33 high-voltage workers before and after climbing tasks. The XGBoost algorithm was applied to establish a fatigue identification model. Results: The analysis showed that the physiological indicators of SpO2, VC, GS, RT, and CFF can effectively evaluate fatigue in climbing operations. The XGBoost fatigue identification model, based on subjective fatigue and the five physiological indicators, achieved an average accuracy of 89.75%. Conclusion: This study provides a basis for personalized management of fatigue in climbing operations, enabling timely detection of their fatigue states and implementation of corresponding measures to minimize the likelihood of accidents.
Asunto(s)
Algoritmos , Fatiga , Frecuencia Cardíaca , Humanos , Masculino , Adulto , Frecuencia Cardíaca/fisiología , Fuerza de la Mano/fisiología , Presión Sanguínea/fisiología , Femenino , Adulto Joven , Tiempo de Reacción/fisiologíaRESUMEN
Gallbladder cancer (GBC) is a malignancy with a bleak prognosis, and radical surgery remains the primary treatment option. However, the high postoperative recurrence rate and the lack of individualized risk assessment tools limit the effectiveness of current treatment strategies. This study aims to identify risk factors affecting the short-term disease-free survival (DFS) of GBC patients using machine learning methods and to build a prediction model. A retrospective analysis was conducted on the clinical data from 328 GBC patients treated at the First Affiliated Hospital of Huzhou University from 2008 to 2021. Patients were randomly divided into a training set (n=230) and a validation set (n=98). Clinical data, laboratory indexes, and follow-up data were collected. Univariate Cox regression analysis identified age, tumor T-staging, lymph node metastasis, differentiation degree, and CA199 level as prognostic factors affecting DFS (all P<0.05). A prediction model constructed using the LASSO regression achieved AUCs of 0.827 and 0.801 for predicting 1-year and 3-year DFS, respectively. Notably, the XGBoost regression model showed higher prediction accuracy with AUCs of 0.922 and 0.947, respectively. The Delong test confirmed that the XGBoost model had significantly higher AUC values compared to the LASSO model (all P<0.001). In the validation set, the XGBoost model demonstrated AUCs of 0.764 and 0.761 for predicting 1-year and 3-year DFS, respectively. Overall, the XGBoost regression model demonstrates high accuracy and clinical value in predicting short-term DFS in GBC patients after radical surgery, offering a valuable tool for personalized treatment.
RESUMEN
Early diagnosis of cervicitis is important. Previous studies have found that neutrophil extracellular traps (NETs) play pro-inflammatory and anti-inflammatory roles in many diseases, suggesting that they may be involved in the inflammation of the uterine cervix and NETs-related genes may serve as biomarkers of cervicitis. However, what NETs-related genes are associated with cervicitis remains to be determined. Transcriptome analysis was performed using samples of exfoliated cervical cells from 15 patients with cervicitis and 15 patients without cervicitis as the control group. First, the intersection of differentially expressed genes (DEGs) and neutrophil extracellular trap-related genes (NETRGs) were taken to obtain genes, followed by functional enrichment analysis. We obtained hub genes through two machine learning algorithms. We then performed Artificial Neural Network (ANN) and nomogram construction, confusion matrix, receiver operating characteristic (ROC), gene set enrichment analysis (GSEA), and immune cell infiltration analysis. Moreover, we constructed ceRNA network, mRNA-transcription factor (TF) network, and hub genes-drug network. We obtained 19 intersecting genes by intersecting 1398 DEGs and 136 NETRGs. 5 hub genes were obtained through 2 machine learning algorithms, namely PKM, ATG7, CTSG, RIPK3, and ENO1. Confusion matrix and ROC curve evaluation ANN model showed high accuracy and stability. A nomogram containing the 5 hub genes was established to assess the disease rate in patients. The correlation analysis revealed that the expression of ATG7 was synergistic with RIPK3. The GSEA showed that most of the hub genes were related to ECM receptor interactions. It was predicted that the ceRNA network contained 2 hub genes, 3 targeted miRNAs, and 27 targeted lnRNAs, and that 5 mRNAs were regulated by 28 TFs. In addition, 36 small molecule drugs that target hub genes may improve the treatment of cervicitis. In this study, five hub genes (PKM, ATG7, CTSG, RIPK3, ENO1) provided new directions for the diagnosis and treatment of patients with cervicitis.
RESUMEN
Accurate and robust positioning has become increasingly essential for emerging applications and services. While GPS (global positioning system) is widely used for outdoor environments, indoor positioning remains a challenging task. This paper presents a novel architecture for indoor positioning, leveraging machine learning techniques and a divide-and-conquer strategy to achieve low error estimates. The proposed method achieves an MAE (mean absolute error) of approximately 1 m for latitude and longitude. Our approach provides a precise and practical solution for indoor positioning. Additionally, some insights on the best machine learning techniques for these tasks are also envisaged.
RESUMEN
Background: Breast cancer is a common and complex disease, with various clinical features affecting prognosis. Accurate prediction of prognosis is essential for guiding personalized treatment strategies. This study aimed to develop machine learning models for predicting prognosis in breast cancer patients using retrospective data. Methods: A total of 6,477 patients from Affiliated Sir Run Run Shaw Hospital were included, and their electronic medical records (EMRs) were thoroughly examined to identify 15 clinical features significantly associated with breast cancer survival. We employed eight different machine learning algorithms, including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGBoost), to develop and evaluate the predictive performance of the models. In addition, to investigate the sensitivity of different training/testing set radio to model performance, we examined five sets of ratios: 50:50, 60:40, 70:30, 80:20, 90:10. Results: Among these models, XGBoost demonstrated the highest performance with receiver operating characteristic (ROC) area under the curve (AUC) of 0.813, accuracy of 0.739, sensitivity of 0.815, and specificity of 0.735. Further statistical analysis identified several significant predictors of prognosis, including age, tumor size, lymph node status, and hormone receptor status. The XGBoost model was found to exhibit superior predictive power compared to established prognostic models such as the Nottingham Prognostic Index (NPI) and Predict Breast. Based on the successful performance of the XGBoost model, we developed a prognosis prediction tool specifically designed for breast cancer, providing valuable insights to clinicians, and aiding them in making informed treatment decisions tailored to individual patients. Conclusions: Our study highlights the potential of machine learning models in accurately predicting prognosis for breast cancer patients, ultimately facilitating personalized treatment strategies. Further research and validation are warranted to fully integrate these models into clinical practice.
RESUMEN
Background: The early prediction of cerebral edema changes in patients with spontaneous intracerebral hemorrhage (SICH) may facilitate earlier interventions and result in improved outcomes. This study aimed to develop and validate machine learning models to predict cerebral edema changes within 72 h, using readily available clinical parameters, and to identify relevant influencing factors. Methods: An observational study was conducted between April 2021 and October 2023 at the Quzhou Affiliated Hospital of Wenzhou Medical University. After preprocessing the data, the study population was randomly divided into training and internal validation cohorts in a 7:3 ratio (training: N = 150; validation: N = 65). The most relevant variables were selected using Support Vector Machine Recursive Feature Elimination (SVM-RFE) and Least Absolute Shrinkage and Selection Operator (LASSO) algorithms. The predictive performance of random forest (RF), GDBT, linear regression (LR), and XGBoost models was evaluated using the area under the receiver operating characteristic curve (AUROC), precision-recall curve (AUPRC), accuracy, F1-score, precision, recall, sensitivity, and specificity. Feature importance was calculated, and the SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) methods were employed to explain the top-performing model. Results: A total of 84 (39.1%) patients developed cerebral edema changes. In the validation cohort, GDBT outperformed LR and RF, achieving an AUC of 0.654 (95% CI: 0.611-0.699) compared to LR of 0.578 (95% CI, 0.535-0.623, DeLong: p = 0.197) and RF of 0.624 (95% CI, 0.588-0.687, DeLong: p = 0.236). XGBoost also demonstrated similar performance with an AUC of 0.660 (95% CI, 0.611-0.711, DeLong: p = 0.963). However, in the training set, GDBT still outperformed XGBoost, with an AUC of 0.603 ± 0.100 compared to XGBoost of 0.575 ± 0.096. SHAP analysis revealed that serum sodium, HDL, subarachnoid hemorrhage volume, sex, and left basal ganglia hemorrhage volume were the top five most important features for predicting cerebral edema changes in the GDBT model. Conclusion: The GDBT model demonstrated the best performance in predicting 72-h changes in cerebral edema. It has the potential to assist clinicians in identifying high-risk patients and guiding clinical decision-making.
RESUMEN
BACKGROUND: Although transcatheter aortic valve replacement has emerged as an alternative to surgical aortic valve replacement, it requires extensive healthcare resources, and optimal length of hospital stay has become increasingly important. This study was conducted to assess the potential of novel machine learning models (artificial neural network and eXtreme Gradient Boost) in predicting optimal hospital discharge following transcatheter aortic valve replacement. AIM: To determine whether artificial neural network and eXtreme Gradient Boost models can be used to accurately predict optimal discharge following transcatheter aortic valve replacement. METHODS: Data were collected from the 2016-2018 National Inpatient Sample database using International Classification of Diseases, Tenth Revision codes. Patients were divided into two cohorts based on length of hospital stay: optimal discharge (length of hospital stay 0-3 days); and late discharge (length of hospital stay 4-9 days). χ2 and t tests were performed to compare patient characteristics with optimal discharge and prolonged discharge. Logistic regression, artificial neural network and eXtreme Gradient Boost models were used to predict optimal discharge. Model performance was determined using area under the curve and F1 score. An area under the curve≥0.80 and an F1 score≥0.70 were considered strong predictive accuracy. RESULTS: Twenty-five thousand and eight hundred and seventy-four patients who underwent transcatheter aortic valve replacement were analysed. Predictability of optimal discharge was similar amongst the models (area under the curve 0.80 in all models). In all models, patient disposition and elective procedure were the most important predictive factors. Coagulation disorder was the strongest co-morbidity predictor of whether a patient had an optimal discharge. CONCLUSIONS: Artificial neural network and eXtreme Gradient Boost models had satisfactory performances, demonstrating similar accuracy to binary logistic regression in predicting optimal discharge following transcatheter aortic valve replacement. Further validation and refinement of these models may lead to broader clinical adoption.
RESUMEN
The present study was designed to test the potential utility of regional cerebral oxygen saturation (rcSO2) in detecting term infants with brain injury. The study also examined whether quantitative rcSO2 features are associated with grade of hypoxic ischaemic encephalopathy (HIE). We analysed 58 term infants with HIE (>36 weeks of gestational age) enrolled in a prospective observational study. All newborn infants had a period of continuous rcSO2 monitoring and magnetic resonance imaging (MRI) assessment during the first week of life. rcSO2 Signals were pre-processed and quantitative features were extracted. Machine-learning and deep-learning models were developed to detect adverse outcome (brain injury on MRI or death in the first week) using the leave-one-out cross-validation approach and to assess the association between rcSO2 and HIE grade (modified Sarnat - at 1 h). The machine-learning model (rcSO2 excluding prolonged relative desaturations) significantly detected infant MRI outcome or death in the first week of life [area under the curve (AUC) = 0.73, confidence interval (CI) = 0.59-0.86, Matthew's correlation coefficient = 0.35]. In agreement, deep learning models detected adverse outcome with an AUC = 0.64, CI = 0.50-0.79. We also report a significant association between rcSO2 features and HIE grade using a machine learning approach (AUC = 0.81, CI = 0.73-0.90). We conclude that automated analysis of rcSO2 using machine learning methods in term infants with HIE was able to determine, with modest accuracy, infants with adverse outcome. De novo approaches to signal analysis of NIRS holds promise to aid clinical decision making in the future. KEY POINTS: Hypoxic-induced neonatal brain injury contributes to both short- and long-term functional deficits. Non-invasive continuous monitoring of brain oxygenation using near-infrared- spectroscopy offers a potential new insight to the development of serious injury. In this study, characteristics of the NIRS signal were summarised using either predefined features or data-driven feature extraction, both were combined with a machine learning approach to predict short-term brain injury. Using data from a cohort of term infants with hypoxic ischaemic encephalopathy, the present study illustrates that automated analysis of regional cerebral oxygen saturation rcSO2, using either machine learning or deep learning methods, was able to determine infants with adverse outcome.
RESUMEN
The drilling rate index (DRI) of rocks is important for optimizing drilling operations, as it informs the choice of appropriate methods and equipment, ultimately improving the efficiency of rock excavation projects. This study presents a hybrid machine learning approach to predict the DRI of rocks accurately. By integrating grey wolf optimization with support vector machine (GWO-SVM), random forest (GWO-RF), and extreme gradient boosting (GWO-XGBoost) models, the aim was to enhance predictive accuracy. Among these, the GWO-XGBoost model exhibited superior predictive performance, achieving a coefficient of determination (R²) of 0.999, mean absolute error (MAE) of 0.00043, root mean square error (RMSE) of 1.98017, and severity index (SI) of 0.0350 during training. Testing results confirmed its accuracy with R² of 0.999, MAE of 0.00038, RMSE of 1.80790, and SI of 0.0312. Furthermore, the GWO-XGBoost model outperformed the other models in terms of precision, recall, f1-score, and multi-class confusion matrix results for each DRI class. The GWO-RF model also demonstrated high accuracy, ranking second, while the GWO-SVM model showed comparatively lower performance. This research aims to advance rock excavation practices by providing a highly accurate and reliable tool for DRI prediction. The results highlight the significant potential of the GWO-XGBoost model in improving DRI predictions, offering valuable intuitions and practical applications in the field.
RESUMEN
OBJECTIVE: The objective was to utilize nine machine learning (ML) methods to predict the prognosis of antibody positive autoimmune encephalitis (AE) patients. METHODS: The encephalitis data from the Global Burden of Disease (GBD) study is analyzed to reflect the disease burden of encephalitis. This study included 187 patients with AE. 121 patients as training set and 67 patients as validation set. Decision trees (DT), random forest (RF), extreme gradient boosting (XGBoost), k-nearest neighbor (KNN), support vector machine (SVM), naive bayes (NB), neural network (NN), light gradient boosting machine (LGBM), and logistic regression (LR) are ML methods used to construct predictive models. The constructed models were validated for discrimination, calibration and clinical applicability using validation set data. Shapley additive explanation (SHAP) analysis was used to explain the model. RESULTS: The number of encephalitis worldwide deaths, incidence and prevalence is increasing every year from 2010 to 2021. The training set included 121 patients with AE. Univariate analysis and LASSO screening identified six variables. The results of constructing models using 9 ML methods showed RF had the highest accuracy (0.860), followed by XGBoost (0.826), with F1 scores of 0.844 and 0.807, respectively. Validation set data showed good discrimination, calibration and clinical applicability of the model. The SHAP values of infection, CSF monocyte percentage, and prealbumin were 0.906, 0.790, and 0.644, respectively. LIMITATIONS: As a rare disease, the sample size of this study is relatively small. CONCLUSION: The model constructed using RF and XGBoost has good performance, good discrimination, calibration, clinical applicability, and interpretability.
RESUMEN
In recent years, with the application of Internet of Things (IoT) and cloud technology in smart industrialization, Industrial Internet of Things (IIoT) has become an emerging hot topic. The increasing amount of data and device numbers in IIoT poses significant challenges to its security issues, making anomaly detection particularly important. Existing methods for anomaly detection in the IIoT often fall short when dealing with data imbalance, and the huge amount of IIoT data makes feature selection challenging and computationally intensive. In this paper, we propose an optimal deep learning model for anomaly detection in IIoT. Firstly, by setting different thresholds of eXtreme Gradient Boosting (XGBoost) for feature selection, features with importance above the given threshold are retained, while those below are ignored. Different thresholds yield different numbers of features. This approach not only secures effective features but also reduces the feature dimensionality, thereby decreasing the consumption of computational resources. Secondly, an optimized loss function is designed to study its impact on model performance in terms of handling imbalanced data, highly similar categories, and model training. We select the optimal threshold and loss function, which are part of our optimal model, by comparing metrics such as accuracy, precision, recall, False Alarm Rate (FAR), Area Under the Receiver Operating Characteristic Curve (AUC-ROC), and Area Under the Precision-Recall Curve (AUC-PR) values. Finally, combining the optimal threshold and loss function, we propose a model named MIX_LSTM for anomaly detection in IIoT. Experiments are conducted using the UNSW-NB15 and NSL-KDD datasets. The proposed MIX_LSTM model can achieve 0.084 FAR, 0.984 AUC-ROC, and 0.988 AUC-PR values in the binary anomaly detection experiment on the UNSW-NB15 dataset. In the NSL-KDD dataset, it can achieve 0.028 FAR, 0.967 AUC-ROC, and 0.962 AUC-PR values. By comparing the evaluation indicators, the model shows good performance in detecting abnormal attacks in the Industrial Internet of Things compared with traditional deep learning models, machine learning models and existing technologies.
RESUMEN
Background: Discharge date prediction plays a crucial role in healthcare management, enabling efficient resource allocation and patient care planning. Accurate estimation of the discharge date can optimize hospital operations and facilitate better patient outcomes. Materials and methods: In this study, we employed a systematic approach to develop a discharge date prediction model. We collaborated closely with clinical experts to identify relevant data elements that contribute to the prediction accuracy. Feature engineering was used to extract predictive features from both structured and unstructured data sources. XGBoost, a powerful machine learning algorithm, was employed for the prediction task. Furthermore, the developed model was seamlessly integrated into a widely used Electronic Medical Record (EMR) system, ensuring practical usability. Results: The model achieved a performance surpassing baseline estimates by up to 35.68% in the F1-score. Post-deployment, the model demonstrated operational value by aligning with MS GMLOS and contributing to an 18.96% reduction in excess hospital days. Conclusions: Our findings highlight the effectiveness and potential value of the developed discharge date prediction model in clinical practice. By improving the accuracy of discharge date estimations, the model has the potential to enhance healthcare resource management and patient care planning. Additional research endeavors should prioritize the evaluation of the model's long-term applicability across diverse scenarios and the comprehensive analysis of its influence on patient outcomes.
RESUMEN
The identification of relevant biomarkers from high-dimensional cancer data remains a significant challenge due to the complexity and heterogeneity inherent in various cancer types. Conventional feature selection methods often struggle to effectively navigate the vast solution space while maintaining high predictive accuracy. In response to these challenges, we introduce a novel feature selection approach that integrates Random Drift Optimization (RDO) with XGBoost, specifically designed to enhance the performance of cancer classification tasks. Our proposed framework not only improves classification accuracy but also offers valuable insights into the underlying biological mechanisms driving cancer progression. Through comprehensive experiments conducted on real-world cancer datasets, including Central Nervous System (CNS), Leukemia, Breast, and Ovarian cancers, we demonstrate the efficacy of our method in identifying a smaller subset of unique and relevant genes. This selection results in significantly improved classification efficiency and accuracy. When compared with popular classifiers such as Support Vector Machine, K-Nearest Neighbor, and Naive Bayes, our approach consistently outperforms these models in terms of both accuracy and F-measure metrics. For instance, our framework achieved an accuracy of 97.24% in the CNS dataset, 99.14% in Leukemia, 95.21% in Ovarian, and 87.62% in Breast cancer, showcasing its robustness and effectiveness across different types of cancer data. These results underline the potential of our RDO-XGBoost framework as a promising solution for feature selection in cancer data analysis, offering enhanced predictive performance and valuable biological insights.