Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 134
Filtrar
1.
Front Oncol ; 14: 1409273, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38947897

RESUMO

Objective: This study aims to develop an artificial intelligence model utilizing clinical blood markers, ultrasound data, and breast biopsy pathological information to predict the distant metastasis in breast cancer patients. Methods: Data from two medical centers were utilized, Clinical blood markers, ultrasound data, and breast biopsy pathological information were separately extracted and selected. Feature dimensionality reduction was performed using Spearman correlation and LASSO regression. Predictive models were constructed using LR and LightGBM machine learning algorithms and validated on internal and external validation sets. Feature correlation analysis was conducted for both models. Results: The LR model achieved AUC values of 0.892, 0.816, and 0.817 for the training, internal validation, and external validation cohorts, respectively. The LightGBM model achieved AUC values of 0.971, 0.861, and 0.890 for the same cohorts, respectively. Clinical decision curve analysis showed a superior net benefit of the LightGBM model over the LR model in predicting distant metastasis in breast cancer. Key features identified included creatine kinase isoenzyme (CK-MB) and alpha-hydroxybutyrate dehydrogenase. Conclusion: This study developed an artificial intelligence model using clinical blood markers, ultrasound data, and pathological information to identify distant metastasis in breast cancer patients. The LightGBM model demonstrated superior predictive accuracy and clinical applicability, suggesting it as a promising tool for early diagnosis of distant metastasis in breast cancer.

2.
Foods ; 13(13)2024 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-38998534

RESUMO

To enhance the accuracy of identifying fresh meat varieties using laser-induced breakdown spectroscopy (LIBS), we utilized the LightGBM model in combination with the Optuna algorithm. The procedure involved flattening fresh meat slices with glass slides and collecting spectral data of the plasma from the surfaces of the fresh meat tissues (pork, beef, and chicken) using LIBS technology. A total of 900 spectra were collected. Initially, we established LightGBM and SVM (support vector machine) models for the collected spectra. Subsequently, we applied information gain and peak extraction algorithms to select the features for each model. We then employed Optuna to optimize the hyperparameters of the LightGBM model, while a 10-fold cross-validation was conducted to determine the optimal parameters for SVM. Ultimately, the LightGBM model achieved higher accuracy, macro-F1, and Cohen's kappa coefficient (kappa coefficient) values of 0.9370, 0.9364, and 0.9244, respectively, compared to the SVM model's values of 0.8888, 0.8881, and 0.8666. This study provides a novel method for the rapid classification of fresh meat varieties using LIBS.

3.
Molecules ; 29(13)2024 Jun 22.
Artigo em Inglês | MEDLINE | ID: mdl-38998926

RESUMO

As an important photovoltaic material, organic-inorganic hybrid perovskites have attracted much attention in the field of solar cells, but their instability is one of the main challenges limiting their commercial application. However, the search for stable perovskites among the thousands of perovskite materials still faces great challenges. In this work, the energy above the convex hull values of organic-inorganic hybrid perovskites was predicted based on four different machine learning algorithms, namely random forest regression (RFR), support vector machine regression (SVR), XGBoost regression, and LightGBM regression, to study the thermodynamic phase stability of organic-inorganic hybrid perovskites. The results show that the LightGBM algorithm has a low prediction error and can effectively capture the key features related to the thermodynamic phase stability of organic-inorganic hybrid perovskites. Meanwhile, the Shapley Additive Explanation (SHAP) method was used to analyze the prediction results based on the LightGBM algorithm. The third ionization energy of the B element is the most critical feature related to the thermodynamic phase stability, and the second key feature is the electron affinity of ions at the X site, which are significantly negatively correlated with the predicted values of energy above the convex hull (Ehull). In the screening of organic-inorganic perovskites with high stability, the third ionization energy of the B element and the electron affinity of ions at the X site is a worthy priority. The results of this study can help us to understand the correlation between the thermodynamic phase stability of organic-inorganic hybrid perovskites and the key features, which can assist with the rapid discovery of highly stable perovskite materials.

4.
Environ Monit Assess ; 196(8): 738, 2024 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-39009752

RESUMO

Accurate retrieval of LST is crucial for understanding and mitigating the effects of urban heat islands, and ultimately addressing the broader challenge of global warming. This study emphasizes the importance of a single day satellite imageries for large-scale LST retrieval. It explores the impact of Spectral indices of the surface parameters, using machine learning algorithms to enhance accuracy. The research proposes a novel approach of capturing satellite data on a single day to reduce uncertainties in LST estimations. A case study over Chandigarh city using Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine, and Random Forest (RF) reveals RF's superior performance in LST estimations during both summer and winter seasons. All the ML models gave an R-square of above 0.8 and RF with slightly higher R-square during both summer (0.93) and winter (0.85). Building on these findings, the study extends its focus to Ranchi, demonstrating RF's robustness with impressive accuracy in capturing LST variations. The research contributes to bridging existing gaps in large-scale LST estimation methodologies, offering valuable insights for its diverse applications in understanding Earth's dynamic systems.


Assuntos
Monitoramento Ambiental , Aprendizado de Máquina , Imagens de Satélites , Estações do Ano , Temperatura , Monitoramento Ambiental/métodos , Aquecimento Global
5.
PeerJ Comput Sci ; 10: e2044, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38855258

RESUMO

Patent lifespan is commonly used as a quantitative measure in patent assessments. Patent holders maintain exclusive rights by paying significant maintenance fees, suggesting a strong correlation between a patent's lifespan and its business potential or economic value. Therefore, accurately forecasting the duration of a patent is of great significance. This study introduces a highly effective method that combines LightGBM, a sophisticated machine learning algorithm, with a customized loss function derived from Focal Loss. The purpose of this approach is to accurately predict the probability of a patent remaining valid until its maximum expiration date. This research differs from previous studies that have examined the various stages and phases of patents. Instead, it assesses the commercial viability of individual patents by considering their lifespan. The evaluation process utilizes a dataset consisting of 200,000 patents. The experimental results show a significant improvement in the performance of the model by combining Focal Loss with LightGBM. By incorporating Focal Loss into LightGBM, its ability to give priority to difficult instances during training is enhanced, resulting in an overall improvement in performance. This targeted approach enhances the model's ability to distinguish between different samples and its ability to recover from challenges by giving priority to difficult samples. As a result, it improves the model's accuracy in making predictions and its ability to apply those predictions to new data.

6.
Sci Rep ; 14(1): 13511, 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38866817

RESUMO

The growing application of carbon dioxide (CO2) in various environmental and energy fields, including carbon capture and storage (CCS) and several CO2-based enhanced oil recovery (EOR) techniques, highlights the importance of studying the phase equilibria of this gas with water. Therefore, accurate prediction of CO2 solubility in water becomes an important thermodynamic property. This study focused on developing two powerful intelligent models, namely gradient boosting (GBoost) and light gradient boosting machine (LightGBM) that predict CO2 solubility in water with high accuracy. The results revealed the outperformance of the GBoost model with root mean square error (RMSE) and determination coefficient (R2) of 0.137 mol/kg and 0.9976, respectively. The trend analysis demonstrated that the developed models were highly capable of detecting the physical trend of CO2 solubility in water across various pressure and temperature ranges. Moreover, the Leverage technique was employed to identify suspected data points as well as the applicability domain of the proposed models. The results showed that less than 5% of the data points were detected as outliers representing the large applicability domain of intelligent models. The outcome of this research provided insight into the potential of intelligent models in predicting solubility of CO2 in pure water.

7.
Med Biol Eng Comput ; 2024 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-38874706

RESUMO

The work elucidates the importance of accurate Parkinson's disease classification within medical diagnostics and introduces a novel framework for achieving this goal. Specifically, the study focuses on enhancing disease identification accuracy utilizing boosting methods. A standout contribution of this work lies in the utilization of a light gradient boosting machine (LGBM) coupled with hyperparameter tuning through grid search optimization (GSO) on the Parkinson's disease dataset derived from speech recording signals. In addition, the Synthetic Minority Over-sampling Technique (SMOTE) has also been employed as a pre-processing technique to balance the dataset, enhancing the robustness and reliability of the analysis. This approach is a novel addition to the study and underscores its potential to enhance disease identification accuracy. The datasets employed in this work include both gender-specific and combined cases, utilizing several distinctive feature subsets including baseline, Mel-frequency cepstral coefficients (MFCC), time-frequency, wavelet transform (WT), vocal fold, and tunable-Q-factor wavelet transform (TQWT). Comparative analyses against state-of-the-art boosting methods, such as AdaBoost and XG-Boost, reveal the superior performance of our proposed approach across diverse datasets and metrics. Notably, on the male cohort dataset, our method achieves exceptional results, demonstrating an accuracy of 0.98, precision of 1.00, sensitivity of 0.97, F1-Score of 0.98, and specificity of 1.00 when utilizing all features with GSO-LGBM. In comparison to AdaBoost and XGBoost, the proposed framework utilizing LGBM demonstrates superior accuracy, achieving an average improvement of 5% in classification accuracy across all feature subsets and datasets. These findings underscore the potential of the proposed methodology to enhance disease identification accuracy and provide valuable insights for further advancements in medical diagnostics.

8.
Front Artif Intell ; 7: 1401810, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38887604

RESUMO

Introduction: Regulatory agencies generate a vast amount of textual data in the review process. For example, drug labeling serves as a valuable resource for regulatory agencies, such as U.S. Food and Drug Administration (FDA) and Europe Medical Agency (EMA), to communicate drug safety and effectiveness information to healthcare professionals and patients. Drug labeling also serves as a resource for pharmacovigilance and drug safety research. Automated text classification would significantly improve the analysis of drug labeling documents and conserve reviewer resources. Methods: We utilized artificial intelligence in this study to classify drug-induced liver injury (DILI)-related content from drug labeling documents based on FDA's DILIrank dataset. We employed text mining and XGBoost models and utilized the Preferred Terms of Medical queries for adverse event standards to simplify the elimination of common words and phrases while retaining medical standard terms for FDA and EMA drug label datasets. Then, we constructed a document term matrix using weights computed by Term Frequency-Inverse Document Frequency (TF-IDF) for each included word/term/token. Results: The automatic text classification model exhibited robust performance in predicting DILI, achieving cross-validation AUC scores exceeding 0.90 for both drug labels from FDA and EMA and literature abstracts from the Critical Assessment of Massive Data Analysis (CAMDA). Discussion: Moreover, the text mining and XGBoost functions demonstrated in this study can be applied to other text processing and classification tasks.

9.
Eur Heart J Digit Health ; 5(3): 270-277, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38774371

RESUMO

Aims: Out-of-hospital cardiac arrest (OHCA) is a major health concern worldwide. Although one-third of all patients achieve a return of spontaneous circulation and may undergo a difficult period in the intensive care unit, only 1 in 10 survive. This study aims to improve our previously developed machine learning model for early prognostication of survival in OHCA. Methods and results: We studied all cases registered in the Swedish Cardiopulmonary Resuscitation Registry during 2010 and 2020 (n = 55 615). We compared the predictive performance of extreme gradient boosting (XGB), light gradient boosting machine (LightGBM), logistic regression, CatBoost, random forest, and TabNet. For each framework, we developed models that optimized (i) a weighted F1 score to penalize models that yielded more false negatives and (ii) a precision-recall area under the curve (PR AUC). LightGBM assigned higher importance values to a larger set of variables, while XGB made predictions using fewer predictors. The area under the curve receiver operating characteristic (AUC ROC) scores for LightGBM were 0.958 (optimized for weighted F1) and 0.961 (optimized for a PR AUC), while for XGB, the scores were 0.958 and 0.960, respectively. The calibration plots showed a subtle underestimation of survival for LightGBM, contrasting with a mild overestimation for XGB models. In the crucial range of 0-10% likelihood of survival, the XGB model, optimized with the PR AUC, emerged as a clinically safe model. Conclusion: We improved our previous prediction model by creating a parsimonious model with an AUC ROC at 0.96, with excellent calibration and no apparent risk of underestimating survival in the critical probability range (0-10%). The model is available at www.gocares.se.

10.
Front Big Data ; 7: 1353469, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38817683

RESUMO

Objective: To develop a robust machine learning prediction model for the automatic screening and diagnosis of obstructive sleep apnea (OSA) using five advanced algorithms, namely Extreme Gradient Boosting (XGBoost), Logistic Regression (LR), Support Vector Machine (SVM), Light Gradient Boosting Machine (LightGBM), and Random Forest (RF) to provide substantial support for early clinical diagnosis and intervention. Methods: We conducted a retrospective analysis of clinical data from 439 patients who underwent polysomnography at the Affiliated Hospital of Xuzhou Medical University between October 2019 and October 2022. Predictor variables such as demographic information [age, sex, height, weight, body mass index (BMI)], medical history, and Epworth Sleepiness Scale (ESS) were used. Univariate analysis was used to identify variables with significant differences, and the dataset was then divided into training and validation sets in a 4:1 ratio. The training set was established to predict OSA severity grading. The validation set was used to assess model performance using the area under the curve (AUC). Additionally, a separate analysis was conducted, categorizing the normal population as one group and patients with moderate-to-severe OSA as another. The same univariate analysis was applied, and the dataset was divided into training and validation sets in a 4:1 ratio. The training set was used to build a prediction model for screening moderate-to-severe OSA, while the validation set was used to verify the model's performance. Results: Among the four groups, the LightGBM model outperformed others, with the top five feature importance rankings of ESS total score, BMI, sex, hypertension, and gastroesophageal reflux (GERD), where Age, ESS total score and BMI played the most significant roles. In the dichotomous model, RF is the best performer of the five models respectively. The top five ranked feature importance of the best-performing RF models were ESS total score, BMI, GERD, age and Dry mouth, with ESS total score and BMI being particularly pivotal. Conclusion: Machine learning-based prediction models for OSA disease grading and screening prove instrumental in the early identification of patients with moderate-to-severe OSA, revealing pertinent risk factors and facilitating timely interventions to counter pathological changes induced by OSA. Notably, ESS total score and BMI emerge as the most critical features for predicting OSA, emphasizing their significance in clinical assessments. The dataset will be publicly available on my Github.

11.
Front Oncol ; 14: 1401496, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38812780

RESUMO

Liver cancer is one of the most prevalent forms of cancer worldwide. A significant proportion of patients with hepatocellular carcinoma (HCC) are diagnosed at advanced stages, leading to unfavorable treatment outcomes. Generally, the development of HCC occurs in distinct stages. However, the diagnostic and intervention markers for each stage remain unclear. Therefore, there is an urgent need to explore precise grading methods for HCC. Machine learning has emerged as an effective technique for studying precise tumor diagnosis. In this research, we employed random forest and LightGBM machine learning algorithms for the first time to construct diagnostic models for HCC at various stages of progression. We categorized 118 samples from GSE114564 into three groups: normal liver, precancerous lesion (including chronic hepatitis, liver cirrhosis, dysplastic nodule), and HCC (including early stage HCC and advanced HCC). The LightGBM model exhibited outstanding performance (accuracy = 0.96, precision = 0.96, recall = 0.96, F1-score = 0.95). Similarly, the random forest model also demonstrated good performance (accuracy = 0.83, precision = 0.83, recall = 0.83, F1-score = 0.83). When the progression of HCC was categorized into the most refined six stages: normal liver, chronic hepatitis, liver cirrhosis, dysplastic nodule, early stage HCC, and advanced HCC, the diagnostic model still exhibited high efficacy. Among them, the LightGBM model exhibited good performance (accuracy = 0.71, precision = 0.71, recall = 0.71, F1-score = 0.72). Also, performance of the LightGBM model was superior to that of the random forest model. Overall, we have constructed a diagnostic model for the progression of HCC and identified potential diagnostic characteristic gene for the progression of HCC.

12.
Sensors (Basel) ; 24(10)2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38794105

RESUMO

Heavy metal pollution in farmland soil threatens soil environmental quality. It is an important task to quickly grasp the status of heavy metal pollution in farmland soil in a region. Hyperspectral remote sensing technology has been widely used in soil heavy metal concentration monitoring. How to improve the accuracy and reliability of its estimation model is a hot topic. This study analyzed 440 soil samples from Sihe Town and the surrounding agricultural areas in Yushu City, Jilin Province. Considering the differences between different types of soils, a local regression model of heavy metal concentrations (As and Cu) was established based on projection pursuit (PP) and light gradient boosting machine (LightGBM) algorithms. Based on the estimations, a spatial distribution map of soil heavy metals in the region was drawn. The findings of this study showed that considering the differences between different soils to construct a local regression estimation model of soil heavy metal concentration improved the estimation accuracy. Specifically, the relative percent difference (RPD) of As and Cu element estimations in black soil increased the most, by 0.30 and 0.26, respectively. The regional spatial distribution map of heavy metal concentration derived from local regression showed high spatial variability. The number of characteristic bands screened by the PP method accounted for 10-13% of the total spectral bands, effectively reducing the model complexity. Compared with the traditional machine model, the LightGBM model showed better estimation ability, and the highest determination coefficients (R2) of different soil validation sets reached 0.73 (As) and 0.75 (Cu), respectively. In this study, the constructed PP-LightGBM estimation model takes into account the differences in soil types, which effectively improves the accuracy and reliability of hyperspectral image estimation of soil heavy metal concentration and provides a reference for drawing large-scale spatial distributions of heavy metals from hyperspectral images and mastering soil environmental quality.

13.
Plant Methods ; 20(1): 77, 2024 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-38797847

RESUMO

BACKGROUND: Taraxacum kok-saghyz Rodin (TKS) is a highly potential source of natural rubber (NR) due to its wide range of suitable planting areas, strong adaptability, and suitability for mechanized planting and harvesting. However, current methods for detecting NR content are relatively cumbersome, necessitating the development of a rapid detection model. This study used near-infrared spectroscopy technology to establish a rapid detection model for NR content in TKS root segments and powder samples. The K445 strain at different growth stages within a year and 129 TKS samples hybridized with dandelion were used to obtain their near-infrared spectral data. The rubber content in the root of the samples was detected using the alkaline boiling method. The Monte Carlo sampling method (MCS) was used to filter abnormal data from the root segments of TKS and powder samples, respectively. The SPXY algorithm was used to divide the training set and validation set in a 3:1 ratio. The original spectrum was preprocessed using moving window smoothing (MWS), standard normalized variate (SNV), multiplicative scatter correction (MSC), and first derivative (FD) algorithms. The competitive adaptive reweighted sampling (CARS) algorithm and the corresponding chemical characteristic bands of NR were used to screen the bands. Partial least squares (PLS), random forest (RF), Lightweight gradient augmentation machine (LightGBM), and convolutional neural network (CNN) algorithms were employed to establish a model using the optimal spectral processing method for three different bands: full band, CARS algorithm, and chemical characteristic bands corresponding to NR. The model with the best predictive performance for high rubber content intervals (rubber content > 15%) was identified. RESULT: The results indicated that the optimal rubber content prediction models for TKS root segments and powder samples were MWS-FD CASR-RF and MWS-FD chemical characteristic band RF, respectively. Their respective R P 2 , RMSEP, and RPDP values were 0.951, 0.979, 1.814, 1.133, 4.498, and 6.845. In the high rubber content range, the model based on the LightGBM algorithm had the best prediction performance, with the RMSEP of the root segments and powder samples being 0.752 and 0.918, respectively. CONCLUSIONS: This research indicates that dried TKS root powder samples are more appropriate for constructing a rubber content prediction model than segmented samples, and the predictive capability of root powder samples is superior to that of root segmented samples. Especially in the elevated rubber content range, the model formulated using the LightGBM algorithm has superior predictive performance, which could offer a theoretical basis for the rapid detection technology of TKS content in the future.

14.
Phys Med Biol ; 69(11)2024 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-38749471

RESUMO

Accurate diagnosis and treatment assessment of liver fibrosis face significant challenges, including inherent limitations in current techniques like sampling errors and inter-observer variability. Addressing this, our study introduces a novel machine learning (ML) framework, which integrates light gradient boosting machine and multivariate imputation by chained equations to enhance liver status assessment using biomechanical markers. Building upon our previously established multiscale mechanical characteristics in fibrotic and treated livers, this framework employs Gaussian Bayesian optimization for post-imputation, significantly improving classification performance. Our findings indicate a marked increase in the precision of liver fibrosis diagnosis and provide a novel, quantitative approach for assessing fibrosis treatment. This innovative combination of multiscale biomechanical markers with advanced ML algorithms represents a transformative step in liver disease diagnostics and treatment evaluation, with potential implications for other areas in medical diagnostics.


Assuntos
Cirrose Hepática , Aprendizado de Máquina , Fenômenos Biomecânicos , Humanos , Fenômenos Mecânicos , Teorema de Bayes , Animais , Biomarcadores/metabolismo
15.
BMC Public Health ; 24(1): 1413, 2024 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-38802838

RESUMO

OBJECTIVE: To explore the factors affecting delayed medical decision-making in older patients with acute ischemic stroke (AIS) using logistic regression analysis and the Light Gradient Boosting Machine (LightGBM) algorithm, and compare the two predictive models. METHODS: A cross-sectional study was conducted among 309 older patients aged ≥ 60 who underwent AIS. Demographic characteristics, stroke onset characteristics, previous stroke knowledge level, health literacy, and social network were recorded. These data were separately inputted into logistic regression analysis and the LightGBM algorithm to build the predictive models for delay in medical decision-making among older patients with AIS. Five parameters of Accuracy, Recall, F1 Score, AUC and Precision were compared between the two models. RESULTS: The medical decision-making delay rate in older patients with AIS was 74.76%. The factors affecting medical decision-making delay, identified through logistic regression and LightGBM algorithm, were as follows: stroke severity, stroke recognition, previous stroke knowledge, health literacy, social network (common factors), mode of onset (logistic regression model only), and reaction from others (LightGBM algorithm only). The LightGBM model demonstrated the more superior performance, achieving the higher AUC of 0.909. CONCLUSIONS: This study used advanced LightGBM algorithm to enable early identification of delay in medical decision-making groups in the older patients with AIS. The identified influencing factors can provide critical insights for the development of early prevention and intervention strategies to reduce delay in medical decisions-making among older patients with AIS and promote patients' health. The LightGBM algorithm is the optimal model for predicting the delay in medical decision-making among older patients with AIS.


Assuntos
Algoritmos , Tomada de Decisão Clínica , AVC Isquêmico , Humanos , Idoso , Feminino , Masculino , Estudos Transversais , Modelos Logísticos , AVC Isquêmico/terapia , Pessoa de Meia-Idade , Idoso de 80 Anos ou mais , Letramento em Saúde/estatística & dados numéricos
16.
Environ Sci Technol ; 58(23): 10128-10139, 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38743597

RESUMO

Pervaporation (PV) is an effective membrane separation process for organic dehydration, recovery, and upgrading. However, it is crucial to improve membrane materials beyond the current permeability-selectivity trade-off. In this research, we introduce machine learning (ML) models to identify high-potential polymers, greatly improving the efficiency and reducing cost compared to conventional trial-and-error approach. We utilized the largest PV data set to date and incorporated polymer fingerprints and features, including membrane structure, operating conditions, and solute properties. Dimensionality reduction, missing data treatment, seed randomness, and data leakage management were employed to ensure model robustness. The optimized LightGBM models achieved RMSE of 0.447 and 0.360 for separation factor and total flux, respectively (logarithmic scale). Screening approximately 1 million hypothetical polymers with ML models resulted in identifying polymers with a predicted permeation separation index >30 and synthetic accessibility score <3.7 for acetic acid extraction. This study demonstrates the promise of ML to accelerate tailored membrane designs.


Assuntos
Aprendizado de Máquina , Polímeros , Polímeros/química , Membranas Artificiais , Permeabilidade
17.
Sci Rep ; 14(1): 12539, 2024 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-38822049

RESUMO

Mine water inrush is a serious threat to mine safety production. It is very important to identify water inrush source types quickly to prevent and control water damage. In this study, the aqueous chemical components Na+ + K+, Ca2+, Mg2+, Cl-, SO42- and HCO3- of different aquifers in Pingdingshan coalfield were selected as the characteristic values, and the Surface water, Quaternary pore water, Carboniferous limestone karst water, Permian sandstone water, and Cambrian limestone karst water were used as the labels. An intelligent water source discrimination model is proposed by combining data mining, classification models, and reinforcement learning. As outlier data in the samples may interfere with the model recognition ability, the data distribution range was analyzed using box plots, and 20 groups of abnormal samples were excluded. The processed water chemistry data were divided into 80% learning samples and 20% test samples, and the learning samples were fed into a light gradient boosting machine (LightGBM) for training. The tree-structured parson estimator (TPE) obtains the optimal values of the main parameters of LightGBM in a very short time. Substituting the hyperparameters back into the model yields a 13.9% improvement in the accuracy of the model, proving the effectiveness of the TPE algorithm. To further validate the performance of the model, TPE-LightGBM is compared and analyzed with a Random Search-Multi Layer Perceptron Machine (RS-MLP) and Genetic Algorithm-Extreme Gradient Boosting Tree (GA-SVM). The accuracy of TPE-LightGBM, RS-MLP, and GA-SVM is 0.931, 0.759, 0.724 in that order, and the generalization error RMSE is 0.415, 1.05, and 1.313 in that order. The results show that TPE-LightGBM is more advantageous in water source identification and is more resistant to overfitting. By calculating and comparing the information gain of each variable, the contribution of Ca2+ is the highest, so it is necessary to pay attention to the change in Ca2+ concentration. TPE-LightGBM's high accuracy and generalization ability have a good prospect for the identification of sudden water source types.

18.
Clin Transplant ; 38(4): e15316, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38607291

RESUMO

BACKGROUND: The incidence of graft failure following liver transplantation (LTx) is consistent. While traditional risk scores for LTx have limited accuracy, the potential of machine learning (ML) in this area remains uncertain, despite its promise in other transplant domains. This study aims to determine ML's predictive limitations in LTx by replicating methods used in previous heart transplant research. METHODS: This study utilized the UNOS STAR database, selecting 64,384 adult patients who underwent LTx between 2010 and 2020. Gradient boosting models (XGBoost and LightGBM) were used to predict 14, 30, and 90-day graft failure compared to conventional logistic regression model. Models were evaluated using both shuffled and rolling cross-validation (CV) methodologies. Model performance was assessed using the AUC across validation iterations. RESULTS: In a study comparing predictive models for 14-day, 30-day and 90-day graft survival, LightGBM consistently outperformed other models, achieving the highest AUC of.740,.722, and.700 in shuffled CV methods. However, in rolling CV the accuracy of the model declined across every ML algorithm. The analysis revealed influential factors for graft survival prediction across all models, including total bilirubin, medical condition, recipient age, and donor AST, among others. Several features like donor age and recipient diabetes history were important in two out of three models. CONCLUSIONS: LightGBM enhances short-term graft survival predictions post-LTx. However, due to changing medical practices and selection criteria, continuous model evaluation is essential. Future studies should focus on temporal variations, clinical implications, and ensure model transparency for broader medical utility.


Assuntos
Transplante de Fígado , Adulto , Humanos , Transplante de Fígado/efeitos adversos , Projetos de Pesquisa , Algoritmos , Bilirrubina , Aprendizado de Máquina
19.
Sci Rep ; 14(1): 7179, 2024 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-38531936

RESUMO

In order to improve the accuracy of transformer fault diagnosis and improve the influence of unbalanced samples on the low accuracy of model identification caused by insufficient model training, this paper proposes a transformer fault diagnosis method based on SMOTE and NGO-GBDT. Firstly, the Synthetic Minority Over-sampling Technique (SMOTE) was used to expand the minority samples. Secondly, the non-coding ratio method was used to construct multi-dimensional feature parameters, and the Light Gradient Boosting Machine (LightGBM) feature optimization strategy was introduced to screen the optimal feature subset. Finally, Northern Goshawk Optimization (NGO) algorithm was used to optimize the parameters of Gradient Boosting Decision Tree (GBDT), and then the transformer fault diagnosis was realized. The results show that the proposed method can reduce the misjudgment of minority samples. Compared with other integrated models, the proposed method has high fault identification accuracy, low misjudgment rate and stable performance.

20.
Front Genet ; 15: 1356205, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38495672

RESUMO

Introduction: Long non-coding RNAs (lncRNAs) have been in the clinical use as potential prognostic biomarkers of various types of cancer. Identifying associations between lncRNAs and diseases helps capture the potential biomarkers and design efficient therapeutic options for diseases. Wet experiments for identifying these associations are costly and laborious. Methods: We developed LDA-SABC, a novel boosting-based framework for lncRNA-disease association (LDA) prediction. LDA-SABC extracts LDA features based on singular value decomposition (SVD) and classifies lncRNA-disease pairs (LDPs) by incorporating LightGBM and AdaBoost into the convolutional neural network. Results: The LDA-SABC performance was evaluated under five-fold cross validations (CVs) on lncRNAs, diseases, and LDPs. It obviously outperformed four other classical LDA inference methods (SDLDA, LDNFSGB, LDASR, and IPCAF) through precision, recall, accuracy, F1 score, AUC, and AUPR. Based on the accurate LDA prediction performance of LDA-SABC, we used it to find potential lncRNA biomarkers for lung cancer. The results elucidated that 7SK and HULC could have a relationship with non-small-cell lung cancer (NSCLC) and lung adenocarcinoma (LUAD), respectively. Conclusion: We hope that our proposed LDA-SABC method can help improve the LDA identification.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...