Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 10.606
1.
Sci Rep ; 14(1): 12700, 2024 06 03.
Article En | MEDLINE | ID: mdl-38830957

Fungicide mixtures are an effective strategy in delaying the development of fungicide resistance. In this research, a fixed ratio ray design method was used to generate fifty binary mixtures of five fungicides with diverse modes of action. The interaction of these mixtures was then analyzed using CA and IA models. QSAR modeling was conducted to assess their fungicidal activity through multiple linear regression (MLR), support vector machine (SVM), and artificial neural network (ANN). Most mixtures exhibited additive interaction, with the CA model proving more accurate than the IA model in predicting fungicidal activity. The MLR model showed a good linear correlation between selected theoretical descriptors by the genetic algorithm and fungicidal activity. However, both ML-based models demonstrated better predictive performance than the MLR model. The ANN model showed slightly better predictability than the SVM model, with R2 and R2cv at 0.91 and 0.81, respectively. For external validation, the R2test value was 0.845. In contrast, the SVM model had values of 0.91, 0.78, and 0.77 for the same metrics. In conclusion, the proposed ML-based model can be a valuable tool for developing potent fungicidal mixtures to delay fungicidal resistance emergence.


Fungicides, Industrial , Machine Learning , Quantitative Structure-Activity Relationship , Fungicides, Industrial/pharmacology , Fungicides, Industrial/chemistry , Support Vector Machine , Neural Networks, Computer , Linear Models
2.
Clin Respir J ; 18(5): e13769, 2024 May.
Article En | MEDLINE | ID: mdl-38736274

BACKGROUND: Lung cancer is the leading cause of cancer-related death worldwide. This study aimed to establish novel multiclassification prediction models based on machine learning (ML) to predict the probability of malignancy in pulmonary nodules (PNs) and to compare with three published models. METHODS: Nine hundred fourteen patients with PNs were collected from four medical institutions (A, B, C and D), which were organized into tables containing clinical features, radiologic features and laboratory test features. Patients were divided into benign lesion (BL), precursor lesion (PL) and malignant lesion (ML) groups according to pathological diagnosis. Approximately 80% of patients in A (total/male: 632/269, age: 57.73 ± 11.06) were randomly selected as a training set; the remaining 20% were used as an internal test set; and the patients in B (total/male: 94/53, age: 60.04 ± 11.22), C (total/male: 94/47, age: 59.30 ± 9.86) and D (total/male: 94/61, age: 62.0 ± 11.09) were used as an external validation set. Logical regression (LR), decision tree (DT), random forest (RF) and support vector machine (SVM) were used to establish prediction models. Finally, the Mayo model, Peking University People's Hospital (PKUPH) model and Brock model were externally validated in our patients. RESULTS: The AUC values of RF model for MLs, PLs and BLs were 0.80 (95% CI: 0.73-0.88), 0.90 (95% CI: 0.82-0.99) and 0.75 (95% CI: 0.67-0.88), respectively. The weighted average AUC value of the RF model for the external validation set was 0.71 (95% CI: 0.67-0.73), and its AUC values for MLs, PLs and BLs were 0.71 (95% CI: 0.68-0.79), 0.98 (95% CI: 0.88-1.07) and 0.68 (95% CI: 0.61-0.74), respectively. The AUC values of the Mayo model, PKUPH model and Brock model were 0.68 (95% CI: 0.62-0.74), 0.64 (95% CI: 0.58-0.70) and 0.57 (95% CI: 0.49-0.65), respectively. CONCLUSIONS: The RF model performed best, and its predictive performance was better than that of the three published models, which may provide a new noninvasive method for the risk assessment of PNs.


Lung Neoplasms , Machine Learning , Multiple Pulmonary Nodules , Aged , Female , Humans , Male , Middle Aged , Decision Trees , Lung Neoplasms/pathology , Lung Neoplasms/diagnosis , Lung Neoplasms/diagnostic imaging , Multiple Pulmonary Nodules/diagnostic imaging , Multiple Pulmonary Nodules/pathology , Multiple Pulmonary Nodules/diagnosis , Predictive Value of Tests , Retrospective Studies , ROC Curve , Solitary Pulmonary Nodule/diagnostic imaging , Solitary Pulmonary Nodule/pathology , Solitary Pulmonary Nodule/diagnosis , Support Vector Machine , Tomography, X-Ray Computed/methods
3.
Aquat Toxicol ; 271: 106936, 2024 Jun.
Article En | MEDLINE | ID: mdl-38723470

In recent years, with the rapid development of society, organic compounds have been released into aquatic environments in various forms, posing a significant threat to the survival of aquatic organisms. The assessment of developmental toxicity is an important part of environmental safety risk systems, helping to identify the potential impacts of organic compounds on the embryonic development of aquatic organisms and enabling early detection and warning of potential ecological risks. Additionally, binary classification models cannot accurately classify organic compounds. Therefore, it is crucial to construct a multiclassification model for predicting the developmental toxicity of organic compounds. In this study, binary and multiclassification models were developed based on the ToxCast™ Phase I chemical library and literature data. The random forest, support vector machine, extreme gradient boosting, adaptive gradient boosting, and C5.0 decision tree algorithms, as well as 8 types of molecular fingerprint were used to establish a multiclassification base model for predicting developmental toxicity through 5-fold cross-validation and external validation. Ultimately, a multiclassification ensemble model was derived through a voting method. The performance of the binary ensemble model, as measured by the balanced accuracy, was 0.918, while that of the multiclassification model was 0.819. The developmental toxicity voting ensemble model (DT-VEM) achieved accuracies of 0.804, 0.834, and 0.855. Furthermore, by utilizing the XGBoost machine learning algorithm to construct separate models for molecular descriptors and substructure molecular fingerprints, we identified several substructures and physical properties related to developmental toxicity. Our research contributes to a more detailed classification of developmental toxicity, providing a new and valuable tool for predicting the developmental toxicity effects of unknown compounds. This supplement addresses the limitations of previous tools, as it offers an enhanced ability to predict potential developmental toxicity in novel compounds.


Water Pollutants, Chemical , Zebrafish , Animals , Water Pollutants, Chemical/toxicity , Embryo, Nonmammalian/drug effects , Toxicity Tests , Embryonic Development/drug effects , Models, Biological , Algorithms , Support Vector Machine , Organic Chemicals/toxicity
4.
BMC Immunol ; 25(1): 26, 2024 May 03.
Article En | MEDLINE | ID: mdl-38702611

BACKGROUND: Early-onset schizophrenia (EOS) is a type of schizophrenia (SCZ) with an age of onset of < 18 years. An abnormal inflammatory immune system may be involved in the occurrence and development of SCZ. We aimed to identify the immune characteristic genes and cells involved in EOS and to further explore the pathogenesis of EOS from the perspective of immunology. METHODS: We obtained microarray data from a whole-genome mRNA expression in peripheral blood mononuclear cells (PBMCs); 19 patients with EOS (age range: 14.79 ± 1.90) and 18 healthy controls (HC) (age range: 15.67 ± 2.40) were involved. We screened for differentially expressed genes (DEGs) using the Limma software package and modular genes using weighted gene co-expression network analysis (WGCNA). In addition, to identify immune characteristic genes and cells, we performed enrichment analysis, immune infiltration analysis, and receiver operating characteristic (ROC) curve analysis; we also used a random forest (RF), a support vector machine (SVM), and the LASSO-Cox algorithm. RESULTS: We selected the following immune characteristic genes: CCL8, PSMD1, AVPR1B and SEMG1. We employed a RF, a SVM, and the LASSO-Cox algorithm. We identified the following immune characteristic cells: activated mast cells, CD4+ memory resting T cells, resting mast cells, neutrophils and CD4+ memory activated T cells. In addition, the AUC values of the immune characteristic genes and cells were all > 0.7. CONCLUSION: Our results indicate that immune system function is altered in SCZ. In addition, CCL8, PSMD1, AVPR1B and SEMG1 may regulate peripheral immune cells in EOS. Further, immune characteristic genes and cells are expected to be diagnostic markers and therapeutic targets of SCZ.


Leukocytes, Mononuclear , Schizophrenia , Humans , Schizophrenia/immunology , Schizophrenia/genetics , Male , Female , Adolescent , Leukocytes, Mononuclear/immunology , Gene Expression Profiling , Age of Onset , Gene Regulatory Networks , Chemokine CCL8/genetics , Immune System , ROC Curve , Support Vector Machine
5.
BMC Med Imaging ; 24(1): 104, 2024 May 03.
Article En | MEDLINE | ID: mdl-38702613

BACKGROUND: The role of isocitrate dehydrogenase (IDH) mutation status for glioma stratification and prognosis is established. While structural magnetic resonance image (MRI) is a promising biomarker, it may not be sufficient for non-invasive characterisation of IDH mutation status. We investigated the diagnostic value of combined diffusion tensor imaging (DTI) and structural MRI enhanced by a deep radiomics approach based on convolutional neural networks (CNNs) and support vector machine (SVM), to determine the IDH mutation status in Central Nervous System World Health Organization (CNS WHO) grade 2-4 gliomas. METHODS: This retrospective study analyzed the DTI-derived fractional anisotropy (FA) and mean diffusivity (MD) images and structural images including fluid attenuated inversion recovery (FLAIR), non-enhanced T1-, and T2-weighted images of 206 treatment-naïve gliomas, including 146 IDH mutant and 60 IDH-wildtype ones. The lesions were manually segmented by experienced neuroradiologists and the masks were applied to the FA and MD maps. Deep radiomics features were extracted from each subject by applying a pre-trained CNN and statistical description. An SVM classifier was applied to predict IDH status using imaging features in combination with demographic data. RESULTS: We comparatively assessed the CNN-SVM classifier performance in predicting IDH mutation status using standalone and combined structural and DTI-based imaging features. Combined imaging features surpassed stand-alone modalities for the prediction of IDH mutation status [area under the curve (AUC) = 0.846; sensitivity = 0.925; and specificity = 0.567]. Importantly, optimal model performance was noted following the addition of demographic data (patients' age) to structural and DTI imaging features [area under the curve (AUC) = 0.847; sensitivity = 0.911; and specificity = 0.617]. CONCLUSIONS: Imaging features derived from DTI-based FA and MD maps combined with structural MRI, have superior diagnostic value to that provided by standalone structural or DTI sequences. In combination with demographic information, this CNN-SVM model offers a further enhanced non-invasive prediction of IDH mutation status in gliomas.


Brain Neoplasms , Diffusion Tensor Imaging , Glioma , Isocitrate Dehydrogenase , Mutation , Humans , Isocitrate Dehydrogenase/genetics , Glioma/diagnostic imaging , Glioma/genetics , Glioma/pathology , Diffusion Tensor Imaging/methods , Retrospective Studies , Male , Female , Middle Aged , Brain Neoplasms/diagnostic imaging , Brain Neoplasms/genetics , Adult , Aged , Neoplasm Grading , Support Vector Machine , Magnetic Resonance Imaging/methods , Neural Networks, Computer , Radiomics
6.
J Neuroeng Rehabil ; 21(1): 72, 2024 May 03.
Article En | MEDLINE | ID: mdl-38702705

BACKGROUND: Neurodegenerative diseases, such as Parkinson's disease (PD), necessitate frequent clinical visits and monitoring to identify changes in motor symptoms and provide appropriate care. By applying machine learning techniques to video data, automated video analysis has emerged as a promising approach to track and analyze motor symptoms, which could facilitate more timely intervention. However, existing solutions often rely on specialized equipment and recording procedures, which limits their usability in unstructured settings like the home. In this study, we developed a method to detect PD symptoms from unstructured videos of clinical assessments, without the need for specialized equipment or recording procedures. METHODS: Twenty-eight individuals with Parkinson's disease completed a video-recorded motor examination that included the finger-to-nose and hand pronation-supination tasks. Clinical staff provided ground truth scores for the level of Parkinsonian symptoms present. For each video, we used a pre-existing model called PIXIE to measure the location of several joints on the person's body and quantify how they were moving. Features derived from the joint angles and trajectories, designed to be robust to recording angle, were then used to train two types of machine-learning classifiers (random forests and support vector machines) to detect the presence of PD symptoms. RESULTS: The support vector machine trained on the finger-to-nose task had an F1 score of 0.93 while the random forest trained on the same task yielded an F1 score of 0.85. The support vector machine and random forest trained on the hand pronation-supination task had F1 scores of 0.20 and 0.33, respectively. CONCLUSION: These results demonstrate the feasibility of developing video analysis tools to track motor symptoms across variable perspectives. These tools do not work equally well for all tasks, however. This technology has the potential to overcome barriers to access for many individuals with degenerative neurological diseases like PD, providing them with a more convenient and timely method to monitor symptom progression, without requiring a structured video recording procedure. Ultimately, more frequent and objective home assessments of motor function could enable more precise telehealth optimization of interventions to improve clinical outcomes inside and outside of the clinic.


Machine Learning , Parkinson Disease , Video Recording , Humans , Parkinson Disease/diagnosis , Parkinson Disease/physiopathology , Male , Female , Video Recording/methods , Middle Aged , Aged , Support Vector Machine
7.
PeerJ ; 12: e17352, 2024.
Article En | MEDLINE | ID: mdl-38784390

Background: The Yunnan section of the Nujiang River (YNR) Basin in the alpine-valley area is one of the most critical areas of debris flow in China. Methods: We analyzed the applicability of three machine learning algorithms to model of susceptibility to debris flow-Random Forest (RF), the linear kernel support vector machine (Linear SVM), and the radial basis function support vector machine (RBFSVM)-and compared 20 factors to determine the dominant controlling in debris flow occurrence in the region. Results: We found that (1) RF outperformed RBFSVM and Linear SVM in terms of accuracy, (2) topographic conditions were prerequisites, and geology, precipitation, vegetation, and anthropogenic influence were critical to forming debris flows. Also, the relative elevation difference was the most prominent evaluation factor of debris flow susceptibility, and (3) susceptibility maps based on RF's debris flow susceptibility (DFS) showed that zones with very high susceptibility were distributed along the mainstream of the Nujiang River. These findings provide methodological guidance and reference for improvement of DFS assessment. It enriches the content of DFS studies in the alpine-valley areas.


Machine Learning , Rivers , China , Rivers/chemistry , Environmental Monitoring/methods , Support Vector Machine , Algorithms
8.
J Neurosci Methods ; 407: 110156, 2024 Jul.
Article En | MEDLINE | ID: mdl-38703796

BACKGROUND: DBS entails the insertion of an electrode into the patient brain, enabling Subthalamic nucleus (STN) stimulation. Accurate delineation of STN borders is a critical but time-consuming task, traditionally reliant on the neurosurgeon experience in deciphering the intricacies of microelectrode recording (MER). While clinical outcomes of MER have been satisfactory, they involve certain risks to patient safety. Recently, there has been a growing interest in exploring the potential of local field potentials (LFP) due to their correlation with the STN motor territory. METHOD: A novel STN detection system, integrating LFP and wavelet packet transform (WPT) with stacking ensemble learning, is developed. Initial steps involve the inclusion of soft thresholding to increase robustness to LFP variability. Subsequently, non-linear WPT features are extracted. Finally, a unique ensemble model, comprising a dual-layer structure, is developed for STN localization. We harnessed the capabilities of support vector machine, Decision tree and k-Nearest Neighbor in conjunction with long short-term memory (LSTM) network. LSTM is pivotal for assigning adequate weights to every base model. RESULTS: Results reveal that the proposed model achieved a remarkable accuracy and F1-score of 89.49% and 91.63%. COMPARISON WITH EXISTING METHODS: Ensemble model demonstrated superior performance when compared to standalone base models and existing meta techniques. CONCLUSION: This framework is envisioned to enhance the efficiency of DBS surgery and reduce the reliance on clinician experience for precise STN detection. This achievement is strategically significant to serve as an invaluable tool for refining the electrode trajectory, potentially replacing the current methodology based on MER.


Deep Brain Stimulation , Subthalamic Nucleus , Wavelet Analysis , Subthalamic Nucleus/physiology , Humans , Deep Brain Stimulation/methods , Deep Brain Stimulation/instrumentation , Support Vector Machine , Machine Learning , Signal Processing, Computer-Assisted , Microelectrodes
9.
Comput Biol Med ; 176: 108432, 2024 Jun.
Article En | MEDLINE | ID: mdl-38744014

This paper presents a comprehensive exploration of machine learning algorithms (MLAs) and feature selection techniques for accurate heart disease prediction (HDP) in modern healthcare. By focusing on diverse datasets encompassing various challenges, the research sheds light on optimal strategies for early detection. MLAs such as Decision Trees (DT), Random Forests (RF), Support Vector Machines (SVM), Gaussian Naive Bayes (NB), and others were studied, with precision and recall metrics emphasized for robust predictions. Our study addresses challenges in real-world data through data cleaning and one-hot encoding, enhancing the integrity of our predictive models. Feature extraction techniques-Recursive Feature Extraction (RFE), Principal Component Analysis (PCA), and univariate feature selection-play a crucial role in identifying relevant features and reducing data dimensionality. Our findings showcase the impact of these techniques on improving prediction accuracy. Optimized models for each dataset have been achieved through grid search hyperparameter tuning, with configurations meticulously outlined. Notably, a remarkable 99.12 % accuracy was achieved on the first Kaggle dataset, showcasing the potential for accurate HDP. Model robustness across diverse datasets was highlighted, with caution against overfitting. The study emphasizes the need for validation of unseen data and encourages ongoing research for generalizability. Serving as a practical guide, this research aids researchers and practitioners in HDP model development, influencing clinical decisions and healthcare resource allocation. By providing insights into effective algorithms and techniques, the paper contributes to reducing heart disease-related morbidity and mortality, supporting the healthcare community's ongoing efforts.


Heart Diseases , Machine Learning , Precision Medicine , Humans , Precision Medicine/methods , Algorithms , Support Vector Machine
10.
Spectrochim Acta A Mol Biomol Spectrosc ; 317: 124461, 2024 Sep 05.
Article En | MEDLINE | ID: mdl-38759393

Esophageal cancer is one of the leading causes of cancer-related deaths worldwide. The identification of residual tumor tissues in the surgical margin of esophageal cancer is essential for the treatment and prognosis of cancer patients. But the current diagnostic methods, either pathological frozen section or paraffin section examination, are laborious, time-consuming, and inconvenient. Raman spectroscopy is a label-free and non-invasive analytical technique that provides molecular information with high specificity. Here, we report the use of a portable Raman system and machine learning algorithms to achieve accurate diagnosis of esophageal tumor tissue in surgically resected specimens. We tested five machine learning-based classification methods, including k-Nearest Neighbors, Adaptive Boosting, Random Forest, Principal Component Analysis-Linear Discriminant Analysis, and Support Vector Machine (SVM). Among them, SVM shows the highest accuracy (88.61 %) in classifying the esophageal tumor and normal tissues. The portable Raman system demonstrates robust measurements with an acceptable focal plane shift of up to 3 mm, which enables large-area Raman mapping on resected tissues. Based on this, we finally achieve successful Raman visualization of tumor boundaries on surgical margin specimens, and the Raman measurement time is less than 5 min. This work provides a robust, convenient, accurate, and cost-effective tool for the diagnosis of esophageal cancer tumors, advancing toward Raman-based clinical intraoperative applications.


Esophageal Neoplasms , Machine Learning , Spectrum Analysis, Raman , Support Vector Machine , Spectrum Analysis, Raman/methods , Esophageal Neoplasms/diagnosis , Esophageal Neoplasms/pathology , Humans , Discriminant Analysis , Principal Component Analysis , Algorithms
11.
Comput Biol Med ; 176: 108621, 2024 Jun.
Article En | MEDLINE | ID: mdl-38763067

Alzheimer's disease (AD) is a progressive neurodegenerative disorder characterized by cognitive decline, memory impairments, and behavioral changes. The presence of abnormal beta-amyloid plaques and tau protein tangles in the brain is known to be associated with AD. However, current limitations of imaging technology hinder the direct detection of these substances. Consequently, researchers are exploring alternative approaches, such as indirect assessments involving monitoring brain signals, cognitive decline levels, and blood biomarkers. Recent studies have highlighted the potential of integrating genetic information into these approaches to enhance early detection and diagnosis, offering a more comprehensive understanding of AD pathology beyond the constraints of existing imaging methods. Our study utilized electroencephalography (EEG) signals, genotypes, and polygenic risk scores (PRSs) as features for machine learning models. We compared the performance of gradient boosting (XGB), random forest (RF), and support vector machine (SVM) to determine the optimal model. Statistical analysis revealed significant correlations between EEG signals and clinical manifestations, demonstrating the ability to distinguish the complexity of AD from other diseases by using genetic information. By integrating EEG with genetic data in an SVM model, we achieved exceptional classification performance, with an accuracy of 0.920 and an area under the curve of 0.916. This study presents a novel approach of utilizing real-time EEG data and genetic background information for multimodal machine learning. The experimental results validate the effectiveness of this concept, providing deeper insights into the actual condition of patients with AD and overcoming the limitations associated with single-oriented data.


Alzheimer Disease , Electroencephalography , Alzheimer Disease/genetics , Alzheimer Disease/physiopathology , Humans , Electroencephalography/methods , Female , Male , Machine Learning , Support Vector Machine , Aged , Signal Processing, Computer-Assisted , Algorithms
12.
Front Biosci (Landmark Ed) ; 29(5): 197, 2024 May 21.
Article En | MEDLINE | ID: mdl-38812315

BACKGROUND: Ubiquitination is a crucial post-translational modification of proteins that regulates diverse cellular functions. Accurate identification of ubiquitination sites in proteins is vital for understanding fundamental biological mechanisms, such as cell cycle and DNA repair. Conventional experimental approaches are resource-intensive, whereas machine learning offers a cost-effective means of accurately identifying ubiquitination sites. The prediction of ubiquitination sites is species-specific, with many existing models being tailored for Arabidopsis thaliana (A. thaliana) and Homo sapiens (H. sapiens). However, these models have shortcomings in sequence window selection and feature extraction, leading to suboptimal performance. METHODS: This study initially employed the chi-square test to determine the optimal sequence window. Subsequently, a combination of six features was assessed: Binary Encoding (BE), Composition of K-Spaced Amino Acid Pair (CKSAAP), Enhanced Amino Acid Composition (EAAC), Position Weight Matrix (PWM), 531 Properties of Amino Acids (AA531), and Position-Specific Scoring Matrix (PSSM). Comparative evaluation involved three feature selection methods: Minimum Redundancy-Maximum Relevance (mRMR), Elastic net, and Null importances. Alongside these were four classifiers: Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The Null importances combined with the RF model exhibited superior predictive performance, and was denoted as UbNiRF (A. thaliana: ArUbNiRF; H. sapiens: HoUbNiRF). RESULTS: A comprehensive assessment indicated that UbNiRF is superior to existing prediction tools across five performance metrics. It notably excelled in the Matthews Correlation Coefficient (MCC), with values of 0.827 for the A. thaliana dataset and 0.781 for the H. sapiens dataset. Feature analysis underscores the significance of integrating six features and demonstrates their critical role in enhancing model performance. CONCLUSIONS: UbNiRF is a valuable predictive tool for identifying ubiquitination sites in both A. thaliana and H. sapiens. Its robust performance and species-specific discovery capabilities make it extremely useful for elucidating biological processes and disease mechanisms associated with ubiquitination.


Arabidopsis , Ubiquitination , Arabidopsis/metabolism , Arabidopsis/genetics , Humans , Computational Biology/methods , Machine Learning , Arabidopsis Proteins/metabolism , Arabidopsis Proteins/genetics , Algorithms , Support Vector Machine , Random Forest
13.
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue ; 36(4): 345-352, 2024 Apr.
Article Zh | MEDLINE | ID: mdl-38813626

OBJECTIVE: To construct and validate the best predictive model for 28-day death risk in patients with septic shock based on different supervised machine learning algorithms. METHODS: The patients with septic shock meeting the Sepsis-3 criteria were selected from Medical Information Mart for Intensive Care-IV v2.0 (MIMIC-IV v2.0). According to the principle of random allocation, 70% of these patients were used as the training set, and 30% as the validation set. Relevant predictive variables were extracted from three aspects: demographic characteristics and basic vital signs, serum indicators within 24 hours of intensive care unit (ICU) admission and complications possibly affecting indicators, functional scoring and advanced life support. The predictive efficacy of models constructed using five mainstream machine learning algorithms including decision tree classification and regression tree (CART), random forest (RF), support vector machine (SVM), linear regression (LR), and super learner [SL; combined CART, RF and extreme gradient boosting (XGBoost)] for 28-day death in patients with septic shock was compared, and the best algorithm model was selected. The optimal predictive variables were determined by intersecting the results from LASSO regression, RF, and XGBoost algorithms, and a predictive model was constructed. The predictive efficacy of the model was validated by drawing receiver operator characteristic curve (ROC curve), the accuracy of the model was assessed using calibration curves, and the practicality of the model was verified through decision curve analysis (DCA). RESULTS: A total of 3 295 patients with septic shock were included, with 2 164 surviving and 1 131 dying within 28 days, resulting in a mortality of 34.32%. Of these, 2 307 were in the training set (with 792 deaths within 28 days, a mortality of 34.33%), and 988 in the validation set (with 339 deaths within 28 days, a mortality of 34.31%). Five machine learning models were established based on the training set data. After including variables at three aspects, the area under the ROC curve (AUC) of RF, SVM, and LR machine learning algorithm models for predicting 28-day death in septic shock patients in the validation set was 0.823 [95% confidence interval (95%CI) was 0.795-0.849], 0.823 (95%CI was 0.796-0.849), and 0.810 (95%CI was 0.782-0.838), respectively, which were higher than that of the CART algorithm model (AUC = 0.750, 95%CI was 0.717-0.782) and SL algorithm model (AUC = 0.756, 95%CI was 0.724-0.789). Thus above three algorithm models were determined to be the best algorithm models. After integrating variables from three aspects, 16 optimal predictive variables were identified through intersection by LASSO regression, RF, and XGBoost algorithms, including the highest pH value, the highest albumin (Alb), the highest body temperature, the lowest lactic acid (Lac), the highest Lac, the highest serum creatinine (SCr), the highest Ca2+, the lowest hemoglobin (Hb), the lowest white blood cell count (WBC), age, simplified acute physiology score III (SAPS III), the highest WBC, acute physiology score III (APS III), the lowest Na+, body mass index (BMI), and the shortest activated partial thromboplastin time (APTT) within 24 hours of ICU admission. ROC curve analysis showed that the Logistic regression model constructed with above 16 optimal predictive variables was the best predictive model, with an AUC of 0.806 (95%CI was 0.778-0.835) in the validation set. The calibration curve and DCA curve showed that this model had high accuracy and the highest net benefit could reach 0.3, which was significantly outperforming traditional models based on single functional score [APS III score, SAPS III score, and sequential organ failure assessment (SOFA) score] with AUC (95%CI) of 0.746 (0.715-0.778), 0.765 (0.734-0.796), and 0.625 (0.589-0.661), respectively. CONCLUSIONS: The Logistic regression model, constructed using 16 optimal predictive variables including pH value, Alb, body temperature, Lac, SCr, Ca2+, Hb, WBC, SAPS III score, APS III score, Na+, BMI, and APTT, is identified as the best predictive model for the 28-day death risk in patients with septic shock. Its performance is stable, with high discriminative ability and accuracy.


Algorithms , Shock, Septic , Supervised Machine Learning , Support Vector Machine , Humans , Shock, Septic/mortality , Shock, Septic/diagnosis , Female , Prognosis , Intensive Care Units , Male , Middle Aged , Machine Learning , Decision Trees
14.
Spectrochim Acta A Mol Biomol Spectrosc ; 316: 124351, 2024 Aug 05.
Article En | MEDLINE | ID: mdl-38692109

Epidermal growth factor receptor (EGFR) plays a pivotal role in the initiation and progression of gliomas. In particular, in glioblastoma, EGFR amplification emerges as a catalyst for invasion, proliferation, and resistance to radiotherapy and chemotherapy. Current approaches are not capable of providing rapid diagnostic results of molecular pathology. In this study, we propose a terahertz spectroscopic approach for predicting the EGFR amplification status of gliomas for the first time. A machine learning model was constructed using the terahertz response of the measured glioma tissues, including the absorption coefficient, refractive index, and dielectric loss tangent. The novelty of our model is the integration of three classical base classifiers, i.e., support vector machine, random forest, and extreme gradient boosting. The ensemble learning method combines the advantages of various base classifiers, this model has more generalization ability. The effectiveness of the proposed method was validated by applying an individual test set. The optimal performance of the integrated algorithm was verified with an area under the curve (AUC) maximum of 85.8 %. This signifies a significant stride toward more effective and rapid diagnostic tools for guiding postoperative therapy in gliomas.


ErbB Receptors , Glioma , Terahertz Spectroscopy , Humans , Glioma/genetics , Glioma/pathology , Glioma/diagnosis , ErbB Receptors/genetics , ErbB Receptors/metabolism , Terahertz Spectroscopy/methods , Machine Learning , Brain Neoplasms/genetics , Brain Neoplasms/pathology , Gene Amplification , Algorithms , Support Vector Machine
15.
Sci Rep ; 14(1): 12043, 2024 05 27.
Article En | MEDLINE | ID: mdl-38802547

To compare and analyze the diagnostic value of different enhancement stages in distinguishing low and high nuclear grade clear cell renal cell carcinoma (ccRCC) based on enhanced computed tomography (CT) images by building machine learning classifiers. A total of 51 patients (Dateset1, including 41 low-grade and 10 high-grade) and 27 patients (Independent Dateset2, including 16 low-grade and 11 high-grade) with pathologically proven ccRCC were enrolled in this retrospective study. Radiomic features were extracted from the corticomedullary phase (CMP), nephrographic phase (NP), and excretory phase (EP) CT images, and selected using the recursive feature elimination cross-validation (RFECV) algorithm, the group differences were assessed using T-test and Mann-Whitney U test for continuous variables. The support vector machine (SVM), random forest (RF), XGBoost (XGB), VGG11, ResNet18, and GoogLeNet classifiers are established to distinguish low-grade and high-grade ccRCC. The classifiers based on CT images of NP (Dateset1, RF: AUC = 0.82 ± 0.05, ResNet18: AUC = 0.81 ± 0.02; Dateset2, XGB: AUC = 0.95 ± 0.02, ResNet18: AUC = 0.87 ± 0.07) obtained the best performance and robustness in distinguishing low-grade and high-grade ccRCC, while the EP-based classifier performance in poorer results. The CT images of enhanced phase NP had the best performance in diagnosing low and high nuclear grade ccRCC. Firstorder_Kurtosis and firstorder_90Percentile feature play a vital role in the classification task.


Carcinoma, Renal Cell , Kidney Neoplasms , Neoplasm Grading , Tomography, X-Ray Computed , Humans , Carcinoma, Renal Cell/diagnostic imaging , Carcinoma, Renal Cell/pathology , Carcinoma, Renal Cell/diagnosis , Tomography, X-Ray Computed/methods , Female , Male , Middle Aged , Kidney Neoplasms/diagnostic imaging , Kidney Neoplasms/pathology , Kidney Neoplasms/diagnosis , Kidney Neoplasms/classification , Aged , Retrospective Studies , Support Vector Machine , Adult , Machine Learning , Algorithms
16.
Lasers Med Sci ; 39(1): 123, 2024 May 04.
Article En | MEDLINE | ID: mdl-38703302

Interaction of polarized light with healthy and abnormal regions of tissue reveals structural information associated with its pathological condition. Even a slight variation in structural alignment can induce a change in polarization property, which can play a crucial role in the early detection of abnormal tissue morphology. We propose a transmission-based Stokes-Mueller microscope for quantitative analysis of the microstructural properties of the tissue specimen. The Stokes-Mueller based polarization microscopy provides significant structural information of tissue through various polarization parameters such as degree of polarization (DOP), degree of linear polarization (DOLP), and degree of circular polarization (DOCP), anisotropy (r) and Mueller decomposition parameters such as diattenuation, retardance and depolarization. Further, by applying a suitable image processing technique such as Machine learning (ML) output images were analysed effectively. The support vector machine image classification model achieved 95.78% validation accuracy and 94.81% testing accuracy with polarization parameter dataset. The study's findings demonstrate the potential of Stokes-Mueller polarimetry in tissue characterization and diagnosis, providing a valuable tool for biomedical applications.


Breast Neoplasms , Machine Learning , Microscopy, Polarization , Humans , Microscopy, Polarization/methods , Breast Neoplasms/pathology , Female , Support Vector Machine , Image Processing, Computer-Assisted/methods , Carcinoma, Ductal, Breast/pathology , Carcinoma, Ductal, Breast/classification , Carcinoma, Ductal, Breast/diagnostic imaging
17.
PLoS One ; 19(5): e0303287, 2024.
Article En | MEDLINE | ID: mdl-38739586

Globally, stroke is the third-leading cause of mortality and disability combined, and one of the costliest diseases in society. More accurate predictions of stroke outcomes can guide healthcare organizations in allocating appropriate resources to improve care and reduce both the economic and social burden of the disease. We aim to develop and evaluate the performance and explainability of three supervised machine learning models and the traditional multinomial logistic regression (mLR) in predicting functional dependence and death three months after stroke, using routinely-collected data. This prognostic study included adult patients, registered in the Swedish Stroke Registry (Riksstroke) from 2015 to 2020. Riksstroke contains information on stroke care and outcomes among patients treated in hospitals in Sweden. Prognostic factors (features) included demographic characteristics, pre-stroke functional status, cardiovascular risk factors, medications, acute care, stroke type, and severity. The outcome was measured using the modified Rankin Scale at three months after stroke (a scale of 0-2 indicates independent, 3-5 dependent, and 6 dead). Outcome prediction models included support vector machines, artificial neural networks (ANN), eXtreme Gradient Boosting (XGBoost), and mLR. The models were trained and evaluated on 75% and 25% of the dataset, respectively. Model predictions were explained using SHAP values. The study included 102,135 patients (85.8% ischemic stroke, 53.3% male, mean age 75.8 years, and median NIHSS of 3). All models demonstrated similar overall accuracy (69%-70%). The ANN and XGBoost models performed significantly better than the mLR in classifying dependence with F1-scores of 0.603 (95% CI; 0.594-0.611) and 0.577 (95% CI; 0.568-0.586), versus 0.544 (95% CI; 0.545-0.563) for the mLR model. The factors that contributed most to the predictions were expectedly similar in the models, based on clinical knowledge. Our ANN and XGBoost models showed a modest improvement in prediction performance and explainability compared to mLR using routinely-collected data. Their improved ability to predict functional dependence may be of particular importance for the planning and organization of acute stroke care and rehabilitation.


Machine Learning , Stroke , Humans , Sweden/epidemiology , Male , Female , Stroke/physiopathology , Aged , Aged, 80 and over , Prognosis , Middle Aged , Registries , Support Vector Machine , Logistic Models , Neural Networks, Computer , Risk Factors
18.
PLoS One ; 19(5): e0302639, 2024.
Article En | MEDLINE | ID: mdl-38739639

Heart failure (HF) encompasses a diverse clinical spectrum, including instances of transient HF or HF with recovered ejection fraction, alongside persistent cases. This dynamic condition exhibits a growing prevalence and entails substantial healthcare expenditures, with anticipated escalation in the future. It is essential to classify HF patients into three groups based on their ejection fraction: reduced (HFrEF), mid-range (HFmEF), and preserved (HFpEF), such as for diagnosis, risk assessment, treatment choice, and the ongoing monitoring of heart failure. Nevertheless, obtaining a definitive prediction poses challenges, requiring the reliance on echocardiography. On the contrary, an electrocardiogram (ECG) provides a straightforward, quick, continuous assessment of the patient's cardiac rhythm, serving as a cost-effective adjunct to echocardiography. In this research, we evaluate several machine learning (ML)-based classification models, such as K-nearest neighbors (KNN), neural networks (NN), support vector machines (SVM), and decision trees (TREE), to classify left ventricular ejection fraction (LVEF) for three categories of HF patients at hourly intervals, using 24-hour ECG recordings. Information from heterogeneous group of 303 heart failure patients, encompassing HFpEF, HFmEF, or HFrEF classes, was acquired from a multicenter dataset involving both American and Greek populations. Features extracted from ECG data were employed to train the aforementioned ML classification models, with the training occurring in one-hour intervals. To optimize the classification of LVEF levels in coronary artery disease (CAD) patients, a nested cross-validation approach was employed for hyperparameter tuning. HF patients were best classified using TREE and KNN models, with an overall accuracy of 91.2% and 90.9%, and average area under the curve of the receiver operating characteristics (AUROC) of 0.98, and 0.99, respectively. Furthermore, according to the experimental findings, the time periods of midnight-1 am, 8-9 am, and 10-11 pm were the ones that contributed to the highest classification accuracy. The results pave the way for creating an automated screening system tailored for patients with CAD, utilizing optimal measurement timings aligned with their circadian cycles.


Electrocardiography , Heart Failure , Machine Learning , Stroke Volume , Ventricular Function, Left , Humans , Heart Failure/physiopathology , Heart Failure/diagnosis , Female , Male , Electrocardiography/methods , Aged , Ventricular Function, Left/physiology , Middle Aged , Circadian Rhythm/physiology , Support Vector Machine , Neural Networks, Computer
19.
J Forensic Odontostomatol ; 42(1): 22-29, 2024 Apr 30.
Article En | MEDLINE | ID: mdl-38742569

BACKGROUND: The utilization of segmentation method using volumetric data in adults dental age estimation (DAE) from cone-beam computed tomography (CBCT) was further expanded by using current 5-Part Tooth Segmentation (SG) method. Additionally, supervised machine learning modelling -namely support vector regression (SVR) with linear and polynomial kernel, and regression tree - was tested and compared with the multiple linear regression model. MATERIAL AND METHODS: CBCT scans from 99 patients aged between 20 to 59.99 was collected. Eighty eligible teeth including maxillary canine, lateral incisor, and central incisor were used in this study. Enamel to dentine volume ratio, pulp to dentine volume ratio, lower tooth volume ratio, and sex was utilized as independent variable to predict chronological age. RESULTS: No multicollinearity was detected in the models. The best performing model comes from maxillary lateral incisor using SVR with polynomial kernel ( = 0.73). The lowest error rate achieved by the model was given also by maxillary lateral incisor, with 4.86 years of mean average error and 6.05 years of root means squared error. However, demands a complex approach to segment the enamel volume in the crown section and a lengthier labour time of 45 minutes per tooth.


Age Determination by Teeth , Cone-Beam Computed Tomography , Machine Learning , Humans , Adult , Age Determination by Teeth/methods , Male , Female , Young Adult , Middle Aged , Dental Enamel/diagnostic imaging , Dentin/diagnostic imaging , Linear Models , Dental Pulp/diagnostic imaging , Support Vector Machine
20.
Medicine (Baltimore) ; 103(20): e38001, 2024 May 17.
Article En | MEDLINE | ID: mdl-38758850

To identify disease signature genes associated with immune infiltration in nonalcoholic steatohepatitis (NASH), we downloaded 2 publicly available gene expression profiles, GSE164760 and GSE37031, from the gene expression omnibus database. These profiles represent human NASH and control samples and were used for differential genes (DEGs) expression screening. Two machine learning methods, the Least Absolute Shrinkage and Selection Operator regression model and Support Vector Machine Recursive Feature Elimination, were used to identify candidate disease signature genes. The CIBERSORT deconvolution algorithm was employed to analyze the infiltration of 22 immune cell types in NASH. Additionally, we constructed a NASH cell model using HepG2 cells treated with oleic acid and free fatty acids. The construction of the cell model was verified using oil red O staining, and Western blotting was used to detect the protein expression of the disease signature genes in both control and model groups. As a result, a total of 262 DEGs were identified. These DEGs were primarily associated with metal ion transmembrane transporter activity, sodium ion transmembrane transporter protein activity, calcium ion, and neuroactive ligand-receptor interactions. FOS, IGFBP2, dual-specificity phosphatase 1 (DUSP1), and IKZF3 were identified as disease signature genes of NASH by the least absolute shrinkage and selection operator and Support Vector Machine Recursive Feature Elimination algorithms for DEGs analysis. The receiver operating characteristic curves showed that FOS, IGFBP2, DUSP1, and IKZF3 had good diagnostic value (area under receiver operating characteristic curve > 0.8). These findings were validated in the GSE89632 dataset and through cellular assays. Immunocyte infiltration analysis revealed that NASH was associated with CD8 T cells, CD4 T cells, follicular helper T cells, resting NK cells, eosinophils, regulatory T cells, and γδ T cells. The FOS, IGFBP2, DUSP1, and IKZF3 genes were specifically associated with follicular helper T cells. Lipid droplet aggregation significantly increased in HepG2 cells treated with oleic acid and free fatty acids, indicating successful construction of the cell model. In this model, the expression of FOS, IGFBP2, and DUSP1 was significantly decreased, while that of IKZF3 was significantly elevated (P < .01, P < .001) compared with the control group. Therefore, FOS, IGFBP2, DUSP1, and IKZF3 can be considered as disease signature genes associated with immune infiltration in NASH.


Machine Learning , Non-alcoholic Fatty Liver Disease , Humans , Non-alcoholic Fatty Liver Disease/genetics , Non-alcoholic Fatty Liver Disease/immunology , Hep G2 Cells , Gene Expression Profiling/methods , Algorithms , Support Vector Machine , Transcriptome
...