Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 10.609
1.
Sci Rep ; 14(1): 12700, 2024 06 03.
Article En | MEDLINE | ID: mdl-38830957

Fungicide mixtures are an effective strategy in delaying the development of fungicide resistance. In this research, a fixed ratio ray design method was used to generate fifty binary mixtures of five fungicides with diverse modes of action. The interaction of these mixtures was then analyzed using CA and IA models. QSAR modeling was conducted to assess their fungicidal activity through multiple linear regression (MLR), support vector machine (SVM), and artificial neural network (ANN). Most mixtures exhibited additive interaction, with the CA model proving more accurate than the IA model in predicting fungicidal activity. The MLR model showed a good linear correlation between selected theoretical descriptors by the genetic algorithm and fungicidal activity. However, both ML-based models demonstrated better predictive performance than the MLR model. The ANN model showed slightly better predictability than the SVM model, with R2 and R2cv at 0.91 and 0.81, respectively. For external validation, the R2test value was 0.845. In contrast, the SVM model had values of 0.91, 0.78, and 0.77 for the same metrics. In conclusion, the proposed ML-based model can be a valuable tool for developing potent fungicidal mixtures to delay fungicidal resistance emergence.


Fungicides, Industrial , Machine Learning , Quantitative Structure-Activity Relationship , Fungicides, Industrial/pharmacology , Fungicides, Industrial/chemistry , Support Vector Machine , Neural Networks, Computer , Linear Models
2.
Brain Behav ; 14(6): e3550, 2024 Jun.
Article En | MEDLINE | ID: mdl-38841739

BACKGROUND: Cerebral specialization and interhemispheric cooperation are two vital features of the human brain. Their dysfunction may be associated with disease progression in patients with Alzheimer's disease (AD), which is featured as progressive cognitive degeneration and asymmetric neuropathology. OBJECTIVE: This study aimed to examine and define two inherent properties of hemispheric function in patients with AD by utilizing resting-state functional magnetic resonance imaging (rs-fMRI). METHODS: Sixty-four clinically diagnosed AD patients and 52 age- and sex-matched cognitively normal subjects were recruited and underwent MRI and clinical evaluation. We calculated and compared brain specialization (autonomy index, AI) and interhemispheric cooperation (connectivity between functionally homotopic voxels, CFH). RESULTS: In comparison to healthy controls, patients with AD exhibited enhanced AI in the left middle occipital gyrus. This increase in specialization can be attributed to reduced functional connectivity in the contralateral region, such as the right temporal lobe. The CFH of the bilateral precuneus and prefrontal areas was significantly decreased in AD patients compared to controls. Imaging-cognitive correlation analysis indicated that the CFH of the right prefrontal cortex was marginally positively related to the Montreal Cognitive Assessment score in patients and the Auditory Verbal Learning Test score. Moreover, taking abnormal AI and CFH values as features, support vector machine-based classification achieved good accuracy, sensitivity, specificity, and area under the curve by leave-one-out cross-validation. CONCLUSION: This study suggests that individuals with AD have abnormal cerebral specialization and interhemispheric cooperation. This provides new insights for further elucidation of the pathological mechanisms of AD.


Alzheimer Disease , Magnetic Resonance Imaging , Humans , Alzheimer Disease/physiopathology , Alzheimer Disease/diagnostic imaging , Female , Male , Aged , Magnetic Resonance Imaging/methods , Brain/physiopathology , Brain/diagnostic imaging , Middle Aged , Support Vector Machine , Aged, 80 and over
3.
Head Neck ; 46(7): 1660-1670, 2024 Jul.
Article En | MEDLINE | ID: mdl-38695435

OBJECTIVE: This study aimed to explore the potential predictive value of oral microbial signatures for oral squamous cell carcinoma (OSCC) risk based on machine learning algorithms. METHODS: The oral microbiome signatures were assessed in the unstimulated saliva samples of 80 OSCC patients and 179 healthy individuals using 16S rRNA gene sequencing. Four different machine learning classifiers were used to develop prediction models. RESULTS: Compared with control participants, OSCC patients had a higher microbial dysbiosis index (MDI, p < 0.001). Among four machine learning classifiers, random forest (RF) provided the best predictive performance, followed by the support vector machines, artificial neural networks and naive Bayes. After controlling the potential confounders using propensity score matching, the optimal RF model was further developed incorporating a minimal set of 20 bacteria genera, exhibiting better predictive performance than the MDI (AUC: 0.992 vs. 0.775, p < 0.001). CONCLUSIONS: The novel MDI and RF model developed in this study based on oral microbiome signatures may serve as noninvasive tools for predicting OSCC risk.


Carcinoma, Squamous Cell , Machine Learning , Microbiota , Mouth Neoplasms , Saliva , Humans , Mouth Neoplasms/microbiology , Male , Female , Middle Aged , Saliva/microbiology , Carcinoma, Squamous Cell/microbiology , Case-Control Studies , Aged , Algorithms , Predictive Value of Tests , Adult , Dysbiosis/microbiology , Mouth/microbiology , RNA, Ribosomal, 16S/genetics , Support Vector Machine
4.
Braz J Microbiol ; 55(2): 1219-1229, 2024 Jun.
Article En | MEDLINE | ID: mdl-38705959

Cyanobacteria have developed acclimation strategies to adapt to harsh environments, making them a model organism. Understanding the molecular mechanisms of tolerance to abiotic stresses can help elucidate how cells change their gene expression patterns in response to stress. Recent advances in sequencing techniques and bioinformatics analysis methods have led to the discovery of many genes involved in stress response in organisms. The Synechocystis sp. PCC 6803 is a suitable microorganism for studying transcriptome response under environmental stress. Therefore, for the first time, we employed two effective feature selection techniques namely and support vector machine recursive feature elimination (SVM-RFE) and LASSO (Least Absolute Shrinkage Selector Operator) to pinpoint the crucial genes responsive to environmental stresses in Synechocystis sp. PCC 6803. We applied these algorithms of machine learning to analyze the transcriptomic data of Synechocystis sp. PCC 6803 under distinct conditions, encompassing light, salt and iron stress conditions. Seven candidate genes namely sll1862, slr0650, sll0760, slr0091, ssl3044, slr1285, and slr1687 were selected by both LASSO and SVM-RFE algorithms. RNA-seq analysis was performed to validate the efficiency of our feature selection approach in selecting the most important genes. The RNA-seq analysis revealed significantly high expression for five genes namely sll1862, slr1687, ssl3044, slr1285, and slr0650 under ion stress condition. Among these five genes, ssl3044 and slr0650 could be introduced as new potential candidate genes for further confirmatory genetic studies, to determine their roles in their response to abiotic stresses.


Algorithms , Machine Learning , Stress, Physiological , Synechocystis , Synechocystis/genetics , Synechocystis/physiology , Stress, Physiological/genetics , Gene Expression Regulation, Bacterial , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Transcriptome , Computational Biology/methods , Support Vector Machine , Gene Expression Profiling , Light , Genes, Bacterial
5.
J Environ Manage ; 360: 121166, 2024 Jun.
Article En | MEDLINE | ID: mdl-38781876

Accurate identification of urban waterlogging areas and assessing waterlogging susceptibility are crucial for preventing and controlling hazards. Data-driven models are utilized to forecast waterlogging areas by establishing intricate relationships between explanatory variables and waterlogging states. This approach tackles the constraints of mechanistic models, which are frequently complex and unable to incorporate socio-economic factors. Previous research predominantly employed single-type data-driven models to predict waterlogging locations and evaluation of their effectiveness. There is a scarcity of comprehensive performance comparisons and uncertainty analyses of different types of models, as well as a lack of interpretability analysis. The chosen study area was the central area of Beijing, which is prone to waterlogging. Given the high manpower, time, and economic costs associated with collecting waterlogging information, the waterlogging point distribution map released by the Beijing Water Affairs Bureau was selected as labeled samples. Twelve factors affecting waterlogging susceptibility were chosen as explanatory variables to construct Random Forest (RF), Support Vector Machine with Radial Basis Function (SVM-RBF), Particle Swarm Optimization-Weakly Labeled Support Vector Machine (PSO-WELLSVM), and Maximum Entropy (MaxEnt). The utilization of diverse single evaluation indicators (such as F-score, Kappa, AUC, etc.) to assess the model performance may yield conflicting results. The Distance between Indices of Simulation and Observation (DISO) was chosen as a comprehensive measure to assess the model's performance in predicting waterlogging points. PSO-WELLSVM exhibited the highest performance with a DISOtest value of 0.63, outperforming MaxEnt (0.78), which excelled in identifying areas highly susceptible to waterlogging, including extremely high susceptibility zones. The SVM-RBF and RF models demonstrated suboptimal performance and exhibited overfitting. The examination of waterlogging susceptibility distribution maps predicted by the four models revealed significant spatial differences due to variations in computational principles and input parameter complexities. The integration of four WSAMs based on logistic regression has been shown to significantly decrease the uncertainty of a single data-driven model and identify the most flood-prone areas. To improve the interpretability of the data model, a geographical detector was incorporated to demonstrate the explanatory capacity of 12 variables and the process of waterlogging. Building Density (BD) exhibits the highest explanatory power in relation to explain waterlogging susceptibility (Q value = 0.202), followed by Distance to Road, Frequency of Heavy Rainstorms (FHR), DEM, etc. The interaction between BD and FHR results in a nonlinear increase in the explanatory power of waterlogging susceptibility. The presence of waterlogging susceptibility risk in the research area can be attributed to the interactions of multiple factors.


Models, Theoretical , Support Vector Machine , Beijing , Floods
6.
J Environ Manage ; 360: 121225, 2024 Jun.
Article En | MEDLINE | ID: mdl-38796867

As the global demand for clean energy continues to grow, the sustainable development of clean energy projects has become an important topic of research. in order to optimize the performance and sustainability of clean energy projects, this work explores the environmental and economic benefits of the clean energy industry. through the use of Support Vector Machine (SVM) Multi-factor models and a bi-level multi-objective approach, this work conducts comprehensive assessment and optimization. with wind power base a as a case study, the work describes the material consumption of wind turbines, transportation energy consumption and carbon dioxide (CO2) emissions, and infrastructure material consumption through descriptive statistics. Moreover, this work analyzes the characteristics of different wind turbine models in depth. On one hand, the SVM multi-factor model is used to predict and assess the profitability of Wind Power Base A. On the other hand, a bi-level multi-objective approach is applied to optimize the number of units, internal rate of return within the project, and annual average equivalent utilization hours of the Wind Power Base A. The research results indicate that in March, the WilderHill New Energy Global Innovation Index (NEX) was 0.91053, while the predicted value of the SVM multi-factor model was 0.98596. The predicted value is slightly higher than the actual value, demonstrating the model's good grasp of future returns. The cumulative rate of return of Wind Power Base A is 18.83%, with an annualized return of 9.47%, exceeding the market performance by 1.68%. Under the optimization of the bi-level multi-objective approach, the number of units at Wind Power Base A decreases from the original 7004 to 5860, with total purchase and transportation costs remaining basically unchanged. The internal rate of return of the project increases from 8% to 9.3%, and the annual equivalent utilization hours increase to 2044 h, comprehensively improving the investment return and utilization efficiency of the wind power base. Through optimization, significant improvements are achieved in terAs the global demand for clean energy continues to grow, the sustainable development of clean energy projects has become an important topic of research. In order to optimize the performance and sustainability of clean energy projects, this work explores the environmental and economic benefits of the clean energy industry. Through the use of Support Vector Machine (SVM) multi-factor models and a bi-level multi-objective approach, this work conducts comprehensive assessment and optimization. With Wind Power Base A as a case study, the work describes the material consumption of wind turbines, transportation energy consumption and carbon dioxide (CO2) emissions, and infrastructure material consumption through descriptive statistics. Moreover, this work analyzes the characteristics of different wind turbine models in depth. On one hand, the SVM multi-factor model is used to predict and assess the profitability of Wind Power Base A. On the other hand, a bi-level multi-objective approach is applied to optimize the number of units, internal rate of return within the project, and annual average equivalent utilization hours of the Wind Power Base A. The research results indicate that in March, the WilderHill New Energy Global Innovation Index (NEX) was 0.91053, while the predicted value of the SVM multi-factor model was 0.98596. The predicted value is slightly higher than the actual value, demonstrating the model's good grasp of future returns. The cumulative rate of return of Wind Power Base A is 18.83%, with an annualized return of 9.47%, exceeding the market performance by 1.68%. Under the optimization of the bi-level multi-objective approach, the number of units at Wind Power Base A decreases from the original 7004 to 5860, with total purchase and transportation costs remaining basically unchanged. The internal rate of return of the project increases from 8% to 9.3%, and the annual equivalent utilization hours increase to 2044 h, comprehensively improving the investment return and utilization efficiency of the wind power base. Through optimization, significant improvements are achieved in terms of the number of units, internal rate of return within the project, and annual average equivalent utilization hours at Wind Power Base A. The number of units decreases to 5860, with total purchase and transportation costs remaining basically unchanged, the internal rate of return increases to 9.3%, and annual equivalent utilization hours increase to 2044 h. Energy consumption and CO2 emissions are significantly reduced, with energy consumption decreasing by 0.68 × 109 kgce and CO2 emissions decreasing by 1.29 × 109 kg. The optimization effects are mainly concentrated in the production and installation stages, with emission reductions achieved through the recycling and disposal of materials consumed in the early stages. In terms of investment benefits, environmental benefits are enhanced, with a 13.93% reduction in CO2 emissions. Moreover, there is improved energy efficiency, with the energy input-output ratio increasing from 7.73 to 9.31. This indicates that the Wind Power Base A project has significant environmental and energy efficiency advantages in the clean energy industry. This work innovatively provides a comprehensive assessment and optimization scheme for clean energy projects and predicts the profitability of Wind Power Base A using SVM multi-factor models. Besides, this work optimizes key parameters of the project using a bi-level multi-objective approach, thus comprehensively improving the investment return and utilization efficiency of the wind power base. This work provides innovative methods and strong data support for the development of the clean energy industry, which is of great significance for promoting sustainable development under the backdrop of green finance.


Support Vector Machine , Sustainable Development , Wind , Carbon Dioxide , Models, Theoretical , Conservation of Energy Resources/methods
7.
Neuroimage ; 293: 120625, 2024 Jun.
Article En | MEDLINE | ID: mdl-38704056

Principal component analysis (PCA) has been widely employed for dimensionality reduction prior to multivariate pattern classification (decoding) in EEG research. The goal of the present study was to provide an evaluation of the effectiveness of PCA on decoding accuracy (using support vector machines) across a broad range of experimental paradigms. We evaluated several different PCA variations, including group-based and subject-based component decomposition and the application of Varimax rotation or no rotation. We also varied the numbers of PCs that were retained for the decoding analysis. We evaluated the resulting decoding accuracy for seven common event-related potential components (N170, mismatch negativity, N2pc, P3b, N400, lateralized readiness potential, and error-related negativity). We also examined more challenging decoding tasks, including decoding of face identity, facial expression, stimulus location, and stimulus orientation. The datasets also varied in the number and density of electrode sites. Our findings indicated that none of the PCA approaches consistently improved decoding performance related to no PCA, and the application of PCA frequently reduced decoding performance. Researchers should therefore be cautious about using PCA prior to decoding EEG data from similar experimental paradigms, populations, and recording setups.


Electroencephalography , Principal Component Analysis , Support Vector Machine , Humans , Electroencephalography/methods , Female , Male , Adult , Young Adult , Evoked Potentials/physiology , Brain/physiology , Signal Processing, Computer-Assisted
8.
Geriatr Gerontol Int ; 24(6): 595-602, 2024 Jun.
Article En | MEDLINE | ID: mdl-38744528

AIM: As the size of the elderly population gradually increases, musculoskeletal disorders, such as sarcopenia, are increasing. Diagnostic techniques such as X-rays, computed tomography, and magnetic resonance imaging are used to predict and diagnose sarcopenia, and methods using machine learning are gradually increasing. This study aimed to create a model that can predict sarcopenia using physical characteristics and activity-related variables without medical diagnostic equipment, such as imaging equipment, for the elderly aged 60 years or older. METHODS: A sarcopenia prediction model was constructed using public data obtained from the Korea National Health and Nutrition Examination Survey. Models were built using Logistic Regression, Support Vector Machine (SVM), XGBoost, LightGBM, RandomForest, and Multi-layer Perceptron Neural Network (MLP) algorithms, and the feature importance of the models trained with the algorithms, except for SVM and MLP, was analyzed. RESULTS: The sarcopenia prediction model built with the LightGBM algorithm achieved the highest test accuracy, of 0.848. In constructing the LightGBM model, physical characteristic variables such as body mass index, weight, and waist circumference showed high importance, and activity-related variables were also used in constructing the model. CONCLUSIONS: The sarcopenia prediction model, which consisted of only physical characteristics and activity-related factors, showed excellent performance. This model has the potential to assist in the early detection of sarcopenia in the elderly, especially in communities with limited access to medical resources or facilities. Geriatr Gerontol Int 2024; 24: 595-602.


Machine Learning , Sarcopenia , Humans , Sarcopenia/diagnosis , Sarcopenia/epidemiology , Aged , Male , Female , Republic of Korea/epidemiology , Middle Aged , Aged, 80 and over , Nutrition Surveys , Support Vector Machine , Geriatric Assessment/methods , Logistic Models , Algorithms , Neural Networks, Computer , Body Mass Index
9.
PLoS One ; 19(5): e0304469, 2024.
Article En | MEDLINE | ID: mdl-38820430

In recent years, the advancement of hyperspectral remote sensing technology has greatly enhanced the detailed mapping of tree species. Nevertheless, delving deep into the significance of hyperspectral remote sensing data features for tree species recognition remains a challenging endeavor. The method of Hybrid-CS was proposed to addresses this challenge by synergizing the strengths of both deep learning and traditional learning techniques. Initially, we extract comprehensive correlation structures and spectral features. Subsequently, a hybrid approach, combining correlation-based feature selection with an optimized recursive feature elimination algorithm, identifies the most valuable feature set. We leverage the Support Vector Machine algorithm to evaluate feature importance and perform classification. Through rigorous experimentation, we evaluate the robustness of hyperspectral image-derived features and compare our method with other state-of-the-art classification methods. The results demonstrate: (1) Superior classification accuracy compared to traditional machine learning methods (e.g., SVM, RF) and advanced deep learning approaches on the tree species dataset. (2) Enhanced classification accuracy achieved by incorporating SVM and CNN information, particularly with the integration of attention mechanisms into the network architecture. Additionally, the classification performance of a two-branch network surpasses that of a single-branch network. (3) Consistent high accuracy across different proportions of training samples, indicating the stability and robustness of the method. This study underscores the potential of hyperspectral images and our proposed methodology for achieving precise tree species classification, thus holding significant promise for applications in forest resource management and monitoring.


Neural Networks, Computer , Support Vector Machine , Trees , Trees/classification , Algorithms , Hyperspectral Imaging/methods , Deep Learning , Remote Sensing Technology/methods
10.
Food Chem ; 452: 139520, 2024 Sep 15.
Article En | MEDLINE | ID: mdl-38723573

The current study addresses the growing demand for sustainable plant-based cheese alternatives by employing molecular docking and deep learning algorithms to optimize protein-ligand interactions. Focusing on key proteins (zein, soy, and almond protein) along with tocopherol and retinol, the goal was to improve texture, nutritional value, and flavor characteristics via dynamic simulations. The findings demonstrated that the docking analysis presented high accuracy in predicting conformational changes. Flexible docking algorithms provided insights into dynamic interactions, while analysis of energetics revealed variations in binding strengths. Tocopherol exhibited stronger affinity (-5.8Kcal/mol) to zein compared to retinol (-4.1Kcal/mol). Molecular dynamics simulations offered comprehensive insights into stability and behavior over time. The integration of machine learning algorithms improved the classification and the prediction accuracy, achieving a rate of 71.59%. This study underscores the significance of molecular understanding in driving innovation in the plant-based cheese industry, facilitating the development of sustainable alternatives to traditional dairy products.


Cheese , Molecular Docking Simulation , Plant Proteins , Prunus dulcis , Tocopherols , Vitamin A , Zein , Plant Proteins/chemistry , Plant Proteins/metabolism , Cheese/analysis , Prunus dulcis/chemistry , Vitamin A/chemistry , Vitamin A/metabolism , Tocopherols/chemistry , Tocopherols/metabolism , Zein/chemistry , Zein/metabolism , Molecular Dynamics Simulation , Machine Learning , Glycine max/chemistry , Glycine max/metabolism , Support Vector Machine
11.
Spectrochim Acta A Mol Biomol Spectrosc ; 316: 124351, 2024 Aug 05.
Article En | MEDLINE | ID: mdl-38692109

Epidermal growth factor receptor (EGFR) plays a pivotal role in the initiation and progression of gliomas. In particular, in glioblastoma, EGFR amplification emerges as a catalyst for invasion, proliferation, and resistance to radiotherapy and chemotherapy. Current approaches are not capable of providing rapid diagnostic results of molecular pathology. In this study, we propose a terahertz spectroscopic approach for predicting the EGFR amplification status of gliomas for the first time. A machine learning model was constructed using the terahertz response of the measured glioma tissues, including the absorption coefficient, refractive index, and dielectric loss tangent. The novelty of our model is the integration of three classical base classifiers, i.e., support vector machine, random forest, and extreme gradient boosting. The ensemble learning method combines the advantages of various base classifiers, this model has more generalization ability. The effectiveness of the proposed method was validated by applying an individual test set. The optimal performance of the integrated algorithm was verified with an area under the curve (AUC) maximum of 85.8 %. This signifies a significant stride toward more effective and rapid diagnostic tools for guiding postoperative therapy in gliomas.


ErbB Receptors , Glioma , Terahertz Spectroscopy , Humans , Glioma/genetics , Glioma/pathology , Glioma/diagnosis , ErbB Receptors/genetics , ErbB Receptors/metabolism , Terahertz Spectroscopy/methods , Machine Learning , Brain Neoplasms/genetics , Brain Neoplasms/pathology , Gene Amplification , Algorithms , Support Vector Machine
12.
Sensors (Basel) ; 24(10)2024 May 09.
Article En | MEDLINE | ID: mdl-38793849

The origin of agricultural products is crucial to their quality and safety. This study explored the differences in chemical composition and structure of rice from different origins using fluorescence detection technology. These differences are mainly affected by climate, environment, geology and other factors. By identifying the fluorescence characteristic absorption peaks of the same rice seed varieties from different origins, and comparing them with known or standard samples, this study aims to authenticate rice, protect brands, and achieve traceability. The study selected the same variety of rice seed planted in different regions of Jilin Province in the same year as samples. Fluorescence spectroscopy was used to collect spectral data, which was preprocessed by normalization, smoothing, and wavelet transformation to remove noise, scattering, and burrs. The processed spectral data was used as input for the long short-term memory (LSTM) model. The study focused on the processing and analysis of rice spectra based on NZ-WT-processed data. To simplify the model, uninformative variable elimination (UVE) and successive projections algorithm (SPA) were used to screen the best wavelengths. These wavelengths were used as input for the support vector machine (SVM) prediction model to achieve efficient and accurate predictions. Within the fluorescence spectral range of 475-525 nm and 665-690 nm, absorption peaks of nicotinamide adenine dinucleotide (NADPH), riboflavin (B2), starch, and protein were observed. The origin tracing prediction model established using SVM exhibited stable performance with a classification accuracy of up to 99.5%.The experiment demonstrated that fluorescence spectroscopy technology has high discrimination accuracy in tracing the origin of rice, providing a new method for rapid identification of rice origin.


Algorithms , Oryza , Spectrometry, Fluorescence , Support Vector Machine , Oryza/chemistry , Oryza/classification , Spectrometry, Fluorescence/methods , Riboflavin/analysis , NADP/chemistry , NADP/analysis , NADP/metabolism , Starch/analysis , Starch/chemistry , Seeds/chemistry
13.
Sensors (Basel) ; 24(10)2024 May 09.
Article En | MEDLINE | ID: mdl-38793872

This paper proposes a novel soft sensor modeling approach, MIC-TCA-INGO-LSSVM, to address the decline in performance of soft sensor models during the fermentation process of Pichia pastoris, caused by changes in working conditions. Initially, the transfer component analysis (TCA) method is utilized to minimize the differences in data distribution across various working conditions. Subsequently, a least squares support vector machine (LSSVM) model is constructed using the dataset adapted by TCA, and strategies for improving the northern goshawk optimization (INGO) algorithm are proposed to optimize the parameters of the LSSVM model. Finally, to further enhance the model's generalization ability and prediction accuracy, considering the transfer of knowledge from multiple-source working conditions, a sub-model weighted ensemble scheme is proposed based on the maximum information coefficient (MIC) algorithm. The proposed soft sensor model is employed to predict cell and product concentrations during the fermentation process of Pichia pastoris. Simulation results indicate that the RMSE of the INGO-LSSVM model in predicting cell and product concentrations is reduced by 47.3% and 42.1%, respectively, compared to the NGO-LSSVM model. Additionally, TCA significantly enhances the model's adaptability when working conditions change. Moreover, the soft sensor model based on TCA and the MIC-weighted ensemble method achieves a reduction of 41.6% and 31.3% in the RMSE for predicting cell and product concentrations, respectively, compared to the single-source condition transfer model TCA-INGO-LSSVM. These results demonstrate the high reliability and predictive performance of the proposed soft sensor method under varying working conditions.


Algorithms , Fermentation , Support Vector Machine , Least-Squares Analysis , Pichia/metabolism , Saccharomycetales
14.
Sensors (Basel) ; 24(10)2024 May 11.
Article En | MEDLINE | ID: mdl-38793908

Cervical auscultation is a simple, noninvasive method for diagnosing dysphagia, although the reliability of the method largely depends on the subjectivity and experience of the evaluator. Recently developed methods for the automatic detection of swallowing sounds facilitate a rough automatic diagnosis of dysphagia, although a reliable method of detection specialized in the peculiar feature patterns of swallowing sounds in actual clinical conditions has not been established. We investigated a novel approach for automatically detecting swallowing sounds by a method wherein basic statistics and dynamic features were extracted based on acoustic features: Mel Frequency Cepstral Coefficients and Mel Frequency Magnitude Coefficients, and an ensemble learning model combining Support Vector Machine and Multi-Layer Perceptron were applied. The evaluation of the effectiveness of the proposed method, based on a swallowing-sounds database synchronized to a video fluorographic swallowing study compiled from 74 advanced-age patients with dysphagia, demonstrated an outstanding performance. It achieved an F1-micro average of approximately 0.92 and an accuracy of 95.20%. The method, proven effective in the current clinical recording database, suggests a significant advancement in the objectivity of cervical auscultation. However, validating its efficacy in other databases is crucial for confirming its broad applicability and potential impact.


Auscultation , Databases, Factual , Deglutition Disorders , Deglutition , Humans , Deglutition/physiology , Deglutition Disorders/diagnosis , Deglutition Disorders/physiopathology , Auscultation/methods , Support Vector Machine , Male , Female , Aged , Machine Learning , Algorithms , Sound
15.
Sensors (Basel) ; 24(10)2024 May 13.
Article En | MEDLINE | ID: mdl-38793951

During robot-assisted rehabilitation, failure to recognize lower limb movement may efficiently limit the development of exoskeleton robots, especially for individuals with knee pathology. A major challenge encountered with surface electromyography (sEMG) signals generated by lower limb movements is variability between subjects, such as motion patterns and muscle structure. To this end, this paper proposes an sEMG-based lower limb motion recognition using an improved support vector machine (SVM). Firstly, non-negative matrix factorization (NMF) is leveraged to analyze muscle synergy for multi-channel sEMG signals. Secondly, the multi-nonlinear sEMG features are extracted, which reflect the complexity of muscle status change during various lower limb movements. The Fisher discriminant function method is utilized to perform feature selection and reduce feature dimension. Then, a hybrid genetic algorithm-particle swarm optimization (GA-PSO) method is leveraged to determine the best parameters for SVM. Finally, the experiments are carried out to distinguish 11 healthy and 11 knee pathological subjects by performing three different lower limb movements. Results demonstrate the effectiveness and feasibility of the proposed approach in three different lower limb movements with an average accuracy of 96.03% in healthy subjects and 93.65% in knee pathological subjects, respectively.


Algorithms , Electromyography , Lower Extremity , Movement , Support Vector Machine , Humans , Electromyography/methods , Lower Extremity/physiology , Male , Adult , Movement/physiology , Female , Signal Processing, Computer-Assisted , Young Adult , Muscle, Skeletal/physiology
16.
Sensors (Basel) ; 24(10)2024 May 16.
Article En | MEDLINE | ID: mdl-38794011

Livestock monitoring is a task traditionally carried out through direct observation by experienced caretakers. By analyzing its behavior, it is possible to predict to a certain degree events that require human action, such as calving. However, this continuous monitoring is in many cases not feasible. In this work, we propose, develop and evaluate the accuracy of intelligent algorithms that operate on data obtained by low-cost sensors to determine the state of the animal in the terms used by the caregivers (grazing, ruminating, walking, etc.). The best results have been obtained using aggregations and averages of the time series with support vector classifiers and tree-based ensembles, reaching accuracies of 57% for the general behavior problem (4 classes) and 85% for the standing behavior problem (2 classes). This is a preliminary step to the realization of event-specific predictions.


Algorithms , Machine Learning , Animals , Cattle , Behavior, Animal/physiology , Support Vector Machine , Humans , Monitoring, Physiologic/methods , Monitoring, Physiologic/instrumentation
17.
Front Biosci (Landmark Ed) ; 29(5): 197, 2024 May 21.
Article En | MEDLINE | ID: mdl-38812315

BACKGROUND: Ubiquitination is a crucial post-translational modification of proteins that regulates diverse cellular functions. Accurate identification of ubiquitination sites in proteins is vital for understanding fundamental biological mechanisms, such as cell cycle and DNA repair. Conventional experimental approaches are resource-intensive, whereas machine learning offers a cost-effective means of accurately identifying ubiquitination sites. The prediction of ubiquitination sites is species-specific, with many existing models being tailored for Arabidopsis thaliana (A. thaliana) and Homo sapiens (H. sapiens). However, these models have shortcomings in sequence window selection and feature extraction, leading to suboptimal performance. METHODS: This study initially employed the chi-square test to determine the optimal sequence window. Subsequently, a combination of six features was assessed: Binary Encoding (BE), Composition of K-Spaced Amino Acid Pair (CKSAAP), Enhanced Amino Acid Composition (EAAC), Position Weight Matrix (PWM), 531 Properties of Amino Acids (AA531), and Position-Specific Scoring Matrix (PSSM). Comparative evaluation involved three feature selection methods: Minimum Redundancy-Maximum Relevance (mRMR), Elastic net, and Null importances. Alongside these were four classifiers: Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The Null importances combined with the RF model exhibited superior predictive performance, and was denoted as UbNiRF (A. thaliana: ArUbNiRF; H. sapiens: HoUbNiRF). RESULTS: A comprehensive assessment indicated that UbNiRF is superior to existing prediction tools across five performance metrics. It notably excelled in the Matthews Correlation Coefficient (MCC), with values of 0.827 for the A. thaliana dataset and 0.781 for the H. sapiens dataset. Feature analysis underscores the significance of integrating six features and demonstrates their critical role in enhancing model performance. CONCLUSIONS: UbNiRF is a valuable predictive tool for identifying ubiquitination sites in both A. thaliana and H. sapiens. Its robust performance and species-specific discovery capabilities make it extremely useful for elucidating biological processes and disease mechanisms associated with ubiquitination.


Arabidopsis , Ubiquitination , Arabidopsis/metabolism , Arabidopsis/genetics , Humans , Computational Biology/methods , Machine Learning , Arabidopsis Proteins/metabolism , Arabidopsis Proteins/genetics , Algorithms , Support Vector Machine , Random Forest
18.
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue ; 36(4): 345-352, 2024 Apr.
Article Zh | MEDLINE | ID: mdl-38813626

OBJECTIVE: To construct and validate the best predictive model for 28-day death risk in patients with septic shock based on different supervised machine learning algorithms. METHODS: The patients with septic shock meeting the Sepsis-3 criteria were selected from Medical Information Mart for Intensive Care-IV v2.0 (MIMIC-IV v2.0). According to the principle of random allocation, 70% of these patients were used as the training set, and 30% as the validation set. Relevant predictive variables were extracted from three aspects: demographic characteristics and basic vital signs, serum indicators within 24 hours of intensive care unit (ICU) admission and complications possibly affecting indicators, functional scoring and advanced life support. The predictive efficacy of models constructed using five mainstream machine learning algorithms including decision tree classification and regression tree (CART), random forest (RF), support vector machine (SVM), linear regression (LR), and super learner [SL; combined CART, RF and extreme gradient boosting (XGBoost)] for 28-day death in patients with septic shock was compared, and the best algorithm model was selected. The optimal predictive variables were determined by intersecting the results from LASSO regression, RF, and XGBoost algorithms, and a predictive model was constructed. The predictive efficacy of the model was validated by drawing receiver operator characteristic curve (ROC curve), the accuracy of the model was assessed using calibration curves, and the practicality of the model was verified through decision curve analysis (DCA). RESULTS: A total of 3 295 patients with septic shock were included, with 2 164 surviving and 1 131 dying within 28 days, resulting in a mortality of 34.32%. Of these, 2 307 were in the training set (with 792 deaths within 28 days, a mortality of 34.33%), and 988 in the validation set (with 339 deaths within 28 days, a mortality of 34.31%). Five machine learning models were established based on the training set data. After including variables at three aspects, the area under the ROC curve (AUC) of RF, SVM, and LR machine learning algorithm models for predicting 28-day death in septic shock patients in the validation set was 0.823 [95% confidence interval (95%CI) was 0.795-0.849], 0.823 (95%CI was 0.796-0.849), and 0.810 (95%CI was 0.782-0.838), respectively, which were higher than that of the CART algorithm model (AUC = 0.750, 95%CI was 0.717-0.782) and SL algorithm model (AUC = 0.756, 95%CI was 0.724-0.789). Thus above three algorithm models were determined to be the best algorithm models. After integrating variables from three aspects, 16 optimal predictive variables were identified through intersection by LASSO regression, RF, and XGBoost algorithms, including the highest pH value, the highest albumin (Alb), the highest body temperature, the lowest lactic acid (Lac), the highest Lac, the highest serum creatinine (SCr), the highest Ca2+, the lowest hemoglobin (Hb), the lowest white blood cell count (WBC), age, simplified acute physiology score III (SAPS III), the highest WBC, acute physiology score III (APS III), the lowest Na+, body mass index (BMI), and the shortest activated partial thromboplastin time (APTT) within 24 hours of ICU admission. ROC curve analysis showed that the Logistic regression model constructed with above 16 optimal predictive variables was the best predictive model, with an AUC of 0.806 (95%CI was 0.778-0.835) in the validation set. The calibration curve and DCA curve showed that this model had high accuracy and the highest net benefit could reach 0.3, which was significantly outperforming traditional models based on single functional score [APS III score, SAPS III score, and sequential organ failure assessment (SOFA) score] with AUC (95%CI) of 0.746 (0.715-0.778), 0.765 (0.734-0.796), and 0.625 (0.589-0.661), respectively. CONCLUSIONS: The Logistic regression model, constructed using 16 optimal predictive variables including pH value, Alb, body temperature, Lac, SCr, Ca2+, Hb, WBC, SAPS III score, APS III score, Na+, BMI, and APTT, is identified as the best predictive model for the 28-day death risk in patients with septic shock. Its performance is stable, with high discriminative ability and accuracy.


Algorithms , Shock, Septic , Supervised Machine Learning , Support Vector Machine , Humans , Shock, Septic/mortality , Shock, Septic/diagnosis , Female , Prognosis , Intensive Care Units , Male , Middle Aged , Machine Learning , Decision Trees
19.
Sci Rep ; 14(1): 12043, 2024 05 27.
Article En | MEDLINE | ID: mdl-38802547

To compare and analyze the diagnostic value of different enhancement stages in distinguishing low and high nuclear grade clear cell renal cell carcinoma (ccRCC) based on enhanced computed tomography (CT) images by building machine learning classifiers. A total of 51 patients (Dateset1, including 41 low-grade and 10 high-grade) and 27 patients (Independent Dateset2, including 16 low-grade and 11 high-grade) with pathologically proven ccRCC were enrolled in this retrospective study. Radiomic features were extracted from the corticomedullary phase (CMP), nephrographic phase (NP), and excretory phase (EP) CT images, and selected using the recursive feature elimination cross-validation (RFECV) algorithm, the group differences were assessed using T-test and Mann-Whitney U test for continuous variables. The support vector machine (SVM), random forest (RF), XGBoost (XGB), VGG11, ResNet18, and GoogLeNet classifiers are established to distinguish low-grade and high-grade ccRCC. The classifiers based on CT images of NP (Dateset1, RF: AUC = 0.82 ± 0.05, ResNet18: AUC = 0.81 ± 0.02; Dateset2, XGB: AUC = 0.95 ± 0.02, ResNet18: AUC = 0.87 ± 0.07) obtained the best performance and robustness in distinguishing low-grade and high-grade ccRCC, while the EP-based classifier performance in poorer results. The CT images of enhanced phase NP had the best performance in diagnosing low and high nuclear grade ccRCC. Firstorder_Kurtosis and firstorder_90Percentile feature play a vital role in the classification task.


Carcinoma, Renal Cell , Kidney Neoplasms , Neoplasm Grading , Tomography, X-Ray Computed , Humans , Carcinoma, Renal Cell/diagnostic imaging , Carcinoma, Renal Cell/pathology , Carcinoma, Renal Cell/diagnosis , Tomography, X-Ray Computed/methods , Female , Male , Middle Aged , Kidney Neoplasms/diagnostic imaging , Kidney Neoplasms/pathology , Kidney Neoplasms/diagnosis , Kidney Neoplasms/classification , Aged , Retrospective Studies , Support Vector Machine , Adult , Machine Learning , Algorithms
20.
BMC Med Imaging ; 24(1): 124, 2024 May 27.
Article En | MEDLINE | ID: mdl-38802736

BACKGROUND: The prevalence of hypertensive heart disease (HHD) is high and there is currently no easy way to detect early HHD. Explore the application of radiomics using cardiac magnetic resonance (CMR) non-enhanced cine sequences in diagnosing HHD and latent cardiac changes caused by hypertension. METHODS: 132 patients who underwent CMR scanning were divided into groups: HHD (42), hypertension with normal cardiac structure and function (HWN) group (46), and normal control (NOR) group (44). Myocardial regions of the end-diastolic (ED) and end-systolic (ES) phases of the CMR short-axis cine sequence images were segmented into regions of interest (ROI). Three feature subsets (ED, ES, and ED combined with ES) were established after radiomic least absolute shrinkage and selection operator feature selection. Nine radiomic models were built using random forest (RF), support vector machine (SVM), and naive Bayes. Model performance was analyzed using receiver operating characteristic curves, and metrics like accuracy, area under the curve (AUC), precision, recall, and specificity. RESULTS: The feature subsets included first-order, shape, and texture features. SVM of ED combined with ES achieved the highest accuracy (0.833), with a macro-average AUC of 0.941. AUCs for HHD, HWN, and NOR identification were 0.967, 0.876, and 0.963, respectively. Precisions were 0.972, 0.740, and 0.826; recalls were 0.833, 0.804, and 0.863, respectively; and specificities were 0.989, 0.863, and 0.909, respectively. CONCLUSIONS: Radiomics technology using CMR non-enhanced cine sequences can detect early cardiac changes due to hypertension. It holds promise for future use in screening for latent cardiac damage in early HHD.


Early Diagnosis , Hypertension , Magnetic Resonance Imaging, Cine , Humans , Female , Male , Magnetic Resonance Imaging, Cine/methods , Middle Aged , Hypertension/diagnostic imaging , Hypertension/complications , Support Vector Machine , Heart Diseases/diagnostic imaging , Aged , Adult , Bayes Theorem , ROC Curve , Image Interpretation, Computer-Assisted/methods , Radiomics
...