Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 10.553
1.
PLoS One ; 19(5): e0302595, 2024.
Article En | MEDLINE | ID: mdl-38718024

Diabetes Mellitus is one of the oldest diseases known to humankind, dating back to ancient Egypt. The disease is a chronic metabolic disorder that heavily burdens healthcare providers worldwide due to the steady increment of patients yearly. Worryingly, diabetes affects not only the aging population but also children. It is prevalent to control this problem, as diabetes can lead to many health complications. As evolution happens, humankind starts integrating computer technology with the healthcare system. The utilization of artificial intelligence assists healthcare to be more efficient in diagnosing diabetes patients, better healthcare delivery, and more patient eccentric. Among the advanced data mining techniques in artificial intelligence, stacking is among the most prominent methods applied in the diabetes domain. Hence, this study opts to investigate the potential of stacking ensembles. The aim of this study is to reduce the high complexity inherent in stacking, as this problem contributes to longer training time and reduces the outliers in the diabetes data to improve the classification performance. In addressing this concern, a novel machine learning method called the Stacking Recursive Feature Elimination-Isolation Forest was introduced for diabetes prediction. The application of stacking with Recursive Feature Elimination is to design an efficient model for diabetes diagnosis while using fewer features as resources. This method also incorporates the utilization of Isolation Forest as an outlier removal method. The study uses accuracy, precision, recall, F1 measure, training time, and standard deviation metrics to identify the classification performances. The proposed method acquired an accuracy of 79.077% for PIMA Indians Diabetes and 97.446% for the Diabetes Prediction dataset, outperforming many existing methods and demonstrating effectiveness in the diabetes domain.


Diabetes Mellitus , Machine Learning , Humans , Diabetes Mellitus/diagnosis , Algorithms , Data Mining/methods , Support Vector Machine , Male
2.
Front Public Health ; 12: 1347219, 2024.
Article En | MEDLINE | ID: mdl-38726233

Background: Osteoporosis is becoming more common worldwide, imposing a substantial burden on individuals and society. The onset of osteoporosis is subtle, early detection is challenging, and population-wide screening is infeasible. Thus, there is a need to develop a method to identify those at high risk for osteoporosis. Objective: This study aimed to develop a machine learning algorithm to effectively identify people with low bone density, using readily available demographic and blood biochemical data. Methods: Using NHANES 2017-2020 data, participants over 50 years old with complete femoral neck BMD data were selected. This cohort was randomly divided into training (70%) and test (30%) sets. Lasso regression selected variables for inclusion in six machine learning models built on the training data: logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), naive Bayes (NB), artificial neural network (ANN) and random forest (RF). NHANES data from the 2013-2014 cycle was used as an external validation set input into the models to verify their generalizability. Model discrimination was assessed via AUC, accuracy, sensitivity, specificity, precision and F1 score. Calibration curves evaluated goodness-of-fit. Decision curves determined clinical utility. The SHAP framework analyzed variable importance. Results: A total of 3,545 participants were included in the internal validation set of this study, of whom 1870 had normal bone density and 1,675 had low bone density Lasso regression selected 19 variables. In the test set, AUC was 0.785 (LR), 0.780 (SVM), 0.775 (GBM), 0.729 (NB), 0.771 (ANN), and 0.768 (RF). The LR model has the best discrimination and a better calibration curve fit, the best clinical net benefit for the decision curve, and it also reflects good predictive power in the external validation dataset The top variables in the LR model were: age, BMI, gender, creatine phosphokinase, total cholesterol and alkaline phosphatase. Conclusion: The machine learning model demonstrated effective classification of low BMD using blood biomarkers. This could aid clinical decision making for osteoporosis prevention and management.


Bone Density , Machine Learning , Osteoporosis , Humans , Female , Middle Aged , Male , Osteoporosis/diagnosis , Aged , Algorithms , Nutrition Surveys , Logistic Models , Support Vector Machine
3.
Clin Respir J ; 18(5): e13769, 2024 May.
Article En | MEDLINE | ID: mdl-38736274

BACKGROUND: Lung cancer is the leading cause of cancer-related death worldwide. This study aimed to establish novel multiclassification prediction models based on machine learning (ML) to predict the probability of malignancy in pulmonary nodules (PNs) and to compare with three published models. METHODS: Nine hundred fourteen patients with PNs were collected from four medical institutions (A, B, C and D), which were organized into tables containing clinical features, radiologic features and laboratory test features. Patients were divided into benign lesion (BL), precursor lesion (PL) and malignant lesion (ML) groups according to pathological diagnosis. Approximately 80% of patients in A (total/male: 632/269, age: 57.73 ± 11.06) were randomly selected as a training set; the remaining 20% were used as an internal test set; and the patients in B (total/male: 94/53, age: 60.04 ± 11.22), C (total/male: 94/47, age: 59.30 ± 9.86) and D (total/male: 94/61, age: 62.0 ± 11.09) were used as an external validation set. Logical regression (LR), decision tree (DT), random forest (RF) and support vector machine (SVM) were used to establish prediction models. Finally, the Mayo model, Peking University People's Hospital (PKUPH) model and Brock model were externally validated in our patients. RESULTS: The AUC values of RF model for MLs, PLs and BLs were 0.80 (95% CI: 0.73-0.88), 0.90 (95% CI: 0.82-0.99) and 0.75 (95% CI: 0.67-0.88), respectively. The weighted average AUC value of the RF model for the external validation set was 0.71 (95% CI: 0.67-0.73), and its AUC values for MLs, PLs and BLs were 0.71 (95% CI: 0.68-0.79), 0.98 (95% CI: 0.88-1.07) and 0.68 (95% CI: 0.61-0.74), respectively. The AUC values of the Mayo model, PKUPH model and Brock model were 0.68 (95% CI: 0.62-0.74), 0.64 (95% CI: 0.58-0.70) and 0.57 (95% CI: 0.49-0.65), respectively. CONCLUSIONS: The RF model performed best, and its predictive performance was better than that of the three published models, which may provide a new noninvasive method for the risk assessment of PNs.


Lung Neoplasms , Machine Learning , Multiple Pulmonary Nodules , Aged , Female , Humans , Male , Middle Aged , Decision Trees , Lung Neoplasms/pathology , Lung Neoplasms/diagnosis , Lung Neoplasms/diagnostic imaging , Multiple Pulmonary Nodules/diagnostic imaging , Multiple Pulmonary Nodules/pathology , Multiple Pulmonary Nodules/diagnosis , Predictive Value of Tests , Retrospective Studies , ROC Curve , Solitary Pulmonary Nodule/diagnostic imaging , Solitary Pulmonary Nodule/pathology , Solitary Pulmonary Nodule/diagnosis , Support Vector Machine , Tomography, X-Ray Computed/methods
4.
PLoS One ; 19(5): e0303287, 2024.
Article En | MEDLINE | ID: mdl-38739586

Globally, stroke is the third-leading cause of mortality and disability combined, and one of the costliest diseases in society. More accurate predictions of stroke outcomes can guide healthcare organizations in allocating appropriate resources to improve care and reduce both the economic and social burden of the disease. We aim to develop and evaluate the performance and explainability of three supervised machine learning models and the traditional multinomial logistic regression (mLR) in predicting functional dependence and death three months after stroke, using routinely-collected data. This prognostic study included adult patients, registered in the Swedish Stroke Registry (Riksstroke) from 2015 to 2020. Riksstroke contains information on stroke care and outcomes among patients treated in hospitals in Sweden. Prognostic factors (features) included demographic characteristics, pre-stroke functional status, cardiovascular risk factors, medications, acute care, stroke type, and severity. The outcome was measured using the modified Rankin Scale at three months after stroke (a scale of 0-2 indicates independent, 3-5 dependent, and 6 dead). Outcome prediction models included support vector machines, artificial neural networks (ANN), eXtreme Gradient Boosting (XGBoost), and mLR. The models were trained and evaluated on 75% and 25% of the dataset, respectively. Model predictions were explained using SHAP values. The study included 102,135 patients (85.8% ischemic stroke, 53.3% male, mean age 75.8 years, and median NIHSS of 3). All models demonstrated similar overall accuracy (69%-70%). The ANN and XGBoost models performed significantly better than the mLR in classifying dependence with F1-scores of 0.603 (95% CI; 0.594-0.611) and 0.577 (95% CI; 0.568-0.586), versus 0.544 (95% CI; 0.545-0.563) for the mLR model. The factors that contributed most to the predictions were expectedly similar in the models, based on clinical knowledge. Our ANN and XGBoost models showed a modest improvement in prediction performance and explainability compared to mLR using routinely-collected data. Their improved ability to predict functional dependence may be of particular importance for the planning and organization of acute stroke care and rehabilitation.


Machine Learning , Stroke , Humans , Sweden/epidemiology , Male , Female , Stroke/physiopathology , Aged , Aged, 80 and over , Prognosis , Middle Aged , Registries , Support Vector Machine , Logistic Models , Neural Networks, Computer , Risk Factors
5.
PLoS One ; 19(5): e0302639, 2024.
Article En | MEDLINE | ID: mdl-38739639

Heart failure (HF) encompasses a diverse clinical spectrum, including instances of transient HF or HF with recovered ejection fraction, alongside persistent cases. This dynamic condition exhibits a growing prevalence and entails substantial healthcare expenditures, with anticipated escalation in the future. It is essential to classify HF patients into three groups based on their ejection fraction: reduced (HFrEF), mid-range (HFmEF), and preserved (HFpEF), such as for diagnosis, risk assessment, treatment choice, and the ongoing monitoring of heart failure. Nevertheless, obtaining a definitive prediction poses challenges, requiring the reliance on echocardiography. On the contrary, an electrocardiogram (ECG) provides a straightforward, quick, continuous assessment of the patient's cardiac rhythm, serving as a cost-effective adjunct to echocardiography. In this research, we evaluate several machine learning (ML)-based classification models, such as K-nearest neighbors (KNN), neural networks (NN), support vector machines (SVM), and decision trees (TREE), to classify left ventricular ejection fraction (LVEF) for three categories of HF patients at hourly intervals, using 24-hour ECG recordings. Information from heterogeneous group of 303 heart failure patients, encompassing HFpEF, HFmEF, or HFrEF classes, was acquired from a multicenter dataset involving both American and Greek populations. Features extracted from ECG data were employed to train the aforementioned ML classification models, with the training occurring in one-hour intervals. To optimize the classification of LVEF levels in coronary artery disease (CAD) patients, a nested cross-validation approach was employed for hyperparameter tuning. HF patients were best classified using TREE and KNN models, with an overall accuracy of 91.2% and 90.9%, and average area under the curve of the receiver operating characteristics (AUROC) of 0.98, and 0.99, respectively. Furthermore, according to the experimental findings, the time periods of midnight-1 am, 8-9 am, and 10-11 pm were the ones that contributed to the highest classification accuracy. The results pave the way for creating an automated screening system tailored for patients with CAD, utilizing optimal measurement timings aligned with their circadian cycles.


Electrocardiography , Heart Failure , Machine Learning , Stroke Volume , Ventricular Function, Left , Humans , Heart Failure/physiopathology , Heart Failure/diagnosis , Female , Male , Electrocardiography/methods , Aged , Ventricular Function, Left/physiology , Middle Aged , Circadian Rhythm/physiology , Support Vector Machine , Neural Networks, Computer
6.
J Neural Eng ; 21(3)2024 May 16.
Article En | MEDLINE | ID: mdl-38718789

Objective.Attention deficit hyperactivity disorder (ADHD) is a prevalent neurodevelopmental disorder in children. While numerous intelligent methods are applied for its subjective diagnosis, they seldom consider the consistency problem of ADHD biomarkers. In practice, these data-driven approaches lead to varying learned features for ADHD classification across diverse ADHD datasets. This phenomenon significantly undermines the reliability of identified biomarkers and hampers the interpretability of these methods.Approach.In this study, we propose a cross-dataset feature selection (FS) module using a grouped SVM-based recursive feature elimination approach (G-SVM-RFE) to enhance biomarker consistency across multiple datasets. Additionally, we employ connectome gradient data for ADHD classification. In details, we introduce the G-SVM-RFE method to effectively concentrate gradient components within a few brain regions, thereby increasing the likelihood of identifying these regions as ADHD biomarkers. The cross-dataset FS module is integrated into an existing binary hypothesis testing (BHT) framework. This module utilizes external datasets to identify global regions that yield stable biomarkers. Meanwhile, given a dataset which waits for implementing the classification task as local dataset, we learn its own specific regions to further improve the performance of accuracy on this dataset.Main results.By employing this module, our experiments achieve an average accuracy of 96.7% on diverse datasets. Importantly, the discriminative gradient components primarily originate from the global regions, providing evidence for the significance of these regions. We further identify regions with the high appearance frequencies as biomarkers, where all the used global regions and one local region are recognized.Significance.These biomarkers align with existing research on impaired brain regions in children with ADHD. Thus, our method demonstrates its validity by providing enhanced biological explanations derived from ADHD mechanisms.


Attention Deficit Disorder with Hyperactivity , Biomarkers , Support Vector Machine , Attention Deficit Disorder with Hyperactivity/diagnosis , Attention Deficit Disorder with Hyperactivity/classification , Humans , Biomarkers/analysis , Child , Male , Female , Connectome/methods , Brain/metabolism , Databases, Factual , Reproducibility of Results
7.
Comput Biol Med ; 175: 108447, 2024 Jun.
Article En | MEDLINE | ID: mdl-38691912

Deep vein thrombosis (DVT) represents a critical health concern due to its potential to lead to pulmonary embolism, a life-threatening complication. Early identification and prediction of DVT are crucial to prevent thromboembolic events and implement timely prophylactic measures in high-risk individuals. This study aims to examine the risk determinants associated with acute lower extremity DVT in hospitalized individuals. Additionally, it introduces an innovative approach by integrating Q-learning augmented colony predation search ant colony optimizer (QL-CPSACO) into the analysis. This algorithm, then combined with support vector machines (SVM), forms a bQL-CPSACO-SVM feature selection model dedicated to crafting a clinical risk prognostication model for DVT. The effectiveness of the proposed algorithm's optimization and the model's accuracy are assessed through experiments utilizing the CEC 2017 benchmark functions and predictive analyses on the DVT dataset. The experimental results reveal that the proposed model achieves an outstanding accuracy of 95.90% in predicting DVT. Key parameters such as D-dimer, normal plasma prothrombin time, prothrombin percentage activity, age, previously documented DVT, leukocyte count, and thrombocyte count demonstrate significant value in the prognostication of DVT. The proposed method provides a basis for risk assessment at the time of patient admission and offers substantial guidance to physicians in making therapeutic decisions.


Support Vector Machine , Venous Thrombosis , Humans , Female , Male , Algorithms , Middle Aged , Hospitalization , Aged , Risk Factors , Risk Assessment , Adult
8.
Spectrochim Acta A Mol Biomol Spectrosc ; 316: 124351, 2024 Aug 05.
Article En | MEDLINE | ID: mdl-38692109

Epidermal growth factor receptor (EGFR) plays a pivotal role in the initiation and progression of gliomas. In particular, in glioblastoma, EGFR amplification emerges as a catalyst for invasion, proliferation, and resistance to radiotherapy and chemotherapy. Current approaches are not capable of providing rapid diagnostic results of molecular pathology. In this study, we propose a terahertz spectroscopic approach for predicting the EGFR amplification status of gliomas for the first time. A machine learning model was constructed using the terahertz response of the measured glioma tissues, including the absorption coefficient, refractive index, and dielectric loss tangent. The novelty of our model is the integration of three classical base classifiers, i.e., support vector machine, random forest, and extreme gradient boosting. The ensemble learning method combines the advantages of various base classifiers, this model has more generalization ability. The effectiveness of the proposed method was validated by applying an individual test set. The optimal performance of the integrated algorithm was verified with an area under the curve (AUC) maximum of 85.8 %. This signifies a significant stride toward more effective and rapid diagnostic tools for guiding postoperative therapy in gliomas.


ErbB Receptors , Glioma , Terahertz Spectroscopy , Humans , Glioma/genetics , Glioma/pathology , Glioma/diagnosis , ErbB Receptors/genetics , ErbB Receptors/metabolism , Terahertz Spectroscopy/methods , Machine Learning , Brain Neoplasms/genetics , Brain Neoplasms/pathology , Gene Amplification , Algorithms , Support Vector Machine
9.
Environ Sci Technol ; 58(19): 8404-8416, 2024 May 14.
Article En | MEDLINE | ID: mdl-38698567

In densely populated urban areas, PM2.5 has a direct impact on the health and quality of residents' life. Thus, understanding the disparities of PM2.5 is crucial for ensuring urban sustainability and public health. Traditional prediction models often overlook the spillover effects within urban areas and the complexity of the data, leading to inaccurate spatial predictions of PM2.5. We propose Deep Support Vector Regression (DSVR) that models the urban areas as a graph, with grid center points as the nodes and the connections between grids as the edges. Nature and human activity features of each grid are initialized as the representation of each node. Based on the graph, DSVR uses random diffusion-based deep learning to quantify the spillover effects of PM2.5. It leverages random walk to uncover more extensive spillover relationships between nodes, thereby capturing both the local and nonlocal spillover effects of PM2.5. And then it engages in predictive learning using the feature vectors that encapsulate spillover effects, enhancing the understanding of PM2.5 disparities and connections across different regions. By applying our proposed model in the northern region of New York for predictive performance analysis, we found that DSVR consistently outperforms other models. During periods of PM2.5 surges, the R-square of DSVR reaches as high as 0.729, outperforming non-spillover models by 2.5 to 5.7 times and traditional spatial metric models by 2.2 to 4.6 times. Therefore, our proposed model holds significant importance for understanding disparities of PM2.5 air pollution in urban areas, taking the first steps toward a new method that considers both the spillover effects and nonlinear feature of data for prediction.


Air Pollution , Particulate Matter , Support Vector Machine , Humans , Air Pollutants/analysis , Cities , Environmental Monitoring
10.
PLoS One ; 19(5): e0301360, 2024.
Article En | MEDLINE | ID: mdl-38771772

Typical machine learning classification benchmark problems often ignore the full input data structures present in real-world classification problems. Here we aim to represent additional information as "hints" for classification. We show that under a specific realistic conditional independence assumption, the hint information can be included by late fusion. In two experiments involving image classification with hints taking the form of text metadata, we demonstrate the feasibility and performance of the fusion scheme. We fuse the output of pre-trained image classifiers with the output of pre-trained text models. We show that calibration of the pre-trained models is crucial for the performance of the fused model. We compare the performance of the fusion scheme with a mid-level fusion scheme based on support vector machines and find that these two methods tend to perform quite similarly, albeit the late fusion scheme has only negligible computational costs.


Support Vector Machine , Machine Learning , Algorithms , Image Processing, Computer-Assisted/methods , Humans
11.
Sci Rep ; 14(1): 10714, 2024 05 10.
Article En | MEDLINE | ID: mdl-38730250

A prompt diagnosis of breast cancer in its earliest phases is necessary for effective treatment. While Computer-Aided Diagnosis systems play a crucial role in automated mammography image processing, interpretation, grading, and early detection of breast cancer, existing approaches face limitations in achieving optimal accuracy. This study addresses these limitations by hybridizing the improved quantum-inspired binary Grey Wolf Optimizer with the Support Vector Machines Radial Basis Function Kernel. This hybrid approach aims to enhance the accuracy of breast cancer classification by determining the optimal Support Vector Machine parameters. The motivation for this hybridization lies in the need for improved classification performance compared to existing optimizers such as Particle Swarm Optimization and Genetic Algorithm. Evaluate the efficacy of the proposed IQI-BGWO-SVM approach on the MIAS dataset, considering various metric parameters, including accuracy, sensitivity, and specificity. Furthermore, the application of IQI-BGWO-SVM for feature selection will be explored, and the results will be compared. Experimental findings demonstrate that the suggested IQI-BGWO-SVM technique outperforms state-of-the-art classification methods on the MIAS dataset, with a resulting mean accuracy, sensitivity, and specificity of 99.25%, 98.96%, and 100%, respectively, using a tenfold cross-validation datasets partition.


Algorithms , Breast Neoplasms , Support Vector Machine , Humans , Breast Neoplasms/diagnosis , Female , Mammography/methods , Diagnosis, Computer-Assisted/methods
12.
J Neuroeng Rehabil ; 21(1): 69, 2024 May 09.
Article En | MEDLINE | ID: mdl-38725065

BACKGROUND: In the practical application of sarcopenia screening, there is a need for faster, time-saving, and community-friendly detection methods. The primary purpose of this study was to perform sarcopenia screening in community-dwelling older adults and investigate whether surface electromyogram (sEMG) from hand grip could potentially be used to detect sarcopenia using machine learning (ML) methods with reasonable features extracted from sEMG signals. The secondary aim was to provide the interpretability of the obtained ML models using a novel feature importance estimation method. METHODS: A total of 158 community-dwelling older residents (≥ 60 years old) were recruited. After screening through the diagnostic criteria of the Asian Working Group for Sarcopenia in 2019 (AWGS 2019) and data quality check, participants were assigned to the healthy group (n = 45) and the sarcopenic group (n = 48). sEMG signals from six forearm muscles were recorded during the hand grip task at 20% maximal voluntary contraction (MVC) and 50% MVC. After filtering recorded signals, nine representative features were extracted, including six time-domain features plus three time-frequency domain features. Then, a voting classifier ensembled by a support vector machine (SVM), a random forest (RF), and a gradient boosting machine (GBM) was implemented to classify healthy versus sarcopenic participants. Finally, the SHapley Additive exPlanations (SHAP) method was utilized to investigate feature importance during classification. RESULTS: Seven out of the nine features exhibited statistically significant differences between healthy and sarcopenic participants in both 20% and 50% MVC tests. Using these features, the voting classifier achieved 80% sensitivity and 73% accuracy through a five-fold cross-validation. Such performance was better than each of the SVM, RF, and GBM models alone. Lastly, SHAP results revealed that the wavelength (WL) and the kurtosis of continuous wavelet transform coefficients (CWT_kurtosis) had the highest feature impact scores. CONCLUSION: This study proposed a method for community-based sarcopenia screening using sEMG signals of forearm muscles. Using a voting classifier with nine representative features, the accuracy exceeds 70% and the sensitivity exceeds 75%, indicating moderate classification performance. Interpretable results obtained from the SHAP model suggest that motor unit (MU) activation mode may be a key factor affecting sarcopenia.


Electromyography , Hand Strength , Independent Living , Machine Learning , Sarcopenia , Humans , Sarcopenia/diagnosis , Sarcopenia/physiopathology , Electromyography/methods , Aged , Male , Female , Hand Strength/physiology , China , Middle Aged , Muscle, Skeletal/physiopathology , Support Vector Machine , Aged, 80 and over , East Asian People
13.
J Neural Eng ; 21(3)2024 May 17.
Article En | MEDLINE | ID: mdl-38722315

Objective.Electroencephalography (EEG) has been widely used in motor imagery (MI) research by virtue of its high temporal resolution and low cost, but its low spatial resolution is still a major criticism. The EEG source localization (ESL) algorithm effectively improves the spatial resolution of the signal by inverting the scalp EEG to extrapolate the cortical source signal, thus enhancing the classification accuracy.Approach.To address the problem of poor spatial resolution of EEG signals, this paper proposed a sub-band source chaotic entropy feature extraction method based on sub-band ESL. Firstly, the preprocessed EEG signals were filtered into 8 sub-bands. Each sub-band signal was source localized respectively to reveal the activation patterns of specific frequency bands of the EEG signals and the activities of specific brain regions in the MI task. Then, approximate entropy, fuzzy entropy and permutation entropy were extracted from the source signal as features to quantify the complexity and randomness of the signal. Finally, the classification of different MI tasks was achieved using support vector machine.Main result.The proposed method was validated on two MI public datasets (brain-computer interface (BCI) competition III IVa, BCI competition IV 2a) and the results showed that the classification accuracies were higher than the existing methods.Significance.The spatial resolution of the signal was improved by sub-band EEG localization in the paper, which provided a new idea for EEG MI research.


Brain-Computer Interfaces , Electroencephalography , Entropy , Imagination , Electroencephalography/methods , Humans , Imagination/physiology , Nonlinear Dynamics , Algorithms , Support Vector Machine , Movement/physiology , Reproducibility of Results
14.
J Cell Mol Med ; 28(9): e18372, 2024 May.
Article En | MEDLINE | ID: mdl-38747737

Multicellular organisms have dense affinity with the coordination of cellular activities, which severely depend on communication across diverse cell types. Cell-cell communication (CCC) is often mediated via ligand-receptor interactions (LRIs). Existing CCC inference methods are limited to known LRIs. To address this problem, we developed a comprehensive CCC analysis tool SEnSCA by integrating single cell RNA sequencing and proteome data. SEnSCA mainly contains potential LRI acquisition and CCC strength evaluation. For acquiring potential LRIs, it first extracts LRI features and reduces the feature dimension, subsequently constructs negative LRI samples through K-means clustering, finally acquires potential LRIs based on Stacking ensemble comprising support vector machine, 1D-convolutional neural networks and multi-head attention mechanism. During CCC strength evaluation, SEnSCA conducts LRI filtering and then infers CCC by combining the three-point estimation approach and single cell RNA sequencing data. SEnSCA computed better precision, recall, accuracy, F1 score, AUC and AUPR under most of conditions when predicting possible LRIs. To better illustrate the inferred CCC network, SEnSCA provided three visualization options: heatmap, bubble diagram and network diagram. Its application on human melanoma tissue demonstrated its reliability in CCC detection. In summary, SEnSCA offers a useful CCC inference tool and is freely available at https://github.com/plhhnu/SEnSCA.


Cell Communication , Single-Cell Analysis , Humans , Ligands , Single-Cell Analysis/methods , Software , Computational Biology/methods , Algorithms , Support Vector Machine , Sequence Analysis, RNA/methods , Melanoma/metabolism , Melanoma/pathology , Melanoma/genetics , Proteome/metabolism , Neural Networks, Computer
15.
Sci Rep ; 14(1): 11022, 2024 05 14.
Article En | MEDLINE | ID: mdl-38745042

The (re)hemorrhage in patients with sporadic cerebral cavernous malformations (CCM) was the primary aim for CCM management. However, accurately identifying the potential (re)hemorrhage among sporadic CCM patients in advance remains a challenge. This study aims to develop machine learning models to detect potential (re)hemorrhage in sporadic CCM patients. This study was based on a dataset of 731 sporadic CCM patients in open data platform Dryad. Sporadic CCM patients were followed up 5 years from January 2003 to December 2018. Support vector machine (SVM), stacked generalization, and extreme gradient boosting (XGBoost) were used to construct models. The performance of models was evaluated by area under receiver operating characteristic curves (AUROC), area under the precision-recall curve (PR-AUC) and other metrics. A total of 517 patients with sporadic CCM were included (330 female [63.8%], mean [SD] age at diagnosis, 42.1 [15.5] years). 76 (re)hemorrhage (14.7%) occurred during follow-up. Among 3 machine learning models, XGBoost model yielded the highest mean (SD) AUROC (0.87 [0.06]) in cross-validation. The top 4 features of XGBoost model were ranked with SHAP (SHapley Additive exPlanations). All-Elements XGBoost model achieved an AUROCs of 0.84 and PR-AUC of 0.49 in testing set, with a sensitivity of 0.86 and a specificity of 0.76. Importantly, 4-Elements XGBoost model developed using top 4 features got a AUROCs of 0.83 and PR-AUC of 0.40, a sensitivity of 0.79, and a specificity of 0.72 in testing set. Two machine learning-based models achieved accurate performance in identifying potential (re)hemorrhages within 5 years in sporadic CCM patients. These models may provide insights for clinical decision-making.


Hemangioma, Cavernous, Central Nervous System , Machine Learning , Humans , Female , Male , Hemangioma, Cavernous, Central Nervous System/diagnosis , Adult , Middle Aged , Support Vector Machine , ROC Curve , Cerebral Hemorrhage/diagnosis
16.
Sci Rep ; 14(1): 11164, 2024 05 15.
Article En | MEDLINE | ID: mdl-38750185

Electrophysiological studies have investigated predictive processing in music by examining event-related potentials (ERPs) elicited by the violation of musical expectations. While several studies have reported that the predictability of stimuli can modulate the amplitude of ERPs, it is unclear how specific the representation of the expected note is. The present study addressed this issue by recording the omitted stimulus potentials (OSPs) to avoid contamination of bottom-up sensory processing with top-down predictive processing. Decoding of the omitted content was attempted using a support vector machine, which is a type of machine learning. ERP responses to the omission of four target notes (E, F, A, and C) at the same position in familiar and unfamiliar melodies were recorded from 25 participants. The results showed that the omission N1 were larger in the familiar melody condition than in the unfamiliar melody condition. The decoding accuracy of the four omitted notes was significantly higher in the familiar melody condition than in the unfamiliar melody condition. These results suggest that the OSPs contain discriminable predictive information, and the higher the predictability, the more the specific representation of the expected note is generated.


Acoustic Stimulation , Electroencephalography , Music , Humans , Female , Male , Young Adult , Adult , Auditory Perception/physiology , Support Vector Machine , Evoked Potentials, Auditory/physiology , Evoked Potentials/physiology
17.
Neuroimage ; 293: 120625, 2024 Jun.
Article En | MEDLINE | ID: mdl-38704056

Principal component analysis (PCA) has been widely employed for dimensionality reduction prior to multivariate pattern classification (decoding) in EEG research. The goal of the present study was to provide an evaluation of the effectiveness of PCA on decoding accuracy (using support vector machines) across a broad range of experimental paradigms. We evaluated several different PCA variations, including group-based and subject-based component decomposition and the application of Varimax rotation or no rotation. We also varied the numbers of PCs that were retained for the decoding analysis. We evaluated the resulting decoding accuracy for seven common event-related potential components (N170, mismatch negativity, N2pc, P3b, N400, lateralized readiness potential, and error-related negativity). We also examined more challenging decoding tasks, including decoding of face identity, facial expression, stimulus location, and stimulus orientation. The datasets also varied in the number and density of electrode sites. Our findings indicated that none of the PCA approaches consistently improved decoding performance related to no PCA, and the application of PCA frequently reduced decoding performance. Researchers should therefore be cautious about using PCA prior to decoding EEG data from similar experimental paradigms, populations, and recording setups.


Electroencephalography , Principal Component Analysis , Support Vector Machine , Humans , Electroencephalography/methods , Female , Male , Adult , Young Adult , Evoked Potentials/physiology , Brain/physiology , Signal Processing, Computer-Assisted
18.
Schizophr Res ; 267: 519-527, 2024 May.
Article En | MEDLINE | ID: mdl-38704344

BACKGROUND: Previous investigations have revealed substantial differences in neuroimaging characteristics between healthy controls (HCs) and individuals diagnosed with schizophrenia (SCZ). However, we are not entirely sure how brain activity links to symptoms in schizophrenia, and there is a need for reliable brain imaging markers for treatment prediction. METHODS: In this longitudinal study, we examined 56 individuals diagnosed with 56 SCZ and 51 HCs. The SCZ patients underwent a three-month course of antipsychotic treatment. We employed resting-state functional magnetic resonance imaging (fMRI) along with fractional Amplitude of Low Frequency Fluctuations (fALFF) and support vector regression (SVR) methods for data acquisition and subsequent analysis. RESULTS: In this study, we initially noted lower fALFF values in the right postcentral/precentral gyrus and left postcentral gyrus, coupled with higher fALFF values in the left hippocampus and right putamen in SCZ patients compared to the HCs at baseline. However, when comparing fALFF values in brain regions with abnormal baseline fALFF values for SCZ patients who completed the follow-up, no significant differences in fALFF values were observed after 3 months of treatment compared to baseline data. The fALFF values in the right postcentral/precentral gyrus and left postcentral gyrus, and the left postcentral gyrus were useful in predicting treatment effects. CONCLUSION: Our findings suggest that reduced fALFF values in the sensory-motor networks and increased fALFF values in the limbic system may constitute distinctive neurobiological features in SCZ patients. These findings may serve as potential neuroimaging markers for the prognosis of SCZ patients.


Antipsychotic Agents , Limbic System , Magnetic Resonance Imaging , Schizophrenia , Humans , Schizophrenia/physiopathology , Schizophrenia/diagnostic imaging , Schizophrenia/drug therapy , Male , Female , Adult , Antipsychotic Agents/pharmacology , Limbic System/diagnostic imaging , Limbic System/physiopathology , Longitudinal Studies , Young Adult , Treatment Outcome , Outcome Assessment, Health Care , Middle Aged , Support Vector Machine
19.
Aquat Toxicol ; 271: 106936, 2024 Jun.
Article En | MEDLINE | ID: mdl-38723470

In recent years, with the rapid development of society, organic compounds have been released into aquatic environments in various forms, posing a significant threat to the survival of aquatic organisms. The assessment of developmental toxicity is an important part of environmental safety risk systems, helping to identify the potential impacts of organic compounds on the embryonic development of aquatic organisms and enabling early detection and warning of potential ecological risks. Additionally, binary classification models cannot accurately classify organic compounds. Therefore, it is crucial to construct a multiclassification model for predicting the developmental toxicity of organic compounds. In this study, binary and multiclassification models were developed based on the ToxCast™ Phase I chemical library and literature data. The random forest, support vector machine, extreme gradient boosting, adaptive gradient boosting, and C5.0 decision tree algorithms, as well as 8 types of molecular fingerprint were used to establish a multiclassification base model for predicting developmental toxicity through 5-fold cross-validation and external validation. Ultimately, a multiclassification ensemble model was derived through a voting method. The performance of the binary ensemble model, as measured by the balanced accuracy, was 0.918, while that of the multiclassification model was 0.819. The developmental toxicity voting ensemble model (DT-VEM) achieved accuracies of 0.804, 0.834, and 0.855. Furthermore, by utilizing the XGBoost machine learning algorithm to construct separate models for molecular descriptors and substructure molecular fingerprints, we identified several substructures and physical properties related to developmental toxicity. Our research contributes to a more detailed classification of developmental toxicity, providing a new and valuable tool for predicting the developmental toxicity effects of unknown compounds. This supplement addresses the limitations of previous tools, as it offers an enhanced ability to predict potential developmental toxicity in novel compounds.


Water Pollutants, Chemical , Zebrafish , Animals , Water Pollutants, Chemical/toxicity , Embryo, Nonmammalian/drug effects , Toxicity Tests , Embryonic Development/drug effects , Models, Biological , Algorithms , Support Vector Machine , Organic Chemicals/toxicity
20.
Chemosphere ; 358: 142222, 2024 Jun.
Article En | MEDLINE | ID: mdl-38714249

In this study, neural networks and support vector regression (SVR) were employed to predict the degradation over three pharmaceutically active compounds (PhACs): Ibuprofen (IBP), diclofenac (DCF), and caffeine (CAF) within a stirred reactor featuring a flotation cell with two non-concentric ultraviolet lamps. A total of 438 datapoints were collected from published works and distributed into 70% training and 30% test datasets while cross-validation was utilized to assess the training reliability. The models incorporated 15 input variables concerning reaction kinetics, molecular properties, hydrodynamic information, presence of radiation, and catalytic properties. It was observed that the Support Vector Regression (SVR) presented a poor performance as the ε hyperparameter ignored large error over low concentration levels. Meanwhile, the Artificial Neural Networks (ANN) model was able to provide rough estimations on the expected degradation of the pollutants without requiring information regarding reaction rate constants. The multi-objective optimization analysis suggested a leading role due to ozone kinetic for a rapid degradation of the contaminants and most of the results required intensification with hydrogen peroxide and Fenton process. Although both models were affected by accuracy limitations, this work provided a lightweight model to evaluate different Advanced Oxidation Processes (AOPs) by providing general information regarding the process operational conditions as well as know molecular and catalytic properties.


Diclofenac , Hydrogen Peroxide , Ibuprofen , Machine Learning , Neural Networks, Computer , Diclofenac/chemistry , Hydrogen Peroxide/chemistry , Ibuprofen/chemistry , Kinetics , Water Pollutants, Chemical/chemistry , Water Pollutants, Chemical/analysis , Caffeine/chemistry , Oxidation-Reduction , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/analysis , Ozone/chemistry , Support Vector Machine , Cost-Benefit Analysis , Ultraviolet Rays , Catalysis , Photolysis
...