Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 538
Filtrer
1.
Hum Exp Toxicol ; 43: 9603271241276981, 2024.
Article de Anglais | MEDLINE | ID: mdl-39226487

RÉSUMÉ

Currently, the incidence of diquat (DQ) poisoning is increasing, and quickly predicting the prognosis of poisoned patients is crucial for clinical treatment. In this study, a total of 84 DQ poisoning patients were included, with 38 surviving and 46 deceased. The plasma DQ concentration of DQ poisoned patients, determined by liquid chromatography-mass spectrometry (LC-MS) were collected and analyzed with their complete blood count (CBC) indicators. Based on DQ concentration and CBC dataset, the random forest of diagnostic and prognostic models were established. The results showed that the initial DQ plasma concentration was highly correlated with patient prognosis. There was data redundancy in the CBC dataset, continuous measurement of CBC tests could improve the model's predictive accuracy. After feature selection, the predictive accuracy of the CBC dataset significantly increased to 0.81 ± 0.17, with the most important features being white blood cells and neutrophils. The constructed CBC random forest prediction model achieved a high predictive accuracy of 0.95 ± 0.06 when diagnosing DQ poisoning. In conclusion, both DQ concentration and CBC dataset can be used to predict the prognosis of DQ treatment. In the absence of DQ concentration, the random forest model using CBC data can effectively diagnose DQ poisoning and patient's prognosis.


Sujet(s)
Algorithmes , Diquat , Humains , Diquat/sang , Diquat/intoxication , Femelle , Mâle , Pronostic , Adulte , Hémogramme , Adulte d'âge moyen , Herbicides/intoxication , Herbicides/sang , Jeune adulte , Adolescent , Forêts aléatoires
2.
Bioinformatics ; 40(Suppl 2): ii198-ii207, 2024 09 01.
Article de Anglais | MEDLINE | ID: mdl-39230698

RÉSUMÉ

MOTIVATION: In the realm of precision medicine, effective patient stratification and disease subtyping demand innovative methodologies tailored for multi-omics data. Clustering techniques applied to multi-omics data have become instrumental in identifying distinct subgroups of patients, enabling a finer-grained understanding of disease variability. Meanwhile, clinical datasets are often small and must be aggregated from multiple hospitals. Online data sharing, however, is seen as a significant challenge due to privacy concerns, potentially impeding big data's role in medical advancements using machine learning. This work establishes a powerful framework for advancing precision medicine through unsupervised random forest-based clustering in combination with federated computing. RESULTS: We introduce a novel multi-omics clustering approach utilizing unsupervised random forests. The unsupervised nature of the random forest enables the determination of cluster-specific feature importance, unraveling key molecular contributors to distinct patient groups. Our methodology is designed for federated execution, a crucial aspect in the medical domain where privacy concerns are paramount. We have validated our approach on machine learning benchmark datasets as well as on cancer data from The Cancer Genome Atlas. Our method is competitive with the state-of-the-art in terms of disease subtyping, but at the same time substantially improves the cluster interpretability. Experiments indicate that local clustering performance can be improved through federated computing. AVAILABILITY AND IMPLEMENTATION: The proposed methods are available as an R-package (https://github.com/pievos101/uRF).


Sujet(s)
Médecine de précision , Humains , Analyse de regroupements , Médecine de précision/méthodes , Apprentissage machine non supervisé , Apprentissage machine , Tumeurs , Vie privée , Algorithmes , Forêts aléatoires
3.
BMC Endocr Disord ; 24(1): 196, 2024 Sep 20.
Article de Anglais | MEDLINE | ID: mdl-39304867

RÉSUMÉ

OBJECTIVE: The primary objective of this study was to investigate the risk factors for diabetic peripheral neuropathy (DPN) and to establish an early diagnostic prediction model for its onset, based on clinical data and biochemical indices. METHODS: Retrospective data were collected from 1,446 diabetic patients at the First Affiliated Hospital of Anhui University of Chinese Medicine and were split into training and internal validation sets in a 7:3 ratio. Additionally, 360 diabetic patients from the Second Affiliated Hospital were used as an external validation cohort. Feature selection was conducted within the training set, where univariate logistic regression identified variables with a p-value < 0.05, followed by backward elimination to construct the logistic regression model. Concurrently, the random forest algorithm was applied to the training set to identify the top 10 most important features, with hyperparameter optimization performed via grid search combined with cross-validation. Model performance was evaluated using ROC curves, decision curve analysis, and calibration curves. Model fit was assessed using the Hosmer-Lemeshow test, followed by Brier Score evaluation for the random forest model. Ten-fold cross-validation was employed for further validation, and SHAP analysis was conducted to enhance model interpretability. RESULTS: A nomogram model was developed using logistic regression with key features: limb numbness, limb pain, diabetic retinopathy, diabetic kidney disease, urinary protein, diastolic blood pressure, white blood cell count, HbA1c, and high-density lipoprotein cholesterol. The model achieved AUCs of 0.91, 0.88, and 0.88 for the training, validation, and test sets, respectively, with a mean AUC of 0.902 across 10-fold cross-validation. Hosmer-Lemeshow test results showed p-values of 0.595, 0.418, and 0.126 for the training, validation, and test sets, respectively. The random forest model demonstrated AUCs of 0.95, 0.88, and 0.88 for the training, validation, and test sets, respectively, with a mean AUC of 0.886 across 10-fold cross-validation. The Brier score indicates a good calibration level, with values of 0.104, 0.143, and 0.142 for the training, validation, and test sets, respectively. CONCLUSION: The developed nomogram exhibits promise as an effective tool for the diagnosis of diabetic peripheral neuropathy in clinical settings.


Sujet(s)
Neuropathies diabétiques , Humains , Neuropathies diabétiques/diagnostic , Neuropathies diabétiques/étiologie , Mâle , Femelle , Adulte d'âge moyen , Études rétrospectives , Modèles logistiques , Sujet âgé , Nomogrammes , Facteurs de risque , Diagnostic précoce , Diabète de type 2/complications , Pronostic , Adulte , Algorithmes , Forêts aléatoires
4.
Environ Geochem Health ; 46(10): 418, 2024 Sep 09.
Article de Anglais | MEDLINE | ID: mdl-39249634

RÉSUMÉ

Fluoride (F) is a trace element that is essential to the human body and occurs naturally in the environment. However, a deficiency or excess of F in the environment can potentially lead to human health issues. The pseudototal amount of F in soil often does not correlate directly with the F content in plants. Instead, the F content within plants tends to have a greater correlation with the bioavailable F in soils. In large-scale soil surveys, only the pseudototal elemental content of soils is typically measured, which may not be highly reliable for developing agricultural zoning plans. There are significant variations in the ability of different plants to accumulate F from soil. Additionally, due to variations in soil elemental absorption mechanisms among different plant species, when multiple crops are grown in an area, it is typically necessary to study the elemental absorption mechanisms of each crop. To address these issues, in this study, we examined the factors influencing F bioaccumulation coefficients in different crops based on 1:50,000 soil geochemical survey data. Using the random forest algorithm, four indicators-bioavailable P, bioavailable Zn, leachable Pb, and Sr-were selected from among 29 parameters to predict the F content within crops to replace bioavailable F in the soil. Compared with the multivariate linear regression (MLR) model, the random forest (RF) model provided more accurate and reliable predictions of the fluoride content in crops, with the RF model's prediction accuracy improving by approximately 95.23%. Additionally, while the partial least squares regression (PLSR) model also offered improved accuracy over MLR, the RF model still outperformed PLSR in terms of prediction accuracy and robustness. Additionally, it maximized the utilization of existing geochemical survey data, enabling cross-species studies for the first time and avoiding redundant evaluations of different types of agricultural products in the same region. In this investigation, we selected the Xining-Ledu region of Qinghai Province, China, as the study area and employed a random forest model to predict the crop F content in soils, providing a new methodological framework for crop production that effectively enhances agricultural quality and efficiency.


Sujet(s)
Algorithmes , Produits agricoles , Fluorures , Polluants du sol , Produits agricoles/composition chimique , Produits agricoles/métabolisme , Fluorures/analyse , Polluants du sol/analyse , Sol/composition chimique , Surveillance de l'environnement/méthodes , Modèles linéaires , Forêts aléatoires
5.
PLoS One ; 19(9): e0308408, 2024.
Article de Anglais | MEDLINE | ID: mdl-39325753

RÉSUMÉ

Childhood and adolescent overweight and obesity are one of the most serious public health challenges of the 21st century. A range of genetic, family, and environmental factors, and health behaviors are associated with childhood obesity. Developing models to predict childhood obesity requires careful examination of how these factors contribute to the emergence of childhood obesity. This paper has employed Multiple Linear Regression (MLR), Random Forest (RF), Decision Tree (DT), and K-Nearest Neighbour (KNN) models to predict the age at the onset of childhood obesity in Saudi Arabia (S.A.) and to identify the significant factors associated with it. De-identified data from Arar and Riyadh regions of S.A. were used to develop the prediction models and to compare their performance using multi-prediction accuracy measures. The average age at the onset of obesity is 10.8 years with no significant difference between boys and girls. The most common age group for onset is (5-15) years. RF model with the R2 = 0.98, the root mean square error = 0.44, and mean absolute error = 0.28 outperformed other models followed by MLR, DT, and KNN. The age at the onset of obesity was linked to several demographic, medical, and lifestyle factors including height and weight, parents' education level and income, consanguineous marriage, family history, autism, gestational age, nutrition in the first 6 months, birth weight, sleep hours, and lack of physical activities. The results can assist in reducing the childhood obesity epidemic in Saudi Arabia by identifying and managing high-risk individuals and providing better preventive care. Furthermore, the study findings can assist in predicting and preventing childhood obesity in other populations.


Sujet(s)
Âge de début , Arbres de décision , Obésité pédiatrique , Humains , Arabie saoudite/épidémiologie , Obésité pédiatrique/épidémiologie , Enfant , Mâle , Femelle , Adolescent , Enfant d'âge préscolaire , Facteurs de risque , Forêts aléatoires
6.
PLoS One ; 19(9): e0310110, 2024.
Article de Anglais | MEDLINE | ID: mdl-39240957

RÉSUMÉ

When conducting condition recognition research on AC contactor vibration signals through time-frequency analysis, the feature data exhibit a high degree of redundancy, which leads to repetitive information and hinders the accuracy of recognition. To address the redundancy issue in the features of AC contactor vibration signals, this study introduces a feature selection method based on Regularized Random Forest with Recursive Selection (RFRS). Initially, a test platform for AC contactor vibration signals was established, and time-frequency domain features of the AC contactor vibration signals were extracted. Subsequently, the traditional Random Forest (RF) was refined by optimizing its stopping criteria using the Recursive Feature Elimination approach and by incorporating a regularization coefficient during the splitting process to direct the split towards significant features. This modification not only enhances the Random Forest's capacity to leverage existing information but also introduces a bias, enabling it to favor important features. Finally, through case analysis, the proposed method effectively reduced the dimensionality of the feature set and achieved an average of 87.37% for Recall, 87.41% for F1-Score, 88.38% for Precision, and 85.74% for Accuracy. The overall performance of this method surpasses that of the three mainstream feature selection methods: Spearman's rank correlation coefficient method, the embedded method, and the filter method. This study thus provides a rather effective feature selection approach for the state recognition study of AC contactors.


Sujet(s)
Vibration , Algorithmes , Humains , Traitement du signal assisté par ordinateur , Forêts aléatoires
7.
Sci Rep ; 14(1): 22673, 2024 09 30.
Article de Anglais | MEDLINE | ID: mdl-39349769

RÉSUMÉ

The COVID-19 pandemic has underscored the critical need for precise diagnostic methods to distinguish between similar respiratory infections, such as COVID-19 and Mycoplasma pneumoniae (MP). Identifying key biomarkers and utilizing machine learning techniques, such as random forest analysis, can significantly improve diagnostic accuracy. We conducted a retrospective analysis of clinical and laboratory data from 214 patients with acute respiratory infections, collected between October 2022 and October 2023 at the Second Hospital of Nanping. The study population was categorized into three groups: COVID-19 positive (n = 52), MP positive (n = 140), and co-infected (n = 22). Key biomarkers, including C-reactive protein (CRP), procalcitonin (PCT), interleukin- 6 (IL-6), and white blood cell (WBC) counts, were evaluated. Correlation analyses were conducted to assess relationships between biomarkers within each group. The random forest analysis was applied to evaluate the discriminative power of these biomarkers. The random forest model demonstrated high classification performance, with area under the ROC curve (AUC) scores of 0.86 (95% CI: 0.70-0.97) for COVID-19, 0.79 (95% CI: 0.64-0.92) for MP, 0.69 (95% CI: 0.50-0.87) for co-infections, and 0.90 (95% CI: 0.83-0.95) for the micro-average ROC. Additionally, the precision-recall curve for the random forest classifier showed a micro-average AUC of 0.80 (95% CI: 0.69-0.91). Confusion matrices highlighted the model's accuracy (0.77) and biomarker relationships. The SHAP feature importance analysis indicated that age (0.27), CRP (0.25), IL6 (0.14), and PCT (0.14) were the most significant predictors. The integration of computational methods, particularly random forest analysis, in evaluating clinical and biomarker data presents a promising approach for enhancing diagnostic processes for infectious diseases. Our findings support the use of specific biomarkers in differentiating between COVID-19 and MP, potentially leading to more targeted and effective diagnostic strategies. This study underscores the potential of machine learning techniques in improving disease classification in the era of precision medicine.


Sujet(s)
Marqueurs biologiques , Protéine C-réactive , COVID-19 , Apprentissage machine , Pneumopathie à mycoplasmes , Procalcitonine , Humains , COVID-19/diagnostic , COVID-19/sang , Pneumopathie à mycoplasmes/diagnostic , Pneumopathie à mycoplasmes/sang , Marqueurs biologiques/sang , Mâle , Femelle , Adulte d'âge moyen , Études rétrospectives , Adulte , Diagnostic différentiel , Procalcitonine/sang , Protéine C-réactive/analyse , Protéine C-réactive/métabolisme , Sujet âgé , Interleukine-6/sang , SARS-CoV-2/isolement et purification , Co-infection/diagnostic , Co-infection/sang , Mycoplasma pneumoniae , Numération des leucocytes , Courbe ROC , Forêts aléatoires
8.
Poult Sci ; 103(11): 104201, 2024 Nov.
Article de Anglais | MEDLINE | ID: mdl-39197340

RÉSUMÉ

The differences in lipids in duck eggs between the 2 rearing systems during storage have not been fully studied. Herein, we propose untargeted lipidomics combined with a random forest (RF) algorithm to identify potential marker lipids based on ultra-performance liquid chromatography‒mass spectrometry (UPLPC-MS/MS). A total of 106 and 16 differential lipids (DL) were screened in egg yolk and white, respectively. In yolk, metabolic pathway analysis of DLs revealed that glycerophospholipid metabolism and sphingolipid metabolism were the key metabolic pathways in the traditional free-range system (TFS) during storage, glycosylphosphatidylinositol-anchored biosynthesis and glyceride metabolism were the key pathways in the floor-rearing system (FRS). In egg white, the key pathway in both systems is the biosynthesis of unsaturated fatty acids. Combined with RF algorithm, 12 marker lipids were screened during storage. Therefore, this study elucidates the changes in lipids in duck eggs during storage in 2 rearing systems and provides new ideas for screening marker lipids during storage. This approach is highly important for evaluating the quality of egg and egg products and provides guidance for duck egg production.


Sujet(s)
Canards , Lipidomique , Apprentissage machine , Animaux , Lipidomique/méthodes , Élevage/méthodes , Stockage des aliments , Algorithmes , Jaune d'œuf/composition chimique , Spectrométrie de masse en tandem/médecine vétérinaire , Ovule/composition chimique , Blanc d'oeuf/composition chimique , Lipides/analyse , Lipides/composition chimique , Forêts aléatoires
9.
Cells ; 13(16)2024 Aug 06.
Article de Anglais | MEDLINE | ID: mdl-39195201

RÉSUMÉ

Colorectal cancer (CRC) is a frequent, worldwide tumor described for its huge complexity, including inter-/intra-heterogeneity and tumor microenvironment (TME) variability. Intra-tumor heterogeneity and its connections with metabolic reprogramming and epithelial-mesenchymal transition (EMT) were investigated with explorative shotgun proteomics complemented by a Random Forest (RF) machine-learning approach. Deep and superficial tumor regions and distant-site non-tumor samples from the same patients (n = 16) were analyzed. Among the 2009 proteins analyzed, 91 proteins, including 23 novel potential CRC hallmarks, showed significant quantitative changes. In addition, a 98.4% accurate classification of the three analyzed tissues was obtained by RF using a set of 21 proteins. Subunit E1 of 2-oxoglutarate dehydrogenase (OGDH-E1) was the best classifying factor for the superficial tumor region, while sorting nexin-18 and coatomer-beta protein (beta-COP), implicated in protein trafficking, classified the deep region. Down- and up-regulations of metabolic checkpoints involved different proteins in superficial and deep tumors. Analogously to immune checkpoints affecting the TME, cytoskeleton and extracellular matrix (ECM) dynamics were crucial for EMT. Galectin-3, basigin, S100A9, and fibronectin involved in TME-CRC-ECM crosstalk were found to be differently variated in both tumor regions. Different metabolic strategies appeared to be adopted by the two CRC regions to uncouple the Krebs cycle and cytosolic glucose metabolism, promote lipogenesis, promote amino acid synthesis, down-regulate bioenergetics in mitochondria, and up-regulate oxidative stress. Finally, correlations with the Dukes stage and budding supported the finding of novel potential CRC hallmarks and therapeutic targets.


Sujet(s)
Tumeurs colorectales , Matrice extracellulaire , Apprentissage machine , Protéomique , Microenvironnement tumoral , Humains , Tumeurs colorectales/métabolisme , Tumeurs colorectales/anatomopathologie , Tumeurs colorectales/immunologie , Protéomique/méthodes , Matrice extracellulaire/métabolisme , Transition épithélio-mésenchymateuse , Transduction du signal , Mâle , Femelle , Adulte d'âge moyen , Sujet âgé , Forêts aléatoires
10.
Food Chem ; 461: 140838, 2024 Dec 15.
Article de Anglais | MEDLINE | ID: mdl-39167944

RÉSUMÉ

Milk casein is regarded as source to release potential sleep-enhancing peptides. Although various casein hydrolysates exhibited sleep-enhancing activity, the underlying reason remains unclear. This study firstly revealed the structural features of potential sleep-enhancing peptides from casein hydrolysates analyzed through peptidomics and multivariate analysis. Additionally, a random forest model and a potential Tyr-based peptide library were established, and then those peptides were quantified to facilitate rapidly-screening. Our findings indicated that YP-, YI/L, and YQ-type peptides with 4-10 amino acids contributed more to higher sleep-enhancing activity of casein hydrolysates, due to their crucial structural features and abundant numbers. Furthermore, three novel strong sleep-enhancing peptides, YQKFPQY, YPFPGPIPN, and YIPIQY were screened, and their activities were validated in vivo. Molecular docking results elucidated the importance of the YP/I/L/Q- structure at the N-terminus of casein peptides in forming crucial hydrogen bond and π-alkyl interactions with His-102 and Asn-60, respectively in the GABAA receptor for activation.


Sujet(s)
Caséines , Peptides , Sommeil , Caséines/composition chimique , Animaux , Peptides/composition chimique , Simulation de docking moléculaire , Souris , Mâle , Humains , Séquence d'acides aminés , Forêts aléatoires
11.
Int J Med Inform ; 191: 105568, 2024 Nov.
Article de Anglais | MEDLINE | ID: mdl-39111243

RÉSUMÉ

PURPOSE: Parametric regression models have been the main statistical method for identifying average treatment effects. Causal machine learning models showed promising results in estimating heterogeneous treatment effects in causal inference. Here we aimed to compare the application of causal random forest (CRF) and linear regression modelling (LRM) to estimate the effects of organisational factors on ICU efficiency. METHODS: A retrospective analysis of 277,459 patients admitted to 128 Brazilian and Uruguayan ICUs over three years. ICU efficiency was assessed using the average standardised efficiency ratio (ASER), measured as the average of the standardised mortality ratio (SMR) and the standardised resource use (SRU) according to the SAPS-3 score. Using a causal inference framework, we estimated and compared the conditional average treatment effect (CATE) of seven common structural and organisational factors on ICU efficiency using LRM with interaction terms and CRF. RESULTS: The hospital mortality was 14 %; median ICU and hospital lengths of stay were 2 and 7 days, respectively. Overall median SMR was 0.97 [IQR: 0.76,1.21], median SRU was 1.06 [IQR: 0.79,1.30] and median ASER was 0.99 [IQR: 0.82,1.21]. Both CRF and LRM showed that the average number of nurses per ten beds was independently associated with ICU efficiency (CATE [95 %CI]: -0.13 [-0.24, -0.01] and -0.09 [-0.17,-0.01], respectively). Finally, CRF identified some specific ICUs with a significant CATE in exposures that did not present a significant average effect. CONCLUSION: In general, both methods were comparable to identify organisational factors significantly associated with CATE on ICU efficiency. CRF however identified specific ICUs with significant effects, even when the average effect was nonsignificant. This can assist healthcare managers in further in-dept evaluation of process interventions to improve ICU efficiency.


Sujet(s)
Mortalité hospitalière , Unités de soins intensifs , Humains , Unités de soins intensifs/organisation et administration , Études rétrospectives , Modèles linéaires , Femelle , Mâle , Brésil , Durée du séjour/statistiques et données numériques , Efficacité fonctionnement , Adulte d'âge moyen , Apprentissage machine , Uruguay , Sujet âgé , Adulte , Forêts aléatoires
12.
Biomolecules ; 14(8)2024 Aug 05.
Article de Anglais | MEDLINE | ID: mdl-39199334

RÉSUMÉ

The interaction between microbes and drugs encompasses the sourcing of pharmaceutical compounds, microbial drug degradation, the development of drug resistance genes, and the impact of microbial communities on host drug metabolism and immune modulation. These interactions significantly impact drug efficacy and the evolution of drug resistance. In this study, we propose a novel predictive model, termed GCGACNN. We first collected microbe, disease, and drug association data from multiple databases and the relevant literature to construct three association matrices and generate similarity feature matrices using Gaussian similarity functions. These association and similarity feature matrices were then input into a multi-layer Graph Neural Network for feature extraction, followed by a two-dimensional Convolutional Neural Network for feature fusion, ultimately establishing an effective predictive framework. Experimental results demonstrate that GCGACNN outperforms existing methods in predictive performance.


Sujet(s)
, Humains , Préparations pharmaceutiques/métabolisme , Algorithmes , Forêts aléatoires
13.
Front Public Health ; 12: 1382354, 2024.
Article de Anglais | MEDLINE | ID: mdl-39086805

RÉSUMÉ

Background: Precise prediction of out-of-pocket (OOP) costs to improve health policy design is important for governments of countries with national health insurance. Controlling the medical expenses for hypertension, one of the leading causes of stroke and ischemic heart disease, is an important issue for the Japanese government. This study aims to explore the importance of OOP costs for outpatients with hypertension. Methods: To obtain a precise prediction of the highest quartile group of OOP costs of hypertensive outpatients, we used nationwide longitudinal data, and estimated a random forest (RF) model focusing on complications with other lifestyle-related diseases and the nonlinearities of the data. Results: The results of the RF models showed that the prediction accuracy of OOP costs for hypertensive patients without activities of daily living (ADL) difficulties was slightly better than that for all hypertensive patients who continued physician visits during the past two consecutive years. Important variables of the highest quartile of OOP costs were age, diabetes or lipidemia, lack of habitual exercise, and moderate or vigorous regular exercise. Conclusion: As preventing complications of diabetes or lipidemia is important for reducing OOP costs in outpatients with hypertension, regular exercise of moderate or vigorous intensity is recommended for hypertensive patients that do not have ADL difficulty. For hypertensive patients with ADL difficulty, habitual exercise is not recommended.


Sujet(s)
Dépenses de santé , Hypertension artérielle , Humains , Hypertension artérielle/économie , Femelle , Mâle , Adulte d'âge moyen , Japon , Sujet âgé , Dépenses de santé/statistiques et données numériques , Activités de la vie quotidienne , Études longitudinales , Adulte , Forêts aléatoires
14.
Sensors (Basel) ; 24(15)2024 Jul 31.
Article de Anglais | MEDLINE | ID: mdl-39124000

RÉSUMÉ

Functional mobility tests, such as the L test of functional mobility, are recommended to provide clinicians with information regarding the mobility progress of lower-limb amputees. Smartphone inertial sensors have been used to perform subtask segmentation on functional mobility tests, providing further clinically useful measures such as fall risk. However, L test subtask segmentation rule-based algorithms developed for able-bodied individuals have not produced sufficiently acceptable results when tested with lower-limb amputee data. In this paper, a random forest machine learning model was trained to segment subtasks of the L test for application to lower-limb amputees. The model was trained with 105 trials completed by able-bodied participants and 25 trials completed by lower-limb amputee participants and tested using a leave-one-out method with lower-limb amputees. This algorithm successfully classified subtasks within a one-foot strike for most lower-limb amputee participants. The algorithm produced acceptable results to enhance clinician understanding of a person's mobility status (>85% accuracy, >75% sensitivity, >95% specificity).


Sujet(s)
Amputés , Membre inférieur , Apprentissage machine , Adulte , Femelle , Humains , Mâle , Adulte d'âge moyen , Amputés/rééducation et réadaptation , Membre inférieur/chirurgie , Membre inférieur/physiopathologie , Membre inférieur/physiologie , Forêts aléatoires
15.
BMC Bioinformatics ; 25(1): 253, 2024 Aug 01.
Article de Anglais | MEDLINE | ID: mdl-39090608

RÉSUMÉ

BACKGROUND: Conditional logistic regression trees have been proposed as a flexible alternative to the standard method of conditional logistic regression for the analysis of matched case-control studies. While they allow to avoid the strict assumption of linearity and automatically incorporate interactions, conditional logistic regression trees may suffer from a relatively high variability. Further machine learning methods for the analysis of matched case-control studies are missing because conventional machine learning methods cannot handle the matched structure of the data. RESULTS: A random forest method for the analysis of matched case-control studies based on conditional logistic regression trees is proposed, which overcomes the issue of high variability. It provides an accurate estimation of exposure effects while being more flexible in the functional form of covariate effects. The efficacy of the method is illustrated in a simulation study and within an application to real-world data from a matched case-control study on the effect of regular participation in cervical cancer screening on the development of cervical cancer. CONCLUSIONS: The proposed random forest method is a promising add-on to the toolbox for the analysis of matched case-control studies and addresses the need for machine-learning methods in this field. It provides a more flexible approach compared to the standard method of conditional logistic regression, but also compared to conditional logistic regression trees. It allows for non-linearity and the automatic inclusion of interaction effects and is suitable both for exploratory and explanatory analyses.


Sujet(s)
Apprentissage machine , Forêts aléatoires , Femelle , Humains , Études cas-témoins , Modèles logistiques , Tumeurs du col de l'utérus
16.
ACS Sens ; 9(8): 4196-4206, 2024 Aug 23.
Article de Anglais | MEDLINE | ID: mdl-39096304

RÉSUMÉ

Reliable and real-time monitoring of seafood decay is attracting growing interest for food safety and human health, while it is still a great challenge to accurately identify the released triethylamine (TEA) from the complex volatilome. Herein, defect-engineered WO3-x architectures are presented to design advanced TEA sensors for seafood quality assessment. Benefiting from abundant oxygen vacancies, the obtained WO2.91 sensor exhibits remarkable TEA-sensing performance in terms of higher response (1.9 times), faster response time (2.1 times), lower detection limit (3.2 times), and higher TEA/NH3 selectivity (2.8 times) compared with the air-annealed WO2.96 sensor. Furthermore, the definite WO2.91 sensor demonstrates long-term stability and anti-interference in complex gases, enabling the accurate recognition of TEA during halibut decay (0-48 h). Coupled with the random forest algorithm with 70 estimators, the WO2.91 sensor enables accurate prediction of halibut storage with an accuracy of 95%. This work not only provides deep insights into improving gas-sensing performance by defect engineering but also offers a rational solution for reliably assessing seafood quality.


Sujet(s)
Algorithmes , Oxydes , Produits de la mer , Tungstène , Produits de la mer/analyse , Tungstène/composition chimique , Oxydes/composition chimique , Qualité alimentaire , Forêts aléatoires
17.
BMC Public Health ; 24(1): 2101, 2024 Aug 03.
Article de Anglais | MEDLINE | ID: mdl-39097727

RÉSUMÉ

With childhood hypertension emerging as a global public health concern, understanding its associated factors is crucial. This study investigated the prevalence and associated factors of hypertension among Chinese children. This cross-sectional investigation was conducted in Pinghu, Zhejiang province, involving 2,373 children aged 8-14 years from 12 schools. Anthropometric measurements were taken by trained staff. Blood pressure (BP) was measured in three separate occasions, with an interval of at least two weeks. Childhood hypertension was defined as systolic blood pressure (SBP) and/or diastolic blood pressure (DBP) ≥ age-, sex-, and height-specific 95th percentile, across all three visits. A self-administered questionnaire was utilized to collect demographic, socioeconomic, health behavioral, and parental information at the first visit of BP measurement. Random forest (RF) and multivariable logistic regression model were used collectively to identify associated factors. Additionally, population attributable fractions (PAFs) were calculated. The prevalence of childhood hypertension was 5.0% (95% confidence interval [CI]: 4.1-5.9%). Children with body mass index (BMI) ≥ 85th percentile were grouped into abnormal weight, and those with waist circumference (WC) > 90th percentile were sorted into central obesity. Normal weight with central obesity (NWCO, adjusted odds ratio [aOR] = 5.04, 95% CI: 1.96-12.98), abnormal weight with no central obesity (AWNCO, aOR = 4.60, 95% CI: 2.57-8.21), and abnormal weight with central obesity (AWCO, aOR = 9.94, 95% CI: 6.06-16.32) were associated with an increased risk of childhood hypertension. Childhood hypertension was attributable to AWCO mostly (PAF: 0.64, 95% CI: 0.50-0.75), followed by AWNCO (PAF: 0.34, 95% CI: 0.19-0.51), and NWCO (PAF: 0.13, 95% CI: 0.03-0.30). Our results indicated that obesity phenotype is associated with childhood hypertension, and the role of weight management could serve as potential target for intervention.


Sujet(s)
Hypertension artérielle , Humains , Études transversales , Mâle , Femelle , Hypertension artérielle/épidémiologie , Chine/épidémiologie , Enfant , Prévalence , Adolescent , Facteurs de risque , Modèles logistiques , Forêts aléatoires
18.
Parasit Vectors ; 17(1): 354, 2024 Aug 21.
Article de Anglais | MEDLINE | ID: mdl-39169433

RÉSUMÉ

BACKGROUND: Culicoides biting midges exhibit a global spatial distribution and are the main vectors of several viruses of veterinary importance, including bluetongue (BT) and African horse sickness (AHS). Many environmental and anthropological factors contribute to their ability to live in a variety of habitats, which have the potential to change over the years as the climate changes. Therefore, as new habitats emerge, the risk for new introductions of these diseases of interest to occur increases. The aim of this study was to model distributions for two primary vectors for BT and AHS (Culicoides imicola and Culicoides bolitinos) using random forest (RF) machine learning and explore the relative importance of environmental and anthropological factors in a region of South Africa with frequent AHS and BT outbreaks. METHODS: Culicoides capture data were collected between 1996 and 2022 across 171 different capture locations in the Western Cape. Predictor variables included climate-related variables (temperature, precipitation, humidity), environment-related variables (normalised difference vegetation index-NDVI, soil moisture) and farm-related variables (livestock densities). Random forest (RF) models were developed to explore the spatial distributions of C. imicola, C. bolitinos and a merged species map, where both competent vectors were combined. The maps were then compared to interpolation maps using the same capture data as well as historical locations of BT and AHS outbreaks. RESULTS: Overall, the RF models performed well with 75.02%, 61.6% and 74.01% variance explained for C. imicola, C. bolitinos and merged species models respectively. Cattle density was the most important predictor for C. imicola and water vapour pressure the most important for C. bolitinos. Compared to interpolation maps, the RF models had higher predictive power throughout most of the year when species were modelled individually; however, when merged, the interpolation maps performed better in all seasons except winter. Finally, midge densities did not show any conclusive correlation with BT or AHS outbreaks. CONCLUSION: This study yielded novel insight into the spatial abundance and drivers of abundance of competent vectors of BT and AHS. It also provided valuable data to inform mathematical models exploring disease outbreaks so that Culicoides-transmitted diseases in South Africa can be further analysed.


Sujet(s)
Peste équine , Fièvre catarrhale du mouton , Ceratopogonidae , Vecteurs insectes , Apprentissage machine , Animaux , Bovins , Peste équine/épidémiologie , Peste équine/transmission , Peste équine/virologie , Fièvre catarrhale du mouton/épidémiologie , Fièvre catarrhale du mouton/transmission , Fièvre catarrhale du mouton/virologie , Virus de la langue bleue , Ceratopogonidae/virologie , Climat , Épidémies de maladies , Écosystème , Equus caballus , Vecteurs insectes/virologie , Forêts aléatoires , République d'Afrique du Sud/épidémiologie , Ovis
19.
PLoS One ; 19(8): e0307853, 2024.
Article de Anglais | MEDLINE | ID: mdl-39173042

RÉSUMÉ

Precise prediction of soil salinity using visible, and near-infrared (vis-NIR) spectroscopy is crucial for ensuring food security and effective environmental management. This paper focuses on the precise prediction of soil salinity utilizing visible and near-infrared (vis-NIR) spectroscopy, a critical factor for food security and effective environmental management. The objective is to utilize vis-NIR spectra alongside a multiple regression model (MLR) and a random forest (RF) modeling approach to predict soil salinity across various land use types, such as farmlands, bare lands, and rangelands accurately. To this end, we selected 150 sampling points representatives of these diverse land uses. At each point, we collected soil samples to measure the soil salinity (ECe) and employed a portable spectrometer to capture the spectral reflectance across the full wavelength range of 400 to 2400 nm. The methodology involved using both individual spectral reflectance values and combinations of reflectance values from different wavelengths as input variables for developing the MLR and RF models. The results indicated that the RF model (RMSE = 4.85 dS m-1, R2 = 0.87, and RPD = 3.15), utilizing combined factors as input variables, outperformed others. Furthermore, our analysis across different land uses revealed that models incorporating combined input variables yielded significantly better results, particularly for farmlands and rangelands. This study underscores the potential of combining vis-NIR spectroscopy with advanced modeling techniques to enhance the accuracy of soil salinity predictions, thereby supporting more informed agricultural and environmental management decisions.


Sujet(s)
Salinité , Sol , Spectroscopie proche infrarouge , Sol/composition chimique , Spectroscopie proche infrarouge/méthodes , Analyse de régression , Agriculture/méthodes , Surveillance de l'environnement/méthodes , Analyse spectrale/méthodes , Forêts aléatoires
20.
Medicine (Baltimore) ; 103(34): e39260, 2024 Aug 23.
Article de Anglais | MEDLINE | ID: mdl-39183417

RÉSUMÉ

Postoperative pulmonary complications (PPCs) are a significant concern following lung resection due to prolonged hospital stays and increased morbidity and mortality among patients. This study aims to develop and validate a risk prediction model for PPCs after lung resection using the random forest (RF) algorithm to enhance early detection and intervention. Data from 180 patients who underwent lung resections at the Third Affiliated Hospital of the Naval Medical University between September 2022 and February 2024 were retrospectively analyzed. The patients were randomly allocated into a training set and a test set in an 8:2 ratio. An RF model was constructed using Python, with feature importance ranked based on the mean Gini index. The predictive performance of the model was evaluated through analyses of the receiver operating characteristic curve, calibration curve, and decision curve. Among the 180 patients included, 47 (26.1%) developed PPCs. The top 5 predictive factors identified by the RF model were blood loss, maximal length of resection, number of lymph nodes removed, forced expiratory volume in the first second as a percentage of predicted value, and age. The receiver operating characteristic curve and calibration curve analyses demonstrated favorable discrimination and calibration capabilities of the model, while decision curve analysis indicated its clinical applicability. The RF algorithm is effective in predicting PPCs following lung resection and holds promise for clinical application.


Sujet(s)
Algorithmes , Pneumonectomie , Complications postopératoires , Humains , Femelle , Mâle , Complications postopératoires/épidémiologie , Complications postopératoires/étiologie , Adulte d'âge moyen , Études rétrospectives , Pneumonectomie/effets indésirables , Sujet âgé , Appréciation des risques/méthodes , Courbe ROC , Facteurs de risque , Adulte , Maladies pulmonaires/étiologie , Maladies pulmonaires/épidémiologie , Forêts aléatoires
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE