Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 500
Filtrar
1.
J Appl Crystallogr ; 57(Pt 4): 975-985, 2024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-39108811

RESUMEN

Predicting crystal symmetry simply from chemical composition has remained challenging. Several machine-learning approaches can be employed, but the predictive value of popular crystallographic databases is relatively modest due to the paucity of data and uneven distribution across the 230 space groups. In this work, virtually all crystallographic information available to science has been compiled and used to train and test multiple machine-learning models. Composition-driven random-forest classification relying on a large set of descriptors showed the best performance. The predictive models for crystal system, Bravais lattice, point group and space group of inorganic compounds are made publicly available as easy-to-use software downloadable from https://gitlab.com/vishsoft/cosy.

2.
Sensors (Basel) ; 24(15)2024 Jul 24.
Artículo en Inglés | MEDLINE | ID: mdl-39123855

RESUMEN

The detection performance of radar is significantly impaired by active jamming and mutual interference from other radars. This paper proposes a radio signal modulation recognition method to accurately recognize these signals, which helps in the jamming cancellation decisions. Based on the ensemble learning stacking algorithm improved by meta-feature enhancement, the proposed method adopts random forests, K-nearest neighbors, and Gaussian naive Bayes as the base-learners, with logistic regression serving as the meta-learner. It takes the multi-domain features of signals as input, which include time-domain features including fuzzy entropy, slope entropy, and Hjorth parameters; frequency-domain features, including spectral entropy; and fractal-domain features, including fractal dimension. The simulation experiment, including seven common signal types of radar and active jamming, was performed for the effectiveness validation and performance evaluation. Results proved the proposed method's performance superiority to other classification methods, as well as its ability to meet the requirements of low signal-to-noise ratio and few-shot learning.

3.
Am J Alzheimers Dis Other Demen ; 39: 15333175241275215, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39133478

RESUMEN

OBJECTIVE: To assess the role of Machine Learning (ML) in identification critical factors of dementia and mild cognitive impairment. METHODS: 371 elderly individuals were ultimately included in the ML analysis. Demographic information (including gender, age, parity, visual acuity, auditory function, mobility, and medication history) and 35 features from 10 assessment scales were used for modeling. Five machine learning classifiers were used for evaluation, employing a procedure involving feature extraction, selection, model training, and performance assessment to identify key indicative factors. RESULTS: The Random Forest model, after data preprocessing, Information Gain, and Meta-analysis, utilized three training features and four meta-features, achieving an area under the curve of 0.961 and a accuracy of 0.894, showcasing exceptional accuracy for the identification of dementia and mild cognitive impairment. CONCLUSIONS: ML serves as a identification tool for dementia and mild cognitive impairment. Using Information Gain and Meta-feature analysis, Clinical Dementia Rating (CDR) and Neuropsychiatric Inventory (NPI) scale information emerged as crucial for training the Random Forest model.


Asunto(s)
Disfunción Cognitiva , Demencia , Aprendizaje Automático , Humanos , Disfunción Cognitiva/diagnóstico , Femenino , Anciano , Masculino , Demencia/diagnóstico , China , Anciano de 80 o más Años , Pruebas Neuropsicológicas/normas , Pruebas Neuropsicológicas/estadística & datos numéricos , Pueblos del Este de Asia
4.
Stat Med ; 43(20): 3921-3942, 2024 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-38951867

RESUMEN

For survival analysis applications we propose a novel procedure for identifying subgroups with large treatment effects, with focus on subgroups where treatment is potentially detrimental. The approach, termed forest search, is relatively simple and flexible. All-possible subgroups are screened and selected based on hazard ratio thresholds indicative of harm with assessment according to the standard Cox model. By reversing the role of treatment one can seek to identify substantial benefit. We apply a splitting consistency criteria to identify a subgroup considered "maximally consistent with harm." The type-1 error and power for subgroup identification can be quickly approximated by numerical integration. To aid inference we describe a bootstrap bias-corrected Cox model estimator with variance estimated by a Jacknife approximation. We provide a detailed evaluation of operating characteristics in simulations and compare to virtual twins and generalized random forests where we find the proposal to have favorable performance. In particular, in our simulation setting, we find the proposed approach favorably controls the type-1 error for falsely identifying heterogeneity with higher power and classification accuracy for substantial heterogeneous effects. Two real data applications are provided for publicly available datasets from a clinical trial in oncology, and HIV.


Asunto(s)
Simulación por Computador , Infecciones por VIH , Modelos de Riesgos Proporcionales , Humanos , Análisis de Supervivencia
5.
Artículo en Inglés | MEDLINE | ID: mdl-38948964

RESUMEN

BACKGROUND: Identifying language disorders earlier can help children receive the support needed to improve developmental outcomes and quality of life. Despite the prevalence and impacts of persistent language disorder, there are surprisingly no robust predictor tools available. This makes it difficult for researchers to recruit young children into early intervention trials, which in turn impedes advances in providing effective early interventions to children who need it. AIMS: To validate externally a predictor set of six variables previously identified to be predictive of language at 11 years of age, using data from the Longitudinal Study of Australian Children (LSAC) birth cohort. Also, to examine whether additional LSAC variables arose as predictive of language outcome. METHODS & PROCEDURES: A total of 5107 children were recruited to LSAC with developmental measures collected from 0 to 3 years. At 11-12 years, children completed the Clinical Evaluation of Language Fundamentals, 4th Edition, Recalling Sentences subtest. We used SuperLearner to estimate the accuracy of six previously identified parent-reported variables from ages 2-3 years in predicting low language (sentence recall score ≥ 1.5 SD below the mean) at 11-12 years. Random forests were used to identify any additional variables predictive of language outcome. OUTCOMES & RESULTS: Complete data were available for 523 participants (52.20% girls), 27 (5.16%) of whom had a low language score. The six predictors yielded fair accuracy: 78% sensitivity (95% confidence interval (CI) = [58, 91]) and 71% specificity (95% CI = [67, 75]). These predictors relate to sentence complexity, vocabulary and behaviour. The random forests analysis identified similar predictors. CONCLUSIONS & IMPLICATIONS: We identified an ultra-short set of variables that predicts 11-12-year language outcome with 'fair' accuracy. In one of few replication studies of this scale in the field, these methods have now been conducted across two population-based cohorts, with consistent results. An imminent practical implication of these findings is using these predictors to aid recruitment into early language intervention studies. Future research can continue to refine the accuracy of early predictors to work towards earlier identification in a clinical context. WHAT THIS PAPER ADDS: What is already known on the subject There are no robust predictor sets of child language disorder despite its prevalence and far-reaching impacts. A previous study identified six variables collected at age 2-3 years that predicted 11-12-year language with 75% sensitivity and 81% specificity, which warranted replication in a separate cohort. What this study adds to the existing knowledge We used machine learning methods to identify a set of six questions asked at age 2-3 years with ≥ 71% sensitivity and specificity for predicting low language outcome at 11-12 years, now showing consistent results across two large-scale population-based cohort studies. What are the potential or clinical implications of this work? This predictor set is more accurate than existing feasible methods and can be translated into a low-resource and time-efficient recruitment tool for early language intervention studies, leading to improved clinical service provision for young children likely to have persisting language difficulties.

6.
J Frailty Aging ; 13(3): 248-253, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39082769

RESUMEN

BACKGROUND: Frailty is a geriatric syndrome characterized by increased individual vulnerability with an increase in both dependence and mortality when exposed to external stressors. The use of Frailty Indices in routine clinical practice is limited by several factors, such as the cognitive status of the patient, times of consultation, or lack of prior information from the patient. OBJECTIVES: In this study, we propose the generation of an objective measure of frailty, based on the signal from hand grip strength (HGS). DESIGN AND MEASUREMENTS: This signal was recorded with a modified Deyard dynamometer and processed using machine learning strategies based on supervised learning methods to train classifiers. A database was generated from a cohort of 138 older adults in a transverse pilot study that combined classical geriatric questionnaires with physiological data. PARTICIPANTS: Participants were patients selected by geriatricians of medical services provided by collaborating entities. SETTING AND RESULTS: To process the generated information 20 selected significant features of the HGS dataset were filtered, cleaned, and extracted. A technique based on a combination of the Synthetic Minority Oversampling Technique (SMOTE) to generate new samples from the smallest group and ENN (technique based on K-nearest neighbors) to remove noisy samples provided the best results as a well-balanced distribution of data. CONCLUSION: A Random Forest Classifier was trained to predict the frailty label with 92.9% of accuracy, achieving sensitivities higher than 90%.


Asunto(s)
Fragilidad , Evaluación Geriátrica , Fuerza de la Mano , Humanos , Fuerza de la Mano/fisiología , Anciano , Femenino , Masculino , Fragilidad/diagnóstico , Evaluación Geriátrica/métodos , Anciano de 80 o más Años , Proyectos Piloto , Anciano Frágil , Aprendizaje Automático , Dinamómetro de Fuerza Muscular
7.
Artículo en Inglés | MEDLINE | ID: mdl-38842593

RESUMEN

PURPOSE: To investigate the xenobiotic profiles of patients with neovascular age-related macular degeneration (nAMD) undergoing anti-vascular endothelial growth factor (anti-VEGF) intravitreal therapy (IVT) to identify biomarkers indicative of clinical phenotypes through advanced AI methodologies. METHODS: In this cross-sectional observational study, we analyzed 156 peripheral blood xenobiotic features in a cohort of 46 nAMD patients stratified by choroidal neovascularization (CNV) control under anti-VEGF IVT. We employed Liquid Chromatography-Tandem Mass Spectrometry (LC-MS/MS) for measurement and leveraged an AI-driven iterative Random Forests (iRF) approach for robust pattern recognition and feature selection, aligning molecular profiles with clinical phenotypes. RESULTS: AI-augmented iRF models effectively refined the metabolite spectrum by discarding non-predictive elements. Perfluorooctanesulfonate (PFOS) and Ethyl ß-glucopyranoside were identified as significant biomarkers through this process, associated with various clinically relevant phenotypes. Unlike single metabolite classes, drug metabolites were distinctly correlated with subretinal fluid presence. CONCLUSIONS: This study underscores the enhanced capability of AI, particularly iRF, in dissecting complex metabolomic data to elucidate the xenobiotic landscape of nAMD and environmental impact on the disease. The preliminary biomarkers discovered offer promising directions for personalized treatment strategies, although further validation in broader cohorts is essential for clinical application.

8.
Huan Jing Ke Xue ; 45(6): 3153-3164, 2024 Jun 08.
Artículo en Chino | MEDLINE | ID: mdl-38897739

RESUMEN

The accurate prediction of spatial variation trends in groundwater SO42- is of great significance for improving groundwater quality and regional groundwater management level. The multi-source spatio-temporal data such as land cover data, soil parameter data, digital elevation data, and groundwater pH value in the plain area of the Yarkant River Basin in 2011, 2014, 2017, and 2020 were used as characteristic variables to analyze their correlation with groundwater SO42- concentration. To enhance the prediction accuracy, the Bayesian optimization algorithm (BOA) was used to optimize the random forest regression (RFR). Based on the BOA-RFR model, the importance of the characteristic variables was analyzed, the prediction accuracy of the model was evaluated, and the groundwater SO42- prediction map was generated. The results showed that pH value, ground elevation (GE), and percentage of bare land (BAR) in the contribution area were important parameters influencing groundwater hydrochemical composition, which were significantly negatively correlated with groundwater SO42- concentration, and the importance of impact factors for predicting groundwater SO42- concentration exceeded 25 %. The geostatistical interpolation method was used as an auxiliary tool for the predictive modeling of spatial distribution. After adding auxiliary samples, the R2 of groundwater SO42- concentration prediction of the BOA-RFR model was greater than 0.96, and the maximum values of RMSE and MAE were reduced by 4.7 % and 23.8 %, respectively, compared with the minimum values of the model with fewer samples. The SO42- concentration prediction map showed that high SO42- groundwater was enriched in the northeast of the plain area of the Yarkand River Basin, an area that was expanding.

9.
Environ Sci Technol ; 58(25): 10920-10931, 2024 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-38861590

RESUMEN

Distinguishing the effects of different fine particulate matter components (PMCs) is crucial for mitigating their effects on human health. However, the sparse distribution of locations where PM is collected for component analysis makes it challenging to investigate the relevant health effects. This study aimed to investigate the agreement between data-fusion-enhanced exposure assessment and site monitoring data in estimating the effects of PMCs on gestational diabetes mellitus (GDM). We first improved the spatial resolution and accuracy of exposure assessment for five major PMCs (EC, OM, NO3-, NH4+, and SO42-) in the Pearl River Delta region by a data fusion model that combined inputs from multiple sources using a random forest model (10-fold cross-validation R2: 0.52 to 0.61; root mean square error: 0.55 to 2.26 µg/m3). Next, we compared the associations between exposures to PMCs during pregnancy and GDM in a hospital-based cohort of 1148 pregnant women in Heshan, China, using both site monitoring data and data-fusion model estimates. The comparative analysis showed that the data-fusion-based exposure generated stronger estimates of identifying statistical disparities. This study suggests that data-fusion-enhanced estimates can improve exposure assessment and potentially mitigate the misclassification of population exposure arising from the utilization of site monitoring data.


Asunto(s)
Material Particulado , Material Particulado/análisis , Humanos , China , Femenino , Ríos/química , Embarazo , Contaminantes Atmosféricos/análisis , Monitoreo del Ambiente/métodos , Estudios Epidemiológicos , Exposición a Riesgos Ambientales , Diabetes Gestacional/epidemiología
10.
J Imaging ; 10(6)2024 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-38921620

RESUMEN

Accurate and comparable annual mapping is critical to understanding changing vegetation distribution and informing land use planning and management. A U-Net convolutional neural network (CNN) model was used to map natural vegetation and forest types based on annual Landsat geomedian reflectance composite images for a 500 km × 500 km study area in southeastern Australia. The CNN was developed using 2018 imagery. Label data were a ten-class natural vegetation and forest classification (i.e., Acacia, Callitris, Casuarina, Eucalyptus, Grassland, Mangrove, Melaleuca, Plantation, Rainforest and Non-Forest) derived by combining current best-available regional-scale maps of Australian forest types, natural vegetation and land use. The best CNN generated using six Landsat geomedian bands as input produced better results than a pixel-based random forest algorithm, with higher overall accuracy (OA) and weighted mean F1 score for all vegetation classes (93 vs. 87% in both cases) and a higher Kappa score (86 vs. 74%). The trained CNN was used to generate annual vegetation maps for 2000-2019 and evaluated for an independent test area of 100 km × 100 km using statistics describing accuracy regarding the label data and temporal stability. Seventy-six percent of pixels did not change over the 20 years (2000-2019), and year-on-year results were highly correlated (94-97% OA). The accuracy of the CNN model was further verified for the study area using 3456 independent vegetation survey plots where the species of interest had ≥ 50% crown cover. The CNN showed an 81% OA compared with the plot data. The model accuracy was also higher than the label data (76%), which suggests that imperfect training data may not be a major obstacle to CNN-based mapping. Applying the CNN to other regions would help to test the spatial transferability of these techniques and whether they can support the automated production of accurate and comparable annual maps of natural vegetation and forest types required for national reporting.

11.
Heliyon ; 10(9): e30820, 2024 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-38765117

RESUMEN

In this study, we analysed 7Be weekly surface measurements from six Spanish laboratories from 2006 to 2021. The Kolmogorov-Zurbenko filter was applied to the six 7Be time series, and following an iterative process, the original data were divided into two fractions: one related to variations characterized by periods above 33 days (including, among others, the seasonal cycle) and the second noisier fraction related to mechanisms originating from variations with periods below 33 days. Both fractions were independent at the six locations. The second machine-based step using random forest models was applied with the aim of identifying the most influential inputs to the observed 7Be concentrations, and machine learning-inspired regression models were fitted. With respect to seasonal components, the results indicated that the memory of the system was the most influential input, as expected by the large fraction of variance explained by the seasonal cycle, followed by that of humidity and wind-related variables. For the fraction corresponding to periods below 33 d, precipitation-, humidity-, and radiation-related variables were the most influential. This methodology has made it possible to successfully describe the major mechanisms known to be involved in the generation of the surface 7Be concentrations observed in Spain.

12.
Comput Struct Biotechnol J ; 24: 281-291, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38644928

RESUMEN

All people have a fingerprint that is unique to them and persistent throughout life. Similarly, we propose that people have a gaitprint, a persistent walking pattern that contains unique information about an individual. To provide evidence of a unique gaitprint, we aimed to identify individuals based on basic spatiotemporal variables. 81 adults were recruited to walk overground on an indoor track at their own pace for four minutes wearing inertial measurement units. A total of 18 trials per participant were completed between two days, one week apart. Four methods of pattern analysis, a) Euclidean distance, b) cosine similarity, c) random forest, and d) support vector machine, were applied to our basic spatiotemporal variables such as step and stride lengths to accurately identify people. Our best accuracy (98.63%) was achieved by random forest, followed by support vector machine (98.40%), and the top 10 most similar trials from cosine similarity (98.40%). Our results clearly demonstrate a persistent walking pattern with sufficient information about the individual to make them identifiable, suggesting the existence of a gaitprint.

13.
Sci Rep ; 14(1): 8928, 2024 04 18.
Artículo en Inglés | MEDLINE | ID: mdl-38637673

RESUMEN

Estimating penetration rates of Jumbo drills is crucial for optimizing underground mining drilling processes, aiming to reduce costs and time. This study investigates various regression and machine learning methods, including Multilayer Perceptron (MLP), Support Vector Regression (SVR), and Random Forests (RF), to predict the penetration rates (ROP) using multivariate inputs such as operation parameters and rock mass characteristics. The Rock Mass Drillability Index (RDi), incorporating both intact rock properties and structural parameters, was utilized to characterize the rock mass. The dataset was split into 80% for training and 20% for testing. Performance metrics including correlation coefficient (R2), variance accounted for (VAF), mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE) were calculated for each method to evaluate the accuracy of the predictions. SVR exhibited the best prediction performance for ROP, achieving the highest R2, lowest RMSE, MAE, and MAPE, as well as the largest VAF values of 0.94, 0.15, 0.11, 4.84, and 94.13 during training, and 0.91, 0.19, 0.13, 6.02, and 91.11 during testing, respectively. With this high accuracy, we conclude that the proposed machine learning algorithms are valuable and efficient predictors for estimating jumbo drill penetration rates in underground mining operations.


Asunto(s)
Aprendizaje Automático , Máquina de Vectores de Soporte , Redes Neurales de la Computación , Algoritmos
14.
BMC Infect Dis ; 24(Suppl 2): 334, 2024 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-38509486

RESUMEN

BACKGROUND: Dengue fever is a well-studied vector-borne disease in tropical and subtropical areas of the world. Several methods for predicting the occurrence of dengue fever in Taiwan have been proposed. However, to the best of our knowledge, no study has investigated the relationship between air quality indices (AQIs) and dengue fever in Taiwan. RESULTS: This study aimed to develop a dengue fever prediction model in which meteorological factors, a vector index, and AQIs were incorporated into different machine learning algorithms. A total of 805 meteorological records from 2013 to 2015 were collected from government open-source data after preprocessing. In addition to well-known dengue-related factors, we investigated the effects of novel variables, including particulate matter with an aerodynamic diameter < 10 µm (PM10), PM2.5, and an ultraviolet index, for predicting dengue fever occurrence. The collected dataset was randomly divided into an 80% training set and a 20% test set. The experimental results showed that the random forests achieved an area under the receiver operating characteristic curve of 0.9547 for the test set, which was the best compared with the other machine learning algorithms. In addition, the temperature was the most important factor in our variable importance analysis, and it showed a positive effect on dengue fever at < 30 °C but had less of an effect at > 30 °C. The AQIs were not as important as temperature, but one was selected in the process of filtering the variables and showed a certain influence on the final results. CONCLUSIONS: Our study is the first to demonstrate that AQI negatively affects dengue fever occurrence in Taiwan. The proposed prediction model can be used as an early warning system for public health to prevent dengue fever outbreaks.


Asunto(s)
Dengue , Bosques Aleatorios , Humanos , Dengue/epidemiología , Taiwán/epidemiología , Temperatura , Brotes de Enfermedades
15.
Comput Biol Med ; 171: 108200, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38428099

RESUMEN

BACKGROUND: The COVID-19 pandemic caused by SARS-CoV-2 has led to significant global morbidity and mortality, with potential neurological consequences, such as Parkinson's disease (PD). However, the underlying mechanisms remain elusive. METHODS: To address this critical question, we conducted an in-depth transcriptome analysis of dopaminergic (DA) neurons in both COVID-19 and PD patients. We identified common pathways and differentially expressed genes (DEGs), performed enrichment analysis, constructed protein‒protein interaction networks and gene regulatory networks, and employed machine learning methods to develop disease diagnosis and progression prediction models. To further substantiate our findings, we performed validation of hub genes using a single-cell sequencing dataset encompassing DA neurons from PD patients, as well as transcriptome sequencing of DA neurons from a mouse model of MPTP(1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine)-induced PD. Furthermore, a drug-protein interaction network was also created. RESULTS: We gained detailed insights into biological functions and signaling pathways, including ion transport and synaptic signaling pathways. CD38 was identified as a potential key biomarker. Disease diagnosis and progression prediction models were specifically tailored for PD. Molecular docking simulations and molecular dynamics simulations were employed to predict potential therapeutic drugs, revealing that genistein holds significant promise for exerting dual therapeutic effects on both PD and COVID-19. CONCLUSIONS: Our study provides innovative strategies for advancing PD-related research and treatment in the context of the ongoing COVID-19 pandemic by elucidating the common pathogenesis between COVID-19 and PD in DA neurons.


Asunto(s)
COVID-19 , Enfermedad de Parkinson , Animales , Ratones , Humanos , Enfermedad de Parkinson/genética , Enfermedad de Parkinson/metabolismo , 1-Metil-4-fenil-1,2,3,6-Tetrahidropiridina/farmacología , 1-Metil-4-fenil-1,2,3,6-Tetrahidropiridina/uso terapéutico , Simulación del Acoplamiento Molecular , Pandemias , SARS-CoV-2 , Modelos Animales de Enfermedad
16.
Front Psychol ; 15: 1337512, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38379618

RESUMEN

Adolescence is a stage during which individuals develop social adaptability through meaningful interactions with others. During this period, students gradually expand their social networks outside the home, forming a sense of community. The aim of the current study was to explore the key predictors related to sense of community among Korean high school students and to develop supportive policies that enhance their sense of community. Accordingly, random forests and SHapley Additive exPlanations (SHAP) were applied to the 7th wave (11th graders) of the Korean Education Longitudinal Study 2013 data (n = 6,077). As a result, 6 predictors positively associated with sense of community were identified, including self-related variables, "multicultural acceptance," "behavioral regulation strategy," and "peer attachment," consistent with previous findings. Newly derived variables that predict sense of community include "positive recognition of volunteering," "creativity," "observance of rules" and "class attitude," which are also positively related to sense of community. The implications of these results and some suggestions for future research are also discussed.

17.
Poult Sci ; 103(4): 103504, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38335671

RESUMEN

Understanding the factors of dead-on-arrival (DOA) incidents during pre-slaughter handling is crucial for informed decision-making, improving broiler welfare, and optimizing farm profitability. In this study, 3 different machine learning (ML) algorithms - least absolute shrinkage and selection operator (LASSO), classification tree (CT), and random forest (RF) - were used together with 4 sampling techniques to optimize imbalanced data. The dataset comes from 22,115 broiler truckloads from a large producer in Thailand (2021-2022) and includes 14 independent variables covering the rearing, catching, and transportation stages. The study focuses on DOA% in the range of 0.10 to 1.20%, with a threshold for high DOA% above 0.3%, and records DOA% per truckload during pre-slaughter ante-mortem inspection. With a high DOA rate of 25.2%, the imbalanced dataset prompts the implementation of 4 methods to tune the imbalance parameters: random over sampling (ROS), random under sampling (RUS), both sampling (BOTH), and synthetic sampling or random over sampling example (ROSE). The aim is to improve the performance of the prediction model in classifying and predicting high DOA%. The comparative analysis of the different error metrics shows that RF outperforms the other models in a balanced dataset. In particular, RUS shows a significant improvement in prediction performance across all models compared to the original unbalanced dataset. The identification of the 4 most important variables for predicting high DOA percentages - mortality and culling rate, rearing stocking density, season, and mean body weight - emphasizes their importance for broiler production. This study provides valuable insights into the prediction of DOA status using an ML approach and contributes to the development of more effective strategies to mitigate high DOA percentages in commercial broiler production.


Asunto(s)
Mataderos , Pollos , Animales , Algoritmos , Aprendizaje Automático , Antibacterianos
18.
Eur J Pharm Sci ; 194: 106705, 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38246432

RESUMEN

Viscosity is a key characteristic of therapeutic antibodies for subcutaneous administration which requires low volume and high concentration formulations. It would be highly beneficial to accurately predict the viscosity of newly developed therapeutic antibodies in the early stages of development. In this work, a ProtT5-XL-UniRef50 (ProtT5) and Random Forests (RF)-based prediction method was proposed for accurately predicting the viscosity of monoclonal antibodies, with only corresponding sequences needed. Starting from the given heavy and light chain V-region sequences, corresponding features were first extracted from the ProtT5 pretrained model. Kernel principal analysis (Kernel-PCA) was then used for reducing the extracted 2048-D (1024-D for each sequence) feature vector to a reasonable level for efficient training of the RF-regressor. Then, the RF model was constructed on 40 commercially available therapeutic antibodies and tested with 3-folds cross-validation. Test results show that the model could reproduce the viscosity value at a high level (Pearson correlation coefficient (PCC) = 0.928). Performance on classifying high (>30 cP) and low (<30 cP) viscosity is much more satisfactory, the Accuracy (ACC) and the area under precision-recall curve (AUC) of the classification model from validation tests are 0.975 and 1.000, respectively. Compared to 5 existing state-of-the-art viscosity prediction methods, the proposed method performs best which would facilitate high concentration antibody viscosity high-throughput screening.


Asunto(s)
Anticuerpos Monoclonales , Bosques Aleatorios , Viscosidad , Composición de Medicamentos
20.
Sci Total Environ ; 912: 169209, 2024 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-38092211

RESUMEN

The partial pressure of ocean surface CO2 (pCO2) plays an important role in quantifying the carbon budget and assessing ocean acidification. For such a vast and complex ocean system as the global ocean, most current research practices tend to study the ocean into regions. In order to reveal the overall characteristics of the global ocean and avoid mutual influence between zones, a holistic research method was used to detect the correlation of twelve predictive factors, including chlorophyll concentration (Chlor_a), diffuse attenuation coefficient at 490 nm (Kd_490), density ocean mixed layer thickness (Mlotst), eastward velocity (East), northward velocity (North), salinity (Sal), temperature (Temp), dissolved iron (Fe), dissolved silicate (Si), nitrate (NO3), potential of hydrogen (pH), phosphate (PO4), at the global ocean scale. Based on measured data from the Global Surface pCO2 (LDEO) database, combined with National Aeronautics and Space Administration (NASA) Ocean Color satellite data and Copernicus Ocean reanalysis data, an improved optimized random forest (ORF) method is proposed for the overall reconstruction of global ocean surface pCO2, and compared with various machine learning methods. The results indicate that the ORF method is the most accurate in overall modeling at the global ocean scale (mean absolute error of 6.27µatm, root mean square error of 15.34µatm, R2 of 0.92). Based on independent observations from the LDEO dataset and time series observation stations, the ORF model was further validated, and the global ocean surface pCO2 distribution map of 0.25° × 0.25° for 2010 to 2019 was reconstructed, which is of significance for the global ocean carbon cycle and carbon assessment.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA