RESUMO
Water pollution originating from land use and land cover (LULC) can disrupt river ecosystems, posing a threat to public health, safety, and socioeconomic sustainability. Although the interactions between terrestrial and aquatic systems have been investigated for decades, the scale at which land use practices, whether in the entire basin or separately in parts, significantly impact water quality still needs to be determined. In this research, we used multitemporal data (field measurements, Sentinel 2 images, and elevation data) to investigate how the LULC composition in the catchment area (CA) of each water pollution measurement station located in the river course of the Los Perros Basin affects water pollution indicators (WPIs). We examined whether the CAs form a sequential runoff aggregation system for certain pollutants from the highest to the lowest part of the basin. Our research applied statistical (correlation, time series analysis, and canonical correspondence analysis) and geo-visual analyses to identify relationships at the CA level between satellite-based LULC composition and WPI concentrations. We observed that pollutants such as nitrogen, phosphorus, coliforms, and water temperature form a sequential runoff aggregation system from the highest to the lowest part of the basin. We concluded that the observed decrease in natural cover and increase in built-up and agricultural cover in the upper CAs of the study basin between the study period (2016 to 2020) are related to elevated WPI values for suspended solids and coliforms, which exceeded the allowed limits on all CAs and measured dates.
Assuntos
Monitoramento Ambiental , Fósforo , Rios , Poluentes Químicos da Água , México , Rios/química , Poluentes Químicos da Água/análise , Fósforo/análise , Agricultura , Nitrogênio/análise , Poluição da Água/estatística & dados numéricosRESUMO
The use of prior knowledge in the machine learning framework has been considered a potential tool to handle the curse of dimensionality in genetic and genomics data. Although random forest (RF) represents a flexible non-parametric approach with several advantages, it can provide poor accuracy in high-dimensional settings, mainly in scenarios with small sample sizes. We propose a knowledge-slanted RF that integrates biological networks as prior knowledge into the model to improve its performance and explainability, exemplifying its use for selecting and identifying relevant genes. knowledge-slanted RF is a combination of two stages. First, prior knowledge represented by graphs is translated by running a random walk with restart algorithm to determine the relevance of each gene based on its connection and localization on a protein-protein interaction network. Then, each relevance is used to modify the selection probability to draw a gene as a candidate split-feature in the conventional RF. Experiments in simulated datasets with very small sample sizes ( n ≤ 30 ) comparing knowledge-slanted RF against conventional RF and logistic lasso regression, suggest an improved precision in outcome prediction compared to the other methods. The knowledge-slanted RF was completed with the introduction of a modified version of the Boruta feature selection algorithm. Finally, knowledge-slanted RF identified more relevant biological genes, offering a higher level of explainability for users than conventional RF. These findings were corroborated in one real case to identify relevant genes to calcific aortic valve stenosis.
RESUMO
Household water treatment (HWT) is recommended when safe drinking water is limited. To understand determinants of HWT adoption, we conducted a cross-sectional survey with 650 households across different regions in Haiti. Data were collected on 71 demographic and psychosocial factors and 2 outcomes (self-reported and confirmed HWT use). Data were transformed into 169 possible determinants of adoption across nine categories. We assessed determinants using logistic regression and, as machine learning methods are increasingly used, random forest analyses. Overall, 376 (58%) respondents self-reported treating or purchasing water, and 123 (19%) respondents had residual chlorine in stored household water. Both logistic regression and machine learning analyses had high accuracy (area under the receiver operating characteristic curve (AUC): 0.77-0.82), and the strongest determinants in models were in the demographics and socioeconomics, risk belief, and WASH practice categories. Determinants that can be influenced inform HWT promotion in Haiti. It is recommended to increase access to HWT products, provide cash and education on water treatment to emergency-impacted populations, and focus future surveys on known determinants of adoption. We found both regression and machine learning methods need informed, thoughtful, and trained analysts to ensure meaningful results and discuss the benefits/drawbacks of analysis methods herein.
Assuntos
Características da Família , Aprendizado de Máquina , Purificação da Água , Haiti , Purificação da Água/métodos , Humanos , Modelos Logísticos , Estudos Transversais , Água Potável , Feminino , Masculino , Adulto , Abastecimento de Água , Fatores SocioeconômicosRESUMO
In designing and implementing initiatives to conserve biodiversity and ensure the flow of ecosystem services, it is crucial to understand the perspectives of communities living near protected areas. Improving conservation efforts may depend on analyzing socio-ecological factors and their impact on Local Ecological Knowledge (LEK) and perceptions of ecosystem services. We employed participatory methodologies with 80 farmers from agrarian settlements adjacent to protected areas in the Cerrado biome, Brazil, we quantified LEK and assessed perceptions of ecosystem services using an adaptation of the Q-methodology. We collected data on thirteen socio-ecological variables, including age, gender, farm size, education, engagement with conservation initiatives, and interactions with protected areas and Legal Reserves. Using artificial intelligence in a Random Forest (RF) modelling approach, we identified the most influential variables on LEK and perceptions. Our findings demonstrate that engagement in nature conservation and restoration initiatives, along with the use of native areas (protected and managed areas) significantly influence LEK levels within the farmers' communities. Farmers with full participation, from conception to implementation and evaluation of the initiatives, had a significantly higher LEK level (28.5 ± 13.0) compared to farmers without participation in those initiatives (11.4 ± 5.9). Farmers who used the cerrado for leisure and education (28.2 ± 21.2) had significantly higher LEK levels compared to farmers who do not attend or use the cerrado areas (13.5 ± 8.9) and those using areas of native vegetation for cattle raising (12.8 ± 6.8). These results highlight that, in addition to farmers' participation in conservation and restoration initiatives, the sustainable use of natural areas is fundamental to strengthen their local knowledge of ecosystem functioning. Furthermore, we found that the type of agroecosystem present on farms strongly? shapes farmers' perceptions of ecosystem services. Farmers perceive different ecosystem services depending on land use, indicating the need for tailored interventions for the planning and management of conservation areas. Farmers practicing soybean monoculture had significantly lower perception scores on ecosystem services (-5.1 ± 3.8) than to the other four evaluated groups. Overall, the study highlights the critical role of incorporating local knowledge and perceptions for the design of effective management strategies to increase ecosystem services provision and biodiversity conservation in areas adjacent to protected areas.
Assuntos
Biodiversidade , Conservação dos Recursos Naturais , Ecossistema , Brasil , Fazendeiros/psicologia , Humanos , Conhecimento , Ecologia , Percepção , AgriculturaRESUMO
PURPOSE: Parametric regression models have been the main statistical method for identifying average treatment effects. Causal machine learning models showed promising results in estimating heterogeneous treatment effects in causal inference. Here we aimed to compare the application of causal random forest (CRF) and linear regression modelling (LRM) to estimate the effects of organisational factors on ICU efficiency. METHODS: A retrospective analysis of 277,459 patients admitted to 128 Brazilian and Uruguayan ICUs over three years. ICU efficiency was assessed using the average standardised efficiency ratio (ASER), measured as the average of the standardised mortality ratio (SMR) and the standardised resource use (SRU) according to the SAPS-3 score. Using a causal inference framework, we estimated and compared the conditional average treatment effect (CATE) of seven common structural and organisational factors on ICU efficiency using LRM with interaction terms and CRF. RESULTS: The hospital mortality was 14 %; median ICU and hospital lengths of stay were 2 and 7 days, respectively. Overall median SMR was 0.97 [IQR: 0.76,1.21], median SRU was 1.06 [IQR: 0.79,1.30] and median ASER was 0.99 [IQR: 0.82,1.21]. Both CRF and LRM showed that the average number of nurses per ten beds was independently associated with ICU efficiency (CATE [95 %CI]: -0.13 [-0.24, -0.01] and -0.09 [-0.17,-0.01], respectively). Finally, CRF identified some specific ICUs with a significant CATE in exposures that did not present a significant average effect. CONCLUSION: In general, both methods were comparable to identify organisational factors significantly associated with CATE on ICU efficiency. CRF however identified specific ICUs with significant effects, even when the average effect was nonsignificant. This can assist healthcare managers in further in-dept evaluation of process interventions to improve ICU efficiency.
Assuntos
Mortalidade Hospitalar , Unidades de Terapia Intensiva , Humanos , Unidades de Terapia Intensiva/organização & administração , Estudos Retrospectivos , Modelos Lineares , Feminino , Masculino , Brasil , Tempo de Internação/estatística & dados numéricos , Eficiência Organizacional , Pessoa de Meia-Idade , Aprendizado de Máquina , Uruguai , Idoso , Adulto , Algoritmo Florestas AleatóriasRESUMO
Species distribution modeling helps understand how environmental factors influence species distribution, creating profiles to predict presence in unexplored areas and assess ecological impacts. This study examined the habitat use and population ecology of the Chilean dolphin in Seno Skyring, Chilean Patagonia. We used three models-random forest (RF), generalized linear model (GLM), and artificial neural network (ANN)-to predict dolphin distribution based on environmental and biotic data like water temperature, salinity, and fish farm density. Our research has determined that the RF model is the most precise tool for predicting the habitat preferences of Chilean dolphins. The results indicate that these dolphins are primarily located within six kilometers of the coast, strongly correlating with areas featuring numerous fish farms, sheltered waters close to the shore with river inputs, and shallow productive zones. This suggests a potential association between dolphin presence and fish-farming activities. These findings can guide targeted conservation measures, such as regulating fish-farming practices and protecting vital coastal areas to improve the survival prospects of the Chilean dolphin. Given the extensive fish-farming industry in Chile, this research highlights the need for greater knowledge and comprehensive conservation efforts to ensure the species' long-term survival. By understanding and mitigating the impacts of fish farming and other human activities, we can better protect the habitat and well-being of Chilean dolphins.
RESUMO
This study aimed to determine the feasibility of applying machine-learning methods to assess the progression of chronic kidney disease (CKD) in patients with coronavirus disease (COVID-19) and acute renal injury (AKI). The study was conducted on patients aged 18 years or older who were diagnosed with COVID-19 and AKI between April 2020 and March 2021, and admitted to a second-level hospital in Mérida, Yucatán, México. Of the admitted patients, 47.92% died and 52.06% were discharged. Among the discharged patients, 176 developed AKI during hospitalization, and 131 agreed to participate in the study. The study's results indicated that the area under the receiver operating characteristic curve (AUC-ROC) for the four models was 0.826 for the support vector machine (SVM), 0.828 for the random forest, 0.840 for the logistic regression, and 0.841 for the boosting model. Variable selection methods were utilized to enhance the performance of the classifier, with the SVM model demonstrating the best overall performance, achieving a classification rate of 99.8% ± 0.1 in the training set and 98.43% ± 1.79 in the validation set in AUC-ROC values. These findings have the potential to aid in the early detection and management of CKD, a complication of AKI resulting from COVID-19. Further research is required to confirm these results.
RESUMO
Introduction: Air quality is directly affected by pollutant emission from vehicles, especially in large cities and metropolitan areas or when there is no compliance check for vehicle emission standards. Particulate Matter (PM) is one of the pollutants emitted from fuel burning in internal combustion engines and remains suspended in the atmosphere, causing respiratory and cardiovascular health problems to the population. In this study, we analyzed the interaction between vehicular emissions, meteorological variables, and particulate matter concentrations in the lower atmosphere, presenting methods for predicting and forecasting PM2.5. Methods: Meteorological and vehicle flow data from the city of Curitiba, Brazil, and particulate matter concentration data from optical sensors installed in the city between 2020 and 2022 were organized in hourly and daily averages. Prediction and forecasting were based on two machine learning models: Random Forest (RF) and Long Short-Term Memory (LSTM) neural network. The baseline model for prediction was chosen as the Multiple Linear Regression (MLR) model, and for forecast, we used the naive estimation as baseline. Results: RF showed that on hourly and daily prediction scales, the planetary boundary layer height was the most important variable, followed by wind gust and wind velocity in hourly or daily cases, respectively. The highest PM prediction accuracy (99.37%) was found using the RF model on a daily scale. For forecasting, the highest accuracy was 99.71% using the LSTM model for 1-h forecast horizon with 5 h of previous data used as input variables. Discussion: The RF and LSTM models were able to improve prediction and forecasting compared with MLR and Naive, respectively. The LSTM was trained with data corresponding to the period of the COVID-19 pandemic (2020 and 2021) and was able to forecast the concentration of PM2.5 in 2022, in which the data show that there was greater circulation of vehicles and higher peaks in the concentration of PM2.5. Our results can help the physical understanding of factors influencing pollutant dispersion from vehicle emissions at the lower atmosphere in urban environment. This study supports the formulation of new government policies to mitigate the impact of vehicle emissions in large cities.
RESUMO
Interspecific interactions, including predator-prey, intraguild predation (IGP) and competition, may drive distribution and habitat use of predator communities. However, elucidating the relative importance of these interactions in shaping predator distributions is challenging, especially in marine communities comprising highly mobile species. We used individual-based models (IBMs) to predict the habitat distributions of apex predators, intraguild (IG) prey and prey. We then used passive acoustic telemetry to test these predictions in a subtropical marine predator community consisting of eight elasmobranch (i.e. shark and ray) species in Bimini, The Bahamas. IBMs predicted that prey and IG prey will preferentially select habitats based on safety over resources (food), with stronger selection for safe habitat by smaller prey. Elasmobranch space-use patterns matched these predictions. Species with predator-prey and asymmetrical IGP (between apex and small mesopredators) interactions showed the clearest spatial separation, followed by asymmetrical IGP among apex and large mesopredators. Competitors showed greater spatial overlap although with finer-scale differences in microhabitat use. Our study suggests space-use patterns in elasmobranchs are at least partially driven by interspecific interactions, with stronger spatial separation occurring where interactions include predator-prey relationships or IGP.
Assuntos
Ecossistema , Cadeia Alimentar , Comportamento Predatório , Tubarões , Animais , Tubarões/fisiologia , Rajidae/fisiologia , Bahamas , Modelos Biológicos , Distribuição Animal , TelemetriaRESUMO
This study proposes a multiclass model to classify the severity of knee osteoarthritis (KOA) using bioimpedance measurements. The experimental setup considered three types of measurements using eight electrodes: global impedance with adjacent pattern, global impedance with opposite pattern, and direct impedance measurement, which were taken using an electronic device proposed by authors and based on the Analog Devices AD5933 impedance converter. The study comprised 37 participants, 25 with healthy knees and 13 with three different degrees of KOA. All participants performed 20 repetitions of each of the following five tasks: (i) sitting with the knee bent, (ii) sitting with the knee extended, (iii) sitting and performing successive extensions and flexions of the knee, (iv) standing, and (v) walking. Data from the 15 experimental setups (3 types of measurements×5 exercises) were used to train a multiclass random forest. The training and validation cycle was repeated 100 times using random undersampling. At each of the 100 cycles, 80% of the data were used for training and the rest for testing. The results showed that the proposed approach achieved average sensitivities and specificities of 100% for the four KOA severity grades in the extension, cyclic, and gait tasks. This suggests that the proposed method can serve as a screening tool to determine which individuals should undergo x-rays or magnetic resonance imaging for further evaluation of KOA.
Assuntos
Impedância Elétrica , Aprendizado de Máquina , Osteoartrite do Joelho , Humanos , Osteoartrite do Joelho/fisiopatologia , Osteoartrite do Joelho/diagnóstico por imagem , Feminino , Masculino , Pessoa de Meia-Idade , Índice de Gravidade de Doença , Idoso , Marcha , Adulto , Articulação do Joelho/fisiopatologia , Articulação do Joelho/diagnóstico por imagem , Sensibilidade e Especificidade , Caminhada , Reprodutibilidade dos TestesRESUMO
Introduction: Studies from different parts of the world have shown that some comorbidities are associated with fatal cases of COVID-19. However, the prevalence rates of comorbidities are different around the world, therefore, their contribution to COVID-19 mortality is different. Socioeconomic factors may influence the prevalence of comorbidities; therefore, they may also influence COVID-19 mortality. Methods: This study conducted feature analysis using two supervised machine learning classification algorithms, Random Forest and XGBoost, to examine the comorbidities and level of economic inequalities associated with fatal cases of COVID-19 in Mexico. The dataset used was collected by the National Epidemiology Center from February 2020 to November 2022, and includes more than 20 million observations and 40 variables describing the characteristics of the individuals who underwent COVID-19 testing or treatment. In addition, socioeconomic inequalities were measured using the normalized marginalization index calculated by the National Population Council and the deprivation index calculated by NASA. Results: The analysis shows that diabetes and hypertension were the main comorbidities defining the mortality of COVID-19, furthermore, socioeconomic inequalities were also important characteristics defining the mortality. Similar features were found with Random Forest and XGBoost. Discussion: It is imperative to implement programs aimed at reducing inequalities as well as preventable comorbidities to make the population more resilient to future pandemics. The results apply to regions or countries with similar levels of inequality or comorbidity prevalence.
RESUMO
Driven by climate change, tropical cyclones (TCs) are predicted to change in intensity and frequency through time. Given these forecasted changes, developing an understanding of how TCs impact insular wildlife is of heightened importance. Previous work has shown that extreme weather events may shape species distributions more strongly than climatic averages; however, given the coarse spatial and temporal scales at which TC data are often reported, the influence of TCs on species distributions has yet to be explored. Using TC data from the National Hurricane Center, we developed spatially and temporally explicit species distribution models (SDMs) to examine the role of TCs in shaping present-day distributions of Puerto Rico's 10 Anolis lizard species. We created six predictor variables to represent the intensity and frequency of TCs. For each occurrence of a species, we calculated these variables for TCs that came within 500 km of the center of Puerto Rico and occurred within the 1-year window prior to when that occurrence was recorded. We also included predictor variables related to landcover, climate, topography, canopy cover and geology. We used random forests to assess model performance and variable importance in models with and without TC variables. We found that the inclusion of TC variables improved model performance for the majority of Puerto Rico's 10 anole species. The magnitude of the improvement varied by species, with generalist species that occur throughout the island experiencing the greatest improvements in model performance. Range-restricted species experienced small, almost negligible, improvements but also had more predictive models both with and without the inclusion of TC variables compared to generalist species. Our findings suggest that incorporating data on TCs into SDMs may be important for modeling insular species that are prone to experiencing these types of extreme weather events.
Assuntos
Tempestades Ciclônicas , Lagartos , Animais , Mudança Climática , Porto Rico , Animais Selvagens , PrevisõesRESUMO
Triatoma dimidiata is a vector of the hemoparasite Trypanosoma cruzi, the causal agent of Chagas disease. It settles reproductive colonies in the peridomicile of the premises. The peridomicile is comprised of a random set of artificial and natural features that overlap and assemble a network of microenvironmental suitable sites (patches) that interact with each other and favor the structure and proliferation of T. dimidiata colonies. The heterogeneity of patch characteristics hinders the understanding and identification of sites susceptible to colonization. In this study, a classification system using a random forest algorithm was used to identify peridomiciles susceptible to colonization to describe the spatial distribution of these sites and their relationship with the colonies of T. dimidiata in ten localities of Yucatan. From 1,000 peridomiciles reviewed, the classification showed that 13.9 % (139) of the patches were highly susceptible (HSP), and 86.1 % (861) were less susceptible (LSP). All localities had at least one HSP. The occupancy by patch type showed that the percentage of total occupancy and by colonies was higher in the HSP, while the occupancy by adult T. dimidiata without evidence of nymphs or exuviae (propagules) was higher in the LSP. A generalized additive model (GAM) revealed that the percentage of occupied patches increases as the abundance of individuals in the localities increases however, the percentage of occupied patches in LSP is lower than occupied in HSP. Distance analyses revealed that colonies and propagules were located significantly closer (approximately 200 m) to a colony in a HSP than any colony in a LSP. The distribution of T. dimidiata in the localities was defined by the distribution of patch type; as the occupancy in these patches increased, a network of peridomestic populations was configured, which may be promoted by a greater abundance of insects inside the localities. These results reveal that the spatial distribution of T. dimidiata individuals and colonies in the peridomicile at the locality scale corresponds to a metapopulation pattern within the localities through a system of patches mediated by distance and level of the vectors' occupancy.
Assuntos
Doença de Chagas , Triatoma , Trypanosoma cruzi , Humanos , Animais , Triatoma/parasitologia , Insetos Vetores/parasitologia , NinfaRESUMO
Within the field of Humanities, there is a recognized need for educational innovation, as there are currently no reported tools available that enable individuals to interact with their environment to create an enhanced learning experience in the humanities (e.g., immersive spaces). This project proposes a solution to address this gap by integrating technology and promoting the development of teaching methodologies in the humanities, specifically by incorporating emotional monitoring during the learning process of humanistic context inside an immersive space. In order to achieve this goal, a real-time emotion recognition EEG-based system was developed to interpret and classify specific emotions. These emotions aligned with the early proposal by Descartes (Passions), including admiration, love, hate, desire, joy, and sadness. This system aims to integrate emotional data into the Neurohumanities Lab interactive platform, creating a comprehensive and immersive learning environment. This work developed a ML, real-time emotion recognition model that provided Valence, Arousal, and Dominance (VAD) estimations every 5 seconds. Using PCA, PSD, RF, and Extra-Trees, the best 8 channels and their respective best band powers were extracted; furthermore, multiple models were evaluated using shift-based data division and cross-validations. After assessing their performance, Extra-Trees achieved a general accuracy of 94%, higher than the reported in the literature (88% accuracy). The proposed model provided real-time predictions of VAD variables and was adapted to classify Descartes' six main passions. However, with the VAD values obtained, more than 15 emotions can be classified (reported in the VAD emotion mapping) and extend the range of this application.
RESUMO
Transfer learning is a machine learning technique that works well with chemical endpoints, with several papers confirming its efficiency. Although effective, because the choice of source/assistant tasks is non-trivial, the application of this technique is severely limited by the domain knowledge of the modeller. Considering this limitation, we developed a purely data-driven approach for source task selection that abstracts the need for domain knowledge. To achieve this, we created a supervised learning setting in which transfer outcome (positive/negative) is the variable to be predicted, and a set of six transferability metrics, calculated based on information from target and source datasets, are the features for prediction. We used the ChEMBL database to generate 100,000 transfers using random pairing, and with these transfers, we trained and evaluated our transferability prediction model (TP-Model). Our TP-Model achieved a 135-fold increase in precision while achieving a sensitivity of 92%, demonstrating a clear superiority against random search. In addition, we observed that transfer learning could provide considerable performance increases when applicable, with an average Matthews Correlation Coefficient (MCC) increase of 0.19 when using a single source and an average MCC increase of 0.44 when using multiple sources.
Assuntos
Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade , Bases de Dados FactuaisRESUMO
The habanero pepper (Capsicum chinense) is a prominent spicy fruit integral to the historical, social, cultural, and economic fabric of the Yucatan peninsula in Mexico. This study leverages the power of 1H NMR spectroscopy coupled with machine learning algorithms to dissect the metabolomic profile of eleven C. chinense cultivars, including those grown by INIFAP (Habanero-Jaguar, Antillano-HRA 1-1, Antillano-HRA 7-1, Habanero-HAm-18A, Habanero-HC-23C, and Jolokia-NJolokia-22) and commercial hybrids (Habanero-Rey Votán, Habanero-Kabal, Balam, USAPR10117, and Rey Pakal). A total of fifty metabolites, encompassing sugars, amino acids, short-chain organic acids, and nucleosides, were identified from the 1H NMR spectra. The optimized machine learning model proficiently predicted the similarity percentage between the INIFAP-grown cultivars and commercial hybrids, thereby facilitating a comprehensive comparison. Biomarkers unique to each cultivar were delineated, revealing that the Habanero-Rey Votán cultivar is characterized by the highest concentration of sugars. In contrast, the Balam cultivar is rich in amino acids and short-chain organic acids, sharing a similar metabolomic profile with the Jolokia-NJolokia-22 cultivar. The findings of this study underscore the efficacy and reliability of NMR-based metabolomics as a robust tool for differentiating C. chinense cultivars based on their intricate chemical profiles. This approach not only contributes to the scientific understanding of the metabolomic diversity among habanero peppers but also holds potential implications for food science, agriculture, and the culinary arts.
Assuntos
Capsicum , Capsicum/química , Reprodutibilidade dos Testes , Capsaicina , Espectroscopia de Ressonância Magnética , Frutas/química , Aminoácidos/análise , Açúcares/análiseRESUMO
This study aims to use advanced machine learning techniques supported by Principal Component Analysis (PCA) to estimate body weight (BW) in buffalos raised in southeastern Mexico and compare their performance. The first stage of the current study consists of body measurements and the process of determining the most informative variables using PCA, a dimension reduction method. This process reduces the data size by eliminating the complex structure of the model and provides a faster and more effective learning process. As a second stage, two separate prediction models were developed with Gradient Boosting and Random Forest algorithms, using the principal components obtained from the data set reduced by PCA. The performances of both models were compared using R2, RMSE and MAE metrics, and showed that the Gradient Boosting model achieved a better prediction performance with a higher R2 value and lower error rates than the Random Forest model. In conclusion, PCA-supported modeling applications can provide more reliable results, and the Gradient Boosting algorithm is superior to Random Forest in this context. The current study demonstrates the potential use of machine learning approaches in estimating body weight in water buffalos, and will support sustainable animal husbandry by contributing to decision making processes in the field of animal science.
RESUMO
The world's urban population is growing rapidly, and threatening natural ecosystems, especially streams. Urbanization leads to stream alterations, increased peak flow frequencies, and reduced water quality due to pollutants, morphological changes, and biodiversity loss, known as the urban stream syndrome. However, a shift towards recognizing urban streams as valuable natural systems is occurring, emphasizing green infrastructure and nature-based solutions. This study in Uruguay examined water quality in various watersheds with different urbanization levels and socio-environmental characteristics along a precipitation gradient. Using Geographic Information Systems (GIS) and in situ data, we assessed physicochemical parameters, generated territorial variables, and identified key predictors of water quality. We found that urbanization, particularly urban areas, paved areas, and populations without sanitation, significantly influenced water quality parameters. These factors explained over 50% of the variation in water quality indicators. However, the relationship between urbanization and water quality was non-linear, with abrupt declines after specific urban intensity thresholds. Our results illustrate that ensuring sanitation networks and managing green areas effectively are essential for preserving urban stream water quality. This research underscores the importance of interdisciplinary teams and localized data for informed freshwater resource management.
Assuntos
Rios , Urbanização , Uruguai , Ecossistema , Saneamento , Qualidade da Água , Monitoramento AmbientalRESUMO
Due to the lucrative nature of specialty coffees, there have been instances of adulteration where low-cost materials are mixed in to increase the overall volume, resulting in illegal profit. A widely used and recommended approach to detect possible adulteration is the application of one-class classifiers (OCC), which only require information about the target class to build the models. Thus, this work aimed to identify adulterations in specialty coffees with low-quality coffee using multielement analysis determined by ICP-MS and to evaluate the performance of one-class classifiers (dd-SIMCA, OCRF, and OCPLS). Therefore, authentic specialty coffee samples were adulterated with low-quality coffee in 25 % to 75 % (w/w) proportions. Samples were subjected to acid decomposition for analysis by ICP-MS. OCPLS method presented the best performance to detect adulterations with low-quality coffee in specialty coffees, showing higher specificity (SPE = 100 %) and reliability rate (RLR = 94.3 %).
Assuntos
Café , Café/química , Reprodutibilidade dos Testes , Análise Espectral , Espectrometria de Massas/métodosRESUMO
Gut microbiota has been implicated in various clinical conditions, yet the substantial heterogeneity in gut microbiota research results necessitates a more sophisticated approach than merely identifying statistically different microbial taxa between healthy and unhealthy individuals. Our study seeks to not only select microbial taxa but also explore their synergy with phenotypic host variables to develop novel predictive models for specific clinical conditions. DESIGN: We assessed 50 healthy and 152 unhealthy individuals for phenotypic variables (PV) and gut microbiota (GM) composition by 16S rRNA gene sequencing. The entire modeling process was conducted in the R environment using the Random Forest algorithm. Model performance was assessed through ROC curve construction. RESULTS: We evaluated 52 bacterial taxa and pre-selected PV (p < 0.05) for their contribution to the final models. Across all diseases, the models achieved their best performance when GM and PV data were integrated. Notably, the integrated predictive models demonstrated exceptional performance for rheumatoid arthritis (AUC = 88.03%), type 2 diabetes (AUC = 96.96%), systemic lupus erythematosus (AUC = 98.4%), and type 1 diabetes (AUC = 86.19%). CONCLUSION: Our findings underscore that the selection of bacterial taxa based solely on differences in relative abundance between groups is insufficient to serve as clinical markers. Machine learning techniques are essential for mitigating the considerable variability observed within gut microbiota. In our study, the use of microbial taxa alone exhibited limited predictive power for health outcomes, while the integration of phenotypic variables into predictive models substantially enhanced their predictive capabilities.
What is Already Known on this Subject? While the gut microbiota has been implicated as potential signatures or biomarkers for various clinical conditions, the establishment of causality in humans remains largely elusive.The role of the gut microbiota in maintaining the host organism's proper physiological function is well-established, yet data regarding the composition of the gut microbiota in disease states often suffer from poor reproducibility.What Are the New Findings? Our study demonstrates that relying solely on differences in the relative abundance of bacterial taxa between groups falls short as a means of identifying clinical markers.We advocate the use of robust statistical tools, such as bootstrapping, to mitigate the substantial variability observed in gut microbiota studies, thereby enhancing the reproducibility of research findings.Our findings underscore the limited predictive power of microbial taxa in isolation for health outcomes.The integration of phenotypic variables into predictive models with gut microbiota significantly augments the ability to predict health outcomes.How This Study Might Advance Research Despite the growing enthusiasm for using gut microbiota as biomarkers for various clinical conditions, the lack of standardization throughout the research process impedes progress in this field.Our study emphasizes the necessity of rigorously testing predictions of clinical conditions based on gut microbiota using bootstrapping techniques, promoting greater reproducibility in research findings.