Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 194
Filtrar
1.
Am J Epidemiol ; 193(2): 389-403, 2024 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-37830395

RESUMO

Understanding characteristics of patients with propensity scores in the tails of the propensity score (PS) distribution has relevance for inverse-probability-of-treatment-weighted and PS-based estimation in observational studies. Here we outline a method for identifying variables most responsible for extreme propensity scores. The approach is illustrated in 3 scenarios: 1) a plasmode simulation of adult patients in the National Ambulatory Medical Care Survey (2011-2015) and 2) timing of dexamethasone initiation and 3) timing of remdesivir initiation in patients hospitalized for coronavirus disease 2019 from February 2020 through January 2021. PS models were fitted using relevant baseline covariates, and tails of the PS distribution were defined using asymmetric first and 99th percentiles. After fitting of the PS model in each original data set, values of each key covariate were permuted and model-agnostic variable importance measures were examined. Visualization and variable importance techniques were helpful in identifying variables most responsible for extreme propensity scores and may help identify individual characteristics that might make patients inappropriate for inclusion in a study (e.g., off-label use). Subsetting or restricting the study sample based on variables identified using this approach may help investigators avoid the need for trimming or overlap weights in studies.


Assuntos
Pontuação de Propensão , Humanos , Simulação por Computador
2.
Biostatistics ; 24(4): 1085-1105, 2023 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-35861622

RESUMO

An endeavor central to precision medicine is predictive biomarker discovery; they define patient subpopulations which stand to benefit most, or least, from a given treatment. The identification of these biomarkers is often the byproduct of the related but fundamentally different task of treatment rule estimation. Using treatment rule estimation methods to identify predictive biomarkers in clinical trials where the number of covariates exceeds the number of participants often results in high false discovery rates. The higher than expected number of false positives translates to wasted resources when conducting follow-up experiments for drug target identification and diagnostic assay development. Patient outcomes are in turn negatively affected. We propose a variable importance parameter for directly assessing the importance of potentially predictive biomarkers and develop a flexible nonparametric inference procedure for this estimand. We prove that our estimator is double robust and asymptotically linear under loose conditions in the data-generating process, permitting valid inference about the importance metric. The statistical guarantees of the method are verified in a thorough simulation study representative of randomized control trials with moderate and high-dimensional covariate vectors. Our procedure is then used to discover predictive biomarkers from among the tumor gene expression data of metastatic renal cell carcinoma patients enrolled in recently completed clinical trials. We find that our approach more readily discerns predictive from nonpredictive biomarkers than procedures whose primary purpose is treatment rule estimation. An open-source software implementation of the methodology, the uniCATE R package, is briefly introduced.


Assuntos
Pesquisa Biomédica , Carcinoma de Células Renais , Neoplasias Renais , Humanos , Carcinoma de Células Renais/diagnóstico , Carcinoma de Células Renais/genética , Neoplasias Renais/diagnóstico , Neoplasias Renais/genética , Biomarcadores , Simulação por Computador
3.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36239380

RESUMO

In order to identify plant pentatricopeptide repeat (PPR) proteins, a framework of variable selection has been proposed. In fact, it is an effective feature selection strategy that focuses on the performance of classification. Random forest has been used as the classifier with certain variables automatically selected for discrimination between PPR functional and non-functional proteins. However, it is found that samples regarded as PPR functional proteins are wrongly classified in a high rate. In this paper, we plan to improve the framework in order to achieve better classification results. Modifications are made on the framework for better identifying PPR functional proteins. Instead of random forest, a hybrid ensemble classifier is built with its base classifiers derived from six different classification methods. Besides, an incremental strategy and a clustering by search in descending order are alternatively used for feature selection, which can effectively select the most representative variables for identification on PPR proteins. In addition, it can be found that different base classifiers alternately play an important role in the ensemble classifier with feature dimension increasing. The experimental results demonstrate the effectiveness of our improvements.


Assuntos
Algoritmos , Proteínas de Plantas , Proteínas de Plantas/genética , Análise por Conglomerados
4.
Int J Legal Med ; 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38622313

RESUMO

To date South African forensic anthropologists are only able to successfully apply a metric approach to estimate population affinity when constructing a biological profile from skeletal remains. While a non-metric, or macromorphoscopic approach exists, limited research has been conducted to explore its use in a South African population. This study aimed to explore 17 cranial macromorphoscopic traits to develop improved methodology for the estimation of population affinity among black, white and coloured South Africans and for the method to be compliant with standards of best practice. The trait frequency distributions revealed substantial group variation and overlap, and not a single trait can be considered characteristic of any one population group. Kruskal-Wallis and Dunn's tests demonstrated significant population differences for 13 of the 17 traits. Random forest modelling was used to develop classification models to assess the reliability and accuracy of the traits in identifying population affinity. Overall, the model including all traits obtained a classification accuracy of 79% when assessing population affinity, which is comparable to current craniometric methods. The variable importance indicates that all the traits contributed some information to the model, with the inferior nasal margin, nasal bone contour, and nasal aperture shape ranked the most useful for classification. Thus, this study validates the use of macromorphoscopic traits in a South African sample, and the population-specific data from this study can potentially be incorporated into forensic casework and skeletal analyses in South Africa to improve population affinity estimates.

5.
BMC Med Res Methodol ; 24(1): 34, 2024 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-38341532

RESUMO

BACKGROUND: Mendelian randomization is a popular method for causal inference with observational data that uses genetic variants as instrumental variables. Similarly to a randomized trial, a standard Mendelian randomization analysis estimates the population-averaged effect of an exposure on an outcome. Dividing the population into subgroups can reveal effect heterogeneity to inform who would most benefit from intervention on the exposure. However, as covariates are measured post-"randomization", naive stratification typically induces collider bias in stratum-specific estimates. METHOD: We extend a previously proposed stratification method (the "doubly-ranked method") to form strata based on a single covariate, and introduce a data-adaptive random forest method to calculate stratum-specific estimates that are robust to collider bias based on a high-dimensional covariate set. We also propose measures based on the Q statistic to assess heterogeneity between stratum-specific estimates (to understand whether estimates are more variable than expected due to chance alone) and variable importance (to identify the key drivers of effect heterogeneity). RESULT: We show that the effect of body mass index (BMI) on lung function is heterogeneous, depending most strongly on hip circumference and weight. While for most individuals, the predicted effect of increasing BMI on lung function is negative, it is positive for some individuals and strongly negative for others. CONCLUSION: Our data-adaptive approach allows for the exploration of effect heterogeneity in the relationship between an exposure and an outcome within a Mendelian randomization framework. This can yield valuable insights into disease aetiology and help identify specific groups of individuals who would derive the greatest benefit from targeted interventions on the exposure.


Assuntos
Variação Genética , Análise da Randomização Mendeliana , Humanos , Análise da Randomização Mendeliana/métodos , Causalidade , Viés , Índice de Massa Corporal
6.
J Asthma ; 61(3): 203-211, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37725084

RESUMO

OBJECTIVE: Previous machine learning approaches fail to consider race and ethnicity and social determinants of health (SDOH) to predict childhood asthma exacerbations. A predictive model for asthma exacerbations in children is developed to explore the importance of race and ethnicity, rural-urban commuting area (RUCA) codes, the Child Opportunity Index (COI), and other ICD-10 SDOH in predicting asthma outcomes. METHODS: Insurance and coverage claims data from the Arkansas All-Payer Claims Database were used to capture risk factors. We identified a cohort of 22,631 children with asthma aged 5-18 years with 2 years of continuous Medicaid enrollment and at least one asthma diagnosis in 2018. The goal was to predict asthma-related hospitalizations and asthma-related emergency department (ED) visits in 2019. The analytic sample was 59% age 5-11 years, 39% White, 33% Black, and 6% Hispanic. Conditional random forest models were used to train the model. RESULTS: The model yielded an area under the curve (AUC) of 72%, sensitivity of 55% and specificity of 78% in the OOB samples and AUC of 73%, sensitivity of 58% and specificity of 77% in the training samples. Consistent with previous literature, asthma-related hospitalization or ED visits in the previous year (2018) were the two most important variables in predicting hospital or ED use in the following year (2019), followed by the total number of reliever and controller medications. CONCLUSIONS: Predictive models for asthma-related exacerbation achieved moderate accuracy, but race and ethnicity, ICD-10 SDOH, RUCA codes, and COI measures were not important in improving model accuracy.


Assuntos
Asma , Estados Unidos/epidemiologia , Criança , Humanos , Asma/diagnóstico , Asma/epidemiologia , Asma/tratamento farmacológico , Fatores de Risco , Hospitalização , Arkansas , Hospitais , Serviço Hospitalar de Emergência
7.
Environ Res ; 252(Pt 4): 119073, 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38710428

RESUMO

Climate change, namely increased warming coupled with a rise in extreme events (e.g., droughts, storms, heatwaves), is negatively affecting forest ecosystems worldwide. In these ecosystems, growth dynamics and biomass accumulation are driven mainly by environmental constraints, inter-tree competition, and disturbance regimes. Usually, climate-growth relationships are assessed by linear correlation due to the simplicity and straightforwardness of modeling. However, applying this method may bias results, since the ecological and physiological responses of trees to environmental factors are non-linear, and usually bell-shaped. In the Eastern Carpathian, Norway spruce is at the southeasternmost edge of its natural occurrence; this region is thus potentially vulnerable to climate change. A non-linear assessment of climate-growth relationships using machine-learning techniques for Norway spruce in this area had not been conducted prior to this study. To address this knowledge gap, we analyzed a large tree-ring network from 158 stands, with over 3000 trees of varying age distributed along an elevational gradient. Our results showed that non-linearity in the growth-climate response of spruce was season-specific: temperatures from the previous autumn and current growing season, along with water availability during winter, induced a bell-shaped response. Moreover, we found that at low elevations, spruce growth was mainly limited by water availability in the growing season, while winter temperatures are likely to have had a slight influence along the entire elevational gradient. Furthermore, at elevations lower than 1400 m, spruce trees were also found to be sensitive to previous autumn water availability. Overall, our results shed new light on the response of Norway spruce to climate in the Carpathians, which may aid in management decisions.


Assuntos
Altitude , Mudança Climática , Picea , Picea/crescimento & desenvolvimento , Dinâmica não Linear , Estações do Ano , Aprendizado de Máquina , Temperatura
8.
Sensors (Basel) ; 24(1)2024 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-38203170

RESUMO

Respiratory viruses' detection is vitally important in coping with pandemics such as COVID-19. Conventional methods typically require laboratory-based, high-cost equipment. An emerging alternative method is Near-Infrared (NIR) spectroscopy, especially a portable one of the type that has the benefits of low cost, portability, rapidity, ease of use, and mass deployability in both clinical and field settings. One obstacle to its effective application lies in its common limitations, which include relatively low specificity and general quality. Characteristically, the spectra curves show an interweaving feature for the virus-present and virus-absent samples. This then provokes the idea of using machine learning methods to overcome the difficulty. While a subsequent obstacle coincides with the fact that a direct deployment of the machine learning approaches leads to inadequate accuracy of the modelling results. This paper presents a data-driven study on the detection of two common respiratory viruses, the respiratory syncytial virus (RSV) and the Sendai virus (SEV), using a portable NIR spectrometer supported by a machine learning solution enhanced by an algorithm of variable selection via the Variable Importance in Projection (VIP) scores and its Quantile value, along with variable truncation processing, to overcome the obstacles to a certain extent. We conducted extensive experiments with the aid of the specifically developed algorithm of variable selection, using a total of four datasets, achieving classification accuracy of: (1) 0.88, 0.94, and 0.93 for RSV, SEV, and RSV + SEV, respectively, averaged over multiple runs, for the neural network modelling of taking in turn 3 sessions of data for training and the remaining one session of an 'unknown' dataset for testing. (2) the average accuracy of 0.94 (RSV), 0.97 (SEV), and 0.97 (RSV + SEV) for model validation and 0.90 (RSV), 0.93 (SEV), and 0.91 (RSV + SEV) for model testing, using two of the datasets for model training, one for model validation and the other for model testing. These results demonstrate the feasibility of using portable NIR spectroscopy coupled with machine learning to detect respiratory viruses with good accuracy, and the approach could be a viable solution for population screening.


Assuntos
COVID-19 , Vírus , Humanos , Algoritmos , COVID-19/diagnóstico , Capacidades de Enfrentamento , Aprendizado de Máquina
9.
Biom J ; 66(1): e2200178, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38072661

RESUMO

We recently developed a new method random-intercept accelerated failure time model with Bayesian additive regression trees (riAFT-BART) to draw causal inferences about population treatment effect on patient survival from clustered and censored survival data while accounting for the multilevel data structure. The practical utility of this method goes beyond the estimation of population average treatment effect. In this work, we exposit how riAFT-BART can be used to solve two important statistical questions with clustered survival data: estimating the treatment effect heterogeneity and variable selection. Leveraging the likelihood-based machine learning, we describe a way in which we can draw posterior samples of the individual survival treatment effect from riAFT-BART model runs, and use the drawn posterior samples to perform an exploratory treatment effect heterogeneity analysis to identify subpopulations who may experience differential treatment effects than population average effects. There is sparse literature on methods for variable selection among clustered and censored survival data, particularly ones using flexible modeling techniques. We propose a permutation-based approach using the predictor's variable inclusion proportion supplied by the riAFT-BART model for variable selection. To address the missing data issue frequently encountered in health databases, we propose a strategy to combine bootstrap imputation and riAFT-BART for variable selection among incomplete clustered survival data. We conduct an expansive simulation study to examine the practical operating characteristics of our proposed methods, and provide empirical evidence that our proposed methods perform better than several existing methods across a wide range of data scenarios. Finally, we demonstrate the methods via a case study of predictors for in-hospital mortality among severe COVID-19 patients and estimating the heterogeneous treatment effects of three COVID-specific medications. The methods developed in this work are readily available in the R ${\textsf {R}}$ package riAFTBART $\textsf {riAFTBART}$ .


Assuntos
Aprendizado de Máquina , Heterogeneidade da Eficácia do Tratamento , Humanos , Teorema de Bayes , Funções Verossimilhança , Simulação por Computador
10.
BMC Bioinformatics ; 24(1): 258, 2023 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-37330468

RESUMO

Capturing the conditional covariances or correlations among the elements of a multivariate response vector based on covariates is important to various fields including neuroscience, epidemiology and biomedicine. We propose a new method called Covariance Regression with Random Forests (CovRegRF) to estimate the covariance matrix of a multivariate response given a set of covariates, using a random forest framework. Random forest trees are built with a splitting rule specially designed to maximize the difference between the sample covariance matrix estimates of the child nodes. We also propose a significance test for the partial effect of a subset of covariates. We evaluate the performance of the proposed method and significance test through a simulation study which shows that the proposed method provides accurate covariance matrix estimates and that the Type-1 error is well controlled. An application of the proposed method to thyroid disease data is also presented. CovRegRF is implemented in a freely available R package on CRAN.


Assuntos
Modelos Estatísticos , Algoritmo Florestas Aleatórias , Criança , Humanos , Simulação por Computador
11.
Biostatistics ; 23(1): 157-172, 2022 01 13.
Artigo em Inglês | MEDLINE | ID: mdl-32424406

RESUMO

Many clinical trials have been conducted to compare right-censored survival outcomes between interventions. Such comparisons are typically made on the basis of the entire group receiving one intervention versus the others. In order to identify subgroups for which the preferential treatment may differ from the overall group, we propose the depth importance in precision medicine (DIPM) method for such data within the precision medicine framework. The approach first modifies the split criteria of the traditional classification tree to fit the precision medicine setting. Then, a random forest of trees is constructed at each node. The forest is used to calculate depth variable importance scores for each candidate split variable. The variable with the highest score is identified as the best variable to split the node. The importance score is a flexible and simply constructed measure that makes use of the observation that more important variables tend to be selected closer to the root nodes of trees. The DIPM method is primarily designed for the analysis of clinical data with two treatment groups. We also present the extension to the case of more than two treatment groups. We use simulation studies to demonstrate the accuracy of our method and provide the results of applications to two real-world data sets. In the case of one data set, the DIPM method outperforms an existing method, and a primary motivation of this article is the ability of the DIPM method to address the shortcomings of this existing method. Altogether, the DIPM method yields promising results that demonstrate its capacity to guide personalized treatment decisions in cases with right-censored survival outcomes.


Assuntos
Medicina de Precisão , Projetos de Pesquisa , Simulação por Computador , Humanos , Medicina de Precisão/métodos
12.
Stat Med ; 2023 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-36601725

RESUMO

The interpretability of machine learning models, even though with an excellent prediction performance, remains a challenge in practical applications. The model interpretability and variable importance for well-performed supervised machine learning models are investigated in this study. With the commonly accepted concept of odds ratio (OR), we propose a novel and computationally efficient Variable Importance evaluation framework based on the Personalized Odds Ratio (VIPOR). It is a model-agnostic interpretation method that can be used to evaluate variable importance both locally and globally. Locally, the variable importance is quantified by the personalized odds ratio (POR), which can account for subject heterogeneity in machine learning. Globally, we utilize a hierarchical tree to group the predictors into five groups: completely positive, completely negative, positive dominated, negative dominated, and neutral groups. The relative importance of predictors within each group is ranked based on different statistics of PORs across subjects for different application purposes. For illustration, we apply the proposed VIPOR method to interpreting a multilayer perceptron (MLP) model, which aims to predict the mortality of subarachnoid hemorrhage (SAH) patients using real-world electronic health records (EHR) data. We compare the important variables derived from MLP with other machine learning models, including tree-based models and the L1-regularized logistic regression model. The top importance variables are consistently identified by VIPOR across different prediction models. Comparisons with existing interpretation methods are also conducted and discussed based on publicly available data sets.

13.
J Surg Oncol ; 127(6): 966-974, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36840925

RESUMO

BACKGROUND AND OBJECTIVES: The role of time to surgery (TTS) for long-term outcomes in colon cancer (CC) remains ill-defined. We sought to utilize artificial intelligence (AI) to characterize the drivers of TTS and its prognostic impact. METHODS: The National Cancer Database was utilized to identify patients diagnosed with non-metastatic CC between 2004 and 2018. AI models were employed to rank the importance of several sociodemographic, facility, and tumor characteristics in determining TTS, and postoperative survival. RESULTS: Among 518 983 patients, 137 902 (26.6%) received intraoperative diagnosis of CC (TTS = 0), while 381 081 (74.4%) underwent elective surgery (TTS > 0) with median TTS of 19.0 days (interquartile range [IQR]: 7.0-33.0). An AI model, identified tumor stage, receipt of adequate lymphadenectomy, histologic grade, lymphovascular invasion, and insurance status as the most important variables associated with TTS = 0. Conversely, the type and location of treating facility and receipt of adjuvant therapy were among the most important variables for TTS > 0. Notably, TTS was among the most important variables associated with survival, and TTS > 3 weeks was associated with an incremental increase in mortality risk. CONCLUSIONS: The identification of factors associated with TTS can help stratify patients most likely to suffer poor outcomes due to prolonged TTS, as well as guide quality improvement initiatives related to timely surgical care.


Assuntos
Inteligência Artificial , Neoplasias do Colo , Oncologia Cirúrgica , Humanos , Neoplasias do Colo/cirurgia , Prognóstico , Fatores de Tempo
14.
BMC Med Res Methodol ; 23(1): 144, 2023 06 19.
Artigo em Inglês | MEDLINE | ID: mdl-37337173

RESUMO

BACKGROUND: Machine learning tools such as random forests provide important opportunities for modeling large, complex modern data generated in medicine. Unfortunately, when it comes to understanding why machine learning models are predictive, applied research continues to rely on 'out of bag' (OOB) variable importance metrics (VIMPs) that are known to have considerable shortcomings within the statistics community. After explaining the limitations of OOB VIMPs - including bias towards correlated features and limited interpretability - we describe a modern approach called 'knockoff VIMPs' and explain its advantages. METHODS: We first evaluate current VIMP practices through an in-depth literature review of 50 recent random forest manuscripts. Next, we recommend organized and interpretable strategies for analysis with knockoff VIMPs, including computing them for groups of features and considering multiple model performance metrics. To demonstrate methods, we develop a random forest to predict 5-year incident stroke in the Sleep Heart Health Study and compare results based on OOB and knockoff VIMPs. RESULTS: Nearly all papers in the literature review contained substantial limitations in their use of VIMPs. In our demonstration, using OOB VIMPs for individual variables suggested two highly correlated lung function variables (forced expiratory volume, forced vital capacity) as the best predictors of incident stroke, followed by age and height. Using an organized analytic approach that considered knockoff VIMPs of both groups of features and individual features, the largest contributions to model sensitivity were medications (especially cardiovascular) and measured medical risk factors, while the largest contributions to model specificity were age, diastolic blood pressure, self-reported medical risk factors, polysomnography features, and pack-years of smoking. Thus, we reach very different conclusions about stroke risk factors using OOB VIMPs versus knockoff VIMPs. CONCLUSIONS: The near-ubiquitous reliance on OOB VIMPs may provide misleading results for researchers who use such methods to guide their research. Given the rapid pace of scientific inquiry using machine learning, it is essential to bring modern knockoff VIMPs that are interpretable and unbiased into widespread applied practice to steer researchers using random forest machine learning toward more meaningful results.


Assuntos
Algoritmo Florestas Aleatórias , Acidente Vascular Cerebral , Humanos , Benchmarking , Aprendizado de Máquina , Acidente Vascular Cerebral/diagnóstico , Acidente Vascular Cerebral/epidemiologia , Sono
15.
BMC Med Res Methodol ; 23(1): 209, 2023 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-37726680

RESUMO

Random Forests are a powerful and frequently applied Machine Learning tool. The permutation variable importance (VIMP) has been proposed to improve the explainability of such a pure prediction model. It describes the expected increase in prediction error after randomly permuting a variable and disturbing its association with the outcome. However, VIMPs measure a variable's marginal influence only, that can make its interpretation difficult or even misleading. In the present work we address the general need for improving the explainability of prediction models by exploring VIMPs in the presence of correlated variables. In particular, we propose to use a variable's residual information for investigating if its permutation importance partially or totally originates from correlated predictors. Hypotheses tests are derived by a resampling algorithm that can further support results by providing test decisions and p-values. In simulation studies we show that the proposed test controls type I error rates. When applying the methods to a Random Forest analysis of post-transplant survival after kidney transplantation, the importance of kidney donor quality for predicting post-transplant survival is shown to be high. However, the transplant allocation policy introduces correlations with other well-known predictors, which raises the concern that the importance of kidney donor quality may simply originate from these predictors. By using the proposed method, this concern is addressed and it is demonstrated that kidney donor quality plays an important role in post-transplant survival, regardless of correlations with other predictors.


Assuntos
Transplante de Rim , Algoritmo Florestas Aleatórias , Humanos , Algoritmos , Simulação por Computador , Aprendizado de Máquina
16.
Environ Sci Technol ; 57(37): 14024-14035, 2023 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-37669088

RESUMO

Decision makers in the Columbia River Basin (CRB) are currently challenged with identifying and characterizing the extent of per- and polyfluoroalkyl substances (PFAS) contamination and human exposure to PFAS. This work aims to develop and pilot a methodology to help decision makers target and prioritize sampling investigations and identify contaminated natural resources. Here we use random forest models to predict ∑PFAS in fish tissue; understanding PFAS levels in fish is particularly important in the CRB because fish can be a major component of tribal and indigenous people diet. Geospatial data, including land cover and distances to known or potential PFAS sources and industries, were leveraged as predictors for modeling. Models were developed and evaluated for Washington state and Oregon using limited available empirical data. Mapped predictions show several areas where detectable concentrations of PFAS in fish tissue are predicted to occur, but prior sampling has not yet confirmed. Variable importance is analyzed to identify potentially important sources of PFAS in fish in this region. The cost-effective methodologies demonstrated here can help address sparsity of existing PFAS occurrence data in environmental media in this and other regions while also giving insights into potentially important drivers and sources of PFAS in fish.


Assuntos
Fluorocarbonos , Algoritmo Florestas Aleatórias , Animais , Humanos , Rios , Oregon , Peixes
17.
J Med Internet Res ; 25: e37540, 2023 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-37155231

RESUMO

BACKGROUND: Norovirus is associated with approximately 18% of the global burden of gastroenteritis and affects all age groups. There is currently no licensed vaccine or available antiviral treatment. However, well-designed early warning systems and forecasting can guide nonpharmaceutical approaches to norovirus infection prevention and control. OBJECTIVE: This study evaluates the predictive power of existing syndromic surveillance data and emerging data sources, such as internet searches and Wikipedia page views, to predict norovirus activity across a range of age groups across England. METHODS: We used existing syndromic surveillance and emerging syndromic data to predict laboratory data indicating norovirus activity. Two methods are used to evaluate the predictive potential of syndromic variables. First, the Granger causality framework was used to assess whether individual variables precede changes in norovirus laboratory reports in a given region or an age group. Then, we used random forest modeling to estimate the importance of each variable in the context of others with two methods: (1) change in the mean square error and (2) node purity. Finally, these results were combined into a visualization indicating the most influential predictors for norovirus laboratory reports in a specific age group and region. RESULTS: Our results suggest that syndromic surveillance data include valuable predictors for norovirus laboratory reports in England. However, Wikipedia page views are less likely to provide prediction improvements on top of Google Trends and Existing Syndromic Data. Predictors displayed varying relevance across age groups and regions. For example, the random forest modeling based on selected existing and emerging syndromic variables explained 60% variance in the ≥65 years age group, 42% in the East of England, but only 13% in the South West region. Emerging data sets highlighted relative search volumes, including "flu symptoms," "norovirus in pregnancy," and norovirus activity in specific years, such as "norovirus 2016." Symptoms of vomiting and gastroenteritis in multiple age groups were identified as important predictors within existing data sources. CONCLUSIONS: Existing and emerging data sources can help predict norovirus activity in England in some age groups and geographic regions, particularly, predictors concerning vomiting, gastroenteritis, and norovirus in the vulnerable populations and historical terms such as stomach flu. However, syndromic predictors were less relevant in some age groups and regions likely due to contrasting public health practices between regions and health information-seeking behavior between age groups. Additionally, predictors relevant to one norovirus season may not contribute to other seasons. Data biases, such as low spatial granularity in Google Trends and especially in Wikipedia data, also play a role in the results. Moreover, internet searches can provide insight into mental models, that is, an individual's conceptual understanding of norovirus infection and transmission, which could be used in public health communication strategies.


Assuntos
Infecções por Caliciviridae , Gastroenterite , Norovirus , Humanos , Infodemiologia , Inglaterra/epidemiologia , Gastroenterite/epidemiologia , Infecções por Caliciviridae/epidemiologia
18.
Int J Biometeorol ; 67(3): 423-437, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36719482

RESUMO

Leptospirosis is a zoonosis that has been linked to hydrometeorological variability. Hydrometeorological averages and extremes have been used before as drivers in the statistical prediction of disease. However, their importance and predictive capacity are still little known. In this study, the use of a random forest classifier was explored to analyze the relative importance of hydrometeorological indices in developing the leptospirosis model and to evaluate the performance of models based on the type of indices used, using case data from three districts in Kelantan, Malaysia, that experience annual monsoonal rainfall and flooding. First, hydrometeorological data including rainfall, streamflow, water level, relative humidity, and temperature were transformed into 164 weekly average and extreme indices in accordance with the Expert Team on Climate Change Detection and Indices (ETCCDI). Then, weekly case occurrences were classified into binary classes "high" and "low" based on an average threshold. Seventeen models based on "average," "extreme," and "mixed" indices were trained by optimizing the feature subsets based on the model computed mean decrease Gini (MDG) scores. The variable importance was assessed through cross-correlation analysis and the MDG score. The average and extreme models showed similar prediction accuracy ranges (61.5-76.1% and 72.3-77.0%) while the mixed models showed an improvement (71.7-82.6% prediction accuracy). An extreme model was the most sensitive while an average model was the most specific. The time lag associated with the driving indices agreed with the seasonality of the monsoon. The rainfall variable (extreme) was the most important in classifying the leptospirosis occurrence while streamflow was the least important despite showing higher correlations with leptospirosis.


Assuntos
Condução de Veículo , Leptospirose , Humanos , Algoritmo Florestas Aleatórias , Leptospirose/epidemiologia , Temperatura , Estações do Ano
19.
Molecules ; 28(19)2023 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-37836708

RESUMO

To investigate the volatile components of Schisandra chinensis (Turcz.) Bail (commonly known as northern Schisandra) of different colors and to explore their similarities and differences, to identify the main flavor substances in the volatile components of the branch exudates of northern schisandra, and finally to establish a fingerprint map of the volatile components of the dried fruits and branch exudates of northern Schisandra of different colors, we used GC-IMS technology to analyze the volatile components of the dried fruits and branch exudates of three different colors of northern Schisandra and established a fingerprint spectra. The results showed that a total of 60 different volatile chemical components were identified in the branch exudates and dried fruits of Schisandra. The components of germplasm resources with different fruit colors were significantly different. The ion mobility spectrum and OPLS-DA results showed that white and yellow fruits were more similar compared to red fruits. The volatile components in dried fruits were significantly higher than those in branch exudates. After VIP (variable importance in projection) screening, 41 key volatile substances in dried fruits and 30 key volatile substances in branch exudates were obtained. After screening by odor activity value (OAV), there were 24 volatile components greater than 1 in both dried fruits and branch exudates. The most important contributing volatile substance was 3-methyl-butanal, and the most important contributing volatile substance in white fruit was (E)-2-hexenal.


Assuntos
Lignanas , Schisandra , Schisandra/química , Frutas/química , Lignanas/química , Extratos Vegetais/química
20.
Molecules ; 28(14)2023 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-37513200

RESUMO

Zangju (Citrus reticulata cv. Manau Gan) is the main citrus cultivar in Derong County, China, with unique aroma and flavour characteristics, but the use of Zangju peel (CRZP) is limited due to a lack of research on its peel. In this study, electronic nose, headspace-gas chromatography-ion mobility spectrometry (HS-GC-IMS), and partial least squares-discriminant analysis (PLS-DA) methods were used to rapidly and comprehensively evaluate the volatile compounds of dried CRZP and to analyse the role of dynamic changes at different maturation stages. The results showed that seventy-eight volatile compounds, mainly aldehydes (25.27%) and monoterpenes (55.88%), were found in the samples at four maturity stages. The contents of alcohols and aldehydes that produce unripe fruit aromas are relatively high in the immature stage (October to November), while the contents of monoterpenoids, ketones and esters in ripe fruit aromas are relatively high in the full ripening stage (January to February). The PLS-DA model results showed that the samples collected at different maturity stages could be effectively discriminated. The VIP method identified 12 key volatile compounds that could be used as flavour markers for CRZP samples collected at different maturity stages. Specifically, the relative volatile organic compounds (VOCs) content of CRZP harvested in October is the highest. This study provides a basis for a comprehensive understanding of the flavour characteristics of CRZP in the ripening process, the application of CRZP as a byproduct in industrial production (food, cosmetics, flavour and fragrance), and a reference for similar research on other C. reticulata varieties.


Assuntos
Citrus , Compostos Orgânicos Voláteis , Nariz Eletrônico , Citrus/química , Cromatografia Gasosa-Espectrometria de Massas/métodos , Álcoois/análise , Aldeídos/análise , Aromatizantes/análise , Monoterpenos/análise , Compostos Orgânicos Voláteis/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA