Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 496
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Stat Med ; 43(20): 3921-3942, 2024 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-38951867

RESUMO

For survival analysis applications we propose a novel procedure for identifying subgroups with large treatment effects, with focus on subgroups where treatment is potentially detrimental. The approach, termed forest search, is relatively simple and flexible. All-possible subgroups are screened and selected based on hazard ratio thresholds indicative of harm with assessment according to the standard Cox model. By reversing the role of treatment one can seek to identify substantial benefit. We apply a splitting consistency criteria to identify a subgroup considered "maximally consistent with harm." The type-1 error and power for subgroup identification can be quickly approximated by numerical integration. To aid inference we describe a bootstrap bias-corrected Cox model estimator with variance estimated by a Jacknife approximation. We provide a detailed evaluation of operating characteristics in simulations and compare to virtual twins and generalized random forests where we find the proposal to have favorable performance. In particular, in our simulation setting, we find the proposed approach favorably controls the type-1 error for falsely identifying heterogeneity with higher power and classification accuracy for substantial heterogeneous effects. Two real data applications are provided for publicly available datasets from a clinical trial in oncology, and HIV.


Assuntos
Simulação por Computador , Infecções por HIV , Modelos de Riscos Proporcionais , Humanos , Análise de Sobrevida
2.
Stat Appl Genet Mol Biol ; 22(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-37991399

RESUMO

The ongoing development of high-throughput technologies is allowing the simultaneous monitoring of the expression levels for hundreds or thousands of biological inputs with the proliferation of what has been coined as omic data sources. One relevant issue when analyzing such data sources is concerned with the detection of differential expression across two experimental conditions, clinical status or two classes of a biological outcome. While a great deal of univariate data analysis approaches have been developed to address the issue, strategies for assessing interaction patterns of differential expression are scarce in the literature and have been limited to ad hoc solutions. This paper contributes to the problem by exploiting the facilities of an ensemble learning algorithm like random forests to propose a measure that assesses the differential expression explained by the interaction of the omic variables so subtle biological patterns may be uncovered as a result. The out of bag error rate, which is an estimate of the predictive accuracy of a random forests classifier, is used as a by-product to propose a new measure that assesses interaction patterns of differential expression. Its performance is studied in synthetic scenarios and it is also applied to real studies on SARS-CoV-2 and colon cancer data where it uncovers associations that remain undetected by other methods. Our proposal is aimed at providing a novel approach that may help the experts in biomedical and life sciences to unravel insightful interaction patterns that may decipher the molecular mechanisms underlying biological and clinical outcomes.


Assuntos
Algoritmos , Neoplasias do Colo , Humanos , Neoplasias do Colo/genética , Aprendizado de Máquina
3.
BMC Infect Dis ; 24(Suppl 2): 334, 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38509486

RESUMO

BACKGROUND: Dengue fever is a well-studied vector-borne disease in tropical and subtropical areas of the world. Several methods for predicting the occurrence of dengue fever in Taiwan have been proposed. However, to the best of our knowledge, no study has investigated the relationship between air quality indices (AQIs) and dengue fever in Taiwan. RESULTS: This study aimed to develop a dengue fever prediction model in which meteorological factors, a vector index, and AQIs were incorporated into different machine learning algorithms. A total of 805 meteorological records from 2013 to 2015 were collected from government open-source data after preprocessing. In addition to well-known dengue-related factors, we investigated the effects of novel variables, including particulate matter with an aerodynamic diameter < 10 µm (PM10), PM2.5, and an ultraviolet index, for predicting dengue fever occurrence. The collected dataset was randomly divided into an 80% training set and a 20% test set. The experimental results showed that the random forests achieved an area under the receiver operating characteristic curve of 0.9547 for the test set, which was the best compared with the other machine learning algorithms. In addition, the temperature was the most important factor in our variable importance analysis, and it showed a positive effect on dengue fever at < 30 °C but had less of an effect at > 30 °C. The AQIs were not as important as temperature, but one was selected in the process of filtering the variables and showed a certain influence on the final results. CONCLUSIONS: Our study is the first to demonstrate that AQI negatively affects dengue fever occurrence in Taiwan. The proposed prediction model can be used as an early warning system for public health to prevent dengue fever outbreaks.


Assuntos
Dengue , Algoritmo Florestas Aleatórias , Humanos , Dengue/epidemiologia , Taiwan/epidemiologia , Temperatura , Surtos de Doenças
4.
Environ Sci Technol ; 58(25): 10920-10931, 2024 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-38861590

RESUMO

Distinguishing the effects of different fine particulate matter components (PMCs) is crucial for mitigating their effects on human health. However, the sparse distribution of locations where PM is collected for component analysis makes it challenging to investigate the relevant health effects. This study aimed to investigate the agreement between data-fusion-enhanced exposure assessment and site monitoring data in estimating the effects of PMCs on gestational diabetes mellitus (GDM). We first improved the spatial resolution and accuracy of exposure assessment for five major PMCs (EC, OM, NO3-, NH4+, and SO42-) in the Pearl River Delta region by a data fusion model that combined inputs from multiple sources using a random forest model (10-fold cross-validation R2: 0.52 to 0.61; root mean square error: 0.55 to 2.26 µg/m3). Next, we compared the associations between exposures to PMCs during pregnancy and GDM in a hospital-based cohort of 1148 pregnant women in Heshan, China, using both site monitoring data and data-fusion model estimates. The comparative analysis showed that the data-fusion-based exposure generated stronger estimates of identifying statistical disparities. This study suggests that data-fusion-enhanced estimates can improve exposure assessment and potentially mitigate the misclassification of population exposure arising from the utilization of site monitoring data.


Assuntos
Material Particulado , Material Particulado/análise , Humanos , China , Feminino , Rios/química , Gravidez , Poluentes Atmosféricos/análise , Monitoramento Ambiental/métodos , Estudos Epidemiológicos , Exposição Ambiental , Diabetes Gestacional/epidemiologia
5.
Artigo em Inglês | MEDLINE | ID: mdl-38842593

RESUMO

PURPOSE: To investigate the xenobiotic profiles of patients with neovascular age-related macular degeneration (nAMD) undergoing anti-vascular endothelial growth factor (anti-VEGF) intravitreal therapy (IVT) to identify biomarkers indicative of clinical phenotypes through advanced AI methodologies. METHODS: In this cross-sectional observational study, we analyzed 156 peripheral blood xenobiotic features in a cohort of 46 nAMD patients stratified by choroidal neovascularization (CNV) control under anti-VEGF IVT. We employed Liquid Chromatography-Tandem Mass Spectrometry (LC-MS/MS) for measurement and leveraged an AI-driven iterative Random Forests (iRF) approach for robust pattern recognition and feature selection, aligning molecular profiles with clinical phenotypes. RESULTS: AI-augmented iRF models effectively refined the metabolite spectrum by discarding non-predictive elements. Perfluorooctanesulfonate (PFOS) and Ethyl ß-glucopyranoside were identified as significant biomarkers through this process, associated with various clinically relevant phenotypes. Unlike single metabolite classes, drug metabolites were distinctly correlated with subretinal fluid presence. CONCLUSIONS: This study underscores the enhanced capability of AI, particularly iRF, in dissecting complex metabolomic data to elucidate the xenobiotic landscape of nAMD and environmental impact on the disease. The preliminary biomarkers discovered offer promising directions for personalized treatment strategies, although further validation in broader cohorts is essential for clinical application.

6.
Artigo em Inglês | MEDLINE | ID: mdl-38948964

RESUMO

BACKGROUND: Identifying language disorders earlier can help children receive the support needed to improve developmental outcomes and quality of life. Despite the prevalence and impacts of persistent language disorder, there are surprisingly no robust predictor tools available. This makes it difficult for researchers to recruit young children into early intervention trials, which in turn impedes advances in providing effective early interventions to children who need it. AIMS: To validate externally a predictor set of six variables previously identified to be predictive of language at 11 years of age, using data from the Longitudinal Study of Australian Children (LSAC) birth cohort. Also, to examine whether additional LSAC variables arose as predictive of language outcome. METHODS & PROCEDURES: A total of 5107 children were recruited to LSAC with developmental measures collected from 0 to 3 years. At 11-12 years, children completed the Clinical Evaluation of Language Fundamentals, 4th Edition, Recalling Sentences subtest. We used SuperLearner to estimate the accuracy of six previously identified parent-reported variables from ages 2-3 years in predicting low language (sentence recall score ≥ 1.5 SD below the mean) at 11-12 years. Random forests were used to identify any additional variables predictive of language outcome. OUTCOMES & RESULTS: Complete data were available for 523 participants (52.20% girls), 27 (5.16%) of whom had a low language score. The six predictors yielded fair accuracy: 78% sensitivity (95% confidence interval (CI) = [58, 91]) and 71% specificity (95% CI = [67, 75]). These predictors relate to sentence complexity, vocabulary and behaviour. The random forests analysis identified similar predictors. CONCLUSIONS & IMPLICATIONS: We identified an ultra-short set of variables that predicts 11-12-year language outcome with 'fair' accuracy. In one of few replication studies of this scale in the field, these methods have now been conducted across two population-based cohorts, with consistent results. An imminent practical implication of these findings is using these predictors to aid recruitment into early language intervention studies. Future research can continue to refine the accuracy of early predictors to work towards earlier identification in a clinical context. WHAT THIS PAPER ADDS: What is already known on the subject There are no robust predictor sets of child language disorder despite its prevalence and far-reaching impacts. A previous study identified six variables collected at age 2-3 years that predicted 11-12-year language with 75% sensitivity and 81% specificity, which warranted replication in a separate cohort. What this study adds to the existing knowledge We used machine learning methods to identify a set of six questions asked at age 2-3 years with ≥ 71% sensitivity and specificity for predicting low language outcome at 11-12 years, now showing consistent results across two large-scale population-based cohort studies. What are the potential or clinical implications of this work? This predictor set is more accurate than existing feasible methods and can be translated into a low-resource and time-efficient recruitment tool for early language intervention studies, leading to improved clinical service provision for young children likely to have persisting language difficulties.

7.
Sensors (Basel) ; 24(15)2024 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-39123855

RESUMO

The detection performance of radar is significantly impaired by active jamming and mutual interference from other radars. This paper proposes a radio signal modulation recognition method to accurately recognize these signals, which helps in the jamming cancellation decisions. Based on the ensemble learning stacking algorithm improved by meta-feature enhancement, the proposed method adopts random forests, K-nearest neighbors, and Gaussian naive Bayes as the base-learners, with logistic regression serving as the meta-learner. It takes the multi-domain features of signals as input, which include time-domain features including fuzzy entropy, slope entropy, and Hjorth parameters; frequency-domain features, including spectral entropy; and fractal-domain features, including fractal dimension. The simulation experiment, including seven common signal types of radar and active jamming, was performed for the effectiveness validation and performance evaluation. Results proved the proposed method's performance superiority to other classification methods, as well as its ability to meet the requirements of low signal-to-noise ratio and few-shot learning.

8.
BMC Bioinformatics ; 24(1): 258, 2023 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-37330468

RESUMO

Capturing the conditional covariances or correlations among the elements of a multivariate response vector based on covariates is important to various fields including neuroscience, epidemiology and biomedicine. We propose a new method called Covariance Regression with Random Forests (CovRegRF) to estimate the covariance matrix of a multivariate response given a set of covariates, using a random forest framework. Random forest trees are built with a splitting rule specially designed to maximize the difference between the sample covariance matrix estimates of the child nodes. We also propose a significance test for the partial effect of a subset of covariates. We evaluate the performance of the proposed method and significance test through a simulation study which shows that the proposed method provides accurate covariance matrix estimates and that the Type-1 error is well controlled. An application of the proposed method to thyroid disease data is also presented. CovRegRF is implemented in a freely available R package on CRAN.


Assuntos
Modelos Estatísticos , Algoritmo Florestas Aleatórias , Criança , Humanos , Simulação por Computador
9.
Appl Environ Microbiol ; 89(2): e0116722, 2023 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-36651726

RESUMO

Contamination of food animal products by Escherichia coli is a leading cause of foodborne disease outbreaks, hospitalizations, and deaths in humans. Chicken is the most consumed meat both in the United States and across the globe according to the U.S. Department of Agriculture. Although E. coli is a ubiquitous commensal bacterium of the guts of humans and animals, its ability to acquire antimicrobial resistance (AMR) genes and virulence factors (VFs) can lead to the emergence of pathogenic strains that are resistant to critically important antibiotics. Thus, it is important to identify the genetic factors that contribute to the virulence and AMR of E. coli. In this study, we performed in-depth genomic evaluation of AMR genes and VFs of E. coli genomes available through the National Antimicrobial Resistance Monitoring System GenomeTrackr database. Our objective was to determine the genetic relatedness of chicken production isolates and human clinical isolates. To achieve this aim, we first developed a massively parallel analytical pipeline (Reads2Resistome) to accurately characterize the resistome of each E. coli genome, including the AMR genes and VFs harbored. We used random forests and hierarchical clustering to show that AMR genes and VFs are sufficient to classify isolates into different pathogenic phylogroups and host origin. We found that the presence of key type III secretion system and AMR genes differentiated human clinical isolates from chicken production isolates. These results further improve our understanding of the interconnected role AMR genes and VFs play in shaping the evolution of pathogenic E. coli strains. IMPORTANCE Pathogenic Escherichia coli causes disease in both humans and food-producing animals. E. coli pathogenesis is dependent on a repertoire of virulence factors and antimicrobial resistance genes. Food-borne outbreaks are highly associated with the consumption of undercooked and contaminated food products. This association highlights the need to understand the genetic factors that make E. coli virulent and pathogenic in humans and poultry. This research shows that E. coli isolates originating from human clinical settings and chicken production harbor different antimicrobial resistance genes and virulence factors that can be used to classify them into phylogroups and host origins. In addition, to aid in the repeatability and reproducibility of the results presented in this study, we have made a public repository of the Reads2Resistome pipeline and have provided the accession numbers associated with the E. coli genomes analyzed.


Assuntos
Anti-Infecciosos , Infecções por Escherichia coli , Animais , Humanos , Escherichia coli , Fatores de Virulência/genética , Antibacterianos/farmacologia , Galinhas/microbiologia , Reprodutibilidade dos Testes , Farmacorresistência Bacteriana/genética , Infecções por Escherichia coli/veterinária , Infecções por Escherichia coli/microbiologia
10.
Am J Obstet Gynecol ; 229(3): 327.e1-327.e16, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37315754

RESUMO

BACKGROUND: Previous predictive models using logistic regression for stillbirth do not leverage the advanced and nuanced techniques involved in sophisticated machine learning methods, such as modeling nonlinear relationships between outcomes. OBJECTIVE: This study aimed to create and refine machine learning models for predicting stillbirth using data available before viability (22-24 weeks) and throughout pregnancy, as well as demographic, medical, and prenatal visit data, including ultrasound and fetal genetics. STUDY DESIGN: This is a secondary analysis of the Stillbirth Collaborative Research Network, which included data from pregnancies resulting in stillborn and live-born infants delivered at 59 hospitals in 5 diverse regions across the United States from 2006 to 2009. The primary aim was the creation of a model for predicting stillbirth using data available before viability. Secondary aims included refining models with variables available throughout pregnancy and determining variable importance. RESULTS: Among 3000 live births and 982 stillbirths, 101 variables of interest were identified. Of the models incorporating data available before viability, the random forests model had 85.1% accuracy (area under the curve) and high sensitivity (88.6%), specificity (85.3%), positive predictive value (85.3%), and negative predictive value (84.8%). A random forests model using data collected throughout pregnancy resulted in accuracy of 85.0%; this model had 92.2% sensitivity, 77.9% specificity, 84.7% positive predictive value, and 88.3% negative predictive value. Important variables in the previability model included previous stillbirth, minority race, gestational age at the earliest prenatal visit and ultrasound, and second-trimester serum screening. CONCLUSION: Applying advanced machine learning techniques to a comprehensive database of stillbirths and live births with unique and clinically relevant variables resulted in an algorithm that could accurately identify 85% of pregnancies that would result in stillbirth, before they reached viability. Once validated in representative databases reflective of the US birthing population and then prospectively, these models may provide effective risk stratification and clinical decision-making support to better identify and monitor those at risk of stillbirth.


Assuntos
Cuidado Pré-Natal , Natimorto , Gravidez , Lactente , Feminino , Humanos , Natimorto/epidemiologia , Idade Gestacional , Segundo Trimestre da Gravidez , Aprendizado de Máquina , Fatores de Risco
11.
Stat Med ; 42(10): 1542-1564, 2023 05 10.
Artigo em Inglês | MEDLINE | ID: mdl-36815690

RESUMO

Linkage between drug claims data and clinical outcome allows a data-driven experimental approach to drug repurposing. We develop an estimation procedure based on generalized random forests for estimation of time-point specific average treatment effects in a time-to-event setting with competing risks. To handle right-censoring, we propose a two-step procedure for estimation, applying inverse probability weighting to construct time-point specific weighted outcomes as input for the generalized random forest. The generalized random forests adaptively handle covariate effects on the treatment assignment by applying a splitting rule that targets a causal parameter. Using simulated data we demonstrate that the method is effective for a causal search through a list of treatments to be ranked according to the magnitude of their effect on clinical outcome. We illustrate the method using the Danish national health registries where it is of interest to discover drugs with an unexpected protective effect against relapse of severe depression.


Assuntos
Algoritmo Florestas Aleatórias , Humanos , Probabilidade
12.
Environ Sci Technol ; 57(46): 18139-18150, 2023 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-37595051

RESUMO

A growing body of literature suggests that developmental exposure to individual or mixtures of environmental chemicals (ECs) is associated with autism spectrum disorder (ASD). However, investigating the effect of interactions among these ECs can be challenging. We introduced a combination of the classical exposure-mixture Weighted Quantile Sum (WQS) regression and a machine-learning method termed Signed iterative Random Forest (SiRF) to discover synergistic interactions between ECs that are (1) associated with higher odds of ASD diagnosis, (2) mimic toxicological interactions, and (3) are present only in a subset of the sample whose chemical concentrations are higher than certain thresholds. In a case-control Childhood Autism Risks from Genetics and Environment (CHARGE) study, we evaluated multiordered synergistic interactions among 62 ECs measured in the urine samples of 479 children in association with increased odds for ASD diagnosis (yes vs no). WQS-SiRF identified two synergistic two-ordered interactions between (1) trace-element cadmium (Cd) and the organophosphate pesticide metabolite diethyl-phosphate (DEP); and (2) 2,4,6-trichlorophenol (TCP-246) and DEP. Both interactions were suggestively associated with increased odds of ASD diagnosis in the subset of children with urinary concentrations of Cd, DEP, and TCP-246 above the 75th percentile. This study demonstrates a novel method that combines the inferential power of WQS and the predictive accuracy of machine-learning algorithms to discover potentially biologically relevant chemical-chemical interactions associated with ASD.


Assuntos
Transtorno do Espectro Autista , Praguicidas , Oligoelementos , Criança , Humanos , Fenóis , Cádmio
13.
J Appl Microbiol ; 134(11)2023 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-37930836

RESUMO

BACKGROUND: Pseudomonas aeruginosa is a significant clinical pathogen that poses a substantial threat due to its extensive drug resistance. The rapid and precise identification of this resistance is crucial for effective clinical treatment. Although matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has been used for antibiotic susceptibility differentiation of some bacteria in recent years, the genetic diversity of P. aeruginosa complicates population analysis. Rapid identification of antimicrobial resistance (AMR) in P. aeruginosa based on a large amount of MALDI-TOF-MS data has not yet been reported. In this study, we employed publicly available datasets for P. aeruginosa, which contain data on bacterial resistance and MALDI-TOF-MS spectra. We introduced a deep neural network model, synergized with a strategic sampling approach (SMOTEENN) to construct a predictive framework for AMR of three widely used antibiotics. RESULTS: The framework achieved area under the curve values of 90%, 85%, and 77% for Tobramycin, Cefepime, and Meropenem, respectively, surpassing conventional classifiers. Notably, random forest algorithm was used to assess the significance of features and post-hoc analysis was conducted on the top 10 features using Cohen's d. This analysis revealed moderate effect sizes (d = 0.5-0.8) in Tobramycin and Cefepime models. Finally, putative AMR biomarkers were identified in this study. CONCLUSIONS: This work presented an AMR prediction tool specifically designed for P. aeruginosa, which offers a hopeful pathway for clinical decision-making.


Assuntos
Pseudomonas aeruginosa , Tobramicina , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Pseudomonas aeruginosa/genética , Cefepima/farmacologia , Fatores de Tempo , Tobramicina/farmacologia
14.
Phytopathology ; 113(8): 1483-1493, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-36880796

RESUMO

Constructing models that accurately predict Fusarium head blight (FHB) epidemics and are also amenable to large-scale deployment is a challenging task. In the United States, the emphasis has been on simple logistic regression (LR) models, which are easy to implement but may suffer from lower accuracies when compared with more complicated, harder-to-deploy (over large geographies) model frameworks such as functional or boosted regressions. This article examined the plausibility of random forests (RFs) for the binary prediction of FHB epidemics as a possible mediation between model simplicity and complexity without sacrificing accuracy. A minimalist set of predictors was also desirable rather than having the RF model use all 90 candidate variables as predictors. The input predictor set was filtered with the aid of three RF variable selection algorithms (Boruta, varSelRF, and VSURF), using resampling techniques to quantify the variability and stability of selected variable sets. Post-selection filtering produced 58 competitive RF models with no more than 14 predictors each. One variable representing temperature stability in the 20 days before anthesis was the most frequently selected predictor. This was a departure from the prominence of relative humidity-based variables previously reported in LR models for FHB. The RF models had overall superior predictive performance over the LR models and may be suitable candidates for use by the Fusarium Head Blight Prediction Center.

15.
Proc Natl Acad Sci U S A ; 117(32): 19061-19071, 2020 08 11.
Artigo em Inglês | MEDLINE | ID: mdl-32719123

RESUMO

Given the powerful implications of relationship quality for health and well-being, a central mission of relationship science is explaining why some romantic relationships thrive more than others. This large-scale project used machine learning (i.e., Random Forests) to 1) quantify the extent to which relationship quality is predictable and 2) identify which constructs reliably predict relationship quality. Across 43 dyadic longitudinal datasets from 29 laboratories, the top relationship-specific predictors of relationship quality were perceived-partner commitment, appreciation, sexual satisfaction, perceived-partner satisfaction, and conflict. The top individual-difference predictors were life satisfaction, negative affect, depression, attachment avoidance, and attachment anxiety. Overall, relationship-specific variables predicted up to 45% of variance at baseline, and up to 18% of variance at the end of each study. Individual differences also performed well (21% and 12%, respectively). Actor-reported variables (i.e., own relationship-specific and individual-difference variables) predicted two to four times more variance than partner-reported variables (i.e., the partner's ratings on those variables). Importantly, individual differences and partner reports had no predictive effects beyond actor-reported relationship-specific variables alone. These findings imply that the sum of all individual differences and partner experiences exert their influence on relationship quality via a person's own relationship-specific experiences, and effects due to moderation by individual differences and moderation by partner-reports may be quite small. Finally, relationship-quality change (i.e., increases or decreases in relationship quality over the course of a study) was largely unpredictable from any combination of self-report variables. This collective effort should guide future models of relationships.


Assuntos
Relações Interpessoais , Aprendizado de Máquina , Características da Família , Feminino , Humanos , Estudos Longitudinais , Masculino , Autorrelato
16.
BMC Med Inform Decis Mak ; 23(1): 110, 2023 06 16.
Artigo em Inglês | MEDLINE | ID: mdl-37328784

RESUMO

OBJECTIVE: Precision medicine requires reliable identification of variation in patient-level outcomes with different available treatments, often termed treatment effect heterogeneity. We aimed to evaluate the comparative utility of individualized treatment selection strategies based on predicted individual-level treatment effects from a causal forest machine learning algorithm and a penalized regression model. METHODS: Cohort study characterizing individual-level glucose-lowering response (6 month reduction in HbA1c) in people with type 2 diabetes initiating SGLT2-inhibitor or DPP4-inhibitor therapy. Model development set comprised 1,428 participants in the CANTATA-D and CANTATA-D2 randomised clinical trials of SGLT2-inhibitors versus DPP4-inhibitors. For external validation, calibration of observed versus predicted differences in HbA1c in patient strata defined by size of predicted HbA1c benefit was evaluated in 18,741 patients in UK primary care (Clinical Practice Research Datalink). RESULTS: Heterogeneity in treatment effects was detected in clinical trial participants with both approaches (proportion predicted to have a benefit on SGLT2-inhibitor therapy over DPP4-inhibitor therapy: causal forest: 98.6%; penalized regression: 81.7%). In validation, calibration was good with penalized regression but sub-optimal with causal forest. A strata with an HbA1c benefit > 10 mmol/mol with SGLT2-inhibitors (3.7% of patients, observed benefit 11.0 mmol/mol [95%CI 8.0-14.0]) was identified using penalized regression but not causal forest, and a much larger strata with an HbA1c benefit 5-10 mmol with SGLT2-inhibitors was identified with penalized regression (regression: 20.9% of patients, observed benefit 7.8 mmol/mol (95%CI 6.7-8.9); causal forest 11.6%, observed benefit 8.7 mmol/mol (95%CI 7.4-10.1). CONCLUSIONS: Consistent with recent results for outcome prediction with clinical data, when evaluating treatment effect heterogeneity researchers should not rely on causal forest or other similar machine learning algorithms alone, and must compare outputs with standard regression, which in this evaluation was superior.


Assuntos
Diabetes Mellitus Tipo 2 , Inibidores da Dipeptidil Peptidase IV , Inibidores do Transportador 2 de Sódio-Glicose , Humanos , Diabetes Mellitus Tipo 2/tratamento farmacológico , Hemoglobinas Glicadas , Estudos de Coortes , Medicina de Precisão , Dipeptidil Peptidase 4/uso terapêutico , Transportador 2 de Glucose-Sódio/uso terapêutico , Hipoglicemiantes/uso terapêutico , Inibidores da Dipeptidil Peptidase IV/uso terapêutico , Inibidores do Transportador 2 de Sódio-Glicose/uso terapêutico , Resultado do Tratamento
17.
J Res Adolesc ; 33(3): 870-889, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-36938634

RESUMO

As 20% of adolescents develop emotion regulation difficulties, it is important to identify important early predictors thereof. Using the machine learning algorithm SEM-forests, we ranked the importance of (87) candidate variables assessed at age 13 in predicting quadratic latent trajectory models of emotion regulation development from age 14 to 18. Participants were 497 Dutch families. Results indicated that the most important predictors were individual differences (e.g., in personality), aspects of relationship quality and conflict behaviors with parents and peers, and internalizing and externalizing problems. Relatively less important were demographics, bullying, delinquency, substance use, and specific parenting practices-although negative parenting practices ranked higher than positive ones. We discuss implications for theory and interventions, and present an open source risk assessment tool, ERRATA.


Assuntos
Comportamento do Adolescente , Regulação Emocional , Humanos , Adolescente , Poder Familiar/psicologia , Desenvolvimento do Adolescente , Pais , Comportamento do Adolescente/psicologia
18.
J Environ Manage ; 330: 117114, 2023 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-36586368

RESUMO

Forest carbon stocks and sinks (CSSs) have been widely estimated using climate classification tables and linear regression (LR) models with common independent variables (IVs) such as the average diameter at breast height (DBH) of stems and root shoot ratio. However, this approach is relatively ineffective when the explanatory power of IVs is lower than that of unobservable variables. Various environmental and anthropogenic factors affect target variables that cause the correlation between them to be chaotic. Here, we designed a knife set (KS) approach combining LR models and the wandering through random forests (WTF) algorithm and applied it in a specific case of Phyllostachys edulis (Carrière) J. Houz. (P. edulis) forests, which have an irregular relationship between their belowground carbon (BGC) stocks and average DBH. We then validated the KS approach performed by cluster computing to estimate the aboveground carbon (AGC) and BGC stocks and the total net primary production (TNPP). The estimated CSSs were compared to the benchmark of the methodology that applied Tier 1 in the Intergovernmental Panel on Climate Change (IPCC) Guidelines for National Greenhouse Gas Inventories via 10-fold cross validation, and the KS approach significantly increased precision and accuracy of estimations. Our approach provides general insights to accurately estimate forest CSSs relying on evidence-based field data, even if some target variables are divergent in specific forest types. We also pointed out the reason why current fancy models containing machine learning (ML) or deep learning algorithms are not effective in predicting the target variables of certain chaotic systems is perhaps that the total explanatory power of observable variables is less than that of the total unobservable variables. Quantifying unobservable variables into observable variables is a linchpin of future works related to chaotic system estimation.


Assuntos
Sequestro de Carbono , Carbono , Mudança Climática
19.
Environ Monit Assess ; 195(8): 923, 2023 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-37410180

RESUMO

Anthropogenic eutrophication is a global environmental problem threatening the ecological functions of many inland freshwaters and diminishing their abilities to meet their designated uses. Water authorities worldwide are being pressed to improve their abilities to monitor, predict, and manage the incidence of harmful algal blooms (HABs). While most water quality management decisions are still based on conventional monitoring programs that lack the needed spatio-temporal resolution for effective lake/reservoir management, recent advances in remote sensing are providing new opportunities towards better understanding water quality variability in these important freshwater systems. This study assessed the potential of using the Sentinel 2 Multispectral Instrument to predict and assess the spatio-temporal variability in the water quality of the Qaraoun Reservoir, a poorly monitored Mediterranean hypereutrophic monomictic reservoir that is subject to extensive periods of HABs. The work first evaluated the ability to transfer and recalibrate previously developed reservoir-specific Landsat 7 and 8 water quality models when used with Sentinel 2 data. The results showed poor transferability between Landsat and Sentinel 2, with most models experiencing a significant drop in their predictive skill even after recalibration. Sentinel 2 models were then developed for the reservoir based on 153 water quality samples collected over 2 years. The models explored different functional forms, including multiple linear regressions (MLR), multivariate adaptive regression splines (MARS), random forests (RF), and support vector regressions (SVR). The results showed that the RF models outperformed their MLR, MARS, and SVR counterparts with regard to predicting chlorophyll-a, total suspended solids, Secchi disk depth, and phycocyanin. The coefficient of determination (R2) for the RF models varied between 85% for TSS up to 95% for SDD. Moreover, the study explored the potential of quantifying cyanotoxin concentrations indirectly from the Sentinel 2 MSI imagery by benefiting from the strong relationship between cyanotoxin levels and chlorophyll-a concentrations.


Assuntos
Monitoramento Ambiental , Qualidade da Água , Monitoramento Ambiental/métodos , Clorofila/análise , Clorofila A/análise , Lagos , Eutrofização , Proliferação Nociva de Algas , Toxinas de Cianobactérias
20.
Entropy (Basel) ; 25(7)2023 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-37509958

RESUMO

Rampant terrorism poses a serious threat to the national security of many countries worldwide, particularly due to separatism and extreme nationalism. This paper focuses on the development and application of a temporal self-exciting point process model to the terror data of three countries: the US, Turkey, and the Philippines. To account for occurrences with the same time-stamp, this paper introduces the order mark and reward term in parameter selection. The reward term considers the triggering effect between events in the same time-stamp but different order. Additionally, this paper provides comparisons between the self-exciting models generated by day-based and month-based arrival times. Another highlight of this paper is the development of a model to predict the number of terror events using a combination of simulation and machine learning, specifically the random forest method, to achieve better predictions. This research offers an insightful approach to discover terror event patterns and forecast future occurrences of terror events, which may have practical application towards national security strategies.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA