RESUMO
OBJECTIVE: This study aims to predict the number of undiagnosed HIV cases at the ZIP Code-level in Atlanta, Georgia, based on publicly available information. STUDY DESIGN: Statistical modeling. METHODS: We fitted a Bayesian hierarchical Binomial model to county-level estimates of the passive-surveillance-system. The denominator was the true total HIV cases arising from a Negative Binomial distribution. The trial probability, known as ascertainment probability, depended on socio-economic determinants of HIV, retained via feature-selection algorithms. Data were obtained from CDC's HIV report for End of the HIV Epidemic and the American Community Survey. The prediction model was assessed out-of-sample in Georgia counties. We combined socio-economic data with the posterior predictive distribution of the coefficients to predict the mean ascertainment probability and total HIV cases at the ZIP Code-level. These estimates were spatially smoothed and aggregated at the county-level for secondary validations. RESULTS: The county-level model showed good mixing properties and predictive accuracy. The mean ascertainment probability calibrated to the ZIP Code-level varied from 78.4% (95% credible interval: 24.4%-99.3%) to 93.8% (95%CI: 80.6%-99.8%). Further, the predicted undiagnosed HIV cases ranged between 12 (95%CI: 6-19; ZIP Code 30322) to 1603 (95%CI 1209-1968; ZIP Code 30318). CONCLUSIONS: Our findings provide a more complete picture of the relative burden of HIV across ZIP codes. Such information can be used by Local Health Departments to identify underserved areas and allocate resources accordingly. Furthermore, our methodological approach can be applied to complement the information obtained from passive surveillance, especially when more resource-intensive approaches are not available or are unfeasible to employ.
RESUMO
Introduction: Sepsis is a complex clinical syndrome characterized by a heterogenous host immune response. Historically, static protein and transcriptomic metrics have been employed to describe the underlying biology. Here, we tested the hypothesis that ex vivo functional TNF expression as well as an immunologic endotype based on both IFNγ and TNF expression could be used to model clinical outcomes in sepsis patients. Methods: This prospective, observational study of patient samples collected from the SPIES consortium included patients at five health systems enrolled over 17 months, with 46 healthy control patients, 68 ICU patients without sepsis, and 107 ICU patients with sepsis. Whole blood was collected on day 1, 4, and 7 of ICU admission. Outcomes included in-hospital and 180-day mortality and non-favorable discharge disposition defined by skilled nursing facility, long-term acute care facility, or hospice. Whole blood ELISpot assays were conducted to quantify TNF expression [stimulated by lipopolysaccharide (LPS)] and IFNγ expression (stimulated by anti-CD3/CD28 mAb), which were then used for assignment to one of four subgroups including an 'immunocompetent', 'immunosuppressed endotype', and two 'mixed' endotypes. Results: Whole blood TNF spot-forming units were significantly increased in septic and CINS patients on days 4 and 7 compared to healthy subjects. In contrast, TNF expression per cell on days 1, 4, and 7 was significantly lower in both septic and critically ill non-septic (CINS) patients compared to healthy subjects. Early increases in total TNF expression were associated with favorable discharge disposition and lower in-hospital mortality. 'Immunocompetent' endotype patients on day 1 had a higher proportion of favorable to non-favorable discharges compared to the 'immunosuppressed' endotype. Similarly, 'immunocompetent' endotype patients on day 4 had a higher in-hospital survival compared to the 'immunosuppressed' endotype patients. Finally, among septic patients, decreased total TNF and IFNγ expression were associated with 180-day mortality. Conclusions: Increased ex vivo whole blood TNF expression is associated with improved clinical outcomes. Further, the early 'immunocompetent' endotype is associated with favorable discharge and improved in-hospital and 180-day survival. The ability to functionally stratify septic patients based on blood cell function ex vivo may allow for identification of future immune modulating therapies.
Assuntos
Interferon gama , Sepse , Fator de Necrose Tumoral alfa , Humanos , Sepse/imunologia , Sepse/mortalidade , Sepse/sangue , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Interferon gama/sangue , Interferon gama/metabolismo , Estudos Prospectivos , Fator de Necrose Tumoral alfa/sangue , Unidades de Terapia Intensiva , Adulto , Mortalidade Hospitalar , Biomarcadores/sangueRESUMO
PURPOSE: The application of 3D printing technology in drug delivery is often limited by the challenges of achieving precise control over drug release profiles. The goal of this study was to apply surface equations to construct 3D printed tablet models, adjust the functional parameters to obtain multiple tablet models and to correlate the model parameters with the in vitro drug release behavior. METHODS: This study reports the development of 3D-printed tablets using surface geometries controlled by mathematical functions to modulate drug release. Utilizing fused deposition modeling (FDM) coupled with hot-melt extrusion (HME) technology, personalized drug delivery systems were produced using thermoplastic polymers. Different tablet shapes (T1-T5) were produced by varying the depth of the parabolic surface (b = 4, 2, 0, -2, -4 mm) to assess the impact of surface curvature on drug dissolution. RESULTS: The T5 formulation, with the greatest surface curvature, demonstrated the fastest drug release, achieving complete release within 4 h. In contrast, T1 and T2 tablets exhibited a slower release over approximately 6 h. The correlation between surface area and drug release rate was confirmed, supporting the predictions of the Noyes-Whitney equation. Differential Scanning Calorimetry (DSC) and Scanning Electron Microscope (SEM) analyses verified the uniform dispersion of acetaminophen and the consistency of the internal structures, respectively. CONCLUSIONS: The precise control of tablet surface geometry effectively tailored drug release profiles, enhancing patient compliance and treatment efficacy. This novel approach offers significant advancements in personalized medicine by providing a highly reproducible and adaptable platform for optimizing drug delivery.
RESUMO
BACKGROUND: Random forests have become popular for clinical risk prediction modeling. In a case study on predicting ovarian malignancy, we observed training AUCs close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behavior of random forests for probability estimation by (1) visualizing data space in three real-world case studies and (2) a simulation study. METHODS: For the case studies, multinomial risk estimates were visualized using heatmaps in a 2-dimensional subspace. The simulation study included 48 logistic data-generating mechanisms (DGM), varying the predictor distribution, the number of predictors, the correlation between predictors, the true AUC, and the strength of true predictors. For each DGM, 1000 training datasets of size 200 or 4000 with binary outcomes were simulated, and random forest models were trained with minimum node size 2 or 20 using the ranger R package, resulting in 192 scenarios in total. Model performance was evaluated on large test datasets (N = 100,000). RESULTS: The visualizations suggested that the model learned "spikes of probability" around events in the training set. A cluster of events created a bigger peak or plateau (signal), isolated events local peaks (noise). In the simulation study, median training AUCs were between 0.97 and 1 unless there were 4 binary predictors or 16 binary predictors with a minimum node size of 20. The median discrimination loss, i.e., the difference between the median test AUC and the true AUC, was 0.025 (range 0.00 to 0.13). Median training AUCs had Spearman correlations of around 0.70 with discrimination loss. Median test AUCs were higher with higher events per variable, higher minimum node size, and binary predictors. Median training calibration slopes were always above 1 and were not correlated with median test slopes across scenarios (Spearman correlation - 0.11). Median test slopes were higher with higher true AUC, higher minimum node size, and higher sample size. CONCLUSIONS: Random forests learn local probability peaks that often yield near perfect training AUCs without strongly affecting AUCs on test data. When the aim is probability estimation, the simulation results go against the common recommendation to use fully grown trees in random forest models.
RESUMO
Rock excavation is essentially an unloading behavior, and its mechanical properties are significantly different from those under loading conditions. In response to the current deficiencies in the peak strength prediction of rocks under unloading conditions, this study proposes a hybrid learning model for the intelligent prediction of the unloading strength of rocks using simple parameters in rock unloading tests. The XGBoost technique was used to construct a model, and the PSO-XGBoost hybrid model was developed by employing particle swarm optimization (PSO) to refine the XGBoost parameters for better prediction. In order to verify the validity and accuracy of the proposed hybrid model, 134 rock sample sets containing various common rock types in rock excavation were collected from international and Chinese publications for the purpose of modeling, and the rock unloading strength prediction results were compared with those obtained by the Random Forest (RF) model, the Support Vector Machine (SVM) model, the XGBoost (XGBoost) model, and the Grid Search Method-based XGBoost (GS-XGBoost) model. Meanwhile, five statistical indicators, including the coefficient of determination (R2), mean absolute error (MAE), mean absolute percentage error (MAPE), mean square error (MSE), and root mean square error (RMSE), were calculated to check the acceptability of these models from a quantitative perspective. A review of the comparison results revealed that the proposed PSO-XGBoost hybrid model provides a better performance than the others in predicting rock unloading strength. Finally, the importance of the effect of each input feature on the generalization performance of the hybrid model was assessed. The insights garnered from this research offer a substantial reference for tunnel excavation design and other representative projects.
RESUMO
Background: Diarrheal disease, characterized by high morbidity and mortality rates, continues to be a serious public health concern, especially in developing nations such as Ethiopia. The significant burden it imposes on these countries underscores the importance of identifying predictors of diarrhea. The use of machine learning techniques to identify significant predictors of diarrhea in children under the age of 5 in Ethiopia's Amhara Region is not well documented. Therefore, this study aimed to clarify these issues. Methods: This study's data have been extracted from the Ethiopian Population and Health Survey. We have applied machine learning ensemble classifier models such as random forests, logistic regression, K-nearest neighbors, decision trees, support vector machines, gradient boosting, and naive Bayes models to predict the determinants of diarrhea in children under the age of 5 in Ethiopia. Finally, Shapley Additive exPlanation (SHAP) value analysis was performed to predict diarrhea. Result: Among the seven models used, the random forest algorithm showed the highest accuracy in predicting diarrheal disease with an accuracy rate of 81.03% and an area under the curve of 86.50%. The following factors were investigated: families who had richest wealth status (log odd of -0.04), children without a history of Acute Respiratory Infections (ARIs) (log odd of -0.08), mothers who did not have a job (log odd of -0.04), children aged between 23 and 36 months (log odd of -0.03), mothers with higher education (log odds ratio of -0.03), urban dwellers (log odd of -0.01), families using electricity as cooking material (log odd of -0.12), children under 5 years of age living in the Amhara region of Ethiopia who did not show signs of wasting, children under 5 years of age who had not taken medications for intestinal parasites unlike their peers and who showed a significant association with diarrheal disease. Conclusion: We recommend implementing programs to reduce the incidence of diarrhea in children under the age of 5 in the Amhara region. These programs should focus on removing socioeconomic barriers that impede mothers' access to wealth, a favorable work environment, cooking fuel, education, and healthcare for their children.
Assuntos
Diarreia , Aprendizado de Máquina , Fatores Socioeconômicos , Humanos , Etiópia/epidemiologia , Diarreia/epidemiologia , Pré-Escolar , Lactente , Feminino , Masculino , Inquéritos Epidemiológicos , Modelos Logísticos , Fatores de Risco , Recém-Nascido , AdultoRESUMO
Despite the improvements in forensic DNA quantification methods that allow for the early detection of low template/challenged DNA samples, complicating stochastic effects are not revealed until the final stage of the DNA analysis workflow. An assay that would provide genotyping information at the earlier stage of quantification would allow examiners to make critical adjustments prior to STR amplification allowing for potentially exclusionary information to be immediately reported. Specifically, qPCR instruments often have dissociation curve and/or high-resolution melt curve (HRM) capabilities; this, coupled with statistical prediction analysis, could provide additional information regarding STR genotypes present. Thus, this study aimed to evaluate Qiagen's principal component analysis (PCA)-based ScreenClust® HRM® software and a linear discriminant analysis (LDA)-based technique for their abilities to accurately predict genotypes and similar groups of genotypes from HRM data. Melt curves from single source samples were generated from STR D5S818 and D18S51 amplicons using a Rotor-Gene® Q qPCR instrument and EvaGreen® intercalating dye. When used to predict D5S818 genotypes for unknown samples, LDA analysis outperformed the PCA-based method whether predictions were for individual genotypes (58.92% accuracy) or for geno-groups (81.00% accuracy). However, when a locus with increased heterogeneity was tested (D18S51), PCA-based prediction accuracy rates improved to rates similar to those obtained using LDA (45.10% and 63.46%, respectively). This study provides foundational data documenting the performance of prediction modeling for STR genotyping based on qPCR-HRM data. In order to expand the forensic applicability of this HRM assay, the method could be tested with a more commonly utilized qPCR platform.
Assuntos
Impressões Digitais de DNA , Genótipo , Repetições de Microssatélites , Análise de Componente Principal , Reação em Cadeia da Polimerase em Tempo Real , Humanos , Impressões Digitais de DNA/métodos , Análise Discriminante , Reação em Cadeia da Polimerase em Tempo Real/métodos , SoftwareRESUMO
BACKGROUND: Identifying individuals with depressive symptomatology (DS) promptly and effectively is of paramount importance for providing timely treatment. Machine learning models have shown promise in this area; however, studies often fall short in demonstrating the practical benefits of using these models and fail to provide tangible real-world applications. OBJECTIVE: This study aims to establish a novel methodology for identifying individuals likely to exhibit DS, identify the most influential features in a more explainable way via probabilistic measures, and propose tools that can be used in real-world applications. METHODS: The study used 3 data sets: PROACTIVE, the Brazilian National Health Survey (Pesquisa Nacional de Saúde [PNS]) 2013, and PNS 2019, comprising sociodemographic and health-related features. A Bayesian network was used for feature selection. Selected features were then used to train machine learning models to predict DS, operationalized as a score of ≥10 on the 9-item Patient Health Questionnaire. The study also analyzed the impact of varying sensitivity rates on the reduction of screening interviews compared to a random approach. RESULTS: The methodology allows the users to make an informed trade-off among sensitivity, specificity, and a reduction in the number of interviews. At the thresholds of 0.444, 0.412, and 0.472, determined by maximizing the Youden index, the models achieved sensitivities of 0.717, 0.741, and 0.718, and specificities of 0.644, 0.737, and 0.766 for PROACTIVE, PNS 2013, and PNS 2019, respectively. The area under the receiver operating characteristic curve was 0.736, 0.801, and 0.809 for these 3 data sets, respectively. For the PROACTIVE data set, the most influential features identified were postural balance, shortness of breath, and how old people feel they are. In the PNS 2013 data set, the features were the ability to do usual activities, chest pain, sleep problems, and chronic back problems. The PNS 2019 data set shared 3 of the most influential features with the PNS 2013 data set. However, the difference was the replacement of chronic back problems with verbal abuse. It is important to note that the features contained in the PNS data sets differ from those found in the PROACTIVE data set. An empirical analysis demonstrated that using the proposed model led to a potential reduction in screening interviews of up to 52% while maintaining a sensitivity of 0.80. CONCLUSIONS: This study developed a novel methodology for identifying individuals with DS, demonstrating the utility of using Bayesian networks to identify the most significant features. Moreover, this approach has the potential to substantially reduce the number of screening interviews while maintaining high sensitivity, thereby facilitating improved early identification and intervention strategies for individuals experiencing DS.
Assuntos
Algoritmos , Teorema de Bayes , Depressão , Humanos , Depressão/diagnóstico , Adulto , Feminino , Masculino , Brasil/epidemiologia , Pessoa de Meia-Idade , Aprendizado de Máquina , Programas de Rastreamento/métodos , Sensibilidade e Especificidade , Inquéritos EpidemiológicosRESUMO
Although guidelines exist for identifying mixtures, these measures often occur at the end-point of analysis and are protracted. To facilitate early mixture detection, we integrated a high-resolution melt (HRM) mixture screening assay into the qPCR step of the forensic workflow, producing the integrated QuantifilerTM Trio-HRM assay. The assay, when coupled with a prediction tool, allowed for 75.0% accurate identification of the contributor status of a sample (single source vs. mixture). To elucidate the limitations of the developed qPCR-HRM assay, developmental validation studies were conducted assessing the reproducibility and samples with varying DNA ratios, contributors, and quality. From this work, it was determined that the integrated QuantifilerTM Trio-HRM assay is capable of accurately identifying mixtures with up to five contributors and mixtures at ratios up to 1:100. Further, the optimal performance concentration range was found to be between 0.025 and 0.5 ng/µL. With these results, evidentiary-like DNA samples were then analyzed, resulting in 100.0% of the mixture samples being accurately identified; furthermore, every time a sample was predicted as a single source, it was true, giving confidence to any single-source calls. Overall, the integrated QuantifilerTM Trio-HRM assay has exhibited an enhanced ability to discern mixture samples from single-source samples at the qPCR stage under commonly observed conditions regardless of the contributor's sex.
Assuntos
Genética Forense , Humanos , Genética Forense/métodos , Reação em Cadeia da Polimerase em Tempo Real/métodos , Reação em Cadeia da Polimerase em Tempo Real/normas , DNA/genética , Impressões Digitais de DNA/métodos , Reprodutibilidade dos Testes , Repetições de Microssatélites/genéticaRESUMO
BACKGROUND: Automatic transdiagnostic risk calculators can improve the detection of individuals at risk of psychosis. However, they rely on assessment at a single point in time and can be refined with dynamic modeling techniques that account for changes in risk over time. METHODS: We included 158,139 patients (5007 events) who received a first index diagnosis of a nonorganic and nonpsychotic mental disorder within electronic health records from the South London and Maudsley National Health Service Foundation Trust between January 1, 2008, and October 8, 2021. A dynamic Cox landmark model was developed to estimate the 2-year risk of developing psychosis according to the TRIPOD (Transparent Reporting of a multivariate prediction model for Individual Prognosis or Diagnosis) statement. The dynamic model included 24 predictors extracted at 9 landmark points (baseline, 0, 6, 12, 24, 30, 36, 42, and 48 months): 3 demographic, 1 clinical, and 20 natural language processing-based symptom and substance use predictors. Performance was compared with a static Cox regression model with all predictors assessed at baseline only and indexed via discrimination (C-index), calibration (calibration plots), and potential clinical utility (decision curves) in internal-external validation. RESULTS: The dynamic model improved discrimination performance from baseline compared with the static model (dynamic: C-index = 0.9; static: C-index = 0.87) and the final landmark point (dynamic: C-index = 0.79; static: C-index = 0.76). The dynamic model was also significantly better calibrated (calibration slope = 0.97-1.1) than the static model at later landmark points (≥24 months). Net benefit was higher for the dynamic than for the static model at later landmark points (≥24 months). CONCLUSIONS: These findings suggest that dynamic prediction models can improve the detection of individuals at risk for psychosis in secondary mental health care settings.
Assuntos
Processamento de Linguagem Natural , Transtornos Psicóticos , Humanos , Transtornos Psicóticos/diagnóstico , Feminino , Masculino , Adulto , Medição de Risco/métodos , Adulto Jovem , Estudos de Coortes , Atenção Secundária à Saúde , Adolescente , Pessoa de Meia-Idade , Modelos de Riscos Proporcionais , Registros Eletrônicos de Saúde , PrognósticoRESUMO
INTRODUCTION: Veterans Affairs Surgical Quality Improvement Program (VASQIP) benchmarking algorithms helped the Veterans Health Administration (VHA) reduce postoperative mortality. Despite calls to consider social risk factors, these algorithms do not adjust for social determinants of health (SDoH) or account for services fragmented between the VHA and the private sector. This investigation examines how the addition of SDoH change model performance and quantifies associations between SDoH and 30-d postoperative mortality. METHODS: VASQIP (2013-2019) cohort study in patients ≥65 y old with 2-30-d inpatient stays. VASQIP was linked to other VHA and Medicare/Medicaid data. 30-d postoperative mortality was examined using multivariable logistic regression models, adjusting first for clinical variables, then adding SDoH. RESULTS: In adjusted analyses of 93,644 inpatient cases (97.7% male, 79.7% non-Hispanic White), higher proportions of non-veterans affairs care (adjusted odds ratio [aOR] = 1.02, 95% CI = 1.01-1.04) and living in highly deprived areas (aOR = 1.15, 95% CI = 1.02-1.29) were associated with increased postoperative mortality. Black race (aOR = 0.77, CI = 0.68-0.88) and rurality (aOR = 0.87, CI = 0.79-0.96) were associated with lower postoperative mortality. Adding SDoH to models with only clinical variables did not improve discrimination (c = 0.836 versus c = 0.835). CONCLUSIONS: Postoperative mortality is worse among Veterans receiving more health care outside the VA and living in highly deprived neighborhoods. However, adjusting for SDoH is unlikely to improve existing mortality-benchmarking models. Reduction efforts for postoperative mortality could focus on alleviating care fragmentation and designing care pathways that consider area deprivation. The adjusted survival advantage for rural and Black Veterans may be of interest to private sector hospitals as they attempt to alleviate enduring health-care disparities.
Assuntos
Determinantes Sociais da Saúde , Veteranos , Humanos , Idoso , Masculino , Feminino , Estados Unidos/epidemiologia , Idoso de 80 Anos ou mais , Veteranos/estatística & dados numéricos , United States Department of Veterans Affairs/estatística & dados numéricos , United States Department of Veterans Affairs/organização & administração , Fatores de Risco , Melhoria de Qualidade , Complicações Pós-Operatórias/mortalidade , Complicações Pós-Operatórias/epidemiologiaRESUMO
Preterm birth (PTB) is a leading cause of morbidity and mortality in children aged under 5 years globally, especially in low-resource settings. It remains a challenge in many low-income and middle-income countries to accurately measure the true burden of PTB due to limited availability of accurate measures of gestational age (GA), first trimester ultrasound dating being the gold standard. Metabolomics biomarkers are a promising area of research that could provide tools for both early identification of high-risk pregnancies and for the estimation of GA and preterm status of newborns postnatally.
Assuntos
Biomarcadores , Idade Gestacional , Metabolômica , Nascimento Prematuro , Humanos , Nascimento Prematuro/metabolismo , Biomarcadores/metabolismo , Feminino , Gravidez , Recém-NascidoRESUMO
Objective: The objective of this study was to investigate the risk factors associated with cesarean scar pregnancy (CSP) and to develop a model for predicting intraoperative bleeding risk. Methods: We retrospectively analyzed the clinical data of 208 patients with CSP who were admitted to the People's Hospital of Leshan between January 2018 and December 2022. Based on whether intraoperative bleeding was ≥ 200 mL, we categorized them into two groups for comparative analysis: the excessive bleeding group (n = 27) and the control group (n = 181). Identifying relevant factors, we constructed a prediction model and created a nomogram. Results: We observed that there were significant differences between the two groups in several parameters. These included the time of menstrual cessation (P = 0.002), maximum diameter of the gestational sac (P < 0.001), thickness of the myometrium at the uterine scar (P = 0.001), pre-treatment blood HCG levels (P = 0.016), and the grade of blood flow signals (P < 0.001). We consolidated the above data and constructed a clinical prediction model. The model exhibited favorable results in terms of predictive efficacy, discriminative ability (C-index = 0.894, specificity = 0.834, sensitivity = 0.852), calibration precision (mean absolute error = 0.018), and clinical decision-making utility, indicating its effectiveness. Conclusion: The clinical prediction model related to the risk of hemorrhage that we developed in this experiment can assist in the development of appropriate interventions and effectively improve patient prognosis.
RESUMO
Background: High risk of intracranial hemorrhage (ICH) is a leading reason for withholding anticoagulation in patients with atrial fibrillation (AF). We aimed to develop a claims-based ICH risk prediction model in older adults with AF initiating oral anticoagulation (OAC). Methods: We used US Medicare claims data to identify new users of OAC aged ≥65 years with AF in 2010-2017. We used regularized Cox regression to select predictors of ICH. We compared our AF ICH risk score with the HAS-BLED bleed risk and Homer fall risk scores by area under the receiver operating characteristic curve (AUC) and assessed net reclassification improvement (NRI) when predicting 1-year risk of ICH. Results: Our study cohort comprised 840,020 patients (mean [SD] age 77.5 [7.4] years and female 52.2%) split geographically into training (3963 ICH events [0.6%] in 629,804 patients) and validation (1397 ICH events [0.7%] in 210,216 patients) sets. Our AF ICH risk score, including 50 predictors, had superior AUCs of 0.653 and 0.650 in the training and validation sets than the HAS-BLED score of 0.580 and 0.567 (p<0.001) and the Homer score of 0.624 and 0.623 (p<0.001). In the validation set, our AF ICH risk score reclassified 57.8%, 42.5%, and 43.9% of low, intermediate, and high-risk patients, respectively, by HAS-BLED score (NRI: 15.3%, p<0.001). Similarly, it reclassified 0.0, 44.1, and 19.4% of low, intermediate, and high-risk patients, respectively, by the Homer score (NRI: 21.9%, p<0.001). Conclusion: Our novel claims-based ICH risk prediction model outperformed the standard HAS-BLED score and can inform OAC prescribing decisions.
RESUMO
Water resources are constantly threatened by pollution of potentially toxic elements (PTEs). In efforts to monitor and mitigate PTEs pollution in water resources, machine learning (ML) algorithms have been utilized to predict them. However, review studies have not paid attention to the suitability of input variables utilized for PTE prediction. Therefore, the present review analyzed studies that employed three ML algorithms: MLP-NN (multilayer perceptron neural network), RBF-NN (radial basis function neural network), and ANFIS (adaptive neuro-fuzzy inference system) to predict PTEs in water. A total of 139 models were analyzed to ascertain the input variables utilized, the suitability of the input variables, the trends of the ML model applications, and the comparison of their performances. The present study identified seven groups of input variables commonly used to predict PTEs in water. Group 1 comprised of physical parameters (P), chemical parameters (C), and metals (M). Group 2 contains only P and C; Group 3 contains only P and M; Group 4 contains only C and M; Group 5 contains only P; Group 6 contains only C; and Group 7 contains only M. Studies that employed the three algorithms proved that Groups 1, 2, 3, 5, and 7 parameters are suitable input variables for forecasting PTEs in water. The parameters of Groups 4 and 6 also proved to be suitable for the MLP-NN algorithm. However, their suitability with respect to the RBF-NN and ANFIS algorithms could not be ascertained. The most commonly predicted PTEs using the MLP-NN algorithm were Fe, Zn, and As. For the RBF-NN algorithm, they were NO3, Zn, and Pb, and for the ANFIS, they were NO3, Fe, and Mn. Based on correlation and determination coefficients (R, R2), the overall order of performance of the three ML algorithms was ANFIS > RBF-NN > MLP-NN, even though MLP-NN was the most commonly used algorithm.
Assuntos
Algoritmos , Aprendizado de Máquina , Redes Neurais de Computação , Poluentes Químicos da Água , Recursos Hídricos , Poluentes Químicos da Água/análise , Monitoramento Ambiental/métodos , Lógica FuzzyRESUMO
Precision psychiatry is an emerging field that aims to provide individualized approaches to mental health care. An important strategy to achieve this precision is to reduce uncertainty about prognosis and treatment response. Multivariate analysis and machine learning are used to create outcome prediction models based on clinical data such as demographics, symptom assessments, genetic information, and brain imaging. While much emphasis has been placed on technical innovation, the complex and varied nature of mental health presents significant challenges to the successful implementation of these models. From this perspective, I review ten challenges in the field of precision psychiatry, including the need for studies on real-world populations and realistic clinical outcome definitions, and consideration of treatment-related factors such as placebo effects and non-adherence to prescriptions. Fairness, prospective validation in comparison to current practice and implementation studies of prediction models are other key issues that are currently understudied. A shift is proposed from retrospective studies based on linear and static concepts of disease towards prospective research that considers the importance of contextual factors and the dynamic and complex nature of mental health.
Assuntos
Transtornos Mentais , Medicina de Precisão , Psiquiatria , Humanos , Medicina de Precisão/métodos , Psiquiatria/métodos , Transtornos Mentais/tratamento farmacológico , Aprendizado de Máquina , PrognósticoRESUMO
INTRODUCTION: Identifying mild cognitive impairment (MCI) patients at risk for dementia could facilitate early interventions. Using electronic health records (EHRs), we developed a model to predict MCI to all-cause dementia (ACD) conversion at 5 years. METHODS: Cox proportional hazards model was used to identify predictors of ACD conversion from EHR data in veterans with MCI. Model performance (area under the receiver operating characteristic curve [AUC] and Brier score) was evaluated on a held-out data subset. RESULTS: Of 59,782 MCI patients, 15,420 (25.8%) converted to ACD. The model had good discriminative performance (AUC 0.73 [95% confidence interval (CI) 0.72-0.74]), and calibration (Brier score 0.18 [95% CI 0.17-0.18]). Age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors, while body mass index, alcohol abuse, and sleep apnea were protective factors. DISCUSSION: EHR-based prediction model had good performance in identifying 5-year MCI to ACD conversion and has potential to assist triaging of at-risk patients. Highlights: Of 59,782 veterans with mild cognitive impairment (MCI), 15,420 (25.8%) converted to all-cause dementia within 5 years.Electronic health record prediction models demonstrated good performance (area under the receiver operating characteristic curve 0.73; Brier 0.18).Age and vascular-related morbidities were predictors of dementia conversion.Synthetic data was comparable to real data in modeling MCI to dementia conversion. Key Points: An electronic health record-based model using demographic and co-morbidity data had good performance in identifying veterans who convert from mild cognitive impairment (MCI) to all-cause dementia (ACD) within 5 years.Increased age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors for 5-year conversion from MCI to ACD.High body mass index, alcohol abuse, and sleep apnea were protective factors for 5-year conversion from MCI to ACD.Models using synthetic data, analogs of real patient data that retain the distribution, density, and covariance between variables of real patient data but are not attributable to any specific patient, performed just as well as models using real patient data. This could have significant implications in facilitating widely distributed computing of health-care data with minimized patient privacy concern that could accelerate scientific discoveries.
RESUMO
OBJECTIVES BACKGROUND: To externally validate clinical prediction models that aim to predict progression to invasive ventilation or death on the ICU in patients admitted with confirmed COVID-19 pneumonitis. DESIGN: Single-center retrospective external validation study. DATA SOURCES: Routinely collected healthcare data in the ICU electronic patient record. Curated data recorded for each ICU admission for the purposes of the U.K. Intensive Care National Audit and Research Centre (ICNARC). SETTING: The ICU at Manchester Royal Infirmary, Manchester, United Kingdom. PATIENTS: Three hundred forty-nine patients admitted to ICU with confirmed COVID-19 Pneumonitis, older than 18 years, from March 1, 2020, to February 28, 2022. Three hundred two met the inclusion criteria for at least one model. Fifty-five of the 349 patients were admitted before the widespread adoption of dexamethasone for the treatment of severe COVID-19 (pre-dexamethasone patients). OUTCOMES: Ability to be externally validated, discriminate, and calibrate. METHODS: Articles meeting the inclusion criteria were identified, and those that gave sufficient details on predictors used and methods to generate predictions were tested in our cohort of patients, which matched the original publications' inclusion/exclusion criteria and endpoint. RESULTS: Thirteen clinical prediction articles were identified. There was insufficient information available to validate models in five of the articles; a further three contained predictors that were not routinely measured in our ICU cohort and were not validated; three had performance that was substantially lower than previously published (range C-statistic = 0.483-0.605 in pre-dexamethasone patients and C = 0.494-0.564 among all patients). One model retained its discriminative ability in our cohort compared with previously published results (C = 0.672 and 0.686), and one retained performance among pre-dexamethasone patients but was poor in all patients (C = 0.793 and 0.596). One model could be calibrated but with poor performance. CONCLUSIONS: Our findings, albeit from a single center, suggest that the published performance of COVID-19 prediction models may not be replicated when translated to other institutions. In light of this, we would encourage bedside intensivists to reflect on the role of clinical prediction models in their own clinical decision-making.
RESUMO
BACKGROUND: SARS-CoV-2 vaccines are effective in reducing hospitalization, COVID-19 symptoms, and COVID-19 mortality for nursing home (NH) residents. We sought to compare the accuracy of various machine learning models, examine changes to model performance, and identify resident characteristics that have the strongest associations with 30-day COVID-19 mortality, before and after vaccine availability. METHODS: We conducted a population-based retrospective cohort study analyzing data from all NH facilities across Ontario, Canada. We included all residents diagnosed with SARS-CoV-2 and living in NHs between March 2020 and July 2021. We employed five machine learning algorithms to predict COVID-19 mortality, including logistic regression, LASSO regression, classification and regression trees (CART), random forests, and gradient boosted trees. The discriminative performance of the models was evaluated using the area under the receiver operating characteristic curve (AUC) for each model using 10-fold cross-validation. Model calibration was determined through evaluation of calibration slopes. Variable importance was calculated by repeatedly and randomly permutating the values of each predictor in the dataset and re-evaluating the model's performance. RESULTS: A total of 14,977 NH residents and 20 resident characteristics were included in the model. The cross-validated AUCs were similar across algorithms and ranged from 0.64 to 0.67. Gradient boosted trees and logistic regression had an AUC of 0.67 pre- and post-vaccine availability. CART had the lowest discrimination ability with an AUC of 0.64 pre-vaccine availability, and 0.65 post-vaccine availability. The most influential resident characteristics, irrespective of vaccine availability, included advanced age (≥ 75 years), health instability, functional and cognitive status, sex (male), and polypharmacy. CONCLUSIONS: The predictive accuracy and discrimination exhibited by all five examined machine learning algorithms were similar. Both logistic regression and gradient boosted trees exhibit comparable performance and display slight superiority over other machine learning algorithms. We observed consistent model performance both before and after vaccine availability. The influence of resident characteristics on COVID-19 mortality remained consistent across time periods, suggesting that changes to pre-vaccination screening practices for high-risk individuals are effective in the post-vaccination era.
Assuntos
COVID-19 , Idoso , Humanos , COVID-19/prevenção & controle , Vacinas contra COVID-19 , Casas de Saúde , Ontário/epidemiologia , Estudos Retrospectivos , SARS-CoV-2 , Masculino , FemininoRESUMO
The use of clinical prediction models to produce individualized risk estimates can facilitate the implementation of precision psychiatry. As a source of data from large, clinically representative patient samples, electronic health records (EHRs) provide a platform to develop and validate clinical prediction models, as well as potentially implement them in routine clinical care. The current review describes promising use cases for the application of precision psychiatry to EHR data and considers their performance in terms of discrimination (ability to separate individuals with and without the outcome) and calibration (extent to which predicted risk estimates correspond to observed outcomes), as well as their potential clinical utility (weighing benefits and costs associated with the model compared to different approaches across different assumptions of the number needed to test). We review 4 externally validated clinical prediction models designed to predict psychosis onset, psychotic relapse, cardiometabolic morbidity, and suicide risk. We then discuss the prospects for clinically implementing these models and the potential added value of integrating data from evidence syntheses, standardized psychometric assessments, and biological data into EHRs. Clinical prediction models can utilize routinely collected EHR data in an innovative way, representing a unique opportunity to inform real-world clinical decision making. Combining data from other sources (e.g., meta-analyses) or enhancing EHR data with information from research studies (clinical and biomarker data) may enhance our abilities to improve the performance of clinical prediction models.