RESUMEN
BACKGROUND: A prediction model can be a useful tool to quantify the risk of a patient developing dementia in the next years and take risk-factor-targeted intervention. Numerous dementia prediction models have been developed, but few have been externally validated, likely limiting their clinical uptake. In our previous work, we had limited success in externally validating some of these existing models due to inadequate reporting. As a result, we are compelled to develop and externally validate novel models to predict dementia in the general population across a network of observational databases. We assess regularization methods to obtain parsimonious models that are of lower complexity and easier to implement. METHODS: Logistic regression models were developed across a network of five observational databases with electronic health records (EHRs) and claims data to predict 5-year dementia risk in persons aged 55-84. The regularization methods L1 and Broken Adaptive Ridge (BAR) as well as three candidate predictor sets to optimize prediction performance were assessed. The predictor sets include a baseline set using only age and sex, a full set including all available candidate predictors, and a phenotype set which includes a limited number of clinically relevant predictors. RESULTS: BAR can be used for variable selection, outperforming L1 when a parsimonious model is desired. Adding candidate predictors for disease diagnosis and drug exposure generally improves the performance of baseline models using only age and sex. While a model trained on German EHR data saw an increase in AUROC from 0.74 to 0.83 with additional predictors, a model trained on US EHR data showed only minimal improvement from 0.79 to 0.81 AUROC. Nevertheless, the latter model developed using BAR regularization on the clinically relevant predictor set was ultimately chosen as best performing model as it demonstrated more consistent external validation performance and improved calibration. CONCLUSIONS: We developed and externally validated patient-level models to predict dementia. Our results show that although dementia prediction is highly driven by demographic age, adding predictors based on condition diagnoses and drug exposures further improves prediction performance. BAR regularization outperforms L1 regularization to yield the most parsimonious yet still well-performing prediction model for dementia.
Asunto(s)
Bases de Datos Factuales , Demencia , Humanos , Demencia/diagnóstico , Demencia/epidemiología , Anciano , Femenino , Masculino , Anciano de 80 o más Años , Persona de Mediana Edad , Registros Electrónicos de Salud , Medición de Riesgo/métodos , Factores de RiesgoRESUMEN
PURPOSE: To develop prediction models for short-term mortality risk assessment following colorectal cancer surgery. METHODS: Data was harmonized from four Danish observational health databases into the Observational Medical Outcomes Partnership Common Data Model. With a data-driven approach using the Least Absolute Shrinkage and Selection Operator logistic regression on preoperative data, we developed 30-day, 90-day, and 1-year mortality prediction models. We assessed discriminative performance using the area under the receiver operating characteristic and precision-recall curve and calibration using calibration slope, intercept, and calibration-in-the-large. We additionally assessed model performance in subgroups of curative, palliative, elective, and emergency surgery. RESULTS: A total of 57,521 patients were included in the study population, 51.1% male and with a median age of 72 years. The model showed good discrimination with an area under the receiver operating characteristic curve of 0.88, 0.878, and 0.861 for 30-day, 90-day, and 1-year mortality, respectively, and a calibration-in-the-large of 1.01, 0.99, and 0.99. The overall incidence of mortality were 4.48% for 30-day mortality, 6.64% for 90-day mortality, and 12.8% for 1-year mortality, respectively. Subgroup analysis showed no improvement of discrimination or calibration when separating the cohort into cohorts of elective surgery, emergency surgery, curative surgery, and palliative surgery. CONCLUSION: We were able to train prediction models for the risk of short-term mortality on a data set of four combined national health databases with good discrimination and calibration. We found that one cohort including all operated patients resulted in better performing models than cohorts based on several subgroups.
Asunto(s)
Neoplasias Colorrectales , Procedimientos Quirúrgicos del Sistema Digestivo , Humanos , Masculino , Anciano , Femenino , Calibración , Bases de Datos Factuales , Procedimientos Quirúrgicos Electivos , Neoplasias Colorrectales/cirugíaRESUMEN
PURPOSE: Real-world data (RWD) offers a valuable resource for generating population-level disease epidemiology metrics. We aimed to develop a well-tested and user-friendly R package to compute incidence rates and prevalence in data mapped to the observational medical outcomes partnership (OMOP) common data model (CDM). MATERIALS AND METHODS: We created IncidencePrevalence, an R package to support the analysis of population-level incidence rates and point- and period-prevalence in OMOP-formatted data. On top of unit testing, we assessed the face validity of the package. To do so, we calculated incidence rates of COVID-19 using RWD from Spain (SIDIAP) and the United Kingdom (CPRD Aurum), and replicated two previously published studies using data from the Netherlands (IPCI) and the United Kingdom (CPRD Gold). We compared the obtained results to those previously published, and measured execution times by running a benchmark analysis across databases. RESULTS: IncidencePrevalence achieved high agreement to previously published data in CPRD Gold and IPCI, and showed good performance across databases. For COVID-19, incidence calculated by the package was similar to public data after the first-wave of the pandemic. CONCLUSION: For data mapped to the OMOP CDM, the IncidencePrevalence R package can support descriptive epidemiological research. It enables reliable estimation of incidence and prevalence from large real-world data sets. It represents a simple, but extendable, analytical framework to generate estimates in a reproducible and timely manner.
Asunto(s)
COVID-19 , Manejo de Datos , Humanos , Incidencia , Prevalencia , Bases de Datos Factuales , COVID-19/epidemiologíaRESUMEN
BACKGROUND: Medication errors (MEs) are a major public health concern which can cause harm and financial burden within the healthcare system. Characterizing MEs is crucial to develop strategies to mitigate MEs in the future. OBJECTIVES: To characterize ME-associated reports, and investigate signals of disproportionate reporting (SDRs) on MEs in the Food and Drug Administration's Adverse Event Reporting System (FAERS). METHODS: FAERS data from 2004 to 2020 was used. ME reports were identified with the narrow Standardised Medical Dictionary for Regulatory Activities® (MedDRA®) Query (SMQ) for MEs. Drug names were converted to the Anatomical Therapeutic Chemical (ATC) classification. SDRs were investigated using the reporting odds ratio (ROR). RESULTS: In total 488 470 ME reports were identified, mostly (59%) submitted by consumers and mainly (55%) associated with females. Median age at time of ME was 57 years (interquartile range: 37-70 years). Approximately 1 out of 3 reports stated a serious health outcome. The most prevalent reported drug class was "antineoplastic and immunomodulating agents" (25%). The most common ME type was "incorrect dose administered" (9%). Of the 1659 SDRs obtained, adalimumab was the most common drug associated with MEs, noting a ROR of 1.22 (95% confidence interval: 1.21-1.24). CONCLUSION: This study offers a first of its kind characterization of MEs as reported to FAERS. Reported MEs are frequent and may be associated with serious health outcomes. This FAERS data provides insights on ME prevention and offers possibilities for additional in-depth analyses.
Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos , Errores de Medicación , Femenino , Estados Unidos , Humanos , Adulto , Persona de Mediana Edad , Anciano , Preparaciones Farmacéuticas , United States Food and Drug Administration , Errores de Medicación/prevención & control , Adalimumab , FarmacovigilanciaRESUMEN
BACKGROUND: Deep learning models have had a lot of success in various fields. However, on structured data they have struggled. Here we apply four state-of-the-art supervised deep learning models using the attention mechanism and compare against logistic regression and XGBoost using discrimination, calibration and clinical utility. METHODS: We develop the models using a general practitioners database. We implement a recurrent neural network, a transformer with and without reverse distillation and a graph neural network. We measure discrimination using the area under the receiver operating characteristic curve (AUC) and the area under the precision recall curve (AUPRC). We assess smooth calibration using restricted cubic splines and clinical utility with decision curve analysis. RESULTS: Our results show that deep learning approaches can improve discrimination up to 2.5% points AUC and 7.4% points AUPRC. However, on average the baselines are competitive. Most models are similarly calibrated as the baselines except for the graph neural network. The transformer using reverse distillation shows the best performance in clinical utility on two out of three prediction problems over most of the prediction thresholds. CONCLUSION: In this study, we evaluated various approaches in supervised learning using neural networks and attention. Here we do a rigorous comparison, not only looking at discrimination but also calibration and clinical utility. There is value in using deep learning models on electronic health record data since it can improve discrimination and clinical utility while providing good calibration. However, good baseline methods are still competitive.
Asunto(s)
Registros Electrónicos de Salud , Redes Neurales de la Computación , Humanos , Modelos Logísticos , Curva ROC , Área Bajo la CurvaRESUMEN
BACKGROUND: Baseline outcome risk can be an important determinant of absolute treatment benefit and has been used in guidelines for "personalizing" medical decisions. We compared easily applicable risk-based methods for optimal prediction of individualized treatment effects. METHODS: We simulated RCT data using diverse assumptions for the average treatment effect, a baseline prognostic index of risk, the shape of its interaction with treatment (none, linear, quadratic or non-monotonic), and the magnitude of treatment-related harms (none or constant independent of the prognostic index). We predicted absolute benefit using: models with a constant relative treatment effect; stratification in quarters of the prognostic index; models including a linear interaction of treatment with the prognostic index; models including an interaction of treatment with a restricted cubic spline transformation of the prognostic index; an adaptive approach using Akaike's Information Criterion. We evaluated predictive performance using root mean squared error and measures of discrimination and calibration for benefit. RESULTS: The linear-interaction model displayed optimal or close-to-optimal performance across many simulation scenarios with moderate sample size (N = 4,250; ~ 785 events). The restricted cubic splines model was optimal for strong non-linear deviations from a constant treatment effect, particularly when sample size was larger (N = 17,000). The adaptive approach also required larger sample sizes. These findings were illustrated in the GUSTO-I trial. CONCLUSIONS: An interaction between baseline risk and treatment assignment should be considered to improve treatment effect predictions.
Asunto(s)
Ensayos Clínicos Controlados Aleatorios como Asunto , Humanos , Pronóstico , Simulación por Computador , Tamaño de la MuestraRESUMEN
BACKGROUND: Many dementia prediction models have been developed, but only few have been externally validated, which hinders clinical uptake and may pose a risk if models are applied to actual patients regardless. Externally validating an existing prediction model is a difficult task, where we mostly rely on the completeness of model reporting in a published article. In this study, we aim to externally validate existing dementia prediction models. To that end, we define model reporting criteria, review published studies, and externally validate three well reported models using routinely collected health data from administrative claims and electronic health records. METHODS: We identified dementia prediction models that were developed between 2011 and 2020 and assessed if they could be externally validated given a set of model criteria. In addition, we externally validated three of these models (Walters' Dementia Risk Score, Mehta's RxDx-Dementia Risk Index, and Nori's ADRD dementia prediction model) on a network of six observational health databases from the United States, United Kingdom, Germany and the Netherlands, including the original development databases of the models. RESULTS: We reviewed 59 dementia prediction models. All models reported the prediction method, development database, and target and outcome definitions. Less frequently reported by these 59 prediction models were predictor definitions (52 models) including the time window in which a predictor is assessed (21 models), predictor coefficients (20 models), and the time-at-risk (42 models). The validation of the model by Walters (development c-statistic: 0.84) showed moderate transportability (0.67-0.76 c-statistic). The Mehta model (development c-statistic: 0.81) transported well to some of the external databases (0.69-0.79 c-statistic). The Nori model (development AUROC: 0.69) transported well (0.62-0.68 AUROC) but performed modestly overall. Recalibration showed improvements for the Walters and Nori models, while recalibration could not be assessed for the Mehta model due to unreported baseline hazard. CONCLUSION: We observed that reporting is mostly insufficient to fully externally validate published dementia prediction models, and therefore, it is uncertain how well these models would work in other clinical settings. We emphasize the importance of following established guidelines for reporting clinical prediction models. We recommend that reporting should be more explicit and have external validation in mind if the model is meant to be applied in different settings.
Asunto(s)
Demencia , Humanos , Reino Unido , Factores de Riesgo , Demencia/diagnóstico , Demencia/epidemiología , Países Bajos/epidemiología , Alemania , PronósticoRESUMEN
BACKGROUND: We investigated whether we could use influenza data to develop prediction models for COVID-19 to increase the speed at which prediction models can reliably be developed and validated early in a pandemic. We developed COVID-19 Estimated Risk (COVER) scores that quantify a patient's risk of hospital admission with pneumonia (COVER-H), hospitalization with pneumonia requiring intensive services or death (COVER-I), or fatality (COVER-F) in the 30-days following COVID-19 diagnosis using historical data from patients with influenza or flu-like symptoms and tested this in COVID-19 patients. METHODS: We analyzed a federated network of electronic medical records and administrative claims data from 14 data sources and 6 countries containing data collected on or before 4/27/2020. We used a 2-step process to develop 3 scores using historical data from patients with influenza or flu-like symptoms any time prior to 2020. The first step was to create a data-driven model using LASSO regularized logistic regression, the covariates of which were used to develop aggregate covariates for the second step where the COVER scores were developed using a smaller set of features. These 3 COVER scores were then externally validated on patients with 1) influenza or flu-like symptoms and 2) confirmed or suspected COVID-19 diagnosis across 5 databases from South Korea, Spain, and the United States. Outcomes included i) hospitalization with pneumonia, ii) hospitalization with pneumonia requiring intensive services or death, and iii) death in the 30 days after index date. RESULTS: Overall, 44,507 COVID-19 patients were included for model validation. We identified 7 predictors (history of cancer, chronic obstructive pulmonary disease, diabetes, heart disease, hypertension, hyperlipidemia, kidney disease) which combined with age and sex discriminated which patients would experience any of our three outcomes. The models achieved good performance in influenza and COVID-19 cohorts. For COVID-19 the AUC ranges were, COVER-H: 0.69-0.81, COVER-I: 0.73-0.91, and COVER-F: 0.72-0.90. Calibration varied across the validations with some of the COVID-19 validations being less well calibrated than the influenza validations. CONCLUSIONS: This research demonstrated the utility of using a proxy disease to develop a prediction model. The 3 COVER models with 9-predictors that were developed using influenza data perform well for COVID-19 patients for predicting hospitalization, intensive services, and fatality. The scores showed good discriminatory performance which transferred well to the COVID-19 population. There was some miscalibration in the COVID-19 validations, which is potentially due to the difference in symptom severity between the two diseases. A possible solution for this is to recalibrate the models in each location before use.
Asunto(s)
COVID-19 , Gripe Humana , Neumonía , Prueba de COVID-19 , Humanos , Gripe Humana/epidemiología , SARS-CoV-2 , Estados UnidosRESUMEN
A digital twin (DT), originally defined as a virtual representation of a physical asset, system, or process, is a new concept in health care. A DT in health care is not a single technology but a domain-adapted multimodal modeling approach incorporating the acquisition, management, analysis, prediction, and interpretation of data, aiming to improve medical decision-making. However, there are many challenges and barriers that must be overcome before a DT can be used in health care. In this viewpoint paper, we build on the current literature, address these challenges, and describe a dynamic DT in health care for optimizing individual patient health care journeys, specifically for women at risk for cardiovascular complications in the preconception and pregnancy periods and across the life course. We describe how we can commit multiple domains to developing this DT. With our cross-domain definition of the DT, we aim to define future goals, trade-offs, and methods that will guide the development of the dynamic DT and implementation strategies in health care.
Asunto(s)
Acontecimientos que Cambian la Vida , Atención al Paciente , Femenino , Humanos , Embarazo , TecnologíaRESUMEN
BACKGROUND: Prognostic models that are accurate could help aid medical decision making. Large observational databases often contain temporal medical data for large and diverse populations of patients. It may be possible to learn prognostic models using the large observational data. Often the performance of a prognostic model undesirably worsens when transported to a different database (or into a clinical setting). In this study we investigate different ensemble approaches that combine prognostic models independently developed using different databases (a simple federated learning approach) to determine whether ensembles that combine models developed across databases can improve model transportability (perform better in new data than single database models)? METHODS: For a given prediction question we independently trained five single database models each using a different observational healthcare database. We then developed and investigated numerous ensemble models (fusion, stacking and mixture of experts) that combined the different database models. Performance of each model was investigated via discrimination and calibration using a leave one dataset out technique, i.e., hold out one database to use for validation and use the remaining four datasets for model development. The internal validation of a model developed using the hold out database was calculated and presented as the 'internal benchmark' for comparison. RESULTS: In this study the fusion ensembles generally outperformed the single database models when transported to a previously unseen database and the performances were more consistent across unseen databases. Stacking ensembles performed poorly in terms of discrimination when the labels in the unseen database were limited. Calibration was consistently poor when both ensembles and single database models were applied to previously unseen databases. CONCLUSION: A simple federated learning approach that implements ensemble techniques to combine models independently developed across different databases for the same prediction question may improve the discriminative performance in new data (new database or clinical setting) but will need to be recalibrated using the new data. This could help medical decision making by improving prognostic model performance.
Asunto(s)
Atención a la Salud , Calibración , Bases de Datos Factuales , Humanos , PronósticoRESUMEN
PURPOSE: The purpose of this study was to develop and validate a prediction model for 90-day mortality following a total knee replacement (TKR). TKR is a safe and cost-effective surgical procedure for treating severe knee osteoarthritis (OA). Although complications following surgery are rare, prediction tools could help identify high-risk patients who could be targeted with preventative interventions. The aim was to develop and validate a simple model to help inform treatment choices. METHODS: A mortality prediction model for knee OA patients following TKR was developed and externally validated using a US claims database and a UK general practice database. The target population consisted of patients undergoing a primary TKR for knee OA, aged ≥ 40 years and registered for ≥ 1 year before surgery. LASSO logistic regression models were developed for post-operative (90-day) mortality. A second mortality model was developed with a reduced feature set to increase interpretability and usability. RESULTS: A total of 193,615 patients were included, with 40,950 in The Health Improvement Network (THIN) database and 152,665 in Optum. The full model predicting 90-day mortality yielded AUROC of 0.78 when trained in OPTUM and 0.70 when externally validated on THIN. The 12 variable model achieved internal AUROC of 0.77 and external AUROC of 0.71 in THIN. CONCLUSIONS: A simple prediction model based on sex, age, and 10 comorbidities that can identify patients at high risk of short-term mortality following TKR was developed that demonstrated good, robust performance. The 12-feature mortality model is easily implemented and the performance suggests it could be used to inform evidence based shared decision-making prior to surgery and targeting prophylaxis for those at high risk. LEVEL OF EVIDENCE: III.
Asunto(s)
Artroplastia de Reemplazo de Rodilla , Osteoartritis de la Rodilla , Niño , Bases de Datos Factuales , HumanosRESUMEN
BACKGROUND: A detailed characterization of patients with COVID-19 living with obesity has not yet been undertaken. We aimed to describe and compare the demographics, medical conditions, and outcomes of COVID-19 patients living with obesity (PLWO) to those of patients living without obesity. METHODS: We conducted a cohort study based on outpatient/inpatient care and claims data from January to June 2020 from Spain, the UK, and the US. We used six databases standardized to the OMOP common data model. We defined two non-mutually exclusive cohorts of patients diagnosed and/or hospitalized with COVID-19; patients were followed from index date to 30 days or death. We report the frequency of demographics, prior medical conditions, and 30-days outcomes (hospitalization, events, and death) by obesity status. RESULTS: We included 627 044 (Spain: 122 058, UK: 2336, and US: 502 650) diagnosed and 160 013 (Spain: 18 197, US: 141 816) hospitalized patients with COVID-19. The prevalence of obesity was higher among patients hospitalized (39.9%, 95%CI: 39.8-40.0) than among those diagnosed with COVID-19 (33.1%; 95%CI: 33.0-33.2). In both cohorts, PLWO were more often female. Hospitalized PLWO were younger than patients without obesity. Overall, COVID-19 PLWO were more likely to have prior medical conditions, present with cardiovascular and respiratory events during hospitalization, or require intensive services compared to COVID-19 patients without obesity. CONCLUSION: We show that PLWO differ from patients without obesity in a wide range of medical conditions and present with more severe forms of COVID-19, with higher hospitalization rates and intensive services requirements. These findings can help guiding preventive strategies of COVID-19 infection and complications and generating hypotheses for causal inference studies.
Asunto(s)
COVID-19/epidemiología , Obesidad/epidemiología , Adolescente , Adulto , Anciano , COVID-19/mortalidad , Estudios de Cohortes , Comorbilidad , Femenino , Hospitalización , Humanos , Masculino , Persona de Mediana Edad , Prevalencia , Factores de Riesgo , España/epidemiología , Reino Unido/epidemiología , Estados Unidos/epidemiología , Adulto JovenRESUMEN
OBJECTIVE: Patients with autoimmune diseases were advised to shield to avoid coronavirus disease 2019 (COVID-19), but information on their prognosis is lacking. We characterized 30-day outcomes and mortality after hospitalization with COVID-19 among patients with prevalent autoimmune diseases, and compared outcomes after hospital admissions among similar patients with seasonal influenza. METHODS: A multinational network cohort study was conducted using electronic health records data from Columbia University Irving Medical Center [USA, Optum (USA), Department of Veterans Affairs (USA), Information System for Research in Primary Care-Hospitalization Linked Data (Spain) and claims data from IQVIA Open Claims (USA) and Health Insurance and Review Assessment (South Korea). All patients with prevalent autoimmune diseases, diagnosed and/or hospitalized between January and June 2020 with COVID-19, and similar patients hospitalized with influenza in 2017-18 were included. Outcomes were death and complications within 30 days of hospitalization. RESULTS: We studied 133 589 patients diagnosed and 48 418 hospitalized with COVID-19 with prevalent autoimmune diseases. Most patients were female, aged ≥50 years with previous comorbidities. The prevalence of hypertension (45.5-93.2%), chronic kidney disease (14.0-52.7%) and heart disease (29.0-83.8%) was higher in hospitalized vs diagnosed patients with COVID-19. Compared with 70 660 hospitalized with influenza, those admitted with COVID-19 had more respiratory complications including pneumonia and acute respiratory distress syndrome, and higher 30-day mortality (2.2-4.3% vs 6.32-24.6%). CONCLUSION: Compared with influenza, COVID-19 is a more severe disease, leading to more complications and higher mortality.
Asunto(s)
Enfermedades Autoinmunes/mortalidad , Enfermedades Autoinmunes/virología , COVID-19/mortalidad , Hospitalización/estadística & datos numéricos , Gripe Humana/mortalidad , Adulto , Anciano , Anciano de 80 o más Años , COVID-19/inmunología , Estudios de Cohortes , Femenino , Humanos , Gripe Humana/inmunología , Masculino , Persona de Mediana Edad , Prevalencia , Pronóstico , República de Corea/epidemiología , SARS-CoV-2 , España/epidemiología , Estados Unidos/epidemiología , Adulto JovenRESUMEN
OBJECTIVES: Concern has been raised in the rheumatology community regarding recent regulatory warnings that HCQ used in the coronavirus disease 2019 pandemic could cause acute psychiatric events. We aimed to study whether there is risk of incident depression, suicidal ideation or psychosis associated with HCQ as used for RA. METHODS: We performed a new-user cohort study using claims and electronic medical records from 10 sources and 3 countries (Germany, UK and USA). RA patients ≥18 years of age and initiating HCQ were compared with those initiating SSZ (active comparator) and followed up in the short (30 days) and long term (on treatment). Study outcomes included depression, suicide/suicidal ideation and hospitalization for psychosis. Propensity score stratification and calibration using negative control outcomes were used to address confounding. Cox models were fitted to estimate database-specific calibrated hazard ratios (HRs), with estimates pooled where I2 <40%. RESULTS: A total of 918 144 and 290 383 users of HCQ and SSZ, respectively, were included. No consistent risk of psychiatric events was observed with short-term HCQ (compared with SSZ) use, with meta-analytic HRs of 0.96 (95% CI 0.79, 1.16) for depression, 0.94 (95% CI 0.49, 1.77) for suicide/suicidal ideation and 1.03 (95% CI 0.66, 1.60) for psychosis. No consistent long-term risk was seen, with meta-analytic HRs of 0.94 (95% CI 0.71, 1.26) for depression, 0.77 (95% CI 0.56, 1.07) for suicide/suicidal ideation and 0.99 (95% CI 0.72, 1.35) for psychosis. CONCLUSION: HCQ as used to treat RA does not appear to increase the risk of depression, suicide/suicidal ideation or psychosis compared with SSZ. No effects were seen in the short or long term. Use at a higher dose or for different indications needs further investigation. TRIAL REGISTRATION: Registered with EU PAS (reference no. EUPAS34497; http://www.encepp.eu/encepp/viewResource.htm? id=34498). The full study protocol and analysis source code can be found at https://github.com/ohdsi-studies/Covid19EstimationHydroxychloroquine2.
Asunto(s)
Antirreumáticos/efectos adversos , Tratamiento Farmacológico de COVID-19 , Depresión/inducido químicamente , Depresión/epidemiología , Hidroxicloroquina/efectos adversos , Psicosis Inducidas por Sustancias/epidemiología , Psicosis Inducidas por Sustancias/etiología , Ideación Suicida , Suicidio/estadística & datos numéricos , Adolescente , Adulto , Anciano , Antirreumáticos/uso terapéutico , Artritis Reumatoide/tratamiento farmacológico , Estudios de Cohortes , Femenino , Alemania , Humanos , Hidroxicloroquina/uso terapéutico , Masculino , Persona de Mediana Edad , Medición de Riesgo , Reino Unido , Estados Unidos , Adulto JovenRESUMEN
Artificial intelligence (AI) has huge potential to improve the health and well-being of people, but adoption in clinical practice is still limited. Lack of transparency is identified as one of the main barriers to implementation, as clinicians should be confident the AI system can be trusted. Explainable AI has the potential to overcome this issue and can be a step towards trustworthy AI. In this paper we review the recent literature to provide guidance to researchers and practitioners on the design of explainable AI systems for the health-care domain and contribute to formalization of the field of explainable AI. We argue the reason to demand explainability determines what should be explained as this determines the relative importance of the properties of explainability (i.e. interpretability and fidelity). Based on this, we propose a framework to guide the choice between classes of explainable AI methods (explainable modelling versus post-hoc explanation; model-based, attribution-based, or example-based explanations; global and local explanations). Furthermore, we find that quantitative evaluation metrics, which are important for objective standardized evaluation, are still lacking for some properties (e.g. clarity) and types of explanations (e.g. example-based methods). We conclude that explainable modelling can contribute to trustworthy AI, but the benefits of explainability still need to be proven in practice and complementary measures might be needed to create trustworthy AI in health care (e.g. reporting data quality, performing extensive (external) validation, and regulation).
Asunto(s)
Inteligencia Artificial , Atención a la Salud , HumanosRESUMEN
PURPOSE: Real-world studies to describe the use of first, second and third line therapies for the management and symptomatic treatment of dementia are lacking. This retrospective cohort study describes the first-, second- and third-line therapies used for the management and symptomatic treatment of dementia, and in particular Alzheimer's Disease. METHODS: Medical records of patients with newly diagnosed dementia between 1997 and 2017 were collected using four databases from the UK, Denmark, Italy and the Netherlands. RESULTS: We identified 191,933 newly diagnosed dementia patients in the four databases between 1997 and 2017 with 39,836 (IPCI (NL): 3281, HSD (IT): 1601, AUH (DK): 4474, THIN (UK): 30,480) fulfilling the inclusion criteria, and of these, 21,131 had received a specific diagnosis of Alzheimer's disease. The most common first line therapy initiated within a year (± 365 days) of diagnosis were Acetylcholinesterase inhibitors, namely rivastigmine in IPCI, donepezil in HSD and the THIN and the N-methyl-D-aspartate blocker memantine in AUH. CONCLUSION: We provide a real-world insight into the heterogeneous management and treatment pathways of newly diagnosed dementia patients and a subset of Alzheimer's Disease patients from across Europe.
Asunto(s)
Enfermedad de Alzheimer , Registros Electrónicos de Salud , Enfermedad de Alzheimer/diagnóstico , Enfermedad de Alzheimer/tratamiento farmacológico , Europa (Continente) , Galantamina , Humanos , Indanos , Italia , Países Bajos , Fenilcarbamatos , Piperidinas , Estudios RetrospectivosRESUMEN
BACKGROUND: Researchers developing prediction models are faced with numerous design choices that may impact model performance. One key decision is how to include patients who are lost to follow-up. In this paper we perform a large-scale empirical evaluation investigating the impact of this decision. In addition, we aim to provide guidelines for how to deal with loss to follow-up. METHODS: We generate a partially synthetic dataset with complete follow-up and simulate loss to follow-up based either on random selection or on selection based on comorbidity. In addition to our synthetic data study we investigate 21 real-world data prediction problems. We compare four simple strategies for developing models when using a cohort design that encounters loss to follow-up. Three strategies employ a binary classifier with data that: (1) include all patients (including those lost to follow-up), (2) exclude all patients lost to follow-up or (3) only exclude patients lost to follow-up who do not have the outcome before being lost to follow-up. The fourth strategy uses a survival model with data that include all patients. We empirically evaluate the discrimination and calibration performance. RESULTS: The partially synthetic data study results show that excluding patients who are lost to follow-up can introduce bias when loss to follow-up is common and does not occur at random. However, when loss to follow-up was completely at random, the choice of addressing it had negligible impact on model discrimination performance. Our empirical real-world data results showed that the four design choices investigated to deal with loss to follow-up resulted in comparable performance when the time-at-risk was 1-year but demonstrated differential bias when we looked into 3-year time-at-risk. Removing patients who are lost to follow-up before experiencing the outcome but keeping patients who are lost to follow-up after the outcome can bias a model and should be avoided. CONCLUSION: Based on this study we therefore recommend (1) developing models using data that includes patients that are lost to follow-up and (2) evaluate the discrimination and calibration of models twice: on a test set including patients lost to follow-up and a test set excluding patients lost to follow-up.
Asunto(s)
Perdida de Seguimiento , Sesgo , Calibración , Estudios de Cohortes , Humanos , PronósticoRESUMEN
BACKGROUND: There are sparse real-world data on severe asthma exacerbations (SAE) in children. This multinational cohort study assessed the incidence of and risk factors for SAE and the incidence of asthma-related rehospitalization in children with asthma. METHODS: Asthma patients 5-17 years old with ≥1 year of follow-up were identified in six European electronic databases from the Netherlands, Italy, the UK, Denmark and Spain in 2008-2013. Asthma was defined as ≥1 asthma-specific disease code within 3 months of prescriptions/dispensing of asthma medication. Severe asthma was defined as high-dosed inhaled corticosteroids plus a second controller. SAE was defined by systemic corticosteroids, emergency department visit and/or hospitalization all for reason of asthma. Risk factors for SAE were estimated by Poisson regression analyses. RESULTS: The cohort consisted of 212 060 paediatric asthma patients contributing to 678 625 patient-years (PY). SAE rates ranged between 17 and 198/1000 PY and were higher in severe asthma and highest in severe asthma patients with a history of exacerbations. Prior SAE (incidence rate ratio 3-45) and younger age increased the SAE risk in all countries, whereas obesity, atopy and GERD were a risk factor in some but not all countries. Rehospitalization rates were up to 79% within 1 year. CONCLUSIONS: In a real-world setting, SAE rates were highest in children with severe asthma with a history of exacerbations. Many severe asthma patients were rehospitalized within 1 year. Asthma management focusing on prevention of SAE is important to reduce the burden of asthma.
Asunto(s)
Antiasmáticos , Asma , Adolescente , Corticoesteroides/uso terapéutico , Antiasmáticos/uso terapéutico , Asma/tratamiento farmacológico , Niño , Preescolar , Estudios de Cohortes , Progresión de la Enfermedad , Europa (Continente)/epidemiología , Femenino , Humanos , Incidencia , Masculino , Factores de RiesgoRESUMEN
BACKGROUND: Recent evidence suggests that there is often substantial variation in the benefits and harms across a trial population. We aimed to identify regression modeling approaches that assess heterogeneity of treatment effect within a randomized clinical trial. METHODS: We performed a literature review using a broad search strategy, complemented by suggestions of a technical expert panel. RESULTS: The approaches are classified into 3 categories: 1) Risk-based methods (11 papers) use only prognostic factors to define patient subgroups, relying on the mathematical dependency of the absolute risk difference on baseline risk; 2) Treatment effect modeling methods (9 papers) use both prognostic factors and treatment effect modifiers to explore characteristics that interact with the effects of therapy on a relative scale. These methods couple data-driven subgroup identification with approaches to prevent overfitting, such as penalization or use of separate data sets for subgroup identification and effect estimation. 3) Optimal treatment regime methods (12 papers) focus primarily on treatment effect modifiers to classify the trial population into those who benefit from treatment and those who do not. Finally, we also identified papers which describe model evaluation methods (4 papers). CONCLUSIONS: Three classes of approaches were identified to assess heterogeneity of treatment effect. Methodological research, including both simulations and empirical evaluations, is required to compare the available methods in different settings and to derive well-informed guidance for their application in RCT analysis.
Asunto(s)
Proyectos de Investigación , Humanos , Ensayos Clínicos Controlados Aleatorios como AsuntoRESUMEN
BACKGROUND: To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to scale-up external validation of patient-level prediction models by enabling validation across a large number of heterogeneous observational healthcare datasets. METHODS: Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks. A network study was run that enabled the five models to be externally validated across nine observational healthcare datasets spanning three countries and five independent sites. RESULTS: The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and they obtained mean c-statistics ranging between 0.57-0.63 across the 6 databases with sufficient data to predict stroke within 1 year of initial atrial fibrillation diagnosis for females with atrial fibrillation. This was comparable with existing validation studies. The validation network study was run across nine datasets within 60 days once the models were replicated. An R package for the study was published at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation. CONCLUSION: This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and a data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability or reproducibility of a prediction model, but without collaborative approaches it can take three or more years for a model to be validated by one independent researcher. In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months. We recommend that researchers developing new prediction models use the OHDSI network to externally validate their models.