RESUMO
BACKGROUND: Electronic fetal monitoring is used in most US hospital births but has significant limitations in achieving its intended goal of preventing intrapartum hypoxic-ischemic injury. Novel deep learning techniques can improve complex data processing and pattern recognition in medicine. OBJECTIVE: This study aimed to apply deep learning approaches to develop and validate a model to predict fetal acidemia from electronic fetal monitoring data. STUDY DESIGN: The database was created using intrapartum electronic fetal monitoring data from 2006 to 2020 from a large, multisite academic health system. Data were divided into training and testing sets with equal distribution of acidemic cases. Several different deep learning architectures were explored. The primary outcome was umbilical artery acidemia, which was investigated at 4 clinically meaningful thresholds: 7.20, 7.15, 7.10, and 7.05, along with base excess. The receiver operating characteristic curves were generated with the area under the receiver operating characteristic assessed to determine the performance of the models. External validation was performed using a publicly available Czech database of electronic fetal monitoring data. RESULTS: A total of 124,777 electronic fetal monitoring files were available, of which 77,132 had <30% missingness in the last 60 minutes of the electronic fetal monitoring tracing. Of these, 21,041 were matched to a corresponding umbilical cord gas result, of which 10,182 were time-stamped within 30 minutes of the last electronic fetal monitoring reading and composed the final dataset. The prevalence rates of the outcomes in the data were 20.9% with a pH of <7.2, 9.1% with a pH of <7.15, 3.3% with a pH of <7.10, and 1.3% with a pH of <7.05. The best performing model achieved an area under the receiver operating characteristic of 0.85 at a pH threshold of <7.05. When predicting the joint outcome of both pH of <7.05 and base excess of less than -10 meq/L, an area under the receiver operating characteristic of 0.89 was achieved. When predicting both pH of <7.20 and base excess of less than -10 meq/L, an area under the receiver operating characteristic of 0.87 was achieved. At a pH of <7.15 and a positive predictive value of 30%, the model achieved a sensitivity of 90% and a specificity of 48%. CONCLUSION: The application of deep learning methods to intrapartum electronic fetal monitoring analysis achieves promising performance in predicting fetal acidemia. This technology could help improve the accuracy and consistency of electronic fetal monitoring interpretation.
RESUMO
BACKGROUND: The coronavirus disease 2019 (COVID-19) pandemic challenges hospital leaders to make time-sensitive, critical decisions about clinical operations and resource allocations. OBJECTIVE: To estimate the timing of surges in clinical demand and the best- and worst-case scenarios of local COVID-19-induced strain on hospital capacity, and thus inform clinical operations and staffing demands and identify when hospital capacity would be saturated. DESIGN: Monte Carlo simulation instantiation of a susceptible, infected, removed (SIR) model with a 1-day cycle. SETTING: 3 hospitals in an academic health system. PATIENTS: All people living in the greater Philadelphia region. MEASUREMENTS: The COVID-19 Hospital Impact Model (CHIME) (http://penn-chime.phl.io) SIR model was used to estimate the time from 23 March 2020 until hospital capacity would probably be exceeded, and the intensity of the surge, including for intensive care unit (ICU) beds and ventilators. RESULTS: Using patients with COVID-19 alone, CHIME estimated that it would be 31 to 53 days before demand exceeds existing hospital capacity. In best- and worst-case scenarios of surges in the number of patients with COVID-19, the needed total capacity for hospital beds would reach 3131 to 12 650 across the 3 hospitals, including 338 to 1608 ICU beds and 118 to 599 ventilators. LIMITATIONS: Model parameters were taken directly or derived from published data across heterogeneous populations and practice environments and from the health system's historical data. CHIME does not incorporate more transition states to model infection severity, social networks to model transmission dynamics, or geographic information to account for spatial patterns of human interaction. CONCLUSION: Publicly available and designed for hospital operations leaders, this modeling tool can inform preparations for capacity strain during the early days of a pandemic. PRIMARY FUNDING SOURCE: University of Pennsylvania Health System and the Palliative and Advanced Illness Research Center.
Assuntos
Betacoronavirus , Infecções por Coronavirus/terapia , Tomada de Decisões , Unidades de Terapia Intensiva/organização & administração , Modelos Organizacionais , Pandemias , Pneumonia Viral/terapia , COVID-19 , Infecções por Coronavirus/epidemiologia , Humanos , Pneumonia Viral/epidemiologia , SARS-CoV-2 , Estados Unidos/epidemiologiaRESUMO
BACKGROUND: Automated texting platforms have emerged as a tool to facilitate communication between patients and health care providers with variable effects on achieving target blood pressure (BP). Understanding differences in the way patients interact with these communication platforms can inform their use and design for hypertension management. OBJECTIVE: Our primary aim was to explore the unique phenotypes of patient interactions with an automated text messaging platform for BP monitoring. Our secondary aim was to estimate associations between interaction phenotypes and BP control. METHODS: This study was a secondary analysis of data from a randomized controlled trial for adults with poorly controlled hypertension. A total of 201 patients with established primary care were assigned to the automated texting platform; messages exchanged throughout the 4-month program were analyzed. We used the k-means clustering algorithm to characterize two different interaction phenotypes: program conformity and engagement style. First, we identified unique clusters signifying differences in program conformity based on the frequency over time of error alerts, which were generated to patients when they deviated from the requested text message format (eg, ###/## for BP). Second, we explored overall engagement styles, defined by error alerts and responsiveness to text prompts, unprompted messages, and word count averages. Finally, we applied the chi-square test to identify associations between each interaction phenotype and achieving the target BP. RESULTS: We observed 3 categories of program conformity based on their frequency of error alerts: those who immediately and consistently submitted texts without system errors (perfect users, 51/201), those who did so after an initial learning period (adaptive users, 66/201), and those who consistently submitted messages generating errors to the platform (nonadaptive users, 38/201). Next, we observed 3 categories of engagement style: the enthusiast, who tended to submit unprompted messages with high word counts (17/155); the student, who inconsistently engaged (35/155); and the minimalist, who engaged only when prompted (103/155). Of all 6 phenotypes, we observed a statistically significant association between patients demonstrating the minimalist communication style (high adherence, few unprompted messages, limited information sharing) and achieving target BP (P<.001). CONCLUSIONS: We identified unique interaction phenotypes among patients engaging with an automated text message platform for remote BP monitoring. Only the minimalist communication style was associated with achieving target BP. Identifying and understanding interaction phenotypes may be useful for tailoring future automated texting interactions and designing future interventions to achieve better BP control.
Assuntos
Pressão Sanguínea/fisiologia , Hipertensão/terapia , Monitorização Fisiológica/métodos , Envio de Mensagens de Texto/normas , Adolescente , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Adulto JovemRESUMO
BACKGROUND: For real-time monitoring of hospital patients, high-quality inference of patients' health status using all information available from clinical covariates and lab test results is essential to enable successful medical interventions and improve patient outcomes. Developing a computational framework that can learn from observational large-scale electronic health records (EHRs) and make accurate real-time predictions is a critical step. In this work, we develop and explore a Bayesian nonparametric model based on multi-output Gaussian process (GP) regression for hospital patient monitoring. METHODS: We propose MedGP, a statistical framework that incorporates 24 clinical covariates and supports a rich reference data set from which relationships between observed covariates may be inferred and exploited for high-quality inference of patient state over time. To do this, we develop a highly structured sparse GP kernel to enable tractable computation over tens of thousands of time points while estimating correlations among clinical covariates, patients, and periodicity in patient observations. MedGP has a number of benefits over current methods, including (i) not requiring an alignment of the time series data, (ii) quantifying confidence regions in the predictions, (iii) exploiting a vast and rich database of patients, and (iv) inferring interpretable relationships among clinical covariates. RESULTS: We evaluate and compare results from MedGP on the task of online prediction for three patient subgroups from two medical data sets across 8,043 patients. We find MedGP improves online prediction over baseline and state-of-the-art methods for nearly all covariates across different disease subgroups and hospitals. CONCLUSIONS: The MedGP framework is robust and efficient in estimating the temporal dependencies from sparse and irregularly sampled medical time series data for online prediction. The publicly available code is at https://github.com/bee-hive/MedGP .
Assuntos
Algoritmos , Modelos Estatísticos , Teorema de Bayes , Distribuição NormalRESUMO
OBJECTIVE: To assess clinician perceptions of a machine learning-based early warning system to predict severe sepsis and septic shock (Early Warning System 2.0). DESIGN: Prospective observational study. SETTING: Tertiary teaching hospital in Philadelphia, PA. PATIENTS: Non-ICU admissions November-December 2016. INTERVENTIONS: During a 6-week study period conducted 5 months after Early Warning System 2.0 alert implementation, nurses and providers were surveyed twice about their perceptions of the alert's helpfulness and impact on care, first within 6 hours of the alert, and again 48 hours after the alert. MEASUREMENTS AND MAIN RESULTS: For the 362 alerts triggered, 180 nurses (50% response rate) and 107 providers (30% response rate) completed the first survey. Of these, 43 nurses (24% response rate) and 44 providers (41% response rate) completed the second survey. Few (24% nurses, 13% providers) identified new clinical findings after responding to the alert. Perceptions of the presence of sepsis at the time of alert were discrepant between nurses (13%) and providers (40%). The majority of clinicians reported no change in perception of the patient's risk for sepsis (55% nurses, 62% providers). A third of nurses (30%) but few providers (9%) reported the alert changed management. Almost half of nurses (42%) but less than a fifth of providers (16%) found the alert helpful at 6 hours. CONCLUSIONS: In general, clinical perceptions of Early Warning System 2.0 were poor. Nurses and providers differed in their perceptions of sepsis and alert benefits. These findings highlight the challenges of achieving acceptance of predictive and machine learning-based sepsis alerts.
Assuntos
Algoritmos , Atitude do Pessoal de Saúde , Sistemas de Apoio a Decisões Clínicas , Aprendizado de Máquina , Sepse/diagnóstico , Choque Séptico/diagnóstico , Diagnóstico por Computador , Registros Eletrônicos de Saúde , Hospitais de Ensino , Humanos , Corpo Clínico Hospitalar , Recursos Humanos de Enfermagem Hospitalar , Padrões de Prática em Enfermagem/estatística & dados numéricos , Padrões de Prática Médica/estatística & dados numéricos , Estudos Prospectivos , Envio de Mensagens de TextoRESUMO
OBJECTIVES: Develop and implement a machine learning algorithm to predict severe sepsis and septic shock and evaluate the impact on clinical practice and patient outcomes. DESIGN: Retrospective cohort for algorithm derivation and validation, pre-post impact evaluation. SETTING: Tertiary teaching hospital system in Philadelphia, PA. PATIENTS: All non-ICU admissions; algorithm derivation July 2011 to June 2014 (n = 162,212); algorithm validation October to December 2015 (n = 10,448); silent versus alert comparison January 2016 to February 2017 (silent n = 22,280; alert n = 32,184). INTERVENTIONS: A random-forest classifier, derived and validated using electronic health record data, was deployed both silently and later with an alert to notify clinical teams of sepsis prediction. MEASUREMENT AND MAIN RESULT: Patients identified for training the algorithm were required to have International Classification of Diseases, 9th Edition codes for severe sepsis or septic shock and a positive blood culture during their hospital encounter with either a lactate greater than 2.2 mmol/L or a systolic blood pressure less than 90 mm Hg. The algorithm demonstrated a sensitivity of 26% and specificity of 98%, with a positive predictive value of 29% and positive likelihood ratio of 13. The alert resulted in a small statistically significant increase in lactate testing and IV fluid administration. There was no significant difference in mortality, discharge disposition, or transfer to ICU, although there was a reduction in time-to-ICU transfer. CONCLUSIONS: Our machine learning algorithm can predict, with low sensitivity but high specificity, the impending occurrence of severe sepsis and septic shock. Algorithm-generated predictive alerts modestly impacted clinical measures. Next steps include describing clinical perception of this tool and optimizing algorithm design and delivery.
Assuntos
Algoritmos , Sistemas de Apoio a Decisões Clínicas , Diagnóstico por Computador , Aprendizado de Máquina , Sepse/diagnóstico , Choque Séptico/diagnóstico , Estudos de Coortes , Registros Eletrônicos de Saúde , Hospitais de Ensino , Humanos , Estudos Retrospectivos , Sensibilidade e Especificidade , Envio de Mensagens de TextoRESUMO
BACKGROUND: Development of electronic health record (EHR) prediction models to improve palliative care delivery is on the rise, yet the clinical impact of such models has not been evaluated. OBJECTIVE: To assess the clinical impact of triggering palliative care using an EHR prediction model. DESIGN: Pilot prospective before-after study on the general medical wards at an urban academic medical center. PARTICIPANTS: Adults with a predicted probability of 6-month mortality of ≥ 0.3. INTERVENTION: Triggered (with opt-out) palliative care consult on hospital day 2. MAIN MEASURES: Frequencies of consults, advance care planning (ACP) documentation, home palliative care and hospice referrals, code status changes, and pre-consult length of stay (LOS). KEY RESULTS: The control and intervention periods included 8 weeks each and 138 admissions and 134 admissions, respectively. Characteristics between the groups were similar, with a mean (standard deviation) risk of 6-month mortality of 0.5 (0.2). Seventy-seven (57%) triggered consults were accepted by the primary team and 8 consults were requested per usual care during the intervention period. Compared to historical controls, consultation increased by 74% (22 [16%] vs 85 [63%], P < .001), median (interquartile range) pre-consult LOS decreased by 1.4 days (2.6 [1.1, 6.2] vs 1.2 [0.8, 2.7], P = .02), ACP documentation increased by 38% (23 [17%] vs 37 [28%], P = .03), and home palliative care referrals increased by 61% (9 [7%] vs 23 [17%], P = .01). There were no differences between the control and intervention groups in hospice referrals (14 [10] vs 22 [16], P = .13), code status changes (42 [30] vs 39 [29]; P = .81), or consult requests for lower risk (< 0.3) patients (48/1004 [5] vs 33/798 [4]; P = .48). CONCLUSIONS: Targeting hospital-based palliative care using an EHR mortality prediction model is a clinically promising approach to improve the quality of care among seriously ill medical patients. More evidence is needed to determine the generalizability of this approach and its impact on patient- and caregiver-reported outcomes.
Assuntos
Técnicas de Apoio para a Decisão , Cuidados Paliativos/organização & administração , Aceitação pelo Paciente de Cuidados de Saúde/estatística & dados numéricos , Idoso , Idoso de 80 Anos ou mais , Registros Eletrônicos de Saúde , Feminino , Hospitalização/estatística & dados numéricos , Humanos , Masculino , Pessoa de Meia-Idade , Ensaios Clínicos Controlados não Aleatórios como Assunto , Projetos Piloto , Estudos Prospectivos , Encaminhamento e Consulta/organização & administração , Encaminhamento e Consulta/estatística & dados numéricosAssuntos
Infecções por Coronavirus , Influenza Humana , Pandemias , Pneumonia Viral , Betacoronavirus , COVID-19 , Hospitais , Humanos , Influenza Humana/epidemiologia , SARS-CoV-2RESUMO
OBJECTIVE: Evaluate predictive performance of an electronic health record (EHR)-based, inpatient 6-month mortality risk model developed to trigger palliative care consultation among patient groups stratified by age, race, ethnicity, insurance and socioeconomic status (SES), which may vary due to social forces (eg, racism) that shape health, healthcare and health data. DESIGN: Retrospective evaluation of prediction model. SETTING: Three urban hospitals within a single health system. PARTICIPANTS: All patients ≥18 years admitted between 1 January and 31 December 2017, excluding observation, obstetric, rehabilitation and hospice (n=58 464 encounters, 41 327 patients). MAIN OUTCOME MEASURES: General performance metrics (c-statistic, integrated calibration index (ICI), Brier Score) and additional measures relevant to health equity (accuracy, false positive rate (FPR), false negative rate (FNR)). RESULTS: For black versus non-Hispanic white patients, the model's accuracy was higher (0.051, 95% CI 0.044 to 0.059), FPR lower (-0.060, 95% CI -0.067 to -0.052) and FNR higher (0.049, 95% CI 0.023 to 0.078). A similar pattern was observed among patients who were Hispanic, younger, with Medicaid/missing insurance, or living in low SES zip codes. No consistent differences emerged in c-statistic, ICI or Brier Score. Younger age had the second-largest effect size in the mortality prediction model, and there were large standardised group differences in age (eg, 0.32 for non-Hispanic white versus black patients), suggesting age may contribute to systematic differences in the predicted probabilities between groups. CONCLUSIONS: An EHR-based mortality risk model was less likely to identify some marginalised patients as potentially benefiting from palliative care, with younger age pinpointed as a possible mechanism. Evaluating predictive performance is a critical preliminary step in addressing algorithmic inequities in healthcare, which must also include evaluating clinical impact, and governance and regulatory structures for oversight, monitoring and accountability.
Assuntos
Registros Eletrônicos de Saúde , Cuidados Paliativos , Gravidez , Feminino , Estados Unidos , Humanos , Estudos Retrospectivos , Etnicidade , Encaminhamento e ConsultaRESUMO
Sudden changes in health care utilization during the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic may have impacted the performance of clinical predictive models that were trained prior to the pandemic. In this study, we evaluated the performance over time of a machine learning, electronic health record-based mortality prediction algorithm currently used in clinical practice to identify patients with cancer who may benefit from early advance care planning conversations. We show that during the pandemic period, algorithm identification of high-risk patients had a substantial and sustained decline. Decreases in laboratory utilization during the peak of the pandemic may have contributed to drift. Calibration and overall discrimination did not markedly decline during the pandemic. This argues for careful attention to the performance and retraining of predictive algorithms that use inputs from the pandemic period.
Assuntos
COVID-19 , Neoplasias , Humanos , Algoritmos , Neoplasias/mortalidade , Pandemias , SARS-CoV-2 , Aprendizado de MáquinaRESUMO
Importance: Serious illness conversations (SICs) between oncology clinicians and patients are associated with improved quality of life and may reduce aggressive end-of-life care. However, most patients with cancer die without a documented SIC. Objective: To test the impact of behavioral nudges to clinicians to prompt SICs on the SIC rate and end-of-life outcomes among patients at high risk of death within 180 days (high-risk patients) as identified by a machine learning algorithm. Design, Setting, and Participants: This prespecified 40-week analysis of a stepped-wedge randomized clinical trial conducted between June 17, 2019, and April 20, 2020 (including 16 weeks of intervention rollout and 24 weeks of follow-up), included 20â¯506 patients with cancer representing 41â¯021 encounters at 9 tertiary or community-based medical oncology clinics in a large academic health system. The current analyses were conducted from June 1, 2021, to May 31, 2022. Intervention: High-risk patients were identified using a validated electronic health record machine learning algorithm to predict 6-month mortality. The intervention consisted of (1) weekly emails to clinicians comparing their SIC rates for all patients against peers' rates, (2) weekly lists of high-risk patients, and (3) opt-out text messages to prompt SICs before encounters with high-risk patients. Main Outcomes and Measures: The primary outcome was SIC rates for all and high-risk patient encounters; secondary end-of-life outcomes among decedents included inpatient death, hospice enrollment and length of stay, and intensive care unit admission and systemic therapy close to death. Intention-to-treat analyses were adjusted for clinic and wedge fixed effects and clustered at the oncologist level. Results: The study included 20 506 patients (mean [SD] age, 60.0 [14.0] years) and 41 021 patient encounters: 22 259 (54%) encounters with female patients, 28 907 (70.5%) with non-Hispanic White patients, and 5520 (13.5%) with high-risk patients; 1417 patients (6.9%) died by the end of follow-up. There were no meaningful differences in demographic characteristics in the control and intervention periods. Among high-risk patient encounters, the unadjusted SIC rates were 3.4% (59 of 1754 encounters) in the control period and 13.5% (510 of 3765 encounters) in the intervention period. In adjusted analyses, the intervention was associated with increased SICs for all patients (adjusted odds ratio, 2.09 [95% CI, 1.53-2.87]; P < .001) and decreased end-of-life systemic therapy (7.5% [72 of 957 patients] vs 10.4% [24 of 231 patients]; adjusted odds ratio, 0.25 [95% CI, 0.11-0.57]; P = .001) relative to controls, but there was no effect on hospice enrollment or length of stay, inpatient death, or end-of-life ICU use. Conclusions and Relevance: In this randomized clinical trial, a machine learning-based behavioral intervention and behavioral nudges to clinicans led to an increase in SICs and reduction in end-of-life systemic therapy but no changes in other end-of-life outcomes among outpatients with cancer. These results suggest that machine learning and behavioral nudges can lead to long-lasting improvements in cancer care delivery. Trial Registration: ClinicalTrials.gov Identifier: NCT03984773.
Assuntos
Neoplasias , Qualidade de Vida , Humanos , Feminino , Pessoa de Meia-Idade , Neoplasias/terapia , Comunicação , Aprendizado de Máquina , MorteRESUMO
Both provider- and protocol-driven electrolyte replacement have been linked to the over-prescription of ubiquitous electrolytes. Here, we describe the development and retrospective validation of a data-driven clinical decision support tool that uses reinforcement learning (RL) algorithms to recommend patient-tailored electrolyte replacement policies for ICU patients. We used electronic health records (EHR) data that originated from two institutions (UPHS; MIMIC-IV). The tool uses a set of patient characteristics, such as their physiological and pharmacological state, a pre-defined set of possible repletion actions, and a set of clinical goals to present clinicians with a recommendation for the route and dose of an electrolyte. RL-driven electrolyte repletion substantially reduces the frequency of magnesium and potassium replacements (up to 60%), adjusts the timing of interventions in all three electrolytes considered (potassium, magnesium, and phosphate), and shifts them towards orally administered repletion over intravenous replacement. This shift in recommended treatment limits risk of the potentially harmful effects of over-repletion and implies monetary savings. Overall, the RL-driven electrolyte repletion recommendations reduce excess electrolyte replacements and improve the safety, precision, efficacy, and cost of each electrolyte repletion event, while showing robust performance across patient cohorts and hospital systems.
RESUMO
Research Objective: Health systems use clinical predictive algorithms to allocate resources to high-risk patients. Such algorithms are trained using historical data and are later implemented in clinical settings. During this implementation period, predictive algorithms are prone to performance changes ("drift") due to exogenous shocks in utilization or shifts in patient characteristics. Our objective was to examine the impact of sudden utilization shifts during the SARS-CoV-2 pandemic on the performance of an electronic health record (EHR)-based prognostic algorithm. Study Design: We studied changes in the performance of Conversation Connect, a validated machine learning algorithm that predicts 180-day mortality among outpatients with cancer receiving care at medical oncology practices within a large academic cancer center. Conversation Connect generates mortality risk predictions before each encounter using data from 159 EHR variables collected in the six months before the encounter. Since January 2019, Conversation Connect has been used as part of a behavioral intervention to prompt clinicians to consider early advance care planning conversations among patients with ≥10% mortality risk. First, we descriptively compared encounter-level characteristics in the following periods: January 2019-February 2020 ("pre-pandemic"), March-May 2020 ("early-pandemic"), and June-December 2020 ("later-pandemic"). Second, we quantified changes in high-risk patient encounters using interrupted time series analyses that controlled for pre-pandemic trends and demographic, clinical, and practice covariates. Our primary metric of performance drift was false negative rate (FNR). Third, we assessed contributors to performance drift by comparing distributions of key EHR inputs across periods and predicting later pandemic utilization using pre-pandemic inputs. Population Studied: 237,336 in-person and telemedicine medical oncology encounters. Principal Findings: Age, race, average patient encounters per month, insurance type, comorbidity counts, laboratory values, and overall mortality were similar among encounters in the pre-, early-, and later-pandemic periods. Relative to the pre-pandemic period, the later-pandemic period was characterized by a 6.5-percentage-point decrease (28.2% vs. 34.7%) in high-risk encounters (p<0.001). FNR increased from 41.0% (95% CI 38.0-44.1%) in the pre-pandemic period to 57.5% (95% CI 51.9-63.0%) in the later pandemic period. Compared to the pre-pandemic period, the early and later pandemic periods had higher proportions of telemedicine encounters (0.01% pre-pandemic vs. 20.0% early-pandemic vs. 26.4% later-pandemic) and encounters with no preceding laboratory draws (17.7% pre-pandemic vs. 19.8% early-pandemic vs. 24.1% later-pandemic). In the later pandemic period, observed laboratory utilization was lower than predicted (76.0% vs 81.2%, p<0.001). In the later-pandemic period, mean 180-day mortality risk scores were lower for telemedicine encounters vs. in-person encounters (10.3% vs 11.2%, p<0.001) and encounters with no vs. any preceding laboratory draws (1.5% vs. 14.0%, p<0.001). Conclusions: During the SARS-CoV-2 pandemic period, the performance of a machine learning prognostic algorithm used to prompt advance care planning declined substantially. Increases in telemedicine and declines in laboratory utilization contributed to lower performance. Implications for Policy or Practice: This is the first study to show algorithm performance drift due to SARS-CoV-2 pandemic-related shifts in telemedicine and laboratory utilization. These mechanisms of performance drift could apply to other EHR clinical predictive algorithms. Pandemic-related decreases in care utilization may negatively impact the performance of clinical predictive algorithms and warrant assessment and possible retraining of such algorithms.
RESUMO
PURPOSE: Machine learning (ML) algorithms that incorporate routinely collected patient-reported outcomes (PROs) alongside electronic health record (EHR) variables may improve prediction of short-term mortality and facilitate earlier supportive and palliative care for patients with cancer. METHODS: We trained and validated two-phase ML algorithms that incorporated standard PRO assessments alongside approximately 200 routinely collected EHR variables, among patients with medical oncology encounters at a tertiary academic oncology and a community oncology practice. RESULTS: Among 12,350 patients, 5,870 (47.5%) completed PRO assessments. Compared with EHR- and PRO-only algorithms, the EHR + PRO model improved predictive performance in both tertiary oncology (EHR + PRO v EHR v PRO: area under the curve [AUC] 0.86 [0.85-0.87] v 0.82 [0.81-0.83] v 0.74 [0.74-0.74]) and community oncology (area under the curve 0.89 [0.88-0.90] v 0.86 [0.85-0.88] v 0.77 [0.76-0.79]) practices. CONCLUSION: Routinely collected PROs contain added prognostic information not captured by an EHR-based ML mortality risk algorithm. Augmenting an EHR-based algorithm with PROs resulted in a more accurate and clinically relevant model, which can facilitate earlier and targeted supportive care for patients with cancer.
Assuntos
Registros Eletrônicos de Saúde , Neoplasias , Humanos , Medidas de Resultados Relatados pelo Paciente , Cuidados Paliativos , Aprendizado de Máquina , Neoplasias/diagnóstico , Neoplasias/terapiaRESUMO
BACKGROUND: While health systems have implemented multifaceted interventions to improve physician and patient communication in serious illnesses such as cancer, clinicians vary in their response to these initiatives. In this secondary analysis of a randomized trial, we identified phenotypes of oncology clinicians based on practice pattern and demographic data, then evaluated associations between such phenotypes and response to a machine learning (ML)-based intervention to prompt earlier advance care planning (ACP) for patients with cancer. METHODS AND FINDINGS: Between June and November 2019, we conducted a pragmatic randomized controlled trial testing the impact of text message prompts to 78 oncology clinicians at 9 oncology practices to perform ACP conversations among patients with cancer at high risk of 180-day mortality, identified using a ML prognostic algorithm. All practices began in the pre-intervention group, which received weekly emails about ACP performance only; practices were sequentially randomized to receive the intervention at 4-week intervals in a stepped-wedge design. We used latent profile analysis (LPA) to identify oncologist phenotypes based on 11 baseline demographic and practice pattern variables identified using EHR and internal administrative sources. Difference-in-differences analyses assessed associations between oncologist phenotype and the outcome of change in ACP conversation rate, before and during the intervention period. Primary analyses were adjusted for patients' sex, age, race, insurance status, marital status, and Charlson comorbidity index. The sample consisted of 2695 patients with a mean age of 64.9 years, of whom 72% were White, 20% were Black, and 52% were male. 78 oncology clinicians (42 oncologists, 36 advanced practice providers) were included. Three oncologist phenotypes were identified: Class 1 (n = 9) composed primarily of high-volume generalist oncologists, Class 2 (n = 5) comprised primarily of low-volume specialist oncologists; and 3) Class 3 (n = 28), composed primarily of high-volume specialist oncologists. Compared with class 1 and class 3, class 2 had lower mean clinic days per week (1.6 vs 2.5 [class 3] vs 4.4 [class 1]) a higher percentage of new patients per week (35% vs 21% vs 18%), higher baseline ACP rates (3.9% vs 1.6% vs 0.8%), and lower baseline rates of chemotherapy within 14 days of death (1.4% vs 6.5% vs 7.1%). Overall, ACP rates were 3.6% in the pre-intervention wedges and 15.2% in intervention wedges (11.6 percentage-point difference). Compared to class 3, oncologists in class 1 (adjusted percentage-point difference-in-differences 3.6, 95% CI 1.0 to 6.1, p = 0.006) and class 2 (adjusted percentage-point difference-in-differences 12.3, 95% confidence interval [CI] 4.3 to 20.3, p = 0.003) had greater response to the intervention. CONCLUSIONS: Patient volume and time availability may be associated with oncologists' response to interventions to increase ACP. Future interventions to prompt ACP should prioritize making time available for such conversations between oncologists and their patients.
Assuntos
Planejamento Antecipado de Cuidados , Neoplasias , Oncologistas , Feminino , Humanos , Aprendizado de Máquina , Masculino , Neoplasias/terapia , FenótipoRESUMO
PURPOSE: Machine learning models developed from electronic health records data have been increasingly used to predict risk of mortality for general oncology patients. But these models may have suboptimal performance because of patient heterogeneity. The objective of this work is to develop a new modeling approach to predicting short-term mortality that accounts for heterogeneity across multiple subgroups in the presence of a large number of electronic health record predictors. METHODS: We proposed a two-stage approach to addressing heterogeneity among oncology patients of different cancer types for predicting their risk of mortality. Structured data were extracted from the University of Pennsylvania Health System for 20,723 patients of 11 cancer types, where 1,340 (6.5%) patients were deceased. We first modeled the overall risk for all patients without differentiating cancer types, as is done in the current practice. We then developed cancer type-specific models using the overall risk score as a predictor along with preselected type-specific predictors. The overall and type-specific models were compared with respect to discrimination using the area under the precision-recall curve (AUPRC) and calibration using the calibration slope. We also proposed metrics that characterize the degree of risk heterogeneity by comparing risk predictors in the overall and type-specific models. RESULTS: The two-stage modeling resulted in improved calibration and discrimination across all 11 cancer types. The improvement in AUPRC was significant for hematologic malignancies including leukemia, lymphoma, and myeloma. For instance, the AUPRC increased from 0.358 to 0.519 (∆ = 0.161; 95% CI, 0.102 to 0.224) and from 0.299 to 0.354 (∆ = 0.055; 95% CI, 0.009 to 0.107) for leukemia and lymphoma, respectively. For all 11 cancer types, the two-stage approach generated well-calibrated risks. A high degree of heterogeneity between type-specific and overall risk predictors was observed for most cancer types. CONCLUSION: Our two-stage modeling approach that accounts for cancer type-specific risk heterogeneity has improved calibration and discrimination than a model agnostic to cancer types.
Assuntos
Aprendizado de Máquina , Neoplasias , Área Sob a Curva , Registros Eletrônicos de Saúde , Humanos , Neoplasias/diagnóstico , Neoplasias/epidemiologia , Fatores de RiscoRESUMO
OBJECTIVES: Palliative care has been demonstrated to have positive effects for patients, families, health care providers, and health systems. Early identification of patients who are likely to benefit from palliative care would increase opportunities to provide these services to those most in need. This study predicted all-cause mortality of patients as a surrogate for patients who could benefit from palliative care. STUDY DESIGN: Claims and electronic health record (EHR) data for 59,639 patients from a large integrated health care system were utilized. METHODS: A deep learning algorithm-a long short-term memory (LSTM) model-was compared with other machine learning models: deep neural networks, random forest, and logistic regression. We conducted prediction analyses using combined claims data and EHR data, only claims data, and only EHR data, respectively. In each case, the data were randomly split into training (80%), validation (10%), and testing (10%) data sets. The models with different hyperparameters were trained using the training data, and the model with the best performance on the validation data was selected as the final model. The testing data were used to provide an unbiased performance evaluation of the final model. RESULTS: In all modeling scenarios, LSTM models outperformed the other 3 models, and using combined claims and EHR data yielded the best performance. CONCLUSIONS: LSTM models can effectively predict mortality by using a combination of EHR data and administrative claims data. The model could be used as a promising clinical tool to aid clinicians in early identification of appropriate patients for palliative care consultations.
Assuntos
Registros Eletrônicos de Saúde , Cuidados Paliativos , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Medição de RiscoRESUMO
OBJECTIVE: This study aimed to develop and validate a claims-based, machine learning algorithm to predict clinical outcomes across both medical and surgical patient populations. METHODS: This retrospective, observational cohort study, used a random 5% sample of 770,777 fee-for-service Medicare beneficiaries with an inpatient hospitalization between 2009-2011. The machine learning algorithms tested included: support vector machine, random forest, multilayer perceptron, extreme gradient boosted tree, and logistic regression. The extreme gradient boosted tree algorithm outperformed the alternatives and was the machine learning method used for the final risk model. Primary outcome was 30-day mortality. Secondary outcomes were: rehospitalization, and any of 23 adverse clinical events occurring within 30 days of the index admission date. RESULTS: The machine learning algorithm performance was evaluated by both the area under the receiver operating curve (AUROC) and Brier Score. The risk model demonstrated high performance for prediction of: 30-day mortality (AUROC = 0.88; Brier Score = 0.06), and 17 of the 23 adverse events (AUROC range: 0.80-0.86; Brier Score range: 0.01-0.05). The risk model demonstrated moderate performance for prediction of: rehospitalization within 30 days (AUROC = 0.73; Brier Score: = 0.07) and six of the 23 adverse events (AUROC range: 0.74-0.79; Brier Score range: 0.01-0.02). The machine learning risk model performed comparably on a second, independent validation dataset, confirming that the risk model was not overfit. CONCLUSIONS AND RELEVANCE: We have developed and validated a robust, claims-based, machine learning risk model that is applicable to both medical and surgical patient populations and demonstrates comparable predictive accuracy to existing risk models.
Assuntos
Aprendizado de Máquina , Resultado do Tratamento , Área Sob a Curva , Bases de Dados Factuais , Hospitalização/estatística & dados numéricos , Humanos , Modelos Logísticos , Medicare , Modelos Teóricos , Mortalidade , Curva ROC , Estudos Retrospectivos , Medição de Risco , Estados UnidosRESUMO
BACKGROUND: Serious illness conversations (SICs) are an evidence-based approach to eliciting patients' values, goals, and care preferences that improve patient outcomes. However, most patients with cancer die without a documented SIC. Clinician-directed implementation strategies informed by behavioral economics ("nudges") that identify high-risk patients have shown promise in increasing SIC documentation among clinicians. It is unknown whether patient-directed nudges that normalize and prime patients towards SIC completion-either alone or in combination with clinician nudges that additionally compare performance relative to peers-may improve on this approach. Our objective is to test the effect of clinician- and patient-directed nudges as implementation strategies for increasing SIC completion among patients with cancer. METHODS: We will conduct a 2 × 2 factorial, cluster randomized pragmatic trial to test the effect of nudges to clinicians, patients, or both, compared to usual care, on SIC completion. Participants will include 166 medical and gynecologic oncology clinicians practicing at ten sites within a large academic health system and their approximately 5500 patients at high risk of predicted 6-month mortality based on a validated machine-learning prognostic algorithm. Data will be obtained via the electronic medical record, clinician survey, and semi-structured interviews with clinicians and patients. The primary outcome will be time to SIC documentation among high-risk patients. Secondary outcomes will include time to SIC documentation among all patients (assessing spillover effects), palliative care referral among high-risk patients, and aggressive end-of-life care utilization (composite of chemotherapy within 14 days before death, hospitalization within 30 days before death, or admission to hospice within 3 days before death) among high-risk decedents. We will assess moderators of the effect of implementation strategies and conduct semi-structured interviews with a subset of clinicians and patients to assess contextual factors that shape the effectiveness of nudges with an eye towards health equity. DISCUSSION: This will be the first pragmatic trial to evaluate clinician- and patient-directed nudges to promote SIC completion for patients with cancer. We expect the study to yield insights into the effectiveness of clinician and patient nudges as implementation strategies to improve SIC rates, and to uncover multilevel contextual factors that drive response to these strategies. TRIAL REGISTRATION: ClinicalTrials.gov , NCT04867850 . Registered on April 30, 2021. FUNDING: National Cancer Institute P50CA244690.
Assuntos
Neoplasias , Assistência Terminal , Comunicação , Economia Comportamental , Feminino , Humanos , Neoplasias/terapia , Cuidados PaliativosRESUMO
IMPORTANCE: Serious illness conversations (SICs) are structured conversations between clinicians and patients about prognosis, treatment goals, and end-of-life preferences. Interventions that increase the rate of SICs between oncology clinicians and patients may improve goal-concordant care and patient outcomes. OBJECTIVE: To determine the effect of a clinician-directed intervention integrating machine learning mortality predictions with behavioral nudges on motivating clinician-patient SICs. DESIGN, SETTING, AND PARTICIPANTS: This stepped-wedge cluster randomized clinical trial was conducted across 20 weeks (from June 17 to November 1, 2019) at 9 medical oncology clinics (8 subspecialty oncology and 1 general oncology clinics) within a large academic health system in Pennsylvania. Clinicians at the 2 smallest subspecialty clinics were grouped together, resulting in 8 clinic groups randomly assigned to the 4 intervention wedge periods. Included participants in the intention-to-treat analyses were 78 oncology clinicians who received SIC training and their patients (N = 14â¯607) who had an outpatient oncology encounter during the study period. INTERVENTIONS: (1) Weekly emails to oncology clinicians with SIC performance feedback and peer comparisons; (2) a list of up to 6 high-risk patients (≥10% predicted risk of 180-day mortality) scheduled for the next week, estimated using a validated machine learning algorithm; and (3) opt-out text message prompts to clinicians on the patient's appointment day to consider an SIC. Clinicians in the control group received usual care consisting of weekly emails with cumulative SIC performance. MAIN OUTCOMES AND MEASURES: Percentage of patient encounters with an SIC in the intervention group vs the usual care (control) group. RESULTS: The sample consisted of 78 clinicians and 14â¯607 patients. The mean (SD) age of patients was 61.9 (14.2) years, 53.7% were female, and 70.4% were White. For all encounters, SICs were conducted among 1.3% in the control group and 4.6% in the intervention group, a significant difference (adjusted difference in percentage points, 3.3; 95% CI, 2.3-4.5; P < .001). Among 4124 high-risk patient encounters, SICs were conducted among 3.6% in the control group and 15.2% in the intervention group, a significant difference (adjusted difference in percentage points, 11.6; 95% CI, 8.2-12.5; P < .001). CONCLUSIONS AND RELEVANCE: In this stepped-wedge cluster randomized clinical trial, an intervention that delivered machine learning mortality predictions with behavioral nudges to oncology clinicians significantly increased the rate of SICs among all patients and among patients with high mortality risk who were targeted by the intervention. Behavioral nudges combined with machine learning mortality predictions can positively influence clinician behavior and may be applied more broadly to improve care near the end of life. TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT03984773.