Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
1.
Healthc (Amst) ; 12(2): 100738, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38531228

RESUMEN

The COVID-19 pandemic generated tremendous interest in using real world data (RWD). Many consortia across the public and private sectors formed in 2020 with the goal of rapidly producing high-quality evidence from RWD to guide medical decision-making, public health priorities, and more. Experiences were gathered from five large consortia on rapid multi-institutional evidence generation during the COVID-19 pandemic. Insights have been compiled across five dimensions: consortium composition, governance structure and alignment of priorities, data sharing, data analysis, and evidence dissemination. The purpose of this piece is to offer guidance on building large-scale multi-institutional RWD analysis pipelines for future public health issues. The composition of each consortium was largely influenced by existing collaborations. A central set of priorities for evidence generation guided each consortium, however different approaches to governance emerged. Challenges surrounding limited access to clinical data due to various contributors were overcome in unique ways. While all consortia used different methods to construct and analyze patient cohorts ranging from centralized to federated approaches, all proved effective for generating meaningful real-world evidence. Actionable recommendations for clinical practice and public health agencies were made from translating insights from consortium analyses. Each consortium was successful in rapidly answering questions about COVID-19 diagnosis and treatment despite all taking slightly different approaches to data sharing and analysis. Leveraging RWD, leveraged in a manner that applies scientific rigor and transparency, can complement higher-level evidence and serve as an important adjunct to clinical trials to quickly guide policy and critical care, especially for a pandemic response.


Asunto(s)
COVID-19 , COVID-19/epidemiología , Humanos , Pandemias , Difusión de la Información/métodos , SARS-CoV-2
2.
medRxiv ; 2024 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-38370719

RESUMEN

Background: Subject screening is a key aspect of all clinical trials; however, traditionally, it is a labor-intensive and error-prone task, demanding significant time and resources. With the advent of large language models (LLMs) and related technologies, a paradigm shift in natural language processing capabilities offers a promising avenue for increasing both quality and efficiency of screening efforts. This study aimed to test the Retrieval-Augmented Generation (RAG) process enabled Generative Pretrained Transformer Version 4 (GPT-4) to accurately identify and report on inclusion and exclusion criteria for a clinical trial. Methods: The Co-Operative Program for Implementation of Optimal Therapy in Heart Failure (COPILOT-HF) trial aims to recruit patients with symptomatic heart failure. As part of the screening process, a list of potentially eligible patients is created through an electronic health record (EHR) query. Currently, structured data in the EHR can only be used to determine 5 out of 6 inclusion and 5 out of 17 exclusion criteria. Trained, but non-licensed, study staff complete manual chart review to determine patient eligibility and record their assessment of the inclusion and exclusion criteria. We obtained the structured assessments completed by the study staff and clinical notes for the past two years and developed a workflow of clinical note-based question answering system powered by RAG architecture and GPT-4 that we named RECTIFIER (RAG-Enabled Clinical Trial Infrastructure for Inclusion Exclusion Review). We used notes from 100 patients as a development dataset, 282 patients as a validation dataset, and 1894 patients as a test set. An expert clinician completed a blinded review of patients' charts to answer the eligibility questions and determine the "gold standard" answers. We calculated the sensitivity, specificity, accuracy, and Matthews correlation coefficient (MCC) for each question and screening method. We also performed bootstrapping to calculate the confidence intervals for each statistic. Results: Both RECTIFIER and study staff answers closely aligned with the expert clinician answers across criteria with accuracy ranging between 97.9% and 100% (MCC 0.837 and 1) for RECTIFIER and 91.7% and 100% (MCC 0.644 and 1) for study staff. RECTIFIER performed better than study staff to determine the inclusion criteria of "symptomatic heart failure" with an accuracy of 97.9% vs 91.7% and an MCC of 0.924 vs 0.721, respectively. Overall, the sensitivity and specificity of determining eligibility for the RECTIFIER was 92.3% (CI) and 93.9% (CI), and study staff was 90.1% (CI) and 83.6% (CI), respectively. Conclusion: GPT-4 based solutions have the potential to improve efficiency and reduce costs in clinical trial screening. When incorporating new tools such as RECTIFIER, it is important to consider the potential hazards of automating the screening process and set up appropriate mitigation strategies such as final clinician review before patient engagement.

3.
EClinicalMedicine ; 64: 102210, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37745021

RESUMEN

Background: Characterizing Post-Acute Sequelae of COVID (SARS-CoV-2 Infection), or PASC has been challenging due to the multitude of sub-phenotypes, temporal attributes, and definitions. Scalable characterization of PASC sub-phenotypes can enhance screening capacities, disease management, and treatment planning. Methods: We conducted a retrospective multi-centre observational cohort study, leveraging longitudinal electronic health record (EHR) data of 30,422 patients from three healthcare systems in the Consortium for the Clinical Characterization of COVID-19 by EHR (4CE). From the total cohort, we applied a deductive approach on 12,424 individuals with follow-up data and developed a distributed representation learning process for providing augmented definitions for PASC sub-phenotypes. Findings: Our framework characterized seven PASC sub-phenotypes. We estimated that on average 15.7% of the hospitalized COVID-19 patients were likely to suffer from at least one PASC symptom and almost 5.98%, on average, had multiple symptoms. Joint pain and dyspnea had the highest prevalence, with an average prevalence of 5.45% and 4.53%, respectively. Interpretation: We provided a scalable framework to every participating healthcare system for estimating PASC sub-phenotypes prevalence and temporal attributes, thus developing a unified model that characterizes augmented sub-phenotypes across the different systems. Funding: Authors are supported by National Institute of Allergy and Infectious Diseases, National Institute on Aging, National Center for Advancing Translational Sciences, National Medical Research Council, National Institute of Neurological Disorders and Stroke, European Union, National Institutes of Health, National Center for Advancing Translational Sciences.

4.
PLOS Digit Health ; 2(7): e0000301, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37490472

RESUMEN

Physical and psychological symptoms lasting months following an acute COVID-19 infection are now recognized as post-acute sequelae of COVID-19 (PASC). Accurate tools for identifying such patients could enhance screening capabilities for the recruitment for clinical trials, improve the reliability of disease estimates, and allow for more accurate downstream cohort analysis. In this retrospective cohort study, we analyzed the EHR of hospitalized COVID-19 patients across three healthcare systems to develop a pipeline for better identifying patients with persistent PASC symptoms (dyspnea, fatigue, or joint pain) after their SARS-CoV-2 infection. We implemented distributed representation learning powered by the Machine Learning for modeling Health Outcomes (MLHO) to identify novel EHR features that could suggest PASC symptoms outside of typical diagnosis codes. MLHO applies an entropy-based feature selection and boosting algorithms for representation mining. These improved definitions were then used for estimating PASC among hospitalized patients. 30,422 hospitalized patients were diagnosed with COVID-19 across three healthcare systems between March 13, 2020 and February 28, 2021. The mean age of the population was 62.3 years (SD, 21.0 years) and 15,124 (49.7%) were female. We implemented the distributed representation learning technique to augment PASC definitions. These definitions were found to have positive predictive values of 0.73, 0.74, and 0.91 for dyspnea, fatigue, and joint pain, respectively. We estimated that 25 percent (CI 95%: 6-48), 11 percent (CI 95%: 6-15), and 13 percent (CI 95%: 8-17) of hospitalized COVID-19 patients will have dyspnea, fatigue, and joint pain, respectively, 3 months or longer after a COVID-19 diagnosis. We present a validated framework for screening and identifying patients with PASC in the EHR and then use the tool to estimate its prevalence among hospitalized COVID-19 patients.

5.
JAMA Cardiol ; 8(1): 12-21, 2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36350612

RESUMEN

Importance: Blood pressure (BP) and cholesterol control remain challenging. Remote care can deliver more effective care outside of traditional clinician-patient settings but scaling and ensuring access to care among diverse populations remains elusive. Objective: To implement and evaluate a remote hypertension and cholesterol management program across a diverse health care network. Design, Setting, and Participants: Between January 2018 and July 2021, 20 454 patients in a large integrated health network were screened; 18 444 were approached, and 10 803 were enrolled in a comprehensive remote hypertension and cholesterol program (3658 patients with hypertension, 8103 patients with cholesterol, and 958 patients with both). A total of 1266 patients requested education only without medication titration. Enrolled patients received education, home BP device integration, and medication titration. Nonlicensed navigators and pharmacists, supported by cardiovascular clinicians, coordinated care using standardized algorithms, task management and automation software, and omnichannel communication. BP and laboratory test results were actively monitored. Main Outcomes and Measures: Changes in BP and low-density lipoprotein cholesterol (LDL-C). Results: The mean (SD) age among 10 803 patients was 65 (11.4) years; 6009 participants (56%) were female; 1321 (12%) identified as Black, 1190 (11%) as Hispanic, 7758 (72%) as White, and 1727 (16%) as another or multiple races (including American Indian or Alaska Native, Asian, Native Hawaiian or Other Pacific Islander, unknown, other, and declined to respond; consolidated owing to small numbers); and 142 (11%) reported a preferred language other than English. A total of 424 482 BP readings and 139 263 laboratory reports were collected. In the hypertension program, the mean (SD) office BP prior to enrollment was 150/83 (18/10) mm Hg, and the mean (SD) home BP was 145/83 (20/12) mm Hg. For those engaged in remote medication management, the mean (SD) clinic BP 6 and 12 months after enrollment decreased by 8.7/3.8 (21.4/12.4) and 9.7/5.2 (22.2/12.6) mm Hg, respectively. In the education-only cohort, BP changed by a mean (SD) -1.5/-0.7 (23.0/11.1) and by +0.2/-1.9 (30.3/11.2) mm Hg, respectively (P < .001 for between cohort difference). In the lipids program, patients in remote medication management experienced a reduction in LDL-C by a mean (SD) 35.4 (43.1) and 37.5 (43.9) mg/dL at 6 and 12 months, respectively, while the education-only cohort experienced a mean (SD) reduction in LDL-C of 9.3 (34.3) and 10.2 (35.5) mg/dL at 6 and 12 months, respectively (P < .001). Similar rates of enrollment and reductions in BP and lipids were observed across different racial, ethnic, and primary language groups. Conclusions and Relevance: The results of this study indicate that a standardized remote BP and cholesterol management program may help optimize guideline-directed therapy at scale, reduce cardiovascular risk, and minimize the need for in-person visits among diverse populations.


Asunto(s)
Hipercolesterolemia , Hipertensión , Humanos , Femenino , Anciano , Masculino , LDL-Colesterol/sangre , Hipertensión/tratamiento farmacológico , Hipertensión/epidemiología , Presión Sanguínea , Atención a la Salud
6.
Bioinformatics ; 38(20): 4833-4836, 2022 10 14.
Artículo en Inglés | MEDLINE | ID: mdl-36053173

RESUMEN

MOTIVATION: The i2b2 platform is used at major academic health institutions and research consortia for querying for electronic health data. However, a major obstacle for wider utilization of the platform is the complexity of data loading that entails a steep curve of learning the platform's complex data schemas. To address this problem, we have developed the i2b2-etl package that simplifies the data loading process, which will facilitate wider deployment and utilization of the platform. RESULTS: We have implemented i2b2-etl as a Python application that imports ontology and patient data using simplified input file schemas and provides inbuilt record number de-identification and data validation. We describe a real-world deployment of i2b2-etl for a population-management initiative at MassGeneral Brigham. AVAILABILITY AND IMPLEMENTATION: i2b2-etl is a free, open-source application implemented in Python available under the Mozilla 2 license. The application can be downloaded as compiled docker images. A live demo is available at https://i2b2clinical.org/demo-i2b2etl/ (username: demo, password: Etl@2021). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información , Biología , Bases de Datos Factuales , Humanos , Informática
7.
Artículo en Inglés | MEDLINE | ID: mdl-35874460

RESUMEN

Analysis of health data typically requires development of queries using structured query language (SQL) by a data-analyst. As the SQL queries are manually created, they are prone to errors. In addition, accurate implementation of the queries depends on effective communication with clinical experts, that further makes the analysis error prone. As a potential resolution, we explore an alternative approach wherein a graphical interface that automatically generates the SQL queries is used to perform the analysis. The latter allows clinical experts to directly perform complex queries on the data, despite their unfamiliarity with SQL syntax. The interface provides an intuitive understanding of the query logic which makes the analysis transparent and comprehensible to the clinical study-staff, thereby enhancing the transparency and validity of the analysis. This study demonstrates the feasibility of using a user-friendly interface that automatically generate SQL for analysis of health data. It outlines challenges that will be useful for designing user-friendly tools to improve transparency and reproducibility of data analysis.

8.
J Am Heart Assoc ; 11(15): e026014, 2022 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-35904194

RESUMEN

Background Models predicting atrial fibrillation (AF) risk, such as Cohorts for Heart and Aging Research in Genomic Epidemiology AF (CHARGE-AF), have not performed as well in electronic health records. Natural language processing (NLP) may improve models by using narrative electronic health record text. Methods and Results From a primary care network, we included patients aged ≥65 years with visits between 2003 and 2013 in development (n=32 960) and internal validation cohorts (n=13 992). An external validation cohort from a separate network from 2015 to 2020 included 39 051 patients. Model features were defined using electronic health record codified data and narrative data with NLP. We developed 2 models to predict 5-year AF incidence using (1) codified+NLP data and (2) codified data only and evaluated model performance. The analysis included 2839 incident AF cases in the development cohort and 1057 and 2226 cases in internal and external validation cohorts, respectively. The C-statistic was greater (P<0.001) in codified+NLP model (0.744 [95% CI, 0.735-0.753]) compared with codified-only (0.730 [95% CI, 0.720-0.739]) in the development cohort. In internal validation, the C-statistic of codified+NLP was modestly higher (0.735 [95% CI, 0.720-0.749]) compared with codified-only (0.729 [95% CI, 0.715-0.744]; P=0.06) and CHARGE-AF (0.717 [95% CI, 0.703-0.731]; P=0.002). Codified+NLP and codified-only were well calibrated, whereas CHARGE-AF underestimated AF risk. In external validation, the C-statistic of codified+NLP (0.750 [95% CI, 0.740-0.760]) remained higher (P<0.001) than codified-only (0.738 [95% CI, 0.727-0.748]) and CHARGE-AF (0.735 [95% CI, 0.725-0.746]). Conclusions Estimation of 5-year risk of AF can be modestly improved using NLP to incorporate narrative electronic health record data.


Asunto(s)
Fibrilación Atrial , Procesamiento de Lenguaje Natural , Fibrilación Atrial/diagnóstico , Fibrilación Atrial/epidemiología , Estudios de Cohortes , Registros Electrónicos de Salud , Humanos , Incidencia , Medición de Riesgo/métodos
9.
BMJ Open ; 12(6): e057725, 2022 06 23.
Artículo en Inglés | MEDLINE | ID: mdl-35738646

RESUMEN

OBJECTIVE: To assess changes in international mortality rates and laboratory recovery rates during hospitalisation for patients hospitalised with SARS-CoV-2 between the first wave (1 March to 30 June 2020) and the second wave (1 July 2020 to 31 January 2021) of the COVID-19 pandemic. DESIGN, SETTING AND PARTICIPANTS: This is a retrospective cohort study of 83 178 hospitalised patients admitted between 7 days before or 14 days after PCR-confirmed SARS-CoV-2 infection within the Consortium for Clinical Characterization of COVID-19 by Electronic Health Record, an international multihealthcare system collaborative of 288 hospitals in the USA and Europe. The laboratory recovery rates and mortality rates over time were compared between the two waves of the pandemic. PRIMARY AND SECONDARY OUTCOME MEASURES: The primary outcome was all-cause mortality rate within 28 days after hospitalisation stratified by predicted low, medium and high mortality risk at baseline. The secondary outcome was the average rate of change in laboratory values during the first week of hospitalisation. RESULTS: Baseline Charlson Comorbidity Index and laboratory values at admission were not significantly different between the first and second waves. The improvement in laboratory values over time was faster in the second wave compared with the first. The average C reactive protein rate of change was -4.72 mg/dL vs -4.14 mg/dL per day (p=0.05). The mortality rates within each risk category significantly decreased over time, with the most substantial decrease in the high-risk group (42.3% in March-April 2020 vs 30.8% in November 2020 to January 2021, p<0.001) and a moderate decrease in the intermediate-risk group (21.5% in March-April 2020 vs 14.3% in November 2020 to January 2021, p<0.001). CONCLUSIONS: Admission profiles of patients hospitalised with SARS-CoV-2 infection did not differ greatly between the first and second waves of the pandemic, but there were notable differences in laboratory improvement rates during hospitalisation. Mortality risks among patients with similar risk profiles decreased over the course of the pandemic. The improvement in laboratory values and mortality risk was consistent across multiple countries.


Asunto(s)
COVID-19 , Pandemias , Hospitalización , Humanos , Estudios Retrospectivos , SARS-CoV-2
10.
J Am Med Inform Assoc ; 29(8): 1334-1341, 2022 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-35511151

RESUMEN

OBJECTIVE: The increasing translation of artificial intelligence (AI)/machine learning (ML) models into clinical practice brings an increased risk of direct harm from modeling bias; however, bias remains incompletely measured in many medical AI applications. This article aims to provide a framework for objective evaluation of medical AI from multiple aspects, focusing on binary classification models. MATERIALS AND METHODS: Using data from over 56 000 Mass General Brigham (MGB) patients with confirmed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), we evaluate unrecognized bias in 4 AI models developed during the early months of the pandemic in Boston, Massachusetts that predict risks of hospital admission, ICU admission, mechanical ventilation, and death after a SARS-CoV-2 infection purely based on their pre-infection longitudinal medical records. Models were evaluated both retrospectively and prospectively using model-level metrics of discrimination, accuracy, and reliability, and a novel individual-level metric for error. RESULTS: We found inconsistent instances of model-level bias in the prediction models. From an individual-level aspect, however, we found most all models performing with slightly higher error rates for older patients. DISCUSSION: While a model can be biased against certain protected groups (ie, perform worse) in certain tasks, it can be at the same time biased towards another protected group (ie, perform better). As such, current bias evaluation studies may lack a full depiction of the variable effects of a model on its subpopulations. CONCLUSION: Only a holistic evaluation, a diligent search for unrecognized bias, can provide enough information for an unbiased judgment of AI bias that can invigorate follow-up investigations on identifying the underlying roots of bias and ultimately make a change.


Asunto(s)
COVID-19 , Inteligencia Artificial , Humanos , Reproducibilidad de los Resultados , Estudios Retrospectivos , SARS-CoV-2
11.
J Med Internet Res ; 24(5): e37931, 2022 05 18.
Artículo en Inglés | MEDLINE | ID: mdl-35476727

RESUMEN

BACKGROUND: Admissions are generally classified as COVID-19 hospitalizations if the patient has a positive SARS-CoV-2 polymerase chain reaction (PCR) test. However, because 35% of SARS-CoV-2 infections are asymptomatic, patients admitted for unrelated indications with an incidentally positive test could be misclassified as a COVID-19 hospitalization. Electronic health record (EHR)-based studies have been unable to distinguish between a hospitalization specifically for COVID-19 versus an incidental SARS-CoV-2 hospitalization. Although the need to improve classification of COVID-19 versus incidental SARS-CoV-2 is well understood, the magnitude of the problems has only been characterized in small, single-center studies. Furthermore, there have been no peer-reviewed studies evaluating methods for improving classification. OBJECTIVE: The aims of this study are to, first, quantify the frequency of incidental hospitalizations over the first 15 months of the pandemic in multiple hospital systems in the United States and, second, to apply electronic phenotyping techniques to automatically improve COVID-19 hospitalization classification. METHODS: From a retrospective EHR-based cohort in 4 US health care systems in Massachusetts, Pennsylvania, and Illinois, a random sample of 1123 SARS-CoV-2 PCR-positive patients hospitalized from March 2020 to August 2021 was manually chart-reviewed and classified as "admitted with COVID-19" (incidental) versus specifically admitted for COVID-19 ("for COVID-19"). EHR-based phenotyping was used to find feature sets to filter out incidental admissions. RESULTS: EHR-based phenotyped feature sets filtered out incidental admissions, which occurred in an average of 26% of hospitalizations (although this varied widely over time, from 0% to 75%). The top site-specific feature sets had 79%-99% specificity with 62%-75% sensitivity, while the best-performing across-site feature sets had 71%-94% specificity with 69%-81% sensitivity. CONCLUSIONS: A large proportion of SARS-CoV-2 PCR-positive admissions were incidental. Straightforward EHR-based phenotypes differentiated admissions, which is important to assure accurate public health reporting and research.


Asunto(s)
COVID-19 , SARS-CoV-2 , COVID-19/diagnóstico , COVID-19/epidemiología , Registros Electrónicos de Salud , Hospitalización , Humanos , Estudios Retrospectivos
12.
medRxiv ; 2022 Feb 18.
Artículo en Inglés | MEDLINE | ID: mdl-35350202

RESUMEN

Admissions are generally classified as COVID-19 hospitalizations if the patient has a positive SARS-CoV-2 polymerase chain reaction (PCR) test. However, because 35% of SARS-CoV-2 infections are asymptomatic, patients admitted for unrelated indications with an incidentally positive test could be misclassified as a COVID-19 hospitalization. EHR-based studies have been unable to distinguish between a hospitalization specifically for COVID-19 versus an incidental SARS-CoV-2 hospitalization. From a retrospective EHR-based cohort in four US healthcare systems, a random sample of 1,123 SARS-CoV-2 PCR-positive patients hospitalized between 3/2020â€"8/2021 was manually chart-reviewed and classified as admitted-with-COVID-19 (incidental) vs. specifically admitted for COVID-19 (for-COVID-19). EHR-based phenotyped feature sets filtered out incidental admissions, which occurred in 26%. The top site-specific feature sets had 79-99% specificity with 62-75% sensitivity, while the best performing across-site feature set had 71-94% specificity with 69-81% sensitivity. A large proportion of SARS-CoV-2 PCR-positive admissions were incidental. Straightforward EHR-based phenotypes differentiated admissions, which is important to assure accurate public health reporting and research.

13.
Appl Clin Inform ; 12(5): 1041-1048, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34758494

RESUMEN

OBJECTIVES: Hypertension is a modifiable risk factor for numerous comorbidities and treating hypertension can greatly improve health outcomes. We sought to increase the efficiency of a virtual hypertension management program through workflow automation processes. METHODS: We developed a customer relationship management (CRM) solution at our institution for the purpose of improving processes and workflow for a virtual hypertension management program and describe here the development, implementation, and initial experience of this CRM system. RESULTS: Notable system features include task automation, patient data capture, multi-channel communication, integration with our electronic health record (EHR), and device integration (for blood pressure cuffs). In the five stages of our program (intake and eligibility screening, enrollment, device configuration/setup, medication titration, and maintenance), we describe some of the key process improvements and workflow automations that are enabled using our CRM platform, like automatic reminders to capture blood pressure data and present these data to our clinical team when ready for clinical decision making. We also describe key limitations of CRM, like balancing out-of-the-box functionality with development flexibility. Among our first group of referred patients, 76% (39/51) preferred email as their communication method, 26/51 (51%) were able to enroll electronically, and 63% of those enrolled (32/51) were able to transmit blood pressure data without phone support. CONCLUSION: A CRM platform could improve clinical processes through multiple pathways, including workflow automation, multi-channel communication, and device integration. Future work will examine the operational improvements of this health information technology solution as well as assess clinical outcomes.


Asunto(s)
Hipertensión , Informática Médica , Automatización , Registros Electrónicos de Salud , Humanos , Hipertensión/tratamiento farmacológico , Flujo de Trabajo
14.
NPJ Digit Med ; 4(1): 15, 2021 Feb 04.
Artículo en Inglés | MEDLINE | ID: mdl-33542473

RESUMEN

This study aims to predict death after COVID-19 using only the past medical information routinely collected in electronic health records (EHRs) and to understand the differences in risk factors across age groups. Combining computational methods and clinical expertise, we curated clusters that represent 46 clinical conditions as potential risk factors for death after a COVID-19 infection. We trained age-stratified generalized linear models (GLMs) with component-wise gradient boosting to predict the probability of death based on what we know from the patients before they contracted the virus. Despite only relying on previously documented demographics and comorbidities, our models demonstrated similar performance to other prognostic models that require an assortment of symptoms, laboratory values, and images at the time of diagnosis or during the course of the illness. In general, we found age as the most important predictor of mortality in COVID-19 patients. A history of pneumonia, which is rarely asked in typical epidemiology studies, was one of the most important risk factors for predicting COVID-19 mortality. A history of diabetes with complications and cancer (breast and prostate) were notable risk factors for patients between the ages of 45 and 65 years. In patients aged 65-85 years, diseases that affect the pulmonary system, including interstitial lung disease, chronic obstructive pulmonary disease, lung cancer, and a smoking history, were important for predicting mortality. The ability to compute precise individual-level risk scores exclusively based on the EHR is crucial for effectively allocating and distributing resources, such as prioritizing vaccination among the general population.

15.
J Med Internet Res ; 23(3): e22219, 2021 03 02.
Artículo en Inglés | MEDLINE | ID: mdl-33600347

RESUMEN

Coincident with the tsunami of COVID-19-related publications, there has been a surge of studies using real-world data, including those obtained from the electronic health record (EHR). Unfortunately, several of these high-profile publications were retracted because of concerns regarding the soundness and quality of the studies and the EHR data they purported to analyze. These retractions highlight that although a small community of EHR informatics experts can readily identify strengths and flaws in EHR-derived studies, many medical editorial teams and otherwise sophisticated medical readers lack the framework to fully critically appraise these studies. In addition, conventional statistical analyses cannot overcome the need for an understanding of the opportunities and limitations of EHR-derived studies. We distill here from the broader informatics literature six key considerations that are crucial for appraising studies utilizing EHR data: data completeness, data collection and handling (eg, transformation), data type (ie, codified, textual), robustness of methods against EHR variability (within and across institutions, countries, and time), transparency of data and analytic code, and the multidisciplinary approach. These considerations will inform researchers, clinicians, and other stakeholders as to the recommended best practices in reviewing manuscripts, grants, and other outputs from EHR-data derived studies, and thereby promote and foster rigor, quality, and reliability of this rapidly growing field.


Asunto(s)
COVID-19/epidemiología , Recolección de Datos/métodos , Registros Electrónicos de Salud , Recolección de Datos/normas , Humanos , Revisión de la Investigación por Pares/normas , Edición/normas , Reproducibilidad de los Resultados , SARS-CoV-2/aislamiento & purificación
16.
J Am Med Inform Assoc ; 28(7): 1411-1420, 2021 07 14.
Artículo en Inglés | MEDLINE | ID: mdl-33566082

RESUMEN

OBJECTIVE: The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing coronavirus disease 2019 (COVID-19) with federated analyses of electronic health record (EHR) data. We sought to develop and validate a computable phenotype for COVID-19 severity. MATERIALS AND METHODS: Twelve 4CE sites participated. First, we developed an EHR-based severity phenotype consisting of 6 code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of intensive care unit (ICU) admission and/or death. We also piloted an alternative machine learning approach and compared selected predictors of severity with the 4CE phenotype at 1 site. RESULTS: The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability-up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean area under the curve of 0.903 (95% confidence interval, 0.886-0.921), compared with an area under the curve of 0.956 (95% confidence interval, 0.952-0.959) for the machine learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared with chart review. DISCUSSION: We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly owing to heterogeneous pandemic conditions. CONCLUSIONS: We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.


Asunto(s)
COVID-19 , Registros Electrónicos de Salud , Índice de Severidad de la Enfermedad , COVID-19/clasificación , Hospitalización , Humanos , Aprendizaje Automático , Pronóstico , Curva ROC , Sensibilidad y Especificidad
17.
JAMA Cardiol ; 5(12): 1430-1434, 2020 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-32936209

RESUMEN

Importance: Optimal treatment of heart failure with reduced ejection fraction (HFrEF) is scripted by treatment guidelines, but many eligible patients do not receive guideline-directed medical therapy (GDMT) in clinical practice. Objective: To determine whether a remote, algorithm-driven, navigator-administered medication optimization program could enhance implementation of GDMT in HFrEF. Design, Setting, and Participants: In this case-control study, a population-based sample of patients with HFrEF was offered participation in a quality improvement program directed at GDMT optimization. Treating clinicians in a tertiary academic medical center who were caring for patients with heart failure and an ejection fraction of 40% or less (identified through an electronic health record-based search) were approached for permission to adjust medical therapy according to a sequential titration algorithm modeled on the current American College of Cardiology/American Heart Association heart failure guidelines. Navigators contacted participants by telephone to direct medication adjustment and conduct longitudinal surveillance of laboratory tests, blood pressure, and symptoms under supervision of a pharmacist, nurse practitioner, and heart failure cardiologist. Patients and clinicians declining to participate served as a control group. Exposures: Navigator-led remote optimization of GDMT compared with usual care. Main Outcomes and Measures: Proportion of patients receiving GDMT in the intervention and control groups at 3 months. Results: Of 1028 eligible patients (mean [SD] values: age, 68 [14] years; ejection fraction, 32% [8%]; and systolic blood pressure, 122 [18] mm Hg; 305 women (30.0%); 892 individuals [86.8%] in New York Heart Association class I and II), 197 (19.2%) participated in the medication optimization program, and 831 (80.8%) continued with usual care as directed by their treating clinicians (585 [56.9%] general cardiologists; 443 [43.1%] heart failure specialists). At 3 months, patients participating in the remote intervention experienced significant increases from baseline in use of renin-angiotensin system antagonists (138 [70.1%] to 170 [86.3%]; P < .001) and ß-blockers (152 [77.2%] to 181 [91.9%]; P < .001) but not mineralocorticoid receptor antagonists (51 [25.9%] to 60 [30.5%]; P = .14). Doses for each category of GDMT also increased from baseline in the intervention group. Among the usual-care group, there were no changes from baseline in the proportion of patients receiving GDMT or the dose of GDMT in any category. Conclusions and Relevance: Remote titration of GDMT by navigators using encoded algorithms may represent an efficient, population-level strategy for rapidly closing the gap between guidelines and clinical practice in patients with HFrEF.


Asunto(s)
Insuficiencia Cardíaca/tratamiento farmacológico , Anciano , Anciano de 80 o más Años , Algoritmos , Estudios de Casos y Controles , Femenino , Insuficiencia Cardíaca/fisiopatología , Humanos , Masculino , Persona de Mediana Edad , Guías de Práctica Clínica como Asunto , Volumen Sistólico , Telemedicina
18.
Patterns (N Y) ; 1(4): 100051, 2020 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-32835307

RESUMEN

Electronic health records (EHRs) contain important temporal information about the progression of disease and treatment outcomes. This paper proposes a transitive sequencing approach for constructing temporal representations from EHR observations for downstream machine learning. Using clinical data from a cohort of patients with congestive heart failure, we mined temporal representations by transitive sequencing of EHR medication and diagnosis records for classification and prediction tasks. We compared the classification and prediction performances of the transitive sequential representations (bag-of-sequences approach) with the conventional approach of using aggregated vectors of EHR data (aggregated vector representation) across different classifiers. We found that the transitive sequential representations are better phenotype "differentiators" and predictors than the "atemporal" EHR records. Our results also demonstrated that data representations obtained from transitive sequencing of EHR observations can present novel insights about the progression of the disease that are difficult to discern when clinical data are treated independently of the patient's history.

19.
Biomed Res Int ; 2020: 2851713, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32724799

RESUMEN

Despite the widespread use of the "Informatics for Integrating Biology and the Bedside" (i2b2) platform, there are substantial challenges for loading electronic health records (EHR) into i2b2 and for querying i2b2. We have previously presented a simplified framework for semantic abstraction of EHR records into i2b2. Building on our previous work, we have created a proof-of-concept implementation of cloud services on an i2b2 data store for cohort identification. Specifically, we have implemented a graphical user interface (GUI) that declares the key components for data import, transformation, and query of EHR data. The GUI integrates with Azure cloud services to create data pipelines for importing EHR data into i2b2, creation of derived facts, and querying for generating Sankey-like flow diagrams that characterize the patient cohorts. We have evaluated the implementation using the real-world MIMIC-III dataset. We discuss the key features of this implementation and direction for future work, which will advance the efforts of the research community for patient cohort identification.


Asunto(s)
Investigación Biomédica/métodos , Informática/métodos , Almacenamiento y Recuperación de la Información/métodos , Biología/métodos , Nube Computacional , Estudios de Cohortes , Registros Electrónicos de Salud , Humanos , Programas Informáticos
20.
Bioinformatics ; 36(10): 3200-3206, 2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-32049335

RESUMEN

MOTIVATION: Expert-labeled data are essential to train phenotyping algorithms for cohort identification. However expert labeling is time and labor intensive, and the costs remain prohibitive for scaling phenotyping to wider use-cases. RESULTS: We present an approach referred to as polar labeling (PL), to create silver standard for training machine learning (ML) for disease classification. We test the hypothesis that ML models trained on the silver standard created by applying PL on unlabeled patient records, are comparable in performance to the ML models trained on gold standard, created by clinical experts through manual review of patient records. We perform experimental validation using health records of 38 023 patients spanning six diseases. Our results demonstrate the superior performance of the proposed approach. AVAILABILITY AND IMPLEMENTATION: We provide a Python implementation of the algorithm and the Python code developed for this study on Github. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Aprendizaje Automático , Color , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...