Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 41
Filter
1.
Clin Neuropsychol ; : 1-12, 2024 Jul 12.
Article in English | MEDLINE | ID: mdl-38997666

ABSTRACT

Objective: To (1) examine the distribution of Telephone Interview for Cognitive Status modified (TICS-m) scores in oldest-old individuals (age 85 and above) identified as cognitively healthy by a previously validated electronic health records-based computable phenotype (CP) and (2) to compare different cutoff scores for cognitive impairment in this population. Method: CP identified 24,024 persons, 470 were contacted and 252 consented and completed the assessment. Associations of TICS-m score with age, sex, and educational categories (<10 years, 11-15 years, and >16 years) were examined. The number of participants perceived as impaired was studied with commonly used cutoff scores (27-31). Results: TICS-m score ranged from 18 to 44 with a mean of 32.6 (SD = 4.7) in older adults aged 85-99 years old. A linear regression model including (range-restricted) age, education, and sex, showed beta estimates comparable to previous findings. Different cutoff scores (27 to 31) generated slightly lower MCI and dementia prevalence rates of participants meeting the criteria for the impairments than studies of younger elderly using traditional recruitment methods. Conclusions: The use of validated computable phenotype to identify a normative cohort generated a normative distribution for the TICS-m consistent with prior findings from more effortful approaches to cohort identification and established expected TICS-m performance in the oldest-old population.

2.
Alzheimers Dement (Amst) ; 16(3): e12613, 2024.
Article in English | MEDLINE | ID: mdl-38966622

ABSTRACT

INTRODUCTION: Alzheimer's disease (AD) is often misclassified in electronic health records (EHRs) when relying solely on diagnosis codes. This study aimed to develop a more accurate, computable phenotype (CP) for identifying AD patients using structured and unstructured EHR data. METHODS: We used EHRs from the University of Florida Health (UFHealth) system and created rule-based CPs iteratively through manual chart reviews. The CPs were then validated using data from the University of Texas Health Science Center at Houston (UTHealth) and the University of Minnesota (UMN). RESULTS: Our best-performing CP was "patient has at least 2 AD diagnoses and AD-related keywords in AD encounters," with an F1-score of 0.817 at UF, 0.961 at UTHealth, and 0.623 at UMN, respectively. DISCUSSION: We developed and validated rule-based CPs for AD identification with good performance, which will be crucial for studies that aim to use real-world data like EHRs. Highlights: Developed a computable phenotype (CP) to identify Alzheimer's disease (AD) patients using EHR data.Utilized both structured and unstructured EHR data to enhance CP accuracy.Achieved a high F1-score of 0.817 at UFHealth, and 0.961 and 0.623 at UTHealth and UMN.Validated the CP across different demographics, ensuring robustness and fairness.

3.
JMIR Public Health Surveill ; 10: e49811, 2024 Jul 15.
Article in English | MEDLINE | ID: mdl-39008361

ABSTRACT

BACKGROUND: Adverse events associated with vaccination have been evaluated by epidemiological studies and more recently have gained additional attention with the emergency use authorization of several COVID-19 vaccines. As part of its responsibility to conduct postmarket surveillance, the US Food and Drug Administration continues to monitor several adverse events of special interest (AESIs) to ensure vaccine safety, including for COVID-19. OBJECTIVE: This study is part of the Biologics Effectiveness and Safety Initiative, which aims to improve the Food and Drug Administration's postmarket surveillance capabilities while minimizing public burden. This study aimed to enhance active surveillance efforts through a rules-based, computable phenotype algorithm to identify 5 AESIs being monitored by the Center for Disease Control and Prevention for COVID-19 or other vaccines: anaphylaxis, Guillain-Barré syndrome, myocarditis/pericarditis, thrombosis with thrombocytopenia syndrome, and febrile seizure. This study examined whether these phenotypes have sufficiently high positive predictive value (PPV) to ensure that the cases selected for surveillance are reasonably likely to be a postbiologic adverse event. This allows patient privacy, and security concerns for the data sharing of patients who had nonadverse events can be properly accounted for when evaluating the cost-benefit aspect of our approach. METHODS: AESI phenotype algorithms were developed to apply to electronic health record data at health provider organizations across the country by querying for standard and interoperable codes. The codes queried in the rules represent symptoms, diagnoses, or treatments of the AESI sourced from published case definitions and input from clinicians. To validate the performance of the algorithms, we applied them to electronic health record data from a US academic health system and provided a sample of cases for clinicians to evaluate. Performance was assessed using PPV. RESULTS: With a PPV of 93.3%, our anaphylaxis algorithm performed the best. The PPVs for our febrile seizure, myocarditis/pericarditis, thrombocytopenia syndrome, and Guillain-Barré syndrome algorithms were 89%, 83.5%, 70.2%, and 47.2%, respectively. CONCLUSIONS: Given our algorithm design and performance, our results support continued research into using interoperable algorithms for widespread AESI postmarket detection.


Subject(s)
Algorithms , Phenotype , Humans , United States/epidemiology , Biological Products/adverse effects , United States Food and Drug Administration , Adverse Drug Reaction Reporting Systems/statistics & numerical data , Drug-Related Side Effects and Adverse Reactions/epidemiology , Product Surveillance, Postmarketing/methods , Product Surveillance, Postmarketing/statistics & numerical data , COVID-19/prevention & control , COVID-19/epidemiology
4.
Am J Med Genet A ; 194(4): e63495, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38066696

ABSTRACT

Turner syndrome (TS) is a genetic condition occurring in ~1 in 2000 females characterized by the complete or partial absence of the second sex chromosome. TS research faces similar challenges to many other pediatric rare disease conditions, with homogenous, single-center, underpowered studies. Secondary data analyses utilizing electronic health record (EHR) have the potential to address these limitations; however, an algorithm to accurately identify TS cases in EHR data is needed. We developed a computable phenotype to identify patients with TS using PEDSnet, a pediatric research network. This computable phenotype was validated through chart review; true positives and negatives and false positives and negatives were used to assess accuracy at both primary and external validation sites. The optimal algorithm consisted of the following criteria: female sex, ≥1 outpatient encounter, and ≥3 encounters with a diagnosis code that maps to TS, yielding an average sensitivity of 0.97, specificity of 0.88, and C-statistic of 0.93 across all sites. The accuracy of any estradiol prescriptions yielded an average C-statistic of 0.91 across sites and 0.80 for transdermal and oral formulations separately. PEDSnet and computable phenotyping are powerful tools in providing large, diverse samples to pragmatically study rare pediatric conditions like TS.


Subject(s)
Electronic Health Records , Turner Syndrome , Humans , Child , Female , Turner Syndrome/diagnosis , Turner Syndrome/genetics , Phenotype , Algorithms , Estradiol
5.
Neuro Oncol ; 26(6): 1163-1170, 2024 Jun 03.
Article in English | MEDLINE | ID: mdl-38141226

ABSTRACT

BACKGROUND: Glioblastoma is the most common malignant brain tumor, and thus it is important to be able to identify patients with this diagnosis for population studies. However, this can be challenging as diagnostic codes are nonspecific. The aim of this study was to create a computable phenotype (CP) for glioblastoma multiforme (GBM) from structured and unstructured data to identify patients with this condition in a large electronic health record (EHR). METHODS: We used the University of Florida (UF) Health Integrated Data Repository, a centralized clinical data warehouse that stores clinical and research data from various sources within the UF Health system, including the EHR system. We performed multiple iterations to refine the GBM-relevant diagnosis codes, procedure codes, medication codes, and keywords through manual chart review of patient data. We then evaluated the performances of various possible proposed CPs constructed from the relevant codes and keywords. RESULTS: We underwent six rounds of manual chart reviews to refine the CP elements. The final CP algorithm for identifying GBM patients was selected based on the best F1-score. Overall, the CP rule "if the patient had at least 1 relevant diagnosis code and at least 1 relevant keyword" demonstrated the highest F1-score using both structured and unstructured data. Thus, it was selected as the best-performing CP rule. CONCLUSIONS: We developed and validated a CP algorithm for identifying patients with GBM using both structured and unstructured EHR data from a large tertiary care center. The final algorithm achieved an F1-score of 0.817, indicating a high performance, which minimizes possible biases from misclassification errors.


Subject(s)
Brain Neoplasms , Electronic Health Records , Glioblastoma , Phenotype , Humans , Glioblastoma/pathology , Glioblastoma/diagnosis , Brain Neoplasms/pathology , Brain Neoplasms/diagnosis , Algorithms , Female
6.
medRxiv ; 2023 Sep 18.
Article in English | MEDLINE | ID: mdl-37790390

ABSTRACT

Background: A scalable approach for the sharing and reuse of human-readable and computer-executable phenotype definitions can facilitate the reuse of electronic health records for cohort identification and research studies. Description: We developed a tool called Sharephe for the Informatics for Integrating Biology and the Bedside (i2b2) platform. Sharephe consists of a plugin for i2b2 and a cloud-based searchable repository of computable phenotypes, has the functionality to import to and export from the repository, and has the ability to link to supporting metadata. Discussion: The i2b2 platform enables researchers to create, evaluate, and implement phenotypes without knowing complex query languages. In an initial evaluation, two sites on the Evolve to Next-Gen ACT (ENACT) network used Sharephe to successfully create, share, and reuse phenotypes. Conclusion: The combination of a cloud-based computable repository and an i2b2 plugin for accessing the repository enables investigators to store and retrieve phenotypes from anywhere and at any time and to collaborate across sites in a research network.

7.
JMIR Med Inform ; 11: e46267, 2023 08 22.
Article in English | MEDLINE | ID: mdl-37621195

ABSTRACT

Background: Throughout the COVID-19 pandemic, many hospitals conducted routine testing of hospitalized patients for SARS-CoV-2 infection upon admission. Some of these patients are admitted for reasons unrelated to COVID-19 and incidentally test positive for the virus. Because COVID-19-related hospitalizations have become a critical public health indicator, it is important to identify patients who are hospitalized because of COVID-19 as opposed to those who are admitted for other indications. Objective: We compared the performance of different computable phenotype definitions for COVID-19 hospitalizations that use different types of data from electronic health records (EHRs), including structured EHR data elements, clinical notes, or a combination of both data types. Methods: We conducted a retrospective data analysis, using clinician chart review-based validation at a large academic medical center. We reviewed and analyzed the charts of 586 hospitalized individuals who tested positive for SARS-CoV-2 in January 2022. We used LASSO (least absolute shrinkage and selection operator) regression and random forests to fit classification algorithms that incorporated structured EHR data elements, clinical notes, or a combination of structured data and clinical notes. We used natural language processing to incorporate data from clinical notes. The performance of each model was evaluated based on the area under the receiver operator characteristic curve (AUROC) and an associated decision rule based on sensitivity and positive predictive value. We also identified top words and clinical indicators of COVID-19-specific hospitalization and assessed the impact of different phenotyping strategies on estimated hospital outcome metrics. Results: Based on a chart review, 38.2% (224/586) of patients were determined to have been hospitalized for reasons other than COVID-19, despite having tested positive for SARS-CoV-2. A computable phenotype that used clinical notes had significantly better discrimination than one that used structured EHR data elements (AUROC: 0.894 vs 0.841; P<.001) and performed similarly to a model that combined clinical notes with structured data elements (AUROC: 0.894 vs 0.893; P=.91). Assessments of hospital outcome metrics significantly differed based on whether the population included all hospitalized patients who tested positive for SARS-CoV-2 or those who were determined to have been hospitalized due to COVID-19. Conclusions: These findings highlight the importance of cause-specific phenotyping for COVID-19 hospitalizations. More generally, this work demonstrates the utility of natural language processing approaches for deriving information related to patient hospitalizations in cases where there may be multiple conditions that could serve as the primary indication for hospitalization.

8.
medRxiv ; 2023 Jul 23.
Article in English | MEDLINE | ID: mdl-37502850

ABSTRACT

Turner syndrome (TS) is a genetic condition occurring in ~1 in 2,000 females characterized by the complete or partial absence of the second sex chromosome. TS research faces similar challenges to many other pediatric rare disease conditions, with homogenous, single-center, underpowered studies. Secondary data analyses utilizing Electronic Health Record (EHR) have the potential to address these limitations, however, an algorithm to accurately identify TS cases in EHR data is needed. We developed a computable phenotype to identify patients with TS using PEDSnet, a pediatric research network. This computable phenotype was validated through chart review; true positives and negatives and false positives and negatives were used to assess accuracy at both primary and external validation sites. The optimal algorithm consisted of the following criteria: female sex, ≥1 outpatient encounter, and ≥3 encounters with a diagnosis code that maps to TS, yielding average sensitivity 0.97, specificity 0.88, and C-statistic 0.93 across all sites. The accuracy of any estradiol prescriptions yielded an average C-statistic of 0.91 across sites and 0.80 for transdermal and oral formulations separately. PEDSnet and computable phenotyping are powerful tools in providing large, diverse samples to pragmatically study rare pediatric conditions like TS.

9.
J Biomed Inform ; 140: 104335, 2023 04.
Article in English | MEDLINE | ID: mdl-36933631

ABSTRACT

Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.


Subject(s)
Algorithms , Electronic Health Records , Machine Learning , Natural Language Processing , Phenotype
10.
J Thromb Haemost ; 21(3): 513-521, 2023 03.
Article in English | MEDLINE | ID: mdl-36696219

ABSTRACT

BACKGROUND: Clinically relevant bleeding risk in discharged medical patients is underestimated and leads to rehospitalization, morbidity, and mortality. Studies assessing this risk are lacking. OBJECTIVE: The aim of this study was to develop and validate a computable phenotype for clinically relevant bleeding using electronic health record (EHR) data and quantify the relative and absolute risks of this bleeding after medical hospitalization. METHODS: We conducted an observational cohort study of people receiving their primary care at sites affiliated with an academic medical center in northwest Vermont, United States. We developed a computable phenotype using EHR data (diagnosis codes, procedure codes, laboratory, and transfusion data) and validated it by manual chart review. Cox proportional hazard models with hospitalization modeled as a time-varying covariate were used to estimate clinically relevant bleeding risk. RESULTS: The computable phenotype had a positive predictive value of 80% and a negative predictive value of 99%. The bleeding rate in individuals with no medical hospitalizations in the past 3 months was 2.9 per 1000 person-years versus 98.9 per 1000 person-years in those who were discharged in the past 3 months. This translates into a hazard ratio (95% CI) of clinically relevant bleeding of 22.9 (18.9, 27.7), 13.0 (10.0, 16.9), and 6.8 (4.7, 9.8) over the first, second, and third months after discharge, respectively. CONCLUSION: We developed and validated a computable phenotype for clinically relevant bleeding and determined its relative and absolute risk in the 3 months after medical hospitalization discharge. The high rates of bleeding observed underscore the clinical importance of capturing and further studying bleeding after medical discharge.


Subject(s)
Inpatients , Thrombosis , Humans , United States , Risk , Cohort Studies , Hemorrhage , Hospitalization
11.
Int Rev Psychiatry ; 34(3-4): 282-291, 2022.
Article in English | MEDLINE | ID: mdl-36151822

ABSTRACT

In several countries, no gender identity- and sexual orientation-related data is routinely collected, if not for specific health or administrative/social purposes. Implementing and ensuring equitable and inclusive socio-demographic data collection is of paramount importance, given that the LGBTI community suffers from a disproportionate burden in terms of both communicable and non-communicable diseases. To the best of the authors' knowledge, there exists no systematic review addressing the methods that can be implemented in capturing gender identity- and sexual orientation-related data in the healthcare sector. A systematic literature review was conducted for filling in this gap of knowledge. Twenty-three articles were retained and analysed: two focussed on self-reported data, two on structured/semi-structured data, seven on text-mining, natural language processing, and other emerging artificial intelligence-based techniques, two on challenges in capturing sexual and gender-diverse populations, eight on the willingness to disclose gender identity and sexual orientation, and, finally, two on integrating structured and unstructured data. Our systematic literature review found that, despite the importance of collecting gender identity- and sexual orientation-related data and its increasing societal acceptance from the LGBTI community, several issues have to be addressed yet. Transgender, non-binary identities, and also intersex individuals remain often invisible and marginalized. In the last decades, there has been an increasing adoption of structured data. However, exploiting unstructured data seems to overperform in identifying LGBTI members, especially integrating structured and unstructured data. Self-declared/self-perceived/self-disclosed definitions, while being respectful of one's perception, may not completely be aligned with sexual behaviours and activities. Incorporating different levels of information (biological, socio-demographic, behavioural, and clinical) would enable overcoming this pitfall. A shift from a rigid/static nomenclature towards a more nuanced, dynamic, 'fuzzy' concept of a 'computable phenotype' has been proposed in the literature to capture the complexity of sexual identities and trajectories. On the other hand, excessive fragmentation has to be avoided considering that: (i) a full list of options including all gender identities and sexual orientations will never be available; (ii) these options should be easily understood by the general population, and (iii) these options should be consistent in such a way that can be compared among various studies and surveys. Only in this way, data collection can be clinically meaningful: that is to say, to impact clinical outcomes at the individual and population level, and to promote further research in the field.


Subject(s)
Gender Identity , Health Care Sector , Artificial Intelligence , Data Collection , Female , Humans , Male , Sexual Behavior
12.
Epilepsia ; 63(11): 2981-2993, 2022 11.
Article in English | MEDLINE | ID: mdl-36106377

ABSTRACT

OBJECTIVE: More than one third of appropriately treated patients with epilepsy have continued seizures despite two or more medication trials, meeting criteria for drug-resistant epilepsy (DRE). Accurate and reliable identification of patients with DRE in observational data would enable large-scale, real-world comparative effectiveness research and improve access to specialized epilepsy care. In the present study, we aim to develop and compare the performance of computable phenotypes for DRE using the Observational Medical Outcomes Partnership (OMOP) Common Data Model. METHODS: We randomly sampled 600 patients from our academic medical center's electronic health record (EHR)-derived OMOP database meeting previously validated criteria for epilepsy (January 2015-August 2021). Two reviewers manually classified patients as having DRE, drug-responsive epilepsy, undefined drug responsiveness, or no epilepsy as of the last EHR encounter in the study period based on consensus definitions. Demographic characteristics and codes for diagnoses, antiseizure medications (ASMs), and procedures were tested for association with DRE. Algorithms combining permutations of these factors were applied to calculate sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for DRE. The F1 score was used to compare overall performance. RESULTS: Among 412 patients with source record-confirmed epilepsy, 62 (15.0%) had DRE, 163 (39.6%) had drug-responsive epilepsy, 124 (30.0%) had undefined drug responsiveness, and 63 (15.3%) had insufficient records. The best performing phenotype for DRE in terms of the F1 score was the presence of ≥1 intractable epilepsy code and ≥2 unique non-gabapentinoid ASM exposures each with ≥90-day drug era (sensitivity = .661, specificity = .937, PPV = .594, NPV = .952, F1 score = .626). Several phenotypes achieved higher sensitivity at the expense of specificity and vice versa. SIGNIFICANCE: OMOP algorithms can identify DRE in EHR-derived data with varying tradeoffs between sensitivity and specificity. These computable phenotypes can be applied across the largest international network of standardized clinical databases for further validation, reproducible observational research, and improving access to appropriate care.


Subject(s)
Drug Resistant Epilepsy , Epilepsy , Humans , Electronic Health Records , Drug Resistant Epilepsy/diagnosis , Drug Resistant Epilepsy/drug therapy , Databases, Factual , Data Collection , Algorithms , Epilepsy/diagnosis , Epilepsy/drug therapy
13.
Med Decis Making ; 42(7): 937-944, 2022 10.
Article in English | MEDLINE | ID: mdl-35658747

ABSTRACT

BACKGROUND: Analytic tools to study important clinical issues in complex, chronic diseases such as Crohn's disease (CD) include randomized trials, claims database studies, or small longitudinal epidemiologic cohorts. Using natural language processing (NLP), we sought to define the computable phenotype health state of pediatric and adult CD and develop patient-level longitudinal histories for health outcomes. METHODS: We defined 6 health states for CD using a subjective symptom-based assessment (symptomatic/asymptomatic) and an objective disease state assessment (active/inactive/no testing). Gold standard for the 6 health states was derived using an iterative process during review by our CD experts. We calculated the transition probabilities to estimate the time to transitions between the various health states using nonparametric Kaplan-Meier estimation and a Markov model. Finally, we determined a standard utility measure from clinical patients assigned to different health states. RESULTS: The NLP computable phenotype health state model correctly ascertained the objective test results and symptoms 96% and 85% of the time, respectively, based on a blinded chart evaluation. In our model, >25% of patients who begin as asymptomatic/active transition to symptomatic/active over the following year. For both adult and pediatric CD health states, the utility assessments of a symptomatic/inactive health state closely resembled a symptomatic/active health state. CONCLUSIONS: Our methodology for a computable phenotype health state demonstrates the application of real-world data to define progression and optimal management of a chronic disease such as CD. The application of the model has the potential to lead to a better understanding of the true impact of a therapeutic intervention and can provide long-term cost-effectiveness analyses for a new therapy. HIGHLIGHTS: Using natural language processing, we defined the computable phenotype health state of Crohn's disease and developed patient-level longitudinal histories for health outcomes.Our methodology demonstrates the application of real-world data to define the progression of a chronic disease.The application of the model has the potential to provide better understanding of the true impact of a new therapy.


Subject(s)
Crohn Disease , Chronic Disease , Cost-Benefit Analysis , Crohn Disease/diagnosis , Crohn Disease/drug therapy , Humans , Phenotype
14.
Crit Care Explor ; 4(3): e0645, 2022 Mar.
Article in English | MEDLINE | ID: mdl-35261979

ABSTRACT

Acute respiratory failure is a common reason for ICU admission and imposes significant strain on patients and the healthcare system. Noninvasive positive-pressure ventilation and high-flow nasal oxygen are increasingly used as an alternative to invasive mechanical ventilation to treat acute respiratory failure. As such, there is a need to accurately cohort patients using large, routinely collected, clinical data to better understand utilization patterns and patient outcomes. The primary objective of this retrospective observational study was to externally validate our computable phenotyping algorithm for patients with acute respiratory failure requiring various sequences of respiratory support in real-world data from a large healthcare delivery network. DESIGN: This is a cross-sectional observational study to validate our algorithm for phenotyping acute respiratory patients by method of respiratory support. We randomly selected 5% (n = 4,319) from each phenotype for manual validation. We calculated the algorithm performance and generated summary statistics for each phenotype and a priori defined clinical subgroups. SETTING: Data were extracted from a clinical data warehouse containing electronic health record data from 46 ICUs in the southwest United States. PATIENTS: All adult (≥ 18 yr) patient records requiring any type of oxygen therapy or mechanical ventilation between November 1, 2013, and September 30, 2020, were extracted for the study. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Micro- and macroaveraged multiclass specificities of the algorithm were 0.902 and 0.896, respectively. Sensitivity and specificity of phenotypes individually were greater than 0.90 for all phenotypes except for those patients extubated from invasive to noninvasive ventilation. We successfully created clinical subgroups of common illnesses requiring ventilatory support and provide high-level comparison of outcomes. CONCLUSIONS: The electronic phenotyping algorithm is robust and provides a necessary tool for retrospective research for characterizing patients with acute respiratory failure across modalities of respiratory support.

15.
J Am Heart Assoc ; 11(7): e023237, 2022 04 05.
Article in English | MEDLINE | ID: mdl-35348008

ABSTRACT

Background Electronic medical records are increasingly used to identify disease cohorts; however, computable phenotypes using electronic medical record data are often unable to distinguish between prevalent and incident cases. Methods and Results We identified all Olmsted County, Minnesota residents aged ≥18 with a first-ever International Classification of Diseases, Ninth Revision (ICD-9) diagnostic code for atrial fibrillation or atrial flutter from 2000 to 2014 (N=6177), and a random sample with an International Classification of Diseases, Tenth Revision (ICD-10) code from 2016 to 2018 (N=200). Trained nurse abstractors reviewed all medical records to validate the events and ascertain the date of onset (incidence date). Various algorithms based on number and types of codes (inpatient/outpatient), medications, and procedures were evaluated. Positive predictive value (PPV) and sensitivity of the algorithms were calculated. The lowest PPV was observed for 1 code (64.4%), and the highest PPV was observed for 2 codes (any type) >7 days apart but within 1 year (71.6%). Requiring either 1 inpatient or 2 outpatient codes separated by >7 days but within 1 year had the best balance between PPV (69.9%) and sensitivity (95.5%). PPVs were slightly higher using ICD-10 codes. Requiring an anticoagulant or antiarrhythmic prescription or electrical cardioversion in addition to diagnostic code(s) modestly improved the PPVs at the expense of large reductions in sensitivity. Conclusions We developed simple, exportable, computable phenotypes for atrial fibrillation using structured electronic medical record data. However, use of diagnostic codes to identify incident atrial fibrillation is prone to some misclassification. Further study is warranted to determine whether more complex phenotypes, including unstructured data sources or using machine learning techniques, may improve the accuracy of identifying incident atrial fibrillation.


Subject(s)
Atrial Fibrillation , Electronic Health Records , Algorithms , Atrial Fibrillation/diagnosis , Atrial Fibrillation/epidemiology , Electric Countershock , Humans , International Classification of Diseases , Machine Learning , Medical Records
16.
Int J Cardiol Heart Vasc ; 39: 100974, 2022 Apr.
Article in English | MEDLINE | ID: mdl-35242997

ABSTRACT

BACKGROUND: Use of existing data in electronic health records (EHRs) could be used more extensively to better leverage real world data for clinical studies, but only if standard, reliable processes are developed. Numerous computable phenotypes have been validated against manual chart review, and common data models (CDMs) exist to aid implementation of such phenotypes across platforms and sites. Our objective was to measure consistency between data that had previously been manually collected for an implantable cardiac device registry and CDM-based phenotypes for the condition of heart failure (HF). METHODS: Patients enrolled in an implantable cardiac device registry at two hospitals from 2013 to 2018 contributed to this analysis wherein registry data were compared to PCORnet CDM-formatted EHR data. Seven different phenotype algorithms were used to search for the presence of HF and compare the results with the registry. Sensitivity, specificity, predictive value and congruence were calculated for each phenotype. RESULTS: In the registry, 176 of 319 (55%) patients had history of HF, compared with different phenotypes estimating between 96 (30%) and 188 (59%). The least-restrictive phenotypes (any diagnosis) had high sensitivity and specificity (90%/80%), but more restrictive phenotypes had higher specificity (e.g., code present in problem list, 94%). Differences were observed using time-based criteria (e.g., days between visit diagnoses) and between participating hospitals. CONCLUSIONS: Consistency between manually-collected registry data and CDM-based phenotypes for history of HF was high overall, but use of different phenotypes impacted sensitivity and specificity, and results may differ depending on the medical condition of interest.

17.
Ophthalmic Epidemiol ; 29(6): 640-648, 2022 12.
Article in English | MEDLINE | ID: mdl-34822319

ABSTRACT

The availability of electronic health record (EHR)-linked biobank data for research presents opportunities to better understand complex ocular diseases. Developing accurate computable phenotypes for ocular diseases for which gold standard diagnosis includes imaging remains inaccessible in most biobank-linked EHRs. The objective of this study was to develop and validate a computable phenotype to identify primary open-angle glaucoma (POAG) through accessing the Department of Veterans Affairs (VA) Computerized Patient Record System (CPRS) and Million Veteran Program (MVP) biobank. Accessing CPRS clinical ophthalmology data from VA Medical Center Eye Clinic (VAMCEC) patients, we developed and iteratively refined POAG case and control algorithms based on clinical, prescription, and structured diagnosis data (ICD-CM codes). Refinement was performed via detailed chart review, initially at a single VAMCEC (n = 200) and validated at two additional VAMCECs (n = 100 each). Positive and negative predictive values (PPV, NPV) were computed as the proportion of CPRS patients correctly classified with POAG or without POAG, respectively, by the algorithms, validated by ophthalmologists and optometrists with access to gold-standard clinical diagnosis data. The final algorithms performed better than previously reported approaches in assuring the accuracy and reproducibility of POAG classification (PPV >83% and NPV >97%) with consistent performance in Black or African American and in White Veterans. Applied to the MVP to identify cases and controls, genetic analysis of a known POAG-associated locus further validated the algorithms. We conclude that ours is a viable approach to use combined EHR-genetic data to study patients with complex diseases that require imaging confirmation.


Subject(s)
Glaucoma, Open-Angle , Veterans , Humans , Glaucoma, Open-Angle/diagnosis , Glaucoma, Open-Angle/epidemiology , Reproducibility of Results , Algorithms , Electronic Health Records
18.
Gigascience ; 10(9)2021 09 11.
Article in English | MEDLINE | ID: mdl-34508578

ABSTRACT

BACKGROUND: High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. METHODS: A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. RESULTS: We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. CONCLUSIONS: There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains.


Subject(s)
Electronic Health Records , Humans , Phenotype , Reproducibility of Results
19.
Int J Med Inform ; 153: 104531, 2021 09.
Article in English | MEDLINE | ID: mdl-34332468

ABSTRACT

BACKGROUND: Replication of prediction modeling using electronic health records (EHR) is challenging because of the necessity to compute phenotypes including study cohort, outcomes, and covariates. However, some phenotypes may not be easily replicated across EHR data sources due to a variety of reasons such as the lack of gold standard definitions and documentation variations across systems, which may lead to measurement error and potential bias. Methicillin-resistant Staphylococcus aureus (MRSA) infections are responsible for high mortality worldwide. With limited treatment options for the infection, the ability to predict MRSA outcome is of interest. However, replicating these MRSA outcome prediction models using EHR data is problematic due to the lack of well-defined computable phenotypes for many of the predictors as well as study inclusion and outcome criteria. OBJECTIVE: In this study, we aimed to evaluate a prediction model for 30-day mortality after MRSA bacteremia infection diagnosis with reduced vancomycin susceptibility (MRSA-RVS) considering multiple computable phenotypes using EHR data. METHODS: We used EHR data from a large academic health center in the United States to replicate the original study conducted in Taiwan. We derived multiple computable phenotypes of risk factors and predictors used in the original study, reported stratified descriptive statistics, and assessed the performance of the prediction model. RESULTS: In our replication study, it was possible to (re)compute most of the original variables. Nevertheless, for certain variables, their computable phenotypes can only be approximated by proxy with structured EHR data items, especially the composite clinical indices such as the Pitt bacteremia score. Even computable phenotype for the outcome variable was subject to variation on the basis of the admission/discharge windows. The replicated prediction model exhibited only a mild discriminatory ability. CONCLUSION: Despite the rich information in EHR data, replication of prediction models involving complex predictors is still challenging, often due to the limited availability of validated computable phenotypes. On the other hand, it is often possible to derive proxy computable phenotypes that can be further validated and calibrated.


Subject(s)
Bacteremia , Methicillin-Resistant Staphylococcus aureus , Staphylococcal Infections , Anti-Bacterial Agents/therapeutic use , Bacteremia/drug therapy , Electronic Health Records , Humans , Phenotype , Staphylococcal Infections/drug therapy , United States
20.
J Child Neurol ; 36(11): 990-997, 2021 10.
Article in English | MEDLINE | ID: mdl-34315300

ABSTRACT

INTRODUCTION: Computable phenotypes allow identification of well-defined patient cohorts from electronic health record data. Little is known about the accuracy of diagnostic codes for important clinical concepts in pediatric epilepsy, such as (1) risk factors like neonatal hypoxic-ischemic encephalopathy; (2) clinical concepts like treatment resistance; (3) and syndromes like juvenile myoclonic epilepsy. We developed and evaluated the performance of computable phenotypes for these examples using electronic health record data at one center. METHODS: We identified gold standard cohorts for neonatal hypoxic-ischemic encephalopathy, pediatric treatment-resistant epilepsy, and juvenile myoclonic epilepsy via existing registries and review of clinical notes. From the electronic health record, we extracted diagnostic and procedure codes for all children with a diagnosis of epilepsy and seizures. We used these codes to develop computable phenotypes and evaluated by sensitivity, positive predictive value, and the F-measure. RESULTS: For neonatal hypoxic-ischemic encephalopathy, the best-performing computable phenotype (HIE ICD-9/10 and [brain magnetic resonance imaging (MRI) or electroencephalography (EEG) within 120 days of life] and absence of commonly miscoded conditions) had high sensitivity (95.7%, 95% confidence interval [CI] 85-99), positive predictive value (100%, 95% CI 95-100), and F measure (0.98). For treatment-resistant epilepsy, the best-performing computable phenotype (3 or more antiseizure medicines in the last 2 years or treatment-resistant ICD-10) had a sensitivity of 86.9% (95% CI 79-93), positive predictive value of 69.6% (95% CI 60-79), and F-measure of 0.77. For juvenile myoclonic epilepsy, the best performing computable phenotype (JME ICD-10) had poor sensitivity (52%, 95% CI 43-60) but high positive predictive value (90.4%, 95% CI 81-96); the F measure was 0.66. CONCLUSION: The variable accuracy of our computable phenotypes (hypoxic-ischemic encephalopathy high, treatment resistance medium, and juvenile myoclonic epilepsy low) demonstrates the heterogeneity of success using administrative data to identify cohorts important for pediatric epilepsy research.


Subject(s)
Brain/diagnostic imaging , Electroencephalography/methods , Electronic Health Records/statistics & numerical data , Epilepsy/diagnosis , Magnetic Resonance Imaging/methods , Registries/statistics & numerical data , Cross-Sectional Studies , Female , Humans , Infant, Newborn , Male , Phenotype , Reproducibility of Results , Retrospective Studies , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL