RESUMEN
Acute Kidney Injury (AKI) is a common clinical syndrome characterized by the rapid loss of kidney excretory function, which aggravates the clinical severity of other diseases in a large number of hospitalized patients. Accurate early prediction of AKI can enable in-time interventions and treatments. However, AKI is highly heterogeneous, thus identification of AKI sub-phenotypes can lead to an improved understanding of the disease pathophysiology and development of more targeted clinical interventions. This study used a memory network-based deep learning approach to discover AKI sub-phenotypes using structured and unstructured electronic health record (EHR) data of patients before AKI diagnosis. We leveraged a real world critical care EHR corpus including 37,486 ICU stays. Our approach identified three distinct sub-phenotypes: sub-phenotype I is with an average age of 63.03±17.25 years, and is characterized by mild loss of kidney excretory function (Serum Creatinine (SCr) 1.55±0.34 mg/dL, estimated Glomerular Filtration Rate Test (eGFR) 107.65±54.98 mL/min/1.73 m2). These patients are more likely to develop stage I AKI. Sub-phenotype II is with average age 66.81±10.43 years, and was characterized by severe loss of kidney excretory function (SCr 1.96±0.49 mg/dL, eGFR 82.19±55.92 mL/min/1.73 m2). These patients are more likely to develop stage III AKI. Sub-phenotype III is with average age 65.07±11.32 years, and was characterized moderate loss of kidney excretory function and thus more likely to develop stage II AKI (SCr 1.69±0.32 mg/dL, eGFR 93.97±56.53 mL/min/1.73 m2). Both SCr and eGFR are significantly different across the three sub-phenotypes with statistical testing plus postdoc analysis, and the conclusion still holds after age adjustment.
Asunto(s)
Lesión Renal Aguda , Registros Electrónicos de Salud , Lesión Renal Aguda/diagnóstico , Anciano , Creatinina , Tasa de Filtración Glomerular , Humanos , Persona de Mediana Edad , FenotipoRESUMEN
The Quality Data Model (QDM) is an information model developed by the National Quality Forum for representing electronic health record (EHR)-based electronic clinical quality measures (eCQMs). In conjunction with the HL7 Health Quality Measures Format (HQMF), QDM contains core elements that make it a promising model for representing EHR-driven phenotype algorithms for clinical research. However, the current QDM specification is available only as descriptive documents suitable for human readability and interpretation, but not for machine consumption. The objective of the present study is to develop and evaluate a data element repository (DER) for providing machine-readable QDM data element service APIs to support phenotype algorithm authoring and execution. We used the ISO/IEC 11179 metadata standard to capture the structure for each data element, and leverage Semantic Web technologies to facilitate semantic representation of these metadata. We observed there are a number of underspecified areas in the QDM, including the lack of model constraints and pre-defined value sets. We propose a harmonization with the models developed in HL7 Fast Healthcare Interoperability Resources (FHIR) and Clinical Information Modeling Initiatives (CIMI) to enhance the QDM specification and enable the extensibility and better coverage of the DER. We also compared the DER with the existing QDM implementation utilized within the Measure Authoring Tool (MAT) to demonstrate the scalability and extensibility of our DER-based approach.
Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Fenotipo , Investigación Biomédica , Bases de Datos Factuales , Humanos , SemánticaRESUMEN
Objective: To evaluate if a machine learning approach can accurately predict antidepressant treatment outcome using electronic health records (EHRs) from patients with depression. Method: This study examined 808 patients with depression at a New York City-based outpatient mental health clinic between June 13, 2016 and June 22, 2020. Antidepressant treatment outcome was defined based on trend in depression symptom severity over time and was categorized as either "Recovering" or "Worsening" (i.e., non-Recovering), measured by the slope of individual-level Patient Health Questionnaire-9 (PHQ-9) score trajectory spanning 6 months following treatment initiation. A patient was designated as "Recovering" if the slope is less than 0 and as "Worsening" if the slope was no less than 0. Multiple machine learning (ML) models including L2 norm regularized Logistic Regression, Naive Bayes, Random Forest, and Gradient Boosting Decision Tree (GBDT) were used to predict treatment outcome based on additional data from EHRs, including demographics and diagnoses. Shapley Additive Explanations were applied to identify the most important predictors. Results: The GBDT achieved the best results of predicting "Recovering" (AUC: 0.7654 ± 0.0227; precision: 0.6002 ± 0.0215; recall: 0.5131 ± 0.0336). When excluding patients with low PHQ-9 scores (<10) at baseline, the results of predicting "Recovering" (AUC: 0.7254 ± 0.0218; precision: 0.5392 ± 0.0437; recall: 0.4431 ± 0.0513) were obtained. Prior diagnosis of anxiety, psychotherapy, recurrent depression, and baseline depression symptom severity were strong predictors. Conclusions: The results demonstrate the potential utility of using ML in longitudinal EHRs to predict antidepressant treatment outcome. Our predictive tool holds the promise to accelerate personalized medical management in patients with psychiatric illnesses.
RESUMEN
OBJECTIVE: To identify depression subphenotypes from Electronic Health Records (EHRs) using machine learning methods, and analyze their characteristics with respect to patient demographics, comorbidities, and medications. MATERIALS AND METHODS: Using EHRs from the INSIGHT Clinical Research Network (CRN) database, multiple machine learning (ML) algorithms were applied to analyze 11 275 patients with depression to discern depression subphenotypes with distinct characteristics. RESULTS: Using the computational approaches, we derived three depression subphenotypes: Phenotype_A (n = 2791; 31.35%) included patients who were the oldest (mean (SD) age, 72.55 (14.93) years), had the most comorbidities, and took the most medications. The most common comorbidities in this cluster of patients were hyperlipidemia, hypertension, and diabetes. Phenotype_B (mean (SD) age, 68.44 (19.09) years) was the largest cluster (n = 4687; 52.65%), and included patients suffering from moderate loss of body function. Asthma, fibromyalgia, and Chronic Pain and Fatigue (CPF) were common comorbidities in this subphenotype. Phenotype_C (n = 1452; 16.31%) included patients who were younger (mean (SD) age, 63.47 (18.81) years), had the fewest comorbidities, and took fewer medications. Anxiety and tobacco use were common comorbidities in this subphenotype. CONCLUSION: Computationally deriving depression subtypes can provide meaningful insights and improve understanding of depression as a heterogeneous disorder. Further investigation is needed to assess the utility of these derived phenotypes to inform clinical trial design and interpretation in routine patient care.
RESUMEN
INTRODUCTION: We sought to assess longitudinal electronic health records (EHRs) using machine learning (ML) methods to computationally derive probable Alzheimer's Disease (AD) and related dementia subphenotypes. METHODS: A retrospective analysis of EHR data from a cohort of 7587 patients seen at a large, multi-specialty urban academic medical center in New York was conducted. Subphenotypes were derived using hierarchical clustering from 792 probable AD patients (cases) who had received at least one diagnosis of AD using their clinical data. The other 6795 patients, labeled as controls, were matched on age and gender with the cases and randomly selected in the ratio of 9:1. Prediction models with multiple ML algorithms were trained on this cohort using 5-fold cross-validation. XGBoost was used to rank the variable importance. RESULTS: Four subphenotypes were computationally derived. Subphenotype A (n = 273; 28.2%) had more patients with cardiovascular diseases; subphenotype B (n = 221; 27.9%) had more patients with mental health illnesses, such as depression and anxiety; patients in subphenotype C (n = 183; 23.1%) were overall older (mean (SD) age, 79.5 (5.4) years) and had the most comorbidities including diabetes, cardiovascular diseases, and mental health disorders; and subphenotype D (n = 115; 14.5%) included patients who took anti-dementia drugs and had sensory problems, such as deafness and hearing impairment.The 0-year prediction model for AD risk achieved an area under the receiver operating curve (AUC) of 0.764 (SD: 0.02); the 6-month model, 0.751 (SD: 0.02); the 1-year model, 0.752 (SD: 0.02); the 2-year model, 0.749 (SD: 0.03); and the 3-year model, 0.735 (SD: 0.03), respectively. Based on variable importance, the top-ranked comorbidities included depression, stroke/transient ischemic attack, hypertension, anxiety, mobility impairments, and atrial fibrillation. The top-ranked medications included anti-dementia drugs, antipsychotics, antiepileptics, and antidepressants. CONCLUSIONS: Four subphenotypes were computationally derived that correlated with cardiovascular diseases and mental health illnesses. ML algorithms based on patient demographics, diagnosis, and treatment demonstrated promising results in predicting the risk of developing AD at different time points across an individual's lifespan.
RESUMEN
INTRODUCTION: Electronic health record (EHR)-driven phenotyping is a critical first step in generating biomedical knowledge from EHR data. Despite recent progress, current phenotyping approaches are manual, time-consuming, error-prone, and platform-specific. This results in duplication of effort and highly variable results across systems and institutions, and is not scalable or portable. In this work, we investigate how the nascent Clinical Quality Language (CQL) can address these issues and enable high-throughput, cross-platform phenotyping. METHODS: We selected a clinically validated heart failure (HF) phenotype definition and translated it into CQL, then developed a CQL execution engine to integrate with the Observational Health Data Sciences and Informatics (OHDSI) platform. We executed the phenotype definition at two large academic medical centers, Northwestern Medicine and Weill Cornell Medicine, and conducted results verification (n = 100) to determine precision and recall. We additionally executed the same phenotype definition against two different data platforms, OHDSI and Fast Healthcare Interoperability Resources (FHIR), using the same underlying dataset and compared the results. RESULTS: CQL is expressive enough to represent the HF phenotype definition, including Boolean and aggregate operators, and temporal relationships between data elements. The language design also enabled the implementation of a custom execution engine with relative ease, and results verification at both sites revealed that precision and recall were both 100%. Cross-platform execution resulted in identical patient cohorts generated by both data platforms. CONCLUSIONS: CQL supports the representation of arbitrarily complex phenotype definitions, and our execution engine implementation demonstrated cross-platform execution against two widely used clinical data platforms. The language thus has the potential to help address current limitations with portability in EHR-driven phenotyping and scale in learning health systems.
RESUMEN
With the increased adoption of electronic health records, data collected for routine clinical care is used for health outcomes and population sciences research, including the identification of phenotypes. In recent years, research networks, such as eMERGE, OHDSI and PCORnet, have been able to increase statistical power and population diversity by combining patient cohorts. These networks share phenotype algorithms that are executed at each participating site. Here we observe experiences with phenotype algorithm portability across seven research networks and propose a generalizable framework for phenotype algorithm portability. Several strategies exist to increase the portability of phenotype algorithms, reducing the implementation effort needed by each site. These include using a common data model, standardized representation of the phenotype algorithm logic, and technical solutions to facilitate federated execution of queries. Portability is achieved by tradeoffs across three domains: Data, Authoring and Implementation, and multiple approaches were observed in representing portable phenotype algorithms. Our proposed framework will help guide future research in operationalizing phenotype algorithm portability at scale.
Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Investigación Biomédica , Elementos de Datos Comunes , Redes de Comunicación de Computadores , Humanos , FenotipoRESUMEN
Acute Kidney Injury (AKI) is the most common cause of organ dysfunction in critically ill adults and prior studies have shown AKI is associated with a significant increase of the mortality risk. Early prediction of the mortality risk for AKI patients can help clinical decision makers better understand the patient condition in time and take appropriate actions. However, AKI is a heterogeneous disease and its cause is complex, which makes such predictions a challenging task. In this paper, we investigate machine learning models for predicting the mortality risk of AKI patients who are stratified according to their AKI stages. With this setup we demonstrate the stratified mortality prediction performance of patients with AKI is better than the results obtained on the mixed population.
Asunto(s)
Lesión Renal Aguda , Cuidados Críticos , Enfermedad Crítica , Humanos , Aprendizaje AutomáticoRESUMEN
Acute Kidney Injury (AKI) in critical care is often a quickly-evolving clinical event with high morbidity and mortality. Early prediction of AKI risk in critical care setting can facilitate early interventions that are likely to provide ben- efit. Recently there have been some research on AKI prediction with patient Electronic Health Records (EHR). The class imbalance problem is encountered in such prediction setting where the number of AKI cases is usually much smaller than the controls. This study systematically investigates the impact of class imbalance on the performance of AKI prediction. We systematically investigate several class-balancing strategies to address class imbalance, includ- ing traditional statistical approaches and the proposed methods (case-control matching approach and individualized prediction approach). Our results show that the proposed class-balancing strategies can effectively improve the AKI prediction performance. Additionally, some important predictors (e.g., creatinine, chloride, and urine) for AKI can be found based on the proposed methods.
RESUMEN
While natural language processing (NLP) of unstructured clinical narratives holds the potential for patient care and clinical research, portability of NLP approaches across multiple sites remains a major challenge. This study investigated the portability of an NLP system developed initially at the Department of Veterans Affairs (VA) to extract 27 key cardiac concepts from free-text or semi-structured echocardiograms from three academic edical centers: Weill Cornell Medicine, Mayo Clinic and Northwestern Medicine. While the NLP system showed high precision and recall easurements for four target concepts (aortic valve regurgitation, left atrium size at end systole, mitral valve regurgitation, tricuspid valve regurgitation) across all sites, we found moderate or poor results for the remaining concepts and the NLP system performance varied between individual sites.
Asunto(s)
Ecocardiografía , Registros Electrónicos de Salud , Interoperabilidad de la Información en Salud , Enfermedades de las Válvulas Cardíacas/diagnóstico por imagen , Procesamiento de Lenguaje Natural , Corazón/anatomía & histología , Corazón/diagnóstico por imagen , Enfermedades de las Válvulas Cardíacas/fisiopatología , Humanos , Narración , Estudios RetrospectivosRESUMEN
Electronic health record (EHR) algorithms for defining patient cohorts are commonly shared as free-text descriptions that require human intervention both to interpret and implement. We developed the Phenotype Execution and Modeling Architecture (PhEMA, http://projectphema.org) to author and execute standardized computable phenotype algorithms. With PhEMA, we converted an algorithm for benign prostatic hyperplasia, developed for the electronic Medical Records and Genomics network (eMERGE), into a standards-based computable format. Eight sites (7 within eMERGE) received the computable algorithm, and 6 successfully executed it against local data warehouses and/or i2b2 instances. Blinded random chart review of cases selected by the computable algorithm shows PPV ≥90%, and 3 out of 5 sites had >90% overlap of selected cases when comparing the computable algorithm to their original eMERGE implementation. This case study demonstrates potential use of PhEMA computable representations to automate phenotyping across different EHR systems, but also highlights some ongoing challenges.
Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Fenotipo , Hiperplasia Prostática/diagnóstico , Data Warehousing , Bases de Datos Factuales , Genómica , Humanos , Masculino , Estudios de Casos Organizacionales , Hiperplasia Prostática/genéticaRESUMEN
A variety of data models have been developed to provide a standardized data interface that supports organizing clinical research data into a standard structure for building the integrated data repositories. HL7 Fast Healthcare Interoperability Resources (FHIR) is emerging as a next generation standards framework for facilitating health care and electronic health records-based data exchange. The objective of the study was to design and assess a consensus-based approach for harmonizing the OHDSI CDM with HL7 FHIR. We leverage a FHIR W5 (Who, What, When, Where, and Why) Classification System for designing the harmonization approaches and assess their utility in achieving the consensus among curators using a standard inter-rater agreement measure. Moderate agreement was achieved for the model-level harmonization (kappa = 0.50) whereas only fair agreement was achieved for the property-level harmonization (kappa = 0.21). FHIR W5 is a useful tool in designing the harmonization approaches between data models and FHIR, and facilitating the consensus achievement.
Asunto(s)
Consenso , Registros Electrónicos de Salud , HumanosRESUMEN
OBJECTIVE: To review and evaluate available software tools for electronic health record-driven phenotype authoring in order to identify gaps and needs for future development. MATERIALS AND METHODS: Candidate phenotype authoring tools were identified through (1) literature search in four publication databases (PubMed, Embase, Web of Science, and Scopus) and (2) a web search. A collection of tools was compiled and reviewed after the searches. A survey was designed and distributed to the developers of the reviewed tools to discover their functionalities and features. RESULTS: Twenty-four different phenotype authoring tools were identified and reviewed. Developers of 16 of these identified tools completed the evaluation survey (67% response rate). The surveyed tools showed commonalities but also varied in their capabilities in algorithm representation, logic functions, data support and software extensibility, search functions, user interface, and data outputs. DISCUSSION: Positive trends identified in the evaluation included: algorithms can be represented in both computable and human readable formats; and most tools offer a web interface for easy access. However, issues were also identified: many tools were lacking advanced logic functions for authoring complex algorithms; the ability to construct queries that leveraged un-structured data was not widely implemented; and many tools had limited support for plug-ins or external analytic software. CONCLUSIONS: Existing phenotype authoring tools could enable clinical researchers to work with electronic health record data more efficiently, but gaps still exist in terms of the functionalities of such tools. The present work can serve as a reference point for the future development of similar tools.
Asunto(s)
Algoritmos , Investigación Biomédica , Registros Electrónicos de Salud , Programas Informáticos , Humanos , Investigación Biomédica TraslacionalRESUMEN
Increasing interest in and experience with electronic health record (EHR)-driven phenotyping has yielded multiple challenges that are at present only partially addressed. Many solutions require the adoption of a single software platform, often with an additional cost of mapping existing patient and phenotypic data to multiple representations. We propose a set of guiding design principles and a modular software architecture to bridge the gap to a standardized phenotype representation, dissemination and execution. Ongoing development leveraging this proposed architecture has shown its ability to address existing limitations.
RESUMEN
By nature, healthcare data is highly complex and voluminous. While on one hand, it provides unprecedented opportunities to identify hidden and unknown relationships between patients and treatment outcomes, or drugs and allergic reactions for given individuals, representing and querying large network datasets poses significant technical challenges. In this research, we study the use of Semantic Web and Linked Data technologies for identifying drug-drug interaction (DDI) information from publicly available resources, and determining if such interactions were observed using real patient data. Specifically, we apply Linked Data principles and technologies for representing patient data from electronic health records (EHRs) at Mayo Clinic as Resource Description Framework (RDF), and identify potential drug-drug interactions (PDDIs) for widely prescribed cardiovascular and gastroenterology drugs. Our results from the proof-of-concept study demonstrate the potential of applying such a methodology to study patient health outcomes as well as enabling genome-guided drug therapies and treatment interventions.
Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos , Minería de Datos/métodos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/clasificación , Registros Electrónicos de Salud , Internet , Registro Médico Coordinado/métodos , Procesamiento de Lenguaje Natural , Inteligencia Artificial , Interacciones Farmacológicas , SemánticaRESUMEN
Gene Wiki Plus (GeneWiki+) and the Online Mendelian Inheritance in Man (OMIM) are publicly available resources for sharing information about disease-gene and gene-SNP associations in humans. While immensely useful to the scientific community, both resources are manually curated, thereby making the data entry and publication process time-consuming, and to some degree, error-prone. To this end, this study investigates Semantic Web technologies to validate existing and potentially discover new genotype-phenotype associations in GWP and OMIM. In particular, we demonstrate the applicability of SPARQL queries for identifying associations not explicitly stated for commonly occurring chronic diseases in GWP and OMIM, and report our preliminary findings for coverage, completeness, and validity of the associations. Our results highlight the benefits of Semantic Web querying technology to validate existing disease-gene associations as well as identify novel associations although further evaluation and analysis is required before such information can be applied and used effectively.
RESUMEN
The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. One of the key requirements to perform GWAS is the identification of subject cohorts with accurate classification of disease phenotypes. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical data stored in electronic health records (EHRs) to accurately identify subjects with specific diseases for inclusion in cohort studies. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR data and enabling federated querying and inferencing via standardized Web protocols for identifying subjects with Diabetes Mellitus. Our study highlights the potential of using Web-scale data federation approaches to execute complex queries.
RESUMEN
The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form "biobanks" where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypothesis generation. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped with Type 2 Diabetes for discovering gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries.
Asunto(s)
Diabetes Mellitus Tipo 2/genética , Registros Electrónicos de Salud , Estudio de Asociación del Genoma Completo , Fenotipo , Minería de Datos , Registros Electrónicos de Salud/normas , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Humanos , Internet/normas , SemánticaRESUMEN
BACKGROUND: The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form "biobanks" where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on a large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypotheses generation. RESULTS: In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped for Type 2 Diabetes and Hypothyroidism to discover gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries. CONCLUSIONS: This study demonstrates how Semantic Web technologies can be applied in conjunction with clinical data stored in EHRs to accurately identify subjects with specific diseases and phenotypes, and identify genotype-phenotype associations.