ABSTRACT
OBJECTIVE: Electronic Health Record (EHR) systems are digital platforms in clinical practice used to collect patients' clinical information related to their health status and represents a useful storage of real-world data. EHRs have a potential role in research studies, in particular, in platform trials. Platform trials are innovative trial designs including multiple trial arms (conducted simultaneously and/or sequentially) on different treatments under a single master protocol. However, the use of EHRs in research comes with important challenges such as incompleteness of records and the need to translate trial eligibility criteria into interoperable queries. In this paper, we aim to review and to describe our proposed innovative methods to tackle some of the most important challenges identified. This work is part of the Innovative Medicines Initiative (IMI) EU Patient-cEntric clinicAl tRial pLatforms (EU-PEARL) project's work package 3 (WP3), whose objective is to deliver tools and guidance for EHR-based protocol feasibility assessment, clinical site selection, and patient pre-screening in platform trials, investing in the building of a data-driven clinical network framework that can execute these complex innovative designs for which feasibility assessments are critically important. METHODS: ISO standards and relevant references informed a readiness survey, producing 354 criteria with corresponding questions selected and harmonised through a 7-round scoring process (0-1) in stakeholder meetings, with 85% of consensus being the threshold of acceptance for a criterium/question. ATLAS cohort definition and Cohort Diagnostics were mainly used to create the trial feasibility eligibility (I/E) criteria as executable interoperable queries. RESULTS: The WP3/EU-PEARL group developed a readiness survey (eSurvey) for an efficient selection of clinical sites with suitable EHRs, consisting of yes-or-no questions, and a set-up of interoperable proxy queries using physicians' defined trial criteria. Both actions facilitate recruiting trial participants and alignment between study costs/timelines and data-driven recruitment potential. CONCLUSION: The eSurvey will help create an archive of clinical sites with mature EHR systems suitable to participate in clinical trials/platform trials, and the interoperable proxy queries of trial eligibility criteria will help identify the number of potential participants. Ultimately, these tools will contribute to the production of EHR-based protocol design.
Subject(s)
Electronic Health Records , Physicians , Humans , Patient Selection , Records , Surveys and QuestionnairesABSTRACT
Key Research Areas (KRAs) were identified to establish a semantic interoperability framework for intensive medicine data in Europe. These include assessing common data model value, ensuring smooth data interoperability, supporting data standardization for efficient dataset use, and defining anonymization requirements to balance data protection and innovation.
Subject(s)
Electronic Health Records , Europe , Humans , Health Information Interoperability , Critical Care , Computer Security , SemanticsABSTRACT
OBJECTIVE: Health data standardized to a common data model (CDM) simplifies and facilitates research. This study examines the factors that make standardizing observational health data to the Observational Medical Outcomes Partnership (OMOP) CDM successful. MATERIALS AND METHODS: Twenty-five data partners (DPs) from 11 countries received funding from the European Health Data Evidence Network (EHDEN) to standardize their data. Three surveys, DataQualityDashboard results, and statistics from the conversion process were analyzed qualitatively and quantitatively. Our measures of success were the total number of days to transform source data into the OMOP CDM and participation in network research. RESULTS: The health data converted to CDM represented more than 133 million patients. 100%, 88%, and 84% of DPs took Surveys 1, 2, and 3. The median duration of the 6 key extract, transform, and load (ETL) processes ranged from 4 to 115 days. Of the 25 DPs, 21 DPs were considered applicable for analysis of which 52% standardized their data on time, and 48% participated in an international collaborative study. DISCUSSION: This study shows that the consistent workflow used by EHDEN proves appropriate to support the successful standardization of observational data across Europe. Over the 25 successful transformations, we confirmed that getting the right people for the ETL is critical and vocabulary mapping requires specific expertise and support of tools. Additionally, we learned that teams that proactively prepared for data governance issues were able to avoid considerable delays improving their ability to finish on time. CONCLUSION: This study provides guidance for future DPs to standardize to the OMOP CDM and participate in distributed networks. We demonstrate that the Observational Health Data Sciences and Informatics community must continue to evaluate and provide guidance and support for what ultimately develops the backbone of how community members generate evidence.
Subject(s)
Global Health , Medicine , Humans , Databases, Factual , Europe , Electronic Health RecordsABSTRACT
OBJECTIVE: The coronavirus disease 2019 (COVID-19) pandemic has demonstrated the value of real-world data for public health research. International federated analyses are crucial for informing policy makers. Common data models (CDMs) are critical for enabling these studies to be performed efficiently. Our objective was to convert the UK Biobank, a study of 500â000 participants with rich genetic and phenotypic data to the Observational Medical Outcomes Partnership (OMOP) CDM. MATERIALS AND METHODS: We converted UK Biobank data to OMOP CDM v. 5.3. We transformedparticipant research data on diseases collected at recruitment and electronic health records (EHRs) from primary care, hospitalizations, cancer registrations, and mortality from providers in England, Scotland, and Wales. We performed syntactic and semantic validations and compared comorbidities and risk factors between source and transformed data. RESULTS: We identified 502â505 participants (3086 with COVID-19) and transformed 690 fields (1â373â239â555 rows) to the OMOP CDM using 8 different controlled clinical terminologies and bespoke mappings. Specifically, we transformed self-reported noncancer illnesses 946â053 (83.91% of all source entries), cancers 37â802 (70.81%), medications 1â218â935 (88.25%), and prescriptions 864â788 (86.96%). In EHR, we transformed 13â028â182 (99.95%) hospital diagnoses, 6â465â399 (89.2%) procedures, 337â896â333 primary care diagnoses (CTV3, SNOMED-CT), 139â966â587 (98.74%) prescriptions (dm+d) and 77â127 (99.95%) deaths (ICD-10). We observed good concordance across demographic, risk factor, and comorbidity factors between source and transformed data. DISCUSSION AND CONCLUSION: Our study demonstrated that the OMOP CDM can be successfully leveraged to harmonize complex large-scale biobanked studies combining rich multimodal phenotypic data. Our study uncovered several challenges when transforming data from questionnaires to the OMOP CDM which require further research. The transformed UK Biobank resource is a valuable tool that can enable federated research, like COVID-19 studies.
Subject(s)
Biological Specimen Banks , COVID-19 , Humans , Databases, Factual , Electronic Health Records , United Kingdom/epidemiologyABSTRACT
OBJECTIVE: The aim of the study was to transform a resource of linked electronic health records (EHR) to the OMOP common data model (CDM) and evaluate the process in terms of syntactic and semantic consistency and quality when implementing disease and risk factor phenotyping algorithms. MATERIALS AND METHODS: Using heart failure (HF) as an exemplar, we represented three national EHR sources (Clinical Practice Research Datalink, Hospital Episode Statistics Admitted Patient Care, Office for National Statistics) into the OMOP CDM 5.2. We compared the original and CDM HF patient population by calculating and presenting descriptive statistics of demographics, related comorbidities, and relevant clinical biomarkers. RESULTS: We identified a cohort of 502 536 patients with the incident and prevalent HF and converted 1 099 195 384 rows of data from 216 581 914 encounters across three EHR sources to the OMOP CDM. The largest percentage (65%) of unmapped events was related to medication prescriptions in primary care. The average coverage of source vocabularies was >98% with the exception of laboratory tests recorded in primary care. The raw and transformed data were similar in terms of demographics and comorbidities with the largest difference observed being 3.78% in the prevalence of chronic obstructive pulmonary disease (COPD). CONCLUSION: Our study demonstrated that the OMOP CDM can successfully be applied to convert EHR linked across multiple healthcare settings and represent phenotyping algorithms spanning multiple sources. Similar to previous research, challenges mapping primary care prescriptions and laboratory measurements still persist and require further work. The use of OMOP CDM in national UK EHR is a valuable research tool that can enable large-scale reproducible observational research.
ABSTRACT
Prostate Cancer Diagnosis and Treatment Enhancement Through the Power of Big Data in Europe (PIONEER) is a European network of excellence for big data in prostate cancer, consisting of 32 private and public stakeholders from 9 countries across Europe. Launched by the Innovative Medicines Initiative 2 and part of the Big Data for Better Outcomes Programme (BD4BO), the overarching goal of PIONEER is to provide high-quality evidence on prostate cancer management by unlocking the potential of big data. The project has identified critical evidence gaps in prostate cancer care, via a detailed prioritization exercise including all key stakeholders. By standardizing and integrating existing high-quality and multidisciplinary data sources from patients with prostate cancer across different stages of the disease, the resulting big data will be assembled into a single innovative data platform for research. Based on a unique set of methodologies, PIONEER aims to advance the field of prostate cancer care with a particular focus on improving prostate-cancer-related outcomes, health system efficiency by streamlining patient management, and the quality of health and social care delivered to all men with prostate cancer and their families worldwide.
Subject(s)
Big Data , Biomedical Research , Prostatic Neoplasms , Humans , MaleABSTRACT
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
ABSTRACT
Signal transduction by prokaryotes almost exclusively relies on two-component systems for sensing and responding to (extracellular) signals. Here, we use stochastic models of two-component systems to better understand the impact of stochasticity on the fidelity and robustness of signal transmission, the outcome of autoregulatory gene expression and the influence of cell growth and division. We report that two-component systems are remarkably robust against copy number fluctuations of the signalling proteins they are composed of, which enhances signal transmission fidelity. Furthermore, we find that due to stochasticity these systems can get locked in an active state for extended time periods when (initially high) signal levels drop to zero. This behaviour can contribute to a bet-hedging adaptation strategy, aiding survival in fluctuating environments. Additionally, autoregulatory gene expression can cause two-component systems to become bistable at realistic parameter values. As a result, two sub-populations of cells can co-exist-active and inactive cells, which contributes to fitness in unpredictable environments. Bistability proved robust with respect to cell growth and division, and is tunable by the growth rate. In conclusion, our results indicate how single cells can cope with the inevitable stochasticity occurring in the activity of their two-component systems. They are robust to disadvantageous fluctuations that scramble signal transduction and they exploit beneficial stochasticity that generates fitness-enhancing heterogeneity across an isogenic population of cells.