RESUMO
OBJECTIVE: The coronavirus disease 2019 (COVID-19) pandemic has demonstrated the value of real-world data for public health research. International federated analyses are crucial for informing policy makers. Common data models (CDMs) are critical for enabling these studies to be performed efficiently. Our objective was to convert the UK Biobank, a study of 500â000 participants with rich genetic and phenotypic data to the Observational Medical Outcomes Partnership (OMOP) CDM. MATERIALS AND METHODS: We converted UK Biobank data to OMOP CDM v. 5.3. We transformedparticipant research data on diseases collected at recruitment and electronic health records (EHRs) from primary care, hospitalizations, cancer registrations, and mortality from providers in England, Scotland, and Wales. We performed syntactic and semantic validations and compared comorbidities and risk factors between source and transformed data. RESULTS: We identified 502â505 participants (3086 with COVID-19) and transformed 690 fields (1â373â239â555 rows) to the OMOP CDM using 8 different controlled clinical terminologies and bespoke mappings. Specifically, we transformed self-reported noncancer illnesses 946â053 (83.91% of all source entries), cancers 37â802 (70.81%), medications 1â218â935 (88.25%), and prescriptions 864â788 (86.96%). In EHR, we transformed 13â028â182 (99.95%) hospital diagnoses, 6â465â399 (89.2%) procedures, 337â896â333 primary care diagnoses (CTV3, SNOMED-CT), 139â966â587 (98.74%) prescriptions (dm+d) and 77â127 (99.95%) deaths (ICD-10). We observed good concordance across demographic, risk factor, and comorbidity factors between source and transformed data. DISCUSSION AND CONCLUSION: Our study demonstrated that the OMOP CDM can be successfully leveraged to harmonize complex large-scale biobanked studies combining rich multimodal phenotypic data. Our study uncovered several challenges when transforming data from questionnaires to the OMOP CDM which require further research. The transformed UK Biobank resource is a valuable tool that can enable federated research, like COVID-19 studies.
Assuntos
Bancos de Espécimes Biológicos , COVID-19 , Humanos , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Reino Unido/epidemiologiaRESUMO
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMO
Prostate Cancer Diagnosis and Treatment Enhancement Through the Power of Big Data in Europe (PIONEER) is a European network of excellence for big data in prostate cancer, consisting of 32 private and public stakeholders from 9 countries across Europe. Launched by the Innovative Medicines Initiative 2 and part of the Big Data for Better Outcomes Programme (BD4BO), the overarching goal of PIONEER is to provide high-quality evidence on prostate cancer management by unlocking the potential of big data. The project has identified critical evidence gaps in prostate cancer care, via a detailed prioritization exercise including all key stakeholders. By standardizing and integrating existing high-quality and multidisciplinary data sources from patients with prostate cancer across different stages of the disease, the resulting big data will be assembled into a single innovative data platform for research. Based on a unique set of methodologies, PIONEER aims to advance the field of prostate cancer care with a particular focus on improving prostate-cancer-related outcomes, health system efficiency by streamlining patient management, and the quality of health and social care delivered to all men with prostate cancer and their families worldwide.