RESUMEN
The Hamburg City Health Study (HCHS) is a large, prospective, long-term, population-based cohort study and a unique research platform and network to obtain substantial knowledge about several important risk and prognostic factors in major chronic diseases. A random sample of 45,000 participants between 45 and 74 years of age from the general population of Hamburg, Germany, are taking part in an extensive baseline assessment at one dedicated study center. Participants undergo 13 validated and 5 novel examinations primarily targeting major organ system function and structures including extensive imaging examinations. The protocol includes validate self-reports via questionnaires regarding lifestyle and environmental conditions, dietary habits, physical condition and activity, sexual dysfunction, professional life, psychosocial context and burden, quality of life, digital media use, occupational, medical and family history as well as healthcare utilization. The assessment is completed by genomic and proteomic characterization. Beyond the identification of classical risk factors for major chronic diseases and survivorship, the core intention is to gather valid prevalence and incidence, and to develop complex models predicting health outcomes based on a multitude of examination data, imaging, biomarker, psychosocial and behavioral assessments. Participants at risk for coronary artery disease, atrial fibrillation, heart failure, stroke and dementia are invited for a visit to conduct an additional MRI examination of either heart or brain. Endpoint assessment of the overall sample will be completed through repeated follow-up examinations and surveys as well as related individual routine data from involved health and pension insurances. The study is targeting the complex relationship between biologic and psychosocial risk and resilience factors, chronic disease, health care use, survivorship and health as well as favorable and bad prognosis within a unique, large-scale long-term assessment with the perspective of further examinations after 6 years in a representative European metropolitan population.
Asunto(s)
Enfermedad Crónica/epidemiología , Anciano , Fibrilación Atrial , Estudios de Cohortes , Enfermedad de la Arteria Coronaria , Femenino , Alemania/epidemiología , Insuficiencia Cardíaca , Humanos , Incidencia , Estilo de Vida , Imagen por Resonancia Magnética , Masculino , Trastornos Mentales , Persona de Mediana Edad , Neoplasias , Salud Bucal , Vigilancia de la Población , Prevalencia , Estudios Prospectivos , Proteómica , Calidad de Vida , Proyectos de Investigación , Factores de Riesgo , Accidente Cerebrovascular , Encuestas y CuestionariosRESUMEN
Searching for patient cohorts in electronic patient data often requires the definition of temporal constraints between the selection criteria. However, beyond a certain degree of temporal complexity, the non-graphical, form-based approaches implemented in current translational research platforms may be limited when modeling such constraints. In our opinion, there is a need for an easily accessible and implementable, fully graphical method for creating temporal queries. We aim to respond to this challenge with a new graphical notation. Based on Allen's time interval algebra, it allows for modeling temporal queries by arranging simple horizontal bars depicting symbolic time intervals. To make our approach applicable to complex temporal patterns, we apply two extensions: with duration intervals, we enable the inference about relative temporal distances between patient events, and with time interval modifiers, we support counting and excluding patient events, as well as constraining numeric values. We describe how to generate database queries from this notation. We provide a prototypical implementation, consisting of a temporal query modeling frontend and an experimental backend that connects to an i2b2 system. We evaluate our modeling approach on the MIMIC-III database to demonstrate that it can be used for modeling typical temporal phenotyping queries.
Asunto(s)
Gráficos por Computador , Simulación por Computador , Algoritmos , Bases de Datos Factuales , Humanos , Almacenamiento y Recuperación de la Información , TiempoRESUMEN
COVID-19 has challenged the healthcare systems worldwide. To quickly identify successful diagnostic and therapeutic approaches large data sharing approaches are inevitable. Though organizational clinical data are abundant, many of them are available only in isolated silos and largely inaccessible to external researchers. To overcome and tackle this challenge the university medicine network (comprising all 36 German university hospitals) has been founded in April 2020 to coordinate COVID-19 action plans, diagnostic and therapeutic strategies and collaborative research activities. 13 projects were initiated from which the CODEX project, aiming at the development of a Germany-wide Covid-19 Data Exchange Platform, is presented in this publication. We illustrate the conceptual design, the stepwise development and deployment, first results and the current status.
Asunto(s)
COVID-19 , Atención a la Salud , Alemania , Hospitales Universitarios , Humanos , Difusión de la InformaciónRESUMEN
BACKGROUND: The harmonization and standardization of digital medical information for research purposes is a challenging and ongoing collaborative effort. Current research data repositories typically require extensive efforts in harmonizing and transforming original clinical data. The Fast Healthcare Interoperability Resources (FHIR) format was designed primarily to represent clinical processes; therefore, it closely resembles the clinical data model and is more widely available across modern electronic health records. However, no common standardized data format is directly suitable for statistical analyses, and data need to be preprocessed before statistical analysis. OBJECTIVE: This study aimed to elucidate how FHIR data can be queried directly with a preprocessing service and be used for statistical analyses. METHODS: We propose that the binary JavaScript Object Notation format of the PostgreSQL (PSQL) open source database is suitable for not only storing FHIR data, but also extending it with preprocessing and filtering services, which directly transform data stored in FHIR format into prepared data subsets for statistical analysis. We specified an interface for this preprocessor, implemented and deployed it at University Hospital Erlangen-Nürnberg, generated 3 sample data sets, and analyzed the available data. RESULTS: We imported real-world patient data from 2016 to 2018 into a standard PSQL database, generating a dataset of approximately 35.5 million FHIR resources, including "Patient," "Encounter," "Condition" (diagnoses specified using International Classification of Diseases codes), "Procedure," and "Observation" (laboratory test results). We then integrated the developed preprocessing service with the PSQL database and the locally installed web-based KETOS analysis platform. Advanced statistical analyses were feasible using the developed framework using 3 clinically relevant scenarios (data-driven establishment of hemoglobin reference intervals, assessment of anemia prevalence in patients with cancer, and investigation of the adverse effects of drugs). CONCLUSIONS: This study shows how the standard open source database PSQL can be used to store FHIR data and be integrated with a specifically developed preprocessing and analysis framework. This enables dataset generation with advanced medical criteria and the integration of subsequent statistical analysis. The web-based preprocessing service can be deployed locally at the hospital level, protecting patients' privacy while being integrated with existing open source data analysis tools currently being developed across Germany.
RESUMEN
The COVID-19 pandemic has caused strains on health systems worldwide disrupting routine hospital services for all non-COVID patients. Within this retrospective study, we analyzed inpatient hospital admissions across 18 German university hospitals during the 2020 lockdown period compared to 2018. Patients admitted to hospital between January 1 and May 31, 2020 and the corresponding periods in 2018 and 2019 were included in this study. Data derived from electronic health records were collected and analyzed using the data integration center infrastructure implemented in the university hospitals that are part of the four consortia funded by the German Medical Informatics Initiative. Admissions were grouped and counted by ICD 10 chapters and specific reasons for treatment at each site. Pooled aggregated data were centrally analyzed with descriptive statistics to compare absolute and relative differences between time periods of different years. The results illustrate how care process adoptions depended on the COVID-19 epidemiological situation and the criticality of the disease. Overall inpatient hospital admissions decreased by 35% in weeks 1 to 4 and by 30.3% in weeks 5 to 8 after the lockdown announcement compared to 2018. Even hospital admissions for critical care conditions such as malignant cancer treatments were reduced. We also noted a high reduction of emergency admissions such as myocardial infarction (38.7%), whereas the reduction in stroke admissions was smaller (19.6%). In contrast, we observed a considerable reduction in admissions for non-critical clinical situations, such as hysterectomies for benign tumors (78.8%) and hip replacements due to arthrosis (82.4%). In summary, our study shows that the university hospital admission rates in Germany were substantially reduced following the national COVID-19 lockdown. These included critical care or emergency conditions in which deferral is expected to impair clinical outcomes. Future studies are needed to delineate how appropriate medical care of critically ill patients can be maintained during a pandemic.
Asunto(s)
COVID-19/epidemiología , Servicio de Urgencia en Hospital/estadística & datos numéricos , Hospitalización/estadística & datos numéricos , Hospitales Universitarios/estadística & datos numéricos , Pandemias/estadística & datos numéricos , Admisión del Paciente/estadística & datos numéricos , Cuarentena/estadística & datos numéricos , Servicio de Urgencia en Hospital/tendencias , Predicción , Alemania/epidemiología , Hospitalización/tendencias , Hospitales Universitarios/tendencias , Humanos , Admisión del Paciente/tendencias , Cuarentena/tendencias , Estudios Retrospectivos , SARS-CoV-2RESUMEN
BACKGROUND: To make patient care data more accessible for research, German university hospitals join forces in the course of the Medical Informatics Initiative. In a first step, the administrative data of university hospitals is made available for federated utilization. Project-specific de-identification of this data is necessary to satisfy privacy laws. OBJECTIVE: We want to make a statement about the population uniqueness of the data. By generalizing the data, we try to reduce uniqueness and improve k-anonymity. METHODS: We analyze quasi-identifying attributes of the Erlangen University Hospital's billing data regarding population uniqueness and re-identification risk. We count individuals per equality class (k) to measure uniqueness. RESULTS: Because of the diagnoses and procedures being particularly unique in combination with sex and age of the patients, the data set is not anonymized in matters of k-anonymity with k > 1 . We are able to reduce population uniqueness with generalization and suppression of unique domains. CONCLUSION: To create k-anonymity with k > 1 while still maintaining a particular utility of the data, we need to apply further established strategies of de-identification.
Asunto(s)
Anonimización de la Información , Hospitales Universitarios , Informática Médica , Honorarios y Precios , Humanos , Mantenimiento , PrivacidadRESUMEN
PURPOSE: Text summarization of clinical trial descriptions has the potential to reduce the time required to familiarize oneself with the subject of studies by condensing long-form detailed descriptions to concise, meaning-preserving synopses. This work describes the process and quality of automatically generated summaries of clinical trial descriptions using extractive text summarization methods. METHODS: We generated a novel dataset from the detailed descriptions and brief summaries of trials registered on clinicaltrials.gov. We executed several text summarization algorithms on the detailed descriptions in this corpus and calculated the standard ROUGE metrics using the brief summaries included in the record as a reference. To investigate the correlation of these metrics with human sentiments, four reviewers assessed the content-completeness of the generated summaries and the helpfulness of both the generated and reference summaries via a Likert scale questionnaire. RESULTS: The filtering stages of the dataset generation process reduce the 277,228 trials registered on clinicaltrials.gov to 101,016 records usable for the summarization task. On average, the summaries in this corpus are 25% the length of the detailed descriptions. Of the evaluated text summarization methods, the TextRank algorithm exhibits the overall best performance with a ROUGE-1 F1 score of 0.3531, ROUGE-2 F1 score of 0.1723, and ROUGE-L F1 score of 0.3003. These scores correlate with the assessment of the helpfulness and content similarity by the human reviewers. Inter-rater agreement for the helpfulness and content similarity was slight and fair respectively (Fleiss' kappa of 0.12 and 0.22). CONCLUSIONS: Extractive summarization is a viable tool for generating meaning-preserving synopses of detailed clinical trial descriptions. Further, the human evaluation has shown that the ROUGE-L F1 score is useful for rating the general quality of generated summaries of clinical trial descriptions in an automated way.
Asunto(s)
Ensayos Clínicos como Asunto , Algoritmos , Procesamiento de Lenguaje NaturalRESUMEN
INTRODUCTION: Data quality (DQ) is an important prerequisite for secondary use of electronic health record (EHR) data in clinical research, particularly with regards to progressing towards a learning health system, one of the MIRACUM consortium's goals. Following the successful integration of the i2b2 research data repository in MIRACUM, we present a standardized and generic DQ framework. STATE OF THE ART: Already established DQ evaluation methods do not cover all of MIRACUM's requirements. CONCEPT: A data quality analysis plan was developed to assess common data quality dimensions for demographic-, condition-, procedure- and department-related variables of MIRACUM's research data repository. IMPLEMENTATION: A data quality analysis (DQA) tool was developed using R scripts packaged in a Docker image with all the necessary dependencies and R libraries for easy distribution. It integrates with the i2b2 data repository at each MIRACUM site, executes an analysis on the data and generates a DQ report. LESSONS LEARNED: Our DQA tool brings the analysis to the data and thus meets the MIRACUM data protection requirements. It evaluates established DQ dimensions of data repositories in a standardized and easily distributable way. This analysis allowed us to reveal and revise inconsistencies in earlier versions of the ETL jobs. The framework is portable, easy to deploy across different sites and even further adaptable to other database schemes. CONCLUSION: The presented framework provides the first step towards a unified, standardized and harmonized EHR DQ assessment in MIRACUM. DQ issues can now be systematically identified by individual hospitals to subsequently implement site- or consortium-wide feedback loops to increase data quality.
Asunto(s)
Exactitud de los Datos , Registros Electrónicos de Salud , Bases de Datos FactualesRESUMEN
BACKGROUND: High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. The accompanying ADOPT BBMRI-ERIC project kick-started BBMRI-ERIC by collecting colorectal cancer data from European biobanks. OBJECTIVES: To transform these data into a common representation, a uniform approach for data integration and harmonization had to be developed. This article describes the design and the implementation of a toolset for this task. METHODS: Based on the semantics of a metadata repository, we developed a lexical bag-of-words matcher, capable of semiautomatically mapping local biobank terms to the central ADOPT BBMRI-ERIC terminology. Its algorithm supports fuzzy matching, utilization of synonyms, and sentiment tagging. To process the anonymized instance data based on these mappings, we also developed a data transformation application. RESULTS: The implementation was used to process the data from 10 European biobanks. The lexical matcher automatically and correctly mapped 78.48% of the 1,492 local biobank terms, and human experts were able to complete the remaining mappings. We used the expert-curated mappings to successfully process 147,608 data records from 3,415 patients. CONCLUSION: A generic harmonization approach was created and successfully used for cross-institutional data harmonization across 10 European biobanks. The software tools were made available as open source.