Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 40
Filter
1.
J Biomed Inform ; 147: 104505, 2023 11.
Article in English | MEDLINE | ID: mdl-37774908

ABSTRACT

OBJECTIVE: Observational research in cancer poses great challenges regarding adequate data sharing and consolidation based on a homogeneous data semantic base. Common Data Models (CDMs) can help consolidate health data repositories from different institutions minimizing loss of meaning by organizing data into a standard structure. This study aims to evaluate the performance of the Observational Medical Outcomes Partnership (OMOP) CDM, Informatics for Integrating Biology & the Bedside (i2b2) and International Cancer Genome Consortium, Accelerating Research in Genomic Oncology (ICGC ARGO) for representing non-imaging data in a breast cancer use case of EuCanImage. METHODS: We used ontologies to represent metamodels of OMOP, i2b2, and ICGC ARGO and variables used in a cancer use case of a European AI project. We selected four evaluation criteria for the CDMs adapted from previous research: content coverage, simplicity, integration, implementability. RESULTS: i2b2 and OMOP exhibited higher element completeness (100% each) than ICGC ARGO (58.1%), while the three achieved 100% domain completeness. ICGC ARGO normalizes only one of our variables with a standard terminology, while i2b2 and OMOP use standardized vocabularies for all of them. In terms of simplicity, ICGC ARGO and i2b2 proved to be simpler both in terms of ontological model (276 and 175 elements, respectively) and in the queries (7 and 20 lines of code, respectively), while OMOP required a much more complex ontological model (615 elements) and queries similar to those of i2b2 (20 lines). Regarding implementability, OMOP had the highest number of mentions in articles in PubMed (130) and Google Scholar (1,810), ICGC ARGO had the highest number of updates to the CDM since 2020 (4), and i2b2 is the model with more tools specifically developed for the CDM (26). CONCLUSION: ICGC ARGO proved to be rigid and very limited in the representation of oncologic concepts, while i2b2 and OMOP showed a very good performance. i2b2's lack of a common dictionary hinders its scalability, requiring sites that will share data to explicitly define a conceptual framework, and suggesting that OMOP and its Oncology extension could be the more suitable choice. Future research employing these CDMs with actual datasets is needed.


Subject(s)
Breast Neoplasms , Humans , Female , Electronic Health Records , Information Dissemination , Databases, Factual , Genomics
2.
J Am Med Inform Assoc ; 30(12): 1985-1994, 2023 11 17.
Article in English | MEDLINE | ID: mdl-37632234

ABSTRACT

OBJECTIVE: Patients who receive most care within a single healthcare system (colloquially called a "loyalty cohort" since they typically return to the same providers) have mostly complete data within that organization's electronic health record (EHR). Loyalty cohorts have low data missingness, which can unintentionally bias research results. Using proxies of routine care and healthcare utilization metrics, we compute a per-patient score that identifies a loyalty cohort. MATERIALS AND METHODS: We implemented a computable program for the widely adopted i2b2 platform that identifies loyalty cohorts in EHRs based on a machine-learning model, which was previously validated using linked claims data. We developed a novel validation approach, which tests, using only EHR data, whether patients returned to the same healthcare system after the training period. We evaluated these tools at 3 institutions using data from 2017 to 2019. RESULTS: Loyalty cohort calculations to identify patients who returned during a 1-year follow-up yielded a mean area under the receiver operating characteristic curve of 0.77 using the original model and 0.80 after calibrating the model at individual sites. Factors such as multiple medications or visits contributed significantly at all sites. Screening tests' contributions (eg, colonoscopy) varied across sites, likely due to coding and population differences. DISCUSSION: This open-source implementation of a "loyalty score" algorithm had good predictive power. Enriching research cohorts by utilizing these low-missingness patients is a way to obtain the data completeness necessary for accurate causal analysis. CONCLUSION: i2b2 sites can use this approach to select cohorts with mostly complete EHR data.


Subject(s)
Algorithms , Electronic Health Records , Humans , Machine Learning , Delivery of Health Care , Electronics
3.
JAMIA Open ; 6(3): ooad068, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37583654

ABSTRACT

Objective: i2b2 offers the possibility to store biomedical data of different projects in subject oriented data marts of the data warehouse, which potentially requires data replication between different projects and also data synchronization in case of data changes. We present an approach that can save this effort and assess its query performance in a case study that reflects real-world scenarios. Material and Methods: For data segregation, we used PostgreSQL's row level security (RLS) feature, the unit test framework pgTAP for validation and testing as well as the i2b2 application. No change of the i2b2 code was required. Instead, to leverage orchestration and deployment, we additionally implemented a command line interface (CLI). We evaluated performance using 3 different queries generated by i2b2, which we performed on an enlarged Harvard demo dataset. Results: We introduce the open source Python CLI i2b2rls, which orchestrates and manages security roles to implement data marts so that they do not need to be replicated and synchronized as different i2b2 projects. Our evaluation showed that our approach is on average 3.55 and on median 2.71 times slower compared to classic i2b2 data marts, but has more flexibility and easier setup. Conclusion: The RLS-based approach is particularly useful in a scenario with many projects, where data is constantly updated, user and group requirements change frequently or complex user authorization requirements have to be defined. The approach applies to both the i2b2 interface and direct database access.

4.
J Am Med Inform Assoc ; 29(11): 1870-1878, 2022 10 07.
Article in English | MEDLINE | ID: mdl-35932187

ABSTRACT

OBJECTIVE: This study aimed is to: (1) extend the Integrating the Biology and the Bedside (i2b2) data and application models to include medical imaging appropriate use criteria, enabling it to serve as a platform to monitor local impact of the Protecting Access to Medicare Act's (PAMA) imaging clinical decision support (CDS) requirements, and (2) validate the i2b2 extension using data from the Medicare Imaging Demonstration (MID) CDS implementation. MATERIALS AND METHODS: This study provided a reference implementation and assessed its validity and reliability using data from the MID, the federal government's predecessor to PAMA's imaging CDS program. The Star Schema was extended to describe the interactions of imaging ordering providers with the CDS. New ontologies were added to enable mapping medical imaging appropriateness data to i2b2 schema. z-Ratio for testing the significance of the difference between 2 independent proportions was utilized. RESULTS: The reference implementation used 26 327 orders for imaging examinations which were persisted to the modified i2b2 schema. As an illustration of the analytical capabilities of the Web Client, we report that 331/1192 or 28.1% of imaging orders were deemed appropriate by the CDS system at the end of the intervention period (September 2013), an increase from 162/1223 or 13.2% for the first month of the baseline period, December 2011 (P = .0212), consistent with previous studies. CONCLUSIONS: The i2b2 platform can be extended to monitor local impact of PAMA's appropriateness of imaging ordering CDS requirements.


Subject(s)
Decision Support Systems, Clinical , Aged , Diagnostic Imaging , Humans , Medicare , Monitoring, Physiologic , Reproducibility of Results , United States
5.
Stud Health Technol Inform ; 294: 287-291, 2022 May 25.
Article in English | MEDLINE | ID: mdl-35612078

ABSTRACT

Reuse of Electronic Health Records (EHRs) for specific diseases such as COVID-19 requires data to be recorded and persisted according to international standards. Since the beginning of the COVID-19 pandemic, Hospital Universitario 12 de Octubre (H12O) evolved its EHRs: it identified, modeled and standardized the concepts related to this new disease in an agile, flexible and staged way. Thus, data from more than 200,000 COVID-19 cases were extracted, transformed, and loaded into an i2b2 repository. This effort allowed H12O to share data with worldwide networks such as the TriNetX platform and the 4CE Consortium.


Subject(s)
COVID-19 , COVID-19/epidemiology , Electronic Health Records , Humans , Pandemics
6.
Stud Health Technol Inform ; 294: 372-376, 2022 May 25.
Article in English | MEDLINE | ID: mdl-35612099

ABSTRACT

In a national effort aiming at cross-hospitals data interoperability, the Swiss Personalized Health Network elected RDF as preferred data and meta-data representation format. Yet, most clinical research software solutions are not designed to interact with RDF databases. We present a modular Python toolkit allowing easy conversion from RDF graphs to i2b2, adaptable to other common data models (CDM) with reasonable efforts. The tool was designed with feedback from clinicians in both oncology and laboratory research.


Subject(s)
Software , Databases, Factual
7.
Stud Health Technol Inform ; 289: 293-296, 2022 Jan 14.
Article in English | MEDLINE | ID: mdl-35062150

ABSTRACT

Publicly shared repositories play an important role in advancing performance benchmarks for some of the most important tasks in natural language processing (NLP) and healthcare in general. This study reviews most recent benchmarks based on the 2014 n2c2 de-identification dataset. Pre-processing challenges were uncovered, and attention brought to the discrepancies in reported number of Protected Health Information (PHI) entities among the studies. Improved reporting is required for greater transparency and reproducibility.


Subject(s)
Benchmarking , Electronic Health Records , Natural Language Processing , Reproducibility of Results
8.
J Am Med Inform Assoc ; 29(4): 643-651, 2022 03 15.
Article in English | MEDLINE | ID: mdl-34849976

ABSTRACT

OBJECTIVE: Integrating and harmonizing disparate patient data sources into one consolidated data portal enables researchers to conduct analysis efficiently and effectively. MATERIALS AND METHODS: We describe an implementation of Informatics for Integrating Biology and the Bedside (i2b2) to create the Mass General Brigham (MGB) Biobank Portal data repository. The repository integrates data from primary and curated data sources and is updated weekly. The data are made readily available to investigators in a data portal where they can easily construct and export customized datasets for analysis. RESULTS: As of July 2021, there are 125 645 consented patients enrolled in the MGB Biobank. 88 527 (70.5%) have a biospecimen, 55 121 (43.9%) have completed the health information survey, 43 552 (34.7%) have genomic data and 124 760 (99.3%) have EHR data. Twenty machine learning computed phenotypes are calculated on a weekly basis. There are currently 1220 active investigators who have run 58 793 patient queries and exported 10 257 analysis files. DISCUSSION: The Biobank Portal allows noninformatics researchers to conduct study feasibility by querying across many data sources and then extract data that are most useful to them for clinical studies. While institutions require substantial informatics resources to establish and maintain integrated data repositories, they yield significant research value to a wide range of investigators. CONCLUSION: The Biobank Portal and other patient data portals that integrate complex and simple datasets enable diverse research use cases. i2b2 tools to implement these registries and make the data interoperable are open source and freely available.


Subject(s)
Biological Specimen Banks , Information Storage and Retrieval , Data Collection , Humans , Informatics
10.
Stud Health Technol Inform ; 287: 129-133, 2021 Nov 18.
Article in English | MEDLINE | ID: mdl-34795096

ABSTRACT

Reuse of EHRs requires data extraction and transformation processes are based on homogeneous and formalized operations in order to make them understandable, reproducible and auditable. This work aims to define a common framework of data operations for obtaining EHR-derived datasets for secondary use. Thus, 21 operations were identified from different data-driven projects of a 1,300-beds tertiary Hospital. Then, ISO 13606 standard was used to formalize them. This work is the starting point to homogenize ETL processes for the reuse of EHRs, applicable to any condition and organization. In future studies, defined data operations will be implemented and validated in projects of different purposes.


Subject(s)
Electronic Health Records
11.
Stud Health Technol Inform ; 281: 28-32, 2021 May 27.
Article in English | MEDLINE | ID: mdl-34042699

ABSTRACT

This work aims to describe how EHRs have been used to meet the needs of healthcare providers and researchers in a 1,300-beds tertiary Hospital during COVID-19 pandemic. For this purpose, essential clinical concepts were identified and standardized with LOINC and SNOMED CT. After that, these concepts were implemented in EHR systems and based on them, data tools, such as clinical alerts, dynamic patient lists and a clinical follow-up dashboard, were developed for healthcare support. In addition, these data were incorporated into standardized repositories and COVID-19 databases to improve clinical research on this new disease. In conclusion, standardized EHRs allowed implementation of useful multi- purpose data resources in a major Hospital in the course of the pandemic.


Subject(s)
COVID-19 , Pandemics , Electronic Health Records , Humans , SARS-CoV-2 , Tertiary Care Centers
12.
Stud Health Technol Inform ; 281: 462-466, 2021 May 27.
Article in English | MEDLINE | ID: mdl-34042786

ABSTRACT

Data-driven methods in biomedical research can help to obtain new insights into the development, progression and therapy of diseases. Clinical and translational data warehouses such as Informatics for Integrating Biology and the Bedside (i2b2) and tranSMART are important solutions for this. From the well-known FAIR data principles, which are used to address the aspects of findability, accessibility, interoperability and reusability. In this paper, we focus on findability. For this purpose, we describe a portal solution that acts as a catalogue for a wide range of data warehouse instances, featuring a central access point and links to training material, such as user manuals and video tutorials. Moreover, the portal provides an overview of the status of multiple warehouses for developers and a set of statistics about the data currently loaded. Due to its modular design and the use of modern web technologies, the portal is easy to extend and customize to reflect different corporate designs and institutional requirements.


Subject(s)
Biomedical Research , Data Warehousing , Informatics
13.
Stud Health Technol Inform ; 278: 251-259, 2021 May 24.
Article in English | MEDLINE | ID: mdl-34042902

ABSTRACT

In the era of translational research, data integration and clinical data warehouses are important enabling technologies for clinical researchers. The OMOP common data model is a wide-spread choice as a target for data integration in medical informatics. It's portability of queries and analyses across different institutions and data are ideal also from the viewpoint of the FAIR principles. Yet, the OMOP CDM lacks a simple and intuitive user interface for untrained users to run simple queries for feasibility analysis. Aim of this study is to provide an algorithm to translate any given i2b2 query to an equivalent query which can then be run on the OMOP CDM database. The provided algorithm is able to convert queries created in the i2b2 webclient to SQL statements which can be executed on a standard OMOP CDM database programmatically.


Subject(s)
Data Warehousing , Electronic Health Records , Algorithms , Databases, Factual
14.
JMIR Med Inform ; 8(7): e15918, 2020 Jul 21.
Article in English | MEDLINE | ID: mdl-32706673

ABSTRACT

BACKGROUND: Modern data-driven medical research provides new insights into the development and course of diseases and enables novel methods of clinical decision support. Clinical and translational data warehouses, such as Informatics for Integrating Biology and the Bedside (i2b2) and tranSMART, are important infrastructure components that provide users with unified access to the large heterogeneous data sets needed to realize this and support use cases such as cohort selection, hypothesis generation, and ad hoc data analysis. OBJECTIVE: Often, different warehousing platforms are needed to support different use cases and different types of data. Moreover, to achieve an optimal data representation within the target systems, specific domain knowledge is needed when designing data-loading processes. Consequently, informaticians need to work closely with clinicians and researchers in short iterations. This is a challenging task as installing and maintaining warehousing platforms can be complex and time consuming. Furthermore, data loading typically requires significant effort in terms of data preprocessing, cleansing, and restructuring. The platform described in this study aims to address these challenges. METHODS: We formulated system requirements to achieve agility in terms of platform management and data loading. The derived system architecture includes a cloud infrastructure with unified management interfaces for multiple warehouse platforms and a data-loading pipeline with a declarative configuration paradigm and meta-loading approach. The latter compiles data and configuration files into forms required by existing loading tools, thereby automating a wide range of data restructuring and cleansing tasks. We demonstrated the fulfillment of the requirements and the originality of our approach by an experimental evaluation and a comparison with previous work. RESULTS: The platform supports both i2b2 and tranSMART with built-in security. Our experiments showed that the loading pipeline accepts input data that cannot be loaded with existing tools without preprocessing. Moreover, it lowered efforts significantly, reducing the size of configuration files required by factors of up to 22 for tranSMART and 1135 for i2b2. The time required to perform the compilation process was roughly equivalent to the time required for actual data loading. Comparison with other tools showed that our solution was the only tool fulfilling all requirements. CONCLUSIONS: Our platform significantly reduces the efforts required for managing clinical and translational warehouses and for loading data in various formats and structures, such as complex entity-attribute-value structures often found in laboratory data. Moreover, it facilitates the iterative refinement of data representations in the target platforms, as the required configuration files are very compact. The quantitative measurements presented are consistent with our experiences of significantly reduced efforts for building warehousing platforms in close cooperation with medical researchers. Both the cloud-based hosting infrastructure and the data-loading pipeline are available to the community as open source software with comprehensive documentation.

15.
Stud Health Technol Inform ; 270: 78-82, 2020 Jun 16.
Article in English | MEDLINE | ID: mdl-32570350

ABSTRACT

The present work provides a real-world case of the connection process of a hospital, 12 de Octubre University Hospital in Spain, to the TriNetX research network, transforming a compilation of disparate sources into a single harmonized repository which is automatically refreshed every day. It describes the different integration phases: terminology core datasets, specialized sources and eventually automatic refreshment. It also explains the work performed on semantic normalization of the involved clinical terminologies; as well as the resulting benefits the InSite platform services have enabled in the form of research opportunities for the hospital.


Subject(s)
Semantics , Systematized Nomenclature of Medicine , Spain
16.
Stud Health Technol Inform ; 258: 16-20, 2019.
Article in English | MEDLINE | ID: mdl-30942705

ABSTRACT

Secondary use of electronic health records using data warehouses (DW) has become an attractive approach to support clinical research. In order to increase the volume of underlying patient data DWs at different institutions can be connected to research networks. Two obstacles to connect a DW to such a network are the syntactical differences between the involved DW technologies and differences in the data models of the connected DWs. The current work presents an approach to tackle both problems by translating queries from the DW system openEHR into queries from the DW system i2b2 and vice versa. For the subset of queries expressible in the query languages of both systems, the presented approach is well feasible.


Subject(s)
Data Warehousing , Electronic Health Records , Humans , Information Storage and Retrieval
17.
Stud Health Technol Inform ; 258: 21-25, 2019.
Article in English | MEDLINE | ID: mdl-30942706

ABSTRACT

i2b2 and REDCap are two widely adopted solutions respectively to facilitate data re-use for research purpose and to manage non-for-profit research studies. REDCap provides the design specifications to build a web service used to import data from an external source with a procedure called DDP. In this work we have developed a web service that implements these specifications in order to import data from i2b2. Our approach has been tested with a real REDCap study.


Subject(s)
Data Warehousing , Data Analysis
18.
EGEMS (Wash DC) ; 7(1): 4, 2019 03 25.
Article in English | MEDLINE | ID: mdl-30937326

ABSTRACT

The last twenty years of health care research has seen a steady stream of common health care data models implemented for multi-organization research. Each model offers a uniform interface on data from the diverse organizations that implement them, enabling the sharing of research tools and data. While the groups designing the models have had various needs and aims, and the data available has changed significantly in this time, there are nevertheless striking similarities between them. This paper traces the evolution of common data models, describing their similarities and points of departure. We believe the history of this work should be understood and preserved. The work has empowered collaborative research across competing organizations and brought together researchers from clinical practice, universities and research institutes around the planet. Understanding the eco-system of data models designed for collaborative research allows readers to evaluate where we have been, where we are going as a field, and to evaluate the utility of different models to their own work.

19.
J Am Med Inform Assoc ; 26(4): 286-293, 2019 04 01.
Article in English | MEDLINE | ID: mdl-30715327

ABSTRACT

OBJECTIVE: Clinical research data warehouses are largely populated from information extracted from electronic health records (EHRs). While these data provide information about a patient's medications, laboratory results, diagnoses, and history, her social, economic, and environmental determinants of health are also major contributing factors in readmission, morbidity, and mortality and are often absent or unstructured in the EHR. Details about a patient's socioeconomic status may be found in the U.S. census. To facilitate researching the impacts of socioeconomic status on health outcomes, clinical and socioeconomic data must be linked in a repository in a fashion that supports seamless interrogation of these diverse data elements. This study demonstrates a method for linking clinical and location-based data and querying these data in a de-identified data warehouse using Informatics for Integrating Biology and the Bedside. MATERIALS AND METHODS: Patient data were extracted from the EHR at Nebraska Medicine. Socioeconomic variables originated from the 2011-2015 five-year block group estimates from the American Community Survey. Data querying was performed using Informatics for Integrating Biology and the Bedside. All location-based data were truncated to prevent identification of a location with a population <20 000 individuals. RESULTS: We successfully linked location-based and clinical data in a de-identified data warehouse and demonstrated its utility with a sample use case. DISCUSSION: With location-based data available for querying, research investigating the impact of socioeconomic context on health outcomes is possible. Efforts to improve geocoding can readily be incorporated into this model. CONCLUSION: This study demonstrates a means for incorporating and querying census data in a de-identified clinical data warehouse.


Subject(s)
Data Warehousing , Electronic Health Records , Geographic Mapping , Social Class , Social Determinants of Health , Adolescent , Adult , Aged , Aged, 80 and over , Censuses , Child , Child, Preschool , Data Anonymization , Emergency Service, Hospital/statistics & numerical data , Female , Geographic Information Systems , Humans , Infant , Infant, Newborn , Logistic Models , Male , Middle Aged , Nebraska , Socioeconomic Factors , United States , Young Adult
20.
Stud Health Technol Inform ; 255: 45-49, 2018.
Article in English | MEDLINE | ID: mdl-30306904

ABSTRACT

Standards Data Warehouse has been implemented in many hospitals. It has enormous potential to improve performance measurement and health care quality. Accessing, organizing, and using these data to optimize clinical coding are evolving challenges for hospital systems. This paper describes development of a coding data warehouse based Entities-Attribute-Value (EAV) that we created by importing data from the clinical data warehouse (CDW) of public hospital. In particular, it focuses on design, implementation, and evaluation of the warehouse. Moreover, it defines the rules to convert a conceptual model of coding into an EAV logical model and his implementation using integrating biology and the bedside (i2b2). We evaluate it using data research mono and multi-criteria and then calculate the precision of our model. The result shows that, the coding data warehouse provides with good accuracy, an association of diagnostic code and medical act closer the patient's clinical landscape. Doctors without knowledge of coding rules could use this information to optimize and improve the diagnostic coding.


Subject(s)
Clinical Coding , Data Warehousing , Information Storage and Retrieval , Humans , Models, Theoretical
SELECTION OF CITATIONS
SEARCH DETAIL