RESUMEN
Left ventricular ejection fraction (LVEF) is an important prognostic indicator of cardiovascular outcomes. It is used clinically to determine the indication for several therapeutic interventions. LVEF is most commonly derived using in-line tools and some manual assessment by cardiologists from standardized echocardiographic views. LVEF is typically documented in free-text reports, and variation in LVEF documentation pose a challenge for the extraction and utilization of LVEF in computer-based clinical workflows. To address this problem, we developed a computerized algorithm to extract LVEF from echocardiography reports for the identification of patients having heart failure with reduced ejection fraction (HFrEF) for therapeutic intervention at a large healthcare system. We processed echocardiogram reports for 57,158 patients with coded diagnosis of Heart Failure that visited the healthcare system over a two-year period. Our algorithm identified a total of 3910 patients with reduced ejection fraction. Of the 46,634 echocardiography reports processed, 97% included a mention of LVEF. Of these reports, 85% contained numerical ejection fraction values, 9% contained ranges, and the remaining 6% contained qualitative descriptions. Overall, 18% of extracted numerical LVEFs were ≤ 40%. Furthermore, manual validation for a sample of 339 reports yielded an accuracy of 1.0. Our study demonstrates that a regular expression-based approach can accurately extract LVEF from echocardiograms, and is useful for delineating heart-failure patients with reduced ejection fraction.
Asunto(s)
Ecocardiografía , Insuficiencia Cardíaca/fisiopatología , Volumen Sistólico , Función Ventricular Izquierda , Algoritmos , Humanos , PronósticoRESUMEN
Historically, medical images collected in the course of clinical care have been difficult to access for secondary research studies. While there is a tremendous potential value in the large volume of studies contained in clinical image archives, Picture Archiving and Communication Systems (PACS) are designed to optimize clinical operations and workflow. Search capabilities in PACS are basic, limiting their use for population studies, and duplication of archives for research is costly. To address this need, we augment the Informatics for Integrating Biology and the Bedside (i2b2) open source software, providing investigators with the tools necessary to query and integrate medical record and clinical research data. Over 100 healthcare institutions have installed this suite of software tools that allows investigators to search medical record metadata including images for specific types of patients. In this report, we describe a new Medical Imaging Informatics Bench to Bedside (mi2b2) module ( www.mi2b2.org ), available now as an open source addition to the i2b2 software platform that allows medical imaging examinations collected during routine clinical care to be made available to translational investigators directly from their institution's clinical PACS for research and educational use in compliance with the Health Insurance Portability and Accountability Act (HIPAA) Omnibus Rule. Access governance within the mi2b2 module is customizable per institution and PACS minimizing impact on clinical systems. Currently in active use at our institutions, this new technology has already been used to facilitate access to thousands of clinical MRI brain studies representing specific patient phenotypes for use in research.
Asunto(s)
Investigación Biomédica/organización & administración , Almacenamiento y Recuperación de la Información , Sistemas de Registros Médicos Computarizados/organización & administración , Sistemas de Información Radiológica/organización & administración , Diagnóstico por Imagen/métodos , Humanos , Innovación Organizacional , Mejoramiento de la Calidad , Integración de SistemasRESUMEN
The success of many population studies is determined by proper matching of cases to controls. Some of the confounding and bias that afflict electronic health record (EHR)-based observational studies may be reduced by creating effective methods for finding adequate controls. We implemented a method to match case and control populations to compensate for sparse and unequal data collection practices common in EHR data. We did this by matching the healthcare utilization of patients after observing that more complete data was collected on high healthcare utilization patients vs. low healthcare utilization patients. In our results, we show that many of the anomalous differences in population comparisons are mitigated using this matching method compared to other traditional age and gender-based matching. As an example, the comparison of the disease associations of ulcerative colitis and Crohn's disease show differences that are not present when the controls are chosen in a random or even a matched age/gender/race algorithm. In conclusion, the use of healthcare utilization-based matching algorithms to find adequate controls greatly enhanced the accuracy of results in EHR studies. Full source code and documentation of the control matching methods is available at https://community.i2b2.org/wiki/display/conmat/.
Asunto(s)
Comorbilidad , Registros Electrónicos de Salud/clasificación , Enfermedades Inflamatorias del Intestino/epidemiología , Informática Médica/métodos , Algoritmos , Estudios de Casos y Controles , Humanos , Aceptación de la Atención de SaludRESUMEN
ABSTRACT: Patients with chronic lymphocytic leukemia (CLL) and non-Hodgkin lymphoma (NHL) can develop hypogammaglobulinemia, a form of secondary immune deficiency (SID), from the disease and treatments. Patients with hypogammaglobulinemia with recurrent infections may benefit from immunoglobulin replacement therapy (IgRT). This study evaluated patterns of immunoglobulin G (IgG) testing and the effectiveness of IgRT in real-world patients with CLL or NHL. A retrospective, longitudinal study was conducted among adult patients diagnosed with CLL or NHL. Clinical data from the Massachusetts General Brigham Research Patient Data Registry were used. IgG testing, infections, and antimicrobial use were compared before vs 3, 6, and 12 months after IgRT initiation. Generalized estimating equation logistic regression models were used to estimate odds ratios, 95% confidence intervals, and P values. The study population included 17 192 patients (CLL: n = 3960; median age, 68 years; NHL: n = 13 232; median age, 64 years). In the CLL and NHL cohorts, 67% and 51.2% had IgG testing, and 6.5% and 4.7% received IgRT, respectively. After IgRT initiation, the proportion of patients with hypogammaglobulinemia, the odds of infections or severe infections, and associated antimicrobial use, decreased significantly. Increased frequency of IgG testing was associated with a significantly lower likelihood of severe infection. In conclusion, in real-world patients with CLL or NHL, IgRT was associated with significant reductions in hypogammaglobulinemia, infections, severe infections, and associated antimicrobials. Optimizing IgG testing and IgRT are warranted for the comprehensive management of SID in patients with CLL or NHL.
Asunto(s)
Inmunoglobulina G , Leucemia Linfocítica Crónica de Células B , Linfoma no Hodgkin , Humanos , Leucemia Linfocítica Crónica de Células B/complicaciones , Leucemia Linfocítica Crónica de Células B/terapia , Anciano , Persona de Mediana Edad , Masculino , Femenino , Inmunoglobulina G/sangre , Linfoma no Hodgkin/terapia , Linfoma no Hodgkin/complicaciones , Estudios Retrospectivos , Infecciones/etiología , Agammaglobulinemia/complicaciones , Agammaglobulinemia/terapia , Agammaglobulinemia/etiología , Resultado del Tratamiento , Estudios Longitudinales , Anciano de 80 o más Años , Adulto , Inmunización Pasiva/métodosRESUMEN
OBJECTIVE: Integrating and harmonizing disparate patient data sources into one consolidated data portal enables researchers to conduct analysis efficiently and effectively. MATERIALS AND METHODS: We describe an implementation of Informatics for Integrating Biology and the Bedside (i2b2) to create the Mass General Brigham (MGB) Biobank Portal data repository. The repository integrates data from primary and curated data sources and is updated weekly. The data are made readily available to investigators in a data portal where they can easily construct and export customized datasets for analysis. RESULTS: As of July 2021, there are 125 645 consented patients enrolled in the MGB Biobank. 88 527 (70.5%) have a biospecimen, 55 121 (43.9%) have completed the health information survey, 43 552 (34.7%) have genomic data and 124 760 (99.3%) have EHR data. Twenty machine learning computed phenotypes are calculated on a weekly basis. There are currently 1220 active investigators who have run 58 793 patient queries and exported 10 257 analysis files. DISCUSSION: The Biobank Portal allows noninformatics researchers to conduct study feasibility by querying across many data sources and then extract data that are most useful to them for clinical studies. While institutions require substantial informatics resources to establish and maintain integrated data repositories, they yield significant research value to a wide range of investigators. CONCLUSION: The Biobank Portal and other patient data portals that integrate complex and simple datasets enable diverse research use cases. i2b2 tools to implement these registries and make the data interoperable are open source and freely available.
Asunto(s)
Bancos de Muestras Biológicas , Almacenamiento y Recuperación de la Información , Recolección de Datos , Humanos , InformáticaRESUMEN
BACKGROUND: The conventional approach for clinical studies is to identify a cohort of potentially eligible patients and then screen for enrollment. In an effort to reduce the cost and manual effort involved in the screening process, several studies have leveraged electronic health records (EHR) to refine cohorts to better match the eligibility criteria, which is referred to as phenotyping. We extend this approach to dynamically identify a cohort by repeating phenotyping in alternation with manual screening. METHODS: Our approach consists of multiple screen cycles. At the start of each cycle, the phenotyping algorithm is used to identify eligible patients from the EHR, creating an ordered list such that patients that are most likely eligible are listed first. This list is then manually screened, and the results are analyzed to improve the phenotyping for the next cycle. We describe the preliminary results and challenges in the implementation of this approach for an intervention study on heart failure. RESULTS: A total of 1,022 patients were screened, with 223 (23%) of patients being found eligible for enrollment into the intervention study. The iterative approach improved the phenotyping in each screening cycle. Without an iterative approach, the positive screening rate (PSR) was expected to dip below the 20% measured in the first cycle; however, the cyclical approach increased the PSR to 23%. CONCLUSIONS: Our study demonstrates that dynamic phenotyping can facilitate recruitment for prospective clinical study. Future directions include improved informatics infrastructure and governance policies to enable real-time updates to research repositories, tooling for EHR annotation, and methodologies to reduce human annotation.
RESUMEN
The wide gap between a care provider's conceptualization of electronic health record (EHR) and the structures for electronic health record (EHR) data storage and transmission, presents a multitude of obstacles for development of innovative Health IT applications. While developers model the EHR view of the clinicians at one end, they work with a different data view to construct health IT applications. Although there has been considerable progress to bridge this gap by evolution of developer friendly standards and tools for terminology mapping and data warehousing, there is a need for a simplified framework to facilitate development of interoperable applications. To this end, we propose a framework for creating a layer of semantic abstraction on the EHR and describe preliminary work on the implementation of this framework for management of hyperlipidemia and hypertension. Our goal is to facilitate the rapid development and portability of Health IT applications.
RESUMEN
Objective: Healthcare organizations use research data models supported by projects and tools that interest them, which often means organizations must support the same data in multiple models. The healthcare research ecosystem would benefit if tools and projects could be adopted independently from the underlying data model. Here, we introduce the concept of a reusable application programming interface (API) for healthcare and show that the i2b2 API can be adapted to support diverse patient-centric data models. Materials and Methods: We develop methodology for extending i2b2's pre-existing API to query additional data models, using i2b2's recent "multi-fact-table querying" feature. Our method involves developing data-model-specific i2b2 ontologies and mapping these to query non-standard table structure. Results: We implement this methodology to query OMOP and PCORnet models, which we validate with the i2b2 query tool. We implement the entire PCORnet data model and a five-domain subset of the OMOP model. We also demonstrate that additional, ancillary data model columns can be modeled and queried as i2b2 "modifiers." Discussion: i2b2's REST API can be used to query multiple healthcare data models, enabling shared tooling to have a choice of backend data stores. This enables separation between data model and software tooling for some of the more popular open analytic data models in healthcare. Conclusion: This methodology immediately allows querying OMOP and PCORnet using the i2b2 API. It is released as an open-source set of Docker images, and also on the i2b2 community wiki.
Asunto(s)
Macrodatos , Data Warehousing/métodos , Registros Electrónicos de Salud , Internet , Investigación Biomédica , Bases de Datos Factuales , Humanos , Modelos Teóricos , Programas Informáticos , Vocabulario ControladoRESUMEN
We present here our preliminary work in using simple two-way categorical tests to discover associations between categorical items in a clinical data repository. Initial results using the chi square test yielded diagnosis code associations that seemed plausible as well as several that did not. This may be due in part to the effect of sample size. Tests more resistant to the effects of sample size may yield a higher fraction of plausible diagnosis code associations.