RESUMO
Laboratory data must be interoperable to be able to accurately compare the results of a lab test between healthcare organizations. To achieve this, terminologies like LOINC (Logical Observation Identifiers, Names and Codes) provide unique identification codes for laboratory tests. Once standardized, the numeric results of laboratory tests can be aggregated and represented in histograms. Due to the characteristics of Real World Data (RWD), outliers and abnormal values are common, but these cases should be treated as exceptions, excluding them from possible analysis. The proposed work analyses two methods capable of automating the selection of histogram limits to sanitize the generated lab test result distributions, Tukey's box-plot method and a "Distance to Density" approach, within the TriNetX Real World Data Network. The generated limits using clinical RWD are generally wider for Tukey's method and narrower for the second method, both greatly dependent on the values used for the algorithm's parameters.
Assuntos
Laboratórios , Logical Observation Identifiers Names and CodesRESUMO
OBJECTIVE: The growing availability of electronic health records (EHR) data opens opportunities for integrative analysis of multi-institutional EHR to produce generalizable knowledge. A key barrier to such integrative analyses is the lack of semantic interoperability across different institutions due to coding differences. We propose a Multiview Incomplete Knowledge Graph Integration (MIKGI) algorithm to integrate information from multiple sources with partially overlapping EHR concept codes to enable translations between healthcare systems. METHODS: The MIKGI algorithm combines knowledge graph information from (i) embeddings trained from the co-occurrence patterns of medical codes within each EHR system and (ii) semantic embeddings of the textual strings of all medical codes obtained from the Self-Aligning Pretrained BERT (SAPBERT) algorithm. Due to the heterogeneity in the coding across healthcare systems, each EHR source provides partial coverage of the available codes. MIKGI synthesizes the incomplete knowledge graphs derived from these multi-source embeddings by minimizing a spherical loss function that combines the pairwise directional similarities of embeddings computed from all available sources. MIKGI outputs harmonized semantic embedding vectors for all EHR codes, which improves the quality of the embeddings and enables direct assessment of both similarity and relatedness between any pair of codes from multiple healthcare systems. RESULTS: With EHR co-occurrence data from Veteran Affairs (VA) healthcare and Mass General Brigham (MGB), MIKGI algorithm produces high quality embeddings for a variety of downstream tasks including detecting known similar or related entity pairs and mapping VA local codes to the relevant EHR codes used at MGB. Based on the cosine similarity of the MIKGI trained embeddings, the AUC was 0.918 for detecting similar entity pairs and 0.809 for detecting related pairs. For cross-institutional medical code mapping, the top 1 and top 5 accuracy were 91.0% and 97.5% when mapping medication codes at VA to RxNorm medication codes at MGB; 59.1% and 75.8% when mapping VA local laboratory codes to LOINC hierarchy. When trained with 500 labels, the lab code mapping attained top 1 and 5 accuracy at 77.7% and 87.9%. MIKGI also attained best performance in selecting VA local lab codes for desired laboratory tests and COVID-19 related features for COVID EHR studies. Compared to existing methods, MIKGI attained the most robust performance with accuracy the highest or near the highest across all tasks. CONCLUSIONS: The proposed MIKGI algorithm can effectively integrate incomplete summary data from biomedical text and EHR data to generate harmonized embeddings for EHR codes for knowledge graph modeling and cross-institutional translation of EHR codes.
Assuntos
COVID-19 , Registros Eletrônicos de Saúde , Algoritmos , Humanos , Logical Observation Identifiers Names and Codes , Reconhecimento Automatizado de PadrãoRESUMO
Measurement concepts are essential to observational healthcare research; however, a lack of concept harmonization limits the quality of research that can be done on multisite research networks. We developed five methods that used a combination of automated, semi-automated and manual approaches for generating measurement concept sets. We validated our concept sets by calculating their frequencies in cohorts from the Columbia University Irving Medical Center (CUIMC) database. For heart transplant patients, the preoperative frequencies of basic metabolic panel concept sets, which we generated by a semi-automated approach, were greater than 99%. We also made concept sets for lumbar puncture and coagulation panels, by automated and manual methods respectively.
Assuntos
Armazenamento e Recuperação da Informação , Logical Observation Identifiers Names and Codes , Bases de Dados Factuais , Humanos , Systematized Nomenclature of MedicineRESUMO
New use cases and the need for quality control and imaging data sharing in health studies require the capacity to align them to reference terminologies. We are interested in mapping the local terminology used at our center to describe imaging procedures to reference terminologies for imaging procedures (RadLex Playbook and LOINC/RSNA Radiology Playbook). We performed a manual mapping of the 200 most frequent imaging report titles at our center (i.e. 73.2% of all imaging exams). The mapping method was based only on information explicitly stated in the titles. The results showed 57.5% and 68.8% of exact mapping to the RadLex and LOINC/RSNA Radiology Playbooks, respectively. We identified the reasons for the mapping failure and analyzed the issues encountered.
Assuntos
Disseminação de Informação/métodos , Logical Observation Identifiers Names and Codes , Sistemas de Informação em Radiologia/tendências , Radiologia , Radiografia , Radiologia/métodos , Radiologia/tendências , Terminologia como AssuntoRESUMO
In 2018 the University Hospital of Giessen (UHG) moved its hospital information system from an in-house solution to commercial software. The introduction of MEONA and Synedra-AIM allowed for the successful migration of clinical documents. The large pool of structured clinical data has been addressed in a second step and is now consolidated in a HAPI-FHIR server and mapped to LOINC and SNOMED for semantic interoperability in multicenter research projects, especially the German Medical Informatics Initiative (MII) and the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium.
Assuntos
Logical Observation Identifiers Names and Codes , Informática Médica , Hospitais Universitários , Humanos , Software , Systematized Nomenclature of MedicineRESUMO
OBJECTIVE: Assess the effectiveness of providing Logical Observation Identifiers Names and Codes (LOINC®)-to-In Vitro Diagnostic (LIVD) coding specification, required by the United States Department of Health and Human Services for SARS-CoV-2 reporting, in medical center laboratories and utilize findings to inform future United States Food and Drug Administration policy on the use of real-world evidence in regulatory decisions. MATERIALS AND METHODS: We compared gaps and similarities between diagnostic test manufacturers' recommended LOINC® codes and the LOINC® codes used in medical center laboratories for the same tests. RESULTS: Five medical centers and three test manufacturers extracted data from laboratory information systems (LIS) for prioritized tests of interest. The data submission ranged from 74 to 532 LOINC® codes per site. Three test manufacturers submitted 15 LIVD catalogs representing 26 distinct devices, 6956 tests, and 686 LOINC® codes. We identified mismatches in how medical centers use LOINC® to encode laboratory tests compared to how test manufacturers encode the same laboratory tests. Of 331 tests available in the LIVD files, 136 (41%) were represented by a mismatched LOINC® code by the medical centers (chi-square 45.0, 4 df, P < .0001). DISCUSSION: The five medical centers and three test manufacturers vary in how they organize, categorize, and store LIS catalog information. This variation impacts data quality and interoperability. CONCLUSION: The results of the study indicate that providing the LIVD mappings was not sufficient to support laboratory data interoperability. National implementation of LIVD and further efforts to promote laboratory interoperability will require a more comprehensive effort and continuing evaluation and quality control.
Assuntos
COVID-19 , Sistemas de Informação em Laboratório Clínico , Humanos , Laboratórios , Logical Observation Identifiers Names and Codes , SARS-CoV-2 , Estados UnidosRESUMO
BACKGROUND: Screening for eligible patients continues to pose a great challenge for many clinical trials. This has led to a rapidly growing interest in standardizing computable representations of eligibility criteria (EC) in order to develop tools that leverage data from electronic health record (EHR) systems. Although laboratory procedures (LP) represent a common entity of EC that is readily available and retrievable from EHR systems, there is a lack of interoperable data models for this entity of EC. A public, specialized data model that utilizes international, widely-adopted terminology for LP, e.g. Logical Observation Identifiers Names and Codes (LOINC®), is much needed to support automated screening tools. OBJECTIVE: The aim of this study is to establish a core dataset for LP most frequently requested to recruit patients for clinical trials using LOINC terminology. Employing such a core dataset could enhance the interface between study feasibility platforms and EHR systems and significantly improve automatic patient recruitment. METHODS: We used a semi-automated approach to analyze 10,516 screening forms from the Medical Data Models (MDM) portal's data repository that are pre-annotated with Unified Medical Language System (UMLS). An automated semantic analysis based on concept frequency is followed by an extensive manual expert review performed by physicians to analyze complex recruitment-relevant concepts not amenable to automatic approach. RESULTS: Based on analysis of 138,225 EC from 10,516 screening forms, 55 laboratory procedures represented 77.87% of all UMLS laboratory concept occurrences identified in the selected EC forms. We identified 26,413 unique UMLS concepts from 118 UMLS semantic types and covered the vast majority of Medical Subject Headings (MeSH) disease domains. CONCLUSIONS: Only a small set of common LP covers the majority of laboratory concepts in screening EC forms which supports the feasibility of establishing a focused core dataset for LP. We present ELaPro, a novel, LOINC-mapped, core dataset for the most frequent 55 LP requested in screening for clinical trials. ELaPro is available in multiple machine-readable data formats like CSV, ODM and HL7 FHIR. The extensive manual curation of this large number of free-text EC as well as the combining of UMLS and LOINC terminologies distinguishes this specialized dataset from previous relevant datasets in the literature.
Assuntos
Logical Observation Identifiers Names and Codes , Medical Subject Headings , Humanos , SemânticaRESUMO
When Donald A.B. Lindberg M.D. became Director in 1984, the U.S. National Library of Medicine (NLM) was a leader in the development and use of information standards for published literature but had no involvement with standards for clinical data. When Dr. Lindberg retired in 2015, NLM was the Central Coordinating Body for Clinical Terminology Standards within the U.S. Department of Health and Human Services, a major funder of ongoing maintenance and free dissemination of clinical terminology standards required for use in U.S. electronic health records (EHRs), and the provider of many services and tools to support the use of terminology standards in health care, public health, and research. This chapter describes key factors in the transformation of NLM into a significant player in the establishment of U.S. terminology standards for electronic health records.
Assuntos
Registros Eletrônicos de Saúde , Troca de Informação em Saúde , National Library of Medicine (U.S.) , Humanos , Liderança , Logical Observation Identifiers Names and Codes , Saúde Pública , RxNorm , Estados UnidosRESUMO
An OpenEHR template based on LOINC terms in German language (LOINC-DE) has been created for the structured clinical data capture. The resulting template includes all terms available in LOINC-DE, which can be selected from the drop-down menu for clinical data capture. The template can be used as an independent laboratory form or it can be customized for local needs. This approach presents the possibility to include terminologies in EHR when capturing patient data.
Assuntos
Idioma , Logical Observation Identifiers Names and Codes , Humanos , Laboratórios , SemânticaRESUMO
The objectives of this paper are to analyze the terminologies SNOMED CT and Logical Observation Identifiers Names and Codes (LOINC) and to provide a guideline for the translation of LOINC concepts to SNOMED CT. Verified research data sets were used for this study, so this experiment is replicable with other research data. 50 LOINC concepts of frequently performed laboratory services were translated to SNOMED CT. Information would be lost with pre-coordinated mapping but the compositional grammar of SNOMED CT allows for the linking of individual concepts into complicated postcoordinated expressions including all embedded information in LOINC concepts. All information can thus be transferred smoothly to SNOMED CT.
Assuntos
Logical Observation Identifiers Names and Codes , Systematized Nomenclature of Medicine , Linguística , TraduçõesRESUMO
Infectious diseases due to microbial resistance pose a worldwide threat that calls for data sharing and the rapid reuse of medical data from health care to research. The integration of pathogen-related data from different hospitals can yield intelligent infection control systems that detect potentially dangerous germs as early as possible. Within the use case Infection Control of the German HiGHmed Project, eight university hospitals have agreed to share their data to enable analysis of various data sources. Data sharing among different hospitals requires interoperability standards that define the structure and the terminology of the information to be exchanged. This article presents the work performed at the University Hospital Charité and Berlin Institute of Health towards a standard model to exchange microbiology data. Fast Healthcare Interoperability Resources (FHIR) is a standard for fast information exchange that allows to model healthcare information, based on information packets called resources, which can be customized into so-called profiles to match use case- specific needs. We show how we created the specific profiles for microbiology data. The model was implemented using FHIR for the structure definition, and the international standards SNOMED CT and LOINC for the terminology services.
Assuntos
Logical Observation Identifiers Names and Codes , Systematized Nomenclature of Medicine , Academias e Institutos , Atenção à Saúde , Humanos , Disseminação de InformaçãoRESUMO
Semantic interoperability is a major challenge in multi-center data sharing projects, a challenge that the German Initiative for Medical Informatics is taking up. With respect to laboratory data, enriching site-specific tests and measurements with LOINC codes appears to be a crucial step in supporting cross-institutional research. However, this effort is very time-consuming, as it requires expert knowledge of local site specifics. To ease this process, we developed a generic manual collaborative terminology mapping tool, the MIRACUM Mapper. It allows the creation of arbitrary mapping workflows involving different user roles. A mapping workflow with two user roles has been implemented at University Hospital Erlangen to support the local LOINC mapping. Additionally, the MIRACUM LabVisualizeR provides summary statistics and visualizations of analyte data. We developed a toolbox that facilitates the collaborative creation of mappings and streamlines the review as well as the validation process. The two tools are available under an open source license.
Assuntos
Logical Observation Identifiers Names and Codes , Informática Médica , Instalações de Saúde , Humanos , Disseminação de Informação , LaboratóriosRESUMO
The local laboratory with a local client-base, that never needs to exchange information with any outside entity, is a dying breed. As marketing channels, animal movement, and reporting requirements become increasingly national and international, the need to communicate about laboratory tests and results grows. Local and proprietary names of laboratory tests often fail to communicate enough detail to distinguish between similar tests. To avoid a lengthy description of each test, laboratories need the ability to assign codes that, although not sufficiently user-friendly for day-to-day use, contain enough information to translate between laboratories and even languages. The Logical Observation Identifiers Names and Codes (LOINC) standard provides such a universal coding system. Each test-each atomic observation-is evaluated on 6 attributes that establish its uniqueness at the level of clinical-or epidemiologic-significance. The analyte detected, analyte property, specimen, and result scale combine with the method of analysis and timing (for challenge and metabolic type tests) to define a unique LOINC code. Equipping laboratory results with such universal identifiers creates a world of opportunity for cross-institutional data exchange, aggregation, and analysis, and presents possibilities for data mining and artificial intelligence on a national and international scale. A few challenges, relatively unique to regulatory veterinary test protocols, require special handling.
Assuntos
Doenças dos Animais/diagnóstico , Sistemas de Informação em Laboratório Clínico/estatística & dados numéricos , Laboratórios/normas , Logical Observation Identifiers Names and Codes , Medicina Veterinária/normas , Animais , Inteligência Artificial , Mineração de DadosRESUMO
BACKGROUND: COVID-19 ranks as the single largest health incident worldwide in decades. In such a scenario, electronic health records (EHRs) should provide a timely response to healthcare needs and to data uses that go beyond direct medical care and are known as secondary uses, which include biomedical research. However, it is usual for each data analysis initiative to define its own information model in line with its requirements. These specifications share clinical concepts, but differ in format and recording criteria, something that creates data entry redundancy in multiple electronic data capture systems (EDCs) with the consequent investment of effort and time by the organization. OBJECTIVE: This study sought to design and implement a flexible methodology based on detailed clinical models (DCM), which would enable EHRs generated in a tertiary hospital to be effectively reused without loss of meaning and within a short time. MATERIAL AND METHODS: The proposed methodology comprises four stages: (1) specification of an initial set of relevant variables for COVID-19; (2) modeling and formalization of clinical concepts using ISO 13606 standard and SNOMED CT and LOINC terminologies; (3) definition of transformation rules to generate secondary use models from standardized EHRs and development of them using R language; and (4) implementation and validation of the methodology through the generation of the International Severe Acute Respiratory and emerging Infection Consortium (ISARIC-WHO) COVID-19 case report form. This process has been implemented into a 1300-bed tertiary Hospital for a cohort of 4489 patients hospitalized from 25 February 2020 to 10 September 2020. RESULTS: An initial and expandable set of relevant concepts for COVID-19 was identified, modeled and formalized using ISO-13606 standard and SNOMED CT and LOINC terminologies. Similarly, an algorithm was designed and implemented with R and then applied to process EHRs in accordance with standardized concepts, transforming them into secondary use models. Lastly, these resources were applied to obtain a data extract conforming to the ISARIC-WHO COVID-19 case report form, without requiring manual data collection. The methodology allowed obtaining the observation domain of this model with a coverage of over 85% of patients in the majority of concepts. CONCLUSION: This study has furnished a solution to the difficulty of rapidly and efficiently obtaining EHR-derived data for secondary use in COVID-19, capable of adapting to changes in data specifications and applicable to other organizations and other health conditions. The conclusion to be drawn from this initial validation is that this DCM-based methodology allows the effective reuse of EHRs generated in a tertiary Hospital during COVID-19 pandemic, with no additional effort or time for the organization and with a greater data scope than that yielded by conventional manual data collection process in ad-hoc EDCs.
Assuntos
COVID-19/patologia , Conjuntos de Dados como Assunto , Registros Eletrônicos de Saúde , Algoritmos , COVID-19/epidemiologia , COVID-19/virologia , Estudos de Coortes , Humanos , Logical Observation Identifiers Names and Codes , SARS-CoV-2/isolamento & purificação , Systematized Nomenclature of MedicineRESUMO
In this study we seek to determine the efficacy of using automated mapping methods to reduce the manual mapping burden of laboratory data to LOINC(r) on a nationwide electronic health record derived oncology specific dataset. We developed novel encoding methodologies to vectorize free text lab data, and evaluated logistic regression, random forest, and knn machine learning classifiers. All machine learning models did significantly better than deterministic baseline algorithms. The best classifiers were random forest and were able to predict the correct LOINC code 94.5% of the time. Ensemble classifiers further increased accuracy, with the best ensemble classifier predicting the same code 80.5% of the time with an accuracy of 99%. We conclude that by using an automated laboratory mapping model we can both reduce manual mapping time, and increase quality of mappings, suggesting automated mapping is a viable tool in a real-world oncology dataset.
Assuntos
Logical Observation Identifiers Names and Codes , Aprendizado de Máquina , Algoritmos , Registros Eletrônicos de Saúde , Humanos , LaboratóriosRESUMO
In this paper, we describe a strategy for the development of a genetic analysis comprehensive representation. The primary intention is to ensure the available utilization of genetic analysis results in clinical practice. The system is called Personnel Genetic Card (PGC), and it is developed in cooperation of CIIRC CTU in Prague and the Mediware company. Nowadays, genetic information is more and more part of medicine and life quality services (e.g. nutritional consulting). Therefore, there is necessary to bind genetic information with the clinical phenotype, such as drug metabolism or intolerance to various substances. We proposed a structured form of the record, where we utilize the LOINC® standard to identify genetic test parameters, and several terminology databases for representing specific genetic information (e.g. HGNC, NCBI RefSeq, NCBI dbNSP, HGVS). Further, there are also several knowledge databases (PharmGKB, SNPedia, ClinVar) that collect interpretation for genetic analysis results. In the results of this paper, we describe our idea in the structure and process perspective. The structural perspective includes the representation of the analysis record and its binding with the interpretations. The process perspective describes roles and activities within the PGC system use.
Assuntos
Testes Genéticos , Informações Pessoalmente Identificáveis , Bases de Dados Genéticas , Logical Observation Identifiers Names and Codes , FenótipoRESUMO
Large observational data networks that leverage routine clinical practice data in electronic health records (EHRs) are critical resources for research on coronavirus disease 2019 (COVID-19). Data normalization is a key challenge for the secondary use of EHRs for COVID-19 research across institutions. In this study, we addressed the challenge of automating the normalization of COVID-19 diagnostic tests, which are critical data elements, but for which controlled terminology terms were published after clinical implementation. We developed a simple but effective rule-based tool called COVID-19 TestNorm to automatically normalize local COVID-19 testing names to standard LOINC (Logical Observation Identifiers Names and Codes) codes. COVID-19 TestNorm was developed and evaluated using 568 test names collected from 8 healthcare systems. Our results show that it could achieve an accuracy of 97.4% on an independent test set. COVID-19 TestNorm is available as an open-source package for developers and as an online Web application for end users (https://clamp.uth.edu/covid/loinc.php). We believe that it will be a useful tool to support secondary use of EHRs for research on COVID-19.