RESUMO
INTRODUCTION: Medical terminologies and code systems, which play a vital role in the health domain, are rarely static but undergo changes as knowledge and terminology evolves. This includes addition, deletion and relabeling of terms, and, if terms are organized hierarchically, changing their position. Tracking these changes may become important if one uses multiple versions of the same terminology and interoperability is desired. METHOD: We propose a new method for automatic change tracking between terminology versions. It consists of a declarative import pipeline, which translates source terminologies into a common data model. We then use semantic and lexical change detection algorithms. They produce an ontology-based representation of terminology changes, which can be queried using semantic query languages. RESULTS: The method proves accurate in detecting additions, deletions, relocations and renaming of terms. In cases where inter-version term mapping information is provided by the publisher, we were able to highly enhance the ability to differentiate between simple additions/deletions and refinements/consolidation of terms. CONCLUSION: The method proves effective for semi-automatic change handling if term refinements and consolidation are relevant and for automatic change detection if additional mapping information is available.
Assuntos
Semântica , Vocabulário Controlado , Algoritmos , Terminologia como Assunto , Processamento de Linguagem Natural , HumanosRESUMO
The evolution of digital media has increased the number of crimes committed using digital equipment. This has led to the evolution of the computer forensics area to digital forensics (DF). Such an area aims to analyze information through its main phases of identification, collection, organization, and presentation (reporting). As this area has evolved, many techniques have been developed, mainly focusing on the formalization of terminologies and concepts for providing a common vocabulary comprehension. This has demanded efforts on several initiatives, such as the definition of ontologies, which are a means to identify the main concepts of a given area. Hence, the existing literature provides several ontologies developed for supporting the DF area. Therefore, to identify and analyze the existing ontologies for DF, this paper presents a systematic literature review (SLR) in which primary studies in the literature are studied. This SLR resulted in the identification of ontology building methodologies, ontology types, feasibility points, evaluation/assessment methods, and DF phases and subareas ontologies have supported. These results were based on the analysis of 29 ontologies that aided in answering six research questions. Another contribution of this paper is a set of recommendations on further ontology-based support of DF investigation, which can guide researchers and practitioners in covering existing research gaps.
Assuntos
Ciências Forenses , Humanos , Ciências Forenses/métodos , Tecnologia Digital , Terminologia como Assunto , Vocabulário ControladoRESUMO
This article presents our experience in development an ontological model can be used in clinical decision support systems (CDSS) creating. We have used the largest international biomedical terminological metathesaurus the Unified Medical Language System (UMLS) as the basis of our model. This metathesaurus has been adapted into Russian using an automated hybrid translation system with expert control. The product we have created was named as the National Unified Terminological System (NUTS). We have added more than 33 million scientific and clinical relationships between NUTS terms, extracted from the texts of scientific articles and electronic health records. We have also computed weights for each relationship, standardized their values and created symptom checker in preliminary diagnostics based on this. We expect, that the NUTS allow solving task of named entity recognition (NER) and increasing terms interoperability in different CDSS.
Assuntos
Registros Eletrônicos de Saúde , Bases de Conhecimento , Unified Medical Language System , Sistemas de Apoio a Decisões Clínicas , Processamento de Linguagem Natural , Humanos , Federação Russa , Vocabulário ControladoRESUMO
This article presents experience in construction the National Unified Terminological System (NUTS) with an ontological structure based on international Unified Medical Language System (UMLS). UMLS has been adapted and enriched with formulations from national directories, relationships, extracted from the texts of scientific articles and electronic health records, and weight coefficients.
Assuntos
Registros Eletrônicos de Saúde , Unified Medical Language System , Processamento de Linguagem Natural , Terminologia como Assunto , Vocabulário ControladoRESUMO
Administrable dose form can be obtained after (no-)transformation from pharmaceutical dose form. Building on the creation of a small ontology of 428 pharmaceutical dose forms from EDQM to support alignment with other dose form ontologies (SNOMED-CT, RxNorm), the present study is focused on a simple ontology of 308 administrable dose forms, 27 Intended Sites and an intermediary level of 65 dose form groupers. The ontology was created after 432 pharmaceutical dose forms, 65 combined pharmaceutical dose forms and 73 combined terms were linked by EDQM to administrable dose forms during the UNICOM project. The article describes these resources, the resulting ontology, the differences between its top-level concepts and the source's. It presents the protocol for a validation study through expert review, as a preparation for use case studies.
Assuntos
Systematized Nomenclature of Medicine , Humanos , Preparações Farmacêuticas , Processamento de Linguagem Natural , Vocabulário ControladoRESUMO
One Digital Health (ODH) merges the Digital Health and One Health approaches to create a comprehensive framework for future health ecosystems. In this rapidly evolving field, a standardized vocabulary is not just a convenience, but a necessity to ensure efficient communication. This research proposes the development of a "One Digital Health-Unified Terminology" (ODH-UT) to facilitate communication among researchers and practitioners in Digital Health and One Health, addressing this crucial need.
Assuntos
Terminologia como Assunto , Humanos , Vocabulário Controlado , Saúde DigitalRESUMO
We are creating a synergy among European Health Data Space projects (e.g., IDERHA, EUCAIM, ASCAPE, iHELP, Bigpicture, and HealthData@EU pilot project) via health standards usage thanks to the HSBOOSTER EU Project since they are involved or using standards, and/or designing health ontologies. We compare health-standardized models/ontologies/terminologies such as HL7 FHIR, DICOM, OMOP, ISO TC 215 Health Informatics, W3C DCAT, etc. used in those projects.
Assuntos
Neoplasias , Humanos , Neoplasias/terapia , Registros Eletrônicos de Saúde/normas , Europa (Continente) , Vocabulário ControladoRESUMO
Ontology is essential for achieving health information and information technology application interoperability in the biomedical fields and beyond. Traditionally, ontology construction is carried out manually by human domain experts (HDE). Here, we explore an active learning approach to automatically identify candidate terms from publications, with manual verification later as a part of a deep learning model training and learning process. We introduce the overall architecture of the active learning pipeline and present some preliminary results. This work is a critical and complementary component in addition to manually building the ontology, especially during the long-term maintenance stage.
Assuntos
Ontologias Biológicas , Humanos , Terminologia como Assunto , Aprendizagem Baseada em Problemas , Aprendizado de Máquina Supervisionado , Vocabulário ControladoRESUMO
PURPOSE: Mapping clinical observations and medical test results into the standardized vocabulary LOINC is a prerequisite for exchanging clinical data between health information systems and ensuring efficient interoperability. METHODS: We present a comparison of three approaches for LOINC transcoding applied to French data collected from real-world settings. These approaches include both a state-of-the-art language model approach and a classifier chains approach. RESULTS: Our study demonstrates that we successfully improve the performance of the baselines using the classifier chains approach and compete effectively with state-of-the-art language models. CONCLUSIONS: Our approach proves to be efficient, cost-effective despite reproducibility challenges and potential for future optimizations and dataset testing.
Assuntos
Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural , França , Logical Observation Identifiers Names and Codes , Vocabulário ControladoRESUMO
This paper presents an effort by the World Health Organization (WHO) to integrate the reference classifications of the Family of International Classifications (ICD, ICF, and ICHI) into a unified digital framework. The integration was accomplished via an expanded Content Model and a single Foundation that hosts all entities from these classifications, allowing the traditional use cases of individual classifications to be retained while enhancing their combined use. The harmonized WHO-FIC Content Model and the unified Foundation has streamlined the content management, enhanced the web-based tool functionalities, and provided opportunities for linkage with external terminologies and ontologies. This integration promises reduced maintenance cost, seamless joint application, complete representation of health-related concepts while enabling better interoperability with other informatics infrastructures.
Assuntos
Classificação Internacional de Doenças , Organização Mundial da Saúde , Vocabulário Controlado , Humanos , Terminologia como Assunto , Classificação Internacional de Funcionalidade, Incapacidade e SaúdeRESUMO
Named Entity Recognition (NER) models based on Transformers have gained prominence for their impressive performance in various languages and domains. This work delves into the often-overlooked aspect of entity-level metrics and exposes significant discrepancies between token and entity-level evaluations. The study utilizes a corpus of synthetic French oncological reports annotated with entities representing oncological morphologies. Four different French BERT-based models are fine-tuned for token classification, and their performance is rigorously assessed at both token and entity-level. In addition to fine-tuning, we evaluate ChatGPT's ability to perform NER through prompt engineering techniques. The findings reveal a notable disparity in model effectiveness when transitioning from token to entity-level metrics, highlighting the importance of comprehensive evaluation methodologies in NER tasks. Furthermore, in comparison to BERT, ChatGPT remains limited when it comes to detecting advanced entities in French.
Assuntos
Processamento de Linguagem Natural , França , Humanos , Registros Eletrônicos de Saúde , Idioma , Neoplasias , Vocabulário ControladoRESUMO
Korean National Institute of Health initiated data harmonization across cohorts with the aim to ensure semantic interoperability of data and to create a common database of standardized data elements for future collaborative research. With this aim, we reviewed code books of cohorts and identified common data items and values which can be combined for data analyses. We then mapped data items and values to standard health terminologies such as SNOMED CT. Preliminary results of this ongoing data harmonization work will be presented.
Assuntos
Systematized Nomenclature of Medicine , Registros Eletrônicos de Saúde , Humanos , Semântica , Vocabulário Controlado , Terminologia como AssuntoRESUMO
Annotated language resources derived from clinical routine documentation form an intriguing asset for secondary use case scenarios. In this investigation, we report on how such a resource can be leveraged to identify additional term candidates for a chosen set of ICD-10 codes. We conducted a log-likelihood analysis, considering the co-occurrence of approximately 1.9 million de-identified ICD-10 codes alongside corresponding brief textual entries from problem lists in German. This analysis aimed to identify potential candidates with statistical significance set at p < 0.01, which were used as seed terms to harvest additional candidates by interfacing to a large language model in a second step. The proposed approach can identify additional term candidates at suitable performance values: hypernyms MAP@5=0.801, synonyms MAP@5 = 0.723 and hyponyms MAP@5 = 0.507. The re-use of existing annotated clinical datasets, in combination with large language models, presents an interesting strategy to bridge the lexical gap in standardized clinical terminologies and real-world jargon.
Assuntos
Classificação Internacional de Doenças , Processamento de Linguagem Natural , Vocabulário Controlado , Humanos , Terminologia como Assunto , Registros Eletrônicos de Saúde/classificação , AlemanhaRESUMO
Ontologies and terminologies serve as the backbone of knowledge representation in biomedical domains, facilitating data integration, interoperability, and semantic understanding across diverse applications. However, the quality assurance and enrichment of these resources remain an ongoing challenge due to the dynamic nature of biomedical knowledge. In this editorial, we provide an introductory summary of seven articles included in this special supplement issue for quality assurance and enrichment of biological and biomedical ontologies and terminologies. These articles span a spectrum of topics, such as development of automated quality assessment frameworks for Resource Description Framework (RDF) resources, identification of missing concepts in SNOMED CT through logical definitions, and developing a COVID interface terminology to enable automatic annotations of COVID-19 related Electronic Health Records (EHRs). Collectively, these contributions underscore the ongoing efforts to improve the accuracy, consistency, and interoperability of biomedical ontologies and terminologies, thus advancing their pivotal role in healthcare and biomedical research.
Assuntos
Ontologias Biológicas , Humanos , COVID-19 , Vocabulário Controlado , Registros Eletrônicos de Saúde/normasRESUMO
EHR Interoperability is crucial to obtain a set of benefits. This can be achieved by using data standards, like ontologies. The Portuguese Nursing Ontology (NursingOntos) is a reference model describing a set of nursing concepts and their relationships, to represent nursing knowledge in the Electronic Health Records (EHR). The purpose of this work was to define a set of correspondences between Nursing Ontology concepts of NursingOntos and other terminologies, which have the same or similar meaning. In this project, we are using the ISO/TR12300:2016 standard on the principles of mapping between terminological systems. Regarding the domain of "airway clearance", we can say that Portuguese Nursing Ontology has a good level of mapping with other terminologies. In conclusion, we can say that Portuguese Nursing Ontology can be used in EHR with the purpose of a global digitalization of health.
Assuntos
Registros Eletrônicos de Saúde , Terminologia Padronizada em Enfermagem , Systematized Nomenclature of Medicine , Portugal , Registros de Enfermagem , Processamento de Linguagem Natural , Vocabulário Controlado , HumanosRESUMO
This poster presentation describes innovative use of the Omaha System, a standardized terminology, into public health nurses' (PHNs) workflow and electronic records within a local health department's Childhood Lead Poisoning Prevention Program. The Omaha System facilitated the tracking of evidence-based interventions and client outcomes, showing a significant improvement in record completeness (from 33% pre-implementation to 84% post-implementation) and client outcomes in health care supervision, growth and development, and nutrition. Outcome data analysis revealed improvement across all post-implementation records from initial assessments to interim assessments for Health care supervision (p<.001), Growth and development (p<.001), and Nutrition (p = .025). This achievement has given program leaders and employees the ability to clearly present their services and results to policymakers, facilitating better assessment of the program's effectiveness. The successful implementation illustrates its potential applicability to other public health projects and areas.
Assuntos
Registros Eletrônicos de Saúde , Intoxicação por Chumbo , Intoxicação por Chumbo/prevenção & controle , Humanos , Criança , Pré-Escolar , Enfermagem em Saúde Pública , Vocabulário Controlado , LactenteRESUMO
This poster presents the use of Interpretive Description in ontology development. The methods selected attended to the need for quality and rigour.
Assuntos
Ontologias Biológicas , Humanos , Vocabulário ControladoRESUMO
OBJECTIVE: We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO). METHODS: We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept's synonyms as well as identifiers. Subsequently, we fine-tuned Llama 2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms. RESULTS: When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only â¼20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy. CONCLUSION: Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen's terms. Our approach provides a solution for the use of LLMs to identify named medical entities from clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary.
Assuntos
Ontologias Biológicas , Processamento de Linguagem Natural , Fenótipo , Doenças Raras , Vocabulário Controlado , HumanosRESUMO
OBJECTIVES: This article aims to enhance the performance of larger language models (LLMs) on the few-shot biomedical named entity recognition (NER) task by developing a simple and effective method called Retrieving and Chain-of-Thought (RT) framework and to evaluate the improvement after applying RT framework. MATERIALS AND METHODS: Given the remarkable advancements in retrieval-based language model and Chain-of-Thought across various natural language processing tasks, we propose a pioneering RT framework designed to amalgamate both approaches. The RT approach encompasses dedicated modules for information retrieval and Chain-of-Thought processes. In the retrieval module, RT discerns pertinent examples from demonstrations during instructional tuning for each input sentence. Subsequently, the Chain-of-Thought module employs a systematic reasoning process to identify entities. We conducted a comprehensive comparative analysis of our RT framework against 16 other models for few-shot NER tasks on BC5CDR and NCBI corpora. Additionally, we explored the impacts of negative samples, output formats, and missing data on performance. RESULTS: Our proposed RT framework outperforms other LMs for few-shot NER tasks with micro-F1 scores of 93.50 and 91.76 on BC5CDR and NCBI corpora, respectively. We found that using both positive and negative samples, Chain-of-Thought (vs Tree-of-Thought) performed better. Additionally, utilization of a partially annotated dataset has a marginal effect of the model performance. DISCUSSION: This is the first investigation to combine a retrieval-based LLM and Chain-of-Thought methodology to enhance the performance in biomedical few-shot NER. The retrieval-based LLM aids in retrieving the most relevant examples of the input sentence, offering crucial knowledge to predict the entity in the sentence. We also conducted a meticulous examination of our methodology, incorporating an ablation study. CONCLUSION: The RT framework with LLM has demonstrated state-of-the-art performance on few-shot NER tasks.
Assuntos
Processamento de Linguagem Natural , Vocabulário Controlado , Armazenamento e Recuperação da Informação/métodosRESUMO
Biomedical terminologies play a vital role in managing biomedical data. Missing IS-A relations in a biomedical terminology could be detrimental to its downstream usages. In this paper, we investigate an approach combining logical definitions and lexical features to discover missing IS-A relations in two biomedical terminologies: SNOMED CT and the National Cancer Institute (NCI) thesaurus. The method is applied to unrelated concept-pairs within non-lattice subgraphs: graph fragments within a terminology likely to contain various inconsistencies. Our approach first compares whether the logical definition of a concept is more general than that of the other concept. Then, we check whether the lexical features of the concept are contained in those of the other concept. If both constraints are satisfied, we suggest a potentially missing IS-A relation between the two concepts. The method identified 982 potential missing IS-A relations for SNOMED CT and 100 for NCI thesaurus. In order to assess the efficacy of our approach, a random sample of results belonging to the "Clinical Findings" and "Procedure" subhierarchies of SNOMED CT and results belonging to the "Drug, Food, Chemical or Biomedical Material" subhierarchy of the NCI thesaurus were evaluated by domain experts. The evaluation results revealed that 118 out of 150 suggestions are valid for SNOMED CT and 17 out of 20 are valid for NCI thesaurus.