RESUMO
SNOMED CT postcoordination is an underused mechanism that can help to implement advanced systems for the automatic extraction and encoding of clinical information from text. It allows defining non-existing SNOMED CT concepts by their relationships with existing ones. Manually building postcoordinated expressions is a difficult task. It requires a deep knowledge of the terminology and the support of specialized tools that barely exist. In order to support the building of postcoordinated expressions, we have implemented KGE4SCT: a method that suggests the corresponding SNOMED CT postcoordinated expression for a given clinical term. We leverage on the SNOMED CT ontology and its graph-like structure and use knowledge graph embeddings (KGEs). The objective of such embeddings is to represent in a vector space knowledge graph components (e.g. entities and relations) in a way that captures the structure of the graph. Then, we use vector similarity and analogies for obtaining the postcoordinated expression of a given clinical term. We obtained a semantic type accuracy of 98%, relationship accuracy of 90%, and analogy accuracy of 60%, with an overall completeness of postcoordination of 52% for the Spanish SNOMED CT version. We have also applied it to the English SNOMED CT version and outperformed state of the art methods in both, corpus generation for language model training for this task (improvement of 6% for analogy accuracy), and automatic postcoordination of SNOMED CT expressions, with an increase of 17% for partial conversion rate.
Assuntos
Semântica , Systematized Nomenclature of Medicine , Reconhecimento Automatizado de Padrão , Idioma , Processamento de Linguagem NaturalRESUMO
Clinical models are artefacts that specify how information is structured in electronic health records (EHRs). However, the makeup of clinical models is not guided by any formal constraint beyond a semantically vague information model. We address this gap by advocating ontology design patterns as a mechanism that makes the semantics of clinical models explicit. This paper demonstrates how ontology design patterns can validate existing clinical models using SHACL. Based on the Clinical Information Modelling Initiative (CIMI), we show how ontology patterns detect both modeling and terminology binding errors in CIMI models. SHACL, a W3C constraint language for the validation of RDF graphs, builds on the concept of "Shape", a description of data in terms of expected cardinalities, datatypes and other restrictions. SHACL, as opposed to OWL, subscribes to the Closed World Assumption (CWA) and is therefore more suitable for the validation of clinical models. We have demonstrated the feasibility of the approach by manually describing the correspondences between six CIMI clinical models represented in RDF and two SHACL ontology design patterns. Using a Java-based SHACL implementation, we found at least eleven modeling and binding errors within these CIMI models. This demonstrates the usefulness of ontology design patterns not only as a modeling tool but also as a tool for validation.
Assuntos
Registros Eletrônicos de Saúde , Modelos Teóricos , Artefatos , Humanos , Terminologia como AssuntoRESUMO
International interoperability of healthcare and research data requires a commitment to standards. To this end, SNOMED CT was evaluated for representing questionnaire items of the European Registry of Stroke Care Quality using a complex annotation protocol. The agreement between validators and annotators was 72.4%. At least 64% of the information could be represented by using SNOMED CT only, including complex post-coordinations. 9% of the information would require an information model, and 14% the addition of new content to SNOMED CT. Next steps will be the creation of an annotation guideline for questionnaires, a specific reference set, and the combination of both with an information model such as HL7 FHIR.
Assuntos
Sistema de Registros , Acidente Vascular Cerebral , Systematized Nomenclature of Medicine , Humanos , Inquéritos e Questionários , Europa (Continente) , Registros Eletrônicos de Saúde/normasRESUMO
The translational research community, in general, and the Clinical and Translational Science Awards (CTSA) community, in particular, share the vision of repurposing EHRs for research that will improve the quality of clinical practice. Many members of these communities are also aware that electronic health records (EHRs) suffer limitations of data becoming poorly structured, biased, and unusable out of original context. This creates obstacles to the continuity of care, utility, quality improvement, and translational research. Analogous limitations to sharing objective data in other areas of the natural sciences have been successfully overcome by developing and using common ontologies. This White Paper presents the authors' rationale for the use of ontologies with computable semantics for the improvement of clinical data quality and EHR usability formulated for researchers with a stake in clinical and translational science and who are advocates for the use of information technology in medicine but at the same time are concerned by current major shortfalls. This White Paper outlines pitfalls, opportunities, and solutions and recommends increased investment in research and development of ontologies with computable semantics for a new generation of EHRs.
RESUMO
Linking Electronic Healthcare Records (EHR) content to educational materials has been considered a key international recommendation to enable clinical engagement and to promote patient safety. This would suggest citizens to access reliable information available on the web and to guide them properly. In this paper, we describe an approach in that direction, based on the use of dual model EHR standards and standardized educational contents. The recommendation method will be based on the semantic coverage of the learning content repository for a particular archetype, which will be calculated by applying semantic web technologies like ontologies and semantic annotations.
Assuntos
Instrução por Computador/normas , Educação Médica/métodos , Educação Médica/normas , Registros Eletrônicos de Saúde , Registros de Saúde Pessoal , Informática Médica/normas , Registro Médico Coordenado/normas , Internet/normas , Semântica , EspanhaRESUMO
Data integration is an increasing need in medical informatics projects like the EU Precise4Q project, in which multidisciplinary semantically and syntactically heterogeneous data across several institutions needs to be integrated. Besides, data sharing agreements often allow a virtual data integration only, because data cannot leave the source repository. We propose a data harmonization infrastructure in which data is virtually integrated by sharing a semantically rich common data representation that allows their homogeneous querying. This common data model integrates content from well-known biomedical ontologies like SNOMED CT by using the BTL2 upper level ontology, and is imported into a graph database. We successfully integrated three datasets and made some test queries showing the feasibility of the approach.
Assuntos
Ontologias Biológicas , Informática Médica , Bases de Dados Factuais , Semântica , Systematized Nomenclature of MedicineRESUMO
The communication between health information systems of hospitals and primary care organizations is currently an important challenge to improve the quality of clinical practice and patient safety. However, clinical information is usually distributed among several independent systems that may be syntactically or semantically incompatible. This fact prevents healthcare professionals from accessing clinical information of patients in an understandable and normalized way. In this work, we address the semantic interoperability of two EHR standards: OpenEHR and ISO EN 13606. Both standards follow the dual model approach which distinguishes information and knowledge, this being represented through archetypes. The solution presented here is capable of transforming OpenEHR archetypes into ISO EN 13606 and vice versa by combining Semantic Web and Model-driven Engineering technologies. The resulting software implementation has been tested using publicly available collections of archetypes for both standards.
Assuntos
Redes de Comunicação de Computadores , Sistemas de Gerenciamento de Base de Dados , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Modelos TeóricosRESUMO
In this paper, we present the ResearchEHR project. It focuses on the usability of Electronic Health Record (EHR) sources and EHR standards for building advanced clinical systems. The aim is to support healthcare professional, institutions and authorities by providing a set of generic methods and tools for the capture, standardization, integration, description and dissemination of health related information. ResearchEHR combines several tools to manage EHR at two different levels. The internal level that deals with the normalization and semantic upgrading of exiting EHR by using archetypes and the external level that uses Semantic Web technologies to specify clinical archetypes for advanced EHR architectures and systems.
Assuntos
Pesquisa Biomédica/métodos , Registros Eletrônicos de Saúde/organização & administração , Registro Médico Coordenado/métodos , Semântica , Pesquisa Biomédica/normas , Registros Eletrônicos de Saúde/normas , Humanos , Integração de SistemasRESUMO
The life-long clinical information of any person supported by electronic means configures his Electronic Health Record (EHR). This information is usually distributed among several independent and heterogeneous systems that may be syntactically or semantically incompatible. There are currently different standards for representing and exchanging EHR information among different systems. In advanced EHR approaches, clinical information is represented by means of archetypes. Most of these approaches use the Archetype Definition Language (ADL) to specify archetypes. However, ADL has some drawbacks when attempting to perform semantic activities in Semantic Web environments. In this work, Semantic Web technologies are used to specify clinical archetypes for advanced EHR architectures. The advantages of using the Ontology Web Language (OWL) instead of ADL are described and discussed in this work. Moreover, a solution combining Semantic Web and Model-driven Engineering technologies is proposed to transform ADL into OWL for the CEN EN13606 EHR architecture.
Assuntos
Biologia Computacional/métodos , Informática Médica/métodos , Sistemas Computadorizados de Registros Médicos , Sistemas de Gerenciamento de Base de Dados , Humanos , Linguagens de Programação , Semântica , Integração de Sistemas , Vocabulário ControladoRESUMO
Semantic interoperability of clinical standards is a major challenge in eHealth across Europe. It would allow healthcare professionals to manage the complete electronic healthcare record of the patient regardless of which institution generated each clinical session. Clinical archetypes are fundamental for the consecution of semantic interoperability, but they are built for particular electronic healthcare record standards. Therefore, methods for transforming archetypes between standards are needed. In this work, a method for transforming archetypes between ISO 13606 and openEHR, based on Model-Driven Engineering and Semantic Web technologies, is presented.
Assuntos
Armazenamento e Recuperação da Informação/normas , Sistemas Computadorizados de Registros Médicos/normas , Semântica , Sistemas Computadorizados de Registros Médicos/organização & administração , Terminologia como AssuntoRESUMO
Semantic standards and human language technologies are key enablers for semantic interoperability across heterogeneous document and data collections in clinical information systems. Data provenance is awarded increasing attention, and it is especially critical where clinical data are automatically extracted from original documents, e.g. by text mining. This paper demonstrates how the output of a commercial clinical text-mining tool can be harmonised with FHIR, the leading clinical information model standard. Character ranges that indicate the origin of an annotation and machine generates confidence values were identified as crucial elements of data provenance in order to enrich text-mining results. We have specified and requested necessary extensions to the FHIR standard and demonstrated how, as a result, important metadata describing processes generating FHIR instances from clinical narratives can be embedded.
Assuntos
Mineração de Dados , Registros Eletrônicos de Saúde , Atenção à Saúde , Humanos , Metadados , SemânticaRESUMO
SNOMED CT provides about 300,000 codes with fine-grained concept definitions to support interoperability of health data. Coding clinical texts with medical terminologies it is not a trivial task and is prone to disagreements between coders. We conducted a qualitative analysis to identify sources of disagreements on an annotation experiment which used a subset of SNOMED CT with some restrictions. A corpus of 20 English clinical text fragments from diverse origins and languages was annotated independently by two domain medically trained annotators following a specific annotation guideline. By following this guideline, the annotators had to assign sets of SNOMED CT codes to noun phrases, together with concept and term coverage ratings. Then, the annotations were manually examined against a reference standard to determine sources of disagreements. Five categories were identified. In our results, the most frequent cause of inter-annotator disagreement was related to human issues. In several cases disagreements revealed gaps in the annotation guidelines and lack of training of annotators. The reminder issues can be influenced by some SNOMED CT features.
Assuntos
Curadoria de Dados , Systematized Nomenclature of Medicine , Estudos de Avaliação como Assunto , Guias como Assunto , HumanosRESUMO
BioTop is a domain upper level ontology for the life sciences, based on OWL DL, introduced ten years ago. This paper provides an update of the current state of this resource, with a special focus on BioTop's top level, BioTopLite, which currently contains 55 classes, 37 object properties and 247 description logics axioms. A bridging file allows harmonising BioTopLite with the classes of Basic Formal Ontology BFO2. The updated OWL resources are available at http://purl.org/biotop. They build the core of several upper level ontological artefacts including bridging ontologies to other upper level resources.
Assuntos
Ontologias Biológicas , Armazenamento e Recuperação da Informação , Software , Bases de Dados Factuais , HumanosRESUMO
"A solid ontology-based analysis with a rigorous formal mapping for correctness" is one of the ten reasons why the HL7 standard Fast Healthcare Interoperability Resources (FHIR) is advertised to be better than other standards for EHR interoperability. In this paper, we aim at contributing to this formal analysis by proposing an RDF representation of a subset of FHIR resources based on a highly constrained top-level ontology and guided by the use of a set of Content Ontology Design Patterns (Content ODPs) for representing clinical information. We exemplify this by reinterpreting FHIR medication resources. Although a manual task now, we foresee a possible automatic translation by using RDF shapes.
Assuntos
Ontologias Biológicas , Registros Eletrônicos de Saúde/normas , Armazenamento e Recuperação da Informação , Medicamentos sob Prescrição , Prescrições , SemânticaRESUMO
Routine patient data in electronic patient records are only partly structured, and an even smaller segment is coded, mainly for administrative purposes. Large parts are only available as free text. Transforming this content into a structured and semantically explicit form is a prerequisite for querying and information extraction. The core of the system architecture presented in this paper is based on SAP HANA in-memory database technology using the SAP Connected Health platform for data integration as well as for clinical data warehousing. A natural language processing pipeline analyses unstructured content and maps it to a standardized vocabulary within a well-defined information model. The resulting semantically standardized patient profiles are used for a broad range of clinical and research application scenarios.
Assuntos
Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Humanos , Semântica , Vocabulário ControladoRESUMO
SNOMED CT supports post-coordination, a technique to combine clinical concepts to ontologically define more complex concepts. This technique follows the validity restrictions defined in the SNOMED CT Concept Model. Pre-coordinated expressions are compositional expressions already in SNOMED CT, whereas post-coordinated expressions extend its content. In this project we aim to evaluate the suitability of existing pre-coordinated expressions to provide the patterns for composing typical clinical information based on a defined list of sets of interrelated SNOMED CT concepts. The method produces a 9.3% precision and a 95.9% recall. As a consequence, further investigations are needed to develop heuristics for the selection of the most meaningful matched patterns to improve the precision.
Assuntos
Armazenamento e Recuperação da Informação , Systematized Nomenclature of Medicine , Vocabulário ControladoRESUMO
The integration of heterogeneous ontologies is often hampered by different upper level categories and relations. We report on an on-going effort to align clinical terminology/ontology SNOMED CT with the formal upper-level ontology BioTopLite. This alignment introduces several constraints at the OWL-DL level. The mapping was done manually by analysing formal and textual definitions. Descriptive logic classifiers interactively checked mapping steps, using small modules for increasing performance. We present an effective workflow, using modules of several scales. However, only part of the classes and relations could easily be mapped. The implications for future evolution of SNOMED CT are discussed. It seems generally feasible to use a highly constrained upper-level ontology as an upper level for the benefit of future SNOMED CT versions that are more interoperable with other biomedical ontologies.
Assuntos
Ontologias Biológicas , Disseminação de Informação/métodos , Systematized Nomenclature of Medicine , Ontologias Biológicas/organização & administração , HumanosRESUMO
OBJECTIVE: To improve semantic interoperability of electronic health records (EHRs) by ontology-based mediation across syntactically heterogeneous representations of the same or similar clinical information. MATERIALS AND METHODS: Our approach is based on a semantic layer that consists of: (1) a set of ontologies supported by (2) a set of semantic patterns. The first aspect of the semantic layer helps standardize the clinical information modeling task and the second shields modelers from the complexity of ontology modeling. We applied this approach to heterogeneous representations of an excerpt of a heart failure summary. RESULTS: Using a set of finite top-level patterns to derive semantic patterns, we demonstrate that those patterns, or compositions thereof, can be used to represent information from clinical models. Homogeneous querying of the same or similar information, when represented according to heterogeneous clinical models, is feasible. DISCUSSION: Our approach focuses on the meaning embedded in EHRs, regardless of their structure. This complex task requires a clear ontological commitment (ie, agreement to consistently use the shared vocabulary within some context), together with formalization rules. These requirements are supported by semantic patterns. Other potential uses of this approach, such as clinical models validation, require further investigation. CONCLUSION: We show how an ontology-based representation of a clinical summary, guided by semantic patterns, allows homogeneous querying of heterogeneous information structures. Whether there are a finite number of top-level patterns is an open question.
Assuntos
Registros Eletrônicos de Saúde , Insuficiência Cardíaca , Registro Médico Coordenado , Vocabulário Controlado , Inteligência Artificial , Insuficiência Cardíaca/classificação , Humanos , Linguagens de Programação , Semântica , Systematized Nomenclature of Medicine , Integração de Sistemas , Terminologia como AssuntoRESUMO
The massive accumulation of biomedical knowledge is reflected by the growth of the literature database MEDLINE with over 23 million bibliographic records. All records are manually indexed by MeSH descriptors, many of them refined by MeSH subheadings. We use subheading information to cluster types of MeSH descriptor co-occurrences in MEDLINE by processing co-occurrence information provided by the UMLS. The goal is to infer plausible predicates to each resulting cluster. In an initial experiment this was done by grouping disease-pharmacologic substance co-occurrences into six clusters. Then, a domain expert manually performed the assignment of meaningful predicates to the clusters. The mean accuracy of the best ten generated biomedical facts of each cluster was 85%. This result supports the evidence of the potential of MeSH subheadings for extracting plausible medical predications from MEDLINE.
Assuntos
Bases de Conhecimento , MEDLINE/estatística & dados numéricos , Medical Subject Headings , Processamento de Linguagem Natural , Publicações Periódicas como Assunto/estatística & dados numéricos , Análise por Conglomerados , Mineração de Dados/métodos , Aprendizado de Máquina , Terminologia como AssuntoRESUMO
We propose a semantic-driven architecture to improve EHR semantic interoperability. This architecture is constituted by five layers: structured heterogeneous data (i), as found in (un-)standardised clinical information models, which are consumed by a semantic mapping layer (ii), which links the data items to clinical ontologies via user-friendly content patterns. A semantic mediator (iii) then translates these content patterns into ontology-based annotations, which populate a virtual homogeneous data store (iv), which serves the application layer (v).