RESUMEN
BACKGROUND: Many disease causing genes have been identified through different methods, but there have been no uniform annotations of biomedical named entity (bio-NE) of the disease phenotypes of these genes yet. Furthermore, semantic similarity comparison between two bio-NE annotations has become important for data integration or system genetics analysis. RESULTS: The package pyMeSHSim recognizes bio-NEs by using MetaMap which produces Unified Medical Language System (UMLS) concepts in natural language process. To map the UMLS concepts to Medical Subject Headings (MeSH), pyMeSHSim is embedded with a house-made dataset containing the main headings (MHs), supplementary concept records (SCRs), and their relations in MeSH. Based on the dataset, pyMeSHSim implemented four information content (IC)-based algorithms and one graph-based algorithm to measure the semantic similarity between two MeSH terms. To evaluate its performance, we used pyMeSHSim to parse OMIM and GWAS phenotypes. The pyMeSHSim introduced SCRs and the curation strategy of non-MeSH-synonymous UMLS concepts, which improved the performance of pyMeSHSim in the recognition of OMIM phenotypes. In the curation of 461 GWAS phenotypes, pyMeSHSim showed recall > 0.94, precision > 0.56, and F1 > 0.70, demonstrating better performance than the state-of-the-art tools DNorm and TaggerOne in recognizing MeSH terms from short biomedical phrases. The semantic similarity in MeSH terms recognized by pyMeSHSim and the previous manual work was calculated by pyMeSHSim and another semantic analysis tool meshes, respectively. The result indicated that the correlation of semantic similarity analysed by two tools reached as high as 0.89-0.99. CONCLUSIONS: The integrative MeSH tool pyMeSHSim embedded with the MeSH MHs and SCRs realized the bio-NE recognition, normalization, and comparison in biomedical text-mining.
Asunto(s)
Medical Subject Headings , Semántica , Unified Medical Language System/normas , HumanosRESUMEN
OBJECTIVE: Misspellings in clinical free text present challenges to natural language processing. With an objective to identify misspellings and their corrections, we developed a prototype spelling analysis method that implements Word2Vec, Levenshtein edit distance constraints, a lexical resource, and corpus term frequencies. We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans Health Administration resources. We evaluated performance by measuring positive predictive value and performing an error analysis of false positive output, using four classifications. We also performed an analysis of spelling errors in each corpus, using common error classifications. RESULTS: In this small-scale study utilizing a total of 76,786 clinical notes, the prototype method achieved positive predictive values of 0.9057 and 0.8979, respectively, for the surgical pathology reports, and emergency department progress and visit notes, in identifying and correcting misspelled words. False positives varied by corpus. Spelling error types were similar among the two corpora, however, the authors of emergency department progress and visit notes made over four times as many errors. Overall, the results of this study suggest that this method could also perform sufficiently in identifying misspellings in other clinical document types.
Asunto(s)
Diccionarios como Asunto , Informática Médica/métodos , Procesamiento de Lenguaje Natural , Vocabulario Controlado , Algoritmos , Humanos , Lenguaje , Informática Médica/normas , Informática Médica/estadística & datos numéricos , Sistemas de Registros Médicos Computarizados/normas , Sistemas de Registros Médicos Computarizados/estadística & datos numéricos , Patología Quirúrgica/métodos , Reproducibilidad de los Resultados , Informe de Investigación/normas , Unified Medical Language System/normas , Unified Medical Language System/estadística & datos numéricosRESUMEN
BACKGROUND: Semantic interoperability of eHealth services within and across countries has been the main topic in several research projects. It is a key consideration for the European Commission to overcome the complexity of making different health information systems work together. This paper describes a study within the EU-funded project ASSESS CT, which focuses on assessing the potential of SNOMED CT as core reference terminology for semantic interoperability at European level. OBJECTIVE: This paper presents a quantitative analysis of the results obtained in ASSESS CT to determine the fitness of SNOMED CT for semantic interoperability. METHODS: The quantitative analysis consists of concept coverage, term coverage and inter-annotator agreement analysis of the annotation experiments related to six European languages (English, Swedish, French, Dutch, German and Finnish) and three scenarios: (i) ADOPT, where only SNOMED CT was used by the annotators; (ii) ALTERNATIVE, where a fixed set of terminologies from UMLS, excluding SNOMED CT, was used; and (iii) ABSTAIN, where any terminologies available in the current national infrastructure of the annotators' country were used. For each language and each scenario, we configured the different terminology settings of the annotation experiments. RESULTS: There was a positive correlation between the number of concepts in each terminology setting and their concept and term coverage values. Inter-annotator agreement is low, irrespective of the terminology setting. CONCLUSIONS: No significant differences were found between the analyses for the three scenarios, but availability of SNOMED CT for the assessed language is associated with increased concept coverage. Terminology setting size and concept and term coverage correlate positively up to a limit where more concepts do not significantly impact the coverage values. The results did not confirm the hypothesis of an inverse correlation between concept coverage and IAA due to a lower amount of choices available. The overall low IAA results pose a challenge for interoperability and indicate the need for further research to assess whether consistent terminology implementation is possible across Europe, e.g., improving term coverage by adding localized versions of the selected terminologies, analysing causes of low inter-annotator agreement, and improving tooling and guidance for annotators. The much lower term coverage for the Swedish version of SNOMED CT compared to English together with the similarly high concept coverage obtained with English and Swedish SNOMED CT reflects its relevance as a hub to connect user interface terminologies and serving a variety of user needs.
Asunto(s)
Informática Médica/métodos , Procesamiento de Lenguaje Natural , Semántica , Systematized Nomenclature of Medicine , Unified Medical Language System/normas , Europa (Continente) , HumanosRESUMEN
BACKGROUND: Medical coding is essential for standardized communication and integration of clinical data. The Unified Medical Language System by the National Library of Medicine is the largest clinical terminology system for medical coders and Natural Language Processing tools. However, the abundance of ambiguous codes leads to low rates of uniform coding among different coders. OBJECTIVE: The objective of our study was to measure uniform coding among different medical experts in terms of interrater reliability and analyze the effect on interrater reliability using an expert- and Web-based code suggestion system. METHODS: We conducted a quasi-experimental study in which 6 medical experts coded 602 medical items from structured quality assurance forms or free-text eligibility criteria of 20 different clinical trials. The medical item content was selected on the basis of mortality-leading diseases according to World Health Organization data. The intervention comprised using a semiautomatic code suggestion tool that is linked to a European information infrastructure providing a large medical text corpus of >300,000 medical form items with expert-assigned semantic codes. Krippendorff alpha (Kalpha) with bootstrap analysis was used for the interrater reliability analysis, and coding times were measured before and after the intervention. RESULTS: The intervention improved interrater reliability in structured quality assurance form items (from Kalpha=0.50, 95% CI 0.43-0.57 to Kalpha=0.62 95% CI 0.55-0.69) and free-text eligibility criteria (from Kalpha=0.19, 95% CI 0.14-0.24 to Kalpha=0.43, 95% CI 0.37-0.50) while preserving or slightly reducing the mean coding time per item for all 6 coders. Regardless of the intervention, precoordination and structured items were associated with significantly high interrater reliability, but the proportion of items that were precoordinated significantly increased after intervention (eligibility criteria: OR 4.92, 95% CI 2.78-8.72; quality assurance: OR 1.96, 95% CI 1.19-3.25). CONCLUSIONS: The Web-based code suggestion mechanism improved interrater reliability toward moderate or even substantial intercoder agreement. Precoordination and the use of structured versus free-text data elements are key drivers of higher interrater reliability.
Asunto(s)
Codificación Clínica/métodos , Ensayos Clínicos Controlados no Aleatorios como Asunto/métodos , Unified Medical Language System/normas , Humanos , Internet , Procesamiento de Lenguaje Natural , Reproducibilidad de los ResultadosRESUMEN
BACKGROUND: Many health care systems now allow patients to access their electronic health record (EHR) notes online through patient portals. Medical jargon in EHR notes can confuse patients, which may interfere with potential benefits of patient access to EHR notes. OBJECTIVE: The aim of this study was to develop and evaluate the usability and content quality of NoteAid, a Web-based natural language processing system that links medical terms in EHR notes to lay definitions, that is, definitions easily understood by lay people. METHODS: NoteAid incorporates two core components: CoDeMed, a lexical resource of lay definitions for medical terms, and MedLink, a computational unit that links medical terms to lay definitions. We developed innovative computational methods, including an adapted distant supervision algorithm to prioritize medical terms important for EHR comprehension to facilitate the effort of building CoDeMed. Ten physician domain experts evaluated the user interface and content quality of NoteAid. The evaluation protocol included a cognitive walkthrough session and a postsession questionnaire. Physician feedback sessions were audio-recorded. We used standard content analysis methods to analyze qualitative data from these sessions. RESULTS: Physician feedback was mixed. Positive feedback on NoteAid included (1) Easy to use, (2) Good visual display, (3) Satisfactory system speed, and (4) Adequate lay definitions. Opportunities for improvement arising from evaluation sessions and feedback included (1) improving the display of definitions for partially matched terms, (2) including more medical terms in CoDeMed, (3) improving the handling of terms whose definitions vary depending on different contexts, and (4) standardizing the scope of definitions for medicines. On the basis of these results, we have improved NoteAid's user interface and a number of definitions, and added 4502 more definitions in CoDeMed. CONCLUSIONS: Physician evaluation yielded useful feedback for content validation and refinement of this innovative tool that has the potential to improve patient EHR comprehension and experience using patient portals. Future ongoing work will develop algorithms to handle ambiguous medical terms and test and evaluate NoteAid with patients.
Asunto(s)
Registros Electrónicos de Salud/normas , PubMed/normas , Unified Medical Language System/normas , Humanos , Procesamiento de Lenguaje Natural , MédicosRESUMEN
BACKGROUND: Radiology reporting is a clinically oriented form of documentation that reflects critical information for patients about their health care processes. Realizing its importance, many medical institutions have started providing radiology reports in patient portals. The gain, however, can be limited because of medical language barriers, which require a way for customizing these reports for patients. The open-access, collaborative consumer health vocabulary (CHV) is a terminology system created for such purposes and can be the basis of lexical simplification processes for clinical notes. OBJECTIVE: The aim of this study was to examine the comprehensibility and suitability of CHV in simplifying radiology reports for consumers. This was done by characterizing the content coverage and the lexical similarity between the terms in the reports and the CHV-preferred terms. METHODS: The overall procedure was divided into the following two main stages: (1) translation and (2) evaluation. The translation process involved using MetaMap to link terms in the reports to CHV concepts. This is followed by replacing the terms with CHV-preferred terms using the concept names and sources table (MRCONSO) in the Unified Medical Language System (UMLS) Metathesaurus. In the second stage, medical terms in the reports and general terms that are used to describe medical phenomena were selected and evaluated by comparing the words in the original reports with the translated ones. The evaluation includes measuring the content coverage, investigating lexical similarity, and finding trends in missing concepts. RESULTS: Of the 792 terms selected from the radiology reports, 695 of them could be mapped directly to CHV concepts, indicating a content coverage of 88.5%. A total of 51 of the concepts (53%, 51/97) that could not be mapped are names of human anatomical structures and regions, followed by 28 anatomical descriptions and pathological variations (29%, 28/97). In addition, 12 radiology techniques and projections represented 12% of the unmapped concepts, whereas the remaining six concepts (6%, 12/97) were physiological descriptions. The rate of lexical similarity between the CHV-preferred terms and the terms in the radiology reports was approximately 72.6%. CONCLUSIONS: The CHV covered a high percentage of concepts found in the radiology reports, but unmapped concepts are associated with areas that are commonly found in radiology reporting. CHV terms also showed a high percentage of lexical similarity with terms in the reports, which contain a myriad of medical jargon. This suggests that many CHV terms might not be suitable for lay consumers who would not be facile with radiology-specific vocabulary. Therefore, further patient-centered content changes are needed of the CHV to increase its usefulness and facilitate its integration into consumer-oriented applications.
Asunto(s)
Registros Electrónicos de Salud/normas , Radiología/normas , Unified Medical Language System/normas , HumanosRESUMEN
Los epónimos vienen siendo utilizados desde hace siglos. Su uso habitual constituye una de las características del lenguaje de las ciencias médicas y está extendido a todas las especialidades, formando parte de su cultura y de la historia de la Medicina. Se abordan los epónimos en el campo de varias especialidades médicas, así como el debate científico a favor y en contra de su uso, considerando que no son pocas las voces que apoyan su erradicación total; esto es algo que todavía resulta difícil pensar, ya que se cree que los epónimos aportan más de lo que podrían ofrecer otros recursos lingüísticos. Se reconoce la existencia de epónimos cubanos, que no se han estudiado lo suficiente (AU).
Eponyms have been used during centuries. Their common use is one of the characteristics of the medical sciences language, reaches all the specialties, and is part of the Medicine culture and history. The use of eponyms in the field of several medical specialties is approached and also the scientific dispute in favor or against their use, taking into consideration that no few voices back their total eradication; it is still something difficult to understand because it is believed that eponyms are more fruitful than what is offered by other linguistic resources. The existence of Cuban eponyms that are still not sufficiently studied is recognized (AU).
Asunto(s)
Humanos , Epónimos , Medicina/tendencias , Literatura de Revisión como Asunto , Unified Medical Language System/normas , Unified Medical Language System/tendencias , Terminología , Historia de la Medicina , Medicina/métodos , Medicina/normasRESUMEN
BACKGROUND: During clinical case diagnoses, especially in low-resourced areas, the use of vocabularies within Unified Medical Language System (UMLS) can strengthen discussions between health professionals and, in certain cases, eliminate the need, enabling faster treatment. INTRODUCTION: This article presents the benefits of using UMLS as a collaborative discussion tool and verifies its impact. MATERIALS AND METHODS: The Sanar system has been improved by UMLS when using text retrieval to extract relevant medical concepts from cases investigated by the user and to provide contextualized searches of related articles. An experiment was conducted, focused on team engagement and discussion of a Zika virus case using Sanar, both with and without UMLS contextualization. RESULTS: The use of the tool was measured, and it was determined that the discussion in the group with UMLS support was more complete based on better information and inclusion of more variables. Clinicians involved responded to a questionnaire evaluating the relevance of functions. DISCUSSION: From the questionnaire showed that most of the group supported UMLS as important in complex diagnostics; the use of knowledge extraction before discussion is relevant to align knowledge of participants with more variables, such as the Zika virus, and to minimize the need for interaction in widely discussed cases. CONCLUSIONS: Based on the results obtained with the questionnaire, the use of UMLS provides acceleration in the diagnostic process that precedes interaction with other health professionals through clinical discussion tools. For future work, a mobile version will support offline navigation for locations with limited Internet access.
Asunto(s)
Internet , Colaboración Intersectorial , Unified Medical Language System/normas , Vocabulario Controlado , Infección por el Virus Zika/clasificación , Infección por el Virus Zika/diagnóstico , Virus Zika/clasificación , HumanosRESUMEN
Quality assurance (QA) is a key factor to evaluate success of organ transplantations. In Germany QA documentation is progressively developed and enforced by law. Our objective is to share QA models from Germany in a standardized format within a form repository for world-wide reuse and exchange. Original QA forms were converted into standardized study forms according to the Operational Data Model (ODM) and shared for open access in an international forms repository. Form elements were translated into English and semantically enriched with Concept Unique Identifiers from the Unified Medical Language System (UMLS) based on medical expert decision. All forms are available on the web as multilingual ODM documents. UMLS concept coverage analysis indicates 92% coverage with few but critically important definition gaps. New content and infrastructure for harmonized documentation forms is provided in the domain of organ transplantations enabling world-wide reuse and exchange.
Asunto(s)
Control de Formularios y Registros/normas , Multilingüismo , Trasplante de Órganos/clasificación , Trasplante de Órganos/normas , Garantía de la Calidad de Atención de Salud/normas , Unified Medical Language System/normas , Alemania , Internet , Procesamiento de Lenguaje Natural , Guías de Práctica Clínica como AsuntoRESUMEN
BACKGROUND: Automatic coding of medical terms is an important, but highly complicated and laborious task. OBJECTIVES: To compare and evaluate different strategies a framework with a standardized web-interface was created. Two UMLS mapping strategies are compared to demonstrate the interface. METHODS: The framework is a Java Spring application running on a Tomcat application server. It accepts different parameters and returns results in JSON format. To demonstrate the framework, a list of medical data items was mapped by two different methods: similarity search in a large table of terminology codes versus search in a manually curated repository. These mappings were reviewed by a specialist. RESULTS: The evaluation shows that the framework is flexible (due to standardized interfaces like HTTP and JSON), performant and reliable. Accuracy of automatically assigned codes is limited (up to 40%). CONCLUSION: Combining different semantic mappers into a standardized Web-API is feasible. This framework can be easily enhanced due to its modular design.
Asunto(s)
Registros Electrónicos de Salud/normas , Procesamiento de Lenguaje Natural , Semántica , Programas Informáticos/normas , Terminología como Asunto , Unified Medical Language System/normas , Alemania , Internet/normas , Registro Médico Coordinado/normas , Reconocimiento de Normas Patrones Automatizadas/normasRESUMEN
Smartphones are growing in number and mobile health applications (apps) are becoming a commonly used way for improving the quality of health and healthcare delivery. Health related apps are mainly centralized in Medical and health&fitness categories in Google and Apple app stores. However, these apps are not easily accessible by the users. We decided to develop a system facilitating the access to these apps, to increase their visibility and usability. Various use cases for 567 health related apps in French were identified and listed incrementally. UML modeling was then used to represent these use cases and their relationships with each other and with the potential users of these apps. Thirty one different use cases were found that were then regrouped into six major categories: consulting medical information references, communicating and/or sharing the information, fulfilling a contextual need, educational tools, managing professional activities, health related management. A classification of this type would highlight the real purpose and functionalities of these apps and offers the user to search for the right app rapidly and to find it in a non-ambiguous context.
Asunto(s)
Aplicaciones Móviles/clasificación , Modelos Teóricos , Lenguajes de Programación , Programas Informáticos , Unified Medical Language System/normasRESUMEN
In recent years, Decision Support Systems (DSSs) have been developed and used to achieve "meaningful use". One approach to developing DSSs is to translate clinical guidelines into a computer-interpretable format. However, there is no specific guideline modeling approach to translate nursing guidelines to computer-interpretable guidelines. This results in limited use of DSSs in nursing. Unified modeling language (UML) is a software writing language known to accurately represent the end-users' perspective, due to its expressive characteristics. Furthermore, standard terminology enabled DSSs have been shown to smoothly integrate into existing health information systems. In order to facilitate development of nursing DSSs, the UML was used to represent a guideline for medication management for older adults encode with the International Classification for Nursing Practice (ICNP®). The UML was found to be a useful and sufficient tool to model a nursing guideline for a DSS.
Asunto(s)
Sistemas de Apoyo a Decisiones Clínicas , Atención de Enfermería/normas , Guías de Práctica Clínica como Asunto , Terminología como Asunto , Unified Medical Language System/normas , Simulación por Computador , Sistemas Especialistas , Humanos , Diseño de Software , Estados UnidosRESUMEN
Biomedical ontologies play a vital role in healthcare information management, data integration, and decision support. Ontology quality assurance (OQA) is an indispensable part of the ontology engineering cycle. Most existing OQA methods are based on the knowledge provided within the targeted ontology. This paper proposes a novel cross-ontology analysis method, Cross-Ontology Hierarchical Relation Examination (COHeRE), to detect inconsistencies and possible errors in hierarchical relations across multiple ontologies. COHeRE leverages the Unified Medical Language System (UMLS) knowledge source and the MapReduce cloud computing technique for systematic, large-scale ontology quality assurance work. COHeRE consists of three main steps with the UMLS concepts and relations as the input. First, the relations claimed in source vocabularies are filtered and aggregated for each pair of concepts. Second, inconsistent relations are detected if a concept pair is related by different types of relations in different source vocabularies. Finally, the uncovered inconsistent relations are voted according to their number of occurrences across different source vocabularies. The voting result together with the inconsistent relations serve as the output of COHeRE for possible ontological change. The highest votes provide initial suggestion on how such inconsistencies might be fixed. In UMLS, 138,987 concept pairs were found to have inconsistent relationships across multiple source vocabularies. 40 inconsistent concept pairs involving hierarchical relationships were randomly selected and manually reviewed by a human expert. 95.8% of the inconsistent relations involved in these concept pairs indeed exist in their source vocabularies rather than being introduced by mistake in the UMLS integration process. 73.7% of the concept pairs with suggested relationship were agreed by the human expert. The effectiveness of COHeRE indicates that UMLS provides a promising environment to enhance qualities of biomedical ontologies by performing cross-ontology examination.
Asunto(s)
Ontologías Biológicas , Nube Computacional/normas , Gestión de la Información en Salud/normas , Unified Medical Language System , Ontologías Biológicas/organización & administración , Gestión de la Información en Salud/organización & administración , Almacenamiento y Recuperación de la Información/normas , Clasificación Internacional de Enfermedades/normas , Semántica , Systematized Nomenclature of Medicine , Unified Medical Language System/organización & administración , Unified Medical Language System/normasRESUMEN
To enable the efficient reuse of standard based medical data we propose to develop a higher level information model that will complement the archetype model of ISO 13606. This model will make use of the relationships that are specified in UML to connect medical archetypes into a knowledge base within a repository. UML connectors were analyzed for their ability to be applied in the implementation of a higher level model that will establish relationships between archetypes. An information model was developed using XML Schema notation. The model allows linking different archetypes of one repository into a knowledge base. Presently it supports several relationships and will be advanced in future.
Asunto(s)
Bases de Datos Factuales/normas , Registros Electrónicos de Salud/normas , Bases del Conocimiento , Registro Médico Coordinado/normas , Procesamiento de Lenguaje Natural , Guías de Práctica Clínica como Asunto , Unified Medical Language System/normas , Internacionalidad , SemánticaRESUMEN
BACKGROUND: Visualization of Concepts in Medicine (VCM) is a compositional iconic language that aims to ease information retrieval in Electronic Health Records (EHR), clinical guidelines or other medical documents. Using VCM language in medical applications requires alignment with medical reference terminologies. Alignment from Medical Subject Headings (MeSH) thesaurus and International Classification of Diseases - tenth revision (ICD10) to VCM are presented here. This study aim was to evaluate alignment quality between VCM and other terminologies using different measures of inter-alignment agreement before integration in EHR. METHODS: For medical literature retrieval purposes and EHR browsing, the MeSH thesaurus and the ICD10, both organized hierarchically, were aligned to VCM language. Some MeSH to VCM alignments were performed automatically but others were performed manually and validated. ICD10 to VCM alignment was entirely manually performed. Inter-alignment agreement was assessed on ICD10 codes and MeSH descriptors, sharing the same Concept Unique Identifiers in the Unified Medical Language System (UMLS). Three metrics were used to compare two VCM icons: binary comparison, crude Dice Similarity Coefficient (DSCcrude), and semantic Dice Similarity Coefficient (DSCsemantic), based on Lin similarity. An analysis of discrepancies was performed. RESULTS: MeSH to VCM alignment resulted in 10,783 relations: 1,830 of which were manually performed and 8,953 were automatically inherited. ICD10 to VCM alignment led to 19,852 relations. UMLS gathered 1,887 alignments between ICD10 and MeSH. Only 1,606 of them were used for this study. Inter-alignment agreement using only validated MeSH to VCM alignment was 74.2% [70.5-78.0]CI95%, DSCcrude was 0.93 [0.91-0.94]CI95%, and DSCsemantic was 0.96 [0.95-0.96]CI95%. Discrepancy analysis revealed that even if two thirds of errors came from the reviewers, UMLS was nevertheless responsible for one third. CONCLUSIONS: This study has shown strong overall inter-alignment agreement between MeSH to VCM and ICD10 to VCM manual alignments. VCM icons have now been integrated into a guideline search engine (http://www.cismef.org) and a health terminologies portal (http://www.hetop.eu).
Asunto(s)
Almacenamiento y Recuperación de la Información/normas , Terminología como Asunto , Vocabulario Controlado , Registros Electrónicos de Salud/normas , Humanos , Clasificación Internacional de Enfermedades/estadística & datos numéricos , Medical Subject Headings/estadística & datos numéricos , Unified Medical Language System/normasRESUMEN
Medical forms are very heterogeneous: on a European scale there are thousands of data items in several hundred different systems. To enable data exchange for clinical care and research purposes there is a need to develop interoperable documentation systems with harmonized forms for data capture. A prerequisite in this harmonization process is comparison of forms. So far--to our knowledge--an automated method for comparison of medical forms is not available. A form contains a list of data items with corresponding medical concepts. An automatic comparison needs data types, item names and especially item with these unique concept codes from medical terminologies. The scope of the proposed method is a comparison of these items by comparing their concept codes (coded in UMLS). Each data item is represented by item name, concept code and value domain. Two items are called identical, if item name, concept code and value domain are the same. Two items are called matching, if only concept code and value domain are the same. Two items are called similar, if their concept codes are the same, but the value domains are different. Based on these definitions an open-source implementation for automated comparison of medical forms in ODM format with UMLS-based semantic annotations was developed. It is available as package compareODM from http://cran.r-project.org. To evaluate this method, it was applied to a set of 7 real medical forms with 285 data items from a large public ODM repository with forms for different medical purposes (research, quality management, routine care). Comparison results were visualized with grid images and dendrograms. Automated comparison of semantically annotated medical forms is feasible. Dendrograms allow a view on clustered similar forms. The approach is scalable for a large set of real medical forms.
Asunto(s)
Codificación Clínica/normas , Sistemas de Registros Médicos Computarizados/normas , Registros Médicos/normas , Unified Medical Language System/normas , Humanos , Sistemas de Registros Médicos Computarizados/instrumentación , Terminología como Asunto , Unified Medical Language System/instrumentaciónRESUMEN
OBJECTIVE: By 2015, SNOMED CT (SCT) will become the USA's standard for encoding diagnoses and problem lists in electronic health records (EHRs). To facilitate this effort, the National Library of Medicine has published the "SCT Clinical Observations Recording and Encoding" and the "Veterans Health Administration and Kaiser Permanente" problem lists (collectively, the "PL"). The PL is studied in regard to its readiness to support meaningful use of EHRs. In particular, we wish to determine if inconsistencies appearing in SCT, in general, occur as frequently in the PL, and whether further quality-assurance (QA) efforts on the PL are required. METHODS AND MATERIALS: A study is conducted where two random samples of SCT concepts are compared. The first consists of concepts strictly from the PL and the second contains general SCT concepts distributed proportionally to the PL's in terms of their hierarchies. Each sample is analyzed for its percentage of primitive concepts and for frequency of modeling errors of various severity levels as quality measures. A simple structural indicator, namely, the number of parents, is suggested to locate high likelihood inconsistencies in hierarchical relationships. The effectiveness of this indicator is evaluated. RESULTS: PL concepts are found to be slightly better than other concepts in the respective SCT hierarchies with regards to the quality measure of the percentage of primitive concepts and the frequency of modeling errors. There were 58% primitive concepts in the PL sample versus 62% in the control sample. The structural indicator of number of parents is shown to be statistically significant in its ability to identify concepts having a higher likelihood of inconsistencies in their hierarchical relationships. The absolute number of errors in the group of concepts having 1-3 parents was shown to be significantly lower than that for concepts with 4-6 parents and those with 7 or more parents based on Chi-squared analyses. CONCLUSION: PL concepts suffer from the same issues as general SCT concepts, although to a slightly lesser extent, and do require further QA efforts to promote meaningful use of EHRs. To support such efforts, a structural indicator is shown to effectively ferret out potentially problematic concepts where those QA efforts should be focused.
Asunto(s)
Inteligencia Artificial , Minería de Datos/métodos , Registros Electrónicos de Salud , Uso Significativo , Registros Médicos Orientados a Problemas , Garantía de la Calidad de Atención de Salud , Systematized Nomenclature of Medicine , Unified Medical Language System , Inteligencia Artificial/normas , Minería de Datos/normas , Registros Electrónicos de Salud/normas , Humanos , Uso Significativo/normas , Registros Médicos Orientados a Problemas/normas , National Library of Medicine (U.S.) , Garantía de la Calidad de Atención de Salud/normas , Terminología como Asunto , Unified Medical Language System/normas , Estados UnidosRESUMEN
OBJECTIVE: To assess the quality of value sets in clinical quality measures, both individually and as a population of value sets. MATERIALS AND METHODS: The concepts from a given value set are expected to be rooted by one or few ancestor concepts and the value set is expected to contain all the descendants of its root concepts and only these descendants. (1) We assessed the completeness and correctness of individual value sets by comparison to the extension derived from their roots. (2) We assessed the non-redundancy of value sets for the entire population of value sets (within a given code system) using the Jaccard similarity measure. RESULTS: We demonstrated the utility of our approach on some cases of inconsistent value sets and produced a list of 58 potentially duplicate value sets from the current set of clinical quality measures for the 2014 Meaningful Use criteria. CONCLUSION: These metrics are easy to compute and provide compact indicators of the completeness, correctness, and non-redundancy of value sets.
Asunto(s)
Indicadores de Calidad de la Atención de Salud , Vocabulario Controlado , Registros Electrónicos de Salud , Clasificación Internacional de Enfermedades , National Library of Medicine (U.S.) , Systematized Nomenclature of Medicine , Unified Medical Language System/normas , Estados UnidosRESUMEN
Auditing healthcare terminologies for errors requires human experts. In this paper, we present a study of the performance of auditors looking for errors in the semantic type assignments of complex UMLS concepts. In this study, concepts are considered complex whenever they are assigned combinations of semantic types. Past research has shown that complex concepts have a higher likelihood of errors. The results of this study indicate that individual auditors are not reliable when auditing such concepts and their performance is low, according to various metrics. These results confirm the outcomes of an earlier pilot study. They imply that to achieve an acceptable level of reliability and performance, when auditing such concepts of the UMLS, several auditors need to be assigned the same task. A mechanism is then needed to combine the possibly differing opinions of the different auditors into a final determination. In the current study, in contrast to our previous work, we used a majority mechanism for this purpose. For a sample of 232 complex UMLS concepts, the majority opinion was found reliable and its performance for accuracy, recall, precision and the F-measure was found statistically significantly higher than the average performance of individual auditors.
Asunto(s)
Semántica , Unified Medical Language System/normas , Humanos , Reproducibilidad de los Resultados , Terminología como AsuntoRESUMEN
BACKGROUND: PubMed is the main access to medical literature on the Internet. In order to enhance the performance of its information retrieval tools, primarily non-indexed citations, the authors propose a method: expanding users' queries using Unified Medical Language System' (UMLS) synonyms i.e. all the terms gathered under one unique Concept Unique Identifier. METHODS: This method was evaluated using queries constructed to emphasize the differences between this new method and the current PubMed automatic term mapping. Four experts assessed citation relevance. RESULTS: Using UMLS, we were able to retrieve new citations in 45.5% of queries, which implies a small increase in recall. The new strategy led to a heterogeneous 23.7% mean increase in non-indexed citation retrieved. Of these, 82% have been published less than 4 months earlier. The overall mean precision was 48.4% but differed according to the evaluators, ranging from 36.7% to 88.1% (Inter rater agreement was poor: kappa = 0.34). CONCLUSIONS: This study highlights the need for specific search tools for each type of user and use-cases. The proposed strategy may be useful to retrieve recent scientific advancement.