Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
1.
BMC Med Inform Decis Mak ; 20(Suppl 10): 305, 2020 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-33319709

RESUMEN

BACKGROUND: Ontologies house various kinds of domain knowledge in formal structures, primarily in the form of concepts and the associative relationships between them. Ontologies have become integral components of many health information processing environments. Hence, quality assurance of the conceptual content of any ontology is critical. Relationships are foundational to the definition of concepts. Missing relationship errors (i.e., unintended omissions of important definitional relationships) can have a deleterious effect on the quality of an ontology. An abstraction network is a structure that overlays an ontology and provides an alternate, summarization view of its contents. One kind of abstraction network is called an area taxonomy, and a variation of it is called a subtaxonomy. A methodology based on these taxonomies for more readily finding missing relationship errors is explored. METHODS: The area taxonomy and the subtaxonomy are deployed to help reveal concepts that have a high likelihood of exhibiting missing relationship errors. A specific top-level grouping unit found within the area taxonomy and subtaxonomy, when deemed to be anomalous, is used as an indicator that missing relationship errors are likely to be found among certain concepts. Two hypotheses pertaining to the effectiveness of our Quality Assurance approach are studied. RESULTS: Our Quality Assurance methodology was applied to the Biological Process hierarchy of the National Cancer Institute thesaurus (NCIt) and SNOMED CT's Eye/vision finding subhierarchy within its Clinical finding hierarchy. Many missing relationship errors were discovered and confirmed in our analysis. For both test-bed hierarchies, our Quality Assurance methodology yielded a statistically significantly higher number of concepts with missing relationship errors in comparison to a control sample of concepts. Two hypotheses are confirmed by these findings. CONCLUSIONS: Quality assurance is a critical part of an ontology's lifecycle, and automated or semi-automated tools for supporting this process are invaluable. We introduced a Quality Assurance methodology targeted at missing relationship errors. Its successful application to the NCIt's Biological Process hierarchy and SNOMED CT's Eye/vision finding subhierarchy indicates that it can be a useful addition to the arsenal of tools available to ontology maintenance personnel.


Asunto(s)
Systematized Nomenclature of Medicine , Vocabulario Controlado , Procesamiento Automatizado de Datos , Humanos , Probabilidad
2.
J Biomed Inform ; 57: 278-87, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26260003

RESUMEN

The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) is an extensive reference terminology with an attendant amount of complexity. It has been updated continuously and revisions have been released semi-annually to meet users' needs and to reflect the results of quality assurance (QA) activities. Two measures based on structural features are proposed to track the effects of both natural terminology growth and QA activities based on aspects of the complexity of SNOMED CT. These two measures, called the structural density measure and accumulated structural measure, are derived based on two abstraction networks, the area taxonomy and the partial-area taxonomy. The measures derive from attribute relationship distributions and various concept groupings that are associated with the abstraction networks. They are used to track the trends in the complexity of structures as SNOMED CT changes over time. The measures were calculated for consecutive releases of five SNOMED CT hierarchies, including the Specimen hierarchy. The structural density measure shows that natural growth tends to move a hierarchy's structure toward a more complex state, whereas the accumulated structural measure shows that QA processes tend to move a hierarchy's structure toward a less complex state. It is also observed that both the structural density and accumulated structural measures are useful tools to track the evolution of an entire SNOMED CT hierarchy and reveal internal concept migration within it.


Asunto(s)
Exactitud de los Datos , Systematized Nomenclature of Medicine
3.
J Biomed Inform ; 45(1): 15-29, 2012 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-21878396

RESUMEN

An algorithmically-derived abstraction network, called the partial-area taxonomy, for a SNOMED hierarchy has led to the identification of concepts considered complex. The designation "complex" is arrived at automatically on the basis of structural analyses of overlap among the constituent concept groups of the partial-area taxonomy. Such complex concepts, called overlapping concepts, constitute a tangled portion of a hierarchy and can be obstacles to users trying to gain an understanding of the hierarchy's content. A new methodology for partitioning the entire collection of overlapping concepts into singly-rooted groups, that are more manageable to work with and comprehend, is presented. Different kinds of overlapping concepts with varying degrees of complexity are identified. This leads to an abstract model of the overlapping concepts called the disjoint partial-area taxonomy, which serves as a vehicle for enhanced, high-level display. The methodology is demonstrated with an application to SNOMED's Specimen hierarchy. Overall, the resulting disjoint partial-area taxonomy offers a refined view of the hierarchy's structural organization and conceptual content that can aid users, such as maintenance personnel, working with SNOMED. The utility of the disjoint partial-area taxonomy as the basis for a SNOMED auditing regimen is presented in a companion paper.


Asunto(s)
Algoritmos , Systematized Nomenclature of Medicine , Humanos , Modelos Teóricos , Reconocimiento de Normas Patrones Automatizadas/métodos , Terminología como Asunto
4.
J Biomed Inform ; 45(1): 1-14, 2012 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-21907827

RESUMEN

Auditors of a large terminology, such as SNOMED CT, face a daunting challenge. To aid them in their efforts, it is essential to devise techniques that can automatically identify concepts warranting special attention. "Complex" concepts, which by their very nature are more difficult to model, fall neatly into this category. A special kind of grouping, called a partial-area, is utilized in the characterization of complex concepts. In particular, the complex concepts that are the focus of this work are those appearing in intersections of multiple partial-areas and are thus referred to as overlapping concepts. In a companion paper, an automatic methodology for identifying and partitioning the entire collection of overlapping concepts into disjoint, singly-rooted groups, that are more manageable to work with and comprehend, has been presented. The partitioning methodology formed the foundation for the development of an abstraction network for the overlapping concepts called a disjoint partial-area taxonomy. This new disjoint partial-area taxonomy offers a collection of semantically uniform partial-areas and is exploited herein as the basis for a novel auditing methodology. The review of the overlapping concepts is done in a top-down order within semantically uniform groups. These groups are themselves reviewed in a top-down order, which proceeds from the less complex to the more complex overlapping concepts. The results of applying the methodology to SNOMED's Specimen hierarchy are presented. Hypotheses regarding error ratios for overlapping concepts and between different kinds of overlapping concepts are formulated. Two phases of auditing the Specimen hierarchy for two releases of SNOMED are reported on. With the use of the double bootstrap and Fisher's exact test (two-tailed), the auditing of concepts and especially roots of overlapping partial-areas is shown to yield a statistically significant higher proportion of errors.


Asunto(s)
Systematized Nomenclature of Medicine , Modelos Teóricos , Terminología como Asunto
5.
J Am Med Inform Assoc ; 16(1): 116-31, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-18952946

RESUMEN

OBJECTIVE: Chemical concepts assigned multiple "Chemical Viewed Structurally" semantic types (STs) in the Unified Medical Language System (UMLS) are subject to ambiguous interpretation. The multiple assignments may denote the fact that a specific represented chemical (combination) is a conjugate, derived via a chemical reaction of chemicals of the different types, or a complex, composed of a mixture of such chemicals. The previously introduced Refined Semantic Network (RSN) is modified to properly model these varied multi-typed chemical combinations. DESIGN: The RSN was previously introduced as an enhanced abstraction of the UMLS's concepts. It features new types, called intersection semantic types (ISTs), each of which explicitly captures a unique combination of ST assignments in one abstract unit. The ambiguous ISTs of different "Chemical Viewed Structurally" ISTs of the RSN are replaced with two varieties of new types, called conjugate types and complex types, which explicitly denote the nature of the chemical interactions. Additional semantic relationships help further refine that new portion of the RSN rooted at the ST "Chemical Viewed Structurally." MEASUREMENTS: The number of new conjugate and complex types and the amount of changes to the type assignment of chemical concepts are presented. RESULTS: The modified RSN, consisting of 35 types and featuring 22 new conjugate and complex types, is presented. A total of 800 (about 98%) chemical concepts representing multi-typed chemical combinations from "Chemical Viewed Structurally" STs are uniquely assigned one of the new types. An additional benefit is the identification of a number of illegal ISTs and ST assignment errors, some of which are direct violations of exclusion rules defined by the UMLS Semantic Network. CONCLUSION: The modified RSN provides an enhanced abstract view of the UMLS's chemical content. Its array of conjugate and complex types provides a more accurate model of the variety of combinations involving chemicals viewed structurally. This framework will help streamline the process of type assignments for such chemical concepts and improve user orientation to the richness of the chemical content of the UMLS.


Asunto(s)
Compuestos Orgánicos/clasificación , Unified Medical Language System , Aminoácidos/química , Aminoácidos/clasificación , Árboles de Decisión , Estructura Molecular , Compuestos Orgánicos/química , Péptidos/química , Péptidos/clasificación , Proteínas/química , Proteínas/clasificación , Semántica , Esteroides/química , Esteroides/clasificación
6.
J Biomed Inform ; 42(3): 468-89, 2009 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-19475725

RESUMEN

The UMLS's integration of more than 100 source vocabularies, not necessarily consistent with one another, causes some inconsistencies. The purpose of auditing the UMLS is to detect such inconsistencies and to suggest how to resolve them while observing the requirement of fully representing the content of each source in the UMLS. A software tool, called the Neighborhood Auditing Tool (NAT), that facilitates UMLS auditing is presented. The NAT supports "neighborhood-based" auditing, where, at any given time, an auditor concentrates on a single-focus concept and one of a variety of neighborhoods of its closely related concepts. Typical diagrammatic displays of concept networks have a number of shortcomings, so the NAT utilizes a hybrid diagram/text interface that features stylized neighborhood views which retain some of the best features of both the diagrammatic layouts and text windows while avoiding the shortcomings. The NAT allows an auditor to display knowledge from both the Metathesaurus (concept) level and the Semantic Network (semantic type) level. Various additional features of the NAT that support the auditing process are described. The usefulness of the NAT is demonstrated through a group of case studies. Its impact is tested with a study involving a select group of auditors.


Asunto(s)
Auditoría Administrativa , Unified Medical Language System , Interfaz Usuario-Computador
7.
J Biomed Inform ; 42(1): 41-52, 2009 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-18619563

RESUMEN

Each UMLS concept is assigned one or more of the semantic types (STs) from the Semantic Network. Due to the size and complexity of the UMLS, errors are unavoidable. We present two auditing methodologies for groups of semantically similar concepts. The straightforward procedure starts with the extent of an ST, which is the group of all concepts assigned this ST. We divide the extent into groups of concepts that have been assigned exactly the same set of STs. An algorithm finds subgroups of suspicious concepts. The human auditor is presented with these subgroups, which purportedly exhibit the same semantics, and thus she will notice different concepts with wrong or missing ST assignments. The dynamic procedure detects concepts which become suspicious in the course of the auditing process. Both procedures are applied to two semantic types. The results are compared with a comprehensive manual audit and show a very high error recall with a much higher precision.


Asunto(s)
Semántica , Unified Medical Language System , Indización y Redacción de Resúmenes , Algoritmos , Animales , Terminología como Asunto
8.
J Biomed Inform ; 41(6): 904-13, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-18486558

RESUMEN

Biomedical research has identified many human genes and various knowledge about them. The National Cancer Institute Thesaurus (NCIT) represents such knowledge as concepts and roles (relationships). Due to the rapid advances in this field, it is to be expected that the NCIT's Gene hierarchy will contain role errors. A comparative methodology to audit the Gene hierarchy with the use of the National Center for Biotechnology Information's (NCBI's) Entrez Gene database is presented. The two knowledge sources are accessed via a pair of Web crawlers to ensure up-to-date data. Our algorithms then compare the knowledge gathered from each, identify discrepancies that represent probable errors, and suggest corrective actions. The primary focus is on two kinds of gene-roles: (1) the chromosomal locations of genes, and (2) the biological processes in which genes play a role. Regarding chromosomal locations, the discrepancies revealed are striking and systematic, suggesting a structurally common origin. In regard to the biological processes, difficulties arise because genes frequently play roles in multiple processes, and processes may have many designations (such as synonymous terms). Our algorithms make use of the roles defined in the NCIT Biological Process hierarchy to uncover many probable gene-role errors in the NCIT. These results show that automated comparative auditing is a promising technique that can identify a large number of probable errors and corrections for them in a terminological genomic knowledge repository, thus facilitating its overall maintenance.


Asunto(s)
Automatización , Bases de Datos Genéticas , Genómica , Algoritmos , Mapeo Cromosómico , National Institutes of Health (U.S.) , Estados Unidos
9.
AMIA Annu Symp Proc ; 2018: 750-759, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30815117

RESUMEN

Many major medical ontologies go through a regular (bi-annual, monthly, etc.) release cycle. A new release will contain corrections to the previous release, as well as genuinely new concepts that are the result of either user requests or new developments in the domain. New concepts need to be placed at the correct place in the ontology hierarchy. Traditionally, this is done by an expert modeling a new concept and running a classifier algorithm. We propose an alternative approach that is based on providing only the name of a new concept and using a Convolutional Neural Network-based machine learning method. We first tested this approach within one version of SNOMED CT and achieved an average 88.5% precision and an F1 score of 0.793. In comparing the July 2017 release with the January 2018 release, limiting ourselves to predicting one out of two or more parents, our average F1 score was 0.701.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Systematized Nomenclature of Medicine , Máquina de Vectores de Soporte
10.
J Biomed Inform ; 40(5): 561-81, 2007 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-17276736

RESUMEN

SNOMED is one of the leading health care terminologies being used worldwide. As such, quality assurance is an important part of its maintenance cycle. Methodologies for auditing SNOMED based on structural aspects of its organization are presented. In particular, automated techniques for partitioning SNOMED into smaller groups of concepts based primarily on relationships patterns are defined. Two abstraction networks, the area taxonomy and p-area taxonomy, are derived from the partitions. The high-level views afforded by these abstraction networks form the basis for systematic auditing. The networks tend to highlight errors that manifest themselves as irregularities at the abstract level. They also support group-based auditing, where sets of purportedly similar concepts are focused on for review. The auditing methodologies are demonstrated on one of SNOMED's top-level hierarchies. Errors discovered during the auditing process are reported.


Asunto(s)
Inteligencia Artificial , Systematized Nomenclature of Medicine , Control de Calidad , Estados Unidos
11.
J Healthc Eng ; 2017: 3495723, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29158885

RESUMEN

Ontologies are important components of health information management systems. As such, the quality of their content is of paramount importance. It has been proven to be practical to develop quality assurance (QA) methodologies based on automated identification of sets of concepts expected to have higher likelihood of errors. Four kinds of such sets (called QA-sets) organized around the themes of complex and uncommonly modeled concepts are introduced. A survey of different methodologies based on these QA-sets and the results of applying them to various ontologies are presented. Overall, following these approaches leads to higher QA yields and better utilization of QA personnel. The formulation of additional QA-set methodologies will further enhance the suite of available ontology QA tools.


Asunto(s)
Ontologías Biológicas , Clasificación , Garantía de la Calidad de Atención de Salud , Humanos
12.
Stud Health Technol Inform ; 245: 978-982, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29295246

RESUMEN

Maintenance and use of a large ontology, consisting of thousands of knowledge assertions, are hampered by its scope and complexity. It is important to provide tools for summarization of ontology content in order to facilitate user "big picture" comprehension. We present a parameterized methodology for the semi-automatic summarization of major topics in an ontology, based on a compact summary of the ontology, called an "aggregate partial-area taxonomy", followed by manual enhancement. An experiment is presented to test the effectiveness of such summarization measured by coverage of a given list of major topics of the corresponding application domain. SNOMED CT's Specimen hierarchy is the test-bed. A domain-expert provided a list of topics that serves as a gold standard. The enhanced results show that the aggregate taxonomy covers most of the domain's main topics.


Asunto(s)
Ontologías Biológicas , Systematized Nomenclature of Medicine , Automatización , Humanos , Bases del Conocimiento
13.
Methods Inf Med ; 56(3): 200-208, 2017 May 18.
Artículo en Inglés | MEDLINE | ID: mdl-28244549

RESUMEN

OBJECTIVES: Ontologies are knowledge structures that lend support to many health-information systems. A study is carried out to assess the quality of ontological concepts based on a measure of their complexity. The results show a relation between complexity of concepts and error rates of concepts. METHODS: A measure of lateral complexity defined as the number of exhibited role types is used to distinguish between more complex and simpler concepts. Using a framework called an area taxonomy, a kind of abstraction network that summarizes the structural organization of an ontology, concepts are divided into two groups along these lines. Various concepts from each group are then subjected to a two-phase QA analysis to uncover and verify errors and inconsistencies in their modeling. A hierarchy of the National Cancer Institute thesaurus (NCIt) is used as our test-bed. A hypothesis pertaining to the expected error rates of the complex and simple concepts is tested. RESULTS: Our study was done on the NCIt's Biological Process hierarchy. Various errors, including missing roles, incorrect role targets, and incorrectly assigned roles, were discovered and verified in the two phases of our QA analysis. The overall findings confirmed our hypothesis by showing a statistically significant difference between the amounts of errors exhibited by more laterally complex concepts vis-à-vis simpler concepts. CONCLUSIONS: QA is an essential part of any ontology's maintenance regimen. In this paper, we reported on the results of a QA study targeting two groups of ontology concepts distinguished by their level of complexity, defined in terms of the number of exhibited role types. The study was carried out on a major component of an important ontology, the NCIt. The findings suggest that more complex concepts tend to have a higher error rate than simpler concepts. These findings can be utilized to guide ongoing efforts in ontology QA.


Asunto(s)
Ontologías Biológicas , Comprensión , Uso Significativo/normas , Modelos Estadísticos , National Cancer Institute (U.S.)/normas , Neoplasias/clasificación , Simulación por Computador , Humanos , Procesamiento de Lenguaje Natural , Garantía de la Calidad de Atención de Salud/normas , Estados Unidos , Vocabulario Controlado
14.
Ann N Y Acad Sci ; 1387(1): 12-24, 2017 01.
Artículo en Inglés | MEDLINE | ID: mdl-27750400

RESUMEN

The purpose of the Big Data to Knowledge initiative is to develop methods for discovering new knowledge from large amounts of data. However, if the resulting knowledge is so large that it resists comprehension, referred to here as Big Knowledge (BK), how can it be used properly and creatively? We call this secondary challenge, Big Knowledge to Use. Without a high-level mental representation of the kinds of knowledge in a BK knowledgebase, effective or innovative use of the knowledge may be limited. We describe summarization and visualization techniques that capture the big picture of a BK knowledgebase, possibly created from Big Data. In this research, we distinguish between assertion BK and rule-based BK (rule BK) and demonstrate the usefulness of summarization and visualization techniques of assertion BK for clinical phenotyping. As an example, we illustrate how a summary of many intracranial bleeding concepts can improve phenotyping, compared to the traditional approach. We also demonstrate the usefulness of summarization and visualization techniques of rule BK for drug-drug interaction discovery.


Asunto(s)
Biología Computacional/métodos , Interacciones Farmacológicas , Interpretación de Imagen Asistida por Computador , Hemorragias Intracraneales/clasificación , Bases del Conocimiento , Modelos Neurológicos , Investigación Biomédica Traslacional/métodos , Animales , Biología Computacional/tendencias , Minería de Datos/métodos , Minería de Datos/tendencias , Toma de Decisiones Asistida por Computador , Humanos , Procesamiento de Imagen Asistido por Computador , Hemorragias Intracraneales/epidemiología , Hemorragias Intracraneales/etiología , Hemorragias Intracraneales/fisiopatología , Preparaciones Farmacéuticas/clasificación , Systematized Nomenclature of Medicine , Terminología como Asunto , Investigación Biomédica Traslacional/tendencias
15.
J Am Med Inform Assoc ; 13(6): 676-90, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16929044

RESUMEN

OBJECTIVE: To develop and test an auditing methodology for detecting errors in medical terminologies satisfying systematic inheritance. This methodology is based on various abstraction taxonomies that provide high-level views of a terminology and highlight potentially erroneous concepts. DESIGN: Our auditing methodology is based on dividing concepts of a terminology into smaller, more manageable units. First, we divide the terminology's concepts into areas according to their relationships/roles. Then each multi-rooted area is further divided into partial-areas (p-areas) that are singly-rooted. Each p-area contains a set of structurally and semantically uniform concepts. Two kinds of abstraction networks, called the area taxonomy and p-area taxonomy, are derived. These taxonomies form the basis for the auditing approach. Taxonomies tend to highlight potentially erroneous concepts in areas and p-areas. Human reviewers can focus their auditing efforts on the limited number of problematic concepts following two hypotheses on the probable concentration of errors. RESULTS: A sample of the area taxonomy and p-area taxonomy for the Biological Process (BP) hierarchy of the National Cancer Institute Thesaurus (NCIT) was derived from the application of our methodology to its concepts. These views led to the detection of a number of different kinds of errors that are reported, and to confirmation of the hypotheses on error concentration in this hierarchy. CONCLUSION: Our auditing methodology based on area and p-area taxonomies is an efficient tool for detecting errors in terminologies satisfying systematic inheritance of roles, and thus facilitates their maintenance. This methodology concentrates a domain expert's manual review on portions of the concepts with a high likelihood of errors.


Asunto(s)
Vocabulario Controlado , Biología/clasificación , National Institutes of Health (U.S.) , Control de Calidad , Terminología como Asunto , Unified Medical Language System , Estados Unidos
16.
J Bioinform Comput Biol ; 14(3): 1642001, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-27301779

RESUMEN

The gene ontology (GO) is used extensively in the field of genomics. Like other large and complex ontologies, quality assurance (QA) efforts for GO's content can be laborious and time consuming. Abstraction networks (AbNs) are summarization networks that reveal and highlight high-level structural and hierarchical aggregation patterns in an ontology. They have been shown to successfully support QA work in the context of various ontologies. Two kinds of AbNs, called the area taxonomy and the partial-area taxonomy, are developed for GO hierarchies and derived specifically for the biological process (BP) hierarchy. Within this framework, several QA heuristics, based on the identification of groups of anomalous terms which exhibit certain taxonomy-defined characteristics, are introduced. Such groups are expected to have higher error rates when compared to other terms. Thus, by focusing QA efforts on anomalous terms one would expect to find relatively more erroneous content. By automatically identifying these potential problem areas within an ontology, time and effort will be saved during manual reviews of GO's content. BP is used as a testbed, with samples of three kinds of anomalous BP terms chosen for a taxonomy-based QA review. Additional heuristics for QA are demonstrated. From the results of this QA effort, it is observed that different kinds of inconsistencies in the modeling of GO can be exposed with the use of the proposed heuristics. For comparison, the results of QA work on a sample of terms chosen from GO's general population are presented.


Asunto(s)
Biología Computacional/métodos , Ontología de Genes , Control de Calidad
17.
J Am Med Inform Assoc ; 12(6): 657-66, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16049233

RESUMEN

OBJECTIVE: The Enriched Semantic Network (ESN) was introduced as an extension of the Unified Medical Language System (UMLS) Semantic Network (SN). Its multiple subsumption configuration and concomitant multiple inheritance make the ESN's relationship structures and semantic type assignments different from those of the SN. A technique for deriving the relationship structures of the ESN's semantic types and an automated technique for deriving the ESN's semantic type assignments from those of the SN are presented. DESIGN: The technique to derive the ESN's relationship structures finds all newly inherited relationships in the ESN. All such relationships are audited for semantic validity, and the blocking mechanism is used to block invalid relationships. The mapping technique to derive the ESN's semantic type assignments uses current SN semantic type assignments and preserves nonredundant categorizations, while preventing new redundant categorizations. RESULTS: Among the 426 newly inherited relationships, 326 are deemed valid. Seven blockings are applied to avoid inheritance of the 100 invalid relationships. Sixteen semantic types have different relationship structures in the ESN as compared to those in the SN. The mapping of semantic type assignments from the SN to the ESN avoids the generation of 26,950 redundant categorizations. The resulting ESN contains 138 semantic types, 149 IS-A links, 7,303 relationships, and 1,013,876 semantic type assignments. CONCLUSION: The ESN's multiple inheritance provides more complete relationship structures than in the SN. The ESN's semantic type assignments avoid the existing redundant categorizations appearing in the SN and prevent new ones that might arise due to multiple parents. Compared to the SN, the ESN provides a more accurate unifying semantic abstraction of the UMLS Metathesaurus.


Asunto(s)
Unified Medical Language System , Semántica
18.
Artif Intell Med ; 34(3): 219-33, 2005 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-15996860

RESUMEN

OBJECTIVE: A metaschema is an abstraction network of the UMLS's semantic network (SN) obtained from a connected partition of its collection of semantic types. A lexical metaschema was previously derived based on a lexical partition which partitioned the SN into semantic-type groups using identical word-usage among the names of semantic types and the definitions of their respective children. In this paper, a statistical analysis methodology is presented to evaluate the lexical metaschema based on a study involving a group of established UMLS experts. METHODS: In the study, each expert was asked to identify subject areas of the SN based on his or her understanding of the various semantic types. For this purpose, the expert scans the SN hierarchy top-down, identifying semantic types, which are important and different enough from their parent semantic types, as roots of their groups. From the response of each expert, an "expert metaschema" is constructed. The different experts' metaschemas can vary widely. So, additional metaschemas are obtained from aggregations of the experts' responses. Of special interest is the consensus metaschema which represents an aggregation of a simple majority of the experts' responses. Statistical analysis comparing the lexical metaschema with the experts' metaschemas and the consensus metaschema is presented. RESULTS: The analysis results shows that 17 out of the 21 meta-semantic types in the lexical metaschema also appear in the consensus metaschema (about 81%). There are 107 semantic types (about 79%) covered by identical meta-semantic types and refinements. The results show the high similarity between the two metaschemas. Furthermore, the statistical analysis shows that the lexical metaschema did not grossly underperform compared to the experts. CONCLUSION: Our study shows that the lexical metaschema provides a good approximation for a partition of meaningful subject areas in the SN, when compared to the consensus metaschema capturing the aggregation of a simple majority of the human experts' opinions.


Asunto(s)
Descriptores , Unified Medical Language System/organización & administración , Unified Medical Language System/normas , Animales , Bases de Datos Factuales/clasificación , Humanos , Almacenamiento y Recuperación de la Información/métodos , Modelos Estadísticos , Compuestos Orgánicos , Semántica , Terminología como Asunto
19.
Artif Intell Med ; 33(1): 41-59, 2005 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-15617981

RESUMEN

OBJECTIVE: A metaschema is a high-level abstraction network of the UMLS's semantic network (SN) obtained from a partition of the SN's collection of semantic types. Every metaschema has nodes, called meta-semantic types, each of which denotes a group of semantic types constituting a subject area of the SN. A new kind of metaschema, called the lexical metaschema, is derived from a lexical partition of the SN. The lexical metaschema is compared to previously derived metaschemas, e.g., the cohesive metaschema. DESIGN: A new lexical partitioning methodology is presented based on identical word-usage among the names of semantic types and the definitions of their respective children. The lexical metaschema is derived from the application of the methodology. We compare the constituent meta-semantic types and their underlying semantic-type groups with the previously derived cohesive metaschema. A similar comparison of the lexical partition and a published partition of the SN is also carried out. RESULTS: The lexical partition of the SN has 21 semantic-type groups, each of which represents a subject area. The lexical metaschema thus has 21 meta-semantic types, 19 meta-child-of hierarchical relationships, and 86 meta-relationships. Our comparison shows that 15 out of the 21 meta-semantic types in the lexical metaschema also appear in the cohesive metaschema, and 80 semantic types are covered by identical meta-semantic types or refinements between the two metaschemas. The comparison between the lexical partition and the semantic partition shows that they have very low similarity. CONCLUSION: The algorithmically derived lexical metaschema serves as an abstraction of the SN and provides views representing different subject areas. It compares favorably with the cohesive metaschema derived via the SN's relationship configuration.


Asunto(s)
Semántica , Unified Medical Language System , Indización y Redacción de Resúmenes
20.
Artif Intell Med ; 64(1): 1-16, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25890687

RESUMEN

OBJECTIVE: Terminologies and terminological systems have assumed important roles in many medical information processing environments, giving rise to the "big knowledge" challenge when terminological content comprises tens of thousands to millions of concepts arranged in a tangled web of relationships. Use and maintenance of knowledge structures on that scale can be daunting. The notion of abstraction network is presented as a means of facilitating the usability, comprehensibility, visualization, and quality assurance of terminologies. METHODS AND MATERIALS: An abstraction network overlays a terminology's underlying network structure at a higher level of abstraction. In particular, it provides a more compact view of the terminology's content, avoiding the display of minutiae. General abstraction network characteristics are discussed. Moreover, the notion of meta-abstraction network, existing at an even higher level of abstraction than a typical abstraction network, is described for cases where even the abstraction network itself represents a case of "big knowledge." Various features in the design of abstraction networks are demonstrated in a methodological survey of some existing abstraction networks previously developed and deployed for a variety of terminologies. RESULTS: The applicability of the general abstraction-network framework is shown through use-cases of various terminologies, including the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT), the Medical Entities Dictionary (MED), and the Unified Medical Language System (UMLS). Important characteristics of the surveyed abstraction networks are provided, e.g., the magnitude of the respective size reduction referred to as the abstraction ratio. Specific benefits of these alternative terminology-network views, particularly their use in terminology quality assurance, are discussed. Examples of meta-abstraction networks are presented. CONCLUSIONS: The "big knowledge" challenge constitutes the use and maintenance of terminological structures that comprise tens of thousands to millions of concepts and their attendant complexity. The notion of abstraction network has been introduced as a tool in helping to overcome this challenge, thus enhancing the usefulness of terminologies. Abstraction networks have been shown to be applicable to a variety of existing biomedical terminologies, and these alternative structural views hold promise for future expanded use with additional terminologies.


Asunto(s)
Gestión de la Información en Salud/organización & administración , Informática Médica/organización & administración , Redes Neurales de la Computación , Vocabulario Controlado
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA