Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
1.
BMC Med Inform Decis Mak ; 23(Suppl 1): 40, 2023 02 24.
Artigo em Inglês | MEDLINE | ID: mdl-36829139

RESUMO

BACKGROUND: Two years into the COVID-19 pandemic and with more than five million deaths worldwide, the healthcare establishment continues to struggle with every new wave of the pandemic resulting from a new coronavirus variant. Research has demonstrated that there are variations in the symptoms, and even in the order of symptom presentations, in COVID-19 patients infected by different SARS-CoV-2 variants (e.g., Alpha and Omicron). Textual data in the form of admission notes and physician notes in the Electronic Health Records (EHRs) is rich in information regarding the symptoms and their orders of presentation. Unstructured EHR data is often underutilized in research due to the lack of annotations that enable automatic extraction of useful information from the available extensive volumes of textual data. METHODS: We present the design of a COVID Interface Terminology (CIT), not just a generic COVID-19 terminology, but one serving a specific purpose of enabling automatic annotation of EHRs of COVID-19 patients. CIT was constructed by integrating existing COVID-related ontologies and mining additional fine granularity concepts from clinical notes. The iterative mining approach utilized the techniques of 'anchoring' and 'concatenation' to identify potential fine granularity concepts to be added to the CIT. We also tested the generalizability of our approach on a hold-out dataset and compared the annotation coverage to the coverage obtained for the dataset used to build the CIT. RESULTS: Our experiments demonstrate that this approach results in higher annotation coverage compared to existing ontologies such as SNOMED CT and Coronavirus Infectious Disease Ontology (CIDO). The final version of CIT achieved about 20% more coverage than SNOMED CT and 50% more coverage than CIDO. In the future, the concepts mined and added into CIT could be used as training data for machine learning models for mining even more concepts into CIT and further increasing the annotation coverage. CONCLUSION: In this paper, we demonstrated the construction of a COVID interface terminology that can be utilized for automatically annotating EHRs of COVID-19 patients. The techniques presented can identify frequently documented fine granularity concepts that are missing in other ontologies thereby increasing the annotation coverage.


Assuntos
COVID-19 , Registros Eletrônicos de Saúde , Humanos , Pandemias , SARS-CoV-2
2.
Genet Med ; 25(4): 100012, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36637017

RESUMO

PURPOSE: TTN truncating variants (TTNtvs) represent the largest known genetic cause of dilated cardiomyopathies (DCMs), however their penetrance for DCM in general populations is low. More broadly, patients with cardiomyopathies (CMs) often exhibit other cardiac conditions, such as atrial fibrillation (Afib), which has also been linked to TTNtvs. This retrospective analysis aims to characterize the relationship between different cardiac conditions in those with TTNtvs and identify individuals with the highest risk of DCM. METHODS: In this work we leverage longitudinal electronic health record and exome sequencing data from approximately 450,000 individuals in 2 health systems to statistically confirm and pinpoint the genetic footprint of TTNtv-related diagnoses aside from CM, such as Afib, and determine whether vetting additional significantly associated phenotypes better stratifies CM risk across those with TTNtvs. We focused on TTNtvs in exons with a percentage spliced in >90% (hiPSI TTNtvs), a representation of constitutive cardiac expression. RESULTS: When controlling for CM and Afib, other cardiac conditions retained only nominal association with TTNtvs. A sliding window analysis of TTNtvs across the locus confirms that the association is specific to hiPSI exons for both CM and Afib, with no meaningful associations in percent spliced in ≤90% exons (loPSI TTNtvs). The combination of hiPSI TTNtv status and early Afib diagnosis (before age 60) found a subset of TTNtv individuals at high risk for CM. The prevalence of CM in this subset was 33%, a rate that was 3.5 fold higher than that in individuals with hiPSI TTNtvs (9% prevalence), 5-fold higher than that in individuals without TTNtvs with early Afib (6% prevalence), and 80-fold higher than that in the general population. CONCLUSION: Our retrospective analyses revealed that those with hiPSI TTNtvs and early Afib (∼1/2900) have a high prevalence of CM (33%), far exceeding that in other individuals with TTNtvs and in those without TTNtvs with an early Afib diagnosis. These results show that combining phenotypic information along with genomic population screening can identify patients at higher risk for progressing to symptomatic heart failure.


Assuntos
Fibrilação Atrial , Cardiomiopatias , Cardiomiopatia Dilatada , Cardiopatias , Humanos , Fibrilação Atrial/epidemiologia , Fibrilação Atrial/genética , Estudos Retrospectivos , Prevalência , Cardiomiopatias/epidemiologia , Cardiomiopatias/genética , Conectina/genética , Conectina/metabolismo , Cardiomiopatia Dilatada/epidemiologia , Cardiomiopatia Dilatada/genética
3.
Front Genet ; 13: 866169, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35571025

RESUMO

The clinical value of population-based genetic screening projects depends on the actions taken on the findings. The Healthy Nevada Project (HNP) is an all-comer genetic screening and research project based in northern Nevada. HNP participants with CDC Tier 1 findings of hereditary breast and ovarian cancer syndrome (HBOC), Lynch syndrome (LS), or familial hypercholesterolemia (FH) are notified and provided with genetic counseling. However, the HNP subsequently takes a "hands-off" approach: it is the responsibility of notified participants to share their findings with their healthcare providers, and providers are expected to implement the recommended action plans. Thus, the HNP presents an opportunity to evaluate the efficiency of participant and provider responses to notification of important genetic findings, using electronic health records (EHRs) at Renown Health (a large regional hospital in northern Nevada). Out of 520 HNP participants with findings, we identified 250 participants who were notified of their findings and who had an EHR. 107 of these participants responded to a survey, with 76 (71%) indicating that they had shared their findings with their healthcare providers. However, a sufficiently specific genetic diagnosis appeared in the EHRs and problem lists of only 22 and 10%, respectively, of participants without prior knowledge. Furthermore, review of participant EHRs provided evidence of possible relevant changes in clinical care for only a handful of participants. Up to 19% of participants would have benefited from earlier screening due to prior presentation of their condition. These results suggest that continuous support for both participants and their providers is necessary to maximize the benefit of population-based genetic screening. We recommend that genetic screening projects require participants' consent to directly document their genetic findings in their EHRs. Additionally, we recommend that they provide healthcare providers with ongoing training regarding documentation of findings and with clinical decision support regarding subsequent care.

4.
J Expo Sci Environ Epidemiol ; 31(5): 797-803, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34257389

RESUMO

BACKGROUND: Air pollution has been linked to increased susceptibility to SARS-CoV-2. Thus, it has been suggested that wildfire smoke events may exacerbate the COVID-19 pandemic. OBJECTIVES: Our goal was to examine whether wildfire smoke from the 2020 wildfires in the western United States was associated with an increased rate of SARS-CoV-2 infections in Reno, Nevada. METHODS: We conducted a time-series analysis using generalized additive models to examine the relationship between the SARS-CoV-2 test positivity rate at a large regional hospital in Reno and ambient PM2.5 from 15 May to 20 Oct 2020. RESULTS: We found that a 10 µg/m3 increase in the 7-day average PM2.5 concentration was associated with a 6.3% relative increase in the SARS-CoV-2 test positivity rate, with a 95% confidence interval (CI) of 2.5 to 10.3%. This corresponded to an estimated 17.7% (CI: 14.4-20.1%) increase in the number of cases during the time period most affected by wildfire smoke, from 16 Aug to 10 Oct. SIGNIFICANCE: Wildfire smoke may have greatly increased the number of COVID-19 cases in Reno. Thus, our results substantiate the role of air pollution in exacerbating the pandemic and can help guide the development of public preparedness policies in areas affected by wildfire smoke, as wildfires are likely to coincide with the COVID-19 pandemic in 2021.


Assuntos
Poluentes Atmosféricos , COVID-19 , Incêndios Florestais , Poluentes Atmosféricos/efeitos adversos , Poluentes Atmosféricos/análise , Humanos , Nevada , Pandemias , Material Particulado/efeitos adversos , Material Particulado/análise , SARS-CoV-2 , Fumaça/efeitos adversos , Estados Unidos/epidemiologia
5.
Cell Death Dis ; 12(4): 310, 2021 03 24.
Artigo em Inglês | MEDLINE | ID: mdl-33762578

RESUMO

SARS-CoV-2 is responsible for the ongoing world-wide pandemic which has already taken more than two million lives. Effective treatments are urgently needed. The enzymatic activity of the HECT-E3 ligase family members has been implicated in the cell egression phase of deadly RNA viruses such as Ebola through direct interaction of its VP40 Protein. Here we report that HECT-E3 ligase family members such as NEDD4 and WWP1 interact with and ubiquitylate the SARS-CoV-2 Spike protein. Furthermore, we find that HECT family members are overexpressed in primary samples derived from COVID-19 infected patients and COVID-19 mouse models. Importantly, rare germline activating variants in the NEDD4 and WWP1 genes are associated with severe COVID-19 cases. Critically, I3C, a natural NEDD4 and WWP1 inhibitor from Brassicaceae, displays potent antiviral effects and inhibits viral egression. In conclusion, we identify the HECT family members of E3 ligases as likely novel biomarkers for COVID-19, as well as new potential targets of therapeutic strategy easily testable in clinical trials in view of the established well-tolerated nature of the Brassicaceae natural compounds.


Assuntos
Tratamento Farmacológico da COVID-19 , COVID-19/enzimologia , Ubiquitina-Proteína Ligases/antagonistas & inibidores , Ubiquitina-Proteína Ligases/metabolismo , Adulto , Idoso , Animais , Antivirais/farmacologia , COVID-19/genética , COVID-19/metabolismo , Chlorocebus aethiops , Complexos Endossomais de Distribuição Requeridos para Transporte/metabolismo , Feminino , Humanos , Indóis/farmacologia , Masculino , Camundongos , Camundongos Endogâmicos BALB C , Pessoa de Meia-Idade , Ubiquitina-Proteína Ligases Nedd4/genética , Ubiquitina-Proteína Ligases Nedd4/metabolismo , SARS-CoV-2/isolamento & purificação , SARS-CoV-2/metabolismo , Glicoproteína da Espícula de Coronavírus/metabolismo , Ubiquitina-Proteína Ligases/genética , Ubiquitinação , Células Vero
6.
Nat Metab ; 2(10): 1126-1134, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33046911

RESUMO

Genome-wide association studies have identified 240 independent loci associated with type 2 diabetes (T2D) risk, but this knowledge has not advanced precision medicine. In contrast, the genetic diagnosis of monogenic forms of diabetes (including maturity-onset diabetes of the young (MODY)) are textbook cases of genomic medicine. Recent studies trying to bridge the gap between monogenic diabetes and T2D have been inconclusive. Here, we show a significant burden of pathogenic variants in genes linked with monogenic diabetes among people with common T2D, particularly in actionable MODY genes, thus implying that there should be a substantial change in care for carriers with T2D. We show that, among 74,629 individuals, this burden is probably driven by the pathogenic variants found in GCK, and to a lesser extent in HNF4A, KCNJ11, HNF1B and ABCC8. The carriers with T2D are leaner, which evidences a functional metabolic effect of these mutations. Pathogenic variants in actionable MODY genes are more frequent than was previously expected in common T2D. These results open avenues for future interventions assessing the clinical interest of these pathogenic mutations in precision medicine.


Assuntos
Diabetes Mellitus Tipo 2/genética , Biologia Computacional , Feminino , Variação Genética , Estudo de Associação Genômica Ampla , Quinases do Centro Germinativo/genética , Heterozigoto , Humanos , Masculino , Pessoa de Meia-Idade , Mutação
7.
Environ Health ; 19(1): 92, 2020 08 27.
Artigo em Inglês | MEDLINE | ID: mdl-32854703

RESUMO

BACKGROUND: Health risks due to particulate matter (PM) from wildfires may differ from risk due to PM from other sources. In places frequently subjected to wildfire smoke, such as Reno, Nevada, it is critical to determine whether wildfire PM poses unique risks. Our goal was to quantify the difference in the association of adverse asthma events with PM on days when wildfire smoke was present versus days when wildfire smoke was not present. METHODS: We obtained counts of visits for asthma at emergency departments and urgent care centers from a large regional healthcare system in Reno for the years 2013-2018. We also obtained dates when wildfire smoke was present from the Washoe County Health District Air Quality Management Division. We then examined whether the presence of wildfire smoke modified the association of PM2.5, PM10-2.5, and PM10 with asthma visits using generalized additive models. We improved on previous studies by excluding wildfire-smoke days where the PM concentration exceeded the maximum PM concentration on other days, thus accounting for possible nonlinearity in the association between PM concentration and asthma visits. RESULTS: Air quality was affected by wildfire smoke on 188 days between 2013 and 2018. We found that the presence of wildfire smoke increased the association of a 5 µg/m3 increase in daily and three-day averages of PM2.5 with asthma visits by 6.1% (95% confidence interval (CI): 2.1-10.3%) and 6.8% (CI: 1.2-12.7%), respectively. Similarly, the presence of wildfire smoke increased the association of a 5 µg/m3 increase in daily and three-day averages of PM10 with asthma visits by 5.5% (CI: 2.5-8.6%) and 7.2% (CI: 2.6-12.0%), respectively. We did not observe any significant increases in association for PM10-2.5 or for seven-day averages of PM2.5 and PM10. CONCLUSIONS: Since we found significantly stronger associations of PM2.5 and PM10 with asthma visits when wildfire smoke was present, our results suggest that wildfire PM is more hazardous than non-wildfire PM for patients with asthma.


Assuntos
Asma/epidemiologia , Serviço Hospitalar de Emergência/estatística & dados numéricos , Exposição Ambiental/efeitos adversos , Hospitalização/estatística & dados numéricos , Material Particulado/efeitos adversos , Fumaça/efeitos adversos , Incêndios Florestais , Asma/induzido quimicamente , Cidades , Nevada/epidemiologia , Material Particulado/análise
8.
Nat Commun ; 11(1): 542, 2020 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-31992710

RESUMO

Understanding the impact of rare variants is essential to understanding human health. We analyze rare (MAF < 0.1%) variants against 4264 phenotypes in 49,960 exome-sequenced individuals from the UK Biobank and 1934 phenotypes (1821 overlapping with UK Biobank) in 21,866 members of the Healthy Nevada Project (HNP) cohort who underwent Exome + sequencing at Helix. After using our rare-variant-tailored methodology to reduce test statistic inflation, we identify 64 statistically significant gene-based associations in our meta-analysis of the two cohorts and 37 for phenotypes available in only one cohort. Singletons make significant contributions to our results, and the vast majority of the associations could not have been identified with a genotyping chip. Our results are available for interactive browsing in a webapp (https://ukb.research.helix.com). This comprehensive analysis illustrates the biological value of large, deeply phenotyped cohorts of unselected populations coupled with NGS data.


Assuntos
Exoma/genética , Variação Genética , Genoma Humano , Estudo de Associação Genômica Ampla , Fenótipo , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Estudos de Coortes , Bases de Dados Genéticas , Europa (Continente) , Feminino , Genética Populacional/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Metanálise como Assunto , Pessoa de Meia-Idade , Software , Sequenciamento do Exoma , Adulto Jovem
9.
G3 (Bethesda) ; 10(2): 645-664, 2020 02 06.
Artigo em Inglês | MEDLINE | ID: mdl-31888951

RESUMO

The aggregation of Electronic Health Records (EHR) and personalized genetics leads to powerful discoveries relevant to population health. Here we perform genome-wide association studies (GWAS) and accompanying phenome-wide association studies (PheWAS) to validate phenotype-genotype associations of BMI, and to a greater extent, severe Class 2 obesity, using comprehensive diagnostic and clinical data from the EHR database of our cohort. Three GWASs of 500,000 variants on the Illumina platform of 6,645 Healthy Nevada participants identified several published and novel variants that affect BMI and obesity. Each GWAS was followed with two independent PheWASs to examine associations between extensive phenotypes (incidence of diagnoses, condition, or disease), significant SNPs, BMI, and incidence of extreme obesity. The first GWAS examines associations with BMI in a cohort with no type 2 diabetics, focusing exclusively on BMI. The second GWAS examines associations with BMI in a cohort that includes type 2 diabetics. In the second GWAS, type 2 diabetes is a comorbidity, and thus becomes a covariate in the statistical model. The intersection of significant variants of these two studies is surprising. The third GWAS is a case vs. control study, with cases defined as extremely obese (Class 2 or 3 obesity), and controls defined as participants with BMI between 18.5 and 25. This last GWAS identifies strong associations with extreme obesity, including established variants in the FTO and NEGR1 genes, as well as loci not yet linked to obesity. The PheWASs validate published associations between BMI and extreme obesity and incidence of specific diagnoses and conditions, yet also highlight novel links. This study emphasizes the importance of our extensive longitudinal EHR database to validate known associations and identify putative novel links with BMI and obesity.


Assuntos
Índice de Massa Corporal , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Obesidade/etiologia , Adulto , Idoso , Comorbidade , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Feminino , Estudos de Associação Genética/métodos , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Nevada/epidemiologia , Obesidade/diagnóstico , Obesidade/epidemiologia , Fenótipo , Polimorfismo de Nucleotídeo Único
10.
PLoS One ; 14(6): e0218078, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31194788

RESUMO

In this study, we perform a full genome-wide association study (GWAS) to identify statistically significantly associated single nucleotide polymorphisms (SNPs) with three red blood cell (RBC) components and follow it with two independent PheWASs to examine associations between phenotypic data (case-control status of diagnoses or disease), significant SNPs, and RBC component levels. We first identified associations between the three RBC components: mean platelet volume (MPV), mean corpuscular volume (MCV), and platelet counts (PC), and the genotypes of approximately 500,000 SNPs on the Illumina Infimum DNA Human OmniExpress-24 BeadChip using a single cohort of 4,673 Northern Nevadans. Twenty-one SNPs in five major genomic regions were found to be statistically significantly associated with MPV, two regions with MCV, and one region with PC, with p<5x10-8. Twenty-nine SNPs and nine chromosomal regions were identified in 30 previous GWASs, with effect sizes of similar magnitude and direction as found in our cohort. The two strongest associations were SNP rs1354034 with MPV (p = 2.4x10-13) and rs855791 with MCV (p = 5.2x10-12). We then examined possible associations between these significant SNPs and incidence of 1,488 phenotype groups mapped from International Classification of Disease version 9 and 10 (ICD9 and ICD10) codes collected in the extensive electronic health record (EHR) database associated with Healthy Nevada Project consented participants. Further leveraging data collected in the EHR, we performed an additional PheWAS to identify associations between continuous red blood cell (RBC) component measures and incidence of specific diagnoses. The first PheWAS illuminated whether SNPs associated with RBC components in our cohort were linked with other hematologic phenotypic diagnoses or diagnoses of other nature. Although no SNPs from our GWAS were identified as strongly associated to other phenotypic components, a number of associations were identified with p-values ranging between 1x10-3 and 1x10-4 with traits such as respiratory failure, sleep disorders, hypoglycemia, hyperglyceridemia, GERD and IBS. The second PheWAS examined possible phenotypic predictors of abnormal RBC component measures: a number of hematologic phenotypes such as thrombocytopenia, anemias, hemoglobinopathies and pancytopenia were found to be strongly associated to RBC component measures; additional phenotypes such as (morbid) obesity, malaise and fatigue, alcoholism, and cirrhosis were also identified to be possible predictors of RBC component measures.


Assuntos
Eritrócitos/citologia , Estudo de Associação Genômica Ampla , Fenótipo , Adulto , Mapeamento Cromossômico , Estudos de Coortes , Feminino , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Nevada , Polimorfismo de Nucleotídeo Único
11.
J Biomed Inform ; 94: 103193, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31048072

RESUMO

In previous research, we have studied concepts that occur in pairs of medical terminologies and are known to be identical, because they have the same ID number in the Unified Medical Language System (UMLS). We observed that such concepts rarely have exactly the same sets of children (=subconcepts) in the two terminologies. The number of common children was found to vary widely. A special situation was identified where the children in one terminology relate to the common parent in a very different way than the children in the other terminology. For example, children in one terminology might subdivide a parent concept by anatomical location in one terminology and by disease kind in the other terminology. We coined the term "alternative classification" (of the same parent concept) for such situations. In previous work, only human experts could recognize alternative classifications. In this paper, we present a mathematically expressed criterion for likely cases of alternative classifications. We compare the recommendations of this criterion, expressed by a mathematical quantity called "EFI" becoming zero, with the decisions of a human expert. It is found that the human expert agreed with the criterion in 72% of all cases, which is a big improvement over having no computable criterion at all. Besides alternative classifications, common parent concepts in a pair of terminologies might also indicate a possible import of a child concept missing in one terminology, different granularities, or errors in either one of the two terminologies. In this paper, we further investigate different kinds of alternative classifications.


Assuntos
Relações Pais-Filho , Terminologia como Assunto , Adulto , Criança , Humanos , Semântica , Unified Medical Language System
12.
J Biomed Inform ; 83: 135-149, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29852316

RESUMO

In previous research, we have demonstrated for a number of ontologies that structurally complex concepts (for different definitions of "complex") in an ontology are more likely to exhibit errors than other concepts. Thus, such complex concepts often become fertile ground for quality assurance (QA) in ontologies. They should be audited first. One example of complex concepts is given by "overlapping concepts" (to be defined below.) Historically, a different auditing methodology had to be developed for every single ontology. For better scalability and efficiency, it is desirable to identify family-wide QA methodologies. Each such methodology would be applicable to a whole family of similar ontologies. In past research, we had divided the 685 ontologies of BioPortal into families of structurally similar ontologies. We showed for four ontologies of the same large family in BioPortal that "overlapping concepts" are indeed statistically significantly more likely to exhibit errors. In order to make an authoritative statement concerning the success of "overlapping concepts" as a methodology for a whole family of similar ontologies (or of large subhierarchies of ontologies), it is necessary to show that "overlapping concepts" have a higher likelihood of errors for six out of six ontologies of the family. In this paper, we are demonstrating for two more ontologies that "overlapping concepts" can successfully predict groups of concepts with a higher error rate than concepts from a control group. The fifth ontology is the Neoplasm subhierarchy of the National Cancer Institute thesaurus (NCIt). The sixth ontology is the Infectious Disease subhierarchy of SNOMED CT. We demonstrate quality assurance results for both of them. Furthermore, in this paper we observe two novel, important, and useful phenomena during quality assurance of "overlapping concepts." First, an erroneous "overlapping concept" can help with discovering other erroneous "non-overlapping concepts" in its vicinity. Secondly, correcting erroneous "overlapping concepts" may turn them into "non-overlapping concepts." We demonstrate that this may reduce the complexity of parts of the ontology, which in turn makes the ontology more comprehensible, simplifying maintenance and use of the ontology.


Assuntos
Ontologias Biológicas , Processamento Eletrônico de Dados/métodos , National Cancer Institute (U.S.) , Systematized Nomenclature of Medicine , Estados Unidos , Vocabulário Controlado
13.
Methods Inf Med ; 57(1): 43-53, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29621830

RESUMO

BACKGROUND: The UMLS assigns semantic types to all its integrated concepts. The semantic types are widely used in various natural language processing tasks in the biomedical domain, such as named entity recognition, semantic disambiguation, and semantic annotation. Due to the size of the UMLS, erroneous semantic type assignments are hard to detect. It is imperative to devise automated techniques to identify errors and inconsistencies in semantic type assignments. OBJECTIVES: Designing a methodology to perform programmatic checks to detect semantic type assignment errors for UMLS concepts with one or more SNOMED CT terms and evaluating concepts in a selected set of SNOMED CT hierarchies to verify our hypothesis that UMLS semantic type assignment errors may exist in concepts residing in semantically inconsistent groups. METHODS: Our methodology is a four-stage process. 1) partitioning concepts in a SNOMED CT hierarchy into semantically uniform groups based on their assigned semantic tags; 2) partitioning concepts in each group from 1) into the disjoint sub-groups based on their semantic type assignments; 3) mapping all SNOMED CT semantic tags into one or more semantic types in the UMLS; 4) identifying semantically inconsistent groups that have inconsistent assignments between semantic tags and semantic types according to the mapping from 3) and providing concepts in such groups to the domain experts for reviewing. RESULTS: We applied our method on the UMLS 2013AA release. Concepts of the semantically inconsistent groups in the PHYSICAL FORCE and RECORD ARTIFACT hierarchies have error rates 33% and 62.5% respectively, which are greatly larger than error rates 0.6% and 1% in semantically consistent groups of the two hierarchies. CONCLUSION: Concepts in semantically in - consistent groups are more likely to contain semantic type assignment errors. Our methodology can make auditing more efficient by limiting auditing resources on concepts of semantically inconsistent groups.


Assuntos
Semântica , Systematized Nomenclature of Medicine , Unified Medical Language System , Artefatos , Reprodutibilidade dos Testes
14.
Artif Intell Med ; 79: 9-14, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-28532962

RESUMO

OBJECTIVE: To examine whether disjoint partial-area taxonomy, a semantically-based evaluation methodology that has been successfully tested in SNOMED CT, will perform with similar effectiveness on Uberon, an anatomical ontology that belongs to a structurally similar family of ontologies as SNOMED CT. METHOD: A disjoint partial-area taxonomy was generated for Uberon. One hundred randomly selected test concepts that overlap between partial-areas were matched to a same size control sample of non-overlapping concepts. The samples were blindly inspected for non-critical issues and presumptive errors first by a general domain expert whose results were then confirmed or rejected by a highly experienced anatomical ontology domain expert. Reported issues were subsequently reviewed by Uberon's curators. RESULTS: Overlapping concepts in Uberon's disjoint partial-area taxonomy exhibited a significantly higher rate of all issues. Clear-cut presumptive errors trended similarly but did not reach statistical significance. A sub-analysis of overlapping concepts with three or more relationship types indicated a much higher rate of issues. CONCLUSIONS: Overlapping concepts from Uberon's disjoint abstraction network are quite likely (up to 28.9%) to exhibit issues. The results suggest that the methodology can transfer well between same family ontologies. Although Uberon exhibited relatively few overlapping concepts, the methodology can be combined with other semantic indicators to expand the process to other concepts within the ontology that will generate high yields of discovered issues.


Assuntos
Semântica , Systematized Nomenclature of Medicine , Ontologias Biológicas
15.
Artigo em Inglês | MEDLINE | ID: mdl-29375930

RESUMO

The Unified Medical Language System (UMLS) is an important terminological system. By the policy of its curators, each concept of the UMLS should be assigned the most specific Semantic Types (STs) in the UMLS Semantic Network (SN). Hence, the Semantic Types of most UMLS concepts are assigned at or near the bottom (leaves) of the UMLS Semantic Network. While most ST assignments are correct, some errors do occur. Therefore, Quality Assurance efforts of UMLS curators for ST assignments should concentrate on automatically detected sets of UMLS concepts with higher error rates than random sets. In this paper, we investigate the assignments of top-level semantic types in the UMLS semantic network to concepts, identify potential erroneous assignments, define four categories of errors, and thus provide assistance to curators of the UMLS to avoid these assignments errors. Human experts analyzed samples of concepts assigned 10 of the top-level semantic types and categorized the erroneous ST assignments into these four logical categories. Two thirds of the concepts assigned these 10 top-level semantic types are erroneous. Our results demonstrate that reviewing top-level semantic type assignments to concepts provides an effective way for UMLS quality assurance, comparing to reviewing a random selection of semantic type assignments.

16.
Stud Health Technol Inform ; 245: 978-982, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29295246

RESUMO

Maintenance and use of a large ontology, consisting of thousands of knowledge assertions, are hampered by its scope and complexity. It is important to provide tools for summarization of ontology content in order to facilitate user "big picture" comprehension. We present a parameterized methodology for the semi-automatic summarization of major topics in an ontology, based on a compact summary of the ontology, called an "aggregate partial-area taxonomy", followed by manual enhancement. An experiment is presented to test the effectiveness of such summarization measured by coverage of a given list of major topics of the corresponding application domain. SNOMED CT's Specimen hierarchy is the test-bed. A domain-expert provided a list of topics that serves as a gold standard. The enhanced results show that the aggregate taxonomy covers most of the domain's main topics.


Assuntos
Ontologias Biológicas , Systematized Nomenclature of Medicine , Automação , Humanos , Bases de Conhecimento
17.
Stud Health Technol Inform ; 245: 1330, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29295411

RESUMO

In previous research we have shown that hierarchically complex overlapping concepts have a higher error rate of errors versus control concepts. In this poster we show an exmaple from Neoplasm concepts of the NCI thesaurus (NCIt) demonstrating that erroneous overplapping concepts, reflected in the partial-area units of a partial-area taxonomy, display visual complexity. Furthermore, correcting these erroneous concepts causes visual simplification.


Assuntos
Neoplasias , Vocabulário Controlado , Humanos , National Cancer Institute (U.S.) , Estados Unidos
18.
J Biomed Inform ; 57: 278-87, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26260003

RESUMO

The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) is an extensive reference terminology with an attendant amount of complexity. It has been updated continuously and revisions have been released semi-annually to meet users' needs and to reflect the results of quality assurance (QA) activities. Two measures based on structural features are proposed to track the effects of both natural terminology growth and QA activities based on aspects of the complexity of SNOMED CT. These two measures, called the structural density measure and accumulated structural measure, are derived based on two abstraction networks, the area taxonomy and the partial-area taxonomy. The measures derive from attribute relationship distributions and various concept groupings that are associated with the abstraction networks. They are used to track the trends in the complexity of structures as SNOMED CT changes over time. The measures were calculated for consecutive releases of five SNOMED CT hierarchies, including the Specimen hierarchy. The structural density measure shows that natural growth tends to move a hierarchy's structure toward a more complex state, whereas the accumulated structural measure shows that QA processes tend to move a hierarchy's structure toward a less complex state. It is also observed that both the structural density and accumulated structural measures are useful tools to track the evolution of an entire SNOMED CT hierarchy and reveal internal concept migration within it.


Assuntos
Confiabilidade dos Dados , Systematized Nomenclature of Medicine
19.
Artigo em Inglês | MEDLINE | ID: mdl-25422719

RESUMO

BACKGROUND: The Refined Semantic Network (RSN) for the UMLS was previously introduced to complement the UMLS Semantic Network (SN). The RSN partitions the UMLS Metathesaurus (META) into disjoint groups of concepts. Each such group is semantically uniform. However, the RSN was initially an order of magnitude larger than the SN, which is undesirable since to be useful, a semantic network should be compact. Most semantic types in the RSN represent combinations of semantic types in the UMLS SN. Such a "combination semantic type" is called Intersection Semantic Type (IST). Many ISTs are assigned to very few concepts. Moreover, when reviewing those concepts, many semantic type assignment inconsistencies were found. After correcting those inconsistencies many ISTs, among them some that contradicted UMLS rules, disappeared, which made the RSN smaller. OBJECTIVE: The authors performed a longitudinal study with the goal of reducing the size of the RSN to become compact. This goal was achieved by correcting inconsistencies and errors in the IST assignments in the UMLS, which additionally helped identify and correct ambiguities, inconsistencies, and errors in source terminologies widely used in the realm of public health. METHODS: In this paper, we discuss the process and steps employed in this longitudinal study and the intermediate results for different stages. The sculpting process includes removing redundant semantic type assignments, expanding semantic type assignments, and removing illegitimate ISTs by auditing ISTs of small extents. However, the emphasis of this paper is not on the auditing methodologies employed during the process, since they were introduced in earlier publications, but on the strategy of employing them in order to transform the RSN into a compact network. For this paper we also performed a comprehensive audit of 168 "small ISTs" in the 2013AA version of the UMLS to finalize the longitudinal study. RESULTS: Over the years it was found that the editors of the UMLS introduced some new inconsistencies that resulted in the reintroduction of unwarranted ISTs that had already been eliminated as a result of their previous corrections. Because of that, the transformation of the RSN into a compact network covering all necessary categories for the UMLS was slowed down. The corrections suggested by an audit of the 2013AA version of the UMLS achieve a compact RSN of equal magnitude as the UMLS SN. The number of ISTs has been reduced to 336. We also demonstrate how auditing the semantic type assignments of UMLS concepts can expose other modeling errors in the UMLS source terminologies, e.g., SNOMED CT, LOINC, and RxNORM that are important for health informatics. Such errors would otherwise stay hidden. CONCLUSIONS: It is hoped that the UMLS curators will implement all required corrections and use the RSN along with the SN when maintaining and extending the UMLS. When used correctly, the RSN will support the prevention of the accidental introduction of inconsistent semantic type assignments into the UMLS. Furthermore, this way the RSN will support the exposure of other hidden errors and inconsistencies in health informatics terminologies, which are sources of the UMLS. Notably, the development of the RSN materializes the deeper, more refined Semantic Network for the UMLS that its designers envisioned originally but had not implemented.

20.
J Biomed Inform ; 47: 192-8, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24239752

RESUMO

OBJECTIVE: To quantify the presence of and evaluate an approach for detection of inconsistencies in the formal definitions of SNOMED CT (SCT) concepts utilizing a lexical method. MATERIAL AND METHOD: Utilizing SCT's Procedure hierarchy, we algorithmically formulated similarity sets: groups of concepts with similar lexical structure of their fully specified name. We formulated five random samples, each with 50 similarity sets, based on the same parameter: number of parents, attributes, groups, all the former as well as a randomly selected control sample. All samples' sets were reviewed for types of formal definition inconsistencies: hierarchical, attribute assignment, attribute target values, groups, and definitional. RESULTS: For the Procedure hierarchy, 2111 similarity sets were formulated, covering 18.1% of eligible concepts. The evaluation revealed that 38 (Control) to 70% (Different relationships) of similarity sets within the samples exhibited significant inconsistencies. The rate of inconsistencies for the sample with different relationships was highly significant compared to Control, as well as the number of attribute assignment and hierarchical inconsistencies within their respective samples. DISCUSSION AND CONCLUSION: While, at this time of the HITECH initiative, the formal definitions of SCT are only a minor consideration, in the grand scheme of sophisticated, meaningful use of captured clinical data, they are essential. However, significant portion of the concepts in the most semantically complex hierarchy of SCT, the Procedure hierarchy, are modeled inconsistently in a manner that affects their computability. Lexical methods can efficiently identify such inconsistencies and possibly allow for their algorithmic resolution.


Assuntos
Algoritmos , Semântica , Systematized Nomenclature of Medicine , Humanos , Uso Significativo , Infarto do Miocárdio/terapia , Isquemia Miocárdica/terapia , Garantia da Qualidade dos Cuidados de Saúde , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA