Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
J Biomed Inform ; 58 Suppl: S164-S170, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26279500

RESUMO

In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9 billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics have opened up new possibilities of using the rich information in electronic medical records (EMRs) to identify relevant risk factors. The 2014 i2b2/UTHealth Challenge brought together researchers and practitioners of clinical natural language processing (NLP) to tackle the identification of heart disease risk factors reported in EMRs. We participated in this track and developed an NLP system by leveraging existing tools and resources, both public and proprietary. Our system was a hybrid of several machine-learning and rule-based components. The system achieved an overall F1 score of 0.9185, with a recall of 0.9409 and a precision of 0.8972.


Assuntos
Doenças Cardiovasculares/epidemiologia , Mineração de Dados/métodos , Complicações do Diabetes/epidemiologia , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Idoso , California/epidemiologia , Doenças Cardiovasculares/diagnóstico , Estudos de Coortes , Comorbidade , Segurança Computacional , Confidencialidade , Complicações do Diabetes/diagnóstico , Feminino , Humanos , Incidência , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Reconhecimento Automatizado de Padrão/métodos , Medição de Risco/métodos , Vocabulário Controlado
2.
BMC Med Inform Decis Mak ; 15 Suppl 1: S2, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26045009

RESUMO

BACKGROUND: Parsing, which generates a syntactic structure of a sentence (a parse tree), is a critical component of natural language processing (NLP) research in any domain including medicine. Although parsers developed in the general English domain, such as the Stanford parser, have been applied to clinical text, there are no formal evaluations and comparisons of their performance in the medical domain. METHODS: In this study, we investigated the performance of three state-of-the-art parsers: the Stanford parser, the Bikel parser, and the Charniak parser, using following two datasets: (1) A Treebank containing 1,100 sentences that were randomly selected from progress notes used in the 2010 i2b2 NLP challenge and manually annotated according to a Penn Treebank based guideline; and (2) the MiPACQ Treebank, which is developed based on pathology notes and clinical notes, containing 13,091 sentences. We conducted three experiments on both datasets. First, we measured the performance of the three state-of-the-art parsers on the clinical Treebanks with their default settings. Then we re-trained the parsers using the clinical Treebanks and evaluated their performance using the 10-fold cross validation method. Finally we re-trained the parsers by combining the clinical Treebanks with the Penn Treebank. RESULTS: Our results showed that the original parsers achieved lower performance in clinical text (Bracketing F-measure in the range of 66.6%-70.3%) compared to general English text. After retraining on the clinical Treebank, all parsers achieved better performance, with the best performance from the Stanford parser that reached the highest Bracketing F-measure of 73.68% on progress notes and 83.72% on the MiPACQ corpus using 10-fold cross validation. When the combined clinical Treebanks and Penn Treebank was used, of the three parsers, the Charniak parser achieved the highest Bracketing F-measure of 73.53% on progress notes and the Stanford parser reached the highest F-measure of 84.15% on the MiPACQ corpus. CONCLUSIONS: Our study demonstrates that re-training using clinical Treebanks is critical for improving general English parsers' performance on clinical text, and combining clinical and open domain corpora might achieve optimal performance for parsing clinical text.


Assuntos
Linguística/métodos , Informática Médica/métodos , Processamento de Linguagem Natural , Humanos
3.
J Rural Health ; 38(4): 908-915, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35261092

RESUMO

PURPOSE: Rural populations are disproportionately affected by the COVID-19 pandemic. We characterized urban-rural disparities in patient portal messaging utilization for COVID-19, and, of those who used the portal during its early stage in the Midwest. METHODS: We collected over 1 million portal messages generated by midwestern Mayo Clinic patients from February to August 2020. We analyzed patient-generated messages (PGMs) on COVID-19 by urban-rural locality and incorporated patients' sociodemographic factors into the analysis. FINDINGS: The urban-rural ratio of portal users, message senders, and COVID-19 message senders was 1.18, 1.31, and 1.79, indicating greater use among urban patients. The urban-rural ratio (1.69) of PGMs on COVID-19 was higher than that (1.43) of general PGMs. The urban-rural ratios of messaging were 1.72-1.85 for COVID-19-related care and 1.43-1.66 for other health care issues on COVID-19. Compared with urban patients, rural patients sent fewer messages for COVID-19 diagnosis and treatment but more messages for other reasons related to COVID-19-related health care (eg, isolation and anxiety). The frequent senders of COVID-19-related messages among rural patients were 40+ years old, women, married, and White. CONCLUSIONS: In this Midwest health system, rural patients were less likely to use patient online services during a pandemic and their reasons for its use differ from urban patients. Results suggest opportunities for increasing equity in rural patient engagement in patient portals (in particular, minority populations) for COVID-19. Public health intervention strategies could target reasons why rural patients might seek health care in a pandemic, such as social isolation and anxiety.


Assuntos
COVID-19 , Adulto , COVID-19/epidemiologia , Teste para COVID-19 , Feminino , Humanos , Pandemias , Participação do Paciente , População Rural
4.
J Biomed Inform ; 44(5): 805-14, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21549857

RESUMO

Biomedical natural language processing (BioNLP) is a useful technique that unlocks valuable information stored in textual data for practice and/or research. Syntactic parsing is a critical component of BioNLP applications that rely on correctly determining the sentence and phrase structure of free text. In addition to dealing with the vast amount of domain-specific terms, a robust biomedical parser needs to model the semantic grammar to obtain viable syntactic structures. With either a rule-based or corpus-based approach, the grammar engineering process requires substantial time and knowledge from experts, and does not always yield a semantically transferable grammar. To reduce the human effort and to promote semantic transferability, we propose an automated method for deriving a probabilistic grammar based on a training corpus consisting of concept strings and semantic classes from the Unified Medical Language System (UMLS), a comprehensive terminology resource widely used by the community. The grammar is designed to specify noun phrases only due to the nominal nature of the majority of biomedical terminological concepts. Evaluated on manually parsed clinical notes, the derived grammar achieved a recall of 0.644, precision of 0.737, and average cross-bracketing of 0.61, which demonstrated better performance than a control grammar with the semantic information removed. Error analysis revealed shortcomings that could be addressed to improve performance. The results indicated the feasibility of an approach which automatically incorporates terminology semantics in the building of an operational grammar. Although the current performance of the unsupervised solution does not adequately replace manual engineering, we believe once the performance issues are addressed, it could serve as an aide in a semi-supervised solution.


Assuntos
Semântica , Terminologia como Assunto , Armazenamento e Recuperação da Informação , Unified Medical Language System , Vocabulário Controlado
5.
J Am Med Inform Assoc ; 27(10): 1625-1638, 2020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-32766692

RESUMO

OBJECTIVE: The study sought to describe the literature related to the development of methods for auditing the Unified Medical Language System (UMLS), with particular attention to identifying errors and inconsistencies of attributes of the concepts in the UMLS Metathesaurus. MATERIALS AND METHODS: We applied the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) approach by searching the MEDLINE database and Google Scholar for studies referencing the UMLS and any of several terms related to auditing, error detection, and quality assurance. A qualitative analysis and summarization of articles that met inclusion criteria were performed. RESULTS: Eighty-three studies were reviewed in detail. We first categorized techniques based on various aspects including concepts, concept names, and synonymy (n = 37), semantic type assignments (n = 36), hierarchical relationships (n = 24), lateral relationships (n = 12), ontology enrichment (n = 8), and ontology alignment (n = 18). We also categorized the methods according to their level of automation (ie, automated systematic, automated heuristic, or manual) and the type of knowledge used (ie, intrinsic or extrinsic knowledge). CONCLUSIONS: This study is a comprehensive review of the published methods for auditing the various conceptual aspects of the UMLS. Categorizing the auditing techniques according to the various aspects will enable the curators of the UMLS as well as researchers comprehensive easy access to this wealth of knowledge (eg, for auditing lateral relationships in the UMLS). We also reviewed ontology enrichment and alignment techniques due to their critical use of and impact on the UMLS.


Assuntos
Controle de Qualidade , Unified Medical Language System , Heurística Computacional , Web Semântica
6.
Bioinformatics ; 24(17): 1971-3, 2008 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-18625612

RESUMO

UNLABELLED: Accurate semantic classification is valuable for text mining and knowledge-based tasks that perform inference based on semantic classes. To benefit applications using the semantic classification of the Unified Medical Language System (UMLS) concepts, we automatically reclassified the concepts based on their lexical and contextual features. The new classification is useful for auditing the original UMLS semantic classification and for building biomedical text mining applications. AVAILABILITY: http://www.dbmi.columbia.edu/~juf7002/reclassify_production


Assuntos
Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Terminologia como Assunto , Unified Medical Language System , Bases de Dados Factuais , Semântica
7.
J Biomed Inform ; 42(3): 413-25, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19285571

RESUMO

Although controlled biomedical terminologies have been with us for centuries, it is only in the last couple of decades that close attention has been paid to the quality of these terminologies. The result of this attention has been the development of auditing methods that apply formal methods to assessing whether terminologies are complete and accurate. We have performed an extensive literature review to identify published descriptions of these methods and have created a framework for characterizing them. The framework considers manual, systematic and heuristic methods that use knowledge (within or external to the terminology) to measure quality factors of different aspects of the terminology content (terms, semantic classification, and semantic relationships). The quality factors examined included concept orientation, consistency, non-redundancy, soundness and comprehensive coverage. We reviewed 130 studies that were retrieved based on keyword search on publications in PubMed, and present our assessment of how they fit into our framework. We also identify which terminologies have been audited with the methods and provide examples to illustrate each part of the framework.


Assuntos
Auditoria Administrativa/métodos , Informática Médica , Terminologia como Assunto
8.
Bioinformatics ; 23(8): 1015-22, 2007 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-17314123

RESUMO

MOTIVATION: The ambiguity of biomedical entities, particularly of gene symbols, is a big challenge for text-mining systems in the biomedical domain. Existing knowledge sources, such as Entrez Gene and the MEDLINE database, contain information concerning the characteristics of a particular gene that could be used to disambiguate gene symbols. RESULTS: For each gene, we create a profile with different types of information automatically extracted from related MEDLINE abstracts and readily available annotated knowledge sources. We apply the gene profiles to the disambiguation task via an information retrieval method, which ranks the similarity scores between the context where the ambiguous gene is mentioned, and candidate gene profiles. The gene profile with the highest similarity score is then chosen as the correct sense. We evaluated the method on three automatically generated testing sets of mouse, fly and yeast organisms, respectively. The method achieved the highest precision of 93.9% for the mouse, 77.8% for the fly and 89.5% for the yeast. AVAILABILITY: The testing data sets and disambiguation programs are available at http://www.dbmi.columbia.edu/~hux7002/gsd2006


Assuntos
Inteligência Artificial , Bases de Dados Genéticas , Genes/genética , MEDLINE , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Terminologia como Assunto , Indexação e Redação de Resumos/métodos , Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação/métodos , Vocabulário Controlado
9.
BMC Bioinformatics ; 8: 264, 2007 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-17650333

RESUMO

BACKGROUND: Biomedical ontologies are critical for integration of data from diverse sources and for use by knowledge-based biomedical applications, especially natural language processing as well as associated mining and reasoning systems. The effectiveness of these systems is heavily dependent on the quality of the ontological terms and their classifications. To assist in developing and maintaining the ontologies objectively, we propose automatic approaches to classify and/or validate their semantic categories. In previous work, we developed an approach using contextual syntactic features obtained from a large domain corpus to reclassify and validate concepts of the Unified Medical Language System (UMLS), a comprehensive resource of biomedical terminology. In this paper, we introduce another classification approach based on words of the concept strings and compare it to the contextual syntactic approach. RESULTS: The string-based approach achieved an error rate of 0.143, with a mean reciprocal rank of 0.907. The context-based and string-based approaches were found to be complementary, and the error rate was reduced further by applying a linear combination of the two classifiers. The advantage of combining the two approaches was especially manifested on test data with sufficient contextual features, achieving the lowest error rate of 0.055 and a mean reciprocal rank of 0.969. CONCLUSION: The lexical features provide another semantic dimension in addition to syntactic contextual features that support the classification of ontological concepts. The classification errors of each dimension can be further reduced through appropriate combination of the complementary classifiers.


Assuntos
Pesquisa Biomédica/classificação , Informática Médica/classificação , Terminologia como Assunto , Pesquisa Biomédica/normas , Informática Médica/normas , Semântica , Software/classificação , Software/normas , Unified Medical Language System/classificação , Unified Medical Language System/normas
10.
J Am Med Inform Assoc ; 14(4): 467-77, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17460124

RESUMO

OBJECTIVE: To develop an automated, high-throughput, and reproducible method for reclassifying and validating ontological concepts for natural language processing applications. DESIGN: We developed a distributional similarity approach to classify the Unified Medical Language System (UMLS) concepts. Classification models were built for seven broad biomedically relevant semantic classes created by grouping subsets of the UMLS semantic types. We used contextual features based on syntactic properties obtained from two different large corpora and used alpha-skew divergence as the similarity measure. MEASUREMENTS: The testing sets were automatically generated based on the changes by the National Library of Medicine to the semantic classification of concepts from the UMLS 2005AA to the 2006AA release. Error rates were calculated and a misclassification analysis was performed. RESULTS: The estimated lowest error rates were 0.198 and 0.116 when considering the correct classification to be covered by our top prediction and top 2 predictions, respectively. CONCLUSION: The results demonstrated that the distributional similarity approach can recommend high level semantic classification suitable for use in natural language processing.


Assuntos
Classificação/métodos , Processamento de Linguagem Natural , Unified Medical Language System , Semântica , Terminologia como Assunto
11.
Stud Health Technol Inform ; 129(Pt 1): 519-23, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17911771

RESUMO

The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.


Assuntos
Classificação/métodos , Processamento de Linguagem Natural , Unified Medical Language System , Semântica
12.
AMIA Annu Symp Proc ; 2017: 689-695, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29854134

RESUMO

Dietary supplements remain a relatively underexplored source for drug repurposing. A systematic approach to soliciting responses from a large consumer population is desirable to speed up innovation. We tested a workflow that mines unexpected benefits of dietary supplements from massive consumer reviews. A (non-exhaustive) list of regular expressions was used to screen over 2 million reviews on health and personal care products. The matched reviews were manually analyzed, and one supplement-disease pair was linked to biological databases for enriching the hypothesized association. The regular expressions found 169 candidate reviews, of which 45.6% described unexpected benefits of certain dietary supplements. The manual analysis showed some of the supplement-disease associations to be novel or in agreement with evidence published later in the literature. The hypothesis enrichment was able to identify meaningful function similarity between the supplement and the disease. The results demonstrated value of the workflow in identifying candidates for supplement repurposing.


Assuntos
Comportamento do Consumidor , Crowdsourcing , Mineração de Dados/métodos , Suplementos Nutricionais , Reposicionamento de Medicamentos , Comércio , Humanos
13.
J Healthc Eng ; 2017: 3818302, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29065591

RESUMO

Exposome is a critical dimension in the precision medicine paradigm. Effective representation of exposomics knowledge is instrumental to melding nongenetic factors into data analytics for clinical research. There is still limited work in (1) modeling exposome entities and relations with proper integration to mainstream ontologies and (2) systematically studying their presence in clinical context. Through selected ontological relations, we developed a template-driven approach to identifying exposome concepts from the Unified Medical Language System (UMLS). The derived concepts were evaluated in terms of literature coverage and the ability to assist in annotating clinical text. The generated semantic model represents rich domain knowledge about exposure events (454 pairs of relations between exposure and outcome). Additionally, a list of 5667 disorder concepts with microbial etiology was created for inferred pathogen exposures. The model consistently covered about 90% of PubMed literature on exposure-induced iatrogenic diseases over 10 years (2001-2010). The model contributed to the efficiency of exposome annotation in clinical text by filtering out 78% of irrelevant machine annotations. Analysis into 50 annotated discharge summaries helped advance our understanding of the exposome information in clinical text. This pilot study demonstrated feasibility of semiautomatically developing a useful semantic resource for exposomics.


Assuntos
Ontologias Biológicas , Exposição Ambiental , Doença Iatrogênica/prevenção & controle , Semântica , Unified Medical Language System , Humanos , Projetos Piloto
14.
Biomed Inform Insights ; 8(Suppl 1): 1-11, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27375358

RESUMO

In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online product reviews. Leveraging natural language processing and machine learning tools, we were able to mine 1.3 million grocery product reviews for health-related information. The objectives of the study were as follows: (1) conduct quantitative and qualitative analysis on the types of health issues found in consumer product reviews; (2) develop a machine learning classifier to detect reviews that contain health-related issues; and (3) gain insights about the task characteristics and challenges for text analytics to guide future research.

15.
J Am Med Inform Assoc ; 20(6): 1168-77, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23907286

RESUMO

OBJECTIVE: To develop, evaluate, and share: (1) syntactic parsing guidelines for clinical text, with a new approach to handling ill-formed sentences; and (2) a clinical Treebank annotated according to the guidelines. To document the process and findings for readers with similar interest. METHODS: Using random samples from a shared natural language processing challenge dataset, we developed a handbook of domain-customized syntactic parsing guidelines based on iterative annotation and adjudication between two institutions. Special considerations were incorporated into the guidelines for handling ill-formed sentences, which are common in clinical text. Intra- and inter-annotator agreement rates were used to evaluate consistency in following the guidelines. Quantitative and qualitative properties of the annotated Treebank, as well as its use to retrain a statistical parser, were reported. RESULTS: A supplement to the Penn Treebank II guidelines was developed for annotating clinical sentences. After three iterations of annotation and adjudication on 450 sentences, the annotators reached an F-measure agreement rate of 0.930 (while intra-annotator rate was 0.948) on a final independent set. A total of 1100 sentences from progress notes were annotated that demonstrated domain-specific linguistic features. A statistical parser retrained with combined general English (mainly news text) annotations and our annotations achieved an accuracy of 0.811 (higher than models trained purely with either general or clinical sentences alone). Both the guidelines and syntactic annotations are made available at https://sourceforge.net/projects/medicaltreebank. CONCLUSIONS: We developed guidelines for parsing clinical text and annotated a corpus accordingly. The high intra- and inter-annotator agreement rates showed decent consistency in following the guidelines. The corpus was shown to be useful in retraining a statistical parser that achieved moderate accuracy.


Assuntos
Registros Eletrônicos de Saúde , Guias como Assunto , Linguística , Processamento de Linguagem Natural
16.
AMIA Annu Symp Proc ; 2011: 382-91, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22195091

RESUMO

Part-of-speech (POS) tagging is a fundamental step required by various NLP systems. The training of a POS tagger relies on sufficient quality annotations. However, the annotation process is both knowledge-intensive and time-consuming in the clinical domain. A promising solution appears to be for institutions to share their annotation efforts, and yet there is little research on associated issues. We performed experiments to understand how POS tagging performance would be affected by using a pre-trained tagger versus raw training data across different institutions. We manually annotated a set of clinical notes at Kaiser Permanente Southern California (KPSC) and a set from the University of Pittsburg Medical Center (UPMC), and trained/tested POS taggers with intra- and inter-institution settings. The cTAKES POS tagger was also included in the comparison to represent a tagger partially trained from the notes of a third institution, Mayo Clinic at Rochester. Intra-institution 5-fold cross-validation estimated an accuracy of 0.953 and 0.945 on the KPSC and UPMC notes respectively. Trained purely on KPSC notes, the accuracy was 0.897 when tested on UPMC notes. Trained purely on UPMC notes, the accuracy was 0.904 when tested on KPSC notes. Applying the cTAKES tagger pre-trained with Mayo Clinic's notes, the accuracy was 0.881 on KPSC notes and 0.883 on UPMC notes. After adding UPMC annotations to KPSC training data, the average accuracy on tested KPSC notes increased to 0.965. After adding KPSC annotations to UPMC training data, the average accuracy on tested UPMC notes increased to 0.953. The results indicated: first, the performance of pre-trained POS taggers dropped about 5% when applied directly across the institutions; second, mixing annotations from another institution following the same guideline increased tagging accuracy for about 1%. Our findings suggest that institutions can benefit more from sharing raw annotations but less from sharing pre-trained models for the POS tagging task. We believe the study could also provide general insights on cross-institution data sharing for other types of NLP tasks.


Assuntos
Registros Eletrônicos de Saúde , Linguística , Registro Médico Coordenado/métodos , Processamento de Linguagem Natural , Sistemas Computadorizados de Registros Médicos
17.
AMIA Annu Symp Proc ; 2009: 183-7, 2009 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-20351846

RESUMO

Word sense disambiguation (WSD) determines the correct meaning of a word that has more than one meaning, and is a critical step in biomedical natural language processing, as interpretation of information in text can be correct only if the meanings of their component terms are correctly identified first. Quality evaluation sets are important to WSD because they can be used as representative samples for developing automatic programs and as referees for comparing different WSD programs. To help create quality test sets for WSD, we developed a MeSH-based automatic sense-tagging method that preferentially annotates terms being topical of the text. Preliminary results were promising and revealed important issues to be addressed in biomedical WSD research. We also suggest that, by cross-validating with 2 or 3 annotators, the method should be able to efficiently generate quality WSD test sets. Online supplement is available at: http://www.dbmi.columbia.edu/~juf7002/AMIA09.


Assuntos
Indexação e Redação de Resumos , Medical Subject Headings , Processamento de Linguagem Natural , Processamento Eletrônico de Dados
18.
AMIA Annu Symp Proc ; : 177-81, 2008 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-18998821

RESUMO

Accurate concept identification is crucial to biomedical natural language processing. However,ambiguity is common during the process of mapping terms to biomedical concepts (one term can be mapped to several concepts). A cost-effective approach to disambiguation relating to training is via semantic classification of the ambiguous terms,provided that the semantic classes of the concepts are available and are all different. We propose such a semantic classification based method to disambiguate ambiguous mappings with different semantic type(s), which can be used with any program that maps terms to UMLS concepts.Classifiers for the semantic types were built using abundant features extracted from a huge corpus with terms mapped to UMLS concepts. The method achieved a precision of 0.709, with unique advantages not achievable by the other comparable methods. Our results also demonstrate a need to further investigate the complementary properties of different methods.


Assuntos
Algoritmos , Inteligência Artificial , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Semântica , Vocabulário Controlado
19.
AMIA Annu Symp Proc ; : 231-5, 2007 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-18693832

RESUMO

Semantic classification is important for biomedical terminologies and the many applications that depend on them. Previously we developed two classifiers for 8 broad clinically relevant classes to reclassify and validate UMLS concepts. We found them to be complementary, and then combined them using a manual approach. In this paper, we extended the classifiers by adding an "other" class to categorize concepts not belonging to any of the 8 classes. In addition, we focused on automating the method for combining the two classifiers by training a meta-classifier that performs dynamic combination to exploit the strength of each classifier. The automated method performed as well as manual combination, achieving classification accuracy of about 0.81.


Assuntos
Unified Medical Language System/classificação , Pesquisa Biomédica/classificação , Informática Médica/classificação , Semântica , Terminologia como Assunto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA