Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
2.
Artículo en Inglés | MEDLINE | ID: mdl-32472120

RESUMEN

Natural language processing (NLP) plays a vital role in modern medical informatics. It converts narrative text or unstructured data into knowledge by analyzing and extracting concepts. A comprehensive lexical system is the foundation to the success of NLP applications and an essential component at the beginning of the NLP pipeline. The SPECIALIST Lexicon and Lexical Tools, distributed by the National Library of Medicine as one of the Unified Medical Language System Knowledge Sources, provides an underlying resource for many NLP applications. This article reports recent developments of 3 key components in the Lexicon. The core NLP operation of Unified Medical Language System concept mapping is used to illustrate the importance of these developments. Our objective is to provide generic, broad coverage and a robust lexical system for NLP applications. A novel multiword approach and other planned developments are proposed.

3.
J Am Med Inform Assoc ; 26(3): 211-218, 2019 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-30668712

RESUMEN

Objective: Automated understanding of consumer health inquiries might be hindered by misspellings. To detect and correct various types of spelling errors in consumer health questions, we developed a distributable spell-checking tool, CSpell, that handles nonword errors, real-word errors, word boundary infractions, punctuation errors, and combinations of the above. Methods: We developed a novel approach of using dual embedding within Word2vec for context-dependent corrections. This technique was used in combination with dictionary-based corrections in a 2-stage ranking system. We also developed various splitters and handlers to correct word boundary infractions. All correction approaches are integrated to handle errors in consumer health questions. Results: Our approach achieves an F1 score of 80.93% and 69.17% for spelling error detection and correction, respectively. Discussion: The dual-embedding model shows a significant improvement (9.13%) in F1 score compared with the general practice of using cosine similarity with word vectors in Word2vec for context ranking. Our 2-stage ranking system shows a 4.94% improvement in F1 score compared with the best 1-stage ranking system. Conclusion: CSpell improves over the state of the art and provides near real-time automatic misspelling detection and correction in consumer health questions. The software and the CSpell test set are available at https://umlslex.nlm.nih.gov/cSpell.


Asunto(s)
Algoritmos , Información de Salud al Consumidor , Conducta en la Búsqueda de Información , Lenguaje , Procesamiento de Lenguaje Natural , Informática Aplicada a la Salud de los Consumidores , Humanos
4.
Stud Health Technol Inform ; 245: 501-505, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29295145

RESUMEN

Concept mapping is important in natural language processing (NLP) for bioinformatics. The UMLS Metathesaurus provides a rich synonym thesaurus and is a popular resource for concept mapping. Query expansion using synonyms for subterm substitutions is an effective technique to increase recall for UMLS concept mapping. Synonyms used to substitute subterms are called element synonyms. The completeness and quality of both element synonyms and the UMLS synonym thesaurus is the key to success in such applications. The Lexical Systems Group (LSG) has developed a new system for element synonym acquisition based on new enhanced requirements and design for better performance. The results show: 1) A 36.71 times growth of synonyms in the Lexicon (lexSynonym) in the 2017 release; 2) Improvements of concept mapping for recall and F1 with similar precision using the lexSynonym.2017 as element synonyms due to the broader coverage and better quality.


Asunto(s)
Procesamiento de Lenguaje Natural , Unified Medical Language System , Semántica , Vocabulario Controlado
5.
AMIA Annu Symp Proc ; : 1030, 2008 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-18998786

RESUMEN

Journal Descriptor Indexing (JDI) is a vector-based text classification system developed at NLM (National Library of Medicine), originally in Lisp and now as a Java tool. Consequently, a testing suite was developed to verify training set data and results of the JDI tool. A methodology was developed and implemented to compare two sets of JD vectors, resulting in a single index (from 0 - 1) measuring their similarity. This methodology is fast, effective, and accurate.


Asunto(s)
Inteligencia Artificial , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Reconocimiento de Normas Patrones Automatizadas/métodos , Terminología como Asunto , Vocabulario Controlado , Algoritmos , Estados Unidos
6.
AMIA Annu Symp Proc ; : 1031, 2008 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-18998787

RESUMEN

Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in most of the worlds writing systems. It is widely used in multilingual NLP (natural language processing) projects. On the other hand, there are some NLP projects still only dealing with ASCII characters. This paper describes methods of utilizing lexical tools to convert Unicode characters (UTF-8) to ASCII (7-bit) characters.


Asunto(s)
Inteligencia Artificial , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Reconocimiento de Normas Patrones Automatizadas/métodos , Terminología como Asunto , Vocabulario Controlado , Algoritmos , Estados Unidos
7.
AMIA Annu Symp Proc ; : 960, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-17238579

RESUMEN

A JDI (Journal Descriptor Indexing) tool has been developed at NLM that automatically categorizes biomedical text as input, returning a ranked list, with scores between 0-1, of either JDs (Journal Descriptors, corresponding to biomedical disciplines) or STs (UMLS Semantic Types). Possible applications include WSD (Word Sense Disambiguation) and retrieval according to discipline. The Lexical Systems Group plans to distribute an open source JAVA version of this tool.


Asunto(s)
Indización y Redacción de Resúmenes/métodos , Procesamiento de Lenguaje Natural , Medical Subject Headings , Publicaciones Periódicas como Asunto , Semántica , Unified Medical Language System
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA