Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Heliyon ; 9(10): e20692, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37876457

RESUMEN

Chinese medical named entity recognition (NER) is a fundamental task in Chinese medical natural language processing, aiming to recognize Chinese medical entities within unstructured medical texts. However, it poses significant challenges mainly due to the extensive usage of medical terms in Chinese medical texts. Although previous studies have made attempts to incorporate lexical or radical knowledge in order to improve the comprehension of medical texts, these studies either focus solely on one of these aspects or utilize a basic concatenation operation to combine these features, which fails to fully utilize the potential of lexical and radical knowledge. In this paper, we propose a novel Cascaded LAttice-and-Radical Transformer (CLART) network to exploit both lexical and radical information for Chinese medical NER. Specifically, given a sentence, a medical lexicon, and a radical dictionary, we first construct a flat lattice (i.e., character-word sequence) for the sentence and radical components of each Chinese character through word matching and radical parsing, respectively. We then employ a lattice Transformer module to capture the dense interactions between characters and matched words, facilitating the enhanced utilization of lexical knowledge. Subsequently, we design a radical Transformer module to model the dense interactions between the lattice and radical features, facilitating better fusion of the lexical and radical knowledge. Finally, we feed the updated lattice-and-radical-aware character representations into a Conditional Random Fields (CRF) decoder to obtain the predicted labels. Experimental results conducted on two publicly available Chinese medical NER datasets show the effectiveness of the proposed method.

2.
AMIA Jt Summits Transl Sci Proc ; 2021: 315-324, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34457146

RESUMEN

Extracting clinical concepts and their relations from clinical narratives is one of the fundamental tasks in clinical natural language processing. Traditional solutions often separate this task into two subtasks with a pipeline architecture, which first recognize the named entities and then classify the relations between any possible entity pairs. The pipeline architecture, although widely used, has two limitations: 1) it suffers from error propagation from the recognition step to the classification step, 2) it cannot utilize the interactions between the two steps. To address the limitations, we investigated a discrete joint model based on structured perceptron and beam search to jointly perform named entity recognition (NER) and relation classification (RC) from clinical notes.


Asunto(s)
Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , Humanos , Narración , Proyectos de Investigación
3.
AMIA Jt Summits Transl Sci Proc ; 2020: 269-277, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32477646

RESUMEN

Developing high-performance entity normalization algorithms that can alleviate the term variation problem is of great interest to the biomedical community. Although deep learning-based methods have been successfully applied to biomedical entity normalization, they often depend on traditional context-independent word embeddings. Bidirectional Encoder Representations from Transformers (BERT), BERT for Biomedical Text Mining (BioBERT) and BERT for Clinical Text Mining (ClinicalBERT) were recently introduced to pre-train contextualized word representation models using bidirectional Transformers, advancing the state-of-the-art for many natural language processing tasks. In this study, we proposed an entity normalization architecture by fine-tuning the pre-trained BERT / BioBERT / ClinicalBERT models and conducted extensive experiments to evaluate the effectiveness of the pre-trained models for biomedical entity normalization using three different types of datasets. Our experimental results show that the best fine-tuned models consistently outperformed previous methods and advanced the state-of-the-art for biomedical entity normalization, with up to 1.17% increase in accuracy.

4.
J Biomed Inform ; 105: 103418, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32298846

RESUMEN

OBJECTIVE: This study aims to develop and evaluate effective methods that can normalize diagnosis and procedure terms written by physicians to standard concepts in International Classification of Diseases(ICD) in Chinese, with the goal to facilitate automated medical coding in China. METHODS: We applied the entity-linking framework to normalize Chinese diagnosis and procedure terms, which consists of two steps - candidate concept generation and candidate concept ranking. For candidate concept generation, we implemented both the traditional BM25 algorithm and an extended version that integrates a synonym knowledgebase. For candidate concept ranking, we investigated a number of different algorithms: (1) the BM25 algorithm, (2) ranking support vector machines (RankSVM), (3) a previously reported Convolutional Neural Network (CNN) approach, (4) 11 deep ranking-based methods from the MatchZoo toolkit, and (5) a new BERT (Bidirectional Encoder Representations from Transformers) based ranking method. Using two manually annotated datasets (8,547 diagnoses and 8,282 procedures) collected from a Tier 3A hospital in China, we evaluated above methods and reported their performance (i.e., accuracy) at different cutoffs. RESULTS: The coverage of candidate concept generation was greatly improved after integrating the synonym knowledgebase, achieving 97.9% for diagnoses and 93.4% for procedures respectively. Overall the new BERT-based ranking method achieved the best performance on both diagnosis and procedure normalization, with the best accuracy of 92.1% for diagnosis and 80.1% for procedure, when the top one concept and exact match criteria were used. CONCLUSIONS: This study developed and compared diverse entity-linking methods to normalize clinical terms in Chinese and our evaluation shows good performance on mapping disease terms to ICD codes, demonstrating the feasibility of automated encoding of clinical terms in Chinese.


Asunto(s)
Clasificación Internacional de Enfermedades , Redes Neurales de la Computación , China , Codificación Clínica , Máquina de Vectores de Soporte
5.
J Am Med Inform Assoc ; 27(1): 13-21, 2020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31135882

RESUMEN

OBJECTIVE: This article presents our approaches to extraction of medications and associated adverse drug events (ADEs) from clinical documents, which is the second track of the 2018 National NLP Clinical Challenges (n2c2) shared task. MATERIALS AND METHODS: The clinical corpus used in this study was from the MIMIC-III database and the organizers annotated 303 documents for training and 202 for testing. Our system consists of 2 components: a named entity recognition (NER) and a relation classification (RC) component. For each component, we implemented deep learning-based approaches (eg, BI-LSTM-CRF) and compared them with traditional machine learning approaches, namely, conditional random fields for NER and support vector machines for RC, respectively. In addition, we developed a deep learning-based joint model that recognizes ADEs and their relations to medications in 1 step using a sequence labeling approach. To further improve the performance, we also investigated different ensemble approaches to generating optimal performance by combining outputs from multiple approaches. RESULTS: Our best-performing systems achieved F1 scores of 93.45% for NER, 96.30% for RC, and 89.05% for end-to-end evaluation, which ranked #2, #1, and #1 among all participants, respectively. Additional evaluations show that the deep learning-based approaches did outperform traditional machine learning algorithms in both NER and RC. The joint model that simultaneously recognizes ADEs and their relations to medications also achieved the best performance on RC, indicating its promise for relation extraction. CONCLUSION: In this study, we developed deep learning approaches for extracting medications and their attributes such as ADEs, and demonstrated its superior performance compared with traditional machine learning algorithms, indicating its uses in broader NER and RC tasks in the medical domain.


Asunto(s)
Aprendizaje Profundo , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Algoritmos , Humanos , Aprendizaje Automático , Narración , Preparaciones Farmacéuticas
6.
J Am Med Inform Assoc ; 27(3): 457-470, 2020 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-31794016

RESUMEN

OBJECTIVE: This article methodically reviews the literature on deep learning (DL) for natural language processing (NLP) in the clinical domain, providing quantitative analysis to answer 3 research questions concerning methods, scope, and context of current research. MATERIALS AND METHODS: We searched MEDLINE, EMBASE, Scopus, the Association for Computing Machinery Digital Library, and the Association for Computational Linguistics Anthology for articles using DL-based approaches to NLP problems in electronic health records. After screening 1,737 articles, we collected data on 25 variables across 212 papers. RESULTS: DL in clinical NLP publications more than doubled each year, through 2018. Recurrent neural networks (60.8%) and word2vec embeddings (74.1%) were the most popular methods; the information extraction tasks of text classification, named entity recognition, and relation extraction were dominant (89.2%). However, there was a "long tail" of other methods and specific tasks. Most contributions were methodological variants or applications, but 20.8% were new methods of some kind. The earliest adopters were in the NLP community, but the medical informatics community was the most prolific. DISCUSSION: Our analysis shows growing acceptance of deep learning as a baseline for NLP research, and of DL-based NLP in the medical community. A number of common associations were substantiated (eg, the preference of recurrent neural networks for sequence-labeling named entity recognition), while others were surprisingly nuanced (eg, the scarcity of French language clinical NLP with deep learning). CONCLUSION: Deep learning has not yet fully penetrated clinical NLP and is growing rapidly. This review highlighted both the popular and unique trends in this active field.


Asunto(s)
Aprendizaje Profundo/tendencias , Procesamiento de Lenguaje Natural , Bibliometría , Aprendizaje Profundo/estadística & datos numéricos , Registros Electrónicos de Salud , Humanos
7.
J Am Med Inform Assoc ; 26(12): 1584-1591, 2019 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-31550346

RESUMEN

OBJECTIVE: Extracting clinical entities and their attributes is a fundamental task of natural language processing (NLP) in the medical domain. This task is typically recognized as 2 sequential subtasks in a pipeline, clinical entity or attribute recognition followed by entity-attribute relation extraction. One problem of pipeline methods is that errors from entity recognition are unavoidably passed to relation extraction. We propose a novel joint deep learning method to recognize clinical entities or attributes and extract entity-attribute relations simultaneously. MATERIALS AND METHODS: The proposed method integrates 2 state-of-the-art methods for named entity recognition and relation extraction, namely bidirectional long short-term memory with conditional random field and bidirectional long short-term memory, into a unified framework. In this method, relation constraints between clinical entities and attributes and weights of the 2 subtasks are also considered simultaneously. We compare the method with other related methods (ie, pipeline methods and other joint deep learning methods) on an existing English corpus from SemEval-2015 and a newly developed Chinese corpus. RESULTS: Our proposed method achieves the best F1 of 74.46% on entity recognition and the best F1 of 50.21% on relation extraction on the English corpus, and 89.32% and 88.13% on the Chinese corpora, respectively, which outperform the other methods on both tasks. CONCLUSIONS: The joint deep learning-based method could improve both entity recognition and relation extraction from clinical text in both English and Chinese, indicating that the approach is promising.


Asunto(s)
Minería de Datos/métodos , Aprendizaje Profundo , Procesamiento de Lenguaje Natural , Conjuntos de Datos como Asunto , Registros Electrónicos de Salud , Humanos
8.
AMIA Jt Summits Transl Sci Proc ; 2019: 829-838, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31259040

RESUMEN

Developing high-throughput and high-performance phenotyping algorithms is critical to the secondary use of electronic health records for clinical research. Supervised machine learning-based methods have shown good performance, but often require large annotated datasets that are costly to build. Simulation studies have shown that active learning (AL) could reduce the number of annotated samples while improving the model performance when assuming that the time of labeling each sample is the same (i.e., cost-insensitive). In this study, we proposed a cost- sensitive AL (CostAL) algorithm for clinical phenotyping, using the identification of breast cancer patients as a use case. CostAL implements a linear regression model to estimate the actual time required for annotating each individual sample. We recruited two annotators to manual review medical records of 766 potential breast cancer patients and recorded the actual time of annotating each sample. We then compared CostAL, AL, and passive learning (PL, aka random sampling) using this annotated dataset and generated learning curves for each method. Our experimental results showed that CostAL achieved the highest area under the curve (AUC) score among the three algorithms (PL, AL, and CostAL are 0.784, 0.8501, and 0.8673 for user 1 and 0.8006, 0.8806 and 0.9006 for user 2). To achieve an accuracy of 0.94, AL and CostAL could save 36% and 60% annotation time for user 1 and 53% and 70% annotation time for user 2, when they were compared with PL, indicating the value of cost-sensitive AL approaches.

9.
AMIA Annu Symp Proc ; 2019: 1236-1245, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-32308921

RESUMEN

Natural language processing (NLP) is useful for extracting information from clinical narratives, and both traditional machine learning methods and more-recent deep learning methods have been successful in various clinical NLP tasks. These methods often depend on traditional word embeddings that are outputs of language models (LMs). Recently, methods that are directly based on pre-trained language models themselves, followed by fine-tuning on the LMs (e.g. the Bidirectional Encoder Representations from Transformers (BERT)), have achieved state-of-the-art performance on many NLP tasks. Despite their success in the open domain and biomedical literature, these pre-trained LMs have not yet been applied to the clinical relation extraction (RE) task. In this study, we developed two different implementations of the BERT model for clinical RE tasks. Our results show that our tuned LMs outperformed previous state-of-the-art RE systems in two shared tasks, which demonstrates the potential of LM-based methods on the RE task.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Conjuntos de Datos como Asunto , Humanos , Narración , Semántica
10.
Stud Health Technol Inform ; 245: 126-130, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29295066

RESUMEN

Due to the differences in environments and cultures, consumers seeking cancer information in various regions of the world may have diverse needs. This study compares the cancer information needs for consumers in the US and China. Specifically, we first collected 1,000 cancer-related questions from Yahoo! Answers and Baidu Zhidao, respectively. Then, we developed a taxonomy of health information needs and manually classified the questions using the taxonomy. Finally, we analyzed the characteristics of information needs from consumers in both countries and summarized the differences between them. Our study demonstrated that although there are some common needs between consumers in the US and China, there are several significant differences between the two countries: the Chinese consumers are more likely to seek diagnosis and treatment online, while the US consumers prefer to seek common medical knowledge online.


Asunto(s)
Conducta en la Búsqueda de Información , Internet , Neoplasias , China , Necesidades y Demandas de Servicios de Salud , Humanos , Estados Unidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...