Pesquisa | BVS IEC

Cross-Language Terminology Mapping Between ICD-10-CN and SNOMED-CT.

Hong, Na; Zhang, Yaoyun; Ren, Yuankai; Hou, Li; Wang, Changran; Li, Jing; Van Zandt, Mui; Liu, Lei.

Stud Health Technol Inform ; 290: 42-46, 2022 Jun 06.

Artigo em Inglês | MEDLINE | ID: mdl-35672967

RESUMO

The objective of this study was to develop a hybrid method and perform an initial evaluation of mappings from the International Statistical Classification of Diseases, 10th revision, Chinese version (ICD-10-CN) to the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT). The methods used to perform mapping include reusing existing mappings, term similarity modeling for automatic mapping and manual review. We evaluated the results of automatic mapping and the coverage of the maps between two terminologies. Experimental results demonstrated that fine-tuning the pre-trained biomedical language model of PubmedBERT obtained the optimal performance, with a precision of 0.859, a recall of 0.773, and a F1 of 0.814. 100% 4-digit code ICD-10-CN terms were mapped to SNOMED-CT terms through exsit code mappings. Around 42.41% randomly selected 6-digit code ICD-10-CN terms had exact matches to corresponding SNOMED-CT terms, and we did not find appropriate SNOMED-CT terms for ICD grouping terms.

Assuntos

Classificação Internacional de Doenças , Systematized Nomenclature of Medicine , Idioma

From Tokenization to Self-Supervision: Building a High-Performance Information Extraction System for Chemical Reactions in Patents.

Wang, Jingqi; Ren, Yuankai; Zhang, Zhi; Xu, Hua; Zhang, Yaoyun.

Front Res Metr Anal ; 6: 691105, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-35005421

RESUMO

Chemical reactions and experimental conditions are fundamental information for chemical research and pharmaceutical applications. However, the latest information of chemical reactions is usually embedded in the free text of patents. The rapidly accumulating chemical patents urge automatic tools based on natural language processing (NLP) techniques for efficient and accurate information extraction. This work describes the participation of the Melax Tech team in the CLEF 2020-ChEMU Task of Chemical Reaction Extraction from Patent. The task consisted of two subtasks: (1) named entity recognition to identify compounds and different semantic roles in the chemical reaction and (2) event extraction to identify event triggers of chemical reaction and their relations with the semantic roles recognized in subtask 1. To build an end-to-end system with high performance, multiple strategies tailored to chemical patents were applied and evaluated, ranging from optimizing the tokenization, pre-training patent language models based on self-supervision, to domain knowledge-based rules. Our hybrid approaches combining different strategies achieved state-of-the-art results in both subtasks, with the top-ranked F1 of 0.957 for entity recognition and the top-ranked F1 of 0.9536 for event extraction, indicating that the proposed approaches are promising.

Extracting comprehensive clinical information for breast cancer using deep learning methods.

Zhang, Xiaohui; Zhang, Yaoyun; Zhang, Qin; Ren, Yuankai; Qiu, Tinglin; Ma, Jianhui; Sun, Qiang.

Int J Med Inform ; 132: 103985, 2019 12.

Artigo em Inglês | MEDLINE | ID: mdl-31627032

RESUMO

OBJECTIVE: Breast cancer is the most common malignant tumor among women. The diagnosis and treatment information of breast cancer patients is abundant in multiple types of clinical fields, including clinicopathological data, genotype and phenotype information, treatment information, and prognosis information. However, current studies are mainly focused on extracting information from one specific type of clinical field. This study defines a comprehensive information model to represent the whole-course clinical information of patients. Furthermore, deep learning approaches are used to extract the concepts and their attributes from clinical breast cancer documents by fine-tuning pretrained Bidirectional Encoder Representations from Transformers (BERT) language models. MATERIALS AND METHODS: The clinical corpus that was used in this study was from one 3A cancer hospital in China, consisting of the encounter notes, operation records, pathology notes, radiology notes, progress notes and discharge summaries of 100 breast cancer patients. Our system consists of two components: a named entity recognition (NER) component and a relation recognition component. For each component, we implemented deep learning-based approaches by fine-tuning BERT, which outperformed other state-of-the-art methods on multiple natural language processing (NLP) tasks. A clinical language model is first pretrained using BERT on a large-scale unlabeled corpus of Chinese clinical text. For NER, the context embeddings that were pretrained using BERT were used as the input features of the Bi-LSTM-CRF (Bidirectional long-short-memory-conditional random fields) model and were fine-tuned using the annotated breast cancer notes. Furthermore, we proposed an approach to fine-tune BERT for relation extraction. It was considered to be a classification problem in which the two entities that were mentioned in the input sentence were replaced with their semantic types. RESULTS: Our best-performing system achieved F1 scores of 93.53% for the NER and 96.73% for the relation extraction. Additional evaluations showed that the deep learning-based approaches that fine-tuned BERT did outperform the traditional Bi-LSTM-CRF and CRF machine learning algorithms in NER and the attention-Bi-LSTM and SVM (support vector machines) algorithms in relation recognition. CONCLUSION: In this study, we developed a deep learning approach that fine-tuned BERT to extract the breast cancer concepts and their attributes. It demonstrated its superior performance compared to traditional machine learning algorithms, thus supporting its uses in broader NER and relation extraction tasks in the medical domain.

Assuntos

Algoritmos , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/terapia , Aprendizado Profundo , Processamento de Linguagem Natural , Neoplasias da Mama/epidemiologia , China/epidemiologia , Feminino , Humanos , Máquina de Vetores de Suporte

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA