Conditional random fields for clinical named entity recognition: A comparative study using Korean clinical texts.

Lee, Wangjin; Kim, Kyungmo; Lee, Eun Young; Choi, Jinwook

Lee, Wangjin; Kim, Kyungmo; Lee, Eun Young; Choi, Jinwook.

Afiliación

Lee W; Interdisciplinary Program for Bioengineering, Graduate School, Seoul National University, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea. Electronic address: jinsamdol@snu.ac.kr.
Kim K; Interdisciplinary Program for Bioengineering, Graduate School, Seoul National University, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea. Electronic address: medinfoman@snu.ac.kr.
Lee EY; Division of Rheumatology, Department of Internal Medicine, Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea. Electronic address: elee@snu.ac.kr.
Choi J; Interdisciplinary Program for Bioengineering, Graduate School, Seoul National University, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea; Department of Biomedical Engineering, Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea; Institute of Medica

Comput Biol Med ; 101: 7-14, 2018 10 01.

Article en En | MEDLINE | ID: mdl-30086416

ABSTRACT

ABSTRACT

BACKGROUND:

This study demonstrates clinical named entity recognition (NER) methods on the clinical texts of rheumatism patients in South Korea. Despite the recent increase in the adoption rate of the electronic health record (EHR) system in global health institutions, health information technologies for handling and acquisition of information from numerous unstructured texts in the EHR system are still in their developing stages. The aim of this study is to verify the conventional named entity recognition (NER) methods, namely dictionary-lookup-based string matching and conditional random fields (CRFs).

METHODS:

We selected discharge summaries for 200 rheumatic patients from the EHR system of the Seoul National University Hospital and attempted to identify heterogeneous semantic types present in the clinical notes of each patient's history.

RESULTS:

CRFs outperform string matching in extracting most semantic types (median F1â¯=â¯0.761, minimumâ¯=â¯0.705, maximumâ¯=â¯0.906). String matching is found to be better suited for identifying hospital visit information. The performance of both methods is comparable for identifying medications. The 10-fold cross-validation shows that CRFs had median F1â¯=â¯0.811 (minimumâ¯=â¯0.752, maximumâ¯=â¯0.918), and exhibited good performance even when trained with simple features.

CONCLUSION:

CRFs are a good candidate for implementing clinical NER in Korean clinical narrative documents. Increasing the training data and incorporating sophisticated feature engineering might improve the accuracy of identifying health information, enabling automated patient history summarization in the future.

Asunto(s)

Minería de Datos/métodos; Informática Médica; Procesamiento de Lenguaje Natural; Humanos; República de Corea

Palabras clave

Clinical named entity recognition; Conditional random field; Discharge summary; Medical history; String matching

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Contexto en salud: 11_ODS3_cobertura_universal / 1_ASSA2030 / 2_ODS3 Problema de salud: 11_delivery_arrangements / 11_multisectoral_coordination / 1_sistemas_informacao_saude / 2_cobertura_universal Asunto principal: Informática Médica / Procesamiento de Lenguaje Natural / Minería de Datos Tipo de estudio: Clinical_trials / Prognostic_studies Límite: Humans País/Región como asunto: Asia Idioma: En Revista: Comput Biol Med Año: 2018 Tipo del documento: Article

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google