Conditional random fields for clinical named entity recognition: A comparative study using Korean clinical texts.
Comput Biol Med
; 101: 7-14, 2018 10 01.
Article
en En
| MEDLINE
| ID: mdl-30086416
ABSTRACT
BACKGROUND:
This study demonstrates clinical named entity recognition (NER) methods on the clinical texts of rheumatism patients in South Korea. Despite the recent increase in the adoption rate of the electronic health record (EHR) system in global health institutions, health information technologies for handling and acquisition of information from numerous unstructured texts in the EHR system are still in their developing stages. The aim of this study is to verify the conventional named entity recognition (NER) methods, namely dictionary-lookup-based string matching and conditional random fields (CRFs).METHODS:
We selected discharge summaries for 200 rheumatic patients from the EHR system of the Seoul National University Hospital and attempted to identify heterogeneous semantic types present in the clinical notes of each patient's history.RESULTS:
CRFs outperform string matching in extracting most semantic types (median F1â¯=â¯0.761, minimumâ¯=â¯0.705, maximumâ¯=â¯0.906). String matching is found to be better suited for identifying hospital visit information. The performance of both methods is comparable for identifying medications. The 10-fold cross-validation shows that CRFs had median F1â¯=â¯0.811 (minimumâ¯=â¯0.752, maximumâ¯=â¯0.918), and exhibited good performance even when trained with simple features.CONCLUSION:
CRFs are a good candidate for implementing clinical NER in Korean clinical narrative documents. Increasing the training data and incorporating sophisticated feature engineering might improve the accuracy of identifying health information, enabling automated patient history summarization in the future.Palabras clave
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Contexto en salud:
11_ODS3_cobertura_universal
/
1_ASSA2030
/
2_ODS3
Problema de salud:
11_delivery_arrangements
/
11_multisectoral_coordination
/
1_sistemas_informacao_saude
/
2_cobertura_universal
Asunto principal:
Informática Médica
/
Procesamiento de Lenguaje Natural
/
Minería de Datos
Tipo de estudio:
Clinical_trials
/
Prognostic_studies
Límite:
Humans
País/Región como asunto:
Asia
Idioma:
En
Revista:
Comput Biol Med
Año:
2018
Tipo del documento:
Article