A unified framework of medical information annotation and extraction for Chinese clinical text.
Artif Intell Med
; 142: 102573, 2023 08.
Article
en En
| MEDLINE
| ID: mdl-37316096
Medical information extraction consists of a group of natural language processing (NLP) tasks, which collaboratively convert clinical text to pre-defined structured formats. This is a critical step to exploit electronic medical records (EMRs). Given the recent thriving NLP technologies, model implementation and performance seem no longer an obstacle, whereas the bottleneck locates on a high-quality annotated corpus and the whole engineering workflow. This study presents an engineering framework consisting of three tasks, i.e., medical entity recognition, relation extraction and attribute extraction. Within this framework, the whole workflow is demonstrated from EMR data collection through model performance evaluation. Our annotation scheme is designed to be comprehensive and compatible between the multiple tasks. With the EMRs from a general hospital in Ningbo, China, and the manual annotation by experienced physicians, our corpus is of large scale and high quality. Built upon this Chinese clinical corpus, the medical information extraction system show performance that approaches human annotation. The annotation scheme, (a subset of) the annotated corpus, and the code are all publicly released, to facilitate further research.
Palabras clave
Texto completo:
1
Banco de datos:
MEDLINE
Asunto principal:
Médicos
/
Registros Electrónicos de Salud
Tipo de estudio:
Guideline
/
Prognostic_studies
Límite:
Humans
País como asunto:
Asia
Idioma:
En
Año:
2023
Tipo del documento:
Article