Your browser doesn't support javascript.
loading
Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine.
Zhang, Tingting; Wang, Yaqiang; Wang, Xiaofeng; Yang, Yafei; Ye, Ying.
Afiliação
  • Zhang T; Basic Medical School, Chengdu University of Traditional Chinese Medicine, No. 37, Shi Er Qiao Road, Chengdu, 610075, People's Republic of China.
  • Wang Y; College of Software Engineering, Chengdu University of Information Technology, No. 24, Xue Fu Road, Chengdu, 610225, People's Republic of China. yaqwang@cuit.edu.cn.
  • Wang X; College of Software Engineering, Chengdu University of Information Technology, No. 24, Xue Fu Road, Chengdu, 610225, People's Republic of China.
  • Yang Y; College of Software Engineering, Chengdu University of Information Technology, No. 24, Xue Fu Road, Chengdu, 610225, People's Republic of China.
  • Ye Y; Basic Medical School, Chengdu University of Traditional Chinese Medicine, No. 37, Shi Er Qiao Road, Chengdu, 610075, People's Republic of China. yeyingtcm@163.com.
BMC Med Inform Decis Mak ; 20(1): 64, 2020 04 06.
Article em En | MEDLINE | ID: mdl-32252745
ABSTRACT

BACKGROUND:

In this study, we focus on building a fine-grained entity annotation corpus with the corresponding annotation guideline of traditional Chinese medicine (TCM) clinical records. Our aim is to provide a basis for the fine-grained corpus construction of TCM clinical records in future.

METHODS:

We developed a four-step approach that is suitable for the construction of TCM medical records in our corpus. First, we determined the entity types included in this study through sample annotation. Then, we drafted a fine-grained annotation guideline by summarizing the characteristics of the dataset and referring to some existing guidelines. We iteratively updated the guidelines until the inter-annotator agreement (IAA) exceeded a Cohen's kappa value of 0.9. Comprehensive annotations were performed while keeping the IAA value above 0.9.

RESULTS:

We annotated the 10,197 clinical records in five rounds. Four entity categories involving 13 entity types were employed. The final fine-grained annotated entity corpus consists of 1104 entities and 67,799 tokens. The final IAAs are 0.936 on average (for three annotators), indicating that the fine-grained entity recognition corpus is of high quality.

CONCLUSIONS:

These results will provide a foundation for future research on corpus construction and named entity recognition tasks in the TCM clinical domain.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Medicinas Tradicionais: Medicinas_tradicionales_de_asia / Medicina_china Assunto principal: Medicina Tradicional Chinesa Tipo de estudo: Diagnostic_studies / Guideline Idioma: En Revista: BMC Med Inform Decis Mak Ano de publicação: 2020 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Medicinas Tradicionais: Medicinas_tradicionales_de_asia / Medicina_china Assunto principal: Medicina Tradicional Chinesa Tipo de estudo: Diagnostic_studies / Guideline Idioma: En Revista: BMC Med Inform Decis Mak Ano de publicação: 2020 Tipo de documento: Article