Your browser doesn't support javascript.
loading
Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts.
Zou, Qunsheng; Yang, Kuo; Shu, Zixin; Chang, Kai; Zheng, Qiguang; Zheng, Yi; Lu, Kezhi; Xu, Ning; Tian, Haoyu; Li, Xiaomeng; Yang, Yuxia; Zhou, Yana; Yu, Haibin; Zhang, Xiaoping; Xia, Jianan; Zhu, Qiang; Poon, Josiah; Poon, Simon; Zhang, Runshun; Li, Xiaodong; Zhou, Xuezhong.
  • Zou Q; Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
  • Yang K; Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
  • Shu Z; Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
  • Chang K; Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
  • Zheng Q; Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
  • Zheng Y; Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
  • Lu K; Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
  • Xu N; The First Affiliated Hospital of Henan University of Chinese Medicine, Zhengzhou 45000, China.
  • Tian H; Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
  • Li X; Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
  • Yang Y; Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
  • Zhou Y; Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan 430061, China.
  • Yu H; The First Affiliated Hospital of Henan University of Chinese Medicine, Zhengzhou 45000, China.
  • Zhang X; Data Centre of Traditional Chinese Medicine, China Academy of Chinese Medical Science, Beijing 100700, China.
  • Xia J; Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
  • Zhu Q; Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
  • Poon J; School of Information Technologies, The University of Sydney, Sydney, Australia, Analytic and Clinical Cooperative Laboratory for Integrative Medicine, USYD & CUHK, Sydney 2006, Australia.
  • Poon S; School of Information Technologies, The University of Sydney, Sydney, Australia, Analytic and Clinical Cooperative Laboratory for Integrative Medicine, USYD & CUHK, Sydney 2006, Australia.
  • Zhang R; Guang'anmen Hospital, China Academy of Chinese Medical Science, Beijing 100053, China.
  • Li X; Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan 430061, China.
  • Zhou X; Institute of Liver Disease, Hubei Provincial Academy of Traditional Chinese Medicine, Wuhan 430061, China.
Biomed Res Int ; 2022: 3524090, 2022.
Article en En | MEDLINE | ID: mdl-35342762
ABSTRACT
Biomedical named entity recognition (BioNER) from clinical texts is a fundamental task for clinical data analysis due to the availability of large volume of electronic medical record data, which are mostly in free text format, in real-world clinical settings. Clinical text data incorporates significant phenotypic medical entities (e.g., symptoms, diseases, and laboratory indexes), which could be used for profiling the clinical characteristics of patients in specific disease conditions (e.g., Coronavirus Disease 2019 (COVID-19)). However, general BioNER approaches mostly rely on coarse-grained annotations of phenotypic entities in benchmark text dataset. Owing to the numerous negation expressions of phenotypic entities (e.g., "no fever," "no cough," and "no hypertension") in clinical texts, this could not feed the subsequent data analysis process with well-prepared structured clinical data. In this paper, we developed Human-machine Cooperative Phenotypic Spectrum Annotation System (http//www.tcmai.org/login, HCPSAS) and constructed a fine-grained Chinese clinical corpus. Thereafter, we proposed a phenotypic named entity recognizer Phenonizer, which utilized BERT to capture character-level global contextual representation, extracted local contextual features combined with bidirectional long short-term memory, and finally obtained the optimal label sequences through conditional random field. The results on COVID-19 dataset show that Phenonizer outperforms those methods based on Word2Vec with an F1-score of 0.896. By comparing character embeddings from different data, it is found that character embeddings trained by clinical corpora can improve F-score by 0.0103. In addition, we evaluated Phenonizer on two kinds of granular datasets and proved that fine-grained dataset can boost methods' F1-score slightly by about 0.005. Furthermore, the fine-grained dataset enables methods to distinguish between negated symptoms and presented symptoms. Finally, we tested the generalization performance of Phenonizer, achieving a superior F1-score of 0.8389. In summary, together with fine-grained annotated benchmark dataset, Phenonizer proposes a feasible approach to effectively extract symptom information from Chinese clinical texts with acceptable performance.
Asunto(s)

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: COVID-19 Límite: Humans País como asunto: Asia Idioma: En Año: 2022 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: COVID-19 Límite: Humans País como asunto: Asia Idioma: En Año: 2022 Tipo del documento: Article