Your browser doesn't support javascript.
loading
Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes.
Zhou, Huiwei; Ning, Shixian; Liu, Zhe; Lang, Chengkun; Liu, Zhuang; Lei, Bizun.
Afiliação
  • Zhou H; School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China. zhouhuiwei@dlut.edu.cn.
  • Ning S; School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China.
  • Liu Z; School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China.
  • Lang C; School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China.
  • Liu Z; School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China.
  • Lei B; School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China.
BMC Bioinformatics ; 21(1): 35, 2020 Jan 30.
Article em En | MEDLINE | ID: mdl-32000677
BACKGROUND: Automated biomedical named entity recognition and normalization serves as the basis for many downstream applications in information management. However, this task is challenging due to name variations and entity ambiguity. A biomedical entity may have multiple variants and a variant could denote several different entity identifiers. RESULTS: To remedy the above issues, we present a novel knowledge-enhanced system for protein/gene named entity recognition (PNER) and normalization (PNEN). On one hand, a large amount of entity name knowledge extracted from biomedical knowledge bases is used to recognize more entity variants. On the other hand, structural knowledge of entities is extracted and encoded as identifier (ID) embeddings, which are then used for better entity normalization. Moreover, deep contextualized word representations generated by pre-trained language models are also incorporated into our knowledge-enhanced system for modeling multi-sense information of entities. Experimental results on the BioCreative VI Bio-ID corpus show that our proposed knowledge-enhanced system achieves 0.871 F1-score for PNER and 0.445 F1-score for PNEN, respectively, leading to a new state-of-the-art performance. CONCLUSIONS: We propose a knowledge-enhanced system that combines both entity knowledge and deep contextualized word representations. Comparison results show that entity knowledge is beneficial to the PNER and PNEN task and can be well combined with contextualized information in our system for further improvement.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas Tipo de estudo: Evaluation_studies / Prognostic_studies Limite: Animals / Humans Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: China País de publicação: Reino Unido

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas Tipo de estudo: Evaluation_studies / Prognostic_studies Limite: Animals / Humans Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: China País de publicação: Reino Unido