SPBERE: Boosting span-based pipeline biomedical entity and relation extraction via entity information.

Yang, Chenglin; Deng, Jiamei; Chen, Xianlai; An, Ying

Yang, Chenglin; Deng, Jiamei; Chen, Xianlai; An, Ying.

Afiliação

Yang C; Big Data Institute, Central South University, Changsha, 410083, China; School of Life Sciences, Central South University, Changsha, 410083, China.
Deng J; Big Data Institute, Central South University, Changsha, 410083, China.
Chen X; Big Data Institute, Central South University, Changsha, 410083, China; Key Laboratory of Medical Information Research, Central South University, Changsha, 410083, China. Electronic address: chenxianlai@csu.edu.cn.
An Y; Big Data Institute, Central South University, Changsha, 410083, China. Electronic address: anying@csu.edu.cn.

J Biomed Inform ; 145: 104456, 2023 09.

Article em En | MEDLINE | ID: mdl-37482171

ABSTRACT

ABSTRACT

Triplet extraction is one of the fundamental tasks in biomedical text mining. Compared with traditional pipeline approaches, joint methods can alleviate the error propagation problem from entity recognition to relation classification. However, existing methods face challenges in detecting overlapping entities and overlapping relations, which are ubiquitous in biomedical texts. In this work, we propose a novel pipeline method of end-to-end biomedical triplet extraction. In particular, a span-based detection strategy is used to detect the overlapping triplets by enumerating possible candidate spans and entity pairs. The strategy is further used to capture different contextualized representations via an entity model and a relation model, respectively. Furthermore, to enhance interrelation between spans, entity information from the output of the entity model is used to construct the input for the relation model without utilizing any external knowledge. Our approach is evaluated on the drug-drug interaction (DDI) and chemical-protein interaction (CHEMPROT) datasets, exhibiting improvement of the absolute F1-score in relation extraction by 3.5%-3.7% compared prior work. The experimental results highlight the importance of overlapping triplet detection using the span-based approach, acquisition of various contextualized representations via different in-domain pre-trained language models, and early fusion of entity information in the relation model.

Assuntos

Mineração de Dados; Idioma; Mineração de Dados/métodos; Processamento de Linguagem Natural; Proteínas; Interações Medicamentosas

Palavras-chave

Biomedical triplet extraction; Entity information; Pipeline; Pre-trained language model; Span-based approach

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Mineração de Dados / Idioma Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google