BioBBC: a multi-feature model that enhances the detection of biomedical entities.
Sci Rep
; 14(1): 7697, 2024 04 02.
Article
de En
| MEDLINE
| ID: mdl-38565624
ABSTRACT
The rapid increase in biomedical publications necessitates efficient systems to automatically handle Biomedical Named Entity Recognition (BioNER) tasks in unstructured text. However, accurately detecting biomedical entities is quite challenging due to the complexity of their names and the frequent use of abbreviations. In this paper, we propose BioBBC, a deep learning (DL) model that utilizes multi-feature embeddings and is constructed based on the BERT-BiLSTM-CRF to address the BioNER task. BioBBC consists of three main layers; an embedding layer, a Long Short-Term Memory (Bi-LSTM) layer, and a Conditional Random Fields (CRF) layer. BioBBC takes sentences from the biomedical domain as input and identifies the biomedical entities mentioned within the text. The embedding layer generates enriched contextual representation vectors of the input by learning the text through four types of embeddings part-of-speech tags (POS tags) embedding, char-level embedding, BERT embedding, and data-specific embedding. The BiLSTM layer produces additional syntactic and semantic feature representations. Finally, the CRF layer identifies the best possible tag sequence for the input sentence. Our model is well-constructed and well-optimized for detecting different types of biomedical entities. Based on experimental results, our model outperformed state-of-the-art (SOTA) models with significant improvements based on six benchmark BioNER datasets.
Mots clés
Texte intégral:
1
Collection:
01-internacional
Base de données:
MEDLINE
Sujet principal:
Sémantique
/
Langage
Langue:
En
Journal:
Sci Rep
Année:
2024
Type de document:
Article
Pays d'affiliation:
Arabie saoudite
Pays de publication:
Royaume-Uni