BioBBC: a multi-feature model that enhances the detection of biomedical entities.

Alamro, Hind; Gojobori, Takashi; Essack, Magbubah; Gao, Xin

Alamro, Hind; Gojobori, Takashi; Essack, Magbubah; Gao, Xin.

Affiliation

Alamro H; Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
Gojobori T; Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
Essack M; College of Computing, Umm Al-Qura University, Mecca, Saudi Arabia.
Gao X; Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

Sci Rep ; 14(1): 7697, 2024 04 02.

Article de En | MEDLINE | ID: mdl-38565624

ABSTRACT

ABSTRACT

The rapid increase in biomedical publications necessitates efficient systems to automatically handle Biomedical Named Entity Recognition (BioNER) tasks in unstructured text. However, accurately detecting biomedical entities is quite challenging due to the complexity of their names and the frequent use of abbreviations. In this paper, we propose BioBBC, a deep learning (DL) model that utilizes multi-feature embeddings and is constructed based on the BERT-BiLSTM-CRF to address the BioNER task. BioBBC consists of three main layers; an embedding layer, a Long Short-Term Memory (Bi-LSTM) layer, and a Conditional Random Fields (CRF) layer. BioBBC takes sentences from the biomedical domain as input and identifies the biomedical entities mentioned within the text. The embedding layer generates enriched contextual representation vectors of the input by learning the text through four types of embeddings part-of-speech tags (POS tags) embedding, char-level embedding, BERT embedding, and data-specific embedding. The BiLSTM layer produces additional syntactic and semantic feature representations. Finally, the CRF layer identifies the best possible tag sequence for the input sentence. Our model is well-constructed and well-optimized for detecting different types of biomedical entities. Based on experimental results, our model outperformed state-of-the-art (SOTA) models with significant improvements based on six benchmark BioNER datasets.

Sujet(s)
Mots clés

BiLSTM; BioBERT; Biomedical named entity recognition; Machine learning; NER; Natural language processing; PubMedBERT

Texte intégral

Ajouter à My VHL

Imprimer

XML

PubMed Links

Recherche sur Google

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Sémantique / Langage Langue: En Journal: Sci Rep Année: 2024 Type de document: Article Pays d'affiliation: Arabie saoudite Pays de publication: Royaume-Uni

Texte intégral

Ajouter à My VHL

Imprimer

XML

PubMed Links

Recherche sur Google