Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study.

Vithanage, Dinithi; Yu, Ping; Wang, Lei; Deng, Chao

Vithanage, Dinithi; Yu, Ping; Wang, Lei; Deng, Chao.

Afiliação

Vithanage D; School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2522 Australia.
Yu P; School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2522 Australia.
Wang L; School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2522 Australia.
Deng C; School of Medical, Indigenous and Health Sciences, University of Wollongong, Wollongong, NSW 2522 Australia.

J Healthc Inform Res ; 8(1): 158-179, 2024 Mar.

Article em En | MEDLINE | ID: mdl-38273979

ABSTRACT

ABSTRACT

Recent advancements in natural language processing (NLP), particularly contextual word embedding models, have improved knowledge extraction from biomedical and healthcare texts. However, limited comprehensive research compares these models. This study conducts a scoping review and compares the performance of the major contextual word embedding models for biomedical knowledge extraction. From 26 articles identified from Scopus, PubMed, PubMed Central, and Google Scholar between 2017 and 2021, 18 notable contextual word embedding models were identified. These include ELMo, BERT, BioBERT, BlueBERT, CancerBERT, DDS-BERT, RuBERT, LABSE, EhrBERT, MedBERT, Clinical BERT, Clinical BioBERT, Discharge Summary BERT, Discharge Summary BioBERT, GPT, GPT-2, GPT-3, and GPT2-Bio-Pt. A case study compared the performance of six representative models-ELMo, BERT, BioBERT, BlueBERT, Clinical BioBERT, and GPT-3-across text classification, named entity recognition, and question answering. The evaluation utilized datasets comprising biomedical text from tweets, NCBI, PubMed, and clinical notes sourced from two electronic health record datasets. Performance metrics, including accuracy and F1 score, were used. The results of this case study reveal that BioBERT performs the best in analyzing biomedical text, while Clinical BioBERT excels in analyzing clinical notes. These findings offer crucial insights into word embedding models for researchers, practitioners, and stakeholders utilizing NLP in biomedical and clinical document analysis. Supplementary Information The online version contains supplementary material available at 10.1007/s41666-023-00157-y.

Palavras-chave

Biomedical text; Contextual word embedding models; Electronic health records; Knowledge extraction; Natural language processing

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies / Systematic_reviews Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google