BioBERT and Similar Approaches for Relation Extraction.

Bhasuran, Balu

Bhasuran, Balu.

Afiliação

Bhasuran B; DRDO-BU Center for Life Sciences, Bharathiar University Campus, Coimbatore, Tamilnadu, India. balubhasuran08@gmail.com.

Methods Mol Biol ; 2496: 221-235, 2022.

Article em En | MEDLINE | ID: mdl-35713867

ABSTRACT

ABSTRACT

In biomedicine, facts about relations between entities (disease, gene, drug, etc.) are hidden in the large trove of 30 million scientific publications. The curated information is proven to play an important role in various applications such as drug repurposing and precision medicine. Recently, due to the advancement in deep learning a transformer architecture named BERT (Bidirectional Encoder Representations from Transformers) has been proposed. This pretrained language model trained using the Books Corpus with 800M words and English Wikipedia with 2500M words reported state of the art results in various NLP (Natural Language Processing) tasks including relation extraction. It is a widely accepted notion that due to the word distribution shift, general domain models exhibit poor performance in information extraction tasks of the biomedical domain. Due to this, an architecture is later adapted to the biomedical domain by training the language models using 28 million scientific literatures from PubMed and PubMed central. This chapter presents a protocol for relation extraction using BERT by discussing state-of-the-art for BERT versions in the biomedical domain such as BioBERT. The protocol emphasis on general BERT architecture, pretraining and fine tuning, leveraging biomedical information, and finally a knowledge graph infusion to the BERT model layer.

Assuntos

Armazenamento e Recuperação da Informação; Processamento de Linguagem Natural; Idioma; PubMed; Publicações

Palavras-chave

BERT; BioBERT; Deep Learning; Relation Extraction; Text Mining; Transformers

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Armazenamento e Recuperação da Informação Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google