NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities.

Loukachevitch, Natalia; Manandhar, Suresh; Baral, Elina; Rozhkov, Igor; Braslavski, Pavel; Ivanov, Vladimir; Batura, Tatiana; Tutubalina, Elena

Loukachevitch, Natalia; Manandhar, Suresh; Baral, Elina; Rozhkov, Igor; Braslavski, Pavel; Ivanov, Vladimir; Batura, Tatiana; Tutubalina, Elena.

Afiliação

Loukachevitch N; Lomonosov Moscow State University, Moscow 19899, Russia.
Manandhar S; Madan Bhandari University of Science and Technology, Chitlang 44600, Nepal.
Baral E; Madan Bhandari University of Science and Technology, Chitlang 44600, Nepal.
Rozhkov I; Lomonosov Moscow State University, Moscow 19899, Russia.
Braslavski P; Ural Federal University, Yekaterinburg 620002, Russia.
Ivanov V; HSE University, Moscow 101000, Russia.
Batura T; Innopolis University, Innopolis 420500, Russia.
Tutubalina E; A.P. Ershov Institute of Informatics Systems, Novosibirsk 630090, Russia.

Bioinformatics ; 39(4)2023 04 03.

Article em En | MEDLINE | ID: mdl-37004189

ABSTRACT

ABSTRACT

MOTIVATION This article describes NEREL-BIO-an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments. NEREL-BIO provides annotation for nested named entities as an extension of the scheme employed for NEREL. Nested named entities may cross entity boundaries to connect to shorter entities nested within longer entities, making them harder to detect.

RESULTS:

NEREL-BIO contains annotations for 700+ Russian and 100+ English abstracts. All English PubMed annotations have corresponding Russian counterparts. Thus, NEREL-BIO comprises the following specific features annotation of nested named entities, it can be used as a benchmark for cross-domain (NEREL â NEREL-BIO) and cross-language (English â Russian) transfer. We experiment with both transformer-based sequence models and machine reading comprehension models and report their results. AVAILABILITY AND IMPLEMENTATION The dataset and annotation guidelines are freely available at https//github.com/nerel-ds/NEREL-BIO.

Assuntos

Processamento de Linguagem Natural; Semântica; PubMed; Idioma

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Semântica / Processamento de Linguagem Natural Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Semântica / Processamento de Linguagem Natural Idioma: En Ano de publicação: 2023 Tipo de documento: Article