Búsqueda | Portal de Búsqueda de la BVS España

TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus.

Álvarez-Mellado, Elena; Díez-Platas, María Luisa; Ruiz-Fabo, Pablo; Bermúdez, Helena; Ros, Salvador; González-Blanco, Elena.

Lang Resour Eval ; 55(2): 525-549, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34776810

RESUMEN

Medieval documents are a rich source of historical data. Performing named-entity recognition (NER) on this genre of texts can provide us with valuable historical evidence. However, traditional NER categories and schemes are usually designed with modern documents in mind (i.e. journalistic text) and the general-domain NER annotation schemes fail to capture the nature of medieval entities. In this paper we explore the challenges of performing named-entity annotation on a corpus of Spanish medieval documents: we discuss the mismatches that arise when applying traditional NER categories to a corpus of Spanish medieval documents and we propose a novel humanist-friendly TEI-compliant annotation scheme and guidelines intended to capture the particular nature of medieval entities.

Medieval Spanish (12th-15th centuries) named entity recognition and attribute annotation system based on contextual information.

Díez Platas, Mª Luisa; Ros Muñoz, Salvador; González-Blanco, Elena; Ruiz Fabo, Pablo; Álvarez Mellado, Elena.

J Assoc Inf Sci Technol ; 72(2): 224-238, 2021 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-33665231

RESUMEN

The recognition of named entities in Spanish medieval texts presents great complexity, involving specific challenges: First, the complex morphosyntactic characteristics in proper-noun use in medieval texts. Second, the lack of strict orthographic standards. Finally, diachronic and geographical variations in Spanish from the 12th to 15th century. In this period, named entities usually appear as complex text structure. For example, it was frequent to add nicknames and information about the persons role in society and geographic origin. To tackle this complexity, named entity recognition and classification system has been implemented. The system uses contextual cues based on semantics to detect entities and assign a type. Given the occurrence of entities with attached attributes, entity contexts are also parsed to determine entity-type-specific dependencies for these attributes. Moreover, it uses a variant generator to handle the diachronic evolution of Spanish medieval terms from a phonetic and morphosyntactic viewpoint. The tool iteratively enriches its proper lexica, dictionaries, and gazetteers. The system was evaluated on a corpus of over 3,000 manually annotated entities of different types and periods, obtaining F1 scores between 0.74 and 0.87. Attribute annotation was evaluated for a person and role name attributes with an overall F1 of 0.75.

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA