Your browser doesn't support javascript.
loading
Is Boundary Annotation Necessary? Evaluating Boundary-Free Approaches to Improve Clinical Named Entity Annotation Efficiency: Case Study.
Herman Bernardim Andrade, Gabriel; Yada, Shuntaro; Aramaki, Eiji.
Afiliación
  • Herman Bernardim Andrade G; Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan.
  • Yada S; Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan.
  • Aramaki E; Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan.
JMIR Med Inform ; 12: e59680, 2024 Jul 02.
Article en En | MEDLINE | ID: mdl-38954456
ABSTRACT

BACKGROUND:

Named entity recognition (NER) is a fundamental task in natural language processing. However, it is typically preceded by named entity annotation, which poses several challenges, especially in the clinical domain. For instance, determining entity boundaries is one of the most common sources of disagreements between annotators due to questions such as whether modifiers or peripheral words should be annotated. If unresolved, these can induce inconsistency in the produced corpora, yet, on the other hand, strict guidelines or adjudication sessions can further prolong an already slow and convoluted process.

OBJECTIVE:

The aim of this study is to address these challenges by evaluating 2 novel annotation methodologies, lenient span and point annotation, aiming to mitigate the difficulty of precisely determining entity boundaries.

METHODS:

We evaluate their effects through an annotation case study on a Japanese medical case report data set. We compare annotation time, annotator agreement, and the quality of the produced labeling and assess the impact on the performance of an NER system trained on the annotated corpus.

RESULTS:

We saw significant improvements in the labeling process efficiency, with up to a 25% reduction in overall annotation time and even a 10% improvement in annotator agreement compared to the traditional boundary-strict approach. However, even the best-achieved NER model presented some drop in performance compared to the traditional annotation methodology.

CONCLUSIONS:

Our findings demonstrate a balance between annotation speed and model performance. Although disregarding boundary information affects model performance to some extent, this is counterbalanced by significant reductions in the annotator's workload and notable improvements in the speed of the annotation process. These benefits may prove valuable in various applications, offering an attractive compromise for developers and researchers.
Palabras clave

Texto completo: 1 Bases de datos: MEDLINE Idioma: En Revista: JMIR Med Inform Año: 2024 Tipo del documento: Article País de afiliación: Japón

Texto completo: 1 Bases de datos: MEDLINE Idioma: En Revista: JMIR Med Inform Año: 2024 Tipo del documento: Article País de afiliación: Japón