RESUMEN
BACKGROUND: Entity normalization is an important information extraction task which has recently gained attention, particularly in the clinical/biomedical and life science domains. On several datasets, state-of-the-art methods perform rather well on popular benchmarks. Yet, we argue that the task is far from resolved. RESULTS: We have selected two gold standard corpora and two state-of-the-art methods to highlight some evaluation biases. We present non-exhaustive initial findings on the existence of evaluation problems of the entity normalization task. CONCLUSIONS: Our analysis suggests better evaluation practices to support the methodological research in this field.
Asunto(s)
Disciplinas de las Ciencias Biológicas , Almacenamiento y Recuperación de la Información , Proyectos de Investigación , Sesgo , Procesamiento de Lenguaje NaturalRESUMEN
INTRODUCTION: Positron emission tomography (PET) amyloid imaging has become an important part of the diagnostic workup for patients with primary progressive aphasia (PPA) and uncertain underlying pathology. Here, we employ a semi-automated analysis of connected speech (CS) with a twofold objective. First, to determine if quantitative CS features can help select primary progressive aphasia (PPA) patients with a higher probability of a positive PET amyloid imaging result. Second, to examine the relevant group differences from a clinical perspective. METHODS: 117 CS samples from a well-characterised cohort of PPA patients who underwent PET amyloid imaging were collected. Expert consensus established PET amyloid status for each patient, and 40% of the sample was amyloid positive. RESULTS: Leave-one-out cross-validation yields 77% classification accuracy (sensitivity: 74%, specificity: 79%). DISCUSSION: Our results confirm the potential of CS analysis as a screening tool. Discriminant CS features from lexical, syntactic, pragmatic, and semantic domains are discussed.