Benchmarking table recognition performance on biomedical literature on neurological disorders.
Bioinformatics
; 38(6): 1624-1630, 2022 03 04.
Article
en En
| MEDLINE
| ID: mdl-34935870
MOTIVATION: Table recognition systems are widely used to extract and structure quantitative information from the vast amount of documents that are increasingly available from different open sources. While many systems already perform well on tables with a simple layout, tables in the biomedical domain are often much more complex. Benchmark and training data for such tables are however very limited. RESULTS: To address this issue, we present a novel, highly curated benchmark dataset based on a hand-curated literature corpus on neurological disorders, which can be used to tune and evaluate table extraction applications for this challenging domain. We evaluate several state-of-the-art table extraction systems based on our proposed benchmark and discuss challenges that emerged during the benchmark creation as well as factors that can impact the performance of recognition methods. For the evaluation procedure, we propose a new metric as well as several improvements that result in a better performance evaluation. AVAILABILITY AND IMPLEMENTATION: The resulting benchmark dataset (https://zenodo.org/record/5549977) as well as the source code to our novel evaluation approach can be openly accessed. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Asunto principal:
Benchmarking
/
Enfermedades del Sistema Nervioso
Límite:
Humans
Idioma:
En
Revista:
Bioinformatics
Asunto de la revista:
INFORMATICA MEDICA
Año:
2022
Tipo del documento:
Article
País de afiliación:
Alemania
Pais de publicación:
Reino Unido