Question-driven summarization of answers to consumer health questions.

Savery, Max; Abacha, Asma Ben; Gayen, Soumya; Demner-Fushman, Dina

Savery, Max; Abacha, Asma Ben; Gayen, Soumya; Demner-Fushman, Dina.

Afiliação

Savery M; Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
Abacha AB; Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
Gayen S; Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
Demner-Fushman D; Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, USA. ddemner@mail.nih.gov.

Sci Data ; 7(1): 322, 2020 10 02.

Article em En | MEDLINE | ID: mdl-33009402

ABSTRACT

ABSTRACT

Automatic summarization of natural language is a widely studied area in computer science, one that is broadly applicable to anyone who needs to understand large quantities of information. In the medical domain, automatic summarization has the potential to make health information more accessible to people without medical expertise. However, to evaluate the quality of summaries generated by summarization algorithms, researchers first require gold standard, human generated summaries. Unfortunately there is no available data for the purpose of assessing summaries that help consumers of health information answer their questions. To address this issue, we present the MEDIQA-Answer Summarization dataset, the first dataset designed for question-driven, consumer-focused summarization. It contains 156 health questions asked by consumers, answers to these questions, and manually generated summaries of these answers. The dataset's unique structure allows it to be used for at least eight different types of summarization evaluations. We also benchmark the performance of baseline and state-of-the-art deep learning approaches on the dataset, demonstrating how it can be used to evaluate automatically generated summaries.

Assuntos

Informática Aplicada à Saúde dos Consumidores; Armazenamento e Recuperação da Informação; Processamento de Linguagem Natural

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Armazenamento e Recuperação da Informação / Informática Aplicada à Saúde dos Consumidores Idioma: En Revista: Sci Data Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google