Your browser doesn't support javascript.
loading
Evaluating progress in automatic chest X-ray radiology report generation.
Yu, Feiyang; Endo, Mark; Krishnan, Rayan; Pan, Ian; Tsai, Andy; Reis, Eduardo Pontes; Fonseca, Eduardo Kaiser Ururahy Nunes; Lee, Henrique Min Ho; Abad, Zahra Shakeri Hossein; Ng, Andrew Y; Langlotz, Curtis P; Venugopal, Vasantha Kumar; Rajpurkar, Pranav.
Afiliação
  • Yu F; Department of Computer Science, Stanford University, Stanford, CA 94305, USA.
  • Endo M; Department of Computer Science, Stanford University, Stanford, CA 94305, USA.
  • Krishnan R; Department of Computer Science, Stanford University, Stanford, CA 94305, USA.
  • Pan I; Department of Radiology, Brigham and Women's Hospital, Boston, MA 02115, USA.
  • Tsai A; Department of Radiology, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA.
  • Reis EP; Cardiothoracic Radiology Group, Hospital Israelita Albert Einstein, São Paulo, São Paulo 05652, Brazil.
  • Fonseca EKUN; Cardiothoracic Radiology Group, Hospital Israelita Albert Einstein, São Paulo, São Paulo 05652, Brazil.
  • Lee HMH; Cardiothoracic Radiology Group, Hospital Israelita Albert Einstein, São Paulo, São Paulo 05652, Brazil.
  • Abad ZSH; Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada.
  • Ng AY; Department of Computer Science, Stanford University, Stanford, CA 94305, USA.
  • Langlotz CP; AIMI Center, Stanford University, Stanford, CA 94304, USA.
  • Venugopal VK; CARPL.ai, New Delhi, Delhi 110016, India.
  • Rajpurkar P; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA.
Patterns (N Y) ; 4(9): 100802, 2023 Sep 08.
Article em En | MEDLINE | ID: mdl-37720336
Artificial intelligence (AI) models for automatic generation of narrative radiology reports from images have the potential to enhance efficiency and reduce the workload of radiologists. However, evaluating the correctness of these reports requires metrics that can capture clinically pertinent differences. In this study, we investigate the alignment between automated metrics and radiologists' scoring of errors in report generation. We address the limitations of existing metrics by proposing new metrics, RadGraph F1 and RadCliQ, which demonstrate stronger correlation with radiologists' evaluations. In addition, we analyze the failure modes of the metrics to understand their limitations and provide guidance for metric selection and interpretation. This study establishes RadGraph F1 and RadCliQ as meaningful metrics for guiding future research in radiology report generation.
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Guideline / Prognostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Guideline / Prognostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article