Your browser doesn't support javascript.
loading
Assessing ChatGPT's ability to emulate human reviewers in scientific research: A descriptive and qualitative approach.
Suleiman, Aiman; von Wedel, Dario; Munoz-Acuna, Ricardo; Redaelli, Simone; Santarisi, Abeer; Seibold, Eva-Lotte; Ratajczak, Nikolai; Kato, Shinichiro; Said, Nader; Sundar, Eswar; Goodspeed, Valerie; Schaefer, Maximilian S.
Affiliation
  • Suleiman A; Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Department of Anesthesia, Critica
  • von Wedel D; Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Munoz-Acuna R; Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Redaelli S; Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Santarisi A; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Department of Emergency Medicine, Disaster Medicine Fellowship, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Seibold EL; Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Ratajczak N; Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Kato S; Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Said N; Department of Industrial Engineering, Faculty of Engineering Technologies and Sciences, Higher Colleges of Technology, DWC, Dubai, United Arab Emirates.
  • Sundar E; Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Goodspeed V; Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Schaefer MS; Department of Anesthesia, Critical Care and Pain Medicine, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Center for Anesthesia Research Excellence (CARE), Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA; Klinik für Anästhesiologie, Unive
Comput Methods Programs Biomed ; 254: 108313, 2024 Jun 28.
Article in En | MEDLINE | ID: mdl-38954915
ABSTRACT

BACKGROUND:

ChatGPT is an AI platform whose relevance in the peer review of scientific articles is steadily growing. Nonetheless, it has sparked debates over its potential biases and inaccuracies. This study aims to assess ChatGPT's ability to qualitatively emulate human reviewers in scientific research.

METHODS:

We included the first submitted version of the latest twenty original research articles published by the 3rd of July 2023, in a high-profile medical journal. Each article underwent evaluation by a minimum of three human reviewers during the initial review stage. Subsequently, three researchers with medical backgrounds and expertise in manuscript revision, independently and qualitatively assessed the agreement between the peer reviews generated by ChatGPT version GPT-4 and the comments provided by human reviewers for these articles. The level of agreement was categorized into complete, partial, none, or contradictory.

RESULTS:

720 human reviewers' comments were assessed. There was a good agreement between the three assessors (Overall kappa >0.6). ChatGPT's comments demonstrated complete agreement in terms of quality and substance with 48 (6.7 %) human reviewers' comments, partially agreed with 92 (12.8 %), identifying issues necessitating further elaboration or recommending supplementary steps to address concerns, had no agreement with a significant 565 (78.5 %), and contradicted 15 (2.1 %). ChatGPT comments on methods had the lowest proportion of complete agreement (13 comments, 3.6 %), while general comments on the manuscript displayed the highest proportion of complete agreement (17 comments, 22.1 %).

CONCLUSION:

ChatGPT version GPT-4 has a limited ability to emulate human reviewers within the peer review process of scientific research.
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Comput Methods Programs Biomed Journal subject: INFORMATICA MEDICA Year: 2024 Document type: Article

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Comput Methods Programs Biomed Journal subject: INFORMATICA MEDICA Year: 2024 Document type: Article
...