Clinical Research With Large Language Models Generated Writing-Clinical Research with AI-assisted Writing (CRAW) Study.

Huespe, Ivan A; Echeverri, Jorge; Khalid, Aisha; Carboni Bisso, Indalecio; Musso, Carlos G; Surani, Salim; Bansal, Vikas; Kashyap, Rahul

Huespe, Ivan A; Echeverri, Jorge; Khalid, Aisha; Carboni Bisso, Indalecio; Musso, Carlos G; Surani, Salim; Bansal, Vikas; Kashyap, Rahul.

Afiliação

Huespe IA; Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.
Echeverri J; Universidad de Buenos Aires, Buenos Aires, Argentina.
Khalid A; Universidad Javeriana, Bogotá, Colombia.
Carboni Bisso I; Harvard Medical School, Boston, MA.
Musso CG; Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.
Surani S; Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.
Bansal V; Facultad de Ciencias de la Salud, Universidad Simon Bolivar, Barranquilla, Colombia.
Kashyap R; Mayo Clinic, Rochester, MN.

Crit Care Explor ; 5(10): e0975, 2023 Oct.

Article em En | MEDLINE | ID: mdl-37795455

ABSTRACT

ABSTRACT

IMPORTANCE The scientific community debates Generative Pre-trained Transformer (GPT)-3.5's article quality, authorship merit, originality, and ethical use in scientific writing.

OBJECTIVES:

Assess GPT-3.5's ability to craft the background section of critical care clinical research questions compared to medical researchers with H-indices of 22 and 13.

DESIGN:

Observational cross-sectional study.

SETTING:

Researchers from 20 countries from six continents evaluated the backgrounds.

PARTICIPANTS:

Researchers with a Scopus index greater than 1 were included. MAIN OUTCOMES AND

MEASURES:

In this study, we generated a background section of a critical care clinical research question on "acute kidney injury in sepsis" using three different

methods:

researcher with H-index greater than 20, researcher with H-index greater than 10, and GPT-3.5. The three background sections were presented in a blinded survey to researchers with an H-index range between 1 and 96. First, the researchers evaluated the main components of the background using a 5-point Likert scale. Second, they were asked to identify which background was written by humans only or with large language model-generated tools.

RESULTS:

A total of 80 researchers completed the survey. The median H-index was 3 (interquartile range, 1-7.25) and most (36%) researchers were from the Critical Care specialty. When compared with researchers with an H-index of 22 and 13, GPT-3.5 was marked high on the Likert scale ranking on main background components (median 4.5 vs. 3.82 vs. 3.6 vs. 4.5, respectively; p < 0.001). The sensitivity and specificity to detect researchers writing versus GPT-3.5 writing were poor, 22.4% and 57.6%, respectively. CONCLUSIONS AND RELEVANCE GPT-3.5 could create background research content indistinguishable from the writing of a medical researcher. It was marked higher compared with medical researchers with an H-index of 22 and 13 in writing the background section of a critical care clinical research question.

Palavras-chave

Generative Pre-trained Transformer-3.5; article writing; artificial intelligence; clinical research; medical research

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links