Performance of Artificial Intelligence Content Detectors Using Human and Artificial Intelligence-Generated Scientific Writing.

Flitcroft, Madelyn A; Sheriff, Salma A; Wolfrath, Nathan; Maddula, Ragasnehith; McConnell, Laura; Xing, Yun; Haines, Krista L; Wong, Sandra L; Kothari, Anai N

Flitcroft, Madelyn A; Sheriff, Salma A; Wolfrath, Nathan; Maddula, Ragasnehith; McConnell, Laura; Xing, Yun; Haines, Krista L; Wong, Sandra L; Kothari, Anai N.

Afiliação

Flitcroft MA; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA.
Sheriff SA; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA.
Wolfrath N; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA.
Maddula R; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA.
McConnell L; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA.
Xing Y; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA.
Haines KL; Department of Surgery, Division of Trauma, Critical Care, and Acute Care Surgery, Duke University, Durham, NC, USA.
Wong SL; Department of Surgery, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA.
Kothari AN; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA. akothari@mcw.edu.

Ann Surg Oncol ; 31(10): 6387-6393, 2024 Oct.

Article em En | MEDLINE | ID: mdl-38909113

ABSTRACT

ABSTRACT

BACKGROUND:

Few studies have examined the performance of artificial intelligence (AI) content detection in scientific writing. This study evaluates the performance of publicly available AI content detectors when applied to both human-written and AI-generated scientific articles.

METHODS:

Articles published in Annals of Surgical Oncology (ASO) during the year 2022, as well as AI-generated articles using OpenAI's ChatGPT, were analyzed by three AI content detectors to assess the probability of AI-generated content. Full manuscripts and their individual sections were evaluated. Group comparisons and trend analyses were conducted by using ANOVA and linear regression. Classification performance was determined using area under the curve (AUC).

RESULTS:

A total of 449 original articles met inclusion criteria and were evaluated to determine the likelihood of being generated by AI. Each detector also evaluated 47 AI-generated articles by using titles from ASO articles. Human-written articles had an average probability of being AI-generated of 9.4% with significant differences between the detectors. Only two (0.4%) human-written manuscripts were detected as having a 0% probability of being AI-generated by all three detectors. Completely AI-generated articles were evaluated to have a higher average probability of being AI-generated (43.5%) with a range from 12.0 to 99.9%.

CONCLUSIONS:

This study demonstrates differences in the performance of various AI content detectors with the potential to label human-written articles as AI-generated. Any effort toward implementing AI detectors must include a strategy for continuous evaluation and validation as AI models and detectors rapidly evolve.

Assuntos

Inteligência Artificial; Humanos; Redação; Oncologia Cirúrgica

Palavras-chave

AI detection; Artificial intelligence; ChatGPT; Generative AI

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Inteligência Artificial Limite: Humans Idioma: En Revista: Ann Surg Oncol Assunto da revista: NEOPLASIAS Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Estados Unidos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google