Your browser doesn't support javascript.
loading
Performance of Artificial Intelligence Content Detectors Using Human and Artificial Intelligence-Generated Scientific Writing.
Flitcroft, Madelyn A; Sheriff, Salma A; Wolfrath, Nathan; Maddula, Ragasnehith; McConnell, Laura; Xing, Yun; Haines, Krista L; Wong, Sandra L; Kothari, Anai N.
Afiliación
  • Flitcroft MA; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA.
  • Sheriff SA; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA.
  • Wolfrath N; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA.
  • Maddula R; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA.
  • McConnell L; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA.
  • Xing Y; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA.
  • Haines KL; Department of Surgery, Division of Trauma, Critical Care, and Acute Care Surgery, Duke University, Durham, NC, USA.
  • Wong SL; Department of Surgery, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA.
  • Kothari AN; Department of Surgery, Division of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI, USA. akothari@mcw.edu.
Ann Surg Oncol ; 31(10): 6387-6393, 2024 Oct.
Article en En | MEDLINE | ID: mdl-38909113
ABSTRACT

BACKGROUND:

Few studies have examined the performance of artificial intelligence (AI) content detection in scientific writing. This study evaluates the performance of publicly available AI content detectors when applied to both human-written and AI-generated scientific articles.

METHODS:

Articles published in Annals of Surgical Oncology (ASO) during the year 2022, as well as AI-generated articles using OpenAI's ChatGPT, were analyzed by three AI content detectors to assess the probability of AI-generated content. Full manuscripts and their individual sections were evaluated. Group comparisons and trend analyses were conducted by using ANOVA and linear regression. Classification performance was determined using area under the curve (AUC).

RESULTS:

A total of 449 original articles met inclusion criteria and were evaluated to determine the likelihood of being generated by AI. Each detector also evaluated 47 AI-generated articles by using titles from ASO articles. Human-written articles had an average probability of being AI-generated of 9.4% with significant differences between the detectors. Only two (0.4%) human-written manuscripts were detected as having a 0% probability of being AI-generated by all three detectors. Completely AI-generated articles were evaluated to have a higher average probability of being AI-generated (43.5%) with a range from 12.0 to 99.9%.

CONCLUSIONS:

This study demonstrates differences in the performance of various AI content detectors with the potential to label human-written articles as AI-generated. Any effort toward implementing AI detectors must include a strategy for continuous evaluation and validation as AI models and detectors rapidly evolve.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Inteligencia Artificial Límite: Humans Idioma: En Revista: Ann Surg Oncol Asunto de la revista: NEOPLASIAS Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Inteligencia Artificial Límite: Humans Idioma: En Revista: Ann Surg Oncol Asunto de la revista: NEOPLASIAS Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos