Between human and AI: assessing the reliability of AI text detection tools.

Bellini, Valentina; Semeraro, Federico; Montomoli, Jonathan; Cascella, Marco; Bignami, Elena

Bellini, Valentina; Semeraro, Federico; Montomoli, Jonathan; Cascella, Marco; Bignami, Elena.

Afiliação

Bellini V; Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Parma, Italy.
Semeraro F; Department of Anesthesia, Intensive Care and Prehospital Emergency, Maggiore Hospital Carlo Alberto Pizzardi, Bologna, Italy.
Montomoli J; Department of Anesthesia and Intensive Care, Infermi Hospital, Romagna Local Health Authority, Rimini, Italy.
Cascella M; Anesthesia and Pain Medicine. Department of Medicine, Surgery and Dentistry "Scuola Medica Salernitana", University of Salerno, Baronissi, Italy.
Bignami E; Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Parma, Italy.

Curr Med Res Opin ; 40(3): 353-358, 2024 03.

Article em En | MEDLINE | ID: mdl-38265047

ABSTRACT

ABSTRACT

OBJECTIVE:

Large language models (LLMs) such as ChatGPT-4 have raised critical questions regarding their distinguishability from human-generated content. In this research, we evaluated the effectiveness of online detection tools in identifying ChatGPT-4 vs human-written text.

METHODS:

A two texts produced by ChatGPT-4 using differing prompts and one text created by a human author were analytically assessed using the following online detection tools GPTZero, ZeroGPT, Writer ACD, and Originality.

RESULTS:

The findings revealed a notable variance in the detection capabilities of the employed detection tools. GPTZero and ZeroGPT exhibited inconsistent assessments regarding the AI-origin of the texts. Writer ACD predominantly identified texts as human-written, whereas Originality consistently recognized the AI-generated content in both samples from ChatGPT-4. This highlights Originality's enhanced sensitivity to patterns characteristic of AI-generated text.

CONCLUSION:

The study demonstrates that while automatic detection tools may discern texts generated by ChatGPT-4 significant variability exists in their accuracy. Undoubtedly, there is an urgent need for advanced detection tools to ensure the authenticity and integrity of content, especially in scientific and academic research. However, our findings underscore an urgent need for more refined detection methodologies to prevent the misdetection of human-written content as AI-generated and vice versa.

Assuntos

Inteligência Artificial; Redação; Humanos

Palavras-chave

Anesthesia; ChatGPT-4; artificial intelligence; intensive care; large language model

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Redação / Inteligência Artificial Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Redação / Inteligência Artificial Idioma: En Ano de publicação: 2024 Tipo de documento: Article