Implementation and evaluation of an additional GPT-4-based reviewer in PRISMA-based medical systematic literature reviews.

Landschaft, Assaf; Antweiler, Dario; Mackay, Sina; Kugler, Sabine; Rüping, Stefan; Wrobel, Stefan; Höres, Timm; Allende-Cid, Hector

Landschaft, Assaf; Antweiler, Dario; Mackay, Sina; Kugler, Sabine; Rüping, Stefan; Wrobel, Stefan; Höres, Timm; Allende-Cid, Hector.

Affiliation

Landschaft A; Boston Children's Hospital, 300 Longwood Avenue, Boston, MA 02115, USA; Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme (IAIS), Sankt Augustin, Germany. Electronic address: assaf.landschaft@iais.fraunhofer.de.
Antweiler D; Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme (IAIS), Sankt Augustin, Germany.
Mackay S; Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme (IAIS), Sankt Augustin, Germany.
Kugler S; Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme (IAIS), Sankt Augustin, Germany.
Rüping S; Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme (IAIS), Sankt Augustin, Germany.
Wrobel S; Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme (IAIS), Sankt Augustin, Germany.
Höres T; Fraunhofer-Institut für Translationale Medizin und Pharmakologie (ITMP), Frankfurt am Main, Germany.
Allende-Cid H; Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme (IAIS), Sankt Augustin, Germany.

Int J Med Inform ; 189: 105531, 2024 Jun 26.

Article in En | MEDLINE | ID: mdl-38943806

ABSTRACT

ABSTRACT

BACKGROUND:

PRISMA-based literature reviews require meticulous scrutiny of extensive textual data by multiple reviewers, which is associated with considerable human effort.

OBJECTIVE:

To evaluate feasibility and reliability of using GPT-4 API as a complementary reviewer in systematic literature reviews based on the PRISMA framework.

METHODOLOGY:

A systematic literature review on the role of natural language processing and Large Language Models (LLMs) in automatic patient-trial matching was conducted using human reviewers and an AI-based reviewer (GPT-4 API). A RAG methodology with LangChain integration was used to process full-text articles. Agreement levels between two human reviewers and GPT-4 API for abstract screening and between a single reviewer and GPT-4 API for full-text parameter extraction were evaluated.

RESULTS:

An almost perfect GPT-human reviewer agreement in the abstract screening process (Cohen's kappa > 0.9) and a lower agreement in the full-text parameter extraction were observed.

CONCLUSION:

As GPT-4 has performed on a par with human reviewers in abstract screening, we conclude that GPT-4 has an exciting potential of being used as a main screening tool for systematic literature reviews, replacing at least one of the human reviewers.

Key words

AI-based reviewer; GPT-4 API; PRISMA; Systematic literature review

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Int J Med Inform Journal subject: INFORMATICA MEDICA Year: 2024 Document type: Article

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google