Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generation.

Park, Jimyung; Fang, Yilu; Ta, Casey; Zhang, Gongbo; Idnay, Betina; Chen, Fangyi; Feng, David; Shyu, Rebecca; Gordon, Emily R; Spotnitz, Matthew; Weng, Chunhua

Park, Jimyung; Fang, Yilu; Ta, Casey; Zhang, Gongbo; Idnay, Betina; Chen, Fangyi; Feng, David; Shyu, Rebecca; Gordon, Emily R; Spotnitz, Matthew; Weng, Chunhua.

Afiliación

Park J; Department of Biomedical Informatics, Columbia University, New York, United States.
Fang Y; Department of Biomedical Informatics, Columbia University, New York, United States.
Ta C; Department of Biomedical Informatics, Columbia University, New York, United States.
Zhang G; Department of Biomedical Informatics, Columbia University, New York, United States.
Idnay B; Department of Biomedical Informatics, Columbia University, New York, United States.
Chen F; Department of Biomedical Informatics, Columbia University, New York, United States.
Feng D; Department of Biomedical Informatics, Columbia University, New York, United States.
Shyu R; Department of Biomedical Informatics, Columbia University, New York, United States.
Gordon ER; Columbia University Vagelos College of Physicians and Surgeons, New York, United States.
Spotnitz M; Department of Biomedical Informatics, Columbia University, New York, United States.
Weng C; Department of Biomedical Informatics, Columbia University, New York, United States. Electronic address: cw2384@cumc.columbia.edu.

J Biomed Inform ; 154: 104649, 2024 Jun.

Article en En | MEDLINE | ID: mdl-38697494

ABSTRACT

ABSTRACT

OBJECTIVE:

Automated identification of eligible patients is a bottleneck of clinical research. We propose Criteria2Query (C2Q) 3.0, a system that leverages GPT-4 for the semi-automatic transformation of clinical trial eligibility criteria text into executable clinical database queries. MATERIALS AND

METHODS:

C2Q 3.0 integrated three GPT-4 prompts for concept extraction, SQL query generation, and reasoning. Each prompt was designed and evaluated separately. The concept extraction prompt was benchmarked against manual annotations from 20 clinical trials by two evaluators, who later also measured SQL generation accuracy and identified errors in GPT-generated SQL queries from 5 clinical trials. The reasoning prompt was assessed by three evaluators on four metrics readability, correctness, coherence, and usefulness, using corrected SQL queries and an open-ended feedback questionnaire.

RESULTS:

Out of 518 concepts from 20 clinical trials, GPT-4 achieved an F1-score of 0.891 in concept extraction. For SQL generation, 29 errors spanning seven categories were detected, with logic errors being the most common (n = 10; 34.48 %). Reasoning evaluations yielded a high coherence rating, with the mean score being 4.70 but relatively lower readability, with a mean of 3.95. Mean scores of correctness and usefulness were identified as 3.97 and 4.37, respectively.

CONCLUSION:

GPT-4 significantly improves the accuracy of extracting clinical trial eligibility criteria concepts in C2Q 3.0. Continued research is warranted to ensure the reliability of large language models.

Asunto(s)

Ensayos Clínicos como Asunto; Humanos; Procesamiento de Lenguaje Natural; Programas Informáticos; Selección de Paciente

Palabras clave

Artificial intelligence; ChatGPT; Eligibility prescreening; Humancomputer collaboration; Large language models

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Ensayos Clínicos como Asunto Idioma: En Revista: J Biomed Inform Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google