Distilling large language models for matching patients to clinical trials.

Nievas, Mauro; Basu, Aditya; Wang, Yanshan; Singh, Hrituraj

Nievas, Mauro; Basu, Aditya; Wang, Yanshan; Singh, Hrituraj.

Afiliación

Nievas M; Triomics Research, Triomics, Inc., San Francisco, CA 94105, United States.
Basu A; Triomics Research, Triomics, Inc., Bengaluru, Karnataka 560102, India.
Wang Y; Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States.
Singh H; Triomics Research, Triomics, Inc., San Francisco, CA 94105, United States.

J Am Med Inform Assoc ; 31(9): 1953-1963, 2024 Sep 01.

Article en En | MEDLINE | ID: mdl-38641416

ABSTRACT

ABSTRACT

OBJECTIVE:

The objective of this study is to systematically examine the efficacy of both proprietary (GPT-3.5, GPT-4) and open-source large language models (LLMs) (LLAMA 7B, 13B, 70B) in the context of matching patients to clinical trials in healthcare. MATERIALS AND

METHODS:

The study employs a multifaceted evaluation framework, incorporating extensive automated and human-centric assessments along with a detailed error analysis for each model, and assesses LLMs' capabilities in analyzing patient eligibility against clinical trial's inclusion and exclusion criteria. To improve the adaptability of open-source LLMs, a specialized synthetic dataset was created using GPT-4, facilitating effective fine-tuning under constrained data conditions.

RESULTS:

The findings indicate that open-source LLMs, when fine-tuned on this limited and synthetic dataset, achieve performance parity with their proprietary counterparts, such as GPT-3.5.

DISCUSSION:

This study highlights the recent success of LLMs in the high-stakes domain of healthcare, specifically in patient-trial matching. The research demonstrates the potential of open-source models to match the performance of proprietary models when fine-tuned appropriately, addressing challenges like cost, privacy, and reproducibility concerns associated with closed-source proprietary LLMs.

CONCLUSION:

The study underscores the opportunity for open-source LLMs in patient-trial matching. To encourage further research and applications in this field, the annotated evaluation dataset and the fine-tuned LLM, Trial-LLAMA, are released for public use.

Asunto(s)

Ensayos Clínicos como Asunto; Selección de Paciente; Humanos; Lenguajes de Programación; Procesamiento de Lenguaje Natural

Palabras clave

GPT-3.5; GPT-4; LLAMA; clinical trial matching; distillation; large language models

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Ensayos Clínicos como Asunto / Selección de Paciente Límite: Humans Idioma: En Revista: J Am Med Inform Assoc Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Reino Unido

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google