Faithful AI in Medicine: A Systematic Review with Large Language Models and Beyond.

Xie, Qianqian; Schenck, Edward J; Yang, He S; Chen, Yong; Peng, Yifan; Wang, Fei

Xie, Qianqian; Schenck, Edward J; Yang, He S; Chen, Yong; Peng, Yifan; Wang, Fei.

Affiliation

Xie Q; Weill Cornell Medicine.
Schenck EJ; New York-Presbyterian Hospital, Weill Cornell Medical Center.
Yang HS; Weill Cornell Medical College.
Chen Y; University of Pennsylvania.
Peng Y; Weill Cornell Medicine.
Wang F; Weill Cornell Medicine.

Res Sq ; 2023 Dec 04.

Article in En | MEDLINE | ID: mdl-38106170

ABSTRACT

ABSTRACT

Objective:

While artificial intelligence (AI), particularly large language models (LLMs), offers significant potential for medicine, it raises critical concerns due to the possibility of generating factually incorrect information, leading to potential long-term risks and ethical issues. This review aims to provide a comprehensive overview of the faithfulness problem in existing research on AI in healthcare and medicine, with a focus on the analysis of the causes of unfaithful results, evaluation metrics, and mitigation methods. Materials and

Methods:

Using PRISMA methodology, we sourced 5,061 records from five databases (PubMed, Scopus, IEEE Xplore, ACM Digital Library, Google Scholar) published between January 2018 to March 2023. We removed duplicates and screened records based on exclusion criteria.

Results:

With 40 leaving articles, we conducted a systematic review of recent developments aimed at optimizing and evaluating factuality across a variety of generative medical AI approaches. These include knowledge-grounded LLMs, text-to-text generation, multimodality-to-text generation, and automatic medical fact-checking tasks.

Discussion:

Current research investigating the factuality problem in medical AI is in its early stages. There are significant challenges related to data resources, backbone models, mitigation methods, and evaluation metrics. Promising opportunities exist for novel faithful medical AI research involving the adaptation of LLMs and prompt engineering.

Conclusion:

This comprehensive review highlights the need for further research to address the issues of reliability and factuality in medical AI, serving as both a reference and inspiration for future research into the safe, ethical use of AI in medicine and healthcare.

Key words

evaluation metrics; factuality and reliability; generative medical AI; large language models

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Systematic_reviews Language: En Journal: Res Sq Year: 2023 Document type: Article

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Systematic_reviews Language: En Journal: Res Sq Year: 2023 Document type: Article