Utilizing ChatGPT as a scientific reasoning engine to differentiate conflicting evidence and summarize challenges in controversial clinical questions.

Xie, Shiyao; Zhao, Wenjing; Deng, Guanghui; He, Guohua; He, Na; Lu, Zhenhua; Hu, Weihua; Zhao, Mingming; Du, Jian

Xie, Shiyao; Zhao, Wenjing; Deng, Guanghui; He, Guohua; He, Na; Lu, Zhenhua; Hu, Weihua; Zhao, Mingming; Du, Jian.

Affiliation

Xie S; Institute of Medical Technology, Peking University Health Science Center, Beijing, 100191, China.
Zhao W; National Institute of Health Data Science, Peking University, Beijing, 100191, China.
Deng G; Institute of Medical Technology, Peking University Health Science Center, Beijing, 100191, China.
He G; National Institute of Health Data Science, Peking University, Beijing, 100191, China.
He N; School of Health Humanities, Peking University, Beijing, 100191, China.
Lu Z; Department of Pediatric Nephrology and Rheumatology, Sun Yat-sen University First Affiliated Hospital, Guangzhou, Guangdong, 510062, China.
Hu W; Department of Pharmacy, Peking University Third Hospital, Beijing, 100089, China.
Zhao M; Department of Gastrointestinal Cancer Translational Research Laboratory, Peking University Cancer Hospital, Beijing, 100143, China.
Du J; Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, 100191, China.

J Am Med Inform Assoc ; 31(7): 1551-1560, 2024 Jun 20.

Article in En | MEDLINE | ID: mdl-38758667

ABSTRACT

ABSTRACT

OBJECTIVE:

Synthesizing and evaluating inconsistent medical evidence is essential in evidence-based medicine. This study aimed to employ ChatGPT as a sophisticated scientific reasoning engine to identify conflicting clinical evidence and summarize unresolved questions to inform further research. MATERIALS AND

METHODS:

We evaluated ChatGPT's effectiveness in identifying conflicting evidence and investigated its principles of logical reasoning. An automated framework was developed to generate a PubMed dataset focused on controversial clinical topics. ChatGPT analyzed this dataset to identify consensus and controversy, and to formulate unsolved research questions. Expert evaluations were conducted 1) on the consensus and controversy for factual consistency, comprehensiveness, and potential harm and, 2) on the research questions for relevance, innovation, clarity, and specificity.

RESULTS:

The gpt-4-1106-preview model achieved a 90% recall rate in detecting inconsistent claim pairs within a ternary assertions setup. Notably, without explicit reasoning prompts, ChatGPT provided sound reasoning for the assertions between claims and hypotheses, based on an analysis grounded in relevance, specificity, and certainty. ChatGPT's conclusions of consensus and controversies in clinical literature were comprehensive and factually consistent. The research questions proposed by ChatGPT received high expert ratings.

DISCUSSION:

Our experiment implies that, in evaluating the relationship between evidence and claims, ChatGPT considered more detailed information beyond a straightforward assessment of sentimental orientation. This ability to process intricate information and conduct scientific reasoning regarding sentiment is noteworthy, particularly as this pattern emerged without explicit guidance or directives in prompts, highlighting ChatGPT's inherent logical reasoning capabilities.

CONCLUSION:

This study demonstrated ChatGPT's capacity to evaluate and interpret scientific claims. Such proficiency can be generalized to broader clinical research literature. ChatGPT effectively aids in facilitating clinical studies by proposing unresolved challenges based on analysis of existing studies. However, caution is advised as ChatGPT's outputs are inferences drawn from the input literature and could be harmful to clinical practice.

Subject(s)
Key words

ChatGPT; clinical research; inconsistent evidence; large language model; reasoning

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Evidence-Based Medicine Limits: Humans Language: En Journal: J Am Med Inform Assoc Journal subject: INFORMATICA MEDICA Year: 2024 Document type: Article Affiliation country: China Country of publication: Reino Unido

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google