RESUMO
CONTEXT.: Artificial intelligence algorithms hold the potential to fundamentally change many aspects of society. Application of these tools, including the publicly available ChatGPT, has demonstrated impressive domain-specific knowledge in many areas, including medicine. OBJECTIVES.: To understand the level of pathology domain-specific knowledge for ChatGPT using different underlying large language models, GPT-3.5 and the updated GPT-4. DESIGN.: An international group of pathologists (n = 15) was recruited to generate pathology-specific questions at a similar level to those that could be seen on licensing (board) examinations. The questions (n = 15) were answered by GPT-3.5, GPT-4, and a staff pathologist who recently passed their Canadian pathology licensing exams. Participants were instructed to score answers on a 5-point scale and to predict which answer was written by ChatGPT. RESULTS.: GPT-3.5 performed at a similar level to the staff pathologist, while GPT-4 outperformed both. The overall score for both GPT-3.5 and GPT-4 was within the range of meeting expectations for a trainee writing licensing examinations. In all but one question, the reviewers were able to correctly identify the answers generated by GPT-3.5. CONCLUSIONS.: By demonstrating the ability of ChatGPT to answer pathology-specific questions at a level similar to (GPT-3.5) or exceeding (GPT-4) a trained pathologist, this study highlights the potential of large language models to be transformative in this space. In the future, more advanced iterations of these algorithms with increased domain-specific knowledge may have the potential to assist pathologists and enhance pathology resident training.