ABSTRACT
PROBLEM: Implementation of competency-based medical education has necessitated more frequent trainee assessments. Use of simulation as an assessment tool is limited by access to trained examiners, cost, and concerns with interrater reliability. Developing an automated tool for pass/fail assessment of trainees in simulation could improve accessibility and quality assurance of assessments. This study aimed to develop an automated assessment model using deep learning techniques to assess performance of anesthesiology trainees in a simulated critical event. APPROACH: The authors retrospectively analyzed anaphylaxis simulation videos to train and validate a deep learning model. They used an anaphylactic shock simulation video database from an established simulation curriculum, integrating a convenience sample of 52 usable videos. The core part of the model, developed between July 2019 and July 2020, is a bidirectional transformer encoder. OUTCOMES: The main outcome was the F1 score, accuracy, recall, and precision of the automated assessment model in analyzing pass/fail of trainees in simulation videos. Five models were developed and evaluated. The strongest model was model 1 with an accuracy of 71% and an F1 score of 0.68. NEXT STEPS: The authors demonstrated the feasibility of developing a deep learning model from a simulation database that can be used for automated assessment of medical trainees in a simulated anaphylaxis scenario. The important next steps are to (1) integrate a larger simulation dataset to improve the accuracy of the model; (2) assess the accuracy of the model on alternative anaphylaxis simulations, additional medical disciplines, and alternative medical education evaluation modalities; and (3) gather feedback from education leadership and clinician educators surrounding the perceived strengths and weaknesses of deep learning models for simulation assessment. Overall, this novel approach for performance prediction has broad implications in medical education and assessment.