RESUMO
INTRODUCTION: Competency-based frameworks are common in surgical training. However, the optimal use of standardized technical assessments is not well defined. We investigated the effect of rater training (RT) on the reliability and validity of four assessment tools. MATERIALS AND METHODS: Forty-Seven surgeons were randomized to RT (N = 24) and no training (N = 23) groups. A task-specific checklist, pass-fail, visual analog, and OSATS global rating scale (GRS) were used to assess trainee knot-tying and suturing tasks. Delayed assessment was performed two weeks later. Internal consistency, intra/inter-rater reliability, and construct validity were measured. RESULTS: The GRS had superior reliability and validity compared to the other tools regardless of training. No significant differences between training groups was found. However, the RT group trended to improved reliability for all tools at both assessments. CONCLUSIONS: RT did not lead to significant improvements in skills assessments. Standardized assessments (OSATS GRS) are preferred due to their superior reliability and validity over other methods. Despite findings, we believe more effective training methods or repeated sessions may be required for sustained and significant effects.