Your browser doesn't support javascript.
loading
The test developer's dilemma: Evaluating the balance of feasibility and empiric performance of test development techniques for repeated written assessments.
Shappell, Eric; Wagner, Mary Jo; Bailitz, John; Mead, Therese; Ahn, James; Eyre, Andrew; Maldonado, Nicholas; Wallace, Bradley; Park, Yoon Soo.
Affiliation
  • Shappell E; Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
  • Wagner MJ; Central Michigan University College of Medicine, Mount Pleasant, MI, USA.
  • Bailitz J; Department of Emergency Medicine, Northwestern University, Evanston, IL, USA.
  • Mead T; Central Michigan University College of Medicine, Mount Pleasant, MI, USA.
  • Ahn J; Section of Emergency Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA.
  • Eyre A; Department of Emergency Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
  • Maldonado N; Department of Emergency Medicine, University of Florida College of Medicine, Gainesville, FL, USA.
  • Wallace B; Emory University School of Medicine, Atlanta, GA, USA.
  • Park YS; Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
Med Teach ; 45(2): 187-192, 2023 02.
Article in En | MEDLINE | ID: mdl-36065641
ABSTRACT

PURPOSE:

Written assessments face challenges when administered repeatedly, including resource-intensive item development and the potential for performance improvement secondary to item recall as opposed to understanding. This study examines the efficacy of three-item development techniques in addressing these challenges.

METHODS:

Learners at five training programs completed two 60-item repeated assessments. Items from the first test were randomized to one of three treatments for the second assessment (1) Verbatim repetition, (2) Isomorphic changes, or (3) Total revisions. Primary outcomes were the stability of item psychometrics across test versions and evidence of item recall influencing performance as measured by the rate of items answered correctly and then incorrectly (correct-to-incorrect rate), which suggests guessing.

RESULTS:

Forty-six learners completed both tests. Item psychometrics were comparable across test versions. Correct-to-incorrect rates differed significantly between groups with the highest guessing rate (lowest recall effect) in the Total Revision group (0.15) and the lowest guessing rate (highest recall effect) in the Verbatim group (0.05), p = 0.01.

CONCLUSIONS:

Isomorphic and total revisions demonstrated superior performance in mitigating the effect of recall on repeated assessments. Given the high costs of total item revisions, there is promise in exploring isomorphic items as an efficient and effective approach to repeated written assessments.[Box see text].
Subject(s)
Key words

Full text: 1 Database: MEDLINE Main subject: Mental Recall / Research Design Type of study: Clinical_trials Limits: Humans Language: En Journal: Med Teach Year: 2023 Type: Article Affiliation country: United States

Full text: 1 Database: MEDLINE Main subject: Mental Recall / Research Design Type of study: Clinical_trials Limits: Humans Language: En Journal: Med Teach Year: 2023 Type: Article Affiliation country: United States