Measurement precision at the cut score in medical multiple choice exams: Theory matters.

Lahner, Felicitas-Maria; Schauber, Stefan; Lörwald, Andrea Carolin; Kropf, Roger; Guttormsen, Sissel; Fischer, Martin R; Huwendiek, Sören

Lahner, Felicitas-Maria; Schauber, Stefan; Lörwald, Andrea Carolin; Kropf, Roger; Guttormsen, Sissel; Fischer, Martin R; Huwendiek, Sören.

Afiliação

Lahner FM; Institute for Medical Education, University of Bern, Bern, Switzerland. felicitas-maria.lahner@bfh.ch.
Schauber S; Department of Health Professions, University of Applied Sciences, Bern, Switzerland. felicitas-maria.lahner@bfh.ch.
Lörwald AC; Centre for Educational Measurement at the University of Oslo (CEMO) and Centre for Health Sciences Education, University of Oslo, Oslo, Norway.
Kropf R; Institute for Medical Education, University of Bern, Bern, Switzerland.
Guttormsen S; Faculty of Medicine, University of Zurich, Zurich, Switzerland.
Fischer MR; Institute for Medical Education, University of Bern, Bern, Switzerland.
Huwendiek S; Institute for Medical Education, University Hospital, LMU Munich, Munich, Germany.

Perspect Med Educ ; 9(4): 220-228, 2020 08.

Article em En | MEDLINE | ID: mdl-32468274

ABSTRACT

ABSTRACT

INTRODUCTION:

In high-stakes assessment, the measurement precision of pass-fail decisions is of great importance. A concept for analyzing the measurement precision at the cut score is conditional reliability, which describes measurement precision for every score achieved in an exam. We compared conditional reliabilities in Classical Test Theory (CTT) and Item Response Theory (IRT) with a special focus on the cut score and potential factors influencing conditional reliability at the cut score.

METHODS:

We analyzed 32 multiple-choice exams from three Swiss medical schools comparing conditional reliability at the cut score in IRT and CCT. Additionally, we analyzed potential influencing factors such as the range of examinees' performance, year of study, and number of items using multiple regression.

RESULTS:

In CTT, conditional reliability was highest for very low and very high scores, whereas examinees with medium scores showed low conditional reliabilities. In IRT, the maximum conditional reliability was in the middle of the scale. Therefore, conditional reliability at the cut score was significantly higher in IRT compared with CTT. It was influenced by the range of examinees' performance and number of items. This influence was more pronounced in CTT.

DISCUSSION:

We found that conditional reliability shows inverse distributions and conclusions regarding the measurement precision at the cut score depending on the theory used. As the use of IRT seems to be more appropriate for criterion-oriented standard setting in the framework of competency-based medical education, our findings might have practical implications for the design and quality assurance of medical education assessments.

Assuntos

Avaliação Educacional/normas; Habilidades para Realização de Testes/normas; Competência Clínica/normas; Competência Clínica/estatística & dados numéricos; Avaliação Educacional/métodos; Avaliação Educacional/estatística & dados numéricos; Humanos; Modelos Logísticos; Modelos Educacionais; Psicometria/instrumentação; Psicometria/métodos; Reprodutibilidade dos Testes; Suíça; Habilidades para Realização de Testes/estatística & dados numéricos

Palavras-chave

Conditional reliability; Measurement precision; Multiple choice exams; Reliability

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Avaliação Educacional / Habilidades para Realização de Testes Tipo de estudo: Prognostic_studies Limite: Humans País/Região como assunto: Europa Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google