Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment.

Hope, David; Adamson, Karen; McManus, I C; Chis, Liliana; Elder, Andrew

Hope, David; Adamson, Karen; McManus, I C; Chis, Liliana; Elder, Andrew.

Afiliação

Hope D; Centre for Medical Education, The Chancellor's Building, College of Medicine and Veterinary Medicine, The University of Edinburgh, 49 Little France Crescent, Edinburgh, Scotland, EH16 4SB, UK. david.hope@ed.ac.uk.
Adamson K; Medical Unit, St John's Hospital, Livingston, Scotland, EH54 6PP, UK.
McManus IC; Research Department of Medical Education, The Medical School, University College London, Gower Street, London, WC1E 6BT, UK.
Chis L; MRCPUK Central Office, 11 St Andrew's Place, Regent's Park, London, NW1 4LE, UK.
Elder A; MRCPUK Central Office, 11 St Andrew's Place, Regent's Park, London, NW1 4LE, UK.

BMC Med Educ ; 18(1): 64, 2018 Apr 03.

Article em En | MEDLINE | ID: mdl-29615016

RESUMO

BACKGROUND: Fairness is a critical component of defensible assessment. Candidates should perform according to ability without influence from background characteristics such as ethnicity or sex. However, performance differs by candidate background in many assessment environments. Many potential causes of such differences exist, and examinations must be routinely analysed to ensure they do not present inappropriate progression barriers for any candidate group. By analysing the individual questions of an examination through techniques such as Differential Item Functioning (DIF), we can test whether a subset of unfair questions explains group-level differences. Such items can then be revised or removed. METHODS: We used DIF to investigate fairness for 13,694 candidates sitting a major international summative postgraduate examination in internal medicine. We compared (a) ethnically white UK graduates against ethnically non-white UK graduates and (b) male UK graduates against female UK graduates. DIF was used to test 2773 questions across 14 sittings. RESULTS: Across 2773 questions eight (0.29%) showed notable DIF after correcting for multiple comparisons: seven medium effects and one large effect. Blinded analysis of these questions by a panel of clinician assessors identified no plausible explanations for the differences. These questions were removed from the question bank and we present them here to share knowledge of questions with DIF. These questions did not significantly impact the overall performance of the cohort. Group-level differences in performance between the groups we studied in this examination cannot be explained by a subset of unfair questions. CONCLUSIONS: DIF helps explore fairness in assessment at the question level. This is especially important in high-stakes assessment where a small number of unfair questions may adversely impact the passing rates of some groups. However, very few questions exhibited notable DIF so differences in passing rates for the groups we studied cannot be explained by unfairness at the question level.

Assuntos

Avaliação Educacional/métodos; Racismo; Sexismo; Desempenho Acadêmico; Estudos de Coortes; Avaliação Educacional/normas; Etnicidade; Feminino; Humanos; Medicina Interna/educação; Masculino; Reino Unido; População Branca; Xenofobia

Palavras-chave

Assessment; Bias; Fairness

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Avaliação Educacional / Racismo / Sexismo Tipo de estudo: Etiology_studies / Incidence_studies / Observational_studies / Prognostic_studies / Risk_factors_studies Limite: Female / Humans / Male País/Região como assunto: Europa Idioma: En Revista: BMC Med Educ Ano de publicação: 2018 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google