Búsqueda | Portal de Búsqueda de la BVS Colombia

Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment.

Hope, David; Adamson, Karen; McManus, I C; Chis, Liliana; Elder, Andrew.

BMC Med Educ ; 18(1): 64, 2018 Apr 03.

Artículo en Inglés | MEDLINE | ID: mdl-29615016

RESUMEN

BACKGROUND: Fairness is a critical component of defensible assessment. Candidates should perform according to ability without influence from background characteristics such as ethnicity or sex. However, performance differs by candidate background in many assessment environments. Many potential causes of such differences exist, and examinations must be routinely analysed to ensure they do not present inappropriate progression barriers for any candidate group. By analysing the individual questions of an examination through techniques such as Differential Item Functioning (DIF), we can test whether a subset of unfair questions explains group-level differences. Such items can then be revised or removed. METHODS: We used DIF to investigate fairness for 13,694 candidates sitting a major international summative postgraduate examination in internal medicine. We compared (a) ethnically white UK graduates against ethnically non-white UK graduates and (b) male UK graduates against female UK graduates. DIF was used to test 2773 questions across 14 sittings. RESULTS: Across 2773 questions eight (0.29%) showed notable DIF after correcting for multiple comparisons: seven medium effects and one large effect. Blinded analysis of these questions by a panel of clinician assessors identified no plausible explanations for the differences. These questions were removed from the question bank and we present them here to share knowledge of questions with DIF. These questions did not significantly impact the overall performance of the cohort. Group-level differences in performance between the groups we studied in this examination cannot be explained by a subset of unfair questions. CONCLUSIONS: DIF helps explore fairness in assessment at the question level. This is especially important in high-stakes assessment where a small number of unfair questions may adversely impact the passing rates of some groups. However, very few questions exhibited notable DIF so differences in passing rates for the groups we studied cannot be explained by unfairness at the question level.

Asunto(s)

Evaluación Educacional/métodos , Racismo , Sexismo , Rendimiento Académico , Estudios de Cohortes , Evaluación Educacional/normas , Etnicidad , Femenino , Humanos , Medicina Interna/educación , Masculino , Reino Unido , Población Blanca , Xenofobia

Implementing statistical equating for MRCP(UK) Parts 1 and 2.

McManus, I C; Chis, Liliana; Fox, Ray; Waller, Derek; Tang, Peter.

BMC Med Educ ; 14: 204, 2014 Sep 26.

Artículo en Inglés | MEDLINE | ID: mdl-25257070

RESUMEN

BACKGROUND: The MRCP(UK) exam, in 2008 and 2010, changed the standard-setting of its Part 1 and Part 2 examinations from a hybrid Angoff/Hofstee method to statistical equating using Item Response Theory, the reference group being UK graduates. The present paper considers the implementation of the change, the question of whether the pass rate increased amongst non-UK candidates, any possible role of Differential Item Functioning (DIF), and changes in examination predictive validity after the change. METHODS: Analysis of data of MRCP(UK) Part 1 exam from 2003 to 2013 and Part 2 exam from 2005 to 2013. RESULTS: Inspection suggested that Part 1 pass rates were stable after the introduction of statistical equating, but showed greater annual variation probably due to stronger candidates taking the examination earlier. Pass rates seemed to have increased in non-UK graduates after equating was introduced, but was not associated with any changes in DIF after statistical equating. Statistical modelling of the pass rates for non-UK graduates found that pass rates, in both Part 1 and Part 2, were increasing year on year, with the changes probably beginning before the introduction of equating. The predictive validity of Part 1 for Part 2 was higher with statistical equating than with the previous hybrid Angoff/Hofstee method, confirming the utility of IRT-based statistical equating. CONCLUSIONS: Statistical equating was successfully introduced into the MRCP(UK) Part 1 and Part 2 written examinations, resulting in higher predictive validity than the previous Angoff/Hofstee standard setting. Concerns about an artefactual increase in pass rates for non-UK candidates after equating were shown not to be well-founded. Most likely the changes resulted from a genuine increase in candidate ability, albeit for reasons which remain unclear, coupled with a cognitive illusion giving the impression of a step-change immediately after equating began. Statistical equating provides a robust standard-setting method, with a better theoretical foundation than judgemental techniques such as Angoff, and is more straightforward and requires far less examiner time to provide a more valid result. The present study provides a detailed case study of introducing statistical equating, and issues which may need to be considered with its introduction.

Asunto(s)

Certificación/métodos , Evaluación Educacional/métodos , Certificación/normas , Certificación/estadística & datos numéricos , Competencia Clínica/normas , Interpretación Estadística de Datos , Humanos , Reproducibilidad de los Resultados , Reino Unido

The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations.

Tighe, Jane; McManus, I C; Dewhurst, Neil G; Chis, Liliana; Mucklow, John.

BMC Med Educ ; 10: 40, 2010 Jun 02.

Artículo en Inglés | MEDLINE | ID: mdl-20525220

RESUMEN

BACKGROUND: Cronbach's alpha is widely used as the preferred index of reliability for medical postgraduate examinations. A value of 0.8-0.9 is seen by providers and regulators alike as an adequate demonstration of acceptable reliability for any assessment. Of the other statistical parameters, Standard Error of Measurement (SEM) is mainly seen as useful only in determining the accuracy of a pass mark. However the alpha coefficient depends both on SEM and on the ability range (standard deviation, SD) of candidates taking an exam. This study investigated the extent to which the necessarily narrower ability range in candidates taking the second of the three part MRCP(UK) diploma examinations, biases assessment of reliability and SEM. METHODS: a) The interrelationships of standard deviation (SD), SEM and reliability were investigated in a Monte Carlo simulation of 10,000 candidates taking a postgraduate examination. b) Reliability and SEM were studied in the MRCP(UK) Part 1 and Part 2 Written Examinations from 2002 to 2008. c) Reliability and SEM were studied in eight Specialty Certificate Examinations introduced in 2008-9. RESULTS: The Monte Carlo simulation showed, as expected, that restricting the range of an assessment only to those who had already passed it, dramatically reduced the reliability but did not affect the SEM of a simulated assessment. The analysis of the MRCP(UK) Part 1 and Part 2 written examinations showed that the MRCP(UK) Part 2 written examination had a lower reliability than the Part 1 examination, but, despite that lower reliability, the Part 2 examination also had a smaller SEM (indicating a more accurate assessment). The Specialty Certificate Examinations had small Ns, and as a result, wide variability in their reliabilities, but SEMs were comparable with MRCP(UK) Part 2. CONCLUSIONS: An emphasis upon assessing the quality of assessments primarily in terms of reliability alone can produce a paradoxical and distorted picture, particularly in the situation where a narrower range of candidate ability is an inevitable consequence of being able to take a second part examination only after passing the first part examination. Reliability also shows problems when numbers of candidates in examinations are low and sampling error affects the range of candidate ability. SEM is not subject to such problems; it is therefore a better measure of the quality of an assessment and is recommended for routine use.

Asunto(s)

Educación de Postgrado en Medicina , Evaluación Educacional/normas , Evaluación Educacional/estadística & datos numéricos , Humanos , Modelos Estadísticos , Método de Montecarlo , Reproducibilidad de los Resultados

Graduates of different UK medical schools show substantial differences in performance on MRCP(UK) Part 1, Part 2 and PACES examinations.

McManus, I C; Elder, Andrew T; de Champlain, Andre; Dacre, Jane E; Mollon, Jennifer; Chis, Liliana.

BMC Med ; 6: 5, 2008 Feb 14.

Artículo en Inglés | MEDLINE | ID: mdl-18275598

RESUMEN

BACKGROUND: The UK General Medical Council has emphasized the lack of evidence on whether graduates from different UK medical schools perform differently in their clinical careers. Here we assess the performance of UK graduates who have taken MRCP(UK) Part 1 and Part 2, which are multiple-choice assessments, and PACES, an assessment using real and simulated patients of clinical examination skills and communication skills, and we explore the reasons for the differences between medical schools. METHOD: We perform a retrospective analysis of the performance of 5827 doctors graduating in UK medical schools taking the Part 1, Part 2 or PACES for the first time between 2003/2 and 2005/3, and 22453 candidates taking Part 1 from 1989/1 to 2005/3. RESULTS: Graduates of UK medical schools performed differently in the MRCP(UK) examination between 2003/2 and 2005/3. Part 1 and 2 performance of Oxford, Cambridge and Newcastle-upon-Tyne graduates was significantly better than average, and the performance of Liverpool, Dundee, Belfast and Aberdeen graduates was significantly worse than average. In the PACES (clinical) examination, Oxford graduates performed significantly above average, and Dundee, Liverpool and London graduates significantly below average. About 60% of medical school variance was explained by differences in pre-admission qualifications, although the remaining variance was still significant, with graduates from Leicester, Oxford, Birmingham, Newcastle-upon-Tyne and London overperforming at Part 1, and graduates from Southampton, Dundee, Aberdeen, Liverpool and Belfast underperforming relative to pre-admission qualifications. The ranking of schools at Part 1 in 2003/2 to 2005/3 correlated 0.723, 0.654, 0.618 and 0.493 with performance in 1999-2001, 1996-1998, 1993-1995 and 1989-1992, respectively. CONCLUSION: Candidates from different UK medical schools perform differently in all three parts of the MRCP(UK) examination, with the ordering consistent across the parts of the exam and with the differences in Part 1 performance being consistent from 1989 to 2005. Although pre-admission qualifications explained some of the medical school variance, the remaining differences do not seem to result from career preference or other selection biases, and are presumed to result from unmeasured differences in ability at entry to the medical school or to differences between medical schools in teaching focus, content and approaches. Exploration of causal mechanisms would be enhanced by results from a national medical qualifying examination.

Asunto(s)

Educación de Pregrado en Medicina/estadística & datos numéricos , Evaluación Educacional/estadística & datos numéricos , Estudiantes de Medicina/estadística & datos numéricos , Femenino , Humanos , Masculino , Análisis Multivariante , Análisis de Regresión , Factores Sexuales , Análisis y Desempeño de Tareas , Reino Unido

UK postgraduate medicine examinations: opportunities for international candidates.

McAlpine, Lawrence; Selamaj, Elona; Shannon, Colleen; Chis, Liliana; Dacre, Jane; Elder, Andrew.

Clin Med (Lond) ; 14(5): 500-5, 2014 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-25301910

RESUMEN

The medical profession is global, and ambitious trainee physicians around the world are eager to attain internationally recognised postgraduate medical qualifications. The MRCP(UK) and specialty certificate examinations of the Federation of Royal Colleges of Physicians of the United Kingdom provide such qualifications, and between 2002 and 2013, the number of international candidates attempting these examinations grew substantially. Delivering these proven and reliable UK-based examinations in other countries has many local benefits: it enhances careers, strengthens medical training and improves standards of patient care. In collaboration with international colleagues, the Federation is committed to continued growth that extends these benefits to all physicians, wherever they work and live.

Asunto(s)

Competencia Clínica/normas , Educación de Postgrado en Medicina , Evaluación Educacional , Educación de Postgrado en Medicina/normas , Educación de Postgrado en Medicina/estadística & datos numéricos , Evaluación Educacional/métodos , Evaluación Educacional/estadística & datos numéricos , Humanos , Internacionalidad , Reino Unido

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA