Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Adv Health Sci Educ Theory Pract ; 28(5): 1441-1465, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37097483

RESUMEN

Automatic Item Generation (AIG) refers to the process of using cognitive models to generate test items using computer modules. It is a new but rapidly evolving research area where cognitive and psychometric theory are combined into digital framework. However, assessment of the item quality, usability and validity of AIG relative to traditional item development methods lacks clarification. This paper takes a top-down strong theory approach to evaluate AIG in medical education. Two studies were conducted: Study I-participants with different levels of clinical knowledge and item writing experience developed medical test items both manually and through AIG. Both item types were compared in terms of quality and usability (efficiency and learnability); Study II-Automatically generated items were included in a summative exam in the content area of surgery. A psychometric analysis based on Item Response Theory inspected the validity and quality of the AIG-items. Items generated by AIG presented quality, evidences of validity and were adequate for testing student's knowledge. The time spent developing the contents for item generation (cognitive models) and the number of items generated did not vary considering the participants' item writing experience or clinical knowledge. AIG produces numerous high-quality items in a fast, economical and easy to learn process, even for inexperienced and without clinical training item writers. Medical schools may benefit from a substantial improvement in cost-efficiency in developing test items by using AIG. Item writing flaws can be significantly reduced thanks to the application of AIG's models, thus generating test items capable of accurately gauging students' knowledge.


Asunto(s)
Educación de Pregrado en Medicina , Educación Médica , Humanos , Evaluación Educacional/métodos , Educación de Pregrado en Medicina/métodos , Psicometría , Estudiantes
2.
Med Teach ; 41(5): 569-577, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-30299196

RESUMEN

Despite the increased emphasis on the use of workplace-based assessment in competency-based education models, there is still an important role for the use of multiple choice questions (MCQs) in the assessment of health professionals. The challenge, however, is to ensure that MCQs are developed in a way to allow educators to derive meaningful information about examinees' abilities. As educators' needs for high-quality test items have evolved so has our approach to developing MCQs. This evolution has been reflected in a number of ways including: the use of different stimulus formats; the creation of novel response formats; the development of new approaches to problem conceptualization; and the incorporation of technology. The purpose of this narrative review is to provide the reader with an overview of how our understanding of the use of MCQs in the assessment of health professionals has evolved to better measure clinical reasoning and to improve both efficiency and item quality.


Asunto(s)
Educación de Pregrado en Medicina , Evaluación Educacional/métodos , Cognición , Educación Basada en Competencias , Instrucción por Computador/métodos , Humanos
3.
Teach Learn Med ; 29(1): 52-58, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-27603790

RESUMEN

Construct: Valid score interpretation is important for constructs in performance assessments such as objective structured clinical examinations (OSCEs). An OSCE is a type of performance assessment in which a series of standardized patients interact with the student or candidate who is scored by either the standardized patient or a physician examiner. BACKGROUND: In high-stakes examinations, test security is an important issue. Students accessing unauthorized test materials can create an unfair advantage and lead to examination scores that do not reflect students' true ability level. The purpose of this study was to assess the impact of various simulated security breaches on OSCE scores. APPROACH: Seventy-six 3rd-year medical students participated in an 8-station OSCE and were randomized to either a control group or to 1 of 2 experimental conditions simulating test security breaches: station topic (i.e., providing a list of station topics prior to the examination) or egregious security breach (i.e., providing detailed content information prior to the examination). Overall total scores were compared for the 3 groups using both a one-way between-subjects analysis of variance and a repeated measure analysis of variance to compare the checklist, rating scales, and oral question subscores across the three conditions. RESULTS: Overall total scores were highest for the egregious security breach condition (81.8%), followed by the station topic condition (73.6%), and they were lowest for the control group (67.4%). This trend was also found with checklist subscores only (79.1%, 64.9%, and 60.3%, respectively for the security breach, station topic, and control conditions). Rating scale subscores were higher for both the station topic and egregious security breach conditions compared to the control group (82.6%, 83.1%, and 77.6%, respectively). Oral question subscores were significantly higher for the egregious security breach condition (88.8%) followed by the station topic condition (64.3%), and they were the lowest for the control group (48.6%). CONCLUSIONS: This simulation of different OSCE security breaches demonstrated that student performance is greatly advantaged by having prior access to test materials. This has important implications for medical educators as they develop policies and procedures regarding the safeguarding and reuse of test content.


Asunto(s)
Competencia Clínica/normas , Decepción , Evaluación Educacional , Femenino , Humanos , Masculino , Estudiantes de Medicina
4.
Med Teach ; 38(8): 838-43, 2016 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26998566

RESUMEN

With the recent interest in competency-based education, educators are being challenged to develop more assessment opportunities. As such, there is increased demand for exam content development, which can be a very labor-intense process. An innovative solution to this challenge has been the use of automatic item generation (AIG) to develop multiple-choice questions (MCQs). In AIG, computer technology is used to generate test items from cognitive models (i.e. representations of the knowledge and skills that are required to solve a problem). The main advantage yielded by AIG is the efficiency in generating items. Although technology for AIG relies on a linear programming approach, the same principles can also be used to improve traditional committee-based processes used in the development of MCQs. Using this approach, content experts deconstruct their clinical reasoning process to develop a cognitive model which, in turn, is used to create MCQs. This approach is appealing because it: (1) is efficient; (2) has been shown to produce items with psychometric properties comparable to those generated using a traditional approach; and (3) can be used to assess higher order skills (i.e. application of knowledge). The purpose of this article is to provide a novel framework for the development of high-quality MCQs using cognitive models.


Asunto(s)
Educación de Pregrado en Medicina , Evaluación Educacional/métodos , Evaluación Educacional/normas , Modelos Psicológicos , Educación Basada en Competencias , Humanos
5.
Teach Learn Med ; 28(2): 166-73, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26849247

RESUMEN

UNLABELLED: CONSTRUCT: Automatic item generation (AIG) is an alternative method for producing large numbers of test items that integrate cognitive modeling with computer technology to systematically generate multiple-choice questions (MCQs). The purpose of our study is to describe and validate a method of generating plausible but incorrect distractors. Initial applications of AIG demonstrated its effectiveness in producing test items. However, expert review of the initial items identified a key limitation where the generation of implausible incorrect options, or distractors, might limit the applicability of items in real testing situations. BACKGROUND: Medical educators require development of test items in large quantities to facilitate the continual assessment of student knowledge. Traditional item development processes are time-consuming and resource intensive. Studies have validated the quality of generated items through content expert review. However, no study has yet documented how generated items perform in a test administration. Moreover, no study has yet to validate AIG through student responses to generated test items. APPROACH: To validate our refined AIG method in generating plausible distractors, we collected psychometric evidence from a field test of the generated test items. A three-step process was used to generate test items in the area of jaundice. At least 455 Canadian and international medical graduates responded to each of the 13 generated items embedded in a high-stake exam administration. Item difficulty, discrimination, and index of discrimination estimates were calculated for the correct option as well as each distractor. RESULTS: Item analysis results for the correct options suggest that the generated items measured candidate performances across a range of ability levels while providing a consistent level of discrimination for each item. Results for the distractors reveal that the generated items differentiated the low- from the high-performing candidates. CONCLUSIONS: Previous research on AIG highlighted how this item development method can be used to produce high-quality stems and correct options for MCQ exams. The purpose of the current study was to describe, illustrate, and evaluate a method for modeling plausible but incorrect options. Evidence provided in this study demonstrates that AIG can produce psychometrically sound test items. More important, by adapting the distractors to match the unique features presented in the stem and correct option, the generation of MCQs using automated procedure has the potential to produce plausible distractors and yield large numbers of high-quality items for medical education.


Asunto(s)
Instrucción por Computador/métodos , Educación de Pregrado en Medicina/métodos , Evaluación Educacional/métodos , Mejoramiento de la Calidad , Automatización , Humanos , Ictericia/diagnóstico , Ictericia/terapia , Modelos Educacionales , Psicometría
6.
Adv Health Sci Educ Theory Pract ; 20(3): 581-94, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-25164266

RESUMEN

Examiner effects and content specificity are two well known sources of construct irrelevant variance that present great challenges in performance-based assessments. National medical organizations that are responsible for large-scale performance based assessments experience an additional challenge as they are responsible for administering qualification examinations to physician candidates at several locations and institutions. This study explores the impact of site location as a source of score variation in a large-scale national assessment used to measure the readiness of internationally educated physician candidates for residency programs. Data from the Medical Council of Canada's National Assessment Collaboration were analyzed using Hierarchical Linear Modeling and Rasch Analyses. Consistent with previous research, problematic variance due to examiner effects and content specificity was found. Additionally, site location was also identified as a potential source of construct irrelevant variance in examination scores.


Asunto(s)
Sesgo , Competencia Clínica , Evaluación Educacional/normas , Médicos , Competencia Clínica/estadística & datos numéricos , Femenino , Humanos , Masculino , Modelos Estadísticos
7.
Med Educ ; 48(9): 921-9, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25113118

RESUMEN

CONTEXT: First-year residents begin clinical practice in settings in which attending staff and senior residents are available to supervise their work. There is an expectation that, while being supervised and as they become more experienced, residents will gradually take on more responsibilities and function independently. OBJECTIVES: This study was conducted to define 'entrustable professional activities' (EPAs) and determine the extent of agreement between the level of supervision expected by clinical supervisors (CSs) and the level of supervision reported by first-year residents. METHODS: Using a nominal group technique, subject matter experts (SMEs) from multiple specialties defined EPAs for incoming residents; these represented a set of activities to be performed independently by residents by the end of the first year of residency, regardless of specialty. We then surveyed CSs and first-year residents from one institution in order to compare the levels of supervision expected and received during the day and night for each EPA. RESULTS: The SMEs defined 10 EPAs (e.g. completing admission orders, obtaining informed consent) that were ratified by a national panel. A total of 113 CSs and 48 residents completed the survey. Clinical supervisors had the same expectations regardless of time of day. For three EPAs (managing i.v. fluids, obtaining informed consent, obtaining advanced directives) the level of supervision reported by first-year residents was lower than that expected by CSs (p < 0.001) regardless of time of day (i.e. day or night). For four more EPAs (initiating the management of a critically ill patient, handing over the care of a patient to colleagues, writing a discharge prescription, coordinating a patient discharge) differences applied only to night-time work (p ≤ 0.001). CONCLUSIONS: First-year residents reported performing EPAs with less supervision than expected by CSs, especially during the night. Using EPAs to guide the content of the undergraduate curriculum and during examinations could help better align CSs' and residents' expectations about early residency supervision.


Asunto(s)
Competencia Clínica/normas , Internado y Residencia/normas , Actitud del Personal de Salud , Docentes Médicos , Humanos , Masculino , Ontario , Práctica Profesional/normas
8.
Med Educ ; 48(10): 950-62, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-25200016

RESUMEN

CONTEXT: Constructed-response tasks, which range from short-answer tests to essay questions, are included in assessments of medical knowledge because they allow educators to measure students' ability to think, reason, solve complex problems, communicate and collaborate through their use of writing. However, constructed-response tasks are also costly to administer and challenging to score because they rely on human raters. One alternative to the manual scoring process is to integrate computer technology with writing assessment. The process of scoring written responses using computer programs is known as 'automated essay scoring' (AES). METHODS: An AES system uses a computer program that builds a scoring model by extracting linguistic features from a constructed-response prompt that has been pre-scored by human raters and then, using machine learning algorithms, maps the linguistic features to the human scores so that the computer can be used to classify (i.e. score or grade) the responses of a new group of students. The accuracy of the score classification can be evaluated using different measures of agreement. RESULTS: Automated essay scoring provides a method for scoring constructed-response tests that complements the current use of selected-response testing in medical education. The method can serve medical educators by providing the summative scores required for high-stakes testing. It can also serve medical students by providing them with detailed feedback as part of a formative assessment process. CONCLUSIONS: Automated essay scoring systems yield scores that consistently agree with those of human raters at a level as high, if not higher, as the level of agreement among human raters themselves. The system offers medical educators many benefits for scoring constructed-response tasks, such as improving the consistency of scoring, reducing the time required for scoring and reporting, minimising the costs of scoring, and providing students with immediate feedback on constructed-response tasks.


Asunto(s)
Instrucción por Computador/tendencias , Educación Médica/métodos , Educación Médica/tendencias , Evaluación Educacional/métodos , Programas Informáticos , Competencia Clínica , Humanos , Escritura
9.
Med Teach ; 36(7): 585-90, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24787530

RESUMEN

BACKGROUND: Past research suggests that the use of externally-applied scoring weights may not appreciably impact measurement qualities such as reliability or validity. Nonetheless, some credentialing boards and academic institutions apply differential scoring weights based on expert opinion about the relative importance of individual items or test components of Observed Structured Clinical Examinations (OSCEs). AIMS: To investigate the impact of simplified scoring models that make little to no use of differential weighting on the reliability of scores and decisions on a high stakes OSCE required for medical licensure in Canada. METHOD: We applied four different weighting models of various complexities to data from three administrations of the OSCE. We compared score reliability, pass/fail rates, correlations between the scores and classification decision accuracy and consistency across the models and administrations. RESULTS: Less complex weighting models yielded similar reliability and pass rates as the more complex weighting model. Minimal changes in candidates' pass/fail status were observed and there were strong and statistically significant correlations between the scores for all scoring models and administrations. Classification decision accuracy and consistency were very high and similar across the four scoring models. CONCLUSIONS: Adopting a simplified weighting scheme for this OSCE did not diminish its measurement qualities. Instead of developing complex weighting schemes, experts' time and effort could be better spent on other critical test development and assembly tasks with little to no compromise in the quality of scores and decisions on this high-stakes OSCE.


Asunto(s)
Competencia Clínica/normas , Evaluación Educacional/normas , Licencia Médica/normas , Canadá , Lista de Verificación , Evaluación Educacional/métodos , Evaluación Educacional/estadística & datos numéricos , Humanos , Modelos Educacionales , Reproducibilidad de los Resultados
10.
BMC Med Educ ; 14: 30, 2014 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-24528493

RESUMEN

BACKGROUND: Tutorial-based assessment commonly used in problem-based learning (PBL) is thought to provide information about students which is different from that gathered with traditional assessment strategies such as multiple-choice questions or short-answer questions. Although multiple-observations within units in an undergraduate medical education curriculum foster more reliable scores, that evaluation design is not always practically feasible. Thus, this study investigated the overall reliability of a tutorial-based program of assessment, namely the Tutotest-Lite. METHODS: More specifically, scores from multiple units were used to profile clinical domains for the first two years of a system-based PBL curriculum. RESULTS: G-Study analysis revealed an acceptable level of generalizability, with g-coefficients of 0.84 and 0.83 for Years 1 and 2, respectively. Interestingly, D-Studies suggested that as few as five observations over one year would yield sufficiently reliable scores. CONCLUSIONS: Overall, the results from this study support the use of the Tutotest-Lite to judge clinical domains over different PBL units.


Asunto(s)
Evaluación Educacional/métodos , Aprendizaje Basado en Problemas , Reproducibilidad de los Resultados
11.
Med Educ ; 44(1): 109-17, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-20078762

RESUMEN

CONTEXT: A test score is a number which purportedly reflects a candidate's proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. OBJECTIVES: The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. METHODS: The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. DISCUSSION: Classical test theory and IRT are widely used to address measurement-related issues that arise from commonly used assessments in medical education, including multiple-choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions.


Asunto(s)
Educación Médica/métodos , Evaluación Educacional/métodos , Modelos Educacionales , Instrucción por Computador/métodos , Humanos , Modelos Estadísticos , Psicometría
12.
Med Teach ; 32(6): 503-8, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-20515382

RESUMEN

BACKGROUND: Though progress tests have been used for several decades in various medical education settings, a few studies have offered analytic frameworks that could be used by practitioners to model growth of knowledge as a function of curricular and other variables of interest. AIM: To explore the use of one form of progress testing in clinical education by modeling growth of knowledge in various disciplines as well as by assessing the impact of recent training (core rotation order) on performance using hierarchical linear modeling (HLM) and analysis of variance (ANOVA) frameworks. METHODS: This study included performances across four test administrations occurring between July 2006 and July 2007 for 130 students from a US medical school who graduated in 2008. Measures-nested-in-examinees HLM growth curve analyses were run to estimate clinical science knowledge growth over time and repeated measures ANOVAs were run to assess the effect of recent training on performance. RESULTS: Core rotation order was related to growth rates for total and pediatrics scores only. Additionally, scores were higher in a given discipline if training had occurred immediately prior to the test administration. CONCLUSIONS: This study provides a useful progress testing framework for assessing medical students' growth of knowledge across their clinical science education and the related impact of training.


Asunto(s)
Medicina Clínica/educación , Evaluación Educacional/métodos , Facultades de Medicina , Prácticas Clínicas , Proyectos Piloto , Estados Unidos
13.
BMC Med ; 6: 5, 2008 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-18275598

RESUMEN

BACKGROUND: The UK General Medical Council has emphasized the lack of evidence on whether graduates from different UK medical schools perform differently in their clinical careers. Here we assess the performance of UK graduates who have taken MRCP(UK) Part 1 and Part 2, which are multiple-choice assessments, and PACES, an assessment using real and simulated patients of clinical examination skills and communication skills, and we explore the reasons for the differences between medical schools. METHOD: We perform a retrospective analysis of the performance of 5827 doctors graduating in UK medical schools taking the Part 1, Part 2 or PACES for the first time between 2003/2 and 2005/3, and 22453 candidates taking Part 1 from 1989/1 to 2005/3. RESULTS: Graduates of UK medical schools performed differently in the MRCP(UK) examination between 2003/2 and 2005/3. Part 1 and 2 performance of Oxford, Cambridge and Newcastle-upon-Tyne graduates was significantly better than average, and the performance of Liverpool, Dundee, Belfast and Aberdeen graduates was significantly worse than average. In the PACES (clinical) examination, Oxford graduates performed significantly above average, and Dundee, Liverpool and London graduates significantly below average. About 60% of medical school variance was explained by differences in pre-admission qualifications, although the remaining variance was still significant, with graduates from Leicester, Oxford, Birmingham, Newcastle-upon-Tyne and London overperforming at Part 1, and graduates from Southampton, Dundee, Aberdeen, Liverpool and Belfast underperforming relative to pre-admission qualifications. The ranking of schools at Part 1 in 2003/2 to 2005/3 correlated 0.723, 0.654, 0.618 and 0.493 with performance in 1999-2001, 1996-1998, 1993-1995 and 1989-1992, respectively. CONCLUSION: Candidates from different UK medical schools perform differently in all three parts of the MRCP(UK) examination, with the ordering consistent across the parts of the exam and with the differences in Part 1 performance being consistent from 1989 to 2005. Although pre-admission qualifications explained some of the medical school variance, the remaining differences do not seem to result from career preference or other selection biases, and are presumed to result from unmeasured differences in ability at entry to the medical school or to differences between medical schools in teaching focus, content and approaches. Exploration of causal mechanisms would be enhanced by results from a national medical qualifying examination.


Asunto(s)
Educación de Pregrado en Medicina/estadística & datos numéricos , Evaluación Educacional/estadística & datos numéricos , Estudiantes de Medicina/estadística & datos numéricos , Femenino , Humanos , Masculino , Análisis Multivariante , Análisis de Regresión , Factores Sexuales , Análisis y Desempeño de Tareas , Reino Unido
14.
Acad Med ; 93(6): 829-832, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-29538109

RESUMEN

There exists an assumption that improving medical education will improve patient care. While seemingly logical, this premise has rarely been investigated. In this Invited Commentary, the authors propose the use of big data to test this assumption. The authors present a few example research studies linking education and patient care outcomes and argue that using big data may more easily facilitate the process needed to investigate this assumption. The authors also propose that collaboration is needed to link educational and health care data. They then introduce a grassroots initiative, inclusive of universities in one Canadian province and national licensing organizations that are working together to collect, organize, link, and analyze big data to study the relationship between pedagogical approaches to medical training and patient care outcomes. While the authors acknowledge the possible challenges and issues associated with harnessing big data, they believe that the benefits supersede these. There is a need for medical education research to go beyond the outcomes of training to study practice and clinical outcomes as well. Without a coordinated effort to harness big data, policy makers, regulators, medical educators, and researchers are left with sometimes costly guesses and assumptions about what works and what does not. As the social, time, and financial investments in medical education continue to increase, it is imperative to understand the relationship between education and health outcomes.


Asunto(s)
Macrodatos , Educación Médica/estadística & datos numéricos , Evaluación de Necesidades , Evaluación de Resultado en la Atención de Salud/estadística & datos numéricos , Humanos
15.
Acad Med ; 81(10 Suppl): S108-11, 2006 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-17001118

RESUMEN

BACKGROUND: This study models time to passing United States Medical Licensing Examination (USMLE) for the computer-based testing (CBT) start-up cohort using the Cox Proportional Hazards Model. METHOD: The number of days it took to pass Step 3 was treated as the dependent variable in the model. Covariates were: (1) gender; (2) native language (English or other); (3) medical school location (United States or other); and (4) citizenship (United States or other). RESULTS: Examinees were .59 times as likely to pass USMLE if they were trained abroad. Additionally, examinees who reported having English as their primary language and U.S. citizenship were more likely to ultimately pass USMLE. Finally, though gender was also associated with passing USMLE, its practical significance was very small. CONCLUSION: Cox regression provides a useful tool for modeling performances in a continuous delivery model. Findings suggest that passing the USMLE sequence tends to be associated with native English-speaking USMGs who also hold U.S. citizenship.


Asunto(s)
Licencia Médica/estadística & datos numéricos , Modelos de Riesgos Proporcionales , Femenino , Humanos , Lenguaje , Masculino , Estados Unidos
16.
Acad Med ; 81(10 Suppl): S17-20, 2006 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-17001127

RESUMEN

BACKGROUND: The purpose of the present study was to assess the fit of three factor analytic (FA) models with a representative set of United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills (CS) cases and examinees based on substantive considerations. METHOD: Checklist, patient note, communication and interpersonal skills, as well as spoken English proficiency data were collected from 387 examinees on a set of four USMLE Step 2 CS cases. The fit of skills-based, case-based, and hybrid models was assessed. RESULTS: Findings show that a skills-based model best accounted for performance on the set of four CS cases. CONCLUSION: Results of this study provide evidence to support the structural aspect of validity. The proficiency set used by examinees when performing on the Step 2 CS cases is consistent with the scoring rubric employed and the blueprint used in form assembly. These findings will be discussed in light of past research in this area.


Asunto(s)
Competencia Clínica , Comunicación , Relaciones Interpersonales , Licencia Médica , Análisis Factorial , Humanos , Estados Unidos
18.
Artículo en Inglés | MEDLINE | ID: mdl-26883811

RESUMEN

PURPOSE: The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada's Qualifying Examination Part I (MCCQEI) based on item response theory. METHODS: Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4. RESULTS: The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%). CONCLUSION: Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.


Asunto(s)
Evaluación Educacional/normas , Licencia Médica/normas , Calibración , Canadá , Conducta de Elección , Humanos , Modelos Teóricos
19.
Eval Health Prof ; 39(1): 100-13, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26377072

RESUMEN

We present a framework for technology-enhanced scoring of bilingual clinical decision-making (CDM) questions using an open-source scoring technology and evaluate the strength of the proposed framework using operational data from the Medical Council of Canada Qualifying Examination. Candidates' responses from six write-in CDM questions were used to develop a three-stage-automated scoring framework. In Stage 1, the linguistic features from CDM responses were extracted. In Stage 2, supervised machine learning techniques were employed for developing the scoring models. In Stage 3, responses to six English and French CDM questions were scored using the scoring models from Stage 2. Of the 8,007 English and French CDM responses, 7,643 were accurately scored with an agreement rate of 95.4% between human and computer scoring. This result serves as an improvement of 5.4% when compared with the human inter-rater reliability. Our framework yielded scores similar to those of expert physician markers and could be used for clinical competency assessment.


Asunto(s)
Competencia Clínica , Evaluación Educacional/métodos , Evaluación Educacional/normas , Procesamiento Automatizado de Datos/normas , Traducción , Canadá , Toma de Decisiones Clínicas , Humanos , Licencia Médica , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA