Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Comparison of Multiple-Choice Question Formats in a First Year Medical Physiology Course.

Wilson, L Britt; DiStefano, Christine; Wang, Huijuan; Blanck, Erika L.

J CME ; 13(1): 2390264, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39157702

RESUMO

The purpose of this study was to compare student performance and question discrimination of multiple-choice questions (MCQs) that followed a standard format (SF) versus those that do not follow a SF, termed here as non-standard format (NSF). Medical physiology exam results of approximately 500 first-year medical students collected over a five-year period (2020-2024) were used. Classical test theory item analysis indices, e.g. discrimination (D), point-biserial correlation (rpbis), distractor analysis for non-functional distractors (NFDs), and difficulty (p) were determined and compared across MCQ format types. The results presented here are the mean ± standard error of the mean (SEM). The analysis showed that D (0.278 ± 0.008 vs 0.228 ± 0.006) and rpbis (0.291 ± .006 vs 0.273 ± .006) were significantly higher for NSF questions compared to SF questions, indicating NSF questions provided more discriminatory power. In addition, the percentage of NFDs was lower for the NSF items compared to the SF ones (58.3 ± 0.019% vs 70.2 ± 0.015%). Also, the NSF questions proved to be more difficult relative to the SF questions (p = 0.741 ± 0.007 for NSF; p = 0.809 ± 0.006 for SF). Thus, the NSF questions discriminated better, had fewer NFDs, and were more difficult than SF questions. These data suggest that using the selected non-standard item writing questions can enhance the ability to discriminate higher performers from lower performers on MCQs as well as provide more rigour for exams.

2.

Evaluating the Quality of Examination Items From the Pathophysiology, Drug Action, and Therapeutics Course Series.

Shultz, Benjamin; Kopale, Mary Sullivan; Benken, Scott; Mucksavage, Jeffrey.

Am J Pharm Educ ; 88(8): 100757, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38996841

RESUMO

OBJECTIVE: To determine the impact of item-writing flaws and cognitive level on student performance metrics in 1 course series across 2 semesters at a single institution. METHODS: Four investigators reviewed 928 multiple-choice items from an integrated therapeutics course series. Differences in performance metrics were examined between flawed and standard items, flawed stems and flawed answer choices, and cognitive levels. RESULTS: Reviewers found that 80% of the items were flawed, with the most common types being implausible distractors and unfocused stems. Flawed items were generally easier than standard ones, but the type of flaw significantly impacted the difficulty. Items with flawed stems had the same difficulty as standard items; however, those with flawed answer choices were significantly easier. Most items tested lower-level skills and have more flaws than higher-level items. There was no significant difference in difficulty between lower- and higher-level cognitive items, and higher-level items were more likely to have answer flaws than item flaws. CONCLUSION: Item-writing flaws differently impact student performance. Implausible distractors artificially lower the difficulty of questions, even those designed to assess higher-level skills. This effect contributes to a lack of significant difference in difficulty between higher- and lower-level items. Unfocused stems, on the other hand, likely increase confusion and hinder performance, regardless of the question's cognitive complexity.

Assuntos

Educação em Farmácia , Avaliação Educacional , Estudantes de Farmácia , Humanos , Avaliação Educacional/métodos , Avaliação Educacional/normas , Educação em Farmácia/métodos , Educação em Farmácia/normas , Currículo , Cognição

3.

Postexamination item analysis of undergraduate pediatric multiple-choice questions exam: implications for developing a validated question Bank.

Rashwan, Nagwan I; Aref, Soha R; Nayel, Omnia A; Rizk, Mennatallah H.

BMC Med Educ ; 24(1): 168, 2024 Feb 21.

Artigo em Inglês | MEDLINE | ID: mdl-38383427

RESUMO

INTRODUCTION: Item analysis (IA) is widely used to assess the quality of multiple-choice questions (MCQs). The objective of this study was to perform a comprehensive quantitative and qualitative item analysis of two types of MCQs: single best answer (SBA) and extended matching questions (EMQs) currently in use in the Final Pediatrics undergraduate exam. METHODOLOGY: A descriptive cross-sectional study was conducted. We analyzed 42 SBA and 4 EMQ administered to 247 fifth-year medical students. The exam was held at the Pediatrics Department, Qena Faculty of Medicine, Egypt, in the 2020-2021 academic year. Quantitative item analysis included item difficulty (P), discrimination (D), distractor efficiency (DE), and test reliability. Qualitative item analysis included evaluation of the levels of cognitive skills and conformity of test items with item writing guidelines. RESULTS: The mean score was 55.04 ± 9.8 out of 81. Approximately 76.2% of SBA items assessed low cognitive skills, and 75% of EMQ items assessed higher-order cognitive skills. The proportions of items with an acceptable range of difficulty (0.3-0.7) on the SBA and EMQ were 23.80 and 16.67%, respectively. The proportions of SBA and EMQ with acceptable ranges of discrimination (> 0.2) were 83.3 and 75%, respectively. The reliability coefficient (KR20) of the test was 0.84. CONCLUSION: Our study will help medical teachers identify the quality of SBA and EMQ, which should be included to develop a validated question bank, as well as questions that need revision and remediation for subsequent use.

Assuntos

Avaliação Educacional , Estudantes de Medicina , Humanos , Criança , Estudos Transversais , Reprodutibilidade dos Testes , Docentes

4.

Twelve tips for designing and implementing peer-led assessment writing schemes in health professions education.

Harris, Benjamin H L; Harris, Samuel R L; Walsh, Jason L; Pereira, Christopher; Black, Susannah M; Allott, Vincent E S; Handa, Ashok; Thampy, Harish.

Med Teach ; 46(8): 1027-1034, 2024 08.

Artigo em Inglês | MEDLINE | ID: mdl-38277134

RESUMO

Peer-led assessment (PLA) has gained increasing prominence within health professions education as an effective means of engaging learners in the process of assessment writing and practice. Involving students in various stages of the assessment lifecycle, including item writing, quality assurance, and feedback, not only facilitates the creation of high-quality item banks with minimal faculty input but also promotes the development of students' assessment literacy and fosters their growth as teachers. The advantages of involving students in the generation of assessments are evident from a pedagogical standpoint, benefiting both students and faculty. However, faculty members may face uncertainty when it comes to implementing such approaches effectively. To address this concern, this paper presents twelve tips that offer guidance on important considerations for the successful implementation of peer-led assessment schemes in the context of health professions education.

Assuntos

Avaliação Educacional , Ocupações em Saúde , Grupo Associado , Redação , Humanos , Avaliação Educacional/métodos , Ocupações em Saúde/educação

5.

Development of a shared item repository for progress testing in veterinary education.

Schaper, Elisabeth; van Haeften, Theo; Wandall, Jakob; Iivanainen, Antti; Penell, Johanna; Press, Charles McLean; Lekeux, Pierre; Holm, Peter.

Front Vet Sci ; 10: 1296514, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38026654

RESUMO

Introduction: Progress testing in education is an assessment principle for the measurement of students' progress over time, e.g., from start to graduation. Progress testing offers valid longitudinal formative measurement of the growth in the cognitive skills of the individual students within the subjects of the test as well as a tool for educators to monitor potential educational gaps and mismatches within the curriculum in relation to the basic veterinary learning outcomes. Methods: Six veterinary educational establishments in Denmark, Finland, Germany (Hannover), the Netherlands, Norway, and Sweden established in cooperation with the European Association of Establishments for Veterinary Education (EAEVE) a common veterinary item repository that can be used for progress testing in European Veterinary Education Establishments (VEEs), linear as well as computer adaptive, covering the EAEVE veterinary subjects and theoretical "Day One Competencies." First, a blueprint was created, suitable item formats were identified, and a quality assurance process for reviewing and approving items was established. The items were trialed to create a database of validated and calibrated items, and the responses were subsequently psychometrically analyzed according to Modern Test Theory. Results: In total, 1,836 items were submitted of which 1,342 were approved by the reviewers for trial testing. 1,119 students from all study years and all partners VEEs participated in one or more of six item trials, and 1,948 responses were collected. Responses were analyzed using Rasch Modeling (analysis of item-fit, differential item function, item-response characteristics). A total of 821 calibrated items of various difficulty levels matching the veterinary students' abilities and covering the veterinary knowledge domains have been banked. Discussion: The item bank is now ready to be used for formative progress testing in European veterinary education. This paper presents and discusses possible pitfalls, problems, and solutions when establishing an international veterinary progress test.

6.

A suggestive approach for assessing item quality, usability and validity of Automatic Item Generation.

Falcão, Filipe; Pereira, Daniela Marques; Gonçalves, Nuno; De Champlain, Andre; Costa, Patrício; Pêgo, José Miguel.

Adv Health Sci Educ Theory Pract ; 28(5): 1441-1465, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37097483

RESUMO

Automatic Item Generation (AIG) refers to the process of using cognitive models to generate test items using computer modules. It is a new but rapidly evolving research area where cognitive and psychometric theory are combined into digital framework. However, assessment of the item quality, usability and validity of AIG relative to traditional item development methods lacks clarification. This paper takes a top-down strong theory approach to evaluate AIG in medical education. Two studies were conducted: Study I-participants with different levels of clinical knowledge and item writing experience developed medical test items both manually and through AIG. Both item types were compared in terms of quality and usability (efficiency and learnability); Study II-Automatically generated items were included in a summative exam in the content area of surgery. A psychometric analysis based on Item Response Theory inspected the validity and quality of the AIG-items. Items generated by AIG presented quality, evidences of validity and were adequate for testing student's knowledge. The time spent developing the contents for item generation (cognitive models) and the number of items generated did not vary considering the participants' item writing experience or clinical knowledge. AIG produces numerous high-quality items in a fast, economical and easy to learn process, even for inexperienced and without clinical training item writers. Medical schools may benefit from a substantial improvement in cost-efficiency in developing test items by using AIG. Item writing flaws can be significantly reduced thanks to the application of AIG's models, thus generating test items capable of accurately gauging students' knowledge.

Assuntos

Educação de Graduação em Medicina , Educação Médica , Humanos , Avaliação Educacional/métodos , Educação de Graduação em Medicina/métodos , Psicometria , Estudantes

7.

Writing Multiple Choice Questions-Has the Student Become the Master?

Pham, Hannah; Court-Kowalski, Stefan; Chan, Hong; Devitt, Peter.

Teach Learn Med ; 35(3): 356-367, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-35491868

RESUMO

CONSTRUCT: We compared the quality of clinician-authored and student-authored multiple choice questions (MCQs) using a formative, mock examination of clinical knowledge for medical students. BACKGROUND: Multiple choice questions are a popular format used in medical programs of assessment. A challenge for educators is creating high-quality items efficiently. For expediency's sake, a standard practice is for faculties to repeat items in examinations from year to year. This study aims to compare the quality of student-authored with clinician-authored items as a potential source of new items to include in faculty item banks. APPROACH: We invited Year IV and V medical students at the University of Adelaide to participate in a mock examination. The participants first completed an online instructional module on strategies for answering and writing MCQs, then submitted one original MCQ each for potential inclusion in the mock examination. Two 180-item mock examinations, one for each year level, were constructed. Each consisted of 90 student-authored items and 90 clinician-authored items. Participants were blinded to the author of each item. Each item was analyzed for item difficulty and discrimination, number of item-writing flaws (IWFs) and non-functioning distractors (NFDs), and cognitive skill level (using a modified version of Bloom's taxonomy). FINDINGS: Eighty-nine and 91 students completed the Year IV and V examinations, respectively. Student-authored items, compared with clinician-authored items, tended to be written at both a lower cognitive skill and difficulty level. They contained a significantly higher rate of IWFs (2-3.5 times) and NFDs (1.18 times). However, they were equally or better discriminating items than clinician-authored items. CONCLUSIONS: Students can author MCQ items with comparable discrimination to clinician-authored items, despite being inferior in other parameters. Student-authored items may be considered a potential source of material for faculty item banks; however, several barriers exist to their use in a summative setting. The overall quality of items remains suboptimal, regardless of author. This highlights the need for ongoing faculty training in item writing.

Assuntos

Educação de Graduação em Medicina , Estudantes de Medicina , Humanos , Avaliação Educacional , Docentes , Redação

8.

Three Sources of Validation Evidence Needed to Evaluate the Quality of Generated Test Items for Medical Licensure.

Gierl, Mark; Swygert, Kimberly; Matovinovic, Donna; Kulesher, Allison; Lai, Hollis.

Teach Learn Med ; : 1-11, 2022 Sep 14.

Artigo em Inglês | MEDLINE | ID: mdl-36106359

RESUMO

Issue: Automatic item generation is a method for creating medical items using an automated, technological solution. Automatic item generation is a contemporary method that can scale the item development process for production of large numbers of new items, support building of multiple forms, and allow rapid responses to changing medical content guidelines and threats to test security. The purpose of this analysis is to describe three sources of validation evidence that are required when producing high-quality medical licensure test items to ensure evidence for valid test score inferences, using the automatic item generation methodology for test development. Evidence: Generated items are used to make inferences about examinees' medical knowledge, skills, and competencies. We present three sources of evidence required to evaluate the quality of the generated items that is necessary to ensure the generated items measure the intended knowledge, skills, and competencies. The sources of evidence we present here relate to the item definition, the item development process, and the item quality review. An item is defined as an explicit set of properties that include the parameters, constraints, and instructions used to elicit a response from the examinee. This definition allows for a critique of the input used for automatic item generation. The item development process is evaluated using a validation table, whose purpose is to support verification of the assumptions related to model specification made by the subject-matter expert. This table provides a succinct summary of the content and constraints that were used to create new items. The item quality review is used to evaluate the statistical quality of the generated items, which often focuses on the difficulty and the discrimination of the correct and incorrect options. Implications: Automatic item generation is an increasingly popular item development method. The generated items from this process must be bolstered by evidence to ensure the items measure the intended knowledge, skills, and competencies. The purpose of this analysis is to describe these sources of evidence that can be used to evaluate the quality of the generated items. The important role of medical expertise in the development and evaluation of the generated items is highlighted as a crucial requirement for producing validation evidence.

9.

Great Question! The Art and Science of Crafting High-Quality Multiple-Choice Questions.

Catanzano, Tara; Jordan, Sheryl G; Lewis, Petra J.

J Am Coll Radiol ; 19(6): 687-692, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35288095

RESUMO

Assessment of medical knowledge is essential to determine the progress of an adult learner. Well-crafted multiple-choice questions are one proven method of testing a learner's understanding of a specific topic. The authors provide readers with rules that must be followed to create high-quality multiple-choice questions. Common question writing mistakes are also addressed to assist readers in improving their item-writing skills.

Assuntos

Avaliação Educacional , Redação

10.

A Natural-Language-Processing-Based Procedure for Generating Distractors for Multiple-Choice Questions.

Baldwin, Peter; Mee, Janet; Yaneva, Victoria; Paniagua, Miguel; D'Angelo, Jean; Swygert, Kimberly; Clauser, Brian E.

Eval Health Prof ; 45(4): 327-340, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-34753326

RESUMO

One of the most challenging aspects of writing multiple-choice test questions is identifying plausible incorrect response options-i.e., distractors. To help with this task, a procedure is introduced that can mine existing item banks for potential distractors by considering the similarities between a new item's stem and answer and the stems and response options for items in the bank. This approach uses natural language processing to measure similarity and requires a substantial pool of items for constructing the generating model. The procedure is demonstrated with data from the United States Medical Licensing Examination (USMLE®). For about half the items in the study, at least one of the top three system-produced candidates matched a human-produced distractor exactly; and for about one quarter of the items, two of the top three candidates matched human-produced distractors. A study was conducted in which a sample of system-produced candidates were shown to 10 experienced item writers. Overall, participants thought about 81% of the candidates were on topic and 56% would help human item writers with the task of writing distractors.

Assuntos

Avaliação Educacional , Processamento de Linguagem Natural , Humanos , Estados Unidos , Avaliação Educacional/métodos

11.

Motivations of assessment item writers in medical programs: a qualitative study.

Karthikeyan, Sowmiya; O'Connor, Elizabeth; Hu, Wendy.

BMC Med Educ ; 20(1): 334, 2020 Sep 29.

Artigo em Inglês | MEDLINE | ID: mdl-32993579

RESUMO

BACKGROUND: The challenge of generating sufficient quality items for medical student examinations is a common experience for medical program coordinators. Faculty development strategies are commonly used, but there is little research on the factors influencing medical educators to engage in item writing. To assist with designing evidence-based strategies to improve engagement, we conducted an interview study informed by self-determination theory (SDT) to understand educators' motivations to write items. METHODS: We conducted 11 semi-structured interviews with educators in an established medical program. Interviews were transcribed verbatim and underwent open coding and thematic analysis. RESULTS: Major themes included; responsibility for item writing and item writer motivations, barriers and enablers; perceptions of the level of content expertise required to write items; and differences in the writing process between clinicians and non-clinicians. CONCLUSIONS: Our findings suggest that flexible item writing training, strengthening of peer review processes and institutional improvements such as improved communication of expectations, allocation of time for item writing and pairing new writers with experienced writers for mentorship could enhance writer engagement.

Assuntos

Motivação , Estudantes de Medicina , Humanos , Mentores , Pesquisa Qualitativa , Redação

12.

Knowledge of dental faculty in gulf cooperation council states of multiple-choice questions' item writing flaws.

Kowash, Mawlood; Alhobeira, Hazza; Hussein, Iyad; Al Halabi, Manal; Khan, Saif.

Med Educ Online ; 25(1): 1812224, 2020 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-32835640

RESUMO

Multiple-Choice Questions provide an objective cost/time effective assessment. Deviation from appropriate question writing structural guidelines will most probably result in commonly ignored multiple-choice questions writing flaws, influencing the ability of the assessment to measure students' cognitive levels thereby seriously affecting students' academic performance outcome measures. To gauge the knowledge of multiple-choice question items writing flaws in dental faculty working at colleges in Gulf Cooperation Council (GCC) countries. A cross-sectional short online Survey MonkeyTM multiple-choice questions-based questionnaire was disseminated to dental faculty working in GCC countries during the academic year 2018/2019. The questionnaire included five test incorrect (flawed) multiple-choice questions and one correct control question. The participants were asked to identify flawed multiple-choice question items from the known 14 items writing flaws. Out of a total of 460 faculty, 216 respondents completed the questionnaires, 132 (61.1%) were from Saudi Arabia, while numbers of participants from United Arab Emirates, Kuwait and Oman were 59 (27.3), 14 (6.5%) and 11 (5.1%) respectively. Majority of participants were male (n = 141, 65.9%) compared to 73 females (34.1%). Eighty percent of the participants possessed more than five years of teaching experience. Assistant professors constituted the majority (43.3%) of the academic positions participating in this study. The overall fail rate ranged from 76.3% to 98.1% and almost 2/3rds of the participants were unable to identify one or more of the flawed item(s). No significant association was observed between the demographics (age, region, academic position and specialty) and knowledge except that of participant's gender (p < 0.009). GCC dental faculty demonstrated below average knowledge of multiple-choice question items writing flaws. Training and workshops are needed to ensure substantial exposure to proper multiple-choice question items construction standards.

Assuntos

Avaliação Educacional , Docentes de Odontologia , Estudos Transversais , Avaliação Educacional/normas , Feminino , Humanos , Kuweit , Masculino , Arábia Saudita , Estudantes , Inquéritos e Questionários , Emirados Árabes Unidos , Redação/normas

13.

Evaluating the quality of multiple-choice questions in a NAPLEX preparation book.

Danh, Tina; Desiderio, Tamara; Herrmann, Victoria; Lyons, Heather M; Patrick, Frankie; Wantuch, Gwendolyn A; Dell, Kamila A.

Curr Pharm Teach Learn ; 12(10): 1188-1193, 2020 10.

Artigo em Inglês | MEDLINE | ID: mdl-32739055

RESUMO

INTRODUCTION: There is a plethora of preparatory books and guides available to help study for the North American Pharmacist Licensure Examination (NAPLEX). However, the quality of questions included has not been scrutinized. Our objective was to evaluate the quality of multiple-choice questions (MCQs) construction in a commonly used NAPLEX preparatory book. METHODS: Five students and two faculty members reviewed MCQs from the RxPrep 2018 edition course book. Item structure and utilization of case-based questions were evaluated using best practices for item construction. Frequency of item writing flaws (IWF) and utilization of cases for case-based questions was identified. RESULTS: A total of 298 questions were reviewed. Twenty-seven (9.1%) questions met all best practices for item construction. Flawed questions contained an average of 2.53 IWF per MCQ. The most commonly identified best practice violations were answer choices containing differing length and verb tense (21%) and question stems containing too little or too much information necessary to eliminate distractors (16.6%). Of the case-based questions, the majority (61.9%) did not require utilization of the provided case. CONCLUSIONS: This pilot analysis identified that a majority of MCQs in one NAPLEX preparatory source contained IWF. These results align with previous evaluations of test-banks in published books outside of pharmacy. Further evaluation of other preparatory materials, to expand on the findings from this pilot analysis, are needed to evaluate the pervasiveness of IWF in preparatory materials and the effect of flawed questions on utility of study materials.

Assuntos

Avaliação Educacional , Farmacêuticos , Livros , Humanos , Estudantes , Redação

14.

Nonfunctional distractor analysis: An indicator for quality of Multiple choice questions.

Sajjad, Madiha; Iltaf, Samina; Khan, Rehan Ahmed.

Pak J Med Sci ; 36(5): 982-986, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32704275

RESUMO

OBJECTIVES: To analyze the low to medium distractor efficiency items in a multiple-choice question (MCQ) paper for item writing flaws. METHODS: This qualitative study was conducted at Islamic International Medical College Rawalpindi, in October 2019. Archived item- analysis report from a midyear medium stakes MCQ paper of 2nd year MBBS class, was analyzed to determine the non-functional distractors (NFDs) and distractor efficiency (DE) of items, in a total of 181 MCQs. DE was categorized as low (3-4 NFDs), medium (1-2 NFDs) and high (0 NFD). Subsequently, qualitative document analysis of the MCQ paper whose item analysis report was assessed was conducted to investigate the item flaws in the low to medium DE items. The flaws identified were coded and grouped as, within option flaws, alignment flaws between options and stem/ lead-in and other flaws. RESULTS: Distractor efficiency was high in 69 items (38%), moderate in 75 items (42%) and low in 37 items (20%). The item-writing flaws identified in low to moderate DE items within distractors included, non-homogenous length (1.8%), non-homogenous content (8%) and repeat in distractor (1.7%). Alignment flaws between distractors and stem/ lead-in identified were linguistic cues (10%), logic cues (12.5%) and irrelevant distractors (16%). Flaws unrelated to distractors were low cognitive level items (40%) and unnecessarily complicated stems (11.6%). CONCLUSIONS: Analyzing the low to medium DE items for item writing flaws, provides valuable information about item writing errors which negatively impact the distractor efficiency.

15.

Barriers and facilitators to writing quality items for medical school assessments - a scoping review.

Karthikeyan, Sowmiya; O'Connor, Elizabeth; Hu, Wendy.

BMC Med Educ ; 19(1): 123, 2019 May 02.

Artigo em Inglês | MEDLINE | ID: mdl-31046744

RESUMO

BACKGROUND: Producing a sufficient quantity of quality items for use in medical school examinations is a continuing challenge in medical education. We conducted this scoping review to identify barriers and facilitators to writing good quality items and note gaps in the literature that are yet to be addressed. METHODS: We conducted searches of three databases (ERIC, Medline and Scopus) as well as Google Scholar for empirical studies on the barriers and facilitators for writing good quality items for medical school examinations. RESULTS: The initial search yielded 1997 articles. After applying pre-determined criteria, 13 articles were selected for the scoping review. Included studies could be broadly categorised into studies that attempted to directly investigate the barriers and facilitators and studies that provided implicit evidence. Key findings were that faculty development and quality assurance were facilitators of good quality item writing while barriers at both an individual and institutional level include motivation, time constraints and scheduling. CONCLUSIONS: Although studies identified factors that may improve or negatively impact on the quality of items written by faculty and clinicians, there was limited research investigating the barriers and facilitators for individual item writers. Investigating these challenges could lead to more targeted and effective interventions to improve both the quality and quantity of assessment items.

Assuntos

Estágio Clínico/normas , Avaliação Educacional/normas , Faculdades de Medicina , Redação/normas , Currículo , Educação de Graduação em Medicina , Humanos , Avaliação de Programas e Projetos de Saúde

16.

The Effect of Option Homogeneity in Multiple-Choice Items.

Applegate, Gregory M; Sutherland, Karen A; Becker, Kirk A; Luo, Xiao.

Appl Psychol Meas ; 43(2): 113-124, 2019 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-30792559

RESUMO

Previous research has found that option homogeneity in multiple-choice items affects item difficulty when items with homogeneous options are compared to the same items with heterogeneous options. This study conducted an empirical test of the effect of option homogeneity in multiple-choice items on a professional licensure examination to determine the predictability and magnitude of the change. Similarity of options to the key was determined by using subject matter experts and a natural language processing algorithm. Contrary to current research, data analysis revealed no consistent effect on item difficulty, discrimination, fit to the measurement model, or response time associated with the absence or presence of option homogeneity. While the results are negative, they call into question established guidelines in item development. A hypothesis is proposed to explain why this effect is found in some studies but not others.

17.

The optimal number of options for multiple-choice questions on high-stakes tests: application of a revised index for detecting nonfunctional distractors.

Raymond, Mark R; Stevens, Craig; Bucak, S Deniz.

Adv Health Sci Educ Theory Pract ; 24(1): 141-150, 2019 03.

Artigo em Inglês | MEDLINE | ID: mdl-30362027

RESUMO

Research suggests that the three-option format is optimal for multiple choice questions (MCQs). This conclusion is supported by numerous studies showing that most distractors (i.e., incorrect answers) are selected by so few examinees that they are essentially nonfunctional. However, nearly all studies have defined a distractor as nonfunctional if it is selected by fewer than 5% of examinees. A limitation of this definition is that the proportion of examinees available to choose a distractor depends on overall item difficulty. This is especially problematic for mastery tests, which consist of items that most examinees are expected to answer correctly. Based on the traditional definition of nonfunctional, a five-option MCQ answered correctly by greater than 90% of examinees will be constrained to have only one functional distractor. The primary purpose of the present study was to evaluate an index of nonfunctional that is sensitive to item difficulty. A secondary purpose was to extend previous research by studying distractor functionality within the context of professionally-developed credentialing tests. Data were analyzed for 840 MCQs consisting of five options per item. Results based on the traditional definition of nonfunctional were consistent with previous research indicating that most MCQs had one or two functional distractors. In contrast, the newly proposed index indicated that nearly half (47.3%) of all items had three or four functional distractors. Implications for item and test development are discussed.

Assuntos

Educação Médica/métodos , Educação Médica/normas , Avaliação Educacional/métodos , Avaliação Educacional/normas , Comportamento de Escolha , Humanos , Modelos Estatísticos , Psicometria

18.

Evaluation of MCQs from MOOCs for common item writing flaws.

Costello, Eamon; Holland, Jane C; Kirwan, Colette.

BMC Res Notes ; 11(1): 849, 2018 Dec 03.

Artigo em Inglês | MEDLINE | ID: mdl-30509321

RESUMO

OBJECTIVE: There is a dearth of research into the quality of assessments based on Multiple Choice Question (MCQ) items in Massive Open Online Courses (MOOCs). This dataset was generated to determine whether MCQ item writing flaws existed in a selection of MOOC assessments, and to evaluate their prevalence if so. Hence, researchers reviewed MCQs from a sample of MOOCs, using an evaluation protocol derived from the medical health education literature, which has an extensive evidence-base with regard to writing quality MCQ items. DATA DESCRIPTION: This dataset was collated from MCQ items in 18 MOOCs in the areas of medical health education, life sciences and computer science. Two researchers critically reviewed 204 questions using an evidence-based evaluation protocol. In the data presented, 50% of the MCQs (112) have one or more item writing flaw, while 28% of MCQs (57) contain two or more flaws. Thus, a majority of the MCQs in the dataset violate item-writing guidelines, which mirrors findings of previous research that examined rates of flaws in MCQs in traditional formal educational contexts.

Assuntos

Currículo , Avaliação Educacional , Internet , Inquéritos e Questionários , Redação

19.

Choosing medical assessments: Does the multiple-choice question make the grade?

Pham, Hannah; Trigg, Monique; Wu, Shaopeng; O'Connell, Alice; Harry, Christopher; Barnard, John; Devitt, Peter.

Educ Health (Abingdon) ; 31(2): 65-71, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30531047

RESUMO

Background: The multiple-choice question (MCQ) has been shown to measure the same constructs as the short-answer question (SAQ), yet the use of the latter persists. The study aims to evaluate whether assessment using the MCQ alone provides the same outcomes as testing with the SAQ. Methods: A prospective study design was used. A total of 276 medical students participated in a mock examination consisting of forty MCQs paired to forty SAQs, each pair matched in cognitive skill level and content. Each SAQ was marked by three independent markers. The impact of item-writing flaws (IWFs) on examination outcome was also evaluated. Results: The intraclass correlation coefficient (ICC) was 0.75 for the year IV examinations and 0.68 for the year V examinations. MCQs were more prone to IWFs than SAQs, but the effect when present in the latter was greater. Removal of questions containing IWFs from the year V SAQ allowed 39% of students who would otherwise have failed to pass. Discussion: The MCQ can test higher order skills as effectively as the SAQ and can be used as a single format in written assessment provided quality items testing higher order cognitive skills are used. IWFs can have a critical role in determining pass/fail results.

Assuntos

Desempenho Acadêmico , Comportamento de Escolha , Avaliação Educacional/métodos , Estudantes de Medicina , Educação de Graduação em Medicina , Feminino , Humanos , Masculino

20.

Comparing Item Performance on Three- Versus Four-Option Multiple Choice Questions in a Veterinary Toxicology Course.

Royal, Kenneth; Dorman, David.

Vet Sci ; 5(2)2018 Jun 09.

Artigo em Inglês | MEDLINE | ID: mdl-29890727

RESUMO

BACKGROUND: The number of answer options is an important element of multiple-choice questions (MCQs). Many MCQs contain four or more options despite the limited literature suggesting that there is little to no benefit beyond three options. The purpose of this study was to evaluate item performance on 3-option versus 4-option MCQs used in a core curriculum course in veterinary toxicology at a large veterinary medical school in the United States. METHODS: A quasi-experimental, crossover design was used in which students in each class were randomly assigned to take one of two versions (A or B) of two major exams. RESULTS: Both the 3-option and 4-option MCQs resulted in similar psychometric properties. CONCLUSION: The findings of our study support earlier research in other medical disciplines and settings that likewise concluded there was no significant change in the psychometric properties of three option MCQs when compared to the traditional MCQs with four or more options.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA