Pesquisa | Portal Regional da BVS

1.

Utility of RAND/UCLA appropriateness method in validating multiple-choice questions on ECG.

Kaga, Tomohiro; Inaba, Shinji; Shikano, Yukari; Watanabe, Yasuyuki; Fujisawa, Tomoki; Akazawa, Yusuke; Ohshita, Muneaki; Kawakami, Hiroshi; Higashi, Haruhiko; Aono, Jun; Nagai, Takayuki; Islam, Mohammad Zahidul; Wannous, Muhammad; Sakata, Masatsugu; Yamamoto, Kazumichi; Furukawa, Toshi A; Yamaguchi, Osamu.

BMC Med Educ ; 24(1): 448, 2024 Apr 24.

Artigo em Inglês | MEDLINE | ID: mdl-38658906

RESUMO

OBJECTIVES: This study aimed to investigate the utility of the RAND/UCLA appropriateness method (RAM) in validating expert consensus-based multiple-choice questions (MCQs) on electrocardiogram (ECG). METHODS: According to the RAM user's manual, nine panelists comprising various experts who routinely handle ECGs were asked to reach a consensus in three phases: a preparatory phase (round 0), an online test phase (round 1), and a face-to-face expert panel meeting (round 2). In round 0, the objectives and future timeline of the study were elucidated to the nine expert panelists with a summary of relevant literature. In round 1, 100 ECG questions prepared by two skilled cardiologists were answered, and the success rate was calculated by dividing the number of correct answers by 9. Furthermore, the questions were stratified into "Appropriate," "Discussion," or "Inappropriate" according to the median score and interquartile range (IQR) of appropriateness rating by nine panelists. In round 2, the validity of the 100 ECG questions was discussed in an expert panel meeting according to the results of round 1 and finally reassessed as "Appropriate," "Candidate," "Revision," and "Defer." RESULTS: In round 1 results, the average success rate of the nine experts was 0.89. Using the median score and IQR, 54 questions were classified as " Discussion." In the expert panel meeting in round 2, 23% of the original 100 questions was ultimately deemed inappropriate, although they had been prepared by two skilled cardiologists. Most of the 46 questions categorized as "Appropriate" using the median score and IQR in round 1 were considered "Appropriate" even after round 2 (44/46, 95.7%). CONCLUSIONS: The use of the median score and IQR allowed for a more objective determination of question validity. The RAM may help select appropriate questions, contributing to the preparation of higher-quality tests.

Assuntos

Eletrocardiografia , Humanos , Consenso , Reprodutibilidade dos Testes , Competência Clínica/normas , Avaliação Educacional/métodos , Cardiologia/normas

2.

Large language models for generating medical examinations: systematic review.

Artsi, Yaara; Sorin, Vera; Konen, Eli; Glicksberg, Benjamin S; Nadkarni, Girish; Klang, Eyal.

BMC Med Educ ; 24(1): 354, 2024 Mar 29.

Artigo em Inglês | MEDLINE | ID: mdl-38553693

RESUMO

BACKGROUND: Writing multiple choice questions (MCQs) for the purpose of medical exams is challenging. It requires extensive medical knowledge, time and effort from medical educators. This systematic review focuses on the application of large language models (LLMs) in generating medical MCQs. METHODS: The authors searched for studies published up to November 2023. Search terms focused on LLMs generated MCQs for medical examinations. Non-English, out of year range and studies not focusing on AI generated multiple-choice questions were excluded. MEDLINE was used as a search database. Risk of bias was evaluated using a tailored QUADAS-2 tool. RESULTS: Overall, eight studies published between April 2023 and October 2023 were included. Six studies used Chat-GPT 3.5, while two employed GPT 4. Five studies showed that LLMs can produce competent questions valid for medical exams. Three studies used LLMs to write medical questions but did not evaluate the validity of the questions. One study conducted a comparative analysis of different models. One other study compared LLM-generated questions with those written by humans. All studies presented faulty questions that were deemed inappropriate for medical exams. Some questions required additional modifications in order to qualify. CONCLUSIONS: LLMs can be used to write MCQs for medical examinations. However, their limitations cannot be ignored. Further study in this field is essential and more conclusive evidence is needed. Until then, LLMs may serve as a supplementary tool for writing medical examinations. 2 studies were at high risk of bias. The study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.

Assuntos

Conhecimento , Idioma , Humanos , Bases de Dados Factuais , Redação

3.

Multiple choice as formative assessment in dental education.

Haugen, Håvard J; de Lange, Thomas.

Eur J Dent Educ ; 2024 Mar 08.

Artigo em Inglês | MEDLINE | ID: mdl-38456591

RESUMO

INTRODUCTION: The effectiveness of multiple-choice questions (MCQs) in dental education is pivotal to student performance and knowledge advancement. However, their optimal implementation requires exploration to enhance the benefits. MATERIALS AND METHODS: An educational tool incorporating MCQs was administered from the 5th to the 10th semester in a dental curriculum. The students filled out a questionnaire after the MCQ, which was linked to the learning management system. Four cohorts of four semesters generated 2300 data points analysed by Spearmen correlation and mixed model regression analysis. RESULTS: Demonstrated a significant correlation between early exam preparation and improved student performance. Independent study hours and lecture attendance emerged as significant predictors, accounting for approximately 10.27% of the variance in student performance on MCQs. While the number of MCQs taken showed an inverse relationship with study hours, the perceived clarity of these questions positively correlated with academic achievement. CONCLUSION: MCQs have proven effective in enhancing student learning and knowledge within the discipline. Our analysis underscores the important role of independent study and consistent lecture attendance in positively influencing MCQ scores. The study provides valuable insights into using MCQs as a practical tool for dental student learning. Moreover, the clarity of assessment tools, such as MCQs, remains pivotal in influencing student outcomes. This study underscores the multifaceted nature of learning experiences in dental education and the importance of bridging the gap between student expectations and actual performance.

4.

Case-based MCQ generator: A custom ChatGPT based on published prompts in the literature for automatic item generation.

Kiyak, Yavuz Selim; Kononowicz, Andrzej A.

Med Teach ; : 1-3, 2024 Feb 10.

Artigo em Inglês | MEDLINE | ID: mdl-38340312

RESUMO

WHAT IS THE EDUCATIONAL CHALLENGE?: A fundamental challenge in medical education is creating high-quality, clinically relevant multiple-choice questions (MCQs). ChatGPT-based automatic item generation (AIG) methods need well-designed prompts. However, the use of these prompts is hindered by the time-consuming process of copying and pasting, a lack of know-how among medical teachers, and the generalist nature of standard ChatGPT, which often lacks the medical context. WHAT ARE THE PROPOSED SOLUTIONS?: The Case-based MCQ Generator, a custom GPT, addresses these challenges. It has been trained by using GPT Builder, which is a platform designed by OpenAI for customizing ChatGPT to meet specific needs, in order to allow users to generate case-based MCQs. By using this free tool for those who have ChatGPT Plus subscription, health professions educators can easily select a prompt, input a learning objective or item-specific test point, and generate clinically relevant questions. WHAT ARE THE POTENTIAL BENEFITS TO A WIDER GLOBAL AUDIENCE?: It enhances the efficiency of MCQ generation and ensures the generation of contextually relevant questions, surpassing the capabilities of standard ChatGPT. It streamlines the MCQ creation process by integrating prompts published in medical education literature, eliminating the need for manual prompt input. WHAT ARE THE NEXT STEPS?: Future development aims at sustainability and addressing ethical and accessibility issues. It requires regular updates, integration of new prompts from emerging health professions education literature, and a supportive digital ecosystem around the tool. Accessibility, especially for educators in low-resource countries, is vital, demanding alternative access models to overcome financial barriers.

5.

ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam.

Kiyak, Yavuz Selim; Coskun, Özlem; Budakoglu, Isil Irem; Uluoglu, Canan.

Eur J Clin Pharmacol ; 80(5): 729-735, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38353690

RESUMO

PURPOSE: Artificial intelligence, specifically large language models such as ChatGPT, offers valuable potential benefits in question (item) writing. This study aimed to determine the feasibility of generating case-based multiple-choice questions using ChatGPT in terms of item difficulty and discrimination levels. METHODS: This study involved 99 fourth-year medical students who participated in a rational pharmacotherapy clerkship carried out based-on the WHO 6-Step Model. In response to a prompt that we provided, ChatGPT generated ten case-based multiple-choice questions on hypertension. Following an expert panel, two of these multiple-choice questions were incorporated into a medical school exam without making any changes in the questions. Based on the administration of the test, we evaluated their psychometric properties, including item difficulty, item discrimination (point-biserial correlation), and functionality of the options. RESULTS: Both questions exhibited acceptable levels of point-biserial correlation, which is higher than the threshold of 0.30 (0.41 and 0.39). However, one question had three non-functional options (options chosen by fewer than 5% of the exam participants) while the other question had none. CONCLUSIONS: The findings showed that the questions can effectively differentiate between students who perform at high and low levels, which also point out the potential of ChatGPT as an artificial intelligence tool in test development. Future studies may use the prompt to generate items in order for enhancing the external validity of the results by gathering data from diverse institutions and settings.

Assuntos

Hipertensão , Estudantes de Medicina , Humanos , Inteligência Artificial , Faculdades de Medicina

6.

An assessment of preclinical removable prosthodontics based on multiple-choice questions: Stakeholders' perceptions.

Shammas, Mohammed; Nagda, Suhasini; Shah, Chinmay; Baxi, Gaurang; Gadde, Praveen; Sachdeva, Shabina; Gupta, Deeksha; Wali, Othman; Dhall, Rupinder Singh; Gajdhar, Shaiq.

J Dent Educ ; 88(5): 533-543, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38314889

RESUMO

PURPOSE: Item analysis of multiple-choice questions (MCQs) is an essential tool for identifying items that can be stored, revised, or discarded to build a quality MCQ bank. This study analyzed MCQs based on item analysis to develop a pool of valid and reliable items and investigate stakeholders' perceptions regarding MCQs in a written summative assessment (WSA) based on this item analysis. METHODS: In this descriptive study, 55 questions each from 2016 to 2019 of WSA in preclinical removable prosthodontics for fourth-year undergraduate dentistry students were analyzed for item analysis. Items were categorized according to their difficulty index (DIF I) and discrimination index (DI). Students (2021-2022) were assessed using this question bank. Students' perceptions of and feedback from faculty members concerning this assessment were collected using a questionnaire with a five-point Likert scale. RESULTS: Of 220 items when both indices (DIF I and DI) were combined, 144 (65.5%) were retained in the question bank, 66 (30%) required revision before incorporation into the question bank, and only 10 (4.5%) were discarded. The mean DIF I and DI values were 69% (standard deviation [Std.Dev] = 19) and 0.22 (Std.Dev = 0.16), respectively, for 220 MCQs. The mean scores from the questionnaire for students and feedback from faculty members ranged from 3.50 to 4.04 and from 4 to 5, respectively, indicating that stakeholders tended to agree and strongly agree, respectively, with the proposed statements. CONCLUSION: This study assisted the prosthodontics department in creating a set of prevalidated questions with known difficulty and discrimination capacity.

Assuntos

Educação em Odontologia , Avaliação Educacional , Prostodontia , Prostodontia/educação , Humanos , Educação em Odontologia/métodos , Avaliação Educacional/métodos , Estudantes de Odontologia/psicologia , Inquéritos e Questionários , Participação dos Interessados

7.

Ten tips for effective use and quality assurance of multiple-choice questions in knowledge-based assessments.

Ali, Kamran; Zahra, Daniel.

Eur J Dent Educ ; 28(2): 655-662, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38282273

RESUMO

Multiple-choice questions (MCQs) are the most popular type of items used in knowledge-based assessments in undergraduate and postgraduate healthcare education. MCQs allow assessment of candidates' knowledge on a broad range of knowledge-based learning outcomes in a single assessment. Single-best-answer (SBA) MCQs are the most versatile and commonly used format. Although writing MCQs may seem straight-forward, producing decent-quality MCQs is challenging and warrants a range of quality checks before an item is deemed suitable for inclusion in an assessment. Like all assessments, MCQ-based examinations must be aligned with the learning outcomes and learning opportunities provided to the students. This paper provides evidence-based guidance on the effective use of MCQs in student assessments, not only to make decisions regarding student progression but also to build an academic environment that promotes assessment as a driver for learning. Practical tips are provided to the readers to produce authentic MCQ items, along with appropriate pre- and post-assessment reviews, the use of standard setting and psychometric evaluation of assessments based on MCQs. Institutions need to develop an academic culture that fosters transparency, openness, equality and inclusivity. In line with contemporary educational principles, teamwork amongst teaching faculty, administrators and students is essential to establish effective learning and assessment practices.

Assuntos

Educação em Odontologia , Avaliação Educacional , Humanos , Estudantes , Aprendizagem , Redação

8.

Twelve tips for designing and implementing peer-led assessment writing schemes in health professions education.

Harris, Benjamin H L; Harris, Samuel R L; Walsh, Jason L; Pereira, Christopher; Black, Susannah M; Allott, Vincent E S; Handa, Ashok; Thampy, Harish.

Med Teach ; : 1-8, 2024 Jan 26.

Artigo em Inglês | MEDLINE | ID: mdl-38277134

RESUMO

Peer-led assessment (PLA) has gained increasing prominence within health professions education as an effective means of engaging learners in the process of assessment writing and practice. Involving students in various stages of the assessment lifecycle, including item writing, quality assurance, and feedback, not only facilitates the creation of high-quality item banks with minimal faculty input but also promotes the development of students' assessment literacy and fosters their growth as teachers. The advantages of involving students in the generation of assessments are evident from a pedagogical standpoint, benefiting both students and faculty. However, faculty members may face uncertainty when it comes to implementing such approaches effectively. To address this concern, this paper presents twelve tips that offer guidance on important considerations for the successful implementation of peer-led assessment schemes in the context of health professions education.

9.

Effect of changing multiple choice questions from "all of the above" to "select all that apply".

Dell, Kamila A; DeVries, Davina M.

Curr Pharm Teach Learn ; 16(3): 174-177, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38218657

RESUMO

INTRODUCTION: The purpose of this study was to describe the effect of converting multiple choice questions (MCQs) that include an "all of the above" (AOTA) answer option to a "select all that apply" (SATA) question type on question performance. METHODS: A summative assessment at the end of the first professional pharmacy year was comprised of approximately 50 multiple choice questions covering material from all courses taught. Eight questions contained AOTA answer options and were converted to SATA items in the subsequent year by eliminating the AOTA option and including the words "select all that apply" in the stem. Majority of the other questions included on the exam remained the same between the two years. Item difficulty, item discrimination, point biserial, and distractor efficiency were used to compare the MCQs on exams in the two years. RESULTS: The AOTA questions were significantly easier and less discriminating than the SATA items. The performance of the remaining questions on the exam did not differ between the years. The distractor efficiency increased significantly when the questions were converted to SATA items. CONCLUSIONS: MCQs with AOTA answer options are discouraged due to poor item construction resulting in poor discrimination between high and low performing students. The AOTA questions are easily converted to the SATA format. The result of this conversion is a more difficult and more discriminating question with all answer options chosen, which prevents students from easily guessing the correct answer.

Assuntos

Avaliação Educacional , Estudantes de Medicina , Succinimidas , Sulfetos , Humanos , Avaliação Educacional/métodos

10.

The effect of the attitude towards risk/ambiguity on examination grades: cross-sectional study in a Portuguese medical school.

Leite-Mendes, Filipe; Delgado, Luis; Ferreira, Amelia; Severo, Milton.

Adv Health Sci Educ Theory Pract ; 2024 Jan 15.

Artigo em Inglês | MEDLINE | ID: mdl-38224412

RESUMO

Given the high prevalence of multiple-choice examinations with formula scoring in medical training, several studies have tried to identify other factors in addition to the degree of knowledge of students which influence their response patterns. This study aims to measure the effect of students' attitude towards risk and ambiguity on their number of correct, wrong, and blank answers. In October 2018, 233 3rd year medical students from the Faculty of Medicine of the University of Porto, in Porto, Portugal, completed a questionnaire which assessed the student's attitudes towards risk and ambiguity, and aversion to ambiguity in medicine. Simple and multiple regression models and the respective regression coefficients were used to measure the association between the students' attitudes, and their answers in two examinations that they had taken in June 2018. Having an intermediate level of ambiguity aversion in medicine (as opposed to a very high or low level) was associated with a significant increase in the number of correct answers and decrease in the number of blank answers in the first examination. In the second examination, high levels of ambiguity aversion in medicine were associated with a decrease in the number of wrong answers. Attitude towards risk, tolerance for ambiguity, and gender did not show significant association with the number of correct, wrong, and blank answers for either examination. Students' ambiguity aversion in medicine is correlated with their performance in multiple-choice examinations with negative marking. Therefore, it is suggested the planning and implementation of counselling sessions with medical students regarding the possible impact of ambiguity aversion on their performance in multiple-choice questions with negative marking.

11.

The Impact of Script Concordance Testing on Clinical Decision-Making in Paramedic Education.

Naylor, Katarzyna; Hislop, Jane; Torres, Kamil; Mani, Zakaria A; Goniewicz, Krzysztof.

Healthcare (Basel) ; 12(2)2024 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-38275562

RESUMO

This study investigates the effectiveness of the Script Concordance Test (SCT) in enhancing clinical reasoning skills within paramedic education. Focusing on the Medical University of Lublin, we evaluated the SCT's application across two cohorts of paramedic students, aiming to understand its potential to improve decision-making skills in emergency scenarios. Our approach, informed by Van der Vleuten's assessment framework, revealed that while the SCT's correlation with traditional methods like multiple-choice questions (MCQs) was limited, its formative nature significantly contributed to improved performance in summative assessments. These findings suggest that the SCT can be an effective tool in paramedic training, particularly in strengthening cognitive abilities critical for emergency responses. The study underscores the importance of incorporating innovative assessment tools like SCTs in paramedic curricula, not only to enhance clinical reasoning but also to prepare students for effective emergency responses. Our research contributes to the ongoing efforts in refining paramedic education and highlights the need for versatile assessment strategies in preparing future healthcare professionals for diverse clinical challenges.

12.

Could ChatGPT-4 pass an anaesthesiology board examination? Follow-up assessment of a comprehensive set of board examination practice questions.

Shay, Denys; Kumar, Bhawesh; Redaelli, Simone; von Wedel, Dario; Liu, Manqing; Dershwitz, Mark; Schaefer, Maximilian S; Beam, Andrew.

Br J Anaesth ; 132(1): 172-174, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-37996275

Assuntos

Anestesiologia , Humanos , Seguimentos

13.

Embedding retrieval practice in undergraduate biochemistry teaching using PeerWise.

Higgins, Tanya; Dudley, Ed; Bodger, Owen; Newton, Phil; Francis, Nigel.

Biochem Mol Biol Educ ; 52(2): 156-164, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-37929789

RESUMO

Retrieval practice is an evidence-based approach to teaching; here, we evaluate the use of PeerWise for embedding retrieval practice into summative assessment. PeerWise allows anonymous authoring, sharing, answering, rating, and feedback on peer-authored multiple choice questions. PeerWise was embedded as a summative assessment in a large first-year introductory biochemistry module. Engagement with five aspects of the tool was evaluated against student performance in coursework, exam, and overall module outcome. Results indicated a weak-to-moderate positive but significant correlation between engagement with PeerWise and assessment performance. Student feedback showed PeerWise had a polarizing effect; the majority recognized the benefits as a learning and revision tool, but a minority strongly disliked it, complaining of a lack of academic moderation and irrelevant questions unrelated to the module. PeerWise can be considered a helpful learning tool for some students and a means of embedding retrieval practice into summative assessment.

Assuntos

Avaliação Educacional , Estudantes , Humanos , Avaliação Educacional/métodos , Aprendizagem , Bioquímica , Retroalimentação , Ensino

14.

Assessment of the capacity of ChatGPT as a self-learning tool in medical pharmacology: a study using MCQs.

Choi, Woong.

BMC Med Educ ; 23(1): 864, 2023 Nov 13.

Artigo em Inglês | MEDLINE | ID: mdl-37957666

RESUMO

BACKGROUND: ChatGPT is a large language model developed by OpenAI that exhibits a remarkable ability to simulate human speech. This investigation attempts to evaluate the potential of ChatGPT as a standalone self-learning tool, with specific attention on its efficacy in answering multiple-choice questions (MCQs) and providing credible rationale for its responses. METHODS: The study used 78 test items from the Korean Comprehensive Basic Medical Sciences Examination (K-CBMSE) for years 2019 to 2021. 78 test items translated from Korean to English with four lead-in prompts per item resulted in a total of 312 MCQs. The MCQs were submitted to ChatGPT and the responses were analyzed for correctness, consistency, and relevance. RESULTS: ChatGPT responded with an overall accuracy of 76.0%. Compared to its performance on recall and interpretation questions, the model performed poorly on problem-solving questions. ChatGPT offered correct rationales for 77.8% (182/234) of the responses, with errors primarily arising from faulty information and flawed reasoning. In terms of references, ChatGPT provided incorrect citations for 69.7% (191/274) of the responses. While the veracity of reference paragraphs could not be ascertained, 77.0% (47/61) were deemed pertinent and accurate with respect to the answer key. CONCLUSION: The current version of ChatGPT has limitations in accurately answering MCQs and generating correct and relevant rationales, particularly when it comes to referencing. To avoid possible threats such as spreading inaccuracies and decreasing critical thinking skills, ChatGPT should be used with supervision.

Assuntos

Inteligência Artificial , Aprendizagem , Farmacologia Clínica , Resolução de Problemas , Humanos , Idioma , Rememoração Mental , Farmacologia Clínica/educação

15.

Evaluating ChatGPT-3.5 and Claude-2 in Answering and Explaining Conceptual Medical Physiology Multiple-Choice Questions.

Agarwal, Mayank; Goswami, Ayan; Sharma, Priyanka.

Cureus ; 15(9): e46222, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37908959

RESUMO

Background Generative artificial intelligence (AI) systems such as ChatGPT-3.5 and Claude-2 may assist in explaining complex medical science topics. A few studies have shown that AI can solve complicated physiology problems that require critical thinking and analysis. However, further studies are required to validate the effectiveness of AI in answering conceptual multiple-choice questions (MCQs) in human physiology. Objective This study aimed to evaluate and compare the proficiency of ChatGPT-3.5 and Claude-2 in answering and explaining a curated set of MCQs in medical physiology. Methods In this cross-sectional study, a set of 55 MCQs from 10 competencies of medical physiology was purposefully constructed that required comprehension, problem-solving, and analytical skills to solve them. The MCQs and a structured prompt for response generation were presented to ChatGPT-3.5 and Claude-2. The explanations provided by both AI systems were documented in an Excel spreadsheet. All three authors subjected these explanations to a rating process using a scale of 0 to 3. A rating of 0 was assigned to an incorrect, 1 to a partially correct, 2 to a correct explanation with some aspects missing, and 3 to a perfectly correct explanation. Both AI models were evaluated for their ability to choose the correct answer (option) and provide clear and comprehensive explanations of the MCQs. The Mann-Whitney U test was used to compare AI responses. The Fleiss multi-rater kappa (κ) was used to determine the score agreement among the three raters. The statistical significance level was decided at P ≤ 0.05. Results Claude-2 answered 40 MCQs correctly, which was significantly higher than the 26 correct responses from ChatGPT-3.5. The rating distribution for the explanations generated by Claude-2 was significantly higher than that of ChatGPT-3.5. The κ values were 0.804 and 0.818 for Claude-2 and ChatGPT-3.5, respectively. Conclusion In terms of answering and elucidating conceptual MCQs in medical physiology, Claude-2 surpassed ChatGPT-3.5. However, accessing Claude-2 from India requires the use of a virtual private network, which may raise security concerns.

16.

Advantages and pitfalls in utilizing artificial intelligence for crafting medical examinations: a medical education pilot study with GPT-4.

E, Klang; S, Portugez; R, Gross; R, Kassif Lerner; A, Brenner; M, Gilboa; T, Ortal; S, Ron; V, Robinzon; H, Meiri; G, Segal.

BMC Med Educ ; 23(1): 772, 2023 Oct 17.

Artigo em Inglês | MEDLINE | ID: mdl-37848913

RESUMO

BACKGROUND: The task of writing multiple choice question examinations for medical students is complex, timely and requires significant efforts from clinical staff and faculty. Applying artificial intelligence algorithms in this field of medical education may be advisable. METHODS: During March to April 2023, we utilized GPT-4, an OpenAI application, to write a 210 multi choice questions-MCQs examination based on an existing exam template and thoroughly investigated the output by specialist physicians who were blinded to the source of the questions. Algorithm mistakes and inaccuracies, as identified by specialists were classified as stemming from age, gender or geographical insensitivities. RESULTS: After inputting a detailed prompt, GPT-4 produced the test rapidly and effectively. Only 1 question (0.5%) was defined as false; 15% of questions necessitated revisions. Errors in the AI-generated questions included: the use of outdated or inaccurate terminology, age-sensitive inaccuracies, gender-sensitive inaccuracies, and geographically sensitive inaccuracies. Questions that were disqualified due to flawed methodology basis included elimination-based questions and questions that did not include elements of integrating knowledge with clinical reasoning. CONCLUSION: GPT-4 can be used as an adjunctive tool in creating multi-choice question medical examinations yet rigorous inspection by specialist physicians remains pivotal.

Assuntos

Educação Médica , Avaliação Educacional , Humanos , Avaliação Educacional/métodos , Projetos Piloto , Inteligência Artificial , Redação

17.

Using Automatic Item Generation to Create Multiple-Choice Questions for Pharmacy Assessment.

Leslie, Tara; Gierl, Mark J.

Am J Pharm Educ ; 87(10): 100081, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37852684

RESUMO

OBJECTIVE: Automatic item generation (AIG) is a new area of assessment research where a set of multiple-choice questions (MCQs) are created using models and computer technology. Although successfully demonstrated in medicine and dentistry, AIG has not been implemented in pharmacy. The objective was to implement AIG to create a set of MCQs appropriate for inclusion in a summative, high-stakes, pharmacy examination. METHODS: A 3-step process, well evidenced in AIG research, was employed to create the pharmacy MCQs. The first step was developing a cognitive model based on content within the examination blueprint. Second, an item model was developed based on the cognitive model. A process of systematic distractor generation was also incorporated to optimize distractor plausibility. Third, we used computer technology to assemble a set of test items based on the cognitive and item models. A sample of generated items was assessed for quality against Gierl and Lai's 8 guidelines of item quality. RESULTS: More than 15,000 MCQs were generated to measure knowledge and skill of patient assessment and treatment of nausea and/or vomiting within the scope of clinical pharmacy. A sample of generated items satisfies the requirements of content-related validity and quality after substantive review. CONCLUSION: This research demonstrates the AIG process is a viable strategy for creating a test item bank to provide MCQs appropriate for inclusion in a pharmacy licensing examination.

Assuntos

Educação de Graduação em Medicina , Educação em Farmácia , Farmácia , Humanos , Avaliação Educacional , Computadores

18.

Automated Item Generation: impact of item variants on performance and standard setting.

Westacott, R; Badger, K; Kluth, D; Gurnell, M; Reed, M W R; Sam, A H.

BMC Med Educ ; 23(1): 659, 2023 Sep 11.

Artigo em Inglês | MEDLINE | ID: mdl-37697275

RESUMO

BACKGROUND: Automated Item Generation (AIG) uses computer software to create multiple items from a single question model. There is currently a lack of data looking at whether item variants to a single question result in differences in student performance or human-derived standard setting. The purpose of this study was to use 50 Multiple Choice Questions (MCQs) as models to create four distinct tests which would be standard set and given to final year UK medical students, and then to compare the performance and standard setting data for each. METHODS: Pre-existing questions from the UK Medical Schools Council (MSC) Assessment Alliance item bank, created using traditional item writing techniques, were used to generate four 'isomorphic' 50-item MCQ tests using AIG software. Isomorphic questions use the same question template with minor alterations to test the same learning outcome. All UK medical schools were invited to deliver one of the four papers as an online formative assessment for their final year students. Each test was standard set using a modified Angoff method. Thematic analysis was conducted for item variants with high and low levels of variance in facility (for student performance) and average scores (for standard setting). RESULTS: Two thousand two hundred eighteen students from 12 UK medical schools participated, with each school using one of the four papers. The average facility of the four papers ranged from 0.55-0.61, and the cut score ranged from 0.58-0.61. Twenty item models had a facility difference > 0.15 and 10 item models had a difference in standard setting of > 0.1. Variation in parameters that could alter clinical reasoning strategies had the greatest impact on item facility. CONCLUSIONS: Item facility varied to a greater extent than the standard set. This difference may relate to variants causing greater disruption of clinical reasoning strategies in novice learners compared to experts, but is confounded by the possibility that the performance differences may be explained at school level and therefore warrants further study.

Assuntos

Raciocínio Clínico , Estudantes de Medicina , Humanos , Aprendizagem , Faculdades de Medicina , Software

19.

Exploring the Potential and Limitations of Chat Generative Pre-trained Transformer (ChatGPT) in Generating Board-Style Dermatology Questions: A Qualitative Analysis.

Ayub, Ibraheim; Hamann, Dathan; Hamann, Carsten R; Davis, Matthew J.

Cureus ; 15(8): e43717, 2023 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-37638266

RESUMO

This article investigates the limitations of Chat Generative Pre-trained Transformer (ChatGPT), a language model developed by OpenAI, as a study tool in dermatology. The study utilized ChatPDF, an application that integrates PDF files with ChatGPT, to generate American Board of Dermatology Applied Exam (ABD-AE)-style questions from continuing medical education articles from the Journal of the American Board of Dermatology. A qualitative analysis of the questions was conducted by two board-certified dermatologists, assessing accuracy, complexity, and clarity. Out of 40 questions generated, only 16 (40%) were deemed accurate and appropriate for ABD-AE study preparation. The remaining questions exhibited limitations, including low complexity, lack of clarity, and inaccuracies. The findings highlight the challenges faced by ChatGPT in understanding the domain-specific knowledge required in dermatology. Moreover, the model's inability to comprehend the context and generate high-quality distractor options, as well as the absence of image generation capabilities, further hinders its usefulness. The study emphasizes that while ChatGPT may aid in generating simple questions, it cannot replace the expertise of dermatologists and medical educators in developing high-quality, board-style questions that effectively evaluate candidates' knowledge and reasoning abilities.

20.

The pearls and pitfalls of setting high-quality multiple choice questions for clinical medicine.

Naidoo, Mergan.

S Afr Fam Pract (2004) ; 65(1): e1-e4, 2023 05 29.

Artigo em Inglês | MEDLINE | ID: mdl-37265132

RESUMO

Multiple choice question (MCQ) examinations have become extremely popular for testing applied knowledge in the basic and clinical sciences. When setting MCQ examinations, assessors need to understand the measures that improve validity and reliability so that the examination accurately reflects the candidate's ability. This continuing medical education unit will cover the essentials of blueprinting an exam, constructing high-quality MCQs and post hoc vetting of the exam. It is hoped that academics involved in assessments use the content provided to improve their skills in setting high-quality MCQs.

Assuntos

Medicina Clínica , Avaliação Educacional , Exame Físico , Reprodutibilidade dos Testes , Humanos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA