Pesquisa | Secretaria de Estado da Saúde

1.

ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam.

Kiyak, Yavuz Selim; Coskun, Özlem; Budakoglu, Isil Irem; Uluoglu, Canan.

Eur J Clin Pharmacol ; 80(5): 729-735, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38353690

RESUMO

PURPOSE: Artificial intelligence, specifically large language models such as ChatGPT, offers valuable potential benefits in question (item) writing. This study aimed to determine the feasibility of generating case-based multiple-choice questions using ChatGPT in terms of item difficulty and discrimination levels. METHODS: This study involved 99 fourth-year medical students who participated in a rational pharmacotherapy clerkship carried out based-on the WHO 6-Step Model. In response to a prompt that we provided, ChatGPT generated ten case-based multiple-choice questions on hypertension. Following an expert panel, two of these multiple-choice questions were incorporated into a medical school exam without making any changes in the questions. Based on the administration of the test, we evaluated their psychometric properties, including item difficulty, item discrimination (point-biserial correlation), and functionality of the options. RESULTS: Both questions exhibited acceptable levels of point-biserial correlation, which is higher than the threshold of 0.30 (0.41 and 0.39). However, one question had three non-functional options (options chosen by fewer than 5% of the exam participants) while the other question had none. CONCLUSIONS: The findings showed that the questions can effectively differentiate between students who perform at high and low levels, which also point out the potential of ChatGPT as an artificial intelligence tool in test development. Future studies may use the prompt to generate items in order for enhancing the external validity of the results by gathering data from diverse institutions and settings.

Assuntos

Hipertensão , Estudantes de Medicina , Humanos , Inteligência Artificial , Faculdades de Medicina

2.

The effect of the attitude towards risk/ambiguity on examination grades: cross-sectional study in a Portuguese medical school.

Leite-Mendes, Filipe; Delgado, Luis; Ferreira, Amelia; Severo, Milton.

Adv Health Sci Educ Theory Pract ; 29(4): 1309-1321, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38224412

RESUMO

Given the high prevalence of multiple-choice examinations with formula scoring in medical training, several studies have tried to identify other factors in addition to the degree of knowledge of students which influence their response patterns. This study aims to measure the effect of students' attitude towards risk and ambiguity on their number of correct, wrong, and blank answers. In October 2018, 233 3rd year medical students from the Faculty of Medicine of the University of Porto, in Porto, Portugal, completed a questionnaire which assessed the student's attitudes towards risk and ambiguity, and aversion to ambiguity in medicine. Simple and multiple regression models and the respective regression coefficients were used to measure the association between the students' attitudes, and their answers in two examinations that they had taken in June 2018. Having an intermediate level of ambiguity aversion in medicine (as opposed to a very high or low level) was associated with a significant increase in the number of correct answers and decrease in the number of blank answers in the first examination. In the second examination, high levels of ambiguity aversion in medicine were associated with a decrease in the number of wrong answers. Attitude towards risk, tolerance for ambiguity, and gender did not show significant association with the number of correct, wrong, and blank answers for either examination. Students' ambiguity aversion in medicine is correlated with their performance in multiple-choice examinations with negative marking. Therefore, it is suggested the planning and implementation of counselling sessions with medical students regarding the possible impact of ambiguity aversion on their performance in multiple-choice questions with negative marking.

Assuntos

Avaliação Educacional , Estudantes de Medicina , Humanos , Portugal , Estudos Transversais , Feminino , Masculino , Estudantes de Medicina/psicologia , Faculdades de Medicina , Educação de Graduação em Medicina , Adulto Jovem , Inquéritos e Questionários , Adulto , Risco , Atitude

3.

ChatGPT prompts for generating multiple-choice questions in medical education and evidence on their validity: a literature review.

Kiyak, Yavuz Selim; Emekli, Emre.

Postgrad Med J ; 2024 Jun 06.

Artigo em Inglês | MEDLINE | ID: mdl-38840505

RESUMO

ChatGPT's role in creating multiple-choice questions (MCQs) is growing but the validity of these artificial-intelligence-generated questions is unclear. This literature review was conducted to address the urgent need for understanding the application of ChatGPT in generating MCQs for medical education. Following the database search and screening of 1920 studies, we found 23 relevant studies. We extracted the prompts for MCQ generation and assessed the validity evidence of MCQs. The findings showed that prompts varied, including referencing specific exam styles and adopting specific personas, which align with recommended prompt engineering tactics. The validity evidence covered various domains, showing mixed accuracy rates, with some studies indicating comparable quality to human-written questions, and others highlighting differences in difficulty and discrimination levels, alongside a significant reduction in question creation time. Despite its efficiency, we highlight the necessity of careful review and suggest a need for further research to optimize the use of ChatGPT in question generation. Main messages Ensure high-quality outputs by utilizing well-designed prompts; medical educators should prioritize the use of detailed, clear ChatGPT prompts when generating MCQs. Avoid using ChatGPT-generated MCQs directly in examinations without thorough review to prevent inaccuracies and ensure relevance. Leverage ChatGPT's potential to streamline the test development process, enhancing efficiency without compromising quality.

4.

Case-based MCQ generator: A custom ChatGPT based on published prompts in the literature for automatic item generation.

Kiyak, Yavuz Selim; Kononowicz, Andrzej A.

Med Teach ; 46(8): 1018-1020, 2024 08.

Artigo em Inglês | MEDLINE | ID: mdl-38340312

RESUMO

WHAT IS THE EDUCATIONAL CHALLENGE?: A fundamental challenge in medical education is creating high-quality, clinically relevant multiple-choice questions (MCQs). ChatGPT-based automatic item generation (AIG) methods need well-designed prompts. However, the use of these prompts is hindered by the time-consuming process of copying and pasting, a lack of know-how among medical teachers, and the generalist nature of standard ChatGPT, which often lacks the medical context. WHAT ARE THE PROPOSED SOLUTIONS?: The Case-based MCQ Generator, a custom GPT, addresses these challenges. It has been trained by using GPT Builder, which is a platform designed by OpenAI for customizing ChatGPT to meet specific needs, in order to allow users to generate case-based MCQs. By using this free tool for those who have ChatGPT Plus subscription, health professions educators can easily select a prompt, input a learning objective or item-specific test point, and generate clinically relevant questions. WHAT ARE THE POTENTIAL BENEFITS TO A WIDER GLOBAL AUDIENCE?: It enhances the efficiency of MCQ generation and ensures the generation of contextually relevant questions, surpassing the capabilities of standard ChatGPT. It streamlines the MCQ creation process by integrating prompts published in medical education literature, eliminating the need for manual prompt input. WHAT ARE THE NEXT STEPS?: Future development aims at sustainability and addressing ethical and accessibility issues. It requires regular updates, integration of new prompts from emerging health professions education literature, and a supportive digital ecosystem around the tool. Accessibility, especially for educators in low-resource countries, is vital, demanding alternative access models to overcome financial barriers.

Assuntos

Educação Médica , Avaliação Educacional , Humanos , Educação Médica/métodos , Avaliação Educacional/métodos

5.

Twelve tips for designing and implementing peer-led assessment writing schemes in health professions education.

Harris, Benjamin H L; Harris, Samuel R L; Walsh, Jason L; Pereira, Christopher; Black, Susannah M; Allott, Vincent E S; Handa, Ashok; Thampy, Harish.

Med Teach ; 46(8): 1027-1034, 2024 08.

Artigo em Inglês | MEDLINE | ID: mdl-38277134

RESUMO

Peer-led assessment (PLA) has gained increasing prominence within health professions education as an effective means of engaging learners in the process of assessment writing and practice. Involving students in various stages of the assessment lifecycle, including item writing, quality assurance, and feedback, not only facilitates the creation of high-quality item banks with minimal faculty input but also promotes the development of students' assessment literacy and fosters their growth as teachers. The advantages of involving students in the generation of assessments are evident from a pedagogical standpoint, benefiting both students and faculty. However, faculty members may face uncertainty when it comes to implementing such approaches effectively. To address this concern, this paper presents twelve tips that offer guidance on important considerations for the successful implementation of peer-led assessment schemes in the context of health professions education.

Assuntos

Avaliação Educacional , Ocupações em Saúde , Grupo Associado , Redação , Humanos , Avaliação Educacional/métodos , Ocupações em Saúde/educação

6.

Utility of RAND/UCLA appropriateness method in validating multiple-choice questions on ECG.

Kaga, Tomohiro; Inaba, Shinji; Shikano, Yukari; Watanabe, Yasuyuki; Fujisawa, Tomoki; Akazawa, Yusuke; Ohshita, Muneaki; Kawakami, Hiroshi; Higashi, Haruhiko; Aono, Jun; Nagai, Takayuki; Islam, Mohammad Zahidul; Wannous, Muhammad; Sakata, Masatsugu; Yamamoto, Kazumichi; Furukawa, Toshi A; Yamaguchi, Osamu.

BMC Med Educ ; 24(1): 448, 2024 Apr 24.

Artigo em Inglês | MEDLINE | ID: mdl-38658906

RESUMO

OBJECTIVES: This study aimed to investigate the utility of the RAND/UCLA appropriateness method (RAM) in validating expert consensus-based multiple-choice questions (MCQs) on electrocardiogram (ECG). METHODS: According to the RAM user's manual, nine panelists comprising various experts who routinely handle ECGs were asked to reach a consensus in three phases: a preparatory phase (round 0), an online test phase (round 1), and a face-to-face expert panel meeting (round 2). In round 0, the objectives and future timeline of the study were elucidated to the nine expert panelists with a summary of relevant literature. In round 1, 100 ECG questions prepared by two skilled cardiologists were answered, and the success rate was calculated by dividing the number of correct answers by 9. Furthermore, the questions were stratified into "Appropriate," "Discussion," or "Inappropriate" according to the median score and interquartile range (IQR) of appropriateness rating by nine panelists. In round 2, the validity of the 100 ECG questions was discussed in an expert panel meeting according to the results of round 1 and finally reassessed as "Appropriate," "Candidate," "Revision," and "Defer." RESULTS: In round 1 results, the average success rate of the nine experts was 0.89. Using the median score and IQR, 54 questions were classified as " Discussion." In the expert panel meeting in round 2, 23% of the original 100 questions was ultimately deemed inappropriate, although they had been prepared by two skilled cardiologists. Most of the 46 questions categorized as "Appropriate" using the median score and IQR in round 1 were considered "Appropriate" even after round 2 (44/46, 95.7%). CONCLUSIONS: The use of the median score and IQR allowed for a more objective determination of question validity. The RAM may help select appropriate questions, contributing to the preparation of higher-quality tests.

Assuntos

Eletrocardiografia , Humanos , Consenso , Reprodutibilidade dos Testes , Competência Clínica/normas , Avaliação Educacional/métodos , Cardiologia/normas

7.

The impact of repeated item development training on the prediction of medical faculty members' item difficulty index.

Lee, Hye Yoon; Yune, So Jung; Lee, Sang Yeoup; Im, Sunju; Kam, Bee Sung.

BMC Med Educ ; 24(1): 599, 2024 May 30.

Artigo em Inglês | MEDLINE | ID: mdl-38816855

RESUMO

BACKGROUND: Item difficulty plays a crucial role in assessing students' understanding of the concept being tested. The difficulty of each item needs to be carefully adjusted to ensure the achievement of the evaluation's objectives. Therefore, this study aimed to investigate whether repeated item development training for medical school faculty improves the accuracy of predicting item difficulty in multiple-choice questions. METHODS: A faculty development program was implemented to enhance the prediction of each item's difficulty index, ensure the absence of item defects, and maintain the general principles of item development. The interrater reliability between the predicted, actual, and corrected item difficulty was assessed before and after the training, using either the kappa index or the correlation coefficient, depending on the characteristics of the data. A total of 62 faculty members participated in the training. Their predictions of item difficulty were compared with the analysis results of 260 items taken by 119 fourth-year medical students in 2016 and 316 items taken by 125 fourth-year medical students in 2018. RESULTS: Before the training, significant agreement between the predicted and actual item difficulty indices was observed for only one medical subject, Cardiology (K = 0.106, P = 0.021). However, after the training, significant agreement was noted for four subjects: Internal Medicine (K = 0.092, P = 0.015), Cardiology (K = 0.318, P = 0.021), Neurology (K = 0.400, P = 0.043), and Preventive Medicine (r = 0.577, P = 0.039). Furthermore, a significant agreement was observed between the predicted and actual difficulty indices across all subjects when analyzing the average difficulty of all items (r = 0.144, P = 0.043). Regarding the actual difficulty index by subject, neurology exceeded the desired difficulty range of 0.45-0.75 in 2016. By 2018, however, all subjects fell within this range. CONCLUSION: Repeated item development training, which includes predicting each item's difficulty index, can enhance faculty members' ability to predict and adjust item difficulty accurately. To ensure that the difficulty of the examination aligns with its intended purpose, item development training can be beneficial. Further studies on faculty development are necessary to explore these benefits more comprehensively.

Assuntos

Avaliação Educacional , Docentes de Medicina , Humanos , Avaliação Educacional/métodos , Reprodutibilidade dos Testes , Estudantes de Medicina , Educação de Graduação em Medicina , Masculino , Feminino

8.

Question banks: credit? Or debit? A qualitative exploration of their use among medical students.

Fisher, James; Leahy, Declan; Lim, Jun Jie; Astles, Emily; Salvatore, Jacobo; Thomson, Richard.

BMC Med Educ ; 24(1): 569, 2024 May 24.

Artigo em Inglês | MEDLINE | ID: mdl-38790034

RESUMO

BACKGROUND: Online question banks are the most widely used education resource amongst medical students. Despite this there is an absence of literature outlining how and why they are used by students. Drawing on Deci and Ryan's self-determination theory, our study aimed to explore why and how early-stage medical students use question banks in their learning and revision strategies. METHODS: The study was conducted at Newcastle University Medical School (United Kingdom and Malaysia). Purposive, convenience and snowball sampling of year two students were employed. Ten interviews were conducted. Thematic analysis was undertaken iteratively, enabling exploration of nascent themes. Data collection ceased when no new perspectives were identified. RESULTS: Students' motivation to use question banks was predominantly driven by extrinsic motivators, with high-stakes exams and fear of failure being central. Their convenience and perceived efficiency promoted autonomy and thus motivation. Rapid feedback cycles and design features consistent with gamification were deterrents to intrinsic motivation. Potentially detrimental patterns of question bank use were evident: cueing, avoidance and memorising. Scepticism regarding veracity of question bank content was absent. CONCLUSIONS: We call on educators to provide students with guidance about potential pitfalls associated with question banks and to reflect on potential inequity of access to these resources.

Assuntos

Motivação , Pesquisa Qualitativa , Estudantes de Medicina , Humanos , Estudantes de Medicina/psicologia , Malásia , Reino Unido , Avaliação Educacional , Feminino , Educação de Graduação em Medicina , Masculino , Internet

9.

Large language models for generating medical examinations: systematic review.

Artsi, Yaara; Sorin, Vera; Konen, Eli; Glicksberg, Benjamin S; Nadkarni, Girish; Klang, Eyal.

BMC Med Educ ; 24(1): 354, 2024 Mar 29.

Artigo em Inglês | MEDLINE | ID: mdl-38553693

RESUMO

BACKGROUND: Writing multiple choice questions (MCQs) for the purpose of medical exams is challenging. It requires extensive medical knowledge, time and effort from medical educators. This systematic review focuses on the application of large language models (LLMs) in generating medical MCQs. METHODS: The authors searched for studies published up to November 2023. Search terms focused on LLMs generated MCQs for medical examinations. Non-English, out of year range and studies not focusing on AI generated multiple-choice questions were excluded. MEDLINE was used as a search database. Risk of bias was evaluated using a tailored QUADAS-2 tool. RESULTS: Overall, eight studies published between April 2023 and October 2023 were included. Six studies used Chat-GPT 3.5, while two employed GPT 4. Five studies showed that LLMs can produce competent questions valid for medical exams. Three studies used LLMs to write medical questions but did not evaluate the validity of the questions. One study conducted a comparative analysis of different models. One other study compared LLM-generated questions with those written by humans. All studies presented faulty questions that were deemed inappropriate for medical exams. Some questions required additional modifications in order to qualify. CONCLUSIONS: LLMs can be used to write MCQs for medical examinations. However, their limitations cannot be ignored. Further study in this field is essential and more conclusive evidence is needed. Until then, LLMs may serve as a supplementary tool for writing medical examinations. 2 studies were at high risk of bias. The study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.

Assuntos

Avaliação Educacional , Humanos , Avaliação Educacional/métodos , Redação/normas , Idioma , Educação Médica

10.

Ten tips for effective use and quality assurance of multiple-choice questions in knowledge-based assessments.

Ali, Kamran; Zahra, Daniel.

Eur J Dent Educ ; 28(2): 655-662, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38282273

RESUMO

Multiple-choice questions (MCQs) are the most popular type of items used in knowledge-based assessments in undergraduate and postgraduate healthcare education. MCQs allow assessment of candidates' knowledge on a broad range of knowledge-based learning outcomes in a single assessment. Single-best-answer (SBA) MCQs are the most versatile and commonly used format. Although writing MCQs may seem straight-forward, producing decent-quality MCQs is challenging and warrants a range of quality checks before an item is deemed suitable for inclusion in an assessment. Like all assessments, MCQ-based examinations must be aligned with the learning outcomes and learning opportunities provided to the students. This paper provides evidence-based guidance on the effective use of MCQs in student assessments, not only to make decisions regarding student progression but also to build an academic environment that promotes assessment as a driver for learning. Practical tips are provided to the readers to produce authentic MCQ items, along with appropriate pre- and post-assessment reviews, the use of standard setting and psychometric evaluation of assessments based on MCQs. Institutions need to develop an academic culture that fosters transparency, openness, equality and inclusivity. In line with contemporary educational principles, teamwork amongst teaching faculty, administrators and students is essential to establish effective learning and assessment practices.

Assuntos

Educação em Odontologia , Avaliação Educacional , Humanos , Estudantes , Aprendizagem , Redação

11.

Multiple choice as formative assessment in dental education.

Haugen, Håvard J; de Lange, Thomas.

Eur J Dent Educ ; 28(3): 757-769, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38456591

RESUMO

INTRODUCTION: The effectiveness of multiple-choice questions (MCQs) in dental education is pivotal to student performance and knowledge advancement. However, their optimal implementation requires exploration to enhance the benefits. MATERIALS AND METHODS: An educational tool incorporating MCQs was administered from the 5th to the 10th semester in a dental curriculum. The students filled out a questionnaire after the MCQ, which was linked to the learning management system. Four cohorts of four semesters generated 2300 data points analysed by Spearmen correlation and mixed model regression analysis. RESULTS: Demonstrated a significant correlation between early exam preparation and improved student performance. Independent study hours and lecture attendance emerged as significant predictors, accounting for approximately 10.27% of the variance in student performance on MCQs. While the number of MCQs taken showed an inverse relationship with study hours, the perceived clarity of these questions positively correlated with academic achievement. CONCLUSION: MCQs have proven effective in enhancing student learning and knowledge within the discipline. Our analysis underscores the important role of independent study and consistent lecture attendance in positively influencing MCQ scores. The study provides valuable insights into using MCQs as a practical tool for dental student learning. Moreover, the clarity of assessment tools, such as MCQs, remains pivotal in influencing student outcomes. This study underscores the multifaceted nature of learning experiences in dental education and the importance of bridging the gap between student expectations and actual performance.

Assuntos

Educação em Odontologia , Avaliação Educacional , Educação em Odontologia/métodos , Avaliação Educacional/métodos , Humanos , Inquéritos e Questionários , Currículo , Estudantes de Odontologia/psicologia , Feminino , Masculino

12.

Sustained clinical knowledge improvements from simulation experiences with Simulation via Instant Messaging-Birmingham Advance.

Zhou, Dengyi; Davitadze, Meri; Ooi, Emma; Ng, Cai Ying; Allison, Isabel; Thomas, Lucretia; Hanania, Thia; Blaggan, Parisha; Evans, Nia; Chen, Wentin; Melson, Eka; Boelaert, Kristien; Karavitaki, Niki; Kempegowda, Punith.

Postgrad Med J ; 99(1167): 25-31, 2023 Mar 22.

Artigo em Inglês | MEDLINE | ID: mdl-36947426

RESUMO

BACKGROUND: Simulation via Instant Messaging-Birmingham Advance (SIMBA) delivers simulation-based learning through WhatsApp and Zoom, helping to sustain continuing medical education (CME) for postgraduate healthcare professionals otherwise disrupted by the coronavirus (COVID-19) pandemic. This study aimed to assess whether SIMBA helped to improve clinical knowledge and if this improvement in knowledge was sustained over time. METHODS: Two SIMBA sessions-thyroid and pituitary-were conducted in July-August 2020. Each session included simulation of various real-life cases and interactive discussion. Participants' self-reported confidence, acceptance, and knowledge were measured using surveys and multiple-choice questions pre- and post-simulation and in a 6- to 12-week follow-up period. The evaluation surveys were designed using Moore's 7 Levels of CME Outcomes Framework. RESULTS: A total of 116 participants were included in the analysis. Significant improvement was observed in participants' self-reported confidence in approach to simulated cases (thyroid, n = 37, P < .0001; pituitary, n = 79, P < .0001). Significant improvement in clinical knowledge was observed following simulation (thyroid, n = 37, P < .0001; pituitary, n = 79, P < .0001). For both sessions, retention of confidence and knowledge was seen at 6-12 weeks' follow-up. CONCLUSIONS: SIMBA increased participants' clinical knowledge on simulated cases and this improvement was retained up to 6-12 weeks after the session. Further studies are required to explore long-term retention and whether it translates to improved real-world clinical practice.

Assuntos

COVID-19 , Humanos , COVID-19/epidemiologia , Pessoal de Saúde/educação , Educação Médica Continuada , Competência Clínica

13.

Identifying the response process validity of clinical vignette-type multiple choice questions: An eye-tracking study.

Specian Junior, Francisco Carlos; Santos, Thiago Martins; Sandars, John; Amaral, Eliana Martorano; Cecilio-Fernandes, Dario.

Med Teach ; 45(8): 845-851, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-36840707

RESUMO

INTRODUCTION: Clinical vignette-type multiple choice questions (CV-MCQs) are widely used in assessment and identifying the response process validity (RPV) of questions with low and high integration of knowledge is essential. Answering CV-MCQs of different levels of knowledge application and integration can be understood from a cognitive workload perspective and this can be identified by using eye-tracking. The aim of the pilot study was to identify the cognitive workload and RPV of CV-MCQs of different levels of knowledge application and integration by the use eye-tracking. METHODS: Fourteen fourth-year medical students answered a test with 40 CV-MCQs, which were equally divided into low-level and high-level complexity (knowledge application and integration). Cognitive workload was measured using screen-based eye tracking, with the number of fixations and revisitations for each area of interest. RESULTS: We found a higher cognitive workload for high-level complexity (M = 121.74) compared with lower-level complexity questions (M = 51.94) and also for participants who answered questions incorrectly (M = 94.31) compared with correctly (M = 79.36). CONCLUSION: Eye-tracking has the potential to become a useful and practical approach for helping to identify the RPV of CV-MCQs. This approach can be used for improving the design and development of CV-MCQs, and to provide feedback to inform teaching and learning.[Box: see text].

Assuntos

Avaliação Educacional , Tecnologia de Rastreamento Ocular , Humanos , Projetos Piloto , Aprendizagem , Retroalimentação

14.

Assessment of the capacity of ChatGPT as a self-learning tool in medical pharmacology: a study using MCQs.

Choi, Woong.

BMC Med Educ ; 23(1): 864, 2023 Nov 13.

Artigo em Inglês | MEDLINE | ID: mdl-37957666

RESUMO

BACKGROUND: ChatGPT is a large language model developed by OpenAI that exhibits a remarkable ability to simulate human speech. This investigation attempts to evaluate the potential of ChatGPT as a standalone self-learning tool, with specific attention on its efficacy in answering multiple-choice questions (MCQs) and providing credible rationale for its responses. METHODS: The study used 78 test items from the Korean Comprehensive Basic Medical Sciences Examination (K-CBMSE) for years 2019 to 2021. 78 test items translated from Korean to English with four lead-in prompts per item resulted in a total of 312 MCQs. The MCQs were submitted to ChatGPT and the responses were analyzed for correctness, consistency, and relevance. RESULTS: ChatGPT responded with an overall accuracy of 76.0%. Compared to its performance on recall and interpretation questions, the model performed poorly on problem-solving questions. ChatGPT offered correct rationales for 77.8% (182/234) of the responses, with errors primarily arising from faulty information and flawed reasoning. In terms of references, ChatGPT provided incorrect citations for 69.7% (191/274) of the responses. While the veracity of reference paragraphs could not be ascertained, 77.0% (47/61) were deemed pertinent and accurate with respect to the answer key. CONCLUSION: The current version of ChatGPT has limitations in accurately answering MCQs and generating correct and relevant rationales, particularly when it comes to referencing. To avoid possible threats such as spreading inaccuracies and decreasing critical thinking skills, ChatGPT should be used with supervision.

Assuntos

Inteligência Artificial , Aprendizagem , Farmacologia Clínica , Resolução de Problemas , Humanos , Idioma , Rememoração Mental , Farmacologia Clínica/educação

15.

Automated Item Generation: impact of item variants on performance and standard setting.

Westacott, R; Badger, K; Kluth, D; Gurnell, M; Reed, M W R; Sam, A H.

BMC Med Educ ; 23(1): 659, 2023 Sep 11.

Artigo em Inglês | MEDLINE | ID: mdl-37697275

RESUMO

BACKGROUND: Automated Item Generation (AIG) uses computer software to create multiple items from a single question model. There is currently a lack of data looking at whether item variants to a single question result in differences in student performance or human-derived standard setting. The purpose of this study was to use 50 Multiple Choice Questions (MCQs) as models to create four distinct tests which would be standard set and given to final year UK medical students, and then to compare the performance and standard setting data for each. METHODS: Pre-existing questions from the UK Medical Schools Council (MSC) Assessment Alliance item bank, created using traditional item writing techniques, were used to generate four 'isomorphic' 50-item MCQ tests using AIG software. Isomorphic questions use the same question template with minor alterations to test the same learning outcome. All UK medical schools were invited to deliver one of the four papers as an online formative assessment for their final year students. Each test was standard set using a modified Angoff method. Thematic analysis was conducted for item variants with high and low levels of variance in facility (for student performance) and average scores (for standard setting). RESULTS: Two thousand two hundred eighteen students from 12 UK medical schools participated, with each school using one of the four papers. The average facility of the four papers ranged from 0.55-0.61, and the cut score ranged from 0.58-0.61. Twenty item models had a facility difference > 0.15 and 10 item models had a difference in standard setting of > 0.1. Variation in parameters that could alter clinical reasoning strategies had the greatest impact on item facility. CONCLUSIONS: Item facility varied to a greater extent than the standard set. This difference may relate to variants causing greater disruption of clinical reasoning strategies in novice learners compared to experts, but is confounded by the possibility that the performance differences may be explained at school level and therefore warrants further study.

Assuntos

Raciocínio Clínico , Estudantes de Medicina , Humanos , Aprendizagem , Faculdades de Medicina , Software

16.

Advantages and pitfalls in utilizing artificial intelligence for crafting medical examinations: a medical education pilot study with GPT-4.

E, Klang; S, Portugez; R, Gross; R, Kassif Lerner; A, Brenner; M, Gilboa; T, Ortal; S, Ron; V, Robinzon; H, Meiri; G, Segal.

BMC Med Educ ; 23(1): 772, 2023 Oct 17.

Artigo em Inglês | MEDLINE | ID: mdl-37848913

RESUMO

BACKGROUND: The task of writing multiple choice question examinations for medical students is complex, timely and requires significant efforts from clinical staff and faculty. Applying artificial intelligence algorithms in this field of medical education may be advisable. METHODS: During March to April 2023, we utilized GPT-4, an OpenAI application, to write a 210 multi choice questions-MCQs examination based on an existing exam template and thoroughly investigated the output by specialist physicians who were blinded to the source of the questions. Algorithm mistakes and inaccuracies, as identified by specialists were classified as stemming from age, gender or geographical insensitivities. RESULTS: After inputting a detailed prompt, GPT-4 produced the test rapidly and effectively. Only 1 question (0.5%) was defined as false; 15% of questions necessitated revisions. Errors in the AI-generated questions included: the use of outdated or inaccurate terminology, age-sensitive inaccuracies, gender-sensitive inaccuracies, and geographically sensitive inaccuracies. Questions that were disqualified due to flawed methodology basis included elimination-based questions and questions that did not include elements of integrating knowledge with clinical reasoning. CONCLUSION: GPT-4 can be used as an adjunctive tool in creating multi-choice question medical examinations yet rigorous inspection by specialist physicians remains pivotal.

Assuntos

Educação Médica , Avaliação Educacional , Humanos , Avaliação Educacional/métodos , Projetos Piloto , Inteligência Artificial , Redação

17.

Feasibility assurance: a review of automatic item generation in medical assessment.

Falcão, Filipe; Costa, Patrício; Pêgo, José M.

Adv Health Sci Educ Theory Pract ; 27(2): 405-425, 2022 05.

Artigo em Inglês | MEDLINE | ID: mdl-35230589

RESUMO

BACKGROUND: Current demand for multiple-choice questions (MCQs) in medical assessment is greater than the supply. Consequently, an urgency for new item development methods arises. Automatic Item Generation (AIG) promises to overcome this burden, generating calibrated items based on the work of computer algorithms. Despite the promising scenario, there is still no evidence to encourage a general application of AIG in medical assessment. It is therefore important to evaluate AIG regarding its feasibility, validity and item quality. OBJECTIVE: Provide a narrative review regarding the feasibility, validity and item quality of AIG in medical assessment. METHODS: Electronic databases were searched for peer-reviewed, English language articles published between 2000 and 2021 by means of the terms 'Automatic Item Generation', 'Automated Item Generation', 'AIG', 'medical assessment' and 'medical education'. Reviewers screened 119 records and 13 full texts were checked according to the inclusion criteria. A validity framework was implemented in the included studies to draw conclusions regarding the validity of AIG. RESULTS: A total of 10 articles were included in the review. Synthesized data suggests that AIG is a valid and feasible method capable of generating high-quality items. CONCLUSIONS: AIG can solve current problems related to item development. It reveals itself as an auspicious next-generation technique for the future of medical assessment, promising several quality items both quickly and economically.

Assuntos

Projetos de Pesquisa , Estudos de Viabilidade , Humanos

18.

Minimum accepted competency examination: test item analysis.

McCrossan, Paddy; Nicholson, Alf; McCallion, Naomi.

BMC Med Educ ; 22(1): 400, 2022 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-35614439

RESUMO

BACKGROUND: To ascertain if undergraduate medical students attain adequate knowledge to practice in paediatrics, we designed the minimum accepted competency (MAC) examination. This was a set of MCQ's designed to test the most basic, 'must know' knowledge as determined by non-faculty paediatric clinicians. Only two-thirds of undergraduate students passed this exam, despite 96% of the same cohort passing their official university paediatric examination. We aim to describe the psychometric properties of the MAC examination to explore why there was a difference in student performance between these two assessments which should, in theory, be testing the same subject area. We will also investigate if the MAC examination is a potentially reliable method of assessing undergraduate knowledge. METHODS: The MAC examination was sat by three groups of undergraduate medical students and paediatric trainee doctors. Test item analysis was performed using facility index, discrimination index and Cronbach's alpha. RESULTS: Test item difficulty on the MAC between each group was positively correlated. Correlation of item difficulty with the standard set for each item showed a statistically significant positive relationship. However, for 10 of the items, the mean score achieved by the candidates did not even reach two standard deviations below the standard set by the faculty. Medical students outperformed the trainee doctors on three items. 18 of 30 items achieved a discrimination index > 0.2. Cronbach's alpha ranged from 0.22-0.59. CONCLUSION: Despite faculty correctly judging that this would be a difficult paper for the candidates, there were a significant number of items on which students performed particularly badly. It is possible that the clinical emphasis in these non-faculty derived questions was juxtaposed with the factual recall often required for university examinations. The MAC examination highlights the difference in the level of knowledge expected of a junior doctor starting work in paediatrics between faculty and non-faculty clinicians and can identify gaps between the current curriculum and the 'hidden curriculum' required for real world clinical practice. The faculty comprises physicians in employment by the University whose role it is to design the paediatric curriculum and deliver teaching to undergraduate students. Non-faculty clinicians are paediatric physicians who work soley as clinicians with no affiliation to an educational institution. The concept of a MAC examination to test basic medical knowledge is feasible and the study presented is an encouraging first step towards this method of assessment.

Assuntos

Educação de Graduação em Medicina , Estudantes de Medicina , Criança , Competência Clínica , Currículo , Avaliação Educacional/métodos , Docentes , Humanos , Psicometria

19.

What faculty write versus what students see? Perspectives on multiple-choice questions using Bloom's taxonomy.

Monrad, Seetha U; Bibler Zaidi, Nikki L; Grob, Karri L; Kurtz, Joshua B; Tai, Andrew W; Hortsch, Michael; Gruppen, Larry D; Santen, Sally A.

Med Teach ; 43(5): 575-582, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-33590781

RESUMO

BACKGROUND: Using revised Bloom's taxonomy, some medical educators assume they can write multiple choice questions (MCQs) that specifically assess higher (analyze, apply) versus lower-order (recall) learning. The purpose of this study was to determine whether three key stakeholder groups (students, faculty, and education assessment experts) assign MCQs the same higher- or lower-order level. METHODS: In Phase 1, stakeholders' groups assigned 90 MCQs to Bloom's levels. In Phase 2, faculty wrote 25 MCQs specifically intended as higher- or lower-order. Then, 10 students assigned these questions to Bloom's levels. RESULTS: In Phase 1, there was low interrater reliability within the student group (Krippendorf's alpha = 0.37), the faculty group (alpha = 0.37), and among three groups (alpha = 0.34) when assigning questions as higher- or lower-order. The assessment team alone had high interrater reliability (alpha = 0.90). In Phase 2, 63% of students agreed with the faculty as to whether the MCQs were higher- or lower-order. There was low agreement between paired faculty and student ratings (Cohen's Kappa range .098-.448, mean .256). DISCUSSION: For many questions, faculty and students did not agree whether the questions were lower- or higher-order. While faculty may try to target specific levels of knowledge or clinical reasoning, students may approach the questions differently than intended.

Assuntos

Avaliação Educacional , Redação , Docentes , Humanos , Reprodutibilidade dos Testes , Estudantes

20.

Evaluating the effectiveness of 'MCQ development workshop using cognitive model framework: A pre-post study.

Ali, Rahila; Sultan, Amber Shamim; Zahid, Nida.

J Pak Med Assoc ; 71(1(A)): 119-121, 2021 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-33484534

RESUMO

A workshop on MCQ development using cognitive model framework was conducted for health educators from Aga Khan University (AKU) and other academic institutions. The aim was to develop the skill of preparing MCQs for assessing higher cognitive levels. A pre-post study was conducted, participant satisfaction was evaluated and pre-post test scores were used to assess learning capability of the workshop participants. Out of the 19 who attended the workshop, 16 participated in the pre- and post-tests and were included in the study through convenience sampling. The total duration of the study was six months. There was a significant difference in the overall pre-post test scores of the participants with a mean difference of -4.176 ± 4.83 (p-value < 0.05). A significant difference was observed in the mean pre-post test scores of junior faculty (-6.350± 4.5829; p-value = 0.02). The mean pre-test scores of junior faculty were significantly lower 4.950±2.83 as compared to the senior faculty 10.417±1.56 (p-value= 0.001). Active participation in faculty development workshops may lead to enhancing skills for preparing one-best MCQs based on international guidelines.

Assuntos

Cognição , Aprendizagem , Avaliação Educacional , Humanos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa