Pesquisa | Secretaria de Estado da Saúde

ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions.

Hoch, Cosima C; Wollenberg, Barbara; Lüers, Jan-Christoffer; Knoedler, Samuel; Knoedler, Leonard; Frank, Konstantin; Cotofana, Sebastian; Alfertshofer, Michael.

Eur Arch Otorhinolaryngol ; 280(9): 4271-4278, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37285018

RESUMO

PURPOSE: With the increasing adoption of artificial intelligence (AI) in various domains, including healthcare, there is growing acceptance and interest in consulting AI models to provide medical information and advice. This study aimed to evaluate the accuracy of ChatGPT's responses to practice quiz questions designed for otolaryngology board certification and decipher potential performance disparities across different otolaryngology subspecialties. METHODS: A dataset covering 15 otolaryngology subspecialties was collected from an online learning platform funded by the German Society of Oto-Rhino-Laryngology, Head and Neck Surgery, designed for board certification examination preparation. These questions were entered into ChatGPT, with its responses being analyzed for accuracy and variance in performance. RESULTS: The dataset included 2576 questions (479 multiple-choice and 2097 single-choice), of which 57% (n = 1475) were answered correctly by ChatGPT. An in-depth analysis of question style revealed that single-choice questions were associated with a significantly higher rate (p < 0.001) of correct responses (n = 1313; 63%) compared to multiple-choice questions (n = 162; 34%). Stratified by question categories, ChatGPT yielded the highest rate of correct responses (n = 151; 72%) in the field of allergology, whereas 7 out of 10 questions (n = 65; 71%) on legal otolaryngology aspects were answered incorrectly. CONCLUSION: The study reveals ChatGPT's potential as a supplementary tool for otolaryngology board certification preparation. However, its propensity for errors in certain otolaryngology areas calls for further refinement. Future research should address these limitations to improve ChatGPT's educational use. An approach, with expert collaboration, is recommended for the reliable and accurate integration of such AI models.

Assuntos

Inteligência Artificial , Otolaringologia , Humanos , Certificação , Escolaridade , Encaminhamento e Consulta

Use of Multiple-Choice Items in Summative Examinations: Questionnaire Survey Among German Undergraduate Dental Training Programs.

Rössler, Lena; Herrmann, Manfred; Wiegand, Annette; Kanzow, Philipp.

JMIR Med Educ ; 10: e58126, 2024 Jun 27.

Artigo em Inglês | MEDLINE | ID: mdl-38952022

RESUMO

Background: Multiple-choice examinations are frequently used in German dental schools. However, details regarding the used item types and applied scoring methods are lacking. Objective: This study aims to gain insight into the current use of multiple-choice items (ie, questions) in summative examinations in German undergraduate dental training programs. Methods: A paper-based 10-item questionnaire regarding the used assessment methods, multiple-choice item types, and applied scoring methods was designed. The pilot-tested questionnaire was mailed to the deans of studies and to the heads of the Department of Operative/Restorative Dentistry at all 30 dental schools in Germany in February 2023. Statistical analysis was performed using the Fisher exact test (P<.05). Results: The response rate amounted to 90% (27/30 dental schools). All respondent dental schools used multiple-choice examinations for summative assessments. Examinations were delivered electronically by 70% (19/27) of the dental schools. Almost all dental schools used single-choice Type A items (24/27, 89%), which accounted for the largest number of items in approximately half of the dental schools (13/27, 48%). Further item types (eg, conventional multiple-select items, Multiple-True-False, and Pick-N) were only used by fewer dental schools (≤67%, up to 18 out of 27 dental schools). For the multiple-select item types, the applied scoring methods varied considerably (ie, awarding [intermediate] partial credit and requirements for partial credit). Dental schools with the possibility of electronic examinations used multiple-select items slightly more often (14/19, 74% vs 4/8, 50%). However, this difference was statistically not significant (P=.38). Dental schools used items either individually or as key feature problems consisting of a clinical case scenario followed by a number of items focusing on critical treatment steps (15/27, 56%). Not a single school used alternative testing methods (eg, answer-until-correct). A formal item review process was established at about half of the dental schools (15/27, 56%). Conclusions: Summative assessment methods among German dental schools vary widely. Especially, a large variability regarding the use and scoring of multiple-select multiple-choice items was found.

Assuntos

Educação em Odontologia , Avaliação Educacional , Alemanha , Humanos , Inquéritos e Questionários , Avaliação Educacional/métodos , Educação em Odontologia/métodos , Faculdades de Odontologia

Scoring Single-Response Multiple-Choice Items: Scoping Review and Comparison of Different Scoring Methods.

Kanzow, Amelie Friederike; Schmidt, Dennis; Kanzow, Philipp.

JMIR Med Educ ; 9: e44084, 2023 May 19.

Artigo em Inglês | MEDLINE | ID: mdl-37001510

RESUMO

BACKGROUND: Single-choice items (eg, best-answer items, alternate-choice items, single true-false items) are 1 type of multiple-choice items and have been used in examinations for over 100 years. At the end of every examination, the examinees' responses have to be analyzed and scored to derive information about examinees' true knowledge. OBJECTIVE: The aim of this paper is to compile scoring methods for individual single-choice items described in the literature. Furthermore, the metric expected chance score and the relation between examinees' true knowledge and expected scoring results (averaged percentage score) are analyzed. Besides, implications for potential pass marks to be used in examinations to test examinees for a predefined level of true knowledge are derived. METHODS: Scoring methods for individual single-choice items were extracted from various databases (ERIC, PsycInfo, Embase via Ovid, MEDLINE via PubMed) in September 2020. Eligible sources reported on scoring methods for individual single-choice items in written examinations including but not limited to medical education. Separately for items with n=2 answer options (eg, alternate-choice items, single true-false items) and best-answer items with n=5 answer options (eg, Type A items) and for each identified scoring method, the metric expected chance score and the expected scoring results as a function of examinees' true knowledge using fictitious examinations with 100 single-choice items were calculated. RESULTS: A total of 21 different scoring methods were identified from the 258 included sources, with varying consideration of correctly marked, omitted, and incorrectly marked items. Resulting credit varied between -3 and +1 credit points per item. For items with n=2 answer options, expected chance scores from random guessing ranged between -1 and +0.75 credit points. For items with n=5 answer options, expected chance scores ranged between -2.2 and +0.84 credit points. All scoring methods showed a linear relation between examinees' true knowledge and the expected scoring results. Depending on the scoring method used, examination results differed considerably: Expected scoring results from examinees with 50% true knowledge ranged between 0.0% (95% CI 0% to 0%) and 87.5% (95% CI 81.0% to 94.0%) for items with n=2 and between -60.0% (95% CI -60% to -60%) and 92.0% (95% CI 86.7% to 97.3%) for items with n=5. CONCLUSIONS: In examinations with single-choice items, the scoring result is not always equivalent to examinees' true knowledge. When interpreting examination scores and setting pass marks, the number of answer options per item must usually be taken into account in addition to the scoring method used.

Preference for Smaller Sooner Over Larger Later Rewards in ADHD: Contribution of Delay Duration and Paradigm Type.

Yu, Xue; Sonuga-Barke, Edmund; Liu, Xiangping.

J Atten Disord ; 22(10): 984-993, 2018 08.

Artigo em Inglês | MEDLINE | ID: mdl-25672671

RESUMO

OBJECTIVE: Individuals with ADHD preferentially choose smaller sooner (SS) over larger later (LL) rewards, termed impulsive choice. This has been observed to different degrees on single-choice and more complex discounting tasks using various types of rewards and durations of delays. There has been no direct comparison of performance of ADHD children using these two paradigms. METHOD: Two experimental paradigms, single-choice and temporal discounting, each including two delay conditions (13 and 25 s), were administered to 7- to 9-year-old children with ADHD ( n = 17) and matched controls ( n = 24). RESULTS: Individuals with ADHD chose more SS rewards than controls on both tasks, but in the long delay condition only. CONCLUSION: These findings demonstrate that delay durations rather than paradigm types determine laboratory-based measures of choice impulsivity in ADHD.

Assuntos

Transtorno do Deficit de Atenção com Hiperatividade/psicologia , Desvalorização pelo Atraso/fisiologia , Recompensa , Atenção/fisiologia , Criança , Feminino , Humanos , Comportamento Impulsivo/fisiologia , Masculino , Testes Neuropsicológicos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa