RESUMO
Human face recognition is highly accurate and exhibits a number of distinctive and well-documented behavioral "signatures" such as the use of a characteristic representational space, the disproportionate performance cost when stimuli are presented upside down, and the drop in accuracy for faces from races the participant is less familiar with. These and other phenomena have long been taken as evidence that face recognition is "special". But why does human face perception exhibit these properties in the first place? Here, we use deep convolutional neural networks (CNNs) to test the hypothesis that all of these signatures of human face perception result from optimization for the task of face recognition. Indeed, as predicted by this hypothesis, these phenomena are all found in CNNs trained on face recognition, but not in CNNs trained on object recognition, even when additionally trained to detect faces while matching the amount of face experience. To test whether these signatures are in principle specific to faces, we optimized a CNN on car discrimination and tested it on upright and inverted car images. As we found for face perception, the car-trained network showed a drop in performance for inverted vs. upright cars. Similarly, CNNs trained on inverted faces produced an inverted face inversion effect. These findings show that the behavioral signatures of human face perception reflect and are well explained as the result of optimization for the task of face recognition, and that the nature of the computations underlying this task may not be so special after all.
Assuntos
Reconhecimento Facial , Humanos , Face , Percepção Visual , Orientação Espacial , Automóveis , Reconhecimento Visual de ModelosRESUMO
We demonstrate that a neural network pretrained on text and fine-tuned on code solves mathematics course problems, explains solutions, and generates questions at a human level. We automatically synthesize programs using few-shot learning and OpenAI's Codex transformer and execute them to solve course problems at 81% automatic accuracy. We curate a dataset of questions from Massachusetts Institute of Technology (MIT)'s largest mathematics courses (Single Variable and Multivariable Calculus, Differential Equations, Introduction to Probability and Statistics, Linear Algebra, and Mathematics for Computer Science) and Columbia University's Computational Linear Algebra. We solve questions from a MATH dataset (on Prealgebra, Algebra, Counting and Probability, Intermediate Algebra, Number Theory, and Precalculus), the latest benchmark of advanced mathematics problems designed to assess mathematical reasoning. We randomly sample questions and generate solutions with multiple modalities, including numbers, equations, and plots. The latest GPT-3 language model pretrained on text automatically solves only 18.8% of these university questions using zero-shot learning and 30.8% using few-shot learning and the most recent chain of thought prompting. In contrast, program synthesis with few-shot learning using Codex fine-tuned on code generates programs that automatically solve 81% of these questions. Our approach improves the previous state-of-the-art automatic solution accuracy on the benchmark topics from 8.8 to 81.1%. We perform a survey to evaluate the quality and difficulty of generated questions. This work automatically solves university-level mathematics course questions at a human level and explains and generates university-level mathematics course questions at scale, a milestone for higher education.
Assuntos
Matemática , Redes Neurais de Computação , Resolução de Problemas , Humanos , Massachusetts , UniversidadesRESUMO
Epidemiologists are attempting to address research questions of increasing complexity by developing novel methods for combining information from diverse sources. Cole et al. (Am J Epidemiol. 2023;192(3)467-474) provide 2 examples of the process of combining information to draw inferences about a population proportion. In this commentary, we consider combining information to learn about a target population as an epidemiologic activity and distinguish it from more conventional meta-analyses. We examine possible rationales for combining information and discuss broad methodological considerations, with an emphasis on study design, assumptions, and sources of uncertainty.
Assuntos
Métodos Epidemiológicos , Humanos , Metanálise como Assunto , Estudos Epidemiológicos , Projetos de Pesquisa Epidemiológica , IncertezaRESUMO
Lung cancer is the leading cause of cancer death in the US and globally. The mortality from lung cancer has been declining, due to a reduction in incidence and advances in treatment. Although recent success in developing targeted and immunotherapies for lung cancer has benefitted patients, it has also expanded the complexity of potential treatment options for health care providers. To aid in reducing such complexity, experts in oncology convened a conference (Bridging the Gaps in Lung Cancer) to identify current knowledge gaps and controversies in the diagnosis, treatment, and outcomes of various lung cancer scenarios, as described here. Such scenarios relate to biomarkers and testing in lung cancer, small cell lung cancer, EGFR mutations and targeted therapy in non-small cell lung cancer (NSCLC), early-stage NSCLC, KRAS/BRAF/MET and other genomic alterations in NSCLC, and immunotherapy in advanced NSCLC.
RESUMO
The unprecedented demand for severe acute respiratory syndrome coronavirus 2 (SARSCoV2) testing led to challenges in prioritizing and processing specimens efficiently. We describe and evaluate a novel workflow using provider- and patient-facing ask at order entry (AOE) questions to generate distinctive icons on specimen labels for within-laboratory clinical decision support (CDS) for specimen triaging. A multidisciplinary committee established target turnaround times (TATs) for SARS-CoV-2 nucleic acid amplification test (NAAT) based on common clinical scenarios. A set of AOE questions was used to collect relevant clinical information that prompted icon generation for triaging SARS-CoV-2 NAAT specimens. We assessed the collect-to-verify TATs among relevant clinical scenarios. Our study included a total of 1,385,813 SARS-CoV-2 NAAT conducted from March 2020 to June 2022. Most testing met the TAT targets established by institutional committees, but deviations from target TATs occurred during periods of high demand and supply shortages. Median TATs for emergency department (ED) and inpatient specimens and ambulatory pre-procedure populations were stable over the pandemic. However, healthcare worker and other ambulatory test TATs varied substantially, depending on testing volume and community transmission rates. Median TAT significantly differed throughout the pandemic for ED and inpatient clinical scenarios, and there were significant differences in TAT among label icon-signified ambulatory clinical scenarios. We describe a novel approach to CDS for triaging specimens within the laboratory. The use of CDS tools could help clinical laboratories prioritize and process specimens efficiently, especially during times of high demand. Further studies are needed to evaluate the impact of our CDS tool on overall laboratory efficiency and patient outcomes. IMPORTANCE We describe a novel approach to clinical decision support (CDS) for triaging specimens within the clinical laboratory for severe acute respiratory syndrome coronavirus 2 (SARSCoV2) nucleic acid amplification tests (NAAT). The use of our CDS tool could help clinical laboratories prioritize and process specimens efficiently, especially during times of high demand. There were significant differences in the turnaround time for specimens differentiated by icons on specimen labels. Further studies are needed to evaluate the impact of our CDS tool on overall laboratory efficiency and patient outcomes.
Assuntos
COVID-19 , Sistemas de Apoio a Decisões Clínicas , Laboratórios Hospitalares , Humanos , SARS-CoV-2/genética , COVID-19/diagnóstico , Estudos Retrospectivos , Fluxo de Trabalho , Técnicas de Amplificação de Ácido NucleicoRESUMO
BACKGROUND: In an effort to improve migraine management around the world, the International Headache Society (IHS) has here developed a list of practical recommendations for the acute pharmacological treatment of migraine. The recommendations are categorized into optimal and essential, in order to provide treatment options for all possible settings, including those with limited access to migraine medications. METHODS: An IHS steering committee developed a list of clinical questions based on practical issues in the management of migraine. A selected group of international senior and junior headache experts developed the recommendations, following expert consensus and the review of available national and international headache guidelines and guidance documents. Following the initial search, a bibliography of twenty-one national and international guidelines was created and reviewed by the working group. RESULTS: A total of seventeen questions addressing different aspects of acute migraine treatment have been outlined. For each of them we provide an optimal recommendation, to be used whenever possible, and an essential recommendation to be used when the optimal level cannot be attained. CONCLUSION: Adoption of these international recommendations will improve the quality of acute migraine treatment around the world, even where pharmacological options remain limited.
Assuntos
Transtornos de Enxaqueca , Transtornos de Enxaqueca/tratamento farmacológico , Humanos , Analgésicos/uso terapêutico , Sociedades Médicas/normasRESUMO
PURPOSE: Artificial intelligence, specifically large language models such as ChatGPT, offers valuable potential benefits in question (item) writing. This study aimed to determine the feasibility of generating case-based multiple-choice questions using ChatGPT in terms of item difficulty and discrimination levels. METHODS: This study involved 99 fourth-year medical students who participated in a rational pharmacotherapy clerkship carried out based-on the WHO 6-Step Model. In response to a prompt that we provided, ChatGPT generated ten case-based multiple-choice questions on hypertension. Following an expert panel, two of these multiple-choice questions were incorporated into a medical school exam without making any changes in the questions. Based on the administration of the test, we evaluated their psychometric properties, including item difficulty, item discrimination (point-biserial correlation), and functionality of the options. RESULTS: Both questions exhibited acceptable levels of point-biserial correlation, which is higher than the threshold of 0.30 (0.41 and 0.39). However, one question had three non-functional options (options chosen by fewer than 5% of the exam participants) while the other question had none. CONCLUSIONS: The findings showed that the questions can effectively differentiate between students who perform at high and low levels, which also point out the potential of ChatGPT as an artificial intelligence tool in test development. Future studies may use the prompt to generate items in order for enhancing the external validity of the results by gathering data from diverse institutions and settings.
Assuntos
Hipertensão , Estudantes de Medicina , Humanos , Inteligência Artificial , Faculdades de MedicinaRESUMO
BACKGROUND: Enhanced communication in end-of-life care (EOL) improves preparation and treatment decisions for patients with advanced cancer, affecting their quality of life at the end of life. Question prompt list (QPL) has been shown to enhance physician-patient communication in patients with cancer, but there is a lack of systematic review and meta-analysis for those with advanced cancer. Enhanced communication in end-of-life care improves preparation and treatment decisions for patients with advanced cancer, affecting their quality of life at the end of life. OBJECTIVE: To review the effectiveness of QPL intervention on physician-patient communication and health outcomes during consultation in patients with advanced cancer. METHODS: CINAHL, Embase, Scopus, and PsycINFO databases were undertaken using inclusion criteria for relevant articles up to August 2021. Pooled standardized mean difference (SMD) and 95% confidence intervals (CIs) were calculated using random-effects models. We used the Cochrane risk-of-bias assessment tool and modified Jadad scale to assess the quality of the studies. RESULTS: Seven RCTs with 1059 participants were included, of which six studies were eligible for the meta-analysis. The pooled meta-analysis results indicated that QPL in patients with advanced cancer had a significant positive effect on the total number of questions asked (SMD, 0.73; 95% CI, 0.28 to 1.18; I2 = 83%) and on the patients' expectations for the future (SMD, 0.67; 95% CI, 0.08 to 1.25; I2 = 88%). There were no significant improvements in health-related outcomes such as end of life, anxiety, and quality of life. CONCLUSIONS: Using QPL in advanced cancer consultations boosts patient questions which helps communication but not health-related indicators. Optimal results depend on full reading, but timing varies. Future research should examine the relationship between communication and health outcomes, including patient/physician behavior and social context.
Assuntos
Comunicação , Neoplasias , Relações Médico-Paciente , Qualidade de Vida , Assistência Terminal , Humanos , Neoplasias/psicologia , Neoplasias/terapia , Assistência Terminal/métodos , Assistência Terminal/psicologia , Ensaios Clínicos Controlados Aleatórios como AssuntoRESUMO
Given the high prevalence of multiple-choice examinations with formula scoring in medical training, several studies have tried to identify other factors in addition to the degree of knowledge of students which influence their response patterns. This study aims to measure the effect of students' attitude towards risk and ambiguity on their number of correct, wrong, and blank answers. In October 2018, 233 3rd year medical students from the Faculty of Medicine of the University of Porto, in Porto, Portugal, completed a questionnaire which assessed the student's attitudes towards risk and ambiguity, and aversion to ambiguity in medicine. Simple and multiple regression models and the respective regression coefficients were used to measure the association between the students' attitudes, and their answers in two examinations that they had taken in June 2018. Having an intermediate level of ambiguity aversion in medicine (as opposed to a very high or low level) was associated with a significant increase in the number of correct answers and decrease in the number of blank answers in the first examination. In the second examination, high levels of ambiguity aversion in medicine were associated with a decrease in the number of wrong answers. Attitude towards risk, tolerance for ambiguity, and gender did not show significant association with the number of correct, wrong, and blank answers for either examination. Students' ambiguity aversion in medicine is correlated with their performance in multiple-choice examinations with negative marking. Therefore, it is suggested the planning and implementation of counselling sessions with medical students regarding the possible impact of ambiguity aversion on their performance in multiple-choice questions with negative marking.
Assuntos
Avaliação Educacional , Estudantes de Medicina , Humanos , Portugal , Estudos Transversais , Feminino , Masculino , Estudantes de Medicina/psicologia , Faculdades de Medicina , Educação de Graduação em Medicina , Adulto Jovem , Inquéritos e Questionários , Adulto , Risco , AtitudeRESUMO
Adult verbal input occurs frequently during parent-child interactions. However, few studies have considered how parent language varies across informal STEM (science, technology, engineering, and math) activities. In this study, we examined how open and closed parent questions (a) differed across three STEM activities and (b) related to math, science, and vocabulary knowledge in their preschool-aged children. A total of 173 parents and their preschool children (Mage = 4 years) from lower socioeconomic households were video-recorded participating in three STEM-related activities: (a) a pretend grocery store activity, (b) a bridge-building challenge, and (c) a book read about a science topic. Parent questions were categorized as open or closed according to the presence of key question terms. Results indicate that the three activities elicited different frequencies of parent open and closed questions, with the grocery store activity containing the most open and closed questions. Children's science knowledge was predicted by the frequency and proportion of parent open questions during the book read. These results enhance our understanding of the role of parent questions in young children's language environments in different informal learning contexts.
Assuntos
Engenharia , Aprendizagem , Matemática , Relações Pais-Filho , Ciência , Tecnologia , Humanos , Pré-Escolar , Masculino , Feminino , Matemática/educação , Ciência/educação , Engenharia/educação , Adulto , Pais/psicologia , VocabulárioRESUMO
ChatGPT's role in creating multiple-choice questions (MCQs) is growing but the validity of these artificial-intelligence-generated questions is unclear. This literature review was conducted to address the urgent need for understanding the application of ChatGPT in generating MCQs for medical education. Following the database search and screening of 1920 studies, we found 23 relevant studies. We extracted the prompts for MCQ generation and assessed the validity evidence of MCQs. The findings showed that prompts varied, including referencing specific exam styles and adopting specific personas, which align with recommended prompt engineering tactics. The validity evidence covered various domains, showing mixed accuracy rates, with some studies indicating comparable quality to human-written questions, and others highlighting differences in difficulty and discrimination levels, alongside a significant reduction in question creation time. Despite its efficiency, we highlight the necessity of careful review and suggest a need for further research to optimize the use of ChatGPT in question generation. Main messages Ensure high-quality outputs by utilizing well-designed prompts; medical educators should prioritize the use of detailed, clear ChatGPT prompts when generating MCQs. Avoid using ChatGPT-generated MCQs directly in examinations without thorough review to prevent inaccuracies and ensure relevance. Leverage ChatGPT's potential to streamline the test development process, enhancing efficiency without compromising quality.
RESUMO
The human ability to produce and understand an indefinite number of sentences is driven by syntax, a cognitive system that can combine a finite number of primitive linguistic elements to build arbitrarily complex expressions. The expressive power of syntax comes in part from its ability to encode potentially unbounded dependencies over abstract structural configurations. How does such a system develop in human minds? We show that 18-mo-old infants are capable of representing abstract nonlocal dependencies, suggesting that a core property of syntax emerges early in development. Our test case is English wh-questions, in which a fronted wh-phrase can act as the argument of a verb at a distance (e.g., What did the chef burn?). Whereas prior work has focused on infants' interpretations of these questions, we introduce a test to probe their underlying syntactic representations, independent of meaning. We ask when infants know that an object wh-phrase and a local object of a verb cannot co-occur because they both express the same argument relation (e.g., * What did the chef burn the pizza ). We find that 1) 18 mo olds demonstrate awareness of this complementary distribution pattern and thus represent the nonlocal grammatical dependency between the wh-phrase and the verb, but 2) younger infants do not. These results suggest that the second year of life is a period of active syntactic development, during which the computational capacities for representing nonlocal syntactic dependencies become evident.
Assuntos
Desenvolvimento da Linguagem , Fala/fisiologia , Cognição/fisiologia , Compreensão , Feminino , Humanos , Lactente , MasculinoRESUMO
BACKGROUND: The Bernese Periacetabular Osteotomy (PAO) has become a popular surgery for fixing development dysplasia of the hip, yet the most common concerns of the PAO population remains ambiguous. The aim of this study was to investigate Facebook, Instagram and Twitter to further understand what the most common preoperative and postoperative questions patients undergoing PAO are asking. We hypothesized most questions would be asked by patients in the preoperative timeframe with regards to education surrounding PAO surgery. METHODS: Facebook, Instagram and Twitter were queried consecutively from February 1, 2023 to November 23, 2011. Facebook was searched for the two most populated interest groups; "Periacetabular Osteotomy (PAO)" and "Periacetabular Osteotomy Australia". Instagram and Twitter were queried for the most popular hashtags: "#PAOwarrior", "#PAOsurgery", "#periacetabularosteotomy", "#periacetabularosteotomyrecovery", and "#paorecovery". Patient questions were categorized according to preoperative and postoperative questions. Questions were further placed into specific themes in their respective preoperative or postoperative question types. RESULTS: Two thousand five hundred and fifty-nine posts were collected, with 849 (33%) posts containing 966 questions. Of the 966 questions, 443 (45.9%) and 523 (54.1%) were preoperative and postoperative questions, respectively. The majority of questions were postoperative complication related (23%) and symptom management (21%). Other postoperative questions included recovery/rehabilitation (21%), and general postoperative questions (18%). The most common preoperative questions were related to PAO education (23%). Rehabilitation (19%), hip dysplasia education (17%), and surgeon selection (12%) were other preoperative questions topics included. Most questions came from Facebook posts. Of 1,054 Facebook posts, 76% were either preoperative or postoperative questions and from the perspective of the patient (87%). CONCLUSION: The majority of patients in the PAO population sought advice on postoperative complications and symptom management. Some patients asked about education surrounding PAO surgery. Understanding the most common concerns and questions patients have can help providers educate patients and focus on more patient-relevant perioperative conversations.
Assuntos
Luxação do Quadril , Mídias Sociais , Humanos , Acetábulo/cirurgia , Resultado do Tratamento , Estudos Retrospectivos , Luxação do Quadril/cirurgia , Osteotomia/efeitos adversos , Complicações Pós-Operatórias/epidemiologia , Complicações Pós-Operatórias/etiologia , Complicações Pós-Operatórias/cirurgia , Articulação do Quadril/cirurgiaRESUMO
BACKGROUND: Health information consumers increasingly rely on question-and-answer (Q&A) communities to address their health concerns. However, the quality of questions posted significantly impacts the likelihood and relevance of received answers. OBJECTIVE: This study aims to improve our understanding of the quality of health questions within web-based Q&A communities. METHODS: We develop a novel framework for defining and measuring question quality within web-based health communities, incorporating content- and language-based variables. This framework leverages k-means clustering and establishes automated metrics to assess overall question quality. To validate our framework, we analyze questions related to kidney disease from expert-curated and community-based Q&A platforms. Expert evaluations confirm the validity of our quality construct, while regression analysis helps identify key variables. RESULTS: High-quality questions were more likely to include demographic and medical information than lower-quality questions (P<.001). In contrast, asking questions at the various stages of disease development was less likely to reflect high-quality questions (P<.001). Low-quality questions were generally shorter with lengthier sentences than high-quality questions (P<.01). CONCLUSIONS: Our findings empower consumers to formulate more effective health information questions, ultimately leading to better engagement and more valuable insights within web-based Q&A communities. Furthermore, our findings provide valuable insights for platform developers and moderators seeking to enhance the quality of user interactions and foster a more trustworthy and informative environment for health information exchange.
Assuntos
Informação de Saúde ao Consumidor , Humanos , Informação de Saúde ao Consumidor/normas , Idioma , Internet , Inquéritos e Questionários/normasRESUMO
The 2022 multi-country Monkeypox (Mpox) outbreak has added concerns to scientific research. However, unanswered questions about the disease remain. These unanswered questions lie in different aspects, such as transmission, the affected community, clinical presentations, infection and prevention control and treatment and vaccination. It is imperative to address these issues to stop the spread and transmission of disease. We documented unanswered questions with Mpox and offered suggestions that could help put health policy into practice. One of those questions is why gay, bisexual or other men who have sex with men (gbMSM) are the most affected community, underscoring the importance of prioritizing this community regarding treatment, vaccination and post-exposure prophylaxis. In addition, destigmatizing gbMSM and implementing community-based gbMSM consultation and action alongside ethical surveillance can facilitate other preventive measures such as ring vaccination to curb disease transmission and track vaccine efficacy. Relevant to that, vaccine and drug side effects have implied the questionability of their use and stimulated the importance of health policy development regarding expanded access and off-label use, expressing the need for safe drug and vaccine development manufacturing. The possibility of reverse zoonotic has also been raised, thus indicating the requirement to screen not only humans, but also their related animals to understand the real magnitude of reverse zoonosis and its potential risks. Implementing infection prevention and control measures to stop the virus circulation at the human-animal interface that includes One Health approach is essential.
Assuntos
Mpox , Minorias Sexuais e de Gênero , Animais , Masculino , Humanos , Homossexualidade Masculina , Política de Saúde , ZoonosesRESUMO
PURPOSE: To examine and compare ChatGPT versus Google websites in answering common head and neck cancer questions. MATERIALS AND METHODS: Commonly asked questions about head and neck cancer were obtained and inputted into both ChatGPT-4 and Google search engine. For each question, the ChatGPT response and first website search result were compiled and examined. Content quality was assessed by independent reviewers using standardized grading criteria and the modified Ensuring Quality Information for Patients (EQIP) tool. Readability was determined using the Flesch reading ease scale. RESULTS: In total, 49 questions related to head and neck cancer were included. Google sources were on average significantly higher quality than ChatGPT responses (4.2 vs 3.6, p = 0.005). According to the EQIP tool, Google and ChatGPT had on average similar response rates per criterion (24.4 vs 20.5, p = 0.09) while Google had a significantly higher average score per question than ChatGPT (13.8 vs 11.7, p < 0.001) According to the Flesch reading ease scale, ChatGPT and Google sources were both considered similarly difficult to read (33.1 vs 37.0, p = 0.180) and at a college level (14.3 vs 14.2, p = 0.820.) CONCLUSION: ChatGPT responses were as challenging to read as Google sources, but poorer quality due to decreased reliability and accuracy in answering questions. Though promising, ChatGPT in its current form should not be considered dependable. Google sources are a preferred resource for patient educational materials.
Assuntos
Neoplasias de Cabeça e Pescoço , Humanos , Reprodutibilidade dos Testes , Neoplasias de Cabeça e Pescoço/terapia , Ferramenta de BuscaRESUMO
BACKGROUND: Crafting quality assessment questions in medical education is a crucial yet time-consuming, expertise-driven undertaking that calls for innovative solutions. Large language models (LLMs), such as ChatGPT (Chat Generative Pre-Trained Transformer), present a promising yet underexplored avenue for such innovations. AIMS: This study explores the utility of ChatGPT to generate diverse, high-quality medical questions, focusing on multiple-choice questions (MCQs) as an illustrative example, to increase educator's productivity and enable self-directed learning for students. DESCRIPTION: Leveraging 12 strategies, we demonstrate how ChatGPT can be effectively used to generate assessment questions aligned with Bloom's taxonomy and core knowledge domains while promoting best practices in assessment design. CONCLUSION: Integrating LLM tools like ChatGPT into generating medical assessment questions like MCQs augments but does not replace human expertise. With continual instruction refinement, AI can produce high-standard questions. Yet, the onus of ensuring ultimate quality and accuracy remains with subject matter experts, affirming the irreplaceable value of human involvement in the artificial intelligence-driven education paradigm.
Assuntos
Inteligência Artificial , Educação Médica , Avaliação Educacional , Humanos , Educação Médica/métodos , Avaliação Educacional/métodosRESUMO
Constructed-response questions (CRQs) require effective marking schemes to ensure that the intended learning objectives and/or professional competencies are appropriately addressed, and valid inferences regarding examinee competence are drawn from such assessments. While the educational literature on writing rubrics has proliferated in recent years, this is largely targeted at classroom use and formative purposes. There is comparatively little guidance on how to develop appropriate marking schemes for summative assessment contexts. The different purposes mean that different principles and practices apply to mark schemes for examinations. In this article, we draw on the educational literature as well as our own practical experience of working with medical and health professional educators on their questions and marking schemes to offer 12 key principles or tips for designing and implementing effective marking schemes.
Assuntos
Avaliação Educacional , Avaliação Educacional/métodos , Humanos , Educação Médica/métodos , Educação Médica/organização & administração , Competência ClínicaRESUMO
PURPOSE: The purpose of this study was to enrich understanding about the perceived benefits and drawbacks of constructed response short-answer questions (CR-SAQs) in preclerkship assessment using Norcini's criteria for good assessment as a framework. METHODS: This multi-institutional study surveyed students and faculty at three institutions. A survey using Likert scale and open-ended questions was developed to evaluate faculty and student perceptions of CR-SAQs using the criteria of good assessment to determine the benefits and drawbacks. Descriptive statistics and Chi-square analyses are presented, and open responses were analyzed using directed content analysis to describe benefits and drawbacks of CR-SAQs. RESULTS: A total of 260 students (19%) and 57 faculty (48%) completed the survey. Students and faculty report that the benefits of CR-SAQs are authenticity, deeper learning (educational effect), and receiving feedback (catalytic effect). Drawbacks included feasibility, construct validity, and scoring reproducibility. Students and faculty found CR-SAQs to be both acceptable (can show your reasoning, partial credit) and unacceptable (stressful, not USMLE format). CONCLUSIONS: CR-SAQs are a method of aligning innovative curricula with assessment and could enrich the assessment toolkit for medical educators.
Assuntos
Educação de Graduação em Medicina , Estudantes de Medicina , Humanos , Currículo , Docentes , Aprendizagem , Reprodutibilidade dos TestesRESUMO
WHAT IS THE EDUCATIONAL CHALLENGE?: A fundamental challenge in medical education is creating high-quality, clinically relevant multiple-choice questions (MCQs). ChatGPT-based automatic item generation (AIG) methods need well-designed prompts. However, the use of these prompts is hindered by the time-consuming process of copying and pasting, a lack of know-how among medical teachers, and the generalist nature of standard ChatGPT, which often lacks the medical context. WHAT ARE THE PROPOSED SOLUTIONS?: The Case-based MCQ Generator, a custom GPT, addresses these challenges. It has been trained by using GPT Builder, which is a platform designed by OpenAI for customizing ChatGPT to meet specific needs, in order to allow users to generate case-based MCQs. By using this free tool for those who have ChatGPT Plus subscription, health professions educators can easily select a prompt, input a learning objective or item-specific test point, and generate clinically relevant questions. WHAT ARE THE POTENTIAL BENEFITS TO A WIDER GLOBAL AUDIENCE?: It enhances the efficiency of MCQ generation and ensures the generation of contextually relevant questions, surpassing the capabilities of standard ChatGPT. It streamlines the MCQ creation process by integrating prompts published in medical education literature, eliminating the need for manual prompt input. WHAT ARE THE NEXT STEPS?: Future development aims at sustainability and addressing ethical and accessibility issues. It requires regular updates, integration of new prompts from emerging health professions education literature, and a supportive digital ecosystem around the tool. Accessibility, especially for educators in low-resource countries, is vital, demanding alternative access models to overcome financial barriers.