Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 3.376
Filtrar
1.
JMIR Med Educ ; 10: e52784, 2024 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-39140269

RESUMEN

Background: With the increasing application of large language models like ChatGPT in various industries, its potential in the medical domain, especially in standardized examinations, has become a focal point of research. Objective: The aim of this study is to assess the clinical performance of ChatGPT, focusing on its accuracy and reliability in the Chinese National Medical Licensing Examination (CNMLE). Methods: The CNMLE 2022 question set, consisting of 500 single-answer multiple choices questions, were reclassified into 15 medical subspecialties. Each question was tested 8 to 12 times in Chinese on the OpenAI platform from April 24 to May 15, 2023. Three key factors were considered: the version of GPT-3.5 and 4.0, the prompt's designation of system roles tailored to medical subspecialties, and repetition for coherence. A passing accuracy threshold was established as 60%. The χ2 tests and κ values were employed to evaluate the model's accuracy and consistency. Results: GPT-4.0 achieved a passing accuracy of 72.7%, which was significantly higher than that of GPT-3.5 (54%; P<.001). The variability rate of repeated responses from GPT-4.0 was lower than that of GPT-3.5 (9% vs 19.5%; P<.001). However, both models showed relatively good response coherence, with κ values of 0.778 and 0.610, respectively. System roles numerically increased accuracy for both GPT-4.0 (0.3%-3.7%) and GPT-3.5 (1.3%-4.5%), and reduced variability by 1.7% and 1.8%, respectively (P>.05). In subgroup analysis, ChatGPT achieved comparable accuracy among different question types (P>.05). GPT-4.0 surpassed the accuracy threshold in 14 of 15 subspecialties, while GPT-3.5 did so in 7 of 15 on the first response. Conclusions: GPT-4.0 passed the CNMLE and outperformed GPT-3.5 in key areas such as accuracy, consistency, and medical subspecialty expertise. Adding a system role insignificantly enhanced the model's reliability and answer coherence. GPT-4.0 showed promising potential in medical education and clinical practice, meriting further study.


Asunto(s)
Evaluación Educacional , Licencia Médica , Humanos , China , Evaluación Educacional/métodos , Evaluación Educacional/normas , Reproducibilidad de los Resultados , Competencia Clínica/normas
3.
Can Med Educ J ; 15(3): 100-103, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39114780

RESUMEN

Validity as a social imperative foregrounds the social consequences of assessment and highlights the importance of building quality into the assessment development and monitoring processes. Validity as a social imperative is informed by current assessment trends such as programmatic-, longitudinal-, and rater-based assessment, and is one of the conceptualizations of validity currently at play in the Health Professions Education (HPE) literature. This Black Ice is intended to help readers to get a grip on how to embed principles of validity as a social imperative in the development and quality monitoring of an assessment. This piece draws on a program of work investigating validity as a social imperative, key HPE literature, and data generated through stakeholder interviews. We describe eight ways to implement validation practices that align with validity as a social imperative.


La validité en tant qu'impératif social met de l'avant les conséquences de l'évaluation des apprentissages sur la société et souligne l'importance d'intégrer la qualité dans le développement et le monitoring de l'évaluation des apprentissages. La validité en tant qu'impératif social est influencée par les tendances actuelles en matière d'évaluation, telles que l'évaluation programmatique, longitudinale et l'évaluation par des évaluateurs. La validité en tant qu'impératif social fait partie des conceptualisations actuellement présentes dans les écrits scientifiques dans le contexte de la pédagogie des sciences de la santé. Ce texte a pour but d'aider les lecteurs à comprendre comment intégrer les principes de validité en tant qu'impératif social dans le développement et le suivi de qualité d'une évaluation. Cet article s'appuie sur un programme de recherche qui examine la validité en tant qu'impératif social, sur les écrits scientifiques en pédagogie des sciences de la santé et des données provenant d'entrevues avec différentes parties prenantes. Nous décrivons huit façons de mettre en œuvre des pratiques de validation qui sont en accord avec la validité en tant qu'impératif social.


Asunto(s)
Evaluación Educacional , Humanos , Reproducibilidad de los Resultados , Evaluación Educacional/métodos , Evaluación Educacional/normas , Empleos en Salud/educación
4.
GMS J Med Educ ; 41(3): Doc30, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39131892

RESUMEN

Objectives: Situational Judgement Tests (SJT) are a cost-efficient method for the assessment of personal characteristics (e.g., empathy, professionalism, ethical thinking) in medical school admission. Recently, complex open-ended response format SJTs have become more feasible to conduct. However, research on their applicability to a German context is missing. This pilot study tests the acceptability, reliability, subgroup differences, and validity of an online SJT with open-ended response format developed in Canada ("Casper"). Methods: German medical school applicants and students from Hamburg were invited to take Casper in 2020 and 2021. The test consisted of 12 video- and text-based scenarios, each followed by three open-ended questions. Participants subsequently evaluated their test experience in an online survey. Data on sociodemographic characteristics, other admission criteria (Abitur, TMS, HAM-Nat, HAM-SJT) and study success (OSCE) was available in a central research database (stav). Results: The full sample consisted of 582 participants. Test-takers' global perception of Casper was positive. Internal consistency was satisfactory in both years (α=0.73; 0.82) while interrater agreement was moderate (ICC(1,2)=0.54). Participants who were female (d=0.37) or did not have a migration background (d=0.40) received higher scores. Casper scores correlated with HAM-SJT (r=.18) but not with OSCE communication stations performance. The test was also related to Abitur grades (r=-.15), the TMS (r=.18), and HAM-Nat logical reasoning scores (r=.23). Conclusion: This study provides positive evidence for the acceptability, internal consistency, and convergent validity of Casper. The selection and training of raters as well as the scenario content require further observation and adjustments to a German context to improve interrater reliability and predictive validity.


Asunto(s)
Criterios de Admisión Escolar , Facultades de Medicina , Estudiantes de Medicina , Humanos , Alemania , Femenino , Masculino , Proyectos Piloto , Reproducibilidad de los Resultados , Estudiantes de Medicina/psicología , Estudiantes de Medicina/estadística & datos numéricos , Adulto , Juicio , Evaluación Educacional/métodos , Evaluación Educacional/normas , Encuestas y Cuestionarios , Adulto Joven , Empatía , Profesionalismo/normas
5.
Am J Pharm Educ ; 88(8): 100757, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38996841

RESUMEN

OBJECTIVE: To determine the impact of item-writing flaws and cognitive level on student performance metrics in 1 course series across 2 semesters at a single institution. METHODS: Four investigators reviewed 928 multiple-choice items from an integrated therapeutics course series. Differences in performance metrics were examined between flawed and standard items, flawed stems and flawed answer choices, and cognitive levels. RESULTS: Reviewers found that 80% of the items were flawed, with the most common types being implausible distractors and unfocused stems. Flawed items were generally easier than standard ones, but the type of flaw significantly impacted the difficulty. Items with flawed stems had the same difficulty as standard items; however, those with flawed answer choices were significantly easier. Most items tested lower-level skills and have more flaws than higher-level items. There was no significant difference in difficulty between lower- and higher-level cognitive items, and higher-level items were more likely to have answer flaws than item flaws. CONCLUSION: Item-writing flaws differently impact student performance. Implausible distractors artificially lower the difficulty of questions, even those designed to assess higher-level skills. This effect contributes to a lack of significant difference in difficulty between higher- and lower-level items. Unfocused stems, on the other hand, likely increase confusion and hinder performance, regardless of the question's cognitive complexity.


Asunto(s)
Educación en Farmacia , Evaluación Educacional , Estudiantes de Farmacia , Humanos , Evaluación Educacional/métodos , Evaluación Educacional/normas , Educación en Farmacia/métodos , Educación en Farmacia/normas , Curriculum , Cognición
6.
Acta Psychol (Amst) ; 248: 104399, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38996670

RESUMEN

The demand for Industrial Engineers across the globe is significantly increasing and professional certifications give advantage in the job marketplace. The Philippine Institute of Industrial Engineers (PIIE) - Industrial Engineering Certification Board facilitates the conferment of the Certified Industrial Engineer (CIE) in the Philippines. The goal of this study was to determine the factors affecting the intention IEs in the Philippines to take the CIE examination using an integration of the Perceived Value Theory and the Extended Theory of Planned Behavior. The research data were collected through an online distributed survey questionnaire to 690 graduating students and graduates from private and public universities across the Philippines. Employing a variance-based partial least squares structural equation modeling, the different significant variables and factors were assessed holistically. It was seen that attitude, perceived behavioral control, subjective norms, and understanding of the CIE examination have significant positive effects to intent to take the CIE examination and becoming a CIE. The perceived benefits and positive emotions brought by becoming a CIE significantly affects the attitude and behavior. This study also confirmed that the higher the perceived return on investment, the more the IEs will take the CIE examination. Moreover, demographic characteristics were identified to correlate and are significant among different variables. As a reflection, the findings and the integrated framework can be utilized in future studies related to development, career pathing, lifelong learning, and other related professional education.


Asunto(s)
Certificación , Ingeniería , Humanos , Filipinas , Ingeniería/normas , Ingeniería/educación , Masculino , Femenino , Adulto , Certificación/normas , Encuestas y Cuestionarios , Actitud , Adulto Joven , Intención , Evaluación Educacional/normas
7.
J Med Internet Res ; 26: e60807, 2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39052324

RESUMEN

BACKGROUND: Over the past 2 years, researchers have used various medical licensing examinations to test whether ChatGPT (OpenAI) possesses accurate medical knowledge. The performance of each version of ChatGPT on the medical licensing examination in multiple environments showed remarkable differences. At this stage, there is still a lack of a comprehensive understanding of the variability in ChatGPT's performance on different medical licensing examinations. OBJECTIVE: In this study, we reviewed all studies on ChatGPT performance in medical licensing examinations up to March 2024. This review aims to contribute to the evolving discourse on artificial intelligence (AI) in medical education by providing a comprehensive analysis of the performance of ChatGPT in various environments. The insights gained from this systematic review will guide educators, policymakers, and technical experts to effectively and judiciously use AI in medical education. METHODS: We searched the literature published between January 1, 2022, and March 29, 2024, by searching query strings in Web of Science, PubMed, and Scopus. Two authors screened the literature according to the inclusion and exclusion criteria, extracted data, and independently assessed the quality of the literature concerning Quality Assessment of Diagnostic Accuracy Studies-2. We conducted both qualitative and quantitative analyses. RESULTS: A total of 45 studies on the performance of different versions of ChatGPT in medical licensing examinations were included in this study. GPT-4 achieved an overall accuracy rate of 81% (95% CI 78-84; P<.01), significantly surpassing the 58% (95% CI 53-63; P<.01) accuracy rate of GPT-3.5. GPT-4 passed the medical examinations in 26 of 29 cases, outperforming the average scores of medical students in 13 of 17 cases. Translating the examination questions into English improved GPT-3.5's performance but did not affect GPT-4. GPT-3.5 showed no difference in performance between examinations from English-speaking and non-English-speaking countries (P=.72), but GPT-4 performed better on examinations from English-speaking countries significantly (P=.02). Any type of prompt could significantly improve GPT-3.5's (P=.03) and GPT-4's (P<.01) performance. GPT-3.5 performed better on short-text questions than on long-text questions. The difficulty of the questions affected the performance of GPT-3.5 and GPT-4. In image-based multiple-choice questions (MCQs), ChatGPT's accuracy rate ranges from 13.1% to 100%. ChatGPT performed significantly worse on open-ended questions than on MCQs. CONCLUSIONS: GPT-4 demonstrates considerable potential for future use in medical education. However, due to its insufficient accuracy, inconsistent performance, and the challenges posed by differing medical policies and knowledge across countries, GPT-4 is not yet suitable for use in medical education. TRIAL REGISTRATION: PROSPERO CRD42024506687; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=506687.


Asunto(s)
Evaluación Educacional , Licencia Médica , Humanos , Licencia Médica/normas , Licencia Médica/estadística & datos numéricos , Evaluación Educacional/métodos , Evaluación Educacional/normas , Evaluación Educacional/estadística & datos numéricos , Competencia Clínica/estadística & datos numéricos , Competencia Clínica/normas , Inteligencia Artificial , Educación Médica/normas
9.
Artículo en Inglés | MEDLINE | ID: mdl-38977033

RESUMEN

PURPOSE: This study aimed to compare and evaluate the efficiency and accuracy of computerized adaptive testing (CAT) under 2 stopping rules (standard error of measurement [SEM]=0.3 and 0.25) using both real and simulated data in medical examinations in Korea. METHODS: This study employed post-hoc simulation and real data analysis to explore the optimal stopping rule for CAT in medical examinations. The real data were obtained from the responses of 3rd-year medical students during examinations in 2020 at Hallym University College of Medicine. Simulated data were generated using estimated parameters from a real item bank in R. Outcome variables included the number of examinees' passing or failing with SEM values of 0.25 and 0.30, the number of items administered, and the correlation. The consistency of real CAT result was evaluated by examining consistency of pass or fail based on a cut score of 0.0. The efficiency of all CAT designs was assessed by comparing the average number of items administered under both stopping rules. RESULTS: Both SEM 0.25 and SEM 0.30 provided a good balance between accuracy and efficiency in CAT. The real data showed minimal differences in pass/ fail outcomes between the 2 SEM conditions, with a high correlation (r=0.99) between ability estimates. The simulation results confirmed these findings, indicating similar average item numbers between real and simulated data. CONCLUSION: The findings suggest that both SEM 0.25 and 0.30 are effective termination criteria in the context of the Rasch model, balancing accuracy and efficiency in CAT.


Asunto(s)
Evaluación Educacional , Psicometría , Estudiantes de Medicina , Humanos , Evaluación Educacional/métodos , Evaluación Educacional/normas , República de Corea , Psicometría/métodos , Simulación por Computador , Análisis de Datos , Educación de Pregrado en Medicina/métodos , Masculino , Femenino
10.
Nurse Educ Today ; 141: 106308, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39038430

RESUMEN

BACKGROUND: Nursing clinical competence assessment has acquired a special relevance at the undergraduate and postgraduate levels in recent years. In this context, the Objective Structured Clinical Assessment (OSCA) has emerged as a valid and feasible method of assessing nursing competence. The Satisfaction with Nursing Skill Examination: Objective Structured Clinical Assessment (SINE-OSCA) scale is a valid and reliable 10-item measure that has been developed to evaluate nursing students' satisfaction with the OSCA in the Australian context. Given the importance that OSCA has gained in Spain, it is necessary to validate this tool to be used in one of the most spoken languages in the world. OBJECTIVES: The purpose of the study was to carry out a modification of the SINE-OSCA, cross-cultural adaptation and a psychometric analysis of the new S-OSCA with Spanish nursing students. DESIGN: A multicenter study of questionnaire development and validation was carried out in 2023 in four Spanish university nursing centers. The study was carried out in 3 phases: design, pilot implementation, and construct validation. PARTICIPANTS/SETTING: The total population of students from these centers amounted to 1350 students. The final sample consisted of 364 nursing students, selected by convenience sampling. METHODS: The process of translation and cultural adaptation of SINE-OSCA to the Spanish population was carried out following the guidelines proposed by Beaton et al. Content validation, Internal consistency and temporal reliability were evaluated. RESULTS: The S-OSCA presents values in the psychometric indicators (V AIKEN, Bland-Altman diagram, and IVC Lawshe) that exceed the cut-off values established even considering the lower limit of the confidence intervals. This spanish version of the SINE-OSCA has a Cronbach's alpha value that is slightly higher than that reported for that original version (0.928 CI 95 % (0.913-0.94)). Regarding temporal reliability, the S-OSCA scale was completed in 40 nursing students at two times separated by an interval of 15 days. The Intraclass Correlation Coefficient (ICC) obtained was 0.974 CI 95 % (0.952-0.986). CONCLUSIONS: The S-OSCA instrument proves to be robust enough to guarantee the quality of its results up to 15 days post-OSCA.


Asunto(s)
Competencia Clínica , Evaluación Educacional , Psicometría , Estudiantes de Enfermería , Humanos , Estudios Transversales , Estudiantes de Enfermería/psicología , Estudiantes de Enfermería/estadística & datos numéricos , Encuestas y Cuestionarios , Psicometría/instrumentación , Psicometría/métodos , España , Femenino , Masculino , Competencia Clínica/normas , Competencia Clínica/estadística & datos numéricos , Evaluación Educacional/métodos , Evaluación Educacional/normas , Reproducibilidad de los Resultados , Satisfacción Personal , Adulto , Bachillerato en Enfermería/normas , Bachillerato en Enfermería/métodos , Adulto Joven
11.
Nurs Educ Perspect ; 45(5): 313-315, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39078667

RESUMEN

ABSTRACT: The objective structured clinical examination (OSCE) is effective for the evaluation of clinical competence. Studies examining the use of OSCEs in undergraduate mental health nursing education in the United States are limited. A pilot study and a follow-up study were conducted to establish the reliability and validity of a mental health OSCE to evaluate the clinical competence of prelicensure nursing students. International Nursing Association for Clinical Simulation and Learning Standards of Best Practice were used to guide the design and implementation. Results from both studies provide evidence for the use of OSCE in undergraduate mental health nursing education.


Asunto(s)
Competencia Clínica , Bachillerato en Enfermería , Evaluación Educacional , Enfermería Psiquiátrica , Humanos , Bachillerato en Enfermería/normas , Evaluación Educacional/normas , Evaluación Educacional/métodos , Competencia Clínica/normas , Proyectos Piloto , Enfermería Psiquiátrica/educación , Enfermería Psiquiátrica/normas , Estados Unidos , Reproducibilidad de los Resultados , Estudiantes de Enfermería/psicología , Femenino , Adulto , Masculino , Investigación en Educación de Enfermería
12.
J Nurses Prof Dev ; 40(4): 184-189, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38949971

RESUMEN

Assessment of initial nursing competency is essential to safe nursing practice yet often focuses on psychomotor skill acquisition. A multistate health system created a competency strategy based on a comprehensive conceptualization of competency using the American Nursing Association scope and standards of nursing practice. This approach allows for the broad application of a standard competency assessment tool across diverse nursing specialties and provides a framework for nursing professional development practitioners to implement in their organizations.


Asunto(s)
Competencia Clínica , Rol de la Enfermera , Humanos , Competencia Clínica/normas , Desarrollo de Personal/métodos , Estados Unidos , Evaluación Educacional/métodos , Evaluación Educacional/normas
13.
Dyslexia ; 30(3): e1777, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38952195

RESUMEN

This article aims to assist practitioners in understanding dyslexia and other reading difficulties and assessing students' learning needs. We describe the essential components of language and literacy, universal screening, diagnostic assessments, curriculum-based measurement and eligibility determination. We then introduce four diagnostic assessments as examples, including norm-referenced assessments (i.e. the Comprehensive Test of Phonological Processing second edition and the Woodcock-Johnson IV Tests of Achievement) and criterion-referenced assessments (i.e. the Gallistel-Ellis Test of Coding Skills and the Dynamic Indicators of Basic Early Literacy Skills). Finally, We use a makeup case as a concrete example to illustrate how multiple diagnostic assessments are recorded and how the results can be used to inform intervention and eligibility for special education services.


Asunto(s)
Dislexia , Humanos , Dislexia/diagnóstico , Niño , Lectura , Evaluación Educacional/normas , Pruebas del Lenguaje/normas , Estudiantes , Alfabetización , Educación Especial
14.
BMC Med Educ ; 24(1): 817, 2024 Jul 29.
Artículo en Inglés | MEDLINE | ID: mdl-39075511

RESUMEN

CONTEXT: Objective Structured Clinical Examinations (OSCEs) are an increasingly popular evaluation modality for medical students. While the face-to-face interaction allows for more in-depth assessment, it may cause standardization problems. Methods to quantify, limit or adjust for examiner effects are needed. METHODS: Data originated from 3 OSCEs undergone by 900-student classes of 5th- and 6th-year medical students at Université Paris Cité in the 2022-2023 academic year. Sessions had five stations each, and one of the three sessions was scored by consensus by two raters (rather than one). We report OSCEs' longitudinal consistency for one of the classes and staff-related and student variability by session. We also propose a statistical method to adjust for inter-rater variability by deriving a statistical random student effect that accounts for staff-related and station random effects. RESULTS: From the four sessions, a total of 16,910 station scores were collected from 2615 student sessions, with two of the sessions undergone by the same students, and 36, 36, 35 and 20 distinct staff teams in each station for each session. Scores had staff-related heterogeneity (p<10-15), with staff-level standard errors approximately doubled compared to chance. With mixed models, staff-related heterogeneity explained respectively 11.4%, 11.6%, and 4.7% of station score variance (95% confidence intervals, 9.5-13.8, 9.7-14.1, and 3.9-5.8, respectively) with 1, 1 and 2 raters, suggesting a moderating effect of consensus grading. Student random effects explained a small proportion of variance, respectively 8.8%, 11.3%, and 9.6% (8.0-9.7, 10.3-12.4, and 8.7-10.5), and this low amount of signal resulted in student rankings being no more consistent over time with this metric, rather than with average scores (p=0.45). CONCLUSION: Staff variability impacts OSCE scores as much as student variability, and the former can be reduced with dual assessment or adjusted for with mixed models. Both are small compared to unmeasured sources of variability, making them difficult to capture consistently.


Asunto(s)
Competencia Clínica , Evaluación Educacional , Variaciones Dependientes del Observador , Estudiantes de Medicina , Humanos , Evaluación Educacional/métodos , Evaluación Educacional/normas , Competencia Clínica/normas , Educación de Pregrado en Medicina/normas , Paris , Reproducibilidad de los Resultados
17.
Nurse Educ Pract ; 78: 104021, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38917560

RESUMEN

AIM: This paper reflects on the experience of one Scottish University in conducting a face-to-face Objective Structured Examination (OSCE) for large cohorts of student nurses. It outlines the challenges experienced and learning gained. Borton's model of reflection frames this work due to its simplicity, ease of application and cyclical nature. BACKGROUND: The theoretical framework for the OSCE is critical thinking, enabling students to apply those skills authentically. OSCE's are designed to transfer classroom knowledge to clinical practice and offer an authentic work-based assessment. DESIGN: Validity and robustness are key considerations in any assessment and in OSCE, the number of stations that students encounter is important and debated. We used a case-study based OSCE approach initially over four stations and following reflection, changed to one long station with four phases. RESULTS: In OSCE examinations, interrater reliability is a necessity, and students expect equity of approach. We identified that despite clear marking criteria, marks were polarised, with students achieving high or low marks with little middle ground. Review of examination papers highlighted that although students' overall performance was good, some had failed in at least one station, suggesting a four-station approach may skew results. On reflection we hypothesised that using a one station case study-based, phased approach enabled the examiner to build up a more holistic picture of student knowledge and skills. It also provided the student opportunity to develop a rapport with the examiner and standardised patient, thereby putting them more at ease. We argue that this approach is holistic, authentic and student centred. CONCLUSIONS: Our experience highlights that a single station, four phase OSCE is preferrable, enabling students to integrate all aspects of the assessment and provides a holistic view of clinical skills and knowledge.


Asunto(s)
Competencia Clínica , Evaluación Educacional , Estudiantes de Enfermería , Humanos , Escocia , Evaluación Educacional/métodos , Evaluación Educacional/normas , Estudiantes de Enfermería/psicología , Competencia Clínica/normas , Bachillerato en Enfermería , Reproducibilidad de los Resultados , Facultades de Enfermería , Pensamiento
20.
Med Educ Online ; 29(1): 2370617, 2024 Dec 31.
Artículo en Inglés | MEDLINE | ID: mdl-38934534

RESUMEN

While objective clinical structured examination (OSCE) is a worldwide recognized and effective method to assess clinical skills of undergraduate medical students, the latest Ottawa conference on the assessment of competences raised vigorous debates regarding the future and innovations of OSCE. This study aimed to provide a comprehensive view of the global research activity on OSCE over the past decades and to identify clues for its improvement. We performed a bibliometric and scientometric analysis of OSCE papers published until March 2024. We included a description of the overall scientific productivity, as well as an unsupervised analysis of the main topics and the international scientific collaborations. A total of 3,224 items were identified from the Scopus database. There was a sudden spike in publications, especially related to virtual/remote OSCE, from 2020 to 2024. We identified leading journals and countries in terms of number of publications and citations. A co-occurrence term network identified three main clusters corresponding to different topics of research in OSCE. Two connected clusters related to OSCE performance and reliability, and a third cluster on student's experience, mental health (anxiety), and perception with few connections to the two previous clusters. Finally, the United States, the United Kingdom, and Canada were identified as leading countries in terms of scientific publications and collaborations in an international scientific network involving other European countries (the Netherlands, Belgium, Italy) as well as Saudi Arabia and Australia, and revealed the lack of important collaboration with Asian countries. Various avenues for improving OSCE research have been identified: i) developing remote OSCE with comparative studies between live and remote OSCE and issuing international recommendations for sharing remote OSCE between universities and countries; ii) fostering international collaborative studies with the support of key collaborating countries; iii) investigating the relationships between student performance and anxiety.


Asunto(s)
Bibliometría , Competencia Clínica , Educación de Pregrado en Medicina , Evaluación Educacional , Humanos , Evaluación Educacional/métodos , Evaluación Educacional/normas , Educación de Pregrado en Medicina/normas , Reproducibilidad de los Resultados , Estudiantes de Medicina/psicología , Estudiantes de Medicina/estadística & datos numéricos , Investigación Biomédica/normas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA