Búsqueda | Portal Regional de la BVS

Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study.

Guo, Eddie; Gupta, Mehul; Deng, Jiawen; Park, Ye-Jean; Paget, Michael; Naugler, Christopher.

J Med Internet Res ; 26: e48996, 2024 Jan 12.

Artículo en Inglés | MEDLINE | ID: mdl-38214966

RESUMEN

BACKGROUND: The systematic review of clinical research papers is a labor-intensive and time-consuming process that often involves the screening of thousands of titles and abstracts. The accuracy and efficiency of this process are critical for the quality of the review and subsequent health care decisions. Traditional methods rely heavily on human reviewers, often requiring a significant investment of time and resources. OBJECTIVE: This study aims to assess the performance of the OpenAI generative pretrained transformer (GPT) and GPT-4 application programming interfaces (APIs) in accurately and efficiently identifying relevant titles and abstracts from real-world clinical review data sets and comparing their performance against ground truth labeling by 2 independent human reviewers. METHODS: We introduce a novel workflow using the Chat GPT and GPT-4 APIs for screening titles and abstracts in clinical reviews. A Python script was created to make calls to the API with the screening criteria in natural language and a corpus of title and abstract data sets filtered by a minimum of 2 human reviewers. We compared the performance of our model against human-reviewed papers across 6 review papers, screening over 24,000 titles and abstracts. RESULTS: Our results show an accuracy of 0.91, a macro F1-score of 0.60, a sensitivity of excluded papers of 0.91, and a sensitivity of included papers of 0.76. The interrater variability between 2 independent human screeners was κ=0.46, and the prevalence and bias-adjusted κ between our proposed methods and the consensus-based human decisions was κ=0.96. On a randomly selected subset of papers, the GPT models demonstrated the ability to provide reasoning for their decisions and corrected their initial decisions upon being asked to explain their reasoning for incorrect classifications. CONCLUSIONS: Large language models have the potential to streamline the clinical review process, save valuable time and effort for researchers, and contribute to the overall quality of clinical reviews. By prioritizing the workflow and acting as an aid rather than a replacement for researchers and reviewers, models such as GPT-4 can enhance efficiency and lead to more accurate and reliable conclusions in medical research.

Asunto(s)

Inteligencia Artificial , Investigación Biomédica , Revisiones Sistemáticas como Asunto , Humanos , Consenso , Análisis de Datos , Solución de Problemas , Procesamiento de Lenguaje Natural , Flujo de Trabajo

How much is enough? Proposing achievement thresholds for core EPAs of graduating medical students in Canada.

Harvey, Adrian; Paget, Michael; McLaughlin, Kevin; Busche, Kevin; Touchie, Claire; Naugler, Christopher; Desy, Janeve.

Med Teach ; 45(9): 1054-1060, 2023 09.

Artículo en Inglés | MEDLINE | ID: mdl-37262177

RESUMEN

PURPOSE: The transition towards Competency-Based Medical Education at the Cumming School of Medicine was accelerated by the reduced clinical time caused by the COVID-19 pandemic. The purpose of this study was to define a standard protocol for setting Entrustable Professional Activity (EPA) achievement thresholds and examine their feasibility within the clinical clerkship. METHODS: Achievement thresholds for each of the 12 AFMC EPAs for graduating Canadian medical students were set by using sequential rounds of revision by three consecutive groups of stakeholders and evaluation experts. Structured communication was guided by a modified Delphi technique. The feasibility/consequence models of these EPAs were then assessed by tracking their completion by the graduating class of 2021. RESULTS: The threshold-setting process resulted in set EPA achievement levels ranging from 1 to 8 across the 12 AFMC EPAs. Estimates were stable after the first round for 9 of 12 EPAs. 96.27% of EPAs were successfully completed by clerkship students despite the shortened clinical period. Feasibility was predicted by the slowing rate of EPA accumulation overtime during the clerkship. CONCLUSION: The process described led to consensus on EPA achievement thresholds. Successful completion of the assigned thresholds was feasible within the shortened clerkship.[Box: see text].

Asunto(s)

COVID-19 , Internado y Residencia , Estudiantes de Medicina , Humanos , Pandemias , Canadá , Competencia Clínica , COVID-19/epidemiología , Educación Basada en Competencias/métodos

COVID-19, curtailed clerkships, and competency: Making graduation decisions in the midst of a global pandemic.

Desy, Janeve; Harvey, Adrian; Busche, Kevin; Weeks, Sarah; Paget, Michael; Naugler, Christopher; Welikovitch, Lisa; McLaughlin, Kevin.

Can Med Educ J ; 11(6): e181-e187, 2020 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-33349776

The application of reward learning in the real world: Changes in the reward positivity amplitude reflect learning in a medical education context.

Williams, Chad C; Hecker, Kent G; Paget, Michael K; Coderre, Sylvain P; Burak, Kelly W; Wright, Bruce; Krigolson, Olave E.

Int J Psychophysiol ; 132(Pt B): 236-242, 2018 10.

Artículo en Inglés | MEDLINE | ID: mdl-29111454

RESUMEN

Evidence ranging from behavioural adaptations to neurocognitive theories has made significant advances into our understanding of feedback-based learning. For instance, over the past twenty years research using electroencephalography has demonstrated that the amplitude of a component of the human event-related brain potential - the reward positivity - appears to change with learning in a manner predicted by reinforcement learning theory (Holroyd and Coles, 2002; Sutton and Barto, 1998). However, while the reward positivity (also known as the feedback related negativity) is well studied, whether the component reflects an underlying learning process or whether it is simply sensitive to feedback evaluation is still unclear. Here, we sought to provide support that the reward positivity is reflective of an underlying learning process and further we hoped to demonstrate this in a real-world medical education context. In the present study, students with no medical training viewed a series of patient cards that contained ten physiological readings relevant for diagnosing liver and biliary disease types, selected the most appropriate diagnostic classification, and received feedback as to whether their decisions were correct or incorrect. Our behavioural results revealed that our participants were able to learn to diagnose liver and biliary disease types. Importantly, we found that the amplitude of the reward positivity diminished in a concomitant manner with the aforementioned behavioural improvements. In sum, our data support theoretical predictions (e.g., Holroyd and Coles, 2002), suggest that the reward positivity is an index of a neural learning system, and further validate that this same system is involved in learning across a wide range of contexts.

Asunto(s)

Corteza Cerebral/fisiología , Educación Médica , Potenciales Evocados/fisiología , Retroalimentación Psicológica/fisiología , Aprendizaje/fisiología , Recompensa , Pensamiento/fisiología , Adulto , Electroencefalografía , Femenino , Humanos , Masculino , Adulto Joven

The grades that clinical teachers give students modifies the grades they receive.

Paget, Michael; Brar, Gurbir; Veale, Pamela; Busche, Kevin; Coderre, Sylvain; Woloschuk, Wayne; McLaughlin, Kevin.

Adv Health Sci Educ Theory Pract ; 23(2): 241-247, 2018 May.

Artículo en Inglés | MEDLINE | ID: mdl-28707179

RESUMEN

Prior studies have shown a correlation between the grades students receive and how they rate their teacher in the classroom. In this study, the authors probe this association on clinical rotations and explore potential mechanisms. All In-Training Evaluation Reports (ITERs) for students on mandatory clerkship rotations from April 1, 2013 to January 31, 2015 were matched with the corresponding student's rating of their teacher (SRT). The date and time that ITERs and SRTs were submitted was used to divide SRTs into those submitted before versus after the corresponding ITER was submitted. Multilevel, mixed effects linear regression was used to examine the association between SRT, ITER rating, and whether the ITER was submitted before or after SRT. Of 2373 paired evaluations, 1098 (46.3%) SRT were submitted before the teacher had submitted the ITER. There was a significant interaction between explanatory variables: when ITER ratings had not yet been submitted, the regression coefficient for this association was 0.25 (95% confidence interval [0.17, 0.33], p < 0.001), whereas the regression coefficient was significantly higher when ITER ratings were submitted prior to SRT (0.40 [0.31, 0.49], p < 0.001). Finding an association between SRT and ITER when students do not know their ITER ratings suggests that SRTs can capture attributes of effective teaching, but the effect modification when students have access to their ITER rating supports grade satisfaction bias. Further studies are needed to explain the mechanism of grade satisfaction and to identify other biases that may impact the validity of SRT.

Asunto(s)

Prácticas Clínicas/estadística & datos numéricos , Evaluación Educacional/estadística & datos numéricos , Docentes Médicos/estadística & datos numéricos , Estudiantes de Medicina/estadística & datos numéricos , Canadá , Humanos , Satisfacción Personal

Cognitive load imposed by ultrasound-facilitated teaching does not adversely affect gross anatomy learning outcomes.

Jamniczky, Heather A; Cotton, Darrel; Paget, Michael; Ramji, Qahir; Lenz, Ryan; McLaughlin, Kevin; Coderre, Sylvain; Ma, Irene W Y.

Anat Sci Educ ; 10(2): 144-151, 2017 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-27533319

RESUMEN

Ultrasonography is increasingly used in medical education, but its impact on learning outcomes is unclear. Adding ultrasound may facilitate learning, but may also potentially overwhelm novice learners. Based upon the framework of cognitive load theory, this study seeks to evaluate the relationship between cognitive load associated with using ultrasound and learning outcomes. The use of ultrasound was hypothesized to facilitate learning in anatomy for 161 novice first-year medical students. Using linear regression analyses, the relationship between reported cognitive load on using ultrasound and learning outcomes as measured by anatomy laboratory examination scores four weeks after ultrasound-guided anatomy training was evaluated in consenting students. Second anatomy examination scores of students who were taught anatomy with ultrasound were compared with historical controls (those not taught with ultrasound). Ultrasound's perceived utility for learning was measured on a five-point scale. Cognitive load on using ultrasound was measured on a nine-point scale. Primary outcome was the laboratory examination score (60 questions). Learners found ultrasound useful for learning. Weighted factor score on "image interpretation" was negatively, but insignificantly, associated with examination scores [F (1,135) = 0.28, beta = -0.22; P = 0.61]. Weighted factor score on "basic knobology" was positively and insignificantly associated with scores; [F (1,138) = 0.27, beta = 0.42; P = 0.60]. Cohorts exposed to ultrasound had significantly higher scores than historical controls (82.4% ± SD 8.6% vs. 78.8% ± 8.5%, Cohen's d = 0.41, P < 0.001). Using ultrasound to teach anatomy does not negatively impact learning and may improve learning outcomes. Anat Sci Educ 10: 144-151. © 2016 American Association of Anatomists.

Asunto(s)

Anatomía/educación , Cognición , Educación de Pregrado en Medicina/métodos , Aprendizaje , Estudiantes de Medicina/psicología , Enseñanza , Ultrasonografía , Alberta , Comprensión , Gráficos por Computador , Instrucción por Computador , Curriculum , Evaluación Educacional/métodos , Escolaridad , Humanos , Modelos Lineales , Análisis de Componente Principal , Facultades de Medicina , Encuestas y Cuestionarios , Carga de Trabajo

How do medical students form impressions of the effectiveness of classroom teachers?

Rannelli, Luke; Coderre, Sylvain; Paget, Michael; Woloschuk, Wayne; Wright, Bruce; McLaughlin, Kevin.

Med Educ ; 48(8): 831-7, 2014 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-25039739

RESUMEN

CONTEXT: Teaching effectiveness ratings (TERs) are used to provide feedback to teachers on their performance and to guide decisions on academic promotion. However, exactly how raters make decisions on teaching effectiveness is unclear. OBJECTIVES: The objectives of this study were to identify variables that medical students appraise when rating the effectiveness of a classroom teacher, and to explore whether the relationships among these variables and TERs are modified by the physical attractiveness of the teacher. METHODS: We asked 48 Year 1 medical students to listen to 2-minute audio clips of 10 teachers and to describe their impressions of these teachers and rate their teaching effectiveness. During each clip, we displayed either an attractive or an unattractive photograph of an unrelated third party. We used qualitative analysis followed by factor analysis to identify the principal components of teaching effectiveness, and multiple linear regression to study the associations among these components, type of photograph displayed, and TER. RESULTS: We identified two principal components of teaching effectiveness: charisma and intellect. There was no association between rating of intellect and TER. Rating of charisma and the display of an attractive photograph were both positively associated with TER and a significant interaction between these two variables was apparent (p < 0.001). The regression coefficient for the association between charisma and TER was 0.26 (95% confidence interval [CI] 0.10-0.41) when an attractive picture was displayed and 0.83 (95% CI 0.66-1.00) when an unattractive picture was displayed (p < 0.001). CONCLUSIONS: When medical students rate classroom teachers, they consider the degree to which the teacher is charismatic, although the relationship between this attribute and TER appears to be modified by the perceived physical attractiveness of the teacher. Further studies are needed to identify other variables that may influence subjective ratings of teaching effectiveness and to evaluate alternative strategies for rating teaching effectiveness.

Asunto(s)

Docentes Médicos , Percepción , Estudiantes de Medicina/psicología , Enseñanza/normas , Retroalimentación , Humanos , Investigación Cualitativa

Rater variables associated with ITER ratings.

Paget, Michael; Wu, Caren; McIlwrick, Joann; Woloschuk, Wayne; Wright, Bruce; McLaughlin, Kevin.

Adv Health Sci Educ Theory Pract ; 18(4): 551-7, 2013 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-22777161

RESUMEN

Advocates of holistic assessment consider the ITER a more authentic way to assess performance. But this assessment format is subjective and, therefore, susceptible to rater bias. Here our objective was to study the association between rater variables and ITER ratings. In this observational study our participants were clerks at the University of Calgary and preceptors who completed online ITERs between February 2008 and July 2009. Our outcome variable was global rating on the ITER (rated 1-5), and we used a generalized estimating equation model to identify variables associated with this rating. Students were rated "above expected level" or "outstanding" on 66.4 % of 1050 online ITERs completed during the study period. Two rater variables attenuated ITER ratings: the log transformed time taken to complete the ITER [ß = -0.06, 95 % confidence interval (-0.10, -0.02), p = 0.002], and the number of ITERs that a preceptor completed over the time period of the study [ß = -0.008 (-0.02, -0.001), p = 0.02]. In this study we found evidence of leniency bias that resulted in two thirds of students being rated above expected level of performance. This leniency bias appeared to be attenuated by delay in ITER completion, and was also blunted in preceptors who rated more students. As all biases threaten the internal validity of the assessment process, further research is needed to confirm these and other sources of rater bias in ITER ratings, and to explore ways of limiting their impact.

Asunto(s)

Prácticas Clínicas , Competencia Clínica/normas , Alberta , Prácticas Clínicas/organización & administración , Educación Basada en Competencias , Estudios Transversales , Educación de Pregrado en Medicina , Evaluación Educacional/métodos , Evaluación Educacional/normas , Humanos , Reproducibilidad de los Resultados , Estudiantes de Medicina

The effect of simulator training on clinical skills acquisition, retention and transfer.

Fraser, Kristin; Peets, Adam; Walker, Ian; Tworek, Janet; Paget, Michael; Wright, Bruce; McLaughlin, Kevin.

Med Educ ; 43(8): 784-9, 2009 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-19659492

RESUMEN

CONTEXT: Prior research has demonstrated that residents have poor clinical skills in cardiology and respirology. It is not clear how these skills can be improved because the number of patients with suitable clinical findings whose cooperation might help residents to better develop these clinical skills is limited. Objectives Our objective was to evaluate the effect of training on a cardiorespiratory simulator (CRS) on skills acquisition, retention and transfer. METHODS: We randomly allocated 146 students to CRS training in either chest pain or dyspnoea and compared each student's performance on the clinical presentation in which he or she had received CRS training with performance on the control presentation. RESULTS: Immediately after training, students were more accurate in identifying abnormal clinical findings on the CRS (70.0% versus 52.2%; d = 7.6, P < 0.0001) and showed improved diagnostic performance (72.1% versus 55.6%; d = 4.3, P = 0.0007) on the training clinical presentation. At the end of the course they were still better at identifying abnormal findings (57.1% versus 51.7%; d = 2.5, P = 0.004) and diagnosing correctly (50.0% versus 38.1%; d = 3.0, P = 0.002) on problems included in the training clinical presentation. However, they showed no difference between training and control presentations in diagnostic performance when required to transfer their skills between problems (45.9% versus 43.8%; P = 0.5) or in performance on multiple-choice questions (64.1% versus 63.6%; P = 0.8). CONCLUSIONS: Students can acquire and retain clinical skills with CRS training, but demonstrate limited ability to transfer these to other problems. Further studies are needed to explore ways of improving learning and transfer with CRS training.

Asunto(s)

Cardiología/educación , Enfermedades Cardiovasculares/diagnóstico , Competencia Clínica/normas , Educación de Pregrado en Medicina/métodos , Simulación de Paciente , Fenómenos Fisiológicos Cardiovasculares , Simulación por Computador , Curriculum , Evaluación Educacional/métodos , Humanos , Fenómenos Fisiológicos Respiratorios , Estadística como Asunto

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA