Search | VHL Regional Portal

SHARP (SHort Answer, Rationale Provision): A New Item Format to Assess Clinical Reasoning.

Runyon, Christopher R; Paniagua, Miguel A; Rosenthal, Francine A; Veneziano, Andrea L; McNaughton, Lauren; Murray, Constance T; Harik, Polina.

Acad Med ; 2024 May 15.

Article in English | MEDLINE | ID: mdl-38753971

ABSTRACT

PROBLEM: Many non-workplace-based assessments do not provide good evidence of a learner's problem representation or ability to provide a rationale for a clinical decision they have made. Exceptions include assessment formats that require resource-intensive administration and scoring. This article reports on research efforts toward building a scalable non-workplace-based assessment format that was specifically developed to capture evidence of a learner's ability to provide a justification for a clinical decision that they had made. APPROACH: The authors developed a 2-step item format called SHARP (SHort Answer, Rationale Provision), referring to the 2 tasks that comprise the item. In collaboration with physician-educators, the authors integrated short-answer questions into a patient medical record-based item starting in October 2021 and arrived at an innovative item format in December 2021. In this format, a test-taker interprets patient medical record data to make a clinical decision, types in their response, and pinpoints medical record details that justify their answers. In January 2022, a total of 177 fourth-year medical students, representing 20 U.S. medical schools, completed 35 SHARP items in a proof-of-concept study. OUTCOMES: Primary outcomes were item timing, difficulty, reliability, and scoring ease. There was substantial variability in item difficulty, with the average item answered correctly by 44% of students (range, 4%-76%). The estimated reliability (Cronbach α) of the set of SHARP items was 0.76 (95% CI, 0.70-0.80). Item scoring is fully automated, minimizing resource requirements. NEXT STEPS: A larger study is planned to gather additional validity evidence about the item format. This study will allow comparisons between performance on SHARP items and other examinations, the examination of group differences in performance, and possible use cases for formative assessment purposes. Cognitive interviews are also planned to better understand the thought processes of medical students as they work through the SHARP items.

"Cephalgia" or "migraine"? Solving the headache of assessing clinical reasoning using natural language processing.

Runyon, Christopher R; Harik, Polina; Barone, Michael A.

Diagnosis (Berl) ; 10(1): 54-60, 2023 02 01.

Article in English | MEDLINE | ID: mdl-36409593

ABSTRACT

In this op-ed, we discuss the advantages of leveraging natural language processing (NLP) in the assessment of clinical reasoning. Clinical reasoning is a complex competency that cannot be easily assessed using multiple-choice questions. Constructed-response assessments can more directly measure important aspects of a learner's clinical reasoning ability, but substantial resources are necessary for their use. We provide an overview of INCITE, the Intelligent Clinical Text Evaluator, a scalable NLP-based computer-assisted scoring system that was developed to measure clinical reasoning ability as assessed in the written documentation portion of the now-discontinued USMLE Step 2 Clinical Skills examination. We provide the rationale for building a computer-assisted scoring system that is aligned with the intended use of an assessment. We show how INCITE's NLP pipeline was designed with transparency and interpretability in mind, so that every score produced by the computer-assisted system could be traced back to the text segment it evaluated. We next suggest that, as a consequence of INCITE's transparency and interpretability features, the system may easily be repurposed for formative assessment of clinical reasoning. Finally, we provide the reader with the resources to consider in building their own NLP-based assessment tools.

Subject(s)

Clinical Competence , Natural Language Processing , Humans , Headache , Clinical Reasoning

Exploring the Validity Based on Internal Structure of the Oldenburg Burnout Inventory - Medical Student (OLBI-MS).

Runyon, Christopher R; Paniagua, Miguel A; Dyrbye, Liselotte N.

Teach Learn Med ; 35(1): 37-51, 2023.

Article in English | MEDLINE | ID: mdl-35068287

ABSTRACT

CONSTRUCT: The study gathers validity evidence for the use of the Oldenburg Burnout Inventory - Medical Student (OLBI-MS), a 16-item scale used to measure medical student burnout. The 16 items on the OLBI-MS are split to form two subscales, disengagement and exhaustion. BACKGROUND: Medical student burnout has been empirically linked to several detrimental professional and personal consequences. In recognition of the high prevalence of medical student burnout, one recommendation has been to regularly measure burnout using standardized measures that have strong validity evidence for their intended use. The OLBI-MS, a frequently used measure of medical student burnout, was adapted from the Oldenburg Burnout Inventory (OLBI). The OLBI has been studied in many occupational settings and been found to have a two-factor solution in majority of these populations, but there is limited validity evidence available that supports the use of the OLBI-MS subscales in a medical student population. APPROACH: Two years of Association of American Medical College Year 2 Questionnaire data (n = 24,008) were used in the study for a series of exploratory and confirmatory factor analyses. The data from the first year (n = 11,586) was randomly split into a confirmatory and exploratory sample, with the data from the second year (n = 12,422) used as a secondary confirmatory sample. Because the questionnaire is administered to medical students during their second year of undergraduate medical education, we consider this a study as providing validity evidence specifically for the measure's use with that population. FINDINGS: The two-factor structure of the OLBI-MS was not empirically supported in the second year medical-student population. Several of the items had low inter-item correlations and/or moderate correlations with unexpected items. Three modified versions of the OLBI-MS were tested using subsets of the original items. Two of the modified versions were adequate statistical explanations of the relationships in the data. However, it is unclear if these revised scales appropriately measure all aspects of the construct of burnout and additional validity evidence is needed prior to their use. CONCLUSIONS: The use of the OLBI-MS is not recommended for measuring second-year medical student burnout. It is unclear if the OLBI-MS is appropriate for medical students at all, or if different measures are necessary at different stages in a medical student's professional development. Additional research is needed to either improve the OLBI-MS or use it as a foundation for a new measure.Supplemental data for this article is available online at at www.tandfonline.com/htlm .

Subject(s)

Burnout, Professional , Students, Medical , Humans , Psychometrics , Burnout, Psychological , Burnout, Professional/diagnosis , Surveys and Questionnaires

Comparing computer adaptive testing stopping rules under the generalized partial-credit model.

Stafford, Rose E; Runyon, Christopher R; Casabianca, Jodi M; Dodd, Barbara G.

Behav Res Methods ; 51(3): 1305-1320, 2019 06.

Article in English | MEDLINE | ID: mdl-29926441

ABSTRACT

An important consideration of any computer adaptive testing (CAT) program is the criterion used for ending item administration-the stopping rule, which ensures that all examinees are assessed to the same standard. Although various stopping rules exist, none of them have been compared under the generalized partial-credit model (Muraki in Applied Psychological Measurement, 16, 159-176, 1992). In this simulation study we compared the performance of three variable-length stopping rules-standard error (SE), minimum information (MI), and change in theta (CT)-both in isolation and in combination with requirements of minimum and maximum numbers of items, as well as a fixed-length stopping rule. Each stopping rule was examined under two termination criteria-one a more lenient requirement (SE = 0.35, MI = 0.56, CT = 0.05), and one more stringent (SE = 0.30, MI = 0.42, CT = 0.02). The simulation design also included content-balancing and exposure controls, aspects of CAT that have been excluded in previous research comparing variable-length stopping rules. The minimum-information stopping rule produced biased theta estimates and varied greatly in measurement quality across the theta distribution. The absolute-change-in-theta stopping rule had strong performance when paired with a lower criterion and a minimum test length. The standard error stopping rule consistently provided the best balance of measurement precision and operational efficiency and was based on the fewest number of administered items necessary to obtain accurate and precise theta estimates, particularly when it was paired with a maximum-number-of-items stopping rule.

Subject(s)

Computer Simulation , Computers , Research , Software

Effects of Discovery, Iteration, and Collaboration in Laboratory Courses on Undergraduates' Research Career Intentions Fully Mediated by Student Ownership.

Corwin, Lisa A; Runyon, Christopher R; Ghanem, Eman; Sandy, Moriah; Clark, Greg; Palmer, Gregory C; Reichler, Stuart; Rodenbusch, Stacia E; Dolan, Erin L.

CBE Life Sci Educ ; 17(2): ar20, 2018 06.

Article in English | MEDLINE | ID: mdl-29749845

ABSTRACT

Course-based undergraduate research experiences (CUREs) provide a promising avenue to attract a larger and more diverse group of students into research careers. CUREs are thought to be distinctive in offering students opportunities to make discoveries, collaborate, engage in iterative work, and develop a sense of ownership of their lab course work. Yet how these elements affect students' intentions to pursue research-related careers remain unexplored. To address this knowledge gap, we collected data on three design features thought to be distinctive of CUREs (discovery, iteration, collaboration) and on students' levels of ownership and career intentions from â¼800 undergraduates who had completed CURE or inquiry courses, including courses from the Freshman Research Initiative (FRI), which has a demonstrated positive effect on student retention in college and in science, technology, engineering, and mathematics. We used structural equation modeling to test relationships among the design features and student ownership and career intentions. We found that discovery, iteration, and collaboration had small but significant effects on students' intentions; these effects were fully mediated by student ownership. Students in FRI courses reported significantly higher levels of discovery, iteration, and ownership than students in other CUREs. FRI research courses alone had a significant effect on students' career intentions.

Subject(s)

Cooperative Behavior , Laboratories , Ownership , Research/education , Students , Curriculum , Female , Humans , Male

Do Biology Students Really Hate Math? Empirical Insights into Undergraduate Life Science Majors' Emotions about Mathematics.

Wachsmuth, Lucas P; Runyon, Christopher R; Drake, John M; Dolan, Erin L.

CBE Life Sci Educ ; 16(3)2017.

Article in English | MEDLINE | ID: mdl-28798211

ABSTRACT

Undergraduate life science majors are reputed to have negative emotions toward mathematics, yet little empirical evidence supports this. We sought to compare emotions of majors in the life sciences versus other natural sciences and math. We adapted the Attitudes toward the Subject of Chemistry Inventory to create an Attitudes toward the Subject of Mathematics Inventory (ASMI). We collected data from 359 science and math majors at two research universities and conducted a series of statistical tests that indicated that four AMSI items comprised a reasonable measure of students' emotional satisfaction with math. We then compared life science and non-life science majors and found that major had a small to moderate relationship with students' responses. Gender also had a small relationship with students' responses, while students' race, ethnicity, and year in school had no observable relationship. Using latent profile analysis, we identified three groups-students who were emotionally satisfied with math, emotionally dissatisfied with math, and neutral. These results and the emotional satisfaction with math scale should be useful for identifying differences in other undergraduate populations, determining the malleability of undergraduates' emotional satisfaction with math, and testing effects of interventions aimed at improving life science majors' attitudes toward math.

Subject(s)

Attitude , Biology/education , Mathematics , Students , Emotions , Hate , Humans , Mathematics/education , Universities

Race and Gender Differences in Undergraduate Research Mentoring Structures and Research Outcomes.

Aikens, Melissa L; Robertson, Melissa M; Sadselia, Sona; Watkins, Keiana; Evans, Mara; Runyon, Christopher R; Eby, Lillian T; Dolan, Erin L.

CBE Life Sci Educ ; 16(2)2017.

Article in English | MEDLINE | ID: mdl-28550078

ABSTRACT

Participating in undergraduate research with mentorship from faculty may be particularly important for ensuring the persistence of women and minority students in science. Yet many life science undergraduates at research universities are mentored by graduate or postdoctoral researchers (i.e., postgraduates). We surveyed a national sample of undergraduate life science researchers about the mentoring structure of their research experiences and the outcomes they realized from participating in research. We observed two common mentoring structures: an open triad with undergraduate-postgraduate and postgraduate-faculty ties but no undergraduate-faculty tie, and a closed triad with ties among all three members. We found that men and underrepresented minority (URM) students are significantly more likely to report a direct tie to their faculty mentors (closed triad) than women, white, and Asian students. We also determined that mentoring structure was associated with differences in student outcomes. Women's mentoring structures were associated with their lower scientific identity, lower intentions to pursue a science, technology, engineering, and mathematics (STEM) PhD, and lower scholarly productivity. URM students' mentoring structures were associated with higher scientific identity, greater intentions to pursue a STEM PhD, and higher scholarly productivity. Asian students reported lower scientific identity and intentions to pursue a STEM PhD, which were unrelated to their mentoring structures.

Subject(s)

Gender Identity , Mentoring , Mentors , Minority Groups/education , Research/education , Students/psychology , Female , Humans , Male , Universities

Comparing Imputation Methods for Trait Estimation Using the Rating Scale Model.

Stafford, Rose E; Runyon, Christopher R; Casabianca, Jodi M; Dodd, Barbara G.

J Appl Meas ; 18(1): 12-27, 2017.

Article in English | MEDLINE | ID: mdl-28453496

ABSTRACT

This study examined the performance of four methods of handling missing data for discrete response options on a questionnaire: (1) ignoring the missingness (using only the observed items to estimate trait levels); (2) nearest-neighbor hot deck imputation; (3) multiple hot deck imputation; and (4) semi-parametric multiple imputation. A simulation study examining three questionnaire lengths (41-, 20-, and 10-item) crossed with three levels of missingness (10, 25, and 40 percent) was conducted to see which methods best recovered trait estimates when data were missing completely at random and the polytomous items were scored with Andrich's (1978) rating scale model. The results showed that ignoring the missingness and semi-parametric imputation best recovered known trait levels across all conditions, with the semi-parametric technique providing the most precise trait estimates. This study demonstrates the power of specific objectivity in Rasch measurement, as ignoring the missingness leads to generally unbiased trait estimates.

Subject(s)

Algorithms , Data Interpretation, Statistical , Models, Statistical , Psychometrics , Sample Size , Surveys and Questionnaires

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL