Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 66
Filter
1.
Adv Health Sci Educ Theory Pract ; 25(3): 629-639, 2020 08.
Article in English | MEDLINE | ID: mdl-31720878

ABSTRACT

As medical schools have changed their curricula to address foundational and clinical sciences in a more integrated fashion, teaching methods such as concept mapping have been incorporated in small group learning settings. Methods that can assess students' ability to apply such integrated knowledge are not as developed, however. The purpose of this project was to assess the validity of scores on a focused version of concept maps called mechanistic case diagrams (MCDs), which are hypothesized to enhance existing tools for assessing integrated knowledge that supports clinical reasoning. The data were from the medical school graduating class of 2018 (N = 136 students). In 2014-2015 we implemented a total of 16 case diagrams in case analysis groups within the Mechanisms of Health and Disease (MOHD) strand of the pre-clinical curriculum. These cases were based on topics being taught during the lectures and small group sessions for MOHD. We created an overall score across all 16 cases for each student. We then correlated these scores with performance in the preclinical curriculum [as assessed by overall performance in MOHD integrated foundational basic science courses and overall performance in the Clinical and Professional Skills (CAPS) courses], and standardized licensing exam scores [United States Medical Licensing Exam (USMLE)] Step 1 (following core clerkships) and Step 2 Clinical Knowledge (at the beginning of the fourth year of medical school). MCD scores correlated with students' overall basic science scores (r = .46, p = .0002) and their overall performance in Clinical and Professional Skills courses (r = .49, p < .0001). In addition, they correlated significantly with standardized exam measures, including USMLE Step 1 (r = .33, p ≤ .0001), and USMLE Step 2 CK (r = .39, p < .0001). These results provide preliminary validity evidence that MCDs may be useful in identifying students who have difficulty in integrating foundational and clinical sciences.


Subject(s)
Concept Formation , Curriculum , Internet , Science/education , Systems Integration , Clinical Competence , Diagnosis, Differential , Pilot Projects
2.
J Clin Anesth ; 54: 102-110, 2019 May.
Article in English | MEDLINE | ID: mdl-30415149

ABSTRACT

STUDY OBJECTIVE: The first aim of this study was to test whether a 7 item evaluation scale developed by our department's certified registered nurse anesthetists (CRNAs) was psychometrically reliable. The second aim was to test whether anesthesiologists' performance changed with their years of postgraduate experience. DESIGN, SETTING, MEASUREMENTS: Sixty-two University of Iowa CRNAs evaluated 81 anesthesiologists during one weekend. Anesthesiologists' scores were adjusted for CRNA rater leniency. Anesthesiologists' scores were tested for sensitivity to CRNA-anesthesiologist case-specific variables. Scores also were tested against anesthesiologists' years of postgraduate experience. The latter association was tested for sensitivity to case-specific variables, anesthesiologists' clinical supervision scores provided by residents, and anesthesiologist clinical assignment variables. MAIN RESULTS: The 7 items demonstrated a single-factor structure, allowing calculation of mean score over the 7 items. Individual anesthesiologist scores were reliable when scores were provided by at least 10 different CRNAs. Anesthesiologists' scores (mean 3.34 [SD 0.41]) were not affected by the interval since last CRNA-anesthesiologist interaction, number of interactions, or case-specific variables. There was a negative association between leniency-adjusted anesthesiologist scores and years of anesthesiologist postgraduate practice (coefficient -0.20 per decade, t = -19.39, P < 0.0001). The association remained robust when accounting for case-specific variables, resident clinical supervision scores, and overall clinical assignment variables. CONCLUSIONS: Anesthesiologists' operating room performance can be evaluated reliably by non-physician anesthesia providers (CRNAs). The evaluation process can be done reliably and validly using an assessment scale consisting of only a few (<10) items and with evaluations by only a few individuals (≥10 CRNA raters). There is no indication evaluations provided by CRNAs were significantly influenced by the interval between interaction and evaluation, number of interactions, or other case-specific variables. From CRNAs' perspectives, on average, as anesthesiologists gain experience, anesthesiologists' behaviors in the operating room change, providing CRNAs with less direct assistance in patient care.


Subject(s)
Anesthesiologists/statistics & numerical data , Clinical Competence/statistics & numerical data , Employee Performance Appraisal/statistics & numerical data , Nurse Anesthetists/psychology , Physician-Nurse Relations , Anesthesiologists/psychology , Employee Performance Appraisal/methods , Humans , Operating Rooms , Psychometrics , Time Factors
3.
IISE Trans Healthc Syst Eng ; 88(2): 110-116, 2018.
Article in English | MEDLINE | ID: mdl-29963653

ABSTRACT

An unbiased, repeatable process for assessing operating room performance is an important step toward quantifying the relationship between surgical training and performance. Hip fracture surgeries offer a promising first target in orthopedic trauma because they are common and they offer quantitative performance metrics that can be assessed from video recordings and intraoperative fluoroscopic images. Hip fracture repair surgeries were recorded using a head-mounted point-of-view camera. Intraoperative fluoroscopic images were also saved. The following performance metrics were analyzed: duration of wire navigation, number of fluoroscopic images collected, degree of intervention by the surgeon's supervisor, and the tip-apex distance (TAD). Two orthopedic traumatologists graded surgical performance in each video independently using an Objective Structured Assessment of Technical Skill (OSATS). Wire navigation duration correlated with weeks into residency and prior cases logged. TAD correlated with cases logged. There was no significant correlation between the OSATS total score and experience metrics. Total OSATS score correlated with duration and number of fluoroscopic images. Our results indicate that two metrics of hip fracture wire navigation performance, duration and TAD, significantly differentiate surgical experience. The methods presented have the potential to provide truly objective assessment of resident technical performance in the OR.

4.
Acad Med ; 93(8): 1212-1217, 2018 08.
Article in English | MEDLINE | ID: mdl-29697428

ABSTRACT

PURPOSE: Many factors influence the reliable assessment of medical students' competencies in the clerkships. The purpose of this study was to determine how many clerkship competency assessment scores were necessary to achieve an acceptable threshold of reliability. METHOD: Clerkship student assessment data were collected during the 2015-2016 academic year as part of the medical school assessment program at the University of Michigan Medical School. Faculty and residents assigned competency assessment scores for third-year core clerkship students. Generalizability (G) and decision (D) studies were conducted using balanced, stratified, and random samples to examine the extent to which overall assessment scores could reliably differentiate between students' competency levels both within and across clerkships. RESULTS: In the across-clerkship model, the residual error accounted for the largest proportion of variance (75%), whereas the variance attributed to the student and student-clerkship effects was much smaller (7% and 10.1%, respectively). D studies indicated that generalizability estimates for eight assessors within a clerkship varied across clerkships (G coefficients range = 0.000-0.795). Within clerkships, the number of assessors needed for optimal reliability varied from 4 to 17. CONCLUSIONS: Minimal reliability was found in competency assessment scores for half of clerkships. The variability in reliability estimates across clerkships may be attributable to differences in scoring processes and assessor training. Other medical schools face similar variation in assessments of clerkship students; therefore, the authors hope this study will serve as a model for other institutions that wish to examine the reliability of their clerkship assessment scores.


Subject(s)
Clinical Clerkship/standards , Clinical Competence/standards , Educational Measurement/standards , Clinical Clerkship/statistics & numerical data , Clinical Competence/statistics & numerical data , Educational Measurement/methods , Educational Measurement/statistics & numerical data , Educational Status , Humans , Reproducibility of Results , Students, Medical/statistics & numerical data
5.
Acad Med ; 93(8): 1146-1149, 2018 08.
Article in English | MEDLINE | ID: mdl-29465452

ABSTRACT

PROBLEM: As medical schools move from discipline-based courses to more integrated approaches, identifying assessment tools that parallel this change is an important goal. APPROACH: The authors describe the use of test item statistics to assess the reliability and validity of web-enabled mechanistic case diagrams (MCDs) as a potential tool to assess students' ability to integrate basic science and clinical information. Students review a narrative clinical case and construct an MCD using items provided by the case author. Students identify the relationships among underlying risk factors, etiology, pathogenesis and pathophysiology, and the patients' signs and symptoms. They receive one point for each correctly identified link. OUTCOMES: In 2014-2015 and 2015-2016, case diagrams were implemented in consecutive classes of 150 medical students. The alpha reliability coefficient for the overall score, constructed using each student's mean proportion correct across all cases, was 0.82. Discrimination indices for each of the case scores with the overall score ranged from 0.23 to 0.51. In a G study using those students with complete data (n = 251) on all 16 cases, 10% of the variance was true score variance, and systematic case variance was large. Using 16 cases generated a G coefficient (relative score reliability) equal to 0.72 and a Phi equal to 0.65. NEXT STEPS: The next phase of the project will involve deploying MCDs in higher-stakes settings to determine whether similar results can be achieved. Further analyses will determine whether these assessments correlate with other measures of higher-order thinking skills.


Subject(s)
Educational Measurement/standards , Students, Medical/psychology , Thinking , Clinical Competence/standards , Educational Measurement/methods , Humans , Reproducibility of Results
6.
Fam Med ; 49(2): 97-105, 2017 Feb.
Article in English | MEDLINE | ID: mdl-28218934

ABSTRACT

BACKGROUND AND OBJECTIVES: Many medical student-patient encounters occur in the outpatient setting. Conference room staffing (CRS) of student presentations has been the norm in the United States in recent decades. However, this method may not be suitable for outpatient precepting, being inefficient and reducing valuable direct face time between physician and patient. Precepting in the Presence of the Patient (PIPP) has previously been found to be an effective educational model in the outpatient setting but has never been studied in family medicine clinics, nor with non-English speaking patients, nor patients from lower socioeconomic backgrounds with low literacy. METHODS: We used a randomized controlled trial of educational models comparing time spent using PIPP with CRS in two family medicine clinics. Patient, student, and physician satisfaction were also measured using a 5-point Likert scale; total encounter time and time spent precepting were also recorded. RESULTS: PIPP is strongly preferred by attending physicians while patients and students were equally satisfied with either precepting method. PIPP provides an additional 3 minutes of physician-patient face time (17.39 versus 14.08 minutes) in an encounter that is overall shortened by 2 minutes (17.39 versus 19.71 minutes). CONCLUSIONS: PIPP is an effective method for precepting medical students in family medicine clinics, even with non-English speaking patients and those with low literacy. Given the time constraints of family physicians, PIPP should be considered as a preferred, time-efficient method for training medical students that is well received by patients, students, and particularly by physicians.


Subject(s)
Family Practice/education , Preceptorship/methods , Students, Medical/psychology , Adult , Ambulatory Care , Female , Humans , Male , Middle Aged , Patient Satisfaction , Physician-Patient Relations , Physicians, Family/psychology , Time Factors , United States
7.
J Eval Clin Pract ; 23(1): 44-48, 2017 Feb.
Article in English | MEDLINE | ID: mdl-26486941

ABSTRACT

RATIONALE: Decision-making performance assessments have proven problematic for assessing clinical reasoning. AIMS AND OBJECTIVES: A Bayesian approach to designing an advanced clinical reasoning assessment is well grounded in mathematical and cognitive theory and may offer significant psychometric advantages. Probabilistic logic plays an important role in medical problem solving, and performances on Bayesian-type tasks appear to be causally-related to the ability to make sound clinical decisions. METHODS: A validity argument is used to guide the design of an assessment of medical reasoning using clinical probabilities. RESULTS/CONCLUSIONS: The practical advantage of using a Bayesian approach to item design relates to the fact that probability theory provides a rationally optimal method for managing uncertain information and provides the criteria for objective correct answer scoring. Potential item formats are discussed.


Subject(s)
Bayes Theorem , Clinical Competence/standards , Clinical Decision-Making/methods , Problem Solving , Humans , Logic , Psychometrics , Thinking , Uncertainty
8.
Acad Med ; 92(4): 550-555, 2017 04.
Article in English | MEDLINE | ID: mdl-27805951

ABSTRACT

PURPOSE: To develop and determine the reliability of a novel measurement instrument assessing the quality of residents' discharge summaries. METHOD: In 2014, the authors created a discharge summary evaluation instrument based on consensus recommendations from national regulatory bodies and input from primary care providers at their institution. After a brief pilot, they used the instrument to evaluate discharge summaries written by first-year internal medicine residents (n = 24) at a single U.S. teaching hospital during the 2013-2014 academic year. They conducted a generalizability study to determine the reliability of the instrument and a series of decision studies to determine the number of discharge summaries and raters needed to achieve a reliable evaluation score. RESULTS: The generalizability study demonstrated that 37% of the variance reflected residents' ability to generate an adequate discharge summary (true score variance). The decision studies estimated that the mean score from six discharge summary reviews completed by a unique rater for each review would yield a reliability coefficient of 0.75. Because of high interrater reliability, multiple raters per discharge summary would not significantly enhance the reliability of the mean rating. CONCLUSIONS: This evaluation instrument reliably measured residents' performance writing discharge summaries. A single rating of six discharge summaries can achieve a reliable mean evaluation score. Using this instrument is feasible even for programs with a limited number of inpatient encounters and a small pool of faculty preceptors.


Subject(s)
Clinical Competence , Internal Medicine/education , Internship and Residency , Patient Discharge Summaries/standards , Educational Measurement/methods , Hospitals, Teaching , Humans , Pilot Projects , Reproducibility of Results , Retrospective Studies , United States
9.
Iowa Orthop J ; 36: 1-6, 2016.
Article in English | MEDLINE | ID: mdl-27528827

ABSTRACT

BACKGROUND: Interpreting two-dimensional radiographs to ascertain the three-dimensional (3D) position and orientation of fracture planes and bone fragments is an important component of orthopedic diagnosis and clinical management. This skill, however, has not been thoroughly explored and measured. Our primary research question is to determine if 3D radiographic image interpretation can be reliably assessed, and whether this assessment varies by level of training. A test designed to measure this skill among orthopedic surgeons would provide a quantitative benchmark for skill assessment and training research. METHODS: Two tests consisting of a series of online exercises were developed to measure this skill. Each exercise displayed a pair of musculoskeletal radiographs. Participants selected one of three CT slices of the same or similar fracture patterns that best matched the radiographs. In experiment 1, 10 orthopedic residents and staff responded to nine questions. In experiment 2, 52 residents from both orthopedics and radiology responded to 12 questions. RESULTS: Experiment 1 yielded a Cronbach alpha of 0.47. Performance correlated with experience; r(8) = 0.87, p<0.01, suggesting that the test could be both valid and reliable with a slight increase in test length. In experiment 2, after removing three non-discriminating items, the Cronbach coefficient alpha was 0.28 and performance correlated with experience; r(50) = 0.25, p<0.10. CONCLUSIONS: Although evidence for reliability and validity was more compelling with the first experiment, the analyses suggest motivation and test duration are important determinants of test efficacy. The interpretation of radiographs to discern 3D information is a promising and a relatively unexplored area for surgical skill education and assessment. The online test was useful and reliable. Further test development is likely to increase test effectiveness. CLINICAL RELEVANCE: Accurately interpreting radiographic images is an essential clinical skill. Quantitative, repeatable techniques to measure this skill can improve resident training and improve patient safety.


Subject(s)
Clinical Competence , Fractures, Bone/diagnostic imaging , Orthopedics/education , Tomography, X-Ray Computed , Educational Measurement , Humans , Reproducibility of Results
10.
J Surg Educ ; 73(5): 780-7, 2016.
Article in English | MEDLINE | ID: mdl-27184177

ABSTRACT

OBJECTIVE: There are no widely accepted, objective, and reliable tools for measuring surgical skill in the operating room (OR). Ubiquitous video and imaging technology provide opportunities to develop metrics that meet this need. Hip fracture surgery is a promising area in which to develop these measures because hip fractures are common, the surgery is used as a milestone for residents, and it demands technical skill. The study objective is to develop meaningful, objective measures of wire navigation performance in the OR. DESIGN: Resident surgeons wore a head-mounted video camera while performing surgical open reduction and internal fixation using a dynamic hip screw. Data collected from video included: duration of wire navigation, number of fluoroscopic images, and the degree of intervention by the surgeon׳s supervisor. To determine reliability of these measurements, 4 independent raters performed them for 2 cases. Raters independently measured the tip-apex distance (TAD), which reflects the accuracy of the surgical placement of the wire, on all the 7 cases. SETTING: University of Iowa Hospitals and Clinics in Iowa City, IA-a public tertiary academic center. PARTICIPANTS: In total 7 surgeries were performed by 7 different orthopedic residents. All 10 raters were biomedical engineering graduate students. RESULTS: The standard deviations for anteroposterior, lateral, and combined TAD measurements of the 10 raters were 2.7, 1.9, and 3.7mm, respectively, and interrater reliability produced a Cronbach α of 0.97. The interrater reliability analysis for all 9 video-based measures produced a Cronbach α of 0.99. CONCLUSIONS: Several video-based metrics were consistent across the 4 video reviewers and are likely to be useful for performance assessment. The TAD measurement was less reliable than previous reports have suggested, but remains a valuable metric of performance. Nonexperts can reliably measure these values and they offer an objective assessment of OR performance.


Subject(s)
Bone Wires , Clinical Competence , Fracture Fixation, Internal/methods , Hip Fractures/surgery , Operating Rooms , Orthopedic Procedures/education , Orthopedic Procedures/instrumentation , Aged, 80 and over , Education, Medical, Graduate , Female , Fluoroscopy , Humans , Internship and Residency , Iowa , Male , Reproducibility of Results , Treatment Outcome , Video Recording
11.
Teach Learn Med ; 28(3): 279-85, 2016.
Article in English | MEDLINE | ID: mdl-27092723

ABSTRACT

UNLABELLED: Construct/Background: Medical school grades are currently unstandardized, and their level of reliability is unknown. This means their usefulness for reporting on student achievement is also not well documented. This study investigates grade reliability within 1 medical school. APPROACH: Generalizability analyses are conducted on grades awarded. Grades from didactic and clerkship-based courses were treated as 2 levels of a fixed facet within a univariate mixed model. Grades from within the 2 levels (didactic and clerkship) were also entered in a multivariate generalizability study. RESULTS: Grades from didactic courses were shown to produce a highly reliable mean score (G = .79) when averaged over as few as 5 courses. Although the universe score correlation between didactic and clerkship courses was high (r = .80), the clerkship courses required almost twice as many grades to reach a comparable level of reliability. When grades were converted to a Pass/Fail metric, almost all information contained in the grades was lost. CONCLUSIONS: Although it has been suggested that the imprecision of medical school grades precludes their use as a reliable indicator of student achievement, these results suggest otherwise. While it is true that a Pass/Fail system of grading provides very little information about a student's level of performance, a multi-tiered grading system was shown to be a highly reliable indicator of student achievement within the medical school. Although grades awarded during the first 2 didactic years appear to be more reliable than clerkship grades, both yield useful information about student performance within the medical college.


Subject(s)
Education, Medical/standards , Educational Measurement/standards , Achievement , Humans , Iowa , Models, Statistical , Reproducibility of Results
12.
Med Educ Online ; 21: 29279, 2016.
Article in English | MEDLINE | ID: mdl-26925540

ABSTRACT

BACKGROUND: When ratings of student performance within the clerkship consist of a variable number of ratings per clinical teacher (rater), an important measurement question arises regarding how to combine such ratings to accurately summarize performance. As previous G studies have not estimated the independent influence of occasion and rater facets in observational ratings within the clinic, this study was designed to provide estimates of these two sources of error. METHOD: During 2 years of an emergency medicine clerkship at a large midwestern university, 592 students were evaluated an average of 15.9 times. Ratings were performed at the end of clinical shifts, and students often received multiple ratings from the same rater. A completely nested G study model (occasion: rater: person) was used to analyze sampled rating data. RESULTS: The variance component (VC) related to occasion was small relative to the VC associated with rater. The D study clearly demonstrates that having a preceptor rate a student on multiple occasions does not substantially enhance the reliability of a clerkship performance summary score. CONCLUSIONS: Although further research is needed, it is clear that case-specific factors do not explain the low correlation between ratings and that having one or two raters repeatedly rate a student on different occasions/cases is unlikely to yield a reliable mean score. This research suggests that it may be more efficient to have a preceptor rate a student just once. However, when multiple ratings from a single preceptor are available for a student, it is recommended that a mean of the preceptor's ratings be used to calculate the student's overall mean performance score.


Subject(s)
Clinical Clerkship/standards , Educational Measurement/methods , Educational Measurement/standards , Clinical Competence , Emergency Medicine/education , Humans , Observer Variation , Reproducibility of Results
13.
Teach Learn Med ; 27(2): 197-200, 2015.
Article in English | MEDLINE | ID: mdl-25893942

ABSTRACT

ISSUE: The research published outside of medical education journals provides an important source of validity evidence for using cognitive ability testing in medical school admissions. EVIDENCE: The cumulative body of validity research, consisting of thousands of studies and scores of meta-analyses, has conclusively demonstrated that a strong positive relationship exists between job performance and general mental ability. IMPLICATIONS: Recommendations for reducing the emphasis on or eliminating the role of general mental ability in the selection process for medical schools are not based on a consideration of the wider research evidence. Admission interventions that substantially reduce the level of academic aptitude are also likely to result in reduced professional performance.


Subject(s)
College Admission Test , Predictive Value of Tests , School Admission Criteria , Schools, Medical , Clinical Competence , Forecasting , Humans , Learning , United States
14.
Med Educ Online ; 19: 24708, 2014.
Article in English | MEDLINE | ID: mdl-25059836

ABSTRACT

The goal of mechanistic case diagraming (MCD) is to provide students with more in-depth understanding of cause and effect relationships and basic mechanistic pathways in medicine. This will enable them to better explain how observed clinical findings develop from preceding pathogenic and pathophysiological events. The pedagogic function of MCD is in relating risk factors, disease entities and morphology, signs and symptoms, and test and procedure findings in a specific case scenario with etiologic pathogenic and pathophysiological sequences within a flow diagram. In this paper, we describe the addition of automation and predetermined lists to further develop the original concept of MCD as described by Engelberg in 1992 and Guerrero in 2001. We demonstrate that with these modifications, MCD is effective and efficient in small group case-based teaching for second-year medical students (ratings of ~3.4 on a 4.0 scale). There was also a significant correlation with other measures of competency, with a 'true' score correlation of 0.54. A traditional calculation of reliability showed promising results (α =0.47) within a low stakes, ungraded environment. Further, we have demonstrated MCD's potential for use in independent learning and TBL. Future studies are needed to evaluate MCD's potential for use in medium stakes assessment or self-paced independent learning and assessment. MCD may be especially relevant in returning students to the application of basic medical science mechanisms in the clinical years.


Subject(s)
Disease/etiology , Education, Medical, Undergraduate/methods , Internet , Learning , Causality , Humans , Iowa , Risk Factors , User-Computer Interface
16.
Teach Learn Med ; 25 Suppl 1: S50-6, 2013.
Article in English | MEDLINE | ID: mdl-24246107

ABSTRACT

Over the last 25 years a large body of research has investigated how best to select applicants to study medicine. Although these studies have inspired little actual change in admission practice, the implications of this research are substantial. Five areas of inquiry are discussed: (1) the interview and related techniques, (2) admission tests, (3) other measures of personal competencies, (4) the decision process, and (5) defining and measuring the criterion. In each of these areas we summarize consequential developments and discuss their implication for improving practice. (1) The traditional interview has been shown to lack both reliability and validity. Alternatives have been developed that display promising measurement characteristics. (2) Admission test scores have been shown to predict academic and clinical performance and are generally the most useful measures obtained about an applicant. (3) Due to the high-stakes nature of the admission decision, it is difficult to support a logical validity argument for the use of personality tests. Although standardized letters of recommendation appear to offer some promise, more research is needed. (4) The methods used to make the selection decision should be responsive to validity research on how best to utilize applicant information. (5) Few resources have been invested in obtaining valid criterion measures. Future research might profitably focus on composite score as a method for generating a measure of a physician's career success. There are a number of social and organization factors that resist evidence-based change. However, research over the last 25 years does present important findings that could be used to improve the admission process.


Subject(s)
Decision Making , School Admission Criteria , Schools, Medical/organization & administration , Students, Medical/psychology , Achievement , Aptitude , Education, Premedical , Educational Measurement , Humans , Interviews as Topic , Morals , Personality , Professional Competence , Psychological Tests , Resilience, Psychological , Social Behavior
17.
Med Educ ; 47(12): 1175-83, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24206151

ABSTRACT

CONTEXT: Recent reviews have claimed that the script concordance test (SCT) methodology generally produces reliable and valid assessments of clinical reasoning and that the SCT may soon be suitable for high-stakes testing. OBJECTIVES: This study is intended to describe three major threats to the validity of the SCT not yet considered in prior research and to illustrate the severity of these threats. METHODS: We conducted a review of SCT reports available through the Web of Science database. Additionally, we reanalysed scores from a previously published SCT administration to explore issues related to standard SCT scoring practice. RESULTS: Firstly, the predominant method for aggregate and partial credit scoring of SCTs introduces logical inconsistencies in the scoring key. Secondly, our literature review shows that SCT reliability studies have generally ignored inter-panel, inter-panellist and test-retest measurement error. Instead, studies have focused on observed levels of coefficient alpha, which is neither an informative index of internal structure nor a comprehensive index of reliability for SCT scores. As such, claims that SCT scores show acceptable reliability are premature. Finally, SCT criteria for item inclusion, in concert with a statistical artefact of the SCT format, cause anchors at the extremes of the scale to have less expected credit than anchors near or at the midpoint. Consequently, SCT scores are likely to reflect construct-irrelevant differences in examinees' response styles. This makes the test susceptible to bias against candidates who endorse extreme scale anchors more readily; it also makes two construct-irrelevant test taking strategies extremely effective. In our reanalysis, we found that examinees could drastically increase their scores by never endorsing extreme scale points. Furthermore, examinees who simply endorsed the scale midpoint for every item would still have outperformed most examinees who used the scale as it is intended. CONCLUSIONS: Given the severity of these threats, we conclude that aggregate scoring of SCTs cannot be recommended. Recommendations for revisions of SCT methodology are discussed.


Subject(s)
Education, Medical, Graduate/methods , Educational Measurement/methods , Clinical Competence/standards , Decision Making , Humans , Reproducibility of Results
18.
Anesth Analg ; 116(6): 1342-51, 2013 Jun.
Article in English | MEDLINE | ID: mdl-23558839

ABSTRACT

BACKGROUND: A study by de Oliveira Filho et al. reported a validated set of 9 questions by which Brazilian anesthesia residents assessed faculty supervision in the operating room. The aim of this study was to use this question set to determine whether faculty operating room supervision scores were associated with residents' year of clinical anesthesia training and/or number of specific resident-faculty interactions. We also characterized associations between faculty operating room supervision scores and resident assessments of: (1) faculty supervision in settings other than operating rooms, (2) faculty clinical ability (family choice), and (3) faculty teaching effectiveness. Finally, we characterized the psychometric properties of the de Oliveira Filho etal. question set in an United States anesthesia residency program. METHODS: All 39 residents in the Department of Anesthesia of the University of Iowa in their first (n = 14), second (n = 13), or third (n = 12) year of clinical anesthesia training evaluated the supervision provided by all anesthesia faculty who staffed in at least 1 of 3 clinical settings (operating room [n = 49], surgical intensive care unit [n = 10], pain clinic [n = 6]). For all resident-faculty pairs, departmental billing data were used to quantitate the number of resident-faculty interactions and the interval between the last interaction and the assessment. A generalizability study was performed to determine the minimum number of resident evaluations needed for high reliability and dependability. RESULTS: There were no significant associations between faculty mean operating room supervision scores and: (1) resident-faculty patient encounters (Kendall τb = 0.01; 95% confidence interval [CI], -0.02 to +0.04; P = 0.71), (2) resident-faculty days of interaction (τb = -0.01; 95% CI, -0.05 to +0.02; P = 0.46), and (3) days since last resident-faculty interaction (τb = 0.01; 95% CI, -0.02 to 0.05; P = 0.49). Supervision scores for the operating room and surgical intensive care unit were highly correlated (τb = 0.71; 95% CI, 0.63 to 0.78; P < 0.0001). Supervision scores for the operating room also were highly correlated with family choice scores (τb = 0.77; 95% CI, 0.70 to 0.84; P < 0.0001) and teaching scores (τb = 0.87; 95% CI, 0.82 to 0.92; P < 0.0001). High reliability and dependability (both G- and ϕ-coefficients > 0.80) occurred when individual faculty anesthesiologists received assessments from 15 or more different residents. CONCLUSION: Supervision scores provided by all residents can be given equal weight when calculating an individual faculty anesthesiologist's mean supervision score. Assessments of supervision, teaching, and quality of clinical care are highly correlated. When the de Oliveira Filho et al. question set is used in a United States anesthesia residency program, supervision scores are highly reliable and dependable when at least 15 residents assess each faculty.


Subject(s)
Anesthesiology/education , Faculty, Medical , Internship and Residency , Operating Rooms/organization & administration , Psychometrics , Humans , Organization and Administration , Reproducibility of Results
19.
Med Educ Online ; 18: 1-9, 2013 Apr 10.
Article in English | MEDLINE | ID: mdl-23578659

ABSTRACT

BACKGROUND: The U.S. Supreme Court has recently heard another affirmative action case, and similar programs to promote equitable representation in higher education are being debated and enacted around the world. Understanding the empirical and quantitative research conducted over the last 50 years is important in designing effective and fair initiatives related to affirmative action in medical education. Unfortunately, the quantitative measurement research relevant to affirmative action is poorly documented in the scholarly journals that serve medical education. METHODS: This research organizes and documents the measurement literature relevant to enacting affirmative action within the medical school environment, and should be valuable for informing future actions. It provides summaries of those areas where the research evidence is strong and highlights areas where more research evidence is needed. To structure the presentation, 10 topic areas are identified in the form of research questions. RESULTS: Measurement evidence related to these questions is reviewed and summarized to provide evidence-based answers. CONCLUSIONS: These answers should provide a useful foundation for making important decisions regarding the use of racial diversity initiatives in medical education.


Subject(s)
College Admission Test , Education, Medical/organization & administration , Racial Groups , Education, Medical/legislation & jurisprudence , Education, Medical/standards , Humans , United States
20.
Teach Learn Med ; 25(1): 103-7, 2013.
Article in English | MEDLINE | ID: mdl-23330903

ABSTRACT

BACKGROUND. Admission decisions require that information about an applicant be combined using either holistic (human judges) or statistical (actuarial) methods. For optimizing a defined measureable outcome, there is a consistent body of research evidence demonstrating that statistical methods yield superior decisions compared to those generated by judges. It is possible, however, that the benefits of holistic decisions are reflected in unmeasured outcomes. If such benefits exist, they would necessarily appear as systematic variance in raters' scores that deviate from statistically-based decisions. PURPOSE. To estimate this variance, we propose a design examining the interrater reliability of difference scores (i.e., the difference between observed committee rankings and rankings based on statistical approaches). METHODS. Example calculations and G study models are presented to demonstrate how rater agreement on difference scores can be analyzed under various circumstances. High interrater reliability of difference scores would support but not prove the assertion that the holistic process adds useful information beyond that achieved by much less costly statistical approaches. Conversely, if the interrater reliability of difference scores is near zero, this would clearly demonstrate that committee judgments add random error to the decision process. RESULTS. Evidence to conduct such studies already exists within most highly selective medical schools and graduate programs and the proposed validity research could be conducted on existing data. CONCLUSIONS. Such research evidence is critical for establishing the validity of widely used holistic admission approaches.


Subject(s)
School Admission Criteria , Schools, Medical , Humans , Reproducibility of Results , United States
SELECTION OF CITATIONS
SEARCH DETAIL
...