Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Med Teach ; 40(11): 1143-1150, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-29688108

RESUMEN

BACKGROUND: Increased recognition of the importance of competency-based education and assessment has led to the need for practical and reliable methods to assess relevant skills in the workplace. METHODS: A novel milestone-based workplace assessment system was implemented in 15 pediatrics residency programs. The system provided: (1) web-based multisource feedback (MSF) and structured clinical observation (SCO) instruments that could be completed on any computer or mobile device; and (2) monthly feedback reports that included competency-level scores and recommendations for improvement. RESULTS: For the final instruments, an average of five MSF and 3.7 SCO assessment instruments were completed for each of 292 interns; instruments required an average of 4-8 min to complete. Generalizability coefficients >0.80 were attainable with six MSF observations. Users indicated that the new system added value to their existing assessment program; the need to complete the local assessments in addition to the new assessments was identified as a burden of the overall process. CONCLUSIONS: Outcomes - including high participation rates and high reliability compared to what has traditionally been found with workplace-based assessment - provide evidence for the validity of scores resulting from this novel competency-based assessment system. The development of this assessment model is generalizable to other specialties.


Asunto(s)
Educación Basada en Competencias/normas , Evaluación Educacional/métodos , Retroalimentación Formativa , Internado y Residencia/organización & administración , Lugar de Trabajo/normas , Competencia Clínica/normas , Toma de Decisiones Clínicas , Evaluación Educacional/normas , Humanos , Internet , Internado y Residencia/normas , Pediatría/educación , Reproducibilidad de los Resultados
2.
Med Teach ; 38(10): 995-1002, 2016 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-27027428

RESUMEN

BACKGROUND: The Pediatrics Milestones Assessment Pilot employed a new multisource feedback (MSF) instrument to assess nine Pediatrics Milestones among interns and subinterns in the inpatient context. OBJECTIVE: To report validity evidence for the MSF tool for informing milestone classification decisions. METHODS: We obtained MSF instruments by different raters per learner per rotation. We present evidence for validity based on the unified validity framework. RESULTS: One hundred and ninety two interns and 41 subinterns at 18 Pediatrics residency programs received a total of 1084 MSF forms from faculty (40%), senior residents (34%), nurses (22%), and other staff (4%). Variance in ratings was associated primarily with rater (32%) and learner (22%). The milestone factor structure fit data better than simpler structures. In domains except professionalism, ratings by nurses were significantly lower than those by faculty and ratings by other staff were significantly higher. Ratings were higher when the rater observed the learner for longer periods and had a positive global opinion of the learner. Ratings of interns and subinterns did not differ, except for ratings by senior residents. MSF-based scales correlated with summative milestone scores. CONCLUSION: We obtain moderately reliable MSF ratings of interns and subinterns in the inpatient context to inform some milestone assignments.


Asunto(s)
Competencia Clínica/normas , Evaluación Educacional/normas , Retroalimentación Formativa , Internado y Residencia , Pediatría/normas , Educación Basada en Competencias , Evaluación Educacional/métodos , Análisis Factorial , Docentes , Humanos , Enfermeras y Enfermeros , Pediatría/educación , Psicometría , Sociedades Médicas
3.
Adv Health Sci Educ Theory Pract ; 17(2): 165-81, 2012 May.
Artículo en Inglés | MEDLINE | ID: mdl-20094911

RESUMEN

During the last decade, interest in assessing professionalism in medical education has increased exponentially and has led to the development of many new assessment tools. Efforts to validate the scores produced by tools designed to assess professionalism have lagged well behind the development of these tools. This paper provides a structured framework for collecting evidence to support the validity of assessments of professionalism. The paper begins with a short history of the concept of validity in the context of psychological assessment. It then describes Michael Kane's approach to validity as a structured argument. The majority of the paper then focuses on how Kane's framework can be applied to assessments of professionalism. Examples are provided from the literature, and recommendations for future investigation are made in areas where the literature is deficient.


Asunto(s)
Educación Médica/métodos , Trastornos Mentales/diagnóstico , Competencia Profesional , Rol Profesional , Pruebas Psicológicas , Reproducibilidad de los Resultados , Humanos
4.
Acad Med ; 82(10 Suppl): S44-7, 2007 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-17895689

RESUMEN

BACKGROUND: The National Board of Medical Examiners is currently developing the Assessment of Professional Behaviors, a multisource feedback (MSF) tool intended for formative use with medical students and residents. This study investigated whether missing responses on this tool can be considered random; evidence that missing values are not random would suggest response bias, a significant threat to score validity. METHOD: Correlational analyses of pilot data (N = 2,149) investigated whether missing values were systematically related to global evaluations of observees. RESULTS: The percentage of missing items was correlated with global evaluations of observees; observers answered more items for preferred observees compared with nonpreferred observees. CONCLUSIONS: Missing responses on this MSF tool seem to be nonrandom and are instead systematically related to global perceptions of observees. Further research is needed to determine whether modifications to the items, the instructions, or other components of the assessment process can reduce this effect.


Asunto(s)
Conducta , Competencia Clínica/normas , Recolección de Datos/estadística & datos numéricos , Educación Médica , Evaluación de Programas y Proyectos de Salud/estadística & datos numéricos , Estudiantes de Medicina , Encuestas y Cuestionarios , Humanos , Variaciones Dependientes del Observador , Proyectos Piloto , Estudios Retrospectivos
5.
Acad Med ; 82(10 Suppl): S101-4, 2007 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-17895671

RESUMEN

BACKGROUND: Systematic trends in examinee performance across the testing day (sequence effects) could indicate that artifacts of the testing situation have an impact on scores. This research investigated the presence of sequence effects for United States Medical Licensing Exam (USMLE) Step 2 clinical skills (CS) examination components. METHOD: Data from Step 2 CS examinees were analyzed using analysis of covariance and hierarchical linear modeling procedures. RESULTS: Sequence was significant for three of the components; communication and interpersonal skills, data gathering, and documentation. A significant gender x sequence interaction was found for two components. CONCLUSIONS: The presence of sequence effects suggests that scores on early cases are influenced by factors that are unrelated to the proficiencies of interest. More research is needed to fully understand these effects.


Asunto(s)
Competencia Clínica/normas , Evaluación Educacional/métodos , Docentes Médicos , Licencia Médica , Estudiantes de Medicina , Comunicación , Femenino , Humanos , Relaciones Interpersonales , Modelos Lineales , Masculino , Factores Sexuales , Estados Unidos
6.
Acad Med ; 81(10 Suppl): S21-4, 2006 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-17001128

RESUMEN

BACKGROUND: This research examined relationships between and among scores from the United States Medical Licensing Examination (USMLE) Step 1, Step 2 Clinical Knowledge (CK), and subcomponents of the Step 2 Clinical Skills (CS) examination. METHOD: Correlations and failure rates were produced for first-time takers who tested during the first year of Step 2 CS Examination administration (June 2004 to July 2005). RESULTS: True-score correlations were high between patient note (PN) and data gathering (DG), moderate between communication and interpersonal skills and DG, and low between the remaining score pairs. There was little overlap between examinees failing Step 2 CK and the different components of Step 2 CS. CONCLUSION: Results suggest that combining DG and PN scores into a single composite score is reasonable and that relatively little redundancy exists between Step 2 CK and CS scores.


Asunto(s)
Competencia Clínica , Relaciones Interpersonales , Lenguaje , Licencia Médica , Comunicación , Médicos Graduados Extranjeros , Humanos , Estados Unidos
7.
Acad Med ; 81(10 Suppl): S56-60, 2006 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-17001137

RESUMEN

BACKGROUND: Multivariate generalizability analysis was used to investigate the performance of a commonly used clinical evaluation tool. METHOD: Practicing physicians were trained to use the mini-Clinical Skills Examination (CEX) rating form to rate performances from the United States Medical Licensing Examination Step 2 Clinical Skills examination. RESULTS: Differences in rater stringency made the greatest contribution to measurement error; more raters rating each examinee, even on fewer occasions, could enhance score stability. Substantial correlated error across the competencies suggests that decisions about one scale unduly influence those on others. CONCLUSIONS: Given the appearance of a halo effect across competencies, score interpretations that assume assessment of distinct dimensions of clinical performance should be made with caution. If the intention is to produce a single composite score by combining results across competencies, the presence of these effects may be less critical.


Asunto(s)
Competencia Clínica/normas , Evaluación Educacional/métodos , Examen Físico/métodos , Programas Informáticos , Análisis de Varianza , Humanos , Entrevistas como Asunto
8.
Acad Med ; 79(10 Suppl): S62-4, 2004 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-15383392

RESUMEN

PURPOSE: Operational USMLE(TM) computer-based case simulation results were examined to determine the extent to which rater reliability and regression model performance met expectations based on preoperational data. METHOD: Operational data resulted from Step 3 examinations given between 1999 and 2004. Plots were produced using reliability and multiple correlation coefficients. RESULTS: Operational testing reliabilities increased over the four years but were lower than the preoperational reliability. Multiple correlation coefficient results are somewhat superior to the results reported during the preoperational period and suggest that the operational scoring algorithms have been relatively consistent. CONCLUSIONS: Changes in the rater population, changes in the rating task, and enhancements to the training procedures are several factors that can explain the identified differences between preoperational and operational results. The present findings have important implications for test development and test validity.


Asunto(s)
Competencia Clínica , Simulación por Computador , Educación Médica , Evaluación Educacional/métodos , Licencia Médica , Algoritmos , Evaluación Educacional/estadística & datos numéricos , Humanos , Variaciones Dependientes del Observador , Análisis de Regresión , Reproducibilidad de los Resultados
9.
Acad Med ; 78(10 Suppl): S68-71, 2003 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-14557100

RESUMEN

PURPOSE: This work investigated the reliability of and relationships between individual case and composite scores on a standardized patient clinical skills examination. METHOD: Four hundred ninety two fourth-year U.S. medical students received three scores [data gathering (DG), interpersonal skills (IPS), and written communication (WC)] for each of 10 standardized patient cases. mGENOVA software was used for all analyses. RESULTS: Estimated generalizability coefficients were 0.69, 0.80, and 0.70 for the DG, IPS, and WC scores, respectively. The universe-score correlation between DG and WC was high (.83); those for DG/IPS and IPS/WC were not as strong (0.51 and 0.37, respectively). Task difficulty appears to be modestly but positively related across the three scores. Correlations between the person-by-task effects for DG/IPS and DG/WC were positive yet modest. The estimated generalizability coefficient for a ten-case test using an equally weighted composite DG/WC score was 0.78. CONCLUSIONS: This work allows for interpretation of correlations between (1) proficiencies measured by multiple scores and (2) sources of error that affect those scores as well as for estimation of the reliability of composite scores. Results have important implications for test construction and test validity.


Asunto(s)
Licencia Médica/estadística & datos numéricos , Anamnesis/estadística & datos numéricos , Examen Físico/estadística & datos numéricos , Relaciones Médico-Paciente , Competencia Clínica/estadística & datos numéricos , Humanos , Análisis Multivariante , Estudiantes de Medicina , Estados Unidos
10.
Acad Med ; 79(10 Suppl): S43-5, 2004 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-15383386

RESUMEN

PURPOSE: To assess the validity of the USMLE Step 2 Clinical Knowledge (CK) examination by addressing the degree to which experts view item content as clinically relevant and appropriate for Step 2 CK. METHOD: Twenty-seven experts were asked to complete three survey questions related to the clinical relevance and appropriateness of 150 Step 2 CK multiple-choice questions. Percentages, reliability estimates, and correlation coefficients were calculated and ordinary least squares regression was used. RESULTS: Results showed that 92% of expert judgments indicated the item content was clinically relevant, 90% indicated the content was appropriate for Step 2 CK, and 85% indicated the content was used in clinical practice. The regression indicated that more difficult items and more frequently used items are considered more appropriate for Step 2 CK. CONCLUSIONS: Results suggest that the majority of item content is clinically relevant and appropriate, thus providing validation support for Step 2 CK.


Asunto(s)
Competencia Clínica , Educación Médica , Evaluación Educacional/normas , Licencia Médica , Competencia Clínica/normas , Educación Médica/normas , Evaluación Educacional/métodos , Testimonio de Experto , Femenino , Humanos , Juicio , Masculino , Reproducibilidad de los Resultados , Estados Unidos
11.
Acad Med ; 85(9): 1453-61, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20736673

RESUMEN

PURPOSE: The mini-Clinical Evaluation Exercise (mCEX) is increasingly being used to assess the clinical skills of medical trainees. Existing mCEX research has typically focused on isolated aspects of the instrument's reliability and validity. A more thorough validity analysis is necessary to inform use of the mCEX, particularly in light of increased interest in high-stakes applications of the methodology. METHOD: Kane's (2006) validity framework, in which a structured argument is developed to support the intended interpretation(s) of assessment results, was used to evaluate mCEX research published from 1995 to 2009. In this framework, evidence to support the argument is divided into four components (scoring, generalization, extrapolation, and interpretation/decision), each of which relates to different features of the assessment or resulting scores. The strength and limitations of the reviewed research were identified in relation to these components, and the findings were synthesized to highlight overall strengths and weaknesses of existing mCEX research. RESULTS: The scoring component yielded the most concerns relating to the validity of mCEX score interpretations. More research is needed to determine whether scoring-related issues, such as leniency error and high interitem correlations, limit the utility of the mCEX for providing feedback to trainees. Evidence within the generalization and extrapolation components is generally supportive of the validity of mCEX score interpretations. CONCLUSIONS: Careful evaluation of the circumstances of mCEX assessment will help to improve the quality of the resulting information. Future research should address issues of rater selection, training, and monitoring which can impact rating accuracy.


Asunto(s)
Competencia Clínica , Educación de Postgrado en Medicina/métodos , Evaluación Educacional/métodos , Medicina Interna/educación , Internado y Residencia , Anamnesis/normas , Examen Físico/normas , Humanos , Psicometría , Reproducibilidad de los Resultados
12.
Acad Med ; 85(10 Suppl): S93-7, 2010 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-20881714

RESUMEN

PURPOSE: This research examined the credibility of the cut scores used to make pass/fail decisions on United States Medical Licensing Examination (USMLE) Step 1, Step 2 Clinical Knowledge, and Step 3. METHOD: Approximately 15,000 members of nine constituency groups were asked their opinions about (1) current initial and ultimate fail rates and (2) the highest acceptable, lowest acceptable, and optimal initial and ultimate fail rates. RESULTS: Initial fail rates were generally viewed as appropriate; more variability was associated with ultimate fail rates. Actual fail rates for each examination across recent years fell within the range that respondents considered acceptable. CONCLUSIONS: Results provide important evidence to support the appropriateness of the cut scores used to make classification decisions for USMLE examinations. This evidence is viewed as part of the overall validity argument for decisions based on USMLE scores.


Asunto(s)
Medicina Clínica/educación , Evaluación Educacional/estadística & datos numéricos , Licencia Médica , Educación de Pregrado en Medicina , Escolaridad , Humanos , Encuestas y Cuestionarios , Estados Unidos
15.
Acad Med ; 83(10 Suppl): S72-5, 2008 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-18820506

RESUMEN

BACKGROUND: Checklist scores used to produce the data gathering score on the Step 2 CS examination are currently weighted using an algorithm based on expert judgment about the importance of the item. The present research was designed to compare this approach with alternative weighting strategies. METHOD: Scores from 21,140 examinees who took the United States Medical Licensing Examination Step 2 between May 2006 and February 2007 were subjected to five weighting models: (1) a regression weights model, (2) a factor loading weights model, (3) a standardized response model, (4) an equal weights model, and (5) the operational expert-judgment weights model. RESULTS: Alternative weighting procedures may have a significant impact on the reliability and validity of checklist scores. CONCLUSIONS: The results suggest that the current weighting procedure is useful, and the regression-based model holds promise for practical application. The regression-based model produces scores that are more reliable than those produced by the current procedure and more strongly related to the external criteria.


Asunto(s)
Algoritmos , Competencia Clínica/estadística & datos numéricos , Licencia Médica , Modelos Estadísticos , Estudios de Cohortes , Análisis Factorial , Humanos , Juicio , Psicometría , Reproducibilidad de los Resultados , Estudios Retrospectivos , Estados Unidos
16.
Acad Med ; 83(10 Suppl): S41-4, 2008 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-18820498

RESUMEN

BACKGROUND: This research examined various sources of measurement error in the documentation score component of the United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills examination. METHOD: A generalizability theory framework was employed to examine the documentation ratings for 847 examinees who completed the USMLE Step 2 Clinical Skills examination during an eight-day period in 2006. Each patient note was scored by two different raters allowing for a persons-crossed-with-raters-nested-in-cases design. RESULTS: The results suggest that inconsistent performance on the part of raters makes a substantially greater contribution to measurement error than case specificity. Double scoring the notes significantly increases precision. CONCLUSIONS: The results provide guidance for improving operational scoring of the patient notes. Double scoring of the notes may produce an increase in the precision of measurement equivalent to that achieved by lengthening the test by more than 50%. The study also cautions researchers that when examining sources of measurement error, inappropriate data-collection designs may result in inaccurate inferences.


Asunto(s)
Competencia Clínica , Licencia Médica , Estudios de Cohortes , Comunicación , Generalización Psicológica , Humanos , Variaciones Dependientes del Observador , Simulación de Paciente , Examen Físico , Relaciones Médico-Paciente , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Estados Unidos
17.
Acad Med ; 83(10 Suppl): S9-12, 2008 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-18820511

RESUMEN

BACKGROUND: This study investigated whether participants' subjective reports of how they assigned ratings on a multisource feedback instrument provide evidence to support interpreting the resulting scores as objective, accurate measures of professional behavior. METHOD: Twenty-six participants completed think-aloud interviews while rating students, residents, or faculty members they had worked with previously. The items rated included 15 behavioral items and one global item. RESULTS: Participants referred to generalized behaviors and global impressions six times as often as specific behaviors, rated observees in the absence of information necessary to do so, relied on indirect evidence about performance, and varied in how they interpreted items. CONCLUSIONS: Behavioral change becomes difficult to address if it is unclear what behaviors raters considered when providing feedback. These findings highlight the importance of explicitly stating and empirically investigating the assumptions that underlie the use of an observational assessment tool.


Asunto(s)
Internado y Residencia , Entrevistas como Asunto , Pediatría/educación , Competencia Profesional , Conducta Social , Retroalimentación Psicológica , Humanos , Conocimiento Psicológico de los Resultados , Variaciones Dependientes del Observador , Investigación Cualitativa , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA