RESUMEN
Soft-pelleted, high-fat diets (HFD) are greasy and crumble easily leading to food wastage and hair coat grease accumulation when mice are fed using commercially available feeders. The ideal HFD feeder design should reduce food wastage, facilitate mouse weight gain, and minimize variables such as hair coat grease accumulation that have the potential to alter scratching behaviors. Our study compared the feeding efficiency of 2 commercially available feeders (feeders A and E) to 4 novel feeder designs (feeders B, C, D, and F). Novel feeders had alterations in feeding aperture size, feeding surface area, feeder configuration, and level of food presentation. Male C57BL/6NCrl mice (n = 120; 4/cage) were randomly assigned to cages containing one of the 6 feeder types and were fed HFD for 12 wk. Feeders and cage bottoms were weighed before use and then weekly at the time of cage change. Mice were weighed before starting the HFD and then biweekly. Scratching behavior was video recorded at 0, 4, 8, and 12 wk. Hair coat grease accumulation was visually scored biweekly. Feeder A use was associated with the highest feed cost due to HFD wastage ($36.98 ± 1.54/cage/wk). Mice fed using Feeder A had the highest average weight gain (23.75 ± 0.8 g, P < 0.005). However, mice also had significantly higher hair coat grease accumulation scores (P < 0.05) and significantly increased scratching frequency at 4 wk (P < 0.05) when compared with mice fed using other feeder types. Novel feeder designs utilized 10 to 21 times less HFD dispensed when compared to feeder A. Mice fed using novel feeders also displayed improved welfare, as evidenced by low hair coat grease accumulation scores, and no significant differences in scratching frequency when compared with baseline behavior.
RESUMEN
Introduction: The Achieve, Develop, Explore Programme for Trainees (ADEPT) Clinical Leadership Fellowship Programme was established in response to growing recommendations to underpin healthcare reconfiguration in Northern Ireland with a collective leadership strategy. The fellowship combines a leadership development programme with a project carried out within a host organisation. With the fellowship now in its sixth year, a need was identified to assess its impact on the fellows' leadership skills, career choices, achievements, and views on both the fellowship and how to develop future leaders. Methods: Demographic data for all ADEPT fellows was held centrally through Northern Ireland Medical and Dental Training Agency (NIMDTA) and assessed anonymously. A mixed-methods questionnaire was composed using Smart Survey. Likert scale questions were designed to determine the extent to which participants believed ADEPT supported their development of strong and exemplary elements of the nine dimensions of the NHS Healthcare Leadership Model. The questionnaire was distributed electronically to all ADEPT alumni in November 2021 and remained open for 4 weeks. Results: There have been 46 ADEPT fellows to date (72% female; all fellows were white). ADEPT fellows were most commonly from Psychiatry (33%), Paediatrics (17%) and Obstetrics and Gynaecology (15%). There were 19 responses from the alumni cohort of 46 (41%). 75% of respondents reported that their project resulted in publication, presentation or award. Leadership skill development was identified as best in "Evaluating Information" and "Engaging the Team", whereas skills in "Sharing the Vision" and "Developing Capability" saw less improvement. The majority felt that the fellowship had been useful in securing their position as a consultant or general practitioner and 50% went on to pursue senior leadership positions. Conclusion: The ADEPT Clinical Leadership Fellowship delivers effective leadership training as measured by the nine domains of the NHS Healthcare Leadership Model. It provides value for host organisations through the projects undertaken and by developing doctors who are more likely to engage in future formal leadership roles. ADEPT alumni saw the value in their leadership experience and felt it should be embedded in standard postgraduate training schemes to reach a wider audience.
Asunto(s)
Distinciones y Premios , Médicos Generales , Embarazo , Humanos , Niño , Femenino , Masculino , Becas , Liderazgo , Irlanda del NorteRESUMEN
CASE PRESENTATION: A snowmobile racer fell from his sled and was run over by another, sustaining "shark bite" to his hand and leg. He was evacuated to a trackside medical trailer where the characteristic wounds were felt to require further exploration at a hospital. DISCUSSION: "Shark bite" is a colloquial term for lacerations sustained from metal studs attached to a snowmobile's track. "Shark-bite" lacerations may be more prone to complications than other lacerations commonly sustained in motorsports events.
RESUMEN
Computing confidence intervals around generalizability coefficients has long been a challenging task in generalizability theory. This is a serious practical problem because generalizability coefficients are often computed from designs where some facets have small sample sizes, and researchers have little guide regarding the trustworthiness of the coefficients. As generalizability theory can be framed to a linear mixed-effect model (LMM), bootstrap and simulation techniques from LMM paradigm can be used to construct the confidence intervals. The purpose of this research is to examine four different LMM-based methods for computing the confidence intervals that have been proposed and to determine their accuracy under six simulated conditions based on the type of test scores (normal, dichotomous, and polytomous data) and data measurement design (p×i×r and p× [i:r]). A bootstrap technique called "parametric methods with spherical random effects" consistently produced more accurate confidence intervals than the three other LMM-based methods. Furthermore, the selected technique was compared with model-based approach to investigate the performance at the levels of variance components via the second simulation study, where the numbers of examines, raters, and items were varied. We conclude with the recommendation generalizability coefficients, the confidence interval should accompany the point estimate.
RESUMEN
We have constructed bispecific immunoglobulin-like immunoadhesins that bind to both the HIV-envelope glycoproteins: gp120 and gp41. These immunoadhesins have N terminal domains of human CD4 engrafted onto the N-terminus of the heavy chain of human anti-gp41 mAb 7B2. Binding of these constructs to recombinant Env and their antiviral activities were compared to that of the parental mAbs and CD4, as well as to control mAbs. The CD4/7B2 constructs bind to both gp41 and gp140, as well as to native Env expressed on the surface of infected cells. These constructs deliver cytotoxic immunoconjugates to HIV-infected cells, but not as well as a mixture of 7B2 and sCD4, and opsonize for antibody-mediated phagocytosis. Most surprisingly, given that 7B2 neutralizes weakly, if at all, is that the chimeric CD4/7B2 immunoadhesins exhibit broad and potent neutralization of HIV, comparable to that of well-known neutralizing mAbs. These data add to the growing evidence that enhanced neutralizing activity can be obtained with bifunctional mAbs/immunoadhesins. The enhanced neutralization activity of the CD4/7B2 chimeras may result from cross-linking of the two Env subunits with subsequent inhibition of the pre-fusion conformational events that are necessary for entry.
RESUMEN
Multivariate generalizability theory (mG-theory) is an important framework in many behavioral and educational studies, as it describes useful psychometric properties of multidimensional assessments. Nevertheless, the use of mG-theory estimation is limited due to the lack of available software for carrying out the necessary calculations: users rely heavily on independent software programs such as mGENOVA and the BUGS/JAGS suite of programs. Considering the prevalence of R software, this paper presents a solution using the glmmTMB package to accomplish the estimation task. Users adopting the proposed method may find it more convenient for conducting both applied investigation and simulation studies without the need to switch between different software programs.
Asunto(s)
Programas Informáticos , Simulación por Computador , Humanos , Modelos LinealesRESUMEN
Conventional methods for evaluating the utility of subscores rely on traditional indices of reliability and on correlations among subscores. One limitation of correlational methods is that they do not explicitly consider variation in subtest means. An exception is an index of score profile reliability designated as G , which quantifies the ratio of true score profile variance to observed score profile variance. G has been shown to be more sensitive than correlational methods to group differences in score profile utility. However, it is a group average, representing the expected value over a population of examinees. Just as score reliability varies across individuals and subgroups, one can expect that the reliability of score profiles will vary across examinees. This article proposes two conditional indices of score profile utility grounded in multivariate generalizability theory. The first is based on the ratio of observed profile variance to the profile variance that can be attributed to random error. The second quantifies the proportion of observed variability in a score profile that can be attributed to true score profile variance. The article describes the indices, illustrates their use with two empirical examples, and evaluates their properties with simulated data. The results suggest that the proposed estimators of profile error variance are consistent with the known error in simulated score profiles and that they provide information beyond that provided by traditional measures of subscore utility. The simulation study suggests that artificially large values of the indices could occur for about 5% to 8% of examinees. The article concludes by suggesting possible applications of the indices and discusses avenues for further research.
RESUMEN
A test blueprint describes the key elements of a test, including the content to be covered, the amount of emphasis allocated to each content area, and other important features. This article offers practical guidelines for developing test blueprints. We first discuss the role of learning outcomes and behavioral objectives in test blueprinting, and then describe a four-stage process for creating test blueprints. The steps include identifying the major knowledge and skill domains (i.e. competencies); delineating the specific assessment objectives; determining the method of assessment to address those objectives; and establishing the amount of emphasis to allocate to each knowledge or skill domain. The article refers to and provides examples of numerous test blueprints for a wide variety of knowledge and skill domains. We conclude by discussing the role of test blueprinting in test score validation, and by summarizing some of the other ways that test blueprints support instruction and assessment.
Asunto(s)
Lista de Verificación/métodos , Evaluación Educacional/métodos , Conocimiento , Competencia Clínica , Curriculum , Educación de Pregrado en Medicina , HumanosRESUMEN
Research suggests that the three-option format is optimal for multiple choice questions (MCQs). This conclusion is supported by numerous studies showing that most distractors (i.e., incorrect answers) are selected by so few examinees that they are essentially nonfunctional. However, nearly all studies have defined a distractor as nonfunctional if it is selected by fewer than 5% of examinees. A limitation of this definition is that the proportion of examinees available to choose a distractor depends on overall item difficulty. This is especially problematic for mastery tests, which consist of items that most examinees are expected to answer correctly. Based on the traditional definition of nonfunctional, a five-option MCQ answered correctly by greater than 90% of examinees will be constrained to have only one functional distractor. The primary purpose of the present study was to evaluate an index of nonfunctional that is sensitive to item difficulty. A secondary purpose was to extend previous research by studying distractor functionality within the context of professionally-developed credentialing tests. Data were analyzed for 840 MCQs consisting of five options per item. Results based on the traditional definition of nonfunctional were consistent with previous research indicating that most MCQs had one or two functional distractors. In contrast, the newly proposed index indicated that nearly half (47.3%) of all items had three or four functional distractors. Implications for item and test development are discussed.
Asunto(s)
Educación Médica/métodos , Educación Médica/normas , Evaluación Educacional/métodos , Evaluación Educacional/normas , Conducta de Elección , Humanos , Modelos Estadísticos , PsicometríaRESUMEN
Conventional methods for evaluating the utility of subscores rely on reliability and correlation coefficients. However, correlations can overlook a notable source of variability: variation in subtest means/difficulties. Brennan introduced a reliability index for score profiles based on multivariate generalizability theory, designated as G , which is sensitive to variation in subtest difficulty. However, there has been little, if any, research evaluating the properties of this index. A series of simulation experiments, as well as analyses of real data, were conducted to investigate G under various conditions of subtest reliability, subtest correlations, and variability in subtest means. Three pilot studies evaluated G in the context of a single group of examinees. Results of the pilots indicated that G indices were typically low; across the 108 experimental conditions, G ranged from .23 to .86, with an overall mean of 0.63. The findings were consistent with previous research, indicating that subscores often do not have interpretive value. Importantly, there were many conditions for which the correlation-based method known as proportion reduction in mean-square error (PRMSE; Haberman, 2006) indicated that subscores were worth reporting, but for which values of G fell into the .50s, .60s, and .70s. The main study investigated G within the context of score profiles for examinee subgroups. Again, not only G indices were generally low, but it was also found that G can be sensitive to subgroup differences when PRMSE is not. Analyses of real data and subsequent discussion address how G can supplement PRMSE for characterizing the quality of subscores.
RESUMEN
PURPOSE: Adaptive learning requires frequent and valid assessments for learners to track progress against their goals. This study determined if multiple-choice questions (MCQs) "crowdsourced" from medical learners could meet the standards of many large-scale testing programs. METHODS: Users of a medical education app (Osmosis.org, Baltimore, MD) volunteered to submit case-based MCQs. Eleven volunteers were selected to submit MCQs targeted to second year medical students. Two hundred MCQs were subjected to duplicate review by a panel of internal medicine faculty who rated each item for relevance, content accuracy, and quality of response option explanations. A sample of 121 items was pretested on clinical subject exams completed by a national sample of U.S. medical students. RESULTS: Seventy-eight percent of the 200 MCQs met faculty reviewer standards based on relevance, accuracy, and quality of explanations. Of the 121 pretested MCQs, 50% met acceptable statistical criteria. The most common reasons for exclusion were that the item was too easy or had a low discrimination index. CONCLUSIONS: Crowdsourcing can efficiently yield high-quality assessment items that meet rigorous judgmental and statistical criteria. Similar models may be adopted by students and educators to augment item pools that support adaptive learning.
Asunto(s)
Educación de Pregrado en Medicina/métodos , Evaluación Educacional/métodos , Retroalimentación Formativa , Colaboración de las Masas , Evaluación Educacional/normas , Humanos , Aprendizaje , Aplicaciones Móviles , Estudiantes de MedicinaRESUMEN
PURPOSE: In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred. METHOD: A total of 233,157 examinees responded to 1,306 cardiology test items over the six-year period; 138 items included multimedia simulations of heart sounds, while 1,168 text-based items without multimedia served as controls. The authors compared changes in difficulty of multimedia items over time with changes in difficulty of text-based cardiology items over time. Further, they compared changes in item difficulty for both groups of items between graduates of Liaison Committee on Medical Education (LCME)-accredited and non-LCME-accredited (i.e., international) medical schools. RESULTS: Examinee performance on cardiology test items with multimedia heart sounds improved by 12.4% over the six-year period, while performance on text-based cardiology items improved by approximately 1.4%. These results were similar for graduates of LCME-accredited and non-LCME-accredited medical schools. CONCLUSIONS: Examinees' ability to interpret auscultation findings in test items that include multimedia presentations increased from 2007 to 2012.
Asunto(s)
Cardiología/educación , Educación Médica/métodos , Evaluación Educacional/métodos , Auscultación Cardíaca/métodos , Entrenamiento Simulado/estadística & datos numéricos , Adulto , Competencia Clínica , Femenino , Humanos , Licencia Médica , Masculino , Multimedia , Lectura , Entrenamiento Simulado/métodos , Estados UnidosRESUMEN
One challenge when implementing case-based learning, and other approaches to contextualized learning, is determining which clinical problems to include. This article illustrates how health care utilization data, readily available from the National Center for Health Statistics (NCHS), can be incorporated into an educational needs assessment to identify medical problems physicians are likely to encounter in clinical practice. The NCHS survey data summarize patient demographics, diagnoses, and interventions for tens of thousands of patients seen in various settings, including emergency departments (EDs), clinics, and hospitals.Selected data from the National Hospital Ambulatory Medical Care Survey: Emergency Department illustrate how instructional materials can be derived from the results of such public-use health care data. Using fever as the reason for visit to the ED, the patient management path is depicted in the form of a case drill-down by exploring the most common diagnoses, blood tests, diagnostic studies, procedures, and medications associated with fever.Although these types of data are quite useful, they should not serve as the sole basis for determining which instructional cases to include. Additional sources of information should be considered to ensure the inclusion of cases that represent infrequent but high-impact problems and those that illustrate fundamental principles that generalize to other cases.
Asunto(s)
Bases de Datos Factuales , Educación Médica/métodos , Encuestas de Atención de la Salud , Servicios de Salud/estadística & datos numéricos , Aprendizaje Basado en Problemas/métodos , Instituciones de Atención Ambulatoria , Curriculum , Servicio de Urgencia en Hospital , Hospitalización , Humanos , National Center for Health Statistics, U.S. , Evaluación de Necesidades , Aceptación de la Atención de Salud , Estados UnidosRESUMEN
This study evaluated the extent to which medical students with limited English-language experience are differentially impacted by the additional reading load of test items consisting of long clinical vignettes. Participants included 25,012 examinees who completed Step 2 of the U.S. Medical Licensing Examination®. Test items were categorized into five levels based on the number of words per item, and examinee scores at each level were evaluated as a function of English-language experience (English as a second language [ESL] status and scores on a test of English-speaking proficiency). The longest items were more difficult than the shortest items across all examinee groups, and examinees with more English-language experience scored higher than those with less experience across all five levels of word count. The effect of primary interest-the interaction of word count with English-language experience-was statistically significant, indicating that score declines for longer items were larger for examinees with less English-language experience; however, the magnitude of this interaction effect was barely detectable (η2 = .0004, p < .001). Additional analyses supported the conclusion that the differential effect for examinees with less English-language experience was small but worthy of continued monitoring.
Asunto(s)
Evaluación Educacional/métodos , Evaluación Educacional/estadística & datos numéricos , Lenguaje , Estudiantes de Medicina/estadística & datos numéricos , Competencia Clínica , HumanosAsunto(s)
Toma de Decisiones , Planificación en Desastres/métodos , Liderazgo , Psicología Industrial/métodos , Estrés Psicológico/psicología , Animales , Animales de Laboratorio , Tormentas Ciclónicas , Desastres/prevención & control , Humanos , Ratones , Ciudad de Nueva York , Trabajo de Rescate/métodos , Trabajo de Rescate/normas , Medición de RiesgoRESUMEN
PURPOSE: Previous studies on standardized patient (SP) exams reported score gains both across attempts when examinees failed and retook the exam and over multiple SP encounters within a single exam session. The authors analyzed the within-session score gains of examinees who repeated the United States Medical Licensing Examination Step 2 Clinical Skills to answer two questions: How much do scores increase within a session? Can the pattern of increasing first-attempt scores account for across-session score gains? METHOD: Data included encounter-level scores for 2,165 U.S. and Canadian medical students and graduates who took Step 2 Clinical Skills twice between April 1, 2005 and December 31, 2010. The authors modeled examinees' score patterns using smoothing and regression techniques and applied statistical tests to determine whether the patterns were the same or different across attempts. In addition, they tested whether any across-session score gains could be explained by the first-attempt within-session score trajectory. RESULTS: For the first and second attempts, the authors attributed examinees' within-session score gains to a pattern of score increases over the first three to six SP encounters followed by a leveling off. Model predictions revealed that the authors could not attribute the across-session score gains to the first-attempt within-session score gains. CONCLUSIONS: The within-session score gains over the first three to six SP encounters of both attempts indicate that there is a temporary "warm-up" effect on performance that "resets" between attempts. Across-session gains are not due to this warm-up effect and likely reflect true improvement in performance.
Asunto(s)
Evaluación Educacional/métodos , Licencia Médica , Examen Físico/normas , Canadá , Competencia Clínica/normas , Competencia Clínica/estadística & datos numéricos , Evaluación Educacional/normas , Evaluación Educacional/estadística & datos numéricos , Humanos , Modelos Estadísticos , Análisis de Regresión , Estados UnidosRESUMEN
Examinees who initially fail and later repeat an SP-based clinical skills exam typically exhibit large score gains on their second attempt, suggesting the possibility that examinees were not well measured on one of those attempts. This study evaluates score precision for examinees who repeated an SP-based clinical skills test administered as part of the US Medical Licensing Examination sequence. Generalizability theory was used as the basis for computing conditional standard errors of measurement (SEM) for individual examinees. Conditional SEMs were computed for approximately 60,000 single-take examinees and 5,000 repeat examinees who completed the Step 2 Clinical Skills Examination(®) between 2007 and 2009. The study focused exclusively on ratings of communication and interpersonal skills. Conditional SEMs for single-take and repeat examinees were nearly indistinguishable across most of the score scale. US graduates and IMGs were measured with equal levels of precision at all score levels, as were examinees with differing levels of skill speaking English. There was no evidence that examinees with the largest score changes were measured poorly on either their first or second attempt. The large score increases for repeat examinees on this SP-based exam probably cannot be attributed to unexpectedly large errors of measurement.
Asunto(s)
Competencia Clínica/normas , Evaluación Educacional/normas , Examen Físico , Comunicación , Humanos , Concesión de Licencias , Simulación de Paciente , Estudiantes de Medicina , Estados UnidosRESUMEN
BACKGROUND: Studies completed over the past decade suggest the presence of a gap between what students learn during medical school and their clinical responsibilities as first-year residents. The purpose of this survey was to verify on a large scale the responsibilities of residents during their initial months of training. METHOD: Practice analysis surveys were mailed in September 2009 to 1,104 residency programs for distribution to an estimated 8,793 first-year residents. Surveys were returned by 3,003 residents from 672 programs; 2,523 surveys met inclusion criteria and were analyzed. RESULTS: New residents performed a wide range of activities, from routine but important communications (obtain informed consent) to complex procedures (thoracentesis), often without the attending physician present or otherwise involved. CONCLUSIONS: Medical school curricula and the content of competence assessments prior to residency should consider more thorough coverage of the complex knowledge and skills required early in residency.
Asunto(s)
Internado y Residencia , Práctica Profesional , Comunicación , Recolección de Datos , Estados UnidosRESUMEN
PURPOSE: Prior studies report large score gains for examinees who fail and later repeat standardized patient (SP) assessments. Although research indicates that score gains on SP exams cannot be attributed to memorizing previous cases, no studies have investigated the empirical validity of scores for repeat examinees. This report compares single-take and repeat examinees in terms of both internal (construct) validity and external (criterion-related) validity. METHOD: Data consisted of test scores for examinees who took the United States Medical Licensing Examination Step 2 Clinical Skills (CS) exam between July 16, 2007, and September 12, 2009. The sample included 12,090 examinees who completed Step 2 CS on one occasion and another 4,030 examinees who completed the exam on two occasions. The internal measures included four separately scored performance domains of the Step 2 CS examination, whereas the external measures consisted of scores on three written assessments of medical knowledge (Step 1, Step 2 clinical knowledge, and Step 3). The authors subjected the four Step 2 CS domains to confirmatory factor analysis and evaluated correlations between Step 2 CS scores and the three written assessments for single-take and repeat examinees. RESULTS: The factor structure for repeat examinees on their first attempt was markedly different from the factor structure for single-take examinees, but it became more similar to that for single-take examinees by their second attempt. Scores on the second attempt correlated more highly with all three external measures. CONCLUSIONS: The findings support the validity of scores for repeat examinees on their second attempt.
Asunto(s)
Competencia Clínica , Educación de Pregrado en Medicina/normas , Evaluación Educacional/métodos , Licencia Médica/normas , Examen Físico/normas , Femenino , Humanos , Masculino , Simulación de Paciente , Reproducibilidad de los Resultados , Estudios Retrospectivos , Estados UnidosRESUMEN
Years of research with high-stakes written tests indicates that although repeat examinees typically experience score gains between their first and subsequent attempts, their pass rates remain considerably lower than pass rates for first-time examinees. This outcome is consistent with expectations. Comparable studies of the performance of repeat examinees on oral examinations are lacking. The current research evaluated pass rates for more than 50,000 examinees on written and oral exams administered by six medical specialty boards for several recent years. Pass rates for first-time examinees were similar for both written and oral exams, averaging about 84% across all boards. Pass rates for repeat examinees on written exams were expectedly lower, ranging from 22% to 51%, with an average of 36%. However, pass rates for repeat examinees on oral exams were markedly higher than for written exams, ranging from 53% to 77%, with an average of 65%. Four explanations for the elevated repeat pass rates on oral exams are proposed, including an increase in examinee proficiency, construct-irrelevant variance, measurement error (score unreliability), and memorization of test content. Simulated data are used to demonstrate that roughly one third of the score increase can be explained by measurement error alone. The authors suggest that a substantial portion of the score increase can also likely be attributed to construct-irrelevant variance. Results are discussed in terms of their implications for making pass-fail decisions when retesting is allowed. The article concludes by identifying areas for future research.