RESUMEN
OBJECTIVES: This study aimed to (1) develop a new measure of adherence to exercise for musculoskeletal (MSK) pain (Adherence To Exercise for Musculoskeletal Pain Tool: ATEMPT) based on previously conceptualised domains of exercise adherence, (2) report the content and structural validity, internal consistency, test-retest reliability, and measurement error for the ATEMPT outcome measure in patients managed with exercise for MSK pain. METHODS: ATEMPT was created using statements describing adherence generated by patients, physiotherapists and researchers, with content validity established. Baseline and retest questionnaires were distributed to patients recommended exercise for MSK pain in 11 National Health Service physiotherapy clinics. Items demonstrating low response variation were removed and the following measurement properties assessed: structural validity, internal consistency, test-retest reliability and measurement error. RESULTS: Baseline and retest data were collected from 382 and 112 patients with MSK pain, respectively. Confirmatory factor analysis established that a single factor solution was the best fit according to Bayesian Information Criterion. The 6-item version of the measure (scored 6-30) demonstrated optimal internal consistency (Cronbach's Alpha 0.86, 95% CI 0.83 to 0.88) with acceptable levels of test-retest reliability (intraclass correlation coefficient 0.84, 95% CI 0.78 to 0.88) and measurement error (smallest detectable change 3.77, 95% CI 3.27 to 4.42) (SE of measurement 2.67, 95% CI 2.31 to 3.16). CONCLUSION: The 6-item ATEMPT was developed from the six domains of exercise adherence. It has adequate content and structural validity, internal consistency, test-retest reliability and measurement error in patients with MSK pain, but should undergo additional testing to establish the construct validity and responsiveness.
Asunto(s)
Dolor Musculoesquelético , Humanos , Reproducibilidad de los Resultados , Teorema de Bayes , Medicina Estatal , Psicometría , Encuestas y CuestionariosRESUMEN
INTRODUCTION: Ensuring equivalence in high-stakes performance exams is important for patient safety and candidate fairness. We compared inter-school examiner differences within a shared OSCE and resulting impact on students' pass/fail categorisation. METHODS: The same 6 station formative OSCE ran asynchronously in 4 medical schools, with 2 parallel circuits/school. We compared examiners' judgements using Video-based Examiner Score Comparison and Adjustment (VESCA): examiners scored station-specific comparator videos in addition to 'live' student performances, enabling 1/controlled score comparisons by a/examiner-cohorts and b/schools and 2/data linkage to adjust for the influence of examiner-cohorts. We calculated score impact and change in pass/fail categorisation by school. RESULTS: On controlled video-based comparisons, inter-school variations in examiners' scoring (16.3%) were nearly double within-school variations (8.8%). Students' scores received a median adjustment of 5.26% (IQR 2.87-7.17%). The impact of adjusting for examiner differences on students' pass/fail categorisation varied by school, with adjustment reducing failure rate from 39.13% to 8.70% (school 2) whilst increasing failure from 0.00% to 21.74% (school 4). DISCUSSION: Whilst the formative context may partly account for differences, these findings query whether variations may exist between medical schools in examiners' judgements. This may benefit from systematic appraisal to safeguard equivalence. VESCA provided a viable method for comparisons.
RESUMEN
PURPOSE: Ensuring equivalence of examiners' judgements within distributed objective structured clinical exams (OSCEs) is key to both fairness and validity but is hampered by lack of cross-over in the performances which different groups of examiners observe. This study develops a novel method called Video-based Examiner Score Comparison and Adjustment (VESCA) using it to compare examiners scoring from different OSCE sites for the first time. MATERIALS/ METHODS: Within a summative 16 station OSCE, volunteer students were videoed on each station and all examiners invited to score station-specific comparator videos in addition to usual student scoring. Linkage provided through the video-scores enabled use of Many Facet Rasch Modelling (MFRM) to compare 1/ examiner-cohort and 2/ site effects on students' scores. RESULTS: Examiner-cohorts varied by 6.9% in the overall score allocated to students of the same ability. Whilst only a tiny difference was apparent between sites, examiner-cohort variability was greater in one site than the other. Adjusting student scores produced a median change in rank position of 6 places (0.48 deciles), however 26.9% of students changed their rank position by at least 1 decile. By contrast, only 1 student's pass/fail classification was altered by score adjustment. CONCLUSIONS: Whilst comparatively limited examiner participation rates may limit interpretation of score adjustment in this instance, this study demonstrates the feasibility of using VESCA for quality assurance purposes in large scale distributed OSCEs.
Asunto(s)
Evaluación Educacional , Estudiantes de Medicina , Humanos , Evaluación Educacional/métodos , Competencia ClínicaRESUMEN
INTRODUCTION: Differential rater function over time (DRIFT) and contrast effects (examiners' scores biased away from the standard of preceding performances) both challenge the fairness of scoring in objective structured clinical exams (OSCEs). This is important as, under some circumstances, these effects could alter whether some candidates pass or fail assessments. Benefitting from experimental control, this study investigated the causality, operation and interaction of both effects simultaneously for the first time in an OSCE setting. METHODS: We used secondary analysis of data from an OSCE in which examiners scored embedded videos of student performances interspersed between live students. Embedded video position varied between examiners (early vs. late) whilst the standard of preceding performances naturally varied (previous high or low). We examined linear relationships suggestive of DRIFT and contrast effects in all within-OSCE data before comparing the influence and interaction of 'early' versus 'late' and 'previous high' versus 'previous low' conditions on embedded video scores. RESULTS: Linear relationships data did not support the presence of DRIFT or contrast effects. Embedded videos were scored higher early (19.9 [19.4-20.5]) versus late (18.6 [18.1-19.1], p < 0.001), but scores did not differ between previous high and previous low conditions. The interaction term was non-significant. CONCLUSIONS: In this instance, the small DRIFT effect we observed on embedded videos can be causally attributed to examiner behaviour. Contrast effects appear less ubiquitous than some prior research suggests. Possible mediators of these finding include the following: OSCE context, detail of task specification, examiners' cognitive load and the distribution of learners' ability. As the operation of these effects appears to vary across contexts, further research is needed to determine the prevalence and mechanisms of contrast and DRIFT effects, so that assessments may be designed in ways that are likely to avoid their occurrence. Quality assurance should monitor for these contextually variable effects in order to ensure OSCE equivalence.
Asunto(s)
Competencia Clínica , Evaluación Educacional , HumanosRESUMEN
BACKGROUND: Ensuring equivalence of examiners' judgements across different groups of examiners is a priority for large scale performance assessments in clinical education, both to enhance fairness and reassure the public. This study extends insight into an innovation called Video-based Examiner Score Comparison and Adjustment (VESCA) which uses video scoring to link otherwise unlinked groups of examiners. This linkage enables comparison of the influence of different examiner-groups within a common frame of reference and provision of adjusted "fair" scores to students. Whilst this innovation promises substantial benefit to quality assurance of distributed Objective Structured Clinical Exams (OSCEs), questions remain about how the resulting score adjustments might be influenced by the specific parameters used to operationalise VESCA. Research questions, How similar are estimates of students' score adjustments when the model is run with either: fewer comparison videos per participating examiner?; reduced numbers of participating examiners? METHODS: Using secondary analysis of recent research which used VESCA to compare scoring tendencies of different examiner groups, we made numerous copies of the original data then selectively deleted video scores to reduce the number of 1/ linking videos per examiner (4 versus several permutations of 3,2,or 1 videos) or 2/examiner participation rates (all participating examiners (76%) versus several permutations of 70%, 60% or 50% participation). After analysing all resulting datasets with Many Facet Rasch Modelling (MFRM) we calculated students' score adjustments for each dataset and compared these with score adjustments in the original data using Spearman's correlations. RESULTS: Students' score adjustments derived form 3 videos per examiner correlated highly with score adjustments derived from 4 linking videos (median Rho = 0.93,IQR0.90-0.95,p < 0.001), with 2 (median Rho 0.85,IQR0.81-0.87,p < 0.001) and 1 linking videos (median Rho = 0.52(IQR0.46-0.64,p < 0.001) producing progressively smaller correlations. Score adjustments were similar for 76% participating examiners and 70% (median Rho = 0.97,IQR0.95-0.98,p < 0.001), and 60% (median Rho = 0.95,IQR0.94-0.98,p < 0.001) participation, but were lower and more variable for 50% examiner participation (median Rho = 0.78,IQR0.65-0.83, some ns). CONCLUSIONS: Whilst VESCA showed some sensitivity to the examined parameters, modest reductions in examiner participation rates or video numbers produced highly similar results. Employing VESCA in distributed or national exams could enhance quality assurance or exam fairness.
Asunto(s)
Evaluación Educacional , Estudiantes de Medicina , Competencia Clínica , Humanos , JuicioRESUMEN
BACKGROUND: The Musculoskeletal Health Questionnaire (MSK-HQ) has been developed to measure musculoskeletal health status across musculoskeletal conditions and settings. However, the MSK-HQ needs to be further evaluated across settings and different languages. OBJECTIVE: The objective of the study was to evaluate and compare measurement properties of the MSK-HQ across Danish (DK) and English (UK) cohorts of patients from primary care physiotherapy services with musculoskeletal pain. METHODS: MSK-HQ was translated into Danish according to international guidelines. Measurement invariance was assessed by differential item functioning (DIF) analyses. Test-retest reliability, measurement error, responsiveness and minimal clinically important change (MCIC) were evaluated and compared between DK (n = 153) and UK (n = 166) cohorts. RESULTS: The Danish version demonstrated acceptable face and construct validity. Out of the 14 MSK-HQ items, three items showed DIF for language (pain/stiffness at night, understanding condition and confidence in managing symptoms) and three items showed DIF for pain location (walking, washing/dressing and physical activity levels). Intraclass Correlation Coefficients for test-retest were 0.86 (95% CI 0.81 to 0.91) for DK cohort and 0.77 (95% CI 0.49 to 0.90) for the UK cohort. The systematic measurement error was 1.6 and 3.9 points for the DK and UK cohorts respectively, with random measurement error being 8.6 and 9.9 points. Receiver operating characteristic (ROC) curves of the change scores against patients' own judgment at 12 weeks exceeded 0.70 in both cohorts. Absolute and relative MCIC estimates were 8-10 points and 26% for the DK cohort and 6-8 points and 29% for the UK cohort. CONCLUSIONS: The measurement properties of MSK-HQ were acceptable across countries, but seem more suited for group than individual level evaluation. Researchers and clinicians should be aware that some discrepancy exits and should take the observed measurement error into account when evaluating change in scores over time.
Asunto(s)
Dolor Musculoesquelético/psicología , Calidad de Vida , Adulto , Comparación Transcultural , Dinamarca , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Prospectivos , Reproducibilidad de los Resultados , Encuestas y Cuestionarios , Traducciones , Reino UnidoRESUMEN
BACKGROUND: Although averaging across multiple examiners' judgements reduces unwanted overall score variability in objective structured clinical examinations (OSCE), designs involving several parallel circuits of the OSCE require that different examiner cohorts collectively judge performances to the same standard in order to avoid bias. Prior research suggests the potential for important examiner-cohort effects in distributed or national examinations that could compromise fairness or patient safety, but despite their importance, these effects are rarely investigated because fully nested assessment designs make them very difficult to study. We describe initial use of a new method to measure and adjust for examiner-cohort effects on students' scores. METHODS: We developed video-based examiner score comparison and adjustment (VESCA): volunteer students were filmed 'live' on 10 out of 12 OSCE stations. Following the examination, examiners additionally scored station-specific common-comparator videos, producing partial crossing between examiner cohorts. Many-facet Rasch modelling and linear mixed modelling were used to estimate and adjust for examiner-cohort effects on students' scores. RESULTS: After accounting for students' ability, examiner cohorts differed substantially in their stringency or leniency (maximal global score difference of 0.47 out of 7.0 [Cohen's d = 0.96]; maximal total percentage score difference of 5.7% [Cohen's d = 1.06] for the same student ability by different examiner cohorts). Corresponding adjustment of students' global and total percentage scores altered the theoretical classification of 6.0% of students for both measures (either pass to fail or fail to pass), whereas 8.6-9.5% students' scores were altered by at least 0.5 standard deviations of student ability. CONCLUSIONS: Despite typical reliability, the examiner cohort that students encountered had a potentially important influence on their score, emphasising the need for adequate sampling and examiner training. Development and validation of VESCA may offer a means to measure and adjust for potential systematic differences in scoring patterns that could exist between locations in distributed or national OSCE examinations, thereby ensuring equivalence and fairness.
Asunto(s)
Competencia Clínica/normas , Educación de Pregrado en Medicina/normas , Evaluación Educacional/métodos , Evaluación Educacional/normas , Variaciones Dependientes del Observador , Grabación de Cinta de Video/métodos , Educación de Pregrado en Medicina/métodos , Humanos , Reproducibilidad de los Resultados , Estudiantes de MedicinaRESUMEN
Background: OSCE examiners' scores are variable and may discriminate domains of performance poorly. Examiners must hold their observations of OSCE performances in "episodic memory" until performances end. We investigated whether examiners vary in their recollection of performances; and whether this relates to their score variability or ability to separate disparate performance domains. Methods: Secondary analysis was performed on data where examiners had: 1/scored videos of OSCE performances showing disparate student ability in different domains; and 2/performed a measure of recollection for an OSCE performance. We calculated measures of "overall-score variance" (the degree individual examiners' overall scores varied from the group mean) and "domain separation" (the degree to which examiners separated different performance domains). We related these variables to the measure of examiners' recollection. Results: Examiners varied considerably in their recollection accuracy (recognition beyond chance -5% to +75% for different examiners). Examiners' recollection accuracy was weakly inversely related to their overall score accuracy (R = -0.17, p < 0.001) and related to their ability to separate domains of performance (R = 0.25, p < 0.001). Conclusions: Examiners vary substantially in their memories for students' performances which may offer a useful point of difference to study processing and integration phases of judgment. Findings could have implication for the utility of feedback.
Asunto(s)
Evaluación Educacional/normas , Juicio , Recuerdo Mental , Variaciones Dependientes del Observador , Competencia Clínica , Femenino , Humanos , Masculino , Análisis de Regresión , Reino UnidoRESUMEN
BACKGROUND: The sample size required to power a study to a nominal level in a paired comparative diagnostic accuracy study, i.e. studies in which the diagnostic accuracy of two testing procedures is compared relative to a gold standard, depends on the conditional dependence between the two tests - the lower the dependence the greater the sample size required. A priori, we usually do not know the dependence between the two tests and thus cannot determine the exact sample size required. One option is to use the implied sample size for the maximal negative dependence, giving the largest possible sample size. However, this is potentially wasteful of resources and unnecessarily burdensome on study participants as the study is likely to be overpowered. A more accurate estimate of the sample size can be determined at a planned interim analysis point where the sample size is re-estimated. METHODS: This paper discusses a sample size estimation and re-estimation method based on the maximum likelihood estimates, under an implied multinomial model, of the observed values of conditional dependence between the two tests and, if required, prevalence, at a planned interim. The method is illustrated by comparing the accuracy of two procedures for the detection of pancreatic cancer, one procedure using the standard battery of tests, and the other using the standard battery with the addition of a PET/CT scan all relative to the gold standard of a cell biopsy. Simulation of the proposed method illustrates its robustness under various conditions. RESULTS: The results show that the type I error rate of the overall experiment is stable using our suggested method and that the type II error rate is close to or above nominal. Furthermore, the instances in which the type II error rate is above nominal are in the situations where the lowest sample size is required, meaning a lower impact on the actual number of participants recruited. CONCLUSION: We recommend multinomial model maximum likelihood estimation of the conditional dependence between paired diagnostic accuracy tests at an interim to reduce the number of participants required to power the study to at least the nominal level. TRIAL REGISTRATION: ISRCTN ISRCTN73852054 . Registered 9th of January 2015. Retrospectively registered.
Asunto(s)
Neoplasias Pancreáticas/diagnóstico por imagen , Neoplasias Pancreáticas/cirugía , Tomografía Computarizada por Tomografía de Emisión de Positrones/métodos , Tamaño de la Muestra , Adulto , Algoritmos , Simulación por Computador , Femenino , Humanos , Funciones de Verosimilitud , Masculino , Análisis por Apareamiento , Modelos Estadísticos , Reproducibilidad de los Resultados , Estudios Retrospectivos , Sensibilidad y Especificidad , Resultado del TratamientoRESUMEN
Millions of children under 5 years in low- and middle-income countries fail to attain their development potential with accruing short- and long-term consequences. Low length/height for age (stunting) is known to be a key factor, but there is little data on how child characteristics are linked with developmental changes among children with stunting. We assessed the socioeconomic, household, anthropometric, and clinical predictors of change in early child development (ECD) among 1-5-year-old children with stunting. This was a prospective cohort study nested in a randomized trial testing effects of lipid-based nutrient supplementation among children with stunting in Uganda. Development was assessed using the Malawi Development Assessment Tool (MDAT). Multiple linear regression analysis was used to assess for predictors of change. We included 750 children with mean ±SD age of 30.2 ±11.7 months 45% of whom were female. After 12 weeks, total MDAT z-score increased by 0.40 (95%CI: 0.32; 0.48). Moderate vs severe stunting, higher fat-free mass, negative malaria test and no inflammation (serum α-1-acid glycoprotein <1 g/l) at baseline predicted greater increase in ECD scores. Older age and fat mass gain predicted a lesser increase in ECD. Our findings reinforce the link between stunting and development with more severely stunted children having a lesser increase in ECD scores over time. Younger age, freedom from malaria and inflammation, and higher fat-free mass at baseline, as well as less gain of fat mass during follow-up predicted a higher increase in developmental scores in this study. Thus, supporting fat-free mass accretion, focusing on younger children, and infection prevention may improve development among children with stunting.
RESUMEN
INTRODUCTION: Early childhood development forms the foundations for functioning later in life. Thus, accurate monitoring of developmental trajectories is critical. However, such monitoring often relies on time-intensive assessments which necessitate administration by skilled professionals. This difficulty is exacerbated in low-resource settings where such professionals are predominantly concentrated in urban and often private clinics, making them inaccessible to many. This geographic and economic inaccessibility contributes to a significant 'detection gap' where many children who might benefit from support remain undetected. The Scalable Transdiagnostic Early Assessment of Mental Health (STREAM) project aims to bridge this gap by developing an open-source, scalable, tablet-based platform administered by non-specialist workers to assess motor, social and cognitive developmental status. The goal is to deploy STREAM through public health initiatives, maximising opportunities for effective early interventions. METHODS AND ANALYSIS: The STREAM project will enrol and assess 4000 children aged 0-6 years from Malawi (n=2000) and India (n=2000). It integrates three established developmental assessment tools measuring motor, social and cognitive functioning using gamified tasks, observation checklists, parent-report and audio-video recordings. Domain scores for motor, social and cognitive functioning will be developed and assessed for their validity and reliability. These domain scores will then be used to construct age-adjusted developmental reference curves. ETHICS AND DISSEMINATION: Ethical approval has been obtained from local review boards at each site (India: Sangath Institutional Review Board; All India Institute of Medical Science (AIIMS) Ethics Committee; Indian Council of Medical Research-Health Ministry Screening Committee; Malawi: College of Medicine Research and Ethics Committee; Malawi Ministry of Health-Blantyre District Health Office). The study adheres to Good Clinical Practice standards and the ethical guidelines of the 6th (2008) Declaration of Helsinki. Findings from STREAM will be disseminated to participating families, healthcare professionals, policymakers, educators and researchers, at local, national and international levels through meetings, academic journals and conferences.
Asunto(s)
Desarrollo Infantil , Salud Mental , Humanos , Preescolar , Lactante , Niño , India , Malaui , Femenino , Recién Nacido , Masculino , Reproducibilidad de los Resultados , Proyectos de InvestigaciónRESUMEN
Detecting when others are looking at us is a crucial social skill. Accordingly, a range of gaze angles is perceived as self-directed; this is termed the "cone of direct gaze" (CoDG). Multiple cues, such as nose and head orientation, are integrated during gaze perception. Thus, occluding the lower portion of the face, such as with face masks during the COVID-19 pandemic, may influence how gaze is perceived. Individual differences in the prioritisation of eye-region and non-eye-region cues may modulate the influence of face masks on gaze perception. Autistic individuals, who may be more reliant on non-eye-region directional cues during gaze perception, might be differentially affected by face masks. In the present study, we compared the CoDG when viewing masked and unmasked faces (N = 157) and measured self-reported autistic traits. The CoDG was wider for masked compared to unmasked faces, suggesting that reduced reliability of lower face cues increases the range of gaze angles perceived as self-directed. Additionally, autistic traits positively predicted the magnitude of CoDG difference between masked and unmasked faces. This study provides crucial insights into the effect of face masks on gaze perception, and how they may affect autistic individuals to a greater extent.
Asunto(s)
Trastorno Autístico , COVID-19 , Humanos , Máscaras , Pandemias , Reproducibilidad de los Resultados , PercepciónRESUMEN
OBJECTIVE: Caring for a child with cystic fibrosis (CF) is a rigorous daily commitment for caregivers and treatment burden is a major concern. We aimed to develop and validate a short form version of a 46-item tool assessing the Challenge of Living with Cystic Fibrosis (CLCF) for clinical or research use. DESIGN: A novel genetic algorithm based on 'evolving' a subset of items from a pre-specified set of criteria, was applied to optimise the tool, using data from 135 families. MAIN OUTCOME MEASURES: Internal reliability and validity were assessed; the latter compared scores to validated tests of parental well-being, markers of treatment burden, and disease severity. RESULTS: The 15-item CLCF-SF demonstrated very good internal consistency [Cronbach's alpha 0.82 (95%CI 0.78-0.87)]. Scores for convergent validity correlated with the Beck Depression Inventory (Rho = 0.48), State Trait Anxiety Inventory (STAI-State, Rho = 0.41; STAI-Trait, Rho = 0.43), Cystic Fibrosis Questionnaire-Revised, lung function (Rho = -0.37), caregiver treatment management (r = 0.48) and child treatment management (r = 0.45), and discriminated between unwell and well children with CF (Mean Difference 5.5, 95%CI 2.5-8.5, p < 0.001), and recent or no hospital admission (MD 3.6, 95%CI 0.25-6.95, p = 0.039). CONCLUSION: The CLCF-SF provides a robust 15-item tool for assessing the challenge of living with a child with CF.
RESUMEN
OBJECTIVE: Treatments for cystic fibrosis (CF) are complex, labour-intensive, and perceived as highly burdensome by caregivers of children with CF. An instrument assessing burden of care is needed. DESIGN: A stepwise, qualitative design was used to create the CLCF with caregiver focus groups, participant researchers, a multidisciplinary professional panel, and cognitive interviews. MAIN OUTCOME MEASURES: Preliminary psychometric analyses evaluated the reliability and convergent validity of the CLCF scores. Cronbach's alpha assessed internal consistency and t-tests examined test-retest reliability. Correlations measured convergence between the Treatment Burden scale of the Cystic Fibrosis Questionnaire-Revised (CFQ-R) and the CLCF. Discriminant validity was assessed by comparing CLCF scores in one vs two-parent families, across ages, and in children with vs without Pseudomonas aeruginosa (PA). RESULTS: Six Challenge subscales emerged from the qualitative data and the professional panel constructed a scoresheet estimating the Time and Effort required for treatments. Internal consistency and test-retest reliability were adequate. Good convergence was found between the Total Challenge score and Treatment Burden on the CFQ-R (r=-0.49, p = 0.02, n = 31). A recent PA infection signalled higher Total Challenge for caregivers (F(23)11.72, p = 0.002). CONCLUSIONS: The CLCF, developed in partnership with parents/caregivers and CF professionals, is a timely, disease-specific burden measure for clinical research.
RESUMEN
Stunting affects 22% children globally, putting them at risk of adverse outcomes including delayed development. We investigated the effect of milk protein (MP) vs. soy and whey permeate (WP) vs. maltodextrin in large-quantity, lipid-based nutrient supplement (LNS), and LNS itself vs. no supplementation, on child development and head circumference among stunted children aged 1-5 years. We conducted a randomized, double-blind, community-based 2 × 2 factorial trial in Uganda (ISRCTN1309319). We randomized 600 children to one of four LNS formulations (~535 kcal/d), with or without MP (n = 299 vs. n = 301) or WP (n = 301 vs. n = 299), for 12 weeks or to no supplementation (n = 150). Child development was assessed using the Malawi Development Assessment Tool. Data were analyzed using linear mixed-effects models. Children had a median [interquartile range] age of 30 [23; 41] months and mean ± standard deviation height-for-age z-score of -3.02 ± 0.74. There were no interactions between MP and WP for any of the outcomes. There was no effect of either MP or WP on any developmental domain. Although LNS itself had no impact on development, it resulted in 0.07 (95%CI: 0.004; 0.14) cm higher head circumference. Neither dairy in LNS, nor LNS in itself, had an effect on development among already stunted children.
Asunto(s)
Desarrollo Infantil , Suero Lácteo , Humanos , Niño , Lactante , Proteínas de la Leche , Uganda , Micronutrientes , Suplementos Dietéticos , Trastornos del Crecimiento/prevención & control , Nutrientes , Proteína de Suero de Leche , LípidosRESUMEN
INTRODUCTION: With the ratification of the Sustainable Development Goals, there is an increased emphasis on early childhood development (ECD) and well-being. The WHO led Global Scales for Early Development (GSED) project aims to provide population and programmatic level measures of ECD for 0-3 years that are valid, reliable and have psychometrically stable performance across geographical, cultural and language contexts. This paper reports on the creation of two measures: (1) the GSED Short Form (GSED-SF)-a caregiver reported measure for population-evaluation-self-administered with no training required and (2) the GSED Long Form (GSED-LF)-a directly administered/observed measure for programmatic evaluation-administered by a trained professional. METHODS: We selected 807 psychometrically best-performing items using a Rasch measurement model from an ECD measurement databank which comprised 66 075 children assessed on 2211 items from 18 ECD measures in 32 countries. From 766 of these items, in-depth subject matter expert judgements were gathered to inform final item selection. Specifically collected were data on (1) conceptual matches between pairs of items originating from different measures, (2) developmental domain(s) measured by each item and (3) perceptions of feasibility of administration of each item in diverse contexts. Prototypes were finalised through a combination of psychometric performance evaluation and expert consensus to optimally identify items. RESULTS: We created the GSED-SF (139 items) and GSED-LF (157 items) for tablet-based and paper-based assessments, with an optimal set of items that fit the Rasch model, met subject matter expert criteria, avoided conceptual overlap, covered multiple domains of child development and were feasible to implement across diverse settings. CONCLUSIONS: State-of-the-art quantitative and qualitative procedures were used to select of theoretically relevant and globally feasible items representing child development for children aged 0-3 years. GSED-SF and GSED-LF will be piloted and validated in children across diverse cultural, demographic, social and language contexts for global use.
Asunto(s)
Macrodatos , Juicio , Humanos , Niño , Preescolar , Encuestas y Cuestionarios , Desarrollo Infantil , PsicometríaRESUMEN
INTRODUCTION: Children's early development is affected by caregiving experiences, with lifelong health and well-being implications. Governments and civil societies need population-based measures to monitor children's early development and ensure that children receive the care needed to thrive. To this end, the WHO developed the Global Scales for Early Development (GSED) to measure children's early development up to 3 years of age. The GSED includes three measures for population and programmatic level measurement: (1) short form (SF) (caregiver report), (2) long form (LF) (direct administration) and (3) psychosocial form (PF) (caregiver report). The primary aim of this protocol is to validate the GSED SF and LF. Secondary aims are to create preliminary reference scores for the GSED SF and LF, validate an adaptive testing algorithm and assess the feasibility and preliminary validity of the GSED PF. METHODS AND ANALYSIS: We will conduct the validation in seven countries (Bangladesh, Brazil, Côte d'Ivoire, Pakistan, The Netherlands, People's Republic of China, United Republic of Tanzania), varying in geography, language, culture and income through a 1-year prospective design, combining cross-sectional and longitudinal methods with 1248 children per site, stratified by age and sex. The GSED generates an innovative common metric (Developmental Score: D-score) using the Rasch model and a Development for Age Z-score (DAZ). We will evaluate six psychometric properties of the GSED SF and LF: concurrent validity, predictive validity at 6 months, convergent and discriminant validity, and test-retest and inter-rater reliability. We will evaluate measurement invariance by comparing differential item functioning and differential test functioning across sites. ETHICS AND DISSEMINATION: This study has received ethical approval from the WHO (protocol GSED validation 004583 20.04.2020) and approval in each site. Study results will be disseminated through webinars and publications from WHO, international organisations, academic journals and conference proceedings. REGISTRATION DETAILS: Open Science Framework https://osf.io/ on 19 November 2021 (DOI 10.17605/OSF.IO/KX5T7; identifier: osf-registrations-kx5t7-v1).
Asunto(s)
Cuidadores , Lenguaje , Humanos , Niño , Preescolar , Reproducibilidad de los Resultados , Estudios Transversales , Encuestas y Cuestionarios , Psicometría/métodosRESUMEN
INTRODUCTION: Objective structured clinical exams (OSCEs) are a cornerstone of assessing the competence of trainee healthcare professionals, but have been criticised for (1) lacking authenticity, (2) variability in examiners' judgements which can challenge assessment equivalence and (3) for limited diagnosticity of trainees' focal strengths and weaknesses. In response, this study aims to investigate whether (1) sharing integrated-task OSCE stations across institutions can increase perceived authenticity, while (2) enhancing assessment equivalence by enabling comparison of the standard of examiners' judgements between institutions using a novel methodology (video-based score comparison and adjustment (VESCA)) and (3) exploring the potential to develop more diagnostic signals from data on students' performances. METHODS AND ANALYSIS: The study will use a complex intervention design, developing, implementing and sharing an integrated-task (research) OSCE across four UK medical schools. It will use VESCA to compare examiner scoring differences between groups of examiners and different sites, while studying how, why and for whom the shared OSCE and VESCA operate across participating schools. Quantitative analysis will use Many Facet Rasch Modelling to compare the influence of different examiners groups and sites on students' scores, while the operation of the two interventions (shared integrated task OSCEs; VESCA) will be studied through the theory-driven method of Realist evaluation. Further exploratory analyses will examine diagnostic performance signals within data. ETHICS AND DISSEMINATION: The study will be extra to usual course requirements and all participation will be voluntary. We will uphold principles of informed consent, the right to withdraw, confidentiality with pseudonymity and strict data security. The study has received ethical approval from Keele University Research Ethics Committee. Findings will be academically published and will contribute to good practice guidance on (1) the use of VESCA and (2) sharing and use of integrated-task OSCE stations.
Asunto(s)
Educación de Pregrado en Medicina , Estudiantes de Medicina , Humanos , Evaluación Educacional/métodos , Educación de Pregrado en Medicina/métodos , Competencia Clínica , Facultades de Medicina , Estudios Multicéntricos como AsuntoRESUMEN
PURPOSE: Ensuring that examiners in different parallel circuits of objective structured clinical examinations (OSCEs) judge to the same standard is critical to the chain of validity. Recent work suggests examiner-cohort (i.e., the particular group of examiners) could significantly alter outcomes for some candidates. Despite this, examiner-cohort effects are rarely examined since fully nested data (i.e., no crossover between the students judged by different examiner groups) limit comparisons. In this study, the authors aim to replicate and further develop a novel method called Video-based Examiner Score Comparison and Adjustment (VESCA), so it can be used to enhance quality assurance of distributed or national OSCEs. METHOD: In 2019, 6 volunteer students were filmed on 12 stations in a summative OSCE. In addition to examining live student performances, examiners from 8 separate examiner-cohorts scored the pool of video performances. Examiners scored videos specific to their station. Video scores linked otherwise fully nested data, enabling comparisons by Many Facet Rasch Modeling. Authors compared and adjusted for examiner-cohort effects. They also compared examiners' scores when videos were embedded (interspersed between live students during the OSCE) or judged later via the Internet. RESULTS: Having accounted for differences in students' ability, different examiner-cohort scores for the same ability of student ranged from 18.57 of 27 (68.8%) to 20.49 (75.9%), Cohen's d = 1.3. Score adjustment changed the pass/fail classification for up to 16% of students depending on the modeled cut score. Internet and embedded video scoring showed no difference in mean scores or variability. Examiners' accuracy did not deteriorate over the 3-week Internet scoring period. CONCLUSIONS: Examiner-cohorts produced a replicable, significant influence on OSCE scores that was unaccounted for by typical assessment psychometrics. VESCA offers a promising means to enhance validity and fairness in distributed OSCEs or national exams. Internet-based scoring may enhance VESCA's feasibility.
Asunto(s)
Competencia Clínica , Evaluación Educacional , Evaluación Educacional/métodos , Humanos , Examen Físico , PsicometríaRESUMEN
BACKGROUND: The early childhood years provide an important window of opportunity to build strong foundations for future development. One impediment to global progress is a lack of population-based measurement tools to provide reliable estimates of developmental status. We aimed to field test and validate a newly created tool for this purpose. METHODS: We assessed attainment of 121 Infant and Young Child Development (IYCD) items in 269 children aged 0-3 from Pakistan, Malawi and Brazil alongside socioeconomic status (SES), maternal educational, Family Care Indicators and anthropometry. Children born premature, malnourished or with neurodevelopmental problems were excluded. We assessed inter-rater and test-retest reliability as well as understandability of items. Each item was analyzed using logistic regression taking SES, anthropometry, gender and FCI as covariates. Consensus choice of final items depended on developmental trajectory, age of attainment, invariance, reliability and acceptability between countries. RESULTS: The IYCD has 100 developmental items (40 gross/fine motor, 30 expressive/receptive language/cognitive, 20 socio-emotional and 10 behavior). Items were acceptable, performed well in cognitive testing, had good developmental trajectories and high reliability across countries. Development for Age (DAZ) scores showed very good known-groups validity. CONCLUSIONS: The IYCD is a simple-to-use caregiver report tool enabling population level assessment of child development for children aged 0-3 years which performs well across three countries on three continents to provide reliable estimates of young children's developmental status.