RESUMO
The onset of the COVID-19 pandemic and associated long-term shifts to virtual instruction among most US schools presented notable challenges among education researchers. Ongoing projects conducted in school settings experienced sudden losses of access to teacher and student participants, in many cases leading to severe interruptions to data collection efforts. Perhaps most notably, upon returns to in-person instruction in the 2021/22 academic year most schools instigated strict policies limiting the number of non-school personnel who could enter school buildings, including researchers conducting in-person data collections. As such, many researchers had to find alternative means to gather data. In this paper, we offer a new protocol that we created in response to these challenges that allows for the secure and fully remote collection of video data in school settings. This new protocol not only addressed the immediate needs of the focal study but also addresses some of the most notable barriers to collecting classroom video data in the field of education research at large. In this paper, we describe the initial development and application of this protocol among a local study of elementary teachers, as well as the scaling of this protocol in a study of elementary teachers in multiple states. It is our hope that this protocol can expand education researchers', practitioners', and policymakers' access to classroom video data.
RESUMO
Mediation analyses supply a principal lens to probe the pathways through which a treatment acts upon an outcome because they can dismantle and test the core components of treatments and test how these components function as a coordinated system or theory of action. Experimental evaluation of mediation effects in addition to total effects has become increasingly common but literature has developed only limited guidance on how to plan mediation studies with multi-tiered hierarchical or clustered structures. In this study, we provide methods for computing the power to detect mediation effects in three-level cluster-randomized designs that examine individual- (level one), intermediate- (level two) or cluster-level (level three) mediators. We assess the methods using a simulation and provide examples of a three-level clinic-randomized study (individuals nested within therapists nested within clinics) probing an individual-, intermediate- or cluster-level mediator using the R package PowerUpR and its Shiny application.
Assuntos
Modelos Estatísticos , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto , Tamanho da AmostraRESUMO
Objective: Analysis of the intermediate behaviors and mechanisms through which innovative therapies come to shape outcomes is a critical objective in many areas of psychotherapy research because it supports the iterative exploration, development and refinement of theories and therapies. Despite widespread interest in the intermediate behaviors and mechanisms that convey treatment effects, there is limited guidance on how to effectively and efficiently design studies to detect such mediated effects in the types of partially nested designs that commonly arise in psychotherapy research. In this study, we develop statistical power formulas to identify requisite sample sizes and guide the planning of studies probing mediation under two- and three-level partially nested designs. Method: We investigate multilevel mediation in partially nested structures and models for two- and three-level designs. Results: Well-powered studies probing mediation using partially nested designs will typically require moderate to large sample sizes or moderate to large effects. Discussion: We implement these formulas in the R package PowerUpR and a simple Shiny web application (https://poweruprshiny.shinyapps.io/PartiallyNestedMediationPower/) and demonstrate their use to plan studies using partially nested designs.
Assuntos
Interpretação Estatística de Dados , Modelos Estatísticos , Negociação , Psicoterapia , Humanos , Tamanho da Amostra , Resultado do TratamentoRESUMO
Multilevel mediation analyses play an essential role in helping researchers develop, probe, and refine theories of action underlying interventions and document how interventions impact outcomes. However, little is known about how to plan studies with sufficient power to detect such multilevel mediation effects. In this study, we describe how to prospectively estimate power and identify sufficient sample sizes for experiments intended to detect multilevel mediation effects. We outline a simple approach to estimate the power to detect mediation effects with individual- or cluster-level mediators using summary statistics easily obtained from empirical literature and the anticipated magnitude of the mediation effect. We draw on a running example to illustrate several different types of mediation and provide an accessible introduction to the design of multilevel mediation studies. The power formulas are implemented in the R package PowerUpR and the PowerUp software ( causalevaluation.org ).
Assuntos
Análise Multinível , Ensaios Clínicos Controlados Aleatórios como Assunto , Tamanho da Amostra , Interpretação Estatística de Dados , HumanosRESUMO
A key consideration in planning studies of community-based HIV education programs is identifying a sample size large enough to ensure a reasonable probability of detecting program effects if they exist. Sufficient sample sizes for community- or group-based designs are proportional to the correlation or similarity of individuals within communities. As a result, efficient and effective design requires reasonable a priori estimates of the correlational structure among individuals within communities. In this study, we investigate the degree of correlation among individuals within communities and regions using samples of sixth-grade adolescents from 609 local area district communities and 122 regions in 15 sub-Saharan African nations. We develop nation-specific and international summaries of these correlations using variance partitioning coefficients from multilevel models and subsequently assess the extent to which different types of background variables delineate key sources of these correlations. The results suggest persistent differences among communities and regions and that the degree of correlation among individuals within communities varied considerably by nation. The findings underscore the importance of empirically derived values of design parameters that are anchored in evidence specific to the outcome, nation and context of the planned study.
Assuntos
Serviços de Saúde Comunitária , Infecções por HIV/prevenção & controle , Educação em Saúde/métodos , África Subsaariana , Criança , Humanos , Modelos EstatísticosRESUMO
Mediation analyses have provided a critical platform to assess the validity of theories of action across a wide range of disciplines. Despite widespread interest and development in these analyses, literature guiding the design of mediation studies has been largely unavailable. Like studies focused on the detection of a total or main effect, an important design consideration is the statistical power to detect indirect effects if they exist. Understanding the sensitivity to detect indirect effects is exceptionally important because it directly influences the scale of data collection and ultimately governs the types of evidence group-randomized studies can bring to bear on theories of action. However, unlike studies concerned with the detection of total effects, literature has not established power formulas for detecting multilevel indirect effects in group-randomized designs. In this study, we develop closed-form expressions to estimate the variance of and the power to detect indirect effects in group-randomized studies with a group-level mediator using two-level linear models (i.e., 2-2-1 mediation). The results suggest that when carefully planned, group-randomized designs may frequently be well positioned to detect mediation effects with typical sample sizes. The resulting power formulas are implemented in the R package PowerUpR and the PowerUp!-Mediator software (causalevaluation.org).
Assuntos
Modelos Estatísticos , Distribuição Aleatória , Simulação por Computador , Interpretação Estatística de Dados , Método de Monte Carlo , Tamanho da Amostra , SoftwareRESUMO
PURPOSE: To examine the reliability and attributable facets of variance within an entrustment-derived workplace-based assessment system. METHOD: Faculty at the University of Cincinnati Medical Center internal medicine residency program (a 3-year program) assessed residents using discrete workplace-based skills called observable practice activities (OPAs) rated on an entrustment scale. Ratings from July 2012 to December 2016 were analyzed using applications of generalizability theory (G-theory) and decision study framework. Given the limitations of G-theory applications with entrustment ratings (the assumption that mean ratings are stable over time), a series of time-specific G-theory analyses and an overall longitudinal G-theory analysis were conducted to detail the reliability of ratings and sources of variance. RESULTS: During the study period, 166,686 OPA entrustment ratings were given by 395 faculty members to 253 different residents. Raters were the largest identified source of variance in both the time-specific and overall longitudinal G-theory analyses (37% and 23%, respectively). Residents were the second largest identified source of variation in the time-specific G-theory analyses (19%). Reliability was approximately 0.40 for a typical month of assessment (27 different OPAs, 2 raters, and 1-2 rotations) and 0.63 for the full sequence of ratings over 36 months. A decision study showed doubling the number of raters and assessments each month could improve the reliability over 36 months to 0.76. CONCLUSIONS: Ratings from the full 36 months of the examined program of assessment showed fair reliability. Increasing the number of raters and assessments per month could improve reliability, highlighting the need for multiple observations by multiple faculty raters.
Assuntos
Competência Clínica , Educação de Pós-Graduação em Medicina , Medicina Interna/educação , Confiança , Avaliação Educacional/métodos , Humanos , Reprodutibilidade dos TestesRESUMO
Structural equation modeling with full information maximum likelihood estimation is the predominant method to empirically assess complex theories involving multiple latent variables in addiction research. Although full information estimators have many desirable properties including consistency, a major limitation in structural equation models is that they often sustain significant bias when implemented in small to moderate size studies (e.g., fewer than 100 or 200). Recent literature has developed a limited information estimator designed to address this limitation-conceptually implemented through a bias-corrected factor score path analysis approach-that has been shown to produce unbiased and efficient estimates in small to moderate sample settings. Despite its theoretical and empirical merits, literature has suggested that the method is underused because of three primary reasons-the methods are unfamiliar to applied researchers, there is a lack of practical and accessible guidance and software available for applied researchers, and comparisons against full information methods that are grounded in discipline-specific examples are lacking. In this study, I delineate this method through a step-by-step analysis of a sequential mediation case study involving internet addiction. I provide example R code using the lavaan package and data based on a hypothetical study of addiction. I examine the differences between the full and limited information estimators within the example data and subsequently probe the extent to which these differences are indicative of a consistent divergence between the estimators using a simulation study. The results suggest that the limited information estimator outperforms the conventional full information maximum likelihood estimator in small to moderate sample sizes in terms of bias, efficiency, and power.
Assuntos
Comportamento Aditivo/epidemiologia , Viés , Análise Fatorial , Análise de Classes Latentes , Modelos Estatísticos , Tamanho da Amostra , Interpretação Estatística de Dados , Pesquisa Empírica , Humanos , Funções Verossimilhança , SoftwareRESUMO
OBJECTIVE:: Over the past two decades, the lack of reliable empirical evidence concerning the effectiveness of educational interventions has motivated a new wave of research in education in sub-Saharan Africa (and across most of the world) that focuses on impact evaluation through rigorous research designs such as experiments. Often these experiments draw on the random assignment of entire clusters, such as schools, to accommodate the multilevel structure of schooling and the theory of action underlying many school-based interventions. Planning effective and efficient school randomized studies, however, requires plausible values of the intraclass correlation coefficient (ICC) and the variance explained by covariates during the design stage. The purpose of this study was to improve the planning of two-level school-randomized studies in sub-Saharan Africa by providing empirical estimates of the ICC and the variance explained by covariates for education outcomes in 15 countries. METHOD:: Our investigation drew on large-scale representative samples of sixth-grade students in 15 countries in sub-Saharan Africa and includes over 60,000 students across 2,500 schools. We examined two core education outcomes: standardized achievement in reading and mathematics. We estimated a series of two-level hierarchical linear models with students nested within schools to inform the design of two-level school-randomized trials. RESULTS:: The analyses suggested that outcomes were substantially clustered within schools but that the magnitude of the clustering varied considerably across countries. Similarly, the results indicated that covariance adjustment generally reduced clustering but that the prognostic value of such adjustment varied across countries.
RESUMO
An important assumption underlying meaningful comparisons of scores in rater-mediated assessments is that measurement is commensurate across raters. When raters differentially apply the standards established by an instrument, scores from different raters are on fundamentally different scales and no longer preserve a common meaning and basis for comparison. In this study, we developed a method to accommodate measurement noninvariance across raters when measurements are cross-classified within two distinct hierarchical units. We conceptualized random item effects cross-classified graded response models and used random discrimination and threshold effects to test, calibrate, and account for measurement noninvariance among raters. By leveraging empirical estimates of rater-specific deviations in the discrimination and threshold parameters, the proposed method allows us to identify noninvariant items and empirically estimate and directly adjust for this noninvariance within a cross-classified framework. Within the context of teaching evaluations, the results of a case study suggested substantial noninvariance across raters and that establishing an approximately invariant scale through random item effects improves model fit and predictive validity.
RESUMO
OBJECTIVES: Group-randomized designs are well suited for studies of professional development because they can accommodate programs that are delivered to intact groups (e.g., schools), the collaborative nature of professional development, and extant teacher/school assignments. Though group designs may be theoretically favorable, prior evidence has suggested that they may be challenging to conduct in professional development studies because well-powered designs will typically require large sample sizes or expect large effect sizes. Using teacher knowledge outcomes in mathematics, we investigated when and the extent to which there is evidence that covariance adjustment on a pretest, teacher certification, or demographic covariates can reduce the sample size necessary to achieve reasonable power. METHOD: Our analyses drew on multilevel models and outcomes in five different content areas for over 4,000 teachers and 2,000 schools. Using these estimates, we assessed the minimum detectable effect sizes for several school-randomized designs with and without covariance adjustment. RESULTS: The analyses suggested that teachers' knowledge is substantially clustered within schools in each of the five content areas and that covariance adjustment for a pretest or, to a lesser extent, teacher certification, has the potential to transform designs that are unreasonably large for professional development studies into viable studies.
RESUMO
This study examined the practical problem of covariate selection in propensity scores (PSs) given a predetermined set of covariates. Because the bias reduction capacity of a confounding covariate is proportional to the concurrent relationships it has with the outcome and treatment, particular focus is set on how we might approximate covariate-outcome relationships while retaining the PS as a design tool (i.e., without using the observed outcomes). To make this approach tractable, I examined the extent to which alternative measures of the outcome might inform covariate-outcome empirical relationships and corresponding covariate selection. Specifically, two such measures were examined: proximal pretreatment measures of the outcome and cross validation. Further, because the implications of covariate choice reach beyond the properties of the treatment effect estimator, I reason that the primary objective of PS covariate selection is to effectively and efficiently reduce bias while forming a scientific basis for inference through, for example, covariate balance. By using outcome proxies or cross validation, substantive knowledge is augmented with empirical evidence of covariates' bias reduction/amplification capacities to better inform covariate selection, improve estimation, and form an evidentiary basis for inference.