ABSTRACT
Single case experimental designs are an important research design in behavioral and medical research. Although there are design standards prescribed by the What Works Clearinghouse for single case experimental designs, these standards do not include statistically derived power computations. Recently we derived the equations for computing power for (AB)k designs. However, these computations and the software code in R may not be accessible to applied researchers who are most likely to want to compute power for their studies. Therefore, we have developed an (AB)k power calculator Shiny App (https://abkpowercalculator.shinyapps.io/ABkpowercalculator/) that researchers can use with no software training. These power computations assume that the researcher would be interested in fitting multilevel models with autocorrelations or conduct similar analyses. The purpose of this software contribution is to briefly explain how power is derived for balanced (AB)k designs and to elaborate on how to use the Shiny App. The app works well on not just computers but mobile phones without installing the R program. We believe this can be a valuable tool for practitioners and applied researchers who want to plan their single case studies with sufficient power to detect appropriate effect sizes.
Subject(s)
Mobile Applications , Research Design , Multilevel AnalysisABSTRACT
Currently, the design standards for single-case experimental designs (SCEDs) are based on validity considerations as prescribed by the What Works Clearinghouse. However, there is a need for design considerations such as power based on statistical analyses. We compute and derive power using computations for (AB)k designs with multiple cases which are common in SCEDs. Our computations show that effect size has the maximum impact on power followed by the number of subjects and then the number of phase reversals. An effect size of 0.75 or higher, at least one set of phase reversals (i.e., where k > 1), and at least three subjects showed high power. The latter two conditions agree with current standards about either having at least an ABAB design or a multiple baseline design with three subjects to meet design standards. An effect size of 0.75 or higher is not uncommon in SCEDs either. Autocorrelations, the number of time-points per phase, and intraclass correlations had a smaller but non-negligible impact on power. In sum, power analyses in the present study show that conditions to meet power requirements are not unreasonable in SCEDs. The software code to compute power is available on GitHub for the use of the reader.
Subject(s)
Research Design , HumansABSTRACT
Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question. The best reviews synthesize studies to draw broad theoretical conclusions about what a literature means, linking theory to evidence and evidence to theory. This guide describes how to plan, conduct, organize, and present a systematic review of quantitative (meta-analysis) or qualitative (narrative review, meta-synthesis) information. We outline core standards and principles and describe commonly encountered problems. Although this guide targets psychological scientists, its high level of abstraction makes it potentially relevant to any subject area or discipline. We argue that systematic reviews are a key methodology for clarifying whether and how research findings replicate and for explaining possible inconsistencies, and we call for researchers to conduct systematic reviews to help elucidate whether there is a replication crisis.
Subject(s)
Meta-Analysis as Topic , Systematic Reviews as Topic , Humans , Guidelines as Topic , Publication Bias , Review Literature as TopicABSTRACT
BACKGROUND: Concerns persist regarding the effect of current surgical resident duty-hour policies on patient outcomes, resident education, and resident well-being. METHODS: We conducted a national, cluster-randomized, pragmatic, noninferiority trial involving 117 general surgery residency programs in the United States (2014-2015 academic year). Programs were randomly assigned to current Accreditation Council for Graduate Medical Education (ACGME) duty-hour policies (standard-policy group) or more flexible policies that waived rules on maximum shift lengths and time off between shifts (flexible-policy group). Outcomes included the 30-day rate of postoperative death or serious complications (primary outcome), other postoperative complications, and resident perceptions and satisfaction regarding their well-being, education, and patient care. RESULTS: In an analysis of data from 138,691 patients, flexible, less-restrictive duty-hour policies were not associated with an increased rate of death or serious complications (9.1% in the flexible-policy group and 9.0% in the standard-policy group, P=0.92; unadjusted odds ratio for the flexible-policy group, 0.96; 92% confidence interval, 0.87 to 1.06; P=0.44; noninferiority criteria satisfied) or of any secondary postoperative outcomes studied. Among 4330 residents, those in programs assigned to flexible policies did not report significantly greater dissatisfaction with overall education quality (11.0% in the flexible-policy group and 10.7% in the standard-policy group, P=0.86) or well-being (14.9% and 12.0%, respectively; P=0.10). Residents under flexible policies were less likely than those under standard policies to perceive negative effects of duty-hour policies on multiple aspects of patient safety, continuity of care, professionalism, and resident education but were more likely to perceive negative effects on personal activities. There were no significant differences between study groups in resident-reported perception of the effect of fatigue on personal or patient safety. Residents in the flexible-policy group were less likely than those in the standard-policy group to report leaving during an operation (7.0% vs. 13.2%, P<0.001) or handing off active patient issues (32.0% vs. 46.3%, P<0.001). CONCLUSIONS: As compared with standard duty-hour policies, flexible, less-restrictive duty-hour policies for surgical residents were associated with noninferior patient outcomes and no significant difference in residents' satisfaction with overall well-being and education quality. (FIRST ClinicalTrials.gov number, NCT02050789.).
Subject(s)
General Surgery/education , Internship and Residency/organization & administration , Job Satisfaction , Postoperative Complications/epidemiology , Workload/standards , Accreditation , Continuity of Patient Care , Education, Medical, Graduate/standards , Fatigue , Hospital Administration , Humans , Patient Safety , Personnel Staffing and Scheduling , Postoperative Complications/mortality , Surgical Procedures, Operative/mortality , United States , Work Schedule ToleranceABSTRACT
Equation (26) is formatted incorrectly in the pdf version. It should appear as follows.
ABSTRACT
Some experimental designs involve clustering within only one treatment group. Such designs may involve group tutoring, therapy administered by multiple therapists, or interventions administered by clinics for the treatment group, whereas the control group receives no treatment. In such cases, the data analysis often proceeds as if there were no clustering within the treatment group. A consequence is that the actual significance level of the treatment effects is larger (i.e., actual p values are larger) than nominal. Additionally, biases will be introduced in estimates of the effect sizes and their variances, leading to inflated effects and underestimated variances when clustering in the treatment group is not taken into account. These consequences of clustering can seriously compromise the interpretation of study results. This article shows how information on the intraclass correlation can be used to obtain a correction for biases in the effect sizes and their variances, and also to obtain an adjustment to the significance test for the effects of clustering.
Subject(s)
Cluster Analysis , Data Interpretation, Statistical , Psychotherapy/statistics & numerical data , Treatment Outcome , Algorithms , Battered Women , Bias , Cognitive Behavioral Therapy , Female , Humans , Psychiatric Status Rating Scales , Research Design , Stress Disorders, Post-Traumatic/psychology , Stress Disorders, Post-Traumatic/therapyABSTRACT
We describe a standardised mean difference statistic (d) for single-case designs that is equivalent to the usual d in between-groups experiments. We show how it can be used to summarise treatment effects over cases within a study, to do power analyses in planning new studies and grant proposals, and to meta-analyse effects across studies of the same question. We discuss limitations of this d-statistic, and possible remedies to them. Even so, this d-statistic is better founded statistically than other effect size measures for single-case design, and unlike many general linear model approaches such as multilevel modelling or generalised additive models, it produces a standardised effect size that can be integrated over studies with different outcome measures. SPSS macros for both effect size computation and power analysis are available.
Subject(s)
Research Design/statistics & numerical data , Humans , Meta-Analysis as TopicABSTRACT
Conventional random-effects models in meta-analysis rely on large sample approximations instead of exact small sample results. While random-effects methods produce efficient estimates and confidence intervals for the summary effect have correct coverage when the number of studies is sufficiently large, we demonstrate that conventional methods result in confidence intervals that are not wide enough when the number of studies is small, depending on the configuration of sample sizes across studies, the degree of true heterogeneity and number of studies. We introduce two alternative variance estimators with better small sample properties, investigate degrees of freedom adjustments for computing confidence intervals, and study their effectiveness via simulation studies.
Subject(s)
Models, Statistical , Computer Simulation , Sample SizeABSTRACT
Researchers in the single-case design tradition have debated the size and importance of the observed autocorrelations in those designs. All of the past estimates of the autocorrelation in that literature have taken the observed autocorrelation estimates as the data to be used in the debate. However, estimates of the autocorrelation are subject to great sampling error when the design has a small number of time points, as is typically the situation in single-case designs. Thus, a given observed autocorrelation may greatly over- or underestimate the corresponding population parameter. This article presents Bayesian estimates of the autocorrelation that greatly reduce the role of sampling error, as compared to past estimators. Simpler empirical Bayes estimates are presented first, in order to illustrate the fundamental notions of autocorrelation sampling error and shrinkage, followed by fully Bayesian estimates, and the difference between the two is explained. Scripts to do the analyses are available as supplemental materials. The analyses are illustrated using two examples from the single-case design literature. Bayesian estimation warrants wider use, not only in debates about the size of autocorrelations, but also in statistical methods that require an independent estimate of the autocorrelation to analyze the data.
Subject(s)
Bayes Theorem , Models, Statistical , Data Interpretation, Statistical , Humans , Regression Analysis , Research Design , Sample Size , Selection BiasABSTRACT
It is common practice in both randomized and quasi-experiments to adjust for baseline characteristics when estimating the average effect of an intervention. The inclusion of a pre-test, for example, can reduce both the standard error of this estimate and-in non-randomized designs-its bias. At the same time, it is also standard to report the effect of an intervention in standardized effect size units, thereby making it comparable to other interventions and studies. Curiously, the estimation of this effect size, including covariate adjustment, has received little attention. In this article, we provide a framework for defining effect sizes in designs with a pre-test (e.g., difference-in-differences and analysis of covariance) and propose estimators of those effect sizes. The estimators and approximations to their sampling distributions are evaluated using a simulation study and then demonstrated using an example from published data.
Subject(s)
Computer Simulation , Statistics as Topic , Research DesignABSTRACT
Descriptive analyses of socially important or theoretically interesting phenomena and trends are a vital component of research in the behavioral, social, economic, and health sciences. Such analyses yield reliable results when using representative individual participant data (IPD) from studies with complex survey designs, including educational large-scale assessments (ELSAs) or social, health, and economic survey and panel studies. The meta-analytic integration of these results offers unique and novel research opportunities to provide strong empirical evidence of the consistency and generalizability of important phenomena and trends. Using ELSAs as an example, this tutorial offers methodological guidance on how to use the two-stage approach to IPD meta-analysis to account for the statistical challenges of complex survey designs (e.g., sampling weights, clustered and missing IPD), first, to conduct descriptive analyses (Stage 1), and second, to integrate results with three-level meta-analytic and meta-regression models to take into account dependencies among effect sizes (Stage 2). The two-stage approach is illustrated with IPD on reading achievement from the Programme for International Student Assessment (PISA). We demonstrate how to analyze and integrate standardized mean differences (e.g., gender differences), correlations (e.g., with students' socioeconomic status [SES]), and interactions between individual characteristics at the participant level (e.g., the interaction between gender and SES) across several PISA cycles. All the datafiles and R scripts we used are available online. Because complex social, health, or economic survey and panel studies share many methodological features with ELSAs, the guidance offered in this tutorial is also helpful for synthesizing research evidence from these studies.
Subject(s)
Students , Humans , Surveys and QuestionnairesABSTRACT
A great deal of educational and social data arises from cluster sampling designs where clusters involve schools, classrooms, or communities. A mistake that is sometimes encountered in the analysis of such data is to ignore the effect of clustering and analyse the data as if it were based on a simple random sample. This typically leads to an overstatement of the precision of results and too liberal conclusions about precision and statistical significance of mean differences. This paper gives simple corrections to the test statistics that would be computed in an analysis of variance if clustering were (incorrectly) ignored. The corrections are multiplicative factors depending on the total sample size, the cluster size, and the intraclass correlation structure. For example, the corrected F statistic has Fisher's F distribution with reduced degrees of freedom. The corrected statistic reduces to the F statistic computed by ignoring clustering when the intraclass correlations are zero. It reduces to the F statistic computed using cluster means when the intraclass correlations are unity, and it is in between otherwise. A similar adjustment to the usual statistic for testing a linear contrast among group means is described.
Subject(s)
Analysis of Variance , Cluster Analysis , Data Collection/statistics & numerical data , Humans , Models, Statistical , Psychology, Educational/statistics & numerical data , Psychology, Social/statistics & numerical data , Randomized Controlled Trials as Topic/statistics & numerical data , Sample Size , Sampling Studies , Statistics as TopicABSTRACT
The present longitudinal study examines the role of caregiver speech in language development, especially syntactic development, using 47 parent-child pairs of diverse SES background from 14 to 46 months. We assess the diversity (variety) of words and syntactic structures produced by caregivers and children. We use lagged correlations to examine language growth and its relation to caregiver speech. Results show substantial individual differences among children, and indicate that diversity of earlier caregiver speech significantly predicts corresponding diversity in later child speech. For vocabulary, earlier child speech also predicts later caregiver speech, suggesting mutual influence. However, for syntax, earlier child speech does not significantly predict later caregiver speech, suggesting a causal flow from caregiver to child. Finally, demographic factors, notably SES, are related to language growth, and are, at least partially, mediated by differences in caregiver speech, showing the pervasive influence of caregiver speech on language growth.
Subject(s)
Caregivers , Language Development , Speech , Child, Preschool , Critical Period, Psychological , Female , Humans , Infant , Longitudinal Studies , Male , VocabularyABSTRACT
In this study, we reanalyze recent empirical research on replication from a meta-analytic perspective. We argue that there are different ways to define "replication failure," and that analyses can focus on exploring variation among replication studies or assess whether their results contradict the findings of the original study. We apply this framework to a set of psychological findings that have been replicated and assess the sensitivity of these analyses. We find that tests for replication that involve only a single replication study are almost always severely underpowered. Among the 40 findings for which ensembles of multisite direct replications were conducted, we find that between 11 and 17 (28% to 43%) ensembles produced heterogeneous effects, depending on how replication is defined. This heterogeneity could not be completely explained by moderators documented by replication research programs. We also find that these ensembles were not always well-powered to detect potentially meaningful values of heterogeneity. Finally, we identify several discrepancies between the results of original studies and the distribution of effects found by multisite replications but note that these analyses also have low power. We conclude by arguing that efforts to assess replication would benefit from further methodological work on designing replication studies to ensure analyses are sufficiently sensitive. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Subject(s)
Psychology/methods , Research Design , Humans , Meta-Analysis as Topic , Reproducibility of ResultsABSTRACT
Introduction: There is a great need for analytic techniques that allow for the synthesis of learning across seemingly idiosyncratic interventions. Objectives: The primary objective of this paper is to introduce taxonomic meta-analysis and explain how it is different from conventional meta-analysis. Results: Conventional meta-analysis has previously been used to examine the effectiveness of childhood obesity prevention interventions. However, these tend to examine narrowly defined sections of obesity prevention initiatives, and as such, do not allow the field to draw conclusions across settings, participants, or subjects. Compared with conventional meta-analysis, taxonomic meta-analysis widens the aperture of what can be examined to synthesize evidence across interventions with diverse topics, goals, research designs, and settings. A component approach is employed to examine interventions at the level of their essential features or activities to identify the concrete aspects of interventions that are used (intervention components), characteristics of the intended populations (target population or intended recipient characteristics), and facets of the environments in which they operate (contextual elements), and the relationship of these components to effect size. In addition, compared with conventional meta-analysis methods, taxonomic meta-analyses can include the results of natural experiments, policy initiatives, program implementation efforts and highly controlled experiments (as examples) regardless of the design of the report being analyzed as long as the intended outcome is the same. It also characterizes the domain of interventions that have been studied. Conclusion: Taxonomic meta-analysis can be a powerful tool for summarizing the evidence that exists and for generating hypotheses that are worthy of more rigorous testing.
Subject(s)
Pediatric Obesity , Child , Humans , Pediatric Obesity/epidemiology , Pediatric Obesity/prevention & controlABSTRACT
Meta-analysis has been used to examine the effectiveness of childhood obesity prevention efforts, yet traditional conventional meta-analytic methods restrict the kinds of studies included, and either narrowly define mechanisms and agents of change, or examine the effectiveness of whole interventions as opposed to the specific actions that comprise interventions. Taxonomic meta-analytic methods widen the aperture of what can be included in a meta-analysis data set, allowing for inclusion of many types of interventions and study designs. The National Collaborative on Childhood Obesity Research Childhood Obesity Evidence Base (COEB) project focuses on interventions intended to prevent childhood obesity in children 2-5 years old who have an outcome measure of BMI. The COEB created taxonomies, anchored in the Social Ecological Model, which catalog specific outcomes, intervention components, intended recipients, and contexts of policies, initiatives, and interventions conducted at the individual, interpersonal, organizational, community, and societal level. Taxonomies were created by discovery from the literature itself using grounded theory. This article describes the process used for a novel taxonomic meta-analysis of childhood obesity prevention studies between the years 2010 and 2019. This method can be applied to other areas of research, including obesity prevention in additional populations.
Subject(s)
Pediatric Obesity , Child , Child, Preschool , Humans , Pediatric Obesity/epidemiology , Pediatric Obesity/prevention & controlABSTRACT
Objective: To evaluate the efficacy of childhood obesity interventions and conduct a taxonomy of intervention components that are most effective in changing obesity-related health outcomes in children 2-5 years of age. Methods: Comprehensive searches located 51 studies from 18,335 unique records. Eligible studies: (1) assessed children aged 2-5, living in the United States; (2) evaluated an intervention to improve weight status; (3) identified a same-aged comparison group; (4) measured BMI; and (5) were available between January 2005 and August 2019. Coders extracted study, sample, and intervention characteristics. Effect sizes [ESs; and 95% confidence intervals (CIs)] were calculated by using random-effects models. Meta-regression was used to determine which intervention components explain variability in ESs. Results: Included were 51 studies evaluating 58 interventions (N = 29,085; mean age = 4 years; 50% girls). Relative to controls, children receiving an intervention had a lower BMI at the end of the intervention (g = 0.10, 95% CI = 0.02-0.18; k = 55) and at the last follow-up (g = 0.17, 95% CI = 0.04-0.30; k = 14; range = 18-143 weeks). Three intervention components moderated efficacy: engage caregivers in praise/encouragement for positive health-related behavior; provide education about the importance of screen time reduction to caregivers; and engage pediatricians/health care providers. Conclusions: Early childhood obesity interventions are effective in reducing BMI in preschool children. Our findings suggest that facilitating caregiver education about the importance of screen time reduction may be an important strategy in reducing early childhood obesity.
Subject(s)
Pediatric Obesity , Caregivers , Child , Child, Preschool , Educational Status , Female , Health Behavior , Health Education , Humans , Male , Pediatric Obesity/epidemiology , Pediatric Obesity/prevention & controlABSTRACT
Formal empirical assessments of replication have recently become more prominent in several areas of science, including psychology. These assessments have used different statistical approaches to determine if a finding has been replicated. The purpose of this article is to provide several alternative conceptual frameworks that lead to different statistical analyses to test hypotheses about replication. All of these analyses are based on statistical methods used in meta-analysis. The differences among the methods described involve whether the burden of proof is placed on replication or nonreplication, whether replication is exact or allows for a small amount of "negligible heterogeneity," and whether the studies observed are assumed to be fixed (constituting the entire body of relevant evidence) or are a sample from a universe of possibly relevant studies. The statistical power of each of these tests is computed and shown to be low in many cases, raising issues of the interpretability of tests for replication. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Subject(s)
Data Interpretation, Statistical , Meta-Analysis as Topic , Models, Statistical , Psychology/methods , Reproducibility of Results , Research Design , HumansABSTRACT
In this rejoinder, we discuss Mathur and VanderWeele's response to our article, "Statistical Analyses for Studying Replication: Meta-Analytic Perspectives," which appears in this current issue. We attempt to clarify a point of confusion regarding the inclusion of an original study in an analysis of replication, and the potential impact of publication bias. We then discuss the methods used by Mathur and VanderWeele to conduct an alternative analysis of the Gambler's Fallacy example from our article. We highlight that there are some potential statistical and conceptual differences to their approach compared to what we propose in our article. (PsycINFO Database Record (c) 2019 APA, all rights reserved).