RESUMO
Partialing is a statistical approach researchers use with the goal of removing extraneous variance from a variable before examining its association with other variables. Controlling for confounds through analysis of covariance or multiple regression analysis and residualizing variables for use in subsequent analyses are common approaches to partialing in clinical research. Despite its intuitive appeal, partialing is fraught with undesirable consequences when predictors are correlated. After describing effects of partialing on variables, we review analytic approaches commonly used in clinical research to make inferences about the nature and effects of partialed variables. We then use two simulations to show how partialing can distort variables and their relations with other variables. Having concluded that, with rare exception, partialing is ill-advised, we offer recommendations for reducing or eliminating problematic uses of partialing. We conclude that the best alternative to partialing is to define and measure constructs so that it is not needed.
RESUMO
There are several approaches to incorporating uncertainty in power analysis. We review these approaches and highlight the Bayesian-classical hybrid approach that has been implemented in the R package hybridpower. Calculating Bayesian-classical hybrid power circumvents the problem of local optimality in which calculated power is valid if and only if the specified inputs are perfectly correct. hybridpower can compute classical and Bayesian-classical hybrid power for popular testing procedures including the t-test, correlation, simple linear regression, one-way ANOVA (with equal or unequal variances), and the sign test. Using several examples, we demonstrate features of hybridpower and illustrate how to elicit subjective priors, how to determine sample size from the Bayesian-classical approach, and how this approach is distinct from related methods. hybridpower can conduct power analysis for the classical approach, and more importantly, the novel Bayesian-classical hybrid approach that returns more realistic calculations by taking into account local optimality that the classical approach ignores. For users unfamiliar with R, we provide a limited number of RShiny applications based on hybridpower to promote the accessibility of this novel approach to power analysis. We end with a discussion on future developments in hybridpower.
Assuntos
Projetos de Pesquisa , Teorema de Bayes , Tamanho da Amostra , Modelos Lineares , IncertezaRESUMO
Confidence intervals (CIs) are fundamental inferential devices which quantify the sampling variability of parameter estimates. In item response theory, CIs have been primarily obtained from large-sample Wald-type approaches based on standard error estimates, derived from the observed or expected information matrix, after parameters have been estimated via maximum likelihood. An alternative approach to constructing CIs is to quantify sampling variability directly from the likelihood function with a technique known as profile-likelihood confidence intervals (PL CIs). In this article, we introduce PL CIs for item response theory models, compare PL CIs to classical large-sample Wald-type CIs, and demonstrate important distinctions among these CIs. CIs are then constructed for parameters directly estimated in the specified model and for transformed parameters which are often obtained post-estimation. Monte Carlo simulation results suggest that PL CIs perform consistently better than Wald-type CIs for both non-transformed and transformed parameters.
Assuntos
Intervalos de Confiança , Modelos Psicológicos , Interpretação Estatística de Dados , Funções Verossimilhança , Método de Monte CarloRESUMO
When statistical models are employed to provide a parsimonious description of empirical relationships, the extent to which strong conclusions can be drawn rests on quantifying the uncertainty in parameter estimates. In multiple linear regression (MLR), regression weights carry two kinds of uncertainty represented by confidence sets (CSs) and exchangeable weights (EWs). Confidence sets quantify uncertainty in estimation whereas the set of EWs quantify uncertainty in the substantive interpretation of regression weights. As CSs and EWs share certain commonalities, we clarify the relationship between these two kinds of uncertainty about regression weights. We introduce a general framework describing how CSs and the set of EWs for regression weights are estimated from the likelihood-based and Wald-type approach, and establish the analytical relationship between CSs and sets of EWs. With empirical examples on posttraumatic growth of caregivers (Cadell et al., 2014; Schneider, Steele, Cadell & Hemsworth, 2011) and on graduate grade point average (Kuncel, Hezlett & Ones, 2001), we illustrate the usefulness of CSs and EWs for drawing strong scientific conclusions. We discuss the importance of considering both CSs and EWs as part of the scientific process, and provide an Online Appendix with R code for estimating Wald-type CSs and EWs for k regression weights.
Assuntos
Modelos Lineares , Análise Multivariada , Algoritmos , Cuidadores/psicologia , Interpretação Estatística de Dados , Educação de Pós-Graduação , Escolaridade , Humanos , Funções Verossimilhança , Software , Estresse Psicológico , IncertezaRESUMO
PURPOSE: Knowing how to improve the dying experience for patients with end-stage cancer is essential for cancer professionals. However, there is little evidence on the relationship between clinically relevant factors and quality of death. Also, while hospice has been linked with improved outcomes, our understanding of factors that contribute to a "good death" when hospice is involved remains limited. This study (1) identified correlates of a good death and (2) provided evidence on the impact of hospice on quality of death. METHODS: Using data from a survey of US households affected by cancer (N = 930, response rate 51 %), we fit regression models with a subsample of 158 respondents who had experienced the death of a family member with cancer. Measures included quality of death (good/bad) and clinically relevant factors including: hospice involvement, symptoms during treatment, whether wishes were followed, provider knowledge/expertise, and compassion. RESULTS: Respondents were 60 % female, 89 % White, and averaged 57 years old. Decedents were most often a respondent's spouse (46 %). While 73 % of respondents reported a good death, Hispanics were less likely to experience good death (p = 0.007). Clinically relevant factors, including hospice, were associated with good death (p < 0.05)--an exception being whether the physician said the cancer was curable/fatal. With adjustments, perceptions of provider knowledge/expertise was the only clinical factor that remained associated with good death. CONCLUSIONS: Enhanced provider training/communication, referrals to hospice and greater attention to symptom management may facilitate improved quality of dying. Additionally, the cultural relevance of the concept of a "good death" warrants further research.
Assuntos
Morte , Família , Cuidados Paliativos na Terminalidade da Vida/normas , Neoplasias/psicologia , Neoplasias/terapia , Adulto , Idoso , Idoso de 80 Anos ou mais , Atitude Frente a Morte , Comunicação , Cultura , Coleta de Dados , Família/psicologia , Feminino , Hospitais para Doentes Terminais/normas , Humanos , Masculino , Pessoa de Meia-Idade , Neoplasias/mortalidade , Cuidados Paliativos/normas , Relações Profissional-FamíliaAssuntos
Teorema de Bayes , Interpretação Estatística de Dados , Software , Humanos , Tempo de Reação , Teste de Stroop , IncertezaRESUMO
Observed scores (e.g., summed scores and estimated factor scores) are assumed to reflect underlying constructs and have many uses in psychological science. Constructs are often operationalized as latent variables (LVs), which are mathematically defined by their relations with manifest variables in an LV measurement model (e.g., common factor model). We examine the performance of several types of observed scores for the purposes of (a) estimating latent scores and classifying people and (b) recovering structural relations among LVs. To better reflect practice, our evaluation takes into account different sources of uncertainty (i.e., sampling error and model error). We review psychometric properties of observed scores based on the classical test theory applied to common factor models, report on a simulation study examining their performance, and provide two empirical examples to illustrate how different scores perform under different conditions of reliability, sample size, and model error. We conclude with general recommendations for using observed scores and discuss future research directions. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
RESUMO
Reliability is an essential measure of how closely observed scores represent latent scores (reflecting constructs), assuming some latent variable measurement model. We present a general theoretical framework of reliability, placing emphasis on measuring the association between latent and observed scores. This framework was inspired by McDonald's (Psychometrika, 76, 511) regression framework, which highlighted the coefficient of determination as a measure of reliability. We extend McDonald's (Psychometrika, 76, 511) framework beyond coefficients of determination and introduce four desiderata for reliability measures (estimability, normalization, symmetry, and invariance). We also present theoretical examples to illustrate distinct measures of reliability and report on a numerical study that demonstrates the behaviour of different reliability measures. We conclude with a discussion on the use of reliability coefficients and outline future avenues of research.
RESUMO
Wu and Browne (Psychometrika 80(3):571-600, 2015. https://doi.org/10.1007/s11336-015-9451-3 ; henceforth W &B) introduced the notion of adventitious error to explicitly take into account approximate goodness of fit of covariance structure models (CSMs). Adventitious error supposes that observed covariance matrices are not directly sampled from a theoretical population covariance matrix but from an operational population covariance matrix. This operational matrix is randomly distorted from the theoretical matrix due to differences in study implementations. W &B showed how adventitious error is linked to the root mean square error of approximation (RMSEA) and how the standard errors (SEs) of parameter estimates are augmented. Our contribution is to consider adventitious error as a general phenomenon and to illustrate its consequences. Using simulations, we illustrate that its impact on SEs can be generalized to pairwise relations between variables beyond the CSM context. Using derivations, we conjecture that heterogeneity of effect sizes across studies and overestimation of statistical power can both be interpreted as stemming from adventitious error. We also show that adventitious error, if it occurs, has an impact on the uncertainty of composite measurement outcomes such as factor scores and summed scores. The results of a simulation study show that the impact on measurement uncertainty is rather small although larger for factor scores than for summed scores. Adventitious error is an assumption about the data generating mechanism; the notion offers a statistical framework for understanding a broad range of phenomena, including approximate fit, varying research findings, heterogeneity of effects, and overestimates of power.
Assuntos
Modelos Estatísticos , Psicometria , Humanos , Psicometria/métodos , Simulação por Computador , Interpretação Estatística de DadosRESUMO
tatistical power is a topic of intense interest as part of proposed methodological reforms to improve the defensibility of psychological findings. Power has been used in disparate ways-some that follow and some that do not follow from definitional features of statistical power. We introduce a taxonomy on three uses of power (comparing the performance of different procedures, designing or planning studies, and evaluating completed studies) in the context of new developments that consider uncertainty due to sampling variability. This review first describes fundamental concepts underlying power, new quantitative developments in power analysis, and the application of power analysis in designing studies. To facilitate the pedagogy of using power for design, we provide web applications to illustrate these concepts and examples of power analysis using newly developed methods. We also describe why using power for evaluating completed studies can be counterproductive. We conclude with a discussion of future directions in quantitative research on power analysis and provide recommendations for applying power in substantive research. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
RESUMO
Much empirical science involves evaluating alternative explanations for the obtained data. For example, given certain assumptions underlying a statistical test, a "significant" result generally refers to implausibility of a null (zero) effect in the population producing the obtained study data. However, methodological work on various versions of p-hacking (i.e., using different analysis strategies until a "significant" result is produced) questions whether significant p-values might often reflect false findings. Indeed, initial simulations of single studies showed that the potential for finding "significant" but false findings might be much higher than the nominal .05 value when various analysis flexibilities are undertaken. In many settings, however, research articles report multiple studies using consistent methods across the studies, where those consistent methods would constrain the flexibilities used to produce high false-finding rates for simulations of single studies. Thus, we conducted simulations of study sets. These simulations show that consistent methods across studies (i.e., consistent in terms of which measures are analyzed, which conditions are included, and whether and how covariates are included) dramatically reduce the potential for flexible research practices (p-hacking) to produce consistent sets of significant results across studies. For p-hacking to produce even modest probabilities of a consistent set of studies would require (a) a large amount of selectivity in study reporting and (b) severe (and quite intentional) versions of p-hacking. With no more than modest selective reporting and with consistent methods across studies, p-hacking does not provide a plausible explanation for consistent empirical results across studies, especially as the size of the reported study set increases. In addition, the simulations show that p-hacking can produce high rates of false findings for single studies with very large samples. In contrast, a series of methodologically-consistent studies (even with much smaller samples) is much less vulnerable to the forms of p-hacking examined in the simulations.
Assuntos
Simulação por Computador , Humanos , Interpretação Estatística de Dados , Projetos de PesquisaRESUMO
The calculation of statistical power has been taken up as a simple yet informative tool to assist in designing an experiment, particularly in justifying sample size. A difficulty with using power for this purpose is that the classical power formula does not incorporate sources of uncertainty (e.g., sampling variability) that can impact the computed power value, leading to a false sense of precision and confidence in design choices. We use simulations to demonstrate the consequences of adding two common sources of uncertainty to the calculation of power. Sampling variability in the estimated effect size (Cohen's d) can introduce a large amount of uncertainty (e.g., sometimes producing rather flat distributions) in power and sample-size determination. The addition of random fluctuations in the population effect size can cause values of its estimates to take on a sign opposite the population value, making calculated power values meaningless. These results suggest that calculated power values or use of such values to justify sample size add little to planning a study. As a result, researchers should put little confidence in power-based choices when planning future studies. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Assuntos
Incerteza , Humanos , Tamanho da AmostraRESUMO
Currently, there is little guidance for navigating measurement challenges that threaten construct validity in replication research. To identify common challenges and ultimately strengthen replication research, we conducted a systematic review of the measures used in the 100 original and replication studies from the Reproducibility Project: Psychology (Open Science Collaboration, 2015). Results indicate that it was common for scales used in the original studies to have little or no validity evidence. Our systematic review demonstrates and corroborates evidence that issues of construct validity are sorely neglected in original and replicated research. We identify four measurement challenges replicators are likely to face: a lack of essential measurement information, a lack of validity evidence, measurement differences, and translation. Next, we offer solutions for addressing these challenges that will improve measurement practices in original and replication research. Finally, we close with a discussion of the need to develop measurement methodologies for the next generation of replication research. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Assuntos
Reprodutibilidade dos TestesRESUMO
Traditionally, statistical power was viewed as relevant to research planning but not evaluation of completed research. However, following discussions of high false finding rates (FFRs) associated with low statistical power, the assumed level of statistical power has become a key criterion for research acceptability. Yet, the links between power and false findings are not as straightforward as described. Assumptions underlying FFR calculations do not reflect research realities in personality and social psychology. Even granting the assumptions, the FFR calculations identify important limitations to any general influences of statistical power. Limits for statistical power in inflating false findings can also be illustrated through the use of FFR calculations to (a) update beliefs about the null or alternative hypothesis and (b) assess the relative support for the null versus alternative hypothesis when evaluating a set of studies. Taken together, statistical power should be de-emphasized in comparison to current uses in research evaluation.
Assuntos
Personalidade , Psicologia Social , HumanosRESUMO
Power analysis serves as the gold standard for evaluating study feasibility and justifying sample size. However, mainstream power analysis is often oversimplified, poorly reflecting complex reality during data analysis. This article highlights the complexities inherent in power analysis, especially when uncertainties present in data analysis are realistically taken into account. We introduce a Bayesian-classical hybrid approach to power analysis, which formally incorporates three sources of uncertainty into power estimates: (a) epistemic uncertainty regarding the unknown values of the effect size of interest, (b) sampling variability, and (c) uncertainty due to model approximation (i.e., models fit data imperfectly; Box, 1979; MacCallum, 2003). To illustrate the nature of estimated power from the Bayesian-classical hybrid method, we juxtapose its power estimates with those obtained from traditional (i.e., classical or frequentist) and Bayesian approaches. We employ an example in lexical processing (e.g., Yap & Seow, 2014) to illustrate underlying concepts and provide accompanying R and Rcpp code for computing power via the Bayesian-classical hybrid method. In general, power estimates become more realistic and much more varied after uncertainties are incorporated into their computation. As such, sample sizes should be determined by assurance (i.e., the mean of the power distribution) and the extent of variability in power estimates (e.g., interval width between 20th and 80th percentiles of the power distribution). We discuss advantages and challenges of incorporating the three stated sources of uncertainty into power analysis and, more broadly, research design. Finally, we conclude with future research directions. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Assuntos
Teorema de Bayes , Modelos Estatísticos , Psicologia/métodos , Incerteza , HumanosRESUMO
B. L. Fredrickson's (1998, 2001) broaden-and-build theory of positive emotions asserts that people's daily experiences of positive emotions compound over time to build a variety of consequential personal resources. The authors tested this build hypothesis in a field experiment with working adults (n = 139), half of whom were randomly-assigned to begin a practice of loving-kindness meditation. Results showed that this meditation practice produced increases over time in daily experiences of positive emotions, which, in turn, produced increases in a wide range of personal resources (e.g., increased mindfulness, purpose in life, social support, decreased illness symptoms). In turn, these increments in personal resources predicted increased life satisfaction and reduced depressive symptoms. Discussion centers on how positive emotions are the mechanism of change for the type of mind-training practice studied here and how loving-kindness meditation is an intervention strategy that produces positive emotions in a way that outpaces the hedonic treadmill effect.
Assuntos
Emoções , Empatia , Amor , Meditação , Qualidade de Vida/psicologia , Adaptação Psicológica , Adulto , Conscientização , Educação , Feminino , Humanos , Relações Interpessoais , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Satisfação Pessoal , Inventário de Personalidade , AutoimagemRESUMO
Current concerns regarding the dependability of psychological findings call for methodological developments to provide additional evidence in support of scientific conclusions. This article highlights the value and importance of two distinct kinds of parameter uncertainty, which are quantified by confidence sets (CSs) and fungible parameter estimates (FPEs; Lee, MacCallum, & Browne, 2017); both provide essential information regarding the defensibility of scientific findings. Using the structural equation model, we introduce a general perturbation framework based on the likelihood function that unifies CSs and FPEs and sheds new light on the conceptual distinctions between them. A targeted illustration is then presented to demonstrate the factors which differentially influence CSs and FPEs, further highlighting their theoretical differences. With 3 empirical examples on initiating a conversation with a stranger (Bagozzi & Warshaw, 1988), posttraumatic growth of caregivers in the context of pediatric palliative care (Cadell et al., 2014), and the direct and indirect effects of spirituality on thriving among youth (Dowling, Gestsdottir, Anderson, von Eye, & Lerner, 2004), we illustrate how CSs and FPEs provide unique information which lead to better informed scientific conclusions. Finally, we discuss the importance of considering information afforded by CSs and FPEs in strengthening the basis of interpreting statistical results in substantive research, conclude with future research directions, and provide example OpenMx code for the computation of CSs and FPEs. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Assuntos
Interpretação Estatística de Dados , Modelos Estatísticos , Psicologia/métodos , Incerteza , HumanosRESUMO
Statistical practice in psychological science is undergoing reform which is reflected in part by strong recommendations for reporting and interpreting effect sizes and their confidence intervals. We present principles and recommendations for research reporting and emphasize the variety of ways effect sizes can be reported. Additionally, we emphasize interpreting and reporting unstandardized effect sizes because of common misconceptions regarding standardized effect sizes which we elucidate. Effect sizes should directly answer their motivating research questions, be comprehensible to the average reader, and be based on meaningful metrics of their constituent variables. We illustrate our recommendations with empirical examples involving a One-way ANOVA, a categorical variable analysis, an interaction effect in linear regression, and a simple mediation model, emphasizing the interpretation of effect sizes. (PsycINFO Database Record