Search | VHL Search Portal

1.

From significance testing to estimation and Open Science: How esci can help.

Calin-Jageman, Robert; Cumming, Geoff.

Int J Psychol ; 2024 Apr 28.

Article in English | MEDLINE | ID: mdl-38679926

ABSTRACT

We argue that researchers should test less, estimate more, and adopt Open Science practices. We outline some of the flaws of null hypothesis significance testing and take three approaches to demonstrating the unreliability of the p value. We explain some advantages of estimation and meta-analysis ("the new statistics"), especially as contributions to Open Science practices, which aim to increase the openness, integrity, and replicability of research. We then describe esci (estimation statistics with confidence intervals): a set of online simulations and an R package for estimation that integrates into jamovi and JASP. This software provides (a) online activities to sharpen understanding of statistical concepts (e.g., "The Dance of the Means"); (b) effects sizes and confidence intervals for a range of study designs, largely by using techniques recently developed by Bonett; (c) publication-ready visualisations that make uncertainty salient; and (d) the option to conduct strong, fair hypothesis evaluation through specification of an interval null. Although developed specifically to support undergraduate learning through the 2nd edition of our textbook, esci should prove a valuable tool for graduate students and researchers interested in adopting the estimation approach. Further information is at ( https://thenewstatistics.com).

2.

The influence of journal submission guidelines on authors' reporting of statistics and use of open research practices: Five years later.

Giofrè, David; Boedker, Ingrid; Cumming, Geoff; Rivella, Carlotta; Tressoldi, Patrizio.

Behav Res Methods ; 55(7): 3845-3854, 2023 10.

Article in English | MEDLINE | ID: mdl-36253598

ABSTRACT

Changes in statistical practices and reporting have been documented by Giofrè et al. PLOS ONE 12(4), e0175583 (2017), who investigated ten statistical and open practices in two high-ranking journals (Psychological Science [PS] and Journal of Experimental Psychology-General [JEPG]): null hypothesis significance testing; confidence or credible intervals; meta-analysis of the results of multiple experiments; confidence interval interpretation; effect size interpretation; sample size determination; data exclusion; data availability; materials availability; and preregistered design and analysis plan. The investigation was based on an analysis of all papers published in these journals between 2013 and 2015. The aim of the present study was to follow up changes in both PS and JEPG in subsequent years, from 2016 to 2020, adding code availability as a further open practice. We found improvement in most practices, with some exceptions (i.e., confidence interval interpretation and meta-analysis). Despite these positive changes, our results indicate a need for further improvements in statistical practices and adoption of open practices.

Subject(s)

Psychology, Experimental , Humans , Research Design , Mental Processes , Sample Size

3.

The frequent insignificance of a "significant" p-value.

McGiffin, David C; Cumming, Geoff; Myles, Paul S.

J Card Surg ; 36(11): 4322-4331, 2021 Nov.

Article in English | MEDLINE | ID: mdl-34477260

ABSTRACT

Null hypothesis significance testing (NHST) and p-values are widespread in the cardiac surgical literature but are frequently misunderstood and misused. The purpose of the review is to discuss major disadvantages of p-values and suggest alternatives. We describe diagnostic tests, the prosecutor's fallacy in the courtroom, and NHST, which involve inter-related conditional probabilities, to help clarify the meaning of p-values, and discuss the enormous sampling variability, or unreliability, of p-values. Finally, we use a cardiac surgical database and simulations to explore further issues involving p-values. In clinical studies, p-values provide a poor summary of the observed treatment effect, whereas the three-number summary provided by effect estimates and confidence intervals is more informative and minimizes over-interpretation of a "significant" result. p-values are an unreliable measure of the strength of evidence; if used at all they give only, at best, a very rough guide to decision making. Researchers should adopt Open Science practices to improve the trustworthiness of research and, where possible, use estimation (three-number summaries) or other better techniques.

Subject(s)

Research Design , Bayes Theorem , Humans , Probability

4.

A Tribute to the Mind, Methodology and Mentoring of Wayne Velicer.

Harlow, Lisa L; Aiken, Leona; Blankson, A Nayena; Boodoo, Gwyneth M; Brick, Leslie Ann D; Collins, Linda M; Cumming, Geoff; Fava, Joseph L; Goodwin, Matthew S; Hoeppner, Bettina B; MacKinnon, David P; Molenaar, Peter C M; Rodgers, Joseph Lee; Rossi, Joseph S; Scott, Allie; Steiger, James H; West, Stephen G.

Multivariate Behav Res ; 56(3): 377-389, 2021.

Article in English | MEDLINE | ID: mdl-32077317

ABSTRACT

Wayne Velicer is remembered for a mind where mathematical concepts and calculations intrigued him, behavioral science beckoned him, and people fascinated him. Born in Green Bay, Wisconsin on March 4, 1944, he was raised on a farm, although early influences extended far beyond that beginning. His Mathematics BS and Psychology minor at Wisconsin State University in Oshkosh, and his PhD in Quantitative Psychology from Purdue led him to a fruitful and far-reaching career. He was honored several times as a high-impact author, was a renowned scholar in quantitative and health psychology, and had more than 300 scholarly publications and 54,000+ citations of his work, advancing the arenas of quantitative methodology and behavioral health. In his methodological work, Velicer sought out ways to measure, synthesize, categorize, and assess people and constructs across behaviors and time, largely through principal components analysis, time series, and cluster analysis. Further, he and several colleagues developed a method called Testing Theory-based Quantitative Predictions, successfully applied to predicting outcomes and effect sizes in smoking cessation, diet behavior, and sun protection, with the potential for wider applications. With $60,000,000 in external funding, Velicer also helped engage a large cadre of students and other colleagues to study methodological models for a myriad of health behaviors in a widely applied Transtheoretical Model of Change. Unwittingly, he has engendered indelible memories and gratitude to all who crossed his path. Although Wayne Velicer left this world on October 15, 2017 after battling an aggressive cancer, he is still very present among us.

Subject(s)

Behavioral Medicine , Mentoring , Humans

5.

Replication: Do not trust your p-value, be it small or large.

Gandevia, Simon; Cumming, Geoff; Amrhein, Valentin; Butler, Annie.

J Physiol ; 599(11): 2989-2990, 2021 06.

Article in English | MEDLINE | ID: mdl-33963767

Subject(s)

Data Interpretation, Statistical

6.

The new statistics: why and how.

Cumming, Geoff.

Psychol Sci ; 25(1): 7-29, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24220629

ABSTRACT

We need to make substantial changes to how we conduct research. First, in response to heightened concern that our published research literature is incomplete and untrustworthy, we need new requirements to ensure research integrity. These include prespecification of studies whenever possible, avoidance of selection and other inappropriate data-analytic practices, complete reporting, and encouragement of replication. Second, in response to renewed recognition of the severe flaws of null-hypothesis significance testing (NHST), we need to shift from reliance on NHST to estimation and other preferred techniques. The new statistics refers to recommended practices, including estimation based on effect sizes, confidence intervals, and meta-analysis. The techniques are not new, but adopting them widely would be new for many researchers, as well as highly beneficial. This article explains why the new statistics are important and offers guidance for their use. It describes an eight-step new-statistics strategy for research with integrity, which starts with formulation of research questions in estimation terms, has no place for NHST, and is aimed at building a cumulative quantitative discipline.

Subject(s)

Biomedical Research/standards , Data Interpretation, Statistical , Psychology/standards , Statistics as Topic/standards , Humans

7.

The perception of positive and negative facial expressions in unilateral brain-damaged patients: A meta-analysis.

Abbott, Jacenta D; Cumming, Geoff; Fidler, Fiona; Lindell, Annukka K.

Laterality ; 18(4): 437-59, 2013.

Article in English | MEDLINE | ID: mdl-22849611

ABSTRACT

How the brain is lateralised for emotion processing remains a key question in contemporary neuropsychological research. The right hemisphere hypothesis asserts that the right hemisphere dominates emotion processing, whereas the valence hypothesis holds that positive emotion is processed in the left hemisphere and negative emotion is controlled by the right hemisphere. A meta-analysis was conducted to assess unilateral brain-damaged individuals' performance on tasks of facial emotion perception according to valence. A systematic search of the literature identified seven articles that met the conservative selection criteria and could be included in a meta-analysis. A total of 12 meta-analyses of facial expression perception were constructed assessing identification and labelling tasks according to valence and the side of brain damage. The results demonstrated that both left and right hemisphere damage leads to impairments in emotion perception (identification and labelling) irrespective of valence. Importantly, right hemisphere damage prompted more pronounced emotion perception impairment than left hemisphere damage, across valence, suggesting right hemisphere dominance for emotion perception. Furthermore, right hemisphere damage was associated with a larger tendency for impaired perception of negative than positive emotion across identification and labelling tasks. Overall the findings support Adolphs, Jansari, and Tranel (2001) model whereby the right hemisphere preferentially processes negative facial expressions and both hemispheres process positive facial expressions.

Subject(s)

Brain Damage, Chronic/physiopathology , Brain Damage, Chronic/psychology , Facial Expression , Functional Laterality , Social Perception , Emotions , Female , Humans , Male

8.

Cohen's d needs to be readily interpretable: comment on Shieh (2013).

Cumming, Geoff.

Behav Res Methods ; 45(4): 968-71, 2013 Dec.

Article in English | MEDLINE | ID: mdl-24002988

ABSTRACT

Shieh (2013) discussed in detail Î´*, a proposed standardized effect size measure for the two-independent-groups design with heteroscedasticity. Shieh focused on inference-notably, the large challenge of calculating confidence intervals for Î´*. I contend, however, that the standardizer chosen for Î´*, meaning the units in which it is expressed, is appropriate for inference but causes Î´* to be inconsistent with conventional Cohen's d. In addition, Î´* depends on the relative sample sizes in the particular experiment and, thus, lacks the generality that is highly desirable if a standardized effect size is to be readily interpretable and also usable in meta-analysis. In the case of heteroscedasticity, I suggest that researchers should choose as standardizer for Cohen's Î´ the best available estimate of the SD of an appropriate population, usually the control population, in preference to Î´* as discussed by Shieh.

Subject(s)

Confidence Intervals , Meta-Analysis as Topic , Models, Statistical , Humans , Intelligence Tests/standards , Intelligence Tests/statistics & numerical data , Judgment , Manifest Anxiety Scale/standards , Manifest Anxiety Scale/statistics & numerical data , Research Design , Sample Size

9.

Error bars in experimental biology.

Cumming, Geoff; Fidler, Fiona; Vaux, David L.

J Cell Biol ; 177(1): 7-11, 2007 Apr 09.

Article in English | MEDLINE | ID: mdl-17420288

ABSTRACT

Error bars commonly appear in figures in publications, but experimental biologists are often unsure how they should be used and interpreted. In this article we illustrate some basic features of error bars and explain how they can help communicate data and assist correct interpretation. Error bars may show confidence intervals, standard errors, standard deviations, or other quantities. Different types of error bars give quite different information, and so figure legends must make clear what error bars represent. We suggest eight simple rules to assist with effective use and interpretation of error bars.

Subject(s)

Data Interpretation, Statistical , Biology/methods , Biology/statistics & numerical data , Computer Graphics , Confidence Intervals , Sample Size

10.

The diamond ratio: A visual indicator of the extent of heterogeneity in meta-analysis.

Cairns, Maxwell; Cumming, Geoff; Calin-Jageman, Robert; Prendergast, Luke A.

Br J Math Stat Psychol ; 75(2): 201-219, 2022 05.

Article in English | MEDLINE | ID: mdl-34730234

ABSTRACT

The result of a meta-analysis is conventionally pictured in the forest plot as a diamond, whose length is the 95% confidence interval (CI) for the summary measure of interest. The Diamond Ratio (DR) is the ratio of the length of the diamond given by a random effects meta-analysis to that given by a fixed effect meta-analysis. The DR is a simple visual indicator of the amount of change caused by moving from a fixed-effect to a random-effects meta-analysis. Increasing values of DR greater than 1.0 indicate increasing heterogeneity relative to the effect variances. We investigate the properties of the DR, and its relationship to four conventional but more complex measures of heterogeneity. We propose for the first time a CI on the DR, and show that it performs well in terms of coverage. We provide example code to calculate the DR and its CI, and to show these in a forest plot. We conclude that the DR is a useful indicator that can assist students and researchers to understand heterogeneity, and to appreciate its extent in particular cases.

Subject(s)

Meta-Analysis as Topic , Humans

11.

Reducing overconfidence in the interval judgments of experts.

Speirs-Bridge, Andrew; Fidler, Fiona; McBride, Marissa; Flander, Louisa; Cumming, Geoff; Burgman, Mark.

Risk Anal ; 30(3): 512-23, 2010 Mar.

Article in English | MEDLINE | ID: mdl-20030766

ABSTRACT

Elicitation of expert opinion is important for risk analysis when only limited data are available. Expert opinion is often elicited in the form of subjective confidence intervals; however, these are prone to substantial overconfidence. We investigated the influence of elicitation question format, in particular the number of steps in the elicitation procedure. In a 3-point elicitation procedure, an expert is asked for a lower limit, upper limit, and best guess, the two limits creating an interval of some assigned confidence level (e.g., 80%). In our 4-step interval elicitation procedure, experts were also asked for a realistic lower limit, upper limit, and best guess, but no confidence level was assigned; the fourth step was to rate their anticipated confidence in the interval produced. In our three studies, experts made interval predictions of rates of infectious diseases (Study 1, n = 21 and Study 2, n = 24: epidemiologists and public health experts), or marine invertebrate populations (Study 3, n = 34: ecologists and biologists). We combined the results from our studies using meta-analysis, which found average overconfidence of 11.9%, 95% CI [3.5, 20.3] (a hit rate of 68.1% for 80% intervals)-a substantial decrease in overconfidence compared with previous studies. Studies 2 and 3 suggest that the 4-step procedure is more likely to reduce overconfidence than the 3-point procedure (Cohen's d = 0.61, [0.04, 1.18]).

Subject(s)

Confidence Intervals , Judgment , Humans , Public Health , Risk Assessment

12.

Yes, but don't underestimate estimation: reply to Morey, Rouder, Verhagen, and Wagenmakers (2014).

Fidler, Fiona; Cumming, Geoff.

Psychol Sci ; 25(6): 1291-2, 2014 Jun.

Article in English | MEDLINE | ID: mdl-24789841

Subject(s)

Biomedical Research/standards , Data Interpretation, Statistical , Psychology/standards , Statistics as Topic/standards , Humans

13.

Replicates and repeats--what is the difference and is it significant? A brief discussion of statistics and experimental design.

Vaux, David L; Fidler, Fiona; Cumming, Geoff.

EMBO Rep ; 13(4): 291-6, 2012 Apr 02.

Article in English | MEDLINE | ID: mdl-22421999

Subject(s)

Reproducibility of Results , Research Design , Statistics as Topic , Animals , Bone Marrow Cells/cytology , Data Interpretation, Statistical , Mice

14.

Putting research in context: understanding confidence intervals from one or more studies.

Finch, Sue; Cumming, Geoff.

J Pediatr Psychol ; 34(9): 903-16, 2009 Oct.

Article in English | MEDLINE | ID: mdl-19095715

ABSTRACT

OBJECTIVES: To support wider use and higher quality interpretation of confidence intervals (CIs) in psychology. METHODS: We discuss the meaning and interpretation of CIs in single studies, and illustrate the value of CIs when reviewing and integrating research findings across studies. We demonstrate how to find CIs from summary statistics and published data in some simple situations. RESULTS: We provide the ESCI graphical software, which runs under Microsoft Excel, to assist with calculating and plotting CIs. (www.latrobe.edu.au/psy/esci) CONCLUSIONS: The wider use of CIs in psychology should support quality research communication and integrated interpretation of findings in context.

Subject(s)

Confidence Intervals , Research Design/statistics & numerical data , Research/statistics & numerical data , Humans

15.

The New Statistics for Better Science: Ask How Much, How Uncertain, and What Else is Known.

Calin-Jageman, Robert J; Cumming, Geoff.

Am Stat ; 73(Suppl 1): 271-280, 2019.

Article in English | MEDLINE | ID: mdl-31762475

ABSTRACT

The "New Statistics" emphasizes effect sizes, confidence intervals, meta-analysis, and the use of Open Science practices. We present 3 specific ways in which a New Statistics approach can help improve scientific practice: by reducing over-confidence in small samples, by reducing confirmation bias, and by fostering more cautious judgments of consistency. We illustrate these points through consideration of the literature on oxytocin and human trust, a research area that typifies some of the endemic problems that arise with poor statistical practice.

16.

Estimation for Better Inference in Neuroscience.

Calin-Jageman, Robert J; Cumming, Geoff.

eNeuro ; 6(4)2019.

Article in English | MEDLINE | ID: mdl-31453316

ABSTRACT

The estimation approach to inference emphasizes reporting effect sizes with expressions of uncertainty (interval estimates). In this perspective we explain the estimation approach and describe how it can help nudge neuroscientists toward a more productive research cycle by fostering better planning, more thoughtful interpretation, and more balanced evaluation of evidence.

Subject(s)

Epilepsy , Pilocarpine , Animals , Cognition , Housing , Mice , Rats , Seizures

17.

The value of RCT evidence depends on the quality of statistical analysis.

Faulkner, Cathy; Fidler, Fiona; Cumming, Geoff.

Behav Res Ther ; 46(2): 270-81, 2008 Feb.

Article in English | MEDLINE | ID: mdl-18191102

ABSTRACT

The authors examined statistical practices in 193 randomized controlled trials (RCTs) of psychological therapies published in prominent psychology and psychiatry journals during 1999-2003. Statistical significance tests were used in 99% of RCTs, 84% discussed clinical significance, but only 46% considered-even minimally-statistical power, 31% interpreted effect size and only 2% interpreted confidence intervals. In a second study, 42 respondents to an email survey of the authors of RCTs analyzed in the first study indicated they consider it very important to know the magnitude and clinical importance of the effect, in addition to whether a treatment effect exists. The present authors conclude that published RCTs focus on statistical significance tests ("Is there an effect or difference?"), and neglect other important questions: "How large is the effect?" and "Is the effect clinically important?" They advocate improved statistical reporting of RCTs especially by reporting and interpreting clinical significance, effect sizes and confidence intervals.

Subject(s)

Data Interpretation, Statistical , Quality Control , Randomized Controlled Trials as Topic , Confidence Intervals , Humans , Psychometrics , Research Design , Treatment Outcome

18.

Theory Testing Using Quantitative Predictions of Effect Size.

Velicer, Wayne F; Cumming, Geoff; Fava, Joseph L; Rossi, Joseph S; Prochaska, James O; Johnson, Janet.

Appl Psychol ; 57(4): 589-608, 2008 Oct.

Article in English | MEDLINE | ID: mdl-22837590

ABSTRACT

Traditional Null Hypothesis Testing procedures are poorly adapted to theory testing. The methodology can mislead researchers in several ways, including: (a) a lack of power can result in an erroneous rejection of the theory; (b) the focus on directionality (ordinal tests) rather than more precise quantitative predictions limits the information gained; and (c) the misuse of probability values to indicate effect size. An alternative approach is proposed which involves employing the theory to generate explicit effect size predictions that are compared to the effect size estimates and related confidence intervals to test the theoretical predictions. This procedure is illustrated employing the Transtheoretical Model. Data from a sample (N = 3,967) of smokers from a large New England HMO system were used to test the model. There were a total of 15 predictions evaluated, each involving the relation between Stage of Change and one of the other 15 Transtheoretical Model variables. For each variable, omega-squared and the related confidence interval were calculated and compared to the predicted effect sizes. Eleven of the 15 predictions were confirmed, providing support for the theoretical model. Quantitative predictions represent a much more direct, informative, and strong test of a theory than the traditional test of significance.

19.

A Cross-Sectional Analysis of Students' Intuitions When Interpreting CIs.

Kalinowski, Pav; Lai, Jerry; Cumming, Geoff.

Front Psychol ; 9: 112, 2018.

Article in English | MEDLINE | ID: mdl-29527180

ABSTRACT

We explored how students interpret the relative likelihood of capturing a population parameter at various points of a CI in two studies. First, an online survey of 101 students found that students' beliefs about the probability curve within a CI take a variety of shapes, and that in fixed choice tasks, 39% CI [30, 48] of students' responses deviated from true distributions. For open ended tasks, this proportion rose to 85%, 95% CI [76, 90]. We interpret this as evidence that, for many students, intuitions about CIs distributions are ill-formed, and their responses are highly susceptible to question format. Many students also falsely believed that there is substantial change in likelihood at the upper and lower limits of the CI, resembling a cliff effect (Rosenthal and Gaito, 1963; Nelson et al., 1986). In a follow-up study, a subset of 24 post-graduate students participated in a 45-min semi-structured interview discussing the students' responses to the survey. Analysis of interview transcripts identified several competing intuitions about CIs, and several new CI misconceptions. During the interview, we also introduced an interactive teaching program displaying a cat's eye CI, that is, a CI that uses normal distributions to depict the correct likelihood distribution. Cat's eye CIs were designed to help students understand likelihood distributions and the relationship between interval length, C% level and sample size. Observed changes in students' intuitions following this teaching program suggest that a brief intervention using cat's eyes can reduce CI misconceptions and increase accurate CI intuitions.

20.

The influence of journal submission guidelines on authors' reporting of statistics and use of open research practices.

Giofrè, David; Cumming, Geoff; Fresc, Luca; Boedker, Ingrid; Tressoldi, Patrizio.

PLoS One ; 12(4): e0175583, 2017.

Article in English | MEDLINE | ID: mdl-28414751

ABSTRACT

From January 2014, Psychological Science introduced new submission guidelines that encouraged the use of effect sizes, estimation, and meta-analysis (the "new statistics"), required extra detail of methods, and offered badges for use of open science practices. We investigated the use of these practices in empirical articles published by Psychological Science and, for comparison, by the Journal of Experimental Psychology: General, during the period of January 2013 to December 2015. The use of null hypothesis significance testing (NHST) was extremely high at all times and in both journals. In Psychological Science, the use of confidence intervals increased markedly overall, from 28% of articles in 2013 to 70% in 2015, as did the availability of open data (3 to 39%) and open materials (7 to 31%). The other journal showed smaller or much smaller changes. Our findings suggest that journal-specific submission guidelines may encourage desirable changes in authors' practices.

Subject(s)

Publishing/standards , Research Design/statistics & numerical data , Humans , Meta-Analysis as Topic

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL