Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Behav Res Methods ; 54(3): 1114-1130, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-34471963

RESUMO

Hypothesis testing is a central statistical method in psychology and the cognitive sciences. However, the problems of null hypothesis significance testing (NHST) and p values have been debated widely, but few attractive alternatives exist. This article introduces the fbst R package, which implements the Full Bayesian Significance Test (FBST) to test a sharp null hypothesis against its alternative via the e value. The statistical theory of the FBST has been introduced more than two decades ago and since then the FBST has shown to be a Bayesian alternative to NHST and p values with both theoretical and practical highly appealing properties. The algorithm provided in the fbst package is applicable to any Bayesian model as long as the posterior distribution can be obtained at least numerically. The core function of the package provides the Bayesian evidence against the null hypothesis, the e value. Additionally, p values based on asymptotic arguments can be computed and rich visualizations for communication and interpretation of the results can be produced. Three examples of frequently used statistical procedures in the cognitive sciences are given in this paper, which demonstrate how to apply the FBST in practice using the fbst package. Based on the success of the FBST in statistical science, the fbst package should be of interest to a broad range of researchers and hopefully will encourage researchers to consider the FBST as a possible alternative when conducting hypothesis tests of a sharp null hypothesis.


Assuntos
Projetos de Pesquisa , Teorema de Bayes , Humanos
2.
Behav Res Methods ; 48(4): 1205-1226, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-26497820

RESUMO

This study documents reporting errors in a sample of over 250,000 p-values reported in eight major psychology journals from 1985 until 2013, using the new R package "statcheck." statcheck retrieved null-hypothesis significance testing (NHST) results from over half of the articles from this period. In line with earlier research, we found that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. In contrast to earlier findings, we found that the average prevalence of inconsistent p-values has been stable over the years or has declined. The prevalence of gross inconsistencies was higher in p-values reported as significant than in p-values reported as nonsignificant. This could indicate a systematic bias in favor of significant results. Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called "co-pilot model," and to use statcheck to flag possible inconsistencies in one's own manuscript or during the review process.


Assuntos
Pesquisa Comportamental/estatística & dados numéricos , Viés , Humanos , Prevalência
3.
Pers Soc Psychol Bull ; 48(7): 1105-1117, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-34308722

RESUMO

Traditionally, statistical power was viewed as relevant to research planning but not evaluation of completed research. However, following discussions of high false finding rates (FFRs) associated with low statistical power, the assumed level of statistical power has become a key criterion for research acceptability. Yet, the links between power and false findings are not as straightforward as described. Assumptions underlying FFR calculations do not reflect research realities in personality and social psychology. Even granting the assumptions, the FFR calculations identify important limitations to any general influences of statistical power. Limits for statistical power in inflating false findings can also be illustrated through the use of FFR calculations to (a) update beliefs about the null or alternative hypothesis and (b) assess the relative support for the null versus alternative hypothesis when evaluating a set of studies. Taken together, statistical power should be de-emphasized in comparison to current uses in research evaluation.


Assuntos
Personalidade , Psicologia Social , Humanos
4.
BMC Psychol ; 10(1): 274, 2022 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-36419180

RESUMO

BACKGROUND: Mainstream psychology is experiencing a crisis of confidence. Many of the methodological solutions offered in response have focused largely on statistical alternatives to null hypothesis statistical testing, ignoring nonstatistical remedies that are readily available within psychology; namely, use of small-N designs. In fact, many classic memory studies that have passed the test of replicability used them. That methodological legacy warranted a retrospective look at nonexperimental data to explore the generality of the reported effects. METHOD: Various classroom demonstrations were conducted over multiple semesters in introductory psychology courses with typical, mostly freshman students from a predominantly white private Catholic university in the US Midwest based on classic memory experiments on immediate memory span, chunking, and depth of processing. RESULTS: Students tended to remember 7 ± 2 digits, remembered more digits of π following an attached meaningful story, and remembered more words after elaborative rehearsal than after maintenance rehearsal. These results amount to replications under uncontrolled classroom environments of the classic experiments originally conducted largely outside of null hypothesis statistical testing frameworks. CONCLUSIONS: In light of the ongoing replication crisis in psychology, the results are remarkable and noteworthy, validating these historically important psychological findings. They are testament to the reliability of reproducible effects as the hallmark of empirical findings in science and suggest an alternative approach to commonly proffered solutions to the replication crisis.


Assuntos
Memória de Curto Prazo , Rememoração Mental , Humanos , Reprodutibilidade dos Testes , Estudos Retrospectivos , Projetos de Pesquisa
5.
J Exp Anal Behav ; 115(1): 115-128, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33336404

RESUMO

Psychology is undergoing major cultural changes methodologically, with efforts to redefine how psychologists analyze and report their data. Davidson (2018) argued that psychology's methodological crises stem from mechanical objectivity involving the adoption of an analytic tool as source of dependable knowledge. This has led to institutionalization, and eventually uncritical ritualistic use, such as happened with null hypothesis statistical testing. Davidson invoked the mythological symbol of the Ouroboros to represent the endless churning of statistical fads. Sidman (1960), in his Tactics of Scientific Research provided a shield from these problems in terms of the premium he placed on the experience, expertise, judgement, and decision-making of the scientist, that appear to be absent in psychology's ritualized processes.


Assuntos
Julgamento , Projetos de Pesquisa , Psicologia
6.
PeerJ ; 9: e12453, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34900418

RESUMO

BACKGROUND: Despite much discussion in the epidemiologic literature surrounding the use of null hypothesis significance testing (NHST) for inferences, the reporting practices of veterinary researchers have not been examined. We conducted a survey of articles published in Preventive Veterinary Medicine, a leading veterinary epidemiology journal, aimed at (a) estimating the frequency of reporting p values, confidence intervals and statistical significance between 1997 and 2017, (b) determining whether this varies by article section and (c) determining whether this varies over time. METHODS: We used systematic cluster sampling to select 985 original research articles from issues published in March, June, September and December of each year of the study period. Using the survey data analysis menu in Stata, we estimated overall and yearly proportions of article sections (abstracts, results-texts, results-tables and discussions) reporting p values, confidence intervals and statistical significance. Additionally, we estimated the proportion of p values less than 0.05 reported in each section, the proportion of article sections in which p values were reported as inequalities, and the proportion of article sections in which confidence intervals were interpreted as if they were significance tests. Finally, we used Generalised Estimating Equations to estimate prevalence odds ratios and 95% confidence intervals, comparing the occurrence of each of the above-mentioned reporting elements in one article section relative to another. RESULTS: Over the 20-year period, for every 100 published manuscripts, 31 abstracts (95% CI [28-35]), 65 results-texts (95% CI [61-68]), 23 sets of results-tables (95% CI [20-27]) and 59 discussion sections (95% CI [56-63]) reported statistical significance at least once. Only in the case of results-tables, were the numbers reporting p values (48; 95% CI [44-51]), and confidence intervals (44; 95% CI [41-48]) higher than those reporting statistical significance. We also found that a substantial proportion of p values were reported as inequalities and most were less than 0.05. The odds of a p value being less than 0.05 (OR = 4.5; 95% CI [2.3-9.0]) or being reported as an inequality (OR = 3.2; 95% CI [1.3-7.6]) was higher in the abstracts than in the results-texts. Additionally, when confidence intervals were interpreted, on most occasions they were used as surrogates for significance tests. Overall, no time trends in reporting were observed for any of the three reporting elements over the study period. CONCLUSIONS: Despite the availability of superior approaches to statistical inference and abundant criticism of its use in the epidemiologic literature, NHST is substantially the most common means of inference in articles published in Preventive Veterinary Medicine. This pattern has not changed substantially between 1997 and 2017.

7.
Perspect Psychol Sci ; 15(4): 1054-1075, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32502366

RESUMO

Data analysis is a risky endeavor, particularly among people who are unaware of its dangers. According to some researchers, "statistical conclusions validity" threatens all research subjected to the dark arts of statistical magic. Although traditional statistics classes may advise against certain practices (e.g., multiple comparisons, small sample sizes, violating normality), they may fail to cover others (e.g., outlier detection and violating linearity). More common, perhaps, is that researchers may fail to remember them. In this article, rather than rehashing old warnings and diatribes against this practice or that, I instead advocate a general statistical-analysis strategy. This graphic-based eight-step strategy promises to resolve the majority of statistical traps researchers may fall into-without having to remember large lists of problematic statistical practices. These steps will assist in preventing both false positives and false negatives and yield critical insights about the data that would have otherwise been missed. I conclude with an applied example that shows how the eight steps reveal interesting insights that would not be detected with standard statistical practices.


Assuntos
Pesquisa Biomédica/normas , Análise de Dados , Interpretação Estatística de Dados , Psicologia/normas , Humanos
8.
Front Psychol ; 10: 2767, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31920819

RESUMO

Turmoil has engulfed psychological science. Causes and consequences of the reproducibility crisis are in dispute. With the hope of addressing some of its aspects, Bayesian methods are gaining increasing attention in psychological science. Some of their advantages, as opposed to the frequentist framework, are the ability to describe parameters in probabilistic terms and explicitly incorporate prior knowledge about them into the model. These issues are crucial in particular regarding the current debate about statistical significance. Bayesian methods are not necessarily the only remedy against incorrect interpretations or wrong conclusions, but there is an increasing agreement that they are one of the keys to avoid such fallacies. Nevertheless, its flexible nature is its power and weakness, for there is no agreement about what indices of "significance" should be computed or reported. This lack of a consensual index or guidelines, such as the frequentist p-value, further contributes to the unnecessary opacity that many non-familiar readers perceive in Bayesian statistics. Thus, this study describes and compares several Bayesian indices, provide intuitive visual representation of their "behavior" in relationship with common sources of variance such as sample size, magnitude of effects and also frequentist significance. The results contribute to the development of an intuitive understanding of the values that researchers report, allowing to draw sensible recommendations for Bayesian statistics description, critical for the standardization of scientific reporting.

9.
Perspect Behav Sci ; 42(1): 109-132, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-31976424

RESUMO

Scientists abstract hypotheses from observations of the world, which they then deploy to test their reliability. The best way to test reliability is to predict an effect before it occurs. If we can manipulate the independent variables (the efficient causes) that make it occur, then ability to predict makes it possible to control. Such control helps to isolate the relevant variables. Control also refers to a comparison condition, conducted to see what would have happened if we had not deployed the key ingredient of the hypothesis: scientific knowledge only accrues when we compare what happens in one condition against what happens in another. When the results of such comparisons are not definitive, metrics of the degree of efficacy of the manipulation are required. Many of those derive from statistical inference, and many of those poorly serve the purpose of the cumulation of knowledge. Without ability to replicate an effect, the utility of the principle used to predict or control is dubious. Traditional models of statistical inference are weak guides to replicability and utility of results. Several alternatives to null hypothesis testing are sketched: Bayesian, model comparison, and predictive inference (p rep). Predictive inference shows, for example, that the failure to replicate most results in the Open Science Project was predictable. Replicability is but one aspect of scientific understanding: it establishes the reliability of our data and the predictive ability of our formal models. It is a necessary aspect of scientific progress, even if not by itself sufficient for understanding.

10.
Temperature (Austin) ; 6(3): 181-210, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31608303

RESUMO

The average environmental and occupational physiologist may find statistics are difficult to interpret and use since their formal training in statistics is limited. Unfortunately, poor statistical practices can generate erroneous or at least misleading results and distorts the evidence in the scientific literature. These problems are exacerbated when statistics are used as thoughtless ritual that is performed after the data are collected. The situation is worsened when statistics are then treated as strict judgements about the data (i.e., significant versus non-significant) without a thought given to how these statistics were calculated or their practical meaning. We propose that researchers should consider statistics at every step of the research process whether that be the designing of experiments, collecting data, analysing the data or disseminating the results. When statistics are considered as an integral part of the research process, from start to finish, several problematic practices can be mitigated. Further, proper practices in disseminating the results of a study can greatly improve the quality of the literature. Within this review, we have included a number of reminders and statistical questions researchers should answer throughout the scientific process. Rather than treat statistics as a strict rule following procedure we hope that readers will use this review to stimulate a discussion around their current practices and attempt to improve them. The code to reproduce all analyses and figures within the manuscript can be found at https://doi.org/10.17605/OSF.IO/BQGDH.

11.
Neuroinformatics ; 17(4): 515-545, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-30649677

RESUMO

Here we address the current issues of inefficiency and over-penalization in the massively univariate approach followed by the correction for multiple testing, and propose a more efficient model that pools and shares information among brain regions. Using Bayesian multilevel (BML) modeling, we control two types of error that are more relevant than the conventional false positive rate (FPR): incorrect sign (type S) and incorrect magnitude (type M). BML also aims to achieve two goals: 1) improving modeling efficiency by having one integrative model and thereby dissolving the multiple testing issue, and 2) turning the focus of conventional null hypothesis significant testing (NHST) on FPR into quality control by calibrating type S errors while maintaining a reasonable level of inference efficiency. The performance and validity of this approach are demonstrated through an application at the region of interest (ROI) level, with all the regions on an equal footing: unlike the current approaches under NHST, small regions are not disadvantaged simply because of their physical size. In addition, compared to the massively univariate approach, BML may simultaneously achieve increased spatial specificity and inference efficiency, and promote results reporting in totality and transparency. The benefits of BML are illustrated in performance and quality checking using an experimental dataset. The methodology also avoids the current practice of sharp and arbitrary thresholding in the p-value funnel to which the multidimensional data are reduced. The BML approach with its auxiliary tools is available as part of the AFNI suite for general use.


Assuntos
Encéfalo/diagnóstico por imagem , Neuroimagem/métodos , Neuroimagem/estatística & dados numéricos , Teorema de Bayes , Humanos , Método de Monte Carlo
12.
J Neurosurg ; 132(2): 662-670, 2019 02 08.
Artigo em Inglês | MEDLINE | ID: mdl-30738384

RESUMO

OBJECTIVE: The objective of this study was to evaluate the trends in reporting of p values in the neurosurgical literature from 1990 through 2017. METHODS: All abstracts from the Journal of Neurology, Neurosurgery, and Psychiatry (JNNP), Journal of Neurosurgery (JNS) collection (including Journal of Neurosurgery: Spine and Journal of Neurosurgery: Pediatrics), Neurosurgery (NS), and Journal of Neurotrauma (JNT) available on PubMed from 1990 through 2017 were retrieved. Automated text mining was performed to extract p values from relevant abstracts. Extracted p values were analyzed for temporal trends and characteristics. RESULTS: The search yielded 47,889 relevant abstracts. A total of 34,324 p values were detected in 11,171 abstracts. Since 1990 there has been a steady, proportionate increase in the number of abstracts containing p values. There were average absolute year-on-year increases of 1.2% (95% CI 1.1%-1.3%; p < 0.001), 0.93% (95% CI 0.75%-1.1%; p < 0.001), 0.70% (95% CI 0.57%-0.83%; p < 0.001), and 0.35% (95% CI 0.095%-0.60%; p = 0.0091) of abstracts reporting p values in JNNP, JNS, NS, and JNT, respectively. There have also been average year-on-year increases of 0.045 (95% CI 0.031-0.059; p < 0.001), 0.052 (95% CI 0.037-0.066; p < 0.001), 0.042 (95% CI 0.030-0.054; p < 0.001), and 0.041 (95% CI 0.026-0.056; p < 0.001) p values reported per abstract for these respective journals. The distribution of p values showed a positive skew and strong clustering of values at rounded decimals (i.e., 0.01, 0.02, etc.). Between 83.2% and 89.8% of all reported p values were at or below the "significance" threshold of 0.05 (i.e., p ≤ 0.05). CONCLUSIONS: Trends in reporting of p values and the distribution of p values suggest publication bias remains in the neurosurgical literature.


Assuntos
Interpretação Estatística de Dados , Procedimentos Neurocirúrgicos/tendências , Publicações Periódicas como Assunto/tendências , Viés de Publicação/tendências , Humanos , Procedimentos Neurocirúrgicos/estatística & dados numéricos , Publicações Periódicas como Assunto/estatística & dados numéricos , Viés de Publicação/estatística & dados numéricos
13.
J Exp Psychopathol ; 8(2): 140-157, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28748068

RESUMO

The principal goals of experimental psychopathology (EPP) research are to offer insights into the pathogenic mechanisms of mental disorders and to provide a stable ground for the development of clinical interventions. The main message of the present article is that those goals are better served by the adoption of Bayesian statistics than by the continued use of null-hypothesis significance testing (NHST). In the first part of the article we list the main disadvantages of NHST and explain why those disadvantages limit the conclusions that can be drawn from EPP research. Next, we highlight the advantages of Bayesian statistics. To illustrate, we then pit NHST and Bayesian analysis against each other using an experimental data set from our lab. Finally, we discuss some challenges when adopting Bayesian statistics. We hope that the present article will encourage experimental psychopathologists to embrace Bayesian statistics, which could strengthen the conclusions drawn from EPP research.

14.
Behav Sci (Basel) ; 7(3)2017 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-28805739

RESUMO

Four data sets from studies included in the Reproducibility Project were re-analyzed to demonstrate a number of flawed research practices (i.e., "bad habits") of modern psychology. Three of the four studies were successfully replicated, but re-analysis showed that in one study most of the participants responded in a manner inconsistent with the researchers' theoretical model. In the second study, the replicated effect was shown to be an experimental confound, and in the third study the replicated statistical effect was shown to be entirely trivial. The fourth study was an unsuccessful replication, yet re-analysis of the data showed that questioning the common assumptions of modern psychological measurement can lead to novel techniques of data analysis and potentially interesting findings missed by traditional methods of analysis. Considered together, these new analyses show that while it is true replication is a key feature of science, causal inference, modeling, and measurement are equally important and perhaps more fundamental to obtaining truly scientific knowledge of the natural world. It would therefore be prudent for psychologists to confront the limitations and flaws in their current analytical methods and research practices.

15.
PeerJ ; 5: e3068, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28265523

RESUMO

Head et al. (2015) provided a large collection of p-values that, from their perspective, indicates widespread statistical significance seeking (i.e., p-hacking). This paper inspects this result for robustness. Theoretically, the p-value distribution should be a smooth, decreasing function, but the distribution of reported p-values shows systematically more reported p-values for .01, .02, .03, .04, and .05 than p-values reported to three decimal places, due to apparent tendencies to round p-values to two decimal places. Head et al. (2015) correctly argue that an aggregate p-value distribution could show a bump below .05 when left-skew p-hacking occurs frequently. Moreover, the elimination of p = .045 and p = .05, as done in the original paper, is debatable. Given that eliminating p = .045 is a result of the need for symmetric bins and systematically more p-values are reported to two decimal places than to three decimal places, I did not exclude p = .045 and p = .05. I conducted Fisher's method .04 < p < .05 and reanalyzed the data by adjusting the bin selection to .03875 < p ≤ .04 versus .04875 < p ≤ .05. Results of the reanalysis indicate that no evidence for left-skew p-hacking remains when we look at the entire range between .04 < p < .05 or when we inspect the second-decimal. Taking into account reporting tendencies when selecting the bins to compare is especially important because this dataset does not allow for the recalculation of the p-values. Moreover, inspecting the bins that include two-decimal reported p-values potentially increases sensitivity if strategic rounding down of p-values as a form of p-hacking is widespread. Given the far-reaching implications of supposed widespread p-hacking throughout the sciences Head et al. (2015), it is important that these findings are robust to data analysis choices if the conclusion is to be considered unequivocal. Although no evidence of widespread left-skew p-hacking is found in this reanalysis, this does not mean that there is no p-hacking at all. These results nuance the conclusion by Head et al. (2015), indicating that the results are not robust and that the evidence for widespread left-skew p-hacking is ambiguous at best.

16.
Otolaryngol Head Neck Surg ; 157(6): 915-918, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29192853

RESUMO

In biomedical research, it is imperative to differentiate chance variation from truth before we generalize what we see in a sample of subjects to the wider population. For decades, we have relied on null hypothesis significance testing, where we calculate P values for our data to decide whether to reject a null hypothesis. This methodology is subject to substantial misinterpretation and errant conclusions. Instead of working backward by calculating the probability of our data if the null hypothesis were true, Bayesian statistics allow us instead to work forward, calculating the probability of our hypothesis given the available data. This methodology gives us a mathematical means of incorporating our "prior probabilities" from previous study data (if any) to produce new "posterior probabilities." Bayesian statistics tell us how confidently we should believe what we believe. It is time to embrace and encourage their use in our otolaryngology research.


Assuntos
Teorema de Bayes , Pesquisa Biomédica/estatística & dados numéricos , Otolaringologia/estatística & dados numéricos , Interpretação Estatística de Dados , Humanos , Probabilidade , Projetos de Pesquisa
17.
Front Psychol ; 8: 908, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28649206

RESUMO

Many statistical methods yield the probability of the observed data - or data more extreme - under the assumption that a particular hypothesis is true. This probability is commonly known as 'the' p-value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.

18.
PeerJ ; 4: e1935, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27077017

RESUMO

Previous studies provided mixed findings on pecularities in p-value distributions in psychology. This paper examined 258,050 test results across 30,710 articles from eight high impact journals to investigate the existence of a peculiar prevalence of p-values just below .05 (i.e., a bump) in the psychological literature, and a potential increase thereof over time. We indeed found evidence for a bump just below .05 in the distribution of exactly reported p-values in the journals Developmental Psychology, Journal of Applied Psychology, and Journal of Personality and Social Psychology, but the bump did not increase over the years and disappeared when using recalculated p-values. We found clear and direct evidence for the QRP "incorrect rounding of p-value" (John, Loewenstein & Prelec, 2012) in all psychology journals. Finally, we also investigated monotonic excess of p-values, an effect of certain QRPs that has been neglected in previous research, and developed two measures to detect this by modeling the distributions of statistically significant p-values. Using simulations and applying the two measures to the retrieved test results, we argue that, although one of the measures suggests the use of QRPs in psychology, it is difficult to draw general conclusions concerning QRPs based on modeling of p-value distributions.

19.
J Neurosci Methods ; 270: 30-45, 2016 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27317498

RESUMO

BACKGROUND: To statistically evaluate the performance of brain-computer interfaces (BCIs), researchers usually rely on null hypothesis significance testing (NHST), i.e. p-values. However, over-reliance on NHST is often identified as one of the causes of the recent reproducibility crisis in psychology and neuroscience. NEW METHOD: In this paper we propose Bayesian estimation as an alternative to NHST in the analysis of BCI performance data. For the three most common experimental designs in BCI research - which would usually be analyzed using a t-test, a linear regression, or an ANOVA - we develop hierarchical models and estimate their parameters using Bayesian inference. Furthermore, we show that the described models are special cases of the hierarchical generalized linear model (HGLM), which we propose as a general framework for the analysis of BCI performance. RESULTS: We demonstrate the effectiveness of the proposed models on three real datasets and show how the results obtained with Bayesian estimation can give a nuanced insight into BCI performance data. Additionally, we provide all the data and code necessary to reproduce the presented results. COMPARISON WITH EXISTING METHOD(S): Compared to NHST, Bayesian estimation with the HGLM allows more flexibility in the analysis of BCI performance data from nested experimental designs, and the obtained results have a more straightforward interpretation. CONCLUSIONS: Besides gains in flexibility and interpretability, a wider adoption of the Bayesian estimation approach in BCI studies could bring about greater transparency in data analysis, allow accumulation of knowledge across studies, and reduce questionable practices such as "p-hacking".


Assuntos
Interfaces Cérebro-Computador , Estudos de Avaliação como Assunto , Ritmo alfa , Teorema de Bayes , Interpretação Estatística de Dados , Eletroencefalografia/métodos , Potenciais Evocados Visuais , Humanos , Imaginação/fisiologia , Lignanas , Modelos Lineares , Conceitos Matemáticos , Atividade Motora/fisiologia , Córtex Motor/fisiologia , Música , Resolução de Problemas/fisiologia , Descanso , Percepção Visual/fisiologia
20.
Front Psychol ; 7: 1444, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27713723

RESUMO

There is increasing concern about the replicability of studies in psychology and cognitive neuroscience. Hidden data dredging (also called p-hacking) is a major contributor to this crisis because it substantially increases Type I error resulting in a much larger proportion of false positive findings than the usually expected 5%. In order to build better intuition to avoid, detect and criticize some typical problems, here I systematically illustrate the large impact of some easy to implement and so, perhaps frequent data dredging techniques on boosting false positive findings. I illustrate several forms of two special cases of data dredging. First, researchers may violate the data collection stopping rules of null hypothesis significance testing by repeatedly checking for statistical significance with various numbers of participants. Second, researchers may group participants post hoc along potential but unplanned independent grouping variables. The first approach 'hacks' the number of participants in studies, the second approach 'hacks' the number of variables in the analysis. I demonstrate the high amount of false positive findings generated by these techniques with data from true null distributions. I also illustrate that it is extremely easy to introduce strong bias into data by very mild selection and re-testing. Similar, usually undocumented data dredging steps can easily lead to having 20-50%, or more false positives.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA