Pesquisa | Portal de Pesquisa da BVS

1.

baymedr: an R package and web application for the calculation of Bayes factors for superiority, equivalence, and non-inferiority designs.

Linde, Maximilian; van Ravenzwaaij, Don.

BMC Med Res Methodol ; 23(1): 279, 2023 11 24.

Artigo em Inglês | MEDLINE | ID: mdl-38001458

RESUMO

BACKGROUND: Clinical trials often seek to determine the superiority, equivalence, or non-inferiority of an experimental condition (e.g., a new drug) compared to a control condition (e.g., a placebo or an already existing drug). The use of frequentist statistical methods to analyze data for these types of designs is ubiquitous even though they have several limitations. Bayesian inference remedies many of these shortcomings and allows for intuitive interpretations, but are currently difficult to implement for the applied researcher. RESULTS: We outline the frequentist conceptualization of superiority, equivalence, and non-inferiority designs and discuss its disadvantages. Subsequently, we explain how Bayes factors can be used to compare the relative plausibility of competing hypotheses. We present baymedr, an R package and web application, that provides user-friendly tools for the computation of Bayes factors for superiority, equivalence, and non-inferiority designs. Instructions on how to use baymedr are provided and an example illustrates how existing results can be reanalyzed with baymedr. CONCLUSIONS: Our baymedr R package and web application enable researchers to conduct Bayesian superiority, equivalence, and non-inferiority tests. baymedr is characterized by a user-friendly implementation, making it convenient for researchers who are not statistical experts. Using baymedr, it is possible to calculate Bayes factors based on raw data and summary statistics.

Assuntos

Projetos de Pesquisa , Humanos , Teorema de Bayes

2.

Comparing the evidential strength for psychotropic drugs: a Bayesian meta-analysis.

Pittelkow, Merle-Marie; de Vries, Ymkje Anna; Monden, Rei; Bastiaansen, Jojanneke A; van Ravenzwaaij, Don.

Psychol Med ; 51(16): 2752-2761, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34620261

RESUMO

Approval and prescription of psychotropic drugs should be informed by the strength of evidence for efficacy. Using a Bayesian framework, we examined (1) whether psychotropic drugs are supported by substantial evidence (at the time of approval by the Food and Drug Administration), and (2) whether there are systematic differences across drug groups. Data from short-term, placebo-controlled phase II/III clinical trials for 15 antipsychotics, 16 antidepressants for depression, nine antidepressants for anxiety, and 20 drugs for attention deficit hyperactivity disorder (ADHD) were extracted from FDA reviews. Bayesian model-averaged meta-analysis was performed and strength of evidence was quantified (i.e. BFBMA). Strength of evidence and trialling varied between drugs. Median evidential strength was extreme for ADHD medication (BFBMA = 1820.4), moderate for antipsychotics (BFBMA = 365.4), and considerably lower and more frequently classified as weak or moderate for antidepressants for depression (BFBMA = 94.2) and anxiety (BFBMA = 49.8). Varying median effect sizes (ESschizophrenia = 0.45, ESdepression = 0.30, ESanxiety = 0.37, ESADHD = 0.72), sample sizes (Nschizophrenia = 324, Ndepression = 218, Nanxiety = 254, NADHD = 189.5), and numbers of trials (kschizophrenia = 3, kdepression = 5.5, kanxiety = 3, kADHD = 2) might account for differences. Although most drugs were supported by strong evidence at the time of approval, some only had moderate or ambiguous evidence. These results show the need for more systematic quantification and classification of statistical evidence for psychotropic drugs. Evidential strength should be communicated transparently and clearly towards clinical decision makers.

Assuntos

Antipsicóticos , Transtorno do Deficit de Atenção com Hiperatividade , Humanos , Antipsicóticos/uso terapêutico , Teorema de Bayes , Psicotrópicos/uso terapêutico , Antidepressivos/uso terapêutico , Transtorno do Deficit de Atenção com Hiperatividade/tratamento farmacológico

3.

Metastudies for robust tests of theory.

Baribault, Beth; Donkin, Chris; Little, Daniel R; Trueblood, Jennifer S; Oravecz, Zita; van Ravenzwaaij, Don; White, Corey N; De Boeck, Paul; Vandekerckhove, Joachim.

Proc Natl Acad Sci U S A ; 115(11): 2607-2612, 2018 03 13.

Artigo em Inglês | MEDLINE | ID: mdl-29531092

RESUMO

We describe and demonstrate an empirical strategy useful for discovering and replicating empirical effects in psychological science. The method involves the design of a metastudy, in which many independent experimental variables-that may be moderators of an empirical effect-are indiscriminately randomized. Radical randomization yields rich datasets that can be used to test the robustness of an empirical claim to some of the vagaries and idiosyncrasies of experimental protocols and enhances the generalizability of these claims. The strategy is made feasible by advances in hierarchical Bayesian modeling that allow for the pooling of information across unlike experiments and designs and is proposed here as a gold standard for replication research and exploratory research. The practical feasibility of the strategy is demonstrated with a replication of a study on subliminal priming.

Assuntos

Pesquisa Biomédica/normas , Projetos de Pesquisa/normas , Teorema de Bayes , Interpretação Estatística de Dados , Humanos , Distribuição Aleatória

4.

True and false positive rates for different criteria of evaluating statistical evidence from clinical trials.

van Ravenzwaaij, Don; Ioannidis, John P A.

BMC Med Res Methodol ; 19(1): 218, 2019 11 27.

Artigo em Inglês | MEDLINE | ID: mdl-31775644

RESUMO

BACKGROUND: Until recently a typical rule that has often been used for the endorsement of new medications by the Food and Drug Administration has been the existence of at least two statistically significant clinical trials favoring the new medication. This rule has consequences for the true positive (endorsement of an effective treatment) and false positive rates (endorsement of an ineffective treatment). METHODS: In this paper, we compare true positive and false positive rates for different evaluation criteria through simulations that rely on (1) conventional p-values; (2) confidence intervals based on meta-analyses assuming fixed or random effects; and (3) Bayes factors. We varied threshold levels for statistical evidence, thresholds for what constitutes a clinically meaningful treatment effect, and number of trials conducted. RESULTS: Our results show that Bayes factors, meta-analytic confidence intervals, and p-values often have similar performance. Bayes factors may perform better when the number of trials conducted is high and when trials have small sample sizes and clinically meaningful effects are not small, particularly in fields where the number of non-zero effects is relatively large. CONCLUSIONS: Thinking about realistic effect sizes in conjunction with desirable levels of statistical evidence, as well as quantifying statistical evidence with Bayes factors may help improve decision-making in some circumstances.

Assuntos

Teorema de Bayes , Ensaios Clínicos como Assunto , Interpretação Estatística de Dados , Aprovação de Drogas , Reações Falso-Negativas , Reações Falso-Positivas , Humanos , Valor Preditivo dos Testes , Tamanho da Amostra

5.

Bayes factors for superiority, non-inferiority, and equivalence designs.

van Ravenzwaaij, Don; Monden, Rei; Tendeiro, Jorge N; Ioannidis, John P A.

BMC Med Res Methodol ; 19(1): 71, 2019 03 29.

Artigo em Inglês | MEDLINE | ID: mdl-30925900

RESUMO

BACKGROUND: In clinical trials, study designs may focus on assessment of superiority, equivalence, or non-inferiority, of a new medicine or treatment as compared to a control. Typically, evidence in each of these paradigms is quantified with a variant of the null hypothesis significance test. A null hypothesis is assumed (null effect, inferior by a specific amount, inferior by a specific amount and superior by a specific amount, for superiority, non-inferiority, and equivalence respectively), after which the probabilities of obtaining data more extreme than those observed under these null hypotheses are quantified by p-values. Although ubiquitous in clinical testing, the null hypothesis significance test can lead to a number of difficulties in interpretation of the results of the statistical evidence. METHODS: We advocate quantifying evidence instead by means of Bayes factors and highlight how these can be calculated for different types of research design. RESULTS: We illustrate Bayes factors in practice with reanalyses of data from existing published studies. CONCLUSIONS: Bayes factors for superiority, non-inferiority, and equivalence designs allow for explicit quantification of evidence in favor of the null hypothesis. They also allow for interim testing without the need to employ explicit corrections for multiple testing.

Assuntos

Algoritmos , Teorema de Bayes , Medicina Baseada em Evidências/estatística & dados numéricos , Avaliação de Resultados em Cuidados de Saúde/estatística & dados numéricos , Projetos de Pesquisa , Biometria/métodos , Medicina Baseada em Evidências/métodos , Humanos , Avaliação de Resultados em Cuidados de Saúde/métodos , Equivalência Terapêutica

6.

Severity of illness and adaptive functioning predict quality of care of children among parents with psychosis: A confirmatory factor analysis.

Campbell, Linda E; Hanlon, Mary-Claire; Galletly, Cherrie A; Harvey, Carol; Stain, Helen; Cohen, Martin; van Ravenzwaaij, Don; Brown, Scott.

Aust N Z J Psychiatry ; 52(5): 435-445, 2018 05.

Artigo em Inglês | MEDLINE | ID: mdl-29103308

RESUMO

OBJECTIVE: Parenthood is central to the personal and social identity of many people. For individuals with psychotic disorders, parenthood is often associated with formidable challenges. We aimed to identify predictors of adequate parenting among parents with psychotic disorders. METHODS: Data pertaining to 234 parents with psychotic disorders living with dependent children were extracted from a population-based prevalence study, the 2010 second Australian national survey of psychosis, and analysed using confirmatory factor analysis. Parenting outcome was defined as quality of care of children, based on participant report and interviewer enquiry/exploration, and included level of participation, interest and competence in childcare during the last 12 months. RESULTS: Five hypothesis-driven latent variables were constructed and labelled psychosocial support, illness severity, substance abuse/dependence, adaptive functioning and parenting role. Importantly, 75% of participants were not identified to have any dysfunction in the quality of care provided to their child(ren). Severity of illness and adaptive functioning were reliably associated with quality of childcare. Psychosocial support, substance abuse/dependence and parenting role had an indirect relationship to the outcome variable via their association with either severity of illness and/or adaptive functioning. CONCLUSION: The majority of parents in the current sample provided adequate parenting. However, greater symptom severity and poorer adaptive functioning ultimately leave parents with significant difficulties and in need of assistance to manage their parenting obligations. As symptoms and functioning can change episodically for people with psychotic illness, provision of targeted and flexible support that can deliver temporary assistance during times of need is necessary. This would maximise the quality of care provided to vulnerable children, with potential long-term benefits.

Assuntos

Adaptação Psicológica , Educação Infantil , Filho de Pais com Deficiência , Poder Familiar , Pais , Transtornos Psicóticos , Índice de Gravidade de Doença , Adulto , Austrália , Criança , Análise Fatorial , Feminino , Inquéritos Epidemiológicos , Humanos , Masculino , Pessoa de Meia-Idade , Apoio Social , Adulto Jovem

7.

Of matchers and maximizers: How competition shapes choice under risk and uncertainty.

Schulze, Christin; van Ravenzwaaij, Don; Newell, Ben R.

Cogn Psychol ; 78: 78-98, 2015 May.

Artigo em Inglês | MEDLINE | ID: mdl-25868112

RESUMO

In a world of limited resources, scarcity and rivalry are central challenges for decision makers-animals foraging for food, corporations seeking maximal profits, and athletes training to win, all strive against others competing for the same goals. In this article, we establish the role of competitive pressures for the facilitation of optimal decision making in simple sequential binary choice tasks. In two experiments, competition was introduced with a computerized opponent whose choice behavior reinforced one of two strategies: If the opponent probabilistically imitated participant choices, probability matching was optimal; if the opponent was indifferent, probability maximizing was optimal. We observed accurate asymptotic strategy use in both conditions irrespective of the provision of outcome probabilities, suggesting that participants were sensitive to the differences in opponent behavior. An analysis of reinforcement learning models established that computational conceptualizations of opponent behavior are critical to account for the observed divergence in strategy adoption. Our results provide a novel appraisal of probability matching and show how this individually 'irrational' choice phenomenon can be socially adaptive under competition.

Assuntos

Comportamento de Escolha , Comportamento Competitivo , Risco , Incerteza , Adolescente , Tomada de Decisões , Feminino , Humanos , Masculino , Probabilidade , Reforço Psicológico , Adulto Jovem

8.

Is the unconscious, if it exists, a superior decision maker?

Huizenga, Hilde M; van Duijvenvoorde, Anna C K; van Ravenzwaaij, Don; Wetzels, Ruud; Jansen, Brenda R J.

Behav Brain Sci ; 37(1): 32-3, 2014 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-24461083

RESUMO

Newell & Shanks (N&S) show that there is no convincing evidence that processes assumed to be unconscious and superior are indeed unconscious. We take their argument one step further by showing that there is also no convincing evidence that these processes are superior. We review alternative paradigms that may provide more convincing tests of the superiority of (presumed) unconscious processes.

Assuntos

Tomada de Decisões , Inconsciente Psicológico , Humanos

9.

Comparing researchers' degree of dichotomous thinking using frequentist versus Bayesian null hypothesis testing.

Muradchanian, Jasmine; Hoekstra, Rink; Kiers, Henk; Fife, Dustin; van Ravenzwaaij, Don.

Sci Rep ; 14(1): 12120, 2024 May 27.

Artigo em Inglês | MEDLINE | ID: mdl-38802451

RESUMO

A large amount of scientific literature in social and behavioural sciences bases their conclusions on one or more hypothesis tests. As such, it is important to obtain more knowledge about how researchers in social and behavioural sciences interpret quantities that result from hypothesis test metrics, such as p-values and Bayes factors. In the present study, we explored the relationship between obtained statistical evidence and the degree of belief or confidence that there is a positive effect in the population of interest. In particular, we were interested in the existence of a so-called cliff effect: A qualitative drop in the degree of belief that there is a positive effect around certain threshold values of statistical evidence (e.g., at p = 0.05). We compared this relationship for p-values to the relationship for corresponding degrees of evidence quantified through Bayes factors, and we examined whether this relationship was affected by two different modes of presentation (in one mode the functional form of the relationship across values was implicit to the participant, whereas in the other mode it was explicit). We found evidence for a higher proportion of cliff effects in p-value conditions than in BF conditions (N = 139), but we did not get a clear indication whether presentation mode had an effect on the proportion of cliff effects. PROTOCOL REGISTRATION: The stage 1 protocol for this Registered Report was accepted in principle on 2 June 2023. The protocol, as accepted by the journal, can be found at: https://doi.org/10.17605/OSF.IO/5CW6P .

10.

Probability matching in risky choice: the interplay of feedback and strategy availability.

Newell, Ben R; Koehler, Derek J; James, Greta; Rakow, Tim; van Ravenzwaaij, Don.

Mem Cognit ; 41(3): 329-38, 2013 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-23135749

RESUMO

Probability matching in sequential decision making is a striking violation of rational choice that has been observed in hundreds of experiments. Recent studies have demonstrated that matching persists even in described tasks in which all the information required for identifying a superior alternative strategy-maximizing-is present before the first choice is made. These studies have also indicated that maximizing increases when (1) the asymmetry in the availability of matching and maximizing strategies is reduced and (2) normatively irrelevant outcome feedback is provided. In the two experiments reported here, we examined the joint influences of these factors, revealing that strategy availability and outcome feedback operate on different time courses. Both behavioral and modeling results showed that while availability of the maximizing strategy increases the choice of maximizing early during the task, feedback appears to act more slowly to erode misconceptions about the task and to reinforce optimal responding. The results illuminate the interplay between "top-down" identification of choice strategies and "bottom-up" discovery of those strategies via feedback.

Assuntos

Tomada de Decisões , Retroalimentação Psicológica , Probabilidade , Resolução de Problemas , Adulto , Comportamento de Escolha , Feminino , Humanos , Masculino , Modelos Psicológicos , Distribuição Aleatória , Adulto Jovem

11.

A quantum of truth? Querying the alternative benchmark for human cognition.

Newell, Ben R; van Ravenzwaaij, Don; Donkin, Chris.

Behav Brain Sci ; 36(3): 300-2, 2013 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-23673047

RESUMO

We focus on two issues: (1) an unusual, counterintuitive prediction that quantum probability (QP) theory appears to make regarding multiple sequential judgments, and (2) the extent to which QP is an appropriate and comprehensive benchmark for assessing judgment. These issues highlight how QP theory can fall prey to the same problems of arbitrariness that Pothos & Busemeyer (P&B) discuss as plaguing other models.

Assuntos

Cognição , Modelos Psicológicos , Teoria da Probabilidade , Teoria Quântica , Humanos

12.

The role of results in deciding to publish: A direct comparison across authors, reviewers, and editors based on an online survey.

Muradchanian, Jasmine; Hoekstra, Rink; Kiers, Henk; van Ravenzwaaij, Don.

PLoS One ; 18(10): e0292279, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37788282

RESUMO

BACKGROUND: Publishing study results in scientific journals has been the standard way of disseminating science. However, getting results published may depend on their statistical significance. The consequence of this is that the representation of scientific knowledge might be biased. This type of bias has been called publication bias. The main objective of the present study is to get more insight into publication bias by examining it at the author, reviewer, and editor level. Additionally, we make a direct comparison between publication bias induced by authors, by reviewers, and by editors. We approached our participants by e-mail, asking them to fill out an online survey. RESULTS: Our findings suggest that statistically significant findings have a higher likelihood to be published than statistically non-significant findings, because (1) authors (n = 65) are more likely to write up and submit articles with significant results compared to articles with non-significant results (median effect size 1.10, BF10 = 1.09*107); (2) reviewers (n = 60) give more favourable reviews to articles with significant results compared to articles with non-significant results (median effect size 0.58, BF10 = 4.73*102); and (3) editors (n = 171) are more likely to accept for publication articles with significant results compared to articles with non-significant results (median effect size, 0.94, BF10 = 7.63*107). Evidence on differences in the relative contributions to publication bias by authors, reviewers, and editors is ambiguous (editors vs reviewers: BF10 = 0.31, reviewers vs authors: BF10 = 3.11, and editors vs authors: BF10 = 0.42). DISCUSSION: One of the main limitations was that rather than investigating publication bias directly, we studied potential for publication bias. Another limitation was the low response rate to the survey.

Assuntos

Autoria , Redação , Humanos , Viés de Publicação , Inquéritos e Questionários , Correio Eletrônico

13.

Decisions about equivalence: A comparison of TOST, HDI-ROPE, and the Bayes factor.

Linde, Maximilian; Tendeiro, Jorge N; Selker, Ravi; Wagenmakers, Eric-Jan; van Ravenzwaaij, Don.

Psychol Methods ; 28(3): 740-755, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-34735173

RESUMO

Some important research questions require the ability to find evidence for two conditions being practically equivalent. This is impossible to accomplish within the traditional frequentist null hypothesis significance testing framework; hence, other methodologies must be utilized. We explain and illustrate three approaches for finding evidence for equivalence: The frequentist two one-sided tests procedure, the Bayesian highest density interval region of practical equivalence procedure, and the Bayes factor interval null procedure. We compare the classification performances of these three approaches for various plausible scenarios. The results indicate that the Bayes factor interval null approach compares favorably to the other two approaches in terms of statistical power. Critically, compared with the Bayes factor interval null procedure, the two one-sided tests and the highest density interval region of practical equivalence procedures have limited discrimination capabilities when the sample size is relatively small: Specifically, in order to be practically useful, these two methods generally require over 250 cases within each condition when rather large equivalence margins of approximately .2 or .3 are used; for smaller equivalence margins even more cases are required. Because of these results, we recommend that researchers rely more on the Bayes factor interval null approach for quantifying evidence for equivalence, especially for studies that are constrained on sample size. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

Assuntos

Projetos de Pesquisa , Humanos , Teorema de Bayes , Tamanho da Amostra

14.

The process of replication target selection in psychology: what to consider?

Pittelkow, Merle-Marie; Field, Sarahanne M; Isager, Peder M; Van't Veer, Anna E; Anderson, Thomas; Cole, Scott N; Dominik, Tomás; Giner-Sorolla, Roger; Gok, Sebahat; Heyman, Tom; Jekel, Marc; Luke, Timothy J; Mitchell, David B; Peels, Rik; Pendrous, Rosina; Sarrazin, Samuel; Schauer, Jacob M; Specker, Eva; Tran, Ulrich S; Vranka, Marek A; Wicherts, Jelte M; Yoshimura, Naoto; Zwaan, Rolf A; van Ravenzwaaij, Don.

R Soc Open Sci ; 10(2): 210586, 2023 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-36756069

RESUMO

Increased execution of replication studies contributes to the effort to restore credibility of empirical research. However, a second generation of problems arises: the number of potential replication targets is at a serious mismatch with available resources. Given limited resources, replication target selection should be well-justified, systematic and transparently communicated. At present the discussion on what to consider when selecting a replication target is limited to theoretical discussion, self-reported justifications and a few formalized suggestions. In this Registered Report, we proposed a study involving the scientific community to create a list of considerations for consultation when selecting a replication target in psychology. We employed a modified Delphi approach. First, we constructed a preliminary list of considerations. Second, we surveyed psychologists who previously selected a replication target with regards to their considerations. Third, we incorporated the results into the preliminary list of considerations and sent the updated list to a group of individuals knowledgeable about concerns regarding replication target selection. Over the course of several rounds, we established consensus regarding what to consider when selecting a replication target. The resulting checklist can be used for transparently communicating the rationale for selecting studies for replication.

15.

A review of applications of the Bayes factor in psychological research.

Heck, Daniel W; Boehm, Udo; Böing-Messing, Florian; Bürkner, Paul-Christian; Derks, Koen; Dienes, Zoltan; Fu, Qianrao; Gu, Xin; Karimova, Diana; Kiers, Henk A L; Klugkist, Irene; Kuiper, Rebecca M; Lee, Michael D; Leenders, Roger; Leplaa, Hidde J; Linde, Maximilian; Ly, Alexander; Meijerink-Bosman, Marlyne; Moerbeek, Mirjam; Mulder, Joris; Palfi, Bence; Schönbrodt, Felix D; Tendeiro, Jorge N; van den Bergh, Don; Van Lissa, Caspar J; van Ravenzwaaij, Don; Vanpaemel, Wolf; Wagenmakers, Eric-Jan; Williams, Donald R; Zondervan-Zwijnenburg, Mariëlle; Hoijtink, Herbert.

Psychol Methods ; 28(3): 558-579, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-35298215

RESUMO

The last 25 years have shown a steady increase in attention for the Bayes factor as a tool for hypothesis evaluation and model selection. The present review highlights the potential of the Bayes factor in psychological research. We discuss six types of applications: Bayesian evaluation of point null, interval, and informative hypotheses, Bayesian evidence synthesis, Bayesian variable selection and model averaging, and Bayesian evaluation of cognitive models. We elaborate what each application entails, give illustrative examples, and provide an overview of key references and software with links to other applications. The article is concluded with a discussion of the opportunities and pitfalls of Bayes factor applications and a sketch of corresponding future research lines. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

Assuntos

Teorema de Bayes , Pesquisa Comportamental , Psicologia , Humanos , Pesquisa Comportamental/métodos , Psicologia/métodos , Software , Projetos de Pesquisa

16.

Predicting reliability through structured expert elicitation with the repliCATS (Collaborative Assessments for Trustworthy Science) process.

Fraser, Hannah; Bush, Martin; Wintle, Bonnie C; Mody, Fallon; Smith, Eden T; Hanea, Anca M; Gould, Elliot; Hemming, Victoria; Hamilton, Daniel G; Rumpff, Libby; Wilkinson, David P; Pearson, Ross; Singleton Thorn, Felix; Ashton, Raquel; Willcox, Aaron; Gray, Charles T; Head, Andrew; Ross, Melissa; Groenewegen, Rebecca; Marcoci, Alexandru; Vercammen, Ans; Parker, Timothy H; Hoekstra, Rink; Nakagawa, Shinichi; Mandel, David R; van Ravenzwaaij, Don; McBride, Marissa; Sinnott, Richard O; Vesk, Peter; Burgman, Mark; Fidler, Fiona.

PLoS One ; 18(1): e0274429, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36701303

RESUMO

As replications of individual studies are resource intensive, techniques for predicting the replicability are required. We introduce the repliCATS (Collaborative Assessments for Trustworthy Science) process, a new method for eliciting expert predictions about the replicability of research. This process is a structured expert elicitation approach based on a modified Delphi technique applied to the evaluation of research claims in social and behavioural sciences. The utility of processes to predict replicability is their capacity to test scientific claims without the costs of full replication. Experimental data supports the validity of this process, with a validation study producing a classification accuracy of 84% and an Area Under the Curve of 0.94, meeting or exceeding the accuracy of other techniques used to predict replicability. The repliCATS process provides other benefits. It is highly scalable, able to be deployed for both rapid assessment of small numbers of claims, and assessment of high volumes of claims over an extended period through an online elicitation platform, having been used to assess 3000 research claims over an 18 month period. It is available to be implemented in a range of ways and we describe one such implementation. An important advantage of the repliCATS process is that it collects qualitative data that has the potential to provide insight in understanding the limits of generalizability of scientific claims. The primary limitation of the repliCATS process is its reliance on human-derived predictions with consequent costs in terms of participant fatigue although careful design can minimise these costs. The repliCATS process has potential applications in alternative peer review and in the allocation of effort for replication studies.

Assuntos

Ciências do Comportamento , Confiabilidade dos Dados , Humanos , Reprodutibilidade dos Testes , Custos e Análise de Custo , Revisão por Pares

17.

Advantages masquerading as "issues" in Bayesian hypothesis testing: A commentary on Tendeiro and Kiers (2019).

van Ravenzwaaij, Don; Wagenmakers, Eric-Jan.

Psychol Methods ; 27(3): 451-465, 2022 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-34881956

RESUMO

Tendeiro and Kiers (2019) provide a detailed and scholarly critique of Null Hypothesis Bayesian Testing (NHBT) and its central component-the Bayes factor-that allows researchers to update knowledge and quantify statistical evidence. Tendeiro and Kiers conclude that NHBT constitutes an improvement over frequentist p-values, but primarily elaborate on a list of 11 "issues" of NHBT. We believe that several issues identified by Tendeiro and Kiers are of central importance for elucidating the complementary roles of hypothesis testing versus parameter estimation and for appreciating the virtue of statistical thinking over conducting statistical rituals. But although we agree with many of their thoughtful recommendations, we believe that Tendeiro and Kiers are overly pessimistic, and that several of their "issues" with NHBT may in fact be conceived as pronounced advantages. We illustrate our arguments with simple, concrete examples and end with a critical discussion of one of the recommendations by Tendeiro and Kiers, which is that "estimation of the full posterior distribution offers a more complete picture" than a Bayes factor hypothesis test. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

Assuntos

Conhecimento , Projetos de Pesquisa , Teorema de Bayes , Humanos

18.

Worked-out examples of the adequacy of Bayesian optional stopping.

Tendeiro, Jorge N; Kiers, Henk A L; van Ravenzwaaij, Don.

Psychon Bull Rev ; 29(1): 70-87, 2022 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-34254263

RESUMO

The practice of sequentially testing a null hypothesis as data are collected until the null hypothesis is rejected is known as optional stopping. It is well known that optional stopping is problematic in the context of p value-based null hypothesis significance testing: The false-positive rates quickly overcome the single test's significance level. However, the state of affairs under null hypothesis Bayesian testing, where p values are replaced by Bayes factors, has perhaps surprisingly been much less consensual. Rouder (2014) used simulations to defend the use of optional stopping under null hypothesis Bayesian testing. The idea behind these simulations is closely related to the idea of sampling from prior predictive distributions. Deng et al. (2016) and Hendriksen et al. (2020) have provided mathematical evidence to the effect that optional stopping under null hypothesis Bayesian testing does hold under some conditions. These papers are, however, exceedingly technical for most researchers in the applied social sciences. In this paper, we provide some mathematical derivations concerning Rouder's approximate simulation results for the two Bayesian hypothesis tests that he considered. The key idea is to consider the probability distribution of the Bayes factor, which is regarded as being a random variable across repeated sampling. This paper therefore offers an intuitive perspective to the literature and we believe it is a valid contribution towards understanding the practice of optional stopping in the context of Bayesian hypothesis testing.

Assuntos

Projetos de Pesquisa , Teorema de Bayes , Simulação por Computador , Humanos , Probabilidade

19.

When numbers fail: do researchers agree on operationalization of published research?

Haucke, Matthias; Hoekstra, Rink; van Ravenzwaaij, Don.

R Soc Open Sci ; 8(9): 191354, 2021 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-34527263

RESUMO

Current discussions on improving the reproducibility of science often revolve around statistical innovations. However, equally important for improving methodological rigour is a valid operationalization of phenomena. Operationalization is the process of translating theoretical constructs into measurable laboratory quantities. Thus, the validity of operationalization is central for the quality of empirical studies. But do differences in the validity of operationalization affect the way scientists evaluate scientific literature? To investigate this, we manipulated the strength of operationalization of three published studies and sent them to researchers via email. In the first task, researchers were presented with a summary of the Method and Result section from one of the studies and were asked to guess the hypothesis that was investigated via a multiple-choice questionnaire. In a second task, researchers were asked to rate the perceived quality of the study. Our results show that (1) researchers are better at inferring the underlying research question from empirical results if the operationalization is more valid, but (2) the different validity is only to some extent reflected in a judgement of the study's quality. These results combined give partial corroboration to the notion that researchers' evaluations of research results are not affected by operationalization validity.

20.

How best to quantify replication success? A simulation study on the comparison of replication success metrics.

Muradchanian, Jasmine; Hoekstra, Rink; Kiers, Henk; van Ravenzwaaij, Don.

R Soc Open Sci ; 8(5): 201697, 2021 May 19.

Artigo em Inglês | MEDLINE | ID: mdl-34017596

RESUMO

To overcome the frequently debated crisis of confidence, replicating studies is becoming increasingly more common. Multiple frequentist and Bayesian measures have been proposed to evaluate whether a replication is successful, but little is known about which method best captures replication success. This study is one of the first attempts to compare a number of quantitative measures of replication success with respect to their ability to draw the correct inference when the underlying truth is known, while taking publication bias into account. Our results show that Bayesian metrics seem to slightly outperform frequentist metrics across the board. Generally, meta-analytic approaches seem to slightly outperform metrics that evaluate single studies, except in the scenario of extreme publication bias, where this pattern reverses.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA