Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Sci Rep ; 14(1): 12120, 2024 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-38802451

RESUMO

A large amount of scientific literature in social and behavioural sciences bases their conclusions on one or more hypothesis tests. As such, it is important to obtain more knowledge about how researchers in social and behavioural sciences interpret quantities that result from hypothesis test metrics, such as p-values and Bayes factors. In the present study, we explored the relationship between obtained statistical evidence and the degree of belief or confidence that there is a positive effect in the population of interest. In particular, we were interested in the existence of a so-called cliff effect: A qualitative drop in the degree of belief that there is a positive effect around certain threshold values of statistical evidence (e.g., at p = 0.05). We compared this relationship for p-values to the relationship for corresponding degrees of evidence quantified through Bayes factors, and we examined whether this relationship was affected by two different modes of presentation (in one mode the functional form of the relationship across values was implicit to the participant, whereas in the other mode it was explicit). We found evidence for a higher proportion of cliff effects in p-value conditions than in BF conditions (N = 139), but we did not get a clear indication whether presentation mode had an effect on the proportion of cliff effects. PROTOCOL REGISTRATION: The stage 1 protocol for this Registered Report was accepted in principle on 2 June 2023. The protocol, as accepted by the journal, can be found at: https://doi.org/10.17605/OSF.IO/5CW6P .

2.
R Soc Open Sci ; 11(7): 240125, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39050728

RESUMO

Many-analysts studies explore how well an empirical claim withstands plausible alternative analyses of the same dataset by multiple, independent analysis teams. Conclusions from these studies typically rely on a single outcome metric (e.g. effect size) provided by each analysis team. Although informative about the range of plausible effects in a dataset, a single effect size from each team does not provide a complete, nuanced understanding of how analysis choices are related to the outcome. We used the Delphi consensus technique with input from 37 experts to develop an 18-item subjective evidence evaluation survey (SEES) to evaluate how each analysis team views the methodological appropriateness of the research design and the strength of evidence for the hypothesis. We illustrate the usefulness of the SEES in providing richer evidence assessment with pilot data from a previous many-analysts study.

3.
PLoS One ; 18(10): e0292279, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37788282

RESUMO

BACKGROUND: Publishing study results in scientific journals has been the standard way of disseminating science. However, getting results published may depend on their statistical significance. The consequence of this is that the representation of scientific knowledge might be biased. This type of bias has been called publication bias. The main objective of the present study is to get more insight into publication bias by examining it at the author, reviewer, and editor level. Additionally, we make a direct comparison between publication bias induced by authors, by reviewers, and by editors. We approached our participants by e-mail, asking them to fill out an online survey. RESULTS: Our findings suggest that statistically significant findings have a higher likelihood to be published than statistically non-significant findings, because (1) authors (n = 65) are more likely to write up and submit articles with significant results compared to articles with non-significant results (median effect size 1.10, BF10 = 1.09*107); (2) reviewers (n = 60) give more favourable reviews to articles with significant results compared to articles with non-significant results (median effect size 0.58, BF10 = 4.73*102); and (3) editors (n = 171) are more likely to accept for publication articles with significant results compared to articles with non-significant results (median effect size, 0.94, BF10 = 7.63*107). Evidence on differences in the relative contributions to publication bias by authors, reviewers, and editors is ambiguous (editors vs reviewers: BF10 = 0.31, reviewers vs authors: BF10 = 3.11, and editors vs authors: BF10 = 0.42). DISCUSSION: One of the main limitations was that rather than investigating publication bias directly, we studied potential for publication bias. Another limitation was the low response rate to the survey.


Assuntos
Autoria , Redação , Humanos , Viés de Publicação , Inquéritos e Questionários , Correio Eletrônico
4.
R Soc Open Sci ; 10(6): 221553, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37293358

RESUMO

This paper explores judgements about the replicability of social and behavioural sciences research and what drives those judgements. Using a mixed methods approach, it draws on qualitative and quantitative data elicited from groups using a structured approach called the IDEA protocol ('investigate', 'discuss', 'estimate' and 'aggregate'). Five groups of five people with relevant domain expertise evaluated 25 research claims that were subject to at least one replication study. Participants assessed the probability that each of the 25 research claims would replicate (i.e. that a replication study would find a statistically significant result in the same direction as the original study) and described the reasoning behind those judgements. We quantitatively analysed possible correlates of predictive accuracy, including self-rated expertise and updating of judgements after feedback and discussion. We qualitatively analysed the reasoning data to explore the cues, heuristics and patterns of reasoning used by participants. Participants achieved 84% classification accuracy in predicting replicability. Those who engaged in a greater breadth of reasoning provided more accurate replicability judgements. Some reasons were more commonly invoked by more accurate participants, such as 'effect size' and 'reputation' (e.g. of the field of research). There was also some evidence of a relationship between statistical literacy and accuracy.

5.
PLoS One ; 18(1): e0274429, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36701303

RESUMO

As replications of individual studies are resource intensive, techniques for predicting the replicability are required. We introduce the repliCATS (Collaborative Assessments for Trustworthy Science) process, a new method for eliciting expert predictions about the replicability of research. This process is a structured expert elicitation approach based on a modified Delphi technique applied to the evaluation of research claims in social and behavioural sciences. The utility of processes to predict replicability is their capacity to test scientific claims without the costs of full replication. Experimental data supports the validity of this process, with a validation study producing a classification accuracy of 84% and an Area Under the Curve of 0.94, meeting or exceeding the accuracy of other techniques used to predict replicability. The repliCATS process provides other benefits. It is highly scalable, able to be deployed for both rapid assessment of small numbers of claims, and assessment of high volumes of claims over an extended period through an online elicitation platform, having been used to assess 3000 research claims over an 18 month period. It is available to be implemented in a range of ways and we describe one such implementation. An important advantage of the repliCATS process is that it collects qualitative data that has the potential to provide insight in understanding the limits of generalizability of scientific claims. The primary limitation of the repliCATS process is its reliance on human-derived predictions with consequent costs in terms of participant fatigue although careful design can minimise these costs. The repliCATS process has potential applications in alternative peer review and in the allocation of effort for replication studies.


Assuntos
Ciências do Comportamento , Confiabilidade dos Dados , Humanos , Reprodutibilidade dos Testes , Custos e Análise de Custo , Revisão por Pares
6.
Nat Hum Behav ; 5(12): 1602-1607, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34711978

RESUMO

The replication crisis in the social, behavioural and life sciences has spurred a reform movement aimed at increasing the credibility of scientific studies. Many of these credibility-enhancing reforms focus, appropriately, on specific research and publication practices. A less often mentioned aspect of credibility is the need for intellectual humility or being transparent about and owning the limitations of our work. Although intellectual humility is presented as a widely accepted scientific norm, we argue that current research practice does not incentivize intellectual humility. We provide a set of recommendations on how to increase intellectual humility in research articles and highlight the central role peer reviewers can play in incentivizing authors to foreground the flaws and uncertainty in their work, thus enabling full and transparent evaluation of the validity of research.


Assuntos
Pesquisa , Ciência , Humanos
7.
R Soc Open Sci ; 8(9): 191354, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34527263

RESUMO

Current discussions on improving the reproducibility of science often revolve around statistical innovations. However, equally important for improving methodological rigour is a valid operationalization of phenomena. Operationalization is the process of translating theoretical constructs into measurable laboratory quantities. Thus, the validity of operationalization is central for the quality of empirical studies. But do differences in the validity of operationalization affect the way scientists evaluate scientific literature? To investigate this, we manipulated the strength of operationalization of three published studies and sent them to researchers via email. In the first task, researchers were presented with a summary of the Method and Result section from one of the studies and were asked to guess the hypothesis that was investigated via a multiple-choice questionnaire. In a second task, researchers were asked to rate the perceived quality of the study. Our results show that (1) researchers are better at inferring the underlying research question from empirical results if the operationalization is more valid, but (2) the different validity is only to some extent reflected in a judgement of the study's quality. These results combined give partial corroboration to the notion that researchers' evaluations of research results are not affected by operationalization validity.

8.
R Soc Open Sci ; 8(5): 201697, 2021 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-34017596

RESUMO

To overcome the frequently debated crisis of confidence, replicating studies is becoming increasingly more common. Multiple frequentist and Bayesian measures have been proposed to evaluate whether a replication is successful, but little is known about which method best captures replication success. This study is one of the first attempts to compare a number of quantitative measures of replication success with respect to their ability to draw the correct inference when the underlying truth is known, while taking publication bias into account. Our results show that Bayesian metrics seem to slightly outperform frequentist metrics across the board. Generally, meta-analytic approaches seem to slightly outperform metrics that evaluate single studies, except in the scenario of extreme publication bias, where this pattern reverses.

9.
Nat Hum Behav ; 5(11): 1473-1480, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34764461

RESUMO

We argue that statistical practice in the social and behavioural sciences benefits from transparency, a fair acknowledgement of uncertainty and openness to alternative interpretations. Here, to promote such a practice, we recommend seven concrete statistical procedures: (1) visualizing data; (2) quantifying inferential uncertainty; (3) assessing data preprocessing choices; (4) reporting multiple models; (5) involving multiple analysts; (6) interpreting results modestly; and (7) sharing data and code. We discuss their benefits and limitations, and provide guidelines for adoption. Each of the seven procedures finds inspiration in Merton's ethos of science as reflected in the norms of communalism, universalism, disinterestedness and organized scepticism. We believe that these ethical considerations-as well as their statistical consequences-establish common ground among data analysts, despite continuing disagreements about the foundations of statistical inference.


Assuntos
Estatística como Assunto , Interpretação Estatística de Dados , Humanos , Disseminação de Informação , Modelos Estatísticos , Projetos de Pesquisa/normas , Estatística como Assunto/métodos , Estatística como Assunto/normas , Incerteza
10.
Elife ; 102021 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-34751133

RESUMO

Any large dataset can be analyzed in a number of ways, and it is possible that the use of different analysis strategies will lead to different results and conclusions. One way to assess whether the results obtained depend on the analysis strategy chosen is to employ multiple analysts and leave each of them free to follow their own approach. Here, we present consensus-based guidance for conducting and reporting such multi-analyst studies, and we discuss how broader adoption of the multi-analyst approach has the potential to strengthen the robustness of results and conclusions obtained from analyses of datasets in basic and applied research.


Assuntos
Consenso , Análise de Dados , Conjuntos de Dados como Assunto , Pesquisa
11.
Elife ; 92020 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-33211009

RESUMO

Peer review practices differ substantially between journals and disciplines. This study presents the results of a survey of 322 editors of journals in ecology, economics, medicine, physics and psychology. We found that 49% of the journals surveyed checked all manuscripts for plagiarism, that 61% allowed authors to recommend both for and against specific reviewers, and that less than 6% used a form of open peer review. Most journals did not have an official policy on altering reports from reviewers, but 91% of editors identified at least one situation in which it was appropriate for an editor to alter a report. Editors were also asked for their views on five issues related to publication ethics. A majority expressed support for co-reviewing, reviewers requesting access to data, reviewers recommending citations to their work, editors publishing in their own journals, and replication studies. Our results provide a window into what is largely an opaque aspect of the scientific process. We hope the findings will inform the debate about the role and transparency of peer review in scholarly publishing.


Assuntos
Políticas Editoriais , Revisão por Pares , Publicações Periódicas como Assunto , Humanos , Inquéritos e Questionários
12.
R Soc Open Sci ; 7(4): 181351, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32431853

RESUMO

The crisis of confidence has undermined the trust that researchers place in the findings of their peers. In order to increase trust in research, initiatives such as preregistration have been suggested, which aim to prevent various questionable research practices. As it stands, however, no empirical evidence exists that preregistration does increase perceptions of trust. The picture may be complicated by a researcher's familiarity with the author of the study, regardless of the preregistration status of the research. This registered report presents an empirical assessment of the extent to which preregistration increases the trust of 209 active academics in the reported outcomes, and how familiarity with another researcher influences that trust. Contrary to our expectations, we report ambiguous Bayes factors and conclude that we do not have strong evidence towards answering our research questions. Our findings are presented along with evidence that our manipulations were ineffective for many participants, leading to the exclusion of 68% of complete datasets, and an underpowered design as a consequence. We discuss other limitations and confounds which may explain why the findings of the study deviate from a previously conducted pilot study. We reflect on the benefits of using the registered report submission format in light of our results. The OSF page for this registered report and its pilot can be found here: http://dx.doi.org/10.17605/OSF.IO/B3K75.

13.
PLoS One ; 13(4): e0195474, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29694370

RESUMO

Efficient medical progress requires that we know when a treatment effect is absent. We considered all 207 Original Articles published in the 2015 volume of the New England Journal of Medicine and found that 45 (21.7%) reported a null result for at least one of the primary outcome measures. Unfortunately, standard statistical analyses are unable to quantify the degree to which these null results actually support the null hypothesis. Such quantification is possible, however, by conducting a Bayesian hypothesis test. Here we reanalyzed a subset of 43 null results from 36 articles using a default Bayesian test for contingency tables. This Bayesian reanalysis revealed that, on average, the reported null results provided strong evidence for the absence of an effect. However, the degree of this evidence is variable and cannot be reliably predicted from the p-value. For null results, sample size is a better (albeit imperfect) predictor for the strength of evidence in favor of the null hypothesis. Together, our findings suggest that (a) the reported null results generally correspond to strong evidence in favor of the null hypothesis; (b) a Bayesian hypothesis test can provide additional information to assist the interpretation of null results.


Assuntos
Interpretação Estatística de Dados , Falha de Tratamento , Teorema de Bayes , Análise Fatorial , Humanos , Publicações Periódicas como Assunto
14.
Exp Psychol ; 65(3): 158-169, 2018 May.
Artigo em Inglês | MEDLINE | ID: mdl-29905114

RESUMO

As a research field expands, scientists have to update their knowledge and integrate the outcomes of a sequence of studies. However, such integrative judgments are generally known to fall victim to a primacy bias where people anchor their judgments on the initial information. In this preregistered study we tested the hypothesis that people anchor on the outcome of a small initial study, reducing the impact of a larger subsequent study that contradicts the initial result. Contrary to our expectation, undergraduates and academics displayed a recency bias, anchoring their judgment on the research outcome presented last. This recency bias is due to the fact that unsuccessful replications decreased trust in an effect more than did unsuccessful initial experiments. We recommend the time-reversal heuristic to account for temporal order effects during integration of research results.


Assuntos
Conhecimento , Projetos de Pesquisa/tendências , Feminino , Humanos , Masculino
15.
Int J Qual Stud Health Well-being ; 12(sup1): 1305590, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-28532325

RESUMO

Academic text books [corrected] are essential assets for disseminating knowledge about ADHD to future healthcare professionals. This study examined if they are balanced with regard to genetics. We selected and analyzed study books (N=43) used in (pre) master's programmes at 10 universities in the Netherlands. Because the mere behaviourally informed quantitative genetics give a much higher effect size of the genetic involvement in ADHD, it is important that text books [corrected] contrast these findings with molecular genetics' outcomes. The latter studies use real genetic data, and their low effect sizes expose the potential weaknesses of quantitative genetics, like underestimating the involvement of the environment. Only a quarter of books mention both effect sizes and contrast these findings, while another quarter does not discuss any effect size. Most importantly, however, roughly half of the books in our sample mention only the effect sizes from quantitative genetic studies without addressing the low explained variance of molecular genetic studies. This may confuse readers by suggesting that the weakly associated genes support the quite spectacular, but potentially flawed estimates of twin, family and adoption studies, while they actually contradict them.


Assuntos
Transtorno do Deficit de Atenção com Hiperatividade/genética , Livros , Educação de Pós-Graduação em Medicina/métodos , Genética/educação , Humanos , Países Baixos
16.
Psychon Bull Rev ; 13(6): 1033-7, 2006 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-17484431

RESUMO

Significance testing is widely used and often criticized. The Task Force on Statistical Inference of the American Psychological Association (TFSI, APA; Wilkinson & TFSI, 1999) addressed the use of significance testing and made recommendations that were incorporated in the fifth edition of the APA Publication Manual (APA, 2001). They emphasized the interpretation of significance testing and the importance of reporting confidence intervals and effect sizes. We examined whether 286 Psychonomic Bulletin & Review articles submitted before and after the publication of the TFSI recommendations by APA complied with these recommendations. Interpretation errors when using significance testing were still made frequently, and the new prescriptions were not yet followed on a large scale. Changing the practice of reporting statistics seems doomed to be a slow process.


Assuntos
Tempo de Reação , Pensamento , Percepção do Tempo , Atenção , Humanos , Julgamento , Modelos Estatísticos
17.
Psychon Bull Rev ; 23(1): 131-40, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26620955

RESUMO

Miller and Ulrich (2015) critique our claim (Hoekstra et al., Psychonomic Bulletin & Review, 21(5), 1157-1164, 2014), based on a survey given to researchers and students, of widespread misunderstanding of confidence intervals (CIs). They suggest that survey respondents may have interpreted the statements in the survey that we deemed incorrect in an idiosyncratic, but correct, way, thus calling into question the conclusion that the results indicate that respondents could not properly interpret CIs. Their alternative interpretations, while correct, cannot be deemed acceptable renderings of the questions in the survey due to the well-known reference class problem. Moreover, there is no support in the data for their contention that participants may have had their alternative interpretations in mind. Finally, their alternative interpretations are merely trivial restatements of the definition of a confidence interval, and have no implications for the location of a parameter.


Assuntos
Intervalos de Confiança , Interpretação Estatística de Dados , Humanos
18.
Psychon Bull Rev ; 23(1): 103-23, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26450628

RESUMO

Interval estimates - estimates of parameters that include an allowance for sampling uncertainty - have long been touted as a key component of statistical analyses. There are several kinds of interval estimates, but the most popular are confidence intervals (CIs): intervals that contain the true parameter value in some known proportion of repeated samples, on average. The width of confidence intervals is thought to index the precision of an estimate; CIs are thought to be a guide to which parameter values are plausible or reasonable; and the confidence coefficient of the interval (e.g., 95 %) is thought to index the plausibility that the true parameter is included in the interval. We show in a number of examples that CIs do not necessarily have any of these properties, and can lead to unjustified or arbitrary inferences. For this reason, we caution against relying upon confidence interval theory to justify interval estimates, and suggest that other theories of interval estimation should be used instead.


Assuntos
Teorema de Bayes , Intervalos de Confiança , Interpretação Estatística de Dados , Humanos
19.
R Soc Open Sci ; 3(1): 150547, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26909182

RESUMO

Openness is one of the central values of science. Open scientific practices such as sharing data, materials and analysis scripts alongside published articles have many benefits, including easier replication and extension studies, increased availability of data for theory-building and meta-analysis, and increased possibility of review and collaboration even after a paper has been published. Although modern information technology makes sharing easier than ever before, uptake of open practices had been slow. We suggest this might be in part due to a social dilemma arising from misaligned incentives and propose a specific, concrete mechanism-reviewers withholding comprehensive review-to achieve the goal of creating the expectation of open practices as a matter of scientific principle.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa