Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
R Soc Open Sci ; 11(7): 240125, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39050728

RESUMEN

Many-analysts studies explore how well an empirical claim withstands plausible alternative analyses of the same dataset by multiple, independent analysis teams. Conclusions from these studies typically rely on a single outcome metric (e.g. effect size) provided by each analysis team. Although informative about the range of plausible effects in a dataset, a single effect size from each team does not provide a complete, nuanced understanding of how analysis choices are related to the outcome. We used the Delphi consensus technique with input from 37 experts to develop an 18-item subjective evidence evaluation survey (SEES) to evaluate how each analysis team views the methodological appropriateness of the research design and the strength of evidence for the hypothesis. We illustrate the usefulness of the SEES in providing richer evidence assessment with pilot data from a previous many-analysts study.

2.
Sci Rep ; 14(1): 12120, 2024 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-38802451

RESUMEN

A large amount of scientific literature in social and behavioural sciences bases their conclusions on one or more hypothesis tests. As such, it is important to obtain more knowledge about how researchers in social and behavioural sciences interpret quantities that result from hypothesis test metrics, such as p-values and Bayes factors. In the present study, we explored the relationship between obtained statistical evidence and the degree of belief or confidence that there is a positive effect in the population of interest. In particular, we were interested in the existence of a so-called cliff effect: A qualitative drop in the degree of belief that there is a positive effect around certain threshold values of statistical evidence (e.g., at p = 0.05). We compared this relationship for p-values to the relationship for corresponding degrees of evidence quantified through Bayes factors, and we examined whether this relationship was affected by two different modes of presentation (in one mode the functional form of the relationship across values was implicit to the participant, whereas in the other mode it was explicit). We found evidence for a higher proportion of cliff effects in p-value conditions than in BF conditions (N = 139), but we did not get a clear indication whether presentation mode had an effect on the proportion of cliff effects. PROTOCOL REGISTRATION: The stage 1 protocol for this Registered Report was accepted in principle on 2 June 2023. The protocol, as accepted by the journal, can be found at: https://doi.org/10.17605/OSF.IO/5CW6P .

3.
PLoS One ; 18(10): e0292279, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37788282

RESUMEN

BACKGROUND: Publishing study results in scientific journals has been the standard way of disseminating science. However, getting results published may depend on their statistical significance. The consequence of this is that the representation of scientific knowledge might be biased. This type of bias has been called publication bias. The main objective of the present study is to get more insight into publication bias by examining it at the author, reviewer, and editor level. Additionally, we make a direct comparison between publication bias induced by authors, by reviewers, and by editors. We approached our participants by e-mail, asking them to fill out an online survey. RESULTS: Our findings suggest that statistically significant findings have a higher likelihood to be published than statistically non-significant findings, because (1) authors (n = 65) are more likely to write up and submit articles with significant results compared to articles with non-significant results (median effect size 1.10, BF10 = 1.09*107); (2) reviewers (n = 60) give more favourable reviews to articles with significant results compared to articles with non-significant results (median effect size 0.58, BF10 = 4.73*102); and (3) editors (n = 171) are more likely to accept for publication articles with significant results compared to articles with non-significant results (median effect size, 0.94, BF10 = 7.63*107). Evidence on differences in the relative contributions to publication bias by authors, reviewers, and editors is ambiguous (editors vs reviewers: BF10 = 0.31, reviewers vs authors: BF10 = 3.11, and editors vs authors: BF10 = 0.42). DISCUSSION: One of the main limitations was that rather than investigating publication bias directly, we studied potential for publication bias. Another limitation was the low response rate to the survey.


Asunto(s)
Autoria , Escritura , Humanos , Sesgo de Publicación , Encuestas y Cuestionarios , Correo Electrónico
4.
R Soc Open Sci ; 10(6): 221553, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37293358

RESUMEN

This paper explores judgements about the replicability of social and behavioural sciences research and what drives those judgements. Using a mixed methods approach, it draws on qualitative and quantitative data elicited from groups using a structured approach called the IDEA protocol ('investigate', 'discuss', 'estimate' and 'aggregate'). Five groups of five people with relevant domain expertise evaluated 25 research claims that were subject to at least one replication study. Participants assessed the probability that each of the 25 research claims would replicate (i.e. that a replication study would find a statistically significant result in the same direction as the original study) and described the reasoning behind those judgements. We quantitatively analysed possible correlates of predictive accuracy, including self-rated expertise and updating of judgements after feedback and discussion. We qualitatively analysed the reasoning data to explore the cues, heuristics and patterns of reasoning used by participants. Participants achieved 84% classification accuracy in predicting replicability. Those who engaged in a greater breadth of reasoning provided more accurate replicability judgements. Some reasons were more commonly invoked by more accurate participants, such as 'effect size' and 'reputation' (e.g. of the field of research). There was also some evidence of a relationship between statistical literacy and accuracy.

5.
PLoS One ; 18(1): e0274429, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36701303

RESUMEN

As replications of individual studies are resource intensive, techniques for predicting the replicability are required. We introduce the repliCATS (Collaborative Assessments for Trustworthy Science) process, a new method for eliciting expert predictions about the replicability of research. This process is a structured expert elicitation approach based on a modified Delphi technique applied to the evaluation of research claims in social and behavioural sciences. The utility of processes to predict replicability is their capacity to test scientific claims without the costs of full replication. Experimental data supports the validity of this process, with a validation study producing a classification accuracy of 84% and an Area Under the Curve of 0.94, meeting or exceeding the accuracy of other techniques used to predict replicability. The repliCATS process provides other benefits. It is highly scalable, able to be deployed for both rapid assessment of small numbers of claims, and assessment of high volumes of claims over an extended period through an online elicitation platform, having been used to assess 3000 research claims over an 18 month period. It is available to be implemented in a range of ways and we describe one such implementation. An important advantage of the repliCATS process is that it collects qualitative data that has the potential to provide insight in understanding the limits of generalizability of scientific claims. The primary limitation of the repliCATS process is its reliance on human-derived predictions with consequent costs in terms of participant fatigue although careful design can minimise these costs. The repliCATS process has potential applications in alternative peer review and in the allocation of effort for replication studies.


Asunto(s)
Ciencias de la Conducta , Exactitud de los Datos , Humanos , Reproducibilidad de los Resultados , Costos y Análisis de Costo , Revisión por Pares
6.
Nat Hum Behav ; 5(11): 1473-1480, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34764461

RESUMEN

We argue that statistical practice in the social and behavioural sciences benefits from transparency, a fair acknowledgement of uncertainty and openness to alternative interpretations. Here, to promote such a practice, we recommend seven concrete statistical procedures: (1) visualizing data; (2) quantifying inferential uncertainty; (3) assessing data preprocessing choices; (4) reporting multiple models; (5) involving multiple analysts; (6) interpreting results modestly; and (7) sharing data and code. We discuss their benefits and limitations, and provide guidelines for adoption. Each of the seven procedures finds inspiration in Merton's ethos of science as reflected in the norms of communalism, universalism, disinterestedness and organized scepticism. We believe that these ethical considerations-as well as their statistical consequences-establish common ground among data analysts, despite continuing disagreements about the foundations of statistical inference.


Asunto(s)
Estadística como Asunto , Interpretación Estadística de Datos , Humanos , Difusión de la Información , Modelos Estadísticos , Proyectos de Investigación/normas , Estadística como Asunto/métodos , Estadística como Asunto/normas , Incertidumbre
7.
Elife ; 102021 11 09.
Artículo en Inglés | MEDLINE | ID: mdl-34751133

RESUMEN

Any large dataset can be analyzed in a number of ways, and it is possible that the use of different analysis strategies will lead to different results and conclusions. One way to assess whether the results obtained depend on the analysis strategy chosen is to employ multiple analysts and leave each of them free to follow their own approach. Here, we present consensus-based guidance for conducting and reporting such multi-analyst studies, and we discuss how broader adoption of the multi-analyst approach has the potential to strengthen the robustness of results and conclusions obtained from analyses of datasets in basic and applied research.


Asunto(s)
Consenso , Análisis de Datos , Conjuntos de Datos como Asunto , Investigación
8.
Nat Hum Behav ; 5(12): 1602-1607, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34711978

RESUMEN

The replication crisis in the social, behavioural and life sciences has spurred a reform movement aimed at increasing the credibility of scientific studies. Many of these credibility-enhancing reforms focus, appropriately, on specific research and publication practices. A less often mentioned aspect of credibility is the need for intellectual humility or being transparent about and owning the limitations of our work. Although intellectual humility is presented as a widely accepted scientific norm, we argue that current research practice does not incentivize intellectual humility. We provide a set of recommendations on how to increase intellectual humility in research articles and highlight the central role peer reviewers can play in incentivizing authors to foreground the flaws and uncertainty in their work, thus enabling full and transparent evaluation of the validity of research.


Asunto(s)
Investigación , Ciencia , Humanos
9.
R Soc Open Sci ; 8(9): 191354, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-34527263

RESUMEN

Current discussions on improving the reproducibility of science often revolve around statistical innovations. However, equally important for improving methodological rigour is a valid operationalization of phenomena. Operationalization is the process of translating theoretical constructs into measurable laboratory quantities. Thus, the validity of operationalization is central for the quality of empirical studies. But do differences in the validity of operationalization affect the way scientists evaluate scientific literature? To investigate this, we manipulated the strength of operationalization of three published studies and sent them to researchers via email. In the first task, researchers were presented with a summary of the Method and Result section from one of the studies and were asked to guess the hypothesis that was investigated via a multiple-choice questionnaire. In a second task, researchers were asked to rate the perceived quality of the study. Our results show that (1) researchers are better at inferring the underlying research question from empirical results if the operationalization is more valid, but (2) the different validity is only to some extent reflected in a judgement of the study's quality. These results combined give partial corroboration to the notion that researchers' evaluations of research results are not affected by operationalization validity.

10.
R Soc Open Sci ; 8(5): 201697, 2021 May 19.
Artículo en Inglés | MEDLINE | ID: mdl-34017596

RESUMEN

To overcome the frequently debated crisis of confidence, replicating studies is becoming increasingly more common. Multiple frequentist and Bayesian measures have been proposed to evaluate whether a replication is successful, but little is known about which method best captures replication success. This study is one of the first attempts to compare a number of quantitative measures of replication success with respect to their ability to draw the correct inference when the underlying truth is known, while taking publication bias into account. Our results show that Bayesian metrics seem to slightly outperform frequentist metrics across the board. Generally, meta-analytic approaches seem to slightly outperform metrics that evaluate single studies, except in the scenario of extreme publication bias, where this pattern reverses.

11.
Elife ; 92020 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-33211009

RESUMEN

Peer review practices differ substantially between journals and disciplines. This study presents the results of a survey of 322 editors of journals in ecology, economics, medicine, physics and psychology. We found that 49% of the journals surveyed checked all manuscripts for plagiarism, that 61% allowed authors to recommend both for and against specific reviewers, and that less than 6% used a form of open peer review. Most journals did not have an official policy on altering reports from reviewers, but 91% of editors identified at least one situation in which it was appropriate for an editor to alter a report. Editors were also asked for their views on five issues related to publication ethics. A majority expressed support for co-reviewing, reviewers requesting access to data, reviewers recommending citations to their work, editors publishing in their own journals, and replication studies. Our results provide a window into what is largely an opaque aspect of the scientific process. We hope the findings will inform the debate about the role and transparency of peer review in scholarly publishing.


Asunto(s)
Políticas Editoriales , Revisión por Pares , Publicaciones Periódicas como Asunto , Humanos , Encuestas y Cuestionarios
12.
R Soc Open Sci ; 7(4): 181351, 2020 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-32431853

RESUMEN

The crisis of confidence has undermined the trust that researchers place in the findings of their peers. In order to increase trust in research, initiatives such as preregistration have been suggested, which aim to prevent various questionable research practices. As it stands, however, no empirical evidence exists that preregistration does increase perceptions of trust. The picture may be complicated by a researcher's familiarity with the author of the study, regardless of the preregistration status of the research. This registered report presents an empirical assessment of the extent to which preregistration increases the trust of 209 active academics in the reported outcomes, and how familiarity with another researcher influences that trust. Contrary to our expectations, we report ambiguous Bayes factors and conclude that we do not have strong evidence towards answering our research questions. Our findings are presented along with evidence that our manipulations were ineffective for many participants, leading to the exclusion of 68% of complete datasets, and an underpowered design as a consequence. We discuss other limitations and confounds which may explain why the findings of the study deviate from a previously conducted pilot study. We reflect on the benefits of using the registered report submission format in light of our results. The OSF page for this registered report and its pilot can be found here: http://dx.doi.org/10.17605/OSF.IO/B3K75.

14.
Exp Psychol ; 65(3): 158-169, 2018 May.
Artículo en Inglés | MEDLINE | ID: mdl-29905114

RESUMEN

As a research field expands, scientists have to update their knowledge and integrate the outcomes of a sequence of studies. However, such integrative judgments are generally known to fall victim to a primacy bias where people anchor their judgments on the initial information. In this preregistered study we tested the hypothesis that people anchor on the outcome of a small initial study, reducing the impact of a larger subsequent study that contradicts the initial result. Contrary to our expectation, undergraduates and academics displayed a recency bias, anchoring their judgment on the research outcome presented last. This recency bias is due to the fact that unsuccessful replications decreased trust in an effect more than did unsuccessful initial experiments. We recommend the time-reversal heuristic to account for temporal order effects during integration of research results.


Asunto(s)
Conocimiento , Proyectos de Investigación/tendencias , Femenino , Humanos , Masculino
15.
PLoS One ; 13(4): e0195474, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29694370

RESUMEN

Efficient medical progress requires that we know when a treatment effect is absent. We considered all 207 Original Articles published in the 2015 volume of the New England Journal of Medicine and found that 45 (21.7%) reported a null result for at least one of the primary outcome measures. Unfortunately, standard statistical analyses are unable to quantify the degree to which these null results actually support the null hypothesis. Such quantification is possible, however, by conducting a Bayesian hypothesis test. Here we reanalyzed a subset of 43 null results from 36 articles using a default Bayesian test for contingency tables. This Bayesian reanalysis revealed that, on average, the reported null results provided strong evidence for the absence of an effect. However, the degree of this evidence is variable and cannot be reliably predicted from the p-value. For null results, sample size is a better (albeit imperfect) predictor for the strength of evidence in favor of the null hypothesis. Together, our findings suggest that (a) the reported null results generally correspond to strong evidence in favor of the null hypothesis; (b) a Bayesian hypothesis test can provide additional information to assist the interpretation of null results.


Asunto(s)
Interpretación Estadística de Datos , Insuficiencia del Tratamiento , Teorema de Bayes , Análisis Factorial , Humanos , Publicaciones Periódicas como Asunto
16.
Int J Qual Stud Health Well-being ; 12(sup1): 1305590, 2017 06.
Artículo en Inglés | MEDLINE | ID: mdl-28532325

RESUMEN

Academic text books [corrected] are essential assets for disseminating knowledge about ADHD to future healthcare professionals. This study examined if they are balanced with regard to genetics. We selected and analyzed study books (N=43) used in (pre) master's programmes at 10 universities in the Netherlands. Because the mere behaviourally informed quantitative genetics give a much higher effect size of the genetic involvement in ADHD, it is important that text books [corrected] contrast these findings with molecular genetics' outcomes. The latter studies use real genetic data, and their low effect sizes expose the potential weaknesses of quantitative genetics, like underestimating the involvement of the environment. Only a quarter of books mention both effect sizes and contrast these findings, while another quarter does not discuss any effect size. Most importantly, however, roughly half of the books in our sample mention only the effect sizes from quantitative genetic studies without addressing the low explained variance of molecular genetic studies. This may confuse readers by suggesting that the weakly associated genes support the quite spectacular, but potentially flawed estimates of twin, family and adoption studies, while they actually contradict them.


Asunto(s)
Trastorno por Déficit de Atención con Hiperactividad/genética , Libros , Educación de Postgrado en Medicina/métodos , Genética/educación , Humanos , Países Bajos
17.
R Soc Open Sci ; 3(1): 150547, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26909182

RESUMEN

Openness is one of the central values of science. Open scientific practices such as sharing data, materials and analysis scripts alongside published articles have many benefits, including easier replication and extension studies, increased availability of data for theory-building and meta-analysis, and increased possibility of review and collaboration even after a paper has been published. Although modern information technology makes sharing easier than ever before, uptake of open practices had been slow. We suggest this might be in part due to a social dilemma arising from misaligned incentives and propose a specific, concrete mechanism-reviewers withholding comprehensive review-to achieve the goal of creating the expectation of open practices as a matter of scientific principle.

18.
Psychon Bull Rev ; 23(1): 131-40, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26620955

RESUMEN

Miller and Ulrich (2015) critique our claim (Hoekstra et al., Psychonomic Bulletin & Review, 21(5), 1157-1164, 2014), based on a survey given to researchers and students, of widespread misunderstanding of confidence intervals (CIs). They suggest that survey respondents may have interpreted the statements in the survey that we deemed incorrect in an idiosyncratic, but correct, way, thus calling into question the conclusion that the results indicate that respondents could not properly interpret CIs. Their alternative interpretations, while correct, cannot be deemed acceptable renderings of the questions in the survey due to the well-known reference class problem. Moreover, there is no support in the data for their contention that participants may have had their alternative interpretations in mind. Finally, their alternative interpretations are merely trivial restatements of the definition of a confidence interval, and have no implications for the location of a parameter.


Asunto(s)
Intervalos de Confianza , Interpretación Estadística de Datos , Humanos
19.
Psychon Bull Rev ; 23(1): 103-23, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26450628

RESUMEN

Interval estimates - estimates of parameters that include an allowance for sampling uncertainty - have long been touted as a key component of statistical analyses. There are several kinds of interval estimates, but the most popular are confidence intervals (CIs): intervals that contain the true parameter value in some known proportion of repeated samples, on average. The width of confidence intervals is thought to index the precision of an estimate; CIs are thought to be a guide to which parameter values are plausible or reasonable; and the confidence coefficient of the interval (e.g., 95 %) is thought to index the plausibility that the true parameter is included in the interval. We show in a number of examples that CIs do not necessarily have any of these properties, and can lead to unjustified or arbitrary inferences. For this reason, we caution against relying upon confidence interval theory to justify interval estimates, and suggest that other theories of interval estimation should be used instead.


Asunto(s)
Teorema de Bayes , Intervalos de Confianza , Interpretación Estadística de Datos , Humanos
20.
Psychon Bull Rev ; 21(5): 1157-64, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-24420726

RESUMEN

Null hypothesis significance testing (NHST) is undoubtedly the most common inferential technique used to justify claims in the social sciences. However, even staunch defenders of NHST agree that its outcomes are often misinterpreted. Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual. Nevertheless, little is known about how researchers interpret CIs. In this study, 120 researchers and 442 students-all in the field of psychology-were asked to assess the truth value of six particular statements involving different interpretations of a CI. Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs. Self-declared experience with statistics was not related to researchers' performance, and, even more surprisingly, researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever. Our findings suggest that many researchers do not know the correct interpretation of a CI. The misunderstandings surrounding p-values and CIs are particularly unfortunate because they constitute the main tools by which psychologists draw conclusions from data.


Asunto(s)
Intervalos de Confianza , Interpretación Estadística de Datos , Investigación Conductal , Escolaridad , Humanos , Probabilidad , Psicología/métodos , Estudiantes , Encuestas y Cuestionarios
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA