Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 121(32): e2403490121, 2024 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-39078672

RESUMO

A typical empirical study involves choosing a sample, a research design, and an analysis path. Variation in such choices across studies leads to heterogeneity in results that introduce an additional layer of uncertainty, limiting the generalizability of published scientific findings. We provide a framework for studying heterogeneity in the social sciences and divide heterogeneity into population, design, and analytical heterogeneity. Our framework suggests that after accounting for heterogeneity, the probability that the tested hypothesis is true for the average population, design, and analysis path can be much lower than implied by nominal error rates of statistically significant individual studies. We estimate each type's heterogeneity from 70 multilab replication studies, 11 prospective meta-analyses of studies employing different experimental designs, and 5 multianalyst studies. In our data, population heterogeneity tends to be relatively small, whereas design and analytical heterogeneity are large. Our results should, however, be interpreted cautiously due to the limited number of studies and the large uncertainty in the heterogeneity estimates. We discuss several ways to parse and account for heterogeneity in the context of different methodologies.

2.
Nature ; 582(7810): 84-88, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32483374

RESUMO

Data analysis workflows in many scientific domains have become increasingly complex and flexible. Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses1. The flexibility of analytical approaches is exemplified by the fact that no two teams chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset2-5. Our findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for performing and reporting multiple analyses of the same data. Potential approaches that could be used to mitigate issues related to analytical variability are discussed.


Assuntos
Análise de Dados , Ciência de Dados/métodos , Ciência de Dados/normas , Conjuntos de Dados como Assunto , Neuroimagem Funcional , Imageamento por Ressonância Magnética , Pesquisadores/organização & administração , Encéfalo/diagnóstico por imagem , Encéfalo/fisiologia , Conjuntos de Dados como Assunto/estatística & dados numéricos , Feminino , Humanos , Modelos Logísticos , Masculino , Metanálise como Assunto , Modelos Neurológicos , Reprodutibilidade dos Testes , Pesquisadores/normas , Software
3.
Proc Natl Acad Sci U S A ; 120(23): e2215572120, 2023 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-37252958

RESUMO

Does competition affect moral behavior? This fundamental question has been debated among leading scholars for centuries, and more recently, it has been tested in experimental studies yielding a body of rather inconclusive empirical evidence. A potential source of ambivalent empirical results on the same hypothesis is design heterogeneity-variation in true effect sizes across various reasonable experimental research protocols. To provide further evidence on whether competition affects moral behavior and to examine whether the generalizability of a single experimental study is jeopardized by design heterogeneity, we invited independent research teams to contribute experimental designs to a crowd-sourced project. In a large-scale online data collection, 18,123 experimental participants were randomly allocated to 45 randomly selected experimental designs out of 95 submitted designs. We find a small adverse effect of competition on moral behavior in a meta-analysis of the pooled data. The crowd-sourced design of our study allows for a clean identification and estimation of the variation in effect sizes above and beyond what could be expected due to sampling variance. We find substantial design heterogeneity-estimated to be about 1.6 times as large as the average standard error of effect size estimates of the 45 research designs-indicating that the informativeness and generalizability of results based on a single experimental design are limited. Drawing strong conclusions about the underlying hypotheses in the presence of substantive design heterogeneity requires moving toward much larger data collections on various experimental designs testing the same hypothesis.

4.
Proc Natl Acad Sci U S A ; 119(30): e2120377119, 2022 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-35858443

RESUMO

This initiative examined systematically the extent to which a large set of archival research findings generalizes across contexts. We repeated the key analyses for 29 original strategic management effects in the same context (direct reproduction) as well as in 52 novel time periods and geographies; 45% of the reproductions returned results matching the original reports together with 55% of tests in different spans of years and 40% of tests in novel geographies. Some original findings were associated with multiple new tests. Reproducibility was the best predictor of generalizability-for the findings that proved directly reproducible, 84% emerged in other available time periods and 57% emerged in other geographies. Overall, only limited empirical evidence emerged for context sensitivity. In a forecasting survey, independent scientists were able to anticipate which effects would find support in tests in new samples.

6.
Annu Rev Psychol ; 73: 719-748, 2022 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-34665669

RESUMO

Replication-an important, uncommon, and misunderstood practice-is gaining appreciation in psychology. Achieving replicability is important for making research progress. If findings are not replicable, then prediction and theory development are stifled. If findings are replicable, then interrogation of their meaning and validity can advance knowledge. Assessing replicability can be productive for generating and testing hypotheses by actively confronting current understandings to identify weaknesses and spur innovation. For psychology, the 2010s might be characterized as a decade of active confrontation. Systematic and multi-site replication projects assessed current understandings and observed surprising failures to replicate many published findings. Replication efforts highlighted sociocultural challenges such as disincentives to conduct replications and a tendency to frame replication as a personal attack rather than a healthy scientific practice, and they raised awareness that replication contributes to self-correction. Nevertheless, innovation in doing and understanding replication and its cousins, reproducibility and robustness, has positioned psychology to improve research practices and accelerate progress.


Assuntos
Projetos de Pesquisa , Humanos , Reprodutibilidade dos Testes
7.
Proc Natl Acad Sci U S A ; 112(50): 15343-7, 2015 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-26553988

RESUMO

Concerns about a lack of reproducibility of statistically significant results have recently been raised in many fields, and it has been argued that this lack comes at substantial economic costs. We here report the results from prediction markets set up to quantify the reproducibility of 44 studies published in prominent psychology journals and replicated in the Reproducibility Project: Psychology. The prediction markets predict the outcomes of the replications well and outperform a survey of market participants' individual forecasts. This shows that prediction markets are a promising tool for assessing the reproducibility of published scientific results. The prediction markets also allow us to estimate probabilities for the hypotheses being true at different testing stages, which provides valuable information regarding the temporal dynamics of scientific discovery. We find that the hypotheses being tested in psychology typically have low prior probabilities of being true (median, 9%) and that a "statistically significant" finding needs to be confirmed in a well-powered replication to have a high probability of being true. We argue that prediction markets could be used to obtain speedy information about reproducibility at low cost and could potentially even be used to determine which studies to replicate to optimally allocate limited resources into replications.


Assuntos
Previsões , Pesquisa , Ciência , Comércio , Probabilidade , Reprodutibilidade dos Testes , Inquéritos e Questionários
8.
Nature ; 452(7185): 348-51, 2008 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-18354481

RESUMO

A key aspect of human behaviour is cooperation. We tend to help others even if costs are involved. We are more likely to help when the costs are small and the benefits for the other person significant. Cooperation leads to a tension between what is best for the individual and what is best for the group. A group does better if everyone cooperates, but each individual is tempted to defect. Recently there has been much interest in exploring the effect of costly punishment on human cooperation. Costly punishment means paying a cost for another individual to incur a cost. It has been suggested that costly punishment promotes cooperation even in non-repeated games and without any possibility of reputation effects. But most of our interactions are repeated and reputation is always at stake. Thus, if costly punishment is important in promoting cooperation, it must do so in a repeated setting. We have performed experiments in which, in each round of a repeated game, people choose between cooperation, defection and costly punishment. In control experiments, people could only cooperate or defect. Here we show that the option of costly punishment increases the amount of cooperation but not the average payoff of the group. Furthermore, there is a strong negative correlation between total payoff and use of costly punishment. Those people who gain the highest total payoff tend not to use costly punishment: winners don't punish. This suggests that costly punishment behaviour is maladaptive in cooperation games and might have evolved for other reasons.


Assuntos
Altruísmo , Comportamento Cooperativo , Teoria dos Jogos , Punição/psicologia , Adulto , Evolução Biológica , Feminino , Humanos , Masculino , Modelos Psicológicos , Medição de Risco
9.
R Soc Open Sci ; 11(7): 240125, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39050728

RESUMO

Many-analysts studies explore how well an empirical claim withstands plausible alternative analyses of the same dataset by multiple, independent analysis teams. Conclusions from these studies typically rely on a single outcome metric (e.g. effect size) provided by each analysis team. Although informative about the range of plausible effects in a dataset, a single effect size from each team does not provide a complete, nuanced understanding of how analysis choices are related to the outcome. We used the Delphi consensus technique with input from 37 experts to develop an 18-item subjective evidence evaluation survey (SEES) to evaluate how each analysis team views the methodological appropriateness of the research design and the strength of evidence for the hypothesis. We illustrate the usefulness of the SEES in providing richer evidence assessment with pilot data from a previous many-analysts study.

10.
Lakartidningen ; 1202023 05 15.
Artigo em Sueco | MEDLINE | ID: mdl-37191395

RESUMO

Analysis of research data entails many choices. As a result, a space of different analytical strategies is open to researchers. Different justifiable analyses may not give similar results. The method of multiple analysts is a way to study the analytical flexibility and behaviour of researchers under naturalistic conditions, as part of the field known as metascience. Analytical flexibility and risks of bias can be counteracted by open data sharing, pre-registration of analysis plans, and registration of clinical trials in trial registers. These measures are particularly important for retrospective studies where analytical flexibility can be greatest, although pre-registration is less useful in this context. Synthetic datasets can be an alternative to pre-registration when used to decide what analyses should be conducted on real datasets by independent parties. All these strategies help build trustworthiness in scientific reports, and improve the reliability of research findings.


Assuntos
Pesquisa Biomédica , Humanos , Reprodutibilidade dos Testes , Estudos Retrospectivos
11.
Proc Natl Acad Sci U S A ; 106(15): 6187-91, 2009 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-19332775

RESUMO

People often favor members of their own group, while discriminating against members of other groups. Such in-group favoritism has been shown to play an important role in human cooperation. However, in the face of changing conflicts and shifting alliances, it is essential for group identities to be flexible. Using the dictator game from behavioral economics, we demonstrate the remodeling of group identities among supporters of Democratic presidential candidates Barack Obama and Hillary Clinton. After Clinton's concession in June 2008, Democrats were more generous toward supporters of their own preferred candidate than to supporters of the other Democratic candidate. The bias observed in June persisted into August, and disappeared only in early September after the Democratic National Convention. We also observe a strong gender effect, with bias both appearing and subsiding among men only. This experimental study illustrates a dynamic change in bias, tracking the realignment of real world conflict lines and public efforts to reconstitute group identity. The change in salient group identity we describe here likely contributed to the victory of Barack Obama in the 2008 presidential election.


Assuntos
Governo Federal , Processos Grupais , Política , Preconceito , Comportamento Social , Feminino , Humanos , Masculino , Fatores de Tempo , Estados Unidos
12.
Behav Brain Sci ; 35(1): 24, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22289313

RESUMO

Guala argues that there is a mismatch between most laboratory experiments on costly punishment and behavior in the field. In the lab, experimental designs typically suppress retaliation. The same is true for most theoretical models of the co-evolution of costly punishment and cooperation, which a priori exclude the possibility of defectors punishing cooperators.


Assuntos
Comportamento Cooperativo , Teoria dos Jogos , Modelos Psicológicos , Punição/psicologia , Comportamento Social , Humanos
13.
Sci Rep ; 12(1): 7575, 2022 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-35534489

RESUMO

Scientists and policymakers seek to choose effective interventions that promote preventative health measures. We evaluated whether academics, behavioral science practitioners, and laypeople (N = 1034) were able to forecast the effectiveness of seven different messages compared to a baseline message for Republicans and Democrats separately. These messages were designed to nudge mask-wearing attitudes, intentions, and behaviors. When examining predictions across political parties, forecasters predicted larger effects than those observed for Democrats compared to Republicans and made more accurate predictions for Republicans compared to Democrats. These results are partly driven by a lack of nudge effects on Democrats, as reported in Gelfand et al. (J Exp Soc Psychol, 2021). Academics and practitioners made more accurate predictions compared to laypeople. Although forecasters' predictions were correlated with the nudge interventions, all groups overestimated the observed results. We discuss potential reasons for why the forecasts did not perform better and how more accurate forecasts of behavioral intervention outcomes could potentially provide insight that can help save resources and increase the efficacy of interventions.


Assuntos
Atitude , Política , Terapia Comportamental
14.
R Soc Open Sci ; 9(9): 220440, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-36177198

RESUMO

Many publications on COVID-19 were released on preprint servers such as medRxiv and bioRxiv. It is unknown how reliable these preprints are, and which ones will eventually be published in scientific journals. In this study, we use crowdsourced human forecasts to predict publication outcomes and future citation counts for a sample of 400 preprints with high Altmetric score. Most of these preprints were published within 1 year of upload on a preprint server (70%), with a considerable fraction (45%) appearing in a high-impact journal with a journal impact factor of at least 10. On average, the preprints received 162 citations within the first year. We found that forecasters can predict if preprints will be published after 1 year and if the publishing journal has high impact. Forecasts are also informative with respect to Google Scholar citations within 1 year of upload on a preprint server. For both types of assessment, we found statistically significant positive correlations between forecasts and observed outcomes. While the forecasts can help to provide a preliminary assessment of preprints at a faster pace than traditional peer-review, it remains to be investigated if such an assessment is suited to identify methodological problems in preprints.

15.
PLoS One ; 16(4): e0248780, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33852589

RESUMO

The reproducibility of published research has become an important topic in science policy. A number of large-scale replication projects have been conducted to gauge the overall reproducibility in specific academic fields. Here, we present an analysis of data from four studies which sought to forecast the outcomes of replication projects in the social and behavioural sciences, using human experts who participated in prediction markets and answered surveys. Because the number of findings replicated and predicted in each individual study was small, pooling the data offers an opportunity to evaluate hypotheses regarding the performance of prediction markets and surveys at a higher power. In total, peer beliefs were elicited for the replication outcomes of 103 published findings. We find there is information within the scientific community about the replicability of scientific findings, and that both surveys and prediction markets can be used to elicit and aggregate this information. Our results show prediction markets can determine the outcomes of direct replications with 73% accuracy (n = 103). Both the prediction market prices, and the average survey responses are correlated with outcomes (0.581 and 0.564 respectively, both p < .001). We also found a significant relationship between p-values of the original findings and replication outcomes. The dataset is made available through the R package "pooledmaRket" and can be used to further study community beliefs towards replications outcomes as elicited in the surveys and prediction markets.


Assuntos
Previsões/métodos , Reprodutibilidade dos Testes , Pesquisa/estatística & dados numéricos , Humanos , Pesquisa/tendências , Inquéritos e Questionários
16.
R Soc Open Sci ; 8(7): 181308, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-34295507

RESUMO

There is evidence that prediction markets are useful tools to aggregate information on researchers' beliefs about scientific results including the outcome of replications. In this study, we use prediction markets to forecast the results of novel experimental designs that test established theories. We set up prediction markets for hypotheses tested in the Defense Advanced Research Projects Agency's (DARPA) Next Generation Social Science (NGS2) programme. Researchers were invited to bet on whether 22 hypotheses would be supported or not. We define support as a test result in the same direction as hypothesized, with a Bayes factor of at least 10 (i.e. a likelihood of the observed data being consistent with the tested hypothesis that is at least 10 times greater compared with the null hypothesis). In addition to betting on this binary outcome, we asked participants to bet on the expected effect size (in Cohen's d) for each hypothesis. Our goal was to recruit at least 50 participants that signed up to participate in these markets. While this was the case, only 39 participants ended up actually trading. Participants also completed a survey on both the binary result and the effect size. We find that neither prediction markets nor surveys performed well in predicting outcomes for NGS2.

17.
Cortex ; 144: 213-229, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-33965167

RESUMO

There is growing awareness across the neuroscience community that the replicability of findings about the relationship between brain activity and cognitive phenomena can be improved by conducting studies with high statistical power that adhere to well-defined and standardised analysis pipelines. Inspired by recent efforts from the psychological sciences, and with the desire to examine some of the foundational findings using electroencephalography (EEG), we have launched #EEGManyLabs, a large-scale international collaborative replication effort. Since its discovery in the early 20th century, EEG has had a profound influence on our understanding of human cognition, but there is limited evidence on the replicability of some of the most highly cited discoveries. After a systematic search and selection process, we have identified 27 of the most influential and continually cited studies in the field. We plan to directly test the replicability of key findings from 20 of these studies in teams of at least three independent laboratories. The design and protocol of each replication effort will be submitted as a Registered Report and peer-reviewed prior to data collection. Prediction markets, open to all EEG researchers, will be used as a forecasting tool to examine which findings the community expects to replicate. This project will update our confidence in some of the most influential EEG findings and generate a large open access database that can be used to inform future research practices. Finally, through this international effort, we hope to create a cultural shift towards inclusive, high-powered multi-laboratory collaborations.


Assuntos
Eletroencefalografia , Neurociências , Cognição , Humanos , Reprodutibilidade dos Testes
18.
Elife ; 102021 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-34751133

RESUMO

Any large dataset can be analyzed in a number of ways, and it is possible that the use of different analysis strategies will lead to different results and conclusions. One way to assess whether the results obtained depend on the analysis strategy chosen is to employ multiple analysts and leave each of them free to follow their own approach. Here, we present consensus-based guidance for conducting and reporting such multi-analyst studies, and we discuss how broader adoption of the multi-analyst approach has the potential to strengthen the robustness of results and conclusions obtained from analyses of datasets in basic and applied research.


Assuntos
Consenso , Análise de Dados , Conjuntos de Dados como Assunto , Pesquisa
20.
R Soc Open Sci ; 7(7): 200566, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32874648

RESUMO

The Defense Advanced Research Projects Agency (DARPA) programme 'Systematizing Confidence in Open Research and Evidence' (SCORE) aims to generate confidence scores for a large number of research claims from empirical studies in the social and behavioural sciences. The confidence scores will provide a quantitative assessment of how likely a claim will hold up in an independent replication. To create the scores, we follow earlier approaches and use prediction markets and surveys to forecast replication outcomes. Based on an initial set of forecasts for the overall replication rate in SCORE and its dependence on the academic discipline and the time of publication, we show that participants expect replication rates to increase over time. Moreover, they expect replication rates to differ between fields, with the highest replication rate in economics (average survey response 58%), and the lowest in psychology and in education (average survey response of 42% for both fields). These results reveal insights into the academic community's views of the replication crisis, including for research fields for which no large-scale replication studies have been undertaken yet.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA