Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
3.
Lakartidningen ; 1202023 05 15.
Artigo em Sueco | MEDLINE | ID: mdl-37191395

RESUMO

Analysis of research data entails many choices. As a result, a space of different analytical strategies is open to researchers. Different justifiable analyses may not give similar results. The method of multiple analysts is a way to study the analytical flexibility and behaviour of researchers under naturalistic conditions, as part of the field known as metascience. Analytical flexibility and risks of bias can be counteracted by open data sharing, pre-registration of analysis plans, and registration of clinical trials in trial registers. These measures are particularly important for retrospective studies where analytical flexibility can be greatest, although pre-registration is less useful in this context. Synthetic datasets can be an alternative to pre-registration when used to decide what analyses should be conducted on real datasets by independent parties. All these strategies help build trustworthiness in scientific reports, and improve the reliability of research findings.


Assuntos
Pesquisa Biomédica , Humanos , Reprodutibilidade dos Testes , Estudos Retrospectivos
4.
Proc Natl Acad Sci U S A ; 120(23): e2215572120, 2023 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-37252958

RESUMO

Does competition affect moral behavior? This fundamental question has been debated among leading scholars for centuries, and more recently, it has been tested in experimental studies yielding a body of rather inconclusive empirical evidence. A potential source of ambivalent empirical results on the same hypothesis is design heterogeneity-variation in true effect sizes across various reasonable experimental research protocols. To provide further evidence on whether competition affects moral behavior and to examine whether the generalizability of a single experimental study is jeopardized by design heterogeneity, we invited independent research teams to contribute experimental designs to a crowd-sourced project. In a large-scale online data collection, 18,123 experimental participants were randomly allocated to 45 randomly selected experimental designs out of 95 submitted designs. We find a small adverse effect of competition on moral behavior in a meta-analysis of the pooled data. The crowd-sourced design of our study allows for a clean identification and estimation of the variation in effect sizes above and beyond what could be expected due to sampling variance. We find substantial design heterogeneity-estimated to be about 1.6 times as large as the average standard error of effect size estimates of the 45 research designs-indicating that the informativeness and generalizability of results based on a single experimental design are limited. Drawing strong conclusions about the underlying hypotheses in the presence of substantive design heterogeneity requires moving toward much larger data collections on various experimental designs testing the same hypothesis.

5.
R Soc Open Sci ; 9(9): 220440, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-36177198

RESUMO

Many publications on COVID-19 were released on preprint servers such as medRxiv and bioRxiv. It is unknown how reliable these preprints are, and which ones will eventually be published in scientific journals. In this study, we use crowdsourced human forecasts to predict publication outcomes and future citation counts for a sample of 400 preprints with high Altmetric score. Most of these preprints were published within 1 year of upload on a preprint server (70%), with a considerable fraction (45%) appearing in a high-impact journal with a journal impact factor of at least 10. On average, the preprints received 162 citations within the first year. We found that forecasters can predict if preprints will be published after 1 year and if the publishing journal has high impact. Forecasts are also informative with respect to Google Scholar citations within 1 year of upload on a preprint server. For both types of assessment, we found statistically significant positive correlations between forecasts and observed outcomes. While the forecasts can help to provide a preliminary assessment of preprints at a faster pace than traditional peer-review, it remains to be investigated if such an assessment is suited to identify methodological problems in preprints.

6.
Proc Natl Acad Sci U S A ; 119(30): e2120377119, 2022 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-35858443

RESUMO

This initiative examined systematically the extent to which a large set of archival research findings generalizes across contexts. We repeated the key analyses for 29 original strategic management effects in the same context (direct reproduction) as well as in 52 novel time periods and geographies; 45% of the reproductions returned results matching the original reports together with 55% of tests in different spans of years and 40% of tests in novel geographies. Some original findings were associated with multiple new tests. Reproducibility was the best predictor of generalizability-for the findings that proved directly reproducible, 84% emerged in other available time periods and 57% emerged in other geographies. Overall, only limited empirical evidence emerged for context sensitivity. In a forecasting survey, independent scientists were able to anticipate which effects would find support in tests in new samples.

7.
Sci Rep ; 12(1): 7575, 2022 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-35534489

RESUMO

Scientists and policymakers seek to choose effective interventions that promote preventative health measures. We evaluated whether academics, behavioral science practitioners, and laypeople (N = 1034) were able to forecast the effectiveness of seven different messages compared to a baseline message for Republicans and Democrats separately. These messages were designed to nudge mask-wearing attitudes, intentions, and behaviors. When examining predictions across political parties, forecasters predicted larger effects than those observed for Democrats compared to Republicans and made more accurate predictions for Republicans compared to Democrats. These results are partly driven by a lack of nudge effects on Democrats, as reported in Gelfand et al. (J Exp Soc Psychol, 2021). Academics and practitioners made more accurate predictions compared to laypeople. Although forecasters' predictions were correlated with the nudge interventions, all groups overestimated the observed results. We discuss potential reasons for why the forecasts did not perform better and how more accurate forecasts of behavioral intervention outcomes could potentially provide insight that can help save resources and increase the efficacy of interventions.


Assuntos
Atitude , Política , Terapia Comportamental
8.
Annu Rev Psychol ; 73: 719-748, 2022 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-34665669

RESUMO

Replication-an important, uncommon, and misunderstood practice-is gaining appreciation in psychology. Achieving replicability is important for making research progress. If findings are not replicable, then prediction and theory development are stifled. If findings are replicable, then interrogation of their meaning and validity can advance knowledge. Assessing replicability can be productive for generating and testing hypotheses by actively confronting current understandings to identify weaknesses and spur innovation. For psychology, the 2010s might be characterized as a decade of active confrontation. Systematic and multi-site replication projects assessed current understandings and observed surprising failures to replicate many published findings. Replication efforts highlighted sociocultural challenges such as disincentives to conduct replications and a tendency to frame replication as a personal attack rather than a healthy scientific practice, and they raised awareness that replication contributes to self-correction. Nevertheless, innovation in doing and understanding replication and its cousins, reproducibility and robustness, has positioned psychology to improve research practices and accelerate progress.


Assuntos
Projetos de Pesquisa , Humanos , Reprodutibilidade dos Testes
9.
Elife ; 102021 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-34751133

RESUMO

Any large dataset can be analyzed in a number of ways, and it is possible that the use of different analysis strategies will lead to different results and conclusions. One way to assess whether the results obtained depend on the analysis strategy chosen is to employ multiple analysts and leave each of them free to follow their own approach. Here, we present consensus-based guidance for conducting and reporting such multi-analyst studies, and we discuss how broader adoption of the multi-analyst approach has the potential to strengthen the robustness of results and conclusions obtained from analyses of datasets in basic and applied research.


Assuntos
Consenso , Análise de Dados , Conjuntos de Dados como Assunto , Pesquisa
10.
R Soc Open Sci ; 8(7): 181308, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-34295507

RESUMO

There is evidence that prediction markets are useful tools to aggregate information on researchers' beliefs about scientific results including the outcome of replications. In this study, we use prediction markets to forecast the results of novel experimental designs that test established theories. We set up prediction markets for hypotheses tested in the Defense Advanced Research Projects Agency's (DARPA) Next Generation Social Science (NGS2) programme. Researchers were invited to bet on whether 22 hypotheses would be supported or not. We define support as a test result in the same direction as hypothesized, with a Bayes factor of at least 10 (i.e. a likelihood of the observed data being consistent with the tested hypothesis that is at least 10 times greater compared with the null hypothesis). In addition to betting on this binary outcome, we asked participants to bet on the expected effect size (in Cohen's d) for each hypothesis. Our goal was to recruit at least 50 participants that signed up to participate in these markets. While this was the case, only 39 participants ended up actually trading. Participants also completed a survey on both the binary result and the effect size. We find that neither prediction markets nor surveys performed well in predicting outcomes for NGS2.

11.
Cortex ; 144: 213-229, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-33965167

RESUMO

There is growing awareness across the neuroscience community that the replicability of findings about the relationship between brain activity and cognitive phenomena can be improved by conducting studies with high statistical power that adhere to well-defined and standardised analysis pipelines. Inspired by recent efforts from the psychological sciences, and with the desire to examine some of the foundational findings using electroencephalography (EEG), we have launched #EEGManyLabs, a large-scale international collaborative replication effort. Since its discovery in the early 20th century, EEG has had a profound influence on our understanding of human cognition, but there is limited evidence on the replicability of some of the most highly cited discoveries. After a systematic search and selection process, we have identified 27 of the most influential and continually cited studies in the field. We plan to directly test the replicability of key findings from 20 of these studies in teams of at least three independent laboratories. The design and protocol of each replication effort will be submitted as a Registered Report and peer-reviewed prior to data collection. Prediction markets, open to all EEG researchers, will be used as a forecasting tool to examine which findings the community expects to replicate. This project will update our confidence in some of the most influential EEG findings and generate a large open access database that can be used to inform future research practices. Finally, through this international effort, we hope to create a cultural shift towards inclusive, high-powered multi-laboratory collaborations.


Assuntos
Eletroencefalografia , Neurociências , Cognição , Humanos , Reprodutibilidade dos Testes
12.
PLoS One ; 16(4): e0248780, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33852589

RESUMO

The reproducibility of published research has become an important topic in science policy. A number of large-scale replication projects have been conducted to gauge the overall reproducibility in specific academic fields. Here, we present an analysis of data from four studies which sought to forecast the outcomes of replication projects in the social and behavioural sciences, using human experts who participated in prediction markets and answered surveys. Because the number of findings replicated and predicted in each individual study was small, pooling the data offers an opportunity to evaluate hypotheses regarding the performance of prediction markets and surveys at a higher power. In total, peer beliefs were elicited for the replication outcomes of 103 published findings. We find there is information within the scientific community about the replicability of scientific findings, and that both surveys and prediction markets can be used to elicit and aggregate this information. Our results show prediction markets can determine the outcomes of direct replications with 73% accuracy (n = 103). Both the prediction market prices, and the average survey responses are correlated with outcomes (0.581 and 0.564 respectively, both p < .001). We also found a significant relationship between p-values of the original findings and replication outcomes. The dataset is made available through the R package "pooledmaRket" and can be used to further study community beliefs towards replications outcomes as elicited in the surveys and prediction markets.


Assuntos
Previsões/métodos , Reprodutibilidade dos Testes , Pesquisa/estatística & dados numéricos , Humanos , Pesquisa/tendências , Inquéritos e Questionários
13.
R Soc Open Sci ; 7(7): 200566, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32874648

RESUMO

The Defense Advanced Research Projects Agency (DARPA) programme 'Systematizing Confidence in Open Research and Evidence' (SCORE) aims to generate confidence scores for a large number of research claims from empirical studies in the social and behavioural sciences. The confidence scores will provide a quantitative assessment of how likely a claim will hold up in an independent replication. To create the scores, we follow earlier approaches and use prediction markets and surveys to forecast replication outcomes. Based on an initial set of forecasts for the overall replication rate in SCORE and its dependence on the academic discipline and the time of publication, we show that participants expect replication rates to increase over time. Moreover, they expect replication rates to differ between fields, with the highest replication rate in economics (average survey response 58%), and the lowest in psychology and in education (average survey response of 42% for both fields). These results reveal insights into the academic community's views of the replication crisis, including for research fields for which no large-scale replication studies have been undertaken yet.

14.
Nature ; 582(7810): 84-88, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32483374

RESUMO

Data analysis workflows in many scientific domains have become increasingly complex and flexible. Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses1. The flexibility of analytical approaches is exemplified by the fact that no two teams chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset2-5. Our findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for performing and reporting multiple analyses of the same data. Potential approaches that could be used to mitigate issues related to analytical variability are discussed.


Assuntos
Análise de Dados , Ciência de Dados/métodos , Ciência de Dados/normas , Conjuntos de Dados como Assunto , Neuroimagem Funcional , Imageamento por Ressonância Magnética , Pesquisadores/organização & administração , Encéfalo/diagnóstico por imagem , Encéfalo/fisiologia , Conjuntos de Dados como Assunto/estatística & dados numéricos , Feminino , Humanos , Modelos Logísticos , Masculino , Metanálise como Assunto , Modelos Neurológicos , Reprodutibilidade dos Testes , Pesquisadores/normas , Software
15.
Psychol Bull ; 146(5): 451-479, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-31944796

RESUMO

To what extent are research results influenced by subjective decisions that scientists make as they design studies? Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from 2 separate large samples (total N > 15,000) were then randomly assigned to complete 1 version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: Materials from different teams rendered statistically significant effects in opposite directions for 4 of 5 hypotheses, with the narrowest range in estimates being d = -0.37 to + 0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for 2 hypotheses and a lack of support for 3 hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, whereas considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim. (PsycInfo Database Record (c) 2020 APA, all rights reserved).


Assuntos
Crowdsourcing , Psicologia/métodos , Projetos de Pesquisa , Adulto , Humanos , Distribuição Aleatória
16.
PLoS One ; 14(12): e0225826, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31805105

RESUMO

We measure how accurately replication of experimental results can be predicted by black-box statistical models. With data from four large-scale replication projects in experimental psychology and economics, and techniques from machine learning, we train predictive models and study which variables drive predictable replication. The models predicts binary replication with a cross-validated accuracy rate of 70% (AUC of 0.77) and estimates of relative effect sizes with a Spearman ρ of 0.38. The accuracy level is similar to market-aggregated beliefs of peer scientists [1, 2]. The predictive power is validated in a pre-registered out of sample test of the outcome of [3], where 71% (AUC of 0.73) of replications are predicted correctly and effect size correlations amount to ρ = 0.25. Basic features such as the sample and effect sizes in original papers, and whether reported effects are single-variable main effects or two-variable interactions, are predictive of successful replication. The models presented in this paper are simple tools to produce cheap, prognostic replicability metrics. These models could be useful in institutionalizing the process of evaluation of new findings and guiding resources to those direct replications that are likely to be most informative.


Assuntos
Laboratórios , Pesquisa , Ciências Sociais , Algoritmos , Modelos Estatísticos , Curva ROC , Análise de Regressão , Reprodutibilidade dos Testes
17.
Sci Data ; 6(1): 106, 2019 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-31263104

RESUMO

There is an ongoing debate about the replicability of neuroimaging research. It was suggested that one of the main reasons for the high rate of false positive results is the many degrees of freedom researchers have during data analysis. In the Neuroimaging Analysis Replication and Prediction Study (NARPS), we aim to provide the first scientific evidence on the variability of results across analysis teams in neuroscience. We collected fMRI data from 108 participants during two versions of the mixed gambles task, which is often used to study decision-making under risk. For each participant, the dataset includes an anatomical (T1 weighted) scan and fMRI as well as behavioral data from four runs of the task. The dataset is shared through OpenNeuro and is formatted according to the Brain Imaging Data Structure (BIDS) standard. Data pre-processed with fMRIprep and quality control reports are also publicly shared. This dataset can be used to study decision-making under risk and to test replicability and interpretability of previous results in the field.


Assuntos
Encéfalo/diagnóstico por imagem , Neuroimagem , Encéfalo/fisiologia , Mapeamento Encefálico , Humanos , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Valor Preditivo dos Testes
18.
J Econ Sci Assoc ; 5(2): 149-169, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31894199

RESUMO

Many studies report on the association between 2D:4D, a putative marker for prenatal testosterone exposure, and economic preferences. However, most of these studies have limited sample sizes and test multiple hypotheses (without preregistration). In this study we mainly replicate the common specifications found in the literature for the association between the 2D:4D ratio and risk taking, the willingness to compete, and dictator game giving separately. In a sample of 330 women we find no robust associations between any of these economic preferences and 2D:4D. We find no evidence of a statistically significant relation for 16 of the 18 total regressions we run. The two regression specifications which are statistically significant have not previously been reported and the associations are not in the expected direction, and therefore they are unlikely to represent a real effect.

19.
Sci Data ; 5: 180236, 2018 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-30375993

RESUMO

We present four datasets from a project examining the role of politics in social psychological research. These include thousands of independent raters who coded scientific abstracts for political relevance and for whether conservatives or liberals were treated as targets of explanation and characterized in a negative light. Further included are predictions about the empirical results by scientists participating in a forecasting survey, and coded publication outcomes for unpublished research projects varying in political overtones. Future researchers can leverage this corpus to test further hypotheses regarding political values and scientific research, perceptions of political bias, publication histories, and forecasting accuracy.


Assuntos
Política , Psicologia Social , Projetos de Pesquisa/tendências , Humanos , Psicologia Social/métodos , Projetos de Pesquisa/estatística & dados numéricos , Inquéritos e Questionários
20.
Nat Hum Behav ; 2(9): 637-644, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-31346273

RESUMO

Being able to replicate scientific findings is crucial for scientific progress1-15. We replicate 21 systematically selected experimental studies in the social sciences published in Nature and Science between 2010 and 201516-36. The replications follow analysis plans reviewed by the original authors and pre-registered prior to the replications. The replications are high powered, with sample sizes on average about five times higher than in the original studies. We find a significant effect in the same direction as the original study for 13 (62%) studies, and the effect size of the replications is on average about 50% of the original effect size. Replicability varies between 12 (57%) and 14 (67%) studies for complementary replicability indicators. Consistent with these results, the estimated true-positive rate is 67% in a Bayesian analysis. The relative effect size of true positives is estimated to be 71%, suggesting that both false positives and inflated effect sizes of true positives contribute to imperfect reproducibility. Furthermore, we find that peer beliefs of replicability are strongly related to replicability, suggesting that the research community could predict which results would replicate and that failures to replicate were not the result of chance alone.


Assuntos
Reprodutibilidade dos Testes , Pesquisa/estatística & dados numéricos , Ciências Sociais/estatística & dados numéricos , Teorema de Bayes , Humanos , Publicações Periódicas como Assunto/estatística & dados numéricos , Tamanho da Amostra , Ciências Sociais/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA