Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Can large language models help predict results from a complex behavioural science study?

Lippert, Steffen; Dreber, Anna; Johannesson, Magnus; Tierney, Warren; Cyrus-Lai, Wilson; Uhlmann, Eric Luis; Pfeiffer, Thomas.

R Soc Open Sci ; 11(9): 240682, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-39323554

RESUMO

We tested whether large language models (LLMs) can help predict results from a complex behavioural science experiment. In study 1, we investigated the performance of the widely used LLMs GPT-3.5 and GPT-4 in forecasting the empirical findings of a large-scale experimental study of emotions, gender, and social perceptions. We found that GPT-4, but not GPT-3.5, matched the performance of a cohort of 119 human experts, with correlations of 0.89 (GPT-4), 0.07 (GPT-3.5) and 0.87 (human experts) between aggregated forecasts and realized effect sizes. In study 2, providing participants from a university subject pool the opportunity to query a GPT-4 powered chatbot significantly increased the accuracy of their forecasts. Results indicate promise for artificial intelligence (AI) to help anticipate-at scale and minimal cost-which claims about human behaviour will find empirical support and which ones will not. Our discussion focuses on avenues for human-AI collaboration in science.

Test many theories in many ways.

Cyrus-Lai, Wilson; Tierney, Warren; Uhlmann, Eric Luis.

Behav Brain Sci ; 47: e37, 2024 Feb 05.

Artigo em Inglês | MEDLINE | ID: mdl-38311437

RESUMO

Demonstrating the limitations of the one-at-a-time approach, crowd initiatives reveal the surprisingly powerful role of analytic and design choices in shaping scientific results. At the same time, cross-cultural variability in effects is far below the levels initially expected. This highlights the value of "medium" science, leveraging diverse stimulus sets and extensive robustness checks to achieve integrative tests of competing theories.

Are we all implicit puritans? New evidence that work and sex are intuitively moralized in both traditional and non-traditional cultures.

Tierney, Warren; Cyrus-Lai, Wilson; Uhlmann, Eric Luis.

Behav Brain Sci ; 46: e317, 2023 10 04.

Artigo em Inglês | MEDLINE | ID: mdl-37789543

RESUMO

Contradicting our earlier claims of American moral exceptionalism, recent self-replication evidence from our laboratory indicates that implicit puritanism characterizes the judgments of people across cultures. Implicit cultural evolution may lag behind explicit change, such that differences between traditional and non-traditional cultures are greater at a deliberative than an intuitive level. Not too deep down, perhaps we are all implicit puritans.

Assuntos

Julgamento , Princípios Morais , Humanos , Estados Unidos

Exposing and overcoming the fixed-effect fallacy through crowd science.

Cyrus-Lai, Wilson; Tierney, Warren; Schweinsberg, Martin; Uhlmann, Eric Luis.

Behav Brain Sci ; 45: e8, 2022 02 10.

Artigo em Inglês | MEDLINE | ID: mdl-35139965

RESUMO

By organizing crowds of scientists to independently tackle the same research questions, we can collectively overcome the generalizability crisis. Strategies to draw inferences from a heterogeneous set of research approaches include aggregation, for instance, meta-analyzing the effect sizes obtained by different investigators, and parsing, attempting to identify theoretically meaningful moderators that explain the variability in results.

Assuntos

Aglomeração , Humanos

Putting the Self in Self-Correction: Findings From the Loss-of-Confidence Project.

Rohrer, Julia M; Tierney, Warren; Uhlmann, Eric L; DeBruine, Lisa M; Heyman, Tom; Jones, Benedict; Schmukle, Stefan C; Silberzahn, Raphael; Willén, Rebecca M; Carlsson, Rickard; Lucas, Richard E; Strand, Julia; Vazire, Simine; Witt, Jessica K; Zentall, Thomas R; Chabris, Christopher F; Yarkoni, Tal.

Perspect Psychol Sci ; 16(6): 1255-1269, 2021 11.

Artigo em Inglês | MEDLINE | ID: mdl-33645334

RESUMO

Science is often perceived to be a self-correcting enterprise. In principle, the assessment of scientific claims is supposed to proceed in a cumulative fashion, with the reigning theories of the day progressively approximating truth more accurately over time. In practice, however, cumulative self-correction tends to proceed less efficiently than one might naively suppose. Far from evaluating new evidence dispassionately and infallibly, individual scientists often cling stubbornly to prior findings. Here we explore the dynamics of scientific self-correction at an individual rather than collective level. In 13 written statements, researchers from diverse branches of psychology share why and how they have lost confidence in one of their own published findings. We qualitatively characterize these disclosures and explore their implications. A cross-disciplinary survey suggests that such loss-of-confidence sentiments are surprisingly common among members of the broader scientific population yet rarely become part of the public record. We argue that removing barriers to self-correction at the individual level is imperative if the scientific community as a whole is to achieve the ideal of efficient self-correction.

Assuntos

Publicações , Pesquisadores , Atitude , Humanos , Processos Mentais , Redação

Quantifying the prevalence and adaptiveness of behavioral rationalizations.

Tierney, Warren; Uhlmann, Eric Luis.

Behav Brain Sci ; 43: e50, 2020 04 15.

Artigo em Inglês | MEDLINE | ID: mdl-32292136

RESUMO

Critical aspects of the "rationality of rationalizations" thesis are open empirical questions. These include the frequency with which past behavior determines attitudes (as opposed to attitudes causing future behaviors), the extent to which post hoc justifications take on a life of their own and shape future actions, and whether rationalizers experience benefits in well-being, social influence, performance, or other desirable outcomes.

Assuntos

Racionalização , Comportamento Sexual , Atitude , Prevalência

Crowdsourcing hypothesis tests: Making transparent how design choices shape research results.

Landy, Justin F; Jia, Miaolei Liam; Ding, Isabel L; Viganola, Domenico; Tierney, Warren; Dreber, Anna; Johannesson, Magnus; Pfeiffer, Thomas; Ebersole, Charles R; Gronau, Quentin F; Ly, Alexander; van den Bergh, Don; Marsman, Maarten; Derks, Koen; Wagenmakers, Eric-Jan; Proctor, Andrew; Bartels, Daniel M; Bauman, Christopher W; Brady, William J; Cheung, Felix; Cimpian, Andrei; Dohle, Simone; Donnellan, M Brent; Hahn, Adam; Hall, Michael P; Jiménez-Leal, William; Johnson, David J; Lucas, Richard E; Monin, Benoît; Montealegre, Andres; Mullen, Elizabeth; Pang, Jun; Ray, Jennifer; Reinero, Diego A; Reynolds, Jesse; Sowden, Walter; Storage, Daniel; Su, Runkun; Tworek, Christina M; Van Bavel, Jay J; Walco, Daniel; Wills, Julian; Xu, Xiaobing; Yam, Kai Chi; Yang, Xiaoyu; Cunningham, William A; Schweinsberg, Martin; Urwitz, Molly; Uhlmann, Eric L.

Psychol Bull ; 146(5): 451-479, 2020 05.

Artigo em Inglês | MEDLINE | ID: mdl-31944796

RESUMO

To what extent are research results influenced by subjective decisions that scientists make as they design studies? Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from 2 separate large samples (total N > 15,000) were then randomly assigned to complete 1 version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: Materials from different teams rendered statistically significant effects in opposite directions for 4 of 5 hypotheses, with the narrowest range in estimates being d = -0.37 to + 0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for 2 hypotheses and a lack of support for 3 hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, whereas considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim. (PsycInfo Database Record (c) 2020 APA, all rights reserved).

Assuntos

Crowdsourcing , Psicologia/métodos , Projetos de Pesquisa , Adulto , Humanos , Distribuição Aleatória

Making prepublication independent replication mainstream.

Tierney, Warren; Schweinsberg, Martin; Uhlmann, Eric Luis.

Behav Brain Sci ; 41: e153, 2018 01.

Artigo em Inglês | MEDLINE | ID: mdl-31064608

RESUMO

The widespread replication of research findings in independent laboratories prior to publication is suggested as a complement to traditional replication approaches. The pre-publication independent replication approach further addresses three key concerns from replication skeptics by systematically taking context into account, reducing reputational costs for original authors and replicators, and increasing the theoretical value of failed replications.

Assuntos

Pesquisa , Reprodutibilidade dos Testes

Data from a pre-publication independent replication initiative examining ten moral judgement effects.

Tierney, Warren; Schweinsberg, Martin; Jordan, Jennifer; Kennedy, Deanna M; Qureshi, Israr; Sommer, S Amy; Thornley, Nico; Madan, Nikhil; Vianello, Michelangelo; Awtrey, Eli; Zhu, Luke Lei; Diermeier, Daniel; Heinze, Justin E; Srinivasan, Malavika; Tannenbaum, David; Bivolaru, Eliza; Dana, Jason; Davis-Stober, Clintin P; du Plessis, Christilene; Gronau, Quentin F; Hafenbrack, Andrew C; Liao, Eko Yi; Ly, Alexander; Marsman, Maarten; Murase, Toshio; Schaerer, Michael; Tworek, Christina M; Wagenmakers, Eric-Jan; Wong, Lynn; Anderson, Tabitha; Bauman, Christopher W; Bedwell, Wendy L; Brescoll, Victoria; Canavan, Andrew; Chandler, Jesse J; Cheries, Erik; Cheryan, Sapna; Cheung, Felix; Cimpian, Andrei; Clark, Mark A; Cordon, Diana; Cushman, Fiery; Ditto, Peter H; Amell, Alice; Frick, Sarah E; Gamez-Djokic, Monica; Grady, Rebecca Hofstein; Graham, Jesse; Gu, Jun; Hahn, Adam.

Sci Data ; 3: 160082, 2016 Oct 11.

Artigo em Inglês | MEDLINE | ID: mdl-27727246

RESUMO

We present the data from a crowdsourced project seeking to replicate findings in independent laboratories before (rather than after) they are published. In this Pre-Publication Independent Replication (PPIR) initiative, 25 research groups attempted to replicate 10 moral judgment effects from a single laboratory's research pipeline of unpublished findings. The 10 effects were investigated using online/lab surveys containing psychological manipulations (vignettes) followed by questionnaires. Results revealed a mix of reliable, unreliable, and culturally moderated findings. Unlike any previous replication project, this dataset includes the data from not only the replications but also from the original studies, creating a unique corpus that researchers can use to better understand reproducibility and irreproducibility in science.

Assuntos

Princípios Morais , Reprodutibilidade dos Testes , Humanos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA