Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 124
Filter
1.
Proc Natl Acad Sci U S A ; 121(24): e2318124121, 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38830100

ABSTRACT

There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs; this is insufficient for making an informed decision about which LLMs are best to use in an interactive setting, and how that varies by setting. Static assessment therefore limits how we understand language model capabilities. We introduce CheckMate, an adaptable prototype platform for humans to interact with and evaluate LLMs. We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics, with a mixed cohort of participants from undergraduate students to professors of mathematics. We release the resulting interaction and rating dataset, MathConverse. By analyzing MathConverse, we derive a taxonomy of human query behaviors and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness in LLM generations, among other findings. Further, we garner a more granular understanding of GPT-4 mathematical problem-solving through a series of case studies, contributed by experienced mathematicians. We conclude with actionable takeaways for ML practitioners and mathematicians: models that communicate uncertainty, respond well to user corrections, and can provide a concise rationale for their recommendations, may constitute better assistants. Humans should inspect LLM output carefully given their current shortcomings and potential for surprising fallibility.


Subject(s)
Language , Mathematics , Problem Solving , Humans , Problem Solving/physiology , Students/psychology
2.
Nat Hum Behav ; 8(6): 1035-1043, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38907029

ABSTRACT

Board, card or video games have been played by virtually every individual in the world. Games are popular because they are intuitive and fun. These distinctive qualities of games also make them ideal for studying the mind. By being intuitive, games provide a unique vantage point for understanding the inductive biases that support behaviour in more complex, ecological settings than traditional laboratory experiments. By being fun, games allow researchers to study new questions in cognition such as the meaning of 'play' and intrinsic motivation, while also supporting more extensive and diverse data collection by attracting many more participants. We describe the advantages and drawbacks of using games relative to standard laboratory-based experiments and lay out a set of recommendations on how to gain the most from using games to study cognition. We hope this Perspective will lead to a wider use of games as experimental paradigms, elevating the ecological validity, scale and robustness of research on the mind.


Subject(s)
Cognition , Video Games , Humans , Video Games/psychology , Games, Experimental , Motivation
3.
Cognition ; 250: 105790, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38908304

ABSTRACT

Rules help guide our behavior-particularly in complex social contexts. But rules sometimes give us the "wrong" answer. How do we know when it is okay to break the rules? In this paper, we argue that we sometimes use contractualist (agreement-based) mechanisms to determine when a rule can be broken. Our model draws on a theory of social interactions - "virtual bargaining" - that assumes that actors engage in a simulated bargaining process when navigating the social world. We present experimental data which suggests that rule-breaking decisions are sometimes driven by virtual bargaining and show that these data cannot be explained by more traditional rule-based or outcome-based approaches.


Subject(s)
Judgment , Morals , Humans , Judgment/physiology , Adult , Female , Male , Social Interaction , Young Adult , Decision Making/physiology , Negotiating
4.
Trends Cogn Sci ; 28(7): 628-642, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38616478

ABSTRACT

Humans often pursue idiosyncratic goals that appear remote from functional ends, including information gain. We suggest that this is valuable because goals (even prima facie foolish or unachievable ones) contain structured information that scaffolds thinking and planning. By evaluating hypotheses and plans with respect to their goals, humans can discover new ideas that go beyond prior knowledge and observable evidence. These hypotheses and plans can be transmitted independently of their original motivations, adapted across generations, and serve as an engine of cultural evolution. Here, we review recent empirical and computational research underlying goal generation and planning and discuss the ways that the flexibility of our motivational system supports cognitive gains for both individuals and societies.


Subject(s)
Cognition , Goals , Humans , Cognition/physiology , Motivation , Thinking/physiology
5.
Trends Cogn Sci ; 28(6): 517-540, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38508911

ABSTRACT

Large language models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their linguistic and cognitive capabilities remain split. Here, we evaluate LLMs using a distinction between formal linguistic competence (knowledge of linguistic rules and patterns) and functional linguistic competence (understanding and using language in the world). We ground this distinction in human neuroscience, which has shown that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty and often requires specialized fine-tuning and/or coupling with external modules. We posit that models that use language in human-like ways would need to master both of these competence types, which, in turn, could require the emergence of separate mechanisms specialized for formal versus functional linguistic competence.


Subject(s)
Language , Humans , Thinking/physiology , Linguistics
6.
Top Cogn Sci ; 16(1): 54-70, 2024 01.
Article in English | MEDLINE | ID: mdl-37962526

ABSTRACT

Great storytelling takes us on a journey the way ordinary reality rarely does. But what exactly do we mean by this "journey?" Recently, literary theorist Karin Kukkonen proposed that storytelling is "probability design:" the art of giving an audience pieces of information bit by bit, to craft the journey of their changing beliefs about the fictional world. A good "probability design" choreographs a delicate dance of certainty and surprise in the reader's mind as the story unfolds from beginning to end. In this paper, we computationally model this conception of storytelling. Building on the classic Bayesian inverse planning model of human social cognition, we treat storytelling as inverse inverse planning: the task of choosing actions to manipulate an inverse planner's inferences, and therefore a human audience's beliefs. First, we use an inverse inverse planner to depict social and physical situations, and present behavioral studies indicating that inverse inverse planning produces more expressive behavior than ordinary "naïve planning." Then, through a series of examples, we demonstrate how inverse inverse planning captures many storytelling elements from first principles: character, narrative arcs, plot twists, irony, flashbacks, and deus ex machina are all naturally encoded in the flexible language of probability design.


Subject(s)
Communication , Narration , Humans , Bayes Theorem , Language
7.
Nat Hum Behav ; 8(2): 320-335, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37996497

ABSTRACT

Many surface cues support three-dimensional shape perception, but humans can sometimes still see shape when these features are missing-such as when an object is covered with a draped cloth. Here we propose a framework for three-dimensional shape perception that explains perception in both typical and atypical cases as analysis-by-synthesis, or inference in a generative model of image formation. The model integrates intuitive physics to explain how shape can be inferred from the deformations it causes to other objects, as in cloth draping. Behavioural and computational studies comparing this account with several alternatives show that it best matches human observers (total n = 174) in both accuracy and response times, and is the only model that correlates significantly with human performance on difficult discriminations. We suggest that bottom-up deep neural network models are not fully adequate accounts of human shape perception, and point to how machine vision systems might achieve more human-like robustness.


Subject(s)
Form Perception , Humans , Form Perception/physiology , Neural Networks, Computer , Cues
8.
Psychon Bull Rev ; 2023 Dec 04.
Article in English | MEDLINE | ID: mdl-38049575

ABSTRACT

'Embodied cognition' suggests that our bodily experiences broadly shape our cognitive capabilities. We study how embodied experience affects the abstract physical problem-solving styles people use in a virtual task where embodiment does not affect action capabilities. We compare how groups with different embodied experience - 25 children and 35 adults with congenital limb differences versus 45 children and 40 adults born with two hands - perform this task, and find that while there is no difference in overall competence, the groups use different cognitive styles to find solutions. People born with limb differences think more before acting but take fewer attempts to reach solutions. Conversely, development affects the particular actions children use, as well as their persistence with their current strategy. Our findings suggest that while development alters action choices and persistence, differences in embodied experience drive changes in the acquisition of cognitive styles for balancing acting with thinking.

9.
Nat Hum Behav ; 7(10): 1767-1776, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37591983

ABSTRACT

Groups coordinate more effectively when individuals are able to learn from others' successes. But acquiring such knowledge is not always easy, especially in real-world environments where success is hidden from public view. We suggest that social inference capacities may help bridge this gap, allowing individuals to update their beliefs about others' underlying knowledge and success from observable trajectories of behaviour. We compared our social inference model against simpler heuristics in three studies of human behaviour in a collective-sensing task. Experiment 1 demonstrated that average performance improved as a function of group size at a rate greater than predicted by heuristic models. Experiment 2 introduced artificial agents to evaluate how individuals selectively rely on social information. Experiment 3 generalized these findings to a more complex reward landscape. Taken together, our findings provide insight into the relationship between individual social cognition and the flexibility of collective behaviour.

10.
Nat Hum Behav ; 7(9): 1481-1489, 2023 09.
Article in English | MEDLINE | ID: mdl-37488401

ABSTRACT

Studies of human exploration frequently cast people as serendipitously stumbling upon good options. Yet these studies may not capture the richness of exploration strategies that people exhibit in more complex environments. Here we study behaviour in a large dataset of 29,493 players of the richly structured online game 'Little Alchemy 2'. In this game, players start with four elements, which they can combine to create up to 720 complex objects. We find that players are driven not only by external reward signals, such as an attempt to produce successful outcomes, but also by an intrinsic motivation to create objects that empower them to create even more objects. We find that this drive for empowerment is eliminated when playing a game variant that lacks recognizable semantics, indicating that people use their knowledge about the world and its possibilities to guide their exploration. Our results suggest that the drive for empowerment may be a potent source of intrinsic motivation in richly structured domains, particularly those that lack explicit reward signals.


Subject(s)
Video Games , Humans , Exploratory Behavior , Motivation , Achievement , Reward
11.
Philos Trans A Math Phys Eng Sci ; 381(2251): 20220050, 2023 Jul 24.
Article in English | MEDLINE | ID: mdl-37271169

ABSTRACT

Expert problem-solving is driven by powerful languages for thinking about problems and their solutions. Acquiring expertise means learning these languages-systems of concepts, alongside the skills to use them. We present DreamCoder, a system that learns to solve problems by writing programs. It builds expertise by creating domain-specific programming languages for expressing domain concepts, together with neural networks to guide the search for programs within these languages. A 'wake-sleep' learning algorithm alternately extends the language with new symbolic abstractions and trains the neural network on imagined and replayed problems. DreamCoder solves both classic inductive programming tasks and creative tasks such as drawing pictures and building scenes. It rediscovers the basics of modern functional programming, vector algebra and classical physics, including Newton's and Coulomb's laws. Concepts are built compositionally from those learned earlier, yielding multilayered symbolic representations that are interpretable and transferrable to new tasks, while still growing scalably and flexibly with experience. This article is part of a discussion meeting issue 'Cognitive artificial intelligence'.

12.
Philos Trans A Math Phys Eng Sci ; 381(2251): 20220047, 2023 Jul 24.
Article in English | MEDLINE | ID: mdl-37271174

ABSTRACT

From sparse descriptions of events, observers can make systematic and nuanced predictions of what emotions the people involved will experience. We propose a formal model of emotion prediction in the context of a public high-stakes social dilemma. This model uses inverse planning to infer a person's beliefs and preferences, including social preferences for equity and for maintaining a good reputation. The model then combines these inferred mental contents with the event to compute 'appraisals': whether the situation conformed to the expectations and fulfilled the preferences. We learn functions mapping computed appraisals to emotion labels, allowing the model to match human observers' quantitative predictions of 20 emotions, including joy, relief, guilt and envy. Model comparison indicates that inferred monetary preferences are not sufficient to explain observers' emotion predictions; inferred social preferences are factored into predictions for nearly every emotion. Human observers and the model both use minimal individualizing information to adjust predictions of how different people will respond to the same event. Thus, our framework integrates inverse planning, event appraisals and emotion concepts in a single computational model to reverse-engineer people's intuitive theory of emotions. This article is part of a discussion meeting issue 'Cognitive artificial intelligence'.


Subject(s)
Theory of Mind , Humans , Artificial Intelligence , Emotions
14.
J Exp Psychol Gen ; 152(8): 2237-2269, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37093666

ABSTRACT

From building towers to picking an orange from a stack of fruit, assessing support is critical for successfully interacting with the physical world. But how do people determine whether one object supports another? In this paper, we develop a counterfactual simulation model (CSM) of causal judgments about physical support. The CSM predicts that people judge physical support by mentally simulating what would happen to a scene if the object of interest was removed. Three experiments test the model by asking one group of participants to judge what would happen to a tower if one of the blocks were removed, and another group of participants how responsible that block was for the tower's stability. The CSM accurately captures participants' predictions by running noisy simulations that incorporate different sources of uncertainty. Participants' responsibility judgments are closely related to counterfactual predictions: a block is more responsible when many other blocks would fall if it were removed. By construing physical support as preventing from falling, the CSM provides a unified account of how causal judgments in dynamic and static physical scenes arise from the process of counterfactual simulation. (PsycInfo Database Record (c) 2023 APA, all rights reserved).


Subject(s)
Judgment , Social Behavior , Humans , Uncertainty , Causality
15.
J Exp Psychol Gen ; 152(7): 1951-1966, 2023 Jul.
Article in English | MEDLINE | ID: mdl-36939608

ABSTRACT

Having an internal model of one's attention can be useful for effectively managing limited perceptual and cognitive resources. While previous work has hinted at the existence of an internal model of attention, it is still unknown how rich and flexible this model is, whether it corresponds to one's own attention or to a generic person-invariant schema, and whether it is specified as a list of facts and rules or alternatively as a probabilistic simulation model. To this end, we tested participants' ability to estimate their own behavior in a visual search task with novel displays. In six online experiments (four pre-registered), prospective search time estimates reflected accurate metacognitive knowledge of key findings in the visual search literature, including the set-size effect, higher efficiency of color over conjunction search, and the asymmetric contributions of target and distractor identities to search difficulty. In contrast, estimates were biased to assume serial search, and demonstrated little to no insight into sizeable effects of search asymmetries for basic visual features, and of target-distractor similarity. Together, our findings reveal a complex picture, where internal models of visual search are sensitive to some, but not all, of the factors that make some searches more difficult than others. (PsycInfo Database Record (c) 2023 APA, all rights reserved).


Subject(s)
Attention , Metacognition , Humans , Prospective Studies , Computer Simulation , Visual Perception , Reaction Time
16.
Neuron ; 111(8): 1331-1344.e8, 2023 04 19.
Article in English | MEDLINE | ID: mdl-36898374

ABSTRACT

Humans learn internal models of the world that support planning and generalization in complex environments. Yet it remains unclear how such internal models are represented and learned in the brain. We approach this question using theory-based reinforcement learning, a strong form of model-based reinforcement learning in which the model is a kind of intuitive theory. We analyzed fMRI data from human participants learning to play Atari-style games. We found evidence of theory representations in prefrontal cortex and of theory updating in prefrontal cortex, occipital cortex, and fusiform gyrus. Theory updates coincided with transient strengthening of theory representations. Effective connectivity during theory updating suggests that information flows from prefrontal theory-coding regions to posterior theory-updating regions. Together, our results are consistent with a neural architecture in which top-down theory representations originating in prefrontal regions shape sensory predictions in visual areas, where factored theory prediction errors are computed and trigger bottom-up updates of the theory.


Subject(s)
Learning , Reinforcement, Psychology , Humans , Prefrontal Cortex/diagnostic imaging , Magnetic Resonance Imaging/methods
17.
Open Mind (Camb) ; 6: 211-231, 2022.
Article in English | MEDLINE | ID: mdl-36439074

ABSTRACT

Do infants appreciate that other people's actions may fail, and that these failures endow risky actions with varying degrees of negative utility (i.e., danger)? Three experiments, including a pre-registered replication, addressed this question by presenting 12- to 15-month-old infants (N = 104, 52 female, majority White) with an animated agent who jumped over trenches of varying depth towards its goals. Infants expected the agent to minimize the danger of its actions, and they learned which goal the agent preferred by observing how much danger it risked to reach each goal, even though the agent's actions were physically identical and never failed. When we tested younger, 10-month-old infants (N = 102, 52 female, majority White) in a fourth experiment, they did not succeed consistently on the same tasks. These findings provide evidence that one-year-old infants use the height that other agents could fall from in order to explain and predict those agents' actions.

18.
Nat Hum Behav ; 6(11): 1557-1568, 2022 11.
Article in English | MEDLINE | ID: mdl-36065061

ABSTRACT

Decades of research indicate that some of the epistemic practices that support scientific enquiry emerge as part of intuitive reasoning in early childhood. Here, we ask whether adults and young children can use intuitive statistical reasoning and metacognitive strategies to estimate how much information they might need to solve different discrimination problems, suggesting that they have some of the foundations for 'intuitive power analyses'. Across five experiments, both adults (N = 290) and children (N = 48, 6-8 years) were able to precisely represent the relative difficulty of discriminating populations and recognized that larger samples were required for populations with greater overlap. Participants were sensitive to the cost of sampling, as well as the perceptual nature of the stimuli. These findings indicate that both young children and adults metacognitively represent their own ability to make discriminations even in the absence of data, and can use this to guide efficient and effective exploration.


Subject(s)
Metacognition , Humans , Child, Preschool , Adult , Child , Problem Solving
19.
Nat Commun ; 13(1): 5024, 2022 08 30.
Article in English | MEDLINE | ID: mdl-36042196

ABSTRACT

Automated, data-driven construction and evaluation of scientific models and theories is a long-standing challenge in artificial intelligence. We present a framework for algorithmically synthesizing models of a basic part of human language: morpho-phonology, the system that builds word forms from sounds. We integrate Bayesian inference with program synthesis and representations inspired by linguistic theory and cognitive models of learning and discovery. Across 70 datasets from 58 diverse languages, our system synthesizes human-interpretable models for core aspects of each language's morpho-phonology, sometimes approaching models posited by human linguists. Joint inference across all 70 data sets automatically synthesizes a meta-model encoding interpretable cross-language typological tendencies. Finally, the same algorithm captures few-shot learning dynamics, acquiring new morphophonological rules from just one or a few examples. These results suggest routes to more powerful machine-enabled discovery of interpretable models in linguistics and other scientific domains.


Subject(s)
Artificial Intelligence , Language , Bayes Theorem , Humans , Learning , Linguistics
20.
Elife ; 112022 05 30.
Article in English | MEDLINE | ID: mdl-35635277

ABSTRACT

Successful engagement with the world requires the ability to predict what will happen next. Here, we investigate how the brain makes a fundamental prediction about the physical world: whether the situation in front of us is stable, and hence likely to stay the same, or unstable, and hence likely to change in the immediate future. Specifically, we ask if judgments of stability can be supported by the kinds of representations that have proven to be highly effective at visual object recognition in both machines and brains, or instead if the ability to determine the physical stability of natural scenes may require generative algorithms that simulate the physics of the world. To find out, we measured responses in both convolutional neural networks (CNNs) and the brain (using fMRI) to natural images of physically stable versus unstable scenarios. We find no evidence for generalizable representations of physical stability in either standard CNNs trained on visual object and scene classification (ImageNet), or in the human ventral visual pathway, which has long been implicated in the same process. However, in frontoparietal regions previously implicated in intuitive physical reasoning we find both scenario-invariant representations of physical stability, and higher univariate responses to unstable than stable scenes. These results demonstrate abstract representations of physical stability in the dorsal but not ventral pathway, consistent with the hypothesis that the computations underlying stability entail not just pattern classification but forward physical simulation.


Subject(s)
Brain Mapping , Brain , Brain/diagnostic imaging , Humans , Magnetic Resonance Imaging/methods , Neural Networks, Computer , Photic Stimulation
SELECTION OF CITATIONS
SEARCH DETAIL