RESUMEN
Observers with autism spectrum disorders (ASDs) find it difficult to read intentions from movements. However, the computational bases of these difficulties are unknown. Do these difficulties reflect an intention readout deficit, or are they more likely rooted in kinematic (dis-)similarities between typical and ASD kinematics? We combined motion tracking, psychophysics, and computational analyses to uncover single-trial intention readout computations in typically developing (TD) children (n = 35) and children with ASD (n = 35) who observed actions performed by TD children and children with ASD. Average intention discrimination performance was above chance for TD observers but not for ASD observers. However, single-trial analysis showed that both TD and ASD observers read single-trial variations in movement kinematics. TD readers were better able to identify intention-informative kinematic features during observation of TD actions; conversely, ASD readers were better able to identify intention-informative features during observation of ASD actions. Crucially, while TD observers were generally able to extract the intention information encoded in movement kinematics, those with autism were unable to do so. These results extend existing conceptions of mind reading in ASD by suggesting that intention reading difficulties reflect both an interaction failure, rooted in kinematic dissimilarity between TD and ASD kinematics (at the level of feature identification), and an individual readout deficit (at the level of information extraction), accompanied by an overall reduced sensitivity of intention readout to single-trial variations in movement kinematics.
Asunto(s)
Trastorno del Espectro Autista/fisiopatología , Fenómenos Biomecánicos/fisiología , Patrones de Reconocimiento Fisiológico/fisiología , Adolescente , Trastorno Autístico , Niño , Desarrollo Infantil , Cognición , Comprensión/fisiología , Emociones/fisiología , Humanos , Intención , Movimiento/fisiologíaRESUMEN
At the core of what defines us as humans is the concept of theory of mind: the ability to track other people's mental states. The recent development of large language models (LLMs) such as ChatGPT has led to intense debate about the possibility that these models exhibit behaviour that is indistinguishable from human behaviour in theory of mind tasks. Here we compare human and LLM performance on a comprehensive battery of measurements that aim to measure different theory of mind abilities, from understanding false beliefs to interpreting indirect requests and recognizing irony and faux pas. We tested two families of LLMs (GPT and LLaMA2) repeatedly against these measures and compared their performance with those from a sample of 1,907 human participants. Across the battery of theory of mind tests, we found that GPT-4 models performed at, or even sometimes above, human levels at identifying indirect requests, false beliefs and misdirection, but struggled with detecting faux pas. Faux pas, however, was the only test where LLaMA2 outperformed humans. Follow-up manipulations of the belief likelihood revealed that the superiority of LLaMA2 was illusory, possibly reflecting a bias towards attributing ignorance. By contrast, the poor performance of GPT originated from a hyperconservative approach towards committing to conclusions rather than from a genuine failure of inference. These findings not only demonstrate that LLMs exhibit behaviour that is consistent with the outputs of mentalistic inference in humans but also highlight the importance of systematic testing to ensure a non-superficial comparison between human and artificial intelligences.