RESUMO
Humans form sequences of event models-representations of the current situation-to predict how activity will unfold. Multiple mechanisms have been proposed for how the cognitive system determines when to segment the stream of behavior and switch from one active event model to another. Here, we constructed a computational model that learns knowledge about event classes (event schemas), by combining recurrent neural networks for short-term dynamics with Bayesian inference over event classes for event-to-event transitions. This architecture represents event schemas and uses them to construct a series of event models. This architecture was trained on one pass through 18â h of naturalistic human activities. Another 3.5â h of activities were used to test each variant for agreement with human segmentation and categorization. The architecture was able to learn to predict human activity, and it developed segmentation and categorization approaching human-like performance. We then compared two variants of this architecture designed to better emulate human event segmentation: one transitioned when the active event model produced high uncertainty in its prediction and the other transitioned when the active event model produced a large prediction error. The two variants learned to segment and categorize events, and the prediction uncertainty variant provided a somewhat closer match to human segmentation and categorization-despite being given no feedback about segmentation or categorization. These results suggest that event model transitioning based on prediction uncertainty or prediction error can reproduce two important features of human event comprehension.
RESUMO
An important aspect of any social interaction involves inferring other people's mental states, intentions, and their likely next actions, by way of facial expression, body posture, eye gaze, and limb movements. An actor's production of actions during social interactions and the observer's perception of these actions are thus closely linked. In this review, we outline an action-observation methodology, which not only allows for separate analyses of production and perception, but also promotes the study of the dynamic interaction between these two sides of every social exchange. We review two lines of research that have benefited from its application. The first line focuses on individuals performing tasks alone and the observation of their actions by other individuals in order to make inferences about their attentional states. The second line of study focused on pairs of individuals performing collaborative tasks in naturalistic settings and the observation of these performances by other individuals. We offer several suggestions for how this methodology can be extended to improve on the limitations of the present studies, as well as some suggestions of how to use this methodology to venture into new territory. Our aim is to inspire future research applications of this methodology in order to advance our understanding of social action production and perception.
RESUMO
This review delves into the remarkable career and scientific contributions of Frans de Waal, a renowned figure in the field of ethology, primatology with important implications for the field of social neurosciences. Rooted in the Dutch tradition of ethology, influenced by luminaries like Niko Tinbergen and Jan Van Hooff, De Waal's career began with groundbreaking research on chimpanzees, which questioned long-held beliefs about dominance and aggression in animal behavior. His work, epitomized in his influential books, such as "Chimpanzee Politics", "The ape and the sushi master", "The age of empathy", not only revolutionized scientific thinking but also ignited discussions about empathy, morality, and complex cognitive functions in animals. De Waal's interdisciplinary approach extended to neuroscience, particularly in understanding empathy, contributing to the development of an original model: the Perception-Action Model (PAM). The fundamental concept of PAM is that even the most intricate forms of empathy stem from basic neural mechanisms of action-perception, such as mirror neurons. Some behavioral phenomena like motor mimicry and emotional contagion arise from a direct neuroanatomical network activity where sensory information about others' emotional states triggers corresponding behavioral responses. Intriguingly, even the most intricate forms of empathy such as concern, consolation and targeted helping, may have evolved from basic neural mechanisms of action-perception.Through these investigations and theoretical explorations, he advocated for a bottom-up approach to comprehending the cognitive abilities of animals. This approach challenged conventional anthropocentric perspectives and underscored the interconnected emotional and cognitive terrain shared among humans and other species. Beyond academia, De Waal's work has profound implications for how we perceive and interact with animals. By debunking notions of human exceptionalism, he highlights the rich tapestry of emotions that bind all living beings. Through his efforts, De Waal has not only advanced our scientific understanding of animal minds but also fostered a more profound appreciation for the depth of emotional connections across species.
RESUMO
While the cognitivist school of thought holds that the mind is analogous to a computer, performing logical operations over internal representations, the tradition of ecological psychology contends that organisms can directly "resonate" to information for action and perception without the need for a representational intermediary. The concept of resonance has played an important role in ecological psychology, but it remains a metaphor. Supplying a mechanistic account of resonance requires a non-representational account of central nervous system (CNS) dynamics. Towards this, we present a series of simple models in which a reservoir network with homeostatic nodes is used to control a simple agent embedded in an environment. This network spontaneously produces behaviors that are adaptive in each context, including (1) visually tracking a moving object, (2) substantially above-chance performance in the arcade game Pong, (2) and avoiding walls while controlling a mobile agent. Upon analyzing the dynamics of the networks, we find that behavioral stability can be maintained without the formation of stable or recurring patterns of network activity that could be identified as neural representations. These results may represent a useful step towards a mechanistic grounding of resonance and a view of the CNS that is compatible with ecological psychology.
RESUMO
Auditory localization is a fundamental ability that allows to perceive the spatial location of a sound source in the environment. The present work aims to provide a comprehensive overview of the mechanisms and acoustic cues used by the human perceptual system to achieve such accurate auditory localization. Acoustic cues are derived from the physical properties of sound waves, and many factors allow and influence auditory localization abilities. This review presents the monaural and binaural perceptual mechanisms involved in auditory localization in the three dimensions. Besides the main mechanisms of Interaural Time Difference, Interaural Level Difference and Head Related Transfer Function, secondary important elements such as reverberation and motion, are also analyzed. For each mechanism, the perceptual limits of localization abilities are presented. A section is specifically devoted to reference systems in space, and to the pointing methods used in experimental research. Finally, some cases of misperception and auditory illusion are described. More than a simple description of the perceptual mechanisms underlying localization, this paper is intended to provide also practical information available for experiments and work in the auditory field.
RESUMO
Misophonia is commonly classified by intense emotional reactions to common everyday sounds. The condition has an impact both on the mental health of its sufferers and societally. As yet, formal models on the basis of misophonia are in their infancy. Based on developing behavioural and neuroscientific research we are gaining a growing understanding of the phenomenology and empirical findings in misophonia, such as the importance of context, types of coping strategies used and the activation of particular brain regions. In this article, we argue for a model of misophonia that includes not only the sound but also the context within which sound is perceived and the emotional reaction triggered. We review the current behavioural and neuroimaging literature, which lends support to this idea. Based on the current evidence, we propose that misophonia should be understood within the broader context of social perception and cognition, and not restricted within the narrow domain of being a disorder of auditory processing. We discuss the evidence in support of this hypothesis, as well as the implications for potential treatment approaches. This article is part of the theme issue 'Sensing and feeling: an integrative approach to sensory processing and emotional experience'.
Assuntos
Emoções , Cognição Social , Humanos , Emoções/fisiologia , Percepção Auditiva/fisiologia , Cognição , Percepção SocialRESUMO
Even though actions we observe in everyday life seem to unfold in a continuous manner, they are automatically divided into meaningful chunks, that are single actions or segments, which provide information for the formation and updating of internal predictive models. Specifically, boundaries between actions constitute a hub for predictive processing since the prediction of the current action comes to an end and calls for updating of predictions for the next action. In the current study, we investigated neural processes which characterize such boundaries using a repertoire of complex action sequences with a predefined probabilistic structure. Action sequences consisted of actions that started with the hand touching an object (T) and ended with the hand releasing the object (U). These action boundaries were determined using an automatic computer vision algorithm. Participants trained all action sequences by imitating demo videos. Subsequently, they returned for an fMRI session during which the original action sequences were presented in addition to slightly modified versions thereof. Participants completed a post-fMRI memory test to assess the retention of original action sequences. The exchange of individual actions, and thus a violation of action prediction, resulted in increased activation of the action observation network and the anterior insula. At U events, marking the end of an action, increased brain activation in supplementary motor area, striatum, and lingual gyrus was indicative of the retrieval of the previously encoded action repertoire. As expected, brain activation at U events also reflected the predefined probabilistic branching structure of the action repertoire. At T events, marking the beginning of the next action, midline and hippocampal regions were recruited, reflecting the selected prediction of the unfolding action segment. In conclusion, our findings contribute to a better understanding of the various cerebral processes characterizing prediction during the observation of complex action repertoires.
Assuntos
Mapeamento Encefálico , Imageamento por Ressonância Magnética , Humanos , Imageamento por Ressonância Magnética/métodos , Masculino , Feminino , Adulto , Adulto Jovem , Mapeamento Encefálico/métodos , Encéfalo/fisiologia , Encéfalo/diagnóstico por imagem , Desempenho Psicomotor/fisiologiaRESUMO
The ability to make sense of and predict others' actions is foundational for many socio-cognitive abilities. Dogs (Canis familiaris) constitute interesting comparative models for the study of action perception due to their marked sensitivity to human actions. We tested companion dogs (N = 21) in two screen-based eye-tracking experiments, adopting a task previously used with human infants and apes, to assess which aspects of an agent's action dogs consider relevant to the agent's underlying intentions. An agent was shown repeatedly acting upon the same one of two objects, positioned in the same location. We then presented the objects in swapped locations and the agent approached the objects centrally (Experiment 1) or the old object in the new location or the new object in the old location (Experiment 2). Dogs' anticipatory fixations and looking times did not reflect an expectation that agents should have continued approaching the same object nor the same location as witnessed during the brief familiarization phase; this contrasts with some findings with infants and apes, but aligns with findings in younger infants before they have sufficient motor experience with the observed action. However, dogs' pupil dilation and latency to make an anticipatory fixation suggested that, if anything, dogs expected the agents to keep approaching the same location rather than the same object, and their looking times showed sensitivity to the animacy of the agents. We conclude that dogs, lacking motor experience with the observed actions of grasping or kicking performed by a human or inanimate agent, might interpret such actions as directed toward a specific location rather than a specific object. Future research will need to further probe the suitability of anticipatory looking as measure of dogs' socio-cognitive abilities given differences between the visual systems of dogs and primates.
Assuntos
Cognição , Hominidae , Humanos , Cães , AnimaisRESUMO
Introduction: We investigated the factors underlying naturalistic action recognition and understanding, as well as the errors occurring during recognition failures. Methods: Participants saw full-light stimuli of ten different whole-body actions presented in three different conditions: as normal videos, as videos with the temporal order of the frames scrambled, and as single static representative frames. After each stimulus presentation participants completed one of two tasks-a forced choice task where they were given the ten potential action labels as options, or a free description task, where they could describe the action performed in each stimulus in their own words. Results: While generally, a combination of form, motion, and temporal information led to the highest action understanding, for some actions form information was sufficient and adding motion and temporal information did not increase recognition accuracy. We also analyzed errors in action recognition and found primarily two different types. Discussion: One type of error was on the semantic level, while the other consisted of reverting to the kinematic level of body part processing without any attribution of semantics. We elaborate on these results in the context of naturalistic action perception.
RESUMO
As a part of ongoing perception, the human cognitive system segments others' activities into discrete episodes (event segmentation). Although prior research has shown that this process is likely related to changes in an actor's actions and goals, it has not yet been determined whether untrained observers can reliably identify action and goal changes as naturalistic activities unfold, or whether the changes they identify are tied to visual features of the activity (e.g., the beginnings and ends of object interactions). This study addressed these questions by examining untrained participants' identification of action changes, goal changes, and event boundaries while watching videos of everyday activities that were presented in both first-person and third-person perspectives. We found that untrained observers can identify goal changes and action changes consistently, and these changes are not explained by visual change and the onsets or offsets of contact with objects. Moreover, the action and goal changes identified by untrained observers were associated with event boundaries, even after accounting for objective visual features of the videos. These findings suggest that people can identify action and goal changes consistently and with high agreement, that they do so by using sensory information flexibly, and that the action and goal changes they identify may contribute to event segmentation.
Assuntos
Objetivos , Humanos , Adulto , Adulto Jovem , Feminino , Masculino , Percepção Social , Percepção Visual/fisiologiaRESUMO
Benefiting from a cooperative interaction requires people to estimate how cooperatively other members of a group will act so that they can calibrate their own behavior accordingly. We investigated whether the synchrony of a group's actions influences observers' estimates of cooperation. Participants (recruited through Prolific) watched animations of actors deciding how much to donate in a public-goods game and using a mouse to drag donations to a public pot. Participants then estimated how much was in the pot in total (as an index of how cooperative they thought the group members were). Experiment 1 (N = 136 adults) manipulated the synchrony between players' decision-making time, and Experiment 2 (N = 136 adults) manipulated the synchrony between players' decision-implementing movements. For both experiments, estimates of how much was in the pot were higher for synchronous than asynchronous groups, demonstrating that the temporal dynamics of an interaction contain signals of a group's level of cooperativity.
Assuntos
Comportamento Cooperativo , Teoria dos Jogos , Adulto , HumanosRESUMO
Depth estimation is an ill-posed problem; objects of different shapes or dimensions, even if at different distances, may project to the same image on the retina. Our brain uses several cues for depth estimation, including monocular cues such as motion parallax and binocular cues such as diplopia. However, it remains unclear how the computations required for depth estimation are implemented in biologically plausible ways. State-of-the-art approaches to depth estimation based on deep neural networks implicitly describe the brain as a hierarchical feature detector. Instead, in this paper we propose an alternative approach that casts depth estimation as a problem of active inference. We show that depth can be inferred by inverting a hierarchical generative model that simultaneously predicts the eyes' projections from a 2D belief over an object. Model inversion consists of a series of biologically plausible homogeneous transformations based on Predictive Coding principles. Under the plausible assumption of a nonuniform fovea resolution, depth estimation favors an active vision strategy that fixates the object with the eyes, rendering the depth belief more accurate. This strategy is not realized by first fixating on a target and then estimating the depth; instead, it combines the two processes through action-perception cycles, with a similar mechanism of the saccades during object recognition. The proposed approach requires only local (top-down and bottom-up) message passing, which can be implemented in biologically plausible neural circuits.
RESUMO
Inferring intentions from verbal and nonverbal human behaviour is critical for everyday social life. Here, we combined Transcranial Magnetic Stimulation (TMS) with a behavioural priming paradigm to test whether key nodes of the Theory of Mind network (ToMn) contribute to understanding others' intentions by integrating prior knowledge about an agent with the observed action kinematics. We used a modified version of the Faked-Action Discrimination Task (FAD), a forced-choice paradigm in which participants watch videos of actors lifting a cube and judge whether the actors are trying to deceive them concerning the weight of the cube. Videos could be preceded or not by verbal description (prior) about the agent's truthful or deceitful intent. We applied single pulse TMS over three key nodes of the ToMn, namely dorsomedial prefrontal cortex (dmPFC), right posterior superior temporal sulcus (pSTS) and right temporo-parietal junction (rTPJ). Sham-TMS served as a control (baseline) condition. Following sham or rTPJ stimulation, we observed no consistent influence of priors on FAD performance. In contrast, following dmPFC stimulation, and to a lesser extent pSTS stimulation, truthful and deceitful actions were perceived as more deceptive only when the prior suggested a dishonest intention. These findings highlight a functional role of dmPFC and pSTS in coupling prior knowledge about deceptive intents with observed action kinematics in order to judge faked actions. Our study provides causal evidence that fronto-temporal nodes of the ToMn are functionally relevant to mental state inference during action observation.
Assuntos
Teoria da Mente , Humanos , Fenômenos Biomecânicos , Teoria da Mente/fisiologia , Estimulação Magnética Transcraniana , Lobo Temporal/fisiologia , Córtex Pré-Frontal/fisiologia , Lobo Parietal/fisiologiaRESUMO
Interpersonal interactions rely on various communication channels, both verbal and non-verbal, through which information regarding one's intentions and emotions are perceived. Here, we investigated the neural correlates underlying the visual processing of hand postures conveying social affordances (i.e., hand-shaking), compared to control stimuli such as hands performing non-social actions (i.e., grasping) or showing no movement at all. Combining univariate and multivariate analysis on electroencephalography (EEG) data, our results indicate that occipito-temporal electrodes show early differential processing of stimuli conveying social information compared to non-social ones. First, the amplitude of the Early Posterior Negativity (EPN, an Event-Related Potential related to the perception of body parts) is modulated differently during the perception of social and non-social content carried by hands. Moreover, our multivariate classification analysis (MultiVariate Pattern Analysis - MVPA) expanded the univariate results by revealing early (<200 ms) categorization of social affordances over occipito-parietal sites. In conclusion, we provide new evidence suggesting that the encoding of socially relevant hand gestures is categorized in the early stages of visual processing.
Assuntos
Emoções , Potenciais Evocados , Humanos , Tempo de Reação , Percepção Visual , EletroencefalografiaRESUMO
Efference copy-based forward model mechanisms may help us to distinguish between self-generated and externally-generated sensory consequences. Previous studies have shown that self-initiation modulates neural and perceptual responses to identical stimulation. For example, event-related potentials (ERPs) elicited by tones that follow a button press are reduced in amplitude relative to ERPs elicited by passively attended tones. However, previous EEG studies investigating visual stimuli in this context are rare, provide inconclusive results, and lack adequate control conditions with passive movements. Furthermore, although self-initiation is known to modulate behavioral responses, it is not known whether differences in the amplitude of ERPs also reflect differences in perception of sensory outcomes. In this study, we presented to participants visual stimuli consisting of gray discs following either active button presses, or passive button presses, in which an electromagnet moved the participant's finger. Two discs presented visually 500-1250 ms apart followed each button press, and participants judged which of the two was more intense. Early components of the primary visual response (N1 and P2) over the occipital electrodes were suppressed in the active condition. Interestingly, suppression in the intensity judgment task was only correlated with suppression of the visual P2 component. These data support the notion of efference copy-based forward model predictions in the visual sensory modality, but especially later processes (P2) seem to be perceptually relevant. Taken together, the results challenge the assumption that N1 differences reflect perceptual suppression and emphasize the relevance of the P2 ERP component.
Assuntos
Eletroencefalografia , Potenciais Evocados Auditivos , Humanos , Potenciais Evocados Auditivos/fisiologia , Potenciais Evocados/fisiologia , Dedos , Percepção , Percepção Auditiva/fisiologia , Percepção Visual/fisiologia , Estimulação Acústica/métodosRESUMO
This study explains how the leader-follower relationship and turn-taking could develop in a dyadic imitative interaction by conducting robotic simulation experiments based on the free energy principle. Our prior study showed that introducing a parameter during the model training phase can determine leader and follower roles for subsequent imitative interactions. The parameter is defined as w, the so-called meta-prior, and is a weighting factor used to regulate the complexity term versus the accuracy term when minimizing the free energy. This can be read as sensory attenuation, in which the robot's prior beliefs about action are less sensitive to sensory evidence. The current extended study examines the possibility that the leader-follower relationship shifts depending on changes in w during the interaction phase. We identified a phase space structure with three distinct types of behavioral coordination using comprehensive simulation experiments with sweeps of w of both robots during the interaction. Ignoring behavior in which the robots follow their own intention was observed in the region in which both ws were set to large values. One robot leading, followed by the other robot was observed when one w was set larger and the other was set smaller. Spontaneous, random turn-taking between the leader and the follower was observed when both ws were set at smaller or intermediate values. Finally, we examined a case of slowly oscillating w in anti-phase between the two agents during the interaction. The simulation experiment resulted in turn-taking in which the leader-follower relationship switched during determined sequences, accompanied by periodic shifts of ws. An analysis using transfer entropy found that the direction of information flow between the two agents also shifted along with turn-taking. Herein, we discuss qualitative differences between random/spontaneous turn-taking and agreed-upon sequential turn-taking by reviewing both synthetic and empirical studies.
RESUMO
The nervous system is sensitive to statistical regularities of the external world and forms internal models of these regularities to predict environmental dynamics. Given the inherently social nature of human behavior, being capable of building reliable predictive models of others' actions may be essential for successful interaction. While social prediction might seem to be a daunting task, the study of human motor control has accumulated ample evidence that our movements follow a series of kinematic invariants, which can be used by observers to reduce their uncertainty during social exchanges. Here, we provide an overview of the most salient regularities that shape biological motion, examine the role of these invariants in recognizing others' actions, and speculate that anchoring socially-relevant perceptual decisions to such kinematic invariants provides a key computational advantage for inferring conspecifics' goals and intentions.
Assuntos
Intenção , Movimento , Humanos , Movimento/fisiologia , PercepçãoRESUMO
To study complex human activity and how it is perceived and remembered, it is valuable to have large-scale, well-characterized stimuli that are representative of such activity. We present the Multi-angle Extended Three-dimensional Activities (META) stimulus set, a structured and highly instrumented set of extended event sequences performed in naturalistic settings. Performances were captured with two color cameras and a Kinect v2 camera with color and depth sensors, allowing the extraction of three-dimensional skeletal joint positions. We tracked the positions and identities of objects for all chapters using a mixture of manual coding and an automated tracking pipeline, and hand-annotated the timings of high-level actions. We also performed an online experiment to collect normative event boundaries for all chapters at a coarse and fine grain of segmentation, which allowed us to quantify event durations and agreement across participants. We share these materials publicly to advance new discoveries in the study of complex naturalistic activity.
Assuntos
Cognição , HumanosRESUMO
The main assumption underlying the present investigation is that action observation elicits a mandatory mental simulation representing the action forward in time. In Experiment 1, participants observed pairs of photos portraying the initial and the final still frames of an action video; then they observed a photo depicting the very same action but either forward or backward in time. Their task was to tell whether the action in the photo portrayed something happened before or after the action seen at encoding. In this explicit task, the evaluation was faster for forward photos than for backward photos. Crucially, the effect was replicated when instructions asked only to evaluate at test whether the photo depicted a scene congruent with the action seen at encoding (implicit task from two still frames, Experiment 2), and when at encoding, they were presented a single still frame and evaluated at test whether a photo depicted a scene congruent with the action seen at encoding (implicit task from single still frame; Experiment 3). Overall, the results speak in favour of a mandatory mechanism through which our brain simulates the action also in tasks that do not explicitly require action simulation.