Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 38
1.
Front Neurorobot ; 17: 1127642, 2023.
Article En | MEDLINE | ID: mdl-37440981

Deep reinforcement learning (RL) agents often suffer from catastrophic forgetting, forgetting previously found solutions in parts of the input space when training new data. Replay memories are a common solution to the problem by decorrelating and shuffling old and new training samples. They naively store state transitions as they arrive, without regard for redundancy. We introduce a novel cognitive-inspired replay memory approach based on the Grow-When-Required (GWR) self-organizing network, which resembles a map-based mental model of the world. Our approach organizes stored transitions into a concise environment-model-like network of state nodes and transition edges, merging similar samples to reduce the memory size and increase pair-wise distance among samples, which increases the relevancy of each sample. Overall, our study shows that map-based experience replay allows for significant memory reduction with only small decreases in performance.

2.
Int J Soc Robot ; : 1-16, 2023 Apr 02.
Article En | MEDLINE | ID: mdl-37359433

To enhance human-robot social interaction, it is essential for robots to process multiple social cues in a complex real-world environment. However, incongruency of input information across modalities is inevitable and could be challenging for robots to process. To tackle this challenge, our study adopted the neurorobotic paradigm of crossmodal conflict resolution to make a robot express human-like social attention. A behavioural experiment was conducted on 37 participants for the human study. We designed a round-table meeting scenario with three animated avatars to improve ecological validity. Each avatar wore a medical mask to obscure the facial cues of the nose, mouth, and jaw. The central avatar shifted its eye gaze while the peripheral avatars generated sound. Gaze direction and sound locations were either spatially congruent or incongruent. We observed that the central avatar's dynamic gaze could trigger crossmodal social attention responses. In particular, human performance was better under the congruent audio-visual condition than the incongruent condition. Our saliency prediction model was trained to detect social cues, predict audio-visual saliency, and attend selectively for the robot study. After mounting the trained model on the iCub, the robot was exposed to laboratory conditions similar to the human experiment. While the human performance was overall superior, our trained model demonstrated that it could replicate attention responses similar to humans.

3.
Neural Netw ; 161: 494-504, 2023 Apr.
Article En | MEDLINE | ID: mdl-36805264

Due to the dynamic nature of human language, automatic speech recognition (ASR) systems need to continuously acquire new vocabulary. Out-Of-Vocabulary (OOV) words, such as trending words and new named entities, pose problems to modern ASR systems that require long training times to adapt their large numbers of parameters. Different from most previous research focusing on language model post-processing, we tackle this problem on an earlier processing level and eliminate the bias in acoustic modeling to recognize OOV words acoustically. We propose to generate OOV words using text-to-speech systems and to rescale losses to encourage neural networks to pay more attention to OOV words. Specifically, we enlarge the classification loss used for training neural networks' parameters of utterances containing OOV words (sentence-level), or rescale the gradient used for back-propagation for OOV words (word-level), when fine-tuning a previously trained model on synthetic audio. To overcome catastrophic forgetting, we also explore the combination of loss rescaling and model regularization, i.e. L2 regularization and elastic weight consolidation (EWC). Compared with previous methods that just fine-tune synthetic audio with EWC, the experimental results on the LibriSpeech benchmark reveal that our proposed loss rescaling approach can achieve significant improvement on the recall rate with only a slight decrease on word error rate. Moreover, word-level rescaling is more stable than utterance-level rescaling and leads to higher recall rates and precision rates on OOV word recognition. Furthermore, our proposed combined loss rescaling and weight consolidation methods can support continual learning of an ASR system.


Speech Perception , Vocabulary , Humans , Speech , Language , Neural Networks, Computer
4.
Front Robot AI ; 9: 669719, 2022.
Article En | MEDLINE | ID: mdl-36274912

We propose a neural learning approach for a humanoid exercise robot that can automatically analyze and correct physical exercises. Such an exercise robot should be able to train many different human partners over time and thus requires the ability for lifelong learning. To this end, we develop a modified Grow-When-Required (GWR) network with recurrent connections, episodic memory and a novel subnode mechanism for learning spatiotemporal relationships of body movements and poses. Once an exercise is successfully demonstrated, the information of pose and movement per frame is stored in the Subnode-GWR network. For every frame, the current pose and motion pair is compared against a predicted output of the GWR, allowing for feedback not only on the pose but also on the velocity of the motion. Since both the pose and motion depend on a user's body morphology, the exercise demonstration by one individual cannot easily be used as a reference for further users. We allow the GWR to grow online with each further demonstration. The subnode mechanism ensures that exercise information for individual humans is stored and retrieved correctly and is not forgotten over time. In the application scenario, a physical exercise is performed in the presence of an expert like a physiotherapist and then used as a reference for a humanoid robot like Pepper to give feedback on further executions of the same exercise. For evaluation, we developed a new synthetic exercise dataset with virtual avatars. We also test our method on real-world data recorded in an office scenario. Overall, we claim that our novel GWR-based architecture can use a learned exercise reference for different body variations through incremental online learning while preventing catastrophic forgetting, enabling an engaging long-term human-robot experience with a humanoid robot.

5.
Front Neurorobot ; 16: 844753, 2022.
Article En | MEDLINE | ID: mdl-35966371

A cognitive agent performing in the real world needs to learn relevant concepts about its environment (e.g., objects, color, and shapes) and react accordingly. In addition to learning the concepts, it needs to learn relations between the concepts, in particular spatial relations between objects. In this paper, we propose three approaches that allow a cognitive agent to learn spatial relations. First, using an embodied model, the agent learns to reach toward an object based on simple instructions involving left-right relations. Since the level of realism and its complexity does not permit large-scale and diverse experiences in this approach, we devise as a second approach a simple visual dataset for geometric feature learning and show that recent reasoning models can learn directional relations in different frames of reference. Yet, embodied and simple simulation approaches together still do not provide sufficient experiences. To close this gap, we thirdly propose utilizing knowledge bases for disembodied spatial relation reasoning. Since the three approaches (i.e., embodied learning, learning from simple visual data, and use of knowledge bases) are complementary, we conceptualize a cognitive architecture that combines these approaches in the context of spatial relation learning.

6.
Article En | MEDLINE | ID: mdl-35867361

The aim of this work is to investigate the impact of crossmodal self-supervised pre-training for speech reconstruction (video-to-audio) by leveraging the natural co-occurrence of audio and visual streams in videos. We propose LipSound2 that consists of an encoder-decoder architecture and location-aware attention mechanism to map face image sequences to mel-scale spectrograms directly without requiring any human annotations. The proposed LipSound2 model is first pre-trained on  âˆ¼  2400-h multilingual (e.g., English and German) audio-visual data (VoxCeleb2). To verify the generalizability of the proposed method, we then fine-tune the pre-trained model on domain-specific datasets (GRID and TCD-TIMIT) for English speech reconstruction and achieve a significant improvement on speech quality and intelligibility compared to previous approaches in speaker-dependent and speaker-independent settings. In addition to English, we conduct Chinese speech reconstruction on the Chinese Mandarin Lip Reading (CMLR) dataset to verify the impact on transferability. Finally, we train the cascaded lip reading (video-to-text) system by fine-tuning the generated audios on a pre-trained speech recognition system and achieve the state-of-the-art performance on both English and Chinese benchmark datasets.

8.
Front Robot AI ; 9: 717193, 2022.
Article En | MEDLINE | ID: mdl-35265672

Collaborative interactions require social robots to share the users' perspective on the interactions and adapt to the dynamics of their affective behaviour. Yet, current approaches for affective behaviour generation in robots focus on instantaneous perception to generate a one-to-one mapping between observed human expressions and static robot actions. In this paper, we propose a novel framework for affect-driven behaviour generation in social robots. The framework consists of (i) a hybrid neural model for evaluating facial expressions and speech of the users, forming intrinsic affective representations in the robot, (ii) an Affective Core, that employs self-organising neural models to embed behavioural traits like patience and emotional actuation that modulate the robot's affective appraisal, and (iii) a Reinforcement Learning model that uses the robot's appraisal to learn interaction behaviour. We investigate the effect of modelling different affective core dispositions on the affective appraisal and use this affective appraisal as the motivation to generate robot behaviours. For evaluation, we conduct a user study (n = 31) where the NICO robot acts as a proposer in the Ultimatum Game. The effect of the robot's affective core on its negotiation strategy is witnessed by participants, who rank a patient robot with high emotional actuation higher on persistence, while an impatient robot with low emotional actuation is rated higher on its generosity and altruistic behaviour.

9.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1552-1565, 2022 03.
Article En | MEDLINE | ID: mdl-32877332

Generative adversarial networks conditioned on textual image descriptions are capable of generating realistic-looking images. However, current methods still struggle to generate images based on complex image captions from a heterogeneous domain. Furthermore, quantitatively evaluating these text-to-image models is challenging, as most evaluation metrics only judge image quality but not the conformity between the image and its caption. To address these challenges we introduce a new model that explicitly models individual objects within an image and a new evaluation metric called Semantic Object Accuracy (SOA) that specifically evaluates images given an image caption. The SOA uses a pre-trained object detector to evaluate if a generated image contains objects that are mentioned in the image caption, e.g., whether an image generated from "a car driving down the street" contains a car. We perform a user study comparing several text-to-image models and show that our SOA metric ranks the models the same way as humans, whereas other metrics such as the Inception Score do not. Our evaluation also shows that models which explicitly model objects outperform models which only model global image characteristics.


Neural Networks, Computer , Semantics , Algorithms , Humans , Image Processing, Computer-Assisted/methods
10.
Front Psychol ; 12: 716671, 2021.
Article En | MEDLINE | ID: mdl-34484079

Human language is inherently embodied and grounded in sensorimotor representations of the self and the world around it. This suggests that the body schema and ideomotor action-effect associations play an important role in language understanding, language generation, and verbal/physical interaction with others. There are computational models that focus purely on non-verbal interaction between humans and robots, and there are computational models for dialog systems that focus only on verbal interaction. However, there is a lack of research that integrates these approaches. We hypothesize that the development of computational models of the self is very appropriate for considering joint verbal and physical interaction. Therefore, they provide the substantial potential to foster the psychological and cognitive understanding of language grounding, and they have significant potential to improve human-robot interaction methods and applications. This review is a first step toward developing models of the self that integrate verbal and non-verbal communication. To this end, we first analyze the relevant findings and mechanisms for language grounding in the psychological and cognitive literature on ideomotor theory. Second, we identify the existing computational methods that implement physical decision-making and verbal interaction. As a result, we outline how the current computational methods can be used to create advanced computational interaction models that integrate language grounding with body schemas and self-representations.

11.
Front Robot AI ; 8: 668057, 2021.
Article En | MEDLINE | ID: mdl-34336933

Wizard-of-Oz experiments play a vital role in Human-Robot Interaction (HRI), as they allow for quick and simple hypothesis testing. Still, a publicly available general tool to conduct such experiments is currently not available in the research community, and researchers often develop and implement their own tools, customized for each individual experiment. Besides being inefficient in terms of programming efforts, this also makes it harder for non-technical researchers to conduct Wizard-of-Oz experiments. In this paper, we present a general and easy-to-use tool for the Pepper robot, one of the most commonly used robots in this context. While we provide the concrete interface for Pepper robots only, the system architecture is independent of the type of robot and can be adapted for other robots. A configuration file, which saves experiment-specific parameters, enables a quick setup for reproducible and repeatable Wizard-of-Oz experiments. A central server provides a graphical interface via a browser while handling the mapping of user input to actions on the robot. In our interface, keyboard shortcuts may be assigned to phrases, gestures, and composite behaviors to simplify and speed up control of the robot. The interface is lightweight and independent of the operating system. Our initial tests confirm that the system is functional, flexible, and easy to use. The interface, including source code, is made commonly available, and we hope that it will be useful for researchers with any background who want to conduct HRI experiments.

12.
Front Neurorobot ; 15: 669534, 2021.
Article En | MEDLINE | ID: mdl-34276332

Long-term human-robot interaction requires the continuous acquisition of knowledge. This ability is referred to as lifelong learning (LL). LL is a long-standing challenge in machine learning due to catastrophic forgetting, which states that continuously learning from novel experiences leads to a decrease in the performance of previously acquired knowledge. Two recently published LL approaches are the Growing Dual-Memory (GDM) and the Self-organizing Incremental Neural Network+ (SOINN+). Both are growing neural networks that create new neurons in response to novel sensory experiences. The latter approach shows state-of-the-art clustering performance on sequentially available data with low memory requirements regarding the number of nodes. However, classification capabilities are not investigated. Two novel contributions are made in our research paper: (I) An extended SOINN+ approach, called associative SOINN+ (A-SOINN+), is proposed. It adopts two main properties of the GDM model to facilitate classification. (II) A new LL object recognition dataset (v-NICO-World-LL) is presented. It is recorded in a nearly photorealistic virtual environment, where a virtual humanoid robot manipulates 100 different objects belonging to 10 classes. Real-world and artificially created background images, grouped into four different complexity levels, are utilized. The A-SOINN+ reaches similar state-of-the-art classification accuracy results as the best GDM architecture of this work and consists of 30 to 350 times fewer neurons, evaluated on two LL object recognition datasets, the novel v-NICO-World-LL and the well-known CORe50. Furthermore, we observe an approximately 268 times lower training time. These reduced numbers result in lower memory and computational requirements, indicating higher suitability for autonomous social robots with low computational resources to facilitate a more efficient LL during long-term human-robot interactions.

13.
Front Robot AI ; 8: 644529, 2021.
Article En | MEDLINE | ID: mdl-34150857

As robots become more advanced and capable, developing trust is an important factor of human-robot interaction and cooperation. However, as multiple environmental and social factors can influence trust, it is important to develop more elaborate scenarios and methods to measure human-robot trust. A widely used measurement of trust in social science is the investment game. In this study, we propose a scaled-up, immersive, science fiction Human-Robot Interaction (HRI) scenario for intrinsic motivation on human-robot collaboration, built upon the investment game and aimed at adapting the investment game for human-robot trust. For this purpose, we utilize two Neuro-Inspired COmpanion (NICO) - robots and a projected scenery. We investigate the applicability of our space mission experiment design to measure trust and the impact of non-verbal communication. We observe a correlation of 0.43 ( p = 0.02 ) between self-assessed trust and trust measured from the game, and a positive impact of non-verbal communication on trust ( p = 0.0008 ) and robot perception for anthropomorphism ( p = 0.007 ) and animacy ( p = 0.00002 ). We conclude that our scenario is an appropriate method to measure trust in human-robot interaction and also to study how non-verbal communication influences a human's trust in robots.

14.
Front Neurorobot ; 14: 52, 2020.
Article En | MEDLINE | ID: mdl-33154720

Human infants are able to acquire natural language seemingly easily at an early age. Their language learning seems to occur simultaneously with learning other cognitive functions as well as with playful interactions with the environment and caregivers. From a neuroscientific perspective, natural language is embodied, grounded in most, if not all, sensory and sensorimotor modalities, and acquired by means of crossmodal integration. However, characterizing the underlying mechanisms in the brain is difficult and explaining the grounding of language in crossmodal perception and action remains challenging. In this paper, we present a neurocognitive model for language grounding which reflects bio-inspired mechanisms such as an implicit adaptation of timescales as well as end-to-end multimodal abstraction. It addresses developmental robotic interaction and extends its learning capabilities using larger-scale knowledge-based data. In our scenario, we utilize the humanoid robot NICO in obtaining the EMIL data collection, in which the cognitive robot interacts with objects in a children's playground environment while receiving linguistic labels from a caregiver. The model analysis shows that crossmodally integrated representations are sufficient for acquiring language merely from sensory input through interaction with objects in an environment. The representations self-organize hierarchically and embed temporal and spatial information through composition and decomposition. This model can also provide the basis for further crossmodal integration of perceptually grounded cognitive representations.

15.
Front Neurorobot ; 14: 28, 2020.
Article En | MEDLINE | ID: mdl-32581759

To overcome novel challenges in complex domestic environments, humanoid robots can learn from human teachers. We propose that the capability for social interaction should be a key factor in this teaching process and benefits both the subjective experience of the human user and the learning process itself. To support our hypothesis, we present a Human-Robot Interaction study on human-assisted visuomotor learning with the robot NICO, the Neuro-Inspired COmpanion, a child-sized humanoid. NICO is a flexible, social platform with sensing and manipulation abilities. We give a detailed description of NICO's design and a comprehensive overview of studies that use or evaluate NICO. To engage in social interaction, NICO can express stylized facial expressions and utter speech via an Embodied Dialogue System. NICO is characterized in particular by combining these social interaction capabilities with the abilities for human-like object manipulation and crossmodal perception. In the presented study, NICO acquires visuomotor grasping skills by interacting with its environment. In contrast to methods like motor babbling, the learning process is, in part, supported by a human teacher. To begin the learning process, an object is placed into NICO's hand, and if this object is accidentally dropped, the human assistant has to recover it. The study is conducted with 24 participants with little or no prior experience with robots. In the robot-guided experimental condition, assistance is actively requested by NICO via the Embodied Dialogue System. In the human-guided condition, instructions are given by a human experimenter, while NICO remains silent. Evaluation using established questionnaires like Godspeed, Mind Perception, and Uncanny Valley Indices, along with a structured interview and video analysis of the interaction, show that the robot's active requests for assistance foster the participant's engagement and benefit the learning process. This result supports the hypothesis that the ability for social interaction is a key factor for companion robots that learn with the help of non-expert teachers, as these robots become capable of communicating active requests or questions that are vital to their learning process. We also show how the design of NICO both enables and is driven by this approach.

16.
Front Integr Neurosci ; 14: 10, 2020.
Article En | MEDLINE | ID: mdl-32174816

Selective attention plays an essential role in information acquisition and utilization from the environment. In the past 50 years, research on selective attention has been a central topic in cognitive science. Compared with unimodal studies, crossmodal studies are more complex but necessary to solve real-world challenges in both human experiments and computational modeling. Although an increasing number of findings on crossmodal selective attention have shed light on humans' behavioral patterns and neural underpinnings, a much better understanding is still necessary to yield the same benefit for intelligent computational agents. This article reviews studies of selective attention in unimodal visual and auditory and crossmodal audiovisual setups from the multidisciplinary perspectives of psychology and cognitive neuroscience, and evaluates different ways to simulate analogous mechanisms in computational models and robotics. We discuss the gaps between these fields in this interdisciplinary review and provide insights about how to use psychological findings and theories in artificial intelligence from different perspectives.

17.
Front Robot AI ; 7: 540565, 2020.
Article En | MEDLINE | ID: mdl-33501309

The quality of crossmodal perception hinges on two factors: The accuracy of the independent unimodal perception and the ability to integrate information from different sensory systems. In humans, the ability for cognitively demanding crossmodal perception diminishes from young to old age. Here, we propose a new approach to research to which degree the different factors contribute to crossmodal processing and the age-related decline by replicating a medical study on visuo-tactile crossmodal pattern discrimination utilizing state-of-the-art tactile sensing technology and artificial neural networks (ANN). We implemented two ANN models to specifically focus on the relevance of early integration of sensory information during the crossmodal processing stream as a mechanism proposed for efficient processing in the human brain. Applying an adaptive staircase procedure, we approached comparable unimodal classification performance for both modalities in the human participants as well as the ANN. This allowed us to compare crossmodal performance between and within the systems, independent of the underlying unimodal processes. Our data show that unimodal classification accuracies of the tactile sensing technology are comparable to humans. For crossmodal discrimination of the ANN the integration of high-level unimodal features on earlier stages of the crossmodal processing stream shows higher accuracies compared to the late integration of independent unimodal classifications. In comparison to humans, the ANN show higher accuracies than older participants in the unimodal as well as the crossmodal condition, but lower accuracies than younger participants in the crossmodal task. Taken together, we can show that state-of-the-art tactile sensing technology is able to perform a complex tactile recognition task at levels comparable to humans. For crossmodal processing, human inspired early sensory integration seems to improve the performance of artificial neural networks. Still, younger participants seem to employ more efficient crossmodal integration mechanisms than modeled in the proposed ANN. Our work demonstrates how collaborative research in neuroscience and embodied artificial neurocognitive models can help to derive models to inform the design of future neurocomputational architectures.

18.
Front Neurorobot ; 13: 93, 2019.
Article En | MEDLINE | ID: mdl-31798437

The problem of generating structured Knowledge Graphs (KGs) is difficult and open but relevant to a range of tasks related to decision making and information augmentation. A promising approach is to study generating KGs as a relational representation of inputs (e.g., textual paragraphs or natural images), where nodes represent the entities and edges represent the relations. This procedure is naturally a mixture of two phases: extracting primary relations from input, and completing the KG with reasoning. In this paper, we propose a hybrid KG builder that combines these two phases in a unified framework and generates KGs from scratch. Specifically, we employ a neural relation extractor resolving primary relations from input and a differentiable inductive logic programming (ILP) model that iteratively completes the KG. We evaluate our framework in both textual and visual domains and achieve comparable performance on relation extraction datasets based on Wikidata and the Visual Genome. The framework surpasses neural baselines by a noticeable gap in reasoning out dense KGs and overall performs particularly well for rare relations.

19.
Int J Neural Syst ; 29(5): 1850052, 2019 Jun.
Article En | MEDLINE | ID: mdl-30764724

This paper attempts to solve the typical problems of self-organizing growing network models, i.e. (a) an influence of the order of input data on the self-organizing ability, (b) an instability to high-dimensional data and an excessive sensitivity to noise, and (c) an expensive computational cost by integrating Kernel Bayes Rule (KBR) and Correntropy-Induced Metric (CIM) into Adaptive Resonance Theory (ART) framework. KBR performs a covariance-free Bayesian computation which is able to maintain a fast and stable computation. CIM is a generalized similarity measurement which can maintain a high-noise reduction ability even in a high-dimensional space. In addition, a Growing Neural Gas (GNG)-based topology construction process is integrated into the ART framework to enhance its self-organizing ability. The simulation experiments with synthetic and real-world datasets show that the proposed model has an outstanding stable self-organizing ability for various test environments.


Algorithms , Bayes Theorem , Computer Simulation , Neural Networks, Computer
20.
Neural Netw ; 113: 54-71, 2019 May.
Article En | MEDLINE | ID: mdl-30780045

Humans and animals have the ability to continually acquire, fine-tune, and transfer knowledge and skills throughout their lifespan. This ability, referred to as lifelong learning, is mediated by a rich set of neurocognitive mechanisms that together contribute to the development and specialization of our sensorimotor skills as well as to long-term memory consolidation and retrieval. Consequently, lifelong learning capabilities are crucial for computational learning systems and autonomous agents interacting in the real world and processing continuous streams of information. However, lifelong learning remains a long-standing challenge for machine learning and neural network models since the continual acquisition of incrementally available information from non-stationary data distributions generally leads to catastrophic forgetting or interference. This limitation represents a major drawback for state-of-the-art deep neural network models that typically learn representations from stationary batches of training data, thus without accounting for situations in which information becomes incrementally available over time. In this review, we critically summarize the main challenges linked to lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic forgetting. Although significant advances have been made in domain-specific learning with neural networks, extensive research efforts are required for the development of robust lifelong learning on autonomous agents and robots. We discuss well-established and emerging research motivated by lifelong learning factors in biological systems such as structural plasticity, memory replay, curriculum and transfer learning, intrinsic motivation, and multisensory integration.


Machine Learning/trends , Neural Networks, Computer , Pattern Recognition, Automated/trends , Animals , Humans , Memory , Pattern Recognition, Automated/methods
...