RESUMEN
A major challenge for systems neuroscience is to break the neural code. Computational algorithms for encoding information into neural activity and extracting information from measured activity afford understanding of how percepts, memories, thought, and knowledge are represented in patterns of brain activity. The past decade and a half has seen significant advances in the development of methods for decoding human neural activity, such as multivariate pattern classification, representational similarity analysis, hyperalignment, and stimulus-model-based encoding and decoding. This article reviews these advances and integrates neural decoding methods into a common framework organized around the concept of high-dimensional representational spaces.
Asunto(s)
Mapeo Encefálico/métodos , Procesamiento de Imagen Asistido por Computador , Modelos Neurológicos , Neuronas/fisiología , Animales , HumanosRESUMEN
Shared information content is represented across brains in idiosyncratic functional topographies. Hyperalignment addresses these idiosyncrasies by using neural responses to project individuals' brain data into a common model space while maintaining the geometric relationships between distinct patterns of activity or connectivity. The dimensions of this common model capture functional profiles that are shared across individuals such as cortical response profiles collected during a common time-locked stimulus presentation (e.g. movie viewing) or functional connectivity profiles. Hyperalignment can use either response-based or connectivity-based input data to derive transformations that project individuals' neural data from anatomical space into the common model space. Previously, only response or connectivity profiles were used in the derivation of these transformations. In this study, we developed a new hyperalignment algorithm, hybrid hyperalignment, that derives transformations based on both response-based and connectivity-based information. We used three different movie-viewing fMRI datasets to test the performance of our new algorithm. Hybrid hyperalignment derives a single common model space that aligns response-based information as well as or better than response hyperalignment while simultaneously aligning connectivity-based information better than connectivity hyperalignment. These results suggest that a single common information space can encode both shared cortical response and functional connectivity profiles across individuals.
Asunto(s)
Mapeo Encefálico/métodos , Corteza Cerebral/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética/métodos , Películas Cinematográficas , Red Nerviosa/diagnóstico por imagen , Adulto , Corteza Cerebral/fisiología , Femenino , Humanos , Masculino , Red Nerviosa/fisiología , Estimulación Luminosa/métodosRESUMEN
Subject-specific, functionally defined areas are conventionally estimated with functional localizers and a simple contrast analysis between responses to different stimulus categories. Compared with functional localizers, naturalistic stimuli provide several advantages such as stronger and widespread brain activation, greater engagement, and increased subject compliance. In this study we demonstrate that a subject's idiosyncratic functional topography can be estimated with high fidelity from that subject's fMRI data obtained while watching a naturalistic movie using hyperalignment to project other subjects' localizer data into that subject's idiosyncratic cortical anatomy. These findings lay the foundation for developing an efficient tool for mapping functional topographies for a wide range of perceptual and cognitive functions in new subjects based only on fMRI data collected while watching an engaging, naturalistic stimulus and other subjects' localizer data from a normative sample.
Asunto(s)
Encéfalo/diagnóstico por imagen , Encéfalo/fisiología , Reconocimiento Facial/fisiología , Imagen por Resonancia Magnética/métodos , Películas Cinematográficas , Adulto , Femenino , Predicción , Humanos , Masculino , Estimulación Luminosa/métodos , Adulto JovenRESUMEN
Variation in cortical connectivity profiles is typically modeled as having a coarse spatial scale parcellated into interconnected brain areas. We created a high-dimensional common model of the human connectome to search for fine-scale structure that is shared across brains. Projecting individual connectivity data into this new common model connectome accounts for substantially more variance in the human connectome than do previous models. This newly discovered shared structure is closely related to fine-scale distinctions in representations of information. These results reveal a shared fine-scale structure that is a major component of the human connectome that coexists with coarse-scale, areal structure. This shared fine-scale structure was not captured in previous models and was, therefore, inaccessible to analysis and study.
Asunto(s)
Conectoma/estadística & datos numéricos , Modelos Neurológicos , Estimulación Acústica , Adulto , Algoritmos , Encéfalo/anatomía & histología , Encéfalo/fisiología , Biología Computacional , Simulación por Computador , Femenino , Humanos , Imagen por Resonancia Magnética , Masculino , Películas Cinematográficas , Estimulación Luminosa , Adulto JovenRESUMEN
Fine-grained functional organization of cortex is not well-conserved across individuals. As a result, individual differences in cortical functional architecture are confounded by topographic idiosyncrasies-i.e., differences in functional-anatomical correspondence. In this study, we used hyperalignment to align information encoded in topographically variable patterns to study individual differences in fine-grained cortical functional architecture in a common representational space. We characterized the structure of individual differences using three common functional indices, and assessed the reliability of this structure across independent samples of data in a natural vision paradigm. Hyperalignment markedly improved the reliability of individual differences across all three indices by resolving topographic idiosyncrasies and accommodating information encoded in spatially fine-grained response patterns. Our results demonstrate that substantial individual differences in cortical functional architecture exist at fine spatial scales, but are inaccessible with anatomical normalization alone.
Asunto(s)
Mapeo Encefálico/métodos , Corteza Cerebral/fisiología , Procesamiento de Imagen Asistido por Computador/métodos , Individualidad , Imagen por Resonancia Magnética/métodos , Adulto , Mapeo Encefálico/normas , Corteza Cerebral/diagnóstico por imagen , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador/normas , Imagen por Resonancia Magnética/normas , Masculino , Reproducibilidad de los Resultados , Adulto JovenRESUMEN
The rate of progress in human neurosciences is limited by the inability to easily apply a wide range of analysis methods to the plethora of different datasets acquired in labs around the world. In this work, we introduce a framework for creating, testing, versioning and archiving portable applications for analyzing neuroimaging data organized and described in compliance with the Brain Imaging Data Structure (BIDS). The portability of these applications (BIDS Apps) is achieved by using container technologies that encapsulate all binary and other dependencies in one convenient package. BIDS Apps run on all three major operating systems with no need for complex setup and configuration and thanks to the comprehensiveness of the BIDS standard they require little manual user input. Previous containerized data processing solutions were limited to single user environments and not compatible with most multi-tenant High Performance Computing systems. BIDS Apps overcome this limitation by taking advantage of the Singularity container technology. As a proof of concept, this work is accompanied by 22 ready to use BIDS Apps, packaging a diverse set of commonly used neuroimaging algorithms.
Asunto(s)
Encéfalo/anatomía & histología , Interpretación de Imagen Asistida por Computador/métodos , Neuroimagen/métodos , Sistemas de Información Radiológica/organización & administración , Programas Informáticos , Interfaz Usuario-Computador , Algoritmos , Humanos , Imagen por Resonancia Magnética/métodosRESUMEN
Neural models of a distributed system for face perception implicate a network of regions in the ventral visual stream for recognition of identity. Here, we report a functional magnetic resonance imaging (fMRI) neural decoding study in humans that shows that this pathway culminates in the right inferior frontal cortex face area (rIFFA) with a representation of individual identities that has been disentangled from variable visual features in different images of the same person. At earlier stages in the pathway, processing begins in early visual cortex and the occipital face area with representations of head view that are invariant across identities, and proceeds to an intermediate level of representation in the fusiform face area in which identity is emerging but still entangled with head view. Three-dimensional, view-invariant representation of identities in the rIFFA may be the critical link to the extended system for face perception, affording activation of person knowledge and emotional responses to familiar faces.
Asunto(s)
Reconocimiento Facial/fisiología , Lóbulo Frontal/fisiología , Red Nerviosa/fisiología , Reconocimiento Visual de Modelos/fisiología , Reconocimiento en Psicología/fisiología , Corteza Visual/fisiología , Adulto , Mapeo Encefálico , Femenino , Humanos , Masculino , Vías Nerviosas/fisiologíaRESUMEN
Humans prioritize different semantic qualities of a complex stimulus depending on their behavioral goals. These semantic features are encoded in distributed neural populations, yet it is unclear how attention might operate across these distributed representations. To address this, we presented participants with naturalistic video clips of animals behaving in their natural environments while the participants attended to either behavior or taxonomy. We used models of representational geometry to investigate how attentional allocation affects the distributed neural representation of animal behavior and taxonomy. Attending to animal behavior transiently increased the discriminability of distributed population codes for observed actions in anterior intraparietal, pericentral, and ventral temporal cortices. Attending to animal taxonomy while viewing the same stimuli increased the discriminability of distributed animal category representations in ventral temporal cortex. For both tasks, attention selectively enhanced the discriminability of response patterns along behaviorally relevant dimensions. These findings suggest that behavioral goals alter how the brain extracts semantic features from the visual world. Attention effectively disentangles population responses for downstream read-out by sculpting representational geometry in late-stage perceptual areas.
Asunto(s)
Atención/fisiología , Encéfalo/fisiología , Percepción de Movimiento/fisiología , Semántica , Adulto , Encéfalo/diagnóstico por imagen , Mapeo Encefálico/métodos , Femenino , Humanos , Imagen por Resonancia Magnética , Masculino , Modelos Estadísticos , Vías Nerviosas/diagnóstico por imagen , Vías Nerviosas/fisiología , Pruebas Neuropsicológicas , Reconocimiento Visual de Modelos/fisiologíaRESUMEN
UNLABELLED: Common or folk knowledge about animals is dominated by three dimensions: (1) level of cognitive complexity or "animacy;" (2) dangerousness or "predacity;" and (3) size. We investigated the neural basis of the perceived dangerousness or aggressiveness of animals, which we refer to more generally as "perception of threat." Using functional magnetic resonance imaging (fMRI), we analyzed neural activity evoked by viewing images of animal categories that spanned the dissociable semantic dimensions of threat and taxonomic class. The results reveal a distributed network for perception of threat extending along the right superior temporal sulcus. We compared neural representational spaces with target representational spaces based on behavioral judgments and a computational model of early vision and found a processing pathway in which perceived threat emerges as a dominant dimension: whereas visual features predominate in early visual cortex and taxonomy in lateral occipital and ventral temporal cortices, these dimensions fall away progressively from posterior to anterior temporal cortices, leaving threat as the dominant explanatory variable. Our results suggest that the perception of threat in the human brain is associated with neural structures that underlie perception and cognition of social actions and intentions, suggesting a broader role for these regions than has been thought previously, one that includes the perception of potential threat from agents independent of their biological class. SIGNIFICANCE STATEMENT: For centuries, philosophers have wondered how the human mind organizes the world into meaningful categories and concepts. Today this question is at the core of cognitive science, but our focus has shifted to understanding how knowledge manifests in dynamic activity of neural systems in the human brain. This study advances the young field of empirical neuroepistemology by characterizing the neural systems engaged by an important dimension in our cognitive representation of the animal kingdom ontological subdomain: how the brain represents the perceived threat, dangerousness, or "predacity" of animals. Our findings reveal how activity for domain-specific knowledge of animals overlaps the social perception networks of the brain, suggesting domain-general mechanisms underlying the representation of conspecifics and other animals.
Asunto(s)
Encéfalo/fisiología , Conectoma , Conducta Predatoria/clasificación , Percepción Visual , Adulto , Anfibios/fisiología , Animales , Artrópodos/fisiología , Encéfalo/citología , Cognición , Femenino , Humanos , Imagen por Resonancia Magnética , Masculino , Neuronas/fisiología , Reptiles/fisiologíaRESUMEN
Current models of the functional architecture of human cortex emphasize areas that capture coarse-scale features of cortical topography but provide no account for population responses that encode information in fine-scale patterns of activity. Here, we present a linear model of shared representational spaces in human cortex that captures fine-scale distinctions among population responses with response-tuning basis functions that are common across brains and models cortical patterns of neural responses with individual-specific topographic basis functions. We derive a common model space for the whole cortex using a new algorithm, searchlight hyperalignment, and complex, dynamic stimuli that provide a broad sampling of visual, auditory, and social percepts. The model aligns representations across brains in occipital, temporal, parietal, and prefrontal cortices, as shown by between-subject multivariate pattern classification and intersubject correlation of representational geometry, indicating that structural principles for shared neural representations apply across widely divergent domains of information. The model provides a rigorous account for individual variability of well-known coarse-scale topographies, such as retinotopy and category selectivity, and goes further to account for fine-scale patterns that are multiplexed with coarse-scale topographies and carry finer distinctions.
Asunto(s)
Percepción Auditiva/fisiología , Mapeo Encefálico/métodos , Corteza Cerebral/fisiología , Imagen por Resonancia Magnética/métodos , Modelos Neurológicos , Percepción Visual/fisiología , Algoritmos , Corteza Cerebral/diagnóstico por imagen , Femenino , Humanos , Modelos Lineales , Masculino , Pruebas Neuropsicológicas , Adulto JovenRESUMEN
Major theories for explaining the organization of semantic memory in the human brain are premised on the often-observed dichotomous dissociation between living and nonliving objects. Evidence from neuroimaging has been interpreted to suggest that this distinction is reflected in the functional topography of the ventral vision pathway as lateral-to-medial activation gradients. Recently, we observed that similar activation gradients also reflect differences among living stimuli consistent with the semantic dimension of graded animacy. Here, we address whether the salient dichotomous distinction between living and nonliving objects is actually reflected in observable measured brain activity or whether previous observations of a dichotomous dissociation were the illusory result of stimulus sampling biases. Using fMRI, we measured neural responses while participants viewed 10 animal species with high to low animacy and two inanimate categories. Representational similarity analysis of the activity in ventral vision cortex revealed a main axis of variation with high-animacy species maximally different from artifacts and with the least animate species closest to artifacts. Although the associated functional topography mirrored activation gradients observed for animate-inanimate contrasts, we found no evidence for a dichotomous dissociation. We conclude that a central organizing principle of human object vision corresponds to the graded psychological property of animacy with no clear distinction between living and nonliving stimuli. The lack of evidence for a dichotomous dissociation in the measured brain activity challenges theories based on this premise.
Asunto(s)
Mapeo Encefálico , Ilusiones Ópticas/fisiología , Reconocimiento Visual de Modelos/fisiología , Semántica , Corteza Visual/fisiología , Vías Visuales/fisiología , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Masculino , Oxígeno/sangre , Estimulación Luminosa , Análisis de Componente Principal , Tiempo de Reacción/fisiología , Corteza Visual/irrigación sanguínea , Vías Visuales/irrigación sanguíneaRESUMEN
Fascinating phenomena such as landmark vector cells and splitter cells are frequently discovered in the hippocampus. Without a unifying principle, each experiment seemingly uncovers new anomalies or coding types. Here, we provide a unifying principle that the mental representation of space is an emergent property of latent higher-order sequence learning. Treating space as a sequence resolves numerous phenomena and suggests that the place field mapping methodology that interprets sequential neuronal responses in Euclidean terms might itself be a source of anomalies. Our model, clone-structured causal graph (CSCG), employs higher-order graph scaffolding to learn latent representations by mapping aliased egocentric sensory inputs to unique contexts. Learning to compress sequential and episodic experiences using CSCGs yields allocentric cognitive maps that are suitable for planning, introspection, consolidation, and abstraction. By explicating the role of Euclidean place field mapping and demonstrating how latent sequential representations unify myriad observed phenomena, our work positions the hippocampus in a sequence-centric paradigm, challenging the prevailing space-centric view.
Asunto(s)
Hipocampo , Hipocampo/fisiología , Humanos , Animales , Modelos Neurológicos , Percepción Espacial/fisiología , Neuronas/fisiología , Aprendizaje/fisiologíaRESUMEN
Evidence of category specificity from neuroimaging in the human visual system is generally limited to a few relatively coarse categorical distinctions-e.g., faces versus bodies, or animals versus artifacts-leaving unknown the neural underpinnings of fine-grained category structure within these large domains. Here we use fMRI to explore brain activity for a set of categories within the animate domain, including six animal species-two each from three very different biological classes: primates, birds, and insects. Patterns of activity throughout ventral object vision cortex reflected the biological classes of the stimuli. Specifically, the abstract representational space-measured as dissimilarity matrices defined between species-specific multivariate patterns of brain activity-correlated strongly with behavioral judgments of biological similarity of the same stimuli. This biological class structure was uncorrelated with structure measured in retinotopic visual cortex, which correlated instead with a dissimilarity matrix defined by a model of V1 cortex for the same stimuli. Additionally, analysis of the shape of the similarity space in ventral regions provides evidence for a continuum in the abstract representational space-with primates at one end and insects at the other. Further investigation into the cortical topography of activity that contributes to this category structure reveals the partial engagement of brain systems active normally for inanimate objects in addition to animate regions.
Asunto(s)
Mapeo Encefálico , Encéfalo/fisiología , Formación de Concepto/fisiología , Juicio/fisiología , Reconocimiento Visual de Modelos/fisiología , Reconocimiento en Psicología/fisiología , Adulto , Encéfalo/irrigación sanguínea , Clasificación , Análisis por Conglomerados , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Masculino , Oxígeno/sangre , Estimulación Luminosa/métodos , Tiempo de Reacción , Vías Visuales/irrigación sanguínea , Vías Visuales/fisiología , Adulto JovenRESUMEN
Inter-subject alignment of functional MRI (fMRI) data is necessary for group analyses. The standard approach to this problem matches anatomical features of the brain, such as major anatomical landmarks or cortical curvature. Precise alignment of functional cortical topographies, however, cannot be derived using only anatomical features. We propose a new inter-subject registration algorithm that aligns intra-subject patterns of functional connectivity across subjects. We derive functional connectivity patterns by correlating fMRI BOLD time-series, measured during movie viewing, between spatially remote cortical regions. We validate our technique extensively on real fMRI experimental data and compare our method to two state-of-the-art inter-subject registration algorithms. By cross-validating our method on independent datasets, we show that the derived alignment generalizes well to other experimental paradigms.
Asunto(s)
Algoritmos , Mapeo Encefálico/métodos , Corteza Cerebral/anatomía & histología , Procesamiento de Imagen Asistido por Computador/métodos , Vías Nerviosas/anatomía & histología , Femenino , Humanos , Imagen por Resonancia Magnética , Masculino , Adulto JovenRESUMEN
How quickly can information about the neural response to a visual stimulus be detected in the hemodynamic response measured using fMRI? Multi-voxel pattern analysis (MVPA) uses pattern classification to detect subtle stimulus-specific information from patterns of responses among voxels, including information that cannot be detected in the average response across a given brain region. Here we use MVPA in combination with rapid temporal sampling of the fMRI signal to investigate the temporal evolution of classification accuracy and its relationship to the average regional hemodynamic response. In primary visual cortex (V1) stimulus information can be detected in the pattern of voxel responses more than a second before the average hemodynamic response of V1 deviates from baseline, and classification accuracy peaks before the peak of the average hemodynamic response. Both of these effects are restricted to early visual cortex, with higher level areas showing no difference or, in some cases, the opposite temporal relationship. These results have methodological implications for fMRI studies using MVPA because they demonstrate that information can be decoded from hemodynamic activity more quickly than previously assumed.
Asunto(s)
Mapeo Encefálico/métodos , Reconocimiento Visual de Modelos/fisiología , Corteza Visual/fisiología , Adulto , Femenino , Hemodinámica , Humanos , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Masculino , Estimulación LuminosaRESUMEN
Intelligent thought is the product of efficient neural information processing, which is embedded in fine-grained, topographically organized population responses and supported by fine-grained patterns of connectivity among cortical fields. Previous work on the neural basis of intelligence, however, has focused on coarse-grained features of brain anatomy and function because cortical topographies are highly idiosyncratic at a finer scale, obscuring individual differences in fine-grained connectivity patterns. We used a computational algorithm, hyperalignment, to resolve these topographic idiosyncrasies and found that predictions of general intelligence based on fine-grained (vertex-by-vertex) connectivity patterns were markedly stronger than predictions based on coarse-grained (region-by-region) patterns. Intelligence was best predicted by fine-grained connectivity in the default and frontoparietal cortical systems, both of which are associated with self-generated thought. Previous work overlooked fine-grained architecture because existing methods could not resolve idiosyncratic topographies, preventing investigation where the keys to the neural basis of intelligence are more likely to be found.
Asunto(s)
Algoritmos , Corteza Cerebral , Inteligencia/fisiología , Red Nerviosa , Adulto , Corteza Cerebral/diagnóstico por imagen , Corteza Cerebral/fisiología , Humanos , Individualidad , Imagen por Resonancia Magnética , Red Nerviosa/diagnóstico por imagen , Red Nerviosa/fisiología , Procesamiento de Señales Asistido por Computador , Adulto JovenRESUMEN
Cognitive maps are mental representations of spatial and conceptual relationships in an environment, and are critical for flexible behavior. To form these abstract maps, the hippocampus has to learn to separate or merge aliased observations appropriately in different contexts in a manner that enables generalization and efficient planning. Here we propose a specific higher-order graph structure, clone-structured cognitive graph (CSCG), which forms clones of an observation for different contexts as a representation that addresses these problems. CSCGs can be learned efficiently using a probabilistic sequence model that is inherently robust to uncertainty. We show that CSCGs can explain a variety of cognitive map phenomena such as discovering spatial relations from aliased sensations, transitive inference between disjoint episodes, and formation of transferable schemas. Learning different clones for different contexts explains the emergence of splitter cells observed in maze navigation and event-specific responses in lap-running experiments. Moreover, learning and inference dynamics of CSCGs offer a coherent explanation for disparate place cell remapping phenomena. By lifting aliased observations into a hidden space, CSCGs reveal latent modularity useful for hierarchical abstraction and planning. Altogether, CSCG provides a simple unifying framework for understanding hippocampal function, and could be a pathway for forming relational abstractions in artificial intelligence.
Asunto(s)
Cognición/fisiología , Hipocampo/fisiología , Aprendizaje/fisiología , Modelos Neurológicos , Redes Neurales de la Computación , Humanos , Cadenas de MarkovRESUMEN
Despite the recent progress in AI powered by deep learning in solving narrow tasks, we are not close to human intelligence in its flexibility, versatility, and efficiency. Efficient learning and effective generalization come from inductive biases, and building Artificial General Intelligence (AGI) is an exercise in finding the right set of inductive biases that make fast learning possible while being general enough to be widely applicable in tasks that humans excel at. To make progress in AGI, we argue that we can look at the human brain for such inductive biases and principles of generalization. To that effect, we propose a strategy to gain insights from the brain by simultaneously looking at the world it acts upon and the computational framework to support efficient learning and generalization. We present a neuroscience-inspired generative model of vision as a case study for such approach and discuss some open problems about the path to AGI.
RESUMEN
Information that is shared across brains is encoded in idiosyncratic fine-scale functional topographies. Hyperalignment captures shared information by projecting pattern vectors for neural responses and connectivities into a common, high-dimensional information space, rather than by aligning topographies in a canonical anatomical space. Individual transformation matrices project information from individual anatomical spaces into the common model information space, preserving the geometry of pairwise dissimilarities between pattern vectors, and model cortical topography as mixtures of overlapping, individual-specific topographic basis functions, rather than as contiguous functional areas. The fundamental property of brain function that is preserved across brains is information content, rather than the functional properties of local features that support that content. In this Perspective, we present the conceptual framework that motivates hyperalignment, its computational underpinnings for joint modeling of a common information space and idiosyncratic cortical topographies, and discuss implications for understanding the structure of cortical functional architecture.
Asunto(s)
Corteza Cerebral/anatomía & histología , Corteza Cerebral/fisiología , Modelos Neurológicos , Red Nerviosa/anatomía & histología , Red Nerviosa/fisiología , Algoritmos , Corteza Cerebral/diagnóstico por imagen , Conectoma , Electroencefalografía , Humanos , Magnetoencefalografía , Red Nerviosa/diagnóstico por imagenRESUMEN
Humans can infer concepts from image pairs and apply those in the physical world in a completely different setting, enabling tasks like IKEA assembly from diagrams. If robots could represent and infer high-level concepts, then it would notably improve their ability to understand our intent and to transfer tasks between different environments. To that end, we introduce a computational framework that replicates aspects of human concept learning. Concepts are represented as programs on a computer architecture consisting of a visual perception system, working memory, and action controller. The instruction set of this cognitive computer has commands for parsing a visual scene, directing gaze and attention, imagining new objects, manipulating the contents of a visual working memory, and controlling arm movement. Inferring a concept corresponds to inducing a program that can transform the input to the output. Some concepts require the use of imagination and recursion. Previously learned concepts simplify the learning of subsequent, more elaborate concepts and create a hierarchy of abstractions. We demonstrate how a robot can use these abstractions to interpret novel concepts presented to it as schematic images and then apply those concepts in very different situations. By bringing cognitive science ideas on mental imagery, perceptual symbols, embodied cognition, and deictic mechanisms into the realm of machine learning, our work brings us closer to the goal of building robots that have interpretable representations and common sense.