Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 62
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
PLoS Biol ; 22(4): e3002564, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38557761

RESUMEN

Behavioral and neuroscience studies in humans and primates have shown that memorability is an intrinsic property of an image that predicts its strength of encoding into and retrieval from memory. While previous work has independently probed when or where this memorability effect may occur in the human brain, a description of its spatiotemporal dynamics is missing. Here, we used representational similarity analysis (RSA) to combine functional magnetic resonance imaging (fMRI) with source-estimated magnetoencephalography (MEG) to simultaneously measure when and where the human cortex is sensitive to differences in image memorability. Results reveal that visual perception of High Memorable images, compared to Low Memorable images, recruits a set of regions of interest (ROIs) distributed throughout the ventral visual cortex: a late memorability response (from around 300 ms) in early visual cortex (EVC), inferior temporal cortex, lateral occipital cortex, fusiform gyrus, and banks of the superior temporal sulcus. Image memorability magnitude results are represented after high-level feature processing in visual regions and reflected in classical memory regions in the medial temporal lobe (MTL). Our results present, to our knowledge, the first unified spatiotemporal account of visual memorability effect across the human cortex, further supporting the levels-of-processing theory of perception and memory.


Asunto(s)
Encéfalo , Percepción Visual , Animales , Humanos , Percepción Visual/fisiología , Encéfalo/fisiología , Corteza Cerebral/fisiología , Lóbulo Temporal/diagnóstico por imagen , Lóbulo Temporal/fisiología , Magnetoencefalografía/métodos , Imagen por Resonancia Magnética/métodos , Mapeo Encefálico/métodos
2.
Proc Natl Acad Sci U S A ; 117(37): 23011-23020, 2020 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-32839334

RESUMEN

The fusiform face area responds selectively to faces and is causally involved in face perception. How does face-selectivity in the fusiform arise in development, and why does it develop so systematically in the same location across individuals? Preferential cortical responses to faces develop early in infancy, yet evidence is conflicting on the central question of whether visual experience with faces is necessary. Here, we revisit this question by scanning congenitally blind individuals with fMRI while they haptically explored 3D-printed faces and other stimuli. We found robust face-selective responses in the lateral fusiform gyrus of individual blind participants during haptic exploration of stimuli, indicating that neither visual experience with faces nor fovea-biased inputs is necessary for face-selectivity to arise in the lateral fusiform gyrus. Our results instead suggest a role for long-range connectivity in specifying the location of face-selectivity in the human brain.


Asunto(s)
Cara/fisiología , Reconocimiento Facial/fisiología , Lóbulo Temporal/fisiología , Percepción Visual/fisiología , Adulto , Mapeo Encefálico/métodos , Femenino , Humanos , Imagen por Resonancia Magnética/métodos , Masculino , Reconocimiento Visual de Modelos/fisiología , Estimulación Luminosa/métodos , Reconocimiento en Psicología/fisiología
3.
Cogn Neuropsychol ; 38(7-8): 468-489, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35729704

RESUMEN

How does the auditory system categorize natural sounds? Here we apply multimodal neuroimaging to illustrate the progression from acoustic to semantically dominated representations. Combining magnetoencephalographic (MEG) and functional magnetic resonance imaging (fMRI) scans of observers listening to naturalistic sounds, we found superior temporal responses beginning ∼55 ms post-stimulus onset, spreading to extratemporal cortices by ∼100 ms. Early regions were distinguished less by onset/peak latency than by functional properties and overall temporal response profiles. Early acoustically-dominated representations trended systematically toward category dominance over time (after ∼200 ms) and space (beyond primary cortex). Semantic category representation was spatially specific: Vocalizations were preferentially distinguished in frontotemporal voice-selective regions and the fusiform; scenes and objects were distinguished in parahippocampal and medial place areas. Our results are consistent with real-world events coded via an extended auditory processing hierarchy, in which acoustic representations rapidly enter multiple streams specialized by category, including areas typically considered visual cortex.


Asunto(s)
Mapeo Encefálico , Semántica , Estimulación Acústica/métodos , Percepción Auditiva/fisiología , Mapeo Encefálico/métodos , Cóclea , Humanos , Imagen por Resonancia Magnética/métodos , Magnetoencefalografía/métodos
4.
J Cogn Neurosci ; 30(11): 1559-1576, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-29877767

RESUMEN

Animacy and real-world size are properties that describe any object and thus bring basic order into our perception of the visual world. Here, we investigated how the human brain processes real-world size and animacy. For this, we applied representational similarity to fMRI and MEG data to yield a view of brain activity with high spatial and temporal resolutions, respectively. Analysis of fMRI data revealed that a distributed and partly overlapping set of cortical regions extending from occipital to ventral and medial temporal cortex represented animacy and real-world size. Within this set, parahippocampal cortex stood out as the region representing animacy and size stronger than most other regions. Further analysis of the detailed representational format revealed differences among regions involved in processing animacy. Analysis of MEG data revealed overlapping temporal dynamics of animacy and real-world size processing starting at around 150 msec and provided the first neuromagnetic signature of real-world object size processing. Finally, to investigate the neural dynamics of size and animacy processing simultaneously in space and time, we combined MEG and fMRI with a novel extension of MEG-fMRI fusion by representational similarity. This analysis revealed partly overlapping and distributed spatiotemporal dynamics, with parahippocampal cortex singled out as a region that represented size and animacy persistently when other regions did not. Furthermore, the analysis highlighted the role of early visual cortex in representing real-world size. A control analysis revealed that the neural dynamics of processing animacy and size were distinct from the neural dynamics of processing low-level visual features. Together, our results provide a detailed spatiotemporal view of animacy and size processing in the human brain.


Asunto(s)
Mapeo Encefálico/métodos , Corteza Cerebral/diagnóstico por imagen , Corteza Cerebral/fisiología , Estimulación Luminosa/métodos , Percepción Espacial/fisiología , Adulto , Femenino , Humanos , Imagen por Resonancia Magnética/métodos , Magnetoencefalografía/métodos , Masculino , Factores de Tiempo , Adulto Joven
5.
Neuroimage ; 153: 346-358, 2017 06.
Artículo en Inglés | MEDLINE | ID: mdl-27039703

RESUMEN

Human scene recognition is a rapid multistep process evolving over time from single scene image to spatial layout processing. We used multivariate pattern analyses on magnetoencephalography (MEG) data to unravel the time course of this cortical process. Following an early signal for lower-level visual analysis of single scenes at ~100ms, we found a marker of real-world scene size, i.e. spatial layout processing, at ~250ms indexing neural representations robust to changes in unrelated scene properties and viewing conditions. For a quantitative model of how scene size representations may arise in the brain, we compared MEG data to a deep neural network model trained on scene classification. Representations of scene size emerged intrinsically in the model, and resolved emerging neural scene size representation. Together our data provide a first description of an electrophysiological signal for layout processing in humans, and suggest that deep neural networks are a promising framework to investigate how spatial layout representations emerge in the human brain.


Asunto(s)
Mapeo Encefálico/métodos , Corteza Cerebral/fisiología , Redes Neurales de la Computación , Reconocimiento Visual de Modelos/fisiología , Adulto , Femenino , Humanos , Magnetoencefalografía , Masculino , Modelos Neurológicos , Análisis Multivariante , Estimulación Luminosa , Adulto Joven
6.
Neuroimage ; 149: 141-152, 2017 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-28132932

RESUMEN

A long-standing question in neuroscience is how perceptual processes select stimuli for encoding and later retrieval by memory processes. Using a functional magnetic resonance imaging study with human participants, we report the discovery of a global, stimulus-driven processing stream that we call memorability. Memorability automatically tags the statistical distinctiveness of stimuli for later encoding, and shows separate neural signatures from both low-level perception (memorability shows no signal in early visual cortex) and classical subsequent memory based on individual memory. Memorability and individual subsequent memory show dissociable neural substrates: first, memorability effects consistently emerge in the medial temporal lobe (MTL), whereas individual subsequent memory effects emerge in the prefrontal cortex (PFC). Second, memorability effects remain consistent even in the absence of memory (i.e., for forgotten images). Third, the MTL shows higher correlations with memorability-based patterns, while the PFC shows higher correlations with individual memory voxels patterns. Taken together, these results support a reformulated framework of the interplay between perception and memory, with the MTL determining stimulus statistics and distinctiveness to support later memory encoding, and the PFC comparing stimuli to specific individual memories. As stimulus memorability is a confound present in many previous memory studies, these findings should stimulate a revisitation of the neural streams dedicated to perception and memory.


Asunto(s)
Encéfalo/fisiología , Memoria/fisiología , Percepción Visual/fisiología , Adulto , Mapeo Encefálico/métodos , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Masculino , Adulto Joven
7.
Cereb Cortex ; 26(8): 3563-3579, 2016 08.
Artículo en Inglés | MEDLINE | ID: mdl-27235099

RESUMEN

Every human cognitive function, such as visual object recognition, is realized in a complex spatio-temporal activity pattern in the brain. Current brain imaging techniques in isolation cannot resolve the brain's spatio-temporal dynamics, because they provide either high spatial or temporal resolution but not both. To overcome this limitation, we developed an integration approach that uses representational similarities to combine measurements of magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI) to yield a spatially and temporally integrated characterization of neuronal activation. Applying this approach to 2 independent MEG-fMRI data sets, we observed that neural activity first emerged in the occipital pole at 50-80 ms, before spreading rapidly and progressively in the anterior direction along the ventral and dorsal visual streams. Further region-of-interest analyses established that dorsal and ventral regions showed MEG-fMRI correspondence in representations later than early visual cortex. Together, these results provide a novel and comprehensive, spatio-temporally resolved view of the rapid neural dynamics during the first few hundred milliseconds of object vision. They further demonstrate the feasibility of spatially unbiased representational similarity-based fusion of MEG and fMRI, promising new insights into how the brain computes complex cognitive functions.


Asunto(s)
Corteza Cerebral/fisiología , Imagen por Resonancia Magnética , Magnetoencefalografía , Reconocimiento Visual de Modelos/fisiología , Reconocimiento en Psicología/fisiología , Adulto , Mapeo Encefálico/métodos , Corteza Cerebral/diagnóstico por imagen , Estudios de Factibilidad , Femenino , Humanos , Imagen por Resonancia Magnética/métodos , Magnetoencefalografía/métodos , Masculino , Imagen Multimodal/métodos , Pruebas Neuropsicológicas , Procesamiento de Señales Asistido por Computador , Vías Visuales/diagnóstico por imagen , Vías Visuales/fisiología
8.
Cereb Cortex ; 25(7): 1792-805, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24436318

RESUMEN

Estimating the size of a space and its degree of clutter are effortless and ubiquitous tasks of moving agents in a natural environment. Here, we examine how regions along the occipital-temporal lobe respond to pictures of indoor real-world scenes that parametrically vary in their physical "size" (the spatial extent of a space bounded by walls) and functional "clutter" (the organization and quantity of objects that fill up the space). Using a linear regression model on multivoxel pattern activity across regions of interest, we find evidence that both properties of size and clutter are represented in the patterns of parahippocampal cortex, while the retrosplenial cortex activity patterns are predominantly sensitive to the size of a space, rather than the degree of clutter. Parametric whole-brain analyses confirmed these results. Importantly, this size and clutter information was represented in a way that generalized across different semantic categories. These data provide support for a property-based representation of spaces, distributed across multiple scene-selective regions of the cerebral cortex.


Asunto(s)
Encéfalo/fisiología , Percepción Visual/fisiología , Adulto , Mapeo Encefálico , Femenino , Humanos , Imagen por Resonancia Magnética , Masculino , Análisis Multivariante , Estimulación Luminosa/métodos , Adulto Joven
9.
Neuroimage ; 122: 408-16, 2015 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-26236029

RESUMEN

While several cortical regions have been highlighted for their category selectivity (e.g., scene-selective regions like the parahippocampal place area, object selective regions like the lateral occipital complex), a growing trend in cognitive neuroscience has been to investigate what particular perceptual properties these regions calculate. Classical scene-selective regions have been particularly targeted in recent work as being sensitive to object size or other related properties. Here we test to which extent these regions are sensitive to spatial information of stimuli at any size. We introduce the spatial object property of "interaction envelope," defined as the space through which a user transverses to interact with an object. In two functional magnetic resonance imaging experiments, we examined activity in a comprehensive set of perceptual regions of interest for when human participants viewed object images varying along the dimensions of interaction envelope and physical size. Importantly, we controlled for confounding perceptual and semantic object properties. We find that scene-selective regions are in fact sensitive to object interaction envelope for small, manipulable objects regardless of real-world size and task. Meanwhile, small-scale entity regions maintain selectivity to stimulus physical size. These results indicate that regions traditionally associated with scene processing may not be solely sensitive to larger object and scene information, but instead are calculating local spatial information of objects and scenes of all sizes.


Asunto(s)
Encéfalo/fisiología , Percepción de Forma/fisiología , Reconocimiento Visual de Modelos/fisiología , Adolescente , Adulto , Mapeo Encefálico , Femenino , Humanos , Imagen por Resonancia Magnética , Masculino , Estimulación Luminosa , Adulto Joven
10.
Nat Commun ; 15(1): 6241, 2024 Jul 24.
Artículo en Inglés | MEDLINE | ID: mdl-39048577

RESUMEN

Studying the neural basis of human dynamic visual perception requires extensive experimental data to evaluate the large swathes of functionally diverse brain neural networks driven by perceiving visual events. Here, we introduce the BOLD Moments Dataset (BMD), a repository of whole-brain fMRI responses to over 1000 short (3 s) naturalistic video clips of visual events across ten human subjects. We use the videos' extensive metadata to show how the brain represents word- and sentence-level descriptions of visual events and identify correlates of video memorability scores extending into the parietal cortex. Furthermore, we reveal a match in hierarchical processing between cortical regions of interest and video-computable deep neural networks, and we showcase that BMD successfully captures temporal dynamics of visual events at second resolution. With its rich metadata, BMD offers new perspectives and accelerates research on the human brain basis of visual event perception.


Asunto(s)
Mapeo Encefálico , Encéfalo , Imagen por Resonancia Magnética , Metadatos , Percepción Visual , Humanos , Imagen por Resonancia Magnética/métodos , Percepción Visual/fisiología , Masculino , Femenino , Mapeo Encefálico/métodos , Adulto , Encéfalo/fisiología , Encéfalo/diagnóstico por imagen , Lóbulo Parietal/fisiología , Lóbulo Parietal/diagnóstico por imagen , Adulto Joven , Estimulación Luminosa , Grabación en Video
11.
Psychol Sci ; 24(6): 981-90, 2013 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-23630219

RESUMEN

Visual long-term memory can store thousands of objects with surprising visual detail, but just how detailed are these representations, and how can one quantify this fidelity? Using the property of color as a case study, we estimated the precision of visual information in long-term memory, and compared this with the precision of the same information in working memory. Observers were shown real-world objects in random colors and were asked to recall the colors after a delay. We quantified two parameters of performance: the variability of internal representations of color (fidelity) and the probability of forgetting an object's color altogether. Surprisingly, the fidelity of color information in long-term memory was comparable to the asymptotic precision of working memory. These results suggest that long-term memory and working memory may be constrained by a common limit, such as a bound on the fidelity required to retrieve a memory representation.


Asunto(s)
Memoria a Largo Plazo/fisiología , Memoria a Corto Plazo/fisiología , Recuerdo Mental/fisiología , Percepción Visual/fisiología , Adolescente , Adulto , Percepción de Color/fisiología , Humanos , Adulto Joven
12.
J Neurosci ; 31(4): 1333-40, 2011 Jan 26.
Artículo en Inglés | MEDLINE | ID: mdl-21273418

RESUMEN

Behavioral and computational studies suggest that visual scene analysis rapidly produces a rich description of both the objects and the spatial layout of surfaces in a scene. However, there is still a large gap in our understanding of how the human brain accomplishes these diverse functions of scene understanding. Here we probe the nature of real-world scene representations using multivoxel functional magnetic resonance imaging pattern analysis. We show that natural scenes are analyzed in a distributed and complementary manner by the parahippocampal place area (PPA) and the lateral occipital complex (LOC) in particular, as well as other regions in the ventral stream. Specifically, we study the classification performance of different scene-selective regions using images that vary in spatial boundary and naturalness content. We discover that, whereas both the PPA and LOC can accurately classify scenes, they make different errors: the PPA more often confuses scenes that have the same spatial boundaries, whereas the LOC more often confuses scenes that have the same content. By demonstrating that visual scene analysis recruits distinct and complementary high-level representations, our results testify to distinct neural pathways for representing the spatial boundaries and content of a visual scene.


Asunto(s)
Lóbulo Occipital/fisiología , Giro Parahipocampal/fisiología , Percepción Visual , Adulto , Mapeo Encefálico , Femenino , Humanos , Imagen por Resonancia Magnética , Masculino , Estimulación Luminosa , Adulto Joven
13.
Proc Natl Acad Sci U S A ; 106(18): 7345-50, 2009 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-19380739

RESUMEN

There is a great deal of structural regularity in the natural environment, and such regularities confer an opportunity to form compressed, efficient representations. Although this concept has been extensively studied within the domain of low-level sensory coding, there has been limited focus on efficient coding in the field of visual attention. Here we show that spatial patterns of orientation information ("spatial ensemble statistics") can be efficiently encoded under conditions of reduced attention. In our task, observers monitored for changes to the spatial pattern of background elements while they were attentively tracking moving objects in the foreground. By using stimuli that enable us to dissociate changes in local structure from changes in the ensemble structure, we found that observers were more sensitive to changes to the background that altered the ensemble structure than to changes that did not alter the ensemble structure. We propose that reducing attention to the background increases the amount of noise in local feature representations, but that spatial ensemble statistics capitalize on structural regularities to overcome this noise by pooling across local measurements, gaining precision in the representation of the ensemble.


Asunto(s)
Atención/fisiología , Percepción Espacial/fisiología , Visión Ocular/fisiología , Adolescente , Adulto , Humanos , Adulto Joven
14.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9434-9445, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-34752386

RESUMEN

Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds. However, most large-scale datasets built to train models for action recognition in video only provide a single label per video. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not learn the full spectrum of information present in each video in training. Towards this goal, we present the Multi-Moments in Time dataset (M-MiT) which includes over two million action labels for over one million three second videos. This multi-label dataset introduces novel challenges on how to train and analyze models for multi-action detection. Here, we present baseline results for multi-action recognition using loss functions adapted for long tail multi-label learning, provide improved methods for visualizing and interpreting models trained for multi-label action detection and show the strength of transferring models trained on M-MiT to smaller datasets.


Asunto(s)
Algoritmos , Aprendizaje
15.
Proc Natl Acad Sci U S A ; 105(38): 14325-9, 2008 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-18787113

RESUMEN

One of the major lessons of memory research has been that human memory is fallible, imprecise, and subject to interference. Thus, although observers can remember thousands of images, it is widely assumed that these memories lack detail. Contrary to this assumption, here we show that long-term memory is capable of storing a massive number of objects with details from the image. Participants viewed pictures of 2,500 objects over the course of 5.5 h. Afterward, they were shown pairs of images and indicated which of the two they had seen. The previously viewed item could be paired with either an object from a novel category, an object of the same basic-level category, or the same object in a different state or pose. Performance in each of these conditions was remarkably high (92%, 88%, and 87%, respectively), suggesting that participants successfully maintained detailed representations of thousands of images. These results have implications for cognitive models, in which capacity limitations impose a primary computational constraint (e.g., models of object recognition), and pose a challenge to neural models of memory storage and retrieval, which must be able to account for such a large and detailed storage capacity.


Asunto(s)
Reconocimiento Visual de Modelos/fisiología , Reconocimiento en Psicología/fisiología , Adulto , Humanos , Estimulación Luminosa/métodos , Factores de Tiempo
16.
Psychol Sci ; 21(11): 1551-6, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20921574

RESUMEN

Observers can store thousands of object images in visual long-term memory with high fidelity, but the fidelity of scene representations in long-term memory is not known. Here, we probed scene-representation fidelity by varying the number of studied exemplars in different scene categories and testing memory using exemplar-level foils. Observers viewed thousands of scenes over 5.5 hr and then completed a series of forced-choice tests. Memory performance was high, even with up to 64 scenes from the same category in memory. Moreover, there was only a 2% decrease in accuracy for each doubling of the number of studied scene exemplars. Surprisingly, this degree of categorical interference was similar to the degree previously demonstrated for object memory. Thus, although scenes have often been defined as a superset of objects, our results suggest that scenes and objects may be entities at a similar level of abstraction in visual long-term memory.


Asunto(s)
Atención , Discriminación en Psicología , Reconocimiento Visual de Modelos , Retención en Psicología , Adulto , Conducta de Elección , Femenino , Humanos , Masculino , Práctica Psicológica , Reconocimiento en Psicología , Adulto Joven
17.
J Vis ; 10(1): 2.1-25, 2010 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-20143895

RESUMEN

The relationship between image features and scene structure is central to the study of human visual perception and computer vision, but many of the specifics of real-world layout perception remain unknown. We do not know which image features are relevant to perceiving layout properties, or whether those features provide the same information for every type of image. Furthermore, we do not know the spatial resolutions required for perceiving different properties. This paper describes an experiment and a computational model that provides new insights on these issues. Humans perceive the global spatial layout properties such as dominant depth, openness, and perspective, from a single image. This work describes an algorithm that reliably predicts human layout judgments. This model's predictions are general, not specific to the observers it trained on. Analysis reveals that the optimal spatial resolutions for determining layout vary with the content of the space and the property being estimated. Openness is best estimated at high resolution, depth is best estimated at medium resolution, and perspective is best estimated at low resolution. Given the reliability and simplicity of estimating the global layout of real-world environments, this model could help resolve perceptual ambiguities encountered by more detailed scene reconstruction schemas.


Asunto(s)
Inteligencia Artificial , Modelos Neurológicos , Percepción Visual/fisiología , Adolescente , Adulto , Algoritmos , Señales (Psicología) , Percepción de Profundidad/fisiología , Percepción de Forma/fisiología , Humanos , Estimulación Luminosa/métodos , Percepción Espacial/fisiología , Adulto Joven
18.
Neuron ; 107(5): 772-781, 2020 09 09.
Artículo en Inglés | MEDLINE | ID: mdl-32721379

RESUMEN

Any cognitive function is mediated by a network of many cortical sites whose activity is orchestrated through complex temporal dynamics. To understand cognition, we need to identify brain responses simultaneously in space and time. Here we present a technique that does this by linking multivariate response patterns of the human brain recorded with functional magnetic resonance imaging (fMRI) and with magneto- or electroencephalography (M/EEG) based on representational similarity. We present the rationale and current applications of this non-invasive analysis technique, termed M/EEG-fMRI fusion, and discuss its pros and cons. We highlight its wide applicability in cognitive neuroscience and how its openness to further development and extension gives it strong potential for a deeper understanding of cognition in the future.


Asunto(s)
Encéfalo/fisiología , Imagen Multimodal/métodos , Neuroimagen/métodos , Mapeo Encefálico/métodos , Electroencefalografía/métodos , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética/métodos , Magnetoencefalografía/métodos
19.
Sci Rep ; 10(1): 4638, 2020 03 13.
Artículo en Inglés | MEDLINE | ID: mdl-32170209

RESUMEN

Research at the intersection of computer vision and neuroscience has revealed hierarchical correspondence between layers of deep convolutional neural networks (DCNNs) and cascade of regions along human ventral visual cortex. Recently, studies have uncovered emergence of human interpretable concepts within DCNNs layers trained to identify visual objects and scenes. Here, we asked whether an artificial neural network (with convolutional structure) trained for visual categorization would demonstrate spatial correspondences with human brain regions showing central/peripheral biases. Using representational similarity analysis, we compared activations of convolutional layers of a DCNN trained for object and scene categorization with neural representations in human brain visual regions. Results reveal a brain-like topographical organization in the layers of the DCNN, such that activations of layer-units with central-bias were associated with brain regions with foveal tendencies (e.g. fusiform gyrus), and activations of layer-units with selectivity for image backgrounds were associated with cortical regions showing peripheral preference (e.g. parahippocampal cortex). The emergence of a categorical topographical correspondence between DCNNs and brain regions suggests these models are a good approximation of the perceptual representation generated by biological neural networks.


Asunto(s)
Reconocimiento Visual de Modelos/fisiología , Corteza Visual/fisiología , Adulto , Femenino , Humanos , Imagen por Resonancia Magnética , Masculino , Modelos Neurológicos , Redes Neurales de la Computación , Estimulación Luminosa , Corteza Visual/diagnóstico por imagen , Adulto Joven
20.
IEEE Trans Pattern Anal Mach Intell ; 42(2): 502-508, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-30802849

RESUMEN

We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for actions occurring in 3 second videos poses many challenges: meaningful events do not include only people, but also objects, animals, and natural phenomena; visual and auditory events can be symmetrical in time ("opening" is "closing" in reverse), and either transient or sustained. We describe the annotation process of our dataset (each video is tagged with one action or activity label among 339 different classes), analyze its scale and diversity in comparison to other large-scale video datasets for action recognition, and report results of several baseline models addressing separately, and jointly, three modalities: spatial, temporal and auditory. The Moments in Time dataset, designed to have a large coverage and diversity of events in both visual and auditory modalities, can serve as a new challenge to develop models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis.


Asunto(s)
Bases de Datos Factuales , Grabación en Video , Animales , Actividades Humanas/clasificación , Humanos , Procesamiento de Imagen Asistido por Computador , Reconocimiento de Normas Patrones Automatizadas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA