Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 62
Filtrar
1.
PLoS Biol ; 22(4): e3002564, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38557761

RESUMO

Behavioral and neuroscience studies in humans and primates have shown that memorability is an intrinsic property of an image that predicts its strength of encoding into and retrieval from memory. While previous work has independently probed when or where this memorability effect may occur in the human brain, a description of its spatiotemporal dynamics is missing. Here, we used representational similarity analysis (RSA) to combine functional magnetic resonance imaging (fMRI) with source-estimated magnetoencephalography (MEG) to simultaneously measure when and where the human cortex is sensitive to differences in image memorability. Results reveal that visual perception of High Memorable images, compared to Low Memorable images, recruits a set of regions of interest (ROIs) distributed throughout the ventral visual cortex: a late memorability response (from around 300 ms) in early visual cortex (EVC), inferior temporal cortex, lateral occipital cortex, fusiform gyrus, and banks of the superior temporal sulcus. Image memorability magnitude results are represented after high-level feature processing in visual regions and reflected in classical memory regions in the medial temporal lobe (MTL). Our results present, to our knowledge, the first unified spatiotemporal account of visual memorability effect across the human cortex, further supporting the levels-of-processing theory of perception and memory.


Assuntos
Encéfalo , Percepção Visual , Animais , Humanos , Percepção Visual/fisiologia , Encéfalo/fisiologia , Córtex Cerebral/fisiologia , Lobo Temporal/diagnóstico por imagem , Lobo Temporal/fisiologia , Magnetoencefalografia/métodos , Imageamento por Ressonância Magnética/métodos , Mapeamento Encefálico/métodos
2.
Proc Natl Acad Sci U S A ; 117(37): 23011-23020, 2020 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-32839334

RESUMO

The fusiform face area responds selectively to faces and is causally involved in face perception. How does face-selectivity in the fusiform arise in development, and why does it develop so systematically in the same location across individuals? Preferential cortical responses to faces develop early in infancy, yet evidence is conflicting on the central question of whether visual experience with faces is necessary. Here, we revisit this question by scanning congenitally blind individuals with fMRI while they haptically explored 3D-printed faces and other stimuli. We found robust face-selective responses in the lateral fusiform gyrus of individual blind participants during haptic exploration of stimuli, indicating that neither visual experience with faces nor fovea-biased inputs is necessary for face-selectivity to arise in the lateral fusiform gyrus. Our results instead suggest a role for long-range connectivity in specifying the location of face-selectivity in the human brain.


Assuntos
Face/fisiologia , Reconhecimento Facial/fisiologia , Lobo Temporal/fisiologia , Percepção Visual/fisiologia , Adulto , Mapeamento Encefálico/métodos , Feminino , Humanos , Imageamento por Ressonância Magnética/métodos , Masculino , Reconhecimento Visual de Modelos/fisiologia , Estimulação Luminosa/métodos , Reconhecimento Psicológico/fisiologia
3.
Cogn Neuropsychol ; 38(7-8): 468-489, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35729704

RESUMO

How does the auditory system categorize natural sounds? Here we apply multimodal neuroimaging to illustrate the progression from acoustic to semantically dominated representations. Combining magnetoencephalographic (MEG) and functional magnetic resonance imaging (fMRI) scans of observers listening to naturalistic sounds, we found superior temporal responses beginning ∼55 ms post-stimulus onset, spreading to extratemporal cortices by ∼100 ms. Early regions were distinguished less by onset/peak latency than by functional properties and overall temporal response profiles. Early acoustically-dominated representations trended systematically toward category dominance over time (after ∼200 ms) and space (beyond primary cortex). Semantic category representation was spatially specific: Vocalizations were preferentially distinguished in frontotemporal voice-selective regions and the fusiform; scenes and objects were distinguished in parahippocampal and medial place areas. Our results are consistent with real-world events coded via an extended auditory processing hierarchy, in which acoustic representations rapidly enter multiple streams specialized by category, including areas typically considered visual cortex.


Assuntos
Mapeamento Encefálico , Semântica , Estimulação Acústica/métodos , Percepção Auditiva/fisiologia , Mapeamento Encefálico/métodos , Cóclea , Humanos , Imageamento por Ressonância Magnética/métodos , Magnetoencefalografia/métodos
4.
J Cogn Neurosci ; 30(11): 1559-1576, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-29877767

RESUMO

Animacy and real-world size are properties that describe any object and thus bring basic order into our perception of the visual world. Here, we investigated how the human brain processes real-world size and animacy. For this, we applied representational similarity to fMRI and MEG data to yield a view of brain activity with high spatial and temporal resolutions, respectively. Analysis of fMRI data revealed that a distributed and partly overlapping set of cortical regions extending from occipital to ventral and medial temporal cortex represented animacy and real-world size. Within this set, parahippocampal cortex stood out as the region representing animacy and size stronger than most other regions. Further analysis of the detailed representational format revealed differences among regions involved in processing animacy. Analysis of MEG data revealed overlapping temporal dynamics of animacy and real-world size processing starting at around 150 msec and provided the first neuromagnetic signature of real-world object size processing. Finally, to investigate the neural dynamics of size and animacy processing simultaneously in space and time, we combined MEG and fMRI with a novel extension of MEG-fMRI fusion by representational similarity. This analysis revealed partly overlapping and distributed spatiotemporal dynamics, with parahippocampal cortex singled out as a region that represented size and animacy persistently when other regions did not. Furthermore, the analysis highlighted the role of early visual cortex in representing real-world size. A control analysis revealed that the neural dynamics of processing animacy and size were distinct from the neural dynamics of processing low-level visual features. Together, our results provide a detailed spatiotemporal view of animacy and size processing in the human brain.


Assuntos
Mapeamento Encefálico/métodos , Córtex Cerebral/diagnóstico por imagem , Córtex Cerebral/fisiologia , Estimulação Luminosa/métodos , Percepção Espacial/fisiologia , Adulto , Feminino , Humanos , Imageamento por Ressonância Magnética/métodos , Magnetoencefalografia/métodos , Masculino , Fatores de Tempo , Adulto Jovem
5.
Neuroimage ; 153: 346-358, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-27039703

RESUMO

Human scene recognition is a rapid multistep process evolving over time from single scene image to spatial layout processing. We used multivariate pattern analyses on magnetoencephalography (MEG) data to unravel the time course of this cortical process. Following an early signal for lower-level visual analysis of single scenes at ~100ms, we found a marker of real-world scene size, i.e. spatial layout processing, at ~250ms indexing neural representations robust to changes in unrelated scene properties and viewing conditions. For a quantitative model of how scene size representations may arise in the brain, we compared MEG data to a deep neural network model trained on scene classification. Representations of scene size emerged intrinsically in the model, and resolved emerging neural scene size representation. Together our data provide a first description of an electrophysiological signal for layout processing in humans, and suggest that deep neural networks are a promising framework to investigate how spatial layout representations emerge in the human brain.


Assuntos
Mapeamento Encefálico/métodos , Córtex Cerebral/fisiologia , Redes Neurais de Computação , Reconhecimento Visual de Modelos/fisiologia , Adulto , Feminino , Humanos , Magnetoencefalografia , Masculino , Modelos Neurológicos , Análise Multivariada , Estimulação Luminosa , Adulto Jovem
6.
Neuroimage ; 149: 141-152, 2017 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-28132932

RESUMO

A long-standing question in neuroscience is how perceptual processes select stimuli for encoding and later retrieval by memory processes. Using a functional magnetic resonance imaging study with human participants, we report the discovery of a global, stimulus-driven processing stream that we call memorability. Memorability automatically tags the statistical distinctiveness of stimuli for later encoding, and shows separate neural signatures from both low-level perception (memorability shows no signal in early visual cortex) and classical subsequent memory based on individual memory. Memorability and individual subsequent memory show dissociable neural substrates: first, memorability effects consistently emerge in the medial temporal lobe (MTL), whereas individual subsequent memory effects emerge in the prefrontal cortex (PFC). Second, memorability effects remain consistent even in the absence of memory (i.e., for forgotten images). Third, the MTL shows higher correlations with memorability-based patterns, while the PFC shows higher correlations with individual memory voxels patterns. Taken together, these results support a reformulated framework of the interplay between perception and memory, with the MTL determining stimulus statistics and distinctiveness to support later memory encoding, and the PFC comparing stimuli to specific individual memories. As stimulus memorability is a confound present in many previous memory studies, these findings should stimulate a revisitation of the neural streams dedicated to perception and memory.


Assuntos
Encéfalo/fisiologia , Memória/fisiologia , Percepção Visual/fisiologia , Adulto , Mapeamento Encefálico/métodos , Feminino , Humanos , Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , Masculino , Adulto Jovem
7.
Cereb Cortex ; 26(8): 3563-3579, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-27235099

RESUMO

Every human cognitive function, such as visual object recognition, is realized in a complex spatio-temporal activity pattern in the brain. Current brain imaging techniques in isolation cannot resolve the brain's spatio-temporal dynamics, because they provide either high spatial or temporal resolution but not both. To overcome this limitation, we developed an integration approach that uses representational similarities to combine measurements of magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI) to yield a spatially and temporally integrated characterization of neuronal activation. Applying this approach to 2 independent MEG-fMRI data sets, we observed that neural activity first emerged in the occipital pole at 50-80 ms, before spreading rapidly and progressively in the anterior direction along the ventral and dorsal visual streams. Further region-of-interest analyses established that dorsal and ventral regions showed MEG-fMRI correspondence in representations later than early visual cortex. Together, these results provide a novel and comprehensive, spatio-temporally resolved view of the rapid neural dynamics during the first few hundred milliseconds of object vision. They further demonstrate the feasibility of spatially unbiased representational similarity-based fusion of MEG and fMRI, promising new insights into how the brain computes complex cognitive functions.


Assuntos
Córtex Cerebral/fisiologia , Imageamento por Ressonância Magnética , Magnetoencefalografia , Reconhecimento Visual de Modelos/fisiologia , Reconhecimento Psicológico/fisiologia , Adulto , Mapeamento Encefálico/métodos , Córtex Cerebral/diagnóstico por imagem , Estudos de Viabilidade , Feminino , Humanos , Imageamento por Ressonância Magnética/métodos , Magnetoencefalografia/métodos , Masculino , Imagem Multimodal/métodos , Testes Neuropsicológicos , Processamento de Sinais Assistido por Computador , Vias Visuais/diagnóstico por imagem , Vias Visuais/fisiologia
8.
Cereb Cortex ; 25(7): 1792-805, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24436318

RESUMO

Estimating the size of a space and its degree of clutter are effortless and ubiquitous tasks of moving agents in a natural environment. Here, we examine how regions along the occipital-temporal lobe respond to pictures of indoor real-world scenes that parametrically vary in their physical "size" (the spatial extent of a space bounded by walls) and functional "clutter" (the organization and quantity of objects that fill up the space). Using a linear regression model on multivoxel pattern activity across regions of interest, we find evidence that both properties of size and clutter are represented in the patterns of parahippocampal cortex, while the retrosplenial cortex activity patterns are predominantly sensitive to the size of a space, rather than the degree of clutter. Parametric whole-brain analyses confirmed these results. Importantly, this size and clutter information was represented in a way that generalized across different semantic categories. These data provide support for a property-based representation of spaces, distributed across multiple scene-selective regions of the cerebral cortex.


Assuntos
Encéfalo/fisiologia , Percepção Visual/fisiologia , Adulto , Mapeamento Encefálico , Feminino , Humanos , Imageamento por Ressonância Magnética , Masculino , Análise Multivariada , Estimulação Luminosa/métodos , Adulto Jovem
9.
Neuroimage ; 122: 408-16, 2015 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-26236029

RESUMO

While several cortical regions have been highlighted for their category selectivity (e.g., scene-selective regions like the parahippocampal place area, object selective regions like the lateral occipital complex), a growing trend in cognitive neuroscience has been to investigate what particular perceptual properties these regions calculate. Classical scene-selective regions have been particularly targeted in recent work as being sensitive to object size or other related properties. Here we test to which extent these regions are sensitive to spatial information of stimuli at any size. We introduce the spatial object property of "interaction envelope," defined as the space through which a user transverses to interact with an object. In two functional magnetic resonance imaging experiments, we examined activity in a comprehensive set of perceptual regions of interest for when human participants viewed object images varying along the dimensions of interaction envelope and physical size. Importantly, we controlled for confounding perceptual and semantic object properties. We find that scene-selective regions are in fact sensitive to object interaction envelope for small, manipulable objects regardless of real-world size and task. Meanwhile, small-scale entity regions maintain selectivity to stimulus physical size. These results indicate that regions traditionally associated with scene processing may not be solely sensitive to larger object and scene information, but instead are calculating local spatial information of objects and scenes of all sizes.


Assuntos
Encéfalo/fisiologia , Percepção de Forma/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Adolescente , Adulto , Mapeamento Encefálico , Feminino , Humanos , Imageamento por Ressonância Magnética , Masculino , Estimulação Luminosa , Adulto Jovem
10.
Nat Commun ; 15(1): 6241, 2024 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-39048577

RESUMO

Studying the neural basis of human dynamic visual perception requires extensive experimental data to evaluate the large swathes of functionally diverse brain neural networks driven by perceiving visual events. Here, we introduce the BOLD Moments Dataset (BMD), a repository of whole-brain fMRI responses to over 1000 short (3 s) naturalistic video clips of visual events across ten human subjects. We use the videos' extensive metadata to show how the brain represents word- and sentence-level descriptions of visual events and identify correlates of video memorability scores extending into the parietal cortex. Furthermore, we reveal a match in hierarchical processing between cortical regions of interest and video-computable deep neural networks, and we showcase that BMD successfully captures temporal dynamics of visual events at second resolution. With its rich metadata, BMD offers new perspectives and accelerates research on the human brain basis of visual event perception.


Assuntos
Mapeamento Encefálico , Encéfalo , Imageamento por Ressonância Magnética , Metadados , Percepção Visual , Humanos , Imageamento por Ressonância Magnética/métodos , Percepção Visual/fisiologia , Masculino , Feminino , Mapeamento Encefálico/métodos , Adulto , Encéfalo/fisiologia , Encéfalo/diagnóstico por imagem , Lobo Parietal/fisiologia , Lobo Parietal/diagnóstico por imagem , Adulto Jovem , Estimulação Luminosa , Gravação em Vídeo
11.
Psychol Sci ; 24(6): 981-90, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23630219

RESUMO

Visual long-term memory can store thousands of objects with surprising visual detail, but just how detailed are these representations, and how can one quantify this fidelity? Using the property of color as a case study, we estimated the precision of visual information in long-term memory, and compared this with the precision of the same information in working memory. Observers were shown real-world objects in random colors and were asked to recall the colors after a delay. We quantified two parameters of performance: the variability of internal representations of color (fidelity) and the probability of forgetting an object's color altogether. Surprisingly, the fidelity of color information in long-term memory was comparable to the asymptotic precision of working memory. These results suggest that long-term memory and working memory may be constrained by a common limit, such as a bound on the fidelity required to retrieve a memory representation.


Assuntos
Memória de Longo Prazo/fisiologia , Memória de Curto Prazo/fisiologia , Rememoração Mental/fisiologia , Percepção Visual/fisiologia , Adolescente , Adulto , Percepção de Cores/fisiologia , Humanos , Adulto Jovem
12.
J Neurosci ; 31(4): 1333-40, 2011 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-21273418

RESUMO

Behavioral and computational studies suggest that visual scene analysis rapidly produces a rich description of both the objects and the spatial layout of surfaces in a scene. However, there is still a large gap in our understanding of how the human brain accomplishes these diverse functions of scene understanding. Here we probe the nature of real-world scene representations using multivoxel functional magnetic resonance imaging pattern analysis. We show that natural scenes are analyzed in a distributed and complementary manner by the parahippocampal place area (PPA) and the lateral occipital complex (LOC) in particular, as well as other regions in the ventral stream. Specifically, we study the classification performance of different scene-selective regions using images that vary in spatial boundary and naturalness content. We discover that, whereas both the PPA and LOC can accurately classify scenes, they make different errors: the PPA more often confuses scenes that have the same spatial boundaries, whereas the LOC more often confuses scenes that have the same content. By demonstrating that visual scene analysis recruits distinct and complementary high-level representations, our results testify to distinct neural pathways for representing the spatial boundaries and content of a visual scene.


Assuntos
Lobo Occipital/fisiologia , Giro Para-Hipocampal/fisiologia , Percepção Visual , Adulto , Mapeamento Encefálico , Feminino , Humanos , Imageamento por Ressonância Magnética , Masculino , Estimulação Luminosa , Adulto Jovem
13.
Proc Natl Acad Sci U S A ; 106(18): 7345-50, 2009 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-19380739

RESUMO

There is a great deal of structural regularity in the natural environment, and such regularities confer an opportunity to form compressed, efficient representations. Although this concept has been extensively studied within the domain of low-level sensory coding, there has been limited focus on efficient coding in the field of visual attention. Here we show that spatial patterns of orientation information ("spatial ensemble statistics") can be efficiently encoded under conditions of reduced attention. In our task, observers monitored for changes to the spatial pattern of background elements while they were attentively tracking moving objects in the foreground. By using stimuli that enable us to dissociate changes in local structure from changes in the ensemble structure, we found that observers were more sensitive to changes to the background that altered the ensemble structure than to changes that did not alter the ensemble structure. We propose that reducing attention to the background increases the amount of noise in local feature representations, but that spatial ensemble statistics capitalize on structural regularities to overcome this noise by pooling across local measurements, gaining precision in the representation of the ensemble.


Assuntos
Atenção/fisiologia , Percepção Espacial/fisiologia , Visão Ocular/fisiologia , Adolescente , Adulto , Humanos , Adulto Jovem
14.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9434-9445, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-34752386

RESUMO

Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds. However, most large-scale datasets built to train models for action recognition in video only provide a single label per video. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not learn the full spectrum of information present in each video in training. Towards this goal, we present the Multi-Moments in Time dataset (M-MiT) which includes over two million action labels for over one million three second videos. This multi-label dataset introduces novel challenges on how to train and analyze models for multi-action detection. Here, we present baseline results for multi-action recognition using loss functions adapted for long tail multi-label learning, provide improved methods for visualizing and interpreting models trained for multi-label action detection and show the strength of transferring models trained on M-MiT to smaller datasets.


Assuntos
Algoritmos , Aprendizagem
15.
Proc Natl Acad Sci U S A ; 105(38): 14325-9, 2008 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-18787113

RESUMO

One of the major lessons of memory research has been that human memory is fallible, imprecise, and subject to interference. Thus, although observers can remember thousands of images, it is widely assumed that these memories lack detail. Contrary to this assumption, here we show that long-term memory is capable of storing a massive number of objects with details from the image. Participants viewed pictures of 2,500 objects over the course of 5.5 h. Afterward, they were shown pairs of images and indicated which of the two they had seen. The previously viewed item could be paired with either an object from a novel category, an object of the same basic-level category, or the same object in a different state or pose. Performance in each of these conditions was remarkably high (92%, 88%, and 87%, respectively), suggesting that participants successfully maintained detailed representations of thousands of images. These results have implications for cognitive models, in which capacity limitations impose a primary computational constraint (e.g., models of object recognition), and pose a challenge to neural models of memory storage and retrieval, which must be able to account for such a large and detailed storage capacity.


Assuntos
Reconhecimento Visual de Modelos/fisiologia , Reconhecimento Psicológico/fisiologia , Adulto , Humanos , Estimulação Luminosa/métodos , Fatores de Tempo
16.
Psychol Sci ; 21(11): 1551-6, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20921574

RESUMO

Observers can store thousands of object images in visual long-term memory with high fidelity, but the fidelity of scene representations in long-term memory is not known. Here, we probed scene-representation fidelity by varying the number of studied exemplars in different scene categories and testing memory using exemplar-level foils. Observers viewed thousands of scenes over 5.5 hr and then completed a series of forced-choice tests. Memory performance was high, even with up to 64 scenes from the same category in memory. Moreover, there was only a 2% decrease in accuracy for each doubling of the number of studied scene exemplars. Surprisingly, this degree of categorical interference was similar to the degree previously demonstrated for object memory. Thus, although scenes have often been defined as a superset of objects, our results suggest that scenes and objects may be entities at a similar level of abstraction in visual long-term memory.


Assuntos
Atenção , Discriminação Psicológica , Reconhecimento Visual de Modelos , Retenção Psicológica , Adulto , Comportamento de Escolha , Feminino , Humanos , Masculino , Prática Psicológica , Reconhecimento Psicológico , Adulto Jovem
17.
J Vis ; 10(1): 2.1-25, 2010 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-20143895

RESUMO

The relationship between image features and scene structure is central to the study of human visual perception and computer vision, but many of the specifics of real-world layout perception remain unknown. We do not know which image features are relevant to perceiving layout properties, or whether those features provide the same information for every type of image. Furthermore, we do not know the spatial resolutions required for perceiving different properties. This paper describes an experiment and a computational model that provides new insights on these issues. Humans perceive the global spatial layout properties such as dominant depth, openness, and perspective, from a single image. This work describes an algorithm that reliably predicts human layout judgments. This model's predictions are general, not specific to the observers it trained on. Analysis reveals that the optimal spatial resolutions for determining layout vary with the content of the space and the property being estimated. Openness is best estimated at high resolution, depth is best estimated at medium resolution, and perspective is best estimated at low resolution. Given the reliability and simplicity of estimating the global layout of real-world environments, this model could help resolve perceptual ambiguities encountered by more detailed scene reconstruction schemas.


Assuntos
Inteligência Artificial , Modelos Neurológicos , Percepção Visual/fisiologia , Adolescente , Adulto , Algoritmos , Sinais (Psicologia) , Percepção de Profundidade/fisiologia , Percepção de Forma/fisiologia , Humanos , Estimulação Luminosa/métodos , Percepção Espacial/fisiologia , Adulto Jovem
18.
Neuron ; 107(5): 772-781, 2020 09 09.
Artigo em Inglês | MEDLINE | ID: mdl-32721379

RESUMO

Any cognitive function is mediated by a network of many cortical sites whose activity is orchestrated through complex temporal dynamics. To understand cognition, we need to identify brain responses simultaneously in space and time. Here we present a technique that does this by linking multivariate response patterns of the human brain recorded with functional magnetic resonance imaging (fMRI) and with magneto- or electroencephalography (M/EEG) based on representational similarity. We present the rationale and current applications of this non-invasive analysis technique, termed M/EEG-fMRI fusion, and discuss its pros and cons. We highlight its wide applicability in cognitive neuroscience and how its openness to further development and extension gives it strong potential for a deeper understanding of cognition in the future.


Assuntos
Encéfalo/fisiologia , Imagem Multimodal/métodos , Neuroimagem/métodos , Mapeamento Encefálico/métodos , Eletroencefalografia/métodos , Humanos , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Magnetoencefalografia/métodos
19.
Sci Rep ; 10(1): 4638, 2020 03 13.
Artigo em Inglês | MEDLINE | ID: mdl-32170209

RESUMO

Research at the intersection of computer vision and neuroscience has revealed hierarchical correspondence between layers of deep convolutional neural networks (DCNNs) and cascade of regions along human ventral visual cortex. Recently, studies have uncovered emergence of human interpretable concepts within DCNNs layers trained to identify visual objects and scenes. Here, we asked whether an artificial neural network (with convolutional structure) trained for visual categorization would demonstrate spatial correspondences with human brain regions showing central/peripheral biases. Using representational similarity analysis, we compared activations of convolutional layers of a DCNN trained for object and scene categorization with neural representations in human brain visual regions. Results reveal a brain-like topographical organization in the layers of the DCNN, such that activations of layer-units with central-bias were associated with brain regions with foveal tendencies (e.g. fusiform gyrus), and activations of layer-units with selectivity for image backgrounds were associated with cortical regions showing peripheral preference (e.g. parahippocampal cortex). The emergence of a categorical topographical correspondence between DCNNs and brain regions suggests these models are a good approximation of the perceptual representation generated by biological neural networks.


Assuntos
Reconhecimento Visual de Modelos/fisiologia , Córtex Visual/fisiologia , Adulto , Feminino , Humanos , Imageamento por Ressonância Magnética , Masculino , Modelos Neurológicos , Redes Neurais de Computação , Estimulação Luminosa , Córtex Visual/diagnóstico por imagem , Adulto Jovem
20.
IEEE Trans Pattern Anal Mach Intell ; 42(2): 502-508, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-30802849

RESUMO

We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for actions occurring in 3 second videos poses many challenges: meaningful events do not include only people, but also objects, animals, and natural phenomena; visual and auditory events can be symmetrical in time ("opening" is "closing" in reverse), and either transient or sustained. We describe the annotation process of our dataset (each video is tagged with one action or activity label among 339 different classes), analyze its scale and diversity in comparison to other large-scale video datasets for action recognition, and report results of several baseline models addressing separately, and jointly, three modalities: spatial, temporal and auditory. The Moments in Time dataset, designed to have a large coverage and diversity of events in both visual and auditory modalities, can serve as a new challenge to develop models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis.


Assuntos
Bases de Dados Factuais , Gravação em Vídeo , Animais , Atividades Humanas/classificação , Humanos , Processamento de Imagem Assistida por Computador , Reconhecimento Automatizado de Padrão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA