Search | VHL Regional Portal

1.

Revealing Vision-Language Integration in the Brain with Multimodal Networks.

Subramaniam, Vighnesh; Conwell, Colin; Wang, Christopher; Kreiman, Gabriel; Katz, Boris; Cases, Ignacio; Barbu, Andrei.

ArXiv ; 2024 Jun 20.

Article in English | MEDLINE | ID: mdl-38947929

ABSTRACT

We use (multi)modal deep neural networks (DNNs) to probe for sites of multimodal integration in the human brain by predicting stereoen-cephalography (SEEG) recordings taken while human subjects watched movies. We operationalize sites of multimodal integration as regions where a multimodal vision-language model predicts recordings better than unimodal language, unimodal vision, or linearly-integrated language-vision models. Our target DNN models span different architectures (e.g., convolutional networks and transformers) and multimodal training techniques (e.g., cross-attention and contrastive learning). As a key enabling step, we first demonstrate that trained vision and language models systematically outperform their randomly initialized counterparts in their ability to predict SEEG signals. We then compare unimodal and multimodal models against one another. Because our target DNN models often have different architectures, number of parameters, and training sets (possibly obscuring those differences attributable to integration), we carry out a controlled comparison of two models (SLIP and SimCLR), which keep all of these attributes the same aside from input modality. Using this approach, we identify a sizable number of neural sites (on average 141 out of 1090 total sites or 12.94%) and brain regions where multimodal integration seems to occur. Additionally, we find that among the variants of multimodal training techniques we assess, CLIP-style training is the best suited for downstream prediction of the neural activity in these sites.

2.

Neuron-level Prediction and Noise can Implement Flexible Reward-Seeking Behavior.

Li, Chenguang; Brenner, Jonah; Boesky, Adam; Ramanathan, Sharad; Kreiman, Gabriel.

bioRxiv ; 2024 May 22.

Article in English | MEDLINE | ID: mdl-38826332

ABSTRACT

We show that neural networks can implement reward-seeking behavior using only local predictive updates and internal noise. These networks are capable of autonomous interaction with an environment and can switch between explore and exploit behavior, which we show is governed by attractor dynamics. Networks can adapt to changes in their architectures, environments, or motor interfaces without any external control signals. When networks have a choice between different tasks, they can form preferences that depend on patterns of noise and initialization, and we show that these preferences can be biased by network architectures or by changing learning rates. Our algorithm presents a flexible, biologically plausible way of interacting with environments without requiring an explicit environmental reward function, allowing for behavior that is both highly adaptable and autonomous. Code is available at https://github.com/ccli3896/PaN.

3.

The Impact of Scene Context on Visual Object Recognition: Comparing Humans, Monkeys, and Computational Models.

Djambazovska, Sara; Zafer, Anaa; Ramezanpour, Hamidreza; Kreiman, Gabriel; Kar, Kohitij.

bioRxiv ; 2024 Jun 01.

Article in English | MEDLINE | ID: mdl-38854011

ABSTRACT

During natural vision, we rarely see objects in isolation but rather embedded in rich and complex contexts. Understanding how the brain recognizes objects in natural scenes by integrating contextual information remains a key challenge. To elucidate neural mechanisms compatible with human visual processing, we need an animal model that behaves similarly to humans, so that inferred neural mechanisms can provide hypotheses relevant to the human brain. Here we assessed whether rhesus macaques could model human context-driven object recognition by quantifying visual object identification abilities across variations in the amount, quality, and congruency of contextual cues. Behavioral metrics revealed strikingly similar context-dependent patterns between humans and monkeys. However, neural responses in the inferior temporal (IT) cortex of monkeys that were never explicitly trained to discriminate objects in context, as well as current artificial neural network models, could only partially explain this cross-species correspondence. The shared behavioral variance unexplained by context-naive neural data or computational models highlights fundamental knowledge gaps. Our findings demonstrate an intriguing alignment of human and monkey visual object processing that defies full explanation by either brain activity in a key visual region or state-of-the-art models.

4.

Feature-selective responses in macaque visual cortex follow eye movements during natural vision.

Xiao, Will; Sharma, Saloni; Kreiman, Gabriel; Livingstone, Margaret S.

Nat Neurosci ; 27(6): 1157-1166, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38684892

ABSTRACT

In natural vision, primates actively move their eyes several times per second via saccades. It remains unclear whether, during this active looking, visual neurons exhibit classical retinotopic properties, anticipate gaze shifts or mirror the stable quality of perception, especially in complex natural scenes. Here, we let 13 monkeys freely view thousands of natural images across 4.6 million fixations, recorded 883 h of neuronal responses in six areas spanning primary visual to anterior inferior temporal cortex and analyzed spatial, temporal and featural selectivity in these responses. Face neurons tracked their receptive field contents, indicated by category-selective responses. Self-consistency analysis showed that general feature-selective responses also followed eye movements and remained gaze-dependent over seconds of viewing the same image. Computational models of feature-selective responses located retinotopic receptive fields during free viewing. We found limited evidence for feature-selective predictive remapping and no viewing-history integration. Thus, ventral visual neurons represent the world in a predominantly eye-centered reference frame during natural vision.

Subject(s)

Eye Movements , Macaca mulatta , Neurons , Visual Cortex , Animals , Visual Cortex/physiology , Eye Movements/physiology , Neurons/physiology , Male , Photic Stimulation/methods , Visual Perception/physiology , Fixation, Ocular/physiology , Saccades/physiology , Vision, Ocular/physiology , Female

5.

Tuned Compositional Feature Replays for Efficient Stream Learning.

Talbot, Morgan B; Zawar, Rushikesh; Badkundri, Rohil; Zhang, Mengmi; Kreiman, Gabriel.

IEEE Trans Neural Netw Learn Syst ; PP2023 Dec 25.

Article in English | MEDLINE | ID: mdl-38145511

ABSTRACT

Our brains extract durable, generalizable knowledge from transient experiences of the world. Artificial neural networks come nowhere close to this ability. When tasked with learning to classify objects by training on nonrepeating video frames in temporal order (online stream learning), models that learn well from shuffled datasets catastrophically forget old knowledge upon learning new stimuli. We propose a new continual learning algorithm, compositional replay using memory blocks (CRUMB), which mitigates forgetting by replaying feature maps reconstructed by combining generic parts. CRUMB concatenates trainable and reusable memory block vectors to compositionally reconstruct feature map tensors in convolutional neural networks (CNNs). Storing the indices of memory blocks used to reconstruct new stimuli enables memories of the stimuli to be replayed during later tasks. This reconstruction mechanism also primes the neural network to minimize catastrophic forgetting by biasing it toward attending to information about object shapes more than information about image textures and stabilizes the network during stream learning by providing a shared feature-level basis for all training examples. These properties allow CRUMB to outperform an otherwise identical algorithm that stores and replays raw images while occupying only 3.6% as much memory. We stress-tested CRUMB alongside 13 competing methods on seven challenging datasets. To address the limited number of existing online stream learning datasets, we introduce two new benchmarks by adapting existing datasets for stream learning. With only 3.7%-4.1% as much memory and 15%-43% as much runtime, CRUMB mitigates catastrophic forgetting more effectively than the state-of-the-art. Our code is available at https://github.com/MorganBDT/crumb.git.

6.

Learning to Learn: How to Continuously Teach Humans and Machines.

Singh, Parantak; Li, You; Sikarwar, Ankur; Lei, Weixian; Gao, Difei; Talbot, Morgan B; Sun, Ying; Shou, Mike Zheng; Kreiman, Gabriel; Zhang, Mengmi.

IEEE Int Conf Comput Vis Workshops ; 2023: 11674-11685, 2023 Oct.

Article in English | MEDLINE | ID: mdl-38784111

ABSTRACT

Curriculum design is a fundamental component of education. For example, when we learn mathematics at school, we build upon our knowledge of addition to learn multiplication. These and other concepts must be mastered before our first algebra lesson, which also reinforces our addition and multiplication skills. Designing a curriculum for teaching either a human or a machine shares the underlying goal of maximizing knowledge transfer from earlier to later tasks, while also minimizing forgetting of learned tasks. Prior research on curriculum design for image classification focuses on the ordering of training examples during a single offline task. Here, we investigate the effect of the order in which multiple distinct tasks are learned in a sequence. We focus on the online class-incremental continual learning setting, where algorithms or humans must learn image classes one at a time during a single pass through a dataset. We find that curriculum consistently influences learning outcomes for humans and for multiple continual machine learning algorithms across several benchmark datasets. We introduce a novel-object recognition dataset for human curriculum learning experiments and observe that curricula that are effective for humans are highly correlated with those that are effective for machines. As an initial step towards automated curriculum design for online class-incremental learning, we propose a novel algorithm, dubbed Curriculum Designer (CD), that designs and ranks curricula based on inter-class feature similarities. We find significant overlap between curricula that are empirically highly effective and those that are highly ranked by our CD. Our study establishes a framework for further research on teaching humans and machines to learn continuously using optimized curricula. Our code and data are available through this link.

7.

LITTLE MEMORY EDITORS LIVING INSIDE YOUR BRAIN.

Zheng, Jie; Chen, Gordon; Kreiman, Gabriel; Rutishauser, Ueli.

Front Young Minds ; 112023.

Article in English | MEDLINE | ID: mdl-38186962

ABSTRACT

We interact with the world continuously. However, memories of our experiences are stored as individual events. For example, when we go on a road trip, we do not remember what happens second by second. Instead, we remember only a few special moments or events from a trip, such as dancing around the campfire. Our brains constantly extract memorable events while we interact with the world, and we organize those events based on their relevance. This process is like grouping road trip photos under different folders on the computer, so we can efficiently and accurately retrieve those memories in the future. How does the brain create these memorable events? In this article, you will learn about two groups of neurons inside the brain that help achieve this remarkable feat. You will also learn about how the activation of these neurons shapes the formation and retrieval of memories.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL