Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 52
Filter
1.
PLoS Comput Biol ; 20(6): e1012159, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38870125

ABSTRACT

Humans are extremely robust in our ability to perceive and recognize objects-we see faces in tea stains and can recognize friends on dark streets. Yet, neurocomputational models of primate object recognition have focused on the initial feed-forward pass of processing through the ventral stream and less on the top-down feedback that likely underlies robust object perception and recognition. Aligned with the generative approach, we propose that the visual system actively facilitates recognition by reconstructing the object hypothesized to be in the image. Top-down attention then uses this reconstruction as a template to bias feedforward processing to align with the most plausible object hypothesis. Building on auto-encoder neural networks, our model makes detailed hypotheses about the appearance and location of the candidate objects in the image by reconstructing a complete object representation from potentially incomplete visual input due to noise and occlusion. The model then leverages the best object reconstruction, measured by reconstruction error, to direct the bottom-up process of selectively routing low-level features, a top-down biasing that captures a core function of attention. We evaluated our model using the MNIST-C (handwritten digits under corruptions) and ImageNet-C (real-world objects under corruptions) datasets. Not only did our model achieve superior performance on these challenging tasks designed to approximate real-world noise and occlusion viewing conditions, but also better accounted for human behavioral reaction times and error patterns than a standard feedforward Convolutional Neural Network. Our model suggests that a complete understanding of object perception and recognition requires integrating top-down and attention feedback, which we propose is an object reconstruction.


Subject(s)
Attention , Neural Networks, Computer , Pattern Recognition, Visual , Humans , Attention/physiology , Pattern Recognition, Visual/physiology , Computational Biology , Models, Neurological , Recognition, Psychology/physiology
2.
J Vis ; 23(5): 16, 2023 05 02.
Article in English | MEDLINE | ID: mdl-37212782

ABSTRACT

The visual system uses sequences of selective glimpses to objects to support goal-directed behavior, but how is this attention control learned? Here we present an encoder-decoder model inspired by the interacting bottom-up and top-down visual pathways making up the recognition-attention system in the brain. At every iteration, a new glimpse is taken from the image and is processed through the "what" encoder, a hierarchy of feedforward, recurrent, and capsule layers, to obtain an object-centric (object-file) representation. This representation feeds to the "where" decoder, where the evolving recurrent representation provides top-down attentional modulation to plan subsequent glimpses and impact routing in the encoder. We demonstrate how the attention mechanism significantly improves the accuracy of classifying highly overlapping digits. In a visual reasoning task requiring comparison of two objects, our model achieves near-perfect accuracy and significantly outperforms larger models in generalizing to unseen stimuli. Our work demonstrates the benefits of object-based attention mechanisms taking sequential glimpses of objects.


Subject(s)
Brain , Visual Perception , Humans , Photic Stimulation/methods , Recognition, Psychology , Problem Solving , Pattern Recognition, Visual
3.
J Vis ; 22(4): 13, 2022 03 02.
Article in English | MEDLINE | ID: mdl-35323870

ABSTRACT

The factors determining how attention is allocated during visual tasks have been studied for decades, but few studies have attempted to model the weighting of several of these factors within and across tasks to better understand their relative contributions. Here we consider the roles of saliency, center bias, target features, and object recognition uncertainty in predicting the first nine changes in fixation made during free viewing and visual search tasks in the OSIE and COCO-Search18 datasets, respectively. We focus on the latter-most and least familiar of these factors by proposing a new method of quantifying uncertainty in an image, one based on object recognition. We hypothesize that the greater the number of object categories competing for an object proposal, the greater the uncertainty of how that object should be recognized and, hence, the greater the need for attention to resolve this uncertainty. As expected, we found that target features best predicted target-present search, with their dominance obscuring the use of other features. Unexpectedly, we found that target features were only weakly used during target-absent search. We also found that object recognition uncertainty outperformed an unsupervised saliency model in predicting free-viewing fixations, although saliency was slightly more predictive of search. We conclude that uncertainty in object recognition, a measure that is image computable and highly interpretable, is better than bottom-up saliency in predicting attention during free viewing.


Subject(s)
Visual Perception , Bias , Humans , Uncertainty
4.
J Vis ; 21(13): 13, 2021 12 01.
Article in English | MEDLINE | ID: mdl-34967860

ABSTRACT

Human visual recognition is outstandingly robust. People can recognize thousands of object classes in the blink of an eye (50-200 ms) even when the objects vary in position, scale, viewpoint, and illumination. What aspects of human category learning facilitate the extraction of invariant visual features for object recognition? Here, we explore the possibility that a contributing factor to learning such robust visual representations may be a taxonomic hierarchy communicated in part by common labels to which people are exposed as part of natural language. We did this by manipulating the taxonomic level of labels (e.g., superordinate-level [mammal, fruit, vehicle] and basic-level [dog, banana, van]), and the order in which these training labels were used during learning by a Convolutional Neural Network. We found that training the model with hierarchical labels yields visual representations that are more robust to image transformations (e.g., position/scale, illumination, noise, and blur), especially when images were first trained with superordinate labels and then fine-tuned with basic labels. We also found that Superordinate-label followed by Basic-label training best predicts functional magnetic resonance imaging responses in visual cortex and behavioral similarity judgments recorded while viewing naturalistic images. The benefits of training with superordinate labels in the earlier stages of category learning is discussed in the context of representational efficiency and generalization.


Subject(s)
Pattern Recognition, Visual , Visual Cortex , Humans , Magnetic Resonance Imaging , Neural Networks, Computer , Photic Stimulation
5.
J Neurosci ; 37(6): 1453-1467, 2017 02 08.
Article in English | MEDLINE | ID: mdl-28039373

ABSTRACT

Modern computational models of attention predict fixations using saliency maps and target maps, which prioritize locations for fixation based on feature contrast and target goals, respectively. But whereas many such models are biologically plausible, none have looked to the oculomotor system for design constraints or parameter specification. Conversely, although most models of saccade programming are tightly coupled to underlying neurophysiology, none have been tested using real-world stimuli and tasks. We combined the strengths of these two approaches in MASC, a model of attention in the superior colliculus (SC) that captures known neurophysiological constraints on saccade programming. We show that MASC predicted the fixation locations of humans freely viewing naturalistic scenes and performing exemplar and categorical search tasks, a breadth achieved by no other existing model. Moreover, it did this as well or better than its more specialized state-of-the-art competitors. MASC's predictive success stems from its inclusion of high-level but core principles of SC organization: an over-representation of foveal information, size-invariant population codes, cascaded population averaging over distorted visual and motor maps, and competition between motor point images for saccade programming, all of which cause further modulation of priority (attention) after projection of saliency and target maps to the SC. Only by incorporating these organizing brain principles into our models can we fully understand the transformation of complex visual information into the saccade programs underlying movements of overt attention. With MASC, a theoretical footing now exists to generate and test computationally explicit predictions of behavioral and neural responses in visually complex real-world contexts.SIGNIFICANCE STATEMENT The superior colliculus (SC) performs a visual-to-motor transformation vital to overt attention, but existing SC models cannot predict saccades to visually complex real-world stimuli. We introduce a brain-inspired SC model that outperforms state-of-the-art image-based competitors in predicting the sequences of fixations made by humans performing a range of everyday tasks (scene viewing and exemplar and categorical search), making clear the value of looking to the brain for model design. This work is significant in that it will drive new research by making computationally explicit predictions of SC neural population activity in response to naturalistic stimuli and tasks. It will also serve as a blueprint for the construction of other brain-inspired models, helping to usher in the next generation of truly intelligent autonomous systems.


Subject(s)
Eye Movements/physiology , Models, Neurological , Pattern Recognition, Visual/physiology , Photic Stimulation/methods , Superior Colliculi/physiology , Visual Perception/physiology , Female , Forecasting , Humans , Male , Models, Anatomic , Superior Colliculi/anatomy & histology
6.
J Vis ; 18(11): 4, 2018 10 01.
Article in English | MEDLINE | ID: mdl-30347091

ABSTRACT

Objects often appear with some amount of occlusion. We fill in missing information using local shape features even before attending to those objects-a process called amodal completion. Here we explore the possibility that knowledge about common realistic objects can be used to "restore" missing information even in cases where amodal completion is not expected. We systematically varied whether visual search targets were occluded or not, both at preview and in search displays. Button-press responses were longest when the preview was unoccluded and the target was occluded in the search display. This pattern is consistent with a target-verification process that uses the features visible at preview but does not restore missing information in the search display. However, visual search guidance was weakest whenever the target was occluded in the search display, regardless of whether it was occluded at preview. This pattern suggests that information missing during the preview was restored and used to guide search, thereby resulting in a feature mismatch and poor guidance. If this process were preattentive, as with amodal completion, we should have found roughly equivalent search guidance across all conditions because the target would always be unoccluded or restored, resulting in no mismatch. We conclude that realistic objects are restored behind occluders during search target preview, even in situations not prone to amodal completion, and this restoration does not occur preattentively during search.


Subject(s)
Fixation, Ocular/physiology , Form Perception/physiology , Perceptual Masking/physiology , Humans , Male , Visual Perception/physiology , Young Adult
7.
J Vis ; 17(4): 2, 2017 04 01.
Article in English | MEDLINE | ID: mdl-28388698

ABSTRACT

Saccades quite systematically undershoot a peripheral visual target by about 10% of its eccentricity while becoming more variable, mainly in amplitude, as the target becomes more peripheral. This undershoot phenomenon has been interpreted as the strategic adjustment of saccadic gain downstream of the superior colliculus (SC), where saccades are programmed. Here, we investigated whether the eccentricity-related increase in saccades' hypometria and imprecision might not instead result from overrepresentation of space closer to the fovea in the SC and visual-cortical areas. To test this magnification-factor (MF) hypothesis, we analyzed four parametric eye-movement data sets, collected while humans made saccades to single eccentric stimuli. We first established that the undershoot phenomenon generalizes to ordinary saccade amplitudes (0.5°-15°) and directions (0°-90°) and that landing-position distributions become not only increasingly elongated but also more skewed toward the fovea as target eccentricity increases. Moreover, we confirmed the MF hypothesis by showing (a) that the linear eccentricity-related increase in undershoot error and negative skewness canceled out when landing positions were log-scaled according to the MF in monkeys' SC and (b) that the spread, proportional to eccentricity outside an extended, 5°, foveal region, became circular and invariant in size in SC space. Yet the eccentricity-related increase in variability, slower near the fovea, yielded progressively larger and more elongated clusters toward foveal and vertical-meridian SC representations. What causes this latter, unexpected, pattern remains undetermined. Nevertheless, our findings clearly suggest that the undershoot phenomenon, and related variability, originate in, or upstream of, the SC, rather than reflecting downstream, adaptive, strategies.


Subject(s)
Saccades/physiology , Superior Colliculi/physiology , Visual Perception/physiology , Adolescent , Female , Fovea Centralis , Humans , Male , Vision, Binocular/physiology , Young Adult
8.
Psychol Sci ; 27(6): 870-84, 2016 06.
Article in English | MEDLINE | ID: mdl-27142461

ABSTRACT

This article introduces a generative model of category representation that uses computer vision methods to extract category-consistent features (CCFs) directly from images of category exemplars. The model was trained on 4,800 images of common objects, and CCFs were obtained for 68 categories spanning subordinate, basic, and superordinate levels in a category hierarchy. When participants searched for these same categories, targets cued at the subordinate level were preferentially fixated, but fixated targets were verified faster when they followed a basic-level cue. The subordinate-level advantage in guidance is explained by the number of target-category CCFs, a measure of category specificity that decreases with movement up the category hierarchy. The basic-level advantage in verification is explained by multiplying the number of CCFs by sibling distance, a measure of category distinctiveness. With this model, the visual representations of real-world object categories, each learned from the vast numbers of image exemplars accumulated throughout everyday experience, can finally be studied.


Subject(s)
Concept Formation/physiology , Models, Theoretical , Pattern Recognition, Visual/physiology , Adult , Humans , Young Adult
9.
J Vis ; 14(7)2014 Jun 05.
Article in English | MEDLINE | ID: mdl-24904121

ABSTRACT

We introduce the proto-object model of visual clutter perception. This unsupervised model segments an image into superpixels, then merges neighboring superpixels that share a common color cluster to obtain proto-objects-defined here as spatially extended regions of coherent features. Clutter is estimated by simply counting the number of proto-objects. We tested this model using 90 images of realistic scenes that were ranked by observers from least to most cluttered. Comparing this behaviorally obtained ranking to a ranking based on the model clutter estimates, we found a significant correlation between the two (Spearman's ρ = 0.814, p < 0.001). We also found that the proto-object model was highly robust to changes in its parameters and was generalizable to unseen images. We compared the proto-object model to six other models of clutter perception and demonstrated that it outperformed each, in some cases dramatically. Importantly, we also showed that the proto-object model was a better predictor of clutter perception than an actual count of the number of objects in the scenes, suggesting that the set size of a scene may be better described by proto-objects than objects. We conclude that the success of the proto-object model is due in part to its use of an intermediate level of visual representation-one between features and objects-and that this is evidence for the potential importance of a proto-object representation in many common visual percepts and tasks.


Subject(s)
Attention/physiology , Computer Simulation , Crowding , Visual Perception/physiology , Adolescent , Adult , Eye Movements/physiology , Humans , Young Adult
10.
J Vis ; 14(12)2014 Oct 01.
Article in English | MEDLINE | ID: mdl-25274990

ABSTRACT

The role of target typicality in a categorical visual search task was investigated by cueing observers with a target name, followed by a five-item target present/absent search array in which the target images were rated in a pretest to be high, medium, or low in typicality with respect to the basic-level target cue. Contrary to previous work, we found that search guidance was better for high-typicality targets compared to low-typicality targets, as measured by both the proportion of immediate target fixations and the time to fixate the target. Consistent with previous work, we also found an effect of typicality on target verification times, the time between target fixation and the search judgment; as target typicality decreased, verification times increased. To model these typicality effects, we trained Support Vector Machine (SVM) classifiers on the target categories, and tested these on the corresponding specific targets used in the search task. This analysis revealed significant differences in classifier confidence between the high-, medium-, and low-typicality groups, paralleling the behavioral results. Collectively, these findings suggest that target typicality broadly affects both search guidance and verification, and that differences in typicality can be predicted by distance from an SVM classification boundary.


Subject(s)
Cues , Eye Movements/physiology , Analysis of Variance , Fixation, Ocular/physiology , Humans , Judgment/physiology , Pattern Recognition, Visual/physiology , Perceptual Masking/physiology , Photic Stimulation/methods
11.
J Vis ; 14(3): 8, 2014 Mar 05.
Article in English | MEDLINE | ID: mdl-24599946

ABSTRACT

The visual-search literature has assumed that the top-down target representation used to guide search resides in visual working memory (VWM). We directly tested this assumption using contralateral delay activity (CDA) to estimate the VWM load imposed by the target representation. In Experiment 1, observers previewed four photorealistic objects and were cued to remember the two objects appearing to the left or right of central fixation; Experiment 2 was identical except that observers previewed two photorealistic objects and were cued to remember one. CDA was measured during a delay following preview offset but before onset of a four-object search array. One of the targets was always present, and observers were asked to make an eye movement to it and press a button. We found lower magnitude CDA on trials when the initial search saccade was directed to the target (strong guidance) compared to when it was not (weak guidance). This difference also tended to be larger shortly before search-display onset and was largely unaffected by VWM item-capacity limits or number of previews. Moreover, the difference between mean strong- and weak-guidance CDA was proportional to the increase in search time between mean strong-and weak-guidance trials (as measured by time-to-target and reaction-time difference scores). Contrary to most search models, our data suggest that trials resulting in the maintenance of more target features results in poor search guidance to a target. We interpret these counterintuitive findings as evidence for strong search guidance using a small set of highly discriminative target features that remain after pruning from a larger set of features, with the load imposed on VWM varying with this feature-consolidation process.


Subject(s)
Cues , Memory, Short-Term/physiology , Pattern Recognition, Visual/physiology , Visual Perception/physiology , Adult , Eye Movements/physiology , Humans , Reaction Time , Young Adult
12.
bioRxiv ; 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-39005469

ABSTRACT

The brain routes and integrates information from many sources during behavior. A number of models explain this phenomenon within the framework of mixed selectivity theory, yet it is difficult to compare their predictions to understand how neurons and circuits integrate information. In this work, we apply time-series partial information decomposition [PID] to compare models of integration on a dataset of superior colliculus [SC] recordings collected during a multi-target visual search task. On this task, SC must integrate target guidance, bottom-up salience, and previous fixation signals to drive attention. We find evidence that SC neurons integrate these factors in diverse ways, including decision-variable selectivity to expected value, functional specialization to previous fixation, and code-switching (to incorporate new visual input).

13.
J Vis ; 13(14)2013 Dec 12.
Article in English | MEDLINE | ID: mdl-24338446

ABSTRACT

Is it possible to infer a person's goal by decoding their fixations on objects? Two groups of participants categorically searched for either a teddy bear or butterfly among random category distractors, each rated as high, medium, or low in similarity to the target classes. Target-similar objects were preferentially fixated in both search tasks, demonstrating information about target category in looking behavior. Different participants then viewed the searchers' scanpaths, superimposed over the target-absent displays, and attempted to decode the target category (bear/butterfly). Bear searchers were classified perfectly; butterfly searchers were classified at 77%. Bear and butterfly Support Vector Machine (SVM) classifiers were also used to decode the same preferentially fixated objects and found to yield highly comparable classification rates. We conclude that information about a person's search goal exists in fixation behavior, and that this information can be behaviorally decoded to reveal a search target-essentially reading a person's mind by analyzing their fixations.


Subject(s)
Fixation, Ocular/physiology , Pattern Recognition, Visual/physiology , Artificial Intelligence , Eye Movements , Female , Humans , Male
14.
J Vis ; 13(3): 30, 2013 Oct 08.
Article in English | MEDLINE | ID: mdl-24105460

ABSTRACT

Search is commonly described as a repeating cycle of guidance to target-like objects, followed by the recognition of these objects as targets or distractors. Are these indeed separate processes using different visual features? We addressed this question by comparing observer behavior to that of support vector machine (SVM) models trained on guidance and recognition tasks. Observers searched for a categorically defined teddy bear target in four-object arrays. Target-absent trials consisted of random category distractors rated in their visual similarity to teddy bears. Guidance, quantified as first-fixated objects during search, was strongest for targets, followed by target-similar, medium-similarity, and target-dissimilar distractors. False positive errors to first-fixated distractors also decreased with increasing dissimilarity to the target category. To model guidance, nine teddy bear detectors, using features ranging in biological plausibility, were trained on unblurred bears then tested on blurred versions of the same objects appearing in each search display. Guidance estimates were based on target probabilities obtained from these detectors. To model recognition, nine bear/nonbear classifiers, trained and tested on unblurred objects, were used to classify the object that would be fixated first (based on the detector estimates) as a teddy bear or a distractor. Patterns of categorical guidance and recognition accuracy were modeled almost perfectly by an HMAX model in combination with a color histogram feature. We conclude that guidance and recognition in the context of search are not separate processes mediated by different features, and that what the literature knows as guidance is really recognition performed on blurred objects viewed in the visual periphery.


Subject(s)
Image Processing, Computer-Assisted , Pattern Recognition, Visual/physiology , Eye Movements/physiology , Humans , Reaction Time
15.
Psychophysiology ; 59(4): e13998, 2022 04.
Article in English | MEDLINE | ID: mdl-35001411

ABSTRACT

Are all real-world objects created equal? Visual search difficulty increases with the number of targets and as target-related visual working memory (VWM) load increases. Our goal was to investigate the load imposed by individual real-world objects held in VWM in the context of search. Measures of visual clutter attempt to quantify real-world set-size in the context of scenes. We applied one of these measures, the number of proto-objects, to individual real-world objects and used contralateral delay activity (CDA) to measure the resulting VWM load. The current study presented a real-world object as a target cue, followed by a delay where CDA was measured. This was followed by a four-object search array. We compared CDA and later search performance from target cues containing a high or low number of proto-objects. High proto-object target cues resulted in greater CDA, longer search RTs, target dwell times, and reduced search guidance, relative to low proto-object targets. These findings demonstrate that targets with more proto-objects result in a higher VWM load and reduced search performance. This shows that the number of proto-objects contained within individual objects produce set-size like effects in VWM and suggests proto-objects may be a viable unit of measure of real-world VWM load. Importantly, this demonstrates that not all real-world objects are created equal.


Subject(s)
Evoked Potentials , Memory, Short-Term , Cues , Humans , Visual Perception
16.
Mem Cognit ; 39(4): 600-13, 2011 May.
Article in English | MEDLINE | ID: mdl-21264590

ABSTRACT

Do refixations serve a rehearsal function in visual working memory (VWM)? We analyzed refixations from observers freely viewing multiobject scenes. An eyetracker was used to limit the viewing of a scene to a specified number of objects fixated after the target (intervening objects), followed by a four-alternative forced choice recognition test. Results showed that the probability of target refixation increased with the number of fixated intervening objects, and these refixations produced a 16% accuracy benefit over the first five intervening-object conditions. Additionally, refixations most frequently occurred after fixations on only one to two other objects, regardless of the intervening-object condition. These behaviors could not be explained by random or minimally constrained computational models; a VWM component was required to completely describe these data. We explain these findings in terms of a monitor-refixate rehearsal system: The activations of object representations in VWM are monitored, with refixations occurring when these activations decrease suddenly.


Subject(s)
Attention , Fixation, Ocular , Memory, Short-Term , Pattern Recognition, Visual , Practice, Psychological , Color Perception , Humans , Intention , Models, Theoretical , Recognition, Psychology , Space Perception
17.
J Vis ; 11(8)2011 Jul 14.
Article in English | MEDLINE | ID: mdl-21757505

ABSTRACT

We asked how visual similarity relationships affect search guidance to categorically defined targets (no visual preview). Experiment 1 used a web-based task to collect visual similarity rankings between two target categories, teddy bears and butterflies, and random-category objects, from which we created search displays in Experiment 2 having either high-similarity distractors, low-similarity distractors, or "mixed" displays with high-, medium-, and low-similarity distractors. Analysis of target-absent trials revealed faster manual responses and fewer fixated distractors on low-similarity displays compared to high-similarity displays. On mixed displays, first fixations were more frequent on high-similarity distractors (bear = 49%; butterfly = 58%) than on low-similarity distractors (bear = 9%; butterfly = 12%). Experiment 3 used the same high/low/mixed conditions, but now these conditions were created using similarity estimates from a computer vision model that ranked objects in terms of color, texture, and shape similarity. The same patterns were found, suggesting that categorical search can indeed be guided by purely visual similarity. Experiment 4 compared cases where the model and human rankings differed and when they agreed. We found that similarity effects were best predicted by cases where the two sets of rankings agreed, suggesting that both human visual similarity rankings and the computer vision model captured features important for guiding search to categorical targets.


Subject(s)
Artificial Intelligence , Attention/physiology , Eye Movements/physiology , Form Perception/physiology , Pattern Recognition, Visual/physiology , Fixation, Ocular/physiology , Humans , Photic Stimulation/methods , Psychophysics
18.
J Vis ; 11(14)2011 Dec 07.
Article in English | MEDLINE | ID: mdl-22159628

ABSTRACT

We evaluated the use of visual clutter as a surrogate measure of set size effects in visual search by comparing the effects of subjective clutter (determined by independent raters) and objective clutter (as quantified by edge count and feature congestion) using "evolving" scenes, ones that varied incrementally in clutter while maintaining their semantic continuity. Observers searched for a target building in rural, suburban, and urban city scenes created using the game SimCity. Stimuli were 30 screenshots obtained for each scene type as the city evolved over time. Reaction times and search guidance (measured by scan path ratio) were fastest/strongest for sparsely cluttered rural scenes, slower/weaker for more cluttered suburban scenes, and slowest/weakest for highly cluttered urban scenes. Subjective within-city clutter estimates also increased as each city matured and correlated highly with RT and search guidance. However, multiple regression modeling revealed that adding objective estimates failed to better predict search performance over the subjective estimates alone. This suggests that within-city clutter may not be explained exclusively by low-level feature congestion; conceptual congestion (e.g., the number of different types of buildings in a scene), part of the subjective clutter measure, may also be important in determining the effects of clutter on search.


Subject(s)
Attention/physiology , Eye Movements/physiology , Pattern Recognition, Visual/physiology , Humans , Photic Stimulation
19.
Article in English | MEDLINE | ID: mdl-34164631

ABSTRACT

Understanding how goals control behavior is a question ripe for interrogation by new methods from machine learning. These methods require large and labeled datasets to train models. To annotate a large-scale image dataset with observed search fixations, we collected 16,184 fixations from people searching for either microwaves or clocks in a dataset of 4,366 images (MS-COCO). We then used this behaviorally-annotated dataset and the machine learning method of inverse-reinforcement learning (IRL) to learn target-specific reward functions and policies for these two target goals. Finally, we used these learned policies to predict the fixations of 60 new behavioral searchers (clock = 30, microwave = 30) in a disjoint test dataset of kitchen scenes depicting both a microwave and a clock (thus controlling for differences in low-level image contrast). We found that the IRL model predicted behavioral search efficiency and fixation-density maps using multiple metrics. Moreover, reward maps from the IRL model revealed target-specific patterns that suggest, not just attention guidance by target features, but also guidance by scene context (e.g., fixations along walls in the search of clocks). Using machine learning and the psychologically meaningful principle of reward, it is possible to learn the visual features used in goal-directed attention control.

20.
J Vis ; 10(14)2010 Dec 29.
Article in English | MEDLINE | ID: mdl-21191133

ABSTRACT

We hypothesize that our ability to track objects through occlusions is mediated by timely assistance from gaze in the form of "rescue saccades"-eye movements to tracked objects that are in danger of being lost due to impending occlusion. Observers tracked 2-4 target sharks (out of 9) for 20 s as they swam through a rendered 3D underwater scene. Targets were either allowed to enter into occlusions (occlusion trials) or not (no occlusion trials). Tracking accuracy with 2-3 targets was ≥ 92% regardless of target occlusion but dropped to 74% on occlusion trials with four targets (no occlusion trials remained accurate; 83%). This pattern was mirrored in the frequency of rescue saccades. Rescue saccades accompanied approximatlely 50% of the Track 2-3 target occlusions, but only 34% of the Track 4 occlusions. Their frequency also decreased with increasing distance between a target and the nearest other object, suggesting that it is the potential for target confusion that summons a rescue saccade, not occlusion itself. These findings provide evidence for a tracking system that monitors for events that might cause track loss (e.g., occlusions) and requests help from the oculomotor system to resolve these momentary crises. As the number of crises increase with the number of targets, some requests for help go unsatisfied, resulting in degraded tracking.


Subject(s)
Attention/physiology , Depth Perception/physiology , Motion Perception/physiology , Saccades/physiology , Form Perception/physiology , Humans , Photic Stimulation/methods
SELECTION OF CITATIONS
SEARCH DETAIL