Pesquisa | BVS IEC

1.

The attentive reconstruction of objects facilitates robust object recognition.

Ahn, Seoyoung; Adeli, Hossein; Zelinsky, Gregory J.

PLoS Comput Biol ; 20(6): e1012159, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38870125

RESUMO

Humans are extremely robust in our ability to perceive and recognize objects-we see faces in tea stains and can recognize friends on dark streets. Yet, neurocomputational models of primate object recognition have focused on the initial feed-forward pass of processing through the ventral stream and less on the top-down feedback that likely underlies robust object perception and recognition. Aligned with the generative approach, we propose that the visual system actively facilitates recognition by reconstructing the object hypothesized to be in the image. Top-down attention then uses this reconstruction as a template to bias feedforward processing to align with the most plausible object hypothesis. Building on auto-encoder neural networks, our model makes detailed hypotheses about the appearance and location of the candidate objects in the image by reconstructing a complete object representation from potentially incomplete visual input due to noise and occlusion. The model then leverages the best object reconstruction, measured by reconstruction error, to direct the bottom-up process of selectively routing low-level features, a top-down biasing that captures a core function of attention. We evaluated our model using the MNIST-C (handwritten digits under corruptions) and ImageNet-C (real-world objects under corruptions) datasets. Not only did our model achieve superior performance on these challenging tasks designed to approximate real-world noise and occlusion viewing conditions, but also better accounted for human behavioral reaction times and error patterns than a standard feedforward Convolutional Neural Network. Our model suggests that a complete understanding of object perception and recognition requires integrating top-down and attention feedback, which we propose is an object reconstruction.

Assuntos

Atenção , Redes Neurais de Computação , Reconhecimento Visual de Modelos , Humanos , Atenção/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Biologia Computacional , Modelos Neurológicos , Reconhecimento Psicológico/fisiologia

2.

A brain-inspired object-based attention network for multiobject recognition and visual reasoning.

Adeli, Hossein; Ahn, Seoyoung; Zelinsky, Gregory J.

J Vis ; 23(5): 16, 2023 05 02.

Artigo em Inglês | MEDLINE | ID: mdl-37212782

RESUMO

The visual system uses sequences of selective glimpses to objects to support goal-directed behavior, but how is this attention control learned? Here we present an encoder-decoder model inspired by the interacting bottom-up and top-down visual pathways making up the recognition-attention system in the brain. At every iteration, a new glimpse is taken from the image and is processed through the "what" encoder, a hierarchy of feedforward, recurrent, and capsule layers, to obtain an object-centric (object-file) representation. This representation feeds to the "where" decoder, where the evolving recurrent representation provides top-down attentional modulation to plan subsequent glimpses and impact routing in the encoder. We demonstrate how the attention mechanism significantly improves the accuracy of classifying highly overlapping digits. In a visual reasoning task requiring comparison of two objects, our model achieves near-perfect accuracy and significantly outperforms larger models in generalizing to unseen stimuli. Our work demonstrates the benefits of object-based attention mechanisms taking sequential glimpses of objects.

Assuntos

Encéfalo , Percepção Visual , Humanos , Estimulação Luminosa/métodos , Reconhecimento Psicológico , Resolução de Problemas , Reconhecimento Visual de Modelos

3.

Weighting the factors affecting attention guidance during free viewing and visual search: The unexpected role of object recognition uncertainty.

Chakraborty, Souradeep; Samaras, Dimitris; Zelinsky, Gregory J.

J Vis ; 22(4): 13, 2022 03 02.

Artigo em Inglês | MEDLINE | ID: mdl-35323870

RESUMO

The factors determining how attention is allocated during visual tasks have been studied for decades, but few studies have attempted to model the weighting of several of these factors within and across tasks to better understand their relative contributions. Here we consider the roles of saliency, center bias, target features, and object recognition uncertainty in predicting the first nine changes in fixation made during free viewing and visual search tasks in the OSIE and COCO-Search18 datasets, respectively. We focus on the latter-most and least familiar of these factors by proposing a new method of quantifying uncertainty in an image, one based on object recognition. We hypothesize that the greater the number of object categories competing for an object proposal, the greater the uncertainty of how that object should be recognized and, hence, the greater the need for attention to resolve this uncertainty. As expected, we found that target features best predicted target-present search, with their dominance obscuring the use of other features. Unexpectedly, we found that target features were only weakly used during target-absent search. We also found that object recognition uncertainty outperformed an unsupervised saliency model in predicting free-viewing fixations, although saliency was slightly more predictive of search. We conclude that uncertainty in object recognition, a measure that is image computable and highly interpretable, is better than bottom-up saliency in predicting attention during free viewing.

Assuntos

Percepção Visual , Viés , Humanos , Incerteza

4.

Use of superordinate labels yields more robust and human-like visual representations in convolutional neural networks.

Ahn, Seoyoung; Zelinsky, Gregory J; Lupyan, Gary.

J Vis ; 21(13): 13, 2021 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-34967860

RESUMO

Human visual recognition is outstandingly robust. People can recognize thousands of object classes in the blink of an eye (50-200 ms) even when the objects vary in position, scale, viewpoint, and illumination. What aspects of human category learning facilitate the extraction of invariant visual features for object recognition? Here, we explore the possibility that a contributing factor to learning such robust visual representations may be a taxonomic hierarchy communicated in part by common labels to which people are exposed as part of natural language. We did this by manipulating the taxonomic level of labels (e.g., superordinate-level [mammal, fruit, vehicle] and basic-level [dog, banana, van]), and the order in which these training labels were used during learning by a Convolutional Neural Network. We found that training the model with hierarchical labels yields visual representations that are more robust to image transformations (e.g., position/scale, illumination, noise, and blur), especially when images were first trained with superordinate labels and then fine-tuned with basic labels. We also found that Superordinate-label followed by Basic-label training best predicts functional magnetic resonance imaging responses in visual cortex and behavioral similarity judgments recorded while viewing naturalistic images. The benefits of training with superordinate labels in the earlier stages of category learning is discussed in the context of representational efficiency and generalization.

Assuntos

Reconhecimento Visual de Modelos , Córtex Visual , Humanos , Imageamento por Ressonância Magnética , Redes Neurais de Computação , Estimulação Luminosa

5.

A Model of the Superior Colliculus Predicts Fixation Locations during Scene Viewing and Visual Search.

Adeli, Hossein; Vitu, Françoise; Zelinsky, Gregory J.

J Neurosci ; 37(6): 1453-1467, 2017 02 08.

Artigo em Inglês | MEDLINE | ID: mdl-28039373

RESUMO

Modern computational models of attention predict fixations using saliency maps and target maps, which prioritize locations for fixation based on feature contrast and target goals, respectively. But whereas many such models are biologically plausible, none have looked to the oculomotor system for design constraints or parameter specification. Conversely, although most models of saccade programming are tightly coupled to underlying neurophysiology, none have been tested using real-world stimuli and tasks. We combined the strengths of these two approaches in MASC, a model of attention in the superior colliculus (SC) that captures known neurophysiological constraints on saccade programming. We show that MASC predicted the fixation locations of humans freely viewing naturalistic scenes and performing exemplar and categorical search tasks, a breadth achieved by no other existing model. Moreover, it did this as well or better than its more specialized state-of-the-art competitors. MASC's predictive success stems from its inclusion of high-level but core principles of SC organization: an over-representation of foveal information, size-invariant population codes, cascaded population averaging over distorted visual and motor maps, and competition between motor point images for saccade programming, all of which cause further modulation of priority (attention) after projection of saliency and target maps to the SC. Only by incorporating these organizing brain principles into our models can we fully understand the transformation of complex visual information into the saccade programs underlying movements of overt attention. With MASC, a theoretical footing now exists to generate and test computationally explicit predictions of behavioral and neural responses in visually complex real-world contexts.SIGNIFICANCE STATEMENT The superior colliculus (SC) performs a visual-to-motor transformation vital to overt attention, but existing SC models cannot predict saccades to visually complex real-world stimuli. We introduce a brain-inspired SC model that outperforms state-of-the-art image-based competitors in predicting the sequences of fixations made by humans performing a range of everyday tasks (scene viewing and exemplar and categorical search), making clear the value of looking to the brain for model design. This work is significant in that it will drive new research by making computationally explicit predictions of SC neural population activity in response to naturalistic stimuli and tasks. It will also serve as a blueprint for the construction of other brain-inspired models, helping to usher in the next generation of truly intelligent autonomous systems.

Assuntos

Movimentos Oculares/fisiologia , Modelos Neurológicos , Reconhecimento Visual de Modelos/fisiologia , Estimulação Luminosa/métodos , Colículos Superiores/fisiologia , Percepção Visual/fisiologia , Feminino , Previsões , Humanos , Masculino , Modelos Anatômicos , Colículos Superiores/anatomia & histologia

6.

Occluded information is restored at preview but not during visual search.

Alexander, Robert G; Zelinsky, Gregory J.

J Vis ; 18(11): 4, 2018 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-30347091

RESUMO

Objects often appear with some amount of occlusion. We fill in missing information using local shape features even before attending to those objects-a process called amodal completion. Here we explore the possibility that knowledge about common realistic objects can be used to "restore" missing information even in cases where amodal completion is not expected. We systematically varied whether visual search targets were occluded or not, both at preview and in search displays. Button-press responses were longest when the preview was unoccluded and the target was occluded in the search display. This pattern is consistent with a target-verification process that uses the features visible at preview but does not restore missing information in the search display. However, visual search guidance was weakest whenever the target was occluded in the search display, regardless of whether it was occluded at preview. This pattern suggests that information missing during the preview was restored and used to guide search, thereby resulting in a feature mismatch and poor guidance. If this process were preattentive, as with amodal completion, we should have found roughly equivalent search guidance across all conditions because the target would always be unoccluded or restored, resulting in no mismatch. We conclude that realistic objects are restored behind occluders during search target preview, even in situations not prone to amodal completion, and this restoration does not occur preattentively during search.

Assuntos

Fixação Ocular/fisiologia , Percepção de Forma/fisiologia , Mascaramento Perceptivo/fisiologia , Humanos , Masculino , Percepção Visual/fisiologia , Adulto Jovem

7.

The magnification factor accounts for the greater hypometria and imprecision of larger saccades: Evidence from a parametric human-behavioral study.

Vitu, Françoise; Casteau, Soazig; Adeli, Hossein; Zelinsky, Gregory J; Castet, Eric.

J Vis ; 17(4): 2, 2017 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-28388698

RESUMO

Saccades quite systematically undershoot a peripheral visual target by about 10% of its eccentricity while becoming more variable, mainly in amplitude, as the target becomes more peripheral. This undershoot phenomenon has been interpreted as the strategic adjustment of saccadic gain downstream of the superior colliculus (SC), where saccades are programmed. Here, we investigated whether the eccentricity-related increase in saccades' hypometria and imprecision might not instead result from overrepresentation of space closer to the fovea in the SC and visual-cortical areas. To test this magnification-factor (MF) hypothesis, we analyzed four parametric eye-movement data sets, collected while humans made saccades to single eccentric stimuli. We first established that the undershoot phenomenon generalizes to ordinary saccade amplitudes (0.5°-15°) and directions (0°-90°) and that landing-position distributions become not only increasingly elongated but also more skewed toward the fovea as target eccentricity increases. Moreover, we confirmed the MF hypothesis by showing (a) that the linear eccentricity-related increase in undershoot error and negative skewness canceled out when landing positions were log-scaled according to the MF in monkeys' SC and (b) that the spread, proportional to eccentricity outside an extended, 5°, foveal region, became circular and invariant in size in SC space. Yet the eccentricity-related increase in variability, slower near the fovea, yielded progressively larger and more elongated clusters toward foveal and vertical-meridian SC representations. What causes this latter, unexpected, pattern remains undetermined. Nevertheless, our findings clearly suggest that the undershoot phenomenon, and related variability, originate in, or upstream of, the SC, rather than reflecting downstream, adaptive, strategies.

Assuntos

Movimentos Sacádicos/fisiologia , Colículos Superiores/fisiologia , Percepção Visual/fisiologia , Adolescente , Feminino , Fóvea Central , Humanos , Masculino , Visão Binocular/fisiologia , Adulto Jovem

8.

Searching for Category-Consistent Features: A Computational Approach to Understanding Visual Category Representation.

Yu, Chen-Ping; Maxfield, Justin T; Zelinsky, Gregory J.

Psychol Sci ; 27(6): 870-84, 2016 06.

Artigo em Inglês | MEDLINE | ID: mdl-27142461

RESUMO

This article introduces a generative model of category representation that uses computer vision methods to extract category-consistent features (CCFs) directly from images of category exemplars. The model was trained on 4,800 images of common objects, and CCFs were obtained for 68 categories spanning subordinate, basic, and superordinate levels in a category hierarchy. When participants searched for these same categories, targets cued at the subordinate level were preferentially fixated, but fixated targets were verified faster when they followed a basic-level cue. The subordinate-level advantage in guidance is explained by the number of target-category CCFs, a measure of category specificity that decreases with movement up the category hierarchy. The basic-level advantage in verification is explained by multiplying the number of CCFs by sibling distance, a measure of category distinctiveness. With this model, the visual representations of real-world object categories, each learned from the vast numbers of image exemplars accumulated throughout everyday experience, can finally be studied.

Assuntos

Formação de Conceito/fisiologia , Modelos Teóricos , Reconhecimento Visual de Modelos/fisiologia , Adulto , Humanos , Adulto Jovem

9.

Effects of target typicality on categorical search.

Maxfield, Justin T; Stalder, Westri D; Zelinsky, Gregory J.

J Vis ; 14(12)2014 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-25274990

RESUMO

The role of target typicality in a categorical visual search task was investigated by cueing observers with a target name, followed by a five-item target present/absent search array in which the target images were rated in a pretest to be high, medium, or low in typicality with respect to the basic-level target cue. Contrary to previous work, we found that search guidance was better for high-typicality targets compared to low-typicality targets, as measured by both the proportion of immediate target fixations and the time to fixate the target. Consistent with previous work, we also found an effect of typicality on target verification times, the time between target fixation and the search judgment; as target typicality decreased, verification times increased. To model these typicality effects, we trained Support Vector Machine (SVM) classifiers on the target categories, and tested these on the corresponding specific targets used in the search task. This analysis revealed significant differences in classifier confidence between the high-, medium-, and low-typicality groups, paralleling the behavioral results. Collectively, these findings suggest that target typicality broadly affects both search guidance and verification, and that differences in typicality can be predicted by distance from an SVM classification boundary.

Assuntos

Sinais (Psicologia) , Movimentos Oculares/fisiologia , Análise de Variância , Fixação Ocular/fisiologia , Humanos , Julgamento/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Mascaramento Perceptivo/fisiologia , Estimulação Luminosa/métodos

10.

Modeling visual clutter perception using proto-object segmentation.

Yu, Chen-Ping; Samaras, Dimitris; Zelinsky, Gregory J.

J Vis ; 14(7)2014 Jun 05.

Artigo em Inglês | MEDLINE | ID: mdl-24904121

RESUMO

We introduce the proto-object model of visual clutter perception. This unsupervised model segments an image into superpixels, then merges neighboring superpixels that share a common color cluster to obtain proto-objects-defined here as spatially extended regions of coherent features. Clutter is estimated by simply counting the number of proto-objects. We tested this model using 90 images of realistic scenes that were ranked by observers from least to most cluttered. Comparing this behaviorally obtained ranking to a ranking based on the model clutter estimates, we found a significant correlation between the two (Spearman's ρ = 0.814, p < 0.001). We also found that the proto-object model was highly robust to changes in its parameters and was generalizable to unseen images. We compared the proto-object model to six other models of clutter perception and demonstrated that it outperformed each, in some cases dramatically. Importantly, we also showed that the proto-object model was a better predictor of clutter perception than an actual count of the number of objects in the scenes, suggesting that the set size of a scene may be better described by proto-objects than objects. We conclude that the success of the proto-object model is due in part to its use of an intermediate level of visual representation-one between features and objects-and that this is evidence for the potential importance of a proto-object representation in many common visual percepts and tasks.

Assuntos

Atenção/fisiologia , Simulação por Computador , Aglomeração , Percepção Visual/fisiologia , Adolescente , Adulto , Movimentos Oculares/fisiologia , Humanos , Adulto Jovem

11.

More target features in visual working memory leads to poorer search guidance: evidence from contralateral delay activity.

Schmidt, Joseph; MacNamara, Annmarie; Proudfit, Greg Hajcak; Zelinsky, Gregory J.

J Vis ; 14(3): 8, 2014 Mar 05.

Artigo em Inglês | MEDLINE | ID: mdl-24599946

RESUMO

The visual-search literature has assumed that the top-down target representation used to guide search resides in visual working memory (VWM). We directly tested this assumption using contralateral delay activity (CDA) to estimate the VWM load imposed by the target representation. In Experiment 1, observers previewed four photorealistic objects and were cued to remember the two objects appearing to the left or right of central fixation; Experiment 2 was identical except that observers previewed two photorealistic objects and were cued to remember one. CDA was measured during a delay following preview offset but before onset of a four-object search array. One of the targets was always present, and observers were asked to make an eye movement to it and press a button. We found lower magnitude CDA on trials when the initial search saccade was directed to the target (strong guidance) compared to when it was not (weak guidance). This difference also tended to be larger shortly before search-display onset and was largely unaffected by VWM item-capacity limits or number of previews. Moreover, the difference between mean strong- and weak-guidance CDA was proportional to the increase in search time between mean strong-and weak-guidance trials (as measured by time-to-target and reaction-time difference scores). Contrary to most search models, our data suggest that trials resulting in the maintenance of more target features results in poor search guidance to a target. We interpret these counterintuitive findings as evidence for strong search guidance using a small set of highly discriminative target features that remain after pruning from a larger set of features, with the load imposed on VWM varying with this feature-consolidation process.

Assuntos

Sinais (Psicologia) , Memória de Curto Prazo/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Percepção Visual/fisiologia , Adulto , Movimentos Oculares/fisiologia , Humanos , Tempo de Reação , Adulto Jovem

12.

Evaluating theories of neural information integration during visual search.

Leite, Abe; Adeli, Hossein; McPeek, Robert M; Zelinsky, Gregory J.

bioRxiv ; 2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-39005469

RESUMO

The brain routes and integrates information from many sources during behavior. A number of models explain this phenomenon within the framework of mixed selectivity theory, yet it is difficult to compare their predictions to understand how neurons and circuits integrate information. In this work, we apply time-series partial information decomposition [PID] to compare models of integration on a dataset of superior colliculus [SC] recordings collected during a multi-target visual search task. On this task, SC must integrate target guidance, bottom-up salience, and previous fixation signals to drive attention. We find evidence that SC neurons integrate these factors in diverse ways, including decision-variable selectivity to expected value, functional specialization to previous fixation, and code-switching (to incorporate new visual input).

13.

Eye can read your mind: decoding gaze fixations to reveal categorical search targets.

Zelinsky, Gregory J; Peng, Yifan; Samaras, Dimitris.

J Vis ; 13(14)2013 Dec 12.

Artigo em Inglês | MEDLINE | ID: mdl-24338446

RESUMO

Is it possible to infer a person's goal by decoding their fixations on objects? Two groups of participants categorically searched for either a teddy bear or butterfly among random category distractors, each rated as high, medium, or low in similarity to the target classes. Target-similar objects were preferentially fixated in both search tasks, demonstrating information about target category in looking behavior. Different participants then viewed the searchers' scanpaths, superimposed over the target-absent displays, and attempted to decode the target category (bear/butterfly). Bear searchers were classified perfectly; butterfly searchers were classified at 77%. Bear and butterfly Support Vector Machine (SVM) classifiers were also used to decode the same preferentially fixated objects and found to yield highly comparable classification rates. We conclude that information about a person's search goal exists in fixation behavior, and that this information can be behaviorally decoded to reveal a search target-essentially reading a person's mind by analyzing their fixations.

Assuntos

Fixação Ocular/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Inteligência Artificial , Movimentos Oculares , Feminino , Humanos , Masculino

14.

Modeling guidance and recognition in categorical search: bridging human and computer object detection.

Zelinsky, Gregory J; Peng, Yifan; Berg, Alexander C; Samaras, Dimitris.

J Vis ; 13(3): 30, 2013 Oct 08.

Artigo em Inglês | MEDLINE | ID: mdl-24105460

RESUMO

Search is commonly described as a repeating cycle of guidance to target-like objects, followed by the recognition of these objects as targets or distractors. Are these indeed separate processes using different visual features? We addressed this question by comparing observer behavior to that of support vector machine (SVM) models trained on guidance and recognition tasks. Observers searched for a categorically defined teddy bear target in four-object arrays. Target-absent trials consisted of random category distractors rated in their visual similarity to teddy bears. Guidance, quantified as first-fixated objects during search, was strongest for targets, followed by target-similar, medium-similarity, and target-dissimilar distractors. False positive errors to first-fixated distractors also decreased with increasing dissimilarity to the target category. To model guidance, nine teddy bear detectors, using features ranging in biological plausibility, were trained on unblurred bears then tested on blurred versions of the same objects appearing in each search display. Guidance estimates were based on target probabilities obtained from these detectors. To model recognition, nine bear/nonbear classifiers, trained and tested on unblurred objects, were used to classify the object that would be fixated first (based on the detector estimates) as a teddy bear or a distractor. Patterns of categorical guidance and recognition accuracy were modeled almost perfectly by an HMAX model in combination with a color histogram feature. We conclude that guidance and recognition in the context of search are not separate processes mediated by different features, and that what the literature knows as guidance is really recognition performed on blurred objects viewed in the visual periphery.

Assuntos

Processamento de Imagem Assistida por Computador , Reconhecimento Visual de Modelos/fisiologia , Movimentos Oculares/fisiologia , Humanos , Tempo de Reação

15.

Are all real-world objects created equal? Estimating the "set-size" of the search target in visual working memory.

Miuccio, Michael T; Zelinsky, Gregory J; Schmidt, Joseph.

Psychophysiology ; 59(4): e13998, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35001411

RESUMO

Are all real-world objects created equal? Visual search difficulty increases with the number of targets and as target-related visual working memory (VWM) load increases. Our goal was to investigate the load imposed by individual real-world objects held in VWM in the context of search. Measures of visual clutter attempt to quantify real-world set-size in the context of scenes. We applied one of these measures, the number of proto-objects, to individual real-world objects and used contralateral delay activity (CDA) to measure the resulting VWM load. The current study presented a real-world object as a target cue, followed by a delay where CDA was measured. This was followed by a four-object search array. We compared CDA and later search performance from target cues containing a high or low number of proto-objects. High proto-object target cues resulted in greater CDA, longer search RTs, target dwell times, and reduced search guidance, relative to low proto-object targets. These findings demonstrate that targets with more proto-objects result in a higher VWM load and reduced search performance. This shows that the number of proto-objects contained within individual objects produce set-size like effects in VWM and suggests proto-objects may be a viable unit of measure of real-world VWM load. Importantly, this demonstrates that not all real-world objects are created equal.

Assuntos

Potenciais Evocados , Memória de Curto Prazo , Sinais (Psicologia) , Humanos , Percepção Visual

16.

Target-absent Human Attention.

Yang, Zhibo; Mondal, Sounak; Ahn, Seoyoung; Zelinsky, Gregory; Hoai, Minh; Samaras, Dimitris.

Comput Vis ECCV ; 13664: 52-68, 2022 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-38144433

RESUMO

The prediction of human gaze behavior is important for building human-computer interaction systems that can anticipate the user's attention. Computer vision models have been developed to predict the fixations made by people as they search for target objects. But what about when the target is not in the image? Equally important is to know how people search when they cannot find a target, and when they would stop searching. In this paper, we propose a data-driven computational model that addresses the search-termination problem and predicts the scanpath of search fixations made by people searching for targets that do not appear in images. We model visual search as an imitation learning problem and represent the internal knowledge that the viewer acquires through fixations using a novel state representation that we call Foveated Feature Maps (FFMs). FFMs integrate a simulated foveated retina into a pretrained ConvNet that produces an in-network feature pyramid, all with minimal computational overhead. Our method integrates FFMs as the state representation in inverse reinforcement learning. Experimentally, we improve the state of the art in predicting human target-absent search behavior on the COCO-Search18 dataset. Code is available at: https://github.com/cvlab-stonybrook/Target-absent-Human-Attention.

17.

Do object refixations during scene viewing indicate rehearsal in visual working memory?

Zelinsky, Gregory J; Loschky, Lester C; Dickinson, Christopher A.

Mem Cognit ; 39(4): 600-13, 2011 May.

Artigo em Inglês | MEDLINE | ID: mdl-21264590

RESUMO

Do refixations serve a rehearsal function in visual working memory (VWM)? We analyzed refixations from observers freely viewing multiobject scenes. An eyetracker was used to limit the viewing of a scene to a specified number of objects fixated after the target (intervening objects), followed by a four-alternative forced choice recognition test. Results showed that the probability of target refixation increased with the number of fixated intervening objects, and these refixations produced a 16% accuracy benefit over the first five intervening-object conditions. Additionally, refixations most frequently occurred after fixations on only one to two other objects, regardless of the intervening-object condition. These behaviors could not be explained by random or minimally constrained computational models; a VWM component was required to completely describe these data. We explain these findings in terms of a monitor-refixate rehearsal system: The activations of object representations in VWM are monitored, with refixations occurring when these activations decrease suddenly.

Assuntos

Atenção , Fixação Ocular , Memória de Curto Prazo , Reconhecimento Visual de Modelos , Prática Psicológica , Percepção de Cores , Humanos , Intenção , Modelos Teóricos , Reconhecimento Psicológico , Percepção Espacial

18.

Cutting through the clutter: searching for targets in evolving complex scenes.

Neider, Mark B; Zelinsky, Gregory J.

J Vis ; 11(14)2011 Dec 07.

Artigo em Inglês | MEDLINE | ID: mdl-22159628

RESUMO

We evaluated the use of visual clutter as a surrogate measure of set size effects in visual search by comparing the effects of subjective clutter (determined by independent raters) and objective clutter (as quantified by edge count and feature congestion) using "evolving" scenes, ones that varied incrementally in clutter while maintaining their semantic continuity. Observers searched for a target building in rural, suburban, and urban city scenes created using the game SimCity. Stimuli were 30 screenshots obtained for each scene type as the city evolved over time. Reaction times and search guidance (measured by scan path ratio) were fastest/strongest for sparsely cluttered rural scenes, slower/weaker for more cluttered suburban scenes, and slowest/weakest for highly cluttered urban scenes. Subjective within-city clutter estimates also increased as each city matured and correlated highly with RT and search guidance. However, multiple regression modeling revealed that adding objective estimates failed to better predict search performance over the subjective estimates alone. This suggests that within-city clutter may not be explained exclusively by low-level feature congestion; conceptual congestion (e.g., the number of different types of buildings in a scene), part of the subjective clutter measure, may also be important in determining the effects of clutter on search.

Assuntos

Atenção/fisiologia , Movimentos Oculares/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Humanos , Estimulação Luminosa

19.

Visual similarity effects in categorical search.

Alexander, Robert G; Zelinsky, Gregory J.

J Vis ; 11(8)2011 Jul 14.

Artigo em Inglês | MEDLINE | ID: mdl-21757505

RESUMO

We asked how visual similarity relationships affect search guidance to categorically defined targets (no visual preview). Experiment 1 used a web-based task to collect visual similarity rankings between two target categories, teddy bears and butterflies, and random-category objects, from which we created search displays in Experiment 2 having either high-similarity distractors, low-similarity distractors, or "mixed" displays with high-, medium-, and low-similarity distractors. Analysis of target-absent trials revealed faster manual responses and fewer fixated distractors on low-similarity displays compared to high-similarity displays. On mixed displays, first fixations were more frequent on high-similarity distractors (bear = 49%; butterfly = 58%) than on low-similarity distractors (bear = 9%; butterfly = 12%). Experiment 3 used the same high/low/mixed conditions, but now these conditions were created using similarity estimates from a computer vision model that ranked objects in terms of color, texture, and shape similarity. The same patterns were found, suggesting that categorical search can indeed be guided by purely visual similarity. Experiment 4 compared cases where the model and human rankings differed and when they agreed. We found that similarity effects were best predicted by cases where the two sets of rankings agreed, suggesting that both human visual similarity rankings and the computer vision model captured features important for guiding search to categorical targets.

Assuntos

Inteligência Artificial , Atenção/fisiologia , Movimentos Oculares/fisiologia , Percepção de Forma/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Fixação Ocular/fisiologia , Humanos , Estimulação Luminosa/métodos , Psicofísica

20.

COCO-Search18 fixation dataset for predicting goal-directed attention control.

Chen, Yupei; Yang, Zhibo; Ahn, Seoyoung; Samaras, Dimitris; Hoai, Minh; Zelinsky, Gregory.

Sci Rep ; 11(1): 8776, 2021 04 22.

Artigo em Inglês | MEDLINE | ID: mdl-33888734

RESUMO

Attention control is a basic behavioral process that has been studied for decades. The currently best models of attention control are deep networks trained on free-viewing behavior to predict bottom-up attention control - saliency. We introduce COCO-Search18, the first dataset of laboratory-quality goal-directed behavior large enough to train deep-network models. We collected eye-movement behavior from 10 people searching for each of 18 target-object categories in 6202 natural-scene images, yielding [Formula: see text] 300,000 search fixations. We thoroughly characterize COCO-Search18, and benchmark it using three machine-learning methods: a ResNet50 object detector, a ResNet50 trained on fixation-density maps, and an inverse-reinforcement-learning model trained on behavioral search scanpaths. Models were also trained/tested on images transformed to approximate a foveated retina, a fundamental biological constraint. These models, each having a different reliance on behavioral training, collectively comprise the new state-of-the-art in predicting goal-directed search fixations. Our expectation is that future work using COCO-Search18 will far surpass these initial efforts, finding applications in domains ranging from human-computer interactive systems that can anticipate a person's intent and render assistance to the potentially early identification of attention-related clinical disorders (ADHD, PTSD, phobia) based on deviation from neurotypical fixation behavior.

Assuntos

Atenção , Fixação Ocular , Objetivos , Conjuntos de Dados como Assunto , Aprendizado Profundo , Humanos , Sistemas Homem-Máquina

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA