RESUMO
Past research suggests that recognizing scene gist, a viewer's holistic semantic representation of a scene acquired within a single eye fixation, involves purely feed-forward mechanisms. We investigated whether expectations can influence scene categorization. To do this, we embedded target scenes in more ecologically valid, first-person-viewpoint image sequences, along spatiotemporally connected routes (e.g., an office to a parking lot). We manipulated the sequences' spatiotemporal coherence by presenting them either coherently or in random order. Participants identified the category of one target scene in a 10-scene-image rapid serial visual presentation. Categorization accuracy was greater for targets in coherent sequences. Accuracy was also greater for targets with more visually similar primes. In Experiment 2, we investigated whether targets in coherent sequences were more predictable and whether predictable images were identified more accurately in Experiment 1 after accounting for the effect of prime-to-target visual similarity. To do this, we removed targets and had participants predict the category of the missing scene. Images were more accurately predicted in coherent sequences, and both image predictability and prime-to-target visual similarity independently contributed to performance in Experiment 1. To test whether prediction-based facilitation effects were solely due to response bias, participants performed a two-alternative forced-choice task in which they indicated whether the target was an intact or a phase-randomized scene. Critically, predictability of the target category was irrelevant to this task. Nevertheless, results showed that sensitivity, but not response bias, was greater for targets in coherent sequences. Predictions made prior to viewing a scene facilitate scene-gist recognition.
Assuntos
Fixação Ocular , Reconhecimento Visual de Modelos , Reconhecimento Psicológico , Percepção Visual , Adulto , Atenção , Feminino , Humanos , Processamento de Imagem Assistida por Computador , Masculino , Tempo de Reação , Visão Ocular , Adulto JovemRESUMO
We investigated the relative contributions of central versus peripheral vision in scene-gist recognition with panoramic 180° scenes. Experiment 1 used the window/scotoma paradigm of Larson and Loschky (2009). We replicated their findings that peripheral vision was more important for rapid scene categorization, while central vision was more efficient, but those effects were greatly magnified. For example, in comparing our critical radius (which produced equivalent performance with mutually exclusive central and peripheral image regions) to that of Larson and Loschky, our critical radius of 10° had a ratio of central to peripheral image area that was 10 times smaller. Importantly, we found different functional relationships between the radius of centrally versus peripherally presented imagery (or the proportion of centrally versus peripherally presented image area) and scene-categorization sensitivity. For central vision, stimulus discriminability was an inverse function of image radius, while for peripheral vision the relationship was essentially linear. In Experiment 2, we tested the photographic-bias hypothesis that the greater efficiency of central vision for rapid scene categorization was due to more diagnostic information in the center of photographs. We factorially compared the effects of the eccentricity from which imagery was sampled versus the eccentricity at which imagery was presented. The presentation eccentricity effect was roughly 3 times greater than the sampling eccentricity effect, showing that the central-vision efficiency advantage was primarily due to the greater sensitivity of central vision. We discuss our results in terms of the eccentricity-dependent neurophysiology of vision and discuss implications for computationally modeling rapid scene categorization.
Assuntos
Reconhecimento Visual de Modelos/fisiologia , Reconhecimento Psicológico/fisiologia , Campos Visuais/fisiologia , Percepção Visual/fisiologia , Adulto , Feminino , Humanos , Masculino , Testes de Campo Visual , Adulto JovemRESUMO
Objective We implemented a gaze-contingent useful field of view paradigm to examine older adult multitasking performance in a simulated driving environment. Background Multitasking refers to the ability to manage multiple simultaneous streams of information. Recent work suggests that multitasking declines with age, yet the mechanisms supporting these declines are still debated. One possible framework to better understand this phenomenon is the useful field of view, or the area in the visual field where information can be attended and processed. In particular, the useful field of view allows for the discrimination of two competing theories of real-time multitasking, a general interference account and a tunneling account. Methods Twenty-five older adult subjects completed a useful field of view task that involved discriminating the orientation of lines in gaze-contingent Gabor patches appearing at varying eccentricities (based on distance from the fovea) as they operated a vehicle in a driving simulator. In half of the driving scenarios, subjects also completed an auditory two-back task to manipulate cognitive workload, and during some trials, wind was introduced as a means to alter general driving difficulty. Results Consistent with prior work, indices of driving performance were sensitive to both wind and workload. Interestingly, we also observed a decline in Gabor patch discrimination accuracy under high cognitive workload regardless of eccentricity, which provides support for a general interference account of multitasking. Conclusion The results showed that our gaze-contingent useful field of view paradigm was able to successfully examine older adult multitasking performance in a simulated driving environment. Application This study represents the first attempt to successfully measure dynamic changes in the useful field of view for older adults completing a multitasking scenario involving driving.
Assuntos
Envelhecimento/fisiologia , Percepção Auditiva/fisiologia , Função Executiva/fisiologia , Movimentos Oculares/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Desempenho Psicomotor/fisiologia , Campos Visuais/fisiologia , Idoso , Condução de Veículo , HumanosRESUMO
Perceiving the visual world around us requires the brain to represent the features of stimuli and to categorize the stimulus based on these features. Incorrect categorization can result either from errors in visual representation or from errors in processes that lead to categorical choice. To understand the temporal relationship between the neural signatures of such systematic errors, we recorded whole-scalp magnetoencephalography (MEG) data from human subjects performing a rapid-scene categorization task. We built scene category decoders based on (1) spatiotemporally resolved neural activity, (2) spatial envelope (SpEn) image features, and (3) behavioral responses. Using confusion matrices, we tracked how well the pattern of errors from neural decoders could be explained by SpEn decoders and behavioral errors, over time and across cortical areas. Across the visual cortex and the medial temporal lobe, we found that both SpEn and behavioral errors explained unique variance in the errors of neural decoders. Critically, these effects were nearly simultaneous, and most prominent between 100 and 250ms after stimulus onset. Thus, during rapid-scene categorization, neural processes that ultimately result in behavioral categorization are simultaneous and co-localized with neural processes underlying visual information representation.
Assuntos
Mapeamento Encefálico/métodos , Córtex Cerebral/fisiologia , Magnetoencefalografia/métodos , Reconhecimento Visual de Modelos/fisiologia , Reconhecimento Psicológico/fisiologia , Análise e Desempenho de Tarefas , Adulto , Feminino , Humanos , Masculino , Rede Nervosa/fisiologia , Tempo de Reação/fisiologiaRESUMO
This study investigated the relative roles of visuospatial versus linguistic working memory (WM) systems in the online generation of bridging inferences while viewers comprehend visual narratives. We contrasted these relative roles in the visuospatial primacy hypothesis versus the shared (visuospatial & linguistic) systems hypothesis, and tested them in 3 experiments. Participants viewed picture stories containing multiple target episodes consisting of a beginning state, a bridging event, and an end state, respectively, and the presence of the bridging event was manipulated. When absent, viewers had to infer the bridging-event action to comprehend the end-state image. A pilot study showed that after viewing the end-state image, participants' think-aloud protocols contained more inferred actions when the bridging event was absent than when it was present. Likewise, Experiment 1 found longer viewing times for the end-state image when the bridging-event image was absent, consistent with viewing times revealing online inference generation processes. Experiment 2 showed that both linguistic and visuospatial WM loads attenuated the inference viewing time effect, consistent with the shared systems hypothesis. Importantly, however, Experiment 3 found that articulatory suppression did not attenuate the inference viewing time effect, indicating that (sub)vocalization did not support online inference generation during visual narrative comprehension. Thus, the results support a shared-systems hypothesis in which both visuospatial and linguistic WM systems support inference generation in visual narratives, with the linguistic WM system operating at a deeper level than (sub)vocalization.
Assuntos
Compreensão/fisiologia , Idioma , Memória de Curto Prazo/fisiologia , Percepção Visual/fisiologia , Adulto , Feminino , Humanos , Masculino , Adulto JovemRESUMO
A fundamental issue in visual attention is the relationship between the useful field of view (UFOV), the region of visual space where information is encoded within a single fixation, and eccentricity. A common assumption is that impairing attentional resources reduces the size of the UFOV (i.e., tunnel vision). However, most research has not accounted for eccentricity-dependent changes in spatial resolution, potentially conflating fixed visual properties with flexible changes in visual attention. Williams (1988, 1989) argued that foveal loads are necessary to reduce the size of the UFOV, producing tunnel vision. Without a foveal load, it is argued that the attentional decrement is constant across the visual field (i.e., general interference). However, other research asserts that auditory working memory (WM) loads produce tunnel vision. To date, foveal versus auditory WM loads have not been compared to determine if they differentially change the size of the UFOV. In two experiments, we tested the effects of a foveal (rotated L vs. T discrimination) task and an auditory WM (N-back) task on an extrafoveal (Gabor) discrimination task. Gabor patches were scaled for size and processing time to produce equal performance across the visual field under single-task conditions, thus removing the confound of eccentricity-dependent differences in visual sensitivity. The results showed that although both foveal and auditory loads reduced Gabor orientation sensitivity, only the foveal load interacted with retinal eccentricity to produce tunnel vision, clearly demonstrating task-specific changes to the form of the UFOV. This has theoretical implications for understanding the UFOV.
Assuntos
Atenção , Transtornos da Visão/fisiopatologia , Visão Ocular/fisiologia , Campos Visuais/fisiologia , Percepção Visual/fisiologia , Adulto , Feminino , Humanos , Masculino , Adulto JovemRESUMO
OBJECTIVE: We aimed to develop and test a new dynamic measure of transient changes to the useful field of view (UFOV), utilizing a gaze-contingent paradigm for use in realistic simulated environments. BACKGROUND: The UFOV, the area from which an observer can extract visual information during a single fixation, has been correlated with driving performance and crash risk. However, some existing measures of the UFOV cannot be used dynamically in realistic simulators, and other UFOV measures involve constant stimuli at fixed locations. We propose a gaze-contingent UFOV measure (the GC-UFOV) that solves the above problems. METHODS: Twenty-five participants completed four simulated drives while they concurrently performed an occasional gaze-contingent Gabor orientation discrimination task. Gabors appeared randomly at one of three retinal eccentricities (5°, 10°, or 15°). Cognitive workload was manipulated both with a concurrent auditory working memory task and with driving task difficulty (via presence/absence of lateral wind). RESULTS: Cognitive workload had a detrimental effect on Gabor discrimination accuracy at all three retinal eccentricities. Interestingly, this accuracy cost was equivalent across eccentricities, consistent with previous findings of "general interference" rather than "tunnel vision." CONCLUSION: The results showed that the GC-UFOV method was able to measure transient changes in UFOV due to cognitive load in a realistic simulated environment. APPLICATION: The GC-UFOV paradigm developed and tested in this study is a novel and effective tool for studying transient changes in the UFOV due to cognitive load in the context of complex real-world tasks such as simulated driving.
Assuntos
Condução de Veículo , Fixação Ocular/fisiologia , Desempenho Psicomotor/fisiologia , Campos Visuais/fisiologia , Adulto , HumanosRESUMO
Scene gist, a viewer's holistic representation of a scene from a single eye fixation, has been extensively studied for terrestrial views, but not for aerial views. We compared rapid scene categorization of both views in three experiments to determine the degree to which diagnostic information is view dependent versus view independent.We found large differences in observers' ability to rapidly categorize aerial and terrestrial scene views, consistent with the idea that scene gist recognition is viewpoint dependent.In addition, computational modeling showed that training models on one view (aerial or terrestrial) led to poor performance on the other view, thereby providing further evidence of viewpoint dependence as a function of available information. Importantly, we found that rapid categorization of terrestrial views (but not aerial views) was strongly interfered with by image rotation, further suggesting that terrestrial-view scene gist recognition is viewpoint dependent, with aerial-view scene recognition being viewpoint independent. Furthermore, rotation-invariant texture images synthesized from aerial views of scenes were twice as recognizable as those synthesized from terrestrial views of scenes (which were at chance), providing further evidence that diagnostic information for rapid scene categorization of aerial views is viewpoint invariant. We discuss the results within a perceptual-expertise framework that distinguishes between configural and featural processing, where terrestrial views are more effectively processed due to their predictable view-dependent configurations whereas aerial views are processed less effectively due to reliance on view-independent features.
Assuntos
Área de Dependência-Independência , Reconhecimento Visual de Modelos/fisiologia , Adolescente , Feminino , Fixação Ocular/fisiologia , Humanos , Masculino , Mascaramento Perceptivo , Percepção Espacial/fisiologiaRESUMO
People spontaneously segment continuous ongoing actions into sequences of events. Prior research found that gaze similarity and pupil dilation increase at event boundaries and that older adults segment more idiosyncratically than do young adults. We used eye tracking to explore age-related differences in gaze similarity (i.e., the extent to which individuals look at the same places at the same time as others) and pupil dilation at event boundaries. Older and young adults watched naturalistic videos of actors performing everyday activities while we tracked their eye movements. Afterward, they segmented the videos into subevents. Replicating prior work, we found that pupil size and gaze similarity increased at event boundaries. Thus, there were fewer individual differences in eye position at boundaries. We also found that young adults had higher gaze similarity than older adults throughout an entire video and at event boundaries. This study is the first to show that age-related differences in how people parse continuous everyday activities into events may be partially explained by individual differences in gaze patterns. Those who segment less normatively may do so because they fixate less normative regions. Results have implications for future interventions designed to improve encoding in older adults. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Assuntos
Envelhecimento , Movimentos Oculares , Humanos , IdosoRESUMO
Scene Perception and Event Comprehension Theory (SPECT) posits that understanding picture stories depends upon a coordination of two processes: (1) integrating new information into the current event model that is coherent with it (i.e., mapping) and (2) segmenting experiences into distinct event models (i.e., shifting). In two experiments, we investigated competing hypotheses regarding how viewers coordinate the mapping process of bridging inference generation and the shifting process of event segmentation by manipulating the presence/absence of Bridging Action pictures (i.e., creating coherence gaps) in wordless picture stories. The Computational Effort Hypothesis says that experiencing a coherence gap prompts event segmentation and the additional computational effort to generate bridging inferences. Thus, it predicted a positive relationship between event segmentation and explanations when Bridging Actions were absent. Alternatively, the Coherence Gap Resolution Hypothesis says that experiencing a coherence gap prompt generating a bridging inference to close the gap, which obviates segmentation. Thus, it predicted a negative relationship between event segmentation and the production of explanations. Replicating prior work, viewers were more likely to segment and generate explanations when Bridging Action pictures were absent than when they were present. Crucially, the relationship between explanations and segmentation was negative when Bridging Action pictures were absent, consistent with the Coherence Gap Resolution Hypothesis. Unexpectedly, the relationship was positive when Bridging Actions were present. The results are consistent with SPECT's assumption that mapping and shifting processes are coordinated, but how they are coordinated depends upon the experience of a coherence gap.
RESUMO
Viewers can recognize the gist of a scene (i.e., its holistic semantic representation, such as its category) in less time than a single fixation, and backward masking has traditionally been employed as a means to determine that time course. The masks used in those paradigms are often characterized by either specific amplitude spectra only, or amplitude and phase spectra-defined structural properties. However, it remains unclear whether there would be a differential contribution of amplitude only or amplitude + phase defined image statistics to the effective backward masking of rapid scene categorization. The current study addresses this issue. Experiments 1-3 explored amplitude spectra defined contributions to category masking and revealed that the slope of the amplitude spectrum was more important for modulating scene category masking strength than amplitude orientation. Further, the masking effects followed an "amplitude spectrum slope similarity principle" whereby the more similar the amplitude spectrum slope of the mask was to the target's amplitude spectrum slope, the stronger the masking. Experiment 5 showed that, when holding mask amplitude spectrum slope approximately constant, both categorically specific unrecognizable amplitude only and amplitude + phase statistical regularities disrupted rapid scene categorization. Specifically, the masking effects observed in Experiment 5 followed a target-mask categorical dissimilarity principle whereby the more dissimilar the mask category is to the target image category, the stronger the masking. Overall, the results support the notion that amplitude only or amplitude + phase-defined image statistics differentially contribute to the effective backward masking of rapid scene gist recognition.
Assuntos
Área de Dependência-Independência , Percepção de Forma/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Mascaramento Perceptivo/fisiologia , Adolescente , Adulto , Feminino , Humanos , Orientação , Tempo de Reação , Adulto JovemRESUMO
When people see political advertisements on a polarized issue they take a stance on, what factors influence how they respond to and remember the adverts contents? Across three studies, we tested competing hypotheses about how individual differences in social vigilantism (i.e., attitude superiority) and need for cognition relate to intentions to resist attitude change and memory for political advertisements concerning abortion. In Experiments 1 and 2, we examined participants' intentions to use resistance strategies to preserve their pre-existing attitudes about abortion, by either engaging against opposing opinions or disengaging from them. In Experiment 3, we examined participants' memory for information about both sides of the controversy presented in political advertisements. Our results suggest higher levels of social vigilantism are related to greater intentions to counterargue and better memory for attitude-incongruent information. These findings extend our understanding of individual differences in how people process and respond to controversial social and political discourse.
RESUMO
Viewers' attentional selection while looking at scenes is affected by both top-down and bottom-up factors. However, when watching film, viewers typically attend to the movie similarly irrespective of top-down factors-a phenomenon we call the tyranny of film. A key difference between still pictures and film is that film contains motion, which is a strong attractor of attention and highly predictive of gaze during film viewing. The goal of the present study was to test if the tyranny of film is driven by motion. To do this, we created a slideshow presentation of the opening scene of Touch of Evil. Context condition participants watched the full slideshow. No-context condition participants did not see the opening portion of the scene, which showed someone placing a time bomb into the trunk of a car. In prior research, we showed that despite producing very different understandings of the clip, this manipulation did not affect viewers' attention (i.e., the tyranny of film), as both context and no-context participants were equally likely to fixate on the car with the bomb when the scene was presented as a film. The current study found that when the scene was shown as a slideshow, the context manipulation produced differences in attentional selection (i.e., it attenuated attentional synchrony). We discuss these results in the context of the Scene Perception and Event Comprehension Theory, which specifies the relationship between event comprehension and attentional selection in the context of visual narratives.
Assuntos
Compreensão , Movimentos Oculares , Atenção , Humanos , Filmes Cinematográficos , Motivação , Percepção VisualRESUMO
Do refixations serve a rehearsal function in visual working memory (VWM)? We analyzed refixations from observers freely viewing multiobject scenes. An eyetracker was used to limit the viewing of a scene to a specified number of objects fixated after the target (intervening objects), followed by a four-alternative forced choice recognition test. Results showed that the probability of target refixation increased with the number of fixated intervening objects, and these refixations produced a 16% accuracy benefit over the first five intervening-object conditions. Additionally, refixations most frequently occurred after fixations on only one to two other objects, regardless of the intervening-object condition. These behaviors could not be explained by random or minimally constrained computational models; a VWM component was required to completely describe these data. We explain these findings in terms of a monitor-refixate rehearsal system: The activations of object representations in VWM are monitored, with refixations occurring when these activations decrease suddenly.
Assuntos
Atenção , Fixação Ocular , Memória de Curto Prazo , Reconhecimento Visual de Modelos , Prática Psicológica , Percepção de Cores , Humanos , Intenção , Modelos Teóricos , Reconhecimento Psicológico , Percepção EspacialRESUMO
How does viewers' knowledge guide their attention while they watch everyday events, how does it affect their memory, and does it change with age? Older adults have diminished episodic memory for everyday events, but intact semantic knowledge. Indeed, research suggests that older adults may rely on their semantic memory to offset impairments in episodic memory, and when relevant knowledge is lacking, older adults' memory can suffer. Yet, the mechanism by which prior knowledge guides attentional selection when watching dynamic activity is unclear. To address this, we studied the influence of knowledge on attention and memory for everyday events in young and older adults by tracking their eyes while they watched videos. The videos depicted activities that older adults perform more frequently than young adults (balancing a checkbook, planting flowers) or activities that young adults perform more frequently than older adults (installing a printer, setting up a video game). Participants completed free recall, recognition, and order memory tests after each video. We found age-related memory deficits when older adults had little knowledge of the activities, but memory did not differ between age groups when older adults had relevant knowledge and experience with the activities. Critically, results showed that knowledge influenced where viewers fixated when watching the videos. Older adults fixated less goal-relevant information compared to young adults when watching young adult activities, but they fixated goal-relevant information similarly to young adults, when watching more older adult activities. Finally, results showed that fixating goal-relevant information predicted free recall of the everyday activities for both age groups. Thus, older adults may use relevant knowledge to more effectively infer the goals of actors, which guides their attention to goal-relevant actions, thus improving their episodic memory for everyday activities.
Assuntos
Objetivos , Memória Episódica , Idoso , Envelhecimento , Humanos , Rememoração Mental , Reconhecimento Psicológico , Adulto JovemRESUMO
Visual crowding, the impairment of object recognition in peripheral vision due to flanking objects, has generally been studied using simple stimuli on blank backgrounds. While crowding is widely assumed to occur in natural scenes, it has not been shown rigorously yet. Given that scene contexts can facilitate object recognition, crowding effects may be dampened in real-world scenes. Therefore, this study investigated crowding using objects in computer-generated real-world scenes. In two experiments, target objects were presented with four flanker objects placed uniformly around the target. Previous research indicates that crowding occurs when the distance between the target and flanker is approximately less than half the retinal eccentricity of the target. In each image, the spacing between the target and flanker objects was varied considerably above or below the standard (0.5) threshold to either suppress or facilitate the crowding effect. Experiment 1 cued the target location and then briefly flashed the scene image before participants could move their eyes. Participants then selected the target object's category from a 15-alternative forced choice response set (including all objects shown in the scene). Experiment 2 used eye tracking to ensure participants were centrally fixating at the beginning of each trial and showed the image for the duration of the participant's fixation. Both experiments found object recognition accuracy decreased with smaller spacing between targets and flanker objects. Thus, this study rigorously shows crowding of objects in semantically consistent real-world scenes.
RESUMO
Understanding how people comprehend visual narratives (including picture stories, comics, and film) requires the combination of traditionally separate theories that span the initial sensory and perceptual processing of complex visual scenes, the perception of events over time, and comprehension of narratives. Existing piecemeal approaches fail to capture the interplay between these levels of processing. Here, we propose the Scene Perception & Event Comprehension Theory (SPECT), as applied to visual narratives, which distinguishes between front-end and back-end cognitive processes. Front-end processes occur during single eye fixations and are comprised of attentional selection and information extraction. Back-end processes occur across multiple fixations and support the construction of event models, which reflect understanding of what is happening now in a narrative (stored in working memory) and over the course of the entire narrative (stored in long-term episodic memory). We describe relationships between front- and back-end processes, and medium-specific differences that likely produce variation in front-end and back-end processes across media (e.g., picture stories vs. film). We describe several novel research questions derived from SPECT that we have explored. By addressing these questions, we provide greater insight into how attention, information extraction, and event model processes are dynamically coordinated to perceive and understand complex naturalistic visual events in narratives and the real world.
Assuntos
Atenção/fisiologia , Desenhos Animados como Assunto , Compreensão/fisiologia , Movimentos Oculares/fisiologia , Filmes Cinematográficos , Narração , Reconhecimento Visual de Modelos/fisiologia , Teoria Psicológica , HumanosRESUMO
Which region of the visual field is most useful for recognizing scene gist, central vision (the fovea and parafovea) based on its higher visual resolution and importance for object recognition, or the periphery, based on resolving lower spatial frequencies useful for scene gist recognition, and its large extent? Scenes were presented in two experimental conditions: a "Window," a circular region showing the central portion of a scene, and blocking peripheral information, or a "Scotoma," which blocks out the central portion of a scene and shows only the periphery. Results indicated the periphery was more useful than central vision for maximal performance (i.e., equal to seeing the entire image). Nevertheless, central vision was more efficient for scene gist recognition than the periphery on a per-pixel basis. A critical radius of 7.4 degrees was found where the Window and Scotoma performance curves crossed, producing equal performance. This value was compared to predicted critical radii from cortical magnification functions on the assumption that equal V1 activation would produce equal performance. However, these predictions were systematically smaller than the empirical critical radius, suggesting that the utility of central vision for gist recognition is less than predicted by V1 cortical magnification.
Assuntos
Reconhecimento Visual de Modelos/fisiologia , Campos Visuais/fisiologia , Adolescente , Humanos , Modelos Neurológicos , Estimulação Luminosa/métodos , Percepção Espacial/fisiologia , Córtex Visual/fisiologia , Adulto JovemRESUMO
What information do people use to categorize scenes? Computational scene classification models have proposed that unlocalized amplitude information, the distribution of spatial frequencies and orientations, is useful for categorizing scenes. Previous research has provided conflicting results regarding this claim. Our previous research (Loschky et al., 2007) has shown that randomly localizing amplitude information (i.e., randomizing phase) greatly disrupts scene categorization at the basic level. Conversely, studies suggesting the usefulness of unlocalized amplitude information have used binary distinctions, e.g., Natural/Man-made. We hypothesized that unlocalized amplitude information contributes more to the Natural/Man-made distinction than basic level distinctions. Using an established set of images and categories, we varied phase randomization and measured participants' ability to distinguish Natural versus Man-made scenes or scenes at the basic level. Results showed that eliminating localized information by phase randomization disrupted scene classification even for the Natural/Man-made distinction, demonstrating that amplitude localization is necessary for scene categorization.