Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
Proc Natl Acad Sci U S A ; 120(40): e2211179120, 2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37769256

RESUMO

In modeling vision, there has been a remarkable progress in recognizing a range of scene components, but the problem of analyzing full scenes, an ultimate goal of visual perception, is still largely open. To deal with complete scenes, recent work focused on the training of models for extracting the full graph-like structure of a scene. In contrast with scene graphs, humans' scene perception focuses on selected structures in the scene, starting with a limited interpretation and evolving sequentially in a goal-directed manner [G. L. Malcolm, I. I. A. Groen, C. I. Baker, Trends. Cogn. Sci. 20, 843-856 (2016)]. Guidance is crucial throughout scene interpretation since the extraction of full scene representation is often infeasible. Here, we present a model that performs human-like guided scene interpretation, using an iterative bottom-up, top-down processing, in a "counterstream" structure motivated by cortical circuitry. The process proceeds by the sequential application of top-down instructions that guide the interpretation process. The results show how scene structures of interest to the viewer are extracted by an automatically selected sequence of top-down instructions. The model shows two further benefits. One is an inherent capability to deal well with the problem of combinatorial generalization-generalizing broadly to unseen scene configurations, which is limited in current network models [B. Lake, M. Baroni, 35th International Conference on Machine Learning, ICML 2018 (2018)]. The second is the ability to combine visual with nonvisual information at each cycle of the interpretation process, which is a key aspect for modeling human perception as well as advancing AI vision systems.


Assuntos
Motivação , Percepção Visual , Humanos , Estimulação Luminosa/métodos , Reconhecimento Visual de Modelos
2.
Proc Natl Acad Sci U S A ; 119(20): e2117184119, 2022 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-35549552

RESUMO

Gaze understanding­a suggested precursor for understanding others' intentions­requires recovery of gaze direction from the observed person's head and eye position. This challenging computation is naturally acquired at infancy without explicit external guidance, but can it be learned later if vision is extremely poor throughout early childhood? We addressed this question by studying gaze following in Ethiopian patients with early bilateral congenital cataracts diagnosed and treated by us only at late childhood. This sight restoration provided a unique opportunity to directly address basic issues on the roles of "nature" and "nurture" in development, as it caused a selective perturbation to the natural process, eliminating some gaze-direction cues while leaving others still available. Following surgery, the patients' visual acuity typically improved substantially, allowing discrimination of pupil position in the eye. Yet, the patients failed to show eye gaze-following effects and fixated less than controls on the eyes­two spontaneous behaviors typically seen in controls. Our model for unsupervised learning of gaze direction explains how head-based gaze following can develop under severe image blur, resembling preoperative conditions. It also suggests why, despite acquiring sufficient resolution to extract eye position, automatic eye gaze following is not established after surgery due to lack of detailed early visual experience. We suggest that visual skills acquired in infancy in an unsupervised manner will be difficult or impossible to acquire when internal guidance is no longer available, even when sufficient image resolution for the task is restored. This creates fundamental barriers to spontaneous vision recovery following prolonged deprivation in early age.


Assuntos
Fixação Ocular , Visão Ocular , Atenção , Cegueira , Criança , Humanos , Acuidade Visual
3.
Proc Natl Acad Sci U S A ; 118(34)2021 08 24.
Artigo em Inglês | MEDLINE | ID: mdl-34417308

RESUMO

Natural vision is a dynamic and continuous process. Under natural conditions, visual object recognition typically involves continuous interactions between ocular motion and visual contrasts, resulting in dynamic retinal activations. In order to identify the dynamic variables that participate in this process and are relevant for image recognition, we used a set of images that are just above and below the human recognition threshold and whose recognition typically requires >2 s of viewing. We recorded eye movements of participants while attempting to recognize these images within trials lasting 3 s. We then assessed the activation dynamics of retinal ganglion cells resulting from ocular dynamics using a computational model. We found that while the saccadic rate was similar between recognized and unrecognized trials, the fixational ocular speed was significantly larger for unrecognized trials. Interestingly, however, retinal activation level was significantly lower during these unrecognized trials. We used retinal activation patterns and oculomotor parameters of each fixation to train a binary classifier, classifying recognized from unrecognized trials. Only retinal activation patterns could predict recognition, reaching 80% correct classifications on the fourth fixation (on average, ∼2.5 s from trial onset). We thus conclude that the information that is relevant for visual perception is embedded in the dynamic interactions between the oculomotor sequence and the image. Hence, our results suggest that ocular dynamics play an important role in recognition and that understanding the dynamics of retinal activation is crucial for understanding natural vision.


Assuntos
Fixação Ocular , Retina/fisiologia , Percepção Visual/fisiologia , Adulto , Feminino , Humanos , Masculino , Projetos Piloto , Movimentos Sacádicos , Adulto Jovem
4.
J Cogn Neurosci ; 31(9): 1354-1367, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31059350

RESUMO

Visual object recognition is performed effortlessly by humans notwithstanding the fact that it requires a series of complex computations, which are, as yet, not well understood. Here, we tested a novel account of the representations used for visual recognition and their neural correlates using fMRI. The rationale is based on previous research showing that a set of representations, termed "minimal recognizable configurations" (MIRCs), which are computationally derived and have unique psychophysical characteristics, serve as the building blocks of object recognition. We contrasted the BOLD responses elicited by MIRC images, derived from different categories (faces, objects, and places), sub-MIRCs, which are visually similar to MIRCs, but, instead, result in poor recognition and scrambled, unrecognizable images. Stimuli were presented in blocks, and participants indicated yes/no recognition for each image. We confirmed that MIRCs elicited higher recognition performance compared to sub-MIRCs for all three categories. Whereas fMRI activation in early visual cortex for both MIRCs and sub-MIRCs of each category did not differ from that elicited by scrambled images, high-level visual regions exhibited overall greater activation for MIRCs compared to sub-MIRCs or scrambled images. Moreover, MIRCs and sub-MIRCs from each category elicited enhanced activation in corresponding category-selective regions including fusiform face area and occipital face area (faces), lateral occipital cortex (objects), and parahippocampal place area and transverse occipital sulcus (places). These findings reveal the psychological and neural relevance of MIRCs and enable us to make progress in developing a more complete account of object recognition.


Assuntos
Reconhecimento Visual de Modelos/fisiologia , Reconhecimento Psicológico/fisiologia , Córtex Visual/fisiologia , Adulto , Encéfalo/fisiologia , Mapeamento Encefálico , Feminino , Humanos , Imageamento por Ressonância Magnética , Masculino , Estimulação Luminosa , Adulto Jovem
5.
Proc Natl Acad Sci U S A ; 113(10): 2744-9, 2016 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-26884200

RESUMO

Discovering the visual features and representations used by the brain to recognize objects is a central problem in the study of vision. Recently, neural network models of visual object recognition, including biological and deep network models, have shown remarkable progress and have begun to rival human performance in some challenging tasks. These models are trained on image examples and learn to extract features and representations and to use them for categorization. It remains unclear, however, whether the representations and learning processes discovered by current models are similar to those used by the human visual system. Here we show, by introducing and using minimal recognizable images, that the human visual system uses features and processes that are not used by current models and that are critical for recognition. We found by psychophysical studies that at the level of minimal recognizable images a minute change in the image can have a drastic effect on recognition, thus identifying features that are critical for the task. Simulations then showed that current models cannot explain this sensitivity to precise feature configurations and, more generally, do not learn to recognize minimal images at a human level. The role of the features shown here is revealed uniquely at the minimal level, where the contribution of each feature is essential. A full understanding of the learning and use of such features will extend our understanding of visual recognition and its cortical mechanisms and will enhance the capacity of computational models to learn from visual experience and to deal with recognition and detailed image interpretation.


Assuntos
Redes Neurais de Computação , Reconhecimento Visual de Modelos/fisiologia , Visão Ocular/fisiologia , Percepção Visual/fisiologia , Encéfalo/fisiologia , Humanos , Modelos Neurológicos , Rede Nervosa/fisiologia , Estimulação Luminosa , Psicofísica/métodos , Córtex Visual/fisiologia , Vias Visuais/fisiologia
6.
Proc Natl Acad Sci U S A ; 109(44): 18215-20, 2012 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-23012418

RESUMO

Early in development, infants learn to solve visual problems that are highly challenging for current computational methods. We present a model that deals with two fundamental problems in which the gap between computational difficulty and infant learning is particularly striking: learning to recognize hands and learning to recognize gaze direction. The model is shown a stream of natural videos and learns without any supervision to detect human hands by appearance and by context, as well as direction of gaze, in complex natural scenes. The algorithm is guided by an empirically motivated innate mechanism--the detection of "mover" events in dynamic images, which are the events of a moving image region causing a stationary region to move or change after contact. Mover events provide an internal teaching signal, which is shown to be more effective than alternative cues and sufficient for the efficient acquisition of hand and gaze representations. The implications go beyond the specific tasks, by showing how domain-specific "proto concepts" can guide the system to acquire meaningful concepts, which are significant to the observer but statistically inconspicuous in the sensory input.


Assuntos
Percepção Visual , Mãos , Humanos , Análise e Desempenho de Tarefas
7.
Nature ; 438(7065): E3; discussion E3-4, 2005 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-16280984

RESUMO

Any analysis of plastic reorganization at a neuronal locus needs a veridical measure of changes in the functional output--that is, spiking responses of the neurons in question. In a study of the effect of retinal lesions on adult primary visual cortex (V1), Smirnakis et al. propose that there is no cortical reorganization. Their results are based, however, on BOLD (blood-oxygen-level-dependent) fMRI (functional magnetic resonance imaging), which provides an unreliable gauge of spiking activity. We therefore question their criterion for lack of plasticity, particularly in the light of the large body of earlier work that demonstrates cortical plasticity.


Assuntos
Macaca/fisiologia , Plasticidade Neuronal/fisiologia , Córtex Visual/fisiologia , Potenciais de Ação/fisiologia , Adulto , Animais , Humanos , Imageamento por Ressonância Magnética , Reprodutibilidade dos Testes , Retina/lesões , Retina/fisiologia
8.
Proc Natl Acad Sci U S A ; 105(38): 14298-303, 2008 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-18796607

RESUMO

The human visual system recognizes objects and their constituent parts rapidly and with high accuracy. Standard models of recognition by the visual cortex use feed-forward processing, in which an object's parts are detected before the complete object. However, parts are often ambiguous on their own and require the prior detection and localization of the entire object. We show how a cortical-like hierarchy obtains recognition and localization of objects and parts at multiple levels nearly simultaneously by a single feed-forward sweep from low to high levels of the hierarchy, followed by a feedback sweep from high- to low-level areas.


Assuntos
Córtex Cerebral/fisiologia , Modelos Neurológicos , Reconhecimento Visual de Modelos/fisiologia , Retroalimentação Fisiológica , Humanos , Estimulação Luminosa/métodos
9.
J Vis ; 11(8): 18, 2011 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-21799022

RESUMO

Visual expertise is usually defined as the superior ability to distinguish between exemplars of a homogeneous category. Here, we ask how real-world expertise manifests at basic-level categorization and assess the contribution of stimulus-driven and top-down knowledge-based factors to this manifestation. Car experts and novices categorized computer-selected image fragments of cars, airplanes, and faces. Within each category, the fragments varied in their mutual information (MI), an objective quantifiable measure of feature diagnosticity. Categorization of face and airplane fragments was similar within and between groups, showing better performance with increasing MI levels. Novices categorized car fragments more slowly than face and airplane fragments, while experts categorized car fragments as fast as face and airplane fragments. The experts' advantage with car fragments was similar across MI levels, with similar functions relating RT with MI level for both groups. Accuracy was equal between groups for cars as well as faces and airplanes, but experts' response criteria were biased toward cars. These findings suggest that expertise does not entail only specific perceptual strategies. Rather, at the basic level, expertise manifests as a general processing advantage arguably involving application of top-down mechanisms, such as knowledge and attention, which helps experts to distinguish between object categories.


Assuntos
Comportamento de Escolha , Discriminação Psicológica/fisiologia , Aprendizagem/fisiologia , Reconhecimento Psicológico/fisiologia , Percepção Visual/fisiologia , Face , Humanos , Estimulação Luminosa/métodos
10.
Sci Rep ; 11(1): 7827, 2021 04 09.
Artigo em Inglês | MEDLINE | ID: mdl-33837223

RESUMO

Humans recognize individual faces regardless of variation in the facial view. The view-tuned face neurons in the inferior temporal (IT) cortex are regarded as the neural substrate for view-invariant face recognition. This study approximated visual features encoded by these neurons as combinations of local orientations and colors, originated from natural image fragments. The resultant features reproduced the preference of these neurons to particular facial views. We also found that faces of one identity were separable from the faces of other identities in a space where each axis represented one of these features. These results suggested that view-invariant face representation was established by combining view sensitive visual features. The face representation with these features suggested that, with respect to view-invariant face representation, the seemingly complex and deeply layered ventral visual pathway can be approximated via a shallow network, comprised of layers of low-level processing for local orientations and colors (V1/V2-level) and the layers which detect particular sets of low-level elements derived from natural image fragments (IT-level).


Assuntos
Reconhecimento Facial/fisiologia , Reconhecimento Psicológico/fisiologia , Lobo Temporal/fisiologia , Córtex Visual/fisiologia , Vias Visuais/fisiologia , Animais , Mapeamento Encefálico , Face , Macaca fuscata , Rede Nervosa/fisiologia , Neurônios/fisiologia
11.
Cognition ; 201: 104263, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32325309

RESUMO

Objects and their parts can be visually recognized from purely spatial or purely temporal information but the mechanisms integrating space and time are poorly understood. Here we show that visual recognition of objects and actions can be achieved by efficiently combining spatial and motion cues in configurations where each source on its own is insufficient for recognition. This analysis is obtained by identifying minimal videos: these are short and tiny video clips in which objects, parts, and actions can be reliably recognized, but any reduction in either space or time makes them unrecognizable. Human recognition in minimal videos is invariably accompanied by full interpretation of the internal components of the video. State-of-the-art deep convolutional networks for dynamic recognition cannot replicate human behavior in these configurations. The gap between human and machine vision demonstrated here is due to critical mechanisms for full spatiotemporal interpretation that are lacking in current computational models.


Assuntos
Reconhecimento Psicológico , Visão Ocular , Humanos
12.
Cognition ; 183: 67-81, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30419508

RESUMO

Rapid developments in the fields of learning and object recognition have been obtained by successfully developing and using methods for learning from a large number of labeled image examples. However, such current methods cannot explain infants' learning of new concepts based on their visual experience, in particular, the ability to learn complex concepts without external guidance, as well as the natural order in which related concepts are acquired. A remarkable example of early visual learning is the category of 'containers' and the notion of 'containment'. Surprisingly, this is one of the earliest spatial relations to be learned, starting already around 3 month of age, and preceding other common relations (e.g., 'support', 'in-between'). In this work we present a model, which explains infants' capacity of learning 'containment' and related concepts by 'just looking', together with their empirical development trajectory. Learning occurs in the model fast and without external guidance, relying only on perceptual processes that are present in the first months of life. Instead of labeled training examples, the system provides its own internal supervision to guide the learning process. We show how the detection of so-called 'paradoxical occlusion' provides natural internal supervision, which guides the system to gradually acquire a range of useful containment-related concepts. Similar mechanisms of using implicit internal supervision can have broad application in other cognitive domains as well as artificial intelligent systems, because they alleviate the need for supplying extensive external supervision, and because they can guide the learning process to extract concepts that are meaningful to the observer, even if they are not by themselves obvious, or salient in the input.


Assuntos
Desenvolvimento Infantil/fisiologia , Aprendizagem/fisiologia , Modelos Teóricos , Percepção Espacial/fisiologia , Percepção Visual/fisiologia , Humanos , Lactente
13.
Sci Adv ; 5(3): eaav1598, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30944855

RESUMO

Patterns are broad phenomena that relate to biology, chemistry, and physics. The dendritic growth of crystals is the most well-known ice pattern formation process. Tyndall figures are water-melting patterns that occur when ice absorbs light and becomes superheated. Here, we report a previously undescribed ice and water pattern formation process induced by near-infrared irradiation that heats one phase more than the other in a two-phase system. The pattern formed during the irradiation of ice crystals tens of micrometers thick in solution near equilibrium. Dynamic holes and a microchannel labyrinth then formed in specific regions and were characterized by a typical distance between melted points. We concluded that the differential absorption of water and ice was the driving force for the pattern formation. Heating ice by laser absorption might be useful in applications such as the cryopreservation of biological samples.

14.
Trends Cogn Sci ; 11(2): 58-64, 2007 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-17188555

RESUMO

How do we learn to recognize visual categories, such as dogs and cats? Somehow, the brain uses limited variable examples to extract the essential characteristics of new visual categories. Here, I describe an approach to category learning and recognition that is based on recent computational advances. In this approach, objects are represented by a hierarchy of fragments that are extracted during learning from observed examples. The fragments are class-specific features and are selected to deliver a high amount of information for categorization. The same fragments hierarchy is then used for general categorization, individual object recognition and object-parts identification. Recognition is also combined with object segmentation, using stored fragments, to provide a top-down process that delineates object boundaries in complex cluttered scenes. The approach is computationally effective and provides a possible framework for categorization, recognition and segmentation in human vision.


Assuntos
Percepção de Forma/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Reconhecimento Psicológico/fisiologia , Humanos , Estimulação Luminosa
15.
Conscious Cogn ; 17(3): 587-601, 2008 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-18082425

RESUMO

Recently, we proposed a fundamental subdivision of the human cortex into two complementary networks-an "extrinsic" one which deals with the external environment, and an "intrinsic" one which largely overlaps with the "default mode" system, and deals with internally oriented and endogenous mental processes. Here we tested this hypothesis by contrasting decision making under external and internally-derived conditions. Subjects were presented with an external cue, and were required to either follow an external instruction ("determined" condition) or to ignore it and follow a voluntary decision process ("free-will" condition). Our results show that a well defined component of the intrinsic system-the right inferior parietal cortex-was preferentially activated during the "free-will" condition. Importantly, this activity was significantly higher than the base-line resting state. The results support a self-related role for the intrinsic system and provide clear evidence for both hemispheric and regional specialization in the human intrinsic system.


Assuntos
Córtex Cerebral/fisiologia , Volição , Adulto , Córtex Cerebral/anatomia & histologia , Sinais (Psicologia) , Tomada de Decisões , Feminino , Fixação Ocular , Lateralidade Funcional/fisiologia , Humanos , Imageamento por Ressonância Magnética , Masculino , Lobo Parietal/anatomia & histologia , Lobo Parietal/fisiologia , Percepção Visual/fisiologia
16.
IEEE Trans Pattern Anal Mach Intell ; 30(9): 1618-31, 2008 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-18617719

RESUMO

We develop a novel method for class-based feature matching across large changes in viewing conditions. The method is based on the property that when objects share a similar part, the similarity is preserved across viewing conditions. Given a feature and a training set of object images, we first identify the subset of objects that share this feature. The transformation of the feature's appearance across viewing conditions is determined mainly by properties of the feature, rather than of the object in which it is embedded. Therefore, the transformed feature will be shared by approximately the same set of objects. Based on this consistency requirement, corresponding features can be reliably identified from a set of candidate matches. Unlike previous approaches, the proposed scheme compares feature appearances only in similar viewing conditions, rather than across different viewing conditions. As a result, the scheme is not restricted to locally planar objects or affine transformations. The approach also does not require examples of correct matches. We show that by using the proposed method, a dense set of accurate correspondences can be obtained. Experimental comparisons demonstrate that matching accuracy is significantly improved over previous schemes. Finally, we show that the scheme can be successfully used for invariant object recognition.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Sensibilidade e Especificidade
17.
IEEE Trans Pattern Anal Mach Intell ; 30(12): 2109-25, 2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18988946

RESUMO

We construct a segmentation scheme that combines top-down with bottom-up processing. In the proposed scheme, segmentation and recognition are intertwined rather than proceeding in a serial manner. The top-down part applies stored knowledge about object shapes acquired through learning, whereas the bottom-up part creates a hierarchy of segmented regions based on uniformity criteria. Beginning with unsegmented training examples of class and non-class images, the algorithm constructs a bank of class-specific fragments and determines their figure-ground segmentation. This bank is then used to segment novel images in a top-down manner: the fragments are first used to recognize images containing class objects, and then to create a complete cover that best approximates these objects. The resulting segmentation is then integrated with bottom-up multi-scale grouping to better delineate the object boundaries. Our experiments, applied to a large set of four classes (horses, pedestrians, cars, faces), demonstrate segmentation results that surpass those achieved by previous top-down or bottom-up schemes. The main novel aspects of this work are the fragment learning phase, which efficiently learns the figure-ground labeling of segmentation fragments, even in training sets with high object and background variability; combining the top-down segmentation with bottom-up criteria to draw on their relative merits; and the use of segmentation to improve recognition.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
18.
Nat Neurosci ; 5(7): 682-7, 2002 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-12055634

RESUMO

The human visual system analyzes shapes and objects in a series of stages in which stimulus features of increasing complexity are extracted and analyzed. The first stages use simple local features, and the image is subsequently represented in terms of larger and more complex features. These include features of intermediate complexity and partial object views. The nature and use of these higher-order representations remains an open question in the study of visual processing by the primate cortex. Here we show that intermediate complexity (IC) features are optimal for the basic visual task of classification. Moderately complex features are more informative for classification than very simple or very complex ones, and so they emerge naturally by the simple coding principle of information maximization with respect to a class of images. Our findings suggest a specific role for IC features in visual processing and a principle for their extraction.


Assuntos
Face , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Visual de Modelos/classificação , Simulação por Computador , Humanos
19.
Cognition ; 171: 65-84, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29107889

RESUMO

The goal in this work is to model the process of 'full interpretation' of object images, which is the ability to identify and localize all semantic features and parts that are recognized by human observers. The task is approached by dividing the interpretation of the complete object to the interpretation of multiple reduced but interpretable local regions. In such reduced regions, interpretation is simpler, since the number of semantic components is small, and the variability of possible configurations is low. We model the interpretation process by identifying primitive components and relations that play a useful role in local interpretation by humans. To identify useful components and relations used in the interpretation process, we consider the interpretation of 'minimal configurations': these are reduced local regions, which are minimal in the sense that further reduction renders them unrecognizable and uninterpretable. We show that such minimal interpretable images have useful properties, which we use to identify informative features and relations used for full interpretation. We describe our interpretation model, and show results of detailed interpretations of minimal configurations, produced automatically by the model. Finally, we discuss possible extensions and implications of full interpretation to difficult visual tasks, such as recognizing social interactions, which are beyond the scope of current models of visual recognition.


Assuntos
Modelos Teóricos , Reconhecimento Automatizado de Padrão , Reconhecimento Visual de Modelos/fisiologia , Humanos
20.
Interface Focus ; 8(4): 20180020, 2018 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-29951197

RESUMO

Computational models of vision have advanced in recent years at a rapid rate, rivalling in some areas human-level performance. Much of the progress to date has focused on analysing the visual scene at the object level-the recognition and localization of objects in the scene. Human understanding of images reaches a richer and deeper image understanding both 'below' the object level, such as identifying and localizing object parts and sub-parts, as well as 'above' the object level, such as identifying object relations, and agents with their actions and interactions. In both cases, understanding depends on recovering meaningful structures in the image, and their components, properties and inter-relations, a process referred here as 'image interpretation'. In this paper, we describe recent directions, based on human and computer vision studies, towards human-like image interpretation, beyond the reach of current schemes, both below the object level, as well as some aspects of image interpretation at the level of meaningful configurations beyond the recognition of individual objects, and in particular, interactions between two people in close contact. In both cases the recognition process depends on the detailed interpretation of so-called 'minimal images', and at both levels recognition depends on combining 'bottom-up' processing, proceeding from low to higher levels of a processing hierarchy, together with 'top-down' processing, proceeding from high to lower levels stages of visual analysis.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA