Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Research (Wash D C) ; 6: 0024, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37223467

RESUMO

We overview several properties-old and new-of training overparameterized deep networks under the square loss. We first consider a model of the dynamics of gradient flow under the square loss in deep homogeneous rectified linear unit networks. We study the convergence to a solution with the absolute minimum ρ, which is the product of the Frobenius norms of each layer weight matrix, when normalization by Lagrange multipliers is used together with weight decay under different forms of gradient descent. A main property of the minimizers that bound their expected error for a specific network architecture is ρ. In particular, we derive novel norm-based bounds for convolutional layers that are orders of magnitude better than classical bounds for dense networks. Next, we prove that quasi-interpolating solutions obtained by stochastic gradient descent in the presence of weight decay have a bias toward low-rank weight matrices, which should improve generalization. The same analysis predicts the existence of an inherent stochastic gradient descent noise for deep networks. In both cases, we verify our predictions experimentally. We then predict neural collapse and its properties without any specific assumption-unlike other published proofs. Our analysis supports the idea that the advantage of deep networks relative to other classifiers is greater for problems that are appropriate for sparse deep architectures such as convolutional neural networks. The reason is that compositionally sparse target functions can be approximated well by "sparse" deep networks without incurring in the curse of dimensionality.

2.
Proc Natl Acad Sci U S A ; 117(48): 30039-30045, 2020 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-32518109

RESUMO

While deep learning is successful in a number of applications, it is not yet well understood theoretically. A theoretical characterization of deep learning should answer questions about their approximation power, the dynamics of optimization, and good out-of-sample performance, despite overparameterization and the absence of explicit regularization. We review our recent results toward this goal. In approximation theory both shallow and deep networks are known to approximate any continuous functions at an exponential cost. However, we proved that for certain types of compositional functions, deep networks of the convolutional type (even without weight sharing) can avoid the curse of dimensionality. In characterizing minimization of the empirical exponential loss we consider the gradient flow of the weight directions rather than the weights themselves, since the relevant function underlying classification corresponds to normalized networks. The dynamics of normalized weights turn out to be equivalent to those of the constrained problem of minimizing the loss subject to a unit norm constraint. In particular, the dynamics of typical gradient descent have the same critical points as the constrained problem. Thus there is implicit regularization in training deep networks under exponential-type loss functions during gradient flow. As a consequence, the critical points correspond to minimum norm infima of the loss. This result is especially relevant because it has been recently shown that, for overparameterized models, selection of a minimum norm solution optimizes cross-validation leave-one-out stability and thereby the expected error. Thus our results imply that gradient descent in deep networks minimize the expected error.

3.
Nat Commun ; 11(1): 1027, 2020 02 24.
Artigo em Inglês | MEDLINE | ID: mdl-32094327

RESUMO

Overparametrized deep networks predict well, despite the lack of an explicit complexity control during training, such as an explicit regularization term. For exponential-type loss functions, we solve this puzzle by showing an effective regularization effect of gradient descent in terms of the normalized weights that are relevant for classification.

4.
Sci Rep ; 10(1): 1411, 2020 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-31996698

RESUMO

Though the range of invariance in recognition of novel objects is a basic aspect of human vision, its characterization has remained surprisingly elusive. Here we report tolerance to scale and position changes in one-shot learning by measuring recognition accuracy of Korean letters presented in a flash to non-Korean subjects who had no previous experience with Korean letters. We found that humans have significant scale-invariance after only a single exposure to a novel object. The range of translation-invariance is limited, depending on the size and position of presented objects. To understand the underlying brain computation associated with the invariance properties, we compared experimental data with computational modeling results. Our results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representations captured by neurons' receptive field sizes and sampling density that change with eccentricity. Our psychophysical experiments and related simulations strongly suggest that the human visual system uses a computational strategy that differs in some key aspects from current deep learning architectures, being more data efficient and relying more critically on eye-movements.


Assuntos
Movimentos Oculares/fisiologia , Visão Ocular/fisiologia , Percepção Visual/fisiologia , Aprendizado Profundo , Humanos , Idioma , Aprendizagem/fisiologia , Estimulação Luminosa/métodos
5.
Annu Rev Vis Sci ; 4: 403-422, 2018 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-30052494

RESUMO

Recognizing the people, objects, and actions in the world around us is a crucial aspect of human perception that allows us to plan and act in our environment. Remarkably, our proficiency in recognizing semantic categories from visual input is unhindered by transformations that substantially alter their appearance (e.g., changes in lighting or position). The ability to generalize across these complex transformations is a hallmark of human visual intelligence, which has been the focus of wide-ranging investigation in systems and computational neuroscience. However, while the neural machinery of human visual perception has been thoroughly described, the computational principles dictating its functioning remain unknown. Here, we review recent results in brain imaging, neurophysiology, and computational neuroscience in support of the hypothesis that the ability to support the invariant recognition of semantic entities in the visual world shapes which neural representations of sensory input are computed by human visual cortex.


Assuntos
Discriminação Psicológica/fisiologia , Modelos Neurológicos , Reconhecimento Psicológico/fisiologia , Córtex Visual/fisiologia , Percepção Visual/fisiologia , Biologia Computacional , Humanos
6.
J Neurophysiol ; 119(2): 631-640, 2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-29118198

RESUMO

Humans can effortlessly recognize others' actions in the presence of complex transformations, such as changes in viewpoint. Several studies have located the regions in the brain involved in invariant action recognition; however, the underlying neural computations remain poorly understood. We use magnetoencephalography decoding and a data set of well-controlled, naturalistic videos of five actions (run, walk, jump, eat, drink) performed by different actors at different viewpoints to study the computational steps used to recognize actions across complex transformations. In particular, we ask when the brain discriminates between different actions, and when it does so in a manner that is invariant to changes in 3D viewpoint. We measure the latency difference between invariant and noninvariant action decoding when subjects view full videos as well as form-depleted and motion-depleted stimuli. We were unable to detect a difference in decoding latency or temporal profile between invariant and noninvariant action recognition in full videos. However, when either form or motion information is removed from the stimulus set, we observe a decrease and delay in invariant action decoding. Our results suggest that the brain recognizes actions and builds invariance to complex transformations at the same time and that both form and motion information are crucial for fast, invariant action recognition. NEW & NOTEWORTHY The human brain can quickly recognize actions despite transformations that change their visual appearance. We use neural timing data to uncover the computations underlying this ability. We find that within 200 ms action can be read out of magnetoencephalography data and that this representation is invariant to changes in viewpoint. We find form and motion are needed for this fast action decoding, suggesting that the brain quickly integrates complex spatiotemporal features to form invariant action representations.


Assuntos
Encéfalo/fisiologia , Percepção de Movimento , Reconhecimento Visual de Modelos , Adulto , Feminino , Humanos , Masculino , Movimento , Tempo de Reação
7.
PLoS Comput Biol ; 13(12): e1005859, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29253864

RESUMO

Recognizing the actions of others from visual stimuli is a crucial aspect of human perception that allows individuals to respond to social cues. Humans are able to discriminate between similar actions despite transformations, like changes in viewpoint or actor, that substantially alter the visual appearance of a scene. This ability to generalize across complex transformations is a hallmark of human visual intelligence. Advances in understanding action recognition at the neural level have not always translated into precise accounts of the computational principles underlying what representations of action sequences are constructed by human visual cortex. Here we test the hypothesis that invariant action discrimination might fill this gap. Recently, the study of artificial systems for static object perception has produced models, Convolutional Neural Networks (CNNs), that achieve human level performance in complex discriminative tasks. Within this class, architectures that better support invariant object recognition also produce image representations that better match those implied by human and primate neural data. However, whether these models produce representations of action sequences that support recognition across complex transformations and closely follow neural representations of actions remains unknown. Here we show that spatiotemporal CNNs accurately categorize video stimuli into action classes, and that deliberate model modifications that improve performance on an invariant action recognition task lead to data representations that better match human neural recordings. Our results support our hypothesis that performance on invariant discrimination dictates the neural representations of actions computed in the brain. These results broaden the scope of the invariant recognition framework for understanding visual intelligence from perception of inanimate objects and faces in static images to the study of human perception of action sequences.


Assuntos
Reconhecimento Psicológico/fisiologia , Percepção Visual/fisiologia , Biologia Computacional , Sinais (Psicologia) , Discriminação Psicológica/fisiologia , Humanos , Magnetoencefalografia , Modelos Neurológicos , Redes Neurais de Computação , Estimulação Luminosa , Córtex Visual/fisiologia
8.
Curr Biol ; 27(1): 62-67, 2017 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-27916522

RESUMO

The primate brain contains a hierarchy of visual areas, dubbed the ventral stream, which rapidly computes object representations that are both specific for object identity and robust against identity-preserving transformations, like depth rotations [1, 2]. Current computational models of object recognition, including recent deep-learning networks, generate these properties through a hierarchy of alternating selectivity-increasing filtering and tolerance-increasing pooling operations, similar to simple-complex cells operations [3-6]. Here, we prove that a class of hierarchical architectures and a broad set of biologically plausible learning rules generate approximate invariance to identity-preserving transformations at the top level of the processing hierarchy. However, all past models tested failed to reproduce the most salient property of an intermediate representation of a three-level face-processing hierarchy in the brain: mirror-symmetric tuning to head orientation [7]. Here, we demonstrate that one specific biologically plausible Hebb-type learning rule generates mirror-symmetric tuning to bilaterally symmetric stimuli, like faces, at intermediate levels of the architecture and show why it does so. Thus, the tuning properties of individual cells inside the visual stream appear to result from group properties of the stimuli they encode and to reflect the learning rules that sculpted the information-processing system within which they reside.


Assuntos
Encéfalo/fisiologia , Reconhecimento Facial/fisiologia , Movimentos da Cabeça/fisiologia , Aprendizagem/fisiologia , Macaca/fisiologia , Modelos Neurológicos , Animais , Orientação , Orientação Espacial , Reconhecimento Visual de Modelos , Estimulação Luminosa/métodos , Córtex Visual/fisiologia
9.
PLoS One ; 11(3): e0150980, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26985989

RESUMO

Faces are an important and unique class of visual stimuli, and have been of interest to neuroscientists for many years. Faces are known to elicit certain characteristic behavioral markers, collectively labeled "holistic processing", while non-face objects are not processed holistically. However, little is known about the underlying neural mechanisms. The main aim of this computational simulation work is to investigate the neural mechanisms that make face processing holistic. Using a model of primate visual processing, we show that a single key factor, "neural tuning size", is able to account for three important markers of holistic face processing: the Composite Face Effect (CFE), Face Inversion Effect (FIE) and Whole-Part Effect (WPE). Our proof-of-principle specifies the precise neurophysiological property that corresponds to the poorly-understood notion of holism, and shows that this one neural property controls three classic behavioral markers of holism. Our work is consistent with neurophysiological evidence, and makes further testable predictions. Overall, we provide a parsimonious account of holistic face processing, connecting computation, behavior and neurophysiology.


Assuntos
Face/anatomia & histologia , Reconhecimento Psicológico , Percepção Visual , Animais , Simulação por Computador , Humanos , Modelos Biológicos , Primatas
10.
PLoS Comput Biol ; 11(10): e1004390, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26496457

RESUMO

Is visual cortex made up of general-purpose information processing machinery, or does it consist of a collection of specialized modules? If prior knowledge, acquired from learning a set of objects is only transferable to new objects that share properties with the old, then the recognition system's optimal organization must be one containing specialized modules for different object classes. Our analysis starts from a premise we call the invariance hypothesis: that the computational goal of the ventral stream is to compute an invariant-to-transformations and discriminative signature for recognition. The key condition enabling approximate transfer of invariance without sacrificing discriminability turns out to be that the learned and novel objects transform similarly. This implies that the optimal recognition system must contain subsystems trained only with data from similarly-transforming objects and suggests a novel interpretation of domain-specific regions like the fusiform face area (FFA). Furthermore, we can define an index of transformation-compatibility, computable from videos, that can be combined with information about the statistics of natural vision to yield predictions for which object categories ought to have domain-specific regions in agreement with the available data. The result is a unifying account linking the large literature on view-based recognition with the wealth of experimental evidence concerning domain-specific regions.


Assuntos
Modelos Neurológicos , Rede Nervosa/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Reconhecimento Psicológico/fisiologia , Córtex Visual/fisiologia , Vias Visuais/fisiologia , Animais , Simulação por Computador , Humanos
11.
J Neurophysiol ; 111(1): 91-102, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24089402

RESUMO

The human visual system can rapidly recognize objects despite transformations that alter their appearance. The precise timing of when the brain computes neural representations that are invariant to particular transformations, however, has not been mapped in humans. Here we employ magnetoencephalography decoding analysis to measure the dynamics of size- and position-invariant visual information development in the ventral visual stream. With this method we can read out the identity of objects beginning as early as 60 ms. Size- and position-invariant visual information appear around 125 ms and 150 ms, respectively, and both develop in stages, with invariance to smaller transformations arising before invariance to larger transformations. Additionally, the magnetoencephalography sensor activity localizes to neural sources that are in the most posterior occipital regions at the early decoding times and then move temporally as invariant information develops. These results provide previously unknown latencies for key stages of human-invariant object recognition, as well as new and compelling evidence for a feed-forward hierarchical model of invariant object recognition where invariance increases at each successive visual area along the ventral stream.


Assuntos
Reconhecimento Visual de Modelos , Tempo de Reação , Córtex Visual/fisiologia , Adolescente , Adulto , Potenciais Evocados Visuais , Feminino , Humanos , Masculino
13.
Ann N Y Acad Sci ; 1305: 72-82, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23773126

RESUMO

Object recognition has been a central yet elusive goal of computational vision. For many years, computer performance seemed highly deficient and unable to emulate the basic capabilities of the human recognition system. Over the past decade or so, computer scientists and neuroscientists have developed algorithms and systems-and models of visual cortex-that have come much closer to human performance in visual identification and categorization. In this personal perspective, we discuss the ongoing struggle of visual models to catch up with the visual cortex, identify key reasons for the relatively rapid improvement of artificial systems and models, and identify open problems for computational vision in this domain.


Assuntos
Visão Ocular/fisiologia , Percepção Visual/fisiologia , Simulação por Computador , Humanos , Modelos Neurológicos , Córtex Visual/fisiologia
15.
Artigo em Inglês | MEDLINE | ID: mdl-22754523

RESUMO

Learning by temporal association rules such as Foldiak's trace rule is an attractive hypothesis that explains the development of invariance in visual recognition. Consistent with these rules, several recent experiments have shown that invariance can be broken at both the psychophysical and single cell levels. We show (1) that temporal association learning provides appropriate invariance in models of object recognition inspired by the visual cortex, (2) that we can replicate the "invariance disruption" experiments using these models with a temporal association learning rule to develop and maintain invariance, and (3) that despite dramatic single cell effects, a population of cells is very robust to these disruptions. We argue that these models account for the stability of perceptual invariance despite the underlying plasticity of the system, the variability of the visual world and expected noise in the biological mechanisms.

16.
Perception ; 41(9): 1017-23, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23409366

RESUMO

I discuss the "levels of understanding" framework described in Marr's Vision and propose an updated version to capture the changes in computation and neuroscience over the last 30 years.


Assuntos
Modelos Neurológicos , Visão Ocular , Humanos , Aprendizagem , Neurociências , Vias Visuais
17.
Proc Natl Acad Sci U S A ; 108(21): 8850-5, 2011 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-21555594

RESUMO

Recognizing objects in cluttered scenes requires attentional mechanisms to filter out distracting information. Previous studies have found several physiological correlates of attention in visual cortex, including larger responses for attended objects. However, it has been unclear whether these attention-related changes have a large impact on information about objects at the neural population level. To address this question, we trained monkeys to covertly deploy their visual attention from a central fixation point to one of three objects displayed in the periphery, and we decoded information about the identity and position of the objects from populations of ∼ 200 neurons from the inferior temporal cortex using a pattern classifier. The results show that before attention was deployed, information about the identity and position of each object was greatly reduced relative to when these objects were shown in isolation. However, when a monkey attended to an object, the pattern of neural activity, represented as a vector with dimensionality equal to the size of the neural population, was restored toward the vector representing the isolated object. Despite this nearly exclusive representation of the attended object, an increase in the salience of nonattended objects caused "bottom-up" mechanisms to override these "top-down" attentional enhancements. The method described here can be used to assess which attention-related physiological changes are directly related to object recognition, and should be helpful in assessing the role of additional physiological changes in the future.


Assuntos
Atenção/fisiologia , Reconhecimento Psicológico/fisiologia , Lobo Temporal/fisiologia , Percepção Visual/fisiologia , Animais , Haplorrinos , Neurônios/fisiologia , Córtex Visual/fisiologia
18.
Nat Commun ; 1: 68, 2010 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-20842193

RESUMO

Neurobehavioural analysis of mouse phenotypes requires the monitoring of mouse behaviour over long periods of time. In this study, we describe a trainable computer vision system enabling the automated analysis of complex mouse behaviours. We provide software and an extensive manually annotated video database used for training and testing the system. Our system performs on par with human scoring, as measured from ground-truth manual annotations of thousands of clips of freely behaving mice. As a validation of the system, we characterized the home-cage behaviours of two standard inbred and two non-standard mouse strains. From these data, we were able to predict in a blind test the strain identity of individual animals with high accuracy. Our video-based software will complement existing sensor-based automated approaches and enable an adaptable, comprehensive, high-throughput, fine-grained, automated analysis of mouse behaviour.


Assuntos
Comportamento Animal , Animais , Feminino , Masculino , Camundongos
19.
J Neurosci ; 30(25): 8519-28, 2010 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-20573899

RESUMO

Items are categorized differently depending on the behavioral context. For instance, a lion can be categorized as an African animal or a type of cat. We recorded lateral prefrontal cortex (PFC) neural activity while monkeys switched between categorizing the same image set along two different category schemes with orthogonal boundaries. We found that each category scheme was largely represented by independent PFC neuronal populations and that activity reflecting a category distinction was weaker, but not absent, when that category was irrelevant. We suggest that the PFC represents competing category representations independently to reduce interference between them.


Assuntos
Formação de Conceito/fisiologia , Tomada de Decisões/fisiologia , Neurônios/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Córtex Pré-Frontal/fisiologia , Potenciais de Ação/fisiologia , Animais , Comportamento Animal/fisiologia , Mapeamento Encefálico , Aprendizagem por Discriminação/fisiologia , Eletrofisiologia , Processamento de Imagem Assistida por Computador , Macaca mulatta , Imageamento por Ressonância Magnética , Tempo de Reação/fisiologia
20.
Vision Res ; 50(22): 2233-47, 2010 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-20493206

RESUMO

In the theoretical framework of this paper, attention is part of the inference process that solves the visual recognition problem of what is where. The theory proposes a computational role for attention and leads to a model that predicts some of its main properties at the level of psychophysics and physiology. In our approach, the main goal of the visual system is to infer the identity and the position of objects in visual scenes: spatial attention emerges as a strategy to reduce the uncertainty in shape information while feature-based attention reduces the uncertainty in spatial information. Featural and spatial attention represent two distinct modes of a computational process solving the problem of recognizing and localizing objects, especially in difficult recognition tasks such as in cluttered natural scenes. We describe a specific computational model and relate it to the known functional anatomy of attention. We show that several well-known attentional phenomena--including bottom-up pop-out effects, multiplicative modulation of neuronal tuning curves and shift in contrast responses--all emerge naturally as predictions of the model. We also show that the Bayesian model predicts well human eye fixations (considered as a proxy for shifts of attention) in natural scenes.


Assuntos
Atenção/fisiologia , Teorema de Bayes , Percepção Visual/fisiologia , Fixação Ocular/fisiologia , Percepção de Forma/fisiologia , Humanos , Modelos Teóricos , Reconhecimento Psicológico , Percepção Espacial/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...