Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 56
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Behav Brain Sci ; 46: e415, 2023 Dec 06.
Artículo en Inglés | MEDLINE | ID: mdl-38054298

RESUMEN

On several key issues we agree with the commentators. Perhaps most importantly, everyone seems to agree that psychology has an important role to play in building better models of human vision, and (most) everyone agrees (including us) that deep neural networks (DNNs) will play an important role in modelling human vision going forward. But there are also disagreements about what models are for, how DNN-human correspondences should be evaluated, the value of alternative modelling approaches, and impact of marketing hype in the literature. In our view, these latter issues are contributing to many unjustified claims regarding DNN-human correspondences in vision and other domains of cognition. We explore all these issues in this response.


Asunto(s)
Cognición , Redes Neurales de la Computación , Humanos
2.
J Exp Psychol Gen ; 152(12): 3380-3402, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37695326

RESUMEN

Humans are particularly sensitive to relationships between parts of objects. It remains unclear why this is. One hypothesis is that relational features are highly diagnostic of object categories and emerge as a result of learning to classify objects. We tested this by analyzing the internal representations of supervised convolutional neural networks (CNNs) trained to classify large sets of objects. We found that CNNs do not show the same sensitivity to relational changes as previously observed for human participants. Furthermore, when we precisely controlled the deformations to objects, human behavior was best predicted by the number of relational changes while CNNs were equally sensitive to all changes. Even changing the statistics of the learning environment by making relations uniquely diagnostic did not make networks more sensitive to relations in general. Our results show that learning to classify objects is not sufficient for the emergence of human shape representations. Instead, these results suggest that humans are selectively sensitive to relational changes because they build representations of distal objects from their retinal images and interpret relational changes as changes to these distal objects. This inferential process makes human shape representations qualitatively different from those of artificial neural networks optimized to perform image classification. (PsycInfo Database Record (c) 2023 APA, all rights reserved).


Asunto(s)
Aprendizaje , Redes Neurales de la Computación , Humanos
3.
Neural Netw ; 162: 199-211, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-36913820

RESUMEN

Natural and artificial audition can in principle acquire different solutions to a given problem. The constraints of the task, however, can nudge the cognitive science and engineering of audition to qualitatively converge, suggesting that a closer mutual examination would potentially enrich artificial hearing systems and process models of the mind and brain. Speech recognition - an area ripe for such exploration - is inherently robust in humans to a number transformations at various spectrotemporal granularities. To what extent are these robustness profiles accounted for by high-performing neural network systems? We bring together experiments in speech recognition under a single synthesis framework to evaluate state-of-the-art neural networks as stimulus-computable, optimized observers. In a series of experiments, we (1) clarify how influential speech manipulations in the literature relate to each other and to natural speech, (2) show the granularities at which machines exhibit out-of-distribution robustness, reproducing classical perceptual phenomena in humans, (3) identify the specific conditions where model predictions of human performance differ, and (4) demonstrate a crucial failure of all artificial systems to perceptually recover where humans do, suggesting alternative directions for theory and model building. These findings encourage a tighter synergy between the cognitive science and engineering of audition.


Asunto(s)
Percepción del Habla , Habla , Humanos , Redes Neurales de la Computación , Encéfalo
4.
Neural Netw ; 161: 515-524, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-36805266

RESUMEN

Convolutional neural networks (CNNs) are often described as promising models of human vision, yet they show many differences from human abilities. We focus on a superhuman capacity of top-performing CNNs, namely, their ability to learn very large datasets of random patterns. We verify that human learning on such tasks is extremely limited, even with few stimuli. We argue that the performance difference is due to CNNs' overcapacity and introduce biologically inspired mechanisms to constrain it, while retaining the good test set generalisation to structured images as characteristic of CNNs. We investigate the efficacy of adding noise to hidden units' activations, restricting early convolutional layers with a bottleneck, and using a bounded activation function. Internal noise was the most potent intervention and the only one which, by itself, could reduce random data performance in the tested models to chance levels. We also investigated whether networks with biologically inspired capacity constraints show improved generalisation to out-of-distribution stimuli, however little benefit was observed. Our results suggest that constraining networks with biologically motivated mechanisms paves the way for closer correspondence between network and human performance, but the few manipulations we have tested are only a small step towards that goal.


Asunto(s)
Aprendizaje , Redes Neurales de la Computación , Humanos , Generalización Psicológica
5.
Behav Res Methods ; 55(3): 1314-1331, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-35650383

RESUMEN

Nonword pronunciation is a critical challenge for models of reading aloud but little attention has been given to identifying the best method for assessing model predictions. The most typical approach involves comparing the model's pronunciations of nonwords to pronunciations of the same nonwords by human participants and deeming the model's output correct if it matches with any transcription of the human pronunciations. The present paper introduces a new ratings-based method, in which participants are shown printed nonwords and asked to rate the plausibility of the provided pronunciations, generated here by a speech synthesiser. We demonstrate this method with reference to a previously published database of 915 disyllabic nonwords (Mousikou et al., 2017). We evaluated two well-known psychological models, RC00 and CDP++, as well as an additional grapheme-to-phoneme algorithm known as Sequitur, and compared our model assessment with the corpus-based method adopted by Mousikou et al. We find that the ratings method: a) is much easier to implement than a corpus-based method, b) has a high hit rate and low false-alarm rate in assessing nonword reading accuracy, and c) provided a similar outcome as the corpus-based method in its assessment of RC00 and CDP++. However, the two methods differed in their evaluation of Sequitur, which performed much better under the ratings method. Indeed, our evaluation of Sequitur revealed that the corpus-based method introduced a number of false positives and more often, false negatives. Implications of these findings are discussed.


Asunto(s)
Fonética , Lectura , Humanos , Atención , Modelos Psicológicos , Algoritmos
6.
Behav Brain Sci ; 46: e385, 2022 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-36453586

RESUMEN

Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modeling approaches that focus on psychological data.


Asunto(s)
Redes Neurales de la Computación , Percepción Visual , Humanos , Percepción Visual/fisiología , Visión Ocular , Encéfalo/diagnóstico por imagen , Encéfalo/fisiología , Imagen por Resonancia Magnética/métodos
7.
J Vis ; 22(10): 11, 2022 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-36094524

RESUMEN

Same-different visual reasoning is a basic skill central to abstract combinatorial thought. This fact has lead neural networks researchers to test same-different classification on deep convolutional neural networks (DCNNs), which has resulted in a controversy regarding whether this skill is within the capacity of these models. However, most tests of same-different classification rely on testing on images that come from the same pixel-level distribution as the training images, yielding the results inconclusive. In this study, we tested relational same-different reasoning in DCNNs. In a series of simulations we show that models based on the ResNet architecture are capable of visual same-different classification, but only when the test images are similar to the training images at the pixel level. In contrast, when there is a shift in the testing distribution that does not change the relation between the objects in the image, the performance of DCNNs decreases substantially. This finding is true even when the DCNNs' training regime is expanded to include images taken from a wide range of different pixel-level distributions or when the model is trained on the testing distribution but on a different task in a multitask learning context. Furthermore, we show that the relation network, a deep learning architecture specifically designed to tackle visual relational reasoning problems, suffers the same kind of limitations. Overall, the results of this study suggest that learning same-different relations is beyond the scope of current DCNNs.


Asunto(s)
Redes Neurales de la Computación , Humanos
8.
PLoS Comput Biol ; 18(5): e1009572, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35560155

RESUMEN

Humans rely heavily on the shape of objects to recognise them. Recently, it has been argued that Convolutional Neural Networks (CNNs) can also show a shape-bias, provided their learning environment contains this bias. This has led to the proposal that CNNs provide good mechanistic models of shape-bias and, more generally, human visual processing. However, it is also possible that humans and CNNs show a shape-bias for very different reasons, namely, shape-bias in humans may be a consequence of architectural and cognitive constraints whereas CNNs show a shape-bias as a consequence of learning the statistics of the environment. We investigated this question by exploring shape-bias in humans and CNNs when they learn in a novel environment. We observed that, in this new environment, humans (i) focused on shape and overlooked many non-shape features, even when non-shape features were more diagnostic, (ii) learned based on only one out of multiple predictive features, and (iii) failed to learn when global features, such as shape, were absent. This behaviour contrasted with the predictions of a statistical inference model with no priors, showing the strong role that shape-bias plays in human feature selection. It also contrasted with CNNs that (i) preferred to categorise objects based on non-shape features, and (ii) increased reliance on these non-shape features as they became more predictive. This was the case even when the CNN was pre-trained to have a shape-bias and the convolutional backbone was frozen. These results suggest that shape-bias has a different source in humans and CNNs: while learning in CNNs is driven by the statistical properties of the environment, humans are highly constrained by their previous biases, which suggests that cognitive constraints play a key role in how humans learn to recognise novel objects.


Asunto(s)
Redes Neurales de la Computación , Percepción Visual , Sesgo , Ceguera , Humanos , Aprendizaje
9.
Neural Netw ; 150: 222-236, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35334437

RESUMEN

Humans can identify objects following various spatial transformations such as scale and viewpoint. This extends to novel objects, after a single presentation at a single pose, sometimes referred to as online invariance. CNNs have been proposed as a compelling model of human vision, but their ability to identify objects across transformations is typically tested on held-out samples of trained categories after extensive data augmentation. This paper assesses whether standard CNNs can support human-like online invariance by training models to recognize images of synthetic 3D objects that undergo several transformations: rotation, scaling, translation, brightness, contrast, and viewpoint. Through the analysis of models' internal representations, we show that standard supervised CNNs trained on transformed objects can acquire strong invariances on novel classes even when trained with as few as 50 objects taken from 10 classes. This extended to a different dataset of photographs of real objects. We also show that these invariances can be acquired in a self-supervised way, through solving the same/different task. We suggest that this latter approach may be similar to how humans acquire invariances.


Asunto(s)
Aprendizaje , Redes Neurales de la Computación , Humanos , Rotación
10.
Neural Netw ; 148: 96-110, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-35114495

RESUMEN

Deep Convolutional Neural Networks (DNNs) have achieved superhuman accuracy on standard image classification benchmarks. Their success has reignited significant interest in their use as models of the primate visual system, bolstered by claims of their architectural and representational similarities. However, closer scrutiny of these models suggests that they rely on various forms of shortcut learning to achieve their impressive performance, such as using texture rather than shape information. Such superficial solutions to image recognition have been shown to make DNNs brittle in the face of more challenging tests such as noise-perturbed or out-of-distribution images, casting doubt on their similarity to their biological counterparts. In the present work, we demonstrate that adding fixed biological filter banks, in particular banks of Gabor filters, helps to constrain the networks to avoid reliance on shortcuts, making them develop more structured internal representations and more tolerance to noise. Importantly, they also gained around 20-35% improved accuracy when generalising to our novel out-of-distribution test image sets over standard end-to-end trained architectures. We take these findings to suggest that these properties of the primate visual system should be incorporated into DNNs to make them more able to cope with real-world vision and better capture some of the more impressive aspects of human visual perception such as generalisation.


Asunto(s)
Redes Neurales de la Computación , Percepción Visual , Animales , Generalización Psicológica , Reconocimiento en Psicología , Visión Ocular
11.
PLoS One ; 17(1): e0262260, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35045116

RESUMEN

There is growing interest in the role that morphological knowledge plays in literacy acquisition, but there is no research directly comparing the efficacy of different forms of morphological instruction. Here we compare two methods of teaching English morphology in the context of a memory experiment when words were organized by affix during study (e.g., a list of words was presented that all share an affix, such as , , , , etc.) or by base during study (e.g., a list of words was presented that all share a base, such as , , , ). We show that memory for morphologically complex words is better in both conditions compared to a control condition that does not highlight the morphological composition of words, and most importantly, show that studying words in a base-centric format improves memory further still. We argue that the morphological matrix that organizes words around a common base may provide an important new tool for literacy instruction.


Asunto(s)
Educación/métodos , Alfabetización/tendencias , Enseñanza/educación , Educación/tendencias , Femenino , Humanos , Lenguaje , Lingüística/métodos , Masculino , Lectura , Adulto Joven
12.
J Vis ; 21(2): 9, 2021 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-33620380

RESUMEN

Visual translation tolerance refers to our capacity to recognize objects over a wide range of different retinal locations. Although translation is perhaps the simplest spatial transform that the visual system needs to cope with, the extent to which the human visual system can identify objects at previously unseen locations is unclear, with some studies reporting near complete invariance over 10 degrees and other reporting zero invariance at 4 degrees of visual angle. Similarly, there is confusion regarding the extent of translation tolerance in computational models of vision, as well as the degree of match between human and model performance. Here, we report a series of eye-tracking studies (total N = 70) demonstrating that novel objects trained at one retinal location can be recognized at high accuracy rates following translations up to 18 degrees. We also show that standard deep convolutional neural networks (DCNNs) support our findings when pretrained to classify another set of stimuli across a range of locations, or when a global average pooling (GAP) layer is added to produce larger receptive fields. Our findings provide a strong constraint for theories of human vision and help explain inconsistent findings previously reported with convolutional neural networks (CNNs).


Asunto(s)
Redes Neurales de la Computación , Reconocimiento de Normas Patrones Automatizadas/métodos , Reconocimiento Visual de Modelos/fisiología , Aprendizaje Profundo , Femenino , Humanos , Masculino , Adulto Joven
13.
Elife ; 92020 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-32876562

RESUMEN

Deep convolutional neural networks (DCNNs) are frequently described as the best current models of human and primate vision. An obvious challenge to this claim is the existence of adversarial images that fool DCNNs but are uninterpretable to humans. However, recent research has suggested that there may be similarities in how humans and DCNNs interpret these seemingly nonsense images. We reanalysed data from a high-profile paper and conducted five experiments controlling for different ways in which these images can be generated and selected. We show human-DCNN agreement is much weaker and more variable than previously reported, and that the weak agreement is contingent on the choice of adversarial images and the design of the experiment. Indeed, we find there are well-known methods of generating images for which humans show no agreement with DCNNs. We conclude that adversarial images still pose a challenge to theorists using DCNNs as models of human vision.


Asunto(s)
Visión Ocular/fisiología , Humanos , Redes Neurales de la Computación
14.
Vision Res ; 176: 60-71, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32781347

RESUMEN

Various methods of measuring unit selectivity have been developed with the aim of better understanding how neural networks work. But the different measures provide divergent estimates of selectivity, and this has led to different conclusions regarding the conditions in which selective object representations are learned and the functional relevance of these representations. In an attempt to better characterize object selectivity, we undertake a comparison of various selectivity measures on a large set of units in AlexNet, including localist selectivity, precision, class-conditional mean activity selectivity (CCMAS), the human interpretation of activation maximization (AM) images, and standard signal-detection measures. We find that the different measures provide different estimates of object selectivity, with precision and CCMAS measures providing misleadingly high estimates. Indeed, the most selective units had a poor hit-rate or a high false-alarm rate (or both) in object classification, making them poor object detectors. We fail to find any units that are even remotely as selective as the 'grandmother cell' units reported in recurrent neural networks. In order to generalize these results, we compared selectivity measures on units in VGG-16 and GoogLeNet trained on the ImageNet or Places-365 datasets that have been described as 'object detectors'. Again, we find poor hit-rates and high false-alarm rates for object classification. We conclude that signal-detection measures provide a better assessment of single-unit selectivity compared to common alternative approaches, and that deep convolutional networks of image classification do not learn object detectors in their hidden layers.


Asunto(s)
Redes Neurales de la Computación , Humanos
15.
Vision Res ; 174: 57-68, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32599343

RESUMEN

When deep convolutional neural networks (CNNs) are trained "end-to-end" on raw data, some of the feature detectors they develop in their early layers resemble the representations found in early visual cortex. This result has been used to draw parallels between deep learning systems and human visual perception. In this study, we show that when CNNs are trained end-to-end they learn to classify images based on whatever feature is predictive of a category within the dataset. This can lead to bizarre results where CNNs learn idiosyncratic features such as high-frequency noise-like masks. In the extreme case, our results demonstrate image categorisation on the basis of a single pixel. Such features are extremely unlikely to play any role in human object recognition, where experiments have repeatedly shown a strong preference for shape. Through a series of empirical studies with standard high-performance CNNs, we show that these networks do not develop a shape-bias merely through regularisation methods or more ecologically plausible training regimes. These results raise doubts over the assumption that simply learning end-to-end in standard CNNs leads to the emergence of similar representations to the human visual system. In the second part of the paper, we show that CNNs are less reliant on these idiosyncratic features when we forgo end-to-end learning and introduce hard-wired Gabor filters designed to mimic early visual processing in V1.


Asunto(s)
Redes Neurales de la Computación , Percepción Visual , Humanos
16.
Philos Trans R Soc Lond B Biol Sci ; 375(1791): 20190309, 2020 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-31840580

RESUMEN

Combinatorial generalization-the ability to understand and produce novel combinations of already familiar elements-is considered to be a core capacity of the human mind and a major challenge to neural network models. A significant body of research suggests that conventional neural networks cannot solve this problem unless they are endowed with mechanisms specifically engineered for the purpose of representing symbols. In this paper, we introduce a novel way of representing symbolic structures in connectionist terms-the vectors approach to representing symbols (VARS), which allows training standard neural architectures to encode symbolic knowledge explicitly at their output layers. In two simulations, we show that neural networks not only can learn to produce VARS representations, but in doing so they achieve combinatorial generalization in their symbolic and non-symbolic output. This adds to other recent work that has shown improved combinatorial generalization under some training conditions, and raises the question of whether specific mechanisms or training routines are needed to support symbolic processing. This article is part of the theme issue 'Towards mechanistic models of meaning composition'.


Asunto(s)
Redes Neurales de la Computación , Simbolismo , Simulación por Computador , Humanos , Aprendizaje
17.
Bioessays ; 41(8): e1800248, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31322760

RESUMEN

There is widespread agreement in neuroscience and psychology that the visual system identifies objects and faces based on a pattern of activation over many neurons, each neuron being involved in representing many different categories. The hypothesis that the visual system includes finely tuned neurons for specific objects or faces for the sake of identification, so-called "grandmother cells", is widely rejected. Here it is argued that the rejection of grandmother cells is premature. Grandmother cells constitute a hypothesis of how familiar visual categories are identified, but the primary evidence against this hypothesis comes from studies that have failed to observe neurons that selectively respond to unfamiliar stimuli. These findings are reviewed and it is shown that they are irrelevant. Neuroscientists need to better understand existing models of face and object identification that include grandmother cells and then compare the selectivity of these units with single neurons responding to stimuli that can be identified.


Asunto(s)
Biología Computacional , Neuronas/fisiología , Reconocimiento en Psicología/fisiología , Percepción Visual/fisiología , Animales , Cara , Reconocimiento Facial/fisiología , Haplorrinos/psicología , Humanos , Memoria a Corto Plazo/fisiología , Modelos Neurológicos , Recompensa , Corteza Visual/fisiología
18.
Q J Exp Psychol (Hove) ; 71(7): 1497-1500, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29741459

RESUMEN

Taylor, Davis, and Rastle employed an artificial language learning paradigm to compare phonics and meaning-based approaches to reading instruction. Adults were taught consonant, vowel, and consonant (CVC) words composed of novel letters when the mappings between letters and sounds were completely systematic and the mappings between letters and meaning were completely arbitrary. At test, performance on naming tasks was better following training that emphasised the phonological rather than the semantic mappings, whereas performance on semantic tasks was similar in the two conditions. The authors concluded that these findings support phonics for early reading instruction in English. However, in our view, these conclusions are not justified given that the artificial language mischaracterised both the phonological and semantic mappings in English. Furthermore, the way participants studied the arbitrary letter-meaning correspondences bears little relation to meaning-based strategies used in schools. To compare phonics with meaning-based instruction it must be determined whether phonics is better than alternative forms of instruction that fully exploit the regularities within the semantic route. This is rarely assessed because of a widespread and mistaken assumption that underpins so much basic and applied research, namely, that the main function of spellings is to represent sounds.


Asunto(s)
Lenguaje , Lectura , Adulto , Humanos , Aprendizaje , Fonética , Semántica
19.
Psychon Bull Rev ; 25(2): 560-585, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-28875456

RESUMEN

Phonemes play a central role in traditional theories as units of speech perception and access codes to lexical representations. Phonemes have two essential properties: they are 'segment-sized' (the size of a consonant or vowel) and abstract (a single phoneme may be have different acoustic realisations). Nevertheless, there is a long history of challenging the phoneme hypothesis, with some theorists arguing for differently sized phonological units (e.g. features or syllables) and others rejecting abstract codes in favour of representations that encode detailed acoustic properties of the stimulus. The phoneme hypothesis is the minority view today. We defend the phoneme hypothesis in two complementary ways. First, we show that rejection of phonemes is based on a flawed interpretation of empirical findings. For example, it is commonly argued that the failure to find acoustic invariances for phonemes rules out phonemes. However, the lack of invariance is only a problem on the assumption that speech perception is a bottom-up process. If learned sublexical codes are modified by top-down constraints (which they are), then this argument loses all force. Second, we provide strong positive evidence for phonemes on the basis of linguistic data. Almost all findings that are taken (incorrectly) as evidence against phonemes are based on psycholinguistic studies of single words. However, phonemes were first introduced in linguistics, and the best evidence for phonemes comes from linguistic analyses of complex word forms and sentences. In short, the rejection of phonemes is based on a false analysis and a too-narrow consideration of the relevant data.


Asunto(s)
Lenguaje , Fonética , Percepción del Habla , Humanos , Lingüística , Psicolingüística
20.
Trends Cogn Sci ; 21(12): 950-961, 2017 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-29100738

RESUMEN

Parallel distributed processing (PDP) models in psychology are the precursors of deep networks used in computer science. However, only PDP models are associated with two core psychological claims, namely that all knowledge is coded in a distributed format and cognition is mediated by non-symbolic computations. These claims have long been debated in cognitive science, and recent work with deep networks speaks to this debate. Specifically, single-unit recordings show that deep networks learn units that respond selectively to meaningful categories, and researchers are finding that deep networks need to be supplemented with symbolic systems to perform some tasks. Given the close links between PDP and deep networks, it is surprising that research with deep networks is challenging PDP theory.


Asunto(s)
Modelos Psicológicos , Redes Neurales de la Computación , Cognición , Humanos , Lingüística
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA