Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15883-15895, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37651494

RESUMO

Domain adaptation (DA) aims to alleviate the domain shift between source domain and target domain. Most DA methods require access to the source data, but often that is not possible (e.g., due to data privacy or intellectual property). In this paper, we address the challenging source-free domain adaptation (SFDA) problem, where the source pretrained model is adapted to the target domain in the absence of source data. Our method is based on the observation that target data, which might not align with the source domain classifier, still forms clear clusters. We capture this intrinsic structure by defining local affinity of the target data, and encourage label consistency among data with high local affinity. We observe that higher affinity should be assigned to reciprocal neighbors. To aggregate information with more context, we consider expanded neighborhoods with small affinity values. Furthermore, we consider the density around each target sample, which can alleviate the negative impact of potential outliers. In the experimental results we verify that the inherent structure of the target features is an important source of information for domain adaptation. We demonstrate that this local structure can be efficiently captured by considering the local neighbors, the reciprocal neighbors, and the expanded neighborhood. Finally, we achieve state-of-the-art performance on several 2D image and 3D point cloud recognition datasets.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14611-14624, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37450360

RESUMO

Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training. The test samples can additionally contain seen categories in the generalized variant. Existing approaches rely on learning either shared or label-specific attention from the seen classes. Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge. In contrast, state-of-the-art single-label generative adversarial network (GAN) based approaches learn to directly synthesize the class-specific visual features from the corresponding class attribute embeddings. However, synthesizing multi-label features from GANs is still unexplored in the context of zero-shot setting. When multiple objects occur jointly in a single image, a critical question is how to effectively fuse multi-class information. In this work, we introduce different fusion approaches at the attribute-level, feature-level and cross-level (across attribute and feature-levels) for synthesizing multi-label features from their corresponding multi-label class embeddings. To the best of our knowledge, our work is the first to tackle the problem of multi-label feature synthesis in the (generalized) zero-shot setting. Our cross-level fusion-based generative approach outperforms the state-of-the-art on three zero-shot benchmarks: NUS-WIDE, Open Images and MS COCO. Furthermore, we show the generalization capabilities of our fusion approach in the zero-shot detection task on MS COCO, achieving favorable performance against existing methods.

3.
Front Hum Neurosci ; 17: 1079493, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36742356

RESUMO

Negation is frequently used in natural language, yet relatively little is known about its processing. More importantly, what is known regarding the neurophysiological processing of negation is mostly based on results of studies using written stimuli (the word-by-word paradigm). While the results of these studies have suggested processing costs in connection to negation (increased negativities in brain responses), it is difficult to know how this translates into processing of spoken language. We therefore developed an auditory paradigm based on a previous visual study investigating processing of affirmatives, sentential negation (not), and prefixal negation (un-). The findings of processing costs were replicated but differed in the details. Importantly, the pattern of ERP effects suggested less effortful processing for auditorily presented negated forms (restricted to increased anterior and posterior positivities) in comparison to visually presented negated forms. We suggest that the natural flow of spoken language reduces variability in processing and therefore results in clearer ERP patterns.

4.
IEEE Trans Neural Netw Learn Syst ; 34(11): 9116-9127, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35298386

RESUMO

In class-incremental semantic segmentation, we have no access to the labeled data of previous tasks. Therefore, when incrementally learning new classes, deep neural networks suffer from catastrophic forgetting of previously learned knowledge. To address this problem, we propose to apply a self-training approach that leverages unlabeled data, which is used for rehearsal of previous knowledge. Specifically, we first learn a temporary model for the current task, and then, pseudo labels for the unlabeled data are computed by fusing information from the old model of the previous task and the current temporary model. In addition, conflict reduction is proposed to resolve the conflicts of pseudo labels generated from both the old and temporary models. We show that maximizing self-entropy can further improve results by smoothing the overconfident predictions. Interestingly, in the experiments, we show that the auxiliary data can be different from the training data and that even general-purpose, but diverse auxiliary data can lead to large performance gains. The experiments demonstrate the state-of-the-art results: obtaining a relative gain of up to 114% on Pascal-VOC 2012 and 8.5% on the more challenging ADE20K compared to previous state-of-the-art methods.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 5513-5533, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36215375

RESUMO

For future learning systems, incremental learning is desirable because it allows for: efficient resource usage by eliminating the need to retrain from scratch at the arrival of new data; reduced memory usage by preventing or limiting the amount of data required to be stored - also important when privacy limitations are imposed; and learning that more closely resembles human learning. The main challenge for incremental learning is catastrophic forgetting, which refers to the precipitous drop in performance on previously learned tasks after learning a new one. Incremental learning of deep neural networks has seen explosive growth in recent years. Initial work focused on task-incremental learning, where a task-ID is provided at inference time. Recently, we have seen a shift towards class-incremental learning where the learner must discriminate at inference time between all classes seen in previous tasks without recourse to a task-ID. In this paper, we provide a complete survey of existing class-incremental learning methods for image classification, and in particular, we perform an extensive experimental evaluation on thirteen class-incremental methods. We consider several new experimental scenarios, including a comparison of class-incremental methods on multiple large-scale image classification datasets, an investigation into small and large domain shifts, and a comparison of various network architectures.

6.
Logoped Phoniatr Vocol ; 47(3): 157-165, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33565897

RESUMO

AIM: Students with hearing loss (HL) often fall behind hearing peers in complex language tasks such as narrative writing. This study explored the effects of school grade, gender, cognitive and linguistic predisposition and audiological factors on narrative text quality in this target group. METHOD: Eleven students with HL in Grades 5-6 and 7-8 (age 12-15) who took part in a writing intervention wrote four narrative texts over six months. A trained panel rated text quality. The effects of the students' working memory capacity, language comprehension, reading comprehension, school grade and gender and the intervention were analyzed as a mixed-effects regression model. Audiological factors were considered separately. RESULTS: The analysis showed that throughout the period, texts written by female students in Grade 7-8 received the highest text quality ratings, while those written by male students in Grade 7-8 received the lowest ratings. There was no effect of the intervention, or of the linguistic and cognitive measures. The students with the lowest text quality ratings received amplification later than those with high ratings, but HL severity was not associated with text quality. CONCLUSION: Hearing loss severity was not a decisive factor in narrative text quality. The intervention which the students took part in is potentially effective, with some adaptation to the special needs of students with HL. The strong gender effects are discussed.


Assuntos
Perda Auditiva , Qualidade da Voz , Adolescente , Criança , Compreensão , Feminino , Humanos , Masculino , Narração , Leitura , Estudantes/psicologia , Redação
7.
IEEE Trans Image Process ; 30: 3069-3083, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33621175

RESUMO

Modern computer vision requires processing large amounts of data, both while training the model and/or during inference, once the model is deployed. Scenarios where images are captured and processed in physically separated locations are increasingly common (e.g. autonomous vehicles, cloud computing, smartphones). In addition, many devices suffer from limited resources to store or transmit data (e.g. storage space, channel capacity). In these scenarios, lossy image compression plays a crucial role to effectively increase the number of images collected under such constraints. However, lossy compression entails some undesired degradation of the data that may harm the performance of the downstream analysis task at hand, since important semantic information may be lost in the process. Moreover, we may only have compressed images at training time but are able to use original images at inference time (i.e. test), or vice versa, and in such a case, the downstream model suffers from covariate shift. In this paper, we analyze this phenomenon, with a special focus on vision-based perception for autonomous driving as a paradigmatic scenario. We see that loss of semantic information and covariate shift do indeed exist, resulting in a drop in performance that depends on the compression rate. In order to address the problem, we propose dataset restoration, based on image restoration with generative adversarial networks (GANs). Our method is agnostic to both the particular image compression method and the downstream task; and has the advantage of not adding additional cost to the deployed models, which is particularly important in resource-limited devices. The presented experiments focus on semantic segmentation as a challenging use case, cover a broad range of compression rates and diverse datasets, and show how our method is able to significantly alleviate the negative effects of compression on the downstream visual task.

8.
Logoped Phoniatr Vocol ; 46(1): 1-10, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-31910683

RESUMO

AIM: Self-efficacy for writing is an important motivational factor and considered to predict writing performance. Self-efficacy for narrative writing has been sparsely studied, and few studies focus on the effects of writing intervention on self-efficacy. Additionally, there is a lack of validated measures of self-efficacy for elementary school students. In a previous study, we found that a trained panel rated personal narrative text quality higher for girls than for boys, which led to our aim: to investigate boys' and girls' self-efficacy for narrative writing before and after an intervention, and to explore associations between self-efficacy and text quality. METHODS: An 18-item self-efficacy scale was developed. Fifty-five fifth-grade students (M 11:2 years, SD 3.7 months) filled out the scale before and after a five-lesson observational learning intervention. Self-efficacy was then related to writing performance as measured by holistic text quality ratings. RESULTS: The students demonstrated strong self-efficacy, which increased significantly post-intervention. Girls and boys demonstrated similar self-efficacy, despite girls' higher text quality. There were moderate correlations between self-efficacy and writing performance pre- and post-intervention. CONCLUSIONS: The results support previous findings of strong self-efficacy at this age. The interaction between writing self-efficacy and performance is complex. Young students may not be able to differentiate between self-efficacy, general writing skills, task performance, and self-regulation. Self-efficacy scales should thus be carefully constructed with respect to age, genre, instruction, and to students' general educational context.


Assuntos
Autoeficácia , Qualidade da Voz , Feminino , Humanos , Masculino , Instituições Acadêmicas , Estudantes , Redação
9.
Front Psychol ; 11: 1622, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32760329

RESUMO

The present study looked at the extent to which 2-year-old children benefited from information conveyed by viewing a hiding event through an opening in a cardboard screen, seeing it as live video, as pre-recorded video, or by way of a mirror. Being encouraged to find the hidden object by selecting one out of two cups, the children successfully picked the baited cup significantly more often when they had viewed the hiding through the opening, or in live video, than when they viewed it in pre-recorded video, or by way of a mirror. All conditions rely on the perception of similarity. The study suggests, however, that contiguity - i.e., the perception of temporal and physical closeness between events - rather than similarity is the principal factor accounting for the results.

10.
Sensors (Basel) ; 20(3)2020 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-31973078

RESUMO

On-board vision systems may need to increase the number of classes that can be recognized in a relatively short period. For instance, a traffic sign recognition system may suddenly be required to recognize new signs. Since collecting and annotating samples of such new classes may need more time than we wish, especially for uncommon signs, we propose a method to generate these samples by combining synthetic images and Generative Adversarial Network (GAN) technology. In particular, the GAN is trained on synthetic and real-world samples from known classes to perform synthetic-to-real domain adaptation, but applied to synthetic samples of the new classes. Using the Tsinghua dataset with a synthetic counterpart, SYNTHIA-TS, we have run an extensive set of experiments. The results show that the proposed method is indeed effective, provided that we use a proper Convolutional Neural Network (CNN) to perform the traffic sign recognition (classification) task as well as a proper GAN to transform the synthetic images. Here, a ResNet101-based classifier and domain adaptation based on CycleGAN performed extremely well for a ratio ∼ 1 / 4 for new/known classes; even for more challenging ratios such as ∼ 4 / 1 , the results are also very positive.

11.
Front Psychol ; 11: 584231, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33510669

RESUMO

As humans interact in the world, they often orient one another's attention to objects through the use of spoken demonstrative expressions and head and/or hand movements to point to the objects. Although indicating behaviors have frequently been studied in lab settings, we know surprisingly little about how demonstratives and pointing are used to coordinate attention in large-scale space and in natural contexts. This study investigates how speakers of Quiahije Chatino, an indigenous language of Mexico, use demonstratives and pointing to give directions to named places in large-scale space across multiple scales (local activity, district, state). The results show that the use and coordination of demonstratives and pointing change as the scale of search space for the target grows. At larger scales, demonstratives and pointing are more likely to occur together, and the two signals appear to manage different aspects of the search for the target: demonstratives orient attention primarily to the gesturing body, while pointing provides cues for narrowing the search space. These findings underscore the distinct contributions of speech and gesture to the linguistic composite, while illustrating the dynamic nature of their interplay. Abstracts in Spanish and Quiahije Chatino are provided as appendices. Se incluyen como apéndices resúmenes en español y en el chatino de San Juan Quiahije. SonG ktyiC reC inH, ngyaqC skaE ktyiC noE ndaH sonB naF ngaJ noI ngyaqC loE ktyiC reC, ngyaqC ranF chaqE xlyaK qoE chaqF jnyaJ noA ndywiqA renqA KchinA KyqyaC.

12.
Front Psychol ; 10: 1775, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31456709

RESUMO

Production studies show that anaphoric reference is bimodal. Speakers can introduce a referent in speech by also using a localizing gesture, assigning a specific locus in space to it. Referring back to that referent, speakers then often accompany a spoken anaphor with a localizing anaphoric gesture (i.e., indicating the same locus). Speakers thus create visual anaphoricity in parallel to the anaphoric process in speech. In the current perception study, we examine whether addressees are sensitive to localizing anaphoric gestures and specifically to the (mis)match between recurrent use of space and spoken anaphora. The results of two reaction time experiments show that, when a single referent is gesturally tracked, addressees are sensitive to the presence of localizing gestures, but not to their spatial congruence. Addressees thus seem to integrate gestural information when processing bimodal anaphora, but their use of locational information in gestures is not obligatory in every discourse context.

13.
Cogn Sci ; 43(3): e12720, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30900290

RESUMO

In two miniature artificial language learning experiments, we compare the processing of narrow and broad negation, corresponding to prefixal negation (unhappy) and free-standing negation (not happy) respectively, with that of non-negation (happy). Three artificial prefixes were invented to express the three meanings above. The meaning scope expressed by the negation types was manipulated in the experiments, and the processing of the three forms was tested through a picture-word verification task. In Experiment 1, the scope expressed by prefixal negation was included in the scope expressed by free-standing negation, while in Experiment 2, there was no overlap between the two negation types and the scope of free-standing negation was limited to the intermediate range of a scale. Experiment 1 showed that narrow negation is more difficult to process than the non-negated meanings, but not as difficult as broad negation. Experiment 2 showed that when the meaning scope of broad negation was restricted to the middle range, the processing difficulty found in Experiment 1 disappeared, as it did not take longer for participants to identify the middle range compared to the ends of the scale. We show that the chunking of the negated meanings relative to one another plays a role in the processing cost of these forms.


Assuntos
Compreensão , Idioma , Aprendizagem , Adulto , Feminino , Humanos , Masculino
14.
J Opt Soc Am A Opt Image Sci Vis ; 36(1): 105-114, 2019 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-30645344

RESUMO

In this work, we propose a convolutional neural network based approach to estimate the spectral reflectance of a surface and spectral power distribution of light from a single RGB image of a V-shaped surface. Interreflections happening in a concave surface lead to gradients of RGB values over its area. These gradients carry a lot of information concerning the physical properties of the surface and the illuminant. Our network is trained with only simulated data constructed using a physics-based interreflection model. Coupling interreflection effects with deep learning helps to retrieve the spectral reflectance under an unknown light and to estimate spectral power distribution of this light as well. In addition, it is more robust to the presence of image noise than classical approaches. Our results show that the proposed approach outperforms state-of-the-art learning-based approaches on simulated data. In addition, it gives better results on real data compared to other interreflection-based approaches.

15.
Artigo em Inglês | MEDLINE | ID: mdl-30640609

RESUMO

In this paper, we extend the standard belief propagation (BP) sequential technique proposed in the tree-reweighted sequential method [15] to the fully connected CRF models with the geodesic distance affinity. The proposed method has been applied to the stereo matching problem. Also a new approach to the BP marginal solution is proposed that we call one-view occlusion detection (OVOD). In contrast to the standard winner takes all (WTA) estimation, the proposed OVOD solution allows to find occluded regions in the disparity map and simultaneously improve the matching result. As a result we can perform only one energy minimization process and avoid the cost calculation for the second view and the left-right check procedure. We show that the OVOD approach considerably improves results for cost augmentation and energy minimization techniques in comparison with the standard one-view affinity space implementation. We apply our method to the Middlebury data set and reach state-of-the-art especially for median, average and mean squared error metrics.

16.
Logoped Phoniatr Vocol ; 44(3): 115-123, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29303017

RESUMO

Observational learning has shown to be a successful intervention for writing. Until now, however, studies have only been performed with normal-hearing participants, usually high school or university students. Additionally, there have been conflicting results in whether subjective text quality correlates with one or more objectively measured text characteristics. In this study, we measured the effect of observational learning in a group of four university students with hearing impairment, and compared the results with those of a group of 10 students with normal hearing who did the same intervention, and those of a control group consisting of 10 students with normal hearing who did not do the intervention. Subjective text quality ratings and nine objectively measured text characteristics were collected for three argumentative texts written by each of the participants. In between writing these three texts, the participants in the experimental groups watched a video of a model writer who read out loud and corrected a similar kind of text. The statistical analysis showed significant correlations between the subjective ratings and four out of the nine objective measures, but no significant intervention effect. These findings suggest that observation-learning intervention is most effective when the model writer is a peer learner, and when the intervention is stretched out over time. Additionally, the method may be better suited for learners younger than the ones who were included in the present study.


Assuntos
Educação de Pessoas com Deficiência Auditiva/métodos , Aprendizagem , Pessoas com Deficiência Auditiva/psicologia , Estudantes/psicologia , Universidades , Redação , Adulto , Estudos de Casos e Controles , Feminino , Humanos , Masculino , Influência dos Pares , Fatores de Tempo , Adulto Jovem
17.
IEEE Trans Image Process ; 28(4): 1837-1850, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30403630

RESUMO

The usage of both off-the-shelf and end-to-end trained deep networks have significantly improved the performance of visual tracking on RGB videos. However, the lack of large labeled datasets hampers the usage of convolutional neural networks for tracking in thermal infrared (TIR) images. Therefore, most state-of-the-art methods on tracking for TIR data are still based on handcrafted features. To address this problem, we propose to use image-to-image translation models. These models allow us to translate the abundantly available labeled RGB data to synthetic TIR data. We explore both the usage of paired and unpaired image translation models for this purpose. These methods provide us with a large labeled dataset of synthetic TIR sequences, on which we can train end-to-end optimal features for tracking. To the best of our knowledge, we are the first to train end-to-end features for TIR tracking. We perform extensive experiments on the VOT-TIR2017 dataset. We show that a network trained on a large dataset of synthetic TIR data obtains better performance than one trained on the available real TIR data. Combining both data sources leads to further improvement. In addition, when we combine the network with motion features, we outperform the state of the art with a relative gain of over 10%, clearly showing the efficiency of using synthetic data to train end-to-end TIR trackers.

18.
Anim Cogn ; 20(6): 1137-1146, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28929247

RESUMO

The ability to inhibit unproductive motor responses triggered by salient stimuli is a fundamental inhibitory skill. Such motor self-regulation is thought to underlie more complex cognitive mechanisms, like self-control. Recently, a large-scale study, comparing 36 species, found that absolute brain size best predicted competence in motor inhibition, with great apes as the best performers. This was challenged when three Corvus species (corvids) were found to parallel great apes despite having much smaller absolute brain sizes. However, new analyses suggest that it is the number of pallial neurons, and not absolute brain size per se, that correlates with levels of motor inhibition. Both studies used the cylinder task, a detour-reaching test where food is presented behind a transparent barrier. We tested four species from the order Psittaciformes (parrots) on this task. Like corvids, many parrots have relatively large brains, high numbers of pallial neurons, and solve challenging cognitive tasks. Nonetheless, parrots performed markedly worse than the Corvus species in the cylinder task and exhibited strong learning effects in performance and response times. Our results suggest either that parrots are poor at controlling their motor impulses, and hence that pallial neuronal numbers do not always correlate with such skills, or that the widely used cylinder task may not be a good measure of motor inhibition.


Assuntos
Inibição Psicológica , Papagaios/fisiologia , Autocontrole , Animais , Comportamento Animal , Feminino , Masculino , Desempenho Psicomotor , Projetos de Pesquisa
19.
Acta Psychol (Amst) ; 180: 175-189, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28961495

RESUMO

In this eye-tracking and drawing study, we investigate the perceptual grounding of different types of spatial dimensions such as dense-sparse and top-bottom, focusing both on the participants' experiences of the opposite regions, e.g., O1: dense; O2: sparse, and the region that is experienced as intermediate, e.g., INT: neither dense nor sparse. Six spatial dimensions expected to have three different perceptual structures in terms of the point and range nature of O1, INT and O2 were analysed. Presented with images, the participants were instructed to identify each region (O1, INT, O2), first by looking at the region, and then circumscribing it using the computer mouse. We measured the eye movements, identification times and various characteristics of the drawings such as the relative size of the three regions, overlaps and gaps. Three main results emerged. Firstly, generally speaking, intermediate regions were not different from the poles on any of the indicators: overall identification times, number of fixations, and locations. Some differences emerged with regard to the duration of fixations for point INTs and the number of fixations for range INTs between two range poles (O1, O2). Secondly, the analyses of the fixation locations showed that the poles support the identification of the intermediate region as much as the intermediate region supports the identification of the poles. Finally, the relative size of the three areas selected in the drawing task were consistent with the classification of the regions as points or ranges. The analyses of the gaps and the overlaps between the three areas showed that the intermediate is neither O1 nor O2, but an entity in its own right.


Assuntos
Movimentos Oculares/fisiologia , Fixação Ocular/fisiologia , Mãos , Reconhecimento Visual de Modelos/fisiologia , Adulto , Sinais (Psicologia) , Feminino , Humanos , Masculino , Adulto Jovem
20.
IEEE Trans Image Process ; 26(8): 3696-3706, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28541203

RESUMO

All known recursive filters based on the geodesic distance affinity are realized by two 1D recursions applied in two orthogonal directions of the image plane. The 2D extension of the filter is not valid and has theoretically drawbacks, which lead to known artifacts. In this paper, a maximum influence propagation method is proposed to approximate the 2D extension for the geodesic distance-based recursive filter. The method allows to partially overcome the drawbacks of the 1D recursion approach. We show that our improved recursion better approximates the true geodesic distance filter, and the application of this improved filter for image denoising outperforms the existing recursive implementation of the geodesic distance. As an application, we consider a geodesic distance-based filter for image denoising. Experimental evaluation of our denoising method demonstrates comparable and for several test images better results, than state-of-the-art approaches, while our algorithm is considerably faster with computational complexity O(8P).

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA