Pesquisa | BVS Violência e Saúde

1.

Understanding the role of individual units in a deep neural network.

Bau, David; Zhu, Jun-Yan; Strobelt, Hendrik; Lapedriza, Agata; Zhou, Bolei; Torralba, Antonio.

Proc Natl Acad Sci U S A ; 117(48): 30071-30078, 2020 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-32873639

RESUMO

Deep neural networks excel at finding hierarchical representations that solve complex tasks over large datasets. How can we humans understand these learned representations? In this work, we present network dissection, an analytic framework to systematically identify the semantics of individual hidden units within image classification and image generation networks. First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts. We find evidence that the network has learned many object classes that play crucial roles in classifying scene classes. Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes. By analyzing changes made when small sets of units are activated or deactivated, we find that objects can be added and removed from the output scenes while adapting to the context. Finally, we apply our analytic framework to understanding adversarial attacks and to semantic image editing.

2.

Modeling Subjective Affect Annotations with Multi-Task Learning.

Hayat, Hassan; Ventura, Carles; Lapedriza, Agata.

Sensors (Basel) ; 22(14)2022 Jul 13.

Artigo em Inglês | MEDLINE | ID: mdl-35890925

RESUMO

In supervised learning, the generalization capabilities of trained models are based on the available annotations. Usually, multiple annotators are asked to annotate the dataset samples and, then, the common practice is to aggregate the different annotations by computing average scores or majority voting, and train and test models on these aggregated annotations. However, this practice is not suitable for all types of problems, especially when the subjective information of each annotator matters for the task modeling. For example, emotions experienced while watching a video or evoked by other sources of content, such as news headlines, are subjective: different individuals might perceive or experience different emotions. The aggregated annotations in emotion modeling may lose the subjective information and actually represent an annotation bias. In this paper, we highlight the weaknesses of models that are trained on aggregated annotations for modeling tasks related to affect. More concretely, we compare two generic Deep Learning architectures: a Single-Task (ST) architecture and a Multi-Task (MT) architecture. While the ST architecture models single emotional perception each time, the MT architecture jointly models every single annotation and the aggregated annotations at once. Our results show that the MT approach can more accurately model every single annotation and the aggregated annotations when compared to methods that are directly trained on the aggregated annotations. Furthermore, the MT approach achieves state-of-the-art results on the COGNIMUSE, IEMOCAP, and SemEval_2007 benchmarks.

3.

Incidents1M: A Large-Scale Dataset of Images With Natural Disasters, Damage, and Incidents.

Weber, Ethan; Papadopoulos, Dim P; Lapedriza, Agata; Ofli, Ferda; Imran, Muhammad; Torralba, Antonio.

IEEE Trans Pattern Anal Mach Intell ; 45(4): 4768-4781, 2023 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-35905065

RESUMO

Natural disasters, such as floods, tornadoes, or wildfires, are increasingly pervasive as the Earth undergoes global warming. It is difficult to predict when and where an incident will occur, so timely emergency response is critical to saving the lives of those endangered by destructive events. Fortunately, technology can play a role in these situations. Social media posts can be used as a low-latency data source to understand the progression and aftermath of a disaster, yet parsing this data is tedious without automated methods. Prior work has mostly focused on text-based filtering, yet image and video-based filtering remains largely unexplored. In this work, we present the Incidents1M Dataset, a large-scale multi-label dataset which contains 977,088 images, with 43 incident and 49 place categories. We provide details of the dataset construction, statistics and potential biases; introduce and train a model for incident detection; and perform image-filtering experiments on millions of images on Flickr and Twitter. We also present some applications on incident analysis to encourage and enable future work in computer vision for humanitarian aid. Code, data, and models are available at http://incidentsdataset.csail.mit.edu.

4.

Deploying a robotic positive psychology coach to improve college students' psychological well-being.

Jeong, Sooyeon; Aymerich-Franch, Laura; Arias, Kika; Alghowinem, Sharifa; Lapedriza, Agata; Picard, Rosalind; Park, Hae Won; Breazeal, Cynthia.

User Model User-adapt Interact ; 33(2): 571-615, 2023 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38737788

RESUMO

Despite the increase in awareness and support for mental health, college students' mental health is reported to decline every year in many countries. Several interactive technologies for mental health have been proposed and are aiming to make therapeutic service more accessible, but most of them only provide one-way passive contents for their users, such as psycho-education, health monitoring, and clinical assessment. We present a robotic coach that not only delivers interactive positive psychology interventions but also provides other useful skills to build rapport with college students. Results from our on-campus housing deployment feasibility study showed that the robotic intervention showed significant association with increases in students' psychological well-being, mood, and motivation to change. We further found that students' personality traits were associated with the intervention outcomes as well as their working alliance with the robot and their satisfaction with the interventions. Also, students' working alliance with the robot was shown to be associated with their pre-to-post change in motivation for better well-being. Analyses on students' behavioral cues showed that several verbal and nonverbal behaviors were associated with the change in self-reported intervention outcomes. The qualitative analyses on the post-study interview suggest that the robotic coach's companionship made a positive impression on students, but also revealed areas for improvement in the design of the robotic coach. Results from our feasibility study give insight into how learning users' traits and recognizing behavioral cues can help an AI agent provide personalized intervention experiences for better mental health outcomes.

5.

Context Based Emotion Recognition Using EMOTIC Dataset.

Kosti, Ronak; Alvarez, Jose M; Recasens, Adria; Lapedriza, Agata.

IEEE Trans Pattern Anal Mach Intell ; 42(11): 2755-2766, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-31095475

RESUMO

In our everyday lives and social interactions we often try to perceive the emotional states of people. There has been a lot of research in providing machines with a similar capacity of recognizing emotions. From a computer vision perspective, most of the previous efforts have been focusing in analyzing the facial expressions and, in some cases, also the body pose. Some of these methods work remarkably well in specific settings. However, their performance is limited in natural, unconstrained environments. Psychological studies show that the scene context, in addition to facial expression and body pose, provides important information to our perception of people's emotions. However, the processing of the context for automatic emotion recognition has not been explored in depth, partly due to the lack of proper data. In this paper we present EMOTIC, a dataset of images of people in a diverse set of natural situations, annotated with their apparent emotion. The EMOTIC dataset combines two different types of emotion representation: (1) a set of 26 discrete categories, and (2) the continuous dimensions Valence, Arousal, and Dominance. We also present a detailed statistical and algorithmic analysis of the dataset along with annotators' agreement analysis. Using the EMOTIC dataset we train different CNN models for emotion recognition, combining the information of the bounding box containing the person with the contextual information extracted from the scene. Our results show how scene context provides important information to automatically recognize emotional states and motivate further research in this direction.

6.

Places: A 10 Million Image Database for Scene Recognition.

Zhou, Bolei; Lapedriza, Agata; Khosla, Aditya; Oliva, Aude; Torralba, Antonio.

IEEE Trans Pattern Anal Mach Intell ; 40(6): 1452-1464, 2018 06.

Artigo em Inglês | MEDLINE | ID: mdl-28692961

RESUMO

The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification performance at tasks such as visual object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories, comprising a large and diverse list of the types of environments encountered in the world. Using the state-of-the-art Convolutional Neural Networks (CNNs), we provide scene classification CNNs (Places-CNNs) as baselines, that significantly outperform the previous approaches. Visualization of the CNNs trained on Places shows that object detectors emerge as an intermediate representation of scene classification. With its high-coverage and high-diversity of exemplars, the Places Database along with the Places-CNNs offer a novel resource to guide future progress on scene recognition problems.

7.

Boosted online learning for face recognition.

Masip, David; Lapedriza, Agata; Vitrià, Jordi.

IEEE Trans Syst Man Cybern B Cybern ; 39(2): 530-8, 2009 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-19095543

RESUMO

Face recognition applications commonly suffer from three main drawbacks: a reduced training set, information lying in high-dimensional subspaces, and the need to incorporate new people to recognize. In the recent literature, the extension of a face classifier in order to include new people in the model has been solved using online feature extraction techniques. The most successful approaches of those are the extensions of the principal component analysis or the linear discriminant analysis. In the current paper, a new online boosting algorithm is introduced: a face recognition method that extends a boosting-based classifier by adding new classes while avoiding the need of retraining the classifier each time a new person joins the system. The classifier is learned using the multitask learning principle where multiple verification tasks are trained together sharing the same feature space. The new classes are added taking advantage of the structure learned previously, being the addition of new classes not computationally demanding. The present proposal has been (experimentally) validated with two different facial data sets by comparing our approach with the current state-of-the-art techniques. The results show that the proposed online boosting algorithm fares better in terms of final accuracy. In addition, the global performance does not decrease drastically even when the number of classes of the base problem is multiplied by eight.

Assuntos

Inteligência Artificial , Face , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Análise por Conglomerados , Humanos , Distribuição de Poisson , Análise de Componente Principal

8.

Preferred spatial frequencies for human face processing are associated with optimal class discrimination in the machine.

Keil, Matthias S; Lapedriza, Agata; Masip, David; Vitria, Jordi.

PLoS One ; 3(7): e2590, 2008 Jul 02.

Artigo em Inglês | MEDLINE | ID: mdl-18596932

RESUMO

Psychophysical studies suggest that humans preferentially use a narrow band of low spatial frequencies for face recognition. Here we asked whether artificial face recognition systems have an improved recognition performance at the same spatial frequencies as humans. To this end, we estimated recognition performance over a large database of face images by computing three discriminability measures: Fisher Linear Discriminant Analysis, Non-Parametric Discriminant Analysis, and Mutual Information. In order to address frequency dependence, discriminabilities were measured as a function of (filtered) image size. All three measures revealed a maximum at the same image sizes, where the spatial frequency content corresponds to the psychophysical found frequencies. Our results therefore support the notion that the critical band of spatial frequencies for face recognition in humans and machines follows from inherent properties of face images, and that the use of these frequencies is associated with optimal face recognition performance.

Assuntos

Face , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Feminino , Humanos , Interpretação de Imagem Assistida por Computador/instrumentação , Masculino , Reconhecimento Visual de Modelos , Percepção Visual

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA