Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Front Robot AI ; 8: 555913, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34277714

RESUMO

Listening to one another is essential to human-human interaction. In fact, we humans spend a substantial part of our day listening to other people, in private as well as in work settings. Attentive listening serves the function to gather information for oneself, but at the same time, it also signals to the speaker that he/she is being heard. To deduce whether our interlocutor is listening to us, we are relying on reading his/her nonverbal cues, very much like how we also use non-verbal cues to signal our attention. Such signaling becomes more complex when we move from dyadic to multi-party interactions. Understanding how humans use nonverbal cues in a multi-party listening context not only increases our understanding of human-human communication but also aids the development of successful human-robot interactions. This paper aims to bring together previous analyses of listener behavior analyses in human-human multi-party interaction and provide novel insights into gaze patterns between the listeners in particular. We are investigating whether the gaze patterns and feedback behavior, as observed in the human-human dialogue, are also beneficial for the perception of a robot in multi-party human-robot interaction. To answer this question, we are implementing an attentive listening system that generates multi-modal listening behavior based on our human-human analysis. We are comparing our system to a baseline system that does not differentiate between different listener types in its behavior generation. We are evaluating it in terms of the participant's perception of the robot, his behavior as well as the perception of third-party observers.

2.
IEEE Trans Pattern Anal Mach Intell ; 43(3): 1092-1099, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-31804927

RESUMO

Most non-invasive gaze estimation methods regress gaze directions directly from a single face or eye image. However, due to important variabilities in eye shapes and inner eye structures amongst individuals, universal models obtain limited accuracies and their output usually exhibit high variance as well as subject dependent biases. Thus, increasing accuracy is usually done through calibration, allowing gaze predictions for a subject to be mapped to her actual gaze. In this article, we introduce a novel approach, which works by directly training a differential convolutional neural network to predict gaze differences between two eye input images of the same subject. Then, given a set of subject specific calibration images, we can use the inferred differences to predict the gaze direction of a novel eye sample. The assumption is that by comparing eye images of the same user, annoyance factors (alignment, eyelid closing, illumination perturbations) which usually plague single image prediction methods can be much reduced, allowing better prediction altogether. Furthermore, the differential network itself can be adapted via finetuning to make predictions consistent with the available user reference pairs. Experiments on 3 public datasets validate our approach which constantly outperforms state-of-the-art methods even when using only one calibration sample or those relying on subject specific gaze adaptation.

3.
IEEE Trans Pattern Anal Mach Intell ; 40(11): 2653-2667, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-29993569

RESUMO

Head pose estimation is a fundamental task for face and social related research. Although 3D morphable model (3DMM) based methods relying on depth information usually achieve accurate results, they usually require frontal or mid-profile poses which preclude a large set of applications where such conditions can not be garanteed, like monitoring natural interactions from fixed sensors placed in the environment. A major reason is that 3DMM models usually only cover the face region. In this paper, we present a framework which combines the strengths of a 3DMM model fitted online with a prior-free reconstruction of a 3D full head model providing support for pose estimation from any viewpoint. In addition, we also proposes a symmetry regularizer for accurate 3DMM fitting under partial observations, and exploit visual tracking to address natural head dynamics with fast accelerations. Extensive experiments show that our method achieves state-of-the-art performance on the public BIWI dataset, as well as accurate and robust results on UbiPose, an annotated dataset of natural interactions that we make public and where adverse poses, occlusions or fast motions regularly occur.


Assuntos
Cabeça/anatomia & histologia , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Algoritmos , Bases de Dados Factuais , Aprendizado Profundo , Face/anatomia & histologia , Expressão Facial , Feminino , Humanos , Masculino , Modelos Anatômicos , Reconhecimento Automatizado de Padrão/métodos , Postura
4.
IEEE Trans Pattern Anal Mach Intell ; 38(8): 1583-97, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-26955020

RESUMO

This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatio-temporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network (DBN) to handle skeletal dynamics, and a 3D Convolutional Neural Network (3DCNN) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data.


Assuntos
Gestos , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão , Algoritmos , Humanos , Aprendizagem , Distribuição Normal
5.
IEEE Trans Image Process ; 23(7): 3040-56, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24835228

RESUMO

We present a conditional random field approach to tracking-by-detection in which we model pairwise factors linking pairs of detections and their hidden labels, as well as higher order potentials defined in terms of label costs. To the contrary of previous papers, our method considers long-term connectivity between pairs of detections and models similarities as well as dissimilarities between them, based on position, color, and as novelty, visual motion cues. We introduce a set of feature-specific confidence scores, which aim at weighting feature contributions according to their reliability. Pairwise potential parameters are then learned in an unsupervised way from detections or from tracklets. Label costs are defined so as to penalize the complexity of the labeling, based on prior knowledge about the scene like the location of entry/exit zones. Experiments on PETS'09, TUD, CAVIAR, Parking Lot, and Town Center public data sets show the validity of our approach, and similar or better performance than recent state-of-the-art algorithms.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Movimento/fisiologia , Humanos , Análise Espaço-Temporal , Gravação em Vídeo
6.
IEEE Trans Pattern Anal Mach Intell ; 36(1): 140-56, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24231872

RESUMO

In this paper, we present a new model for unsupervised discovery of recurrent temporal patterns (or motifs) in time series (or documents). The model is designed to handle the difficult case of multivariate time series obtained from a mixture of activities, that is, our observations are caused by the superposition of multiple phenomena occurring concurrently and with no synchronization. The model uses nonparametric Bayesian methods to describe both the motifs and their occurrences in documents. We derive an inference scheme to automatically and simultaneously recover the recurrent motifs (both their characteristics and number) and their occurrence instants in each document. The model is widely applicable and is illustrated on datasets coming from multiple modalities, mainly videos from static cameras and audio localization data. The rich semantic interpretation that the model offers can be leveraged in tasks such as event counting or for scene analysis. The approach is also used as a mean of doing soft camera calibration in a camera network. A thorough study of the model parameters is provided and a cross-platform implementation of the inference algorithm will be made publicly available.

7.
IEEE Trans Image Process ; 22(1): 272-85, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22851262

RESUMO

To improve visual tracking, a large number of papers study more powerful features, or better cue fusion mechanisms, such as adaptation or contextual models. A complementary approach consists of improving the track management, that is, deciding when to add a target or stop its tracking, for example, in case of failure. This is an essential component for effective multiobject tracking applications, and is often not trivial. Deciding whether or not to stop a track is a compromise between avoiding erroneous early stopping while tracking is fine, and erroneous continuation of tracking when there is an actual failure. This decision process, very rarely addressed in the literature, is difficult due to object detector deficiencies or observation models that are insufficient to describe the full variability of tracked objects and deliver reliable likelihood (tracking) information. This paper addresses the track management issue and presents a real-time online multiface tracking algorithm that effectively deals with the above difficulties. The tracking itself is formulated in a multiobject state-space Bayesian filtering framework solved with Markov Chain Monte Carlo. Within this framework, an explicit probabilistic filtering step decides when to add or remove a target from the tracker, where decisions rely on multiple cues such as face detections, likelihood measures, long-term observations, and track state characteristics. The method has been applied to three challenging data sets of more than 9 h in total, and demonstrate a significant performance increase compared to more traditional approaches (Markov Chain Monte Carlo, reversible-jump Markov Chain Monte Carlo) only relying on head detection and likelihood for track management.


Assuntos
Identificação Biométrica/métodos , Face/anatomia & histologia , Processamento de Imagem Assistida por Computador/métodos , Algoritmos , Teorema de Bayes , Humanos , Cadeias de Markov , Método de Monte Carlo , Gravação em Vídeo
8.
IEEE Trans Pattern Anal Mach Intell ; 33(1): 101-16, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21088322

RESUMO

This paper introduces a novel contextual model for the recognition of people's visual focus of attention (VFOA) in meetings from audio-visual perceptual cues. More specifically, instead of independently recognizing the VFOA of each meeting participant from his own head pose, we propose to jointly recognize the participants' visual attention in order to introduce context-dependent interaction models that relate to group activity and the social dynamics of communication. Meeting contextual information is represented by the location of people, conversational events identifying floor holding patterns, and a presentation activity variable. By modeling the interactions between the different contexts and their combined and sometimes contradictory impact on the gazing behavior, our model allows us to handle VFOA recognition in difficult task-based meetings involving artifacts, presentations, and moving people. We validated our model through rigorous evaluation on a publicly available and challenging data set of 12 real meetings (5 hours of data). The results demonstrated that the integration of the presentation and conversation dynamical context using our model can lead to significant performance improvements.


Assuntos
Atenção/fisiologia , Sinais (Psicologia) , Movimentos da Cabeça/fisiologia , Algoritmos , Inteligência Artificial , Teorema de Bayes , Simulação por Computador , Humanos
9.
IEEE Trans Syst Man Cybern B Cybern ; 39(1): 16-33, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19068430

RESUMO

We address the problem of recognizing the visual focus of attention (VFOA) of meeting participants based on their head pose. To this end, the head pose observations are modeled using a Gaussian mixture model (GMM) or a hidden Markov model (HMM) whose hidden states correspond to the VFOA. The novelties of this paper are threefold. First, contrary to previous studies on the topic, in our setup, the potential VFOA of a person is not restricted to other participants only. It includes environmental targets as well (a table and a projection screen), which increases the complexity of the task, with more VFOA targets spread in the pan as well as tilt gaze space. Second, we propose a geometric model to set the GMM or HMM parameters by exploiting results from cognitive science on saccadic eye motion, which allows the prediction of the head pose given a gaze target. Third, an unsupervised parameter adaptation step not using any labeled data is proposed, which accounts for the specific gazing behavior of each participant. Using a publicly available corpus of eight meetings featuring four persons, we analyze the above methods by evaluating, through objective performance measures, the recognition of the VFOA from head pose information obtained either using a magnetic sensor device or a vision-based tracking system. The results clearly show that in such complex but realistic situations, the VFOA recognition performance is highly dependent on how well the visual targets are separated for a given meeting participant. In addition, the results show that the use of a geometric model with unsupervised adaptation achieves better results than the use of training data to set the HMM parameters.


Assuntos
Atenção/fisiologia , Movimentos da Cabeça/fisiologia , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Campos Visuais/fisiologia , Algoritmos , Inteligência Artificial , Simulação por Computador , Humanos , Cadeias de Markov , Distribuição Normal
10.
IEEE Trans Pattern Anal Mach Intell ; 30(7): 1212-29, 2008 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-18550904

RESUMO

We define and address the problem of finding the visual focus of attention for a varying number of wandering people (VFOA-W), determining where the people's movement is unconstrained. VFOA-W estimation is a new and important problem with mplications for behavior understanding and cognitive science, as well as real-world applications. One such application, which we present in this article, monitors the attention passers-by pay to an outdoor advertisement. Our approach to the VFOA-W problem proposes a multi-person tracking solution based on a dynamic Bayesian network that simultaneously infers the (variable) number of people in a scene, their body locations, their head locations, and their head pose. For efficient inference in the resulting large variable-dimensional state-space we propose a Reversible Jump Markov Chain Monte Carlo (RJMCMC) sampling scheme, as well as a novel global observation model which determines the number of people in the scene and localizes them. We propose a Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM)-based VFOA-W model which use head pose and location information to determine people's focus state. Our models are evaluated for tracking performance and ability to recognize people looking at an outdoor advertisement, with results indicating good performance on sequences where a moderate number of people pass in front of an advertisement.


Assuntos
Inteligência Artificial , Atenção , Interpretação de Imagem Assistida por Computador/métodos , Movimento/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Campos Visuais/fisiologia , Percepção Visual/fisiologia , Algoritmos , Humanos , Aumento da Imagem/métodos , Imageamento Tridimensional/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Gravação em Vídeo/métodos
11.
IEEE Trans Pattern Anal Mach Intell ; 29(9): 1575-89, 2007 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-17627045

RESUMO

This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate (1) whether a text-like bag-of-visterms representation (histogram of quantized local visual features) is suitable for scene (rather than object) classification, (2) whether some analogies between discrete scene representations and text documents exist, and (3) whether unsupervised, latent space models can be used both as feature extractors for the classification task and to discover patterns of visual co-occurrence. Using several data sets, we validate our approach, presenting and discussing experiments on each of these issues. We first show, with extensive experiments on binary and multi-class scene classification tasks using a 9,500-image data set, that the bag-of-visterms representation consistently outperforms classical scene classification approaches. In other data sets we show that our approach competes with or outperforms other recent, more complex, methods. We also show that Probabilistic Latent Semantic Analysis (PLSA) generates a compact scene representation, discriminative for accurate classification, and more robust than the bag-of-visterms representation when less labeled training data is available. Finally, through aspect-based image ranking experiments, we show the ability of PLSA to automatically extract visually meaningful scene patterns, making such representation useful for browsing image collections.


Assuntos
Algoritmos , Inteligência Artificial , Bases de Dados Factuais , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Aumento da Imagem/métodos , Processamento de Linguagem Natural , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
12.
IEEE Trans Image Process ; 15(11): 3514-30, 2006 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17076409

RESUMO

Particle filtering is now established as one of the most popular methods for visual tracking. Within this framework, there are two important considerations. The first one refers to the generic assumption that the observations are temporally independent given the sequence of object states. The second consideration, often made in the literature, uses the transition prior as the proposal distribution. Thus, the current observations are not taken into account, requiring the noise process of this prior to be large enough to handle abrupt trajectory changes. As a result, many particles are either wasted in low likelihood regions of the state space, resulting in low sampling efficiency, or more importantly, propagated to distractor regions of the image, resulting in tracking failures. In this paper, we propose to handle both considerations using motion. We first argue that, in general, observations are conditionally correlated, and propose a new model to account for this correlation, allowing for the natural introduction of implicit and/or explicit motion measurements in the likelihood term. Second, explicit motion measurements are used to drive the sampling process towards the most likely regions of the state space. Overall, the proposed model handles abrupt motion changes and filters out visual distractors, when tracking objects with generic models based on shape or color distribution. Results were obtained on head tracking experiments using several sequences with moving camera involving large dynamics. When compared against the Condensation Algorithm, they have demonstrated the superior tracking performance of our approach.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Análise por Conglomerados , Simulação por Computador , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Movimento (Física) , Processos Estocásticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...