Gaze Tracking Based on Concatenating Spatial-Temporal Features.

Hwang, Bor-Jiunn; Chen, Hui-Hui; Hsieh, Chaur-Heh; Huang, Deng-Yu

Hwang, Bor-Jiunn; Chen, Hui-Hui; Hsieh, Chaur-Heh; Huang, Deng-Yu.

Afiliación

Hwang BJ; Department of Computer and Communication Engineering, Ming Chuan University, Taoyuan 333, Taiwan.
Chen HH; Department of Computer and Communication Engineering, Ming Chuan University, Taoyuan 333, Taiwan.
Hsieh CH; College of Artificial Intelligence, Yango University, Fuzhou 350015, China.
Huang DY; Department of Computer and Communication Engineering, Ming Chuan University, Taoyuan 333, Taiwan.

Sensors (Basel) ; 22(2)2022 Jan 11.

Article en En | MEDLINE | ID: mdl-35062502

ABSTRACT

ABSTRACT

Based on experimental observations, there is a correlation between time and consecutive gaze positions in visual behaviors. Previous studies on gaze point estimation usually use images as the input for model trainings without taking into account the sequence relationship between image data. In addition to the spatial features, the temporal features are considered to improve the accuracy in this paper by using videos instead of images as the input data. To be able to capture spatial and temporal features at the same time, the convolutional neural network (CNN) and long short-term memory (LSTM) network are introduced to build a training model. In this way, CNN is used to extract the spatial features, and LSTM correlates temporal features. This paper presents a CNN Concatenating LSTM network (CCLN) that concatenates spatial and temporal features to improve the performance of gaze estimation in the case of time-series videos as the input training data. In addition, the proposed model can be optimized by exploring the numbers of LSTM layers, the influence of batch normalization (BN) and global average pooling layer (GAP) on CCLN. It is generally believed that larger amounts of training data will lead to better models. To provide data for training and prediction, we propose a method for constructing datasets of video for gaze point estimation. The issues are studied, including the effectiveness of different commonly used general models and the impact of transfer learning. Through exhaustive evaluation, it has been proved that the proposed method achieves a better prediction accuracy than the existing CNN-based methods. Finally, 93.1% of the best model and 92.6% of the general model MobileNet are obtained.

Asunto(s)

Tecnología de Seguimiento Ocular; Redes Neurales de la Computación; Memoria a Largo Plazo; Factores de Tiempo

Palabras clave

convolutional neural network (CNN); deep learning; gaze tracking; long short-term memory (LSTM)

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Redes Neurales de la Computación / Tecnología de Seguimiento Ocular Tipo de estudio: Prognostic_studies Idioma: En Revista: Sensors (Basel) Año: 2022 Tipo del documento: Article País de afiliación: Taiwán

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google