RESUMEN
We investigate cross-lingual sentiment analysis, which has attracted significant attention due to its applications in various areas including market research, politics, and social sciences. In particular, we introduce a sentiment analysis framework in multi-label setting as it obeys Plutchik's wheel of emotions. We introduce a novel dynamic weighting method that balances the contribution from each class during training, unlike previous static weighting methods that assign non-changing weights based on their class frequency. Moreover, we adapt the focal loss that favors harder instances from single-label object recognition literature to our multi-label setting. Furthermore, we derive a method to choose optimal class-specific thresholds that maximize the macro-f1 score in linear time complexity. Through an extensive set of experiments, we show that our method obtains the state-of-the-art performance in seven of nine metrics in three different languages using a single model compared with the common baselines and the best performing methods in the SemEval competition. We publicly share our code for our model, which can perform sentiment analysis in 100 languages, to facilitate further research.
RESUMEN
Parkinson's disease (PD) diagnosis is based on clinical criteria, i.e., bradykinesia, rest tremor, rigidity, etc. Assessment of the severity of PD symptoms with clinical rating scales, however, is subject to inter-rater variability. In this paper, we propose a deep learning based automatic PD diagnosis method using videos to assist the diagnosis in clinical practices. We deploy a 3D Convolutional Neural Network (CNN) as the baseline approach for the PD severity classification and show the effectiveness. Due to the lack of data in clinical field, we explore the possibility of transfer learning from non-medical dataset and show that PD severity classification can benefit from it. To bridge the domain discrepancy between medical and non-medical datasets, we let the network focus more on the subtle temporal visual cues, i.e., the frequency of tremors, by designing a Temporal Self-Attention (TSA) mechanism. Seven tasks from the Movement Disorders Society - Unified PD rating scale (MDS-UPDRS) part III are investigated, which reveal the symptoms of bradykinesia and postural tremors. Furthermore, we propose a multi-domain learning method to predict the patient-level PD severity through task-assembling. We show the effectiveness of TSA and task-assembling method on our PD video dataset empirically. We achieve the best MCC of 0.55 on binary task-level and 0.39 on three-class patient-level classification.
Asunto(s)
Enfermedad de Parkinson , Humanos , Hipocinesia/diagnóstico , Pruebas de Estado Mental y Demencia , Enfermedad de Parkinson/diagnóstico , Índice de Severidad de la Enfermedad , Temblor/diagnósticoRESUMEN
INTRODUCTION: The most prominent risk factor of Alzheimer's disease (AD) is aging. Aging also influences the physical appearance. Our clinical experience suggests that patients with AD may appear younger than their actual age. Based on this empirical observation, we set forth to test the hypothesis with human and computer-based estimation systems. METHOD: We compared 50 early-stage AD patients with 50 age and sex-matched controls. Facial images of all subjects were recorded using a video camera with high resolution, frontal view, and clear lighting. Subjects were recorded during natural conversations while performing Mini-Mental State Examination, including spontaneous smiles in addition to static images. The images were used for age estimation by 2 methods: (1) computer-based age estimation; (2) human-based age estimation. Computer-based system used a state-of-the-art deep convolutional neural network classifier to process the facial images contained in a single-video session and performed frame-based age estimation. Individuals who estimated the age by visual inspection of video sequences were chosen following a pilot selection phase. The mean error (ME) of estimations was the main end point of this study. RESULTS: There was no statistically significant difference between the ME scores for AD patients and healthy controls (p = 0.33); however, the difference was in favor of younger estimation of the AD group. The average ME score for AD patients was lower than that for healthy controls in computer-based estimation system, indicating that AD patients were on average estimated to be younger than their actual age as compared to controls. This difference was statistically significant (p = 0.007). CONCLUSION: There was a tendency for humans to estimate AD patients younger, and computer-based estimations showed that AD patients were estimated to be younger than their real age as compared to controls. The underlying mechanisms for this observation are unclear.
Asunto(s)
Envejecimiento/fisiología , Enfermedad de Alzheimer , Apariencia Física , Estadística como Asunto/métodos , Factores de Edad , Anciano , Enfermedad de Alzheimer/diagnóstico , Enfermedad de Alzheimer/fisiopatología , Metodologías Computacionales , Femenino , Humanos , Masculino , Grabación en VideoRESUMEN
The standard clinical assessment of pain is limited primarily to self-reported pain or clinician impression. While the self-reported measurement of pain is useful, in some circumstances it cannot be obtained. Automatic facial expression analysis has emerged as a potential solution for an objective, reliable, and valid measurement of pain. In this study, we propose a video based approach for the automatic measurement of self-reported pain and the observer pain intensity, respectively. To this end, we explore the added value of three self-reported pain scales, i.e., the Visual Analog Scale (VAS), the Sensory Scale (SEN), and the Affective Motivational Scale (AFF), as well as the Observer Pain Intensity (OPI) rating for a reliable assessment of pain intensity from facial expression. Using a spatio-temporal Convolutional Neural Network - Recurrent Neural Network (CNN-RNN) architecture, we propose to jointly minimize the mean absolute error of pain scores estimation for each of these scales while maximizing the consistency between them. The reliability of the proposed method is evaluated on the benchmark database for pain measurement from videos, namely, the UNBC-McMaster Pain Archive. Our results show that enforcing the consistency between different self-reported pain intensity scores collected using different pain scales enhances the quality of predictions and improve the state of the art in automatic self-reported pain estimation. The obtained results suggest that automatic assessment of self-reported pain intensity from videos is feasible, and could be used as a complementary instrument to unburden caregivers, specially for vulnerable populations that need constant monitoring.
RESUMEN
The main challenges of age estimation from facial expression videos lie not only in the modeling of the static facial appearance, but also in the capturing of the temporal facial dynamics. Traditional techniques to this problem focus on constructing handcrafted features to explore the discriminative information contained in facial appearance and dynamics separately. This relies on sophisticated feature-refinement and framework-design. In this paper, we present an end-toend architecture for age estimation, called Spatially-Indexed Attention Model (SIAM), which is able to simultaneously learn both the appearance and dynamics of age from raw videos of facial expressions. Specifically, we employ convolutional neural networks to extract effective latent appearance representations and feed them into recurrent networks to model the temporal dynamics. More importantly, we propose to leverage attention models for salience detection in both the spatial domain for each single image and the temporal domain for the whole video as well. We design a specific spatially-indexed attention mechanism among the convolutional layers to extract the salient facial regions in each individual image, and a temporal attention layer to assign attention weights to each frame. This two-pronged approach not only improves the performance by allowing the model to focus on informative frames and facial areas, but it also offers an interpretable correspondence between the spatial facial regions as well as temporal frames, and the task of age estimation. We demonstrate the strong performance of our model in experiments on a large, gender-balanced database with 400 subjects with ages spanning from 8 to 76 years. Experiments reveal that our model exhibits significant superiority over the state-of-the-art methods given sufficient training data.
RESUMEN
We present a new model for multivariate time-series classification, called the hidden-unit logistic model (HULM), that uses binary stochastic hidden units to model latent structure in the data. The hidden units are connected in a chain structure that models temporal dependencies in the data. Compared with the prior models for time-series classification such as the hidden conditional random field, our model can model very complex decision boundaries, because the number of latent states grows exponentially with the number of hidden units. We demonstrate the strong performance of our model in experiments on a variety of (computer vision) tasks, including handwritten character recognition, speech recognition, facial expression, and action recognition. We also present a state-of-the-art system for facial action unit detection based on the HULM.
RESUMEN
Depression is one of the most common psychiatric disorders worldwide, with over 350 million people affected. Current methods to screen for and assess depression depend almost entirely on clinical interviews and self-report scales. While useful, such measures lack objective, systematic, and efficient ways of incorporating behavioral observations that are strong indicators of depression presence and severity. Using dynamics of facial and head movement and vocalization, we trained classifiers to detect three levels of depression severity. Participants were a community sample diagnosed with major depressive disorder. They were recorded in clinical interviews (Hamilton Rating Scale for Depression, HRSD) at seven-week intervals over a period of 21 weeks. At each interview, they were scored by the HRSD as moderately to severely depressed, mildly depressed, or remitted. Logistic regression classifiers using leave-one-participant-out validation were compared for facial movement, head movement, and vocal prosody individually and in combination. Accuracy of depression severity measurement from facial movement dynamics was higher than that for head movement dynamics, and each was substantially higher than that for vocal prosody. Accuracy using all three modalities combined only marginally exceeded that of face and head combined. These findings suggest that automatic detection of depression severity from behavioral indicators in patients is feasible and that multimodal measures afford the most powerful detection.
Asunto(s)
Depresión , Diagnóstico por Computador/métodos , Grabación en Video/métodos , Adulto , Anciano , Depresión/clasificación , Depresión/diagnóstico , Depresión/fisiopatología , Cara/fisiología , Femenino , Movimientos de la Cabeza/fisiología , Humanos , Entrevistas como Asunto , Persona de Mediana Edad , Índice de Severidad de la Enfermedad , Procesamiento de Señales Asistido por Computador , Voz/fisiología , Adulto JovenRESUMEN
Estimating the age of a human from the captured images of his/her face is a challenging problem. In general, the existing approaches to this problem use appearance features only. In this paper, we show that in addition to appearance information, facial dynamics can be leveraged in age estimation. We propose a method to extract and use dynamic features for age estimation, using a person's smile. Our approach is tested on a large, gender-balanced database with 400 subjects, with an age range between 8 and 76. In addition, we introduce a new database on posed disgust expressions with 324 subjects in the same age range, and evaluate the reliability of the proposed approach when used with another expression. State-of-the-art appearance-based age estimation methods from the literature are implemented as baseline. We demonstrate that for each of these methods, the addition of the proposed dynamic features results in statistically significant improvement. We further propose a novel hierarchical age estimation architecture based on adaptive age grouping. We test our approach extensively, including an exploration of spontaneous versus posed smile dynamics, and gender-specific age estimation. We show that using spontaneity information reduces the mean absolute error by up to 21%, advancing the state of the art for facial age estimation.
Asunto(s)
Envejecimiento/fisiología , Antropometría/métodos , Cara/anatomía & histología , Cara/fisiología , Expresión Facial , Interpretación de Imagen Asistida por Computador/métodos , Adolescente , Adulto , Anciano , Envejecimiento/patología , Inteligencia Artificial , Biometría/métodos , Niño , Femenino , Humanos , Masculino , Reconocimiento de Normas Patrones Automatizadas/métodos , Fotograbar/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Adulto JovenRESUMEN
Current methods for depression assessment depend almost entirely on clinical interview or self-report ratings. Such measures lack systematic and efficient ways of incorporating behavioral observations that are strong indicators of psychological disorder. We compared a clinical interview of depression severity with automatic measurement in 48 participants undergoing treatment for depression. Interviews were obtained at 7-week intervals on up to four occasions. Following standard cut-offs, participants at each session were classified as remitted, intermediate, or depressed. Logistic regression classifiers using leave-one-out validation were compared for facial movement dynamics, head movement dynamics, and vocal prosody individually and in combination. Accuracy (remitted versus depressed) for facial movement dynamics was higher than that for head movement dynamics; and each was substantially higher than that for vocal prosody. Accuracy for all three modalities together reached 88.93%, exceeding that for any single modality or pair of modalities. These findings suggest that automatic detection of depression from behavioral indicators is feasible and that multimodal measures afford most powerful detection.
RESUMEN
Many facial-analysis approaches rely on robust and accurate automatic facial landmarking to correctly function. In this paper, we describe a statistical method for automatic facial-landmark localization. Our landmarking relies on a parsimonious mixture model of Gabor wavelet features, computed in coarse-to-fine fashion and complemented with a shape prior. We assess the accuracy and the robustness of the proposed approach in extensive cross-database conditions conducted on four face data sets (Face Recognition Grand Challenge, Cohn-Kanade, Bosphorus, and BioID). Our method has 99.33% accuracy on the Bosphorus database and 97.62% accuracy on the BioID database on the average, which improves the state of the art. We show that the method is not significantly affected by low-resolution images, small rotations, facial expressions, and natural occlusions such as beard and mustache. We further test the goodness of the landmarks in a facial expression recognition application and report landmarking-induced improvement over baseline on two separate databases for video-based expression recognition (Cohn-Kanade and BU-4DFE).