Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
Add more filters










Publication year range
1.
J Neural Eng ; 21(3)2024 May 03.
Article in English | MEDLINE | ID: mdl-38621380

ABSTRACT

Objective. Machine learning (ML) models have opened up enormous opportunities in the field of brain-computer Interfaces (BCIs). Despite their great success, they usually face severe limitations when they are employed in real-life applications outside a controlled laboratory setting.Approach. Mixing causal reasoning, identifying causal relationships between variables of interest, with brainwave modeling can change one's viewpoint on some of these major challenges which can be found in various stages in the ML pipeline, ranging from data collection and data pre-processing to training methods and techniques.Main results. In this work, we employ causal reasoning and present a framework aiming to breakdown and analyze important challenges of brainwave modeling for BCIs.Significance. Furthermore, we present how general ML practices as well as brainwave-specific techniques can be utilized and solve some of these identified challenges. And finally, we discuss appropriate evaluation schemes in order to measure these techniques' performance and efficiently compare them with other methods that will be developed in the future.


Subject(s)
Brain-Computer Interfaces , Machine Learning , Brain-Computer Interfaces/trends , Humans , Electroencephalography/methods , Brain Waves/physiology , Brain/physiology , Algorithms
2.
J Neural Eng ; 21(3)2024 May 13.
Article in English | MEDLINE | ID: mdl-38684154

ABSTRACT

Objective. The patterns of brain activity associated with different brain processes can be used to identify different brain states and make behavioural predictions. However, the relevant features are not readily apparent and accessible. Our aim is to design a system for learning informative latent representations from multichannel recordings of ongoing EEG activity.Approach: We propose a novel differentiable decoding pipeline consisting of learnable filters and a pre-determined feature extraction module. Specifically, we introduce filters parameterized by generalized Gaussian functions that offer a smooth derivative for stable end-to-end model training and allow for learning interpretable features. For the feature module, we use signal magnitude and functional connectivity estimates.Main results.We demonstrate the utility of our model on a new EEG dataset of unprecedented size (i.e. 721 subjects), where we identify consistent trends of music perception and related individual differences. Furthermore, we train and apply our model in two additional datasets, specifically for emotion recognition on SEED and workload classification on simultaneous task EEG workload. The discovered features align well with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening. This agrees with the specialisation of the temporal lobes regarding music perception proposed in the literature.Significance. The proposed method offers strong interpretability of learned features while reaching similar levels of accuracy achieved by black box deep learning models. This improved trustworthiness may promote the use of deep learning models in real world applications. The model code is available athttps://github.com/SMLudwig/EEGminer/.


Subject(s)
Brain , Electroencephalography , Humans , Electroencephalography/methods , Brain/physiology , Male , Adult , Female , Music , Young Adult , Auditory Perception/physiology , Machine Learning , Emotions/physiology
3.
J Neural Eng ; 20(5)2023 09 22.
Article in English | MEDLINE | ID: mdl-37678229

ABSTRACT

Objective.Brain-computer interfaces (BCIs) enable a direct communication of the brain with the external world, using one's neural activity, measured by electroencephalography (EEG) signals. In recent years, convolutional neural networks (CNNs) have been widely used to perform automatic feature extraction and classification in various EEG-based tasks. However, their undeniable benefits are counterbalanced by the lack of interpretability properties as well as the inability to perform sufficiently when only limited amount of training data is available.Approach.In this work, we introduce a novel, lightweight, fully-learnable neural network architecture that relies on Gabor filters to delocalize EEG signal information into scattering decomposition paths along frequency and slow-varying temporal modulations.Main results.We utilize our network in two distinct modeling settings, for building either a generic (training across subjects) or a personalized (training within a subject) classifier.Significance.In both cases, using two different publicly available datasets and one in-house collected dataset, we demonstrate high performance for our model with considerably less number of trainable parameters as well as shorter training time compared to other state-of-the-art deep architectures. Moreover, our network demonstrates enhanced interpretability properties emerging at the level of the temporal filtering operation and enables us to train efficient personalized BCI models with limited amount of training data.


Subject(s)
Brain Waves , Brain-Computer Interfaces , Humans , Electroencephalography , Recognition, Psychology , Brain
4.
Article in English | MEDLINE | ID: mdl-37023162

ABSTRACT

Deep Convolutional Neural Networks (CNNs) have recently demonstrated impressive results in electroencephalogram (EEG) decoding for several Brain-Computer Interface (BCI) paradigms, including Motor-Imagery (MI). However, neurophysiological processes underpinning EEG signals vary across subjects causing covariate shifts in data distributions and hence hindering the generalization of deep models across subjects. In this paper, we aim to address the challenge of inter-subject variability in MI. To this end, we employ causal reasoning to characterize all possible distribution shifts in the MI task and propose a dynamic convolution framework to account for shifts caused by the inter-subject variability. Using publicly available MI datasets, we demonstrate improved generalization performance (up to 5%) across subjects in various MI tasks for four well-established deep architectures.


Subject(s)
Algorithms , Brain-Computer Interfaces , Humans , Neural Networks, Computer , Electroencephalography/methods , Generalization, Psychological , Imagination/physiology
5.
IEEE Trans Pattern Anal Mach Intell ; 44(8): 4021-4034, 2022 08.
Article in English | MEDLINE | ID: mdl-33571091

ABSTRACT

Deep convolutional neural networks (DCNNs) are currently the method of choice both for generative, as well as for discriminative learning in computer vision and machine learning. The success of DCNNs can be attributed to the careful selection of their building blocks (e.g., residual blocks, rectifiers, sophisticated normalization schemes, to mention but a few). In this paper, we propose Π-Nets, a new class of function approximators based on polynomial expansions. Π-Nets are polynomial neural networks, i.e., the output is a high-order polynomial of the input. The unknown parameters, which are naturally represented by high-order tensors, are estimated through a collective tensor factorization with factors sharing. We introduce three tensor decompositions that significantly reduce the number of parameters and show how they can be efficiently implemented by hierarchical neural networks. We empirically demonstrate that Π-Nets are very expressive and they even produce good results without the use of non-linear activation functions in a large battery of tasks and signals, i.e., images, graphs, and audio. When used in conjunction with activation functions, Π-Nets produce state-of-the-art results in three challenging tasks, i.e., image generation, face verification and 3D mesh representation learning. The source code is available at https://github.com/grigorisg9gr/polynomial_nets.


Subject(s)
Algorithms , Neural Networks, Computer , Machine Learning
6.
IEEE Trans Neural Netw Learn Syst ; 33(8): 3498-3509, 2022 Aug.
Article in English | MEDLINE | ID: mdl-33531308

ABSTRACT

Recently, a multitude of methods for image-to-image translation have demonstrated impressive results on problems, such as multidomain or multiattribute transfer. The vast majority of such works leverages the strengths of adversarial learning and deep convolutional autoencoders to achieve realistic results by well-capturing the target data distribution. Nevertheless, the most prominent representatives of this class of methods do not facilitate semantic structure in the latent space and usually rely on binary domain labels for test-time transfer. This leads to rigid models, unable to capture the variance of each domain label. In this light, we propose a novel adversarial learning method that: 1) facilitates the emergence of latent structure by semantically disentangling sources of variation and 2) encourages learning generalizable, continuous, and transferable latent codes that enable flexible attribute mixing. This is achieved by introducing a novel loss function that encourages representations to result in uniformly distributed class posteriors for disentangled attributes. In tandem with an algorithm for inducing generalizable properties, the resulting representations can be utilized for a variety of tasks such as intensity-preserving multiattribute image translation and synthesis, without requiring labeled test data. We demonstrate the merits of the proposed method by a set of qualitative and quantitative experiments on popular databases such as MultiPIE, RaFD, and BU-3DFE, where our method outperforms other state-of-the-art methods in tasks such as intensity-preserving multiattribute transfer and synthesis.

7.
IEEE Trans Pattern Anal Mach Intell ; 43(3): 1022-1040, 2021 03.
Article in English | MEDLINE | ID: mdl-31581074

ABSTRACT

Natural human-computer interaction and audio-visual human behaviour sensing systems, which would achieve robust performance in-the-wild are more needed than ever as digital devices are increasingly becoming an indispensable part of our life. Accurately annotated real-world data are the crux in devising such systems. However, existing databases usually consider controlled settings, low demographic variability, and a single task. In this paper, we introduce the SEWA database of more than 2,000 minutes of audio-visual data of 398 people coming from six cultures, 50 percent female, and uniformly spanning the age range of 18 to 65 years old. Subjects were recorded in two different contexts: while watching adverts and while discussing adverts in a video chat. The database includes rich annotations of the recordings in terms of facial landmarks, facial action units (FAU), various vocalisations, mirroring, and continuously valued valence, arousal, liking, agreement, and prototypic examples of (dis)liking. This database aims to be an extremely valuable resource for researchers in affective computing and automatic human sensing and is expected to push forward the research in human behaviour analysis, including cultural studies. Along with the database, we provide extensive baseline experiments for automatic FAU detection and automatic valence, arousal, and (dis)liking intensity estimation.


Subject(s)
Algorithms , Emotions , Adolescent , Adult , Aged , Attitude , Databases, Factual , Face , Female , Humans , Middle Aged , Young Adult
8.
Sci Rep ; 10(1): 19940, 2020 11 17.
Article in English | MEDLINE | ID: mdl-33203906

ABSTRACT

Brain structure in later life reflects both influences of intrinsic aging and those of lifestyle, environment and disease. We developed a deep neural network model trained on brain MRI scans of healthy people to predict "healthy" brain age. Brain regions most informative for the prediction included the cerebellum, hippocampus, amygdala and insular cortex. We then applied this model to data from an independent group of people not stratified for health. A phenome-wide association analysis of over 1,410 traits in the UK Biobank with differences between the predicted and chronological ages for the second group identified significant associations with over 40 traits including diseases (e.g., type I and type II diabetes), disease risk factors (e.g., increased diastolic blood pressure and body mass index), and poorer cognitive function. These observations highlight relationships between brain and systemic health and have implications for understanding contributions of the latter to late life dementia risk.


Subject(s)
Aging/pathology , Brain Diseases/pathology , Brain/pathology , Cardiovascular Diseases/pathology , Magnetic Resonance Imaging/methods , Metabolic Diseases/pathology , Quantitative Trait Loci , Aging/genetics , Brain Diseases/genetics , Cardiovascular Diseases/genetics , Humans , Mendelian Randomization Analysis , Metabolic Diseases/genetics , Neural Networks, Computer , Neuroimaging/methods
9.
IEEE Trans Cybern ; 50(5): 2288-2301, 2020 May.
Article in English | MEDLINE | ID: mdl-30561363

ABSTRACT

The ability to localize visual objects that are associated with an audio source and at the same time to separate the audio signal is a cornerstone in audio-visual signal-processing applications. However, available methods mainly focus on localizing only the visual objects, without audio separation abilities. Besides that, these methods often rely on either laborious preprocessing steps to segment video frames into semantic regions, or additional supervisions to guide their localization. In this paper, we aim to address the problem of visual source localization and audio separation in an unsupervised manner and avoid all preprocessing or post-processing steps. To this end, we devise a novel structured matrix decomposition method that decomposes the data matrix of each modality as a superposition of three terms: 1) a low-rank matrix capturing the background information; 2) a sparse matrix capturing the correlated components among the two modalities and, hence, uncovering the sound source in visual modality and the associated sound in audio modality; and 3) a third sparse matrix accounting for uncorrelated components, such as distracting objects in visual modality and irrelevant sound in audio modality. The generality of the proposed method is demonstrated by applying it onto three applications, namely: 1) visual localization of a sound source; 2) visually assisted audio separation; and 3) active speaker detection. Experimental results indicate the effectiveness of the proposed method on these application domains.


Subject(s)
Image Processing, Computer-Assisted/methods , Signal Processing, Computer-Assisted , Algorithms , Deep Learning , Humans , Sound Localization , Video Recording
10.
IEEE Trans Pattern Anal Mach Intell ; 41(10): 2349-2364, 2019 10.
Article in English | MEDLINE | ID: mdl-30843800

ABSTRACT

Robust principal component analysis (RPCA) is a powerful method for learning low-rank feature representation of various visual data. However, for certain types as well as significant amount of error corruption, it fails to yield satisfactory results; a drawback that can be alleviated by exploiting domain-dependent prior knowledge or information. In this paper, we propose two models for the RPCA that take into account such side information, even in the presence of missing values. We apply this framework to the task of UV completion which is widely used in pose-invariant face recognition. Moreover, we construct a generative adversarial network (GAN) to extract side information as well as subspaces. These subspaces not only assist in the recovery but also speed up the process in case of large-scale data. We quantitatively and qualitatively evaluate the proposed approaches through both synthetic data and eight real-world datasets to verify their effectiveness.

11.
IEEE Trans Pattern Anal Mach Intell ; 41(4): 928-940, 2019 04.
Article in English | MEDLINE | ID: mdl-29993651

ABSTRACT

Networks have been a general tool for representing, analyzing, and modeling relational data arising in several domains. One of the most important aspect of network analysis is community detection or network clustering. Until recently, the major focus have been on discovering community structure in single (i.e., monoplex) networks. However, with the advent of relational data with multiple modalities, multiplex networks, i.e., networks composed of multiple layers representing different aspects of relations, have emerged. Consequently, community detection in multiplex network, i.e., detecting clusters of nodes shared by all layers, has become a new challenge. In this paper, we propose Network Fusion for Composite Community Extraction (NF-CCE), a new class of algorithms, based on four different non-negative matrix factorization models, capable of extracting composite communities in multiplex networks. Each algorithm works in two steps: first, it finds a non-negative, low-dimensional feature representation of each network layer; then, it fuses the feature representation of layers into a common non-negative, low-dimensional feature representation via collective factorization. The composite clusters are extracted from the common feature representation. We demonstrate the superior performance of our algorithms over the state-of-the-art methods on various types of multiplex networks, including biological, social, economic, citation, phone communication, and brain multiplex networks.

12.
IEEE Trans Pattern Anal Mach Intell ; 41(10): 2365-2379, 2019 Oct.
Article in English | MEDLINE | ID: mdl-30442601

ABSTRACT

Dictionary learning and component analysis models are fundamental for learning compact representations that are relevant to a given task (feature extraction, dimensionality reduction, denoising, etc.). The model complexity is encoded by means of specific structure, such as sparsity, low-rankness, or nonnegativity. Unfortunately, approaches like K-SVD - that learn dictionaries for sparse coding via Singular Value Decomposition (SVD) - are hard to scale to high-volume and high-dimensional visual data, and fragile in the presence of outliers. Conversely, robust component analysis methods such as the Robust Principal Component Analysis (RPCA) are able to recover low-complexity (e.g., low-rank) representations from data corrupted with noise of unknown magnitude and support, but do not provide a dictionary that respects the structure of the data (e.g., images), and also involve expensive computations. In this paper, we propose a novel Kronecker-decomposable component analysis model, coined as Robust Kronecker Component Analysis (RKCA), that combines ideas from sparse dictionary learning and robust component analysis. RKCA has several appealing properties, including robustness to gross corruption; it can be used for low-rank modeling, and leverages separability to solve significantly smaller problems. We design an efficient learning algorithm by drawing links with a restricted form of tensor factorization, and analyze its optimality and low-rankness properties. The effectiveness of the proposed approach is demonstrated on real-world applications, namely background subtraction and image denoising and completion, by performing a thorough comparison with the current state of the art.

13.
IEEE Trans Pattern Anal Mach Intell ; 40(11): 2682-2695, 2018 11.
Article in English | MEDLINE | ID: mdl-29990016

ABSTRACT

Statistical methods are of paramount importance in discovering the modes of variation in visual data. The Principal Component Analysis (PCA) is probably the most prominent method for extracting a single mode of variation in the data. However, in practice, several factors contribute to the appearance of visual objects including pose, illumination, and deformation, to mention a few. To extract these modes of variations from visual data, several supervised methods, such as the TensorFaces relying on multilinear (tensor) decomposition have been developed. The main drawbacks of such methods is that they require both labels regarding the modes of variations and the same number of samples under all modes of variations (e.g., the same face under different expressions, poses etc.). Therefore, their applicability is limited to well-organised data, usually captured in well-controlled conditions. In this paper, we propose a novel general multilinear matrix decomposition method that discovers the multilinear structure of possibly incomplete sets of visual data in unsupervised setting (i.e., without the presence of labels). We also propose extensions of the method with sparsity and low-rank constraints in order to handle noisy data, captured in unconstrained conditions. Besides that, a graph-regularised variant of the method is also developed in order to exploit available geometric or label information for some modes of variations. We demonstrate the applicability of the proposed method in several computer vision tasks, including Shape from Shading (SfS) (in the wild and with occlusion removal), expression transfer, and estimation of surface normals from images captured in the wild.

14.
IEEE Trans Pattern Anal Mach Intell ; 40(11): 2668-2681, 2018 11.
Article in English | MEDLINE | ID: mdl-29990036

ABSTRACT

A set of images depicting faces with different expressions or in various ages consists of components that are shared across all images (i.e., joint components) imparting to the depicted object the properties of human faces as well as individual components that are related to different expressions or age groups. Discovering the common (joint) and individual components in facial images is crucial for applications such as facial expression transfer and age progression. The problem is rather challenging when dealing with images captured in unconstrained conditions in the presence of sparse non-Gaussian errors of large magnitude (i.e., sparse gross errors or outliers) and contain missing data. In this paper, we investigate the use of a method recently introduced in statistics, the so-called Joint and Individual Variance Explained (JIVE) method, for the robust recovery of joint and individual components in visual facial data consisting of an arbitrary number of views. Since the JIVE is not robust to sparse gross errors, we propose alternatives, which are (1) robust to sparse gross, non-Gaussian noise, (2) able to automatically find the individual components rank, and (3) can handle missing data. We demonstrate the effectiveness of the proposed methods to several computer vision applications, namely facial expression synthesis and 2D and 3D face age progression 'in-the-wild'.


Subject(s)
Face/anatomy & histology , Age Factors , Algorithms , Databases, Factual , Facial Expression , Female , Humans , Image Processing, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Machine Learning , Male , Models, Anatomic , Pattern Recognition, Automated/methods
15.
IEEE Trans Pattern Anal Mach Intell ; 40(11): 2638-2652, 2018 11.
Article in English | MEDLINE | ID: mdl-29993707

ABSTRACT

3D Morphable Models (3DMMs) are powerful statistical models of 3D facial shape and texture, and are among the state-of-the-art methods for reconstructing facial shape from single images. With the advent of new 3D sensors, many 3D facial datasets have been collected containing both neutral as well as expressive faces. However, all datasets are captured under controlled conditions. Thus, even though powerful 3D facial shape models can be learnt from such data, it is difficult to build statistical texture models that are sufficient to reconstruct faces captured in unconstrained conditions ("in-the-wild"). In this paper, we propose the first "in-the-wild" 3DMM by combining a statistical model of facial identity and expression shape with an "in-the-wild" texture model. We show that such an approach allows for the development of a greatly simplified fitting procedure for images and videos, as there is no need to optimise with regards to the illumination parameters. We have collected three new benchmarks that combine "in-the-wild" images and video with ground truth 3D facial geometry, the first of their kind, and report extensive quantitative evaluations using them that demonstrate our method is state-of-the-art.


Subject(s)
Face/anatomy & histology , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Algorithms , Databases, Factual , Facial Expression , Female , Humans , Image Processing, Computer-Assisted/methods , Machine Learning , Male , Models, Anatomic , Models, Statistical , Photography , Video Recording
16.
Int J Comput Vis ; 126(2): 333-357, 2018.
Article in English | MEDLINE | ID: mdl-31983807

ABSTRACT

Human behavior and affect is inherently a dynamic phenomenon involving temporal evolution of patterns manifested through a multiplicity of non-verbal behavioral cues including facial expressions, body postures and gestures, and vocal outbursts. A natural assumption for human behavior modeling is that a continuous-time characterization of behavior is the output of a linear time-invariant system when behavioral cues act as the input (e.g., continuous rather than discrete annotations of dimensional affect). Here we study the learning of such dynamical system under real-world conditions, namely in the presence of noisy behavioral cues descriptors and possibly unreliable annotations by employing structured rank minimization. To this end, a novel structured rank minimization method and its scalable variant are proposed. The generalizability of the proposed framework is demonstrated by conducting experiments on 3 distinct dynamic behavior analysis tasks, namely (i) conflict intensity prediction, (ii) prediction of valence and arousal, and (iii) tracklet matching. The attained results outperform those achieved by other state-of-the-art methods for these tasks and, hence, evidence the robustness and effectiveness of the proposed approach.

17.
IEEE Trans Image Process ; 26(12): 5603-5617, 2017 Dec.
Article in English | MEDLINE | ID: mdl-28783634

ABSTRACT

The analysis of high-dimensional, possibly temporally misaligned, and time-varying visual data is a fundamental task in disciplines, such as image, vision, and behavior computing. In this paper, we focus on dynamic facial behavior analysis and in particular on the analysis of facial expressions. Distinct from the previous approaches, where sets of facial landmarks are used for face representation, raw pixel intensities are exploited for: 1) unsupervised analysis of the temporal phases of facial expressions and facial action units (AUs) and 2) temporal alignment of a certain facial behavior displayed by two different persons. To this end, the slow features nonnegative matrix factorization (SFNMF) is proposed in order to learn slow varying parts-based representations of time varying sequences capturing the underlying dynamics of temporal phenomena, such as facial expressions. Moreover, the SFNMF is extended in order to handle two temporally misaligned data sequences depicting the same visual phenomena. To do so, the dynamic time warping is incorporated into the SFNMF, allowing the temporal alignment of the data sets onto the subspace spanned by the estimated nonnegative shared latent features amongst the two visual sequences. Extensive experimental results in two video databases demonstrate the effectiveness of the proposed methods in: 1) unsupervised detection of the temporal phases of posed and spontaneous facial events and 2) temporal alignment of facial expressions, outperforming by a large margin the state-of-the-art methods that they are compared to.

18.
Int J Comput Vis ; 122(2): 270-291, 2017.
Article in English | MEDLINE | ID: mdl-32226226

ABSTRACT

The unconstrained acquisition of facial data in real-world conditions may result in face images with significant pose variations, illumination changes, and occlusions, affecting the performance of facial landmark localization and recognition methods. In this paper, a novel method, robust to pose, illumination variations, and occlusions is proposed for joint face frontalization and landmark localization. Unlike the state-of-the-art methods for landmark localization and pose correction, where large amount of manually annotated images or 3D facial models are required, the proposed method relies on a small set of frontal images only. By observing that the frontal facial image of both humans and animals, is the one having the minimum rank of all different poses, a model which is able to jointly recover the frontalized version of the face as well as the facial landmarks is devised. To this end, a suitable optimization problem is solved, concerning minimization of the nuclear norm (convex surrogate of the rank function) and the matrix ℓ 1 norm accounting for occlusions. The proposed method is assessed in frontal view reconstruction of human and animal faces, landmark localization, pose-invariant face recognition, face verification in unconstrained conditions, and video inpainting by conducting experiment on 9 databases. The experimental results demonstrate the effectiveness of the proposed method in comparison to the state-of-the-art methods for the target problems.

19.
IEEE Trans Image Process ; 25(5): 2021-34, 2016 May.
Article in English | MEDLINE | ID: mdl-27008268

ABSTRACT

Face images convey rich information which can be perceived as a superposition of low-complexity components associated with attributes, such as facial identity, expressions, and activation of facial action units (AUs). For instance, low-rank components characterizing neutral facial images are associated with identity, while sparse components capturing non-rigid deformations occurring in certain face regions reveal expressions and AU activations. In this paper, the discriminant incoherent component analysis (DICA) is proposed in order to extract low-complexity components, corresponding to facial attributes, which are mutually incoherent among different classes (e.g., identity, expression, and AU activation) from training data, even in the presence of gross sparse errors. To this end, a suitable optimization problem, involving the minimization of nuclear-and l1 -norm, is solved. Having found an ensemble of class-specific incoherent components by the DICA, an unseen (test) image is expressed as a group-sparse linear combination of these components, where the non-zero coefficients reveal the class(es) of the respective facial attribute(s) that it belongs to. The performance of the DICA is experimentally assessed on both synthetic and real-world data. Emphasis is placed on face analysis tasks, namely, joint face and expression recognition, face recognition under varying percentages of training data corruption, subject-independent expression recognition, and AU detection by conducting experiments on four data sets. The proposed method outperforms all the methods that are compared with all the tasks and experimental settings.

20.
IEEE Trans Pattern Anal Mach Intell ; 38(8): 1665-78, 2016 08.
Article in English | MEDLINE | ID: mdl-26552077

ABSTRACT

Recovering correlated and individual components of two, possibly temporally misaligned, sets of data is a fundamental task in disciplines such as image, vision, and behavior computing, with application to problems such as multi-modal fusion (via correlated components), predictive analysis, and clustering (via the individual ones). Here, we study the extraction of correlated and individual components under real-world conditions, namely i) the presence of gross non-Gaussian noise and ii) temporally misaligned data. In this light, we propose a method for the Robust Correlated and Individual Component Analysis (RCICA) of two sets of data in the presence of gross, sparse errors. We furthermore extend RCICA in order to handle temporal incongruities arising in the data. To this end, two suitable optimization problems are solved. The generality of the proposed methods is demonstrated by applying them onto 4 applications, namely i) heterogeneous face recognition, ii) multi-modal feature fusion for human behavior analysis (i.e., audio-visual prediction of interest and conflict), iii) face clustering, and iv) thetemporal alignment of facial expressions. Experimental results on 2 synthetic and 7 real world datasets indicate the robustness and effectiveness of the proposed methodson these application domains, outperforming other state-of-the-art methods in the field.


Subject(s)
Behavior , Cluster Analysis , Facial Expression , Pattern Recognition, Automated , Algorithms , Face , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...