Search | VHL CLAP/WR-PAHO/WHO

1.

A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild".

Chrysos, Grigorios G; Antonakos, Epameinondas; Snape, Patrick; Asthana, Akshay; Zafeiriou, Stefanos.

Int J Comput Vis ; 126(2): 198-232, 2018.

Article in English | MEDLINE | ID: mdl-31983805

ABSTRACT

Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300 VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.

2.

Large Scale 3D Morphable Models.

Booth, James; Roussos, Anastasios; Ponniah, Allan; Dunaway, David; Zafeiriou, Stefanos.

Int J Comput Vis ; 126(2): 233-254, 2018.

Article in English | MEDLINE | ID: mdl-31983806

ABSTRACT

We present large scale facial model (LSFM)-a 3D Morphable Model (3DMM) automatically constructed from 9663 distinct facial identities. To the best of our knowledge LSFM is the largest-scale Morphable Model ever constructed, containing statistical information from a huge variety of the human population. To build such a large model we introduce a novel fully automated and robust Morphable Model construction pipeline, informed by an evaluation of state-of-the-art dense correspondence techniques. The dataset that LSFM is trained on includes rich demographic information about each subject, allowing for the construction of not only a global 3DMM model but also models tailored for specific age, gender or ethnicity groups. We utilize the proposed model to perform age classification from 3D shape alone and to reconstruct noisy out-of-sample data in the low-dimensional model space. Furthermore, we perform a systematic analysis of the constructed 3DMM models that showcases their quality and descriptive power. The presented extensive qualitative and quantitative evaluations reveal that the proposed 3DMM achieves state-of-the-art results, outperforming existing models by a large margin. Finally, for the benefit of the research community, we make publicly available the source code of the proposed automatic 3DMM construction pipeline, as well as the constructed global 3DMM and a variety of bespoke models tailored by age, gender and ethnicity.

3.

A Unified Framework for Compositional Fitting of Active Appearance Models.

Alabort-I-Medina, Joan; Zafeiriou, Stefanos.

Int J Comput Vis ; 121(1): 26-64, 2017.

Article in English | MEDLINE | ID: mdl-32355408

ABSTRACT

Active appearance models (AAMs) are one of the most popular and well-established techniques for modeling deformable objects in computer vision. In this paper, we study the problem of fitting AAMs using compositional gradient descent (CGD) algorithms. We present a unified and complete view of these algorithms and classify them with respect to three main characteristics: (i) cost function; (ii) type of composition; and (iii) optimization method. Furthermore, we extend the previous view by: (a) proposing a novel Bayesian cost function that can be interpreted as a general probabilistic formulation of the well-known project-out loss; (b) introducing two new types of composition, asymmetric and bidirectional, that combine the gradients of both image and appearance model to derive better convergent and more robust CGD algorithms; and (c) providing new valuable insights into existent CGD algorithms by reinterpreting them as direct applications of the Schur complement and the Wiberg method. Finally, in order to encourage open research and facilitate future comparisons with our work, we make the implementation of the algorithms studied in this paper publicly available as part of the Menpo Project (http://www.menpo.org).

4.

Robust Statistical Frontalization of Human and Animal Faces.

Sagonas, Christos; Panagakis, Yannis; Zafeiriou, Stefanos; Pantic, Maja.

Int J Comput Vis ; 122(2): 270-291, 2017.

Article in English | MEDLINE | ID: mdl-32226226

ABSTRACT

The unconstrained acquisition of facial data in real-world conditions may result in face images with significant pose variations, illumination changes, and occlusions, affecting the performance of facial landmark localization and recognition methods. In this paper, a novel method, robust to pose, illumination variations, and occlusions is proposed for joint face frontalization and landmark localization. Unlike the state-of-the-art methods for landmark localization and pose correction, where large amount of manually annotated images or 3D facial models are required, the proposed method relies on a small set of frontal images only. By observing that the frontal facial image of both humans and animals, is the one having the minimum rank of all different poses, a model which is able to jointly recover the frontalized version of the face as well as the facial landmarks is devised. To this end, a suitable optimization problem is solved, concerning minimization of the nuclear norm (convex surrogate of the rank function) and the matrix â 1 norm accounting for occlusions. The proposed method is assessed in frontal view reconstruction of human and animal faces, landmark localization, pose-invariant face recognition, face verification in unconstrained conditions, and video inpainting by conducting experiment on 9 databases. The experimental results demonstrate the effectiveness of the proposed method in comparison to the state-of-the-art methods for the target problems.

5.

EEGminer: discovering interpretable features of brain activity with learnable filters.

Ludwig, Siegfried; Bakas, Stylianos; Adamos, Dimitrios A; Laskaris, Nikolaos; Panagakis, Yannis; Zafeiriou, Stefanos.

J Neural Eng ; 21(3)2024 May 13.

Article in English | MEDLINE | ID: mdl-38684154

ABSTRACT

Objective. The patterns of brain activity associated with different brain processes can be used to identify different brain states and make behavioural predictions. However, the relevant features are not readily apparent and accessible. Our aim is to design a system for learning informative latent representations from multichannel recordings of ongoing EEG activity.Approach: We propose a novel differentiable decoding pipeline consisting of learnable filters and a pre-determined feature extraction module. Specifically, we introduce filters parameterized by generalized Gaussian functions that offer a smooth derivative for stable end-to-end model training and allow for learning interpretable features. For the feature module, we use signal magnitude and functional connectivity estimates.Main results.We demonstrate the utility of our model on a new EEG dataset of unprecedented size (i.e. 721 subjects), where we identify consistent trends of music perception and related individual differences. Furthermore, we train and apply our model in two additional datasets, specifically for emotion recognition on SEED and workload classification on simultaneous task EEG workload. The discovered features align well with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening. This agrees with the specialisation of the temporal lobes regarding music perception proposed in the literature.Significance. The proposed method offers strong interpretability of learned features while reaching similar levels of accuracy achieved by black box deep learning models. This improved trustworthiness may promote the use of deep learning models in real world applications. The model code is available athttps://github.com/SMLudwig/EEGminer/.

Subject(s)

Brain , Electroencephalography , Humans , Electroencephalography/methods , Brain/physiology , Male , Adult , Female , Music , Young Adult , Auditory Perception/physiology , Machine Learning , Emotions/physiology

6.

A causal perspective on brainwave modeling for brain-computer interfaces.

Barmpas, Konstantinos; Panagakis, Yannis; Zoumpourlis, Georgios; Adamos, Dimitrios A; Laskaris, Nikolaos; Zafeiriou, Stefanos.

J Neural Eng ; 21(3)2024 May 03.

Article in English | MEDLINE | ID: mdl-38621380

ABSTRACT

Objective. Machine learning (ML) models have opened up enormous opportunities in the field of brain-computer Interfaces (BCIs). Despite their great success, they usually face severe limitations when they are employed in real-life applications outside a controlled laboratory setting.Approach. Mixing causal reasoning, identifying causal relationships between variables of interest, with brainwave modeling can change one's viewpoint on some of these major challenges which can be found in various stages in the ML pipeline, ranging from data collection and data pre-processing to training methods and techniques.Main results. In this work, we employ causal reasoning and present a framework aiming to breakdown and analyze important challenges of brainwave modeling for BCIs.Significance. Furthermore, we present how general ML practices as well as brainwave-specific techniques can be utilized and solve some of these identified challenges. And finally, we discuss appropriate evaluation schemes in order to measure these techniques' performance and efficiently compare them with other methods that will be developed in the future.

Subject(s)

Brain-Computer Interfaces , Machine Learning , Brain-Computer Interfaces/trends , Humans , Electroencephalography/methods , Brain Waves/physiology , Brain/physiology , Algorithms

7.

Free-HeadGAN: Neural Talking Head Synthesis With Explicit Gaze Control.

Doukas, Michail Christos; Ververas, Evangelos; Sharmanska, Viktoriia; Zafeiriou, Stefanos.

IEEE Trans Pattern Anal Mach Intell ; 45(8): 9743-9756, 2023 Aug.

Article in English | MEDLINE | ID: mdl-37028333

ABSTRACT

We present Free-HeadGAN, a person-generic neural talking head synthesis system. We show that modeling faces with sparse 3D facial landmarks is sufficient for achieving state-of-the-art generative performance, without relying on strong statistical priors of the face, such as 3D Morphable Models. Apart from 3D pose and facial expressions, our method is capable of fully transferring the eye gaze, from a driving actor to a source identity. Our complete pipeline consists of three components: a canonical 3D key-point estimator that regresses 3D pose and expression-related deformations, a gaze estimation network and a generator that is built upon the architecture of HeadGAN. We further experiment with an extension of our generator to accommodate few-shot learning using an attention mechanism, in case multiple source images are available. Compared to recent methods for reenactment and motion transfer, our system achieves higher photo-realism combined with superior identity preservation, while offering explicit gaze control.

Subject(s)

Algorithms , Face , Humans , Fixation, Ocular , Learning , Facial Expression

8.

Redesigning Multi-Scale Neural Network for Crowd Counting.

Du, Zhipeng; Shi, Miaojing; Deng, Jiankang; Zafeiriou, Stefanos.

IEEE Trans Image Process ; 32: 3664-3678, 2023.

Article in English | MEDLINE | ID: mdl-37384475

ABSTRACT

Perspective distortions and crowd variations make crowd counting a challenging task in computer vision. To tackle it, many previous works have used multi-scale architecture in deep neural networks (DNNs). Multi-scale branches can be either directly merged (e.g. by concatenation) or merged through the guidance of proxies (e.g. attentions) in the DNNs. Despite their prevalence, these combination methods are not sophisticated enough to deal with the per-pixel performance discrepancy over multi-scale density maps. In this work, we redesign the multi-scale neural network by introducing a hierarchical mixture of density experts, which hierarchically merges multi-scale density maps for crowd counting. Within the hierarchical structure, an expert competition and collaboration scheme is presented to encourage contributions from all scales; pixel-wise soft gating nets are introduced to provide pixel-wise soft weights for scale combinations in different hierarchies. The network is optimized using both the crowd density map and the local counting map, where the latter is obtained by local integration on the former. Optimizing both can be problematic because of their potential conflicts. We introduce a new relative local counting loss based on relative count differences among hard-predicted local regions in an image, which proves to be complementary to the conventional absolute error loss on the density map. Experiments show that our method achieves the state-of-the-art performance on five public datasets, i.e. ShanghaiTech, UCF_CC_50, JHU-CROWD++, NWPU-Crowd and Trancos. Our codes will be available at https://github.com/ZPDu/Redesigning-Multi-Scale-Neural-Network-for-Crowd-Counting.

9.

Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting.

Bouritsas, Giorgos; Frasca, Fabrizio; Zafeiriou, Stefanos; Bronstein, Michael M.

IEEE Trans Pattern Anal Mach Intell ; 45(1): 657-668, 2023 Jan.

Article in English | MEDLINE | ID: mdl-35201983

ABSTRACT

While Graph Neural Networks (GNNs) have achieved remarkable results in a variety of applications, recent studies exposed important shortcomings in their ability to capture the structure of the underlying graph. It has been shown that the expressive power of standard GNNs is bounded by the Weisfeiler-Leman (WL) graph isomorphism test, from which they inherit proven limitations such as the inability to detect and count graph substructures. On the other hand, there is significant empirical evidence, e.g. in network science and bioinformatics, that substructures are often intimately related to downstream tasks. To this end, we propose "Graph Substructure Networks" (GSN), a topologically-aware message passing scheme based on substructure encoding. We theoretically analyse the expressive power of our architecture, showing that it is strictly more expressive than the WL test, and provide sufficient conditions for universality. Importantly, we do not attempt to adhere to the WL hierarchy; this allows us to retain multiple attractive properties of standard GNNs such as locality and linear network complexity, while being able to disambiguate even hard instances of graph isomorphism. We perform an extensive experimental evaluation on graph classification and regression tasks and obtain state-of-the-art results in diverse real-world settings including molecular graphs and social networks.

10.

BrainWave-Scattering Net: a lightweight network for EEG-based motor imagery recognition.

Barmpas, Konstantinos; Panagakis, Yannis; Adamos, Dimitrios A; Laskaris, Nikolaos; Zafeiriou, Stefanos.

J Neural Eng ; 20(5)2023 09 22.

Article in English | MEDLINE | ID: mdl-37678229

ABSTRACT

Objective.Brain-computer interfaces (BCIs) enable a direct communication of the brain with the external world, using one's neural activity, measured by electroencephalography (EEG) signals. In recent years, convolutional neural networks (CNNs) have been widely used to perform automatic feature extraction and classification in various EEG-based tasks. However, their undeniable benefits are counterbalanced by the lack of interpretability properties as well as the inability to perform sufficiently when only limited amount of training data is available.Approach.In this work, we introduce a novel, lightweight, fully-learnable neural network architecture that relies on Gabor filters to delocalize EEG signal information into scattering decomposition paths along frequency and slow-varying temporal modulations.Main results.We utilize our network in two distinct modeling settings, for building either a generic (training across subjects) or a personalized (training within a subject) classifier.Significance.In both cases, using two different publicly available datasets and one in-house collected dataset, we demonstrate high performance for our model with considerably less number of trainable parameters as well as shorter training time compared to other state-of-the-art deep architectures. Moreover, our network demonstrates enhanced interpretability properties emerging at the level of the temporal filtering operation and enables us to train efficient personalized BCI models with limited amount of training data.

Subject(s)

Brain Waves , Brain-Computer Interfaces , Humans , Electroencephalography , Recognition, Psychology , Brain

11.

Inverse Image Frequency for Long-Tailed Image Recognition.

Alexandridis, Konstantinos Panagiotis; Luo, Shan; Nguyen, Anh; Deng, Jiankang; Zafeiriou, Stefanos.

IEEE Trans Image Process ; 32: 5721-5736, 2023.

Article in English | MEDLINE | ID: mdl-37824316

ABSTRACT

The long-tailed distribution is a common phenomenon in the real world. Extracted large scale image datasets inevitably demonstrate the long-tailed property and models trained with imbalanced data can obtain high performance for the over-represented categories, but struggle for the under-represented categories, leading to biased predictions and performance degradation. To address this challenge, we propose a novel de-biasing method named Inverse Image Frequency (IIF). IIF is a multiplicative margin adjustment transformation of the logits in the classification layer of a convolutional neural network. Our method achieves stronger performance than similar works and it is especially useful for downstream tasks such as long-tailed instance segmentation as it produces fewer false positive detections. Our extensive experiments show that IIF surpasses the state of the art on many long-tailed benchmarks such as ImageNet-LT, CIFAR-LT, Places-LT and LVIS, reaching 55.8% top-1 accuracy with ResNet50 on ImageNet-LT and 26.3% segmentation AP with MaskRCNN ResNet50 on LVIS. Code available at https://github.com/kostas1515/iif.

12.

Linear Complexity Self-Attention With 3rd Order Polynomials.

Babiloni, Francesca; Marras, Ioannis; Deng, Jiankang; Kokkinos, Filippos; Maggioni, Matteo; Chrysos, Grigorios; Torr, Philip; Zafeiriou, Stefanos.

IEEE Trans Pattern Anal Mach Intell ; 45(11): 12726-12737, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37030770

ABSTRACT

Self-attention mechanisms and non-local blocks have become crucial building blocks for state-of-the-art neural architectures thanks to their unparalleled ability in capturing long-range dependencies in the input. However their cost is quadratic with the number of spatial positions hence making their use impractical in many real case applications. In this work, we analyze these methods through a polynomial lens, and we show that self-attention can be seen as a special case of a 3 rd order polynomial. Within this polynomial framework, we are able to design polynomial operators capable of accessing the same data pattern of non-local and self-attention blocks while reducing the complexity from quadratic to linear. As a result, we propose two modules (Poly-NL and Poly-SA) that can be used as "drop-in" replacements for more-complex non-local and self-attention layers in state-of-the-art CNNs and ViT architectures. Our modules can achieve comparable, if not better, performance across a wide range of computer vision tasks while keeping a complexity equivalent to a standard linear layer.

13.

Improving Generalization of CNN-Based Motor-Imagery EEG Decoders via Dynamic Convolutions.

Barmpas, Konstantinos; Panagakis, Yannis; Bakas, Stylianos; Adamos, Dimitrios A; Laskaris, Nikolaos; Zafeiriou, Stefanos.

IEEE Trans Neural Syst Rehabil Eng ; 31: 1997-2005, 2023.

Article in English | MEDLINE | ID: mdl-37023162

ABSTRACT

Deep Convolutional Neural Networks (CNNs) have recently demonstrated impressive results in electroencephalogram (EEG) decoding for several Brain-Computer Interface (BCI) paradigms, including Motor-Imagery (MI). However, neurophysiological processes underpinning EEG signals vary across subjects causing covariate shifts in data distributions and hence hindering the generalization of deep models across subjects. In this paper, we aim to address the challenge of inter-subject variability in MI. To this end, we employ causal reasoning to characterize all possible distribution shifts in the MI task and propose a dynamic convolution framework to account for shifts caused by the inter-subject variability. Using publicly available MI datasets, we demonstrate improved generalization performance (up to 5%) across subjects in various MI tasks for four well-established deep architectures.

Subject(s)

Algorithms , Brain-Computer Interfaces , Humans , Neural Networks, Computer , Electroencephalography/methods , Generalization, Psychological , Imagination/physiology

14.

EDFace-Celeb-1 M: Benchmarking Face Hallucination With a Million-Scale Dataset.

Zhang, Kaihao; Li, Dongxu; Luo, Wenhan; Liu, Jingyu; Deng, Jiankang; Liu, Wei; Zafeiriou, Stefanos.

IEEE Trans Pattern Anal Mach Intell ; 45(3): 3968-3978, 2023 03.

Article in English | MEDLINE | ID: mdl-35687621

ABSTRACT

Recent deep face hallucination methods show stunning performance in super-resolving severely degraded facial images, even surpassing human ability. However, these algorithms are mainly evaluated on non-public synthetic datasets. It is thus unclear how these algorithms perform on public face hallucination datasets. Meanwhile, most of the existing datasets do not well consider the distribution of races, which makes face hallucination methods trained on these datasets biased toward some specific races. To address the above two problems, in this paper, we build a public Ethnically Diverse Face dataset, EDFace-Celeb-1 M, and design a benchmark task for face hallucination. Our dataset includes 1.7 million photos that cover different countries, with relatively balanced race composition. To the best of our knowledge, it is the largest-scale and publicly available face hallucination dataset in the wild. Associated with this dataset, this paper also contributes various evaluation protocols and provides comprehensive analysis to benchmark the existing state-of-the-art methods. The benchmark evaluations demonstrate the performance and limitations of state-of-the-art algorithms. https://github.com/HDCVLab/EDFace-Celeb-1M.

Subject(s)

Algorithms , Benchmarking , Humans , Hallucinations

15.

Fast-GANFIT: Generative Adversarial Network for High Fidelity 3D Face Reconstruction.

Gecer, Baris; Ploumpis, Stylianos; Kotsia, Irene; Zafeiriou, Stefanos.

IEEE Trans Pattern Anal Mach Intell ; 44(9): 4879-4893, 2022 09.

Article in English | MEDLINE | ID: mdl-34043505

ABSTRACT

A lot of work has been done towards reconstructing the 3D facial structure from single images by capitalizing on the power of deep convolutional neural networks (DCNNs). In the recent works, the texture features either correspond to components of a linear texture space or are learned by auto-encoders directly from in-the-wild images. In all cases, the quality of the facial texture reconstruction is still not capable of modeling facial texture with high-frequency details. In this paper, we take a radically different approach and harness the power of generative adversarial networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images. That is, we utilize GANs to train a very powerful facial texture prior from a large-scale 3D texture dataset. Then, we revisit the original 3D Morphable Models (3DMMs) fitting making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. In order to be robust towards initialisation and expedite the fitting process, we propose a novel self-supervised regression based approach. We demonstrate excellent results in photorealistic and identity preserving 3D face reconstructions and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details.

Subject(s)

Algorithms , Neural Networks, Computer , Face/diagnostic imaging , Image Processing, Computer-Assisted/methods

16.

Deep Polynomial Neural Networks.

Chrysos, Grigorios G; Moschoglou, Stylianos; Bouritsas, Giorgos; Deng, Jiankang; Panagakis, Yannis; Zafeiriou, Stefanos.

IEEE Trans Pattern Anal Mach Intell ; 44(8): 4021-4034, 2022 08.

Article in English | MEDLINE | ID: mdl-33571091

ABSTRACT

Deep convolutional neural networks (DCNNs) are currently the method of choice both for generative, as well as for discriminative learning in computer vision and machine learning. The success of DCNNs can be attributed to the careful selection of their building blocks (e.g., residual blocks, rectifiers, sophisticated normalization schemes, to mention but a few). In this paper, we propose Π-Nets, a new class of function approximators based on polynomial expansions. Π-Nets are polynomial neural networks, i.e., the output is a high-order polynomial of the input. The unknown parameters, which are naturally represented by high-order tensors, are estimated through a collective tensor factorization with factors sharing. We introduce three tensor decompositions that significantly reduce the number of parameters and show how they can be efficiently implemented by hierarchical neural networks. We empirically demonstrate that Π-Nets are very expressive and they even produce good results without the use of non-linear activation functions in a large battery of tasks and signals, i.e., images, graphs, and audio. When used in conjunction with activation functions, Π-Nets produce state-of-the-art results in three challenging tasks, i.e., image generation, face verification and 3D mesh representation learning. The source code is available at https://github.com/grigorisg9gr/polynomial_nets.

Subject(s)

Algorithms , Neural Networks, Computer , Machine Learning

17.

AvatarMe⁺⁺: Facial Shape and BRDF Inference With Photorealistic Rendering-Aware GANs.

Lattas, Alexandros; Moschoglou, Stylianos; Ploumpis, Stylianos; Gecer, Baris; Ghosh, Abhijeet; Zafeiriou, Stefanos.

IEEE Trans Pattern Anal Mach Intell ; 44(12): 9269-9284, 2022 Dec.

Article in English | MEDLINE | ID: mdl-34748477

ABSTRACT

Over the last years, with the advent of Generative Adversarial Networks (GANs), many face analysis tasks have accomplished astounding performance, with applications including, but not limited to, face generation and 3D face reconstruction from a single "in-the-wild" image. Nevertheless, to the best of our knowledge, there is no method which can produce render-ready high-resolution 3D faces from "in-the-wild" images and this can be attributed to the: (a) scarcity of available data for training, and (b) lack of robust methodologies that can successfully be applied on very high-resolution data. In this paper, we introduce the first method that is able to reconstruct photorealistic render-ready 3D facial geometry and BRDF from a single "in-the-wild" image. To achieve this, we capture a large dataset of facial shape and reflectance, which we have made public. Moreover, we define a fast and photorealistic differentiable rendering methodology with accurate facial skin diffuse and specular reflection, self-occlusion and subsurface scattering approximation. With this, we train a network that disentangles the facial diffuse and specular reflectance components from a mesh and texture with baked illumination, scanned or reconstructed with a 3DMM fitting method. As we demonstrate in a series of qualitative and quantitative experiments, our method outperforms the existing arts by a significant margin and reconstructs authentic, 4K by 6K-resolution 3D faces from a single low-resolution image, that are ready to be rendered in various applications and bridge the uncanny valley.

Subject(s)

Algorithms , Image Processing, Computer-Assisted , Image Processing, Computer-Assisted/methods , Face/diagnostic imaging , Lighting

18.

ArcFace: Additive Angular Margin Loss for Deep Face Recognition.

Deng, Jiankang; Guo, Jia; Yang, Jing; Xue, Niannan; Kotsia, Irene; Zafeiriou, Stefanos.

IEEE Trans Pattern Anal Mach Intell ; 44(10): 5962-5979, 2022 10.

Article in English | MEDLINE | ID: mdl-34106845

ABSTRACT

Recently, a popular line of research in face recognition is adopting margins in the well-established softmax loss function to maximize class separability. In this paper, we first introduce an Additive Angular Margin Loss (ArcFace), which not only has a clear geometric interpretation but also significantly enhances the discriminative power. Since ArcFace is susceptible to the massive label noise, we further propose sub-center ArcFace, in which each class contains K sub-centers and training samples only need to be close to any of the K positive sub-centers. Sub-center ArcFace encourages one dominant sub-class that contains the majority of clean faces and non-dominant sub-classes that include hard or noisy faces. Based on this self-propelled isolation, we boost the performance through automatically purifying raw web faces under massive real-world noise. Besides discriminative feature embedding, we also explore the inverse problem, mapping feature vectors to face images. Without training any additional generator or discriminator, the pre-trained ArcFace model can generate identity-preserved face images for both subjects inside and outside the training data only by using the network gradient and Batch Normalization (BN) priors. Extensive experiments demonstrate that ArcFace can enhance the discriminative feature embedding as well as strengthen the generative face synthesis.

Subject(s)

Facial Recognition , Algorithms , Face , Humans

19.

Convolutional mesh autoencoders for the 3-dimensional identification of FGFR-related craniosynostosis.

O' Sullivan, Eimear; van de Lande, Lara S; Papaioannou, Athanasios; Breakey, Richard W F; Jeelani, N Owase; Ponniah, Allan; Duncan, Christian; Schievano, Silvia; Khonsari, Roman H; Zafeiriou, Stefanos; Dunaway, David J.

Sci Rep ; 12(1): 2230, 2022 02 09.

Article in English | MEDLINE | ID: mdl-35140239

ABSTRACT

Clinical diagnosis of craniofacial anomalies requires expert knowledge. Recent studies have shown that artificial intelligence (AI) based facial analysis can match the diagnostic capabilities of expert clinicians in syndrome identification. In general, these systems use 2D images and analyse texture and colour. They are powerful tools for photographic analysis but are not suitable for use with medical imaging modalities such as ultrasound, MRI or CT, and are unable to take shape information into consideration when making a diagnostic prediction. 3D morphable models (3DMMs), and their recently proposed successors, mesh autoencoders, analyse surface topography rather than texture enabling analysis from photography and all common medical imaging modalities and present an alternative to image-based analysis. We present a craniofacial analysis framework for syndrome identification using Convolutional Mesh Autoencoders (CMAs). The models were trained using 3D photographs of the general population (LSFM and LYHM), computed tomography data (CT) scans from healthy infants and patients with 3 genetically distinct craniofacial syndromes (Muenke, Crouzon, Apert). Machine diagnosis outperformed expert clinical diagnosis with an accuracy of 99.98%, sensitivity of 99.95% and specificity of 100%. The diagnostic precision of this technique supports its potential inclusion in clinical decision support systems. Its reliance on 3D topography characterisation make it suitable for AI assisted diagnosis in medical imaging as well as photographic analysis in the clinical setting.

Subject(s)

Artificial Intelligence , Craniosynostoses/classification , Craniosynostoses/diagnosis , Image Processing, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Computer Simulation , Craniosynostoses/diagnostic imaging , Face/abnormalities , Head/abnormalities , Humans , Infant , Tomography, X-Ray Computed

20.

Growth patterns and shape development of the paediatric mandible - A 3D statistical model.

O' Sullivan, Eimear; van de Lande, Lara S; El Ghoul, Khalid; Koudstaal, Maarten J; Schievano, Silvia; Khonsari, Roman H; Dunaway, David J; Zafeiriou, Stefanos.

Bone Rep ; 16: 101528, 2022 Jun.

Article in English | MEDLINE | ID: mdl-35399871

ABSTRACT

Background/aim: To develop a 3D morphable model of the normal paediatric mandible to analyse shape development and growth patterns for males and females. Methods: Computed tomography (CT) data was collected for 242 healthy children referred for CT scan between 2011 and 2018 aged between 0 and 47 months (mean, 20.6 ± 13.4 months, 59.9% male). Thresholding techniques were used to segment the mandible from the CT scans. All mandible meshes were annotated using a defined set of 52 landmarks and processed such that all meshes followed a consistent triangulation. Following this, the mandible meshes were rigidly aligned to remove translation and rotation effects, while size effects were retained. Principal component analysis (PCA) was applied to the processed meshes to construct a generative 3D morphable model. Partial least squares (PLS) regression was also applied to the processed data to extract the shape modes with which to evaluate shape differences for age and sex. Growth curves were constructed for anthropometric measurements. Results: A 3D morphable model of the paediatric mandible was constructed and validated with good generalisation, compactness, and specificity. Growth curves of the assessed anthropometric measurements were plotted without significant differences between male and female subjects. The first principal component was dominated by size effects and is highly correlated with age at time of scan (Spearman's r = 0.94, p < 0.01). As with PCA, the first extracted PLS mode captures much of the size variation within the dataset and is highly correlated with age (Spearman's r = -0.94, p < 0.01). Little correlation was observed between extracted shape modes and sex with either PCA or PLS for this study population. Conclusion: The presented 3D morphable model of the paediatric mandible enables an understanding of mandibular shape development and variation by age and sex. It allowed for the construction of growth curves, which contains valuable information that can be used to enhance our understanding of various disorders that affect the mandibular development. Knowledge of shape changes in the growing mandible has potential to improve diagnostic accuracy for craniofacial conditions that impact the mandibular morphology, objective evaluation, surgical planning, and patient follow-up.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL