Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Sci Data ; 11(1): 87, 2024 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-38238370

RESUMEN

Oracle bone script is an ancient Chinese writing system engraved on turtle shells and animal bones, serving as a valuable resource for interpreting ancient culture, history, and language. We introduce the Oracle-MNIST dataset, comprising of 28 × 28 grayscale images of 30,222 ancient characters from 10 categories, designed for benchmarking pattern classification, with particular challenges related to image noise and distortion. The training set totally consists of 27,222 images, and the test set contains 300 images per class. Oracle-MNIST follows the same data format with the original MNIST dataset, enabling direct compatibility with all existing classifiers and systems, but it constitutes a more challenging classification task than MNIST. The images of ancient characters suffer from (1) extremely serious and unique noises caused by three-thousand years of burial and aging and (2) dramatically variant writing styles by ancient Chinese, which all make them realistic for machine learning research.

2.
Math Biosci Eng ; 20(8): 13562-13580, 2023 Jun 14.
Artículo en Inglés | MEDLINE | ID: mdl-37679102

RESUMEN

The advancement of deep learning has resulted in significant improvements on various visual tasks. However, deep neural networks (DNNs) have been found to be vulnerable to well-designed adversarial examples, which can easily deceive DNNs by adding visually imperceptible perturbations to original clean data. Prior research on adversarial attack methods mainly focused on single-task settings, i.e., generating adversarial examples to fool networks with a specific task. However, real-world artificial intelligence systems often require solving multiple tasks simultaneously. In such multi-task situations, the single-task adversarial attacks will have poor attack performance on the unrelated tasks. To address this issue, the generation of multi-task adversarial examples should leverage the generalization knowledge among multiple tasks and reduce the impact of task-specific information during the generation process. In this study, we propose a multi-task adversarial attack method to generate adversarial examples from a multi-task learning network by applying attention distraction with gradient sharpening. Specifically, we first attack the attention heat maps, which contain more generalization information than feature representations, by distracting the attention on the attack regions. Additionally, we use gradient-based adversarial example-generating schemes and propose to sharpen the gradients so that the gradients with multi-task information rather than only task-specific information can make a greater impact. Experimental results on the NYUD-V2 and PASCAL datasets demonstrate that the proposed method can improve the generalization ability of adversarial examples among multiple tasks and achieve better attack performance.

3.
J Cardiothorac Surg ; 18(1): 70, 2023 Feb 10.
Artículo en Inglés | MEDLINE | ID: mdl-36765357

RESUMEN

OBJECTIVE: We aimed to estimate the prevalence of CRFs and investigate its associated social-economic factors among adults in coastal areas of Qinzhou, Guangxi. METHODS: A representative sample of 1836 participants aged 20 to 70 years was included in Qinzhou, Guangxi in 2020. Data were collected by the questionnaire, anthropometric and laboratory measurements. The prevalence of CRFs, including hypertension, dyslipidemia, diabetes, overweight or obesity, alcohol consumption, and smoking were calculated by standardization. The multivariate logistic regression analysis was performed to explore the independent factors associated with the presence of CRFs. RESULTS: The age-standardized prevalence of hypertension, dyslipidemia, diabetes, overweight or obesity alcohol consumption, and smoking was 42.7%, 39.5%, 0.9%, 38.5%, 18.4% and 15.7%, respectively. The prevalence of clustering of at least one and at least two cardiovascular disease risk factors were 82.2% and 45.3% in total. There were differences in the aggregation of cardiovascular risk factors among different age, education, and income levels. There appeared higher clustering of at least one and at least two CRFs among adults with lower education level, higher income level and those elderly. CONCLUSIONS: Compared with other regions in China, a higher prevalence of CRFs exists among adults in Guangxi and several social-economic factors were associated with the presence of CRFs. These findings suggest that we should implement effective measures to control the CRFs, to reduce the risk of cardiovascular disease in adults.


Asunto(s)
Enfermedades Cardiovasculares , Diabetes Mellitus , Dislipidemias , Hipertensión , Adulto , Anciano , Humanos , Enfermedades Cardiovasculares/epidemiología , Enfermedades Cardiovasculares/etiología , China/epidemiología , Factores de Riesgo , Sobrepeso/complicaciones , Prevalencia , Estudios Transversales , Hipertensión/complicaciones , Obesidad/epidemiología , Obesidad/complicaciones , Análisis por Conglomerados , Diabetes Mellitus/epidemiología , Factores de Riesgo de Enfermedad Cardiaca , Dislipidemias/complicaciones
4.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3590-3603, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-35588415

RESUMEN

While convenient in daily life, face recognition technologies also raise privacy concerns for regular users on the social media since they could be used to analyze face images and videos, efficiently and surreptitiously without any security restrictions. In this paper, we investigate the face privacy protection from a technology standpoint based on a new type of customized cloak, which can be applied to all the images of a regular user, to prevent malicious face recognition systems from uncovering their identity. Specifically, we propose a new method, named one person one mask (OPOM), to generate person-specific (class-wise) universal masks by optimizing each training sample in the direction away from the feature subspace of the source identity. To make full use of the limited training images, we investigate several modeling methods, including affine hulls, class centers and convex hulls, to obtain a better description of the feature subspace of source identities. The effectiveness of the proposed method is evaluated on both common and celebrity datasets against black-box face recognition models with different loss functions and network architectures. In addition, we discuss the advantages and potential problems of the proposed method. In particular, we conduct an application study on the privacy protection of a video dataset, Sherlock, to demonstrate the potential practical usage of the proposed method.

5.
Artículo en Inglés | MEDLINE | ID: mdl-35834457

RESUMEN

Deep metric learning turns to be attractive in zero-shot image retrieval and clustering (ZSRC) task in which a good embedding/metric is requested such that the unseen classes can be distinguished well. Most existing works deem this "good" embedding just to be the discriminative one and race to devise the powerful metric objectives or the hard-sample mining strategies for learning discriminative deep metrics. However, in this article, we first emphasize that the generalization ability is also a core ingredient of this "good" metric and it largely affects the metric performance in zero-shot settings as a matter of fact. Then, we propose the confusion-based metric learning (CML) framework to explicitly optimize a robust metric. It is mainly achieved by introducing two interesting regularization terms, i.e., the energy confusion (EC) and diversity confusion (DC) terms. These terms daringly break away from the traditional deep metric learning idea of designing discriminative objectives and instead seek to "confuse" the learned model. These two confusion terms focus on local and global feature distribution confusions, respectively. We train these confusion terms together with the conventional deep metric objective in an adversarial manner. Although it seems weird to "confuse" the model learning, we show that our CML indeed serves as an efficient regularization framework for deep metric learning and it is applicable to various conventional metric methods. This article empirically and experimentally demonstrates the importance of learning an embedding/metric with good generalization, achieving the state-of-the-art performances on the popular CUB, CARS, Stanford Online Products, and In-Shop datasets for ZSRC tasks.

6.
IEEE Trans Image Process ; 31: 4909-4921, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35839179

RESUMEN

In many real-world applications, face recognition models often degenerate when training data (referred to as source domain) are different from testing data (referred to as target domain). To alleviate this mismatch caused by some factors like pose and skin tone, the utilization of pseudo-labels generated by clustering algorithms is an effective way in unsupervised domain adaptation. However, they always miss some hard positive samples. Supervision on pseudo-labeled samples attracts them towards their prototypes and would cause an intra-domain gap between pseudo-labeled samples and the remaining unlabeled samples within target domain, which results in the lack of discrimination in face recognition. In this paper, considering the particularity of face recognition, we propose a novel adversarial information network (AIN) to address it. First, a novel adversarial mutual information (MI) loss is proposed to alternately minimize MI with respect to the target classifier and maximize MI with respect to the feature extractor. By this min-max manner, the positions of target prototypes are adaptively modified which makes unlabeled images clustered more easily such that intra-domain gap can be mitigated. Second, to assist adversarial MI loss, we utilize a graph convolution network to predict linkage likelihoods between target data and generate pseudo-labels. It leverages valuable information in the context of nodes and can achieve more reliable results. The proposed method is evaluated under two scenarios, i.e., domain adaptation across poses and image conditions, and domain adaptation across faces with different skin tones. Extensive experiments show that AIN successfully improves cross-domain generalization and offers a new state-of-the-art on RFW dataset.


Asunto(s)
Reconocimiento Facial , Algoritmos , Servicios de Información
7.
IEEE Trans Image Process ; 31: 3137-3150, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35420984

RESUMEN

Oracle bone script is the earliest-known Chinese writing system of the Shang dynasty and is precious to archeology and philology. However, real-world scanned oracle data are rare and few experts are available for annotation which make the automatic recognition of scanned oracle characters become a challenging task. Therefore, we aim to explore unsupervised domain adaptation to transfer knowledge from handprinted oracle data, which are easy to acquire, to scanned domain. We propose a structure-texture separation network (STSN), which is an end-to-end learning framework for joint disentanglement, transformation, adaptation and recognition. First, STSN disentangles features into structure (glyph) and texture (noise) components by generative models, and then aligns handprinted and scanned data in structure feature space such that the negative influence caused by serious noises can be avoided when adapting. Second, transformation is achieved via swapping the learned textures across domains and a classifier for final classification is trained to predict the labels of the transformed scanned characters. This not only guarantees the absolute separation, but also enhances the discriminative ability of the learned features. Extensive experiments on Oracle-241 dataset show that STSN outperforms other adaptation methods and successfully improves recognition performance on scanned data even when they are contaminated by long burial and careless excavation.

8.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8433-8448, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-34383643

RESUMEN

Although deep face recognition has achieved impressive progress in recent years, controversy has arisen regarding discrimination based on skin tone, questioning their deployment into real-world scenarios. In this paper, we aim to systematically and scientifically study this bias from both data and algorithm aspects. First, using the dermatologist approved Fitzpatrick Skin Type classification system and Individual Typology Angle, we contribute a benchmark called Identity Shades (IDS) database, which effectively quantifies the degree of the bias with respect to skin tone in existing face recognition algorithms and commercial APIs. Further, we provide two skin-tone aware training datasets, called BUPT-Globalface dataset and BUPT-Balancedface dataset, to remove bias in training data. Finally, to mitigate the algorithmic bias, we propose a novel meta-learning algorithm, called Meta Balanced Network (MBN), which learns adaptive margins in large margin loss such that the model optimized by this loss can perform fairly across people with different skin tones. To determine the margins, our method optimizes a meta skewness loss on a clean and unbiased meta set and utilizes backward-on-backward automatic differentiation to perform a second order gradient descent step on the current margins. Extensive experiments show that MBN successfully mitigates bias and learns more balanced performance for people with different skin tones in face recognition. The proposed datasets are available at http://www.whdeng.cn/RFW/index.html.


Asunto(s)
Algoritmos , Reconocimiento Facial , Benchmarking , Bases de Datos Factuales , Humanos
9.
IEEE Trans Cybern ; 52(12): 12649-12660, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34197333

RESUMEN

In this article, we propose a simple yet effective approach, called point adversarial self mining (PASM), to improve the recognition accuracy in facial expression recognition (FER). Unlike previous works focusing on designing specific architectures or loss functions to solve this problem, PASM boosts the network capability by simulating human learning processes: providing updated learning materials and guidance from more capable teachers. Specifically, to generate new learning materials, PASM leverages a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task, generating harder learning samples to refine the network. The searched position is highly adaptive since it considers both the statistical information of each sample and the teacher network capability. Other than being provided new learning materials, the student network also receives guidance from the teacher network. After the student network finishes training, the student network changes its role and acts as a teacher, generating new learning materials and providing stronger guidance to train a better student network. The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively. Extensive experimental results validate the efficacy of our method over the existing state of the arts for FER.


Asunto(s)
Reconocimiento Facial , Humanos , Aprendizaje
10.
IEEE Trans Image Process ; 30: 2587-2598, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33417553

RESUMEN

Deep face recognition has achieved great success due to large-scale training databases and rapidly developing loss functions. The existing algorithms devote to realizing an ideal idea: minimizing the intra-class distance and maximizing the inter-class distance. However, they may neglect that there are also low quality training images which should not be optimized in this strict way. Considering the imperfection of training databases, we propose that intra-class and inter-class objectives can be optimized in a moderate way to mitigate overfitting problem, and further propose a novel loss function, named sigmoid-constrained hypersphere loss (SFace). Specifically, SFace imposes intra-class and inter-class constraints on a hypersphere manifold, which are controlled by two sigmoid gradient re-scale functions respectively. The sigmoid curves precisely re-scale the intra-class and inter-class gradients so that training samples can be optimized to some degree. Therefore, SFace can make a better balance between decreasing the intra-class distances for clean examples and preventing overfitting to the label noise, and contributes more robust deep face recognition models. Extensive experiments of models trained on CASIA-WebFace, VGGFace2, and MS-Celeb-1M databases, and evaluated on several face recognition benchmarks, such as LFW, MegaFace and IJB-C databases, have demonstrated the superiority of SFace.

11.
IEEE Trans Pattern Anal Mach Intell ; 41(3): 758-767, 2019 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-29994561

RESUMEN

A binary descriptor typically consists of three stages: image filtering, binarization, and spatial histogram. This paper first demonstrates that the binary code of the maximum-variance filtering responses leads to the lowest bit error rate under Gaussian noise. Then, an optimal eigenfilter bank is derived from a universal assumption on the local stationary random field. Finally, compressive binary patterns (CBP) is designed by replacing the local derivative filters of local binary patterns (LBP) with these novel random-field eigenfilters, which leads to a compact and robust binary descriptor that characterizes the most stable local structures that are resistant to image noise and degradation. A scattering-like operator is subsequently applied to enhance the distinctiveness of the descriptor. Surprisingly, the results obtained from experiments on the FERET, LFW, and PaSC databases show that the scattering CBP (SCBP) descriptor, which is handcrafted by only 6 optimal eigenfilters under restrictive assumptions, outperforms the state-of-the-art learning-based face descriptors in terms of both matching accuracy and robustness. In particular, on probe images degraded with noise, blur, JPEG compression, and reduced resolution, SCBP outperforms other descriptors by a greater than 10 percent accuracy margin.

12.
IEEE Trans Image Process ; 28(1): 356-370, 2019 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-30183631

RESUMEN

Facial expression is central to human experience, but most previous databases and studies are limited to posed facial behavior under controlled conditions. In this paper, we present a novel facial expression database, Real-world Affective Face Database (RAF-DB), which contains approximately 30 000 facial images with uncontrolled poses and illumination from thousands of individuals of diverse ages and races. During the crowdsourcing annotation, each image is independently labeled by approximately 40 annotators. An expectation-maximization algorithm is developed to reliably estimate the emotion labels, which reveals that real-world faces often express compound or even mixture emotions. A cross-database study between RAF-DB and CK+ database further indicates that the action units of real-world emotions are much more diverse than, or even deviate from, those of laboratory-controlled emotions. To address the recognition of multi-modal expressions in the wild, we propose a new deep locality-preserving convolutional neural network (DLP-CNN) method that aims to enhance the discriminative power of deep features by preserving the locality closeness while maximizing the inter-class scatter. Benchmark experiments on 7-class basic expressions and 11-class compound expressions, as well as additional experiments on CK+, MMI, and SFEW 2.0 databases, show that the proposed DLP-CNN outperforms the state-of-the-art handcrafted features and deep learning-based methods for expression recognition in the wild. To promote further study, we have made the RAF database, benchmarks, and descriptor encodings publicly available to the research community.

13.
IEEE Trans Image Process ; 27(12): 5813-5826, 2018 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-30040643

RESUMEN

One of the bottlenecks in acquiring a perfect database for deep learning is the tedious process of collecting and labeling data. In this paper, we propose a generative model trained with synthetic images rendered from 3D models which can reduce the burden on collecting real training data and make the background conditions more realistic. Our architecture is composed of two sub-networks: a semantic foreground object reconstruction network based on Bayesian inference and a classification network based on multi-triplet cost training for avoiding overfitting on the monotone synthetic object surface and utilizing accurate information of synthetic images like object poses and lighting conditions which are helpful for recognizing regular photos. First, our generative model with metric learning utilizes additional foreground object channels generated from semantic foreground object reconstruction sub-network for recognizing the original input images. Multi-triplet cost function based on poses is used for metric learning which makes it possible to train an effective categorical classifier purely based on synthetic data. Second, we design a coordinate training strategy with the help of adaptive noise applied on the inputs of both of the concatenated sub-networks to make them benefit from each other and avoid inharmonious parameter tuning due to different convergence speeds of two sub-networks. Our architecture achieves the state-of-the-art accuracy of 50.5% on the ShapeNet database with data migration obstacle from synthetic images to real images. This pipeline makes it applicable to do recognition on real images only based on 3D models. Our codes are available at https://github.com/wangyida/gm-cml.

14.
IEEE Trans Pattern Anal Mach Intell ; 40(10): 2513-2521, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-28976311

RESUMEN

Collaborative representation methods, such as sparse subspace clustering (SSC) and sparse representation-based classification (SRC), have achieved great success in face clustering and classification by directly utilizing the training images as the dictionary bases. In this paper, we reveal that the superior performance of collaborative representation relies heavily on the sufficiently large class separability of the controlled face datasets such as Extended Yale B. On the uncontrolled or undersampled dataset, however, collaborative representation suffers from the misleading coefficients of the incorrect classes. To address this limitation, inspired by the success of linear discriminant analysis (LDA), we develop a superposed linear representation classifier (SLRC) to cast the recognition problem by representing the test image in term of a superposition of the class centroids and the shared intra-class differences. In spite of its simplicity and approximation, the SLRC largely improves the generalization ability of collaborative representation, and competes well with more sophisticated dictionary learning techniques, on the experiments of AR and FRGC databases. Enforced with the sparsity constraint, SLRC achieves the state-of-the-art performance on FERET database using single sample per person.

15.
IEEE Trans Pattern Anal Mach Intell ; 34(9): 1864-70, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22813959

RESUMEN

Sparse Representation-Based Classification (SRC) is a face recognition breakthrough in recent years which has successfully addressed the recognition problem with sufficient training images of each gallery subject. In this paper, we extend SRC to applications where there are very few, or even a single, training images per subject. Assuming that the intraclass variations of one subject can be approximated by a sparse linear combination of those of other subjects, Extended Sparse Representation-Based Classifier (ESRC) applies an auxiliary intraclass variant dictionary to represent the possible variation between the training and testing images. The dictionary atoms typically represent intraclass sample differences computed from either the gallery faces themselves or the generic faces that are outside the gallery. Experimental results on the AR and FERET databases show that ESRC has better generalization ability than SRC for undersampled face recognition under variable expressions, illuminations, disguises, and ages. The superior results of ESRC suggest that if the dictionary is properly constructed, SRC algorithms can generalize well to the large-scale face recognition problem, even with a single training image per class.


Asunto(s)
Identificación Biométrica/métodos , Bases de Datos Factuales , Cara/anatomía & histología , Expresión Facial , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos , Femenino , Humanos , Iluminación , Masculino
16.
Science ; 321(5891): 912; author reply 912, 2008 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-18703725

RESUMEN

Jenkins and Burton (Brevia, 25 January 2008, p. 435) reported that image averaging increased the accuracy of the automatic face recognition to 100% and thus could be applied to photo-identification documents. We argue that the feasibility of image averaging on identification documents is not fully supported by the presented evidence.

17.
IEEE Trans Pattern Anal Mach Intell ; 30(8): 1503-4, 2008 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-18566503

RESUMEN

In [1], UDP is proposed to address the limitation of LPP for the clustering and classification tasks. In this communication, we show that the basic ideas of UDP and LPP are identical. In particular, UDP is just a simplified version of LPP on the assumption that the local density is uniform.


Asunto(s)
Inteligencia Artificial , Biometría/métodos , Cara/anatomía & histología , Mano/anatomía & histología , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Algoritmos , Simulación por Computador , Análisis Discriminante , Humanos , Modelos Biológicos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...