Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15477-15493, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37531306

RESUMEN

State-of-the-art (SOTA) Generative Models (GMs) can synthesize photo-realistic images that are hard for humans to distinguish from genuine photos. Identifying and understanding manipulated media are crucial to mitigate the social concerns on the potential misuse of GMs. We propose to perform reverse engineering of GMs to infer model hyperparameters from the images generated by these models. We define a novel problem, "model parsing", as estimating GM network architectures and training loss functions by examining their generated images - a task seemingly impossible for human beings. To tackle this problem, we propose a framework with two components: a Fingerprint Estimation Network (FEN), which estimates a GM fingerprint from a generated image by training with four constraints to encourage the fingerprint to have desired properties, and a Parsing Network (PN), which predicts network architecture and loss functions from the estimated fingerprints. To evaluate our approach, we collect a fake image dataset with 100 K images generated by 116 different GMs. Extensive experiments show encouraging results in parsing the hyperparameters of the unseen models. Finally, our fingerprint estimation can be leveraged for deepfake detection and image attribution, as we show by reporting SOTA results on both the deepfake detection (Celeb-DF) and image attribution benchmarks.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 9122-9134, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37022222

RESUMEN

We present a novel approach for disentangling the content of a text image from all aspects of its appearance. The appearance representation we derive can then be applied to new content, for one-shot transfer of the source style to new content. We learn this disentanglement in a self-supervised manner. Our method processes entire word boxes, without requiring segmentation of text from background, per-character processing, or making assumptions on string lengths. We show results in different text domains which were previously handled by specialized methods, e.g., scene text, handwritten text. To these ends, we make a number of technical contributions: (1) We disentangle the style and content of a textual image into a non-parametric, fixed-dimensional vector. (2) We propose a novel approach inspired by StyleGAN but conditioned over the example style at different resolution and content. (3) We present novel self-supervised training criteria which preserve both source style and target content using a pre-trained font classifier and text recognizer. Finally, (4) we also introduce Imgur5K, a new challenging dataset for handwritten word images. We offer numerous qualitative photo-realistic results of our method. We further show that our method surpasses previous work in quantitative tests on scene text and handwriting datasets, as well as in a user study.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 560-575, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-35471874

RESUMEN

We present Face Swapping GAN (FSGAN) for face swapping and reenactment. Unlike previous work, we offer a subject agnostic swapping scheme that can be applied to pairs of faces without requiring training on those faces. We derive a novel iterative deep learning-based approach for face reenactment which adjusts significant pose and expression variations that can be applied to a single image or a video sequence. For video sequences, we introduce a continuous interpolation of the face views based on reenactment, Delaunay Triangulation, and barycentric coordinates. Occluded face regions are handled by a face completion network. Finally, we use a face blending network for seamless blending of the two faces while preserving the target skin color and lighting conditions. This network uses a novel Poisson blending loss combining Poisson optimization with a perceptual loss. We compare our approach to existing state-of-the-art systems and show our results to be both qualitatively and quantitatively superior. This work describes extensions of the FSGAN method, proposed in an earlier conference version of our work (Nirkin et al. 2019), as well as additional experiments and results.

4.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6111-6121, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-34185639

RESUMEN

We propose a method for detecting face swapping and other identity manipulations in single images. Face swapping methods, such as DeepFake, manipulate the face region, aiming to adjust the face to the appearance of its context, while leaving the context unchanged. We show that this modus operandi produces discrepancies between the two regions (e.g., Fig. 1). These discrepancies offer exploitable telltale signs of manipulation. Our approach involves two networks: (i) a face identification network that considers the face region bounded by a tight semantic segmentation, and (ii) a context recognition network that considers the face context (e.g., hair, ears, neck). We describe a method which uses the recognition signals from our two networks to detect such discrepancies, providing a complementary detection signal that improves conventional real versus fake classifiers commonly used for detecting fake images. Our method achieves state of the art results on the FaceForensics++ and Celeb-DF-v2 benchmarks for face manipulation detection, and even generalizes to detect fakes produced by unseen methods.


Asunto(s)
Algoritmos , Cara , Cara/diagnóstico por imagen
5.
IEEE Trans Pattern Anal Mach Intell ; 41(2): 379-393, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-29994497

RESUMEN

We propose a method designed to push the frontiers of unconstrained face recognition in the wild with an emphasis on extreme out-of-plane pose variations. Existing methods either expect a single model to learn pose invariance by training on massive amounts of data or else normalize images by aligning faces to a single frontal pose. Contrary to these, our method is designed to explicitly tackle pose variations. Our proposed Pose-Aware Models (PAM) process a face image using several pose-specific, deep convolutional neural networks (CNN). 3D rendering is used to synthesize multiple face poses from input images to both train these models and to provide additional robustness to pose variations at test time. Our paper presents an extensive analysis of the IARPA Janus Benchmark A (IJB-A), evaluating the effects that landmark detection accuracy, CNN layer selection, and pose model selection all have on the performance of the recognition pipeline. It further provides comparative evaluations on IJB-A and the PIPA dataset. These tests show that our approach outperforms existing methods, even surprisingly matching the accuracy of methods that were specifically fine-tuned to the target dataset. Parts of this work previously appeared in [1] and [2].

6.
IEEE Trans Pattern Anal Mach Intell ; 40(12): 3067-3074, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-29990138

RESUMEN

This paper concerns the problem of facial landmark detection. We provide a unique new analysis of the features produced at intermediate layers of a convolutional neural network (CNN) trained to regress facial landmark coordinates. This analysis shows that while being processed by the CNN, face images can be partitioned in an unsupervised manner into subsets containing faces in similar poses (i.e., 3D views) and facial properties (e.g., presence or absence of eye-wear). Based on this finding, we describe a novel CNN architecture, specialized to regress the facial landmark coordinates of faces in specific poses and appearances. To address the shortage of training data, particularly in extreme profile poses, we additionally present data augmentation techniques designed to provide sufficient training examples for each of these specialized sub-networks. The proposed Tweaked CNN (TCNN) architecture is shown to outperform existing landmark detection methods in an extensive battery of tests on the AFW, ALFW, and 300W benchmarks. Finally, to promote reproducibility of our results, we make code and trained models publicly available through our project webpage.

7.
IEEE Trans Pattern Anal Mach Intell ; 39(7): 1431-1443, 2017 07.
Artículo en Inglés | MEDLINE | ID: mdl-27448341

RESUMEN

Scale invariant feature detectors often find stable scales in only a few image pixels. Consequently, methods for feature matching typically choose one of two extreme options: matching a sparse set of scale invariant features, or dense matching using arbitrary scales. In this paper, we turn our attention to the overwhelming majority of pixels, those where stable scales are not found by standard techniques. We ask, is scale-selection necessary for these pixels, when dense, scale-invariant matching is required and if so, how can it be achieved? We make the following contributions: (i) We show that features computed over different scales, even in low-contrast areas, can be different and selecting a single scale, arbitrarily or otherwise, may lead to poor matches when the images have different scales. (ii) We show that representing each pixel as a set of SIFTs, extracted at multiple scales, allows for far better matches than single-scale descriptors, but at a computational price. Finally, (iii) we demonstrate that each such set may be accurately represented by a low-dimensional, linear subspace. A subspace-to-point mapping may further be used to produce a novel descriptor representation, the Scale-Less SIFT (SLS), as an alternative to single-scale descriptors. These claims are verified by quantitative and qualitative tests, demonstrating significant improvements over existing methods. A preliminary version of this work appeared in [1] .

8.
J Exp Biol ; 219(Pt 11): 1608-17, 2016 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-26994179

RESUMEN

Using videography to extract quantitative data on animal movement and kinematics constitutes a major tool in biomechanics and behavioral ecology. Advanced recording technologies now enable acquisition of long video sequences encompassing sparse and unpredictable events. Although such events may be ecologically important, analysis of sparse data can be extremely time-consuming and potentially biased; data quality is often strongly dependent on the training level of the observer and subject to contamination by observer-dependent biases. These constraints often limit our ability to study animal performance and fitness. Using long videos of foraging fish larvae, we provide a framework for the automated detection of prey acquisition strikes, a behavior that is infrequent yet critical for larval survival. We compared the performance of four video descriptors and their combinations against manually identified feeding events. For our data, the best single descriptor provided a classification accuracy of 77-95% and detection accuracy of 88-98%, depending on fish species and size. Using a combination of descriptors improved the accuracy of classification by ∼2%, but did not improve detection accuracy. Our results indicate that the effort required by an expert to manually label videos can be greatly reduced to examining only the potential feeding detections in order to filter false detections. Thus, using automated descriptors reduces the amount of manual work needed to identify events of interest from weeks to hours, enabling the assembly of an unbiased large dataset of ecologically relevant behaviors.


Asunto(s)
Conducta Alimentaria/fisiología , Peces/fisiología , Estadística como Asunto/métodos , Grabación en Video , Animales , Automatización , Fenómenos Biomecánicos , Peces/crecimiento & desarrollo , Larva/fisiología , Estadios del Ciclo de Vida , Boca/fisiología , Análisis Espacio-Temporal , Factores de Tiempo
9.
IEEE Trans Pattern Anal Mach Intell ; 38(5): 875-88, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-26336115

RESUMEN

We seek a practical method for establishing dense correspondences between two images with similar content, but possibly different 3D scenes. One of the challenges in designing such a system is the local scale differences of objects appearing in the two images. Previous methods often considered only few image pixels; matching only pixels for which stable scales may be reliably estimated. Recently, others have considered dense correspondences, but with substantial costs associated with generating, storing and matching scale invariant descriptors. Our work is motivated by the observation that pixels in the image have contexts-the pixels around them-which may be exploited in order to reliably estimate local scales. We make the following contributions. (i) We show that scales estimated in sparse interest points may be propagated to neighboring pixels where this information cannot be reliably determined. Doing so allows scale invariant descriptors to be extracted anywhere in the image. (ii) We explore three means for propagating this information: using the scales at detected interest points, using the underlying image information to guide scale propagation in each image separately, and using both images together. Finally, (iii), we provide extensive qualitative and quantitative results, demonstrating that scale propagation allows for accurate dense correspondences to be obtained even between very different images, with little computational costs beyond those required by existing methods.

10.
J Atten Disord ; 18(7): 585-93, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-22628144

RESUMEN

OBJECTIVE: Knowing how adults with ADHD interact with prerecorded video lessons at home may provide a novel means of early screening and long-term monitoring for ADHD. METHOD: Viewing patterns of 484 students with known ADHD were compared with 484 age, gender, and academically matched controls chosen from 8,699 non-ADHD students. Transcripts generated by their video playback software were analyzed using t tests and regression analysis. RESULTS: ADHD students displayed significant tendencies (p ≤ .05) to watch videos with more pauses and more reviews of previously watched parts. Other parameters showed similar tendencies. Regression analysis indicated that attentional deficits remained constant for age and gender but varied for learning experience. CONCLUSION: There were measurable and significant differences between the video-viewing habits of the ADHD and non-ADHD students. This provides a new perspective on how adults cope with attention deficits and suggests a novel means of early screening for ADHD.


Asunto(s)
Trastorno por Déficit de Atención con Hiperactividad/psicología , Atención , Instrucción por Computador/métodos , Grabación de Cinta de Video , Adolescente , Adulto , Estudios de Casos y Controles , Femenino , Humanos , Aprendizaje , Masculino , Tamizaje Masivo/métodos , Persona de Mediana Edad , Análisis de Regresión , Adulto Joven
11.
IEEE Trans Pattern Anal Mach Intell ; 34(3): 615-21, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-22262724

RESUMEN

Recognizing actions in videos is rapidly becoming a topic of much research. To facilitate the development of methods for action recognition, several video collections, along with benchmark protocols, have previously been proposed. In this paper, we present a novel video database, the "Action Similarity LAbeliNg" (ASLAN) database, along with benchmark protocols. The ASLAN set includes thousands of videos collected from the web, in over 400 complex action classes. Our benchmark protocols focus on action similarity (same/not-same), rather than action classification, and testing is performed on never-before-seen actions. We propose this data set and benchmark as a means for gaining a more principled understanding of what makes actions different or similar, rather than learning the properties of particular action classes. We present baseline results on our benchmark, and compare them to human performance. To promote further study of action similarity techniques, we make the ASLAN database, benchmarks, and descriptor encodings publicly available to the research community.


Asunto(s)
Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Algoritmos , Benchmarking/métodos , Bases de Datos Factuales , Humanos , Grabación en Video
12.
IEEE Trans Pattern Anal Mach Intell ; 33(2): 266-78, 2011 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-20513927

RESUMEN

Subspaces offer convenient means of representing information in many pattern recognition, machine vision, and statistical learning applications. Contrary to the growing popularity of subspace representations, the problem of efficiently searching through large subspace databases has received little attention in the past. In this paper, we present a general solution to the problem of Approximate Nearest Subspace search. Our solution uniformly handles cases where the queries are points or subspaces, where query and database elements differ in dimensionality, and where the database contains subspaces of different dimensions. To this end, we present a simple mapping from subspaces to points, thus reducing the problem to the well-studied Approximate Nearest Neighbor problem on points. We provide theoretical proofs of correctness and error bounds of our construction and demonstrate its capabilities on synthetic and real data. Our experiments indicate that an approximate nearest subspace can be located significantly faster than the nearest subspace, with little loss of accuracy.

13.
IEEE Trans Pattern Anal Mach Intell ; 33(10): 1978-90, 2011 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-21173442

RESUMEN

Computer vision systems have demonstrated considerable improvement in recognizing and verifying faces in digital images. Still, recognizing faces appearing in unconstrained, natural conditions remains a challenging task. In this paper, we present a face-image, pair-matching approach primarily developed and tested on the "Labeled Faces in the Wild" (LFW) benchmark that reflects the challenges of face recognition from unconstrained images. The approach we propose makes the following contributions. 1) We present a family of novel face-image descriptors designed to capture statistics of local patch similarities. 2) We demonstrate how unlabeled background samples may be used to better evaluate image similarities. To this end, we describe a number of novel, effective similarity measures. 3) We show how labeled background samples, when available, may further improve classification performance, by employing a unique pair-matching pipeline. We present state-of-the-art results on the LFW pair-matching benchmarks. In addition, we show our system to be well suited for multilabel face classification (recognition) problem, on both the LFW images and on images from the laboratory controlled multi-PIE database.


Asunto(s)
Identificación Biométrica/métodos , Cara/anatomía & histología , Procesamiento de Imagen Asistido por Computador/métodos , Gestos , Humanos , Curva ROC
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...