Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
IEEE Trans Image Process ; 32: 6102-6114, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37883291

RESUMO

While adversarial training and its variants have shown to be the most effective algorithms to defend against adversarial attacks, their extremely slow training process makes it hard to scale to large datasets like ImageNet. The key idea of recent works to accelerate adversarial training is to substitute multi-step attacks (e.g., PGD) with single-step attacks (e.g., FGSM). However, these single-step methods suffer from catastrophic overfitting, where the accuracy against PGD attack suddenly drops to nearly 0% during training, and the network totally loses its robustness. In this work, we study the phenomenon from the perspective of training instances. We show that catastrophic overfitting is instance-dependent, and fitting instances with larger input gradient norm is more likely to cause catastrophic overfitting. Based on our findings, we propose a simple but effective method, Adversarial Training with Adaptive Step size (ATAS). ATAS learns an instance-wise adaptive step size that is inversely proportional to its gradient norm. Our theoretical analysis shows that ATAS converges faster than the commonly adopted non-adaptive counterparts. Empirically, ATAS consistently mitigates catastrophic overfitting and achieves higher robust accuracy on CIFAR10, CIFAR100, and ImageNet when evaluated on various adversarial budgets. Our code is released at https://github.com/HuangZhiChao95/ATAS.

2.
IEEE Trans Neural Netw Learn Syst ; 34(6): 3146-3160, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34699369

RESUMO

Training certifiable neural networks enables us to obtain models with robustness guarantees against adversarial attacks. In this work, we introduce a framework to obtain a provable adversarial-free region in the neighborhood of the input data by a polyhedral envelope, which yields more fine-grained certified robustness than existing methods. We further introduce polyhedral envelope regularization (PER) to encourage larger adversarial-free regions and thus improve the provable robustness of the models. We demonstrate the flexibility and effectiveness of our framework on standard benchmarks; it applies to networks of different architectures and with general activation functions. Compared with state of the art, PER has negligible computational overhead; it achieves better robustness guarantees and accuracy on the clean data in various settings.

3.
IEEE Trans Image Process ; 31: 1628-1640, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35081026

RESUMO

Classic image-restoration algorithms use a variety of priors, either implicitly or explicitly. Their priors are hand-designed and their corresponding weights are heuristically assigned. Hence, deep learning methods often produce superior image restoration quality. Deep networks are, however, capable of inducing strong and hardly predictable hallucinations. Networks implicitly learn to be jointly faithful to the observed data while learning an image prior; and the separation of original data and hallucinated data downstream is then not possible. This limits their wide-spread adoption in image restoration. Furthermore, it is often the hallucinated part that is victim to degradation-model overfitting. We present an approach with decoupled network-prior based hallucination and data fidelity terms. We refer to our framework as the Bayesian Integration of a Generative Prior (BIGPrior). Our method is rooted in a Bayesian framework and tightly connected to classic restoration methods. In fact, it can be viewed as a generalization of a large family of classic restoration algorithms. We use network inversion to extract image prior information from a generative network. We show that, on image colorization, inpainting and denoising, our framework consistently improves the inversion results. Our method, though partly reliant on the quality of the generative network inversion, is competitive with state-of-the-art supervised and task-specific restoration methods. It also provides an additional metric that sets forth the degree of prior reliance per pixel relative to data fidelity.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Teorema de Bayes , Alucinações/diagnóstico por imagem , Humanos
4.
Artigo em Inglês | MEDLINE | ID: mdl-32149690

RESUMO

Blind and universal image denoising consists of using a unique model that denoises images with any level of noise. It is especially practical as noise levels do not need to be known when the model is developed or at test time. We propose a theoretically-grounded blind and universal deep learning image denoiser for additive Gaussian noise removal. Our network is based on an optimal denoising solution, which we call fusion denoising. It is derived theoretically with a Gaussian image prior assumption. Synthetic experiments show our network's generalization strength to unseen additive noise levels. We also adapt the fusion denoising network architecture for image denoising on real images. Our approach improves real-world grayscale additive image denoising PSNR results for training noise levels and further on noise levels not seen during training. It also improves state-of-the-art color image denoising performance on every single noise level, by an average of 0.1dB, whether trained on or not.

5.
IEEE Trans Image Process ; 28(12): 6185-6197, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31265393

RESUMO

Removing reflection artefacts from a single image is a problem of both theoretical and practical interest, which still presents challenges because of the massively ill-posed nature of the problem. In this paper, we propose a technique based on a novel optimization problem. First, we introduce a simple user interaction scheme, which helps minimize information loss in the reflection-free regions. Second, we introduce an H2 fidelity term, which preserves fine detail while enforcing the global color similarity. We show that this combination allows us to mitigate the shortcomings in structure and color preservation, which presents some of the most prominent drawbacks in the existing methods for reflection removal. We demonstrate, through numerical and visual experiments, that our method is able to outperform the state-of-the-art model-based methods and compete with recent deep-learning approaches.

6.
J Opt Soc Am A Opt Image Sci Vis ; 34(5): 743-751, 2017 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-28463318

RESUMO

We present a method for hiding images in synthetic videos and reveal them by temporal averaging. The main challenge is to develop a visual masking method that hides the input image both spatially and temporally. Our masking approach consists of temporal and spatial pixel by pixel variations of the frequency band coefficients representing the image to be hidden. These variations ensure that the target image remains invisible both in the spatial and the temporal domains. In addition, by applying a temporal masking function derived from a dither matrix, we allow the video to carry a visible message that is different from the hidden image. The image hidden in the video can be revealed by software averaging, or with a camera, by long-exposure photography. The presented work may find applications in the secure transmission of digital information.

7.
IEEE Trans Image Process ; 25(4): 1660-73, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26886992

RESUMO

Real camera systems have a limited depth of field (DOF) which may cause an image to be degraded due to visible misfocus or too shallow DOF. In this paper, we present a blind deblurring pipeline able to restore such images by slightly extending their DOF and recovering sharpness in regions slightly out of focus. To address this severely ill-posed problem, our algorithm relies first on the estimation of the spatially varying defocus blur. Drawing on local frequency image features, a machine learning approach based on the recently introduced regression tree fields is used to train a model able to regress a coherent defocus blur map of the image, labeling each pixel by the scale of a defocus point spread function. A non-blind spatially varying deblurring algorithm is then used to properly extend the DOF of the image. The good performance of our algorithm is assessed both quantitatively, using realistic ground truth data obtained with a novel approach based on a plenoptic camera, and qualitatively with real images.

8.
IEEE Trans Pattern Anal Mach Intell ; 36(8): 1672-8, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26353346

RESUMO

We present a method to automatically detect shadows in a fast and accurate manner by taking advantage of the inherent sensitivity of digital camera sensors to the near-infrared (NIR) part of the spectrum. Dark objects, which confound many shadow detection algorithms, often have much higher reflectance in the NIR. We can thus build an accurate shadow candidate map based on image pixels that are dark both in the visible and NIR representations. We further refine the shadow map by incorporating ratios of the visible to the NIR image, based on the observation that commonly encountered light sources have very distinct spectra in the NIR band. The results are validated on a new database, which contains visible/NIR images for a large variety of real-world shadow creating illuminant conditions, as well as manually labeled shadow ground truth. Both quantitative and qualitative evaluations show that our method outperforms current state-of-the-art shadow detection algorithms in terms of accuracy and computational efficiency.

9.
J Opt Soc Am A Opt Image Sci Vis ; 29(7): 1199-210, 2012 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-22751384

RESUMO

There are many works in color that assume illumination change can be modeled by multiplying sensor responses by individual scaling factors. The early research in this area is sometimes grouped under the heading "von Kries adaptation": the scaling factors are applied to the cone responses. In more recent studies, both in psychophysics and in computational analysis, it has been proposed that scaling factors should be applied to linear combinations of the cones that have narrower support: they should be applied to the so-called "sharp sensors." In this paper, we generalize the computational approach to spectral sharpening in three important ways. First, we introduce spherical sampling as a tool that allows us to enumerate in a principled way all linear combinations of the cones. This allows us to, second, find the optimal sharp sensors that minimize a variety of error measures including CIE Delta E (previous work on spectral sharpening minimized RMS) and color ratio stability. Lastly, we extend the spherical sampling paradigm to the multispectral case. Here the objective is to model the interaction of light and surface in terms of color signal spectra. Spherical sampling is shown to improve on the state of the art.

10.
IEEE Trans Pattern Anal Mach Intell ; 34(11): 2274-82, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22641706

RESUMO

Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art superpixel algorithms for their ability to adhere to image boundaries, speed, memory efficiency, and their impact on segmentation performance. We then introduce a new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels. Despite its simplicity, SLIC adheres to boundaries as well as or better than previous methods. At the same time, it is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation.


Assuntos
Algoritmos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
11.
IEEE Trans Pattern Anal Mach Intell ; 34(12): 2289-302, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22371428

RESUMO

We present a study of in-camera image processing through an extensive analysis of more than 10,000 images from over 30 cameras. The goal of this work is to investigate if image values can be transformed to physically meaningful values, and if so, when and how this can be done. From our analysis, we found a major limitation of the imaging model employed in conventional radiometric calibration methods and propose a new in-camera imaging model that fits well with today's cameras. With the new model, we present associated calibration procedures that allow us to convert sRGB images back to their original CCD RAW responses in a manner that is significantly more accurate than any existing methods. Additionally, we show how this new imaging model can be used to build an image correction application that converts an sRGB input image captured with the wrong camera settings to an sRGB output image that would have been recorded under the correct settings of a specific camera.

12.
IEEE Trans Image Process ; 17(1): 42-52, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18229803

RESUMO

Videos representing flames, water, smoke, etc., are often defined as dynamic textures: "textures" because they are characterized by the redundant repetition of a pattern and "dynamic" because this repetition is also in time and not only in space. Dynamic textures have been modeled as linear dynamic systems by unfolding the video frames into column vectors and describing their trajectory as time evolves. After the projection of the vectors onto a lower dimensional space by a singular value decomposition (SVD), the trajectory is modeled using system identification techniques. Synthesis is obtained by driving the system with random noise. In this paper, we show that the standard SVD can be replaced by a higher order SVD (HOSVD), originally known as Tucker decomposition. HOSVD decomposes the dynamic texture as a multidimensional signal (tensor) without unfolding the video frames on column vectors. This is a more natural and flexible decomposition, since it permits us to perform dimension reduction in the spatial, temporal, and chromatic domain, while standard SVD allows for temporal reduction only. We show that for a comparable synthesis quality, the HOSVD approach requires, on average, five times less parameters than the standard SVD approach. The analysis part is more expensive, but the synthesis has the same cost as existing algorithms. Our technique is, thus, well suited to dynamic texture synthesis on devices limited by memory and computational power, such as PDAs or mobile phones.


Assuntos
Inteligência Artificial , Compressão de Dados/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador , Gravação em Vídeo/métodos , Algoritmos , Modelos Biológicos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
13.
J Opt Soc Am A Opt Image Sci Vis ; 24(9): 2807-16, 2007 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-17767249

RESUMO

We present a tone mapping algorithm that is derived from a model of retinal processing. Our approach has two major improvements over existing methods. First, tone mapping is applied directly on the mosaic image captured by the sensor, analogous to the human visual system that applies a nonlinearity to the chromatic responses captured by the cone mosaic. This reduces the number of necessary operations by a factor 3. Second, we introduce a variation of the center/surround class of local tone mapping algorithms, which are known to increase the local contrast of images but tend to create artifacts. Our method gives a good improvement in contrast while avoiding halos and maintaining good global appearance. Like traditional center/surround algorithms, our method uses a weighted average of surrounding pixel values. Instead of being used directly, the weighted average serves as a variable in the Naka-Rushton equation, which models the photoreceptors' nonlinearity. Our algorithm provides pleasing results on various images with different scene content and dynamic range.

14.
Vis Neurosci ; 23(3-4): 591-6, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16962001

RESUMO

We tested whether motion and configural complexity affect perceived transparency. A series of five coherent chromatic transformations in color space was applied across a figure: translation, convergence, shear, divergence and rotation. The stimuli consisted of a bipartite or a checkerboard configuration (10 x 10 degrees), with a central static or moving overlay (5 x 5 degrees). Three different luminance conditions (the plane of chromatic transformation oriented toward higher, lower, or equal luminances) were also tested for each of three modulation depths. For each stimulus, the observer judged whether the overlay appeared transparent or not. The main results indicated an interaction between the type of chromatic transformation and stimulus motion and complexity. For example, convergences are judged to appear transparent significantly more often when motion is added for bipartite configurations, or when they are generated in a checkerboard configuration. Surprisingly, shears that have been reported to appear opaque, are more frequently reported to appear transparent with short vector lengths and when combined with motion. Other transformations are also affected by motion, although the effectiveness of figural complexity on transparency seems to depend on both the type of color shifts and the presence of motion. The results indicate that adding motion and stimulus complexity are not necessarily neutral with respect to the chromatic shifts evoking transparency. Thus, studies that have used motion to enhance transparency may yield different results about the color shifts supporting transparency perception from those that did not. The same might be supposed for stimulus complexity under some conditions.


Assuntos
Percepção de Cores/fisiologia , Percepção de Movimento/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Testes de Percepção de Cores/métodos , Feminino , Humanos , Masculino , Movimento (Física) , Estimulação Luminosa/métodos
15.
IEEE Trans Image Process ; 15(9): 2820-30, 2006 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-16948325

RESUMO

We propose a new method to render high dynamic range images that models global and local adaptation of the human visual system. Our method is based on the center-surround Retinex model. The novelties of our method is first to use an adaptive filter, whose shape follows the image high-contrast edges, thus reducing halo artifacts common to other methods. Second, only the luminance channel is processed, which is defined by the first component of a principal component analysis. Principal component analysis provides orthogonality between channels and thus reduces the chromatic changes caused by the modification of luminance. We show that our method efficiently renders high dynamic range images and we compare our results with the current state of the art.


Assuntos
Algoritmos , Colorimetria/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Fotografação/métodos , Biomimética/métodos , Humanos , Armazenamento e Recuperação da Informação/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Percepção Visual/fisiologia
16.
IEEE Trans Image Process ; 14(4): 439-49, 2005 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15825479

RESUMO

There is an analogy between single-chip color cameras and the human visual system in that these two systems acquire only one limited wavelength sensitivity band per spatial location. We have exploited this analogy, defining a model that characterizes a one-color per spatial position image as a coding into luminance and chrominance of the corresponding three colors per spatial position image. Luminance is defined with full spatial resolution while chrominance contains subsampled opponent colors. Moreover, luminance and chrominance follow a particular arrangement in the Fourier domain, allowing for demosaicing by spatial frequency filtering. This model shows that visual artifacts after demosaicing are due to aliasing between luminance and chrominance and could be solved using a preprocessing filter. This approach also gives new insights for the representation of single-color per spatial location images and enables formal and controllable procedures to design demosaicing algorithms that perform well compared to concurrent approaches, as demonstrated by experiments.


Assuntos
Algoritmos , Biomimética/métodos , Colorimetria/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Processamento de Sinais Assistido por Computador , Inteligência Artificial , Cor , Gráficos por Computador , Humanos , Análise Numérica Assistida por Computador , Reconhecimento Automatizado de Padrão/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Visão Ocular/fisiologia
17.
IEEE Trans Pattern Anal Mach Intell ; 26(12): 1645-9, 2004 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-15573825

RESUMO

For certain databases and classification tasks, analyzing images based region features instead of image features results in more accurate classifications. We introduce eigenregions, which are geometrical features that encompass area, location, and shape properties of an image region, even if the region is spatially incoherent. Eigenregions are calculated using principal component analysis (PCA). On a database of 77,000 different regions obtained through the segmentation of 13,500 real-scene photographic images taken by nonprofessionals, eigenregions improved the detection of localized image classes by a noticeable amount. Additionally, eigenregions allow us to prove that the largest variance in natural image region geometry is due to its area and not to shape or position.

18.
Vis Neurosci ; 21(3): 291-4, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15518202

RESUMO

The spectral properties of chromatic-detection mechanisms were investigated using a noise-masking paradigm. Contrast-detection thresholds were measured for a signal with a Gaussian spatial profile, modulated in the equiluminant plane in the presence of spatial chromatic noise. The noise was distributed within a sector in the equiluminant plane, centered on the signal direction. Each stimulus consisted of two adjacent fields, one of which contained the signal, separated horizontally by a gap with the same average chromaticity as the uniform background. Observers were asked to judge on which side of the central fixation point the signal was displayed via a two-alternative, forced-choice (2AFC) paradigm. Contrast thresholds were measured for four color directions and three sector widths at increasing levels of the average energy of the axial component of the noise. Results show that contrast thresholds are unaffected by the width of the noise sector, as previously found for temporally modulated stimuli (D'Zmura & Knoblauch, 1998). The results are consistent with the existence of spectrally broadband linear-detection mechanisms tuned to the signal color direction and support the hypothesis of the existence of higher-order color mechanisms with sensitivities tuned to intermediate directions in color space.


Assuntos
Percepção de Cores/fisiologia , Sensibilidades de Contraste , Estimulação Acústica , Adulto , Humanos , Ruído , Reprodutibilidade dos Testes , Limiar Sensorial
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA