Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-38564348

ABSTRACT

Transformer based methods have achieved great success in image inpainting recently. However, we find that these solutions regard each pixel as a token, thus suffering from an information loss issue from two aspects: 1) They downsample the input image into much lower resolutions for efficiency consideration. 2) They quantize 2563 RGB values to a small number (such as 512) of quantized color values. The indices of quantized pixels are used as tokens for the inputs and prediction targets of the transformer. To mitigate these issues, we propose a new transformer based framework called "PUT". Specifically, to avoid input downsampling while maintaining computation efficiency, we design a patch-based auto-encoder P-VQVAE. The encoder converts the masked image into non-overlapped patch tokens and the decoder recovers the masked regions from the inpainted tokens while keeping the unmasked regions unchanged. To eliminate the information loss caused by input quantization, an Un-quantized Transformer is applied. It directly takes features from the P-VQVAE encoder as input without any quantization and only regards the quantized tokens as prediction targets.Furthermore, to make the inpainting process more controllable, we introduce semantic and structural conditions as extra guidance. Extensive experiments show that our method greatly outperforms existing transformer based methods on image fidelity and achieves much higher diversity and better fidelity than state-of-the-art pluralistic inpainting methods on complex large-scale datasets (e.g., ImageNet). Codes are available at https://github.com/liuqk3/PUT.

2.
Article in English | MEDLINE | ID: mdl-38526903

ABSTRACT

The intellectual property of deep networks can be easily "stolen" by surrogate model attack. There has been significant progress in protecting the model IP in classification tasks. However, little attention has been devoted to the protection of image processing models. By utilizing consistent invisible spatial watermarks, the work [1] first considered model watermarking for deep image processing networks and demonstrated its efficacy in many downstream tasks. Its success depends on the hypothesis that if a consistent watermark exists in all prediction outputs, that watermark will be learned into the attacker's surrogate model. However, when the attacker uses common data augmentation attacks (e.g., rotate, crop, and resize) during surrogate model training, it will fail because the underlying watermark consistency is destroyed. To mitigate this issue, we propose a new watermarking methodology, "structure consistency", based on which a new deep structure-aligned model watermarking algorithm is designed. Specifically, the embedded watermarks are designed to be aligned with physically consistent image structures, such as edges or semantic regions. Experiments demonstrate that our method is more robust than the baseline in resisting data augmentation attacks. Besides that, we test the generalization ability and robustness of our method to a broader range of adaptive attacks.

3.
IEEE Trans Image Process ; 33: 2183-2196, 2024.
Article in English | MEDLINE | ID: mdl-38451765

ABSTRACT

Notwithstanding the prominent performance shown in various applications, point cloud recognition models have often suffered from natural corruptions and adversarial perturbations. In this paper, we delve into boosting the general robustness of point cloud recognition, proposing Point-Cloud Contrastive Adversarial Training (PointCAT). The main intuition of PointCAT is encouraging the target recognition model to narrow the decision gap between clean point clouds and corrupted point clouds by devising feature-level constraints rather than logit-level constraints. Specifically, we leverage a supervised contrastive loss to facilitate the alignment and the uniformity of hypersphere representations, and design a pair of centralizing losses with dynamic prototype guidance to prevent features from deviating outside their belonging category clusters. To generate more challenging corrupted point clouds, we adversarially train a noise generator concurrently with the recognition model from the scratch. This differs from previous adversarial training methods that utilized gradient-based attacks as the inner loop. Comprehensive experiments show that the proposed PointCAT outperforms the baseline methods, significantly enhancing the robustness of diverse point cloud recognition models under various corruptions, including isotropic point noises, the LiDAR simulated noises, random point dropping, and adversarial perturbations. Our code is available at: https://github.com/shikiw/PointCAT.

4.
IEEE Trans Image Process ; 33: 1683-1698, 2024.
Article in English | MEDLINE | ID: mdl-38416621

ABSTRACT

Image restoration under adverse weather conditions (e.g., rain, snow, and haze) is a fundamental computer vision problem that has important implications for various downstream applications. Distinct from early methods that are specially designed for specific types of weather, recent works tend to simultaneously remove various adverse weather effects based on either spatial feature representation learning or semantic information embedding. Inspired by various successful applications incorporating large-scale pre-trained models (e.g., CLIP), in this paper, we explore their potential benefits for leveraging large-scale pre-trained models in this task based on both spatial feature representation learning and semantic information embedding aspects: 1) spatial feature representation learning, we design a Spatially Adaptive Residual (SAR) encoder to adaptively extract degraded areas. To facilitate training of this model, we propose a Soft Residual Distillation (CLIP-SRD) strategy to transfer spatial knowledge from CLIP between clean and adverse weather images; 2) semantic information embedding, we propose a CLIP Weather Prior (CWP) embedding module to enable the network to adaptively respond to different weather conditions. This module integrates the sample-specific weather priors extracted by the CLIP image encoder with the distribution-specific information (as learned by a set of parameters) and embeds these elements using a cross-attention mechanism. Extensive experiments demonstrate that our proposed method can achieve state-of-the-art performance under various and severe adverse weather conditions. The code will be made available.

5.
J Imaging ; 10(1)2024 Jan 11.
Article in English | MEDLINE | ID: mdl-38249006

ABSTRACT

Face swapping is an intriguing and intricate task in the field of computer vision. Currently, most mainstream face swapping methods employ face recognition models to extract identity features and inject them into the generation process. Nonetheless, such methods often struggle to effectively transfer identity information, which leads to generated results failing to achieve a high identity similarity to the source face. Furthermore, if we can accurately disentangle identity information, we can achieve controllable face swapping, thereby providing more choices to users. In pursuit of this goal, we propose a new face swapping framework (ControlFace) based on the disentanglement of identity information. We disentangle the structure and texture of the source face, encoding and characterizing them in the form of feature embeddings separately. According to the semantic level of each feature representation, we inject them into the corresponding feature mapper and fuse them adequately in the latent space of StyleGAN. Owing to such disentanglement of structure and texture, we are able to controllably transfer parts of the identity features. Extensive experiments and comparisons with state-of-the-art face swapping methods demonstrate the superiority of our face swapping framework in terms of transferring identity information, producing high-quality face images, and controllable face swapping.

6.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 881-895, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37871095

ABSTRACT

Image matting is a fundamental and challenging problem in computer vision and graphics. Most existing matting methods leverage a user-supplied trimap as an auxiliary input to produce good alpha matte. However, obtaining high-quality trimap itself is arduous. Recently, some hint-free methods have emerged, however, the matting quality is still far behind the trimap-based methods. The main reason is that, some hints for removing semantic ambiguity and improving matting quality are essential. Apparently, there is a trade-off between interaction cost and matting quality. To balance performance and user-friendliness, we propose an improved deep image matting framework which is trimap-free and only needs sparse user click or scribble interaction to minimize the needed auxiliary constraints while still allowing interactivity. Moreover, we introduce uncertainty estimation that predicts which parts need polishing and conduct uncertainty-guided refinement. To trade off runtime against refinement quality, users can also choose different refinement modes. Experimental results show that our method performs better than existing trimap-free methods and comparably to state-of-the-art trimap-based methods with minimal user effort. Finally, we demonstrate the extensibility of our framework to video human matting without any structure modification, by adding optical flow-based sparse hint propagation and temporal consistency regularization imposed on the single frame.

7.
Opt Express ; 31(16): 26301-26313, 2023 Jul 31.
Article in English | MEDLINE | ID: mdl-37710493

ABSTRACT

We have developed a simple time-bin phase encoding quantum key distribution system, using the optical injection locking technique. This setup incorporates both the merits of simplicity and stability in encoding, and immunity to channel disturbance. We have demonstrated the field implementation of quantum key distribution over long-distance deployed aerial fiber automatically. During the 70-day field test, we achieved approximately a 1.0 kbps secure key rate with stable performance. Our work takes an important step toward widespread implementation of QKD systems in diverse and complex real-life scenarios.

8.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6247-6264, 2023 May.
Article in English | MEDLINE | ID: mdl-36166518

ABSTRACT

Semantic image synthesis, translating semantic layouts to photo-realistic images, is a one-to-many mapping problem. Though impressive progress has been recently made, diverse semantic synthesis that can efficiently produce semantic-level or even instance-level multimodal results, still remains a challenge. In this article, we propose a novel diverse semantic image synthesis framework from the perspective of semantic class distributions, which naturally supports diverse generation at both semantics and instance level. We achieve this by modeling class-level conditional modulation parameters as continuous probability distributions instead of discrete values, and sampling per-instance modulation parameters through instance-adaptive stochastic sampling that is consistent across the network. Moreover, we propose prior noise remapping, through linear perturbation parameters encoded from paired references, to facilitate supervised training and exemplar-based instance style control at test time. To further extend the user interaction function of the proposed method, we also introduce sketches into the network. In addition, specially designed generator modules, Progressive Growing Module and Multi-Scale Refinement Module, can be used as a general module to improve the performance of complex scene generation. Extensive experiments on multiple datasets show that our method can achieve superior diversity and comparable quality compared to state-of-the-art methods. Codes are available at https://github.com/tzt101/INADE.git.

9.
IEEE Trans Image Process ; 31: 5691-5705, 2022.
Article in English | MEDLINE | ID: mdl-36040942

ABSTRACT

Recent research shows deep neural networks are vulnerable to different types of attacks, such as adversarial attacks, data poisoning attacks, and backdoor attacks. Among them, backdoor attacks are the most cunning and can occur in almost every stage of the deep learning pipeline. Backdoor attacks have attracted lots of interest from both academia and industry. However, most existing backdoor attack methods are visible or fragile to some effortless pre-processing such as common data transformations. To address these limitations, we propose a robust and invisible backdoor attack called "Poison Ink". Concretely, we first leverage the image structures as target poisoning areas and fill them with poison ink (information) to generate the trigger pattern. As the image structure can keep its semantic meaning during the data transformation, such a trigger pattern is inherently robust to data transformations. Then we leverage a deep injection network to embed such input-aware trigger pattern into the cover image to achieve stealthiness. Compared to existing popular backdoor attack methods, Poison Ink outperforms both in stealthiness and robustness. Through extensive experiments, we demonstrate that Poison Ink is not only general to different datasets and network architectures but also flexible for different attack scenarios. Besides, it also has very strong resistance against many state-of-the-art defense techniques.


Subject(s)
Poisons , Ink , Neural Networks, Computer , Semantics
10.
IEEE Trans Image Process ; 31: 3267-3280, 2022.
Article in English | MEDLINE | ID: mdl-35439133

ABSTRACT

This paper studies the problem of StyleGAN inversion, which plays an essential role in enabling the pretrained StyleGAN to be used for real image editing tasks. The goal of StyleGAN inversion is to find the exact latent code of the given image in the latent space of StyleGAN. This problem has a high demand for quality and efficiency. Existing optimization-based methods can produce high-quality results, but the optimization often takes a long time. On the contrary, forward-based methods are usually faster but the quality of their results is inferior. In this paper, we present a new feed-forward network "E2Style" for StyleGAN inversion, with significant improvement in terms of efficiency and effectiveness. In our inversion network, we introduce: 1) a shallower backbone with multiple efficient heads across scales; 2) multi-layer identity loss and multi-layer face parsing loss to the loss function; and 3) multi-stage refinement. Combining these designs together forms an effective and efficient method that exploits all benefits of optimization-based and forward-based methods. Quantitative and qualitative results show that our E2Style performs better than existing forward-based methods and comparably to state-of-the-art optimization-based methods while maintaining the high efficiency as well as forward-based methods. Moreover, a number of real image editing applications demonstrate the efficacy of our E2Style. Our code is available at https://github.com/wty-ustc/e2style.

11.
IEEE Trans Vis Comput Graph ; 28(12): 4403-4417, 2022 Dec.
Article in English | MEDLINE | ID: mdl-34115588

ABSTRACT

Invertible grayscale is a special kind of grayscale from which the original color can be recovered. Given an input color image, this seminal work tries to hide the color information into its grayscale counterpart while making it hard to recognize any anomalies. This powerful functionality is enabled by training a hiding sub-network and restoring sub-network in an end-to-end way. Despite its expressive results, two key limitations exist: 1) The restored color image often suffers from some noticeable visual artifacts in the smooth regions. 2) It is very sensitive to JPEG compression, i.e., the original color information cannot be well recovered once the intermediate grayscale image is compressed by JPEG. To overcome these two limitations, this article introduces adversarial training and JPEG simulator respectively. Specifically, two auxiliary adversarial networks are incorporated to make the intermediate grayscale images and final restored color images indistinguishable from normal grayscale and color images. And the JPEG simulator is utilized to simulate real JPEG compression during the online training so that the hiding and restoring sub-networks can automatically learn to be JPEG robust. Extensive experiments demonstrate that the proposed method is superior to the original invertible grayscale work both qualitatively and quantitatively while ensuring the JPEG robustness. We further show that the proposed framework can be applied under different types of grayscale constraints and achieve excellent results.

12.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 4852-4866, 2022 09.
Article in English | MEDLINE | ID: mdl-33914680

ABSTRACT

Spatially-adaptive normalization (SPADE) is remarkably successful recently in conditional semantic image synthesis in T. Park et al. 2019 which modulates the normalized activation with spatially-varying transformations learned from semantic layouts, to prevent the semantic information from being washed away. Despite its impressive performance, a more thorough understanding of the advantages inside the box is still highly demanded to help reduce the significant computation and parameter overhead introduced by this novel structure. In this paper, from a return-on-investment point of view, we conduct an in-depth analysis of the effectiveness of this spatially-adaptive normalization and observe that its modulation parameters benefit more from semantic-awareness rather than spatial-adaptiveness, especially for high-resolution input masks. Inspired by this observation, we propose class-adaptive normalization (CLADE), a lightweight but equally-effective variant that is only adaptive to semantic class. In order to further improve spatial-adaptiveness, we introduce intra-class positional map encoding calculated from semantic layouts to modulate the normalization parameters of CLADE and propose a truly spatially-adaptive variant of CLADE, namely CLADE-ICPE. Through extensive experiments on multiple challenging datasets, we demonstrate that the proposed CLADE can be generalized to different SPADE-based methods while achieving comparable generation quality compared to SPADE, but it is much more efficient with fewer extra parameters and lower computational cost. The code and pretrained models are available at https://github.com/tzt101/CLADE.git.


Subject(s)
Algorithms , Semantics
13.
IEEE Trans Vis Comput Graph ; 28(12): 5006-5025, 2022 Dec.
Article in English | MEDLINE | ID: mdl-33886472

ABSTRACT

Three-dimensional (3-D) meshes are commonly used to represent virtual surfaces and volumes. Over the past decade, 3-D meshes have emerged in industrial, medical, and entertainment applications, being of large practical significance for 3-D mesh steganography and steganalysis. In this article, we provide a systematic survey of the literature on 3-D mesh steganography and steganalysis. Compared with an earlier survey (Girdhar et al., 2017), we propose a new taxonomy of steganographic algorithms with four categories: 1) two-state domain, 2) LSB domain, 3) permutation domain, and 4) transform domain. Regarding steganalysis algorithms, we divide them into two categories: 1) universal steganalysis and 2) specific steganalysis. For each category, the history of technical developments and the current technological level are introduced and discussed. Finally, we highlight some promising future research directions and challenges in improving the performance of 3-D mesh steganography and steganalysis.

14.
IEEE Trans Pattern Anal Mach Intell ; 44(8): 4005-4020, 2022 Aug.
Article in English | MEDLINE | ID: mdl-33687836

ABSTRACT

Despite the tremendous success, deep neural networks are exposed to serious IP infringement risks. Given a target deep model, if the attacker knows its full information, it can be easily stolen by fine-tuning. Even if only its output is accessible, a surrogate model can be trained through student-teacher learning by generating many input-output training pairs. Therefore, deep model IP protection is important and necessary. However, it is still seriously under-researched. In this work, we propose a new model watermarking framework for protecting deep networks trained for low-level computer vision or image processing tasks. Specifically, a special task-agnostic barrier is added after the target model, which embeds a unified and invisible watermark into its outputs. When the attacker trains one surrogate model by using the input-output pairs of the barrier target model, the hidden watermark will be learned and extracted afterwards. To enable watermarks from binary bits to high-resolution images, a deep invisible watermarking mechanism is designed. By jointly training the target model and watermark embedding, the extra barrier can even be absorbed into the target model. Through extensive experiments, we demonstrate the robustness of the proposed framework, which can resist attacks with different network structures and objective functions.

15.
IEEE Trans Pattern Anal Mach Intell ; 43(1): 33-47, 2021 Jan.
Article in English | MEDLINE | ID: mdl-31265384

ABSTRACT

Many different deep networks have been used to approximate, accelerate or improve traditional image operators. Among these traditional operators, many contain parameters which need to be tweaked to obtain the satisfactory results, which we refer to as "parameterized image operators". However, most existing deep networks trained for these operators are only designed for one specific parameter configuration, which does not meet the needs of real scenarios that usually require flexible parameters settings. To overcome this limitation, we propose a new decoupled learning algorithm to learn from the operator parameters to dynamically adjust the weights of a deep network for image operators, denoted as the base network. The learned algorithm is formed as another network, namely the weight learning network, which can be end-to-end jointly trained with the base network. Experiments demonstrate that the proposed framework can be successfully applied to many traditional parameterized image operators. To accelerate the parameter tuning for practical scenarios, the proposed framework can be further extended to dynamically change the weights of only one single layer of the base network while sharing most computation cost. We demonstrate that this cheap parameter-tuning extension of the proposed decoupled learning framework even outperforms the state-of-the-art alternative approaches.

16.
IEEE Trans Vis Comput Graph ; 27(1): 57-67, 2021 Jan.
Article in English | MEDLINE | ID: mdl-31331894

ABSTRACT

The standard tensor voting technique shows its versatility in tasks such as object recognition and semantic segmentation by recognizing feature points and sharp edges that can segment a model into several patches. We propose a neighborhood-level representation-guided tensor voting model for 3D mesh steganalysis. Because existing steganalytic methods do not analyze correlations among neighborhood faces, they are not very effective at discriminating stego meshes from cover meshes. In this paper, we propose to utilize a tensor voting model to reveal the artifacts caused by embedding data. In the proposed steganalytic scheme, the normal voting tensor (NVT) operation is performed on original mesh faces and smoothed mesh faces separately. Then, the absolute values of the differences between the eigenvalues of the two tensors (from the original face and the smoothed face) are regarded as features that capture intricate relationships among the vertices. Subsequently, the extracted features are processed with a nonlinear mapping to boost the feature effectiveness. The experimental results show that the proposed feature sets prevail over state-of-the-art feature sets including LFS64 and ELFS124 under various steganographic schemes.

17.
IEEE Trans Pattern Anal Mach Intell ; 43(7): 2373-2387, 2021 07.
Article in English | MEDLINE | ID: mdl-31905133

ABSTRACT

Image style transfer is to re-render the content of one image with the style of another. Most existing methods couple content and style information in their network structures and hyper-parameters, and learn it as a black-box. For better understanding, this paper aims to provide a new explicit decoupled perspective. Specifically, we propose StyleBank, which is composed of multiple convolution filter banks and each filter bank explicitly represents one style. To transfer an image to a specific style, the corresponding filter bank is operated on the intermediate feature produced by a single auto-encoder. The StyleBank and the auto-encoder are jointly learnt in such a way that the auto-encoder does not encode any style information. This explicit representation also enables us to conduct incremental learning to add a new style and fuse styles at not only the image level, but also the region level. Our method is the first style transfer network that links back to traditional texton mapping methods, and provides new understanding on neural style transfer. We further apply this general filterbank learning idea to two different multi-parameter image processing tasks: edge-aware image smoothing and denoising. Experiments demonstrate that it can achieve comparable results to its single parameter setting counterparts.


Subject(s)
Algorithms , Image Processing, Computer-Assisted
18.
Article in English | MEDLINE | ID: mdl-33026987

ABSTRACT

In recent years, the field of object detection has made significant progress. The success of most of the state-of-the-art object detectors is derived from the use of feature pyramid and the carefully designed anchor boxes. However, the current methods of constructing feature pyramid usually blindly integrate multi-scale representations on each feature hierarchy. Furthermore, these detectors also suffer from some drawbacks brought by the hand-designed anchors. To mitigate the adverse effects caused thereby, we introduce a one-stage object detector, named as the semi-anchor-free network with enhanced feature pyramid (SAFNet). Specifically, to better construct feature pyramid, we propose a novel enhanced feature pyramid generation paradigm, which mainly consists of two modules, i.e., adaptive feature fusion module (AFFM) and self-enhanced module (SEM). The paradigm adaptively integrates multi-scale representations in a non-linear method meanwhile suppress the redundant semantic information for each pyramid level, such that a clean and enhanced feature pyramid could be obtained. In addition, an adaptive anchor generator (AAG) is designed to yield fewer but more suitable anchor boxes for each input image. Benefiting from the enhanced feature pyramid, AAG is capable of generating more accurate anchor boxes by introducing few priors. Thus, AAG has the ability to alleviate the drawbacks caused by the preset anchor hyper-parameters and helps to decrease the computation cost. Extensive experiments demonstrate the effectiveness of our approach. Profited from the proposed modules, SAFNet significantly boosts the detection performance, i.e., achieving 2 points and 2.1 points higher Average Precision (AP) than RetinaNet (our baseline) on PASCAL VOC and MS COCO respectively. Codes will be publicly available soon.

19.
Article in English | MEDLINE | ID: mdl-29994677

ABSTRACT

Recursive code construction (RCC), based on the optimal transition probability matrix (OTPM), approaching the rate-distortion bound of reversible data hiding (RDH) has been proposed. Using the existing methods, OTPM can be effectively estimated only for a consistent distortion metric, i.e., if the host elements at different positions share the same distortion metric. However, in many applications, the distortion metrics are position dependent and should thus be inconsistent. Inconsistent distortion metrics can usually be quantified as a multi-distortion metric. In this paper, we first formulate the rate-distortion problem of RDH under a multi-distortion metric and subsequently propose a general framework to estimate the corresponding OTPM, with which RCC is extended to approach the rate-distortion bound of RDH under the multi-distortion metric. We apply the proposed framework to two examples of inconsistent distortion metrics: RDH in color image and reversible steganography. The experimental results show that the proposed method can efficiently improve upon the existing techniques.

20.
IEEE Trans Image Process ; 26(4): 1623-1625, 2017 Apr.
Article in English | MEDLINE | ID: mdl-28252386

ABSTRACT

Message hiding in texture image synthesis is a novel steganography approach by which we resample a smaller texture image and synthesize a new texture image with a similar local appearance and an arbitrary size. However, the mirror operation over the image boundary is flawed and is easy to attack. We propose an attacking method on this steganography, which can not only detect the stego-images but can also extract the hidden messages.

SELECTION OF CITATIONS
SEARCH DETAIL
...