Pesquisa | Secretaria de Estado da Saúde

1.

A composite visualization method for electrophysiology-morphous merging of human heart.

Yang, Fei; Zhang, Lei; Lu, Weigang; Zhang, Yue; Zuo, Wangmeng; Wang, Kuanquan; Zhang, Henggui.

Biomed Eng Online ; 16(1): 70, 2017 Jun 08.

Artigo em Inglês | MEDLINE | ID: mdl-28595607

RESUMO

BACKGROUND: Electrophysiological behavior is of great importance for analyzing the cardiac functional mechanism under cardiac physiological and pathological condition. Due to the complexity of cardiac structure and biophysiological function, visualization of a cardiac electrophysiological model compositively is still a challenge. The lack of either modality of the whole organ structure or cardiac electrophysiological behaviors makes analysis of the intricate mechanisms of cardiac dynamic function a difficult task. This study aims at exploring 3D conduction of stimulus and electrical excitation reactivity on the level of organ with the authentic fine cardiac anatomy structure. METHODS: In this paper, a cardiac electrical excitation propagation model is established based on the human cardiac cross-sectional data to explore detailed cardiac electrical activities. A novel biophysical merging visualization method is then presented for biophysical integration of cardiac anatomy and electrophysiological properties in the form of the merging optical model, which provides the corresponding position, spatial relationship and the whole process in 3D space with the context of anatomical structure for representing the biophysical detailed electrophysiological activity. RESULTS: The visualization result present the action potential propagation of the left ventricle within the excitation cycle with the authentic fine cardiac organ anatomy. In the visualized images, all vital organs are identified and distinguished without ambiguity. The three dimensional spatial position, relation and the process of cardiac excitation conduction and re-entry propagation in the anatomical structure during the phase of depolarization and repolarization is also shown in the result images, which exhibits the performance of a more detailed biophysical understanding of the electrophysiological kinetics of human heart in vivo. CONCLUSIONS: Results suggest that the proposed merging optical model can merge cardiac electrophysiological activity with the anatomy structure. By specifying the respective opacity for the cardiac anatomy structure and the electrophysiological model in the merging attenuation function, the visualized images can provide an in-depth insight into the biophysical detailed cardiac functioning phenomena and the corresponding electrophysiological behavior mechanism, which is helpful for further speculating cardiac physiological and pathological responses and is fundamental to the cardiac research and clinical diagnoses.

Assuntos

Fenômenos Eletrofisiológicos , Coração/fisiologia , Modelos Cardiovasculares , Estudos Transversais , Coração/diagnóstico por imagem , Sistema de Condução Cardíaco/diagnóstico por imagem , Sistema de Condução Cardíaco/fisiologia , Humanos , Imageamento por Ressonância Magnética , Tomografia Computadorizada por Raios X

2.

Multiview stereo and silhouette fusion via minimizing generalized reprojection error.

Li, Zhaoxin; Wang, Kuanquan; Jia, Wenyan; Chen, Hsin-Chen; Zuo, Wangmeng; Meng, Deyu; Sun, Mingui.

Image Vis Comput ; 33: 1-14, 2015 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-25558120

RESUMO

Accurate reconstruction of 3D geometrical shape from a set of calibrated 2D multiview images is an active yet challenging task in computer vision. The existing multiview stereo methods usually perform poorly in recovering deeply concave and thinly protruding structures, and suffer from several common problems like slow convergence, sensitivity to initial conditions, and high memory requirements. To address these issues, we propose a two-phase optimization method for generalized reprojection error minimization (TwGREM), where a generalized framework of reprojection error is proposed to integrate stereo and silhouette cues into a unified energy function. For the minimization of the function, we first introduce a convex relaxation on 3D volumetric grids which can be efficiently solved using variable splitting and Chambolle projection. Then, the resulting surface is parameterized as a triangle mesh and refined using surface evolution to obtain a high-quality 3D reconstruction. Our comparative experiments with several state-of-the-art methods show that the performance of TwGREM based 3D reconstruction is among the highest with respect to accuracy and efficiency, especially for data with smooth texture and sparsely sampled viewpoints.

3.

Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations.

Zhang, Zhilu; Wang, Ruohao; Zhang, Hongzhi; Zuo, Wangmeng.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-38507386

RESUMO

In this paper, we consider two challenging issues in reference-based super-resolution (RefSR) for smartphone, (i) how to choose a proper reference image, and (ii) how to learn RefSR in a self-supervised manner. Particularly, we propose a novel self-supervised learning approach for real-world RefSR from observations at dual and multiple camera zooms. Firstly, considering the popularity of multiple cameras in modern smartphones, the more zoomed (telephoto) image can be naturally leveraged as the reference to guide the super-resolution (SR) of the lesser zoomed (ultra-wide) image, which gives us a chance to learn a deep network that performs SR from the dual zoomed observations (DZSR). Secondly, for self-supervised learning of DZSR, we take the telephoto image instead of an additional high-resolution image as the supervision information, and select a center patch from it as the reference to super-resolve the corresponding ultra-wide image patch. To mitigate the effect of the misalignment between ultra-wide low-resolution (LR) patch and telephoto ground-truth (GT) image during training, we first adopt patch-based optical flow alignment to obtain the warped LR, then further design an auxiliary-LR to guide the deforming of the warped LR features. To generate visually pleasing results, we present local overlapped sliced Wasserstein loss to better represent the perceptual difference between GT and output in the feature space. During testing, DZSR can be directly deployed to super-solve the whole ultra-wide image with the reference of the telephoto image. In addition, we further take multiple zoomed observations to explore self-supervised RefSR, and present a progressive fusion scheme for the effective utilization of reference images. Experiments show that our methods achieve better quantitative and qualitative performance against state-of-the-arts. The code and pre-trained models will be publicly available.

4.

Learning Diverse Tone Styles for Image Retouching.

Wang, Haolin; Zhang, Jiawei; Liu, Ming; Wu, Xiaohe; Zuo, Wangmeng.

IEEE Trans Image Process ; 33: 310-321, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38090849

RESUMO

Image retouching, aiming to regenerate the visually pleasing renditions of given images, is a subjective task where the users are with different aesthetic sensations. Most existing methods adopt a deterministic model to learn the retouching style from a specific expert, making it less flexible to meet diverse subjective preferences. Besides, the intrinsic diversity of an expert due to the targeted processing of different images is also deficiently described. To circumvent such issues, we propose to learn diverse image retouching with normalizing flow-based architectures. Unlike current flow-based methods which directly generate the output image, we argue that learning in a one-dimensional style space could 1) disentangle the retouching styles from the image content, 2) lead to a stable style presentation form, and 3) avoid the spatial disharmony effects. For obtaining meaningful image tone style representations, a joint-training pipeline is delicately designed, which is composed of a style encoder, a conditional RetouchNet, and the image tone style normalizing flow (TSFlow) module. In particular, the style encoder predicts the target style representation of an input image, which serves as the conditional information in the RetouchNet for retouching, while the TSFlow maps the style representation vector into a Gaussian distribution in the forward pass. After training, the TSFlow can generate diverse image tone style vectors by sampling from the Gaussian distribution. Extensive experiments on MIT-Adobe FiveK and PPR10K datasets show that our proposed method performs favorably against state-of-the-art methods and is effective in generating diverse results to satisfy different human aesthetic preferences. Source codeterministic and pre-trained models are publicly available at https://github.com/SSRHeart/TSFlow.

Assuntos

Aprendizagem , Humanos , Distribuição Normal

5.

Layer-Specific Knowledge Distillation for Class Incremental Semantic Segmentation.

Wang, Qilong; Wu, Yiwen; Yang, Liu; Zuo, Wangmeng; Hu, Qinghua.

IEEE Trans Image Process ; 33: 1977-1989, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38451756

RESUMO

Recently, class incremental semantic segmentation (CISS) towards the practical open-world setting has attracted increasing research interest, which is mainly challenged by the well-known issue of catastrophic forgetting. Particularly, knowledge distillation (KD) techniques have been widely studied to alleviate catastrophic forgetting. Despite the promising performance, existing KD-based methods generally use the same distillation schemes for different intermediate layers to transfer old knowledge, while employing manually tuned and fixed trade-off weights to control the effect of KD. These KD-based methods take no consideration of feature characteristics from different intermediate layers, limiting the effectiveness of KD for CISS. In this paper, we propose a layer-specific knowledge distillation (LSKD) method to assign appropriate knowledge schemes and weights for various intermediate layers by considering feature characteristics, aiming to further explore the potential of KD in improving the performance of CISS. Specifically, we present a mask-guided distillation (MD) to alleviate the background shift on semantic features, which performs distillation by masking the features affected by the background. Furthermore, a mask-guided context distillation (MCD) is presented to explore global context information lying in high-level semantic features. Based on them, our LSKD assigns different distillation schemes according to feature characteristics. To adjust the effect of layer-specific distillation adaptively, LSKD introduces a regularized gradient equilibrium method to learn dynamic trade-off weights. Additionally, our LSKD makes an attempt to simultaneously learn distillation schemes and trade-off weights of different layers by developing a bi-level optimization method. Extensive experiments on widely used Pascal VOC 12 and ADE20K show our LSKD clearly outperforms its counterparts while achieving state-of-the-art results.

6.

A self-supervised network for image denoising and watermark removal.

Tian, Chunwei; Xiao, Jingyu; Zhang, Bob; Zuo, Wangmeng; Zhang, Yudong; Lin, Chia-Wen.

Neural Netw ; 174: 106218, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38518709

RESUMO

In image watermark removal, popular methods depend on given reference non-watermark images in a supervised way to remove watermarks. However, reference non-watermark images are difficult to be obtained in the real world. At the same time, they often suffer from the influence of noise when captured by digital devices. To resolve these issues, in this paper, we present a self-supervised network for image denoising and watermark removal (SSNet). SSNet uses a parallel network in a self-supervised learning way to remove noise and watermarks. Specifically, each sub-network contains two sub-blocks. The upper sub-network uses the first sub-block to remove noise, according to noise-to-noise. Then, the second sub-block in the upper sub-network is used to remove watermarks, according to the distributions of watermarks. To prevent the loss of important information, the lower sub-network is used to simultaneously learn noise and watermarks in a self-supervised learning way. Moreover, two sub-networks interact via attention to extract more complementary salient information. The proposed method does not depend on paired images to learn a blind denoising and watermark removal model, which is very meaningful for real applications. Also, it is more effective than the popular image watermark removal methods in public datasets. Codes can be found at https://github.com/hellloxiaotian/SSNet.

7.

An Intermediate-Level Attack Framework on the Basis of Linear Regression.

Guo, Yiwen; Li, Qizhang; Zuo, Wangmeng; Chen, Hao.

IEEE Trans Pattern Anal Mach Intell ; 45(3): 2726-2735, 2023 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-35786551

RESUMO

This article substantially extends our work published at ECCV (Li et al., 2020), in which an intermediate-level attack was proposed to improve the transferability of some baseline adversarial examples. Specifically, we advocate a framework in which a direct linear mapping from the intermediate-level discrepancies (between adversarial features and benign features) to prediction loss of the adversarial example is established. By delving deep into the core components of such a framework, we show that a variety of linear regression models can all be considered in order to establish the mapping, the magnitude of the finally obtained intermediate-level adversarial discrepancy is correlated with the transferability, and further boost of the performance can be achieved by performing multiple runs of the baseline attack with random initialization. In addition, by leveraging these findings, we achieve new state-of-the-arts on transfer-based l∞ and l2 attacks. Our code is publicly available at https://github.com/qizhangli/ila-plus-plus-lr.

8.

Learning Context-Based Nonlocal Entropy Modeling for Image Compression.

Li, Mu; Zhang, Kai; Li, Jinxing; Zuo, Wangmeng; Timofte, Radu; Zhang, David.

IEEE Trans Neural Netw Learn Syst ; 34(3): 1132-1145, 2023 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-34428157

RESUMO

The entropy of the codes usually serves as the rate loss in the recent learned lossy image compression methods. Precise estimation of the probabilistic distribution of the codes plays a vital role in reducing the entropy and boosting the joint rate-distortion performance. However, existing deep learning based entropy models generally assume the latent codes are statistically independent or depend on some side information or local context, which fails to take the global similarity within the context into account and thus hinders the accurate entropy estimation. To address this issue, we propose a special nonlocal operation for context modeling by employing the global similarity within the context. Specifically, due to the constraint of context, nonlocal operation is incalculable in context modeling. We exploit the relationship between the code maps produced by deep neural networks and introduce the proxy similarity functions as a workaround. Then, we combine the local and the global context via a nonlocal attention block and employ it in masked convolutional networks for entropy modeling. Taking the consideration that the width of the transforms is essential in training low distortion models, we finally produce a U-net block in the transforms to increase the width with manageable memory consumption and time complexity. Experiments on Kodak and Tecnick datasets demonstrate the priority of the proposed context-based nonlocal attention block in entropy modeling and the U-net block in low distortion situations. On the whole, our model performs favorably against the existing image compression standards and recent deep image compression models.

9.

Learning Dual Memory Dictionaries for Blind Face Restoration.

Li, Xiaoming; Zhang, Shiguang; Zhou, Shangchen; Zhang, Lei; Zuo, Wangmeng.

IEEE Trans Pattern Anal Mach Intell ; 45(5): 5904-5917, 2023 May.

Artigo em Inglês | MEDLINE | ID: mdl-36251909

RESUMO

Blind face restoration is a challenging task due to the unknown, unsynthesizable and complex degradation, yet is valuable in many practical applications. To improve the performance of blind face restoration, recent works mainly treat the two aspects, i.e., generic and specific restoration, separately. In particular, generic restoration attempts to restore the results through general facial structure prior, while on the one hand, cannot generalize to real-world degraded observations due to the limited capability of direct CNNs' mappings in learning blind restoration, and on the other hand, fails to exploit the identity-specific details. On the contrary, specific restoration aims to incorporate the identity features from the reference of the same identity, in which the requirement of proper reference severely limits the application scenarios. Generally, it is a challenging and intractable task to improve the photo-realistic performance of blind restoration and adaptively handle the generic and specific restoration scenarios with a single unified model. Instead of implicitly learning the mapping from a low-quality image to its high-quality counterpart, this paper suggests a DMDNet by explicitly memorizing the generic and specific features through dual dictionaries. First, the generic dictionary learns the general facial priors from high-quality images of any identity, while the specific dictionary stores the identity-belonging features for each person individually. Second, to handle the degraded input with or without specific reference, dictionary transform module is suggested to read the relevant details from the dual dictionaries which are subsequently fused into the input features. Finally, multi-scale dictionaries are leveraged to benefit the coarse-to-fine restoration. The whole framework including the generic and specific dictionaries is optimized in an end-to-end manner and can be flexibly plugged into different application scenarios. Moreover, a new high-quality dataset, termed CelebRef-HQ, is constructed to promote the exploration of specific face restoration in the high-resolution space. Experimental results demonstrate that the proposed DMDNet performs favorably against the state of the arts in both quantitative and qualitative evaluation, and generates more photo-realistic results on the real-world low-quality images. The codes, models and the CelebRef-HQ dataset will be publicly available at https://github.com/csxmli2016/DMDNet.

10.

Infrared Small and Dim Target Detection With Transformer Under Complex Backgrounds.

Liu, Fangcen; Gao, Chenqiang; Chen, Fang; Meng, Deyu; Zuo, Wangmeng; Gao, Xinbo.

IEEE Trans Image Process ; 32: 5921-5932, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37883292

RESUMO

The infrared small and dim (S&D) target detection is one of the key techniques in the infrared search and tracking system. Since the local regions similar to infrared S&D targets spread over the whole background, exploring the correlation amongst image features in large-range dependencies to mine the difference between the target and background is crucial for robust detection. However, existing deep learning-based methods are limited by the locality of convolutional neural networks, which impairs the ability to capture large-range dependencies. Additionally, the S&D appearance of the infrared target makes the detection model highly possible to miss detection. To this end, we propose a robust and general infrared S&D target detection method with the transformer. We adopt the self-attention mechanism of the transformer to learn the correlation of image features in a larger range. Moreover, we design a feature enhancement module to learn discriminative features of S&D targets to avoid miss-detections. After that, to avoid the loss of the target information, we adopt a decoder with the U-Net-like skip connection operation to contain more information of S&D targets. Finally, we get the detection result by a segmentation head. Extensive experiments on two public datasets show the obvious superiority of the proposed method over state-of-the-art methods, and the proposed method has a stronger generalization ability and better noise tolerance.

11.

Localization Distillation for Object Detection.

Zheng, Zhaohui; Ye, Rongguang; Hou, Qibin; Ren, Dongwei; Wang, Ping; Zuo, Wangmeng; Cheng, Ming-Ming.

IEEE Trans Pattern Anal Mach Intell ; 45(8): 10070-10083, 2023 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-37027640

RESUMO

Previous knowledge distillation (KD) methods for object detection mostly focus on feature imitation instead of mimicking the prediction logits due to its inefficiency in distilling the localization information. In this paper, we investigate whether logit mimicking always lags behind feature imitation. Towards this goal, we first present a novel localization distillation (LD) method which can efficiently transfer the localization knowledge from the teacher to the student. Second, we introduce the concept of valuable localization region that can aid to selectively distill the classification and localization knowledge for a certain region. Combining these two new components, for the first time, we show that logit mimicking can outperform feature imitation and the absence of localization distillation is a critical reason for why logit mimicking under-performs for years. The thorough studies exhibit the great potential of logit mimicking that can significantly alleviate the localization ambiguity, learn robust feature representation, and ease the training difficulty in the early stage. We also provide the theoretical connection between the proposed LD and the classification KD, that they share the equivalent optimization effect. Our distillation scheme is simple as well as effective and can be easily applied to both dense horizontal object detectors and rotated object detectors. Extensive experiments on the MS COCO, PASCAL VOC, and DOTA benchmarks demonstrate that our method can achieve considerable AP improvement without any sacrifice on the inference speed. Our source code and pretrained models are publicly available at https://github.com/HikariTJU/LD.

Assuntos

Algoritmos , Benchmarking , Humanos , Aprendizagem , Software

12.

Multi-View Knowledge Ensemble With Frequency Consistency for Cross-Domain Face Translation.

Cao, Bing; Wang, Qinghe; Zhu, Pengfei; Hu, Qinghua; Ren, Dongwei; Zuo, Wangmeng; Gao, Xinbo.

IEEE Trans Neural Netw Learn Syst ; PP2023 Jan 18.

Artigo em Inglês | MEDLINE | ID: mdl-37021850

RESUMO

Cross-domain face translation aims to transfer face images from one domain to another. It can be widely used in practical applications, such as photos/sketches in law enforcement, photos/drawings in digital entertainment, and near-infrared (NIR)/visible (VIS) images in security access control. Restricted by limited cross-domain face image pairs, the existing methods usually yield structural deformation or identity ambiguity, which leads to poor perceptual appearance. To address this challenge, we propose a multi-view knowledge (structural knowledge and identity knowledge) ensemble framework with frequency consistency (MvKE-FC) for cross-domain face translation. Due to the structural consistency of facial components, the multi-view knowledge learned from large-scale data can be appropriately transferred to limited cross-domain image pairs and significantly improve the generative performance. To better fuse multi-view knowledge, we further design an attention-based knowledge aggregation module that integrates useful information, and we also develop a frequency-consistent (FC) loss that constrains the generated images in the frequency domain. The designed FC loss consists of a multidirection Prewitt (mPrewitt) loss for high-frequency consistency and a Gaussian blur loss for low-frequency consistency. Furthermore, our FC loss can be flexibly applied to other generative models to enhance their overall performance. Extensive experiments on multiple cross-domain face datasets demonstrate the superiority of our method over state-of-the-art methods both qualitatively and quantitatively.

13.

Towards a Deeper Understanding of Global Covariance Pooling in Deep Learning: An Optimization Perspective.

Wang, Qilong; Zhang, Zhaolin; Gao, Mingze; Xie, Jiangtao; Zhu, Pengfei; Li, Peihua; Zuo, Wangmeng; Hu, Qinghua.

IEEE Trans Pattern Anal Mach Intell ; 45(12): 15802-15819, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37782579

RESUMO

Global covariance pooling (GCP) as an effective alternative to global average pooling has shown good capacity to improve deep convolutional neural networks (CNNs) in a variety of vision tasks. Although promising performance, it is still an open problem on how GCP (especially its post-normalization) works in deep learning. In this paper, we make the effort towards understanding the effect of GCP on deep learning from an optimization perspective. Specifically, we first analyze behavior of GCP with matrix power normalization on optimization loss and gradient computation of deep architectures. Our findings show that GCP can improve Lipschitzness of optimization loss and achieve flatter local minima, while improving gradient predictiveness and functioning as a special pre-conditioner on gradients. Then, we explore the effect of post-normalization on GCP from the model optimization perspective, which encourages us to propose a simple yet effective normalization, namely DropCov. Based on above findings, we point out several merits of deep GCP that have not been recognized previously or fully explored, including faster convergence, stronger model robustness and better generalization across tasks. Extensive experimental results using both CNNs and vision transformers on diversified vision tasks provide strong support to our findings while verifying the effectiveness of our method.

14.

Deepfake Forensics via an Adversarial Game.

Wang, Zhi; Guo, Yiwen; Zuo, Wangmeng.

IEEE Trans Image Process ; 31: 3541-3552, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35544507

RESUMO

With the progress in AI-based facial forgery (i.e., deepfake), people are concerned about its abuse. Albeit effort has been made for training models to recognize such forgeries, existing models suffer from poor generalization to unseen forgery technologies and high sensitivity to changes in image/video quality. In this paper, we advocate robust training for improving the generalization ability. We believe training with samples that are adversarially crafted to attack the classification models improves the generalization ability considerably. Considering that AI-based face manipulation often leads to high-frequency artifacts that can be easily spotted (by models) yet difficult to generalize, we further propose a new adversarial training method that attempts to blur out these artifacts, by introducing pixel-wise Gaussian blurring. Plenty of empirical evidence show that, with adversarial training, models are forced to learn more discriminative and generalizable features. Our code: https://github.com/ah651/deepfake_adv.

15.

Delving Deeper Into Pixel Prior for Box-Supervised Semantic Segmentation.

Ma, Tianqi; Wang, Qilong; Zhang, Hongzhi; Zuo, Wangmeng.

IEEE Trans Image Process ; 31: 1406-1417, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35038294

RESUMO

Weakly supervised semantic segmentation (WSSS) based on bounding box annotations has attracted considerable recent attention and has achieved promising performance. However, most of existing methods focus on generation of high-quality pseudo labels for segmented objects using box indicators, but they fail to fully explore and exploit prior from bounding box annotations, which limits performance of WSSS methods, especially for fine parts and boundaries. To overcome above issues, this paper proposes a novel Pixel-as-Instance Prior (PIP) for WSSS methods by delving deeper into pixel prior from bounding box annotations. Specifically, the proposed PIP is built on two important observations on pixels around bounding boxes. First, since objects are usually irregularity and tightly close to bounding boxes (dubbed irregular-filling prior), so each row or column of bounding boxes basically have at least one pixel belonging to foreground objects and background, respectively. Second, pixels near the bounding boxes tend to be highly ambiguous and more difficult to classify (dubbed label-ambiguity prior). To implement our PIP, a constrained loss alike multiple instance learning (MIL) and a labeling-balance loss are developed to jointly train WSSS models, which regards each pixel as a weighted positive or negative instance while considering more effective prior (i.e., irregular-filling and label-ambiguity priors) from bounding box annotations in an efficient way. Note that our PIP can be flexibly integrated with various WSSS methods, while clearly improving their performance with negligible computational overload in training stage. The experiments are conducted on most widely used PASCAL VOC 2012 and Cityscapes benchmarks, and the results show that our PIP has a good ability to improve performance of various WSSS methods, while achieving very competitive results.

16.

Learning a Prototype Discriminator With RBF for Multimodal Image Synthesis.

Bi, Zhiwei; Cao, Bing; Zuo, Wangmeng; Hu, Qinghua.

IEEE Trans Image Process ; 31: 6664-6678, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36260596

RESUMO

Multimodal image synthesis has emerged as a viable solution to the modality missing challenge. Most existing approaches employ softmax-based classifiers to provide modal constraints for the generated models. These methods, however, focus on learning to distinguish inter-domain differences while failing to build intra-domain compactness, resulting in inferior synthetic results. To provide sufficient domain-specific constraint, we hereby introduce a novel prototype discriminator for generative adversarial network (PT-GAN) to effectively estimate the missing or noisy modalities. Different from most previous works, we introduce the Radial Basis Function (RBF) network, endowing the discriminator with domain-specific prototypes, to improve the optimization of generative model. Since the prototype learning extracts more discriminative representation of each domain, and emphasizes intra-domain compactness, it reduces the sensitivity of discriminator to pixel changes in generated images. To address this dilemma, we further propose a reconstructive regularization term which connects the discriminator with the generator, thus enhancing its pixel detectability. To this end, the proposed PT-GAN provides not only consistent domain-specific constraints, but also reasonable uncertainty estimation of generated images with the RBF distance. Experimental results show that our method outperforms the state-of-the-art techniques. The source code will be available at: https://github.com/zhiweibi/PT-GAN.

17.

CrabNet: Fully Task-Specific Feature Learning for One-Stage Object Detection.

Wang, Hao; Wang, Qilong; Zhang, Hongzhi; Hu, Qinghua; Zuo, Wangmeng.

IEEE Trans Image Process ; 31: 2962-2974, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35353700

RESUMO

Object detection is usually solved by learning a deep architecture involving classification and localization tasks, where feature learning for these two tasks is shared using the same backbone model. Recent works have shown that suitable disentanglement of classification and localization tasks has the great potential to improve performance of object detection. Despite the promising performance, existing feature disentanglement methods usually suffer from two limitations. First, most of them only focus on the disentangled proposals or predication heads for classification and localization tasks after RPN. While little consideration has been given to that the features for these two different tasks actually are obtained by a shared backbone model before RPN. Second, they are suggested for two-stage objectors and are not applicable to one-stage methods. To overcome these limitations, this paper presents a novel fully task-specific feature learning method for one-stage object detection. Specifically, our method first learns disentangled features for classification and localization tasks using two separated backbone models, where auxiliary classification and localization heads are inserted at the end of the two backbone models for providing a fully task-specific features for classification and localization. Then, a feature interaction module is developed for aligning and fusing task-specific features, which are further used to produce the final detection result. Experiments on MS COCO show that our proposed method (dubbed CrabNet) can achieve clear improvement over counterparts with increasing limited inference time, while performing favorably against state-of-the-arts.

18.

Deep Cognitive Gate: Resembling Human Cognition for Saliency Detection.

Yan, Ke; Wang, Xiuying; Kim, Jinman; Zuo, Wangmeng; Feng, Dagan.

IEEE Trans Pattern Anal Mach Intell ; 44(9): 4776-4792, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-33755558

RESUMO

Saliency detection by human refers to the ability to identify pertinent information using our perceptive and cognitive capabilities. While human perception is attracted by visual stimuli, our cognitive capability is derived from the inspiration of constructing concepts of reasoning. Saliency detection has gained intensive interest with the aim of resembling human 'perceptual' system. However, saliency related to human 'cognition', particularly the analysis of complex salient regions ('cogitating' process), is yet to be fully exploited. We propose to resemble human cognition, coupled with human perception, to improve saliency detection. We recognize saliency in three phases ('Seeing' - 'Perceiving' - 'Cogitating), mimicking human's perceptive and cognitive thinking of an image. In our method, 'Seeing' phase is related to human perception, and we formulate the 'Perceiving' and 'Cogitating' phases related to the human cognition systems via deep neural networks (DNNs) to construct a new module (Cognitive Gate) that enhances the DNN features for saliency detection. To the best of our knowledge, this is the first work that established DNNs to resemble human cognition for saliency detection. In our experiments, our approach outperformed 17 benchmarking DNN methods on six well-recognized datasets, demonstrating that resembling human cognition improves saliency detection.

Assuntos

Algoritmos , Redes Neurais de Computação , Cognição , Humanos

19.

Offset-decoupled deformable convolution for efficient crowd counting.

Zhong, Xin; Qin, Jing; Guo, Mingyue; Zuo, Wangmeng; Lu, Weigang.

Sci Rep ; 12(1): 12229, 2022 07 18.

Artigo em Inglês | MEDLINE | ID: mdl-35851829

RESUMO

Crowd counting is considered a challenging issue in computer vision. One of the most critical challenges in crowd counting is considering the impact of scale variations. Compared with other methods, better performance is achieved with CNN-based methods. However, given the limit of fixed geometric structures, the head-scale features are not completely obtained. Deformable convolution with additional offsets is widely used in the fields of image classification and pattern recognition, as it can successfully exploit the potential of spatial information. However, owing to the randomly generated parameters of offsets in network initialization, the sampling points of the deformable convolution are disorderly stacked, weakening the effectiveness of feature extraction. To handle the invalid learning of offsets and the inefficient utilization of deformable convolution, an offset-decoupled deformable convolution (ODConv) is proposed in this paper. It can completely obtain information within the effective region of sampling points, leading to better performance. In extensive experiments, average MAE of 62.3, 8.3, 91.9, and 159.3 are achieved using our method on the ShanghaiTech A, ShanghaiTech B, UCF-QNRF, and UCF_CC_50 datasets, respectively, outperforming the state-of-the-art methods and validating the effectiveness of the proposed ODConv.

Assuntos

Algoritmos

20.

Image super-resolution with an enhanced group convolutional neural network.

Tian, Chunwei; Yuan, Yixuan; Zhang, Shichao; Lin, Chia-Wen; Zuo, Wangmeng; Zhang, David.

Neural Netw ; 153: 373-385, 2022 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-35779445

RESUMO

CNNs with strong learning abilities are widely chosen to resolve super-resolution problem. However, CNNs depend on deeper network architectures to improve performance of image super-resolution, which may increase computational cost in general. In this paper, we present an enhanced super-resolution group CNN (ESRGCNN) with a shallow architecture by fully fusing deep and wide channel features to extract more accurate low-frequency information in terms of correlations of different channels in single image super-resolution (SISR). Also, a signal enhancement operation in the ESRGCNN is useful to inherit more long-distance contextual information for resolving long-term dependency. An adaptive up-sampling operation is gathered into a CNN to obtain an image super-resolution model with low-resolution images of different sizes. Extensive experiments report that our ESRGCNN surpasses the state-of-the-arts in terms of SISR performance, complexity, execution speed, image quality evaluation and visual effect in SISR. Code is found at https://github.com/hellloxiaotian/ESRGCNN.

Assuntos

Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador/métodos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa