Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Image Process ; 33: 4840-4852, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39042525

RESUMO

The spiking neural networks (SNNs) that efficiently encode temporal sequences have shown great potential in extracting audio-visual joint feature representations. However, coupling SNNs (binary spike sequences) with transformers (float-point sequences) to jointly explore the temporal-semantic information still facing challenges. In this paper, we introduce a novel Spiking Tucker Fusion Transformer (STFT) for audio-visual zero-shot learning (ZSL). The STFT leverage the temporal and semantic information from different time steps to generate robust representations. The time-step factor (TSF) is introduced to dynamically synthesis the subsequent inference information. To guide the formation of input membrane potentials and reduce the spike noise, we propose a global-local pooling (GLP) which combines the max and average pooling operations. Furthermore, the thresholds of the spiking neurons are dynamically adjusted based on semantic and temporal cues. Integrating the temporal and semantic information extracted by SNNs and Transformers are difficult due to the increased number of parameters in a straightforward bilinear model. To address this, we introduce a temporal-semantic Tucker fusion module, which achieves multi-scale fusion of SNN and Transformer outputs while maintaining full second-order interactions. Our experimental results demonstrate the effectiveness of the proposed approach in achieving state-of-the-art performance in three benchmark datasets. The harmonic mean (HM) improvement of VGGSound, UCF101 and ActivityNet are around 15.4%, 3.9%, and 14.9%, respectively.

2.
IEEE Trans Image Process ; 33: 3634-3647, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38809732

RESUMO

For capturing dynamic scenes with ultra-fast motion, neuromorphic cameras with extremely high temporal resolution have demonstrated their great capability and potential. Different from the event cameras that only record relative changes in light intensity, spike camera fires a stream of spikes according to a full-time accumulation of photons so that it can recover the texture details for both static areas and dynamic areas. Recently, color spike camera has been invented to record color information of dynamic scenes using a color filter array (CFA). However, demosaicing for color spike cameras is an open and challenging problem. In this paper, we develop a demosaicing network, called CSpkNet, to reconstruct dynamic color visual signals from the spike stream captured by the color spike camera. Firstly, we develop a light inference module to convert binary spike streams to intensity estimates. In particular, a feature-based channel attention module is proposed to reduce the noises caused by quantization errors. Secondly, considering both the Bayer configuration and object motion, we propose a motion-guided filtering module to estimate the missing pixels of each color channel, without undesired motion blur. Finally, we design a refinement module to improve the intensity and details, utilizing the color correlation. Experimental results demonstrate that CSpkNet can reconstruct color images from the Bayer-pattern spike stream with promising visual quality.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38215317

RESUMO

Video super-resolution (VSR) is used to compose high-resolution (HR) video from low-resolution video. Recently, the deformable alignment-based VSR methods are becoming increasingly popular. In these methods, the features extracted from video are aligned to eliminate the motion error targeting high super-resolution (SR) quality. However, these methods often suffer from misalignment and the lack of enough temporal information to compose HR frames, which accordingly induce artifacts in the SR result. In this article, we design a deep VSR network (DVSRNet) based on the proposed progressive deformable alignment (PDA) module and temporal-sparse enhancement (TSE) module. Specifically, the PDA module is designed to accurately align features and to eliminate artifacts via the bidirectional information propagation. The TSE module is constructed to further eliminate artifacts and to generate clear details for the HR frame. In addition, we construct a lightweight deep optical flow network (OFNet) to obtain the bidirectional optical flows for the implementation of the PDA module. Moreover, two new loss functions are designed for our proposed method. The first one is adopted in OFNet and the second one is constructed to guarantee the generation of sharp and clear details for the HR frames. The experimental results demonstrate that our method performs better than the state-of-the-art methods.

4.
IEEE Trans Image Process ; 32: 3493-3506, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37335802

RESUMO

Intra prediction is a crucial part of video compression, which utilizes local information in images to eliminate spatial redundancy. As the state-of-the-art video coding standard, Versatile Video Coding (H.266/VVC) employs multiple directional prediction modes in intra prediction to find the texture trend of local areas. Then the prediction is made based on reference samples in the selected direction. Recently, neural network-based intra prediction has achieved great success. Deep network models are trained and applied to assist the HEVC and VVC intra modes. In this paper, we propose a novel tree-structured data clustering-driven neural network (dubbed TreeNet) for intra prediction, which builds the networks and clusters the training data in a tree-structured manner. Specifically, in each network split and training process of TreeNet, every parent network on a leaf node is split into two child networks by adding or subtracting Gaussian random noise. Then data clustering-driven training is applied to train the two derived child networks using the clustered training data of their parent. On the one hand, the networks at the same level in TreeNet are trained with non-overlapping clustered datasets, and thus they can learn different prediction abilities. On the other hand, the networks at different levels are trained with hierarchically clustered datasets, and thus they will have different generalization abilities. TreeNet is integrated into VVC to assist or replace intra prediction modes to test its performance. In addition, a fast termination strategy is proposed to accelerate the search of TreeNet. The experimental results demonstrate that when TreeNet is used to assist the VVC Intra modes, TreeNet with depth = 3 can bring an average of 3.78% bitrate saving (up to 8.12%) over VTM-17.0. If TreeNet with the same depth replaces all VVC intra modes, an average of 1.59% bitrate saving can be reached.


Assuntos
Compressão de Dados , Redes Neurais de Computação , Análise por Conglomerados
5.
IEEE Trans Med Imaging ; 42(3): 619-632, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36279355

RESUMO

The lesion recognition of dermoscopy images is significant for automated skin cancer diagnosis. Most of the existing methods ignore the medical perspective, which is crucial since this task requires a large amount of medical knowledge. A few methods are designed according to medical knowledge, but they ignore to be fully in line with doctors' entire learning and diagnosis process, since certain strategies and steps of those are conducted in practice for doctors. Thus, we put forward Clinical-Inspired Network (CI-Net) to involve the learning strategy and diagnosis process of doctors, as for a better analysis. The diagnostic process contains three main steps: the zoom step, the observe step and the compare step. To simulate these, we introduce three corresponding modules: a lesion area attention module, a feature extraction module and a lesion feature attention module. To simulate the distinguish strategy, which is commonly used by doctors, we introduce a distinguish module. We evaluate our proposed CI-Net on six challenging datasets, including ISIC 2016, ISIC 2017, ISIC 2018, ISIC 2019, ISIC 2020 and PH2 datasets, and the results indicate that CI-Net outperforms existing work. The code is publicly available at https://github.com/lzh19961031/Dermoscopy_classification.


Assuntos
Dermatopatias , Neoplasias Cutâneas , Humanos , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador/métodos , Dermoscopia/métodos , Dermatopatias/diagnóstico por imagem , Neoplasias Cutâneas/diagnóstico por imagem
6.
IEEE Trans Med Imaging ; 41(11): 3398-3410, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-35767510

RESUMO

Medical image segmentation is fundamental and essential for the analysis of medical images. Although prevalent success has been achieved by convolutional neural networks (CNN), challenges are encountered in the domain of medical image analysis by two aspects: 1) lack of discriminative features to handle similar textures of distinct structures and 2) lack of selective features for potential blurred boundaries in medical images. In this paper, we extend the concept of contrastive learning (CL) to the segmentation task to learn more discriminative representation. Specifically, we propose a novel patch-dragsaw contrastive regularization (PDCR) to perform patch-level tugging and repulsing. In addition, a new structure, namely uncertainty-aware feature re- weighting block (UAFR), is designed to address the potential high uncertainty regions in the feature maps and serves as a better feature re- weighting. Our proposed method achieves state-of-the-art results across 8 public datasets from 6 domains. Besides, the method also demonstrates robustness in the limited-data scenario. The code is publicly available at https://github.com/lzh19961031/PDCR_UAFR-MIShttps://github.com/lzh19961031/PDCR_UAFR-MIS.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador/métodos
7.
Artigo em Inglês | MEDLINE | ID: mdl-32149688

RESUMO

Block transform coded images usually suffer from annoying artifacts at low bit-rates, because of the independent quantization of DCT coefficients. Image prior models play an important role in compressed image reconstruction. Natural image patches in a small neighborhood of the high-dimensional image space usually exhibit an underlying sub-manifold structure. To model the distribution of signal, we extract sub-manifold structure as prior knowledge. We utilize graph Laplacian regularization to characterize the sub-manifold structure at patch level. And similar patches are exploited as samples to estimate distribution of a particular patch. Instead of using Euclidean distance as similarity metric, we propose to use graph-domain distance to measure the patch similarity. Then we perform low-rank regularization on the similar-patch group, and incorporate a non-convex lp penalty to surrogate matrix rank. Finally, an alternatively minimizing strategy is employed to solve the non-convex problem. Experimental results show that our proposed method is capable of achieving more accurate reconstruction than the state-of-the-art methods in both objective and perceptual qualities.

8.
IEEE Trans Image Process ; 27(10): 4987-5001, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29985138

RESUMO

Transform and quantization account for a considerable amount of computation time in video encoding process. However, there are a large number of discrete cosine transform coefficients which are finally quantized into zeros. In essence, blocks with all zero quantized coefficients do not transmit any information, but still occupy substantial unnecessary computational resources. As such, detecting all-zero block (AZB) before transform and quantization has been recognized to be an efficient approach to speed up the encoding process. Instead of considering the hard-decision quantization (HDQ) only, in this paper, we incorporate the properties of soft-decision quantization into the AZB detection. In particular, we categorize the AZB blocks into genuine AZBs (G-AZB) and pseudo AZBs (P-AZBs) to distinguish their origins. For G-AZBs directly generated from HDQ, the sum of absolute transformed difference-based approach is adopted for early termination. Regarding the classification of P-AZBs which are generated in the sense of rate-distortion optimization, the rate-distortion models established based on transform coefficients together with the adaptive searching of the maximum transform coefficient are jointly employed for the discrimination. Experimental results show that our algorithm can achieve up to 24.16% transform and quantization time-savings with less than 0.06% RD performance loss. The total encoder time saving is about 5.18% on average with the maximum value up to 9.12%. Moreover, the detection accuracy of larger TU sizes, such as and can reach to 95% on average.

9.
IEEE Trans Image Process ; 27(7): 3236-3247, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29641403

RESUMO

This paper proposes a deep learning method for intra prediction. Different from traditional methods utilizing some fixed rules, we propose using a fully connected network to learn an end-to-end mapping from neighboring reconstructed pixels to the current block. In the proposed method, the network is fed by multiple reference lines. Compared with traditional single line-based methods, more contextual information of the current block is utilized. For this reason, the proposed network has the potential to generate better prediction. In addition, the proposed network has good generalization ability on different bitrate settings. The model trained from a specified bitrate setting also works well on other bitrate settings. Experimental results demonstrate the effectiveness of the proposed method. When compared with high efficiency video coding reference software HM-16.9, our network can achieve an average of 3.4% bitrate saving. In particular, the average result of 4K sequences is 4.5% bitrate saving, where the maximum one is 7.4%.

10.
IEEE Trans Image Process ; 27(8): 3827-3841, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-29698212

RESUMO

High efficiency video coding (HEVC) standard achieves half bit-rate reduction while keeping the same quality compared with AVC. However, it still cannot satisfy the demand of higher quality in real applications, especially at low bit rates. To further improve the quality of reconstructed frame while reducing the bitrates, a residual highway convolutional neural network (RHCNN) is proposed in this paper for in-loop filtering in HEVC. The RHCNN is composed of several residual highway units and convolutional layers. In the highway units, there are some paths that could allow unimpeded information across several layers. Moreover, there also exists one identity skip connection (shortcut) from the beginning to the end, which is followed by one small convolutional layer. Without conflicting with deblocking filter (DF) and sample adaptive offset (SAO) filter in HEVC, RHCNN is employed as a high-dimension filter following DF and SAO to enhance the quality of reconstructed frames. To facilitate the real application, we apply the proposed method to I frame, P frame, and B frame, respectively. For obtaining better performance, the entire quantization parameter (QP) range is divided into several QP bands, where a dedicated RHCNN is trained for each QP band. Furthermore, we adopt a progressive training scheme for the RHCNN where the QP band with lower value is used for early training and their weights are used as initial weights for QP band of higher values in a progressive manner. Experimental results demonstrate that the proposed method is able to not only raise the PSNR of reconstructed frame but also prominently reduce the bit-rate compared with HEVC reference software.

11.
IEEE Trans Image Process ; 27(4): 1966-1980, 2018 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33156782

RESUMO

This paper proposes a single-image super-resolution scheme by introducing a gradient field sharpening transform that converts the blurry gradient field of upsampled low-resolution (LR) image to a much sharper gradient field of original high-resolution (HR) image. Different from the existing methods that need to figure out the whole gradient profile structure and locate the edge points, we derive a new approach that sharpens the gradient field adaptively only based on the pixels in a small neighborhood. To maintain image contrast, image gradient is adaptively scaled to keep the integral of gradient field stable. Finally, the HR image is reconstructed by fusing the LR image with the sharpened HR gradient field. Experimental results demonstrate that the proposed algorithm can generate more accurate gradient field and produce super-resolved images with better objective and visual qualities. Another advantage is that the proposed gradient sharpening transform is very fast and suitable for low-complexity applications.

12.
IEEE Trans Image Process ; 26(1): 222-236, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-27810810

RESUMO

Recently, there is a resurgence of interest in uncoded transmission for wireless visual communication. While conventional coded systems suffer from cliff effect as the channel condition varies dynamically, uncoded linear-transformed transmission (ULT) provides elegant quality degradation for wide channel SNR range. ULT skips non-linear operations, such as quantization and entropy coding. Instead, it utilizes linear decorrelation transform and linear scaling power allocation to achieve optimized transmission. This paper presents a theoretical analysis for power-distortion optimization of ULT. In addition to the observation in our previous work that a decorrelation transform can bring significant performance gain, this paper reveals that exploiting the energy diversity in transformed signal is the key to achieve the full potential of decorrelation transform. In particular, we investigated the efficiency of ULT with exact or inexact signal statistics, highlighting the impact of signal energy modeling accuracy. Based on that, we further proposed two practical energy modeling schemes for ULT of visual signals. Experimental results show that the proposed schemes improve the quality of reconstructed images by 3~5 dB, while reducing the signal modeling overhead from hundreds or thousands of meta data to only a few meta data. The perceptual quality of reconstruction is significantly improved.

13.
IEEE Trans Image Process ; 25(3): 1246-59, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26761774

RESUMO

Due to independent and coarse quantization of transform coefficients in each block, block-based transform coding usually introduces visually annoying blocking artifacts at low bitrates, which greatly prevents further bit reduction. To alleviate the conflict between bit reduction and quality preservation, deblocking as a post-processing strategy is an attractive and promising solution without changing existing codec. In this paper, in order to reduce blocking artifacts and obtain high-quality image, image deblocking is formulated as an optimization problem within maximum a posteriori framework, and a novel algorithm for image deblocking using constrained non-convex low-rank model is proposed. The ℓ(p) (0 < p < 1) penalty function is extended on singular values of a matrix to characterize low-rank prior model rather than the nuclear norm, while the quantization constraint is explicitly transformed into the feasible solution space to constrain the non-convex low-rank optimization. Moreover, a new quantization noise model is developed, and an alternatively minimizing strategy with adaptive parameter adjustment is developed to solve the proposed optimization problem. This parameter-free advantage enables the whole algorithm more attractive and practical. Experiments demonstrate that the proposed image deblocking algorithm outperforms the current state-of-the-art methods in both the objective quality and the perceptual quality.

14.
IEEE Trans Image Process ; 24(12): 6048-61, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26441447

RESUMO

Annoying compression artifacts exist in most of lossy coded videos at low bit rates, which are caused by coarse quantization of transform coefficients or motion compensation from distorted frames. In this paper, we propose a compression artifact reduction approach that utilizes both the spatial and the temporal correlation to form multi-hypothesis predictions from spatio-temporal similar blocks. For each transform block, three predictions with their reliabilities are estimated, respectively. The first prediction is constructed by inversely quantizing transform coefficients directly, and its reliability is determined by the variance of quantization noise. The second prediction is derived by representing each transform block with a temporal auto-regressive (TAR) model along its motion trajectory, and its corresponding reliability is estimated from local prediction errors of the TAR model. The last prediction infers the original coefficients from similar blocks in non-local regions, and its reliability is estimated based on the distribution of coefficients in these similar blocks. Finally, all the predictions are adaptively fused according to their reliabilities to restore high-quality videos. The experimental results show that the proposed method can efficiently reduce most of the compression artifacts and improve both subjective and objective quality of block transform coded videos.

15.
IEEE Trans Image Process ; 22(12): 4613-26, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23893722

RESUMO

Block transform coded images usually suffer from annoying artifacts at low bit rates, caused by the coarse quantization of transform coefficients. In this paper, we propose a new method to reduce compression artifacts by the overlapped-block transform coefficient estimation from non-local blocks. In the proposed method, the discrete cosine transform coefficients of each block are estimated by adaptively fusing two prediction values based on their reliabilities. One prediction is the quantized values of coefficients decoded from the compressed bitstream, whose reliability is determined by quantization steps. The other prediction is the weighted average of the coefficients in nonlocal blocks, whose reliability depends on the variance of the coefficients in these blocks. The weights are used to distinguish the effectiveness of the coefficients in nonlocal blocks to predict original coefficients and are determined by block similarity in transform domain. To solve the optimization problem, the overlapped blocks are divided into several subsets. Each subset contains nonoverlapped blocks covering the whole image and is optimized independently. Therefore, the overall optimization is reduced to a set of sub-optimization problems, which can be easily solved. Finally, we provide a strategy for parameter selection based on the compression levels. Experimental results show that the proposed method can remarkably reduce compression artifacts and significantly improve both the subjective and objective qualities of block transform coded images.

16.
IEEE Trans Image Process ; 22(11): 4364-79, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23864205

RESUMO

This paper investigates priority encoding transmission (PET) protection for streaming scalably compressed video streams over erasure channels, for the scenarios where a small number of retransmissions are allowed. In principle, the optimal protection depends not only on the importance of each stream element, but also on the expected channel behavior. By formulating a collection of hypotheses concerning its own behavior in future transmissions, limited-retransmission PET (LR-PET) effectively constructs channel codes spanning multiple transmission slots and thus offers better protection efficiency than the original PET. As the number of transmission opportunities increases, the optimization for LR-PET becomes very challenging because the number of hypothetical retransmission paths increases exponentially. As a key contribution, this paper develops a method to derive the effective recovery-probability versus redundancy-rate characteristic for the LR-PET procedure with any number of transmission opportunities. This significantly accelerates the protection assignment procedure in the original LR-PET with only two transmissions, and also makes a quick and optimal protection assignment feasible for scenarios where more transmissions are possible. This paper also gives a concrete proof to the redundancy embedding property of the channel codes formed by LR-PET, which allows for a decoupled optimization for sequentially dependent source elements with convex utility-length characteristic. This essentially justifies the source-independent construction of the protection convex hull for LR-PET.


Assuntos
Algoritmos , Compressão de Dados/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Fotografação/métodos , Processamento de Sinais Assistido por Computador , Gravação em Vídeo/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
17.
IEEE Trans Image Process ; 20(11): 3291-6, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21632305

RESUMO

Traditional methods for image downsampling commit to remove the aliasing artifacts. However, the influences on the quality of the image interpolated from the downsampled one are usually neglected. To tackle this problem, in this paper, we propose an interpolation-dependent image downsampling (IDID), where interpolation is hinged to downsampling. Given an interpolation method, the goal of IDID is to obtain a downsampled image that minimizes the sum of square errors between the input image and the one interpolated from the corresponding downsampled image. Utilizing a least squares algorithm, the solution of IDID is derived as the inverse operator of upsampling. We also devise a content-dependent IDID for the interpolation methods with varying interpolation coefficients. Numerous experimental results demonstrate the viability and efficiency of the proposed IDID.

18.
IEEE Trans Image Process ; 20(12): 3455-69, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21571611

RESUMO

The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation.

19.
IEEE Trans Image Process ; 19(9): 2382-95, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20378470

RESUMO

For streaming scalably compressed video streams over unreliable networks, Limited-Retransmission Priority Encoding Transmission (LR-PET) outperforms PET remarkably since the opportunity to retransmit is fully exploited by hypothesizing the possible future retransmission behavior before the retransmission really occurs. For the retransmission to be efficient in such a scheme, it is critical to get adequate acknowledgment from a previous transmission before deciding what data to retransmit. However, in many scenarios, the presence of a stochastic packet delay process results in frequent late acknowledgements, while imperfect feedback channels can impair the server's knowledge of what the client has received. This paper proposes an extended LR-PET scheme, which optimizes PET-protection of transmitted bitstreams, recognizing that the received feedback information is likely to be incomplete. Similar to the original LR-PET, the behavior of future retransmissions is hypothesized in the optimization objective of each transmission opportunity. As the key contribution, we develop a method to efficiently derive the effective recovery probability versus redundancy rate characteristic for the extended LR-PET communication process. This significantly simplifies the ultimate protection assignment procedure. This paper also demonstrates the advantage of the proposed strategy over several alternative strategies.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA