Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Sensors (Basel) ; 22(4)2022 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-35214252

RESUMO

The paper proposes a novel post-filtering method based on convolutional neural networks (CNNs) for quality enhancement of RGB/grayscale images and video sequences. The lossy images are encoded using common image codecs, such as JPEG and JPEG2000. The video sequences are encoded using previous and ongoing video coding standards, high-efficiency video coding (HEVC) and versatile video coding (VVC), respectively. A novel deep neural network architecture is proposed to estimate fine refinement details for full-, half-, and quarter-patch resolutions. The proposed architecture is built using a set of efficient processing blocks designed based on the following concepts: (i) the multi-head attention mechanism for refining the feature maps, (ii) the weight sharing concept for reducing the network complexity, and (iii) novel block designs of layer structures for multiresolution feature fusion. The proposed method provides substantial performance improvements compared with both common image codecs and video coding standards. Experimental results on high-resolution images and standard video sequences show that the proposed post-filtering method provides average BD-rate savings of 31.44% over JPEG and 54.61% over HEVC (x265) for RGB images, Y-BD-rate savings of 26.21% over JPEG and 15.28% over VVC (VTM) for grayscale images, and 15.47% over HEVC and 14.66% over VVC for video sequences.


Assuntos
Compressão de Dados , Aprendizado Profundo , Compressão de Dados/métodos , Redes Neurais de Computação , Gravação em Vídeo/métodos
2.
Sensors (Basel) ; 22(24)2022 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-36560383

RESUMO

The event sensor provides high temporal resolution and generates large amounts of raw event data. Efficient low-complexity coding solutions are required for integration into low-power event-processing chips with limited memory. In this paper, a novel lossless compression method is proposed for encoding the event data represented as asynchronous event sequences. The proposed method employs only low-complexity coding techniques so that it is suitable for hardware implementation into low-power event-processing chips. A first, novel, contribution consists of a low-complexity coding scheme which uses a decision tree to reduce the representation range of the residual error. The decision tree is formed by using a triplet threshold parameter which divides the input data range into several coding ranges arranged at concentric distances from an initial prediction, so that the residual error of the true value information is represented by using a reduced number of bits. Another novel contribution consists of an improved representation, which divides the input sequence into same-timestamp subsequences, wherein each subsequence collects the same timestamp events in ascending order of the largest dimension of the event spatial information. The proposed same-timestamp representation replaces the event timestamp information with the same-timestamp subsequence length and encodes it together with the event spatial and polarity information into a different bitstream. Another novel contribution is the random access to any time window by using additional header information. The experimental evaluation on a highly variable event density dataset demonstrates that the proposed low-complexity lossless coding method provides an average improvement of 5.49%, 11.45%, and 35.57% compared with the state-of-the-art performance-oriented lossless data compression codecs Bzip2, LZMA, and ZLIB, respectively. To our knowledge, the paper proposes the first low-complexity lossless compression method for encoding asynchronous event sequences that are suitable for hardware implementation into low-power chips.


Assuntos
Algoritmos , Compressão de Dados , Compressão de Dados/métodos
3.
Sensors (Basel) ; 22(21)2022 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-36366136

RESUMO

In recent years, Vehicle Make and Model Recognition (VMMR) has attracted a lot of attention as it plays a crucial role in Intelligent Transportation Systems (ITS). Accurate and efficient VMMR systems are required in real-world applications including intelligent surveillance and autonomous driving. The paper introduces a new large-scale dataset and a novel deep learning paradigm for VMMR. A new large-scale dataset dubbed Diverse large-scale VMM (DVMM) is proposed collecting image-samples with the most popular vehicle brands operating in Europe. A novel VMMR framework is proposed which follows a two-branch architecture performing make and model recognition respectively. A two-stage training procedure and a novel decision module are proposed to process the make and model predictions and compute the final model prediction. In addition, a novel metric based on the true positive rate is proposed to compare classification confusion of the proposed 2B-2S and the baseline methods. A complex experimental validation is carried out, demonstrating the generality, diversity, and practicality of the proposed DVMM dataset. The experimental results show that the proposed framework provides 93.95% accuracy over the more diverse DVMM dataset and 95.85% accuracy over traditional VMMR datasets. The proposed two-branch approach outperforms the conventional one-branch approach for VMMR over small-, medium-, and large-scale datasets by providing lower vehicle model confusion and reduced inter-make ambiguity. The paper demonstrates the advantages of the proposed two-branch VMMR paradigm in terms of robustness and lower confusion relative to single-branch designs.


Assuntos
Aprendizado Profundo , Pesquisa , Coleta de Dados , Modelos Biológicos , Inteligência
4.
Sensors (Basel) ; 21(9)2021 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-34067191

RESUMO

In this paper, we propose a novel filtering method based on deep attention networks for the quality enhancement of light field (LF) images captured by plenoptic cameras and compressed using the High Efficiency Video Coding (HEVC) standard. The proposed architecture was built using efficient complex processing blocks and novel attention-based residual blocks. The network takes advantage of the macro-pixel (MP) structure, specific to LF images, and processes each reconstructed MP in the luminance (Y) channel. The input patch is represented as a tensor that collects, from an MP neighbourhood, four Epipolar Plane Images (EPIs) at four different angles. The experimental results on a common LF image database showed high improvements over HEVC in terms of the structural similarity index (SSIM), with an average Y-Bjøntegaard Delta (BD)-rate savings of 36.57%, and an average Y-BD-PSNR improvement of 2.301 dB. Increased performance was achieved when the HEVC built-in filtering methods were skipped. The visual results illustrate that the enhanced image contains sharper edges and more texture details. The ablation study provides two robust solutions to reduce the inference time by 44.6% and the network complexity by 74.7%. The results demonstrate the potential of attention networks for the quality enhancement of LF images encoded by HEVC.

5.
Sensors (Basel) ; 21(1)2021 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-33401627

RESUMO

The paper proposes a novel instance segmentation method for traffic videos devised for deployment on real-time embedded devices. A novel neural network architecture is proposed using a multi-resolution feature extraction backbone and improved network designs for the object detection and instance segmentation branches. A novel post-processing method is introduced to ensure a reduced rate of false detection by evaluating the quality of the output masks. An improved network training procedure is proposed based on a novel label assignment algorithm. An ablation study on speed-vs.-performance trade-off further modifies the two branches and replaces the conventional ResNet-based performance-oriented backbone with a lightweight speed-oriented design. The proposed architectural variations achieve real-time performance when deployed on embedded devices. The experimental results demonstrate that the proposed instance segmentation method for traffic videos outperforms the you only look at coefficients algorithm, the state-of-the-art real-time instance segmentation method. The proposed architecture achieves qualitative results with 31.57 average precision on the COCO dataset, while its speed-oriented variations achieve speeds of up to 66.25 frames per second on the Jetson AGX Xavier module.

6.
Sensors (Basel) ; 20(21)2020 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-33143080

RESUMO

The paper presents a novel depth-estimation method for light-field (LF) images based on innovative multi-stereo matching and machine-learning techniques. In the first stage, a novel block-based stereo matching algorithm is employed to compute the initial estimation. The proposed algorithm is specifically designed to operate on any pair of sub-aperture images (SAIs) in the LF image and to compute the pair's corresponding disparity map. For the central SAI, a disparity fusion technique is proposed to compute the initial disparity map based on all available pairwise disparities. In the second stage, a novel pixel-wise deep-learning (DL)-based method for residual error prediction is employed to further refine the disparity estimation. A novel neural network architecture is proposed based on a new structure of layers. The proposed DL-based method is employed to predict the residual error of the initial estimation and to refine the final disparity map. The experimental results demonstrate the superiority of the proposed framework and reveal that the proposed method achieves an average improvement of 15.65% in root mean squared error (RMSE), 43.62% in mean absolute error (MAE), and 5.03% in structural similarity index (SSIM) over machine-learning-based state-of-the-art methods.

7.
IEEE Trans Image Process ; 22(11): 4195-210, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23807443

RESUMO

This paper introduces an efficient method for lossless compression of depth map images, using the representation of a depth image in terms of three entities: 1) the crack-edges; 2) the constant depth regions enclosed by them; and 3) the depth value over each region. The starting representation is identical with that used in a very efficient coder for palette images, the piecewise-constant image model coding, but the techniques used for coding the elements of the representation are more advanced and especially suitable for the type of redundancy present in depth images. Initially, the vertical and horizontal crack-edges separating the constant depth regions are transmitted by 2D context coding using optimally pruned context trees. Both the encoder and decoder can reconstruct the regions of constant depth from the transmitted crack-edge image. The depth value in a given region is encoded using the depth values of the neighboring regions already encoded, exploiting the natural smoothness of the depth variation, and the mutual exclusiveness of the values in neighboring regions. The encoding method is suitable for lossless compression of depth images, obtaining compression of about 10-65 times, and additionally can be used as the entropy coding stage for lossy depth compression.


Assuntos
Algoritmos , Compressão de Dados/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Modelos Teóricos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA