Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
IEEE Trans Image Process ; 31: 2661-2672, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35316184

RESUMO

High Dynamic Range (HDR) imaging via multi-exposure fusion is an important task for most modern imaging platforms. In spite of recent developments in both hardware and algorithm innovations, challenges remain over content association ambiguities caused by saturation, motion, and various artifacts introduced during multi-exposure fusion such as ghosting, noise, and blur. In this work, we propose an Attention-guided Progressive Neural Texture Fusion (APNT-Fusion) HDR restoration model which aims to address these issues within one framework. An efficient two-stream structure is proposed which separately focuses on texture feature transfer over saturated regions and multi-exposure tonal and texture feature fusion. A neural feature transfer mechanism is proposed which establishes spatial correspondence between different exposures based on multi-scale VGG features in the masked saturated HDR domain for discriminative contextual clues over the ambiguous image areas. A progressive texture blending module is designed to blend the encoded two-stream features in a multi-scale and progressive manner. In addition, we introduce several novel attention mechanisms, i.e., the motion attention module detects and suppresses the content discrepancies among the reference images; the saturation attention module facilitates differentiating the misalignment caused by saturation from those caused by motion; and the scale attention module ensures texture blending consistency between different coder/decoder scales. We carry out comprehensive qualitative and quantitative evaluations and ablation studies, which validate that these novel modules work coherently under the same framework and outperform state-of-the-art methods.


Assuntos
Algoritmos
2.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6094-6110, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34101585

RESUMO

Coded aperture is a promising approach for capturing the 4-D light field (LF), in which the 4-D data are compressively modulated into 2-D coded measurements that are further decoded by reconstruction algorithms. The bottleneck lies in the reconstruction algorithms, resulting in rather limited reconstruction quality. To tackle this challenge, we propose a novel learning-based framework for the reconstruction of high-quality LFs from acquisitions via learned coded apertures. The proposed method incorporates the measurement observation into the deep learning framework elegantly to avoid relying entirely on data-driven priors for LF reconstruction. Specifically, we first formulate the compressive LF reconstruction as an inverse problem with an implicit regularization term. Then, we construct the regularization term with a deep efficient spatial-angular separable convolutional sub-network in the form of local and global residual learning to comprehensively explore the signal distribution free from the limited representation ability and inefficiency of deterministic mathematical modeling. Furthermore, we extend this pipeline to LF denoising and spatial super-resolution, which could be considered as variants of coded aperture imaging equipped with different degradation matrices. Extensive experimental results demonstrate that the proposed methods outperform state-of-the-art approaches to a significant extent both quantitatively and qualitatively, i.e., the reconstructed LFs not only achieve much higher PSNR/SSIM but also preserve the LF parallax structure better on both real and synthetic LF benchmarks. The code will be publicly available at https://github.com/MantangGuo/DRLF.

3.
IEEE Trans Image Process ; 30: 2876-2887, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33539297

RESUMO

In this article, we propose a novel self-training approach named Crowd-SDNet that enables a typical object detector trained only with point-level annotations (i.e., objects are labeled with points) to estimate both the center points and sizes of crowded objects. Specifically, during training, we utilize the available point annotations to supervise the estimation of the center points of objects directly. Based on a locally-uniform distribution assumption, we initialize pseudo object sizes from the point-level supervisory information, which are then leveraged to guide the regression of object sizes via a crowdedness-aware loss. Meanwhile, we propose a confidence and order-aware refinement scheme to continuously refine the initial pseudo object sizes such that the ability of the detector is increasingly boosted to detect and count objects in crowds simultaneously. Moreover, to address extremely crowded scenes, we propose an effective decoding method to improve the detector's representation ability. Experimental results on the WiderFace benchmark show that our approach significantly outperforms state-of-the-art point-supervised methods under both detection and counting tasks, i.e., our method improves the average precision by more than 10% and reduces the counting error by 31.2%. Besides, our method obtains the best results on the crowd counting and localization datasets (i.e., ShanghaiTech and NWPU-Crowd) and vehicle counting datasets (i.e., CARPK and PUCPR+) compared with state-of-the-art counting-by-detection methods. The code will be publicly available at https://github.com/WangyiNTU/Point-supervised-crowd-detection.

4.
IEEE Trans Neural Netw Learn Syst ; 32(5): 2299-2304, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-32511095

RESUMO

Regularization is commonly used for alleviating overfitting in machine learning. For convolutional neural networks (CNNs), regularization methods, such as DropBlock and Shake-Shake, have illustrated the improvement in the generalization performance. However, these methods lack a self-adaptive ability throughout training. That is, the regularization strength is fixed to a predefined schedule, and manual adjustments are required to adapt to various network architectures. In this article, we propose a dynamic regularization method for CNNs. Specifically, we model the regularization strength as a function of the training loss. According to the change of the training loss, our method can dynamically adjust the regularization strength in the training procedure, thereby balancing the underfitting and overfitting of CNNs. With dynamic regularization, a large-scale model is automatically regularized by the strong perturbation, and vice versa. Experimental results show that the proposed method can improve the generalization capability on off-the-shelf network architectures and outperform state-of-the-art regularization methods.

5.
IEEE Trans Image Process ; 27(10): 4889-4900, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29969399

RESUMO

Depth estimation is a fundamental problem for light field photography applications. Numerous methods have been proposed in recent years, which either focus on crafting cost terms for more robust matching, or on analyzing the geometry of scene structures embedded in the epipolar-plane images. Significant improvements have been made in terms of overall depth estimation error; however, current state-of-the-art methods still show limitations in handling intricate occluding structures and complex scenes with multiple occlusions. To address these challenging issues, we propose a very effective depth estimation framework which focuses on regularizing the initial label confidence map and edge strength weights. Specifically, we first detect partially occluded boundary regions (POBR) via superpixel-based regularization. Series of shrinkage/reinforcement operations are then applied on the label confidence map and edge strength weights over the POBR. We show that after weight manipulations, even a low-complexity weighted least squares model can produce much better depth estimation than the state-of-the-art methods in terms of average disparity error rate, occlusion boundary precision-recall rate, and the preservation of intricate visual features.

6.
IEEE J Biomed Health Inform ; 20(3): 915-924, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-25775501

RESUMO

A human-computer interface (namely Facial position and expression Mouse system, FM) for the persons with tetraplegia based on a monocular infrared depth camera is presented in this paper. The nose position along with the mouth status (close/open) is detected by the proposed algorithm to control and navigate the cursor as computer user input. The algorithm is based on an improved Randomized Decision Tree, which is capable of detecting the facial information efficiently and accurately. A more comfortable user experience is achieved by mapping the nose motion to the cursor motion via a nonlinear function. The infrared depth camera enables the system to be independent of illumination and color changes both from the background and on human face, which is a critical advantage over RGB camera-based options. Extensive experimental results show that the proposed system outperforms existing assistive technologies in terms of quantitative and qualitative assessments.


Assuntos
Face/fisiologia , Expressão Facial , Quadriplegia/reabilitação , Tecnologia Assistiva , Interface Usuário-Computador , Adulto , Algoritmos , Árvores de Decisões , Feminino , Humanos , Masculino , Adulto Jovem
7.
IEEE J Biomed Health Inform ; 19(2): 430-9, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24771601

RESUMO

The elderly population is increasing rapidly all over the world. One major risk for elderly people is fall accidents, especially for those living alone. In this paper, we propose a robust fall detection approach by analyzing the tracked key joints of the human body using a single depth camera. Compared to the rivals that rely on the RGB inputs, the proposed scheme is independent of illumination of the lights and can work even in a dark room. In our scheme, a pose-invariant randomized decision tree algorithm is proposed for the key joint extraction, which requires low computational cost during the training and test. Then, the support vector machine classifier is employed to determine whether a fall motion occurs, whose input is the 3-D trajectory of the head joint. The experimental results demonstrate that the proposed fall detection method is more accurate and robust compared with the state-of-the-art methods.


Assuntos
Acidentes por Quedas , Processamento de Imagem Assistida por Computador/métodos , Monitorização Ambulatorial/instrumentação , Monitorização Ambulatorial/métodos , Gravação em Vídeo/instrumentação , Adulto , Feminino , Cabeça/fisiologia , Humanos , Masculino , Reprodutibilidade dos Testes , Adulto Jovem
8.
IEEE Trans Image Process ; 16(11): 2830-41, 2007 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17990759

RESUMO

This paper proposes a new algorithm to integrate image registration into image super-resolution (SR). Image SR is a process to reconstruct a high-resolution (HR) image by fusing multiple low-resolution (LR) images. A critical step in image SR is accurate registration of the LR images or, in other words, effective estimation of motion parameters. Conventional SR algorithms assume either the estimated motion parameters by existing registration methods to be error-free or the motion parameters are known a priori. This assumption, however, is impractical in many applications, as most existing registration algorithms still experience various degrees of errors, and the motion parameters among the LR images are generally unknown a priori. In view of this, this paper presents a new framework that performs simultaneous image registration and HR image reconstruction. As opposed to other current methods that treat image registration and HR reconstruction as disjoint processes, the new framework enables image registration and HR reconstruction to be estimated simultaneously and improved progressively. Further, unlike most algorithms that focus on the translational motion model, the proposed method adopts a more generic motion model that includes both translation as well as rotation. An iterative scheme is developed to solve the arising nonlinear least squares problem. Experimental results show that the proposed method is effective in performing image registration and SR for simulated as well as real-life images.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação Estatística de Dados , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Simulação por Computador , Armazenamento e Recuperação da Informação/métodos , Análise dos Mínimos Quadrados , Modelos Estatísticos , Dinâmica não Linear , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
9.
IEEE Trans Image Process ; 15(6): 1323-30, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16764260

RESUMO

In this paper, we address the problem of unequal error protection (UEP) for scalable video transmission over wireless packet-erasure channel. Unequal amounts of protection are allocated to the different frames (I- or P-frame) of a group-of-pictures (GOP), and in each frame, unequal amounts of protection are allocated to the progressive bit-stream of scalable video to provide a graceful degradation of video quality as packet loss rate varies. We use a genetic algorithm (GA) to quickly get the allocation pattern, which is hard to get with other conventional methods, like hill-climbing method. Theoretical analysis and experimental results both demonstrate the advantage of the proposed algorithm.


Assuntos
Artefatos , Redes de Comunicação de Computadores , Compressão de Dados/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Processamento de Sinais Assistido por Computador , Gravação em Vídeo/métodos , Algoritmos , Simulação por Computador , Interpretação Estatística de Dados , Modelos Genéticos , Modelos Estatísticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA