Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Sensors (Basel) ; 24(9)2024 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-38732817

RESUMO

Existing retinex-based low-light image enhancement strategies focus heavily on crafting complex networks for Retinex decomposition but often result in imprecise estimations. To overcome the limitations of previous methods, we introduce a straightforward yet effective strategy for Retinex decomposition, dividing images into colormaps and graymaps as new estimations for reflectance and illumination maps. The enhancement of these maps is separately conducted using a diffusion model for improved restoration. Furthermore, we address the dual challenge of perturbation removal and brightness adjustment in illumination maps by incorporating brightness guidance. This guidance aids in precisely adjusting the brightness while eliminating disturbances, ensuring a more effective enhancement process. Extensive quantitative and qualitative experimental analyses demonstrate that our proposed method improves the performance by approximately 4.4% on the LOL dataset compared to other state-of-the-art diffusion-based methods, while also validating the model's generalizability across multiple real-world datasets.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12408-12426, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37819806

RESUMO

Natural untrimmed videos provide rich visual content for self-supervised learning. Yet most previous efforts to learn spatio-temporal representations rely on manually trimmed videos, such as Kinetics dataset (Carreira and Zisserman 2017), resulting in limited diversity in visual patterns and limited performance gains. In this work, we aim to improve video representations by leveraging the rich information in natural untrimmed videos. For this purpose, we propose learning a hierarchy of temporal consistencies in videos, i.e., visual consistency and topical consistency, corresponding respectively to clip pairs that tend to be visually similar when separated by a short time span, and clip pairs that share similar topics when separated by a long time span. Specifically, we present a Hierarchical Consistency (HiCo++) learning framework, in which the visually consistent pairs are encouraged to share the same feature representations by contrastive learning, while topically consistent pairs are coupled through a topical classifier that distinguishes whether they are topic-related, i.e., from the same untrimmed video. Additionally, we impose a gradual sampling algorithm for the proposed hierarchical consistency learning, and demonstrate its theoretical superiority. Empirically, we show that HiCo++ can not only generate stronger representations on untrimmed videos, but also improve the representation quality when applied to trimmed videos. This contrasts with standard contrastive learning, which fails to learn powerful representations from untrimmed videos. Source code will be made available here.

3.
IEEE Trans Image Process ; 32: 3717-3731, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37405882

RESUMO

Improving boundary segmentation results has recently attracted increasing attention in the field of semantic segmentation. Since existing popular methods usually exploit the long-range context, the boundary cues are obscure in the feature space, leading to poor boundary results. In this paper, we propose a novel conditional boundary loss (CBL) for semantic segmentation to improve the performance of the boundaries. The CBL creates a unique optimization goal for each boundary pixel, conditioned on its surrounding neighbors. The conditional optimization of the CBL is easy yet effective. In contrast, most previous boundary-aware methods have difficult optimization goals or may cause potential conflicts with the semantic segmentation task. Specifically, the CBL enhances the intra-class consistency and inter-class difference, by pulling each boundary pixel closer to its unique local class center and pushing it away from its different-class neighbors. Moreover, the CBL filters out noisy and incorrect information to obtain precise boundaries, since only surrounding neighbors that are correctly classified participate in the loss calculation. Our loss is a plug-and-play solution that can be used to improve the boundary segmentation performance of any semantic segmentation network. We conduct extensive experiments on ADE20K, Cityscapes, and Pascal Context, and the results show that applying the CBL to various popular segmentation networks can significantly improve the mIoU and boundary F-score performance.

4.
IEEE Trans Cybern ; 53(3): 1641-1652, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34506295

RESUMO

Human parsing is a fine-grained semantic segmentation task, which needs to understand human semantic parts. Most existing methods model human parsing as a general semantic segmentation, which ignores the inherent relationship among hierarchical human parts. In this work, we propose a pose-guided hierarchical semantic decomposition and composition framework for human parsing. Specifically, our method includes a semantic maintained decomposition and composition (SMDC) module and a pose distillation (PC) module. SMDC progressively disassembles the human body to focus on the more concise regions of interest in the decomposition stage and then gradually assembles human parts under the guidance of pose information in the composition stage. Notably, SMDC maintains the atomic semantic labels during both stages to avoid the error propagation issue of the hierarchical structure. To further take advantage of the relationship of human parts, we introduce pose information as explicit guidance for the composition. However, the discrete structure prediction in pose estimation is against the requirement of the continuous region in human parsing. To this end, we design a PC module to broadcast the maximum responses of pose estimation to form the continuous structure in the way of knowledge distillation. The experimental results on the look-into-person (LIP) and PASCAL-Person-Part datasets demonstrate the superiority of our method compared with the state-of-the-art methods, that is, 55.21% mean Intersection of Union (mIoU) on LIP and 69.88% mIoU on PASCAL-Person-Part.


Assuntos
Semântica , Humanos
5.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7319-7337, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-36355744

RESUMO

Person search aims at localizing and recognizing query persons from raw video frames, which is a combination of two sub-tasks, i.e., pedestrian detection and person re-identification. The dominant fashion is termed as the one-step person search that jointly optimizes detection and identification in a unified network, exhibiting higher efficiency. However, there remain major challenges: (i) conflicting objectives of multiple sub-tasks under the shared feature space, (ii) inconsistent memory bank caused by the limited batch size, (iii) underutilized unlabeled identities during the identification learning. To address these issues, we develop an enhanced decoupled and memory-reinforced network (DMRNet++). First, we simplify the standard tightly coupled pipelines and establish a task-decoupled framework (TDF). Second, we build a memory-reinforced mechanism (MRM), with a slow-moving average of the network to better encode the consistency of the memorized features. Third, considering the potential of unlabeled samples, we model the recognition process as semi-supervised learning. An unlabeled-aided contrastive loss (UCL) is developed to boost the identification feature learning by exploiting the aggregation of unlabeled identities. Experimentally, the proposed DMRNet++ obtains the mAP of 94.5% and 52.1% on CUHK-SYSU and PRW datasets, which exceeds most existing methods.

6.
Appl Opt ; 61(25): 7498-7507, 2022 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-36256055

RESUMO

In an uncooled infrared imaging system, thermal radiation effects are caused by the heat source from the target or the detection window, which affects the ability of target detection, tracking, and recognition seriously. To address this problem, a multi-scale correction method via a fast surface fitting with Chebyshev polynomials is proposed. A high-precision Chebyshev polynomial surface fitting is introduced into thermal radiation bias field estimation for the first time, to the best of our knowledge. The surface fitting in the gradient domain is added to the thermal radiation effects correction model as a regularization term, which overcomes the ill-posed matrix problem of high-order bivariate polynomials surface fitting, and achieves higher accuracy under the same order. Additionally, a multi-scale iterative strategy and vector representation are adopted to speed up the iterative optimization and surface fitting, respectively. Vector representation greatly reduces the number of basis function calls and achieves fast surface fitting. In addition, split Bregman optimization is used to solve the minimization problem of the correction model, which decomposes the multivariable optimization problem into multiple univariate optimization sub-problems. The experimental results of simulated and real degraded images demonstrate that our proposed method performs favorably against the state of the art in thermal radiation effects correction.

7.
Artigo em Inglês | MEDLINE | ID: mdl-32203019

RESUMO

Dynamic scene blur is usually caused by object motion, depth variation as well as camera shake. Most existing methods usually solve this problem using image segmentation or fully end-to-end trainable deep convolutional neural networks by considering different object motions or camera shakes. However, these algorithms are less effective when there exist depth variations. In this work, we propose a deep neural convolutional network that exploits the depth map for dynamic scene deblurring. Given a blurred image, we first extract the depth map and adopt a depth refinement network to restore the edges and structure in the depth map. To effectively exploit the depth map, we adopt the spatial feature transform layer to extract depth features and fuse with the image features through scaling and shifting. Our image deblurring network thus learns to restore a clear image under the guidance of the depth map. With substantial experiments and analysis, we show that the depth information is crucial to the performance of the proposed model. Finally, extensive quantitative and qualitative evaluations demonstrate that the proposed model performs favorably against the state-of-the-art dynamic scene deblurring approaches as well as conventional depth-based deblurring algorithms.

8.
Artigo em Inglês | MEDLINE | ID: mdl-31751272

RESUMO

We present an effective semi-supervised learning algorithm for single image dehazing. The proposed algorithm applies a deep Convolutional Neural Network (CNN) containing a supervised learning branch and an unsupervised learning branch. In the supervised branch, the deep neural network is constrained by the supervised loss functions, which are mean squared, perceptual, and adversarial losses. In the unsupervised branch, we exploit the properties of clean images via sparsity of dark channel and gradient priors to constrain the network. We train the proposed network on both the synthetic data and real-world images in an end-to-end manner. Our analysis shows that the proposed semi-supervised learning algorithm is not limited to synthetic training datasets and can be generalized well to real-world images. Extensive experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art single image dehazing algorithms on both benchmark datasets and real-world images.

9.
Artigo em Inglês | MEDLINE | ID: mdl-31536000

RESUMO

This work develops a biologically inspired neural network for contour detection in natural images by combining the nonclassical receptive field modulation mechanism with a deep learning framework. The input image is first convolved with the local feature detectors to produce the classical receptive field responses, and then a corresponding modulatory kernel is constructed for each feature map to model the nonclassical receptive field modulation behaviors. The modulatory effects can activate a larger cortical area and thus allow cortical neurons to integrate a broader range of visual information to recognize complex cases. Additionally, to characterize spatial structures at various scales, a multiresolution technique is used to represent visual field information from fine to coarse. Different scale responses are combined to estimate the contour probability. Our method achieves state-of-the-art results among all biologically inspired contour detection models. This study provides a method for improving visual modeling of contour detection and inspires new ideas for integrating more brain cognitive mechanisms into deep neural networks.

10.
Sensors (Basel) ; 19(18)2019 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-31500196

RESUMO

Most existing person re-identification methods focus on matching still person images across non-overlapping camera views. Despite their excellent performance in some circumstances, these methods still suffer from occlusion and the changes of pose, viewpoint or lighting. Video-based re-id is a natural way to overcome these problems, by exploiting space-time information from videos. One of the most challenging problems in video-based person re-identification is temporal alignment, in addition to spatial alignment. To address the problem, we propose an effective superpixel-based temporally aligned representation for video-based person re-identification, which represents a video sequence only using one walking cycle. Particularly, we first build a candidate set of walking cycles by extracting motion information at superpixel level, which is more robust than that at the pixel level. Then, from the candidate set, we propose an effective criterion to select the walking cycle most matching the intrinsic periodicity property of walking persons. Finally, we propose a temporally aligned pooling scheme to describe the video data in the selected walking cycle. In addition, to characterize the individual still images in the cycle, we propose a superpixel-based representation to improve spatial alignment. Extensive experimental results on three public datasets demonstrate the effectiveness of the proposed method compared with the state-of-the-art approaches.

11.
J Opt Soc Am A Opt Image Sci Vis ; 33(6): 1207-13, 2016 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-27409451

RESUMO

Manifold regularization (MR) has become one of the most widely used approaches in the semi-supervised learning field. It has shown superiority by exploiting the local manifold structure of both labeled and unlabeled data. The manifold structure is modeled by constructing a Laplacian graph and then incorporated in learning through a smoothness regularization term. Hence the labels of labeled and unlabeled data vary smoothly along the geodesics on the manifold. However, MR has ignored the discriminative ability of the labeled and unlabeled data. To address the problem, we propose an enhanced MR framework for semi-supervised classification in which the local discriminative information of the labeled and unlabeled data is explicitly exploited. To make full use of labeled data, we firstly employ a semi-supervised clustering method to discover the underlying data space structure of the whole dataset. Then we construct a local discrimination graph to model the discriminative information of labeled and unlabeled data according to the discovered intrinsic structure. Therefore, the data points that may be from different clusters, though similar on the manifold, are enforced far away from each other. Finally, the discrimination graph is incorporated into the MR framework. In particular, we utilize semi-supervised fuzzy c-means and Laplacian regularized Kernel minimum squared error for semi-supervised clustering and classification, respectively. Experimental results on several benchmark datasets and face recognition demonstrate the effectiveness of our proposed method.

12.
Sensors (Basel) ; 16(4)2016 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-27092505

RESUMO

Appearance representation and the observation model are the most important components in designing a robust visual tracking algorithm for video-based sensors. Additionally, the exemplar-based linear discriminant analysis (ELDA) model has shown good performance in object tracking. Based on that, we improve the ELDA tracking algorithm by deep convolutional neural network (CNN) features and adaptive model update. Deep CNN features have been successfully used in various computer vision tasks. Extracting CNN features on all of the candidate windows is time consuming. To address this problem, a two-step CNN feature extraction method is proposed by separately computing convolutional layers and fully-connected layers. Due to the strong discriminative ability of CNN features and the exemplar-based model, we update both object and background models to improve their adaptivity and to deal with the tradeoff between discriminative ability and adaptivity. An object updating method is proposed to select the "good" models (detectors), which are quite discriminative and uncorrelated to other selected models. Meanwhile, we build the background model as a Gaussian mixture model (GMM) to adapt to complex scenes, which is initialized offline and updated online. The proposed tracker is evaluated on a benchmark dataset of 50 video sequences with various challenges. It achieves the best overall performance among the compared state-of-the-art trackers, which demonstrates the effectiveness and robustness of our tracking algorithm.

13.
J Opt Soc Am A Opt Image Sci Vis ; 32(2): 173-85, 2015 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-26366588

RESUMO

Recent methods based on midlevel visual concepts have shown promising capabilities in the human action recognition field. Automatically discovering semantic entities such as action parts remains challenging. In this paper, we present a method of automatically discovering distinctive midlevel action parts from video for recognition of human actions. We address this problem by learning and selecting a collection of discriminative and representative action part detectors directly from video data. We initially train a large collection of candidate exemplar-linear discriminant analysis detectors from clusters obtained by clustering spatiotemporal patches in whitened space. To select the most effective detectors from the vast array of candidates, we propose novel coverage-entropy curves (CE curves) to evaluate a detector's capability of distinguishing actions. The CE curves characterize the correlation between the representative and discriminative power of detectors. In the experiments, we apply the mined part detectors as a visual vocabulary to the task of action recognition on four datasets: KTH, Olympic Sports, UCF50, and HMDB51. The experimental results demonstrate the effectiveness of the proposed method and show the state-of-the-art recognition performance.

14.
J Opt Soc Am A Opt Image Sci Vis ; 32(4): 566-75, 2015 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-26366765

RESUMO

In the last decades, Gaussian Mixture Models (GMMs) have attracted considerable interest in data mining and pattern recognition. A GMM-based clustering algorithm models a dataset with a mixture of multiple Gaussian components and estimates the model parameters using the Expectation-Maximization (EM) algorithm. Recently, a new Locally Consistent GMM (LCGMM) has been proposed to improve the clustering performance by exploiting the local manifold structure of the data using a p nearest neighbor graph. In addition to the underlying manifold structure, many other forms of prior knowledge may guide the clustering process and improve the performance. In this paper, we introduce a Semi-Supervised LCGMM (Semi-LCGMM), where the prior knowledge is provided in the form of class labels of partial data. In particular, the new Semi-LCGMM incorporates the prior knowledge into the maximum likelihood function of the original LCGMM, and the model parameters are estimated using the EM algorithm. It is worth noting that, in our algorithm, each class may be modeled by multiple Gaussian components while in the unsupervised setting each class is modeled by a single Gaussian component. Our algorithm has shown promising results in many different applications, including clustering breast cancer data, heart disease data, handwritten digit images, human face images, and image segmentation.


Assuntos
Algoritmos , Modelos Estatísticos , Neoplasias da Mama , Análise por Conglomerados , Mineração de Dados , Face , Cardiopatias , Humanos , Processamento de Imagem Assistida por Computador , Distribuição Normal
15.
PLoS One ; 9(5): e98447, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24871350

RESUMO

This paper presents a novel object detection method using a single instance from the object category. Our method uses biologically inspired global scene context criteria to check whether every individual location of the image can be naturally replaced by the query instance, which indicates whether there is a similar object at this location. Different from the traditional detection methods that only look at individual locations for the desired objects, our method evaluates the consistency of the entire scene. It is therefore robust to large intra-class variations, occlusions, a minor variety of poses, low-revolution conditions, background clutter etc., and there is no off-line training. The experimental results on four datasets and two video sequences clearly show the superior robustness of the proposed method, suggesting that global scene context is important for visual detection/localization.


Assuntos
Algoritmos , Modelos Biológicos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Humanos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Visual de Modelos/fisiologia , Percepção Visual/fisiologia
16.
J Opt Soc Am A Opt Image Sci Vis ; 31(1): 1-6, 2014 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-24561932

RESUMO

Face recognition is one of the most important applications of machine learning and computer vision. The traditional supervised learning methods require a large amount of labeled face images to achieve good performance. In practice, however, labeled images are usually scarce while unlabeled ones may be abundant. In this paper, we introduce a semi-supervised face recognition method, in which semi-supervised linear discriminant analysis (SDA) and affinity propagation (AP) are integrated into a self-training framework. In particular, SDA is employed to compute the face subspace using both labeled and unlabeled images, and AP is used to identify the exemplars of different face classes in the subspace. The unlabeled data can then be classified according to the exemplars and the newly labeled data with the highest confidence are added to the labeled data, and the whole procedure iterates until convergence. A series of experiments on four face datasets are carried out to evaluate the performance of our algorithm. Experimental results illustrate that our algorithm outperforms the other unsupervised, semi-supervised, and supervised methods.


Assuntos
Algoritmos , Inteligência Artificial , Face , Análise Discriminante , Humanos , Processamento de Imagem Assistida por Computador
17.
Opt Lett ; 37(1): 76-8, 2012 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-22212796

RESUMO

We present an instance-based attention model to predict where humans could look first when searching for an object instance, and we show its application in image synthesis. The proposed model learns configurational rules from vast scene images described by global scene representations. The rules are then used to predict the focus of attention for the purpose of searching for a given object instance with special scale and pose. Finally, the image synthesis results are obtained by putting the object instance into the scene at the position that attracts most attention. Promising experimental results demonstrate the effectiveness of the proposed model.


Assuntos
Atenção , Visão Ocular/fisiologia , Humanos , Modelos Biológicos , Fotografação , Probabilidade
18.
Comput Med Imaging Graph ; 33(2): 140-7, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19095408

RESUMO

The paper presents a versatile nonlinear diffusion method to visually enhance the angiogram images for improving the clinical diagnosis. Traditional nonlinear diffusion has been shown very effective in edge-preserved smoothing of images. However, the existing nonlinear diffusion models suffer several drawbacks: sensitivity to the choice of the conductance parameter, limited range of edge enhancement, and the sensitivity to the selection of evolution time. The new anisotropic diffusion we proposed is based on facet model which can solve the issues mentioned above adaptively according to the image content. This method uses facet model for fitting the image to reduce noise, and uses the sum square of eigenvalues of Hessian as the standard of the conductance parameter selection synchronously. The capability of dealing with noise and conductance parameter can also change adaptively in the whole diffusion process. Moreover, our method is not sensitive to the choice of evolution time. Experimental results show that our new method is more effective than the original anisotropic diffusion.


Assuntos
Angiografia/métodos , Artefatos , Modelos Estruturais , Intensificação de Imagem Radiográfica/métodos , Processamento de Sinais Assistido por Computador , Anisotropia , Vasos Coronários/anatomia & histologia , Transferência de Energia , Lógica Fuzzy , Humanos , Dinâmica não Linear , Reconhecimento Automatizado de Padrão/métodos , Análise de Regressão , Sensibilidade e Especificidade
19.
Conf Proc IEEE Eng Med Biol Soc ; 2005: 1736-8, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-17282549

RESUMO

In this paper, we present a new method for X-ray angiogram images enhancement using a contrast-modulated nonlinear diffusion. The original nonlinear diffusion is gradient driven, which leads into much dependence on the accurate estimation of the edge. However, it is very difficult to get the accurate estimation of edges for X-ray angiogram images, which are characterized with complex background. So it is necessary to do some improvements to this model. By designing a new concept of contrast space according to the characteristics of these images, we change the original nonlinear diffusion into contrast-modulated nonlinear diffusion. Compared with the traditional method, this new approach is proved to have a better performance of enhancement.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA