Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38502627

RESUMO

The remarkable performance of recent stereo depth estimation models benefits from the successful use of convolutional neural networks to regress dense disparity. Akin to most tasks, this needs gathering training data that covers a number of heterogeneous scenes at deployment time. However, training samples are typically acquired continuously in practical applications, making the capability to learn new scenes continually even more crucial. For this purpose, we propose to perform continual stereo matching where a model is tasked to 1) continually learn new scenes, 2) overcome forgetting previously learned scenes, and 3) continuously predict disparities at inference. We achieve this goal by introducing a Reusable Architecture Growth (RAG) framework. RAG leverages task-specific neural unit search and architecture growth to learn new scenes continually in both supervised and self-supervised manners. It can maintain high reusability during growth by reusing previous units while obtaining good performance. Additionally, we present a Scene Router module to adaptively select the scene-specific architecture path at inference. Comprehensive experiments on numerous datasets show that our framework performs impressively in various weather, road, and city circumstances and surpasses the state-of-the-art methods in more challenging cross-dataset settings. Further experiments also demonstrate the adaptability of our method to unseen scenes, which can facilitate end-to-end stereo architecture learning and practical deployment.

2.
IEEE Trans Image Process ; 33: 354-365, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38117623

RESUMO

The sparse signals provided by external sources have been leveraged as guidance for improving dense disparity estimation. However, previous methods assume depth measurements to be randomly sampled, which restricts performance improvements due to under-sampling in challenging regions and over-sampling in well-estimated areas. In this work, we introduce an Active Disparity Sampling problem that selects suitable sampling patterns to enhance the utility of depth measurements given arbitrary sampling budgets. We achieve this goal by learning an Adjoint Network for a deep stereo model to measure its pixel-wise disparity quality. Specifically, we design a hard-soft prior supervision mechanism to provide hierarchical supervision for learning the quality map. A Bayesian optimized disparity sampling policy is further proposed to sample depth measurements with the guidance of the disparity quality. Extensive experiments on standard datasets with various stereo models demonstrate that our method is suited and effective in different stereo architectures and outperforms existing fixed and adaptive sampling methods under different sampling rates. Remarkably, the proposed method makes substantial improvements when generalized to heterogeneous unseen domains.

3.
IEEE Trans Pattern Anal Mach Intell ; 43(9): 2905-2920, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-32866094

RESUMO

Neural architecture search (NAS) is inherently subject to the gap of architectures during searching and validating. To bridge this gap effectively, we develop Differentiable ArchiTecture Approximation (DATA) with Ensemble Gumbel-Softmax (EGS) estimator and Architecture Distribution Constraint (ADC) to automatically approximate architectures during searching and validating in a differentiable manner. Technically, the EGS estimator consists of a group of Gumbel-Softmax estimators, which is capable of converting probability vectors to binary codes and passing gradients reversely, reducing the estimation bias in a differentiable way. To narrow the distribution gap between sampled architectures and supernet, further, the ADC is introduced to reduce the variance of sampling during searching. Benefiting from such modeling, architecture probabilities and network weights in the NAS model can be jointly optimized with the standard back-propagation, yielding an end-to-end learning mechanism for searching deep neural architectures in an extended search space. Conclusively, in the validating process, a high-performance architecture that approaches to the learned one during searching is readily built. Extensive experiments on various tasks including image classification, few-shot learning, unsupervised clustering, semantic segmentation and language modeling strongly demonstrate that DATA is capable of discovering high-performance architectures while guaranteeing the required efficiency. Code is available at https://github.com/XinbangZhang/DATA-NAS.

4.
IEEE Trans Pattern Anal Mach Intell ; 42(4): 793-808, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30571616

RESUMO

Baselines estimation is a critical preprocessing step for many tasks of document image processing and analysis. The problem is very challenging due to arbitrarily complicated page layouts and various types of image quality degradations. This paper proposes a method based on slope fields recovery for curved baseline extraction from a distorted document image captured by a hand-held camera. Our method treats the curved baselines as the solution curves of an ordinary differential equation defined on a slope field. By assuming the page shape is a smooth and developable surface, we investigate a type of intrinsic geometric constraints of baselines to estimate the latent slope field. The curved baselines are finally obtained by solving an ordinary differential equation through the Euler method. Unlike the traditional text-lines based methods, our method is free from text-lines detection and segmentation. It can exploit multiple visual cues other than horizontal text-lines available in images for baselines extraction and is quite robust to document scripts, various types of image quality degradation (e.g., image distortion, blur and non-uniform illumination), large areas of non-textual objects and complex page layouts. Extensive experiments on synthetic and real-captured document images are implemented to evaluate the performance of the proposed method.

5.
IEEE Trans Pattern Anal Mach Intell ; 42(4): 809-823, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30596571

RESUMO

Clustering is a crucial but challenging task in pattern analysis and machine learning. Existing methods often ignore the combination between representation learning and clustering. To tackle this problem, we reconsider the clustering task from its definition to develop Deep Self-Evolution Clustering (DSEC) to jointly learn representations and cluster data. For this purpose, the clustering task is recast as a binary pairwise-classification problem to estimate whether pairwise patterns are similar. Specifically, similarities between pairwise patterns are defined by the dot product between indicator features which are generated by a deep neural network (DNN). To learn informative representations for clustering, clustering constraints are imposed on the indicator features to represent specific concepts with specific representations. Since the ground-truth similarities are unavailable in clustering, an alternating iterative algorithm called Self-Evolution Clustering Training (SECT) is presented to select similar and dissimilar pairwise patterns and to train the DNN alternately. Consequently, the indicator features tend to be one-hot vectors and the patterns can be clustered by locating the largest response of the learned indicator features. Extensive experiments strongly evidence that DSEC outperforms current models on twelve popular image, text and audio datasets consistently.

6.
IEEE Trans Pattern Anal Mach Intell ; 42(11): 2874-2886, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31071020

RESUMO

Convolutional neural networks (CNNs) provide a dramatically powerful class of models, but are subject to traditional convolution that can merely aggregate permutation-ordered and dimension-equal local inputs. It causes that CNNs are allowed to only manage signals on Euclidean or grid-like domains (e.g., images), not ones on non-Euclidean or graph domains (e.g., traffic networks). To eliminate this limitation, we develop a local-aggregation function, a sharable nonlinear operation, to aggregate permutation-unordered and dimension-unequal local inputs on non-Euclidean domains. In the context of the function approximation theory, the local-aggregation function is parameterized with a group of orthonormal polynomials in an effective and efficient manner. By replacing the traditional convolution in CNNs with the parameterized local-aggregation function, Local-Aggregation Graph Networks (LAGNs) are readily established, which enable to fit nonlinear functions without activation functions and can be expediently trained with the standard back-propagation. Extensive experiments on various datasets strongly demonstrate the effectiveness and efficiency of LAGNs, leading to superior performance on numerous pattern recognition and machine learning tasks, including text categorization, molecular activity detection, taxi flow prediction, and image classification.

7.
IEEE Trans Pattern Anal Mach Intell ; 41(11): 2660-2676, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-30176580

RESUMO

Most existing hashing methods resort to binary codes for large scale similarity search, owing to the high efficiency of computation and storage. However, binary codes lack enough capability in similarity preservation, resulting in less desirable performance. To address this issue, we propose Nonlinear Asymmetric Multi-Valued Hashing (NAMVH) supported by two distinct non-binary embeddings. Specifically, a real-valued embedding is used for representing the newly-coming query by an ideally nonlinear transformation. Besides, a multi-integer-embedding is employed for compressing the whole database, which is modeled by Binary Sparse Representation (BSR) with fixed sparsity. With these two non-binary embeddings, NAMVH preserves more precise similarities between data points and enables access to the incremental extension with database samples evolving dynamically. To perform meaningful asymmetric similarity computation for efficient semantic search, these embeddings are jointly learnt by preserving the pairwise label-based similarity. Technically, this results in a mixed integer programming problem, which is efficiently solved by a well-designed alternative optimization method. Extensive experiments on seven large scale datasets demonstrate that our approach not only outperforms the existing binary hashing methods in search accuracy, but also retains their query and storage efficiency.

8.
Artigo em Inglês | MEDLINE | ID: mdl-30334760

RESUMO

Recent research have shown the potential of using convolutional neural networks (CNNs) to accomplish single image dehazing. In this work, we take one step further to explore the possibility of exploiting a network to perform haze removal for videos. Unlike single image dehazing, video based approaches can take advantage of the abundant information that exists across neighboring frames. In this work, assuming that a scene point yields highly correlated transmission values between adjacent video frames, we develop a deep learning solution for video dehazing, where a CNN is trained end-to-end to learn how to accumulate information across frames for transmission estimation. The estimated transmission map is subsequently used to recover a haze-free frame via atmospheric scattering model. In addition, as the semantic information of a scene provides a strong prior for image restoration, we propose to incorporate global semantic priors as input to regularize the transmission maps so that the estimated maps can be smooth in the regions of the same object and only discontinuous across the boundaries of different objects. To train this network, we generate a dataset consisted of synthetic hazy and haze-free videos for supervision based on the NYU depth dataset. We show that the features learned from this dataset are capable of removing haze that arises in outdoor scenes in a wide range of videos. Extensive experiments demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods on both synthetic and real-world videos.

9.
IEEE Trans Image Process ; 23(12): 5412-27, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25330488

RESUMO

Hyperspectral unmixing, the process of estimating a common set of spectral bases and their corresponding composite percentages at each pixel, is an important task for hyperspectral analysis, visualization, and understanding. From an unsupervised learning perspective, this problem is very challenging-both the spectral bases and their composite percentages are unknown, making the solution space too large. To reduce the solution space, priors. In practice, these priors would easily lead to some unsuitable solution. This is because they are achieved by applying an identical strength of constraints to all the factors, which does not hold in practice. To overcome this limitation, we propose a novel sparsity-based method by learning a data-guided map (DgMap) to describe the individual mixed level of each pixel. Through this DgMap, the l(p) (0 < p < 1) constraint is applied in an adaptive manner. Such implementation not only meets the practical situation, but also guides the spectral bases toward the pixels under highly sparse constraint. What is more, an elegant optimization scheme as well as its convergence proof have been provided in this paper. Extensive experiments on several datasets also demonstrate that the DgMap is feasible, and high quality unmixing results could be obtained by our method.

10.
IEEE Trans Pattern Anal Mach Intell ; 35(7): 1730-43, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23681999

RESUMO

A scanned image of an opened book page often suffers from various scanning artifacts known as scanning shading and dark borders noises. These artifacts will degrade the qualities of the scanned images and cause many problems to the subsequent process of document image analysis. In this paper, we propose an effective method to rectify these scanning artifacts. Our method comes from two observations: that the shading surface of most scanned book pages is quasi-concave and that the document contents are usually printed on a sheet of plain and bright paper. Based on these observations, a shading image can be accurately extracted via convex hulls-based image reconstruction. The proposed method proves to be surprisingly effective for image shading correction and dark borders removal. It can restore a desired shading-free image and meanwhile yield an illumination surface of high quality. More importantly, the proposed method is nonparametric and thus does not involve any user interactions or parameter fine-tuning. This would make it very appealing to nonexpert users in applications. Extensive experiments based on synthetic and real-scanned document images demonstrate the efficiency of the proposed method.

11.
IEEE Trans Pattern Anal Mach Intell ; 34(4): 707-22, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21808093

RESUMO

In this paper, we propose a metric rectification method to restore an image from a single camera-captured document image. The core idea is to construct an isometric image mesh by exploiting the geometry of page surface and camera. Our method uses a general cylindrical surface (GCS) to model the curved page shape. Under a few proper assumptions, the printed horizontal text lines are shown to be line convergent symmetric. This property is then used to constrain the estimation of various model parameters under perspective projection. We also introduce a paraperspective projection to approximate the nonlinear perspective projection. A set of close-form formulas is thus derived for the estimate of GCS directrix and document aspect ratio. Our method provides a straightforward framework for image metric rectification. It is insensitive to camera positions, viewing angles, and the shapes of document pages. To evaluate the proposed method, we implemented comprehensive experiments on both synthetic and real-captured images. The results demonstrate the efficiency of our method. We also carried out a comparative experiment on the public CBDAR2007 data set. The experimental results show that our method outperforms the state-of-the-art methods in terms of OCR accuracy and rectification errors.

12.
IEEE Trans Neural Netw Learn Syst ; 23(11): 1738-54, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24808069

RESUMO

This paper presents a framework of discriminative least squares regression (LSR) for multiclass classification and feature selection. The core idea is to enlarge the distance between different classes under the conceptual framework of LSR. First, a technique called ε-dragging is introduced to force the regression targets of different classes moving along opposite directions such that the distances between classes can be enlarged. Then, the ε-draggings are integrated into the LSR model for multiclass classification. Our learning framework, referred to as discriminative LSR, has a compact model form, where there is no need to train two-class machines that are independent of each other. With its compact form, this model can be naturally extended for feature selection. This goal is achieved in terms of L2,1 norm of matrix, generating a sparse learning model for feature selection. The model for multiclass classification and its extension for feature selection are finally solved elegantly and efficiently. Experimental evaluation over a range of benchmark datasets indicates the validity of our method.

13.
IEEE Trans Image Process ; 19(7): 1837-46, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20236898

RESUMO

This paper proposes a general-purpose method for estimating the skew angles of document images. Rather than to derive a skew angle merely from text lines, the proposed method exploits various types of visual cues of image skew available in local image regions. The visual cues are extracted by Radon transform and then outliers of them are iteratively rejected through a floating cascade. A bagging (bootstrap aggregating) estimator is finally employed to combine the estimations on the local image blocks. Our experimental results show significant improvements against the state-of-the-art methods, in terms of execution speed and estimation accuracy, as well as the robustness to short and sparse text lines, multiple different skews and the presence of nontextual objects of various types and quantities.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...