Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Neural Netw ; 178: 106484, 2024 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-38954894

RESUMO

Graph neural networks (GNNs) have demonstrated exceptional performance in processing various types of graph data, such as citation networks and social networks, etc. Although many of these GNNs prove their superiority in handling homophilic graphs, they often overlook the other kind of widespread heterophilic graphs, in which adjacent nodes tend to have different classes or dissimilar features. Recent methods attempt to address heterophilic graphs from the graph spatial domain, which try to aggregate more similar nodes or prevent dissimilar nodes with negative weights. However, they may neglect valuable heterophilic information or extract heterophilic information ineffectively, which could cause poor performance of downstream tasks on heterophilic graphs, including node classification and graph classification, etc. Hence, a novel framework named GARN is proposed to effectively extract both homophilic and heterophilic information. First, we analyze the shortcomings of most GNNs in tackling heterophilic graphs from the perspective of graph spectral and spatial theory. Then, motivated by these analyses, a Graph Aggregating-Repelling Convolution (GARC) mechanism is designed with the objective of fusing both low-pass and high-pass graph filters. Technically, it learns positive attention weights as a low-pass filter to aggregate similar adjacent nodes, and learns negative attention weights as a high-pass filter to repel dissimilar adjacent nodes. A learnable integration weight is used to adaptively fuse these two filters and balance the proportion of the learned positive and negative weights, which could control our GARC to evolve into different types of graph filters and prevent it from over-relying on high intra-class similarity. Finally, a framework named GARN is established by simply stacking several layers of GARC to evaluate its graph representation learning ability on both the node classification and image-converted graph classification tasks. Extensive experiments conducted on multiple homophilic and heterophilic graphs and complex real-world image-converted graphs indicate the effectiveness of our proposed framework and mechanism over several representative GNN baselines.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38502627

RESUMO

The remarkable performance of recent stereo depth estimation models benefits from the successful use of convolutional neural networks to regress dense disparity. Akin to most tasks, this needs gathering training data that covers a number of heterogeneous scenes at deployment time. However, training samples are typically acquired continuously in practical applications, making the capability to learn new scenes continually even more crucial. For this purpose, we propose to perform continual stereo matching where a model is tasked to 1) continually learn new scenes, 2) overcome forgetting previously learned scenes, and 3) continuously predict disparities at inference. We achieve this goal by introducing a Reusable Architecture Growth (RAG) framework. RAG leverages task-specific neural unit search and architecture growth to learn new scenes continually in both supervised and self-supervised manners. It can maintain high reusability during growth by reusing previous units while obtaining good performance. Additionally, we present a Scene Router module to adaptively select the scene-specific architecture path at inference. Comprehensive experiments on numerous datasets show that our framework performs impressively in various weather, road, and city circumstances and surpasses the state-of-the-art methods in more challenging cross-dataset settings. Further experiments also demonstrate the adaptability of our method to unseen scenes, which can facilitate end-to-end stereo architecture learning and practical deployment.

3.
IEEE Trans Image Process ; 33: 354-365, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38117623

RESUMO

The sparse signals provided by external sources have been leveraged as guidance for improving dense disparity estimation. However, previous methods assume depth measurements to be randomly sampled, which restricts performance improvements due to under-sampling in challenging regions and over-sampling in well-estimated areas. In this work, we introduce an Active Disparity Sampling problem that selects suitable sampling patterns to enhance the utility of depth measurements given arbitrary sampling budgets. We achieve this goal by learning an Adjoint Network for a deep stereo model to measure its pixel-wise disparity quality. Specifically, we design a hard-soft prior supervision mechanism to provide hierarchical supervision for learning the quality map. A Bayesian optimized disparity sampling policy is further proposed to sample depth measurements with the guidance of the disparity quality. Extensive experiments on standard datasets with various stereo models demonstrate that our method is suited and effective in different stereo architectures and outperforms existing fixed and adaptive sampling methods under different sampling rates. Remarkably, the proposed method makes substantial improvements when generalized to heterogeneous unseen domains.

4.
Neural Netw ; 165: 909-924, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37441908

RESUMO

Graph Convolutional Networks (GCNs) with naive message passing mechanisms have limited performance due to the isotropic aggregation strategy. To remedy this drawback, some recent works focus on how to design anisotropic aggregation strategies with tricks on feature mapping or structure mining. However, these models still suffer from the low ability of expressiveness and long-range modeling for the needs of high performance in practice. To this end, this paper proposes a tree-guided anisotropic GCN, which applies an anisotropic aggregation strategy with competitive expressiveness and a large receptive field. Specifically, the anisotropic aggregation is decoupled into two stages. The first stage is to establish the path of the message passing on a tree-like hypergraph consisting of substructures. The second one is to aggregate the messages with constrained intensities by employing an effective gating mechanism. In addition, a novel anisotropic readout mechanism is constructed to generate representative and discriminative graph-level features for downstream tasks. Our model outperforms baseline methods and recent works on several synthetic benchmarks and datasets from different real-world tasks. In addition, extensive ablation studies and theoretical analyses indicate the effectiveness of our proposed method.


Assuntos
Redes Neurais de Computação
5.
Entropy (Basel) ; 25(2)2023 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-36832697

RESUMO

Topological Data Analysis (TDA) is an approach to analyzing the shape of data using techniques from algebraic topology. The staple of TDA is Persistent Homology (PH). Recent years have seen a trend of combining PH and Graph Neural Networks (GNNs) in an end-to-end manner to capture topological features from graph data. Though effective, these methods are limited by the shortcomings of PH: incomplete topological information and irregular output format. Extended Persistent Homology (EPH), as a variant of PH, addresses these problems elegantly. In this paper, we propose a plug-in topological layer for GNNs, termed Topological Representation with Extended Persistent Homology (TREPH). Taking advantage of the uniformity of EPH, a novel aggregation mechanism is designed to collate topological features of different dimensions to the local positions determining their living processes. The proposed layer is provably differentiable and more expressive than PH-based representations, which in turn is strictly stronger than message-passing GNNs in expressive power. Experiments on real-world graph classification tasks demonstrate the competitiveness of TREPH compared with the state-of-the-art approaches.

6.
Neural Netw ; 161: 213-227, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36774861

RESUMO

The development of deep learning techniques has greatly benefited CNN-based object detectors, leading to unprecedented progress in recent years. However, the distribution variance between training and testing domains causes significant performance degradation. Labeling data for new scenarios is costly and time-consuming, so most existing domain adaptation methods perform feature alignment through adversarial training. While this can improve the accuracy of detectors in unlabeled target domains, the unconstrained domain alignment also negatively transfers the feature distribution, which compromises the recognition ability of the model. To address this problem, we propose the Knowledge Transfer Network (KTNet), which consists of object intrinsic knowledge mining and category relational knowledge constraint modules. Specifically, a binary classifier shared by the source and target domains is designed to extract common attribute knowledge of objects, which can align foreground and background features from different data domains adaptively. Then, we construct relational knowledge graphs to explicitly constrain the category correlations in the source, target, and cross-domain settings. These two modules guide the detector to learn object-related and domain-invariant representations, enabling the proposed KTNet to perform well in four commonly-used cross-domain scenarios. Furthermore, the ablation experiments show that our method is scalable to more complex backbone networks and different detection architectures.


Assuntos
Conhecimento , Reconhecimento Psicológico
7.
Artigo em Inglês | MEDLINE | ID: mdl-36306294

RESUMO

Studying the relationship between linear discriminant analysis (LDA) and least squares regression (LSR) is of great theoretical and practical significance. It is well-known that the two-class LDA is equivalent to an LSR problem, and directly casting multiclass LDA as an LSR problem, however, becomes more challenging. Recent study reveals that the equivalence between multiclass LDA and LSR can be established based on a special class indicator matrix, but under a mild condition which may not hold under the scenarios with low-dimensional or oversampled data. In this article, we show that the equivalence between multiclass LDA and LSR can be established based on arbitrary linearly independent class indicator vectors and without any condition. In addition, we show that LDA is also equivalent to a constrained LSR based on the data-dependent indicator vectors. It can be concluded that under exactly the same mild condition, such two regressions are both equivalent to the null space LDA method. Illuminated by the equivalence of LDA and LSR, we propose a direct LDA classifier to replace the conventional framework of LDA plus extra classifier. Extensive experiments well validate the above theoretic analysis.

8.
Neural Netw ; 154: 190-202, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35905653

RESUMO

Spatial-temporal graph modeling has been widely studied in many fields, such as traffic forecasting and energy analysis, where data has time and space properties. Existing methods focus on capturing stable and dynamic spatial correlations by constructing physical and virtual graphs along with graph convolution and temporal modeling. However, existing methods tending to smooth node features may obscure the spatial-temporal patterns among nodes. Worse, the graph structure is not always available in some fields, while the manually constructed stable or dynamic graphs cannot necessarily reflect the true spatial correlations either. This paper proposes a Subgraph-Aware Graph Structure Revision network (SAGSR) to overcome these limitations. Architecturally, a subgraph-aware structure revision graph convolution module (SASR-GCM) is designed, which revises the learned stable graph to obtain a dynamic one to automatically infer the dynamics of spatial correlations. Each of these two graphs is separated into one homophilic subgraph and one heterophilic subgraph by a subgraph-aware graph convolution mechanism, which aggregates similar nodes in the homophilic subgraph with positive weights, while keeping nodes with dissimilar features in the heterophilic subgraph mutually away with negative aggregation weights to avoid pattern obfuscation. By combining a gated multi-scale temporal convolution module (GMS-TCM) for temporal modeling, SAGSR can efficiently capture the spatial-temporal correlations and extract complex spatial-temporal graph features. Extensive experiments, conducted on two specific tasks: traffic flow forecasting and energy consumption forecasting, indicate the effectiveness and superiority of our proposed approach over several competitive baselines.

9.
IEEE Trans Neural Netw Learn Syst ; 32(9): 4026-4038, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-32841126

RESUMO

External memory-based neural networks, such as differentiable neural computers (DNCs), have recently gained importance and popularity to solve complex sequential learning tasks that pose challenges to conventional neural networks. However, a trained DNC usually has a low-memory utilization efficiency. This article introduces a variation of DNC architecture with a convertible short-term and long-term memory, named CSLM-DNC. Unlike the memory architecture of the original DNC, the new scheme of short-term and long-term memories offers different importance of memory locations for read and write, and they can be converted over time. This is mainly motivated by the human brain where short-term memory stores large amounts of noisy and unimportant information and decays rapidly, while long-term memory stores important information and lasts for a long time. The conversion of these two types of memory is allowed and is able to be learned according to their reading and writing frequency. We quantitatively and qualitatively evaluate the proposed CSLM-DNC architecture on the tasks of question answering, copy and repeat copy, showing that it can significantly improve memory efficiency and learning performance.

10.
IEEE Trans Pattern Anal Mach Intell ; 43(9): 2891-2904, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-32866093

RESUMO

Recently neural architecture search (NAS) has raised great interest in both academia and industry. However, it remains challenging because of its huge and non-continuous search space. Instead of applying evolutionary algorithm or reinforcement learning as previous works, this paper proposes a direct sparse optimization NAS (DSO-NAS) method. The motivation behind DSO-NAS is to address the task in the view of model pruning. To achieve this goal, we start from a completely connected block, and then introduce scaling factors to scale the information flow between operations. Next, sparse regularizations are imposed to prune useless connections in the architecture. Lastly, an efficient and theoretically sound optimization method is derived to solve it. Our method enjoys both advantages of differentiability and efficiency, therefore it can be directly applied to large datasets like ImageNet and tasks beyond classification. Particularly, on the CIFAR-10 dataset, DSO-NAS achieves an average test error 2.74 percent, while on the ImageNet dataset DSO-NAS achieves 25.4 percent test error under 600M FLOPs with 8 GPUs in 18 hours. As for semantic segmentation task, DSO-NAS also achieve competitive result compared with manually designed architectures on the PASCAL VOC dataset. Code is available at https://github.com/XinbangZhang/DSO-NAS.

11.
IEEE Trans Pattern Anal Mach Intell ; 43(9): 2905-2920, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-32866094

RESUMO

Neural architecture search (NAS) is inherently subject to the gap of architectures during searching and validating. To bridge this gap effectively, we develop Differentiable ArchiTecture Approximation (DATA) with Ensemble Gumbel-Softmax (EGS) estimator and Architecture Distribution Constraint (ADC) to automatically approximate architectures during searching and validating in a differentiable manner. Technically, the EGS estimator consists of a group of Gumbel-Softmax estimators, which is capable of converting probability vectors to binary codes and passing gradients reversely, reducing the estimation bias in a differentiable way. To narrow the distribution gap between sampled architectures and supernet, further, the ADC is introduced to reduce the variance of sampling during searching. Benefiting from such modeling, architecture probabilities and network weights in the NAS model can be jointly optimized with the standard back-propagation, yielding an end-to-end learning mechanism for searching deep neural architectures in an extended search space. Conclusively, in the validating process, a high-performance architecture that approaches to the learned one during searching is readily built. Extensive experiments on various tasks including image classification, few-shot learning, unsupervised clustering, semantic segmentation and language modeling strongly demonstrate that DATA is capable of discovering high-performance architectures while guaranteeing the required efficiency. Code is available at https://github.com/XinbangZhang/DATA-NAS.

12.
IEEE Trans Pattern Anal Mach Intell ; 42(4): 793-808, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30571616

RESUMO

Baselines estimation is a critical preprocessing step for many tasks of document image processing and analysis. The problem is very challenging due to arbitrarily complicated page layouts and various types of image quality degradations. This paper proposes a method based on slope fields recovery for curved baseline extraction from a distorted document image captured by a hand-held camera. Our method treats the curved baselines as the solution curves of an ordinary differential equation defined on a slope field. By assuming the page shape is a smooth and developable surface, we investigate a type of intrinsic geometric constraints of baselines to estimate the latent slope field. The curved baselines are finally obtained by solving an ordinary differential equation through the Euler method. Unlike the traditional text-lines based methods, our method is free from text-lines detection and segmentation. It can exploit multiple visual cues other than horizontal text-lines available in images for baselines extraction and is quite robust to document scripts, various types of image quality degradation (e.g., image distortion, blur and non-uniform illumination), large areas of non-textual objects and complex page layouts. Extensive experiments on synthetic and real-captured document images are implemented to evaluate the performance of the proposed method.

13.
IEEE Trans Pattern Anal Mach Intell ; 42(4): 809-823, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30596571

RESUMO

Clustering is a crucial but challenging task in pattern analysis and machine learning. Existing methods often ignore the combination between representation learning and clustering. To tackle this problem, we reconsider the clustering task from its definition to develop Deep Self-Evolution Clustering (DSEC) to jointly learn representations and cluster data. For this purpose, the clustering task is recast as a binary pairwise-classification problem to estimate whether pairwise patterns are similar. Specifically, similarities between pairwise patterns are defined by the dot product between indicator features which are generated by a deep neural network (DNN). To learn informative representations for clustering, clustering constraints are imposed on the indicator features to represent specific concepts with specific representations. Since the ground-truth similarities are unavailable in clustering, an alternating iterative algorithm called Self-Evolution Clustering Training (SECT) is presented to select similar and dissimilar pairwise patterns and to train the DNN alternately. Consequently, the indicator features tend to be one-hot vectors and the patterns can be clustered by locating the largest response of the learned indicator features. Extensive experiments strongly evidence that DSEC outperforms current models on twelve popular image, text and audio datasets consistently.

14.
IEEE Trans Pattern Anal Mach Intell ; 42(11): 2874-2886, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31071020

RESUMO

Convolutional neural networks (CNNs) provide a dramatically powerful class of models, but are subject to traditional convolution that can merely aggregate permutation-ordered and dimension-equal local inputs. It causes that CNNs are allowed to only manage signals on Euclidean or grid-like domains (e.g., images), not ones on non-Euclidean or graph domains (e.g., traffic networks). To eliminate this limitation, we develop a local-aggregation function, a sharable nonlinear operation, to aggregate permutation-unordered and dimension-unequal local inputs on non-Euclidean domains. In the context of the function approximation theory, the local-aggregation function is parameterized with a group of orthonormal polynomials in an effective and efficient manner. By replacing the traditional convolution in CNNs with the parameterized local-aggregation function, Local-Aggregation Graph Networks (LAGNs) are readily established, which enable to fit nonlinear functions without activation functions and can be expediently trained with the standard back-propagation. Extensive experiments on various datasets strongly demonstrate the effectiveness and efficiency of LAGNs, leading to superior performance on numerous pattern recognition and machine learning tasks, including text categorization, molecular activity detection, taxi flow prediction, and image classification.

15.
IEEE Trans Image Process ; 28(10): 4774-4789, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30969920

RESUMO

This paper performs a comprehensive and comparative evaluation of the state-of-the-art local features for the task of image-based 3D reconstruction. The evaluated local features cover the recently developed ones by using powerful machine learning techniques and the elaborately designed handcrafted features. To obtain a comprehensive evaluation, we choose to include both float type features and binary ones. Meanwhile, two kinds of datasets have been used in this evaluation. One is a dataset of many different scene types with groundtruth 3D points, containing images of different scenes captured at fixed positions, for quantitative performance evaluation of different local features in the controlled image capturing situation. The other dataset contains Internet scale image sets of several landmarks with a lot of unrelated images, which is used for qualitative performance evaluation of different local features in the free image collection situation. Our experimental results show that binary features are competent to reconstruct scenes from controlled image sequences with only a fraction of processing time compared to using float type features. However, for the case of a large scale image set with many distracting images, float type features show a clear advantage over binary ones. Currently, the most traditional SIFT is very stable with regard to scene types in this specific task and produces very competitive reconstruction results among all the evaluated local features. Meanwhile, although the learned binary features are not as competitive as the handcrafted ones, learning float type features with CNN is promising but still requires much effort in the future.

16.
IEEE Trans Pattern Anal Mach Intell ; 41(11): 2660-2676, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-30176580

RESUMO

Most existing hashing methods resort to binary codes for large scale similarity search, owing to the high efficiency of computation and storage. However, binary codes lack enough capability in similarity preservation, resulting in less desirable performance. To address this issue, we propose Nonlinear Asymmetric Multi-Valued Hashing (NAMVH) supported by two distinct non-binary embeddings. Specifically, a real-valued embedding is used for representing the newly-coming query by an ideally nonlinear transformation. Besides, a multi-integer-embedding is employed for compressing the whole database, which is modeled by Binary Sparse Representation (BSR) with fixed sparsity. With these two non-binary embeddings, NAMVH preserves more precise similarities between data points and enables access to the incremental extension with database samples evolving dynamically. To perform meaningful asymmetric similarity computation for efficient semantic search, these embeddings are jointly learnt by preserving the pairwise label-based similarity. Technically, this results in a mixed integer programming problem, which is efficiently solved by a well-designed alternative optimization method. Extensive experiments on seven large scale datasets demonstrate that our approach not only outperforms the existing binary hashing methods in search accuracy, but also retains their query and storage efficiency.

17.
IEEE Trans Neural Netw Learn Syst ; 29(1): 87-103, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-28113786

RESUMO

Hashing-based semantic similarity search is becoming increasingly important for building large-scale content-based retrieval system. The state-of-the-art supervised hashing techniques use flexible two-step strategy to learn hash functions. The first step learns binary codes for training data by solving binary optimization problems with millions of variables, thus usually requiring intensive computations. Despite simplicity and efficiency, locality-sensitive hashing (LSH) has never been recognized as a good way to generate such codes due to its poor performance in traditional approximate neighbor search. We claim in this paper that the true merit of LSH lies in transforming the semantic labels to obtain the binary codes, resulting in an effective and efficient two-step hashing framework. Specifically, we developed the locality-sensitive two-step hashing (LS-TSH) that generates the binary codes through LSH rather than any complex optimization technique. Theoretically, with proper assumption, LS-TSH is actually a useful LSH scheme, so that it preserves the label-based semantic similarity and possesses sublinear query complexity for hash lookup. Experimentally, LS-TSH could obtain comparable retrieval accuracy with state of the arts with two to three orders of magnitudes faster training speed.

18.
IEEE Trans Image Process ; 24(11): 4027-40, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26186789

RESUMO

Hyperspectral unmixing is one of the crucial steps for many hyperspectral applications. The problem of hyperspectral unmixing has proved to be a difficult task in unsupervised work settings where the endmembers and abundances are both unknown. In addition, this task becomes more challenging in the case that the spectral bands are degraded by noise. This paper presents a robust model for unsupervised hyperspectral unmixing. Specifically, our model is developed with the correntropy-based metric where the nonnegative constraints on both endmembers and abundances are imposed to keep physical significance. Besides, a sparsity prior is explicitly formulated to constrain the distribution of the abundances of each endmember. To solve our model, a half-quadratic optimization technique is developed to convert the original complex optimization problem into an iteratively reweighted nonnegative matrix factorization with sparsity constraints. As a result, the optimization of our model can adaptively assign small weights to noisy bands and put more emphasis on noise-free bands. In addition, with sparsity constraints, our model can naturally generate sparse abundances. Experiments on synthetic and real data demonstrate the effectiveness of our model in comparison to the related state-of-the-art unmixing models.

19.
IEEE Trans Neural Netw Learn Syst ; 26(9): 2206-13, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25474813

RESUMO

This brief presents a framework of retargeted least squares regression (ReLSR) for multicategory classification. The core idea is to directly learn the regression targets from data other than using the traditional zero-one matrix as regression targets. The learned target matrix can guarantee a large margin constraint for the requirement of correct classification for each data point. Compared with the traditional least squares regression (LSR) and a recently proposed discriminative LSR models, ReLSR is much more accurate in measuring the classification error of the regression model. Furthermore, ReLSR is a single and compact model, hence there is no need to train two-class (binary) machines that are independent of each other. The convex optimization problem of ReLSR is solved elegantly and efficiently with an alternating procedure including regression and retargeting as substeps. The experimental evaluation over a range of databases identifies the validity of our method.

20.
IEEE Trans Image Process ; 23(12): 5412-27, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25330488

RESUMO

Hyperspectral unmixing, the process of estimating a common set of spectral bases and their corresponding composite percentages at each pixel, is an important task for hyperspectral analysis, visualization, and understanding. From an unsupervised learning perspective, this problem is very challenging-both the spectral bases and their composite percentages are unknown, making the solution space too large. To reduce the solution space, priors. In practice, these priors would easily lead to some unsuitable solution. This is because they are achieved by applying an identical strength of constraints to all the factors, which does not hold in practice. To overcome this limitation, we propose a novel sparsity-based method by learning a data-guided map (DgMap) to describe the individual mixed level of each pixel. Through this DgMap, the l(p) (0 < p < 1) constraint is applied in an adaptive manner. Such implementation not only meets the practical situation, but also guides the spectral bases toward the pixels under highly sparse constraint. What is more, an elegant optimization scheme as well as its convergence proof have been provided in this paper. Extensive experiments on several datasets also demonstrate that the DgMap is feasible, and high quality unmixing results could be obtained by our method.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA