Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Neural Netw ; 174: 106265, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38552351

ABSTRACT

Graph Transformers (GTs) have achieved impressive results on various graph-related tasks. However, the huge computational cost of GTs hinders their deployment and application, especially in resource-constrained environments. Therefore, in this paper, we explore the feasibility of sparsifying GTs, a significant yet under-explored topic. We first discuss the redundancy of GTs based on the characteristics of existing GT models, and then propose a comprehensive Graph Transformer SParsification (GTSP) framework that helps to reduce the computational complexity of GTs from four dimensions: the input graph data, attention heads, model layers, and model weights. Specifically, GTSP designs differentiable masks for each individual compressible component, enabling effective end-to-end pruning. We examine our GTSP through extensive experiments on prominent GTs, including GraphTrans, Graphormer, and GraphGPS. The experimental results demonstrate that GTSP effectively reduces computational costs, with only marginal decreases in accuracy or, in some instances, even improvements. For example, GTSP results in a 30% reduction in Floating Point Operations while contributing to a 1.8% increase in Area Under the Curve accuracy on the OGBG-HIV dataset. Furthermore, we provide several insights on the characteristics of attention heads and the behavior of attention mechanisms, all of which have immense potential to inspire future research endeavors in this domain. Our code is available at https://github.com/LiuChuang0059/GTSP.

2.
Neural Netw ; 170: 548-563, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38052151

ABSTRACT

Siamese tracking has witnessed tremendous progress in tracking paradigm. However, its default box estimation pipeline still faces a crucial inconsistency issue, namely, the bounding box decided by its classification score is not always best overlapped with the ground truth, thus harming performance. To this end, we explore a novel simple tracking paradigm based on the intersection over union (IoU) value prediction. To first bypass this inconsistency issue, we propose a concise target state predictor termed IoUformer, which instead of default box estimation pipeline directly predicts the IoU values related to tracking performance metrics. In detail, it extends the long-range dependency modeling ability of transformer to jointly grasp target-aware interactions between target template and search region, and search sub-region interactions, thus neatly unifying global semantic interaction and target state prediction. Thanks to this joint strength, IoUformer can predict reliable IoU values near-linear with the ground truth, which paves a safe way for our new IoU-based siamese tracking paradigm. Since it is non-trivial to explore this paradigm with pleased efficacy and portability, we offer the respective network components and two alternative localization ways. Experimental results show that our IoUformer-based tracker achieves promising results with less training data. For its applicability, it still serves as a refinement module to consistently boost existing advanced trackers.


Subject(s)
Benchmarking , Semantics
3.
Neural Netw ; 168: 539-548, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37837743

ABSTRACT

As a graph data mining task, graph classification has high academic value and wide practical application. Among them, the graph neural network-based method is one of the mainstream methods. Most graph neural networks (GNNs) follow the message passing paradigm and can be called Message Passing Neural Networks (MPNNs), achieving good results in structural data-related tasks. However, it has also been reported that these methods suffer from over-squashing and limited expressive power. In recent years, many works have proposed different solutions to these problems separately, but none has yet considered these shortcomings in a comprehensive way. After considering these several aspects comprehensively, we identify two specific defects: information loss caused by local information aggregation, and an inability to capture higher-order structures. To solve these issues, we propose a plug-and-play framework based on Commute Time Distance (CTD), in which information is propagated in commute time distance neighborhoods. By considering both local and global graph connections, the commute time distance between two nodes is evaluated with reference to the path length and the number of paths in the whole graph. Moreover, the proposed framework CTD-MPNNs (Commute Time Distance-based Message Passing Neural Networks) can capture higher-order structural information by utilizing commute paths to enhance the expressive power of GNNs. Thus, our proposed framework can propagate and aggregate messages from defined important neighbors and model more powerful GNNs. We conduct extensive experiments using various real-world graph classification benchmarks. The experimental performance demonstrates the effectiveness of our framework. Codes are released on https://github.com/Haldate-Yu/CTD-MPNNs.


Subject(s)
Benchmarking , Data Mining , Neural Networks, Computer
4.
Neural Netw ; 167: 559-571, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37696073

ABSTRACT

Graph Neural Networks (GNNs) have been successfully applied to graph-level tasks in various fields such as biology, social networks, computer vision, and natural language processing. For the graph-level representations learning of GNNs, graph pooling plays an essential role. Among many pooling techniques, node drop pooling has garnered significant attention and is considered as a leading approach. However, existing node drop pooling methods, which typically retain the top-k nodes based on their significance scores, often overlook the diversity inherent in node features and graph structures. This limitation leads to suboptimal graph-level representations. To overcome this, we introduce a groundbreaking plug-and-play score scheme, termed MID. MID comprises a Multidimensional score space and two key operations: flIpscore and Dropscore. The multidimensional score space depicts the significance of nodes by multiple criteria; the flipscore process promotes the preservation of distinct node features; the dropscore compels the model to take into account a range of graph structures rather than focusing on local structures. To evaluate the effectiveness of our proposed MID, we have conducted extensive experiments by integrating it with a broad range of recent node drop pooling methods, such as TopKPool, SAGPool, GSAPool, and ASAP. In particular, MID has proven to bring a significant average improvement of approximately 2.8% over the four aforementioned methods when tested on 17 real-world graph classification datasets. Code is available at https://github.com/whuchuang/mid.


Subject(s)
Learning , Natural Language Processing , Neural Networks, Computer , Social Networking
5.
Article in English | MEDLINE | ID: mdl-37440376

ABSTRACT

Contrastive learning (CL) is a prominent technique for self-supervised representation learning, which aims to contrast semantically similar (i.e., positive) and dissimilar (i.e., negative) pairs of examples under different augmented views. Recently, CL has provided unprecedented potential for learning expressive graph representations without external supervision. In graph CL, the negative nodes are typically uniformly sampled from augmented views to formulate the contrastive objective. However, this uniform negative sampling strategy limits the expressive power of contrastive models. To be specific, not all the negative nodes can provide sufficiently meaningful knowledge for effective contrastive representation learning. In addition, the negative nodes that are semantically similar to the anchor are undesirably repelled from it, leading to degraded model performance. To address these limitations, in this article, we devise an adaptive sampling strategy termed "AdaS." The proposed AdaS framework can be trained to adaptively encode the importance of different negative nodes, so as to encourage learning from the most informative graph nodes. Meanwhile, an auxiliary polarization regularizer is proposed to suppress the adverse impacts of the false negatives and enhance the discrimination ability of AdaS. The experimental results on a variety of real-world datasets firmly verify the effectiveness of our AdaS in improving the performance of graph CL.

6.
Article in English | MEDLINE | ID: mdl-37368807

ABSTRACT

Graph neural networks (GNNs) tend to suffer from high computation costs due to the exponentially increasing scale of graph data and a large number of model parameters, which restricts their utility in practical applications. To this end, some recent works focus on sparsifying GNNs (including graph structures and model parameters) with the lottery ticket hypothesis (LTH) to reduce inference costs while maintaining performance levels. However, the LTH-based methods suffer from two major drawbacks: 1) they require exhaustive and iterative training of dense models, resulting in an extremely large training computation cost, and 2) they only trim graph structures and model parameters but ignore the node feature dimension, where vast redundancy exists. To overcome the above limitations, we propose a comprehensive graph gradual pruning framework termed CGP. This is achieved by designing a during-training graph pruning paradigm to dynamically prune GNNs within one training process. Unlike LTH-based methods, the proposed CGP approach requires no retraining, which significantly reduces the computation costs. Furthermore, we design a cosparsifying strategy to comprehensively trim all the three core elements of GNNs: graph structures, node features, and model parameters. Next, to refine the pruning operation, we introduce a regrowth process into our CGP framework, to reestablish the pruned but important connections. The proposed CGP is evaluated over a node classification task across six GNN architectures, including shallow models graph convolutional network (GCN) and graph attention network (GAT), shallow-but-deep-propagation models simple graph convolution (SGC) and approximate personalized propagation of neural predictions (APPNP), and deep models GCN via initial residual and identity mapping (GCNII) and residual GCN (ResGCN), on a total of 14 real-world graph datasets, including large-scale graph datasets from the challenging Open Graph Benchmark (OGB). Experiments reveal that the proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of the existing methods.

7.
IEEE Trans Pattern Anal Mach Intell ; 45(9): 11270-11282, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37027256

ABSTRACT

Point cloud registration is a fundamental problem in 3D computer vision. Previous learning-based methods for LiDAR point cloud registration can be categorized into two schemes: dense-to-dense matching methods and sparse-to-sparse matching methods. However, for large-scale outdoor LiDAR point clouds, solving dense point correspondences is time-consuming, whereas sparse keypoint matching easily suffers from keypoint detection error. In this paper, we propose SDMNet, a novel Sparse-to-Dense Matching Network for large-scale outdoor LiDAR point cloud registration. Specifically, SDMNet performs registration in two sequential stages: sparse matching stage and local-dense matching stage. In the sparse matching stage, we sample a set of sparse points from the source point cloud and then match them to the dense target point cloud using a spatial consistency enhanced soft matching network and a robust outlier rejection module. Furthermore, a novel neighborhood matching module is developed to incorporate local neighborhood consensus, significantly improving performance. The local-dense matching stage is followed for fine-grained performance, where dense correspondences are efficiently obtained by performing point matching in local spatial neighborhoods of high-confidence sparse correspondences. Extensive experiments on three large-scale outdoor LiDAR point cloud datasets demonstrate that the proposed SDMNet achieves state-of-the-art performance with high efficiency.

8.
IEEE Trans Image Process ; 31: 6635-6648, 2022.
Article in English | MEDLINE | ID: mdl-36256710

ABSTRACT

Image dehazing aims to remove haze in images to improve their image quality. However, most image dehazing methods heavily depend on strict prior knowledge and paired training strategy, which would hinder generalization and performance when dealing with unseen scenes. In this paper, to address the above problem, we propose Bidirectional Normalizing Flow (BiN-Flow), which exploits no prior knowledge and constructs a neural network through weakly-paired training with better generalization for image dehazing. Specifically, BiN-Flow designs 1) Feature Frequency Decoupling (FFD) for mining the various texture details through multi-scale residual blocks and 2) Bidirectional Propagation Flow (BPF) for exploiting the one-to-many relationships between hazy and haze-free images using a sequence of invertible Flow. In addition, BiN-Flow constructs a reference mechanism (RM) that uses a small number of paired hazy and haze-free images and a large number of haze-free reference images for weakly-paired training. Essentially, the mutual relationships between hazy and haze-free images could be effectively learned to further improve the generalization and performance for image dehazing. We conduct extensive experiments on five commonly-used datasets to validate the BiN-Flow. The experimental results that BiN-Flow outperforms all state-of-the-art competitors demonstrate the capability and generalization of our BiN-Flow. Besides, our BiN-Flow could produce diverse dehazing images for the same image by considering restoration diversity.

9.
Article in English | MEDLINE | ID: mdl-36136920

ABSTRACT

Few-shot visual recognition refers to recognize novel visual concepts from a few labeled instances. Many few-shot visual recognition methods adopt the metric-based meta-learning paradigm by comparing the query representation with class representations to predict the category of query instance. However, the current metric-based methods generally treat all instances equally and consequently often obtain biased class representation, considering not all instances are equally significant when summarizing the instance-level representations for the class-level representation. For example, some instances may contain unrepresentative information, such as too much background and information of unrelated concepts, which skew the results. To address the above issues, we propose a novel metric-based meta-learning framework termed instance-adaptive class representation learning network (ICRL-Net) for few-shot visual recognition. Specifically, we develop an adaptive instance revaluing network (AIRN) with the capability to address the biased representation issue when generating the class representation, by learning and assigning adaptive weights for different instances according to their relative significance in the support set of corresponding class. In addition, we design an improved bilinear instance representation and incorporate two novel structural losses, i.e., intraclass instance clustering loss and interclass representation distinguishing loss, to further regulate the instance revaluation process and refine the class representation. We conduct extensive experiments on four commonly adopted few-shot benchmarks: miniImageNet, tieredImageNet, CIFAR-FS, and FC100 datasets. The experimental results compared with the state-of-the-art approaches demonstrate the superiority of our ICRL-Net.

SELECTION OF CITATIONS
SEARCH DETAIL
...