Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 135
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38728126

RESUMO

The presence of label noise in the training data has a profound impact on the generalization of deep neural networks (DNNs). In this study, we introduce and theoretically demonstrate a simple feature noise (FN) method, which directly adds noise to the features of training data and can enhance the generalization of DNNs under label noise. Specifically, we conduct theoretical analyses to reveal that label noise leads to weakened DNN generalization by loosening the generalization bound, and FN results in better DNN generalization by imposing an upper bound on the mutual information between the model weights and the features, which constrains the generalization bound. Furthermore, we conduct a qualitative analysis to discuss the ideal type of FN that obtains good label noise generalization. Finally, extensive experimental results on several popular datasets demonstrate that the FN method can significantly enhance the label noise generalization of state-of-the-art methods. The source codes of the FN method are available on https://github.com/zlzenglu/FN.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38652629

RESUMO

Despite its significant progress, cross-modal retrieval still suffers from one-to-many matching cases, where the multiplicity of semantic instances in another modality could be acquired by a given query. However, existing approaches usually map heterogeneous data into the learned space as deterministic point vectors. In spite of their remarkable performance in matching the most similar instance, such deterministic point embedding suffers from the insufficient representation of rich semantics in one-to-many correspondence. To address the limitations, we intuitively extend a deterministic point into a closed geometry and develop geometric representation learning methods for cross-modal retrieval. Thus, a set of points inside such a geometry could be semantically related to many candidates, and we could effectively capture the semantic uncertainty. We then introduce two types of geometric matching for one-to-many correspondence, i.e., point-to-rectangle matching (dubbed P2RM) and rectangle-to-rectangle matching (termed R2RM). The former treats all retrieved candidates as rectangles with zero volume (equivalent to points) and the query as a box, while the latter encodes all heterogeneous data into rectangles. Therefore, we could evaluate semantic similarity among heterogeneous data by the Euclidean distance from a point to a rectangle or the volume of intersection between two rectangles. Additionally, both strategies could be easily employed for off-the-self approaches and further improve the retrieval performance of baselines. Under various evaluation metrics, extensive experiments and ablation studies on several commonly used datasets, two for image-text matching and two for video-text retrieval, demonstrate our effectiveness and superiority.

3.
Am J Med Sci ; 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38460926

RESUMO

BACKGROUND: Stroke is prevalent in hypertensive population. It has been suggested that unsaturated fatty acids (USFA) have protective effect on stroke. The effect of saturated fatty acids (SFAs) on stroke is still unclear. Therefore, we studied the relationship between circulating fatty acids and acute ischemic stroke (AIS) in hypertensive patients. METHODS: Eighty-nine pairs including 100 men and 78 women matched by sex and age were recruited. Each pair included a hypertensive patient within 48h of AIS onset and a hypertensive patient without stroke. Six circulating fatty acids were methylated before concentration determination which was repeated twice with percent recovery estimated. RESULTS: There were differences in educational level (P = 0.002) and occupation (P < 0.001) between stroke and non-stroke participants. All the 6 fatty acid levels were higher in non-stroke participants (P = 0.017 for palmitoleic acid, 0.001 for palmitic acid, <0.001 for linoleic acid, <0.001 for behenic acid, <0.001 for nervonic acid and 0.002 for lignoceric acid). In logistic regression analysis, AIS was inversely associated with fatty acid levels except for lignoceric acid. After adjustment for education and occupation, the palmitoleic acid and palmitic acid levels were no longer inversely associated with AIS. After further adjustment for systolic blood pressure, smoking, drinking, total cholesterol and triglyceride, the inverse associations of linoleic acid (OR = 0.965, 95%CI = 0.942-0.990, P = 0.005), behenic acid (OR = 0.778, 95%CI = 0.664-0.939, P = 0.009), nervonic acid (OR = 0.323, 95%CI = 0.121-0.860, P = 0.024) with AIS remained significant. CONCLUSIONS: Circulating fatty acids except lignoceric acid were inversely associated with AIS. Both USFAs and SFAs may have beneficial effect on stroke prevention in hypertensive population.

4.
IEEE Trans Image Process ; 33: 2226-2237, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38470583

RESUMO

Cross-modal retrieval (e.g., query a given image to obtain a semantically similar sentence, and vice versa) is an important but challenging task, as the heterogeneous gap and inconsistent distributions exist between different modalities. The dominant approaches struggle to bridge the heterogeneity by capturing the common representations among heterogeneous data in a constructed subspace which can reflect the semantic closeness. However, insufficient consideration is taken into the fact that learned latent representations are actually heavily entangled with those semantic-unrelated features, which obviously further compounds the challenges of cross-modal retrieval. To alleviate the difficulty, this work makes an assumption that the data are jointly characterized by two independent features: semantic-shared and semantic-unrelated representations. The former presents characteristics of consistent semantics shared by different modalities, while the latter reflects the characteristics with respect to the modality yet unrelated to semantics, such as background, illumination, and other low-level information. Therefore, this paper aims to disentangle the shared semantics from the entangled features, andthus the purer semantic representation can promote the closeness of paired data. Specifically, this paper designs a novel Semantics Disentangling approach for Cross-Modal Retrieval (termed as SDCMR) to explicitly decouple the two different features based on variational auto-encoder. Next, the reconstruction is performed by exchanging shared semantics to ensure the learning of semantic consistency. Moreover, a dual adversarial mechanism is designed to disentangle the two independent features via a pushing-and-pulling strategy. Comprehensive experiments on four widely used datasets demonstrate the effectiveness and superiority of the proposed SDCMR method by achieving a new bar on performance when compared against 15 state-of-the-art methods.

5.
Artigo em Inglês | MEDLINE | ID: mdl-38379234

RESUMO

Unsupervised domain adaptation (UDA) aims to alleviate the domain shift by transferring knowledge learned from a labeled source dataset to an unlabeled target domain. Although UDA has seen promising progress recently, it requires access to data from both domains, making it problematic in source data-absent scenarios. In this article, we investigate a practical task source-free domain adaptation (SFDA) that alleviates the limitations of the widely studied UDA in simultaneously acquiring source and target data. In addition, we further study the imbalanced SFDA (ISFDA) problem, which addresses the intra-domain class imbalance and inter-domain label shift in SFDA. We observe two key issues in SFDA that: 1) target data form clusters in the representation space regardless of whether the target data points are aligned with the source classifier and 2) target samples with higher classification confidence are more reliable and have less variation in their classification confidence during adaptation. Motivated by these observations, we propose a unified method, named intrinsic consistency preservation with adaptively reliable samples (ICPR), to jointly cope with SFDA and ISFDA. Specifically, ICPR first encourages the intrinsic consistency in the predictions of neighbors for unlabeled samples with weak augmentation (standard flip-and-shift), regardless of their reliability. ICPR then generates strongly augmented views specifically for adaptively selected reliable samples and is trained to fix the intrinsic consistency between weakly and strongly augmented views of the same image concerning predictions of neighbors and their own. Additionally, we propose to use a prototype-like classifier to avoid the classification confusion caused by severe intra-domain class imbalance and inter-domain label shift. We demonstrate the effectiveness and general applicability of ICPR on six benchmarks of both SFDA and ISFDA tasks. The reproducible code of our proposed ICPR method is available at https://github.com/CFM-MSG/Code_ICPR.

6.
Artigo em Inglês | MEDLINE | ID: mdl-38416606

RESUMO

Over the past decade, domain adaptation has become a widely studied branch of transfer learning which aims to improve performance on target domains by leveraging knowledge from the source domain. Conventional domain adaptation methods often assume access to both source and target domain data simultaneously, which may not be feasible in real-world scenarios due to privacy and confidentiality concerns. As a result, the research of Source-Free Domain Adaptation (SFDA) has drawn growing attention in recent years, which only utilizes the source-trained model and unlabeled target data to adapt to the target domain. Despite the rapid explosion of SFDA work, there has been no timely and comprehensive survey in the field. To fill this gap, we provide a comprehensive survey of recent advances in SFDA and organize them into a unified categorization scheme based on the framework of transfer learning. Instead of presenting each approach independently, we modularize several components of each method to more clearly illustrate their relationships and mechanisms in light of the composite properties of each method. Furthermore, we compare the results of more than 30 representative SFDA methods on three popular classification benchmarks, namely Office-31, Office-home, and VisDA, to explore the effectiveness of various technical routes and the combination effects among them. Additionally, we briefly introduce the applications of SFDA and related fields. Drawing on our analysis of the challenges confronting SFDA, we offer some insights into future research directions and potential settings.

7.
Artigo em Inglês | MEDLINE | ID: mdl-38194384

RESUMO

Unsupervised anomaly detection (UAD) attracts a lot of research interest and drives widespread applications, where only anomaly-free samples are available for training. Some UAD applications intend to locate the anomalous regions further even without any anomaly information. Although the absence of anomalous samples and annotations deteriorates the UAD performance, an inconspicuous, yet powerful statistics model, the normalizing flows, is appropriate for anomaly detection (AD) and localization in an unsupervised fashion. The flow-based probabilistic models, only trained on anomaly-free data, can efficiently distinguish unpredictable anomalies by assigning them much lower likelihoods than normal data. Nevertheless, the size variation of unpredictable anomalies introduces another inconvenience to the flow-based methods for high-precision AD and localization. To generalize the anomaly size variation, we propose a novel multiscale flow-based framework (MSFlow) composed of asymmetrical parallel flows followed by a fusion flow to exchange multiscale perceptions. Moreover, different multiscale aggregation strategies are adopted for image-wise AD and pixel-wise anomaly localization according to the discrepancy between them. The proposed MSFlow is evaluated on three AD datasets, significantly outperforming existing methods. Notably, on the challenging MVTec AD benchmark, our MSFlow achieves a new state-of-the-art (SOTA) with a detection AUORC score of up to 99.7%, localization AUCROC score of 98.8% and PRO score of 97.1%.

8.
IEEE Trans Image Process ; 33: 1032-1044, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38241118

RESUMO

The majority of existing works explore Unsupervised Domain Adaptation (UDA) with an ideal assumption that samples in both domains are available and complete. In real-world applications, however, this assumption does not always hold. For instance, data-privacy is becoming a growing concern, the source domain samples may be not publicly available for training, leading to a typical Source-Free Domain Adaptation (SFDA) problem. Traditional UDA methods would fail to handle SFDA since there are two challenges in the way: the data incompleteness issue and the domain gaps issue. In this paper, we propose a visually SFDA method named Adversarial Style Matching (ASM) to address both issues. Specifically, we first train a style generator to generate source-style samples given the target images to solve the data incompleteness issue. We use the auxiliary information stored in the pre-trained source model to ensure that the generated samples are statistically aligned with the source samples, and use the pseudo labels to keep semantic consistency. Then, we feed the target domain samples and the corresponding source-style samples into a feature generator network to reduce the domain gaps with a self-supervised loss. An adversarial scheme is employed to further expand the distributional coverage of the generated source-style samples. The experimental results verify that our method can achieve comparative performance even compared with the traditional UDA methods with source samples for training.

9.
Artigo em Inglês | MEDLINE | ID: mdl-38127607

RESUMO

Generating consecutive descriptions for videos, that is, video captioning, requires taking full advantage of visual representation along with the generation process. Existing video captioning methods focus on an exploration of spatial-temporal representations and their relationships to produce inferences. However, such methods only exploit the superficial association contained in a video itself without considering the intrinsic visual commonsense knowledge that exists in a video dataset, which may hinder their capabilities of knowledge cognitive to reason accurate descriptions. To address this problem, we propose a simple, yet effective method, called visual commonsense-aware representation network (VCRN), for video captioning. Specifically, we construct a Video Dictionary, a plug-and-play component, obtained by clustering all video features from the total dataset into multiple clustered centers without additional annotation. Each center implicitly represents a visual commonsense concept in a video domain, which is utilized in our proposed visual concept selection (VCS) component to obtain a video-related concept feature. Next, a concept-integrated generation (CIG) component is proposed to enhance caption generation. Extensive experiments on three public video captioning benchmarks: MSVD, MSR-VTT, and VATEX, demonstrate that our method achieves state-of-the-art performance, indicating the effectiveness of our method. In addition, our method is integrated into the existing method of video question answering (VideoQA) and improves this performance, which further demonstrates the generalization capability of our method. The source code has been released at https://github.com/zchoi/VCRN.

10.
IEEE Trans Image Process ; 32: 6386-6400, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37963006

RESUMO

Action Quality Assessment (AQA) plays an important role in video analysis, which is applied to evaluate the quality of specific actions, i.e., sports activities. However, it is still challenging because there are lots of small action discrepancies with similar backgrounds, but current approaches mostly adopt holistic video representations. So that fine-grained intra-class variations are unable to be captured. To address the aforementioned challenge, we propose a Fine-grained Spatio-temporal Parsing Network (FSPN) which is composed of the intra-sequence action parsing module and spatiotemporal multiscale transformer module to learn fine-grained spatiotemporal sub-action representations for more reliable AQA. The intra-sequence action parsing module performs semantical sub-action parsing by mining sub-actions at fine-grained levels. It enables a correct description of the subtle differences between action sequences. The spatiotemporal multiscale transformer module learns motion-oriented action features and obtains their long-range dependencies among sub-actions at different scales. Furthermore, we design a group contrastive loss to train the model and learn more discriminative feature representations for sub-actions without explicit supervision. We exhaustively evaluate our proposed approach in the FineDiving, AQA-7, and MTL-AQA datasets. Extensive experiment results demonstrate the effectiveness and feasibility of our proposed approach, which outperforms the state-of-the-art methods by a significant margin.

11.
IEEE Trans Image Process ; 32: 6485-6499, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37991910

RESUMO

Existing supervised quantization methods usually learn the quantizers from pair-wise, triplet, or anchor-based losses, which only capture their relationship locally without aligning them globally. This may cause an inadequate use of the entire space and a severe intersection among different semantics, leading to inferior retrieval performance. Furthermore, to enable quantizers to learn in an end-to-end way, current practices usually relax the non-differentiable quantization operation by substituting it with softmax, which unfortunately is biased, leading to an unsatisfying suboptimal solution. To address the above issues, we present Spherical Centralized Quantization (SCQ), which contains a Priori Knowledge based Feature (PKFA) module for the global alignment of feature vectors, and an Annealing Regulation Semantic Quantization (ARSQ) module for low-biased optimization. Specifically, the PKFA module first applies Semantic Center Allocation (SCA) to obtain semantic centers based on prior knowledge, and then adopts Centralized Feature Alignment (CFA) to gather feature vectors based on corresponding semantic centers. The SCA and CFA globally optimize the inter-class separability and intra-class compactness, respectively. After that, the ARSQ module performs a partial-soft relaxation to tackle biases, and an Annealing Regulation Quantization loss for further addressing the local optimal solution. Experimental results show that our SCQ outperforms state-of-the-art algorithms by a large margin (2.1%, 3.6%, 5.5% mAP respectively) on CIFAR-10, NUS-WIDE, and ImageNet with a code length of 8 bits. Codes are publicly available:https://github.com/zzb111/Spherical-Centralized-Quantization.

12.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13921-13940, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37788219

RESUMO

The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e.g., "woman-on/standing on/walking on-beach". As general SGG models tend to predict head predicates and re-balancing strategies prefer tail categories, none of them can appropriately handle hard-to-distinguish predicates. To tackle this issue, inspired by fine-grained image classification, which focuses on differentiating hard-to-distinguish objects, we propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG. First, we introduce an Adaptive Predicate Lattice (PL-A) to figure out hard-to-distinguish predicates, which adaptively explores predicate correlations in keeping with model's dynamic learning pace. Practically, PL-A is initialized from SGG dataset, and gets refined by exploring model's predictions of current mini-batch. Utilizing PL-A, we propose an Adaptive Category Discriminating Loss (CDL-A) and an Adaptive Entity Discriminating Loss (EDL-A), which progressively regularize model's discriminating process with fine-grained supervision concerning model's dynamic learning status, ensuring balanced and efficient learning process. Extensive experimental results show that our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance. Moreover, experiments on Sentence-to-Graph Retrieval and Image Captioning tasks further demonstrate practicability of our method.

13.
IEEE Trans Image Process ; 32: 5909-5920, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37883290

RESUMO

The optical flow guidance strategy is ideal for obtaining motion information of objects in the video. It is widely utilized in video segmentation tasks. However, existing optical flow-based methods have a significant dependency on optical flow, which results in poor performance when the optical flow estimation fails for a particular scene. The temporal consistency provided by the optical flow could be effectively supplemented by modeling in a structural form. This paper proposes a new hierarchical graph neural network (GNN) architecture, dubbed hierarchical graph pattern understanding (HGPU), for zero-shot video object segmentation (ZS-VOS). Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (i.e., optical flow) to enhance the high-order representations from the neighbors of target frames. Specifically, a hierarchical graph pattern encoder with message aggregation is introduced to acquire different levels of motion and appearance features in a sequential manner. Furthermore, a decoder is designed for hierarchically parsing and understanding the transformed multi-modal contexts to achieve more accurate and robust results. HGPU achieves state-of-the-art performance on four publicly available benchmarks (DAVIS-16, YouTube-Objects, Long-Videos and DAVIS-17). Code and pre-trained model can be found at https://github.com/NUST-Machine-Intelligence-Laboratory/HGPU.

14.
Opt Lett ; 48(15): 3909-3912, 2023 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-37527080

RESUMO

Reversed nonlinear dynamics is predicted to be capable of enhancing the quantum sensing in unprecedented ways. Here, we report the experimental demonstration of a loss-tolerant (external loss) and quantum-enhanced interferometer. Two cascaded optical parametric amplifiers are used to judiciously construct an interferometry with two orthogonal squeezing operation. As a consequence, a weak displacement introduced by a test cavity can be amplified for measurement, and the measured signal-to-noise ratio is better than that of both conventional photon shot-noise limited and squeezed-light assisted interferometers. We further confirm its superior loss-tolerant performance by varying the external losses and comparing with both conventional photon shot-noise limited and squeezed-light assisted configurations, illustrating the potential application in gravitational wave detection.

16.
Artigo em Inglês | MEDLINE | ID: mdl-37163399

RESUMO

Building multi-person pose estimation (MPPE) models that can handle complex foreground and uncommon scenes is an important challenge in computer vision. Aside from designing novel models, strengthening training data is a promising direction but remains largely unexploited for the MPPE task. In this article, we systematically identify the key deficiencies of existing pose datasets that prevent the power of well-designed models from being fully exploited and propose the corresponding solutions. Specifically, we find that the traditional data augmentation techniques are inadequate in addressing the two key deficiencies, imbalanced instance complexity (IC) (evaluated by our new metric IC) and insufficient realistic scenes. To overcome these deficiencies, we propose a model-agnostic full-view data generation (Full-DG) method to enrich the training data from the perspectives of both poses and scenes. By hallucinating images with more balanced pose complexity and richer real-world scenes, Full-DG can help improve pose estimators' robustness and generalizability. In addition, we introduce a plug-and-play adaptive category-aware loss (AC-loss) to alleviate the severe pixel-level imbalance between keypoints and backgrounds (i.e., around 1:600). Full-DG together with AC-loss can be readily applied to both the bottom-up and top-down models to improve their accuracy. Notably, plugging into the representative estimators HigherHRNet and HRNet, our method achieves substantial performance gains of 1.0%-2.9% AP on the COCO benchmark, and 1.0%-5.1% AP on the CrowdPose benchmark.

17.
IEEE Trans Image Process ; 32: 5017-5030, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37186535

RESUMO

Lately, video-language pre-training and text-video retrieval have attracted significant attention with the explosion of multimedia data on the Internet. However, existing approaches for video-language pre-training typically limit the exploitation of the hierarchical semantic information in videos, such as frame semantic information and global video semantic information. In this work, we present an end-to-end pre-training network with Hierarchical Matching and Momentum Contrast named HMMC. The key idea is to explore the hierarchical semantic information in videos via multilevel semantic matching between videos and texts. This design is motivated by the observation that if a video semantically matches a text (can be a title, tag or caption), the frames in this video usually have semantic connections with the text and show higher similarity than frames in other videos. Hierarchical matching is mainly realized by two proxy tasks: Video-Text Matching (VTM) and Frame-Text Matching (FTM). Another proxy task: Frame Adjacency Matching (FAM) is proposed to enhance the single visual modality representations while training from scratch. Furthermore, momentum contrast framework was introduced into HMMC to form a multimodal momentum contrast framework, enabling HMMC to incorporate more negative samples for contrastive learning which contributes to the generalization of representations. We also collected a large-scale Chinese video-language dataset (over 763k unique videos) named CHVTT to explore the multilevel semantic connections between videos and texts. Experimental results on two major Text-video retrieval benchmark datasets demonstrate the advantages of our methods. We release our code at https://github.com/cheetah003/HMMC.

18.
IEEE Trans Image Process ; 32: 2348-2359, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37074884

RESUMO

Zero-shot video object segmentation (ZS-VOS) aims to segment foreground objects in a video sequence without prior knowledge of these objects. However, existing ZS-VOS methods often struggle to distinguish between foreground and background or to keep track of the foreground in complex scenarios. The common practice of introducing motion information, such as optical flow, can lead to overreliance on optical flow estimation. To address these challenges, we propose an encoder-decoder-based hierarchical co-attention propagation network (HCPN) capable of tracking and segmenting objects. Specifically, our model is built upon multiple collaborative evolutions of the parallel co-attention module (PCM) and the cross co-attention module (CCM). PCM captures common foreground regions among adjacent appearance and motion features, while CCM further exploits and fuses cross-modal motion features returned by PCM. Our method is progressively trained to achieve hierarchical spatio-temporal feature propagation across the entire video. Experimental results demonstrate that our HCPN outperforms all previous methods on public benchmarks, showcasing its effectiveness for ZS-VOS. Code and pre-trained model can be found at https://github.com/NUST-Machine-Intelligence-Laboratory/HCPN.

19.
IEEE Trans Image Process ; 32: 2399-2412, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37015122

RESUMO

Multi-Codebook Quantization (MCQ) is a generalized version of existing codebook-based quantizations for Approximate Nearest Neighbor (ANN) search. Specifically, MCQ picks one codeword for each sub-codebook independently and takes the sum of picked codewords to approximate the original vector. The objective function involves no constraints, therefore, MCQ theoretically has the potential to achieve the best performance because solutions of other codebook-based quantization methods are all covered by MCQ's solution space under the same codebook size setting. However, finding the optimal solution to MCQ is proved to be NP-hard due to its encoding process, i.e., converting an input vector to a binary code. To tackle this, researchers apply constraints to it to find near-optimal solutions or employ heuristic algorithms that are still time-consuming for encoding. Different from previous approaches, this paper takes the first attempt to find a deep solution to MCQ. The encoding network is designed to be as simple as possible, so the very complex encoding problem becomes simply a feed-forward. Compared with other methods on three datasets, our method shows state-of-the-art performance. Notably, our method is 11× - 38× faster than heuristic algorithms for encoding, which makes it more practical for the real scenery of large-scale retrieval. Our code is publicly available: https://github.com/DeepMCQ/DeepQ.

20.
Artigo em Inglês | MEDLINE | ID: mdl-37022401

RESUMO

Contrastive learning has been successfully applied in unsupervised representation learning. However, the generalization ability of representation learning is limited by the fact that the loss of downstream tasks (e.g., classification) is rarely taken into account while designing contrastive methods. In this article, we propose a new contrastive-based unsupervised graph representation learning (UGRL) framework by 1) maximizing the mutual information (MI) between the semantic information and the structural information of the data and 2) designing three constraints to simultaneously consider the downstream tasks and the representation learning. As a result, our proposed method outputs robust low-dimensional representations. Experimental results on 11 public datasets demonstrate that our proposed method is superior over recent state-of-the-art methods in terms of different downstream tasks. Our code is available at https://github.com/LarryUESTC/GRLC.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA