Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 96
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38598383

RESUMO

A long-standing topic in artificial intelligence is the effective recognition of patterns from noisy images. In this regard, the recent data-driven paradigm considers 1) improving the representation robustness by adding noisy samples in training phase (i.e., data augmentation) or 2) pre-processing the noisy image by learning to solve the inverse problem (i.e., image denoising). However, such methods generally exhibit inefficient process and unstable result, limiting their practical applications. In this paper, we explore a non-learning paradigm that aims to derive robust representation directly from noisy images, without the denoising as pre-processing. Here, the noise-robust representation is designed as Fractional-order Moments in Radon space (FMR), with also beneficial properties of orthogonality and rotation invariance. Unlike earlier integer-order methods, our work is a more generic design taking such classical methods as special cases, and the introduced fractional-order parameter offers time-frequency analysis capability that is not available in classical methods. Formally, both implicit and explicit paths for constructing the FMR are discussed in detail. Extensive simulation experiments and robust visual applications are provided to demonstrate the uniqueness and usefulness of our FMR, especially for noise robustness, rotation invariance, and time-frequency discriminability.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38530739

RESUMO

Fast adversarial training (FAT) is an efficient method to improve robustness in white-box attack scenarios. However, the original FAT suffers from catastrophic overfitting, which dramatically and suddenly reduces robustness after a few training epochs. Although various FAT variants have been proposed to prevent overfitting, they require high training time. In this paper, we investigate the relationship between adversarial example quality and catastrophic overfitting by comparing the training processes of standard adversarial training and FAT. We find that catastrophic overfitting occurs when the attack success rate of adversarial examples becomes worse. Based on this observation, we propose a positive prior-guided adversarial initialization to prevent overfitting by improving adversarial example quality without extra training time. This initialization is generated by using high-quality adversarial perturbations from the historical training process. We provide theoretical analysis for the proposed initialization and propose a prior-guided regularization method that boosts the smoothness of the loss function. Additionally, we design a prior-guided ensemble FAT method that averages the different model weights of historical models using different decay rates. Our proposed method, called FGSM-PGK, assembles the prior-guided knowledge, i.e., the prior-guided initialization and model weights, acquired during the historical training process. The proposed method can effectively improve the model's adversarial robustness in white-box attack scenarios. Evaluations of four datasets demonstrate the superiority of the proposed method.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38412089

RESUMO

Optical aberration is a ubiquitous degeneration in realistic lens-based imaging systems. Optical aberrations are caused by the differences in the optical path length when light travels through different regions of the camera lens with different incident angles. The blur and chromatic aberrations manifest significant discrepancies when the optical system changes. This work designs a transferable and effective image simulation system of simple lenses via multi-wavelength, depth-aware, spatially-variant four-dimensional point spread functions (4D-PSFs) estimation by changing a small amount of lens-dependent parameters. The image simulation system can alleviate the overhead of dataset collecting and exploiting the principle of computational imaging for effective optical aberration correction. With the guidance of domain knowledge about the image formation model provided by the 4D-PSFs, we establish a multi-scale optical aberration correction network for degraded image reconstruction, which consists of a scene depth estimation branch and an image restoration branch. Specifically, we propose to predict adaptive filters with the depth-aware PSFs and carry out dynamic convolutions, which facilitate the model's generalization in various scenes. We also employ convolution and self-attention mechanisms for global and local feature extraction and realize a spatially-variant restoration. The multi-scale feature extraction complements the features across different scales and provides fine details and contextual features. Extensive experiments demonstrate that our proposed algorithm performs favorably against state-of-the-art restoration methods. The source code and trained models are available to the public.

4.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 1093-1108, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37930909

RESUMO

Image restoration aims to reconstruct the latent sharp image from its corrupted counterpart. Besides dealing with this long-standing task in the spatial domain, a few approaches seek solutions in the frequency domain by considering the large discrepancy between spectra of sharp/degraded image pairs. However, these algorithms commonly utilize transformation tools, e.g., wavelet transform, to split features into several frequency parts, which is not flexible enough to select the most informative frequency component to recover. In this paper, we exploit a multi-branch and content-aware module to decompose features into separate frequency subbands dynamically and locally, and then accentuate the useful ones via channel-wise attention weights. In addition, to handle large-scale degradation blurs, we propose an extremely simple decoupling and modulation module to enlarge the receptive field via global and window-based average pooling. Furthermore, we merge the paradigm of multi-stage networks into a single U-shaped network to pursue multi-scale receptive fields and improve efficiency. Finally, integrating the above designs into a convolutional backbone, the proposed Frequency Selection Network (FSNet) performs favorably against state-of-the-art algorithms on 20 different benchmark datasets for 6 representative image restoration tasks, including single-image defocus deblurring, image dehazing, image motion deblurring, image desnowing, image deraining, and image denoising.

5.
Zhongguo Dang Dai Er Ke Za Zhi ; 25(11): 1107-1112, 2023 Nov 15.
Artigo em Chinês | MEDLINE | ID: mdl-37990453

RESUMO

OBJECTIVES: To study the efficacy and safety of Xiyanping injection through intramuscular injection for the treatment of acute bronchitis in children. METHODS: A prospective study was conducted from December 2021 to October 2022, including 78 children with acute bronchitis from three hospitals using a multicenter, randomized, parallel-controlled design. The participants were divided into a test group (conventional treatment plus Xiyanping injection; n=36) and a control group (conventional treatment alone; n=37) in a 1:1 ratio. Xiyanping injection was administered at a dose of 0.3 mL/(kg·d) (total daily dose ≤8 mL), twice daily via intramuscular injection, with a treatment duration of ≤4 days and a follow-up period of 7 days. The treatment efficacy and safety were compared between the two groups. RESULTS: The total effective rate on the 3rd day after treatment in the test group was significantly higher than that in the control group (P<0.05), while there was no significant difference in the total effective rate on the 5th day between the two groups (P>0.05). The rates of fever relief, cough relief, and lung rale relief in the test group on the 3rd day after treatment were higher than those in the control group (P<0.05). The cough relief rate on the 5th day after treatment in the test group was higher than that in the control group (P<0.05), while there was no significant difference in the fever relief rate and lung rale relief rate between the two groups (P>0.05). The cough relief time, daily cough relief time, and nocturnal cough relief time in the test group were significantly shorter than those in the control group (P<0.05), while there were no significant differences in the fever duration and lung rale relief time between the two groups (P>0.05). There was no significant difference in the incidence of adverse events between the two groups (P>0.05). CONCLUSIONS: The overall efficacy of combined routine treatment with intramuscular injection of Xiyanping injection in the treatment of acute bronchitis in children is superior to that of routine treatment alone, without an increase in the incidence of adverse reactions.


Assuntos
Bronquite , Tosse , Humanos , Criança , Injeções Intramusculares , Tosse/tratamento farmacológico , Estudos Prospectivos , Sons Respiratórios , Bronquite/tratamento farmacológico , Resultado do Tratamento
6.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7668-7685, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37819793

RESUMO

Nowadays, machine learning (ML) and deep learning (DL) methods have become fundamental building blocks for a wide range of AI applications. The popularity of these methods also makes them widely exposed to malicious attacks, which may cause severe security concerns. To understand the security properties of the ML/DL methods, researchers have recently started to turn their focus to adversarial attack algorithms that could successfully corrupt the model or clean data owned by the victim with imperceptible perturbations. In this paper, we study the Label Flipping Attack (LFA) problem, where the attacker expects to corrupt an ML/DL model's performance by flipping a small fraction of the labels in the training data. Prior art along this direction adopts combinatorial optimization problems, leading to limited scalability toward deep learning models. To this end, we propose a novel minimax problem which provides an efficient reformulation of the sample selection process in LFA. In the new optimization problem, the sample selection operation could be implemented with a single thresholding parameter. This leads to a novel training algorithm called Sample Thresholding. Since the objective function is differentiable and the model complexity does not depend on the sample size, we can apply Sample Thresholding to attack deep learning models. Moreover, since the victim's behavior is not predictable in a poisonous attack setting, we have to employ surrogate models to simulate the true model employed by the victim model. Seeing the problem, we provide a theoretical analysis of such a surrogate paradigm. Specifically, we show that the performance gap between the true model employed by the victim and the surrogate model is small under mild conditions. On top of this paradigm, we extend Sample Thresholding to the crowdsourced ranking task, where labels collected from the annotators are vulnerable to adversarial attacks. Finally, experimental analyses on three real-world datasets speak to the efficacy of our method.

7.
IEEE Trans Image Process ; 32: 5465-5477, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37773909

RESUMO

Context modeling or multi-level feature fusion methods have been proved to be effective in improving semantic segmentation performance. However, they are not specialized to deal with the problems of pixel-context mismatch and spatial feature misalignment, and the high computational complexity hinders their widespread application in real-time scenarios. In this work, we propose a lightweight Context and Spatial Feature Calibration Network (CSFCN) to address the above issues with pooling-based and sampling-based attention mechanisms. CSFCN contains two core modules: Context Feature Calibration (CFC) module and Spatial Feature Calibration (SFC) module. CFC adopts a cascaded pyramid pooling module to efficiently capture nested contexts, and then aggregates private contexts for each pixel based on pixel-context similarity to realize context feature calibration. SFC splits features into multiple groups of sub-features along the channel dimension and propagates sub-features therein by the learnable sampling to achieve spatial feature calibration. Extensive experiments on the Cityscapes and CamVid datasets illustrate that our method achieves a state-of-the-art trade-off between speed and accuracy. Concretely, our method achieves 78.7% mIoU with 70.0 FPS and 77.8% mIoU with 179.2 FPS on the Cityscapes and CamVid test sets, respectively. The code is available at https://nave.vr3i.com/ and https://github.com/kaigelee/CSFCN.

8.
IEEE Trans Image Process ; 32: 5394-5407, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37721874

RESUMO

Human parsing aims to segment each pixel of the human image with fine-grained semantic categories. However, current human parsers trained with clean data are easily confused by numerous image corruptions such as blur and noise. To improve the robustness of human parsers, in this paper, we construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models. Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions. Specifically, two types of data augmentations from different views, i.e., image-aware augmentation and model-aware image-to-image transformation, are integrated in a sequential manner for adapting to unforeseen image corruptions. The image-aware augmentation can enrich the high diversity of training images with the help of common image operations. The model-aware augmentation strategy that improves the diversity of input data by considering the model's randomness. The proposed method is model-agnostic, and it can plug and play into arbitrary state-of-the-art human parsing frameworks. The experimental results show that the proposed method demonstrates good universality which can improve the robustness of the human parsing models and even the semantic segmentation models when facing various image common corruptions. Meanwhile, it can still obtain approximate performance on clean data.

9.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15345-15363, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37751347

RESUMO

Positive-Unlabeled (PU) data arise frequently in a wide range of fields such as medical diagnosis, anomaly analysis and personalized advertising. The absence of any known negative labels makes it very challenging to learn binary classifiers from such data. Many state-of-the-art methods reformulate the original classification risk with individual risks over positive and unlabeled data, and explicitly minimize the risk of classifying unlabeled data as negative. This, however, usually leads to classifiers with a bias toward negative predictions, i.e., they tend to recognize most unlabeled data as negative. In this paper, we propose a label distribution alignment formulation for PU learning to alleviate this issue. Specifically, we align the distribution of predicted labels with the ground-truth, which is constant for a given class prior. In this way, the proportion of samples predicted as negative is explicitly controlled from a global perspective, and thus the bias toward negative predictions could be intrinsically eliminated. On top of this, we further introduce the idea of functional margins to enhance the model's discriminability, and derive a margin-based learning framework named Positive-Unlabeled learning with Label Distribution Alignment (PULDA). This framework is also combined with the class prior estimation process for practical scenarios, and theoretically supported by a generalization analysis. Moreover, a stochastic mini-batch optimization algorithm based on the exponential moving average strategy is tailored for this problem with a convergence guarantee. Finally, comprehensive empirical results demonstrate the effectiveness of the proposed method.

10.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15494-15511, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37561614

RESUMO

The Area Under the ROC curve (AUC) is a popular metric for long-tail classification. Many efforts have been devoted to AUC optimization methods in the past decades. However, little exploration has been done to make them survive adversarial attacks. Among the few exceptions, AdAUC presents an early trial for AUC-oriented adversarial training with a convergence guarantee. This algorithm generates the adversarial perturbations globally for all the training examples. However, it implicitly assumes that the attackers must know in advance that the victim is using an AUC-based loss function and training technique, which is too strong to be met in real-world scenarios. Moreover, whether a straightforward generalization bound for AdAUC exists is unclear due to the technical difficulties in decomposing each adversarial example. By carefully revisiting the AUC-orient adversarial training problem, we present three reformulations of the original objective function and propose an inducing algorithm. On top of this, we can show that: 1) Under mild conditions, AdAUC can be optimized equivalently with score-based or instance-wise-loss-based perturbations, which is compatible with most of the popular adversarial example generation methods. 2) AUC-oriented AT does have an explicit error bound to ensure its generalization ability. 3) One can construct a fast SVRG-based gradient descent-ascent algorithm to accelerate the AdAUC method. Finally, the extensive experimental results show the performance and robustness of our algorithm in five long-tail datasets.

11.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14161-14174, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37561615

RESUMO

The Area Under the ROC curve (AUC) is a crucial metric for machine learning, which is often a reasonable choice for applications like disease prediction and fraud detection where the datasets often exhibit a long-tail nature. However, most of the existing AUC-oriented learning methods assume that the training data and test data are drawn from the same distribution. How to deal with domain shift remains widely open. This paper presents an early trial to attack AUC-oriented Unsupervised Domain Adaptation (UDA) (denoted as AUCUDA hence after). Specifically, we first construct a generalization bound that exploits a new distributional discrepancy for AUC. The critical challenge is that the AUC risk could not be expressed as a sum of independent loss terms, making the standard theoretical technique unavailable. We propose a new result that not only addresses the interdependency issue but also brings a much sharper bound with weaker assumptions about the loss function. Turning theory into practice, the original discrepancy requires complete annotations on the target domain, which is incompatible with UDA. To fix this issue, we propose a pseudo-labeling strategy and present an end-to-end training framework. Finally, empirical studies over five real-world datasets speak to the efficacy of our framework.

12.
IEEE Trans Image Process ; 32: 4393-4406, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37490377

RESUMO

Sketch classification models have been extensively investigated by designing a task-driven deep neural network. Despite their successful performances, few works have attempted to explain the prediction of sketch classifiers. To explain the prediction of classifiers, an intuitive way is to visualize the activation maps via computing the gradients. However, visualization based explanations are constrained by several factors when directly applying them to interpret the sketch classifiers: (i) low-semantic visualization regions for human understanding. and (ii) neglecting of the inter-class correlations among distinct categories. To address these issues, we introduce a novel explanation method to interpret the decision of sketch classifiers with stroke-level evidences. Specifically, to achieve stroke-level semantic regions, we first develop a sketch parser that parses the sketch into strokes while preserving their geometric structures. Then, we design a counterfactual map generator to discover the stroke-level principal components for a specific category. Finally, based on the counterfactual feature maps, our model could explain the question of "why the sketch is classified as X" by providing positive and negative semantic explanation evidences. Experiments conducted on two public sketch benchmarks, Sketchy-COCO and TU-Berlin, demonstrate the effectiveness of our proposed model. Furthermore, our model could provide more discriminative and human understandable explanations compared with these existing works.

13.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13117-13133, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37390000

RESUMO

Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories. We first contribute the Product1M datasets and define two real practical instance-level retrieval tasks that enable evaluations on price comparison and personalized recommendations. For both instance-level tasks, accurately identifying the intended product target mentioned in visual-linguistic data and mitigating the impact of irrelevant content are quite challenging. To address this, we devise a more effective cross-modal pretraining model capable of adaptively incorporating key concept information from multi-modal data. This is accomplished by utilizing an entity graph, where nodes represented entities and edges denoted the similarity relations between them. Specifically, a novel Entity-Graph Enhanced Cross-Modal Pretraining (EGE-CMP) model is proposed for instance-level commodity retrieval, which explicitly injects entity knowledge in both node-based and subgraph-based ways into the multi-modal networks via a self-supervised hybrid-stream transformer. This could reduce the confusion between different object contents, thereby effectively guiding the network to focus on entities with real semantics. Experimental results sufficiently verify the efficacy and generalizability of our EGE-CMP, outperforming several SOTA cross-modal baselines like CLIP Radford et al. 2021, UNITER Chen et al. 2020 and CAPTURE Zhan et al. 2021.

14.
Artigo em Inglês | MEDLINE | ID: mdl-37363849

RESUMO

Current 3D mesh steganography algorithms relying on geometric modification are prone to detection by steganalyzers. In traditional steganography, adaptive steganography has proven to be an efficient means of enhancing steganography security. Taking inspiration from this, we propose a highly adaptive embedding algorithm, guided by the principle of minimizing a carefully crafted distortion through efficient steganography codes. Specifically, we tailor a payload-limited embedding optimization problem for 3D settings and devise a feature-preserving distortion (FPD) to measure the impact of message embedding. The distortion takes on an additive form and is defined as a weighted difference of the effective steganalytic subfeatures utilized by the current 3D steganalyzers. With practicality in mind, we refine the distortion to enhance robustness and computational efficiency. By minimizing the FPD, our algorithm can preserve mesh features to a considerable extent, including steganalytic and geometric features, while achieving a high embedding capacity. During the practical embedding phase, we employ the Q-layered syndrome trellis code (STC). However, calculating the bit modification probability (BMP) for each layer of the Q-layered STC, given the variation of Q, can be cumbersome. To address this issue, we design a universal and automatic approach for the BMP calculation. The experimental results demonstrate that our algorithm achieves state-of-the-art performance in countering 3D steganalysis.

15.
Artigo em Inglês | MEDLINE | ID: mdl-37204958

RESUMO

Restoring missing areas without leaving visible traces has become a trivial task with Photoshop inpainting tools. However, such tools have potentially illegal or unethical uses, such as removing specific objects in images to deceive the public. Despite the emergence of many forensics methods of image inpainting, their detection ability is still insufficient when attending to professional Photoshop inpainting. Motivated by this, we propose a novel method termed primary-secondary network (PS-Net) to localize the Photoshop inpainted regions in images. To the best of our knowledge, this is the first forensic method devoted specifically to Photoshop inpainting. The PS-Net is designed to deal with the problems of delicate and professional inpainted images. It consists of two subnetworks: the primary network (P-Net) and the secondary network (S-Net). The P-Net aims at mining the frequency clues of subtle inpainting features through the convolutional network and further identifying the tampered region. The S-Net enables the model to mitigate compression and noise attacks to some extent by increasing the co-occurring feature weights and providing features that are not captured by the P-Net. Furthermore, the dense connection, Ghost modules, and channel attention blocks (C-A blocks) are adopted to further strengthen the localization ability of PS-Net. Extensive experimental results illustrate that PS-Net can successfully distinguish forged regions in elaborate inpainted images, outperforming several state-of-the-art solutions. The proposed PS-Net is also robust against some postprocessing operations commonly used in Photoshop.

16.
IEEE Trans Image Process ; 32: 3040-3053, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37163394

RESUMO

In this paper, we address the problem of video-based rain streak removal by developing an event-aware multi-patch progressive neural network. Rain streaks in video exhibit correlations in both temporal and spatial dimensions. Existing methods have difficulties in modeling the characteristics. Based on the observation, we propose to develop a module encoding events from neuromorphic cameras to facilitate deraining. Events are captured asynchronously at pixel-level only when intensity changes by a margin exceeding a certain threshold. Due to this property, events contain considerable information about moving objects including rain streaks passing though the camera across adjacent frames. Thus we suggest that utilizing it properly facilitates deraining performance non-trivially. In addition, we develop a multi-patch progressive neural network. The multi-patch manner enables various receptive fields by partitioning patches and the progressive learning in different patch levels makes the model emphasize each patch level to a different extent. Extensive experiments show that our method guided by events outperforms the state-of-the-art methods by a large margin in synthetic and real-world datasets.

17.
IEEE Trans Image Process ; 32: 2468-2480, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37115831

RESUMO

Human-object relationship detection reveals the fine-grained relationship between humans and objects, helping the comprehensive understanding of videos. Previous human-object relationship detection approaches are mainly developed with object features and relation features without exploring the specific information of humans. In this paper, we propose a novel Relation-Pose Transformer (RPT) for human-object relationship detection. Inspired by the coordination of eye-head-body movements in cognitive science, we employ the head pose to find those crucial objects that humans focus on and use the body pose with skeleton information to represent multiple actions. Then, we utilize the spatial encoder to capture spatial contextualized information of the relation pair, which integrates the relation features and pose features. Next, the temporal decoder aims to model the temporal dependency of the relationship. Finally, we adopt multiple classifiers to predict different types of relationships. Extensive experiments on the benchmark Action Genome validate the effectiveness of our proposed method and show the state-of-the-art performance compared with related methods.


Assuntos
Cognição , Apego ao Objeto , Humanos , Benchmarking , Movimentos da Cabeça , Esqueleto
18.
Artigo em Inglês | MEDLINE | ID: mdl-37022900

RESUMO

Most multi-exposure image fusion (MEF) methods perform unidirectional alignment within limited and local regions, which ignore the effects of augmented locations and preserve deficient global features. In this work, we propose a multi-scale bidirectional alignment network via deformable self-attention to perform adaptive image fusion. The proposed network exploits differently exposed images and aligns them to the normal exposure in varying degrees. Specifically, we design a novel deformable self-attention module that considers variant long-distance attention and interaction and implements the bidirectional alignment for image fusion. To realize adaptive feature alignment, we employ a learnable weighted summation of different inputs and predict the offsets in the deformable self-attention module, which facilitates that the model generalizes well in various scenes. In addition, the multi-scale feature extraction strategy makes the features across different scales complementary and provides fine details and contextual features. Extensive experiments demonstrate that our proposed algorithm performs favorably against state-of-the-art MEF methods.

19.
IEEE Trans Cybern ; 53(1): 454-467, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34797770

RESUMO

Despite that convolutional neural networks (CNNs) have shown high-quality reconstruction for single image dehazing, recovering natural and realistic dehazed results remains a challenging problem due to semantic confusion in the hazy scene. In this article, we show that it is possible to recover textures faithfully by incorporating semantic prior into dehazing network since objects in haze-free images tend to show certain shapes, textures, and colors. We propose a semantic-aware dehazing network (SDNet) in which the semantic prior is taken as a color constraint for dehazing, benefiting the acquisition of a reasonable scene configuration. In addition, we design a densely connected block to capture global and local information for dehazing and semantic prior estimation. To eliminate the unnatural appearance of some objects, we propose to fuse the features from shallow and deep layers adaptively. Experimental results demonstrate that our proposed model performs favorably against the state-of-the-art single image dehazing approaches.

20.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 1017-1035, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34995181

RESUMO

The recently proposed Collaborative Metric Learning (CML) paradigm has aroused wide interest in the area of recommendation systems (RS) owing to its simplicity and effectiveness. Typically, the existing literature of CML depends largely on the negative sampling strategy to alleviate the time-consuming burden of pairwise computation. However, in this work, by taking a theoretical analysis, we find that negative sampling would lead to a biased estimation of the generalization error. Specifically, we show that the sampling-based CML would introduce a bias term in the generalization bound, which is quantified by the per-user Total Variance (TV) between the distribution induced by negative sampling and the ground truth distribution. This suggests that optimizing the sampling-based CML loss function does not ensure a small generalization error even with sufficiently large training data. Moreover, we show that the bias term will vanish without the negative sampling strategy. Motivated by this, we propose an efficient alternative without negative sampling for CML named Sampling-Free Collaborative Metric Learning (SFCML), to get rid of the sampling bias in a practical sense. Finally, comprehensive experiments over seven benchmark datasets speak to the supriority of the proposed algorithm.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...