Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 93
Filtrer
1.
Article de Anglais | MEDLINE | ID: mdl-38861429

RÉSUMÉ

Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and collaborative filtering. Following the convention of RS, existing practices exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this setting, the unique user representation might induce preference bias, especially when the item category distribution is imbalanced. To address this issue, we propose a novel method called Diversity-Promoting Collaborative Metric Learning (DPCML), with the hope of considering the commonly ignored minority interest of the user. The key idea behind DPCML is to introduce a set of multiple representations for each user in the system where users' preference toward an item is aggregated by taking the minimum item-user distance among their embedding set. Specifically, we instantiate two effective assignment strategies to explore a proper quantity of vectors for each user. Meanwhile, a Diversity Control Regularization Scheme (DCRS) is developed to accommodate the multi-vector representation strategy better. Theoretically, we show that DPCML could induce a smaller generalization error than traditional CML. Furthermore, we notice that CML-based approaches usually require negative sampling to reduce the heavy computational burden caused by the pairwise objective therein. In this paper, we reveal the fundamental limitation of the widely adopted hard-aware sampling from the One-Way Partial AUC (OPAUC) perspective and then develop an effective sampling alternative for the CML-based paradigm. Finally, comprehensive experiments over a range of benchmark datasets speak to the efficacy of DPCML.

2.
IEEE Trans Med Imaging ; PP2024 Jun 11.
Article de Anglais | MEDLINE | ID: mdl-38861436

RÉSUMÉ

Medical image reporting focused on automatically generating the diagnostic reports from medical images has garnered growing research attention. In this task, learning cross-modal alignment between images and reports is crucial. However, the exposure bias problem in autoregressive text generation poses a notable challenge, as the model is optimized by a word-level loss function using the teacher-forcing strategy. To this end, we propose a novel Token-Mixer framework that learns to bind image and text in one embedding space for medical image reporting. Concretely, Token-Mixer enhances the cross-modal alignment by matching image-to-text generation with text-to-text generation that suffers less from exposure bias. The framework contains an image encoder, a text encoder and a text decoder. In training, images and paired reports are first encoded into image tokens and text tokens, and these tokens are randomly mixed to form the mixed tokens. Then, the text decoder accepts image tokens, text tokens or mixed tokens as prompt tokens and conducts text generation for network optimization. Furthermore, we introduce a tailored text decoder and an alternative training strategy that well integrate with our Token-Mixer framework. Extensive experiments across three publicly available datasets demonstrate Token-Mixer successfully enhances the image-text alignment and thereby attains a state-of-the-art performance. Related codes are available at https://github.com/yangyan22/Token-Mixer.

3.
Sensors (Basel) ; 24(11)2024 May 21.
Article de Anglais | MEDLINE | ID: mdl-38894065

RÉSUMÉ

A 9-10-bit adjustable and energy-efficient switching scheme for SAR ADC with one-LSB common-mode voltage variation is proposed. Based on capacitor-splitting technology and common-mode conversion techniques, the proposed switching scheme reduces the DAC switching energy by 96.41% compared to the conventional scheme. The low complexity and the one-LSB common-mode voltage offset of this scheme benefit from the simultaneous switching of the reference voltages of the capacitors corresponding to the positive array and the negative array throughout the entire reference voltage switching process, and the reference voltage of each capacitor in the scheme does not change more than two voltages. The post-layout result shows that the ADC achieves the 54.96 dB SNDR, the 61.73 dB SFDR, and the 0.67 µw power consumption with the 10-bit mode and the 48.33 dB SNDR, the 54.17 dB SFDR, and the 0.47 µw power consumption with the 9-bit mode in a 180 nm process with a 100 kS/s sampling frequency.

4.
Article de Anglais | MEDLINE | ID: mdl-38896521

RÉSUMÉ

Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc. Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fully explore the potential risks, we leverage an online attack on the vulnerable data collection process. Since it is independent of rank aggregation and lacks effective protection mechanisms, we disrupt the data collection process by fabricating pairwise comparisons without knowledge of the future data or the true distribution. From the game-theoretic perspective, the confrontation scenario between the online manipulator and the ranker who takes control of the original data source is formulated as a distributionally robust game that deals with the uncertainty of knowledge. Then we demonstrate that the equilibrium in the above game is potentially favorable to the adversary by analyzing the vulnerability of the sampling algorithms such as Bernoulli and reservoir methods. According to the above theoretical analysis, different sequential manipulation policies are proposed under a Bayesian decision framework and a large class of parametric pairwise comparison models. For attackers with complete knowledge, we establish the asymptotic optimality of the proposed policies. To increase the success rate of the sequential manipulation with incomplete knowledge, a distributionally robust estimator, which replaces the maximum likelihood estimation in a saddle point problem, provides a conservative data generation solution. Finally, the corroborating empirical evidence shows that the proposed method manipulates the results of rank aggregation methods in a sequential manner.

5.
Sci Rep ; 14(1): 9324, 2024 04 23.
Article de Anglais | MEDLINE | ID: mdl-38654056

RÉSUMÉ

This study constructs a composite indicator system covering the core dimensions of medical equipment input and output. Based on this system, an innovative cone-constrained data envelopment analysis (DEA) model is designed. The model integrates the advantages of the analytic hierarchy process (AHP) with an improved criterion importance through intercriteria correlation (CRITIC) method to determine subjective and objective weights and employs game theory to obtain the final combined weights, which are further incorporated as constraints to form the cone-constrained DEA model. Finally, a bidirectional long short-term memory (Bi-LSTM) model with an attention mechanism is introduced for integration, aiming to provide a novel and practical model for evaluating the effectiveness of medical equipment. The proposed model has essential reference value for optimizing medical equipment management decision-making and investment strategies.


Sujet(s)
Équipement et fournitures , Humains , Modèles théoriques , Théorie du jeu , Algorithmes
6.
IEEE Trans Image Process ; 33: 3115-3129, 2024.
Article de Anglais | MEDLINE | ID: mdl-38656836

RÉSUMÉ

Long-term Video Question Answering (VideoQA) is a challenging vision-and-language bridging task focusing on semantic understanding of untrimmed long-term videos and diverse free-form questions, simultaneously emphasizing comprehensive cross-modal reasoning to yield precise answers. The canonical approaches often rely on off-the-shelf feature extractors to detour the expensive computation overhead, but often result in domain-independent modality-unrelated representations. Furthermore, the inherent gradient blocking between unimodal comprehension and cross-modal interaction hinders reliable answer generation. In contrast, recent emerging successful video-language pre-training models enable cost-effective end-to-end modeling but fall short in domain-specific ratiocination and exhibit disparities in task formulation. Toward this end, we present an entirely end-to-end solution for long-term VideoQA: Multi-granularity Contrastive cross-modal collaborative Generation (MCG) model. To derive discriminative representations possessing high visual concepts, we introduce Joint Unimodal Modeling (JUM) on a clip-bone architecture and leverage Multi-granularity Contrastive Learning (MCL) to harness the intrinsically or explicitly exhibited semantic correspondences. To alleviate the task formulation discrepancy problem, we propose a Cross-modal Collaborative Generation (CCG) module to reformulate VideoQA as a generative task instead of the conventional classification scheme, empowering the model with the capability for cross-modal high-semantic fusion and generation so as to rationalize and answer. Extensive experiments conducted on six publicly available VideoQA datasets underscore the superiority of our proposed method.

7.
Article de Anglais | MEDLINE | ID: mdl-38683713

RÉSUMÉ

Crowd localization aims to predict the positions of humans in images of crowded scenes. While existing methods have made significant progress, two primary challenges remain: (i) a fixed number of evenly distributed anchors can cause excessive or insufficient predictions across regions in an image with varying crowd densities, and (ii) ranking inconsistency of predictions between the testing and training phases leads to the model being sub-optimal in inference. To address these issues, we propose a Consistency-Aware Anchor Pyramid Network (CAAPN) comprising two key components: an Adaptive Anchor Generator (AAG) and a Localizer with Augmented Matching (LAM). The AAG module adaptively generates anchors based on estimated crowd density in local regions to alleviate the anchor deficiency or excess problem. It also considers the spatial distribution prior to heads for better performance. The LAM module is designed to augment the predictions which are used to optimize the neural network during training by introducing an extra set of target candidates and correctly matching them to the ground truth. The proposed method achieves favorable performance against state-of-the-art approaches on five challenging datasets: ShanghaiTech A and B, UCF-QNRF, JHU-CROWD++, and NWPU-Crowd. The source code and trained models will be released at https://github.com/ucasyan/CAAPN.

8.
Article de Anglais | MEDLINE | ID: mdl-38683715

RÉSUMÉ

Video activity anticipation aims to predict what will happen in the future, embracing a broad application prospect ranging from robot vision and autonomous driving. Despite the recent progress, the data uncertainty issue, reflected as the content evolution process and dynamic correlation in event labels, has been somehow ignored. This reduces the model generalization ability and deep understanding on video content, leading to serious error accumulation and degraded performance. In this paper, we address the uncertainty learning problem and propose an uncertainty-boosted robust video activity anticipation framework, which generates uncertainty values to indicate the credibility of the anticipation results. The uncertainty value is used to derive a temperature parameter in the softmax function to modulate the predicted target activity distribution. To guarantee the distribution adjustment, we construct a reasonable target activity label representation by incorporating the activity evolution from the temporal class correlation and the semantic relationship. Moreover, we quantify the uncertainty into relative values by comparing the uncertainty among sample pairs and their temporal-lengths. This relative strategy provides a more accessible way in uncertainty modeling than quantifying the absolute uncertainty values on the whole dataset. Experiments on multiple backbones and benchmarks show our framework achieves promising performance and better robustness/interpretability. Source codes are available at https://github.com/qzhb/UbRV2A.

9.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 5062-5079, 2024 Jul.
Article de Anglais | MEDLINE | ID: mdl-38315603

RÉSUMÉ

Stochastic optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning. Despite extensive studies on AUPRC optimization, generalization is still an open problem. In this work, we present the first trial in the algorithm-dependent generalization of stochastic AUPRC optimization. The obstacles to our destination are three-fold. First, according to the consistency analysis, the majority of existing stochastic estimators are biased with biased sampling strategies. To address this issue, we propose a stochastic estimator with sampling-rate-invariant consistency and reduce the consistency error by estimating the full-batch scores with score memory. Second, standard techniques for algorithm-dependent generalization analysis cannot be directly applied to listwise losses. To fill this gap, we extend the model stability from instance-wise losses to listwise losses. Third, AUPRC optimization involves a compositional optimization problem, which brings complicated computations. In this work, we propose to reduce the computational complexity by matrix spectral decomposition. Based on these techniques, we derive the first algorithm-dependent generalization bound for AUPRC optimization. Motivated by theoretical results, we propose a generalization-induced learning framework, which improves the AUPRC generalization by equivalently increasing the batch size and the number of valid training examples. Practically, experiments on image retrieval and long-tailed classification speak to the effectiveness and soundness of our framework.

10.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 4926-4943, 2024 Jul.
Article de Anglais | MEDLINE | ID: mdl-38349824

RÉSUMÉ

Change captioning aims to describe the semantic change between two similar images. In this process, as the most typical distractor, viewpoint change leads to the pseudo changes about appearance and position of objects, thereby overwhelming the real change. Besides, since the visual signal of change appears in a local region with weak feature, it is difficult for the model to directly translate the learned change features into the sentence. In this paper, we propose a syntax-calibrated multi-aspect relation transformer to learn effective change features under different scenes, and build reliable cross-modal alignment between the change features and linguistic words during caption generation. Specifically, a multi-aspect relation learning network is designed to 1) explore the fine-grained changes under irrelevant distractors (e.g., viewpoint change) by embedding the relations of semantics and relative position into the features of each image; 2) learn two view-invariant image representations by strengthening their global contrastive alignment relation, so as to help capture a stable difference representation; 3) provide the model with the prior knowledge about whether and where the semantic change happened by measuring the relation between the representations of captured difference and the image pair. Through the above manner, the model can learn effective change features for caption generation. Further, we introduce the syntax knowledge of Part-of-Speech (POS) and devise a POS-based visual switch to calibrate the transformer decoder. The POS-based visual switch dynamically utilizes visual information during different word generation based on the POS of words. This enables the decoder to build reliable cross-modal alignment, so as to generate a high-level linguistic sentence about change. Extensive experiments show that the proposed method achieves the state-of-the-art performance on the three public datasets.

11.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 4850-4865, 2024 Jul.
Article de Anglais | MEDLINE | ID: mdl-38261483

RÉSUMÉ

Although stereo image restoration has been extensively studied, most existing work focuses on restoring stereo images with limited horizontal parallax due to the binocular symmetry constraint. Stereo images with unlimited parallax (e.g., large ranges and asymmetrical types) are more challenging in real-world applications and have rarely been explored so far. To restore high-quality stereo images with unlimited parallax, this paper proposes an attention-guided correspondence learning method, which learns both self- and cross-views feature correspondence guided by parallax and omnidirectional attention. To learn cross-view feature correspondence, a Selective Parallax Attention Module (SPAM) is proposed to interact with cross-view features under the guidance of parallax attention that adaptively selects receptive fields for different parallax ranges. Furthermore, to handle asymmetrical parallax, we propose a Non-local Omnidirectional Attention Module (NOAM) to learn the non-local correlation of both self- and cross-view contexts, which guides the aggregation of global contextual features. Finally, we propose an Attention-guided Correspondence Learning Restoration Network (ACLRNet) upon SPAMs and NOAMs to restore stereo images by associating the features of two views based on the learned correspondence. Extensive experiments on five benchmark datasets demonstrate the effectiveness and generalization of the proposed method on three stereo image restoration tasks including super-resolution, denoising, and compression artifact reduction.

12.
IEEE Trans Image Process ; 33: 1059-1069, 2024.
Article de Anglais | MEDLINE | ID: mdl-38265894

RÉSUMÉ

This paper presents a novel fine-grained task for traffic accident analysis. Accident detection in surveillance or dashcam videos is a common task in the field of traffic accident analysis by using videos. However, common accident detection does not analyze the specific particulars of the accident, only identifies the accident's existence or occurrence time in a video. In this paper, we define the novel fine-grained accident detection task which contains fine-grained accident classification, temporal-spatial occurrence region localization, and accident severity estimation. A transformer-based framework combining the RGB and optical flow information of videos is proposed for fine-grained accident detection. Additionally, we introduce a challenging Fine-grained Accident Detection (FAD) database that covers multiple tasks in surveillance videos which places more emphasis on the overall perspective. Experimental results demonstrate that our model could effectively extract the video features for multiple tasks, indicating that current traffic accident analysis has limitations in dealing with the FAD task and that further research is indeed needed.

13.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 3509-3521, 2024 May.
Article de Anglais | MEDLINE | ID: mdl-38090835

RÉSUMÉ

There are two mainstream approaches for object detection: top-down and bottom-up. The state-of-the-art approaches are mainly top-down methods. In this paper, we demonstrate that bottom-up approaches show competitive performance compared with top-down approaches and have higher recall rates. Our approach, named CenterNet, detects each object as a triplet of keypoints (top-left and bottom-right corners and the center keypoint). We first group the corners according to some designed cues and confirm the object locations based on the center keypoints. The corner keypoints allow the approach to detect objects of various scales and shapes and the center keypoint reduces the confusion introduced by a large number of false-positive proposals. Our approach is an anchor-free detector because it does not need to define explicit anchor boxes. We adapt our approach to backbones with different structures, including 'hourglass'-like networks and 'pyramid'-like networks, which detect objects in single-resolution and multi-resolution feature maps, respectively. On the MS-COCO dataset, CenterNet with Res2Net-101 and Swin-Transformer achieve average precisions (APs) of 53.7% and 57.1%, respectively, outperforming all existing bottom-up detectors and achieving state-of-the-art performance. We also design a real-time CenterNet model, which achieves a good trade-off between accuracy and speed, with an AP of 43.6% at 30.5 frames per second (FPS).

14.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 957-974, 2024 Feb.
Article de Anglais | MEDLINE | ID: mdl-37878433

RÉSUMÉ

To improve user experience, recommender systems have been widely used on many online platforms. In these systems, recommendation models are typically learned from positive/negative feedback that are collected automatically. Notably, recommender systems are a little different from general supervised learning tasks. In recommender systems, there are some factors (e.g., previous recommendation models or operation strategies of a online platform) that determine which items can be exposed to each individual user. Normally, the previous exposure results are not only relevant to the instances' features (i.e., user or item), but also affect their feedback ratings, thus leading to confounding bias in the recommendation models. To mitigate this bias, researchers have already provided a variety of strategies. However, there are still two issues that are underappreciated: 1) previous debiased RS approaches cannot effectively capture recommendation-specific, exposure-specific and their common knowledge simultaneously; 2) the true exposure results of the user-item pairs are partially inaccessible, so there would be some noises if we use their observability to approximate it as existing approaches. Motivated by this, we develop a novel debiasing recommendation approach. More specifically, we first propose a mutual information-based counterfactual learning framework based on the causal relationship among the instance features, exposure status, and ratings. This framework can 1) capture recommendation-specific, exposure-specific and their common knowledge by explicitly modeling the relationship among the causal factors, and 2) achieve robustness towards partially inaccessible exposure results by a pairwise learning strategy. Under such a framework, we implement an optimizable loss function with theoretical analysis. By minimizing this loss, we expect to obtain an unbiased recommendation model that reflects the users' real interests. Meanwhile, we also prove that our loss function has robustness towards the partial inaccessibility of the exposure status. Finally, extensive experiments on public datasets manifest the superiority of our proposed method in boosting the recommendation performance.

15.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 1049-1064, 2024 Feb.
Article de Anglais | MEDLINE | ID: mdl-37878438

RÉSUMÉ

Video captioning aims to generate natural language descriptions for a given video clip. Existing methods mainly focus on end-to-end representation learning via word-by-word comparison between predicted captions and ground-truth texts. Although significant progress has been made, such supervised approaches neglect semantic alignment between visual and linguistic entities, which may negatively affect the generated captions. In this work, we propose a hierarchical modular network to bridge video representations and linguistic semantics at four granularities before generating captions: entity, verb, predicate, and sentence. Each level is implemented by one module to embed corresponding semantics into video representations. Additionally, we present a reinforcement learning module based on the scene graph of captions to better measure sentence similarity. Extensive experimental results show that the proposed method performs favorably against the state-of-the-art models on three widely-used benchmark datasets, including microsoft research video description corpus (MSVD), MSR-video to text (MSR-VTT), and video-and-TEXt (VATEX).

16.
Med Eng Phys ; 122: 104073, 2023 12.
Article de Anglais | MEDLINE | ID: mdl-38092490

RÉSUMÉ

OBJECTIVE: Ambulatory arterial stiffness index (AASI) is an index which indicates arterial stiffness. This work aims to explore the mathematical relationship between AASI and mean value of PP (PP‾), and reveal the importance of PP‾ during AASI estimating. Meanwhile, a well-performing AASI estimation model is presented. METHODS: To evaluate AASI, electrocardiograph (ECG) signal, photoplethysmogram (PPG) signal and arterial blood pressure (ABP) are used as the source of AASI estimation. Features are extracted from the above three signals. Meanwhile, fitting curve analysis and regression models are implemented to describe the relationship between AASI and PP‾. RESULTS: Among three fitting curves on AASI and PP‾, cubic polynomial curve performs best. The introduction of feature PP‾ in AASI estimation reduced LR's MAE from 0.0556 to 0.0372, SVMR's MAE from 0.0413 to 0.0343 and RFR's MAE from 0.0386 to 0.0256. All three estimation models obtain considerable improvement, especially on the previous worst-performing linear regression. SIGNIFICANCE: This work presents the mathematical association between AASI and PP‾. AASI estimation using regression models can be significantly improved by involving PP‾ as its key feature, which is not only meaningful for exploring the connection between vascular elasticity function and pulse pressure, but also hold importance for the diagnosis of cardiovascular arteriosclerosis and atherosclerosis at the early stage.


Sujet(s)
Rigidité vasculaire , Pression sanguine/physiologie , Modèles linéaires , Élasticité
17.
Micromachines (Basel) ; 14(12)2023 Dec 15.
Article de Anglais | MEDLINE | ID: mdl-38138413

RÉSUMÉ

A low-power SAR ADC with capacitor-splitting energy-efficient switching scheme is proposed for wearable biosensor applications. Based on capacitor-splitting, additional reference voltage Vcm, and common-mode techniques, the proposed switching scheme achieves 93.76% less switching energy compared to the conventional scheme with common-mode voltage shift in one LSB. With the switching scheme, the proposed SAR ADC can lower the dependency on the accuracy of Vcm and the complexity of digital control logic and DAC driver circuits. Furthermore, the SAR ADC employs low-noise and low-power dynamic comparators utilizing multi-clock control, low sampling error sampling switches based on the bootstrap technique, and dynamic SAR logic. The simulation results demonstrate that the ADC achieves a 61.77 dB SNDR and a 78.06 dB SFDR and consumes 4.45 µW of power in a 180 nm process with a 1 V power supply, a full-swing input signal frequency of 93.33 kHz, and a sampling rate of 200 kS/s.

18.
Article de Anglais | MEDLINE | ID: mdl-38032778

RÉSUMÉ

Multilabel image recognition (MLR) aims to annotate an image with comprehensive labels and suffers from object occlusion or small object sizes within images. Although the existing works attempt to capture and exploit label correlations to tackle these issues, they predominantly rely on global statistical label correlations as prior knowledge for guiding label prediction, neglecting the unique label correlations present within each image. To overcome this limitation, we propose a semantic and correlation disentangled graph convolution (SCD-GC) method, which builds the image-specific graph and employs graph propagation to reason the labels effectively. Specifically, we introduce a semantic disentangling module to extract categorywise semantic features as graph nodes and develop a correlation disentangling module to extract image-specific label correlations as graph edges. Performing graph convolutions on this image-specific graph allows for better mining of difficult labels with weak visual representations. Visualization experiments reveal that our approach successfully disentangles the dominant label correlations existing within the input image. Through extensive experimentation, we demonstrate that our method achieves superior results on the challenging Microsoft COCO (MS-COCO), PASCAL visual object classes (PASCAL-VOC), NUS web image dataset (NUS-WIDE), and Visual Genome 500 (VG-500) datasets. Code is available at GitHub: https://github.com/caigitrepo/SCDGC.

19.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7668-7685, 2023 Jun.
Article de Anglais | MEDLINE | ID: mdl-37819793

RÉSUMÉ

Nowadays, machine learning (ML) and deep learning (DL) methods have become fundamental building blocks for a wide range of AI applications. The popularity of these methods also makes them widely exposed to malicious attacks, which may cause severe security concerns. To understand the security properties of the ML/DL methods, researchers have recently started to turn their focus to adversarial attack algorithms that could successfully corrupt the model or clean data owned by the victim with imperceptible perturbations. In this paper, we study the Label Flipping Attack (LFA) problem, where the attacker expects to corrupt an ML/DL model's performance by flipping a small fraction of the labels in the training data. Prior art along this direction adopts combinatorial optimization problems, leading to limited scalability toward deep learning models. To this end, we propose a novel minimax problem which provides an efficient reformulation of the sample selection process in LFA. In the new optimization problem, the sample selection operation could be implemented with a single thresholding parameter. This leads to a novel training algorithm called Sample Thresholding. Since the objective function is differentiable and the model complexity does not depend on the sample size, we can apply Sample Thresholding to attack deep learning models. Moreover, since the victim's behavior is not predictable in a poisonous attack setting, we have to employ surrogate models to simulate the true model employed by the victim model. Seeing the problem, we provide a theoretical analysis of such a surrogate paradigm. Specifically, we show that the performance gap between the true model employed by the victim and the surrogate model is small under mild conditions. On top of this paradigm, we extend Sample Thresholding to the crowdsourced ranking task, where labels collected from the annotators are vulnerable to adversarial attacks. Finally, experimental analyses on three real-world datasets speak to the efficacy of our method.

20.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15345-15363, 2023 Dec.
Article de Anglais | MEDLINE | ID: mdl-37751347

RÉSUMÉ

Positive-Unlabeled (PU) data arise frequently in a wide range of fields such as medical diagnosis, anomaly analysis and personalized advertising. The absence of any known negative labels makes it very challenging to learn binary classifiers from such data. Many state-of-the-art methods reformulate the original classification risk with individual risks over positive and unlabeled data, and explicitly minimize the risk of classifying unlabeled data as negative. This, however, usually leads to classifiers with a bias toward negative predictions, i.e., they tend to recognize most unlabeled data as negative. In this paper, we propose a label distribution alignment formulation for PU learning to alleviate this issue. Specifically, we align the distribution of predicted labels with the ground-truth, which is constant for a given class prior. In this way, the proportion of samples predicted as negative is explicitly controlled from a global perspective, and thus the bias toward negative predictions could be intrinsically eliminated. On top of this, we further introduce the idea of functional margins to enhance the model's discriminability, and derive a margin-based learning framework named Positive-Unlabeled learning with Label Distribution Alignment (PULDA). This framework is also combined with the class prior estimation process for practical scenarios, and theoretically supported by a generalization analysis. Moreover, a stochastic mini-batch optimization algorithm based on the exponential moving average strategy is tailored for this problem with a convergence guarantee. Finally, comprehensive empirical results demonstrate the effectiveness of the proposed method.

SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...