Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 72
Filtrer
1.
Article de Anglais | MEDLINE | ID: mdl-38833389

RÉSUMÉ

Weakly supervised object localization (WSOL) stands as a pivotal endeavor within the realm of computer vision, entailing the location of objects utilizing merely image-level labels. Contemporary approaches in WSOL have leveraged FPMs, yielding commendable outcomes. However, these existing FPM-based techniques are predominantly confined to rudimentary strategies of either augmenting the foreground or diminishing the background presence. We argue for the exploration and exploitation of the intricate interplay between the object's foreground and its background to achieve efficient object localization. In this manuscript, we introduce an innovative framework, termed adaptive zone learning (AZL), which operates on a coarse-to-fine basis to refine FPMs through a triad of adaptive zone mechanisms. First, an adversarial learning mechanism (ALM) is employed, orchestrating an interplay between the foreground and background regions. This mechanism accentuates coarse-grained object regions in a mutually adversarial manner. Subsequently, an oriented learning mechanism (OLM) is unveiled, which harnesses local insights from both foreground and background in a fine-grained manner. This mechanism is instrumental in delineating object regions with greater granularity, thereby generating better FPMs. Furthermore, we propose a reinforced learning mechanism (RLM) as the compensatory mechanism for adversarial design, by which the undesirable foreground maps are refined again. Extensive experiments on CUB-200-2011 and ILSVRC datasets demonstrate that AZL achieves significant and consistent performance improvements over other state-of-the-art WSOL methods.

2.
Article de Anglais | MEDLINE | ID: mdl-38593014

RÉSUMÉ

Visible-infrared person re-identification (VI-ReID) is the task of matching the same individuals across the visible and infrared modalities. Its main challenge lies in the modality gap caused by the cameras operating on different spectra. Existing VI-ReID methods mainly focus on learning general features across modalities, often at the expense of feature discriminability. To address this issue, we present a novel cycle-construction-based network for neutral yet discriminative feature learning, termed CycleTrans. Specifically, CycleTrans uses a lightweight knowledge capturing module (KCM) to capture rich semantics from the modality-relevant feature maps according to pseudo anchors. Afterward, a discrepancy modeling module (DMM) is deployed to transform these features into neutral ones according to the modality-irrelevant prototypes. To ensure feature discriminability, another two KCMs are further deployed for feature cycle constructions. With cycle construction, our method can learn effective neutral features for visible and infrared images while preserving their salient semantics. Extensive experiments on SYSU-MM01 and RegDB datasets validate the merits of CycleTrans against a flurry of state-of-the-art (SOTA) methods, +1.88% on rank-1 in SYSU-MM01 and +1.1% on rank-1 in RegDB. Our code is available at https://github.com/DoubtedSteam/CycleTrans.

3.
IEEE Trans Image Process ; 33: 2158-2170, 2024.
Article de Anglais | MEDLINE | ID: mdl-38470575

RÉSUMÉ

Depth information opens up new opportunities for video object segmentation (VOS) to be more accurate and robust in complex scenes. However, the RGBD VOS task is largely unexplored due to the expensive collection of RGBD data and time-consuming annotation of segmentation. In this work, we first introduce a new benchmark for RGBD VOS, named DepthVOS, which contains 350 videos (over 55k frames in total) annotated with masks and bounding boxes. We futher propose a novel, strong baseline model - Fused Color-Depth Network (FusedCDNet), which can be trained solely under the supervision of bounding boxes, while being used to generate masks with a bounding box guideline only in the first frame. Thereby, the model possesses three major advantages: a weakly-supervised training strategy to overcome the high-cost annotation, a cross-modal fusion module to handle complex scenes, and weakly-supervised inference to promote ease of use. Extensive experiments demonstrate that our proposed method performs on par with top fully-supervised algorithms. We will open-source our project on https://github.com/yjybuaa/depthvos/ to facilitate the development of RGBD VOS.

4.
Article de Anglais | MEDLINE | ID: mdl-38502629

RÉSUMÉ

PSNR-oriented models are a critical class of super-resolution models with applications across various fields. However, these models tend to generate over-smoothed images, a problem that has been analyzed previously from the perspectives of models or loss functions, but without taking into account the impact of data properties. In this paper, we present a novel phenomenon that we term the center-oriented optimization (COO) problem, where a model's output converges towards the center point of similar high-resolution images, rather than towards the ground truth. We demonstrate that the strength of this problem is related to the uncertainty of data, which we quantify using entropy. We prove that as the entropy of high-resolution images increases, their center point will move further away from the clean image distribution, and the model will generate over-smoothed images. Implicitly optimizing the COO problem, perceptual-driven approaches such as perceptual loss, model structure optimization, or GAN-based methods can be viewed. We propose an explicit solution to the COO problem, called Detail Enhanced Contrastive Loss (DECLoss). DECLoss utilizes the clustering property of contrastive learning to directly reduce the variance of the potential high-resolution distribution and thereby decrease the entropy. We evaluate DECLoss on multiple super-resolution benchmarks and demonstrate that it improves the perceptual quality of PSNR-oriented models. Moreover, when applied to GAN-based methods, such as RaGAN, DECLoss helps to achieve state-of-the-art performance, such as 0.093 LPIPS with 24.51 PSNR on 4× downsampled Urban100, validating the effectiveness and generalization of our approach.

5.
Article de Anglais | MEDLINE | ID: mdl-38502633

RÉSUMÉ

Transformers have shown remarkable performance, however, their architecture design is a time-consuming process that demands expertise and trial-and-error. Thus, it is worthwhile to investigate efficient methods for automatically searching high-performance Transformers via Transformer Architecture Search (TAS). In order to improve the search efficiency, training-free proxy based methods have been widely adopted in Neural Architecture Search (NAS). Whereas, these proxies have been found to be inadequate in generalizing well to Transformer search spaces, as confirmed by several studies and our own experiments. This paper presents an effective scheme for TAS called TRansformer Architecture search with ZerO-cost pRoxy guided evolution (T-Razor) that achieves exceptional efficiency. Firstly, through theoretical analysis, we discover that the synaptic diversity of multi-head self-attention (MSA) and the saliency of multi-layer perceptron (MLP) are correlated with the performance of corresponding Transformers. The properties of synaptic diversity and synaptic saliency motivate us to introduce the ranks of synaptic diversity and saliency that denoted as DSS++ for evaluating and ranking Transformers. DSS++ incorporates correlation information among sampled Transformers to provide unified scores for both synaptic diversity and synaptic saliency. We then propose a block-wise evolution search guided by DSS++ to find optimal Transformers. DSS++ determines the positions for mutation and crossover, enhancing the exploration ability. Experimental results demonstrate that our T-Razor performs competitively against the state-of-the-art manually or automatically designed Transformer architectures across four popular Transformer search spaces. Significantly, T-Razor improves the searching efficiency across different Transformer search spaces, e.g., reducing required GPU days from more than 24 to less than 0.4 and outperforming existing zero-cost approaches. We also apply T-Razor to the BERT search space and find that the searched Transformers achieve competitive GLUE results on several Neural Language Processing (NLP) datasets. This work provides insights into training-free TAS, revealing the usefulness of evaluating Transformers based on the properties of their different blocks.

6.
BMC Med ; 22(1): 62, 2024 02 08.
Article de Anglais | MEDLINE | ID: mdl-38331793

RÉSUMÉ

BACKGROUND: The distal transradial access (dTRA) has become an attractive and alternative access to the conventional transradial access (TRA) for cardiovascular interventional diagnosis and/or treatment. There was a lack of randomized clinical trials to evaluate the effect of the dTRA on the long-term radial artery occlusion (RAO). METHODS: This was a prospective, randomized controlled study. The primary endpoint was the incidence of long-term RAO at 3 months after discharge. The secondary endpoints included the successful puncture rate, puncture time, and other access-related complications. RESULTS: The incidence of long-term RAO was 0.8% (3/361) for dTRA and 3.3% (12/365) for TRA (risk ratio = 0.25, 95% confidence interval = 0.07-0.88, P = 0.02). The incidence of RAO at 24 h was significantly lower in the dTRA group than in the TRA group (2.5% vs. 6.7%, P < 0.01). The puncture success rate (96.0% vs. 98.5%, P = 0.03) and single puncture attempt (70.9% vs. 83.9%, P < 0.01) were significantly lower in the dTRA group than in the TRA group. However, the number of puncture attempts and puncture time were higher in the dTRA group. The dTRA group had a lower incidence of bleeding than the TRA group (1.5% vs. 6.0%, P < 0.01). There was no difference in the success rate of the procedure, total fluoroscopy time, or incidence of other access-related complications between the two groups. In the per-protocol analysis, the incidence of mEASY type ≥ II haematoma was significantly lower in the dTRA group, which was consistent with that in the as-treated analysis. CONCLUSIONS: The dTRA significantly reduced the incidence of long-term RAO, bleeding or haematoma. TRIAL REGISTRATION: ClinicalTrials.gov identifer: NCT05253820.


Sujet(s)
Artériopathies oblitérantes , Intervention coronarienne percutanée , Humains , Artère radiale/chirurgie , Études prospectives , Artériopathies oblitérantes/imagerie diagnostique , Artériopathies oblitérantes/épidémiologie , Hémorragie , Hématome/étiologie , Hématome/complications , Coronarographie/effets indésirables , Coronarographie/méthodes , Intervention coronarienne percutanée/effets indésirables , Intervention coronarienne percutanée/méthodes , Résultat thérapeutique
7.
Article de Anglais | MEDLINE | ID: mdl-37934637

RÉSUMÉ

Unsupervised domain adaptation (UDA) person reidentification (Re-ID) aims to identify pedestrian images within an unlabeled target domain with an auxiliary labeled source-domain dataset. Many existing works attempt to recover reliable identity information by considering multiple homogeneous networks. And take these generated labels to train the model in the target domain. However, these homogeneous networks identify people in approximate subspaces and equally exchange their knowledge with others or their mean net to improve their ability, inevitably limiting the scope of available knowledge and putting them into the same mistake. This article proposes a dual-level asymmetric mutual learning (DAML) method to learn discriminative representations from a broader knowledge scope with diverse embedding spaces. Specifically, two heterogeneous networks mutually learn knowledge from asymmetric subspaces through the pseudo label generation in a hard distillation manner. The knowledge transfer between two networks is based on an asymmetric mutual learning (AML) manner. The teacher network learns to identify both the target and source domain while adapting to the target domain distribution based on the knowledge of the student. Meanwhile, the student network is trained on the target dataset and employs the ground-truth label through the knowledge of the teacher. Extensive experiments in Market-1501, CUHK-SYSU, and MSMT17 public datasets verified the superiority of DAML over state-of-the-arts (SOTA).

8.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14990-15004, 2023 Dec.
Article de Anglais | MEDLINE | ID: mdl-37669203

RÉSUMÉ

Network pruning is an effective approach to reduce network complexity with acceptable performance compromise. Existing studies achieve the sparsity of neural networks via time-consuming weight training or complex searching on networks with expanded width, which greatly limits the applications of network pruning. In this paper, we show that high-performing and sparse sub-networks without the involvement of weight training, termed "lottery jackpots", exist in pre-trained models with unexpanded width. Our presented lottery jackpots are traceable through empirical and theoretical outcomes. For example, we obtain a lottery jackpot that has only 10% parameters and still reaches the performance of the original dense VGGNet-19 without any modifications on the pre-trained weights on CIFAR-10. Furthermore, we improve the efficiency for searching lottery jackpots from two perspectives. First, we observe that the sparse masks derived from many existing pruning criteria have a high overlap with the searched mask of our lottery jackpot, among which, the magnitude-based pruning results in the most similar mask with ours. In compliance with this insight, we initialize our sparse mask using the magnitude-based pruning, resulting in at least 3× cost reduction on the lottery jackpot searching while achieving comparable or even better performance. Second, we conduct an in-depth analysis of the searching process for lottery jackpots. Our theoretical result suggests that the decrease in training loss during weight searching can be disturbed by the dependency between weights in modern networks. To mitigate this, we propose a novel short restriction method to restrict change of masks that may have potential negative impacts on the training loss, which leads to a faster convergence and reduced oscillation for searching lottery jackpots. Consequently, our searched lottery jackpot removes 90% weights in ResNet-50, while it easily obtains more than 70% top-1 accuracy using only 5 searching epochs on ImageNet.

9.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10478-10487, 2023 Aug.
Article de Anglais | MEDLINE | ID: mdl-37030750

RÉSUMÉ

The mainstream approach for filter pruning is usually either to force a hard-coded importance estimation upon a computation-heavy pretrained model to select "important" filters, or to impose a hyperparameter-sensitive sparse constraint on the loss objective to regularize the network training. In this paper, we present a novel filter pruning method, dubbed dynamic-coded filter fusion (DCFF), to derive compact CNNs in a computation-economical and regularization-free manner for efficient image classification. Each filter in our DCFF is first given an inter-similarity distribution with a temperature parameter as a filter proxy, on top of which, a fresh Kullback-Leibler divergence based dynamic-coded criterion is proposed to evaluate the filter importance. In contrast to simply keeping high-score filters in other methods, we propose the concept of filter fusion, i.e., the weighted averages using the assigned proxies, as our preserved filters. We obtain a one-hot inter-similarity distribution as the temperature parameter approaches infinity. Thus, the relative importance of each filter can vary along with the training of the compact CNN, leading to dynamically changeable fused filters without both the dependency on the pretrained model and the introduction of sparse constraints. Extensive experiments on classification benchmarks demonstrate the superiority of our DCFF over the compared counterparts. For example, our DCFF derives a compact VGGNet-16 with only 72.77M FLOPs and 1.06M parameters while reaching top-1 accuracy of 93.47% on CIFAR-10. A compact ResNet-50 is obtained with 63.8% FLOPs and 58.6% parameter reductions, retaining 75.60% top-1 accuracy on ILSVRC-2012. Our code, narrower models and training logs are available at https://github.com/lmbxmu/DCFF.

10.
IEEE Trans Pattern Anal Mach Intell ; 45(9): 11108-11119, 2023 Sep.
Article de Anglais | MEDLINE | ID: mdl-37023149

RÉSUMÉ

A resource-adaptive supernet adjusts its subnets for inference to fit the dynamically available resources. In this paper, we propose prioritized subnet sampling to train a resource-adaptive supernet, termed PSS-Net. We maintain multiple subnet pools, each of which stores the information of substantial subnets with similar resource consumption. Considering a resource constraint, subnets conditioned on this resource constraint are sampled from a pre-defined subnet structure space and high-quality ones will be inserted into the corresponding subnet pool. Then, the sampling will gradually be prone to sampling subnets from the subnet pools. Moreover, the one with a better performance metric is assigned with higher priority to train our PSS-Net, if sampling is from a subnet pool. At the end of training, our PSS-Net retains the best subnet in each pool to entitle a fast switch of high-quality subnets for inference when the available resources vary. Experiments on ImageNet using MobileNet-V1/V2 and ResNet-50 show that our PSS-Net can well outperform state-of-the-art resource-adaptive supernets. Our project is publicly available at https://github.com/chenbong/PSS-Net.

11.
IEEE Trans Neural Netw Learn Syst ; 34(1): 134-143, 2023 Jan.
Article de Anglais | MEDLINE | ID: mdl-34197327

RÉSUMÉ

Referring expression comprehension (REC) is an emerging research topic in computer vision, which refers to the detection of a target region in an image given a test description. Most existing REC methods follow a multistage pipeline, which is computationally expensive and greatly limits the applications of REC. In this article, we propose a one-stage model toward real-time REC, termed real-time global inference network (RealGIN). RealGIN addresses the issues of expression diversity and complexity of REC with two innovative designs: adaptive feature selection (AFS) and Global Attentive ReAsoNing (GARAN). Expression diversity concerns varying expression content, which includes information such as colors, attributes, locations, and fine-grained categories. To address this issue, AFS adaptively fuses features of different semantic levels to tackle the changes in expression content. In contrast, expression complexity concerns the complex relational conditions in expressions that are used to identify the referent. To this end, GARAN uses the textual feature as a pivot to collect expression-aware visual information from all regions and then diffuses this information back to each region, which provides sufficient context for modeling the relational conditions in expressions. On five benchmark datasets, i.e., RefCOCO, RefCOCO+, RefCOCOg, ReferIT, and Flickr30k, the proposed RealGIN outperforms most existing methods and achieves very competitive performances against the most advanced one, i.e., MAttNet. More importantly, under the same hardware, RealGIN can boost the processing speed by 10-20 times over the existing methods.

12.
IEEE Trans Neural Netw Learn Syst ; 34(11): 8743-8752, 2023 Nov.
Article de Anglais | MEDLINE | ID: mdl-35254994

RÉSUMÉ

Existing online knowledge distillation approaches either adopt the student with the best performance or construct an ensemble model for better holistic performance. However, the former strategy ignores other students' information, while the latter increases the computational complexity during deployment. In this article, we propose a novel method for online knowledge distillation, termed feature fusion and self-distillation (FFSD), which comprises two key components: FFSD, toward solving the above problems in a unified framework. Different from previous works, where all students are treated equally, the proposed FFSD splits them into a leader student set and a common student set. Then, the feature fusion module converts the concatenation of feature maps from all common students into a fused feature map. The fused representation is used to assist the learning of the leader student. To enable the leader student to absorb more diverse information, we design an enhancement strategy to increase the diversity among students. Besides, a self-distillation module is adopted to convert the feature map of deeper layers into a shallower one. Then, the shallower layers are encouraged to mimic the transformed feature maps of the deeper layers, which helps the students to generalize better. After training, we simply adopt the leader student, which achieves superior performance, over the common students, without increasing the storage or inference cost. Extensive experiments on CIFAR-100 and ImageNet demonstrate the superiority of our FFSD over existing works. The code is available at https://github.com/SJLeo/FFSD.

13.
IEEE Trans Neural Netw Learn Syst ; 34(11): 9139-9148, 2023 Nov.
Article de Anglais | MEDLINE | ID: mdl-35294359

RÉSUMÉ

This article focuses on filter-level network pruning. A novel pruning method, termed CLR-RNF, is proposed. We first reveal a "long-tail" pruning problem in magnitude-based weight pruning methods and then propose a computation-aware measurement for individual weight importance, followed by a cross-layer ranking (CLR) of weights to identify and remove the bottom-ranked weights. Consequently, the per-layer sparsity makes up the pruned network structure in our filter pruning. Then, we introduce a recommendation-based filter selection scheme where each filter recommends a group of its closest filters. To pick the preserved filters from these recommended groups, we further devise a k -reciprocal nearest filter (RNF) selection scheme where the selected filters fall into the intersection of these recommended groups. Both our pruned network structure and the filter selection are nonlearning processes, which, thus, significantly reduces the pruning complexity and differentiates our method from existing works. We conduct image classification on CIFAR-10 and ImageNet to demonstrate the superiority of our CLR-RNF over the state-of-the-arts. For example, on CIFAR-10, CLR-RNF removes 74.1% FLOPs and 95.0% parameters from VGGNet-16 with even 0.3% accuracy improvements. On ImageNet, it removes 70.2% FLOPs and 64.8% parameters from ResNet-50 with only 1.7% top-five accuracy drops. Our project is available at https://github.com/lmbxmu/CLR-RNF.

14.
IEEE Trans Neural Netw Learn Syst ; 34(10): 7946-7955, 2023 Oct.
Article de Anglais | MEDLINE | ID: mdl-35157600

RÉSUMÉ

Channel pruning has been long studied to compress convolutional neural networks (CNNs), which significantly reduces the overall computation. Prior works implement channel pruning in an unexplainable manner, which tends to reduce the final classification errors while failing to consider the internal influence of each channel. In this article, we conduct channel pruning in a white box. Through deep visualization of feature maps activated by different channels, we observe that different channels have a varying contribution to different categories in image classification. Inspired by this, we choose to preserve channels contributing to most categories. Specifically, to model the contribution of each channel to differentiating categories, we develop a class-wise mask for each channel, implemented in a dynamic training manner with respect to the input image's category. On the basis of the learned class-wise mask, we perform a global voting mechanism to remove channels with less category discrimination. Lastly, a fine-tuning process is conducted to recover the performance of the pruned model. To our best knowledge, it is the first time that CNN interpretability theory is considered to guide channel pruning. Extensive experiments on representative image classification tasks demonstrate the superiority of our White-Box over many state-of-the-arts (SOTAs). For instance, on CIFAR-10, it reduces 65.23% floating point operations per seconds (FLOPs) with even 0.62% accuracy improvement for ResNet-110. On ILSVRC-2012, White-Box achieves a 45.6% FLOP reduction with only a small loss of 0.83% in the top-1 accuracy for ResNet-50. Code is available at https://github.com/zyxxmu/White-Box.

15.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3181-3199, 2023 Mar.
Article de Anglais | MEDLINE | ID: mdl-35696461

RÉSUMÉ

Graph Neural Networks have attracted increasing attention in recent years. However, existing GNN frameworks are deployed based upon simple graphs, which limits their applications in dealing with complex data correlation of multi-modal/multi-type data in practice. A few hypergraph-based methods have recently been proposed to address the problem of multi-modal/multi-type data correlation by directly concatenating the hypergraphs constructed from each single individual modality/type, which is difficult to learn an adaptive weight for each modality/type. In this paper, we extend the original conference version HGNN, and introduce a general high-order multi-modal/multi-type data correlation modeling framework called HGNN + to learn an optimal representation in a single hypergraph based framework. It is achieved by bridging multi-modal/multi-type data and hyperedge with hyperedge groups. Specifically, in our method, hyperedge groups are first constructed to represent latent high-order correlations in each specific modality/type with explicit or implicit graph structures. An adaptive hyperedge group fusion strategy is then used to effectively fuse the correlations from different modalities/types in a unified hypergraph. After that a new hypergraph convolution scheme performed in spatial domain is used to learn a general data representation for various tasks. We have evaluated this framework on several popular datasets and compared it with recent state-of-the-art methods. The comprehensive evaluations indicate that the proposed HGNN + framework can consistently outperform existing methods with a significant margin, especially when modeling implicit data correlations. We also release a toolbox called THU-DeepHypergraph for the proposed framework, which can be used for various of applications, such as data classification, retrieval and recommendation.

16.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 2945-2951, 2023 Mar.
Article de Anglais | MEDLINE | ID: mdl-35588416

RÉSUMÉ

Few-shot class-incremental learning (FSCIL) is challenged by catastrophically forgetting old classes and over-fitting new classes. Revealed by our analyses, the problems are caused by feature distribution crumbling, which leads to class confusion when continuously embedding few samples to a fixed feature space. In this study, we propose a Dynamic Support Network (DSN), which refers to an adaptively updating network with compressive node expansion to "support" the feature space. In each training session, DSN tentatively expands network nodes to enlarge feature representation capacity for incremental classes. It then dynamically compresses the expanded network by node self-activation to pursue compact feature representation, which alleviates over-fitting. Simultaneously, DSN selectively recalls old class distributions during incremental learning to support feature distributions and avoid confusion between classes. DSN with compressive node expansion and class distribution recalling provides a systematic solution for the problems of catastrophic forgetting and overfitting. Experiments on CUB, CIFAR-100, and miniImage datasets show that DSN significantly improves upon the baseline approach, achieving new state-of-the-arts.

17.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6277-6288, 2023 May.
Article de Anglais | MEDLINE | ID: mdl-36215372

RÉSUMÉ

Binary neural networks (BNNs) have attracted broad research interest due to their efficient storage and computational ability. Nevertheless, a significant challenge of BNNs lies in handling discrete constraints while ensuring bit entropy maximization, which typically makes their weight optimization very difficult. Existing methods relax the learning using the sign function, which simply encodes positive weights into +1s, and -1s otherwise. Alternatively, we formulate an angle alignment objective to constrain the weight binarization to {0,+1} to solve the challenge. In this article, we show that our weight binarization provides an analytical solution by encoding high-magnitude weights into +1s, and 0s otherwise. Therefore, a high-quality discrete solution is established in a computationally efficient manner without the sign function. We prove that the learned weights of binarized networks roughly follow a Laplacian distribution that does not allow entropy maximization, and further demonstrate that it can be effectively solved by simply removing the l2 regularization during network training. Our method, dubbed sign-to-magnitude network binarization (SiMaN), is evaluated on CIFAR-10 and ImageNet, demonstrating its superiority over the sign-based state-of-the-arts. Our source code, experimental settings, training logs and binary models are available at https://github.com/lmbxmu/SiMaN.

18.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 5800-5815, 2023 05.
Article de Anglais | MEDLINE | ID: mdl-36155478

RÉSUMÉ

Patient survival prediction based on gigapixel whole-slide histopathological images (WSIs) has become increasingly prevalent in recent years. A key challenge of this task is achieving an informative survival-specific global representation from those WSIs with highly complicated data correlation. This article proposes a multi-hypergraph based learning framework, called "HGSurvNet," to tackle this challenge. HGSurvNet achieves an effective high-order global representation of WSIs via multilateral correlation modeling in multiple spaces and a general hypergraph convolution network. It has the ability to alleviate over-fitting issues caused by the lack of training data by using a new convolution structure called hypergraph max-mask convolution. Extensive validation experiments were conducted on three widely-used carcinoma datasets: Lung Squamous Cell Carcinoma (LUSC), Glioblastoma Multiforme (GBM), and National Lung Screening Trial (NLST). Quantitative analysis demonstrated that the proposed method consistently outperforms state-of-the-art methods, coupled with the Bayesian Concordance Readjust loss. We also demonstrate the individual effectiveness of each module of the proposed framework and its application potential for pathology diagnosis and reporting empowered by its interpretability potential.


Sujet(s)
Algorithmes , Apprentissage , Humains , Théorème de Bayes
19.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 3999-4008, 2023 Apr.
Article de Anglais | MEDLINE | ID: mdl-35917571

RÉSUMÉ

Though network pruning receives popularity in reducing the complexity of convolutional neural networks (CNNs), it remains an open issue to concurrently maintain model accuracy as well as achieve significant speedups on general CPUs. In this paper, we propose a novel 1×N pruning pattern to break this limitation. In particular, consecutive N output kernels with the same input channel index are grouped into one block, which serves as a basic pruning granularity of our pruning pattern. Our 1×N pattern prunes these blocks considered unimportant. We also provide a workflow of filter rearrangement that first rearranges the weight matrix in the output channel dimension to derive more influential blocks for accuracy improvements and then applies similar rearrangement to the next-layer weights in the input channel dimension to ensure correct convolutional operations. Moreover, the output computation after our 1×N pruning can be realized via a parallelized block-wise vectorized operation, leading to significant speedups on general CPUs. The efficacy of our pruning pattern is proved with experiments on ILSVRC-2012. For example, given the pruning rate of 50% and N=4, our pattern obtains about 3.0% improvements over filter pruning in the top-1 accuracy of MobileNet-V2. Meanwhile, it obtains 56.04ms inference savings on Cortex-A7 CPU over weight pruning. Our project is made available at https://github.com/lmbxmu/1xN.

20.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 5158-5173, 2023 Apr.
Article de Anglais | MEDLINE | ID: mdl-35917573

RÉSUMÉ

Variation of scales or aspect ratios has been one of the main challenges for tracking. To overcome this challenge, most existing methods adopt either multi-scale search or anchor-based schemes, which use a predefined search space in a handcrafted way and therefore limit their performance in complicated scenes. To address this problem, recent anchor-free based trackers have been proposed without using prior scale or anchor information. However, an inconsistency problem between classification and regression degrades the tracking performance. To address the above issues, we propose a simple yet effective tracker (named Siamese Box Adaptive Network, SiamBAN) to learn a target-aware scale handling schema in a data-driven manner. Our basic idea is to predict the target boxes in a per-pixel fashion through a fully convolutional network, which is anchor-free. Specifically, SiamBAN divides the tracking problem into classification and regression tasks, which directly predict objectiveness and regress bounding boxes, respectively. A no-prior box design is proposed to avoid tuning hyper-parameters related to candidate boxes, which makes SiamBAN more flexible. SiamBAN further uses a target-aware branch to address the inconsistency problem. Experiments on benchmarks including VOT2018, VOT2019, OTB100, UAV123, LaSOT and TrackingNet show that SiamBAN achieves promising performance and runs at 35 FPS.

SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...