Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
BMC Genomics ; 21(1): 691, 2020 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-33023466

RESUMO

BACKGROUND: Head and neck squamous cell carcinoma (HNSCC) is a fatal malignancy owing to the lack of effective tools to predict overall survival (OS). MicroRNAs (miRNAs) play an important role in HNSCC occurrence, development, invasion and metastasis, significantly affecting the OS of patients. Thus, the construction of miRNA-based risk signatures and nomograms is desirable to predict the OS of patients with HNSCC. Accordingly, in the present study, miRNA sequencing data of 71 HNSCC and 13 normal samples downloaded from The Cancer Genome Atlas (TCGA) were screened to identify differentially expressed miRNAs (DEMs) between HNSCC patients and normal controls. Based on the exclusion criteria, the clinical information and miRNA sequencing data of 67 HNSCC samples were selected and used to establish a miRNA-based signature and a prognostic nomogram. Forty-three HNSCC samples were assigned to an internal validation cohort for verifying the credibility and accuracy of the primary cohort. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed to explore the functions of 11 miRNA target genes. RESULTS: In total, 11 DEMs were successfully identified. An 11-miRNA risk signature and a prognostic nomogram were constructed based on the expression levels of these 11 DEMs and clinical information. The signature and nomogram were further validated by calculating the C-index, area under the curve (AUC) in receiver-operating characteristic curve analysis, and calibration curves, which revealed their promising performance. The results of the internal validation cohort shown the reliable predictive accuracy both of the miRNA-based signature and the prognostic nomogram. GO and KEGG analyses revealed that a mass of signal pathways participated in HNSCC proliferation and metastasis. CONCLUSION: Overall, we constructed an 11-miRNA-based signature and a prognostic nomogram with excellent accuracy for predicting the OS of patients with HNSCC.


Assuntos
Biomarcadores Tumorais/genética , Carcinoma de Células Escamosas/genética , Neoplasias de Cabeça e Pescoço/genética , MicroRNAs/genética , Biomarcadores Tumorais/metabolismo , Biomarcadores Tumorais/normas , Carcinoma de Células Escamosas/patologia , Neoplasias de Cabeça e Pescoço/patologia , Humanos , MicroRNAs/metabolismo , MicroRNAs/normas , Nomogramas
2.
Biol Pharm Bull ; 37(2): 248-54, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24492721

RESUMO

20(S)-Ginsenoside Rh2 (GRh2) and ginsenoside Rg3 (GRg3) are members of the protopanaxadiol family and have been investigated for possible chemopreventive activity. This study explored the biological and apoptotic mechanisms induced by 20(S)-GRh2 in human acute leukaemia line-Reh cells. Reh cells were treated with different concentration of 20(S)-GRh2 in vitro. Cell viability was determined by Cell Counting Kit-8 and Annexin V/7-AAD assays. Mitochondrial membrane potential (MMP) was examined through JC-1 staining. Activation of caspases associated with the mitochondria-mediated apoptosis pathway was determined by Western blot. We observed that survival of Reh cells decreased after exposure to 20(S)-GRh2 in a concentration-dependent manner. Moreover, 20(S)-GRh2 can induce mitochondria depolarization of Reh cells as evident in the shift in JC-1 fluorescence from red to green. In addition, 20(S)-GRh2 induced the release of mitochondrial cytochrome c and activation of caspase-9 and caspase-3 in Reh cells. These results indicate that 20(S)-GRh2 could induce apoptosis through the mitochondrial pathway, demonstrating its potential as a chemotherapeutic agent for leukaemia therapy.


Assuntos
Antineoplásicos Fitogênicos/farmacologia , Apoptose/efeitos dos fármacos , Ginsenosídeos/farmacologia , Leucemia/tratamento farmacológico , Mitocôndrias/efeitos dos fármacos , Panax/química , Fitoterapia , Anexina A5/metabolismo , Antineoplásicos Fitogênicos/uso terapêutico , Apoptose/fisiologia , Caspase 3/metabolismo , Caspase 9/metabolismo , Linhagem Celular Tumoral , Citocromos c/metabolismo , Relação Dose-Resposta a Droga , Medicamentos de Ervas Chinesas/farmacologia , Medicamentos de Ervas Chinesas/uso terapêutico , Ginsenosídeos/uso terapêutico , Humanos , Leucemia/metabolismo , Potencial da Membrana Mitocondrial/efeitos dos fármacos , Mitocôndrias/metabolismo , Mitocôndrias/fisiologia , Transdução de Sinais
3.
Artigo em Inglês | MEDLINE | ID: mdl-39024083

RESUMO

Conventional neural architecture search (NAS) algorithms typically work on search spaces with short-distance node connections. We argue that such designs, though safe and stable, are obstacles to exploring more effective network architectures. In this brief, we explore the search algorithm upon a complicated search space with long-distance connections and show that existing weight-sharing search algorithms fail due to the existence of interleaved connections (ICs). Based on the observation, we present a simple-yet-effective algorithm, termed interleaving-free neural architecture search (IF-NAS). We further design a periodic sampling strategy to construct subnetworks during the search procedure, avoiding the ICs to emerge in any of them. In the proposed search space, IF-NAS outperforms both random sampling and previous weight-sharing search algorithms by significant margins. It can also be well-generalized to the microcell-based spaces. This study emphasizes the importance of macrostructure and we look forward to further efforts in this direction. The code is available at github.com/sunsmarterjie/IFNAS.

4.
Artigo em Inglês | MEDLINE | ID: mdl-38980785

RESUMO

Under low data regimes, few-shot object detection (FSOD) transfers related knowledge from base classes with sufficient annotations to novel classes with limited samples in a two-step paradigm, including base training and balanced fine-tuning. In base training, the learned embedding space needs to be dispersed with large class margins to facilitate novel class accommodation and avoid feature aliasing while in balanced fine-tuning properly concentrating with small margins to represent novel classes precisely. Although obsession with the discrimination and representation dilemma has stimulated substantial progress, explorations for the equilibrium of class margins within the embedding space are still in full swing. In this study, we propose a class margin optimization scheme, termed explicit margin equilibrium (EME), by explicitly leveraging the quantified relationship between base and novel classes. EME first maximizes base-class margins to reserve adequate space to prepare for novel class adaptation. During fine-tuning, it quantifies the interclass semantic relationships by calculating the equilibrium coefficients based on the assumption that novel instances can be represented by linear combinations of base-class prototypes. EME finally reweights margin loss using equilibrium coefficients to adapt base knowledge for novel instance learning with the help of instance disturbance (ID) augmentation. As a plug-and-play module, EME can also be applied to few-shot classification. Consistent performance gains upon various baseline methods and benchmarks validate the generality and efficacy of EME. The code is available at github.com/Bohao-Lee/EME.

5.
Artigo em Inglês | MEDLINE | ID: mdl-38241099

RESUMO

Multidomain crowd counting aims to learn a general model for multiple diverse datasets. However, deep networks prefer modeling distributions of the dominant domains instead of all domains, which is known as domain bias. In this study, we propose a simple-yet-effective modulating domain-specific knowledge network (MDKNet) to handle the domain bias issue in multidomain crowd counting. MDKNet is achieved by employing the idea of "modulating", enabling deep network balancing and modeling different distributions of diverse datasets with little bias. Specifically, we propose an instance-specific batch normalization (IsBN) module, which serves as a base modulator to refine the information flow to be adaptive to domain distributions. To precisely modulating the domain-specific information, the domain-guided virtual classifier (DVC) is then introduced to learn a domain-separable latent space. This space is employed as an input guidance for the IsBN modulator, such that the mixture distributions of multiple datasets can be well treated. Extensive experiments performed on popular benchmarks, including Shanghai-tech A/B, QNRF, and NWPU validate the superiority of MDKNet in tackling multidomain crowd counting and the effectiveness for multidomain learning. Code is available at https://github.com/csguomy/MDKNet.

6.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 4908-4925, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38306258

RESUMO

Point-based object localization (POL), which pursues high-performance object sensing under low-cost data annotation, has attracted increased attention. However, the point annotation mode inevitably introduces semantic variance due to the inconsistency of annotated points. Existing POL heavily rely on strict annotation rules, which are difficult to define and apply, to handle the problem. In this study, we propose coarse point refinement (CPR), which to our best knowledge is the first attempt to alleviate semantic variance from an algorithmic perspective. CPR reduces the semantic variance by selecting a semantic centre point in a neighbourhood region to replace the initial annotated point. Furthermore, We design a sampling region estimation module to dynamically compute a sampling region for each object and use a cascaded structure to achieve end-to-end optimization. We further integrate a variance regularization into the structure to concentrate the predicted scores, yielding CPR++. We observe that CPR++ can obtain scale information and further reduce the semantic variance in a global region, thus guaranteeing high-performance object localization. Extensive experiments on four challenging datasets validate the effectiveness of both CPR and CPR++. We hope our work can inspire more research on designing algorithms rather than annotation rules to address the semantic variance problem in POL.

7.
Artigo em Inglês | MEDLINE | ID: mdl-39046859

RESUMO

We propose integrally pre-trained transformer pyramid network (iTPN), towards jointly optimizing the network backbone and the neck, so that transfer gap between representation models and downstream tasks is minimal. iTPN is born with two elaborated designs: 1) The first pre-trained feature pyramid upon vision transformer (ViT). 2) Multi-stage supervision to the feature pyramid using masked feature modeling (MFM) . iTPN is updated to Fast-iTPN, reducing computational memory overhead and accelerating inference through two flexible designs. 1) Token migration: dropping redundant tokens of the backbone while replenishing them in the feature pyramid without attention operations. 2) Token gathering: reducing computation cost caused by global attention by introducing few gathering tokens. The base/large-level Fast-iTPN achieve 88.75%/89.5% top-1 accuracy on ImageNet-1K. With 1× training schedule using DINO, the base/large-level Fast-iTPN achieves 58.4%/58.8% box AP on COCO object detection, and a 57.5%/58.7% mIoU on ADE20K semantic segmentation using MaskDINO. Fast-iTPN can accelerate the inference procedure by up to 70%, with negligible performance loss, demonstrating the potential to be a powerful backbone for downstream vision tasks. The code is available at github.com/sunsmarterjie/iTPN.

8.
IEEE Trans Neural Netw Learn Syst ; 34(12): 9832-9846, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35358053

RESUMO

In this study, we propose a novel pretext task and a self-supervised motion perception (SMP) method for spatiotemporal representation learning. The pretext task is defined as video playback rate perception, which utilizes temporal dilated sampling to augment video clips to multiple duplicates of different temporal resolutions. The SMP method is built upon discriminative and generative motion perception models, which capture representations related to motion dynamics and appearance from video clips of multiple temporal resolutions in a collaborative fashion. To enhance the collaboration, we further propose difference and convolution motion attention (MA), which drives the generative model focusing on motion-related appearance, and leverage multiple granularity perception (MG) to extract accurate motion dynamics. Extensive experiments demonstrate SMP's effectiveness for video motion perception and state-of-the-art performance of self-supervised representation models upon target tasks, including action recognition and video retrieval. Code for SMP is available at github.com/yuanyao366/SMP.

9.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12699-12706, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37145941

RESUMO

Few-shot class-incremental learning (FSCIL) faces the challenges of memorizing old class distributions and estimating new class distributions given few training samples. In this study, we propose a learnable distribution calibration (LDC) approach, to systematically solve these two challenges using a unified framework. LDC is built upon a parameterized calibration unit (PCU), which initializes biased distributions for all classes based on classifier vectors (memory-free) and a single covariance matrix. The covariance matrix is shared by all classes, so that the memory costs are fixed. During base training, PCU is endowed with the ability to calibrate biased distributions by recurrently updating sampled features under supervision of real distributions. During incremental learning, PCU recovers distributions for old classes to avoid 'forgetting', as well as estimating distributions and augmenting samples for new classes to alleviate 'over-fitting' caused by the biased distributions of few-shot samples. LDC is theoretically plausible by formatting a variational inference procedure. It improves FSCIL's flexibility as the training procedure requires no class similarity priori. Experiments on CUB200, CIFAR100, and mini-ImageNet datasets show that LDC respectively outperforms the state-of-the-arts by 4.64%, 1.98%, and 3.97%. LDC's effectiveness is also validated on few-shot learning scenarios.

10.
Artigo em Inglês | MEDLINE | ID: mdl-37988202

RESUMO

Adapting object detectors learned with sufficient supervision to novel classes under low data regimes is charming yet challenging. In few-shot object detection (FSOD), the two-step training paradigm is widely adopted to mitigate the severe sample imbalance, i.e., holistic pre-training on base classes, then partial fine-tuning in a balanced setting with all classes. Since unlabeled instances are suppressed as backgrounds in the base training phase, the learned region proposal network (RPN) is prone to produce biased proposals for novel instances, resulting in dramatic performance degradation. Unfortunately, the extreme data scarcity aggravates the proposal distribution bias, hindering the region of interest (RoI) head from evolving toward novel classes. In this brief, we introduce a simple yet effective proposal distribution calibration (PDC) approach to neatly enhance the localization and classification abilities of the RoI head by recycling its localization ability endowed in base training and enriching high-quality positive samples for semantic fine-tuning. Specifically, we sample proposals based on the base proposal statistics to calibrate the distribution bias and impose additional localization and classification losses upon the sampled proposals for fast expanding the base detector to novel classes. Experiments on the commonly used Pascal VOC and MS COCO datasets with explicit state-of-the-art performances justify the efficacy of our PDC for FSOD. Code is available at github.com/Bohao-Lee/PDC.

11.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 2945-2951, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35588416

RESUMO

Few-shot class-incremental learning (FSCIL) is challenged by catastrophically forgetting old classes and over-fitting new classes. Revealed by our analyses, the problems are caused by feature distribution crumbling, which leads to class confusion when continuously embedding few samples to a fixed feature space. In this study, we propose a Dynamic Support Network (DSN), which refers to an adaptively updating network with compressive node expansion to "support" the feature space. In each training session, DSN tentatively expands network nodes to enlarge feature representation capacity for incremental classes. It then dynamically compresses the expanded network by node self-activation to pursue compact feature representation, which alleviates over-fitting. Simultaneously, DSN selectively recalls old class distributions during incremental learning to support feature distributions and avoid confusion between classes. DSN with compressive node expansion and class distribution recalling provides a systematic solution for the problems of catastrophic forgetting and overfitting. Experiments on CUB, CIFAR-100, and miniImage datasets show that DSN significantly improves upon the baseline approach, achieving new state-of-the-arts.

12.
IEEE Trans Image Process ; 32: 29-42, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36459604

RESUMO

Unsupervised person re-identification (re-ID) remains a challenging task. While extensive research has focused on the framework design and loss function, this paper shows that sampling strategy plays an equally important role. We analyze the reasons for the performance differences between various sampling strategies under the same framework and loss function. We suggest that deteriorated over-fitting is an important factor causing poor performance, and enhancing statistical stability can rectify this problem. Inspired by that, a simple yet effective approach is proposed, termed group sampling, which gathers samples from the same class into groups. The model is thereby trained using normalized group samples, which helps alleviate the negative impact of individual samples. Group sampling updates the pipeline of pseudo-label generation by guaranteeing that samples are more efficiently classified into the correct classes. It regulates the representation learning process, enhancing statistical stability for feature representation in a progressive fashion. Extensive experiments on Market-1501, DukeMTMC-reID and MSMT17 show that group sampling achieves performance comparable to state-of-the-art methods and outperforms the current techniques under purely camera-agnostic settings. Code has been available at https://github.com/ucas-vg/GroupSampling.

13.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12133-12147, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37200122

RESUMO

Despite the substantial progress of active learning for image recognition, there lacks a systematic investigation of instance-level active learning for object detection. In this paper, we propose to unify instance uncertainty calculation with image uncertainty estimation for informative image selection, creating a multiple instance differentiation learning (MIDL) method for instance-level active learning. MIDL consists of a classifier prediction differentiation module and a multiple instance differentiation module. The former leverages two adversarial instance classifiers trained on the labeled and unlabeled sets to estimate instance uncertainty of the unlabeled set. The latter treats unlabeled images as instance bags and re-estimates image-instance uncertainty using the instance classification model in a multiple instance learning fashion. Through weighting the instance uncertainty using instance class probability and instance objectness probability under the total probability formula, MIDL unifies the image uncertainty with instance uncertainty in the Bayesian theory framework. Extensive experiments validate that MIDL sets a solid baseline for instance-level active learning. On commonly used object detection datasets, it outperforms other state-of-the-art methods by significant margins, particularly when the labeled sets are small.

14.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9454-9468, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37022836

RESUMO

With convolution operations, Convolutional Neural Networks (CNNs) are good at extracting local features but experience difficulty to capture global representations. With cascaded self-attention modules, vision transformers can capture long-distance feature dependencies but unfortunately deteriorate local feature details. In this paper, we propose a hybrid network structure, termed Conformer, to take both advantages of convolution operations and self-attention mechanisms for enhanced representation learning. Conformer roots in feature coupling of CNN local features and transformer global representations under different resolutions in an interactive fashion. Conformer adopts a dual structure so that local details and global dependencies are retained to the maximum extent. We also propose a Conformer-based detector (ConformerDet), which learns to predict and refine object proposals, by performing region-level feature coupling in an augmented cross-attention fashion. Experiments on ImageNet and MS COCO datasets validate Conformer's superiority for visual recognition and object detection, demonstrating its potential to be a general backbone network.


Assuntos
Algoritmos , Aprendizagem , Redes Neurais de Computação
15.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12535-12549, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37155380

RESUMO

Vision-and-language navigation (VLN) asks an agent to follow a given language instruction to navigate through a real 3D environment. Despite significant advances, conventional VLN agents are trained typically under disturbance-free environments and may easily fail in real-world navigation scenarios, since they are unaware of how to deal with various possible disturbances, such as sudden obstacles or human interruptions, which widely exist and may usually cause an unexpected route deviation. In this paper, we present a model-agnostic training paradigm, called Progressive Perturbation-aware Contrastive Learning (PROPER) to enhance the generalization ability of existing VLN agents to the real world, by requiring them to learn towards deviation-robust navigation. Specifically, a simple yet effective path perturbation scheme is introduced to implement the route deviation, with which the agent is required to still navigate successfully following the original instruction. Since directly enforcing the agent to learn perturbed trajectories may lead to insufficient and inefficient training, a progressively perturbed trajectory augmentation strategy is designed, where the agent can self-adaptively learn to navigate under perturbation with the improvement of its navigation performance for each specific trajectory. For encouraging the agent to well capture the difference brought by perturbation and adapt to both perturbation-free and perturbation-based environments, a perturbation-aware contrastive learning mechanism is further developed by contrasting perturbation-free trajectory encodings and perturbation-based counterparts. Extensive experiments on the standard Room-to-Room (R2R) benchmark show that PROPER can benefit multiple state-of-the-art VLN baselines in perturbation-free scenarios. We further collect the perturbed path data to construct an introspection subset based on the R2R, called Path-Perturbed R2R (PP-R2R). The results on PP-R2R show unsatisfying robustness of popular VLN agents and the capability of PROPER in improving the navigation robustness under deviation.

16.
Artigo em Inglês | MEDLINE | ID: mdl-37934637

RESUMO

Unsupervised domain adaptation (UDA) person reidentification (Re-ID) aims to identify pedestrian images within an unlabeled target domain with an auxiliary labeled source-domain dataset. Many existing works attempt to recover reliable identity information by considering multiple homogeneous networks. And take these generated labels to train the model in the target domain. However, these homogeneous networks identify people in approximate subspaces and equally exchange their knowledge with others or their mean net to improve their ability, inevitably limiting the scope of available knowledge and putting them into the same mistake. This article proposes a dual-level asymmetric mutual learning (DAML) method to learn discriminative representations from a broader knowledge scope with diverse embedding spaces. Specifically, two heterogeneous networks mutually learn knowledge from asymmetric subspaces through the pseudo label generation in a hard distillation manner. The knowledge transfer between two networks is based on an asymmetric mutual learning (AML) manner. The teacher network learns to identify both the target and source domain while adapting to the target domain distribution based on the knowledge of the student. Meanwhile, the student network is trained on the target dataset and employs the ground-truth label through the knowledge of the teacher. Extensive experiments in Market-1501, CUHK-SYSU, and MSMT17 public datasets verified the superiority of DAML over state-of-the-arts (SOTA).

17.
IEEE Trans Pattern Anal Mach Intell ; 44(6): 3096-3109, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-33434120

RESUMO

Modern CNN-based object detectors assign anchors for ground-truth objects under the restriction of object-anchor Intersection-over-Union (IoU). In this study, we propose a learning-to-match (LTM) method to break IoU restriction, allowing objects to match anchors in a flexible manner. LTM updates hand-crafted anchor assignment to "free" anchor matching by formulating detector training in the Maximum Likelihood Estimation (MLE) framework. During the training phase, LTM is implemented by converting the detection likelihood to anchor matching loss functions which are plug-and-play. Minimizing the matching loss functions drives learning and selecting features which best explain a class of objects with respect to both classification and localization. LTM is extended from anchor-based detectors to anchor-free detectors, validating the general applicability of learnable object-feature matching mechanism for visual object detection. Experiments on MS COCO dataset demonstrate that LTM detectors consistently outperform counterpart detectors with significant margins. The last but not the least, LTM requires negligible computational cost in both training and inference phases as it does not involve any additional architecture or parameter. Code has been made publicly available.


Assuntos
Algoritmos , Redes Neurais de Computação
18.
IEEE Trans Neural Netw Learn Syst ; 33(10): 5452-5466, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33861707

RESUMO

Weakly supervised object detection (WSOD) is a challenging task that requires simultaneously learning object detectors and estimating object locations under the supervision of image category labels. Many WSOD methods that adopt multiple instance learning (MIL) have nonconvex objective functions and, therefore, are prone to get stuck in local minima (falsely localize object parts) while missing full object extent during training. In this article, we introduce classical continuation optimization into MIL, thereby creating continuation MIL (C-MIL) with the aim to alleviate the nonconvexity problem in a systematic way. To fulfill this purpose, we partition instances into class-related and spatially related subsets and approximate MIL's objective function with a series of smoothed objective functions defined within the subsets. We further propose a parametric strategy to implement continuation smooth functions, which enables C-MIL to be applied to instance selection tasks in a uniform manner. Optimizing smoothed loss functions prevents the training procedure from falling prematurely into local minima and facilities learning full object extent. Extensive experiments demonstrate the superiority of CMIL over conventional MIL methods. As a general instance selection method, C-MIL is also applied to supervised object detection to optimize anchors/features, improving the detection performance with a significant margin.

19.
Artigo em Inglês | MEDLINE | ID: mdl-36417732

RESUMO

Weakly supervised object localization (WSOL), which trains object localization models using solely image category annotations, remains a challenging problem. Existing approaches based on convolutional neural networks (CNNs) tend to miss full object extent while activating discriminative object parts. Based on our analysis, this is caused by CNN's intrinsic characteristics, which experiences difficulty to capture object semantics at long distances. In this article, we introduce the vision transformer to WSOL, with the aim to capture long-range semantic dependency of features by leveraging transformer's cascaded self-attention mechanism. We propose the token semantic coupled attention map (TS-CAM) method, which first decomposes class-aware semantics and then couples the semantics with attention maps for semantic-aware activation. To capture object semantics at long distances and avoid partial activation, TS-CAM performs spatial embedding by partitioning an image to a set of patch tokens. To incorporate object category information to patch tokens, TS-CAM reallocates category-related semantics to each patch token. The patch tokens are finally coupled with attention maps which are semantic-agnostic to perform semantic-aware object localization. By introducing semantic tokens to produce semantic-aware attention maps, we further explore the capability of TS-CAM for multicategory object localization. Experiments show that TS-CAM outperforms its CNN-CAM counterpart by 11.6% and 28.9% on ILSVRC and CUB-200-2011 datasets, respectively, improving the state-of-the-art with large margins. TS-CAM also demonstrates superiority for multicategory object localization on the Pascal VOC dataset. The code is available at github.com/yuanyao366/ts-cam-extension.

20.
IEEE Trans Neural Netw Learn Syst ; 33(1): 117-129, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-33119512

RESUMO

Visual commonsense knowledge has received growing attention in the reasoning of long-tailed visual relationships biased in terms of object and relation labels. Most current methods typically collect and utilize external knowledge for visual relationships by following the fixed reasoning path of {subject, object → predicate} to facilitate the recognition of infrequent relationships. However, the knowledge incorporation for such fixed multidependent path suffers from the data set biased and exponentially grown combinations of object and relation labels and ignores the semantic gap between commonsense knowledge and real scenes. To alleviate this, we propose configurable graph reasoning (CGR) to decompose the reasoning path of visual relationships and the incorporation of external knowledge, achieving configurable knowledge selection and personalized graph reasoning for each relation type in each image. Given a commonsense knowledge graph, CGR learns to match and retrieve knowledge for different subpaths and selectively compose the knowledge routed path. CGR adaptively configures the reasoning path based on the knowledge graph, bridges the semantic gap between the commonsense knowledge, and the real-world scenes and achieves better knowledge generalization. Extensive experiments show that CGR consistently outperforms previous state-of-the-art methods on several popular benchmarks and works well with different knowledge graphs. Detailed analyses demonstrated that CGR learned explainable and compelling configurations of reasoning paths.


Assuntos
Algoritmos , Redes Neurais de Computação , Conhecimento , Reconhecimento Psicológico , Semântica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA