Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Image Process ; 33: 1136-1148, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38300774

RESUMO

The image-level label has prevailed in weakly supervised semantic segmentation tasks due to its easy availability. Since image-level labels can only indicate the existence or absence of specific categories of objects, visualization-based techniques have been widely adopted to provide object location clues. Considering class activation maps (CAMs) can only locate the most discriminative part of objects, recent approaches usually adopt an expansion strategy to enlarge the activation area for more integral object localization. However, without proper constraints, the expanded activation will easily intrude into the background region. In this paper, we propose spatial structure constraints (SSC) for weakly supervised semantic segmentation to alleviate the unwanted object over-activation of attention expansion. Specifically, we propose a CAM-driven reconstruction module to directly reconstruct the input image from deep CAM features, which constrains the diffusion of last-layer object attention by preserving the coarse spatial structure of the image content. Moreover, we propose an activation self-modulation module to refine CAMs with finer spatial structure details by enhancing regional consistency. Without external saliency models to provide background clues, our approach achieves 72.7% and 47.0% mIoU on the PASCAL VOC 2012 and COCO datasets, respectively, demonstrating the superiority of our proposed approach. The source codes and models have been made available at https://github.com/NUST-Machine-Intelligence-Laboratory/SSC.

2.
Clin Chem Lab Med ; 2024 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-38217085

RESUMO

OBJECTIVES: Lymphocyte subsets are the predictors of disease diagnosis, treatment, and prognosis. Determination of lymphocyte subsets is usually carried out by flow cytometry. Despite recent advances in flow cytometry analysis, most flow cytometry data can be challenging with manual gating, which is labor-intensive, time-consuming, and error-prone. This study aimed to develop an automated method to identify lymphocyte subsets. METHODS: We propose a knowledge-driven combined with data-driven method which can gate automatically to achieve subset identification. To improve accuracy and stability, we have implemented a Loop Adjustment Gating to optimize the gating result of the lymphocyte population. Furthermore, we have incorporated an anomaly detection mechanism to issue warnings for samples that might not have been successfully analyzed, ensuring the quality of the results. RESULTS: The evaluation showed a 99.2 % correlation between our method results and manual analysis with a dataset of 2,000 individual cases from lymphocyte subset assays. Our proposed method attained 97.7 % accuracy for all cases and 100 % for the high-confidence cases. With our automated method, 99.1 % of manual labor can be saved when reviewing only the low-confidence cases, while the average turnaround time required is only 29 s, reducing by 83.7 %. CONCLUSIONS: Our proposed method can achieve high accuracy in flow cytometry data from lymphocyte subset assays. Additionally, it can save manual labor and reduce the turnaround time, making it have the potential for application in the laboratory.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38051621

RESUMO

Visual grounding (VG) aims to locate a specific target in an image based on a given language query. The discriminative information from context is important for distinguishing the target from other objects, particularly for the targets that have the same category as others. However, most previous methods underestimate such information. Moreover, they are usually designed for the standard scene (without any novel object), which limits their generalization to the open-vocabulary scene. In this paper, we propose a novel framework with context disentangling and prototype inheriting for robust visual grounding to handle both scenes. Specifically, the context disentangling disentangles the referent and context features, which achieves better discrimination between them. The prototype inheriting inherits the prototypes discovered from the disentangled visual features by a prototype bank to fully utilize the seen data, especially for the open-vocabulary scene. The fused features, obtained by leveraging Hadamard product on disentangled linguistic and visual features of prototypes to avoid sharp adjusting the importance between the two types of features, are then attached with a special token and feed to a vision Transformer encoder for bounding box regression. Extensive experiments are conducted on both standard and open-vocabulary scenes. The performance comparisons indicate that our method outperforms the state-of-the-art methods in both scenarios. The code is available at https://github.com/WayneTomas/TransCP.

4.
Artigo em Inglês | MEDLINE | ID: mdl-38100344

RESUMO

Wireless sensor network (WSN) is an emerging and promising developing area in the intelligent sensing field. Due to various factors like sudden sensors breakdown or saving energy by deliberately shutting down partial nodes, there are always massive missing entries in the collected sensing data from WSNs. Low-rank matrix approximation (LRMA) is a typical and effective approach for pattern analysis and missing data recovery in WSNs. However, existing LRMA-based approaches ignore the adverse effects of outliers inevitably mixed with collected data, which may dramatically degrade their recovery accuracy. To address this issue, this article innovatively proposes a latent feature analysis (LFA) based spatiotemporal signal recovery (STSR) model, named LFA-STSR. Its main idea is twofold: 1) incorporating the spatiotemporal correlation into an LFA model as the regularization constraint to improve its recovery accuracy and 2) aggregating the L1 -norm into the loss part of an LFA model to improve its robustness to outliers. As such, LFA-STSR can accurately recover missing data based on partially observed data mixed with outliers in WSNs. To evaluate the proposed LFA-STSR model, extensive experiments have been conducted on four real-world WSNs datasets. The results demonstrate that LFA-STSR significantly outperforms the related six state-of-the-art models in terms of both recovery accuracy and robustness to outliers.

5.
Artigo em Inglês | MEDLINE | ID: mdl-37995167

RESUMO

This article proposes a new hashing framework named relational consistency induced self-supervised hashing (RCSH) for large-scale image retrieval. To capture the potential semantic structure of data, RCSH explores the relational consistency between data samples in different spaces, which learns reliable data relationships in the latent feature space and then preserves the learned relationships in the Hamming space. The data relationships are uncovered by learning a set of prototypes that group similar data samples in the latent feature space. By uncovering the semantic structure of the data, meaningful data-to-prototype and data-to-data relationships are jointly constructed. The data-to-prototype relationships are captured by constraining the prototype assignments generated from different augmented views of an image to be the same. Meanwhile, these data-to-prototype relationships are preserved to learn informative compact hash codes by matching them with these reliable prototypes. To accomplish this, a novel dual prototype contrastive loss is proposed to maximize the agreement of prototype assignments in the latent feature space and Hamming space. The data-to-data relationships are captured by enforcing the distribution of pairwise similarities in the latent feature space and Hamming space to be consistent, which makes the learned hash codes preserve meaningful similarity relationships. Extensive experimental results on four widely used image retrieval datasets demonstrate that the proposed method significantly outperforms the state-of-the-art methods. Besides, the proposed method achieves promising performance in out-of-domain retrieval tasks, which shows its good generalization ability. The source code and models are available at https://github.com/IMAG-LuJin/RCSH.

6.
Artigo em Inglês | MEDLINE | ID: mdl-37527324

RESUMO

Canonical correlation analysis (CCA) is a correlation analysis technique that is widely used in statistics and the machine-learning community. However, the high complexity involved in the training process lays a heavy burden on the processing units and memory system, making CCA nearly impractical in large-scale data. To overcome this issue, a novel CCA method that tries to carry out analysis on the dataset in the Fourier domain is developed in this article. Appling Fourier transform on the data, we can convert the traditional eigenvector computation of CCA into finding some predefined discriminative Fourier bases that can be learned with only element-wise dot product and sum operations, without complex time-consuming calculations. As the eigenvalues come from the sum of individual sample products, they can be estimated in parallel. Besides, thanks to the data characteristic of pattern repeatability, the eigenvalues can be well estimated with partial samples. Accordingly, a progressive estimate scheme is proposed, in which the eigenvalues are estimated through feeding data batch by batch until the eigenvalues sequence is stable in order. As a result, the proposed method shows its characteristics of extraordinarily fast and memory efficiencies. Furthermore, we extend this idea to the nonlinear kernel and deep models and obtained satisfactory accuracy and extremely fast training time consumption as expected. An extensive discussion on the fast Fourier transform (FFT)-CCA is made in terms of time and memory efficiencies. Experimental results on several large-scale correlation datasets, such as MNIST8M, X-RAY MICROBEAM SPEECH, and Twitter Users Data, demonstrate the superiority of the proposed algorithm over state-of-the-art (SOTA) large-scale CCA methods, as our proposed method achieves almost same accuracy with the training time of our proposed method being 1000 times faster. This makes our proposed models best practice models for dealing with large-scale correlation datasets. The source code is available at https://github.com/Mrxuzhao/FFTCCA.

7.
Artigo em Inglês | MEDLINE | ID: mdl-37022403

RESUMO

Deep learning-based models have been shown to outperform human beings in many computer vision tasks with massive available labeled training data in learning. However, humans have an amazing ability to easily recognize images of novel categories by browsing only a few examples of these categories. In this case, few-shot learning comes into being to make machines learn from extremely limited labeled examples. One possible reason why human beings can well learn novel concepts quickly and efficiently is that they have sufficient visual and semantic prior knowledge. Toward this end, this work proposes a novel knowledge-guided semantic transfer network (KSTNet) for few-shot image recognition from a supplementary perspective by introducing auxiliary prior knowledge. The proposed network jointly incorporates vision inferring, knowledge transferring, and classifier learning into one unified framework for optimal compatibility. A category-guided visual learning module is developed in which a visual classifier is learned based on the feature extractor along with the cosine similarity and contrastive loss optimization. To fully explore prior knowledge of category correlations, a knowledge transfer network is then developed to propagate knowledge information among all categories to learn the semantic-visual mapping, thus inferring a knowledge-based classifier for novel categories from base categories. Finally, we design an adaptive fusion scheme to infer the desired classifiers by effectively integrating the above knowledge and visual information. Extensive experiments are conducted on two widely used Mini-ImageNet and Tiered-ImageNet benchmarks to validate the effectiveness of KSTNet. Compared with the state of the art, the results show that the proposed method achieves favorable performance with minimal bells and whistles, especially in the case of one-shot learning.

8.
IEEE Trans Neural Netw Learn Syst ; 34(4): 1838-1851, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32502968

RESUMO

Hashing has been widely applied to multimodal retrieval on large-scale multimedia data due to its efficiency in computation and storage. In this article, we propose a novel deep semantic multimodal hashing network (DSMHN) for scalable image-text and video-text retrieval. The proposed deep hashing framework leverages 2-D convolutional neural networks (CNN) as the backbone network to capture the spatial information for image-text retrieval, while the 3-D CNN as the backbone network to capture the spatial and temporal information for video-text retrieval. In the DSMHN, two sets of modality-specific hash functions are jointly learned by explicitly preserving both intermodality similarities and intramodality semantic labels. Specifically, with the assumption that the learned hash codes should be optimal for the classification task, two stream networks are jointly trained to learn the hash functions by embedding the semantic labels on the resultant hash codes. Moreover, a unified deep multimodal hashing framework is proposed to learn compact and high-quality hash codes by exploiting the feature representation learning, intermodality similarity-preserving learning, semantic label-preserving learning, and hash function learning with different types of loss functions simultaneously. The proposed DSMHN method is a generic and scalable deep hashing framework for both image-text and video-text retrievals, which can be flexibly integrated with different types of loss functions. We conduct extensive experiments for both single-modal- and cross-modal-retrieval tasks on four widely used multimodal-retrieval data sets. Experimental results on both image-text- and video-text-retrieval tasks demonstrate that the DSMHN significantly outperforms the state-of-the-art methods.

9.
IEEE Trans Neural Netw Learn Syst ; 34(4): 1732-1741, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33064658

RESUMO

The adaptive neurofuzzy inference system (ANFIS) is a structured multioutput learning machine that has been successfully adopted in learning problems without noise or outliers. However, it does not work well for learning problems with noise or outliers. High-accuracy real-time forecasting of traffic flow is extremely difficult due to the effect of noise or outliers from complex traffic conditions. In this study, a novel probabilistic learning system, probabilistic regularized extreme learning machine combined with ANFIS (probabilistic R-ELANFIS), is proposed to capture the correlations among traffic flow data and, thereby, improve the accuracy of traffic flow forecasting. The new learning system adopts a fantastic objective function that minimizes both the mean and the variance of the model bias. The results from an experiment based on real-world traffic flow data showed that, compared with some kernel-based approaches, neural network approaches, and conventional ANFIS learning systems, the proposed probabilistic R-ELANFIS achieves competitive performance in terms of forecasting ability and generalizability.

10.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3003-3018, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35759595

RESUMO

Weakly supervised Referring Expression Grounding (REG) aims to ground a particular target in an image described by a language expression while lacking the correspondence between target and expression. Two main problems exist in weakly supervised REG. First, the lack of region-level annotations introduces ambiguities between proposals and queries. Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects. To address the above challenges, we design an entity-enhanced adaptive reconstruction network (EARN). Specifically, EARN includes three modules: entity enhancement, adaptive grounding, and collaborative reconstruction. In entity enhancement, we calculate semantic similarity as supervision to select the candidate proposals. Adaptive grounding calculates the ranking score of candidate proposals upon subject, location and context with hierarchical attention. Collaborative reconstruction measures the ranking result from three perspectives: adaptive reconstruction, language reconstruction and attribute classification. The adaptive mechanism helps to alleviate the variance of different referring expressions. Experiments on five datasets show EARN outperforms existing state-of-the-art methods. Qualitative results demonstrate that the proposed EARN can better handle the situation where multiple objects of a particular category are situated together.

11.
IEEE Trans Image Process ; 31: 314-326, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34871171

RESUMO

Fine-grained image hashing is challenging due to the difficulties of capturing discriminative local information to generate hash codes. On the one hand, existing methods usually extract local features with the dense attention mechanism by focusing on dense local regions, which cannot contain diverse local information for fine-grained hashing. On the other hand, hash codes of the same class suffer from large intra-class variation of fine-grained images. To address the above problems, this work proposes a novel sub-Region Localized Hashing (sRLH) to learn intra-class compact and inter-class separable hash codes that also contain diverse subtle local information for efficient fine-grained image retrieval. Specifically, to localize diverse local regions, a sub-region localization module is developed to learn discriminative local features by locating the peaks of non-overlap sub-regions in the feature map. Different from localizing dense local regions, these peaks can guide the sub-region localization module to capture multifarious local discriminative information by paying close attention to dispersive local regions. To mitigate intra-class variations, hash codes of the same class are enforced to approach one common binary center. Meanwhile, the gram-schmidt orthogonalization is performed on the binary centers to make the hash codes inter-class separable. Extensive experimental results on four widely used fine-grained image retrieval datasets demonstrate the superiority of sRLH to several state-of-the-art methods. The source code of sRLH will be released at https://github.com/ZhangYajie-NJUST/sRLH.git.

12.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9904-9917, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34855586

RESUMO

Contextual information has been shown to be powerful for semantic segmentation. This work proposes a novel Context-based Tandem Network (CTNet) by interactively exploring the spatial contextual information and the channel contextual information, which can discover the semantic context for semantic segmentation. Specifically, the Spatial Contextual Module (SCM) is leveraged to uncover the spatial contextual dependency between pixels by exploring the correlation between pixels and categories. Meanwhile, the Channel Contextual Module (CCM) is introduced to learn the semantic features including the semantic feature maps and class-specific features by modeling the long-term semantic dependence between channels. The learned semantic features are utilized as the prior knowledge to guide the learning of SCM, which can make SCM obtain more accurate long-range spatial dependency. Finally, to further improve the performance of the learned representations for semantic segmentation, the results of the two context modules are adaptively integrated to achieve better results. Extensive experiments are conducted on four widely-used datasets, i.e., PASCAL-Context, Cityscapes, ADE20K and PASCAL VOC2012. The results demonstrate the superior performance of the proposed CTNet by comparison with several state-of-the-art methods. The source code and models are available at https://github.com/syp2ysy/CTNet.

13.
IEEE Trans Neural Netw Learn Syst ; 33(1): 130-144, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33180734

RESUMO

Recently, there are many works on discriminant analysis, which promote the robustness of models against outliers by using L1- or L2,1-norm as the distance metric. However, both of their robustness and discriminant power are limited. In this article, we present a new robust discriminant subspace (RDS) learning method for feature extraction, with an objective function formulated in a different form. To guarantee the subspace to be robust and discriminative, we measure the within-class distances based on [Formula: see text]-norm and use [Formula: see text]-norm to measure the between-class distances. This also makes our method include rotational invariance. Since the proposed model involves both [Formula: see text]-norm maximization and [Formula: see text]-norm minimization, it is very challenging to solve. To address this problem, we present an efficient nongreedy iterative algorithm. Besides, motivated by trace ratio criterion, a mechanism of automatically balancing the contributions of different terms in our objective is found. RDS is very flexible, as it can be extended to other existing feature extraction techniques. An in-depth theoretical analysis of the algorithm's convergence is presented in this article. Experiments are conducted on several typical databases for image classification, and the promising results indicate the effectiveness of RDS.

14.
IEEE Trans Neural Netw Learn Syst ; 33(4): 1752-1764, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-33378265

RESUMO

Recent studies on semantic segmentation are exploiting contextual information to address the problem of inconsistent parsing prediction in big objects and ignorance in small objects. However, they utilize multilevel contextual information equally across pixels, overlooking those different pixels may demand different levels of context. Motivated by the above-mentioned intuition, we propose a novel global-guided selective context network (GSCNet) to adaptively select contextual information for improving scene parsing. Specifically, we introduce two global-guided modules, called global-guided global module (GGM) and global-guided local module (GLM), to, respectively, select global context (GC) and local context (LC) for pixels. When given an input feature map, GGM jointly employs the input feature map and its globally pooled feature to learn its global contextual demand based on which per-pixel GC is selected. While GLM adopts low-level feature from the adjacent stage as LC and synthetically models the input feature map, its globally pooled feature and LC to generate local contextual demand, based on which per-pixel LC is selected. Furthermore, we combine these two modules as a selective context block and import such SCBs in different levels of the network to propagate contextual information in a coarse-to-fine manner. Finally, we conduct extensive experiments to verify the effectiveness of our proposed model and achieve state-of-the-art performance on four challenging scene parsing data sets, i.e., Cityscapes, ADE20K, PASCAL Context, and COCO Stuff. Especially, GSCNet-101 obtains 82.6% on Cityscapes test set without using coarse data and 56.22% on ADE20K test set.


Assuntos
Algoritmos , Redes Neurais de Computação
15.
Artigo em Inglês | MEDLINE | ID: mdl-32386154

RESUMO

Haze interferes the transmission of scene radiation and significantly degrades color and details of outdoor images. Existing deep neural networks-based image dehazing algorithms usually use some common networks. The network design does not model the image formation of haze process well, which accordingly leads to dehazed images containing artifacts and haze residuals in some special scenes. In this paper, we propose a task-oriented network for image dehazing, where the network design is motivated by the image formation of haze process. The task-oriented network involves a hybrid network containing an encoder and decoder network and a spatially variant recurrent neural network which is derived from the hazy process. In addition, we develop a multi-stage dehazing algorithm to further improve the accuracy by filtering haze residuals in a step-bystep fashion. To constrain the proposed network, we develop a dual composition loss, content-based pixel-wise loss and total variation constraint. We train the proposed network in an end-to-end manner and analyze its effect on image dehazing. Experimental results demonstrate that the proposed algorithm achieves favorable performance against state-of-the-art dehazing methods.

16.
IEEE Trans Neural Netw Learn Syst ; 30(12): 3818-3832, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31725389

RESUMO

Of late, there are many studies on the robust discriminant analysis, which adopt L1-norm as the distance metric, but their results are not robust enough to gain universal acceptance. To overcome this problem, the authors of this article present a nonpeaked discriminant analysis (NPDA) technique, in which cutting L1-norm is adopted as the distance metric. As this kind of norm can better eliminate heavy outliers in learning models, the proposed algorithm is expected to be stronger in performing feature extraction tasks for data representation than the existing robust discriminant analysis techniques, which are based on the L1-norm distance metric. The authors also present a comprehensive analysis to show that cutting L1-norm distance can be computed equally well, using the difference between two special convex functions. Against this background, an efficient iterative algorithm is designed for the optimization of the proposed objective. Theoretical proofs on the convergence of the algorithm are also presented. Theoretical insights and effectiveness of the proposed method are validated by experimental tests on several real data sets.

17.
IEEE Trans Pattern Anal Mach Intell ; 41(8): 2027-2034, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-30908192

RESUMO

Image retagging aims to improve the tag quality of social images by completing the missing tags, rectifying the noise-corrupted tags, and assigning new high-quality tags. Recent approaches simultaneously explore visual, user and tag information to improve the performance of image retagging by mining the tag-image-user associations. However, such methods will become computationally infeasible with the rapidly increasing number of images, tags and users. It has been proven that the anchor graph can significantly accelerate large-scale graph-based learning by exploring only a small number of anchor points. Inspired by this, we propose a novel Social anchor-Unit GrAph Regularized Tensor Completion (SUGAR-TC) method to efficiently refine the tags of social images, which is insensitive to the scale of data. First, we construct an anchor-unit graph across multiple domains (e.g., image and user domains) rather than traditional anchor graph in a single domain. Second, a tensor completion based on Social anchor-Unit GrAph Regularization (SUGAR) is implemented to refine the tags of the anchor images. Finally, we efficiently assign tags to non-anchor images by leveraging the relationship between the non-anchor units and the anchor units. Experimental results on a real-world social image database well demonstrate the effectiveness and efficiency of SUGAR-TC, outperforming the state-of-the-art methods.

18.
IEEE Trans Neural Netw Learn Syst ; 30(5): 1429-1440, 2019 May.
Artigo em Inglês | MEDLINE | ID: mdl-30281496

RESUMO

Cross-modal hashing has attracted increasing research attention due to its efficiency for large-scale multimedia retrieval. With simultaneous feature representation and hash function learning, deep cross-modal hashing (DCMH) methods have shown superior performance. However, most existing methods on DCMH adopt binary quantization functions (e.g., [Formula: see text]) to generate hash codes, which limit the retrieval performance since binary quantization functions are sensitive to the variations of numeric values. Toward this end, we propose a novel end-to-end ranking-based hashing framework, in this paper, termed as deep semantic-preserving ordinal hashing (DSPOH), to learn hash functions with deep neural networks by exploring the ranking structure of feature dimensions. In DSPOH, the ordinal representation, which encodes the relative rank ordering of feature dimensions, is explored to generate hash codes. Such ordinal embedding benefits from the numeric stability of rank correlation measures. To make the hash codes discriminative, the ordinal representation is expected to well predict the class labels so that the ranking-based hash function learning is optimally compatible with the label predicting. Meanwhile, the intermodality similarity is preserved to guarantee that the hash codes of different modalities are consistent. Importantly, DSPOH can be effectively integrated with different types of network architectures, which demonstrates the flexibility and scalability of our proposed hashing framework. Extensive experiments on three widely used multimodal data sets show that DSPOH outperforms state of the art for cross-modal retrieval tasks.

19.
IEEE Trans Image Process ; 28(5): 2173-2186, 2019 May.
Artigo em Inglês | MEDLINE | ID: mdl-30507504

RESUMO

Hashing has attracted increasing research attention in recent years due to its high efficiency of computation and storage in image retrieval. Recent works have demonstrated the superiority of simultaneous feature representations and hash functions learning with deep neural networks. However, most existing deep hashing methods directly learn the hash functions by encoding the global semantic information, while ignoring the local spatial information of images. The loss of local spatial structure makes the performance bottleneck of hash functions, therefore limiting its application for accurate similarity retrieval. In this paper, we propose a novel deep ordinal hashing (DOH) method, which learns ordinal representations to generate ranking-based hash codes by leveraging the ranking structure of feature space from both local and global views. In particular, to effectively build the ranking structure, we propose to learn the rank correlation space by exploiting the local spatial information from fully convolutional network and the global semantic information from the convolutional neural network simultaneously. More specifically, an effective spatial attention model is designed to capture the local spatial information by selectively learning well-specified locations closely related to target objects. In such hashing framework, the local spatial and global semantic nature of images is captured in an end-to-end ranking-to-hashing manner. Experimental results conducted on three widely used datasets demonstrate that the proposed DOH method significantly outperforms the state-of-the-art hashing methods.

20.
IEEE Trans Pattern Anal Mach Intell ; 41(9): 2070-2083, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-29994391

RESUMO

In this work, we investigate the problem of learning knowledge from the massive community-contributed images with rich weakly-supervised context information, which can benefit multiple image understanding tasks simultaneously, such as social image tag refinement and assignment, content-based image retrieval, tag-based image retrieval and tag expansion. Towards this end, we propose a Deep Collaborative Embedding (DCE) model to uncover a unified latent space for images and tags. The proposed method incorporates the end-to-end learning and collaborative factor analysis in one unified framework for the optimal compatibility of representation learning and latent space discovery. A nonnegative and discrete refined tagging matrix is learned to guide the end-to-end learning. To collaboratively explore the rich context information of social images, the proposed method integrates the weakly-supervised image-tag correlation, image correlation and tag correlation simultaneously and seamlessly. The proposed model is also extended to embed new tags in the uncovered space. To verify the effectiveness of the proposed method, extensive experiments are conducted on two widely-used social image benchmarks for multiple social image understanding tasks. The encouraging performance of the proposed method over the state-of-the-art approaches demonstrates its superiority.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...