CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval.

Li, Yewen; Ge, Mingyuan; Li, Mingyong; Li, Tiansong; Xiang, Sen

Li, Yewen; Ge, Mingyuan; Li, Mingyong; Li, Tiansong; Xiang, Sen.

Afiliación

Li Y; School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China.
Ge M; School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China.
Li M; School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China.
Li T; School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China.
Xiang S; School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan 430081, China.

Sensors (Basel) ; 23(7)2023 Mar 24.

Article en En | MEDLINE | ID: mdl-37050499

RESUMEN

With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1) As existing methods cannot fully capture the complementary and co-occurrence information of multi-modal data, existing methods suffer from inaccurate similarity measures. (2) Existing methods suffer from unbalanced multi-modal learning and data semantic structure being corrupted in the process of hash codes binarization. To address these obstacles, we devise an effective CLIP-based Adaptive Graph Attention Network (CAGAN) for large-scale unsupervised multi-modal hashing retrieval. Firstly, we use the multi-modal model CLIP to extract fine-grained semantic features, mine similar information from different perspectives of multi-modal data and perform similarity fusion and enhancement. In addition, this paper proposes an adaptive graph attention network to assist the learning of hash codes, which uses an attention mechanism to learn adaptive graph similarity across modalities. It further aggregates the intrinsic neighborhood information of neighboring data nodes through a graph convolutional network to generate more discriminative hash codes. Finally, this paper employs an iterative approximate optimization strategy to mitigate the information loss in the binarization process. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms several representative hashing methods in unsupervised multi-modal retrieval tasks.

Palabras clave

attention mechanism; deep hashing; graph convolutional networks; multi-modal retrieval; unsupervised learning

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Sensors (Basel) Año: 2023 Tipo del documento: Article País de afiliación: China

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google