Your browser doesn't support javascript.
loading
Hierarchical matching and reasoning for multi-query image retrieval.
Ji, Zhong; Li, Zhihao; Zhang, Yan; Wang, Haoran; Pang, Yanwei; Li, Xuelong.
Afiliação
  • Ji Z; School of Electrical and Information Engineering, Tianjin Key Laboratory of Brain-inspired Intelligence Technology, Tianjin University, Tianjin, 300072, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China. Electronic address: jizhong@tju.edu.cn.
  • Li Z; School of Electrical and Information Engineering, Tianjin Key Laboratory of Brain-inspired Intelligence Technology, Tianjin University, Tianjin, 300072, China. Electronic address: zh_li@tju.edu.cn.
  • Zhang Y; School of Electrical and Information Engineering, Tianjin Key Laboratory of Brain-inspired Intelligence Technology, Tianjin University, Tianjin, 300072, China. Electronic address: yzhang1995@tju.edu.cn.
  • Wang H; Baidu Research, Beijing, 100193, China. Electronic address: wanghaoran09@baidu.com.
  • Pang Y; School of Electrical and Information Engineering, Tianjin Key Laboratory of Brain-inspired Intelligence Technology, Tianjin University, Tianjin, 300072, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China. Electronic address: pyw@tju.edu.cn.
  • Li X; School of Artificial Intelligence, OPtics and ElectroNics (iOPEN) and the Key Laboratory of Intelligent Interaction and Applications, Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi'an, 710072, China. Electronic address: li@nwpu.edu.cn.
Neural Netw ; 173: 106200, 2024 May.
Article em En | MEDLINE | ID: mdl-38422836
ABSTRACT
As a promising field, Multi-Query Image Retrieval (MQIR) aims at searching for the semantically relevant image given multiple region-specific text queries. Existing works mainly focus on a single-level similarity between image regions and text queries, which neglect the hierarchical guidance of multi-level similarities and result in incomplete alignments. Besides, the high-level semantic correlations that intrinsically connect different region-query pairs are rarely considered. To address above limitations, we propose a novel Hierarchical Matching and Reasoning Network (HMRN) for MQIR. It disentangles MQIR into three hierarchical semantic representations, which is responsible to capture fine-grained local details, contextual global scopes, and high-level inherent correlations. HMRN consists of two modules Scalar-based Matching (SM) module and Vector-based Reasoning (VR) module. Specifically, the SM module characterizes the multi-level alignment similarity, which consists of a fine-grained local-level similarity and a context-aware global-level similarity. Afterwards, the VR module is developed to excavate the potential semantic correlations among multiple region-query pairs, which further explores the high-level reasoning similarity. Finally, these three-level similarities are aggregated into a joint similarity space to form the ultimate similarity. Extensive experiments on the benchmark dataset demonstrate that our HMRN substantially surpasses the current state-of-the-art methods. For instance, compared with the existing best method Drill-down, the metric R@1 in the last round is improved by 23.4%. Our source codes will be released at https//github.com/LZH-053/HMRN.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Resolução de Problemas / Benchmarking Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Resolução de Problemas / Benchmarking Idioma: En Ano de publicação: 2024 Tipo de documento: Article