Coarse-to-Fine Semantic Alignment for Cross-Modal Moment Localization.

Hu, Yupeng; Nie, Liqiang; Liu, Meng; Wang, Kun; Wang, Yinglong; Hua, Xian-Sheng

Hu, Yupeng; Nie, Liqiang; Liu, Meng; Wang, Kun; Wang, Yinglong; Hua, Xian-Sheng.

IEEE Trans Image Process ; 30: 5933-5943, 2021.

Article em En | MEDLINE | ID: mdl-34166192

ABSTRACT

ABSTRACT

Video moment localization, as an important branch of video content analysis, has attracted extensive attention in recent years. However, it is still in its infancy due to the following challenges cross-modal semantic alignment and localization efficiency. To address these impediments, we present a cross-modal semantic alignment network. To be specific, we first design a video encoder to generate moment candidates, learn their representations, as well as model their semantic relevance. Meanwhile, we design a query encoder for diverse query intention understanding. Thereafter, we introduce a multi-granularity interaction module to deeply explore the semantic correlation between multi-modalities. Thereby, we can effectively complete target moment localization via sufficient cross-modal semantic understanding. Moreover, we introduce a semantic pruning strategy to reduce cross-modal retrieval overhead, improving localization efficiency. Experimental results on two benchmark datasets have justified the superiority of our model over several state-of-the-art competitors.

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: IEEE Trans Image Process Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google