Weakly supervised temporal action localization with actionness-guided false positive suppression.

Li, Zhilin; Wang, Zilei; Liu, Qinying

Li, Zhilin; Wang, Zilei; Liu, Qinying.

Afiliação

Li Z; National Engineering Laboratory for Brain-inspired Intelligence Technology and Application (NEL-BITA), University of Science and Technology of China, Hefei, 230026, China. Electronic address: lizhilin@mail.ustc.edu.cn.
Wang Z; National Engineering Laboratory for Brain-inspired Intelligence Technology and Application (NEL-BITA), University of Science and Technology of China, Hefei, 230026, China. Electronic address: zlwang@ustc.edu.cn.
Liu Q; National Engineering Laboratory for Brain-inspired Intelligence Technology and Application (NEL-BITA), University of Science and Technology of China, Hefei, 230026, China. Electronic address: lydyc@mail.ustc.edu.cn.

Neural Netw ; 175: 106307, 2024 Jul.

Article em En | MEDLINE | ID: mdl-38626617

ABSTRACT

ABSTRACT

Weakly supervised temporal action localization aims to locate the temporal boundaries of action instances in untrimmed videos using video-level labels and assign them the corresponding action category. Generally, it is solved by a pipeline called "localization-by-classification", which finds the action instances by classifying video snippets. However, since this approach optimizes the video-level classification objective, the generated activation sequences often suffer interference from class-related scenes, resulting in a large number of false positives in the prediction results. Many existing works treat background as an independent category, forcing models to learn to distinguish background snippets. However, under weakly supervised conditions, the background information is fuzzy and uncertain, making this method extremely difficult. To alleviate the impact of false positives, we propose a new actionness-guided false positive suppression framework. Our method seeks to suppress false positive backgrounds without introducing the background category. Firstly, we propose a self-training actionness branch to learn class-agnostic actionness, which can minimize the interference of class-related scene information by ignoring the video labels. Secondly, we propose a false positive suppression module to mine false positive snippets and suppress them. Finally, we introduce the foreground enhancement module, which guides the model to learn the foreground with the help of the attention mechanism as well as class-agnostic actionness. We conduct extensive experiments on three benchmarks (THUMOS14, ActivityNet1.2, and ActivityNet1.3). The results demonstrate the effectiveness of our method in suppressing false positives and it achieves the state-of-the-art performance. Code https//github.com/lizhilin-ustc/AFPS.

Assuntos

Gravação em Vídeo; Humanos; Redes Neurais de Computação; Aprendizado de Máquina Supervisionado; Algoritmos

Palavras-chave

Action recognition; False positive suppression; Self-training; Temporal action localization; Weakly supervised learning

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Gravação em Vídeo Limite: Humans Idioma: En Revista: Neural Netw Assunto da revista: NEUROLOGIA Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google