UNIMEMnet: Learning long-term motion and appearance dynamics for video prediction with a unified memory network.

Dai, Kuai; Li, Xutao; Luo, Chuyao; Chen, Wuqiao; Ye, Yunming; Feng, Shanshan

Dai, Kuai; Li, Xutao; Luo, Chuyao; Chen, Wuqiao; Ye, Yunming; Feng, Shanshan.

Afiliación

Dai K; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, Guangdong, China.
Li X; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, Guangdong, China. Electronic address: lixutao@hit.edu.cn.
Luo C; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, Guangdong, China.
Chen W; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, Guangdong, China.
Ye Y; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, Guangdong, China.
Feng S; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, Guangdong, China.

Neural Netw ; 168: 256-271, 2023 Nov.

Article en En | MEDLINE | ID: mdl-37774512

ABSTRACT

ABSTRACT

As a pixel-wise dense forecast task, video prediction is challenging due to its high computation complexity, dramatic future uncertainty, and extremely complicated spatial-temporal patterns. Many deep learning methods are proposed for the task, which bring up significant improvements. However, they focus on modeling short-term spatial-temporal dynamics and fail to sufficiently exploit long-term ones. As a result, the methods tend to deliver unsatisfactory performance for a long-term forecast requirement. In this article, we propose a novel unified memory network (UNIMEMnet) for long-term video prediction, which can effectively exploit long-term motion-appearance dynamics and unify the short-term spatial-temporal dynamics and long-term ones in an architecture. In the UNIMEMnet, a dual branch multi-scale memory module is carefully designed to extract and preserve long-term spatial-temporal patterns. In addition, a short-term spatial-temporal dynamics module and an alignment and fusion module are devised to capture and coordinate short-term motion-appearance dynamics with long-term ones from our designed memory module. Extensive experiments on five video prediction datasets from both synthetic and real-world scenarios are conducted, which validate the effectiveness and superiority of our proposed method UNIMEMnet over state-of-the-art methods.

Asunto(s)

Movimiento (Física); Incertidumbre

Palabras clave

Long-term motion-appearance dynamics; Memory network; Short-term motion-appearance dynamics; Video prediction

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Movimiento (Física) Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Neural Netw Asunto de la revista: NEUROLOGIA Año: 2023 Tipo del documento: Article País de afiliación: China

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google