Video domain adaptation for semantic segmentation using perceptual consistency matching.

Ullah, Ihsan; An, Sion; Kang, Myeongkyun; Chikontwe, Philip; Lee, Hyunki; Choi, Jinwoo; Park, Sang Hyun

Ullah, Ihsan; An, Sion; Kang, Myeongkyun; Chikontwe, Philip; Lee, Hyunki; Choi, Jinwoo; Park, Sang Hyun.

Afiliação

Ullah I; Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea; Division of Intelligent Robotics, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea.
An S; Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea.
Kang M; Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea.
Chikontwe P; Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea.
Lee H; Division of Intelligent Robotics, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea.
Choi J; Department of Computer Science and Engineering, Kyung Hee University, Yongin, South Korea.
Park SH; Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea. Electronic address: shpark13135@dgist.ac.kr.

Neural Netw ; 179: 106505, 2024 Nov.

Article em En | MEDLINE | ID: mdl-39002205

ABSTRACT

ABSTRACT

Unsupervised domain adaptation (UDA) aims to transfer knowledge in previous and related labeled datasets (sources) to a new unlabeled dataset (target). Despite the impressive performance, existing approaches have largely focused on image-based UDA only, and video-based UDA has been relatively understudied and received less attention due to the difficulty of adapting diverse modal video features and modeling temporal associations efficiently. To address this, existing studies use optical flow to capture motion cues between in-domain consecutive frames, but is limited by heavy compute requirements and modeling flow patterns across diverse domains is equally challenging. In this work, we propose an adversarial domain adaptation approach for video semantic segmentation that aims to align temporally associated pixels in successive source and target domain frames without relying on optical flow. Specifically, we introduce a Perceptual Consistency Matching (PCM) strategy that leverages perceptual similarity to identify pixels with high correlation across consecutive frames, and infer that such pixels should correspond to the same class. Therefore, we can enhance prediction accuracy for video-UDA by enforcing consistency not only between in-domain frames, but across domains using PCM objectives during model training. Extensive experiments on public datasets show the benefit of our approach over existing state-of-the-art UDA methods. Our approach not only addresses a crucial task in video domain adaptation but also offers notable improvements in performance with faster inference times.

Assuntos

Semântica; Gravação em Vídeo; Humanos; Processamento de Imagem Assistida por Computador/métodos; Redes Neurais de Computação; Aprendizado de Máquina não Supervisionado; Algoritmos

Palavras-chave

Consistency matching; Semantic segmentation; Unsupervised domain adaptation; Video domain adaptation

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Semântica / Gravação em Vídeo Limite: Humans Idioma: En Revista: Neural Netw / Neural netw / Neural networks Assunto da revista: NEUROLOGIA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Coréia do Sul

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google