Enhancing Query Formulation for Universal Image Segmentation.

Qu, Yipeng; Kim, Joohee

Qu, Yipeng; Kim, Joohee.

Afiliación

Qu Y; Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA.
Kim J; Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA.

Sensors (Basel) ; 24(6)2024 Mar 14.

Article en En | MEDLINE | ID: mdl-38544142

ABSTRACT

ABSTRACT

Recent advancements in image segmentation have been notably driven by Vision Transformers. These transformer-based models offer one versatile network structure capable of handling a variety of segmentation tasks. Despite their effectiveness, the pursuit of enhanced capabilities often leads to more intricate architectures and greater computational demands. OneFormer has responded to these challenges by introducing a query-text contrastive learning strategy active during training only. However, this approach has not completely addressed the inefficiency issues in text generation and the contrastive loss computation. To solve these problems, we introduce Efficient Query Optimizer (EQO), an approach that efficiently utilizes multi-modal data to refine query optimization in image segmentation. Our strategy significantly reduces the complexity of parameters and computations by distilling inter-class and inter-task information from an image into a single template sentence. Furthermore, we propose a novel attention-based contrastive loss. It is designed to facilitate a one-to-many matching mechanism in the loss computation, which helps object queries learn more robust representations. Beyond merely reducing complexity, our model demonstrates superior performance compared to OneFormer across all three segmentation tasks using the Swin-T backbone. Our evaluations on the ADE20K dataset reveal that our model outperforms OneFormer in multiple metrics by 0.2% in mean Intersection over Union (mIoU), 0.6% in Average Precision (AP), and 0.8% in Panoptic Quality (PQ). These results highlight the efficacy of our model in advancing the field of image segmentation.

Palabras clave

computer vision; image segmentation; panoptic segmentation; semantic segmentation; transformer

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Bases de datos: MEDLINE Idioma: En Revista: Sensors (Basel) Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos