Búsqueda | OPS/OMS Uruguay

Understanding Whitening Loss in Self-supervised Learning.

Huang, Lei; Ni, Yunhao; Weng, Xi; Anwer, Rao Muhammad; Khan, Salman; Yang, Ming-Hsuan; Khan, Fahad Shahbaz.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Jun 21.

Artículo en Inglés | MEDLINE | ID: mdl-38905085

RESUMEN

A desirable objective in self-supervised learning (SSL) is to avoid feature collapse. Whitening loss guarantees collapse avoidance by minimizing the distance between embeddings of positive pairs under the conditioning that the embeddings from different views are whitened. In this paper, we propose a framework with an informative indicator to analyze whitening loss, which provides a clue to demystify several interesting phenomena and a pivoting point connecting to other SSL methods. We show that batch whitening (BW) based methods do not impose whitening constraints on the embedding but only require the embedding to be full-rank. This full-rank constraint is also sufficient to avoid dimensional collapse. We further demonstrate that the stable rank of the embedding is invariant during training by gradient descent, given the assumption that embedding is updated with an infinitely small learning rate. Based on our analysis, we propose channel whitening with random group partition (CW-RGP), which exploits the advantages of BW-based methods in preventing collapse and avoids their disadvantages requiring large batch size. Experimental results on ImageNet classification and COCO object detection reveal that the proposed CW-RGP possesses a promising potential for learning good representations.

SipMaskv2: Enhanced Fast Image and Video Instance Segmentation.

Cao, Jiale; Pang, Yanwei; Anwer, Rao Muhammad; Cholakkal, Hisham; Khan, Fahad Shahbaz; Shao, Ling.

IEEE Trans Pattern Anal Mach Intell ; 45(3): 3798-3812, 2023 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-37815954

RESUMEN

We propose a fast single-stage method for both image and video instance segmentation, called SipMask, that preserves the instance spatial information by performing multiple sub-region mask predictions. The main module in our method is a light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for the sub-regions within a bounding-box, enabling a better delineation of spatially adjacent instances. To better correlate mask prediction with object detection, we further propose a mask alignment weighting loss and a feature alignment scheme. In addition, we identify two issues that impede the performance of single-stage instance segmentation and introduce two modules, including a sample selection scheme and an instance refinement module, to address these two issues. Experiments are performed on both image instance segmentation dataset MS COCO and video instance segmentation dataset YouTube-VIS. On MS COCO test-dev set, our method achieves a state-of-the-art performance. In terms of real-time capabilities, it outperforms YOLACT by a gain of 3.0% (mask AP) under the similar settings, while operating at a comparable speed. On YouTube-VIS validation set, our method also achieves promising results. The source code is available at https://github.com/JialeCao001/SipMask.

Mask-Guided Attention Network and Occlusion-Sensitive Hard Example Mining for Occluded Pedestrian Detection.

Xie, Jin; Pang, Yanwei; Khan, Muhammad Haris; Anwer, Rao Muhammad; Khan, Fahad Shahbaz; Shao, Ling.

IEEE Trans Image Process ; 30: 3872-3884, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-33275581

RESUMEN

Pedestrian detection relying on deep convolution neural networks has made significant progress. Though promising results have been achieved on standard pedestrians, the performance on heavily occluded pedestrians remains far from satisfactory. The main culprits are intra-class occlusions involving other pedestrians and inter-class occlusions caused by other objects, such as cars and bicycles. These result in a multitude of occlusion patterns. We propose an approach for occluded pedestrian detection with the following contributions. First, we introduce a novel mask-guided attention network that fits naturally into popular pedestrian detection pipelines. Our attention network emphasizes on visible pedestrian regions while suppressing the occluded ones by modulating full body features. Second, we propose the occlusion-sensitive hard example mining method and occlusion-sensitive loss that mines hard samples according to the occlusion level and assigns higher weights to the detection errors occurring at highly occluded pedestrians. Third, we empirically demonstrate that weak box-based segmentation annotations provide reasonable approximation to their dense pixel-wise counterparts. Experiments are performed on CityPersons, Caltech and ETH datasets. Our approach sets a new state-of-the-art on all three datasets. Our approach obtains an absolute gain of 10.3% in log-average miss rate, compared with the best reported results on the heavily occluded HO pedestrian set of the CityPersons test set. Code and models are available at: https://github.com/Leotju/MGAN.

Asunto(s)

Minería de Datos/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la Computación , Peatones/clasificación , Humanos , Grabación en Video

Recognizing Actions Through Action-Specific Person Detection.

Khan, Fahad Shahbaz; Xu, Jiaolong; van de Weijer, Joost; Bagdanov, Andrew D; Anwer, Rao Muhammad; Lopez, Antonio M.

IEEE Trans Image Process ; 24(11): 4422-32, 2015 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-26259079

RESUMEN

Action recognition in still images is a challenging problem in computer vision. To facilitate comparative evaluation independently of person detection, the standard evaluation protocol for action recognition uses an oracle person detector to obtain perfect bounding box information at both training and test time. The assumption is that, in practice, a general person detector will provide candidate bounding boxes for action recognition. In this paper, we argue that this paradigm is suboptimal and that action class labels should already be considered during the detection stage. Motivated by the observation that body pose is strongly conditioned on action class, we show that: 1) the existing state-of-the-art generic person detectors are not adequate for proposing candidate bounding boxes for action classification; 2) due to limited training examples, the direct training of action-specific person detectors is also inadequate; and 3) using only a small number of labeled action examples, the transfer learning is able to adapt an existing detector to propose higher quality bounding boxes for subsequent action classification. To the best of our knowledge, we are the first to investigate transfer learning for the task of action-specific person detection in still images. We perform extensive experiments on two benchmark data sets: 1) Stanford-40 and 2) PASCAL VOC 2012. For the action detection task (i.e., both person localization and classification of the action performed), our approach outperforms methods based on general person detection by 5.7% mean average precision (MAP) on Stanford-40 and 2.1% MAP on PASCAL VOC 2012. Our approach also significantly outperforms the state of the art with a MAP of 45.4% on Stanford-40 and 31.4% on PASCAL VOC 2012. We also evaluate our action detection approach for the task of action classification (i.e., recognizing actions without localizing them). For this task, our approach, without using any ground-truth person localization at test time, outperforms on both data sets state-of-the-art methods, which do use person locations.

Semantic pyramids for gender and action recognition.

Khan, Fahad Shahbaz; van de Weijer, Joost; Anwer, Rao Muhammad; Felsberg, Michael; Gatta, Carlo.

IEEE Trans Image Process ; 23(8): 3633-45, 2014 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-24956369

RESUMEN

Person description is a challenging problem in computer vision. We investigated two major aspects of person description: 1) gender and 2) action recognition in still images. Most state-of-the-art approaches for gender and action recognition rely on the description of a single body part, such as face or full-body. However, relying on a single body part is suboptimal due to significant variations in scale, viewpoint, and pose in real-world images. This paper proposes a semantic pyramid approach for pose normalization. Our approach is fully automatic and based on combining information from full-body, upper-body, and face regions for gender and action recognition in still images. The proposed approach does not require any annotations for upper-body and face of a person. Instead, we rely on pretrained state-of-the-art upper-body and face detectors to automatically extract semantic information of a person. Given multiple bounding boxes from each body part detector, we then propose a simple method to select the best candidate bounding box, which is used for feature extraction. Finally, the extracted features from the full-body, upper-body, and face regions are combined into a single representation for classification. To validate the proposed approach for gender recognition, experiments are performed on three large data sets namely: 1) human attribute; 2) head-shoulder; and 3) proxemics. For action recognition, we perform experiments on four data sets most used for benchmarking action recognition in still images: 1) Sports; 2) Willow; 3) PASCAL VOC 2010; and 4) Stanford-40. Our experiments clearly demonstrate that the proposed approach, despite its simplicity, outperforms state-of-the-art methods for gender and action recognition.

Asunto(s)

Actigrafía/métodos , Biometría/métodos , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Análisis para Determinación del Sexo/métodos , Imagen de Cuerpo Entero/métodos , Algoritmos , Inteligencia Artificial , Femenino , Humanos , Aumento de la Imagen/métodos , Masculino , Reproducibilidad de los Resultados , Semántica , Sensibilidad y Especificidad

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA