RESUMO
Contextual information and the dependencies between dimensions is vital in image semantic segmentation. In this paper, we propose a multiple-attention mechanism network (MANet) for semantic segmentation in a very effective and efficient way. Concretely, the contributions are as follows: (1) a novel dual-attention mechanism for capturing feature dependencies in spatial and channel dimensions, where the adjacent position attention captures the dependencies between pixels well; (2) a new cross-dimensional interactive attention feature fusion module, which strengthens the fusion of fine location structure information in low-level features and category semantic information in high-level features. We conduct extensive experiments on semantic segmentation benchmarks including PASCAL VOC 2012 and Cityscapes datasets. Our MANet achieves the mIoU scores of 75.5% and 72.8% on PASCAL VOC 2012 and Cityscapes datasets, respectively. The effectiveness of the network is higher than the previous popular semantic segmentation networks under the same conditions.
Assuntos
Redes Neurais de Computação , Compostos Orgânicos Voláteis , Processamento de Imagem Assistida por Computador/métodos , SemânticaRESUMO
Object detection is one of the key tasks in an automatic driving system. Aiming to solve the problem of object detection, which cannot meet the detection speed and detection accuracy at the same time, a real-time object detection algorithm (MobileYOLO) is proposed based on YOLOv4. Firstly, the feature extraction network is replaced by introducing the MobileNetv2 network to reduce the number of model parameters; then, part of the standard convolution is replaced by depthwise separable convolution in PAnet and the head network to further reduce the number of model parameters. Finally, by introducing an improved lightweight channel attention modul-Efficient Channel Attention (ECA)-to improve the feature expression ability during feature fusion. The Single-Stage Headless (SSH) context module is introduced to the small object detection branch to increase the receptive field. The experimental results show that the improved algorithm has an accuracy rate of 90.7% on the KITTI data set. Compared with YOLOv4, the parameters of the proposed MobileYOLO model are reduced by 52.11 M, the model size is reduced to one-fifth, and the detection speed is increased by 70%.
Assuntos
Algoritmos , Condução de Veículo , Projetos de PesquisaRESUMO
This article presents a new perspective from control theory to interpret and solve the instability and mode collapse problems of generative adversarial networks (GANs). The dynamics of GANs are parameterized in the function space and control directed methods are applied to investigate GANs. First, the linear control theory is utilized to analyze and understand GANs. It is proved that the stability depends only on control parameters. Second, a proportional-integral-derivative (PID) controller is designed to improve its stability. GANs can be controlled to adaptively generate images by an overshoot rate that is only related to the PID control parameters. Third, a new PIDGAN is derived with a theoretical guarantee of stability. Fourth, to exploit the nonlinear characteristics of GANs, the nonlinear control theory is applied to further analyze GANs and develop a feedback linearization control-based PIDGAN named NPIDGAN. Both PIDGAN and NPIDGAN not only improve stability but also prevent mode collapse. With five datasets covering a wide variety of image domains, the proposed models achieve superior performance with 1024×1024 resolution compared with the state-of-the-art GANs, even when data are limited.