RESUMEN
Accuracy and speed are the most important indexes for evaluating many object tracking algorithms. However, when constructing a deep fully convolutional neural network (CNN), the use of deep network feature tracking will cause tracking drift due to the effects of convolution padding, receptive field (RF), and overall network step size. The speed of the tracker will also decrease. This article proposes a fully convolutional siamese network object tracking algorithm that combines the attention mechanism with the feature pyramid network (FPN), and uses heterogeneous convolution kernels to reduce the amount of calculations (FLOPs) and parameters. The tracker first uses a new fully CNN to extract image features, and introduces a channel attention mechanism in the feature extraction process to improve the representation ability of convolutional features. Then use the FPN to fuse the convolutional features of high and low layers, learn the similarity of the fused features, and train the fully CNNs. Finally, the heterogeneous convolutional kernel is used to replace the standard convolution kernel to improve the speed of the algorithm, thereby making up for the efficiency loss caused by the feature pyramid model. In this article, the tracker is experimentally verified and analyzed on the VOT-2017, VOT-2018, OTB-2013, and OTB-2015 datasets. The results show that our tracker has achieved better results than the state-of-the-art trackers.