Search | VHL Regional Portal

Learning feature relationships in CNN model via relational embedding convolution layer.

Xiong, Shengzhou; Tan, Yihua; Wang, Guoyou; Yan, Pei; Xiang, Xuanyu.

Neural Netw ; 179: 106510, 2024 Nov.

Article in English | MEDLINE | ID: mdl-39024707

ABSTRACT

Establishing the relationships among hierarchical visual attributes of objects in the visual world is crucial for human cognition. The classic convolution neural network (CNN) can successfully extract hierarchical features but ignore the relationships among features, resulting in shortcomings compared to humans in areas like interpretability and domain generalization. Recently, algorithms have introduced feature relationships by external prior knowledge and special auxiliary modules, which have been proven to bring multiple improvements in many computer vision tasks. However, prior knowledge is often difficult to obtain, and auxiliary modules bring additional consumption of computing and storage resources, which limits the flexibility and practicality of the algorithm. In this paper, we aim to drive the CNN model to learn the relationships among hierarchical deep features without prior knowledge and consumption increasing, while enhancing the fundamental performance of some aspects. Firstly, the task of learning the relationships among hierarchical features in CNN is defined and three key problems related to this task are pointed out, including the quantitative metric of connection intensity, the threshold of useless connections, and the updating strategy of relation graph. Secondly, Relational Embedding Convolution (RE-Conv) layer is proposed for the representation of feature relationships in convolution layer, followed by a scheme called use & disuse strategy which aims to address the three problems of feature relation learning. Finally, the improvements brought by the proposed feature relation learning scheme have been demonstrated through numerous experiments, including interpretability, domain generalization, noise robustness, and inference efficiency. In particular, the proposed scheme outperforms many state-of-the-art methods in the domain generalization community and can be seamlessly integrated with existing methods for further improvement. Meanwhile, it maintains comparable precision to the original CNN model while reducing floating point operations (FLOPs) by approximately 50%.

Subject(s)

Algorithms , Neural Networks, Computer , Humans , Deep Learning

Learning to Holistically Detect Bridges From Large-Size VHR Remote Sensing Imagery.

Li, Yansheng; Luo, Junwei; Zhang, Yongjun; Tan, Yihua; Yu, Jin-Gang; Bai, Song.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Apr 29.

Article in English | MEDLINE | ID: mdl-38683714

ABSTRACT

Bridge detection in remote sensing images (RSIs) plays a crucial role in various applications, but it poses unique challenges compared to the detection of other objects. In RSIs, bridges exhibit considerable variations in terms of their spatial scales and aspect ratios. Therefore, to ensure the visibility and integrity of bridges, it is essential to perform holistic bridge detection in large-size very-high-resolution (VHR) RSIs. However, the lack of datasets with large-size VHR RSIs limits the deep learning algorithms' performance on bridge detection. Due to the limitation of GPU memory in tackling large-size images, deep learning-based object detection methods commonly adopt the cropping strategy, which inevitably results in label fragmentation and discontinuous prediction. To ameliorate the scarcity of datasets, this paper proposes a large-scale dataset named GLH-Bridge comprising 6,000 VHR RSIs sampled from diverse geographic locations across the globe. These images encompass a wide range of sizes, varying from 2,048 × 2,048 to 16,384 × 16,384 pixels, and collectively feature 59,737 bridges. These bridges span diverse backgrounds, and each of them has been manually annotated, using both an oriented bounding box (OBB) and a horizontal bounding box (HBB). Furthermore, we present an efficient network for holistic bridge detection (HBD-Net) in large-size RSIs. The HBD-Net presents a separate detector-based feature fusion (SDFF) architecture and is optimized via a shape-sensitive sample re-weighting (SSRW) strategy. The SDFF architecture performs inter-layer feature fusion (IFF) to incorporate multi-scale context in the dynamic image pyramid (DIP) of the large-size image, and the SSRW strategy is employed to ensure an equitable balance in the regression weight of bridges with various aspect ratios. Based on the proposed GLH-Bridge dataset, we establish a bridge detection benchmark including the OBB and HBB tasks, and validate the effectiveness of the proposed HBD-Net. Additionally, cross-dataset generalization experiments on two publicly available datasets illustrate the strong generalization capability of the GLH-Bridge dataset. The dataset and source code will be released at https://luo-z13.github.io/GLH-Bridge-page/.

NCA-Net for Tracking Multiple Objects across Multiple Cameras.

Tan, Yihua; Tai, Yuan; Xiong, Shengzhou.

Sensors (Basel) ; 18(10)2018 Oct 11.

Article in English | MEDLINE | ID: mdl-30314285

ABSTRACT

Tracking multiple pedestrians across multi-camera scenarios is an important part of intelligent video surveillance and has great potential application for public security, which has been an attractive topic in the literature in recent years. In most previous methods, artificial features such as color histograms, HOG descriptors and Haar-like feature were adopted to associate objects among different cameras. But there are still many challenges caused by low resolution, variation of illumination, complex background and posture change. In this paper, a feature extraction network named NCA-Net is designed to improve the performance of multiple objects tracking across multiple cameras by avoiding the problem of insufficient robustness caused by hand-crafted features. The network combines features learning and metric learning via a Convolutional Neural Network (CNN) model and the loss function similar to neighborhood components analysis (NCA). The loss function is adapted from the probability loss of NCA aiming at object tracking. The experiments conducted on the NLPR_MCT dataset show that we obtain satisfactory results even with a simple matching operation. In addition, we embed the proposed NCA-Net with two existing tracking systems. The experimental results on the corresponding datasets demonstrate that the extracted features using NCA-net can effectively make improvement on the tracking performance.

Cauchy graph embedding based diffusion model for salient object detection.

Tan, Yihua; Li, Yansheng; Chen, Chen; Yu, Jin-Gang; Tian, Jinwen.

J Opt Soc Am A Opt Image Sci Vis ; 33(5): 887-98, 2016 05 01.

Article in English | MEDLINE | ID: mdl-27140886

ABSTRACT

Salient object detection has been a rather hot research topic recently, due to its potential applications in image compression, scene classification, image registration, and so forth. The overwhelming majority of existing computational models are designed based on computer vision techniques by using lots of image cues and priors. Actually, salient object detection is derived from the biological perceptual mechanism, and biological evidence shows that the spread of the spatial attention generates the object attention. Inspired by this, we attempt to utilize the emerging spread mechanism of object attention to construct a new computational model. A novel Cauchy graph embedding based diffusion (CGED) model is proposed to fulfill the spread process. Combining the diffusion model and attention prediction model, a salient object detection approach is presented through perceptually grouping the multiscale diffused attention maps. The effectiveness of the proposed approach is validated on the salient object dataset. The experimental results show that the CGED process can obviously improve the performance of salient object detection compared with the input spatial attention map, and the proposed approach can achieve performance comparable to that of state-of-the-art approaches.

Subject(s)

Diagnostic Imaging/methods , Pattern Recognition, Automated/methods , Algorithms , Animals , Computer Simulation , Diffusion , Humans , Image Processing, Computer-Assisted/methods , Medical Informatics , Models, Statistical , Optics and Photonics

Aircraft Detection in High-Resolution SAR Images Based on a Gradient Textural Saliency Map.

Tan, Yihua; Li, Qingyun; Li, Yansheng; Tian, Jinwen.

Sensors (Basel) ; 15(9): 23071-94, 2015 Sep 11.

Article in English | MEDLINE | ID: mdl-26378543

ABSTRACT

This paper proposes a new automatic and adaptive aircraft target detection algorithm in high-resolution synthetic aperture radar (SAR) images of airport. The proposed method is based on gradient textural saliency map under the contextual cues of apron area. Firstly, the candidate regions with the possible existence of airport are detected from the apron area. Secondly, directional local gradient distribution detector is used to obtain a gradient textural saliency map in the favor of the candidate regions. In addition, the final targets will be detected by segmenting the saliency map using CFAR-type algorithm. The real high-resolution airborne SAR image data is used to verify the proposed algorithm. The results demonstrate that this algorithm can detect aircraft targets quickly and accurately, and decrease the false alarm rate.

Maximal entropy random walk for region-based visual saliency.

Yu, Jin-Gang; Zhao, Ji; Tian, Jinwen; Tan, Yihua.

IEEE Trans Cybern ; 44(9): 1661-72, 2014 Sep.

Article in English | MEDLINE | ID: mdl-25137693

ABSTRACT

Visual saliency is attracting more and more research attention since it is beneficial to many computer vision applications. In this paper, we propose a novel bottom-up saliency model for detecting salient objects in natural images. First, inspired by the recent advance in the realm of statistical thermodynamics, we adopt a novel mathematical model, namely, the maximal entropy random walk (MERW) to measure saliency. We analyze the rationality and superiority of MERW for modeling visual saliency. Then, based on the MERW model, we establish a generic framework for saliency detection. Different from the vast majority of existing saliency models, our method is built on a purely region-based strategy, which is able to yield high-resolution saliency maps with well preserved object shapes and uniformly highlighted salient regions. In the proposed framework, the input image is first over-segmented into superpixels, which are taken as the primary units for subsequent procedures, and regional features are extracted. Then, saliency is measured according to two principles, i.e., uniqueness and visual organization, both implemented in a unified approach, i.e., the MERW model based on graph representation. Intensive experimental results on publicly available datasets demonstrate that our method outperforms the state-of-the-art saliency models.

Biologically inspired multilevel approach for multiple moving targets detection from airborne forward-looking infrared sequences.

Li, Yansheng; Tan, Yihua; Li, Hang; Li, Tao; Tian, Jinwen.

J Opt Soc Am A Opt Image Sci Vis ; 31(4): 734-44, 2014 Apr 01.

Article in English | MEDLINE | ID: mdl-24695135

ABSTRACT

In this paper, a biologically inspired multilevel approach for simultaneously detecting multiple independently moving targets from airborne forward-looking infrared (FLIR) sequences is proposed. Due to the moving platform, low contrast infrared images, and nonrepeatability of the target signature, moving targets detection from FLIR sequences is still an open problem. Avoiding six parameter affine or eight parameter planar projective transformation matrix estimation of two adjacent frames, which are utilized by existing moving targets detection approaches to cope with the moving infrared camera and have become the bottleneck for the further elevation of the moving targets detection performance, the proposed moving targets detection approach comprises three sequential modules: motion perception for efficiently extracting motion cues, attended motion views extraction for coarsely localizing moving targets, and appearance perception in the local attended motion views for accurately detecting moving targets. Experimental results demonstrate that the proposed approach is efficient and outperforms the compared state-of-the-art approaches.

Subject(s)

Air , Biomimetics/methods , Infrared Rays , Motion , Optical Imaging/methods , Motion Perception

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL