Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 53
Filter
Add more filters

Publication year range
1.
Sensors (Basel) ; 24(8)2024 Apr 21.
Article in English | MEDLINE | ID: mdl-38676267

ABSTRACT

The rapid increase in the number of vehicles has led to increasing traffic congestion, traffic accidents, and motor vehicle crime rates. The management of various parking lots has also become increasingly challenging. Vehicle-type recognition technology can reduce the workload of humans in vehicle management operations. Therefore, the application of image technology for vehicle-type recognition is of great significance for integrated traffic management. In this paper, an improved faster region with convolutional neural network features (Faster R-CNN) model was proposed for vehicle-type recognition. Firstly, the output features of different convolution layers were combined to improve the recognition accuracy. Then, the average precision (AP) of the recognition model was improved through the contextual features of the original image and the object bounding box optimization strategy. Finally, the comparison experiment used the vehicle image dataset of three vehicle types, including cars, sports utility vehicles (SUVs), and vans. The experimental results show that the improved recognition model can effectively identify vehicle types in the images. The AP of the three vehicle types is 83.2%, 79.2%, and 78.4%, respectively, and the mean average precision (mAP) is 1.7% higher than that of the traditional Faster R-CNN model.

2.
Sensors (Basel) ; 24(2)2024 Jan 10.
Article in English | MEDLINE | ID: mdl-38257525

ABSTRACT

Robotic manipulation requires object pose knowledge for the objects of interest. In order to perform typical household chores, a robot needs to be able to estimate 6D poses for objects such as water glasses or salad bowls. This is especially difficult for glass objects, as for these, depth data are mostly disturbed, and in RGB images, occluded objects are still visible. Thus, in this paper, we propose to redefine the ground-truth for training RGB-based pose estimators in two ways: (a) we apply a transparency-aware multisegmentation, in which an image pixel can belong to more than one object, and (b) we use transparency-aware bounding boxes, which always enclose whole objects, even if parts of an object are formally occluded by another object. The latter approach ensures that the size and scale of an object remain more consistent across different images. We train our pose estimator, which was originally designed for opaque objects, with three different ground-truth types on the ClearPose dataset. Just by changing the training data to our transparency-aware segmentation, with no additional glass-specific feature changes in the estimator, the ADD-S AUC value increases by 4.3%. Such a multisegmentation can be created for every dataset that provides a 3D model of the object and its ground-truth pose.

3.
BMC Infect Dis ; 23(1): 32, 2023 Jan 19.
Article in English | MEDLINE | ID: mdl-36658559

ABSTRACT

BACKGROUND: Nontuberculous mycobacterial lung disease (NTM-LD) and Mycobacterium tuberculosis lung disease (MTB-LD) have similar clinical characteristics. Therefore, NTM-LD is sometimes incorrectly diagnosed with MTB-LD and treated incorrectly. To solve these difficulties, we aimed to distinguish the two diseases in chest X-ray images using deep learning technology, which has been used in various fields recently. METHODS: We retrospectively collected chest X-ray images from 3314 patients infected with Mycobacterium tuberculosis (MTB) or nontuberculosis mycobacterium (NTM). After selecting the data according to the diagnostic criteria, various experiments were conducted to create the optimal deep learning model. A performance comparison was performed with the radiologist. Additionally, the model performance was verified using newly collected MTB-LD and NTM-LD patient data. RESULTS: Among the implemented deep learning models, the ensemble model combining EfficientNet B4 and ResNet 50 performed the best in the test data. Also, the ensemble model outperformed the radiologist on all evaluation metrics. In addition, the accuracy of the ensemble model was 0.85 for MTB-LD and 0.78 for NTM-LD on an additional validation dataset consisting of newly collected patients. CONCLUSIONS: In previous studies, it was known that it was difficult to distinguish between MTB-LD and NTM-LD in chest X-ray images, but we have successfully distinguished the two diseases using deep learning methods. This study has the potential to aid clinical decisions if the two diseases need to be differentiated.


Subject(s)
Lung Diseases , Mycobacterium Infections, Nontuberculous , Mycobacterium tuberculosis , Pneumonia , Humans , Retrospective Studies , X-Rays , Mycobacterium Infections, Nontuberculous/diagnostic imaging , Mycobacterium Infections, Nontuberculous/drug therapy , Nontuberculous Mycobacteria , Machine Learning
4.
Sensors (Basel) ; 23(10)2023 May 22.
Article in English | MEDLINE | ID: mdl-37430876

ABSTRACT

Bounding box regression is a crucial step in object detection, directly affecting the localization performance of the detected objects. Especially in small object detection, an excellent bounding box regression loss can significantly alleviate the problem of missing small objects. However, there are two major problems with the broad Intersection over Union (IoU) losses, also known as Broad IoU losses (BIoU losses) in bounding box regression: (i) BIoU losses cannot provide more effective fitting information for predicted boxes as they approach the target box, resulting in slow convergence and inaccurate regression results; (ii) most localization loss functions do not fully utilize the spatial information of the target, namely the target's foreground area, during the fitting process. Therefore, this paper proposes the Corner-point and Foreground-area IoU loss (CFIoU loss) function by delving into the potential for bounding box regression losses to overcome these issues. First, we use the normalized corner point distance between the two boxes instead of the normalized center-point distance used in the BIoU losses, which effectively suppresses the problem of BIoU losses degrading to IoU loss when the two boxes are close. Second, we add adaptive target information to the loss function to provide richer target information to optimize the bounding box regression process, especially for small object detection. Finally, we conducted simulation experiments on bounding box regression to validate our hypothesis. At the same time, we conducted quantitative comparisons of the current mainstream BIoU losses and our proposed CFIoU loss on the small object public datasets VisDrone2019 and SODA-D using the latest anchor-based YOLOv5 and anchor-free YOLOv8 object detection algorithms. The experimental results demonstrate that YOLOv5s (+3.12% Recall, +2.73% mAP@0.5, and +1.91% mAP@0.5:0.95) and YOLOv8s (+1.72% Recall and +0.60% mAP@0.5), both incorporating the CFIoU loss, achieved the highest performance improvement on the VisDrone2019 test set. Similarly, YOLOv5s (+6% Recall, +13.08% mAP@0.5, and +14.29% mAP@0.5:0.95) and YOLOv8s (+3.36% Recall, +3.66% mAP@0.5, and +4.05% mAP@0.5:0.95), both incorporating the CFIoU loss, also achieved the highest performance improvement on the SODA-D test set. These results indicate the effectiveness and superiority of the CFIoU loss in small object detection. Additionally, we conducted comparative experiments by fusing the CFIoU loss and the BIoU loss with the SSD algorithm, which is not proficient in small object detection. The experimental results demonstrate that the SSD algorithm incorporating the CFIoU loss achieved the highest improvement in the AP (+5.59%) and AP75 (+5.37%) metrics, indicating that the CFIoU loss can also improve the performance of algorithms that are not proficient in small object detection.

5.
Sensors (Basel) ; 23(10)2023 May 10.
Article in English | MEDLINE | ID: mdl-37430525

ABSTRACT

In the era of coronavirus disease (COVID-19), wearing a mask could effectively protect people from the risk of infection and largely reduce transmission in public places. To prevent the spread of the virus, instruments are needed in public places to monitor whether people are wearing masks, which has higher requirements for the accuracy and speed of detection algorithms. To meet the demand for high accuracy and real-time monitoring, we propose a single-stage approach based on YOLOv4 to identify the face and whether to regulate the wearing of masks. In this approach, we propose a new feature pyramidal network based on the attention mechanism to reduce the loss of object information that can be caused by sampling and pooling in convolutional neural networks. The network is able to deeply mine the feature map for spatial and communication factors, and the multi-scale feature fusion makes the feature map equipped with location and semantic information. Based on the complete intersection over union (CIoU), a penalty function based on the norm is proposed to improve positioning accuracy, which is more accurate at the detection of small objects; the new bounding box regression function is called Norm CIoU (NCIoU). This function is applicable to various object-detection bounding box regression tasks. A combination of the two functions to calculate the confidence loss is used to mitigate the problem of the algorithm bias towards determinating no objects in the image. Moreover, we provide a dataset for recognizing faces and masks (RFM) that includes 12,133 realistic images. The dataset contains three categories: face, standardized mask and non-standardized mask. Experiments conducted on the dataset demonstrate that the proposed approach achieves mAP@.5:.95 69.70% and AP75 73.80%, outperforming the compared methods.


Subject(s)
COVID-19 , Humans , Algorithms , Recognition, Psychology , Neural Networks, Computer , Communication
6.
Sensors (Basel) ; 23(4)2023 Feb 04.
Article in English | MEDLINE | ID: mdl-36850373

ABSTRACT

The use of vision for the recognition of water targets is easily influenced by reflections and ripples, resulting in misidentification. This paper proposed a detection method based on the fusion of 3D point clouds and visual information to detect and locate water surface targets. The point clouds help to reduce the impact of ripples and reflections, and the recognition accuracy is enhanced by visual information. This method consists of three steps: Firstly, the water surface target is detected using the CornerNet-Lite network, and then the candidate target box and camera detection confidence are determined. Secondly, the 3D point cloud is projected onto the two-dimensional pixel plane, and the confidence of LiDAR detection is calculated based on the ratio between the projected area of the point clouds and the pixel area of the bounding box. The target confidence is calculated with the camera detection and LiDAR detection confidence, and the water surface target is determined by combining the detection thresholds. Finally, the bounding box is used to determine the 3D point clouds of the target and estimate its 3D coordinates. The experiment results showed this method reduced the misidentification rate and had 15.5% higher accuracy compared with traditional CornerNet-Lite network. By combining the depth information from LiDAR, the position of the target relative to the detection coordinate system origin could be accurately estimated.

7.
Sensors (Basel) ; 22(21)2022 Nov 01.
Article in English | MEDLINE | ID: mdl-36366090

ABSTRACT

CNN-based object detectors have achieved great success in recent years. The available detectors adopted horizontal bounding boxes to locate various objects. However, in some unique scenarios, objects such as buildings and vehicles in aerial images may be densely arranged and have apparent orientations. Therefore, some approaches extend the horizontal bounding box to the oriented bounding box to better extract objects, usually carried out by directly regressing the angle or corners. However, this suffers from the discontinuous boundary problem caused by angular periodicity or corner order. In this paper, we propose a simple but efficient oriented object detector based on YOLOv4 architecture. We regress the offset of an object's front point instead of its angle or corners to avoid the above mentioned problems. In addition, we introduce the intersection over union (IoU) correction factor to make the training process more stable. The experimental results on two public datasets, DOTA and HRSC2016, demonstrate that the proposed method significantly outperforms other methods in terms of detection speed while maintaining high accuracy. In DOTA, our proposed method achieved the highest mAP for the classes with prominent front-side appearances, such as small vehicles, large vehicles, and ships. The highly efficient architecture of YOLOv4 increases more than 25% detection speed compared to the other approaches.

8.
Sensors (Basel) ; 22(19)2022 Oct 02.
Article in English | MEDLINE | ID: mdl-36236593

ABSTRACT

Conventional point cloud simplification algorithms have problems including nonuniform simplification, a deficient reflection of point cloud characteristics, unreasonable weight distribution, and high computational complexity. A simplification algorithm, namely, the multi-index weighting simplification algorithm (MIWSA), is proposed in this paper. First, the point cloud is organized with a bounding box and kd-trees to find the neighborhood of each point, and the points are divided into small segments. Second, the feature index of each point is calculated to indicate the characteristics of the points. Third, the analytic hierarchy process (AHP) and criteria importance through intercriteria correlation (CRITIC) are applied to weight these indexes to determine whether each point is a feature point. Fourth, non-feature points are judged as saved or abandoned according to their spatial relationship with the feature points. To verify the effect of the MIWSA, 3D model scanning datasets are calculated and analyzed, as well as field area scanning datasets. The accuracy for the 3D model scanning datasets is assessed by the surface area and patch numbers of the encapsulated surfaces, and that for field area scanning datasets is evaluated by the DEM error statistics. Compared with existing algorithms, the overall accuracy of the MIWSA is 5% to 15% better. Additionally, the running time is shorter than most. The experimental results illustrate that the MIWSA can simplify point clouds more precisely and uniformly.

9.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi ; 39(3): 462-470, 2022 Jun 25.
Article in Zh | MEDLINE | ID: mdl-35788515

ABSTRACT

Percutaneous pulmonary puncture guided by computed tomography (CT) is one of the most effective tools for obtaining lung tissue and diagnosing lung cancer. Path planning is an important procedure to avoid puncture complications and reduce patient pain and puncture mortality. In this work, a path planning method for lung puncture is proposed based on multi-level constraints. A digital model of the chest is firstly established using patient's CT image. A Fibonacci lattice sampling is secondly conducted on an ideal sphere centered on the tumor lesion in order to obtain a set of candidate paths. Finally, by considering clinical puncture guidelines, an optimal path can be obtained by a proposed multi-level constraint strategy, which is combined with oriented bounding box tree (OBBTree) algorithm and Pareto optimization algorithm. Results of simulation experiments demonstrated the effectiveness of the proposed method, which has good performance for avoiding physical and physiological barriers. Hence, the method could be used as an aid for physicians to select the puncture path.


Subject(s)
Lung Neoplasms , Humans , Lung/diagnostic imaging , Lung Neoplasms/diagnostic imaging , Punctures , Thorax , Tomography, X-Ray Computed
10.
Sensors (Basel) ; 21(23)2021 Dec 03.
Article in English | MEDLINE | ID: mdl-34884103

ABSTRACT

Bounding box estimation by overlap maximization has improved the state of the art of visual tracking significantly, yet the improvement in robustness and accuracy is restricted by the limited reference information, i.e., the initial target. In this paper, we present DCOM, a novel bounding box estimation method for visual tracking, based on distribution calibration and overlap maximization. We assume every dimension in the modulation vector follows a Gaussian distribution, so that the mean and the variance can borrow from those of similar targets in large-scale training datasets. As such, sufficient and reliable reference information can be obtained from the calibrated distribution, leading to a more robust and accurate target estimation. Additionally, an updating strategy for the modulation vector is proposed to adapt the variation of the target object. Our method can be built on top of off-the-shelf networks without finetuning and extra parameters. It yields state-of-the-art performance on three popular benchmarks, including GOT-10k, LaSOT, and NfS while running at around 40 FPS, confirming its effectiveness and efficiency.


Subject(s)
Calibration , Normal Distribution
11.
Sensors (Basel) ; 21(5)2021 Mar 01.
Article in English | MEDLINE | ID: mdl-33804330

ABSTRACT

In object detection of remote sensing images, anchor-free detectors often suffer from false boxes and sample imbalance, due to the use of single oriented features and the key point-based boxing strategy. This paper presents a simple and effective anchor-free approach-RatioNet with less parameters and higher accuracy for sensing images, which assigns all points in ground-truth boxes as positive samples to alleviate the problem of sample imbalance. In dealing with false boxes from single oriented features, global features of objects is investigated to build a novel regression to predict boxes by predicting width and height of objects and corresponding ratios of l_ratio and t_ratio, which reflect the location of objects. Besides, we introduce ratio-center to assign different weights to pixels, which successfully preserves high-quality boxes and effectively facilitates the performance. On the MS-COCO test-dev set, the proposed RatioNet achieves 49.7% AP.

12.
Sensors (Basel) ; 21(9)2021 Apr 22.
Article in English | MEDLINE | ID: mdl-33922124

ABSTRACT

This paper provides an efficient way of addressing the problem of detecting or estimating the 6-Dimensional (6D) pose of objects from an RGB image. A quaternion is used to define an object's three-dimensional pose, but the pose represented by q and the pose represented by -q are equivalent, and the L2 loss between them is very large. Therefore, we define a new quaternion pose loss function to solve this problem. Based on this, we designed a new convolutional neural network named Q-Net to estimate an object's pose. Considering that the quaternion's output is a unit vector, a normalization layer is added in Q-Net to hold the output of pose on a four-dimensional unit sphere. We propose a new algorithm, called the Bounding Box Equation, to obtain 3D translation quickly and effectively from 2D bounding boxes. The algorithm uses an entirely new way of assessing the 3D rotation (R) and 3D translation rotation (t) in only one RGB image. This method can upgrade any traditional 2D-box prediction algorithm to a 3D prediction model. We evaluated our model using the LineMod dataset, and experiments have shown that our methodology is more acceptable and efficient in terms of L2 loss and computational time.

13.
Sensors (Basel) ; 21(21)2021 Oct 25.
Article in English | MEDLINE | ID: mdl-34770381

ABSTRACT

Mobile construction machineries are accident-prone on a dynamic construction site, as the site environment is constantly changing and continuous safety monitoring by human beings is impossible. These accidents usually happen in the form of machinery overturning or collapsing into risk areas, including the foundation pit, slopes, or soft soil area. Therefore, preventing mobile construction machineries from entering risk areas is the key. However, currently, there is a lack of practical safety management techniques to achieve this. Utilizing a wireless sensor device to collect the location information of mobile construction machineries, this research develops a safety warning algorithm to prevent the machineries moving into risk area and reduces onsite overturning or collapsing accidents. A modified axis aligned bounding box method is proposed according to the movement patterns of mobile construction machineries, and the warning algorithm is developed based on the onsite safety management regulations. The algorithm is validated in a real case simulation when machinery enters the warning zone. The simulation results showed that the overall algorithm combining the location sensing technology and the modified bounding box method could detect risk and give warnings in a timely manner. This algorithm can be implemented for the safety monitoring of mobile construction machineries in daily onsite management.


Subject(s)
Algorithms , Safety Management , Computer Simulation , Humans
14.
Sensors (Basel) ; 21(23)2021 Nov 26.
Article in English | MEDLINE | ID: mdl-34883887

ABSTRACT

The 3D vehicle trajectory in complex traffic conditions such as crossroads and heavy traffic is practically very useful in autonomous driving. In order to accurately extract the 3D vehicle trajectory from a perspective camera in a crossroad where the vehicle has an angular range of 360 degrees, problems such as the narrow visual angle in single-camera scene, vehicle occlusion under conditions of low camera perspective, and lack of vehicle physical information must be solved. In this paper, we propose a method for estimating the 3D bounding boxes of vehicles and extracting trajectories using a deep convolutional neural network (DCNN) in an overlapping multi-camera crossroad scene. First, traffic data were collected using overlapping multi-cameras to obtain a wide range of trajectories around the crossroad. Then, 3D bounding boxes of vehicles were estimated and tracked in each single-camera scene through DCNN models (YOLOv4, multi-branch CNN) combined with camera calibration. Using the abovementioned information, the 3D vehicle trajectory could be extracted on the ground plane of the crossroad by calculating results obtained from the overlapping multi-camera with a homography matrix. Finally, in experiments, the errors of extracted trajectories were corrected through a simple linear interpolation and regression, and the accuracy of the proposed method was verified by calculating the difference with ground-truth data. Compared with other previously reported methods, our approach is shown to be more accurate and more practical.

15.
J Digit Imaging ; 34(4): 846-852, 2021 08.
Article in English | MEDLINE | ID: mdl-34322753

ABSTRACT

Patients who are intubated with endotracheal tubes often receive chest x-ray (CXR) imaging to determine whether the tube is correctly positioned. When these CXRs are interpreted by a radiologist, they evaluate whether the tube needs to be repositioned and typically provide a measurement in centimeters between the endotracheal tube tip and carina. In this project, a large dataset of endotracheal tube and carina bounding boxes was annotated on CXRs, and a machine-learning model was trained to generate these boxes on new CXRs and to calculate a distance measurement between the tube and carina. This model was applied to a gold standard annotated dataset, as well as to all prospective data passing through our radiology system for two weeks. Inter-radiologist variability was also measured on a test dataset. The distance measurements for both the gold standard dataset (mean error = 0.70 cm) and prospective dataset (mean error = 0.68 cm) were noninferior to inter-radiologist variability (mean error = 0.70 cm) within an equivalence bound of 0.1 cm. This suggests that this model performs at an accuracy similar to human measurements, and these distance calculations can be used for clinical report auto-population and/or worklist prioritization of severely malpositioned tubes.


Subject(s)
Intubation, Intratracheal , Trachea , Humans , Prospective Studies , Radiography , Trachea/diagnostic imaging , X-Rays
16.
Entropy (Basel) ; 23(11)2021 Nov 19.
Article in English | MEDLINE | ID: mdl-34828241

ABSTRACT

Several supervised machine learning algorithms focused on binary classification for solving daily problems can be found in the literature. The straight-line segment classifier stands out for its low complexity and competitiveness, compared to well-knownconventional classifiers. This binary classifier is based on distances between points and two labeled sets of straight-line segments. Its training phase consists of finding the placement of labeled straight-line segment extremities (and consequently, their lengths) which gives the minimum mean square error. However, during the training phase, the straight-line segment lengths can grow significantly, giving a negative impact on the classification rate. Therefore, this paper proposes an approach for adjusting the placements of labeled straight-line segment extremities to build reliable classifiers in a constrained search space (tuned by a scale factor parameter) in order to restrict their lengths. Ten artificial and eight datasets from the UCI Machine Learning Repository were used to prove that our approach shows promising results, compared to other classifiers. We conclude that this classifier can be used in industry for decision-making problems, due to the straightforward interpretation and classification rates.

17.
BMC Med Imaging ; 20(1): 37, 2020 04 15.
Article in English | MEDLINE | ID: mdl-32293303

ABSTRACT

BACKGROUND: Renal cancer is one of the 10 most common cancers in human beings. The laparoscopic partial nephrectomy (LPN) is an effective way to treat renal cancer. Localization and delineation of the renal tumor from pre-operative CT Angiography (CTA) is an important step for LPN surgery planning. Recently, with the development of the technique of deep learning, deep neural networks can be trained to provide accurate pixel-wise renal tumor segmentation in CTA images. However, constructing the training dataset with a large amount of pixel-wise annotations is a time-consuming task for the radiologists. Therefore, weakly-supervised approaches attract more interest in research. METHODS: In this paper, we proposed a novel weakly-supervised convolutional neural network (CNN) for renal tumor segmentation. A three-stage framework was introduced to train the CNN with the weak annotations of renal tumors, i.e. the bounding boxes of renal tumors. The framework includes pseudo masks generation, group and weighted training phases. Clinical abdominal CT angiographic images of 200 patients were applied to perform the evaluation. RESULTS: Extensive experimental results show that the proposed method achieves a higher dice coefficient (DSC) of 0.826 than the other two existing weakly-supervised deep neural networks. Furthermore, the segmentation performance is close to the fully supervised deep CNN. CONCLUSIONS: The proposed strategy improves not only the efficiency of network training but also the precision of the segmentation.


Subject(s)
Computed Tomography Angiography/methods , Image Processing, Computer-Assisted/methods , Kidney Neoplasms/diagnostic imaging , Clinical Competence , Humans , Kidney Neoplasms/blood supply , Neural Networks, Computer , Preoperative Period , Supervised Machine Learning
18.
Sensors (Basel) ; 20(7)2020 Mar 30.
Article in English | MEDLINE | ID: mdl-32235541

ABSTRACT

With an infrared circumferential scanning system (IRCSS), we can realize long-time surveillance over a large field of view. Recognizing targets in the field of view automatically is a crucial component of improving environmental awareness under the trend of informatization, especially in the defense system. Target recognition consists of two subtasks: detection and identification, corresponding to the position and category of the target, respectively. In this study, we propose a deep convolutional neural network (DCNN)-based method to realize the end-to-end target recognition in the IRCSS. Existing DCNN-based methods require a large annotated dataset for training, while public infrared datasets are mostly used for target tracking. Therefore, we build an infrared target recognition dataset to both overcome the shortage of data and enhance the adaptability of the algorithm in various scenes. We then use data augmentation and exploit the optimal cross-domain transfer learning strategy for network training. In this process, we design the smoother L1 as the loss function in bounding box regression for better localization performance. In the experiments, the proposed method achieved 82.7 mAP, accomplishing the end-to-end infrared target recognition with high effectiveness on accuracy.

19.
Sensors (Basel) ; 19(2)2019 Jan 15.
Article in English | MEDLINE | ID: mdl-30650645

ABSTRACT

Building damage accounts for a high percentage of post-natural disaster assessment. Extracting buildings from optical remote sensing images is of great significance for natural disaster reduction and assessment. Traditional methods mainly are semi-automatic methods which require human-computer interaction or rely on purely human interpretation. In this paper, inspired by the recently developed deep learning techniques, we propose an improved Mask Region Convolutional Neural Network (Mask R-CNN) method that can detect the rotated bounding boxes of buildings and segment them from very complex backgrounds, simultaneously. The proposed method has two major improvements, making it very suitable to perform building extraction task. Firstly, instead of predicting horizontal rectangle bounding boxes of objects like many other detectors do, we intend to obtain the minimum enclosing rectangles of buildings by adding a new term: the principal directions of the rectangles θ. Secondly, a new layer by integrating advantages of both atrous convolution and inception block is designed and inserted into the segmentation branch of the Mask R-CNN to make the branch to learn more representative features. We test the proposed method on a newly collected large Google Earth remote sensing dataset with diverse buildings and very complex backgrounds. Experiments demonstrate that it can obtain promising results.

20.
Sensors (Basel) ; 18(8)2018 Aug 17.
Article in English | MEDLINE | ID: mdl-30126096

ABSTRACT

Since remote sensing images are captured from the top of the target, such as from a satellite or plane platform, ship targets can be presented at any orientation. When detecting ship targets using horizontal bounding boxes, there will be background clutter in the box. This clutter makes it harder to detect the ship and find its precise location, especially when the targets are in close proximity or staying close to the shore. To solve these problems, this paper proposes a deep learning algorithm using a multiscale rotated bounding box to detect the ship target in a complex background and obtain the location and orientation information of the ship. When labeling the oriented targets, we use the five-parameter method to ensure that the box shape is maintained rectangular. The algorithm uses a pretrained deep network to extract features and produces two divided flow paths to output the result. One flow path predicts the target class, while the other predicts the location and angle information. In the training stage, we match the prior multiscale rotated bounding boxes to the ground-truth bounding boxes to obtain the positive sample information and use it to train the deep learning model. When matching the rotated bounding boxes, we narrow down the selection scope to reduce the amount of calculation. In the testing stage, we use the trained model to predict and obtain the final result after comparing with the score threshold and nonmaximum suppression post-processing. Experiments conducted on a remote sensing dataset show that the algorithm is robust in detecting ship targets under complex conditions, such as wave clutter background, target in close proximity, ship close to the shore, and multiscale varieties. Compared to other algorithms, our algorithm not only exhibits better performance in ship detection but also obtains the precise location and orientation information of the ship.

SELECTION OF CITATIONS
SEARCH DETAIL