RESUMO
Currently, lightweight small object detection algorithms for unmanned aerial vehicles (UAVs) often employ group convolutions, resulting in high Memory Access Cost (MAC) and rendering them unsuitable for edge devices that rely on parallel computing. To address this issue, we propose the SOD-YOLO model based on YOLOv7, which incorporates a DSDM-LFIM backbone network and includes a small object detection branch. The DSDM-LFIM backbone network, which combines Deep-Shallow Downsampling Modules (DSD Modules) and Lightweight Feature Integration Modules (LFI Modules), avoids excessive use of group convolutions and element-wise operations. The DSD Module focuses on extracting both deep and shallow features from feature maps using fewer parameters to obtain richer feature representations. The LFI Module, is a dual-branch feature integration module designed to consolidate feature information. Experimental results demonstrate that the SOD-YOLO model achieves an AP50 of 50.7% and a FPS of 72.5 on the VisDrone validation set. Compared to YOLOv7, our model reduces computational costs by 20.25% and decreases the number of parameters by 17.89%. After scaling the number of channels in the model, it achieves an AP50 of 33.4% with an inference time of 27.3ms on the Atlas 200I DK A2. These experimental results indicate that the SOD-YOLO model can effectively perform small object detection tasks in a large number of aerial images captured by UAVs.
RESUMO
This research presents an innovative methodology aimed at monitoring jet trajectory during the jetting process using imagery captured by unmanned aerial vehicles (UAVs). This approach seamlessly integrates UAV imagery with an offline learnable prompt vector module (OPVM) to enhance trajectory monitoring accuracy and stability. By leveraging a high-resolution camera mounted on a UAV, image enhancement is proposed to solve the problem of geometric and photometric distortion in jet trajectory images, and the Faster R-CNN network is deployed to detect objects within the images and precisely identify the jet trajectory within the video stream. Subsequently, the offline learnable prompt vector module is incorporated to further refine trajectory predictions, thereby improving monitoring accuracy and stability. In particular, the offline learnable prompt vector module not only learns the visual characteristics of jet trajectory but also incorporates their textual features, thus adopting a bimodal approach to trajectory analysis. Additionally, OPVM is trained offline, thereby minimizing additional memory and computational resource requirements. Experimental findings underscore the method's remarkable precision of 95.4% and efficiency in monitoring jet trajectory, thereby laying a solid foundation for advancements in trajectory detection and tracking. This methodology holds significant potential for application in firefighting systems and industrial processes, offering a robust framework to address dynamic trajectory monitoring challenges and augment computer vision capabilities in practical scenarios.
RESUMO
Lodging is a crucial factor that limits wheat yield and quality in wheat breeding. Therefore, accurate and timely determination of winter wheat lodging grading is of great practical importance for agricultural insurance companies to assess agricultural losses and good seed selection. However, using artificial fields to investigate the inclination angle and lodging area of winter wheat lodging in actual production is time-consuming, laborious, subjective, and unreliable in measuring results. This study addresses these issues by designing a classification-semantic segmentation multitasking neural network model MLP_U-Net, which can accurately estimate the inclination angle and lodging area of winter wheat lodging. This model can also comprehensively, qualitatively, and quantitatively evaluate the grading of winter wheat lodging. The model is based on U-Net architecture and improves the shift MLP module structure to achieve network refinement and segmentation for complex tasks. The model utilizes a common encoder to enhance its robustness, improve classification accuracy, and strengthen the segmentation network, considering the correlation between lodging degree and lodging area parameters. This study used 82 winter wheat varieties sourced from the regional experiment of national winter wheat in the Huang-Huai-Hai southern area of the water land group at the Henan Modern Agriculture Research and Development Base. The base is located in Xinxiang City, Henan Province. Winter wheat lodging images were collected using the unmanned aerial vehicle (UAV) remote sensing platform. Based on these images, winter wheat lodging datasets were created using different time sequences and different UAV flight heights. These datasets aid in segmenting and classifying winter wheat lodging degrees and areas. The results show that MLP_U-Net has demonstrated superior detection performance in a small sample dataset. The accuracies of winter wheat lodging degree and lodging area grading were 96.1% and 92.2%, respectively, when the UAV flight height was 30 m. For a UAV flight height of 50 m, the accuracies of winter wheat lodging degree and lodging area grading were 84.1% and 84.7%, respectively. These findings indicate that MLP_U-Net is highly robust and efficient in accurately completing the winter wheat lodging-grading task. This valuable insight provides technical references for UAV remote sensing of winter wheat disaster severity and the assessment of losses.
RESUMO
Global navigation satellite systems (GNSSs) provide a common positioning method that utilizes satellite signals to determine the spatial location of a receiver. However, there are several error factors in standalone GNSS positioning due to instrumental, procedural, and environmental factors that arise during the signal transmission process, and the final positioning error can be up to several meters or greater in length. Thus, real-time kinematic (RTK) correction and post-mission precise point positioning (PPP) processing technologies are proposed to improve accuracy and accomplish precise position measurements. To evaluate the geolocation accuracy of mosaicked UAV images of an abandoned mine site, we compared each orthomosaic image and digital elevation model obtained using standalone GNSS positioning, differential (RTK) GNSS positioning, and post-mission PPP processing techniques. In the three types of error evaluation measure (i.e., relative camera location error, ground control points-based absolute image mapping error, and volumetric difference of mine tailings), we found that the RTK GNSS positioning method obtained the best performance in terms of the relative camera location error and the absolute image mapping error evaluations, and the PPP post-processing correction effectively reduced the error (69.5% of the average total relative camera location error and 59.3% of the average total absolute image mapping error) relative to the standalone GNSS positioning method. Although differential (RTK) GNSS positioning is widely used in positioning applications that require very high accuracy, post-mission PPP processing can also be used in various fields in which it is either not feasible to operate expensive equipment to receive RTK GNSS signals or network RTK services are unavailable.
Assuntos
Tecnologia , Fenômenos BiomecânicosRESUMO
As one of the important timber species in China, Cunninghamia lanceolata is widely distributed in southern China. The information of tree individuals and crown plays an important role in accurately monitoring forest resources. Therefore, it is particularly significant to accurately grasp such information of individual C. lanceolata tree. For high-canopy closed forest stands, the key to correctly extract such information is whether the crowns of mutual occlusion and adhesion can be accurately segmented. Taking the Fujian Jiangle State-owned Forest Farm as the research area and using the UAV image as the data source, we developed a method to extract crown information of individual tree based on deep learning method and watershed algorithm. Firstly, the deep learning neural network model U-Net was used to segment the coverage area of the canopy of C. lanceolata, and then the traditional image segmentation algorithm was used to segment the individual tree to obtain the number and crown information of individual tree. Under the condition of maintaining the same training set, validation set and test set, the extraction results of the canopy coverage area by the U-Net model and traditional machine learning methods [random forest (RF) and support vector machine (SVM)] were compared. Then, two individual tree segmentation results were compared, one using the marker-controlled watershed algorithm, and the other using the combination of the U-Net model and marker-controlled watershed algorithm. The results showed that the segmentation accuracy (SA), precision, IoU (intersection over union) and F1-score (harmonic mean of precision and recall) of the U-Net model were higher than those of RF and SVM. Compared with RF, the value of those four indicators increased by 4.6%, 14.9%, 7.6% and 0.05, respectively. Compared with SVM, the four indicators increased by 3.3%, 8.5%, 8.1% and 0.05, respectively. In terms of extracting the number of trees, the overall accuracy (OA) of the U-Net model combined with the marker-controlled watershed algorithm was 3.7% higher than that of the marker-controlled watershed algorithm, with the mean absolute error (MAE) being decreased by 3.1%. In terms of extracting crown area and crown width of individual tree, R2 increased by 0.11 and 0.09, mean squared error decreased by 8.49 m2 and 4.27 m, and MAE decreased by 2.93 m2 and 1.72 m, respectively. The combination of deep learning U-Net model and watershed algorithm could overcome the challenges in accurately extracting the number of trees and the crown information of individual tree of high-density pure C. lanceolata plantations. It was an efficient and low-cost method of extracting tree crown parameters, which could provide a basis for developing intelligent forest resource monitoring.
Assuntos
Cunninghamia , Humanos , Algoritmos , Algoritmo Florestas Aleatórias , China , Redes Neurais de ComputaçãoRESUMO
Longan yield estimation is an important practice before longan harvests. Statistical longan yield data can provide an important reference for market pricing and improving harvest efficiency and can directly determine the economic benefits of longan orchards. At present, the statistical work concerning longan yields requires high labor costs. Aiming at the task of longan yield estimation, combined with deep learning and regression analysis technology, this study proposed a method to calculate longan yield in complex natural environment. First, a UAV was used to collect video images of a longan canopy at the mature stage. Second, the CF-YD model and SF-YD model were constructed to identify Cluster_Fruits and Single_Fruits, respectively, realizing the task of automatically identifying the number of targets directly from images. Finally, according to the sample data collected from real orchards, a regression analysis was carried out on the target quantity detected by the model and the real target quantity, and estimation models were constructed for determining the Cluster_Fruits on a single longan tree and the Single_Fruits on a single Cluster_Fruit. Then, an error analysis was conducted on the data obtained from the manual counting process and the estimation model, and the average error rate regarding the number of Cluster_Fruits was 2.66%, while the average error rate regarding the number of Single_Fruits was 2.99%. The results show that the method proposed in this paper is effective at estimating longan yields and can provide guidance for improving the efficiency of longan fruit harvests.
RESUMO
Visual geo-localization plays a crucial role in positioning and navigation for unmanned aerial vehicles, whose goal is to match the same geographic target from different views. This is a challenging task due to the drastic variations in different viewpoints and appearances. Previous methods have been focused on mining features inside the images. However, they underestimated the influence of external elements and the interaction of various representations. Inspired by multimodal and bilinear pooling, we proposed a pioneering feature fusion network (MBF) to address these inherent differences between drone and satellite views. We observe that UAV's status, such as flight height, leads to changes in the size of image field of view. In addition, local parts of the target scene act a role of importance in extracting discriminative features. Therefore, we present two approaches to exploit those priors. The first module is to add status information to network by transforming them into word embeddings. Note that they concatenate with image embeddings in Transformer block to learn status-aware features. Then, global and local part feature maps from the same viewpoint are correlated and reinforced by hierarchical bilinear pooling (HBP) to improve the robustness of feature representation. By the above approaches, we achieve more discriminative deep representations facilitating the geo-localization more effectively. Our experiments on existing benchmark datasets show significant performance boosting, reaching the new state-of-the-art result. Remarkably, the recall@1 accuracy achieves 89.05% in drone localization task and 93.15% in drone navigation task in University-1652, and shows strong robustness at different flight heights in the SUES-200 dataset.
Assuntos
Conscientização , Benchmarking , Humanos , Fontes de Energia Elétrica , Aprendizagem , Dispositivos Aéreos não TripuladosRESUMO
Wheat is one of the important food crops, and it is often subjected to different stresses during its growth. Lodging is a common disaster in filling and maturity for wheat, which not only affects the quality of wheat grains, but also causes severe yield reduction. Assessing the degree of wheat lodging is of great significance for yield estimation, wheat harvesting and agricultural insurance claims. In particular, point cloud data extracted from unmanned aerial vehicle (UAV) images have provided technical support for accurately assessing the degree of wheat lodging. However, it is difficult to process point cloud data due to the cluttered distribution, which limits the wide application of point cloud data. Therefore, a classification method of wheat lodging degree based on dimensionality reduction images from point cloud data was proposed. Firstly, 2D images were obtained from the 3D point cloud data of the UAV images of wheat field, which were generated by dimensionality reduction based on Hotelling transform and point cloud interpolation method. Then three convolutional neural network (CNN) models were used to realize the classification of different lodging degrees of wheat, including AlexNet, VGG16, and MobileNetV2. Finally, the self-built wheat lodging dataset was used to evaluate the classification model, aiming to improve the universality and scalability of the lodging discrimination method. The results showed that based on MobileNetV2, the dimensionality reduction image from point cloud obtained by the method proposed in this paper has achieved good results in identifying the lodging degree of wheat. The F1-Score of the classification model was 96.7% for filling, and 94.6% for maturity. In conclusion, the point cloud dimensionality reduction method proposed in this study could meet the accurate identification of wheat lodging degree at the field scale.
RESUMO
The affine scale-invariant feature transform (ASIFT) algorithm is a feature extraction algorithm with affinity and scale invariance, which is suitable for image feature matching using unmanned aerial vehicles (UAVs). However, there are many problems in the matching process, such as the low efficiency and mismatching. In order to improve the matching efficiency, this algorithm firstly simulates image distortion based on the position and orientation system (POS) information from real-time UAV measurements to reduce the number of simulated images. Then, the scale-invariant feature transform (SIFT) algorithm is used for feature point detection, and the extracted feature points are combined with the binary robust invariant scalable keypoints (BRISK) descriptor to generate the binary feature descriptor, which is matched using the Hamming distance. Finally, in order to improve the matching accuracy of the UAV images, based on the random sample consensus (RANSAC) a false matching eliminated algorithm is proposed. Through four groups of experiments, the proposed algorithm is compared with the SIFT and ASIFT. The results show that the algorithm can optimize the matching effect and improve the matching speed.
RESUMO
Yellow rust is a disease with a wide range that causes great damage to wheat. The traditional method of manually identifying wheat yellow rust is very inefficient. To improve this situation, this study proposed a deep-learning-based method for identifying wheat yellow rust from unmanned aerial vehicle (UAV) images. The method was based on the pyramid scene parsing network (PSPNet) semantic segmentation model to classify healthy wheat, yellow rust wheat, and bare soil in small-scale UAV images, and to investigate the spatial generalization of the model. In addition, it was proposed to use the high-accuracy classification results of traditional algorithms as weak samples for wheat yellow rust identification. The recognition accuracy of the PSPNet model in this study reached 98%. On this basis, this study used the trained semantic segmentation model to recognize another wheat field. The results showed that the method had certain generalization ability, and its accuracy reached 98%. In addition, the high-accuracy classification result of a support vector machine was used as a weak label by weak supervision, which better solved the labeling problem of large-size images, and the final recognition accuracy reached 94%. Therefore, the present study method facilitated timely control measures to reduce economic losses.
Assuntos
Basidiomycota , Aprendizado Profundo , Doenças das Plantas , Máquina de Vetores de Suporte , TriticumRESUMO
Scene reconstruction uses images or videos as input to reconstruct a 3D model of a real scene and has important applications in smart cities, surveying and mapping, military, and other fields. Structure from motion (SFM) is a key step in scene reconstruction, which recovers sparse point clouds from image sequences. However, large-scale scenes cannot be reconstructed using a single compute node. Image matching and geometric filtering take up a lot of time in the traditional SFM problem. In this paper, we propose a novel divide-and-conquer framework to solve the distributed SFM problem. First, we use the global navigation satellite system (GNSS) information from images to calculate the GNSS neighborhood. The number of images matched is greatly reduced by matching each image to only valid GNSS neighbors. This way, a robust matching relationship can be obtained. Second, the calculated matching relationship is used as the initial camera graph, which is divided into multiple subgraphs by the clustering algorithm. The local SFM is executed on several computing nodes to register the local cameras. Finally, all of the local camera poses are integrated and optimized to complete the global camera registration. Experiments show that our system can accurately and efficiently solve the structure from motion problem in large-scale scenes.
RESUMO
Unmanned Aerial Vehicle (UAV) is one of the latest technologies for high spatial resolution 3D modeling of the Earth. The objectives of this study are to assess low-cost UAV data using image radiometric transformation techniques and investigate its effects on global and local accuracy of the Digital Surface Model (DSM). This research uses UAV Light Detection and Ranging (LIDAR) data from 80 meters and UAV Drone data from 300 and 500 meters flying height. RAW UAV images acquired from 500 meters flying height are radiometrically transformed in Matrix Laboratory (MATLAB). UAV images from 300 meters flying height are processed for the generation of 3D point cloud and DSM in Pix4D Mapper. UAV LIDAR data are used for the acquisition of Ground Control Points (GCP) and accuracy assessment of UAV Image data products. Accuracy of enhanced DSM with DSM generated from 300 meters flight height were analyzed for point cloud number, density and distribution. Root Mean Square Error (RMSE) value of Z is enhanced from ±2.15 meters to 0.11 meters. For local accuracy assessment of DSM, four different types of land covers are statistically compared with UAV LIDAR resulting in compatibility of enhancement technique with UAV LIDAR accuracy.
RESUMO
Soil salinization is an important factor affecting winter wheat growth in coastal areas. The rapid, accurate and efficient estimation of soil salt content is of great significance for agricultural production. The Kenli area in the Yellow River Delta was taken as the research area. Three machine learning inversion models, namely, BP neural network (BPNN), support vector machine (SVM) and random forest (RF) were constructed using ground-measured data and UAV images, and the optimal model is applied to UAV images to obtain the salinity inversion result, which is used as the true salt value of the Sentinel-2A image to establish BPNN, SVM and RF collaborative inversion models, and apply the optimal model to the study area. The results showed that the RF collaborative inversion model is optimal, R2 = 0.885. The inversion results are verified by using the measured soil salt data in the study area, which is significantly better than the directly satellite remote sensing inversion method. This study integrates the advantages of multi-scale data and proposes an effective "Satellite-UAV-Ground" collaborative inversion method for soil salinity, so as to obtain more accurate soil information, and provide more effective technical support for agricultural production.
Assuntos
Rios , Salinidade , Solo/química , China , Tecnologia de Sensoriamento Remoto , Triticum/crescimento & desenvolvimentoRESUMO
Tobacco planting information is an important part of tobacco production management. Unmanned aerial vehicle (UAV) remote sensing systems have become a popular topic worldwide because they are mobile, rapid and economic. In this paper, an automatic identification method for tobacco fields based on UAV images is developed by combining supervised classifications with image morphological operations, and this method was used in the Yunnan Province, which is the top province for tobacco planting in China. The results show that the produce accuracy, user accuracy, and overall accuracy of tobacco field identification using the method proposed in this paper are 92.59%, 96.61% and 95.93%, respectively. The method proposed in this paper has the advantages of automation, flow process, high accuracy and easy operation, but the ground sampling distance (GSD) of the UAV image has an effect on the accuracy of the proposed method. When the image GSD was reduced to 1 m, the overall accuracy decreased by approximately 10%. To solve this problem, we further introduced the convolution method into the proposed method, which can ensure the recognition accuracy of tobacco field is above 90% when GSD is less than or equal to 1 m. Some other potential improvements of methods for mapping tobacco fields were also discussed in this paper.
RESUMO
Ghosting and seams are two major challenges in creating unmanned aerial vehicle (UAV) image mosaic. In response to these problems, this paper proposes an improved method for UAV image seam-line searching. First, an image matching algorithm is used to extract and match the features of adjacent images, so that they can be transformed into the same coordinate system. Then, the gray scale difference, the gradient minimum, and the optical flow value of pixels in adjacent image overlapped area in a neighborhood are calculated, which can be applied to creating an energy function for seam-line searching. Based on that, an improved dynamic programming algorithm is proposed to search the optimal seam-lines to complete the UAV image mosaic. This algorithm adopts a more adaptive energy aggregation and traversal strategy, which can find a more ideal splicing path for adjacent UAV images and avoid the ground objects better. The experimental results show that the proposed method can effectively solve the problems of ghosting and seams in the panoramic UAV images.
RESUMO
Dislocation is one of the major challenges in unmanned aerial vehicle (UAV) image stitching. In this paper, we propose a new algorithm for seamlessly stitching UAV images based on a dynamic programming approach. Our solution consists of two steps: Firstly, an image matching algorithm is used to correct the images so that they are in the same coordinate system. Secondly, a new dynamic programming algorithm is developed based on the concept of a stereo dual-channel energy accumulation. A new energy aggregation and traversal strategy is adopted in our solution, which can find a more optimal seam line for image stitching. Our algorithm overcomes the theoretical limitation of the classical Duplaquet algorithm. Experiments show that the algorithm can effectively solve the dislocation problem in UAV image stitching, especially for the cases in dense urban areas. Our solution is also direction-independent, which has better adaptability and robustness for stitching images.
RESUMO
Using the estimation of scale parameters (ESP) image segmentation tool to determine the ideal image segmentation scale, the optimal segmented image was created by the multi-scale segmentation method. Based on the visible vegetation indices derived from mini-UAV imaging data, we chose a set of optimal vegetation indices from a series of visible vegetation indices, and built up a decision tree rule. A membership function was used to automatically classify the study area and an aquatic vegetation map was generated. The results showed the overall accuracy of image classification using the supervised classification was 53.7%, and the overall accuracy of object-oriented image analysis (OBIA) was 91.7%. Compared with pixel-based supervised classification method, the OBIA method improved significantly the image classification result and further increased the accuracy of extracting the aquatic vegetation. The Kappa value of supervised classification was 0.4, and the Kappa value based OBIA was 0.9. The experimental results demonstrated that using visible vegetation indices derived from the mini-UAV data and OBIA method extracting the aquatic vegetation developed in this study was feasible and could be applied in other physically similar areas.
Assuntos
Monitoramento Ambiental , Processamento de Imagem Assistida por Computador , Plantas , Organismos AquáticosRESUMO
In various unmanned aerial vehicle (UAV) imaging applications, the multisensor super-resolution (SR) technique has become a chronic problem and attracted increasing attention. Multisensor SR algorithms utilize multispectral low-resolution (LR) images to make a higher resolution (HR) image to improve the performance of the UAV imaging system. The primary objective of the paper is to develop a multisensor SR method based on the existing multispectral imaging framework instead of using additional sensors. In order to restore image details without noise amplification or unnatural post-processing artifacts, this paper presents an improved regularized SR algorithm by combining the directionally-adaptive constraints and multiscale non-local means (NLM) filter. As a result, the proposed method can overcome the physical limitation of multispectral sensors by estimating the color HR image from a set of multispectral LR images using intensity-hue-saturation (IHS) image fusion. Experimental results show that the proposed method provides better SR results than existing state-of-the-art SR methods in the sense of objective measures.