Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
Sensors (Basel) ; 24(12)2024 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-38931695

RESUMO

Remote sensing image classification plays a crucial role in the field of remote sensing interpretation. With the exponential growth of multi-source remote sensing data, accurately extracting target features and comprehending target attributes from complex images significantly impacts classification accuracy. To address these challenges, we propose a Canny edge-enhanced multi-level attention feature fusion network (CAF) for remote sensing image classification. The original image is specifically inputted into a convolutional network for the extraction of global features, while increasing the depth of the convolutional layer facilitates feature extraction at various levels. Additionally, to emphasize detailed target features, we employ the Canny operator for edge information extraction and utilize a convolution layer to capture deep edge features. Finally, by leveraging the Attentional Feature Fusion (AFF) network, we fuse global and detailed features to obtain more discriminative representations for scene classification tasks. The performance of our proposed method (CAF) is evaluated through experiments conducted across three openly accessible datasets for classifying scenes in remote sensing images: NWPU-RESISC45, UCM, and MSTAR. The experimental findings indicate that our approach based on incorporating edge detail information outperforms methods relying solely on global feature-based classifications.

2.
Sensors (Basel) ; 24(12)2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38931766

RESUMO

Currently, complex scene classification strategies are limited to high-definition image scene sets, and low-quality scene sets are overlooked. Although a few studies have focused on artificially noisy images or specific image sets, none have involved actual low-resolution scene images. Therefore, designing classification models around practicality is of paramount importance. To solve the above problems, this paper proposes a two-stage classification optimization algorithm model based on MPSO, thus achieving high-precision classification of low-quality scene images. Firstly, to verify the rationality of the proposed model, three groups of internationally recognized scene datasets were used to conduct comparative experiments with the proposed model and 21 existing methods. It was found that the proposed model performs better, especially in the 15-scene dataset, with 1.54% higher accuracy than the best existing method ResNet-ELM. Secondly, to prove the necessity of the pre-reconstruction stage of the proposed model, the same classification architecture was used to conduct comparative experiments between the proposed reconstruction method and six existing preprocessing methods on the seven self-built low-quality news scene frames. The results show that the proposed model has a higher improvement rate for outdoor scenes. Finally, to test the application potential of the proposed model in outdoor environments, an adaptive test experiment was conducted on the two self-built scene sets affected by lighting and weather. The results indicate that the proposed model is suitable for weather-affected scene classification, with an average accuracy improvement of 1.42%.

3.
Neural Netw ; 174: 106241, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38508050

RESUMO

Remarkable achievements have been made in the field of remote sensing cross-scene classification in recent years. However, most methods directly align the entire image features for cross-scene knowledge transfer. They usually ignore the high background complexity and low category consistency of remote sensing images, which can significantly impair the performance of distribution alignment. Besides, shortcomings of the adversarial training paradigm and the inability to guarantee the prediction discriminability and diversity can also hinder cross-scene classification performance. To alleviate the above problems, we propose a novel cross-scene classification framework in a discriminator-free adversarial paradigm, called Adversarial Pair-wise Distribution Matching (APDM), to avoid irrelevant knowledge transfer and enable effective cross-domain modeling. Specifically, we propose the pair-wise cosine discrepancy for both inter-domain and intra-domain prediction measurements to fully leverage the prediction information, which can suppress negative semantic features and implicitly align the cross-scene distributions. Nuclear-norm maximization and minimization are introduced to enhance the target prediction quality and increase the applicability of the source knowledge, respectively. As a general cross-scene framework, APDM can be easily embedded with existing methods to boost the performance. Experimental results and analyses demonstrate that APDM can achieve competitive and effective performance on cross-scene classification tasks.


Assuntos
Conhecimento , Tecnologia de Sensoriamento Remoto , Semântica
4.
Sensors (Basel) ; 23(21)2023 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-37960535

RESUMO

Scene classification in autonomous navigation is a highly complex task due to variations, such as light conditions and dynamic objects, in the inspected scenes; it is also a challenge for small-factor computers to run modern and highly demanding algorithms. In this contribution, we introduce a novel method for classifying scenes in simultaneous localization and mapping (SLAM) using the boundary object function (BOF) descriptor on RGB-D points. Our method aims to reduce complexity with almost no performance cost. All the BOF-based descriptors from each object in a scene are combined to define the scene class. Instead of traditional image classification methods such as ORB or SIFT, we use the BOF descriptor to classify scenes. Through an RGB-D camera, we capture points and adjust them onto layers than are perpendicular to the camera plane. From each plane, we extract the boundaries of objects such as furniture, ceilings, walls, or doors. The extracted features compose a bag of visual words classified by a support vector machine. The proposed method achieves almost the same accuracy in scene classification as a SIFT-based algorithm and is 2.38× faster. The experimental results demonstrate the effectiveness of the proposed method in terms of accuracy and robustness for the 7-Scenes and SUNRGBD datasets.

5.
Sensors (Basel) ; 23(19)2023 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-37836958

RESUMO

Identifying early special traffic events is crucial for efficient traffic control management. If there are a sufficient number of vehicles equipped with automatic event detection and report gadgets, this enables a more rapid response to special events, including road debris, unexpected pedestrians, accidents, and malfunctioning vehicles. To address the needs of such a system and service, we propose a framework for an in-vehicle module-based special traffic event and emergency detection and safe driving monitoring service, which utilizes the modified ResNet classification algorithm to improve the efficiency of traffic management on highways. Due to the fact that this type of classification problem has scarcely been proposed, we have adapted various classification algorithms and corresponding datasets specifically designed for detecting special traffic events. By utilizing datasets containing data on road debris and malfunctioning or crashed vehicles obtained from Korean highways, we demonstrate the feasibility of our algorithms. Our main contributions encompass a thorough adaptation of various deep-learning algorithms and class definitions aimed at detecting actual emergencies on highways. We have also developed a dataset and detection algorithm specifically tailored for this task. Furthermore, our final end-to-end algorithm showcases a notable 9.2% improvement in performance compared to the object accident detection-based algorithm.

6.
Sensors (Basel) ; 23(15)2023 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-37571676

RESUMO

Numerous deep learning methods for acoustic scene classification (ASC) have been proposed to improve the classification accuracy of sound events. However, only a few studies have focused on continual learning (CL) wherein a model continually learns to solve issues with task changes. Therefore, in this study, we systematically analyzed the performance of ten recent CL methods to provide guidelines regarding their performances. The CL methods included two regularization-based methods and eight replay-based methods. First, we defined realistic and difficult scenarios such as online class-incremental (OCI) and online domain-incremental (ODI) cases for three public sound datasets. Then, we systematically analyzed the performance of each CL method in terms of average accuracy, average forgetting, and training time. In OCI scenarios, iCaRL and SCR showed the best performance for small buffer sizes, and GDumb showed the best performance for large buffer sizes. In ODI scenarios, SCR adopting supervised contrastive learning consistently outperformed the other methods, regardless of the memory buffer size. Most replay-based methods have an almost constant training time, regardless of the memory buffer size, and their performance increases with an increase in the memory buffer size. Based on these results, we must first consider GDumb/SCR for the continual learning methods for ASC.

7.
Math Biosci Eng ; 20(7): 12889-12907, 2023 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-37501471

RESUMO

Recently, convolutional neural networks (CNNs) have performed well in object classification and object recognition. However, due to the particularity of geographic data, the labeled samples are seriously insufficient, which limits the practical application of CNN methods in remote sensing (RS) image processing. To address the problem of small sample RS image classification, a discrete wavelet-based multi-level deep feature fusion method is proposed. First, the deep features are extracted from the RS images using pre-trained deep CNNs and discrete wavelet transform (DWT) methods. Next, a modified discriminant correlation analysis (DCA) approach is proposed to distinguish easily confused categories effectively, which is based on the distance coefficient of between-class. The proposed approach can effectively integrate the deep feature information of various frequency bands. Thereby, the proposed method obtains the low-dimensional features with good discrimination, which is demonstrated through experiments on four benchmark datasets. Compared with several state-of-the-art methods, the proposed method achieves outstanding performance under limited training samples, especially one or two training samples per class.

8.
Data Brief ; 48: 109146, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37128585

RESUMO

Accurate perception and awareness of the environment surrounding the automobile is a challenge in automotive research. This article presents A3CarScene, a dataset recorded while driving a research vehicle equipped with audio and video sensors on public roads in the Marche Region, Italy. The sensor suite includes eight microphones installed inside and outside the passenger compartment and two dashcams mounted on the front and rear windows. Approximately 31 h of data for each device were collected during October and November 2022 by driving about 1500 km along diverse roads and landscapes, in variable weather conditions, in daytime and nighttime hours. All key information for the scene understanding process of automated vehicles has been accurately annotated. For each route, annotations with beginning and end timestamps report the type of road traveled (motorway, trunk, primary, secondary, tertiary, residential, and service roads), the degree of urbanization of the area (city, town, suburban area, village, exurban and rural areas), the weather conditions (clear, cloudy, overcast, and rainy), the level of lighting (daytime, evening, night, and tunnel), the type (asphalt or cobblestones) and moisture status (dry or wet) of the road pavement, and the state of the windows (open or closed). This large-scale dataset is valuable for developing new driving assistance technologies based on audio or video data alone or in a multimodal manner and for improving the performance of systems currently in use. The data acquisition process with sensors in multiple locations allows for the assessment of the best installation placement concerning the task. Deep learning engineers can use this dataset to build new baselines, as a comparative benchmark, and to extend existing databases for autonomous driving.

9.
Sensors (Basel) ; 23(2)2023 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-36679569

RESUMO

As an auxiliary means of remote sensing (RS) intelligent interpretation, remote sensing scene classification (RSSC) attracts considerable attention and its performance has been improved significantly by the popular deep convolutional neural networks (DCNNs). However, there are still several challenges that hinder the practical applications of RSSC, such as complex composition of land cover, scale-variation of objects, and redundant and noisy areas for scene classification. In order to mitigate the impact of these issues, we propose an adaptive discriminative regions learning network for RSSC, referred as ADRL-Net briefly, which locates discriminative regions effectively for boosting the performance of RSSC by utilizing a novel self-supervision mechanism. Our proposed ADRL-Net consists of three main modules, including a discriminative region generator, a region discriminator, and a region scorer. Specifically, the discriminative region generator first generates some candidate regions which could be informative for RSSC. Then, the region discriminator evaluates the regions generated by region generator and provides feedback for the generator to update the informative regions. Finally, the region scorer makes prediction scores for the whole image by using the discriminative regions. In such a manner, the three modules of ADRL-Net can cooperate with each other and focus on the most informative regions of an image and reduce the interference of redundant regions for final classification, which is robust to the complex scene composition, object scales, and irrelevant information. In order to validate the efficacy of the proposed network, we conduct experiments on four widely used benchmark datasets, and the experimental results demonstrate that ADRL-Net consistently outperforms other state-of-the-art RSSC methods.


Assuntos
Redes Neurais de Computação , Tecnologia de Sensoriamento Remoto , Tecnologia de Sensoriamento Remoto/métodos , Benchmarking , Inteligência
10.
Sensors (Basel) ; 22(19)2022 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-36236744

RESUMO

Object detection is an essential function for mobile robots, allowing them to carry out missions efficiently. In recent years, various deep learning models based on convolutional neural networks have achieved good performance in object detection. However, in cases in which robots have to carry out missions in a particular environment, utilizing a model that has been trained without considering the environment in which robots must conduct their tasks degrades their object detection performance, leading to failed missions. This poor model accuracy occurs because of the class imbalance problem, in which the occurrence frequencies of the object classes in the training dataset are significantly different. In this study, we propose a systematic solution that can solve the class imbalance problem by training multiple object detection models and using these models effectively for robots that move through various environments to carry out missions. Moreover, we show through experiments that the proposed multi-model-based object detection framework with environment-context awareness can effectively overcome the class imbalance problem. As a result of the experiment, CPU usage decreased by 45.49% and latency decreased by more than 60%, while object detection accuracy increased by 6.6% on average.


Assuntos
Robótica , Redes Neurais de Computação
11.
Sensors (Basel) ; 22(14)2022 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-35891109

RESUMO

Multi-label aerial scene image classification is a long-standing and challenging research problem in the remote sensing field. As land cover objects usually co-exist in an aerial scene image, modeling label dependencies is a compelling approach to improve the performance. Previous methods generally directly model the label dependencies among all the categories in the target dataset. However, most of the semantic features extracted from an image are relevant to the existing objects, making the dependencies among the nonexistant categories unable to be effectively evaluated. These redundant label dependencies may bring noise and further decrease the performance of classification. To solve this problem, we propose S-MAT, a Semantic-driven Masked Attention Transformer for multi-label aerial scene image classification. S-MAT adopts a Masked Attention Transformer (MAT) to capture the correlations among the label embeddings constructed by a Semantic Disentanglement Module (SDM). Moreover, the proposed masked attention in MAT can filter out the redundant dependencies and enhance the robustness of the model. As a result, the proposed method can explicitly and accurately capture the label dependencies. Therefore, our method achieves CF1s of 89.21%, 90.90%, and 88.31% on three multi-label aerial scene image classification benchmark datasets: UC-Merced Multi-label, AID Multi-label, and MLRSNet, respectively. In addition, extensive ablation studies and empirical analysis are provided to demonstrate the effectiveness of the essential components of our method under different factors.


Assuntos
Algoritmos , Semântica , Atenção , Fontes de Energia Elétrica , Projetos de Pesquisa
12.
J Imaging ; 8(8)2022 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-35893087

RESUMO

Indoor scene recognition and semantic information can be helpful for social robots. Recently, in the field of indoor scene recognition, researchers have incorporated object-level information and shown improved performances. This paper demonstrates that scene recognition can be performed solely using object-level information in line with these advances. A state-of-the-art object detection model was trained to detect objects typically found in indoor environments and then used to detect objects in scene data. These predicted objects were then used as features to predict room categories. This paper successfully combines approaches conventionally used in computer vision and natural language processing (YOLO and TF-IDF, respectively). These approaches could be further helpful in the field of embodied research and dynamic scene classification, which we elaborate on.

13.
Sensors (Basel) ; 21(23)2021 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-34883955

RESUMO

Classification of indoor environments is a challenging problem. The availability of low-cost depth sensors has opened up a new research area of using depth information in addition to color image (RGB) data for scene understanding. Transfer learning of deep convolutional networks with pairs of RGB and depth (RGB-D) images has to deal with integrating these two modalities. Single-channel depth images are often converted to three-channel images by extracting horizontal disparity, height above ground, and the angle of the pixel's local surface normal (HHA) to apply transfer learning using networks trained on the Places365 dataset. The high computational cost of HHA encoding can be a major disadvantage for the real-time prediction of scenes, although this may be less important during the training phase. We propose a new, computationally efficient encoding method that can be integrated with any convolutional neural network. We show that our encoding approach performs equally well or better in a multimodal transfer learning setup for scene classification. Our encoding is implemented in a customized and pretrained VGG16 Net. We address the class imbalance problem seen in the image dataset using a method based on the synthetic minority oversampling technique (SMOTE) at the feature level. With appropriate image augmentation and fine-tuning, our network achieves scene classification accuracy comparable to that of other state-of-the-art architectures.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Coleta de Dados
14.
PeerJ Comput Sci ; 7: e666, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34616882

RESUMO

Image understanding and scene classification are keystone tasks in computer vision. The development of technologies and profusion of existing datasets open a wide room for improvement in the image classification and recognition research area. Notwithstanding the optimal performance of exiting machine learning models in image understanding and scene classification, there are still obstacles to overcome. All models are data-dependent that can only classify samples close to the training set. Moreover, these models require large data for training and learning. The first problem is solved by few-shot learning, which achieves optimal performance in object detection and classification but with a lack of eligible attention in the scene classification task. Motivated by these findings, in this paper, we introduce two models for few-shot learning in scene classification. In order to trace the behavior of those models, we also introduce two datasets (MiniSun; MiniPlaces) for image scene classification. Experimental results show that the proposed models outperform the benchmark approaches in respect of classification accuracy.

15.
Sensors (Basel) ; 21(18)2021 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-34577488

RESUMO

The detection of obstacles at rail level crossings (RLC) is an important task for ensuring the safety of train traffic. Traffic control systems require reliable sensors for determining the state of anRLC. Fusion of information from a number of sensors located at the site increases the capability for reacting to dangerous situations. One such source is video from monitoring cameras. This paper presents a method for processing video data, using deep learning, for the determination of the state of the area (region of interest-ROI) vital for a safe passage of the train. The proposed approach is validated using video surveillance material from a number of RLC sites in Poland. The films include 24/7 observations in all weather conditions and in all seasons of the year. Results show that the recall values reach 0.98 using significantly reduced processing resources. The solution can be used as an auxiliary source of signals for train control systems, together with other sensor data, and the fused dataset can meet railway safety standards.


Assuntos
Aprendizado Profundo , Ferrovias , Acidentes de Trânsito , Estações do Ano , Tempo (Meteorologia)
16.
Sensors (Basel) ; 21(12)2021 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-34207736

RESUMO

Wildfires have affected global forests and the Mediterranean area with increasing recurrency and intensity in the last years, with climate change resulting in reduced precipitations and higher temperatures. To assess the impact of wildfires on the environment, burned area mapping has become progressively more relevant. Initially carried out via field sketches, the advent of satellite remote sensing opened new possibilities, reducing the cost uncertainty and safety of the previous techniques. In the present study an experimental methodology was adopted to test the potential of advanced remote sensing techniques such as multispectral Sentinel-2, PRISMA hyperspectral satellite, and UAV (unmanned aerial vehicle) remotely-sensed data for the multitemporal mapping of burned areas by soil-vegetation recovery analysis in two test sites in Portugal and Italy. In case study one, innovative multiplatform data classification was performed with the correlation between Sentinel-2 RBR (relativized burn ratio) fire severity classes and the scene hyperspectral signature, performed with a pixel-by-pixel comparison leading to a converging classification. In the adopted methodology, RBR burned area analysis and vegetation recovery was tested for accordance with biophysical vegetation parameters (LAI, fCover, and fAPAR). In case study two, a UAV-sensed NDVI index was adopted for high-resolution mapping data collection. At a large scale, the Sentinel-2 RBR index proved to be efficient for burned area analysis, from both fire severity and vegetation recovery phenomena perspectives. Despite the elapsed time between the event and the acquisition, PRISMA hyperspectral converging classification based on Sentinel-2 was able to detect and discriminate different spectral signatures corresponding to different fire severity classes. At a slope scale, the UAV platform proved to be an effective tool for mapping and characterizing the burned area, giving clear advantage with respect to filed GPS mapping. Results highlighted that UAV platforms, if equipped with a hyperspectral sensor and used in a synergistic approach with PRISMA, would create a useful tool for satellite acquired data scene classification, allowing for the acquisition of a ground truth.


Assuntos
Incêndios , Incêndios Florestais , Florestas , Itália , Portugal
17.
PeerJ Comput Sci ; 7: e557, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34141887

RESUMO

Convolutional neural network is widely used to perform the task of image classification, including pretraining, followed by fine-tuning whereby features are adapted to perform the target task, on ImageNet. ImageNet is a large database consisting of 15 million images belonging to 22,000 categories. Images collected from the Web are labeled using Amazon Mechanical Turk crowd-sourcing tool by human labelers. ImageNet is useful for transfer learning because of the sheer volume of its dataset and the number of object classes available. Transfer learning using pretrained models is useful because it helps to build computer vision models in an accurate and inexpensive manner. Models that have been pretrained on substantial datasets are used and repurposed for our requirements. Scene recognition is a widely used application of computer vision in many communities and industries, such as tourism. This study aims to show multilabel scene classification using five architectures, namely, VGG16, VGG19, ResNet50, InceptionV3, and Xception using ImageNet weights available in the Keras library. The performance of different architectures is comprehensively compared in the study. Finally, EnsemV3X is presented in this study. The proposed model with reduced number of parameters is superior to state-of-of-the-art models Inception and Xception because it demonstrates an accuracy of 91%.

18.
Trends Hear ; 25: 23312165211014118, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34028332

RESUMO

Clinical speech perception tests with simple presentation conditions often overestimate the impact of signal preprocessing on speech perception in complex listening environments. A new procedure was developed to assess speech perception in interleaved acoustic environments of different complexity that allows investigation of the impact of an automatic scene classification (ASC) algorithm on speech perception. The procedure was applied in cohorts of normal hearing (NH) controls and uni- and bilateral cochlear implant (CI) users. Speech reception thresholds (SRTs) were measured by means of a matrix sentence test in five acoustic environments that included different noise conditions (amplitude modulated and continuous), two spatial configurations, and reverberation. The acoustic environments were encapsulated in a randomized, mixed order single experimental run. Acoustic room simulation was played back with a loudspeaker auralization setup with 128 loudspeakers. 18 NH, 16 unilateral, and 16 bilateral CI users participated. SRTs were evaluated for each individual acoustic environment and as mean-SRT. Mean-SRTs improved by 2.4 dB signal-to-noise ratio for unilateral and 1.3 dB signal-to-noise ratio for bilateral CI users with activated ASC. Without ASC, the mean-SRT of bilateral CI users was 3.7 dB better than the SRT of unilateral CI users. The mean-SRT indicated significant differences, with NH group performing best and unilateral CI users performing worse with a difference of up to 13 dB compared to NH. The proposed speech test procedure successfully demonstrated that speech perception and benefit with ASC depend on the acoustic environment.


Assuntos
Implante Coclear , Implantes Cocleares , Percepção da Fala , Acústica , Humanos , Ruído
19.
Sensors (Basel) ; 21(7)2021 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-33805349

RESUMO

Recent studies have applied the superior performance of deep learning to mobile devices, and these studies have enabled the running of the deep learning model on a mobile device with limited computing power. However, there is performance degradation of the deep learning model when it is deployed in mobile devices, due to the different sensors of each device. To solve this issue, it is necessary to train a network model specific to each mobile device. Therefore, herein, we propose an acceleration method for on-device learning to mitigate the device heterogeneity. The proposed method efficiently utilizes unified memory for reducing the latency of data transfer during network model training. In addition, we propose the layer-wise processor selection method to consider the latency generated by the difference in the processor performing the forward propagation step and the backpropagation step in the same layer. The experiments were performed on an ODROID-XU4 with the ResNet-18 model, and the experimental results indicate that the proposed method reduces the latency by at most 28.4% compared to the central processing unit (CPU) and at most 21.8% compared to the graphics processing unit (GPU). Through experiments using various batch sizes to measure the average power consumption, we confirmed that device heterogeneity is alleviated by performing on-device learning using the proposed method.

20.
Sensors (Basel) ; 21(5)2021 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-33668138

RESUMO

While growing instruments generate more and more airborne or satellite images, the bottleneck in remote sensing (RS) scene classification has shifted from data limits toward a lack of ground truth samples. There are still many challenges when we are facing unknown environments, especially those with insufficient training data. Few-shot classification offers a different picture under the umbrella of meta-learning: digging rich knowledge from a few data are possible. In this work, we propose a method named RS-SSKD for few-shot RS scene classification from a perspective of generating powerful representation for the downstream meta-learner. Firstly, we propose a novel two-branch network that takes three pairs of original-transformed images as inputs and incorporates Class Activation Maps (CAMs) to drive the network mining, the most relevant category-specific region. This strategy ensures that the network generates discriminative embeddings. Secondly, we set a round of self-knowledge distillation to prevent overfitting and boost the performance. Our experiments show that the proposed method surpasses current state-of-the-art approaches on two challenging RS scene datasets: NWPU-RESISC45 and RSD46-WHU. Finally, we conduct various ablation experiments to investigate the effect of each component of the proposed method and analyze the training time of state-of-the-art methods and ours.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA