Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 69
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Sensors (Basel) ; 24(19)2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39409392

RESUMO

Occlusion presents a major obstacle in the development of pedestrian detection technologies utilizing computer vision. This challenge includes both inter-class occlusion caused by environmental objects obscuring pedestrians, and intra-class occlusion resulting from interactions between pedestrians. In complex and variable urban settings, these compounded occlusion patterns critically limit the efficacy of both one-stage and two-stage pedestrian detectors, leading to suboptimal detection performance. To address this, we introduce a novel architecture termed the Attention-Guided Feature Enhancement Network (AGFEN), designed within the deep convolutional neural network framework. AGFEN improves the semantic information of high-level features by mapping it onto low-level feature details through sampling, creating an effect comparable to mask modulation. This technique enhances both channel-level and spatial-level features concurrently without incurring additional annotation costs. Furthermore, we transition from a traditional one-to-one correspondence between proposals and predictions to a one-to-multiple paradigm, facilitating non-maximum suppression using the prediction set as the fundamental unit. Additionally, we integrate these methodologies by aggregating local features between regions of interest (RoI) through the reuse of classification weights, effectively mitigating false positives. Our experimental evaluations on three widely used datasets demonstrate that AGFEN achieves a 2.38% improvement over the baseline detector on the CrowdHuman dataset, underscoring its effectiveness and potential for advancing pedestrian detection technologies.

2.
Sensors (Basel) ; 24(11)2024 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-38894331

RESUMO

In view of the frequent failures occurring in rolling bearings, the strong background noise present in signals, weak features, and difficulties associated with extracting fault characteristics, a method of enhancing and diagnosing rolling bearing faults based on coarse-grained lattice features (CGLFs) is proposed. First, the vibrational signals of bearings are subjected to adaptive filtering to eliminate background noise. Second, frequency-domain transformation is performed, and a coarse-grained approach is used to continuously segment the spectrum. Within each segment, amplitude-enhancement operations are executed, transforming the data into a CGLF graph that enhances fault characteristics. This graph is then fed into a Swin Transformer-based pattern-recognition network. Third and finally, a high-precision fault diagnosis model is constructed using fully connected layers and Softmax, enabling the diagnosis of bearing faults. The fault recognition accuracy reaches 98.30% and 98.50% with public datasets and laboratory data, respectively, thereby validating the feasibility and effectiveness of the proposed method. This research offers an efficient and feasible fault diagnosis approach for rolling bearings.

3.
Sensors (Basel) ; 24(9)2024 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-38732944

RESUMO

Sea ice, as an important component of the Earth's ecosystem, has a profound impact on global climate and human activities due to its thickness. Therefore, the inversion of sea ice thickness has important research significance. Due to environmental and equipment-related limitations, the number of samples available for remote sensing inversion is currently insufficient. At high spatial resolutions, remote sensing data contain limited information and noise interference, which seriously affect the accuracy of sea ice thickness inversion. In response to the above issues, we conducted experiments using ice draft data from the Beaufort Sea and designed an improved GBDT method that integrates feature-enhancement and active-learning strategies (IFEAL-GBDT). In this method, the incident angle and time series are used to perform spatiotemporal correction of the data, reducing both temporal and spatial impacts. Meanwhile, based on the original polarization information, effective multi-attribute features are generated to expand the information content and improve the separability of sea ice with different thicknesses. Taking into account the growth cycle and age of sea ice, attributes were added for month and seawater temperature. In addition, we studied an active learning strategy based on the maximum standard deviation to select more informative and representative samples and improve the model's generalization ability. The improved GBDT model was used for training and prediction, offering advantages in dealing with nonlinear, high-dimensional data, and data noise problems, further expanding the effectiveness of feature-enhancement and active-learning strategies. Compared with other methods, the method proposed in this paper achieves the best inversion accuracy, with an average absolute error of 8 cm and a root mean square error of 13.7 cm for IFEAL-GBDT and a correlation coefficient of 0.912. This research proves the effectiveness of our method, which is suitable for the high-precision inversion of sea ice thickness determined using Sentinel-1 data.

4.
Sensors (Basel) ; 24(13)2024 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-39000965

RESUMO

Regarding the difficulty of extracting the acquired fault signal features of bearings from a strong background noise vibration signal, coupled with the fact that one-dimensional (1D) signals provide limited fault information, an optimal time frequency fusion symmetric dot pattern (SDP) bearing fault feature enhancement and diagnosis method is proposed. Firstly, the vibration signals are transformed into two-dimensional (2D) features by the time frequency fusion algorithm SDP, which can multi-scale analyze the fluctuations of signals at minor scales, as well as enhance bearing fault features. Secondly, the bat algorithm is employed to optimize the SDP parameters adaptively. It can effectively improve the distinctions between various types of faults. Finally, the fault diagnosis model can be constructed by a deep convolutional neural network (DCNN). To validate the effectiveness of the proposed method, Case Western Reserve University's (CWRU) bearing fault dataset and bearing fault dataset laboratory experimental platform were used. The experimental results illustrate that the fault diagnosis accuracy of the proposed method is 100%, which proves the feasibility and effectiveness of the proposed method. By comparing with other 2D transformer methods, the experimental results illustrate that the proposed method achieves the highest accuracy in bearing fault diagnosis. It validated the superiority of the proposed methodology.

5.
Sensors (Basel) ; 24(3)2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38339579

RESUMO

The recognition of human activity is crucial as the Internet of Things (IoT) progresses toward future smart homes. Wi-Fi-based motion-recognition stands out due to its non-contact nature and widespread applicability. However, the channel state information (CSI) related to human movement in indoor environments changes with the direction of movement, which poses challenges for existing Wi-Fi movement-recognition methods. These challenges include limited directions of movement that can be detected, short detection distances, and inaccurate feature extraction, all of which significantly constrain the wide-scale application of Wi-Fi action-recognition. To address this issue, we propose a direction-independent CSI fusion and sharing model named CSI-F, one which combines Convolutional Neural Networks (CNN) and Gated Recurrent Units (GRU). Specifically, we have introduced a series of signal-processing techniques that utilize antenna diversity to eliminate random phase shifts, thereby removing noise influences unrelated to motion information. Later, by amplifying the Doppler frequency shift effect through cyclic actions and generating a spectrogram, we further enhance the impact of actions on CSI. To demonstrate the effectiveness of this method, we conducted experiments on datasets collected in natural environments. We confirmed that the superposition of periodic actions on CSI can improve the accuracy of the process. CSI-F can achieve higher recognition accuracy compared with other methods and a monitoring coverage of up to 6 m.


Assuntos
Internet das Coisas , Movimento , Humanos , Movimento (Física) , Efeito Doppler , Meio Ambiente
6.
J Environ Manage ; 351: 119894, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38154219

RESUMO

Deep learning methods exhibited significant advantages in mapping highly nonlinear relationships with acceptable computational speed, and have been widely used to predict water quality. However, various model selection and construction methods resulted in differences in prediction accuracy and performance. Hence, a unified deep learning framework for water quality prediction was established in the paper, including data processing module, feature enhancement module, and data prediction module. In the established model, the data processing module based on wavelet transform method was applied to decomposing complex nonlinear meteorology, hydrology, and water quality data into multiple frequency domain signals for extracting self characteristics of data cyclic and fluctuations. The feature enhancement module based on Informer Encoder was used to enhance feature encoding of time series data in different frequency domains to discover global time dependent features of variables. Finally, the data prediction module based on the stacked bidirectional long and short term memory network (SBiLSTM) method was employed to strengthen the local correlation of feature sequences and predict the water quality. The established model framework was applied in Lijiang River in Guilin, China. The maximum relative errors between the predicted and observed values for dissolved oxygen (DO), chemical oxygen demand (CODMn) were 12.4% and 20.7%, suggesting a satisfactory prediction performance of the established model. The validation results showed that the established model was superior to all other models in terms of prediction accuracy with RMSE values 0.329, 0.121, MAE values 0.217, 0.057, SMAPE values 0.022, 0.063 for DO and CODMn, respectively. Ablation tests confirmed the necessity and rationality of each module for the established model framework. The established method provided a unified deep learning framework for water quality prediction.


Assuntos
Aprendizado Profundo , Qualidade da Água , China , Hidrologia , Meteorologia , Oxigênio
7.
Entropy (Basel) ; 26(8)2024 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-39202151

RESUMO

In order to minimize the disparity between visible and infrared modalities and enhance pedestrian feature representation, a cross-modality person re-identification method is proposed, which integrates modality generation and feature enhancement. Specifically, a lightweight network is used for dimension reduction and augmentation of visible images, and intermediate modalities are generated to bridge the gap between visible images and infrared images. The Convolutional Block Attention Module is embedded into the ResNet50 backbone network to selectively emphasize key features sequentially from both channel and spatial dimensions. Additionally, the Gradient Centralization algorithm is introduced into the Stochastic Gradient Descent optimizer to accelerate convergence speed and improve generalization capability of the network model. Experimental results on SYSU-MM01 and RegDB datasets demonstrate that our improved network model achieves significant performance gains, with an increase in Rank-1 accuracy of 7.12% and 6.34%, as well as an improvement in mAP of 4.00% and 6.05%, respectively.

8.
Sensors (Basel) ; 23(19)2023 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-37836932

RESUMO

Aiming to solve the problem of color distortion and loss of detail information in most dehazing algorithms, an end-to-end image dehazing network based on multi-scale feature enhancement is proposed. Firstly, the feature extraction enhancement module is used to capture the detailed information of hazy images and expand the receptive field. Secondly, the channel attention mechanism and pixel attention mechanism of the feature fusion enhancement module are used to dynamically adjust the weights of different channels and pixels. Thirdly, the context enhancement module is used to enhance the context semantic information, suppress redundant information, and obtain the haze density image with higher detail. Finally, our method removes haze, preserves image color, and ensures image details. The proposed method achieved a PSNR score of 33.74, SSIM scores of 0.9843 and LPIPS distance of 0.0040 on the SOTS-outdoor dataset. Compared with representative dehazing methods, it demonstrates better dehazing performance and proves the advantages of the proposed method on synthetic hazy images. Combined with dehazing experiments on real hazy images, the results show that our method can effectively improve dehazing performance while preserving more image details and achieving color fidelity.

9.
Sensors (Basel) ; 23(7)2023 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-37050483

RESUMO

There are problems associated with facial expression recognition (FER), such as facial occlusion and head pose variations. These two problems lead to incomplete facial information in images, making feature extraction extremely difficult. Most current methods use prior knowledge or fixed-size patches to perform local cropping, thereby enhancing the ability to acquire fine-grained features. However, the former requires extra data processing work and is prone to errors; the latter destroys the integrity of local features. In this paper, we propose a local Sliding Window Attention Network (SWA-Net) for FER. Specifically, we propose a sliding window strategy for feature-level cropping, which preserves the integrity of local features and does not require complex preprocessing. Moreover, the local feature enhancement module mines fine-grained features with intraclass semantics through a multiscale depth network. The adaptive local feature selection module is introduced to prompt the model to find more essential local features. Extensive experiments demonstrate that our SWA-Net model achieves a comparable performance to that of state-of-the-art methods with scores of 90.03% on RAF-DB, 89.22% on FERPlus, 63.97% on AffectNet.


Assuntos
Reconhecimento Facial , Face , Conhecimento , Semântica , Expressão Facial
10.
Sensors (Basel) ; 23(5)2023 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-36904643

RESUMO

As small commodity features are often few in number and easily occluded by hands, the overall detection accuracy is low, and small commodity detection is still a great challenge. Therefore, in this study, a new algorithm for occlusion detection is proposed. Firstly, a super-resolution algorithm with an outline feature extraction module is used to process the input video frames to restore high-frequency details, such as the contours and textures of the commodities. Next, residual dense networks are used for feature extraction, and the network is guided to extract commodity feature information under the effects of an attention mechanism. As small commodity features are easily ignored by the network, a new local adaptive feature enhancement module is designed to enhance the regional commodity features in the shallow feature map to enhance the expression of the small commodity feature information. Finally, a small commodity detection box is generated through the regional regression network to complete the small commodity detection task. Compared to RetinaNet, the F1-score improved by 2.6%, and the mean average precision improved by 2.45%. The experimental results reveal that the proposed method can effectively enhance the expressions of the salient features of small commodities and further improve the detection accuracy for small commodities.

11.
Entropy (Basel) ; 25(9)2023 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-37761649

RESUMO

The House-Tree-Person (HTP) sketch test is a psychological analysis technique designed to assess the mental health status of test subjects. Nowadays, there are mature methods for the recognition of depression using the HTP sketch test. However, existing works primarily rely on manual analysis of drawing features, which has the drawbacks of strong subjectivity and low automation. Only a small number of works automatically recognize depression using machine learning and deep learning methods, but their complex data preprocessing pipelines and multi-stage computational processes indicate a relatively low level of automation. To overcome the above issues, we present a novel deep learning-based one-stage approach for depression recognition in HTP sketches, which has a simple data preprocessing pipeline and calculation process with a high accuracy rate. In terms of data, we use a hand-drawn HTP sketch dataset, which contains drawings of normal people and patients with depression. In the model aspect, we design a novel network called Feature-Enhanced Bi-Level Attention Network (FBANet), which contains feature enhancement and bi-level attention modules. Due to the limited size of the collected data, transfer learning is employed, where the model is pre-trained on a large-scale sketch dataset and fine-tuned on the HTP sketch dataset. On the HTP sketch dataset, utilizing cross-validation, FBANet achieves a maximum accuracy of 99.07% on the validation dataset, with an average accuracy of 97.71%, outperforming traditional classification models and previous works. In summary, the proposed FBANet, after pre-training, demonstrates superior performance on the HTP sketch dataset and is expected to be a method for the auxiliary diagnosis of depression.

12.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi ; 40(3): 409-417, 2023 Jun 25.
Artigo em Zh | MEDLINE | ID: mdl-37380378

RESUMO

High-frequency steady-state asymmetric visual evoked potential (SSaVEP) provides a new paradigm for designing comfortable and practical brain-computer interface (BCI) systems. However, due to the weak amplitude and strong noise of high-frequency signals, it is of great significance to study how to enhance their signal features. In this study, a 30 Hz high-frequency visual stimulus was used, and the peripheral visual field was equally divided into eight annular sectors. Eight kinds of annular sector pairs were selected based on the mapping relationship of visual space onto the primary visual cortex (V1), and three phases (in-phase[0º, 0º], anti-phase [0º, 180º], and anti-phase [180º, 0º]) were designed for each annular sector pair to explore response intensity and signal-to-noise ratio under phase modulation. A total of 8 healthy subjects were recruited in the experiment. The results showed that three annular sector pairs exhibited significant differences in SSaVEP features under phase modulation at 30 Hz high-frequency stimulation. And the spatial feature analysis showed that the two types of features of the annular sector pair in the lower visual field were significantly higher than those in the upper visual field. This study further used the filter bank and ensemble task-related component analysis to calculate the classification accuracy of annular sector pairs under three-phase modulations, and the average accuracy was up to 91.5%, which proved that the phase-modulated SSaVEP features could be used to encode high- frequency SSaVEP. In summary, the results of this study provide new ideas for enhancing the features of high-frequency SSaVEP signals and expanding the instruction set of the traditional steady state visual evoked potential paradigm.


Assuntos
Interfaces Cérebro-Computador , Potenciais Evocados Visuais , Humanos , Voluntários Saudáveis , Razão Sinal-Ruído
13.
Sensors (Basel) ; 22(14)2022 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-35891112

RESUMO

Regularization has become an important method in adversarial defense. However, the existing regularization-based defense methods do not discuss which features in convolutional neural networks (CNN) are more suitable for regularization. Thus, in this paper, we propose a multi-stage feature fusion network with a feature regularization operation, which is called Enhanced Multi-Stage Feature Fusion Network (EMSF2Net). EMSF2Net mainly combines three parts: multi-stage feature enhancement (MSFE), multi-stage feature fusion (MSF2), and regularization. Specifically, MSFE aims to obtain enhanced and expressive features in each stage by multiplying the features of each channel; MSF2 aims to fuse the enhanced features of different stages to further enrich the information of the feature, and the regularization part can regularize the fused and original features during the training process. EMSF2Net has proved that if the regularization term of the enhanced multi-stage feature is added, the adversarial robustness of CNN will be significantly improved. The experimental results on extensive white-box attacks on the CIFAR-10 dataset illustrate the robustness and effectiveness of the proposed method.


Assuntos
Redes Neurais de Computação
14.
Sensors (Basel) ; 22(14)2022 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-35890899

RESUMO

This paper proposes a tracking method combining feature enhancement and template update, aiming to solve the problems of existing trackers lacking global information attention, weak feature characterization ability, and not being well adapted to the changing appearance of the target. Pre-extracted features are enhanced in context and on channels through a feature enhancement network consisting of channel attention and transformer architectures. The enhanced feature information is input into classification and regression networks to achieve the final target state estimation. At the same time, the template update strategy is introduced to update the sample template judiciously. Experimental results show that the proposed tracking method exhibits good tracking performance on the OTB100, LaSOT, and GOT-10k benchmark datasets.


Assuntos
Atenção , Processamento de Imagem Assistida por Computador , Adaptação Fisiológica , Fontes de Energia Elétrica , Processamento de Imagem Assistida por Computador/métodos
15.
Sensors (Basel) ; 22(3)2022 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-35161462

RESUMO

Ship type classification is an essential task in maritime navigation domains, contributing to shipping monitoring, analysis, and forecasting. Presently, with the development of ship positioning and monitoring systems, many ship trajectory acquisitions make it possible to classify ships according to their movement pattern. Existing methods of ship classification based on trajectory include classical sequence analysis and deep learning methods. However, the real ship trajectories are unevenly distributed in geographical space, which leads to many problems in inferring the ship movement mode on the original ship trajectory. This paper proposes a hierarchical spatial-temporal embedding method based on enhanced trajectory features for ship type classification. We first preprocess the trajectory and combine the port information to transform the original ship trajectory into the moored records of ships, removing the unevenly distributed points in the trajectory data and enhancing key points' semantic information. Then, we propose a Hierarchical Spatial-Temporal Embedding Method (Hi-STEM) for ship classification. Hi-STEM maps moored records in the original geographical space into the feature space and can efficiently find the classification plane in the feature space. Experiments are conducted on real-world datasets and compared with several existing methods. The result shows that our approach has high accuracy in ship classification on ship moored records. We make the source code and datasets publicly available.


Assuntos
Semântica , Navios
16.
Sensors (Basel) ; 21(5)2021 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-33807795

RESUMO

The existing pedestrian detection algorithms cannot effectively extract features of heavily occluded targets which results in lower detection accuracy. To solve the heavy occlusion in crowds, we propose a multi-scale feature pyramid network based on ResNet (MFPN) to enhance the features of occluded targets and improve the detection accuracy. MFPN includes two modules, namely double feature pyramid network (FPN) integrated with ResNet (DFR) and repulsion loss of minimum (RLM). We propose the double FPN which improves the architecture to further enhance the semantic information and contours of occluded pedestrians, and provide a new way for feature extraction of occluded targets. The features extracted by our network can be more separated and clearer, especially those heavily occluded pedestrians. Repulsion loss is introduced to improve the loss function which can keep predicted boxes away from the ground truths of the unrelated targets. Experiments carried out on the public CrowdHuman dataset, we obtain 90.96% AP which yields the best performance, 5.16% AP gains compared to the FPN-ResNet50 baseline. Compared with the state-of-the-art works, the performance of the pedestrian detection system has been boosted with our method.

17.
Sensors (Basel) ; 21(14)2021 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-34300386

RESUMO

In recent years, more and more frameworks have been applied to brain-computer interface technology, and electroencephalogram-based motor imagery (MI-EEG) is developing rapidly. However, it is still a challenge to improve the accuracy of MI-EEG classification. A deep learning framework termed IS-CBAM-convolutional neural network (CNN) is proposed to address the non-stationary nature, the temporal localization of excitation occurrence, and the frequency band distribution characteristics of the MI-EEG signal in this paper. First, according to the logically symmetrical relationship between the C3 and C4 channels, the result of the time-frequency image subtraction (IS) for the MI-EEG signal is used as the input of the classifier. It both reduces the redundancy and increases the feature differences of the input data. Second, the attention module is added to the classifier. A convolutional neural network is built as the base classifier, and information on the temporal location and frequency distribution of MI-EEG signal occurrences are adaptively extracted by introducing the Convolutional Block Attention Module (CBAM). This approach reduces irrelevant noise interference while increasing the robustness of the pattern. The performance of the framework was evaluated on BCI competition IV dataset 2b, where the mean accuracy reached 79.6%, and the average kappa value reached 0.592. The experimental results validate the feasibility of the framework and show the performance improvement of MI-EEG signal classification.


Assuntos
Interfaces Cérebro-Computador , Imaginação , Algoritmos , Eletroencefalografia , Redes Neurais de Computação
18.
J Digit Imaging ; 33(1): 273-285, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31270646

RESUMO

Speckle noise reduction algorithms are extensively used in the field of ultrasound image analysis with the aim of improving image quality and diagnostic accuracy. However, significant speckle filtering induces blurring, and this requires the enhancement of features and fine details. We propose a novel framework for both multiplicative noise suppression and robust contrast enhancement and demonstrate its effectiveness using a wide range of clinical ultrasound scans. Our approach to noise suppression uses a novel algorithm based on a convolutional neural network that is first trained on synthetically modeled ultrasound images and then applied on real ultrasound videos. The feature improvement stage uses an improved contrast-limited adaptive histogram equalization (CLAHE) method for enhancing texture features, contrast, resolvable details, and image structures to which the human visual system is sensitive in ultrasound video frames. The proposed CLAHE algorithm also considers an automatic system for evaluating the grid size using entropy, and three different target distribution functions (uniform, Rayleigh, and exponential), and interpolation techniques (B-spline, cubic, and Lanczos-3). An extensive comparative study has been performed to find the most suitable distribution and interpolation techniques and also the optimal clip limit for ultrasound video feature enhancement after speckle suppression. Subjective assessments by four radiologists and experimental validation using three quality metrics clearly indicate that the proposed framework generates superior performance compared with other well-established methods. The processing pipeline reduces speckle effectively while preserving essential information and enhancing the overall visual quality and therefore could find immediate applications in real-time ultrasound video segmentation and classification algorithms.


Assuntos
Aumento da Imagem , Algoritmos , Sistemas Computacionais , Humanos , Ultrassonografia
19.
Sensors (Basel) ; 19(4)2019 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-30781487

RESUMO

Electroencephalography (EEG) provides a non-invasive, portable and low-cost way to convert neural signals into electrical signals. Using EEG to monitor people's cognitive workload means a lot, especially for tasks demanding high attention. Before deep neural networks became a research hotspot, the use of spectrum information and the common spatial pattern algorithm (CSP) was the most popular method to classify EEG-based cognitive workloads. Recently, spectral maps have been combined with deep neural networks to achieve a final accuracy of 91.1% across four levels of cognitive workload. In this study, a parallel mechanism of spectral feature-enhanced maps is proposed which enhances the expression of structural information that may be compressed by inter- and intra-subject differences. A public dataset and milestone neural networks, such as AlexNet, VGGNet, ResNet, DenseNet are used to measure the effectiveness of this approach. As a result, the classification accuracy is improved from 91.10% to 93.71%.

20.
Sensors (Basel) ; 18(3)2018 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-29547551

RESUMO

In object detection systems for autonomous driving, LIDAR sensors provide very useful information. However, problems occur because the object representation is greatly distorted by changes in distance. To solve this problem, we propose a LIDAR shape set that reconstructs the shape surrounding the object more clearly by using the LIDAR point information projected on the object. The LIDAR shape set restores object shape edges from a bird's eye view by filtering LIDAR points projected on a 2D pixel-based front view. In this study, we use this shape set for two purposes. The first is to supplement the shape set with a LIDAR Feature map, and the second is to divide the entire shape set according to the gradient of the depth and density to create a 2D and 3D bounding box proposal for each object. We present a multimodal fusion framework that classifies objects and restores the 3D pose of each object using enhanced feature maps and shape-based proposals. The network structure consists of a VGG -based object classifier that receives multiple inputs and a LIDAR-based Region Proposal Networks (RPN) that identifies object poses. It works in a very intuitive and efficient manner and can be extended to other classes other than vehicles. Our research has outperformed object classification accuracy (Average Precision, AP) and 3D pose restoration accuracy (3D bounding box recall rate) based on the latest studies conducted with KITTI data sets.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA