Pesquisa | Secretaria de Estado da Saúde

1.

SEG-SLAM: Dynamic Indoor RGB-D Visual SLAM Integrating Geometric and YOLOv5-Based Semantic Information.

Cong, Peichao; Li, Jiaxing; Liu, Junjie; Xiao, Yixuan; Zhang, Xin.

Sensors (Basel) ; 24(7)2024 Mar 25.

Artigo em Inglês | MEDLINE | ID: mdl-38610313

RESUMO

Simultaneous localisation and mapping (SLAM) is crucial in mobile robotics. Most visual SLAM systems assume that the environment is static. However, in real life, there are many dynamic objects, which affect the accuracy and robustness of these systems. To improve the performance of visual SLAM systems, this study proposes a dynamic visual SLAM (SEG-SLAM) system based on the orientated FAST and rotated BRIEF (ORB)-SLAM3 framework and you only look once (YOLO)v5 deep-learning method. First, based on the ORB-SLAM3 framework, the YOLOv5 deep-learning method is used to construct a fusion module for target detection and semantic segmentation. This module can effectively identify and extract prior information for obviously and potentially dynamic objects. Second, differentiated dynamic feature point rejection strategies are developed for different dynamic objects using the prior information, depth information, and epipolar geometry method. Thus, the localisation and mapping accuracy of the SEG-SLAM system is improved. Finally, the rejection results are fused with the depth information, and a static dense 3D mapping without dynamic objects is constructed using the Point Cloud Library. The SEG-SLAM system is evaluated using public TUM datasets and real-world scenarios. The proposed method is more accurate and robust than current dynamic visual SLAM algorithms.

2.

Nonintrusive and Effective Volume Reconstruction Model of Swimming Sturgeon Based on RGB-D Sensor.

Lin, Kai; Zhang, Shiyu; Hu, Junjie; Li, Hongsong; Guo, Wenzhong; Hu, Hongxia.

Sensors (Basel) ; 24(15)2024 Aug 03.

Artigo em Inglês | MEDLINE | ID: mdl-39124084

RESUMO

The sturgeon is an important commercial aquaculture species in China. The measurement of sturgeon mass plays a remarkable role in aquaculture management. Furthermore, the measurement of sturgeon mass serves as a key phenotype, offering crucial information for enhancing growth traits through genetic improvement. Until now, the measurement of sturgeon mass is usually conducted by manual sampling, which is work intensive and time consuming for farmers and invasive and stressful for the fish. Therefore, a noninvasive volume reconstruction model for estimating the mass of swimming sturgeon based on RGB-D sensor was proposed in this paper. The volume of individual sturgeon was reconstructed by integrating the thickness of the upper surface of the sturgeon, where the difference in depth between the surface and the bottom was used as the thickness measurement. To verify feasibility, three experimental groups were conducted, achieving prediction accuracies of 0.897, 0.861, and 0.883, which indicated that the method can obtain the reliable, accurate mass of the sturgeon. The strategy requires no special hardware or intensive calculation, and it provides a key to uncovering noncontact, high-throughput, and highly sensitive mass evaluation of sturgeon while holding potential for evaluating the mass of other cultured fishes.

Assuntos

Aquicultura , Peixes , Natação , Animais , Peixes/fisiologia , Natação/fisiologia , Aquicultura/métodos

3.

Revolutionizing Robotic Depalletizing: AI-Enhanced Parcel Detecting with Adaptive 3D Machine Vision and RGB-D Imaging for Automated Unloading.

Kim, Seongje; Truong, Van-Doi; Lee, Kwang-Hee; Yoon, Jonghun.

Sensors (Basel) ; 24(5)2024 Feb 24.

Artigo em Inglês | MEDLINE | ID: mdl-38475009

RESUMO

Detecting parcels accurately and efficiently has always been a challenging task when unloading from trucks onto conveyor belts because of the diverse and complex ways in which parcels are stacked. Conventional methods struggle to quickly and accurately classify the various shapes and surface patterns of unordered parcels. In this paper, we propose a parcel-picking surface detection method based on deep learning and image processing for the efficient unloading of diverse and unordered parcels. Our goal is to develop a systematic image processing algorithm that emphasises the boundaries of parcels regardless of their shape, pattern, or layout. The core of the algorithm is the utilisation of RGB-D technology for detecting the primary boundary lines regardless of obstacles such as adhesive labels, tapes, or parcel surface patterns. For cases where detecting the boundary lines is difficult owing to narrow gaps between parcels, we propose using deep learning-based boundary line detection through the You Only Look at Coefficients (YOLACT) model. Using image segmentation techniques, the algorithm efficiently predicts boundary lines, enabling the accurate detection of irregularly sized parcels with complex surface patterns. Furthermore, even for rotated parcels, we can extract their edges through complex mathematical operations using the depth values of the specified position, enabling the detection of the wider surfaces of the rotated parcels. Finally, we validate the accuracy and real-time performance of our proposed method through various case studies, achieving mAP (50) values of 93.8% and 90.8% for randomly sized and rotationally covered boxes with diverse colours and patterns, respectively.

4.

Amount Estimation Method for Food Intake Based on Color and Depth Images through Deep Learning.

Lee, Dong-Seok; Kwon, Soon-Kak.

Sensors (Basel) ; 24(7)2024 Mar 22.

Artigo em Inglês | MEDLINE | ID: mdl-38610258

RESUMO

In this paper, we propose an amount estimation method for food intake based on both color and depth images. Two pairs of color and depth images are captured pre- and post-meals. The pre- and post-meal color images are employed to detect food types and food existence regions using Mask R-CNN. The post-meal color image is spatially transformed to match the food region locations between the pre- and post-meal color images. The same transformation is also performed on the post-meal depth image. The pixel values of the post-meal depth image are compensated to reflect 3D position changes caused by the image transformation. In both the pre- and post-meal depth images, a space volume for each food region is calculated by dividing the space between the food surfaces and the camera into multiple tetrahedra. The food intake amounts are estimated as the difference in space volumes calculated from the pre- and post-meal depth images. From the simulation results, we verify that the proposed method estimates the food intake amount with an error of up to 2.2%.

Assuntos

Aprendizado Profundo , Simulação por Computador , Alimentos , Período Pós-Prandial , Ingestão de Alimentos

5.

SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection.

Peng, Yanbin; Zhai, Zhinian; Feng, Mingkun.

Sensors (Basel) ; 24(4)2024 Feb 08.

Artigo em Inglês | MEDLINE | ID: mdl-38400274

RESUMO

Salient Object Detection (SOD) in RGB-D images plays a crucial role in the field of computer vision, with its central aim being to identify and segment the most visually striking objects within a scene. However, optimizing the fusion of multi-modal and multi-scale features to enhance detection performance remains a challenge. To address this issue, we propose a network model based on semantic localization and multi-scale fusion (SLMSF-Net), specifically designed for RGB-D SOD. Firstly, we designed a Deep Attention Module (DAM), which extracts valuable depth feature information from both channel and spatial perspectives and efficiently merges it with RGB features. Subsequently, a Semantic Localization Module (SLM) is introduced to enhance the top-level modality fusion features, enabling the precise localization of salient objects. Finally, a Multi-Scale Fusion Module (MSF) is employed to perform inverse decoding on the modality fusion features, thus restoring the detailed information of the objects and generating high-precision saliency maps. Our approach has been validated across six RGB-D salient object detection datasets. The experimental results indicate an improvement of 0.20~1.80%, 0.09~1.46%, 0.19~1.05%, and 0.0002~0.0062, respectively in maxF, maxE, S, and MAE metrics, compared to the best competing methods (AFNet, DCMF, and C2DFNet).

6.

A Computer Vision-Based System to Help Health Professionals to Apply Tests for Fall Risk Assessment.

Blasco-García, Jesús Damián; García-López, Gabriel; Jiménez-Muñoz, Marta; López-Riquelme, Juan Antonio; Feliu-Batlle, Jorge Juan; Pavón-Pulido, Nieves; Herrero, María-Trinidad.

Sensors (Basel) ; 24(6)2024 Mar 21.

Artigo em Inglês | MEDLINE | ID: mdl-38544276

RESUMO

The increase in life expectancy, and the consequent growth of the elderly population, represents a major challenge to guarantee adequate health and social care. The proposed system aims to provide a tool that automates the evaluation of gait and balance, essential to prevent falls in older people. Through an RGB-D camera, it is possible to capture and digitally represent certain parameters that describe how users carry out certain human motions and poses. Such individual motions and poses are actually related to items included in many well-known gait and balance evaluation tests. According to that information, therapists, who would not need to be present during the execution of the exercises, evaluate the results of such tests and could issue a diagnosis by storing and analyzing the sequences provided by the developed system. The system was validated in a laboratory scenario, and subsequently a trial was carried out in a nursing home with six residents. Results demonstrate the usefulness of the proposed system and the ease of objectively evaluating the main items of clinical tests by using the parameters calculated from information acquired with the RGB-D sensor. In addition, it lays the future foundations for creating a Cloud-based platform for remote fall risk assessment and its integration with a mobile assistant robot, and for designing Artificial Intelligence models that can detect patterns and identify pathologies for enabling therapists to prevent falls in users under risk.

Assuntos

Inteligência Artificial , Terapia por Exercício , Humanos , Idoso , Medição de Risco/métodos , Computadores

7.

FusionVision: A Comprehensive Approach of 3D Object Reconstruction and Segmentation from RGB-D Cameras Using YOLO and Fast Segment Anything.

El Ghazouali, Safouane; Mhirit, Youssef; Oukhrid, Ali; Michelucci, Umberto; Nouira, Hichem.

Sensors (Basel) ; 24(9)2024 Apr 30.

Artigo em Inglês | MEDLINE | ID: mdl-38732995

RESUMO

In the realm of computer vision, the integration of advanced techniques into the pre-processing of RGB-D camera inputs poses a significant challenge, given the inherent complexities arising from diverse environmental conditions and varying object appearances. Therefore, this paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery. Traditional computer vision systems face limitations in simultaneously capturing precise object boundaries and achieving high-precision object detection on depth maps, as they are mainly proposed for RGB cameras. To address this challenge, FusionVision adopts an integrated approach by merging state-of-the-art object detection techniques, with advanced instance segmentation methods. The integration of these components enables a holistic (unified analysis of information obtained from both color RGB and depth D channels) interpretation of RGB-D data, facilitating the extraction of comprehensive and accurate object information in order to improve post-processes such as object 6D pose estimation, Simultanious Localization and Mapping (SLAM) operations, accurate 3D dataset extraction, etc. The proposed FusionVision pipeline employs YOLO for identifying objects within the RGB image domain. Subsequently, FastSAM, an innovative semantic segmentation model, is applied to delineate object boundaries, yielding refined segmentation masks. The synergy between these components and their integration into 3D scene understanding ensures a cohesive fusion of object detection and segmentation, enhancing overall precision in 3D object segmentation.

8.

A Smart Cane Based on 2D LiDAR and RGB-D Camera Sensor-Realizing Navigation and Obstacle Recognition.

Mai, Chunming; Chen, Huaze; Zeng, Lina; Li, Zaijin; Liu, Guojun; Qiao, Zhongliang; Qu, Yi; Li, Lianhe; Li, Lin.

Sensors (Basel) ; 24(3)2024 Jan 29.

Artigo em Inglês | MEDLINE | ID: mdl-38339588

RESUMO

In this paper, an intelligent blind guide system based on 2D LiDAR and RGB-D camera sensing is proposed, and the system is mounted on a smart cane. The intelligent guide system relies on 2D LiDAR, an RGB-D camera, IMU, GPS, Jetson nano B01, STM32, and other hardware. The main advantage of the intelligent guide system proposed by us is that the distance between the smart cane and obstacles can be measured by 2D LiDAR based on the cartographer algorithm, thus achieving simultaneous localization and mapping (SLAM). At the same time, through the improved YOLOv5 algorithm, pedestrians, vehicles, pedestrian crosswalks, traffic lights, warning posts, stone piers, tactile paving, and other objects in front of the visually impaired can be quickly and effectively identified. Laser SLAM and improved YOLOv5 obstacle identification tests were carried out inside a teaching building on the campus of Hainan Normal University and on a pedestrian crossing on Longkun South Road in Haikou City, Hainan Province. The results show that the intelligent guide system developed by us can drive the omnidirectional wheels at the bottom of the smart cane and provide the smart cane with a self-leading blind guide function, like a "guide dog", which can effectively guide the visually impaired to avoid obstacles and reach their predetermined destination, and can quickly and effectively identify the obstacles on the way out. The mapping and positioning accuracy of the system's laser SLAM is 1 m ± 7 cm, and the laser SLAM speed of this system is 25~31 FPS, which can realize the short-distance obstacle avoidance and navigation function both in indoor and outdoor environments. The improved YOLOv5 helps to identify 86 types of objects. The recognition rates for pedestrian crosswalks and for vehicles are 84.6% and 71.8%, respectively; the overall recognition rate for 86 types of objects is 61.2%, and the obstacle recognition rate of the intelligent guide system is 25-26 FPS.

9.

Optimally leveraging depth features to enhance segmentation of recyclables from cluttered construction and demolition waste streams.

Prasad, Vineet; Arashpour, Mehrdad.

J Environ Manage ; 354: 120313, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38367501

RESUMO

This paper addresses the critical environmental issue of effectively managing construction and demolition waste (CDW), which has seen a global surge due to rapid urbanization. With the advent of deep learning-based computer vision, this study focuses on improving intelligent identification of valuable recyclables from cluttered and heterogeneous CDW streams in material recovery facilities (MRFs) by optimally leveraging both visual and spatial features (depth). A high-quality CDW RGB-D dataset was curated to capture MRF stream complexities often overlooked in prior studies, and comprises over 3500 images for each modality and more than 160,000 dense object instances of diverse CDW materials with high resource value. In contrast to former studies which directly concatenate RGB and depth features, this study introduces a new depth fusion strategy that utilizes computationally efficient convolutional operations at the end of the conventional waste segmentation architecture to effectively fuse colour and depth information. This avoids cross-modal interference and maximizes the use of distinct information present in the two different modalities. Despite the high clutter and diversity of waste objects, the proposed RGB-DL architecture achieves a 13% increase in segmentation accuracy and a 36% reduction in inference time when compared to the direct concatenation of features. The findings of this study emphasize the benefit of effectively incorporating geometrical features to complement visual cues. This approach helps to deal with the cluttered and varied nature of CDW streams, enhancing automated waste recognition accuracy to improve resource recovery in MRFs. This, in turn, promotes intelligent solid waste management for efficiently managing environmental concerns.

Assuntos

Indústria da Construção , Gerenciamento de Resíduos , Indústria da Construção/métodos , Materiais de Construção , Reciclagem/métodos , Gerenciamento de Resíduos/métodos , Resíduos Sólidos/análise , Resíduos Industriais/análise

10.

Robot Pose Estimation and Normal Trajectory Generation on Curved Surface Using an Enhanced Non-Contact Approach.

Shah, Syed Humayoon; Lin, Chyi-Yeu; Tran, Chi-Cuong; Ahmad, Anton Royanto.

Sensors (Basel) ; 23(8)2023 Apr 07.

Artigo em Inglês | MEDLINE | ID: mdl-37112157

RESUMO

The use of robots for machining operations has become very popular in the last few decades. However, the challenge of the robotic-based machining process, such as surface finishing on curved surfaces, still persists. Prior studies (non-contact- and contact-based) have their own limitations, such as fixture error and surface friction. To cope with these challenges, this study proposes an advanced technique for path correction and normal trajectory generation while tracking a curved workpiece's surface. Initially, a key-point selection approach is used to estimate a reference workpiece's coordinates using a depth measuring tool. This approach overcomes the fixture errors and enables the robot to track the desired path, i.e., where the surface normal trajectory is needed. Subsequently, this study employs an attached RGB-D camera on the end-effector of the robot for determining the depth and angle between the robot and the contact surface, which nullifies surface friction issues. The point cloud information of the contact surface is employed by the pose correction algorithm to guarantee the robot's perpendicularity and constant contact with the surface. The efficiency of the proposed technique is analyzed by carrying out several experimental trials using a 6 DOF robot manipulator. The results reveal a better normal trajectory generation than previous state-of-the-art research, with an average angle and depth error of 1.8 degrees and 0.4 mm.

11.

The Effect of Surrounding Vegetation on Basal Stem Measurements Acquired Using Low-Cost Depth Sensors in Urban and Native Forest Environments.

McGlade, James; Wallace, Luke; Hally, Bryan; Reinke, Karin; Jones, Simon.

Sensors (Basel) ; 23(8)2023 Apr 12.

Artigo em Inglês | MEDLINE | ID: mdl-37112278

RESUMO

Three colour and depth (RGB-D) devices were compared, to assess the effect of depth image misalignment, resulting from simultaneous localisation and mapping (SLAM) error, due to forest structure complexity. Urban parkland (S1) was used to assess stem density, and understory vegetation (≤1.3 m) was assessed in native woodland (S2). Individual stem and continuous capture approaches were used, with stem diameter at breast height (DBH) estimated. Misalignment was present within point clouds; however, no significant differences in DBH were observed for stems captured at S1 with either approach (Kinect p = 0.16; iPad p = 0.27; Zed p = 0.79). Using continuous capture, the iPad was the only RGB-D device to maintain SLAM in all S2 plots. There was significant correlation between DBH error and surrounding understory vegetation with the Kinect device (p = 0.04). Conversely, there was no significant relationship between DBH error and understory vegetation for the iPad (p = 0.55) and Zed (p = 0.86). The iPad had the lowest DBH root-mean-square error (RMSE) across both individual stem (RMSE = 2.16cm) and continuous (RMSE = 3.23cm) capture approaches. The results suggest that the assessed RGB-D devices are more capable of operation within complex forest environments than previous generations.

Assuntos

Florestas

12.

WE3DS: An RGB-D Image Dataset for Semantic Segmentation in Agriculture.

Kitzler, Florian; Barta, Norbert; Neugschwandtner, Reinhard W; Gronauer, Andreas; Motsch, Viktoria.

Sensors (Basel) ; 23(5)2023 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-36904917

RESUMO

Smart farming (SF) applications rely on robust and accurate computer vision systems. An important computer vision task in agriculture is semantic segmentation, which aims to classify each pixel of an image and can be used for selective weed removal. State-of-the-art implementations use convolutional neural networks (CNN) that are trained on large image datasets. In agriculture, publicly available RGB image datasets are scarce and often lack detailed ground-truth information. In contrast to agriculture, other research areas feature RGB-D datasets that combine color (RGB) with additional distance (D) information. Such results show that including distance as an additional modality can improve model performance further. Therefore, we introduce WE3DS as the first RGB-D image dataset for multi-class plant species semantic segmentation in crop farming. It contains 2568 RGB-D images (color image and distance map) and corresponding hand-annotated ground-truth masks. Images were taken under natural light conditions using an RGB-D sensor consisting of two RGB cameras in a stereo setup. Further, we provide a benchmark for RGB-D semantic segmentation on the WE3DS dataset and compare it with a solely RGB-based model. Our trained models achieve up to 70.7% mean Intersection over Union (mIoU) for discriminating between soil, seven crop species, and ten weed species. Finally, our work confirms the finding that additional distance information improves segmentation quality.

13.

RGB-D-Based Stair Detection and Estimation Using Deep Learning.

Wang, Chen; Pei, Zhongcai; Qiu, Shuang; Tang, Zhiyong.

Sensors (Basel) ; 23(4)2023 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-36850775

RESUMO

Stairs are common vertical traffic structures in buildings, and stair detection tasks are important in environmental perception for autonomous mobile robots. Most existing algorithms have difficulty combining the visual information from binocular sensors effectively and ensuring reliable detection at night and in the case of extremely fuzzy visual clues. To solve these problems, we propose a stair detection network with red-green-blue (RGB) and depth inputs. Specifically, we design a selective module, which can make the network learn the complementary relationship between the RGB feature maps and the depth feature maps and fuse the features effectively in different scenes. In addition, we propose several postprocessing algorithms, including a stair line clustering algorithm and a coordinate transformation algorithm, to obtain the stair geometric parameters. Experiments show that our method has better performance than existing the state-of-the-art deep learning method, and the accuracy, recall, and runtime are improved by 5.64%, 7.97%, and 3.81 ms, respectively. The improved indexes show the effectiveness of the multimodal inputs and the selective module. The estimation values of stair geometric parameters have root mean square errors within 15 mm when ascending stairs and 25 mm when descending stairs. Our method also has extremely fast detection speed, which can meet the requirements of most real-time applications.

14.

Assessing Handrail-Use Behavior during Stair Ascent or Descent Using Ambient Sensing Technology.

Miyazaki, Yusuke; Shoda, Kohei; Kitamura, Koji; Nishida, Yoshifumi.

Sensors (Basel) ; 23(4)2023 Feb 16.

Artigo em Inglês | MEDLINE | ID: mdl-36850832

RESUMO

The increasing geriatric population across the world has necessitated the early detection of frailty through the analysis of daily-life behavioral patterns. This paper presents a system for ambient, automatic, and the continuous measurement and analysis of ascent and descent motions and long-term handrail-use behaviors of participants in their homes using an RGB-D camera. The system automatically stores information regarding the environment and three-dimensional skeletal coordinates of the participant only when they appear within the camera's angle of view. Daily stair ascent and descent motions were measured in two houses: one house with two participants in their 20s and two in their 50s, and another with two participants in their 70s. The recorded behaviors were analyzed in terms of the stair ascent/descent speed, handrail grasping points, and frequency determined using the decision tree algorithm. The participants in their 70s exhibited a decreased stair ascent/descent speed compared to other participants; those in their 50s and 70s exhibited increased handrail usage area and frequency. The outcomes of the study indicate the system's ability to accurately detect a decline in physical function through the continuous measurement of daily stair ascent and descent motions.

Assuntos

Algoritmos , Fragilidade , Idoso , Humanos , Estilo de Vida , Movimento (Física) , Tecnologia

15.

KD-Net: Continuous-Keystroke-Dynamics-Based Human Identification from RGB-D Image Sequences.

Dai, Xinxin; Zhao, Ran; Hu, Pengpeng; Munteanu, Adrian.

Sensors (Basel) ; 23(20)2023 Oct 10.

Artigo em Inglês | MEDLINE | ID: mdl-37896466

RESUMO

Keystroke dynamics is a soft biometric based on the assumption that humans always type in uniquely characteristic manners. Previous works mainly focused on analyzing the key press or release events. Unlike these methods, we explored a novel visual modality of keystroke dynamics for human identification using a single RGB-D sensor. In order to verify this idea, we created a dataset dubbed KD-MultiModal, which contains 243.2 K frames of RGB images and depth images, obtained by recording a video of hand typing with a single RGB-D sensor. The dataset comprises RGB-D image sequences of 20 subjects (10 males and 10 females) typing sentences, and each subject typed around 20 sentences. In the task, only the hand and keyboard region contributed to the person identification, so we also propose methods of extracting Regions of Interest (RoIs) for each type of data. Unlike the data of the key press or release, our dataset not only captures the velocity of pressing and releasing different keys and the typing style of specific keys or combinations of keys, but also contains rich information on the hand shape and posture. To verify the validity of our proposed data, we adopted deep neural networks to learn distinguishing features from different data representations, including RGB-KD-Net, D-KD-Net, and RGBD-KD-Net. Simultaneously, the sequence of point clouds also can be obtained from depth images given the intrinsic parameters of the RGB-D sensor, so we also studied the performance of human identification based on the point clouds. Extensive experimental results showed that our idea works and the performance of the proposed method based on RGB-D images is the best, which achieved 99.44% accuracy based on the unseen real-world data. To inspire more researchers and facilitate relevant studies, the proposed dataset will be publicly accessible together with the publication of this paper.

Assuntos

Antropologia Forense , Redes Neurais de Computação , Humanos , Postura , Biometria , Mãos

16.

Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection.

Wang, Shuaihui; Jiang, Fengyi; Xu, Boqian.

Sensors (Basel) ; 23(21)2023 Oct 29.

Artigo em Inglês | MEDLINE | ID: mdl-37960501

RESUMO

Salient object detection (SOD), which is used to identify the most distinctive object in a given scene, plays an important role in computer vision tasks. Most existing RGB-D SOD methods employ a CNN-based network as the backbone to extract features from RGB and depth images; however, the inherent locality of a CNN-based network limits the performance of CNN-based methods. To tackle this issue, we propose a novel Swin Transformer-based edge guidance network (SwinEGNet) for RGB-D SOD in which the Swin Transformer is employed as a powerful feature extractor to capture the global context. An edge-guided cross-modal interaction module is proposed to effectively enhance and fuse features. In particular, we employed the Swin Transformer as the backbone to extract features from RGB images and depth maps. Then, we introduced the edge extraction module (EEM) to extract edge features and the depth enhancement module (DEM) to enhance depth features. Additionally, a cross-modal interaction module (CIM) was used to integrate cross-modal features from global and local contexts. Finally, we employed a cascaded decoder to refine the prediction map in a coarse-to-fine manner. Extensive experiments demonstrated that our SwinEGNet achieved the best performance on the LFSD, NLPR, DES, and NJU2K datasets and achieved comparable performance on the STEREO dataset compared to 14 state-of-the-art methods. Our model achieved better performance compared to SwinNet, with 88.4% parameters and 77.2% FLOPs. Our code will be publicly available.

17.

Global Guided Cross-Modal Cross-Scale Network for RGB-D Salient Object Detection.

Wang, Shuaihui; Jiang, Fengyi; Xu, Boqian.

Sensors (Basel) ; 23(16)2023 Aug 17.

Artigo em Inglês | MEDLINE | ID: mdl-37631757

RESUMO

RGB-D saliency detection aims to accurately localize salient regions using the complementary information of a depth map. Global contexts carried by the deep layer are key to salient objection detection, but they are diluted when transferred to shallower layers. Besides, depth maps may contain misleading information due to the depth sensors. To tackle these issues, in this paper, we propose a new cross-modal cross-scale network for RGB-D salient object detection, where the global context information provides global guidance to boost performance in complex scenarios. First, we introduce a global guided cross-modal and cross-scale module named G2CMCSM to realize global guided cross-modal cross-scale fusion. Then, we employ feature refinement modules for progressive refinement in a coarse-to-fine manner. In addition, we adopt a hybrid loss function to supervise the training of G2CMCSNet over different scales. With all these modules working together, G2CMCSNet effectively enhances both salient object details and salient object localization. Extensive experiments on challenging benchmark datasets demonstrate that our G2CMCSNet outperforms existing state-of-the-art methods.

18.

Affine Iterative Closest Point Algorithm Based on Color Information and Correntropy for Precise Point Set Registration.

Liang, Lexian; Pei, Hailong.

Sensors (Basel) ; 23(14)2023 Jul 17.

Artigo em Inglês | MEDLINE | ID: mdl-37514769

RESUMO

In this paper, we propose a novel affine iterative closest point algorithm based on color information and correntropy, which can effectively deal with the registration problems with a large number of noise and outliers and small deformations in RGB-D datasets. Firstly, to alleviate the problem of low registration accuracy for data with weak geometric structures, we consider introducing color features into traditional affine algorithms to establish more accurate and reliable correspondences. Secondly, we introduce the correntropy measurement to overcome the influence of a large amount of noise and outliers in the RGB-D datasets, thereby further improving the registration accuracy. Experimental results demonstrate that the proposed registration algorithm has higher registration accuracy, with error reduction of almost 10 times, and achieves more stable robustness than other advanced algorithms.

19.

Absolute and Relative Depth-Induced Network for RGB-D Salient Object Detection.

Kong, Yuqiu; Wang, He; Kong, Lingwei; Liu, Yang; Yao, Cuili; Yin, Baocai.

Sensors (Basel) ; 23(7)2023 Mar 30.

Artigo em Inglês | MEDLINE | ID: mdl-37050670

RESUMO

Detecting salient objects in complicated scenarios is a challenging problem. Except for semantic features from the RGB image, spatial information from the depth image also provides sufficient cues about the object. Therefore, it is crucial to rationally integrate RGB and depth features for the RGB-D salient object detection task. Most existing RGB-D saliency detectors modulate RGB semantic features with absolution depth values. However, they ignore the appearance contrast and structure knowledge indicated by relative depth values between pixels. In this work, we propose a depth-induced network (DIN) for RGB-D salient object detection, to take full advantage of both absolute and relative depth information, and further, enforce the in-depth fusion of the RGB-D cross-modalities. Specifically, an absolute depth-induced module (ADIM) is proposed, to hierarchically integrate absolute depth values and RGB features, to allow the interaction between the appearance and structural information in the encoding stage. A relative depth-induced module (RDIM) is designed, to capture detailed saliency cues, by exploring contrastive and structural information from relative depth values in the decoding stage. By combining the ADIM and RDIM, we can accurately locate salient objects with clear boundaries, even from complex scenes. The proposed DIN is a lightweight network, and the model size is much smaller than that of state-of-the-art algorithms. Extensive experiments on six challenging benchmarks, show that our method outperforms most existing RGB-D salient object detection models.

20.

A New Method for Classifying Scenes for Simultaneous Localization and Mapping Using the Boundary Object Function Descriptor on RGB-D Points.

Lomas-Barrie, Victor; Suarez-Espinoza, Mario; Hernandez-Chavez, Gerardo; Neme, Antonio.

Sensors (Basel) ; 23(21)2023 Oct 30.

Artigo em Inglês | MEDLINE | ID: mdl-37960535

RESUMO

Scene classification in autonomous navigation is a highly complex task due to variations, such as light conditions and dynamic objects, in the inspected scenes; it is also a challenge for small-factor computers to run modern and highly demanding algorithms. In this contribution, we introduce a novel method for classifying scenes in simultaneous localization and mapping (SLAM) using the boundary object function (BOF) descriptor on RGB-D points. Our method aims to reduce complexity with almost no performance cost. All the BOF-based descriptors from each object in a scene are combined to define the scene class. Instead of traditional image classification methods such as ORB or SIFT, we use the BOF descriptor to classify scenes. Through an RGB-D camera, we capture points and adjust them onto layers than are perpendicular to the camera plane. From each plane, we extract the boundaries of objects such as furniture, ceilings, walls, or doors. The extracted features compose a bag of visual words classified by a support vector machine. The proposed method achieves almost the same accuracy in scene classification as a SIFT-based algorithm and is 2.38× faster. The experimental results demonstrate the effectiveness of the proposed method in terms of accuracy and robustness for the 7-Scenes and SUNRGBD datasets.

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa