Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Sensors (Basel) ; 23(7)2023 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-37050507

RESUMEN

Recently, transformer architectures have shown superior performance compared to their CNN counterparts in many computer vision tasks. The self-attention mechanism enables transformer networks to connect visual dependencies over short as well as long distances, thus generating a large, sometimes even a global receptive field. In this paper, we propose our Parallel Local-Global Vision Transformer (PLG-ViT), a general backbone model that fuses local window self-attention with global self-attention. By merging these local and global features, short- and long-range spatial interactions can be effectively and efficiently represented without the need for costly computational operations such as shifted windows. In a comprehensive evaluation, we demonstrate that our PLG-ViT outperforms CNN-based as well as state-of-the-art transformer-based architectures in image classification and in complex downstream tasks such as object detection, instance segmentation, and semantic segmentation. In particular, our PLG-ViT models outperformed similarly sized networks like ConvNeXt and Swin Transformer, achieving Top-1 accuracy values of 83.4%, 84.0%, and 84.5% on ImageNet-1K with 27M, 52M, and 91M parameters, respectively.

2.
Sensors (Basel) ; 23(2)2023 Jan 10.
Artículo en Inglés | MEDLINE | ID: mdl-36679616

RESUMEN

In-car activity monitoring is a key enabler of various automotive safety functions. Existing approaches are largely based on vision systems. Radar, however, can provide a low-cost, privacy-preserving alternative. To this day, such systems based on the radar are not widely researched. In our work, we introduce a novel approach that uses the Doppler signal of an ultra-wideband (UWB) radar as an input to deep neural networks for the classification of driving activities. In contrast to previous work in the domain, we focus on generalization to unseen persons and make a new radar driving activity dataset (RaDA) available to the scientific community to encourage comparison and the benchmarking of future methods.


Asunto(s)
Conducción de Automóvil , Procesamiento de Señales Asistido por Computador , Radar , Monitoreo Fisiológico/métodos , Redes Neurales de la Computación
3.
Sensors (Basel) ; 22(3)2022 Jan 21.
Artículo en Inglés | MEDLINE | ID: mdl-35161563

RESUMEN

The problem of accurate three-dimensional reconstruction is important for many research and industrial applications. Light field depth estimation utilizes many observations of the scene and hence can provide accurate reconstruction. We present a method, which enhances existing reconstruction algorithm with per-layer disparity filtering and consistency-based holes filling. Together with that we reformulate the reconstruction result to a form of point cloud from different light field viewpoints and propose a non-linear optimization of it. The capability of our method to reconstruct scenes with acceptable quality was verified by evaluation on a publicly available dataset.

4.
Sensors (Basel) ; 22(22)2022 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-36433393

RESUMEN

This paper presents a novel architecture for simultaneous estimation of highly accurate optical flows and rigid scene transformations for difficult scenarios where the brightness assumption is violated by strong shading changes. In the case of rotating objects or moving light sources, such as those encountered for driving cars in the dark, the scene appearance often changes significantly from one view to the next. Unfortunately, standard methods for calculating optical flows or poses are based on the expectation that the appearance of features in the scene remains constant between views. These methods may fail frequently in the investigated cases. The presented method fuses texture and geometry information by combining image, vertex and normal data to compute an illumination-invariant optical flow. By using a coarse-to-fine strategy, globally anchored optical flows are learned, reducing the impact of erroneous shading-based pseudo-correspondences. Based on the learned optical flows, a second architecture is proposed that predicts robust rigid transformations from the warped vertex and normal maps. Particular attention is paid to situations with strong rotations, which often cause such shading changes. Therefore, a 3-step procedure is proposed that profitably exploits correlations between the normals and vertices. The method has been evaluated on a newly created dataset containing both synthetic and real data with strong rotations and shading effects. These data represent the typical use case in 3D reconstruction, where the object often rotates in large steps between the partial reconstructions. Additionally, we apply the method to the well-known Kitti Odometry dataset. Even if, due to fulfillment of the brightness assumption, this is not the typical use case of the method, the applicability to standard situations and the relation to other methods is therefore established.

5.
Sensors (Basel) ; 22(10)2022 May 12.
Artículo en Inglés | MEDLINE | ID: mdl-35632112

RESUMEN

In recent years, due to the advancements in machine learning, object detection has become a mainstream task in the computer vision domain. The first phase of object detection is to find the regions where objects can exist. With the improvements in deep learning, traditional approaches, such as sliding windows and manual feature selection techniques, have been replaced with deep learning techniques. However, object detection algorithms face a problem when performed in low light, challenging weather, and crowded scenes, similar to any other task. Such an environment is termed a challenging environment. This paper exploits pixel-level information to improve detection under challenging situations. To this end, we exploit the recently proposed hybrid task cascade network. This network works collaboratively with detection and segmentation heads at different cascade levels. We evaluate the proposed methods on three complex datasets of ExDark, CURE-TSD, and RESIDE, and achieve a mAP of 0.71, 0.52, and 0.43, respectively. Our experimental results assert the efficacy of the proposed approach.


Asunto(s)
Algoritmos , Aprendizaje Automático , Cara
6.
Sensors (Basel) ; 22(13)2022 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-35808357

RESUMEN

The generally unsupervised nature of autoencoder models implies that the main training metric is formulated as the error between input images and their corresponding reconstructions. Different reconstruction loss variations and latent space regularizations have been shown to improve model performances depending on the tasks to solve and to induce new desirable properties such as disentanglement. Nevertheless, measuring the success in, or enforcing properties by, the input pixel space is a challenging endeavour. In this work, we want to make use of the available data more efficiently and provide design choices to be considered in the recording or generation of future datasets to implicitly induce desirable properties during training. To this end, we propose a new sampling technique which matches semantically important parts of the image while randomizing the other parts, leading to salient feature extraction and a neglection of unimportant details. The proposed method can be combined with any existing reconstruction loss and the performance gain is superior to the triplet loss. We analyse the resulting properties on various datasets and show improvements on several computer vision tasks: illumination and unwanted features can be normalized or smoothed out and shadows are removed such that classification or other tasks work more reliably; a better invariances with respect to unwanted features is induced; the generalization capacities from synthetic to real images is improved, such that more of the semantics are preserved; uncertainty estimation is superior to Monte Carlo Dropout and an ensemble of models, particularly for datasets of higher visual complexity. Finally, classification accuracy by means of simple linear classifiers in the latent space is improved compared to the triplet loss. For each task, the improvements are highlighted on several datasets commonly used by the research community, as well as in automotive applications.


Asunto(s)
Iluminación , Método de Montecarlo
7.
Sensors (Basel) ; 22(21)2022 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-36366238

RESUMEN

Supervised image-to-image translation has been proven to generate realistic images with sharp details and to have good quantitative performance. Such methods are trained on a paired dataset, where an image from the source domain already has a corresponding translated image in the target domain. However, this paired dataset requirement imposes a huge practical constraint, requires domain knowledge or is even impossible to obtain in certain cases. Due to these problems, unsupervised image-to-image translation has been proposed, which does not require domain expertise and can take advantage of a large unlabeled dataset. Although such models perform well, they are hard to train due to the major constraints induced in their loss functions, which make training unstable. Since CycleGAN has been released, numerous methods have been proposed which try to address various problems from different perspectives. In this review, we firstly describe the general image-to-image translation framework and discuss the datasets and metrics involved in the topic. Furthermore, we revise the current state-of-the-art with a classification of existing works. This part is followed by a small quantitative evaluation, for which results were taken from papers.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Procesamiento de Imagen Asistido por Computador/métodos
8.
Sensors (Basel) ; 22(11)2022 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-35684612

RESUMEN

We present TIMo (Time-of-flight Indoor Monitoring), a dataset for video-based monitoring of indoor spaces captured using a time-of-flight (ToF) camera. The resulting depth videos feature people performing a set of different predefined actions, for which we provide detailed annotations. Person detection for people counting and anomaly detection are the two targeted applications. Most existing surveillance video datasets provide either grayscale or RGB videos. Depth information, on the other hand, is still a rarity in this class of datasets in spite of being popular and much more common in other research fields within computer vision. Our dataset addresses this gap in the landscape of surveillance video datasets. The recordings took place at two different locations with the ToF camera set up either in a top-down or a tilted perspective on the scene. Moreover, we provide experimental evaluation results from baseline algorithms.


Asunto(s)
Algoritmos , Humanos
9.
Sensors (Basel) ; 22(21)2022 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-36366281

RESUMEN

Object detection is a computer vision task that involves localisation and classification of objects in an image. Video data implicitly introduces several challenges, such as blur, occlusion and defocus, making video object detection more challenging in comparison to still image object detection, which is performed on individual and independent images. This paper tackles these challenges by proposing an attention-heavy framework for video object detection that aggregates the disentangled features extracted from individual frames. The proposed framework is a two-stage object detector based on the Faster R-CNN architecture. The disentanglement head integrates scale, spatial and task-aware attention and applies it to the features extracted by the backbone network across all the frames. Subsequently, the aggregation head incorporates temporal attention and improves detection in the target frame by aggregating the features of the support frames. These include the features extracted from the disentanglement network along with the temporal features. We evaluate the proposed framework using the ImageNet VID dataset and achieve a mean Average Precision (mAP) of 49.8 and 52.5 using the backbones of ResNet-50 and ResNet-101, respectively. The improvement in performance over the individual baseline methods validates the efficacy of the proposed approach.

10.
Sensors (Basel) ; 22(18)2022 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-36146318

RESUMEN

Depth maps produced by LiDAR-based approaches are sparse. Even high-end LiDAR sensors produce highly sparse depth maps, which are also noisy around the object boundaries. Depth completion is the task of generating a dense depth map from a sparse depth map. While the earlier approaches focused on directly completing this sparsity from the sparse depth maps, modern techniques use RGB images as a guidance tool to resolve this problem. Whilst many others rely on affinity matrices for depth completion. Based on these approaches, we have divided the literature into two major categories; unguided methods and image-guided methods. The latter is further subdivided into multi-branch and spatial propagation networks. The multi-branch networks further have a sub-category named image-guided filtering. In this paper, for the first time ever we present a comprehensive survey of depth completion methods. We present a novel taxonomy of depth completion approaches, review in detail different state-of-the-art techniques within each category for depth completion of LiDAR data, and provide quantitative results for the approaches on KITTI and NYUv2 depth completion benchmark datasets.

11.
Sensors (Basel) ; 21(1)2021 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-33466293

RESUMEN

Estimation and tracking of 6DoF poses of objects in images is a challenging problem of great importance for robotic interaction and augmented reality. Recent approaches applying deep neural networks for pose estimation have shown encouraging results. However, most of them rely on training with real images of objects with severe limitations concerning ground truth pose acquisition, full coverage of possible poses, and training dataset scaling and generalization capability. This paper presents a novel approach using a Convolutional Neural Network (CNN) trained exclusively on single-channel Synthetic images of objects to regress 6DoF object Poses directly (SynPo-Net). The proposed SynPo-Net is a network architecture specifically designed for pose regression and a proposed domain adaptation scheme transforming real and synthetic images into an intermediate domain that is better fit for establishing correspondences. The extensive evaluation shows that our approach significantly outperforms the state-of-the-art using synthetic training in terms of both accuracy and speed. Our system can be used to estimate the 6DoF pose from a single frame, or be integrated into a tracking system to provide the initial pose.

12.
Sensors (Basel) ; 21(21)2021 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-34770698

RESUMEN

In this paper, we present the idea of Self Supervised learning on the shape completion and classification of point clouds. Most 3D shape completion pipelines utilize AutoEncoders to extract features from point clouds used in downstream tasks such as classification, segmentation, detection, and other related applications. Our idea is to add contrastive learning into AutoEncoders to encourage global feature learning of the point cloud classes. It is performed by optimizing triplet loss. Furthermore, local feature representations learning of point cloud is performed by adding the Chamfer distance function. To evaluate the performance of our approach, we utilize the PointNet classifier. We also extend the number of classes for evaluation from 4 to 10 to show the generalization ability of the learned features. Based on our results, embeddings generated from the contrastive AutoEncoder enhances shape completion and classification performance from 84.2% to 84.9% of point clouds achieving the state-of-the-art results with 10 classes.

13.
Sensors (Basel) ; 21(15)2021 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-34372351

RESUMEN

Recent progress in deep learning has led to accurate and efficient generic object detection networks. Training of highly reliable models depends on large datasets with highly textured and rich images. However, in real-world scenarios, the performance of the generic object detection system decreases when (i) occlusions hide the objects, (ii) objects are present in low-light images, or (iii) they are merged with background information. In this paper, we refer to all these situations as challenging environments. With the recent rapid development in generic object detection algorithms, notable progress has been observed in the field of deep learning-based object detection in challenging environments. However, there is no consolidated reference to cover the state of the art in this domain. To the best of our knowledge, this paper presents the first comprehensive overview, covering recent approaches that have tackled the problem of object detection in challenging environments. Furthermore, we present a quantitative and qualitative performance analysis of these approaches and discuss the currently available challenging datasets. Moreover, this paper investigates the performance of current state-of-the-art generic object detection algorithms by benchmarking results on the three well-known challenging datasets. Finally, we highlight several current shortcomings and outline future directions.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Algoritmos , Humanos
14.
Sensors (Basel) ; 20(10)2020 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-32429341

RESUMEN

The estimation of human hand pose has become the basis for many vital applications where the user depends mainly on the hand pose as a system input. Virtual reality (VR) headset, shadow dexterous hand and in-air signature verification are a few examples of applications that require to track the hand movements in real-time. The state-of-the-art 3D hand pose estimation methods are based on the Convolutional Neural Network (CNN). These methods are implemented on Graphics Processing Units (GPUs) mainly due to their extensive computational requirements. However, GPUs are not suitable for the practical application scenarios, where the low power consumption is crucial. Furthermore, the difficulty of embedding a bulky GPU into a small device prevents the portability of such applications on mobile devices. The goal of this work is to provide an energy efficient solution for an existing depth camera based hand pose estimation algorithm. First, we compress the deep neural network model by applying the dynamic quantization techniques on different layers to achieve maximum compression without compromising accuracy. Afterwards, we design a custom hardware architecture. For our device we selected the FPGA as a target platform because FPGAs provide high energy efficiency and can be integrated in portable devices. Our solution implemented on Xilinx UltraScale+ MPSoC FPGA is 4.2× faster and 577.3× more energy efficient than the original implementation of the hand pose estimation algorithm on NVIDIA GeForce GTX 1070.


Asunto(s)
Algoritmos , Mano , Redes Neurales de la Computación , Humanos , Movimiento , Fenómenos Físicos
15.
Sensors (Basel) ; 19(17)2019 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-31480461

RESUMEN

Hand shape and pose recovery is essential for many computer vision applications such as animation of a personalized hand mesh in a virtual environment. Although there are many hand pose estimation methods, only a few deep learning based algorithms target 3D hand shape and pose from a single RGB or depth image. Jointly estimating hand shape and pose is very challenging because none of the existing real benchmarks provides ground truth hand shape. For this reason, we propose a novel weakly-supervised approach for 3D hand shape and pose recovery (named WHSP-Net) from a single depth image by learning shapes from unlabeled real data and labeled synthetic data. To this end, we propose a novel framework which consists of three novel components. The first is the Convolutional Neural Network (CNN) based deep network which produces 3D joints positions from learned 3D bone vectors using a new layer. The second is a novel shape decoder that recovers dense 3D hand mesh from sparse joints. The third is a novel depth synthesizer which reconstructs 2D depth image from 3D hand mesh. The whole pipeline is fine-tuned in an end-to-end manner. We demonstrate that our approach recovers reasonable hand shapes from real world datasets as well as from live stream of depth camera in real-time. Our algorithm outperforms state-of-the-art methods that output more than the joint positions and shows competitive performance on 3D pose estimation task.

16.
Sensors (Basel) ; 19(20)2019 Oct 22.
Artículo en Inglés | MEDLINE | ID: mdl-31652665

RESUMEN

Recovery of articulated 3D structure from 2D observations is a challenging computer vision problem with many applications. Current learning-based approaches achieve state-of-the-art accuracy on public benchmarks but are restricted to specific types of objects and motions covered by the training datasets. Model-based approaches do not rely on training data but show lower accuracy on these datasets. In this paper, we introduce a model-based method called Structure from Articulated Motion (SfAM), which can recover multiple object and motion types without training on extensive data collections. At the same time, it performs on par with learning-based state-of-the-art approaches on public benchmarks and outperforms previous non-rigid structure from motion (NRSfM) methods. SfAM is built upon a general-purpose NRSfM technique while integrating a soft spatio-temporal constraint on the bone lengths. We use alternating optimization strategy to recover optimal geometry (i.e., bone proportions) together with 3D joint positions by enforcing the bone lengths consistency over a series of frames. SfAM is highly robust to noisy 2D annotations, generalizes to arbitrary objects and does not rely on training data, which is shown in extensive experiments on public benchmarks and real video sequences. We believe that it brings a new perspective on the domain of monocular 3D recovery of articulated structures, including human motion capture.

17.
Sensors (Basel) ; 18(11)2018 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-30423837

RESUMEN

In-air signature is a new modality which is essential for user authentication and access control in noncontact mode and has been actively studied in recent years. However, it has been treated as a conventional online signature, which is essentially a 2D spatial representation. Notably, this modality bears a lot more potential due to an important hidden depth feature. Existing methods for in-air signature verification neither capture this unique depth feature explicitly nor fully explore its potential in verification. Moreover, these methods are based on heuristic approaches for fingertip or hand palm center detection, which are not feasible in practice. Inspired by the great progress in deep-learning-based hand pose estimation, we propose a real-time in-air signature acquisition method which estimates hand joint positions in 3D using a single depth image. The predicted 3D position of fingertip is recorded for each frame. We present four different implementations of a verification module, which are based on the extracted depth and spatial features. An ablation study was performed to explore the impact of the depth feature in particular. For matching, we employed the most commonly used multidimensional dynamic time warping (MD-DTW) algorithm. We created a new database which contains 600 signatures recorded from 15 different subjects. Extensive evaluations were performed on our database. Our method, called 3DAirSig, achieved an equal error rate (EER) of 0 . 46 %. Experiments showed that depth itself is an important feature, which is sufficient for in-air signature verification. The dataset will be publicly available (https://goo.gl/yFdfdL).

18.
Sensors (Basel) ; 17(6)2017 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-28587178

RESUMEN

Motion tracking based on commercial inertial measurements units (IMUs) has been widely studied in the latter years as it is a cost-effective enabling technology for those applications in which motion tracking based on optical technologies is unsuitable. This measurement method has a high impact in human performance assessment and human-robot interaction. IMU motion tracking systems are indeed self-contained and wearable, allowing for long-lasting tracking of the user motion in situated environments. After a survey on IMU-based human tracking, five techniques for motion reconstruction were selected and compared to reconstruct a human arm motion. IMU based estimation was matched against motion tracking based on the Vicon marker-based motion tracking system considered as ground truth. Results show that all but one of the selected models perform similarly (about 35 mm average position estimation error).


Asunto(s)
Movimiento (Física) , Fenómenos Biomecánicos , Humanos , Encuestas y Cuestionarios , Extremidad Superior
19.
Open Res Eur ; 4: 33, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38953016

RESUMEN

In-field human motion capture (HMC) is drawing increasing attention due to the multitude of application areas. Plenty of research is currently invested in camera-based (markerless) HMC, with the advantage of no infrastructure being required on the body, and additional context information being available from the surroundings. However, the inherent drawbacks of camera-based approaches are the limited field of view and occlusions. In contrast, inertial HMC (IHMC) does not suffer from occlusions, thus being a promising approach for capturing human motion outside the laboratory. However, one major challenge of such methods is the necessity of spatial registration. Typically, during a predefined calibration sequence, the orientation and location of each inertial sensor are registered with respect to the underlying skeleton model. This work contributes to calibration-free IHMC, as it proposes a recursive estimator for the simultaneous online estimation of all sensor poses and joint positions of a kinematic chain model like the human skeleton. The full derivation from an optimization objective is provided. The approach can directly be applied to a synchronized data stream from a body-mounted inertial sensor network. Successful evaluations are demonstrated on noisy simulated data from a three-link chain, real lower-body walking data from 25 young, healthy persons, and walking data captured from a humanoid robot. The estimated and derived quantities, global and relative sensor orientations, joint positions, and segment lengths can be exploited for human motion analysis and anthropometric measurements, as well as in the context of hybrid markerless visual-inertial HMC.

20.
J Imaging ; 8(10)2022 Oct 17.
Artículo en Inglés | MEDLINE | ID: mdl-36286381

RESUMEN

Augmented reality (AR), combining virtual elements with the real world, has demonstrated impressive results in a variety of application fields and gained significant research attention in recent years due to its limitless potential [...].

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA