Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros

Base de dados
Tipo de documento
Assunto da revista
País de afiliação
Intervalo de ano de publicação
1.
Sensors (Basel) ; 22(17)2022 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-36081084

RESUMO

Pedestrians are often obstructed by other objects or people in real-world vision sensors. These obstacles make pedestrian-attribute recognition (PAR) difficult; hence, occlusion processing for visual sensing is a key issue in PAR. To address this problem, we first formulate the identification of non-occluded frames as temporal attention based on the sparsity of a crowded video. In other words, a model for PAR is guided to prevent paying attention to the occluded frame. However, we deduced that this approach cannot include a correlation between attributes when occlusion occurs. For example, "boots" and "shoe color" cannot be recognized simultaneously when the foot is invisible. To address the uncorrelated attention issue, we propose a novel temporal-attention module based on group sparsity. Group sparsity is applied across attention weights in correlated attributes. Accordingly, physically-adjacent pedestrian attributes are grouped, and the attention weights of a group are forced to focus on the same frames. Experimental results indicate that the proposed method achieved 1.18% and 6.21% higher F1-scores than the advanced baseline method on the occlusion samples in DukeMTMC-VideoReID and MARS video-based PAR datasets, respectively.


Assuntos
Pedestres , Algoritmos , Humanos , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Psicológico , Gravação em Vídeo
2.
Sensors (Basel) ; 21(22)2021 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-34833615

RESUMO

Universal domain adaptation (UDA) is a crucial research topic for efficient deep learning model training using data from various imaging sensors. However, its development is affected by unlabeled target data. Moreover, the nonexistence of prior knowledge of the source and target domain makes it more challenging for UDA to train models. I hypothesize that the degradation of trained models in the target domain is caused by the lack of direct training loss to improve the discriminative power of the target domain data. As a result, the target data adapted to the source representations is biased toward the source domain. I found that the degradation was more pronounced when I used synthetic data for the source domain and real data for the target domain. In this paper, I propose a UDA method with target domain contrastive learning. The proposed method enables models to leverage synthetic data for the source domain and train the discriminativeness of target features in an unsupervised manner. In addition, the target domain feature extraction network is shared with the source domain classification task, preventing unnecessary computational growth. Extensive experimental results on VisDa-2017 and MNIST to SVHN demonstrated that the proposed method significantly outperforms the baseline by 2.7% and 5.1%, respectively.

3.
Sensors (Basel) ; 20(12)2020 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-32575708

RESUMO

Emotion recognition plays an important role in the field of human-computer interaction (HCI). An electroencephalogram (EEG) is widely used to estimate human emotion owing to its convenience and mobility. Deep neural network (DNN) approaches using an EEG for emotion recognition have recently shown remarkable improvement in terms of their recognition accuracy. However, most studies in this field still require a separate process for extracting handcrafted features despite the ability of a DNN to extract meaningful features by itself. In this paper, we propose a novel method for recognizing an emotion based on the use of three-dimensional convolutional neural networks (3D CNNs), with an efficient representation of the spatio-temporal representations of EEG signals. First, we spatially reconstruct raw EEG signals represented as stacks of one-dimensional (1D) time series data to two-dimensional (2D) EEG frames according to the original electrode position. We then represent a 3D EEG stream by concatenating the 2D EEG frames to the time axis. These 3D reconstructions of the raw EEG signals can be efficiently combined with 3D CNNs, which have shown a remarkable feature representation from spatio-temporal data. Herein, we demonstrate the accuracy of the emotional classification of the proposed method through extensive experiments on the DEAP (a Dataset for Emotion Analysis using EEG, Physiological, and video signals) dataset. Experimental results show that the proposed method achieves a classification accuracy of 99.11%, 99.74%, and 99.73% in the binary classification of valence and arousal, and, in four-class classification, respectively. We investigate the spatio-temporal effectiveness of the proposed method by comparing it to several types of input methods with 2D/3D CNN. We then verify the best performing shape of both the kernel and input data experimentally. We verify that an efficient representation of an EEG and a network that fully takes advantage of the data characteristics can outperform methods that apply handcrafted features.


Assuntos
Eletroencefalografia , Emoções , Redes Neurais de Computação , Nível de Alerta , Humanos , Análise Espaço-Temporal
4.
Sensors (Basel) ; 19(19)2019 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-31590266

RESUMO

As artificial intelligence (AI)- or deep-learning-based technologies become more popular,the main research interest in the field is not only on their accuracy, but also their efficiency, e.g., theability to give immediate results on the users' inputs. To achieve this, there have been many attemptsto embed deep learning technology on intelligent sensors. However, there are still many obstacles inembedding a deep network in sensors with limited resources. Most importantly, there is an apparenttrade-off between the complexity of a network and its processing time, and finding a structure witha better trade-off curve is vital for successful applications in intelligent sensors. In this paper, wepropose two strategies for designing a compact deep network that maintains the required level ofperformance even after minimizing the computations. The first strategy is to automatically determinethe number of parameters of a network by utilizing group sparsity and knowledge distillation (KD)in the training process. By doing so, KD can compensate for the possible losses in accuracy causedby enforcing sparsity. Nevertheless, a problem in applying the first strategy is the unclarity indetermining the balance between the accuracy improvement due to KD and the parameter reductionby sparse regularization. To handle this balancing problem, we propose a second strategy: a feedbackcontrol mechanism based on the proportional control theory. The feedback control logic determinesthe amount of emphasis to be put on network sparsity during training and is controlled based onthe comparative accuracy losses of the teacher and student models in the training. A surprising facthere is that this control scheme not only determines an appropriate trade-off point, but also improvesthe trade-off curve itself. The results of experiments on CIFAR-10, CIFAR-100, and ImageNet32 X 32datasets show that the proposed method is effective in building a compact network while preventingperformance degradation due to sparsity regularization much better than other baselines.

5.
Sensors (Basel) ; 14(11): 21151-73, 2014 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-25390406

RESUMO

This paper proposes VibeComm, a novel communication method for smart devices using a built-in vibrator and accelerometer. The proposed approach is ideal for low-rate off-line communication, and its communication medium is an object on which smart devices are placed, such as tables and desks. When more than two smart devices are placed on an object and one device wants to transmit a message to the other devices, the transmitting device generates a sequence of vibrations. The vibrations are propagated through the object on which the devices are placed. The receiving devices analyze their accelerometer readings to decode incoming messages. The proposed method can be the alternative communication method when general types of radio communication methods are not available. VibeComm is implemented on Android smartphones, and a comprehensive set of experiments is conducted to show its feasibility.

6.
Sci Rep ; 13(1): 15243, 2023 09 14.
Artigo em Inglês | MEDLINE | ID: mdl-37709828

RESUMO

Polyp segmentation is challenging because the boundary between polyps and mucosa is ambiguous. Several models have considered the use of attention mechanisms to solve this problem. However, these models use only finite information obtained from a single type of attention. We propose a new dual-attention network based on shallow and reverse attention modules for colon polyps segmentation called SRaNet. The shallow attention mechanism removes background noise while emphasizing the locality by focusing on the foreground. In contrast, reverse attention helps distinguish the boundary between polyps and mucous membranes more clearly by focusing on the background. The two attention mechanisms are adaptively fused using a "Softmax Gate". Combining the two types of attention enables the model to capture complementary foreground and boundary features. Therefore, the proposed model predicts the boundaries of polyps more accurately than other models. We present the results of extensive experiments on polyp benchmarks to show that the proposed method outperforms existing models on both seen and unseen data. Furthermore, the results show that the proposed dual attention module increases the explainability of the model.


Assuntos
Pólipos , Humanos , Benchmarking , Colo
7.
IEEE Trans Pattern Anal Mach Intell ; 43(2): 623-637, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-31369369

RESUMO

Much progress has been made for non-rigid structure from motion (NRSfM) during the last two decades, which made it possible to provide reasonable solutions for synthetically-created benchmark data. In order to utilize these NRSfM techniques in more realistic situations, however, we are now facing two important problems that must be solved: First, general scenes contain complex deformations as well as multiple objects, which violates the usual assumptions of previous NRSfM proposals. Second, there are many unreconstructable regions in the video, either because of the discontinued tracks of 2D trajectories or those regions static towards the camera, which require careful manipulations. In this paper, we show that a consensus-based reconstruction framework can handle these issues effectively. Even though the entire scene is complex, its parts usually have simpler deformations, and even though there are some unreconstructable parts, they can be weeded out to reduce their harmful effect on the entire reconstruction. The main difficulty of this approach lies in identifying appropriate parts, however, it can be effectively avoided by sampling parts stochastically and then aggregate their reconstructions afterwards. Experimental results show that the proposed method renews the state-of-the-art for popular benchmark data under much harsher environments, i.e., narrow camera view ranges, and it can reconstruct video-based real-world data effectively for as many areas as it can without an elaborated user input.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA