Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 10 de 10
1.
Sensors (Basel) ; 24(8)2024 Apr 12.
Article En | MEDLINE | ID: mdl-38676108

Egocentric activity recognition is a prominent computer vision task that is based on the use of wearable cameras. Since egocentric videos are captured through the perspective of the person wearing the camera, her/his body motions severely complicate the video content, imposing several challenges. In this work we propose a novel approach for domain-generalized egocentric human activity recognition. Typical approaches use a large amount of training data, aiming to cover all possible variants of each action. Moreover, several recent approaches have attempted to handle discrepancies between domains with a variety of costly and mostly unsupervised domain adaptation methods. In our approach we show that through simple manipulation of available source domain data and with minor involvement from the target domain, we are able to produce robust models, able to adequately predict human activity in egocentric video sequences. To this end, we introduce a novel three-stream deep neural network architecture combining elements of vision transformers and residual neural networks which are trained using multi-modal data. We evaluate the proposed approach using a challenging, egocentric video dataset and demonstrate its superiority over recent, state-of-the-art research works.


Neural Networks, Computer , Video Recording , Humans , Video Recording/methods , Algorithms , Pattern Recognition, Automated/methods , Image Processing, Computer-Assisted/methods , Human Activities , Wearable Electronic Devices
2.
Int J Neural Syst ; 33(9): 2350047, 2023 Sep.
Article En | MEDLINE | ID: mdl-37602705

In real-life scenarios, Human Activity Recognition (HAR) from video data is prone to occlusion of one or more body parts of the human subjects involved. Although it is common sense that the recognition of the majority of activities strongly depends on the motion of some body parts, which when occluded compromise the performance of recognition approaches, this problem is often underestimated in contemporary research works. Currently, training and evaluation is based on datasets that have been shot under laboratory (ideal) conditions, i.e. without any kind of occlusion. In this work, we propose an approach for HAR in the presence of partial occlusion, in cases wherein up to two body parts are involved. We assume that human motion is modeled using a set of 3D skeletal joints and also that occluded body parts remain occluded during the whole duration of the activity. We solve this problem using regression, performed by a novel deep Convolutional Recurrent Neural Network (CRNN). Specifically, given a partially occluded skeleton, we attempt to reconstruct the missing information regarding the motion of its occluded part(s). We evaluate our approach using four publicly available human motion datasets. Our experimental results indicate a significant increase of performance, when compared to baseline approaches, wherein networks that have been trained using only nonoccluded or both occluded and nonoccluded samples are evaluated using occluded samples. To the best of our knowledge, this is the first research work that formulates and copes with the problem of HAR under occlusion as a regression task.


Human Activities , Neural Networks, Computer , Humans
3.
Sensors (Basel) ; 23(10)2023 May 19.
Article En | MEDLINE | ID: mdl-37430811

The presence of occlusion in human activity recognition (HAR) tasks hinders the performance of recognition algorithms, as it is responsible for the loss of crucial motion data. Although it is intuitive that it may occur in almost any real-life environment, it is often underestimated in most research works, which tend to rely on datasets that have been collected under ideal conditions, i.e., without any occlusion. In this work, we present an approach that aimed to deal with occlusion in an HAR task. We relied on previous work on HAR and artificially created occluded data samples, assuming that occlusion may prevent the recognition of one or two body parts. The HAR approach we used is based on a Convolutional Neural Network (CNN) that has been trained using 2D representations of 3D skeletal motion. We considered cases in which the network was trained with and without occluded samples and evaluated our approach in single-view, cross-view, and cross-subject cases and using two large scale human motion datasets. Our experimental results indicate that the proposed training strategy is able to provide a significant boost of performance in the presence of occlusion.


Algorithms , Human Activities , Humans , Motion , Neural Networks, Computer , Recognition, Psychology
4.
Int J Neural Syst ; 33(1): 2350002, 2023 Jan.
Article En | MEDLINE | ID: mdl-36573880

The problem of human activity recognition (HAR) has been increasingly attracting the efforts of the research community, having several applications. It consists of recognizing human motion and/or behavior within a given image or a video sequence, using as input raw sensor measurements. In this paper, a multimodal approach addressing the task of video-based HAR is proposed. It is based on 3D visual data that are collected using an RGB + depth camera, resulting to both raw video and 3D skeletal sequences. These data are transformed into six different 2D image representations; four of them are in the spectral domain, another is a pseudo-colored image. The aforementioned representations are based on skeletal data. The last representation is a "dynamic" image which is actually an artificially created image that summarizes RGB data of the whole video sequence, in a visually comprehensible way. In order to classify a given activity video, first, all the aforementioned 2D images are extracted and then six trained convolutional neural networks are used so as to extract visual features. The latter are fused so as to form a single feature vector and are fed into a support vector machine for classification into human activities. For evaluation purposes, a challenging motion activity recognition dataset is used, while single-view, cross-view and cross-subject experiments are performed. Moreover, the proposed approach is compared to three other state-of-the-art methods, demonstrating superior performance in most experiments.


Human Activities , Neural Networks, Computer , Humans , Support Vector Machine
5.
Adv Exp Med Biol ; 1194: 105-114, 2020.
Article En | MEDLINE | ID: mdl-32468527

In this paper we present an approach toward human action detection for activities of daily living (ADLs) that uses a convolutional neural network (CNN). The network is trained on discrete Fourier transform (DFT) images that result from raw sensor readings, i.e., each human action is ultimately described by an image. More specifically, we work using 3D skeletal positions of human joints, which originate from processing of raw RGB sequences enhanced by depth information. The motion of each joint may be described by a combination of three 1D signals, representing its coefficients into a 3D Euclidean space. All such signals from a set of human joints are concatenated to form an image, which is then transformed by DFT and is used for training and evaluation of a CNN. We evaluate our approach using a publicly available challenging dataset of human actions that may involve one or more body parts simultaneously and for two sets of actions which resemble common ADLs.


Activities of Daily Living , Bone and Bones , Deep Learning , Joints , Range of Motion, Articular , Bone and Bones/diagnostic imaging , Humans , Joints/diagnostic imaging , Neural Networks, Computer
6.
J Imaging ; 5(7)2019 Jun 30.
Article En | MEDLINE | ID: mdl-34460457

In recent years, following the tremendous growth of the Web, extremely large amounts of digital multimedia content are being produced every day and are shared online mainly through several newly emerged channels, such as social networks [...].

7.
Comput Math Methods Med ; 2018: 2026962, 2018.
Article En | MEDLINE | ID: mdl-30250496

Wireless Capsule Endoscopy (WCE) is a noninvasive diagnostic technique enabling the inspection of the whole gastrointestinal (GI) tract by capturing and wirelessly transmitting thousands of color images. Proprietary software "stitches" the images into videos for examination by accredited readers. However, the videos produced are of large length and consequently the reading task becomes harder and more prone to human errors. Automating the WCE reading process could contribute in both the reduction of the examination time and the improvement of its diagnostic accuracy. In this paper, we present a novel feature extraction methodology for automated WCE image analysis. It aims at discriminating various kinds of abnormalities from the normal contents of WCE images, in a machine learning-based classification framework. The extraction of the proposed features involves an unsupervised color-based saliency detection scheme which, unlike current approaches, combines both point and region-level saliency information and the estimation of local and global image color descriptors. The salient point detection process involves estimation of DIstaNces On Selective Aggregation of chRomatic image Components (DINOSARC). The descriptors are extracted from superpixels by coevaluating both point and region-level information. The main conclusions of the experiments performed on a publicly available dataset of WCE images are (a) the proposed salient point detection scheme results in significantly less and more relevant salient points; (b) the proposed descriptors are more discriminative than relevant state-of-the-art descriptors, promising a wider adoption of the proposed approach for computer-aided diagnosis in WCE.


Algorithms , Capsule Endoscopy , Diagnosis, Computer-Assisted , Software , Color , Gastrointestinal Tract/diagnostic imaging , Humans
8.
Adv Exp Med Biol ; 989: 155-164, 2017.
Article En | MEDLINE | ID: mdl-28971424

Emotion recognition plays an important role in several applications, such as human computer interaction and understanding affective state of users in certain tasks, e.g., within a learning process, monitoring of elderly, interactive entertainment etc. It may be based upon several modalities, e.g., by analyzing facial expressions and/or speech, using electroencephalograms, electrocardiograms etc. In certain applications the only available modality is the user's (speaker's) voice. In this paper we aim to analyze speakers' emotions based solely on paralinguistic information, i.e., not depending on the linguistic aspect of speech. We compare two machine learning approaches, namely a Convolutional Neural Network and a Support Vector Machine. The former is trained using raw speech information, while the latter is trained on a set of extracted low-level features. Aiming to provide a multilingual approach, training and testing datasets contain speech from different languages.


Emotions , Neural Networks, Computer , Speech , Support Vector Machine , Voice , Humans
9.
Comput Biol Med ; 89: 429-440, 2017 10 01.
Article En | MEDLINE | ID: mdl-28886480

Wireless capsule endoscopy (WCE) is performed with a miniature swallowable endoscope enabling the visualization of the whole gastrointestinal (GI) tract. One of the most challenging problems in WCE is the localization of the capsule endoscope (CE) within the GI lumen. Contemporary, radiation-free localization approaches are mainly based on the use of external sensors and transit time estimation techniques, with practically low localization accuracy. Latest advances for the solution of this problem include localization approaches based solely on visual information from the CE camera. In this paper we present a novel visual localization approach based on an intelligent, artificial neural network, architecture which implements a generic visual odometry (VO) framework capable of estimating the motion of the CE in physical units. Unlike the conventional, geometric, VO approaches, the proposed one is adaptive to the geometric model of the CE used; therefore, it does not require any prior knowledge about and its intrinsic parameters. Furthermore, it exploits color as a cue to increase localization accuracy and robustness. Experiments were performed using a robotic-assisted setup providing ground truth information about the actual location of the CE. The lowest average localization error achieved is 2.70 ± 1.62 cm, which is significantly lower than the error obtained with the geometric approach. This result constitutes a promising step towards the in-vivo application of VO, which will open new horizons for accurate local treatment, including drug infusion and surgical interventions.


Capsule Endoscopes , Capsule Endoscopy/methods , Image Processing, Computer-Assisted , Neural Networks, Computer , Humans
10.
Comput Biol Med ; 65: 297-307, 2015 Oct 01.
Article En | MEDLINE | ID: mdl-26073184

Wireless capsule endoscopy (WCE) enables the non-invasive examination of the gastrointestinal (GI) tract by a swallowable device equipped with a miniature camera. Accurate localization of the capsule in the GI tract enables accurate localization of abnormalities for medical interventions such as biopsy and polyp resection; therefore, the optimization of the localization outcome is important. Current approaches to endoscopic capsule localization are mainly based on external sensors and transit time estimations. Recently, we demonstrated the feasibility of capsule localization based-entirely-on visual features, without the use of external sensors. This technique relies on a motion estimation algorithm that enables measurements of the distance and the rotation of the capsule from the acquired video frames. Towards the determination of an optimal visual feature extraction technique for capsule motion estimation, an extensive comparative assessment of several state-of-the-art techniques, using a publicly available dataset, is presented. The results show that the minimization of the localization error is possible at the cost of computational efficiency. A localization error of approximately one order of magnitude higher than the minimal one can be considered as compromise for the use of current computationally efficient feature extraction techniques.


Capsule Endoscopy/methods , Image Processing, Computer-Assisted/methods , Capsule Endoscopy/instrumentation , Female , Humans , Image Processing, Computer-Assisted/instrumentation , Male
...