Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 109
Filtrar
1.
Sensors (Basel) ; 24(13)2024 Jul 04.
Artículo en Inglés | MEDLINE | ID: mdl-39001122

RESUMEN

Human Activity Recognition (HAR), alongside Ambient Assisted Living (AAL), are integral components of smart homes, sports, surveillance, and investigation activities. To recognize daily activities, researchers are focusing on lightweight, cost-effective, wearable sensor-based technologies as traditional vision-based technologies lack elderly privacy, a fundamental right of every human. However, it is challenging to extract potential features from 1D multi-sensor data. Thus, this research focuses on extracting distinguishable patterns and deep features from spectral images by time-frequency-domain analysis of 1D multi-sensor data. Wearable sensor data, particularly accelerator and gyroscope data, act as input signals of different daily activities, and provide potential information using time-frequency analysis. This potential time series information is mapped into spectral images through a process called use of 'scalograms', derived from the continuous wavelet transform. The deep activity features are extracted from the activity image using deep learning models such as CNN, MobileNetV3, ResNet, and GoogleNet and subsequently classified using a conventional classifier. To validate the proposed model, SisFall and PAMAP2 benchmark datasets are used. Based on the experimental results, this proposed model shows the optimal performance for activity recognition obtaining an accuracy of 98.4% for SisFall and 98.1% for PAMAP2, using Morlet as the mother wavelet with ResNet-101 and a softmax classifier, and outperforms state-of-the-art algorithms.


Asunto(s)
Actividades Humanas , Análisis de Ondículas , Humanos , Actividades Humanas/clasificación , Algoritmos , Aprendizaje Profundo , Dispositivos Electrónicos Vestibles , Actividades Cotidianas , Redes Neurales de la Computación , Procesamiento de Imagen Asistido por Computador/métodos
2.
Sensors (Basel) ; 24(14)2024 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-39065907

RESUMEN

Activity recognition combined with artificial intelligence is a vital area of research, ranging across diverse domains, from sports and healthcare to smart homes. In the industrial domain, and the manual assembly lines, the emphasis shifts to human-machine interaction and thus to human activity recognition (HAR) within complex operational environments. Developing models and methods that can reliably and efficiently identify human activities, traditionally just categorized as either simple or complex activities, remains a key challenge in the field. Limitations of the existing methods and approaches include their inability to consider the contextual complexities associated with the performed activities. Our approach to address this challenge is to create different levels of activity abstractions, which allow for a more nuanced comprehension of activities and define their underlying patterns. Specifically, we propose a new hierarchical taxonomy for human activity abstraction levels based on the context of the performed activities that can be used in HAR. The proposed hierarchy consists of five levels, namely atomic, micro, meso, macro, and mega. We compare this taxonomy with other approaches that divide activities into simple and complex categories as well as other similar classification schemes and provide real-world examples in different applications to demonstrate its efficacy. Regarding advanced technologies like artificial intelligence, our study aims to guide and optimize industrial assembly procedures, particularly in uncontrolled non-laboratory environments, by shaping workflows to enable structured data analysis and highlighting correlations across various levels throughout the assembly progression. In addition, it establishes effective communication and shared understanding between researchers and industry professionals while also providing them with the essential resources to facilitate the development of systems, sensors, and algorithms for custom industrial use cases that adapt to the level of abstraction.


Asunto(s)
Inteligencia Artificial , Humanos , Algoritmos , Actividades Humanas/clasificación
3.
Sensors (Basel) ; 24(14)2024 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-39065968

RESUMEN

Human action recognition based on optical and infrared video data is greatly affected by the environment, and feature extraction in traditional machine learning classification methods is complex; therefore, this paper proposes a method for human action recognition using Frequency Modulated Continuous Wave (FMCW) radar based on an asymmetric convolutional residual network. First, the radar echo data are analyzed and processed to extract the micro-Doppler time domain spectrograms of different actions. Second, a strategy combining asymmetric convolution and the Mish activation function is adopted in the residual block of the ResNet18 network to address the limitations of linear and nonlinear transformations in the residual block for micro-Doppler spectrum recognition. This approach aims to enhance the network's ability to learn features effectively. Finally, the Improved Convolutional Block Attention Module (ICBAM) is integrated into the residual block to enhance the model's attention and comprehension of input data. The experimental results demonstrate that the proposed method achieves a high accuracy of 98.28% in action recognition and classification within complex scenes, surpassing classic deep learning approaches. Moreover, this method significantly improves the recognition accuracy for actions with similar micro-Doppler features and demonstrates excellent anti-noise recognition performance.


Asunto(s)
Redes Neurales de la Computación , Radar , Humanos , Algoritmos , Aprendizaje Automático , Actividades Humanas/clasificación , Aprendizaje Profundo , Reconocimiento de Normas Patrones Automatizadas/métodos
4.
IEEE J Biomed Health Inform ; 28(5): 2687-2698, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38442051

RESUMEN

Self-supervised Human Activity Recognition (HAR) has been gradually gaining a lot of attention in ubiquitous computing community. Its current focus primarily lies in how to overcome the challenge of manually labeling complicated and intricate sensor data from wearable devices, which is often hard to interpret. However, current self-supervised algorithms encounter three main challenges: performance variability caused by data augmentations in contrastive learning paradigm, limitations imposed by traditional self-supervised models, and the computational load deployed on wearable devices by current mainstream transformer encoders. To comprehensively tackle these challenges, this paper proposes a powerful self-supervised approach for HAR from a novel perspective of denoising autoencoder, the first of its kind to explore how to reconstruct masked sensor data built on a commonly employed, well-designed, and computationally efficient fully convolutional network. Extensive experiments demonstrate that our proposed Masked Convolutional AutoEncoder (MaskCAE) outperforms current state-of-the-art algorithms in self-supervised, fully supervised, and semi-supervised situations without relying on any data augmentations, which fills the gap of masked sensor data modeling in HAR area. Visualization analyses show that our MaskCAE could effectively capture temporal semantics in time series sensor data, indicating its great potential in modeling abstracted sensor data. An actual implementation is evaluated on an embedded platform.


Asunto(s)
Algoritmos , Actividades Humanas , Humanos , Actividades Humanas/clasificación , Procesamiento de Señales Asistido por Computador , Dispositivos Electrónicos Vestibles , Aprendizaje Automático Supervisado , Redes Neurales de la Computación
5.
IEEE J Biomed Health Inform ; 28(5): 2733-2744, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38483804

RESUMEN

Human Activity Recognition (HAR) has recently attracted widespread attention, with the effective application of this technology helping people in areas such as healthcare, smart homes, and gait analysis. Deep learning methods have shown remarkable performance in HAR. A pivotal challenge is the trade-off between recognition accuracy and computational efficiency, especially in resource-constrained mobile devices. This challenge necessitates the development of models that enhance feature representation capabilities without imposing additional computational burdens. Addressing this, we introduce a novel HAR model leveraging deep learning, ingeniously designed to navigate the accuracy-efficiency trade-off. The model comprises two innovative modules: 1) Pyramid Multi-scale Convolutional Network (PMCN), which is designed with a symmetric structure and is capable of obtaining a rich receptive field at a finer level through its multiscale representation capability; 2) Cross-Attention Mechanism, which establishes interrelationships among sensor dimensions, temporal dimensions, and channel dimensions, and effectively enhances useful information while suppressing irrelevant data. The proposed model is rigorously evaluated across four diverse datasets: UCI, WISDM, PAMAP2, and OPPORTUNITY. Additional ablation and comparative studies are conducted to comprehensively assess the performance of the model. Experimental results demonstrate that the proposed model achieves superior activity recognition accuracy while maintaining low computational overhead.


Asunto(s)
Aprendizaje Profundo , Actividades Humanas , Humanos , Actividades Humanas/clasificación , Procesamiento de Señales Asistido por Computador , Redes Neurales de la Computación , Algoritmos , Bases de Datos Factuales , Monitoreo Ambulatorio/métodos , Monitoreo Ambulatorio/instrumentación
6.
IEEE Trans Pattern Anal Mach Intell ; 46(8): 5345-5361, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38376962

RESUMEN

Federated human activity recognition (FHAR) has attracted much attention due to its great potential in privacy protection. Existing FHAR methods can collaboratively learn a global activity recognition model based on unimodal or multimodal data distributed on different local clients. However, it is still questionable whether existing methods can work well in a more common scenario where local data are from different modalities, e.g., some local clients may provide motion signals while others can only provide visual data. In this article, we study a new problem of cross-modal federated human activity recognition (CM-FHAR), which is conducive to promote the large-scale use of the HAR model on more local devices. CM-FHAR has at least three dedicated challenges: 1) distributive common cross-modal feature learning, 2) modality-dependent discriminate feature learning, 3) modality imbalance issue. To address these challenges, we propose a modality-collaborative activity recognition network (MCARN), which can comprehensively learn a global activity classifier shared across all clients and multiple modality-dependent private activity classifiers. To produce modality-agnostic and modality-specific features, we learn an altruistic encoder and an egocentric encoder under the constraint of a separation loss and an adversarial modality discriminator collaboratively learned in hyper-sphere. To address the modality imbalance issue, we propose an angular margin adjustment scheme to improve the modality discriminator on modality-imbalanced data by enhancing the intra-modality compactness of the dominant modality and increase the inter-modality discrepancy. Moreover, we propose a relation-aware global-local calibration mechanism to constrain class-level pairwise relationships for the parameters of the private classifier. Finally, through decentralized optimization with alternative steps of adversarial local updating and modality-aware global aggregation, the proposed MCARN obtains state-of-the-art performance on both modality-balanced and modality-imbalanced data.


Asunto(s)
Algoritmos , Actividades Humanas , Reconocimiento de Normas Patrones Automatizadas , Humanos , Actividades Humanas/clasificación , Reconocimiento de Normas Patrones Automatizadas/métodos , Aprendizaje Automático
7.
IEEE Trans Image Process ; 30: 6240-6254, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34224352

RESUMEN

The task of human interaction understanding involves both recognizing the action of each individual in the scene and decoding the interaction relationship among people, which is useful to a series of vision applications such as camera surveillance, video-based sports analysis and event retrieval. This paper divides the task into two problems including grouping people into clusters and assigning labels to each of them, and presents an approach to solving these problems in a joint manner. Our method does not assume the number of groups is known beforehand as this will substantially restrict its application. With the observation that the two challenges are highly correlated, the key idea is to model the pairwise interacting relations among people via a complete graph and its associated energy function such that the labeling and grouping problems are translated into the minimization of the energy function. We implement this joint framework by fusing both deep features and rich contextual cues, and learn the fusion parameters from data. An alternating search algorithm is developed in order to efficiently solve the associated inference problem. By combining the grouping and labeling results obtained with our method, we are able to achieve the semantic-level understanding of human interactions. Extensive experiments are performed to qualitatively and quantitatively evaluate the effectiveness of our approach, which outperforms state-of-the-art methods on several important benchmarks. An ablation study is also performed to verify the effectiveness of different modules within our approach.


Asunto(s)
Actividades Humanas/clasificación , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , Algoritmos , Humanos , Grabación en Video
8.
IEEE Trans Image Process ; 30: 6583-6593, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34270424

RESUMEN

Human-Object Interaction (HOI) Detection is an important task to understand how humans interact with objects. Most of the existing works treat this task as an exhaustive triplet 〈 human, verb, object 〉 classification problem. In this paper, we decompose it and propose a novel two-stage graph model to learn the knowledge of interactiveness and interaction in one network, namely, Interactiveness Proposal Graph Network (IPGN). In the first stage, we design a fully connected graph for learning the interactiveness, which distinguishes whether a pair of human and object is interactive or not. Concretely, it generates the interactiveness features to encode high-level semantic interactiveness knowledge for each pair. The class-agnostic interactiveness is a more general and simpler objective, which can be used to provide reasonable proposals for the graph construction in the second stage. In the second stage, a sparsely connected graph is constructed with all interactive pairs selected by the first stage. Specifically, we use the interactiveness knowledge to guide the message passing. By contrast with the feature similarity, it explicitly represents the connections between the nodes. Benefiting from the valid graph reasoning, the node features are well encoded for interaction learning. Experiments show that the proposed method achieves state-of-the-art performance on both V-COCO and HICO-DET datasets.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la Computación , Algoritmos , Animales , Bases de Datos Factuales , Actividades Humanas/clasificación , Humanos , Semántica
9.
IEEE Trans Image Process ; 30: 3691-3704, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33705316

RESUMEN

This article presents a novel keypoints-based attention mechanism for visual recognition in still images. Deep Convolutional Neural Networks (CNNs) for recognizing images with distinctive classes have shown great success, but their performance in discriminating fine-grained changes is not at the same level. We address this by proposing an end-to-end CNN model, which learns meaningful features linking fine-grained changes using our novel attention mechanism. It captures the spatial structures in images by identifying semantic regions (SRs) and their spatial distributions, and is proved to be the key to modeling subtle changes in images. We automatically identify these SRs by grouping the detected keypoints in a given image. The "usefulness" of these SRs for image recognition is measured using our innovative attentional mechanism focusing on parts of the image that are most relevant to a given task. This framework applies to traditional and fine-grained image recognition tasks and does not require manually annotated regions (e.g. bounding-box of body parts, objects, etc.) for learning and prediction. Moreover, the proposed keypoints-driven attention mechanism can be easily integrated into the existing CNN models. The framework is evaluated on six diverse benchmark datasets. The model outperforms the state-of-the-art approaches by a considerable margin using Distracted Driver V1 (Acc: 3.39%), Distracted Driver V2 (Acc: 6.58%), Stanford-40 Actions (mAP: 2.15%), People Playing Musical Instruments (mAP: 16.05%), Food-101 (Acc: 6.30%) and Caltech-256 (Acc: 2.59%) datasets.


Asunto(s)
Aprendizaje Profundo , Actividades Humanas/clasificación , Procesamiento de Imagen Asistido por Computador/métodos , Femenino , Humanos , Masculino , Semántica
10.
IEEE Trans Image Process ; 30: 2562-2574, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33232232

RESUMEN

Human motion prediction, which aims at predicting future human skeletons given the past ones, is a typical sequence-to-sequence problem. Therefore, extensive efforts have been devoted to exploring different RNN-based encoder-decoder architectures. However, by generating target poses conditioned on the previously generated ones, these models are prone to bringing issues such as error accumulation problem. In this paper, we argue that such issue is mainly caused by adopting autoregressive manner. Hence, a novel Non-AuToregressive model (NAT) is proposed with a complete non-autoregressive decoding scheme, as well as a context encoder and a positional encoding module. More specifically, the context encoder embeds the given poses from temporal and spatial perspectives. The frame decoder is responsible for predicting each future pose independently. The positional encoding module injects positional signal into the model to indicate the temporal order. Besides, a multitask training paradigm is presented for both low-level human skeleton prediction and high-level human action recognition, resulting in the considerable improvement for the prediction task. Our approach is evaluated on Human3.6M and CMU-Mocap benchmarks and outperforms state-of-the-art autoregressive methods.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , Movimiento/fisiología , Actividades Humanas/clasificación , Humanos , Intención , Modelos Estadísticos , Grabación en Video
11.
Sensors (Basel) ; 20(5)2020 Mar 08.
Artículo en Inglés | MEDLINE | ID: mdl-32182668

RESUMEN

Over the past few years, the Internet of Things (IoT) has been greatly developed with one instance being smart home devices gradually entering into people's lives. To maximize the impact of such deployments, home-based activity recognition is required to initially recognize behaviors within smart home environments and to use this information to provide better health and social care services. Activity recognition has the ability to recognize people's activities from the information about their interaction with the environment collected by sensors embedded within the home. In this paper, binary data collected by anonymous binary sensors such as pressure sensors, contact sensors, passive infrared sensors etc. are used to recognize activities. A radial basis function neural network (RBFNN) with localized stochastic-sensitive autoencoder (LiSSA) method is proposed for the purposes of home-based activity recognition. An autoencoder (AE) is introduced to extract useful features from the binary sensor data by converting binary inputs into continuous inputs to extract increased levels of hidden information. The generalization capability of the proposed method is enhanced by minimizing both the training error and the stochastic sensitivity measure in an attempt to improve the ability of the classifier to tolerate uncertainties in the sensor data. Four binary home-based activity recognition datasets including OrdonezA, OrdonezB, Ulster, and activities of daily living data from van Kasteren (vanKasterenADL) are used to evaluate the effectiveness of the proposed method. Compared with well-known benchmarking approaches including support vector machine (SVM), multilayer perceptron neural network (MLPNN), random forest and an RBFNN-based method, the proposed method yielded the best performance with 98.35%, 86.26%, 96.31%, 92.31% accuracy on four datasets, respectively.


Asunto(s)
Actividades Humanas/clasificación , Monitoreo Ambulatorio/métodos , Red Nerviosa , Adulto , Servicios de Atención de Salud a Domicilio , Humanos , Internet de las Cosas , Masculino , Procesos Estocásticos , Máquina de Vectores de Soporte
12.
Nat Commun ; 11(1): 1551, 2020 03 25.
Artículo en Inglés | MEDLINE | ID: mdl-32214095

RESUMEN

Recognizing human physical activities using wireless sensor networks has attracted significant research interest due to its broad range of applications, such as healthcare, rehabilitation, athletics, and senior monitoring. There are critical challenges inherent in designing a sensor-based activity recognition system operating in and around a lossy medium such as the human body to gain a trade-off among power consumption, cost, computational complexity, and accuracy. We introduce an innovative wireless system based on magnetic induction for human activity recognition to tackle these challenges and constraints. The magnetic induction system is integrated with machine learning techniques to detect a wide range of human motions. This approach is successfully evaluated using synthesized datasets, laboratory measurements, and deep recurrent neural networks.


Asunto(s)
Aprendizaje Profundo , Actividades Humanas/clasificación , Fenómenos Magnéticos , Monitoreo Fisiológico/métodos , Procesamiento de Señales Asistido por Computador , Humanos , Movimiento (Física) , Dispositivos Electrónicos Vestibles , Tecnología Inalámbrica
13.
IEEE Trans Neural Netw Learn Syst ; 31(5): 1747-1756, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-31329134

RESUMEN

Recent years have witnessed the success of deep learning methods in human activity recognition (HAR). The longstanding shortage of labeled activity data inherently calls for a plethora of semisupervised learning methods, and one of the most challenging and common issues with semisupervised learning is the imbalanced distribution of labeled data over classes. Although the problem has long existed in broad real-world HAR applications, it is rarely explored in the literature. In this paper, we propose a semisupervised deep model for imbalanced activity recognition from multimodal wearable sensory data. We aim to address not only the challenges of multimodal sensor data (e.g., interperson variability and interclass similarity) but also the limited labeled data and class-imbalance issues simultaneously. In particular, we propose a pattern-balanced semisupervised framework to extract and preserve diverse latent patterns of activities. Furthermore, we exploit the independence of multi-modalities of sensory data and attentively identify salient regions that are indicative of human activities from inputs by our recurrent convolutional attention networks. Our experimental results demonstrate that the proposed model achieves a competitive performance compared to a multitude of state-of-the-art methods, both semisupervised and supervised ones, with 10% labeled training data. The results also show the robustness of our method over imbalanced, small training data sets.


Asunto(s)
Actividades Humanas/clasificación , Redes Neurales de la Computación , Reconocimiento de Normas Patrones Automatizadas/clasificación , Reconocimiento de Normas Patrones Automatizadas/métodos , Aprendizaje Automático Supervisado/clasificación , Humanos
14.
IEEE J Biomed Health Inform ; 24(1): 131-143, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-30716055

RESUMEN

The irregularity detection of daily behaviors for the elderly is an important issue in homecare. Plenty of mechanisms have been developed to detect the health condition of the elderly based on the explicit irregularity of several biomedical parameters or some specific behaviors. However, few research works focus on detecting the implicit irregularity involving the combination of diverse behaviors, which can assess the cognitive and physical wellbeing of elders but cannot be directly identified based on sensor data. This paper proposes an Implicit IRregularity Detection (IIRD) mechanism that aims to detect the implicit irregularity by developing the unsupervised learning algorithm based on daily behaviors. The proposed IIRD mechanism identifies the distance and similarity between daily behaviors, which are important features to distinguish the regular and irregular daily behaviors and detect the implicit irregularity of elderly health condition. Performance results show that the proposed IIRD outperforms the existing unsupervised machine-learning mechanisms in terms of the detection accuracy and irregularity recall.


Asunto(s)
Servicios de Atención de Salud a Domicilio , Actividades Humanas/clasificación , Aprendizaje Automático no Supervisado , Anciano , Algoritmos , Bases de Datos Factuales , Humanos , Monitoreo Fisiológico
15.
IEEE Trans Pattern Anal Mach Intell ; 42(1): 126-139, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-30296212

RESUMEN

With the popularity of mobile sensor technology, smart wearable devices open a unprecedented opportunity to solve the challenging human activity recognition (HAR) problem by learning expressive representations from the multi-dimensional daily sensor signals. This inspires us to develop a new algorithm applicable to both camera-based and wearable sensor-based HAR systems. Although competitive classification accuracy has been reported, existing methods often face the challenge of distinguishing visually similar activities composed of activity patterns in different temporal orders. In this paper, we propose a novel probabilistic algorithm to compactly encode temporal orders of activity patterns for HAR. Specifically, the algorithm learns an optimal set of latent patterns such that their temporal structures really matter in recognizing different human activities. Then, a novel probabilistic First-Take-All (pFTA) approach is introduced to generate compact features from the orders of these latent patterns to encode the entire sequence, and the temporal structural similarity between different sequences can be efficiently measured by the Hamming distance between compact features. Experiments on three public HAR datasets show the proposed pFTA approach can achieve competitive performance in terms of accuracy as well as efficiency.


Asunto(s)
Actividades Humanas/clasificación , Reconocimiento de Normas Patrones Automatizadas/métodos , Algoritmos , Bases de Datos Factuales , Humanos , Procesamiento de Imagen Asistido por Computador , Modelos Estadísticos , Grabación en Video , Dispositivos Electrónicos Vestibles
16.
IEEE Trans Pattern Anal Mach Intell ; 42(3): 622-635, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-30489262

RESUMEN

A first-person video delivers what the camera wearer (actor) experiences through physical interactions with surroundings. In this paper, we focus on a problem of Force from Motion-estimating the active force and torque exerted by the actor to drive her/his activity-from a first-person video. We use two physical cues inherited in the first-person video. (1) Ego-motion: the camera motion is generated by a resultant of force interactions, which allows us to understand the effect of the active force using Newtonian mechanics. (2) Visual semantics: the first-person visual scene is deployed to afford the actor's activity, which is indicative of the physical context of the activity. We estimate the active force and torque using a dynamical system that can describe the transition (dynamics) of the actor's physical state (position, orientation, and linear/angular momentum) where the latent physical state is indirectly observed by the first-person video. We approximate the physical state with the 3D camera trajectory that is reconstructed up to scale and orientation. The absolute scale factor and gravitation field are learned from the ego-motion and visual semantics of the first-person video. Inspired by an optimal control theory, we solve the dynamical system by minimizing reprojection error. Our method shows quantitatively equivalent reconstruction comparing to IMU measurements in terms of gravity and scale recovery and outperforms the methods based on 2D optical flow for an active action recognition task. We apply our method to first-person videos of mountain biking, urban bike racing, skiing, speedflying with parachute, and wingsuit flying where inertial measurements are not accessible.


Asunto(s)
Actividades Humanas/clasificación , Procesamiento de Imagen Asistido por Computador/métodos , Movimiento/fisiología , Grabación en Video/métodos , Aceleración , Humanos , Deportes
17.
IEEE J Biomed Health Inform ; 24(1): 292-299, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-30969934

RESUMEN

Human activity recognition has been widely used in healthcare applications such as elderly monitoring, exercise supervision, and rehabilitation monitoring. Compared with other approaches, sensor-based wearable human activity recognition is less affected by environmental noise and therefore is promising in providing higher recognition accuracy. However, one of the major issues of existing wearable human activity recognition methods is that although the average recognition accuracy is acceptable, the recognition accuracy for some activities (e.g., ascending stairs and descending stairs) is low, mainly due to relatively less training data and complex behavior pattern for these activities. Another issue is that the recognition accuracy is low when the training data from the test subject are limited, which is a common case in real practice. In addition, the use of neural network leads to large computational complexity and thus high power consumption. To address these issues, we proposed a new human activity recognition method with two-stage end-to-end convolutional neural network and a data augmentation method. Compared with the state-of-the-art methods (including neural network based methods and other methods), the proposed methods achieve significantly improved recognition accuracy and reduced computational complexity.


Asunto(s)
Actividades Humanas/clasificación , Movimiento/fisiología , Redes Neurales de la Computación , Acelerometría/métodos , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Monitoreo Ambulatorio/métodos , Reconocimiento de Normas Patrones Automatizadas , Dispositivos Electrónicos Vestibles , Adulto Joven
18.
IEEE J Biomed Health Inform ; 24(1): 27-38, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31107668

RESUMEN

PURPOSE: To evaluate and enhance the generalization performance of machine learning physical activity intensity prediction models developed with raw acceleration data on populations monitored by different activity monitors. METHOD: Five datasets from four studies, each containing only hip- or wrist-based raw acceleration data (two hip- and three wrist-based) were extracted. The five datasets were then used to develop and validate artificial neural networks (ANN) in three setups to classify activity intensity categories (sedentary behavior, light, and moderate-to-vigorous). To examine generalizability, the ANN models were developed using within dataset (leave-one-subject-out) cross validation, and then cross tested to other datasets with different accelerometers. To enhance the models' generalizability, a combination of four of the five datasets was used for training and the fifth dataset for validation. Finally, all the five datasets were merged to develop a single model that is generalizable across the datasets (50% of the subjects from each dataset for training, the remaining for validation). RESULTS: The datasets showed high performance in within dataset cross validation (accuracy 71.9-95.4%, Kappa K = 0.63-0.94). The performance of the within dataset validated models decreased when applied to datasets with different accelerometers (41.2-59.9%, K = 0.21-0.48). The trained models on merged datasets consisting hip and wrist data predicted the left-out dataset with acceptable performance (65.9-83.7%, K = 0.61-0.79). The model trained with all five datasets performed with acceptable performance across the datasets (80.4-90.7%, K = 0.68-0.89). CONCLUSIONS: Integrating heterogeneous datasets in training sets seems a viable approach for enhancing the generalization performance of the models. Instead, within dataset validation is not sufficient to understand the models' performance on other populations with different accelerometers.


Asunto(s)
Acelerometría/métodos , Ejercicio Físico/fisiología , Aprendizaje Automático , Reconocimiento de Normas Patrones Automatizadas/métodos , Adulto , Bases de Datos Factuales , Actividades Humanas/clasificación , Humanos , Modelos Estadísticos , Monitoreo Fisiológico , Redes Neurales de la Computación
19.
IEEE Trans Pattern Anal Mach Intell ; 42(2): 502-508, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-30802849

RESUMEN

We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for actions occurring in 3 second videos poses many challenges: meaningful events do not include only people, but also objects, animals, and natural phenomena; visual and auditory events can be symmetrical in time ("opening" is "closing" in reverse), and either transient or sustained. We describe the annotation process of our dataset (each video is tagged with one action or activity label among 339 different classes), analyze its scale and diversity in comparison to other large-scale video datasets for action recognition, and report results of several baseline models addressing separately, and jointly, three modalities: spatial, temporal and auditory. The Moments in Time dataset, designed to have a large coverage and diversity of events in both visual and auditory modalities, can serve as a new challenge to develop models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis.


Asunto(s)
Bases de Datos Factuales , Grabación en Video , Animales , Actividades Humanas/clasificación , Humanos , Procesamiento de Imagen Asistido por Computador , Reconocimiento de Normas Patrones Automatizadas
20.
IEEE Trans Pattern Anal Mach Intell ; 42(10): 2684-2701, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-31095476

RESUMEN

Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding.


Asunto(s)
Aprendizaje Profundo , Actividades Humanas/clasificación , Procesamiento de Imagen Asistido por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Algoritmos , Benchmarking , Humanos , Semántica , Grabación en Video
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA