Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 98
Filter
Add more filters

Country/Region as subject
Publication year range
1.
BMC Geriatr ; 24(1): 586, 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38977995

ABSTRACT

OBJECTIVE: Through a randomized controlled trial on older adults with sarcopenia, this study compared the training effects of an AI-based remote training group using deep learning-based 3D human pose estimation technology with those of a face-to-face traditional training group and a general remote training group. METHODS: Seventy five older adults with sarcopenia aged 60-75 from community organizations in Changchun city were randomly divided into a face-to-face traditional training group (TRHG), a general remote training group (GTHG), and an AI-based remote training group (AITHG). All groups underwent a 3-month program consisting of 24-form Taichi exercises, with a frequency of 3 sessions per week and each session lasting 40 min. The participants underwent Appendicular Skeletal Muscle Mass Index (ASMI), grip strength, 6-meter walking pace, Timed Up and Go test (TUGT), and quality of life score (QoL) tests before the experiment, during the mid-term, and after the experiment. This study used SPSS26.0 software to perform one-way ANOVA and repeated measures ANOVA tests to compare the differences among the three groups. A significance level of p < 0.05 was defined as having significant difference, while p < 0.01 was defined as having a highly significant difference. RESULTS: (1) The comparison between the mid-term and pre-term indicators showed that TRHG experienced significant improvements in ASMI, 6-meter walking pace, and QoL (p < 0.01), and a significant improvement in TUGT timing test (p < 0.05); GTHG experienced extremely significant improvements in 6-meter walking pace and QoL (p < 0.01); AITHG experienced extremely significant improvements in ASMI, 6-meter walking pace, and QoL (p < 0.01), and a significant improvement in TUGT timing test (p < 0.05). (2) The comparison between the post-term and pre-term indicators showed that TRHG experienced extremely significant improvements in TUGT timing test (p < 0.01); GTHG experienced significant improvements in ASMI and TUGT timing test (p < 0.05); and AITHG experienced extremely significant improvements in TUGT timing test (p < 0.01). (3) During the mid-term, there was no significant difference among the groups in all tests (p > 0.05). The same was in post-term tests (p > 0.05). CONCLUSION: Compared to the pre-experiment, there was no significant difference at the post- experiment in the recovery effects on the muscle quality, physical activity ability, and life quality of patients with sarcopenia between the AI-based remote training group and the face-to-face traditional training group. 3D pose estimation is equally as effective as traditional rehabilitation methods in enhancing muscle quality, functionality and life quality in older adults with sarcopenia. TRIAL REGISTRATION: The trial was registered in ClinicalTrials.gov (NCT05767710).


Subject(s)
Sarcopenia , Telerehabilitation , Humans , Sarcopenia/physiopathology , Sarcopenia/rehabilitation , Sarcopenia/therapy , Aged , Male , Female , Middle Aged , Posture/physiology , Imaging, Three-Dimensional/methods , Quality of Life , Deep Learning
2.
Sensors (Basel) ; 24(2)2024 Jan 09.
Article in English | MEDLINE | ID: mdl-38257488

ABSTRACT

As an important direction in computer vision, human pose estimation has received extensive attention in recent years. A High-Resolution Network (HRNet) can achieve effective estimation results as a classical human pose estimation method. However, the complex structure of the model is not conducive to deployment under limited computer resources. Therefore, an improved Efficient and Lightweight HRNet (EL-HRNet) model is proposed. In detail, point-wise and grouped convolutions were used to construct a lightweight residual module, replacing the original 3 × 3 module to reduce the parameters. To compensate for the information loss caused by the network's lightweight nature, the Convolutional Block Attention Module (CBAM) is introduced after the new lightweight residual module to construct the Lightweight Attention Basicblock (LA-Basicblock) module to achieve high-precision human pose estimation. To verify the effectiveness of the proposed EL-HRNet, experiments were carried out using the COCO2017 and MPII datasets. The experimental results show that the EL-HRNet model requires only 5 million parameters and 2.0 GFlops calculations and achieves an AP score of 67.1% on the COCO2017 validation set. In addition, PCKh@0.5mean is 87.7% on the MPII validation set, and EL-HRNet shows a good balance between model complexity and human pose estimation accuracy.

3.
Sensors (Basel) ; 24(3)2024 Jan 26.
Article in English | MEDLINE | ID: mdl-38339546

ABSTRACT

Recently, monocular 3D human pose estimation (HPE) methods were used to accurately predict 3D pose by solving the ill-pose problem caused by 3D-2D projection. However, monocular 3D HPE still remains challenging owing to the inherent depth ambiguity and occlusions. To address this issue, previous studies have proposed diffusion model-based approaches (DDPM) that learn to reconstruct a correct 3D pose from a noisy initial 3D pose. In addition, these approaches use 2D keypoints or context encoders that encode spatial and temporal information to inform the model. However, they often fall short of achieving peak performance, or require an extended period to converge to the target pose. In this paper, we proposed HDPose, which can converge rapidly and predict 3D poses accurately. Our approach aggregated spatial and temporal information from the condition into a denoising model in a hierarchical structure. We observed that the post-hierarchical structure achieved the best performance among various condition structures. Further, we evaluated our model on the widely used Human3.6M and MPI-INF-3DHP datasets. The proposed model demonstrated competitive performance with state-of-the-art models, achieving high accuracy with faster convergence while being considerably more lightweight.


Subject(s)
Algorithms , Imaging, Three-Dimensional , Humans , Imaging, Three-Dimensional/methods
4.
Sensors (Basel) ; 24(8)2024 Apr 14.
Article in English | MEDLINE | ID: mdl-38676133

ABSTRACT

Two-dimensional (2D) clinical gait analysis systems are more affordable and portable than contemporary three-dimensional (3D) clinical models. Using the Vicon 3D motion capture system as the standard, we evaluated the internal statistics of the Imasen and open-source OpenPose gait measurement systems, both designed for 2D input, to validate their output based on the similarity of results and the legitimacy of their inner statistical processes. We measured time factors, distance factors, and joint angles of the hip and knee joints in the sagittal plane while varying speeds and gaits during level walking in three in-person walking experiments under normal, maximum-speed, and tandem scenarios. The intraclass correlation coefficients of the 2D models were greater than 0.769 for all gait parameters compared with those of Vicon, except for some knee joint angles. The relative agreement was excellent for the time-distance gait parameter and moderate-to-excellent for each gait motion contraction range, except for hip joint angles. The time-distance gait parameter was high for Cronbach's alpha coefficients of 0.899-0.993 but low for 0.298-0.971. Correlation coefficients were greater than 0.571 for time-distance gait parameters but lower for joint angle parameters, particularly hip joint angles. Our study elucidates areas in which to improve 2D models for their widespread clinical application.


Subject(s)
Algorithms , Gait Analysis , Gait , Hip Joint , Knee Joint , Walking , Humans , Gait Analysis/methods , Gait/physiology , Hip Joint/physiology , Knee Joint/physiology , Walking/physiology , Male , Biomechanical Phenomena/physiology , Adult , Range of Motion, Articular/physiology , Posture/physiology , Female
5.
Sensors (Basel) ; 24(12)2024 Jun 13.
Article in English | MEDLINE | ID: mdl-38931606

ABSTRACT

Human pose estimation (HPE) is a technique used in computer vision and artificial intelligence to detect and track human body parts and poses using images or videos. Widely used in augmented reality, animation, fitness applications, and surveillance, HPE methods that employ monocular cameras are highly versatile and applicable to standard videos and CCTV footage. These methods have evolved from two-dimensional (2D) to three-dimensional (3D) pose estimation. However, in real-world environments, current 3D HPE methods trained on laboratory-based motion capture data encounter challenges, such as limited training data, depth ambiguity, left/right switching, and issues with occlusions. In this study, four 3D HPE methods were compared based on their strengths and weaknesses using real-world videos. Joint position correction techniques were proposed to eliminate and correct anomalies such as left/right inversion and false detections of joint positions in daily life motions. Joint angle trajectories were obtained for intuitive and informative human activity recognition using an optimization method based on a 3D humanoid simulator, with the joint position corrected by the proposed technique as the input. The efficacy of the proposed method was verified by applying it to three types of freehand gymnastic exercises and comparing the joint angle trajectories during motion.


Subject(s)
Deep Learning , Joints , Posture , Humans , Posture/physiology , Joints/physiology , Imaging, Three-Dimensional/methods , Algorithms , Movement/physiology , Video Recording/methods
6.
Sensors (Basel) ; 24(6)2024 Mar 17.
Article in English | MEDLINE | ID: mdl-38544186

ABSTRACT

In biomechanics, movement is typically recorded by tracking the trajectories of anatomical landmarks previously marked using passive instrumentation, which entails several inconveniences. To overcome these disadvantages, researchers are exploring different markerless methods, such as pose estimation networks, to capture movement with equivalent accuracy to marker-based photogrammetry. However, pose estimation models usually only provide joint centers, which are incomplete data for calculating joint angles in all anatomical axes. Recently, marker augmentation models based on deep learning have emerged. These models transform pose estimation data into complete anatomical data. Building on this concept, this study presents three marker augmentation models of varying complexity that were compared to a photogrammetry system. The errors in anatomical landmark positions and the derived joint angles were calculated, and a statistical analysis of the errors was performed to identify the factors that most influence their magnitude. The proposed Transformer model improved upon the errors reported in the literature, yielding position errors of less than 1.5 cm for anatomical landmarks and 4.4 degrees for all seven movements evaluated. Anthropometric data did not influence the errors, while anatomical landmarks and movement influenced position errors, and model, rotation axis, and movement influenced joint angle errors.


Subject(s)
Deep Learning , Movement , Rotation , Biomechanical Phenomena , Photogrammetry
7.
Sensors (Basel) ; 24(18)2024 Sep 13.
Article in English | MEDLINE | ID: mdl-39338702

ABSTRACT

Parkinson's disease (PD) is the second most common movement disorder in the world. It is characterized by motor and non-motor symptoms that have a profound impact on the independence and quality of life of people affected by the disease, which increases caregivers' burdens. The use of the quantitative gait data of people with PD and deep learning (DL) approaches based on gait are emerging as increasingly promising methods to support and aid clinical decision making, with the aim of providing a quantitative and objective diagnosis, as well as an additional tool for disease monitoring. This will allow for the early detection of the disease, assessment of progression, and implementation of therapeutic interventions. In this paper, the authors provide a systematic review of emerging DL techniques recently proposed for the analysis of PD by using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The Scopus, PubMed, and Web of Science databases were searched across an interval of six years (between 2018, when the first article was published, and 2023). A total of 25 articles were included in this review, which reports studies on the movement analysis of PD patients using both wearable and non-wearable sensors. Additionally, these studies employed DL networks for classification, diagnosis, and monitoring purposes. The authors demonstrate that there is a wide employment in the field of PD of convolutional neural networks for analyzing signals from wearable sensors and pose estimation networks for motion analysis from videos. In addition, the authors discuss current difficulties and highlight future solutions for PD monitoring and disease progression.


Subject(s)
Deep Learning , Gait , Parkinson Disease , Humans , Parkinson Disease/physiopathology , Parkinson Disease/diagnosis , Gait/physiology , Gait Analysis/methods , Wearable Electronic Devices , Quality of Life
8.
Sensors (Basel) ; 24(13)2024 Jul 08.
Article in English | MEDLINE | ID: mdl-39001202

ABSTRACT

Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encoding to achieve 3D pose inference. However, various architectures exploit the independent effects of spatial and temporal cues on 3D pose estimation, while neglecting the spatial-temporal synergistic influence. To address this issue, this paper proposes a novel 3D pose estimation method with a dual-adaptive spatial-temporal former (DASTFormer) and additional supervised training. The DASTFormer contains attention-adaptive (AtA) and pure-adaptive (PuA) modes, which will enhance pose inference from 2D to 3D by adaptively learning spatial-temporal effects, considering both their cooperative and independent influences. In addition, an additional supervised training with batch variance loss is proposed in this work. Different from common training strategy, a two-round parameter update is conducted on the same batch data. Not only can it better explore the potential relationship between spatial-temporal encoding and 3D poses, but it can also alleviate the batch size limitations imposed by graphics cards on transformer-based frameworks. Extensive experimental results show that the proposed method significantly outperforms most state-of-the-art approaches on Human3.6 and HumanEVA datasets.


Subject(s)
Algorithms , Imaging, Three-Dimensional , Humans , Imaging, Three-Dimensional/methods , Posture/physiology , Robotics/methods
9.
Sensors (Basel) ; 24(11)2024 May 26.
Article in English | MEDLINE | ID: mdl-38894216

ABSTRACT

In this paper, we propose a novel, vision-transformer-based end-to-end pose estimation method, LidPose, for real-time human skeleton estimation in non-repetitive circular scanning (NRCS) lidar point clouds. Building on the ViTPose architecture, we introduce novel adaptations to address the unique properties of NRCS lidars, namely, the sparsity and unusual rosetta-like scanning pattern. The proposed method addresses a common issue of NRCS lidar-based perception, namely, the sparsity of the measurement, which needs balancing between the spatial and temporal resolution of the recorded data for efficient analysis of various phenomena. LidPose utilizes foreground and background segmentation techniques for the NRCS lidar sensor to select a region of interest (RoI), making LidPose a complete end-to-end approach to moving pedestrian detection and skeleton fitting from raw NRCS lidar measurement sequences captured by a static sensor for surveillance scenarios. To evaluate the method, we have created a novel, real-world, multi-modal dataset, containing camera images and lidar point clouds from a Livox Avia sensor, with annotated 2D and 3D human skeleton ground truth.

10.
Ergonomics ; 67(2): 240-256, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37264831

ABSTRACT

The aim is to develop a computer-based assessment model for novel dynamic postural evaluation using RULA. The present study proposed a camera-based, three-dimensional (3D) dynamic human pose estimation model using 'BlazePose' with a data set of 50,000 action-level-based images. The model was investigated using the Deep Neural Network (DNN) and Transfer Learning (TL) approach. The model has been trained to evaluate the posture with high accuracy, precision, and recall for each output prediction class. The model can quickly analyse the ergonomics of dynamic posture online and offline with a promising accuracy of 94.12%. A novel dynamic postural estimator using blaze pose and transfer learning is proposed and assessed for accuracy. The model is subjected to a constant muscle loading factor and foot support score that could evaluate one person with good image clarity at a time.Practitioner summary: A detailed investigation of dynamic work postures is largely missing in the literature. Experimental analysis has been performed using transfer learning, BlazePose, and RULA action levels. An overall accuracy of 94.12% is achieved for dynamic postural assessment.


Subject(s)
Neural Networks, Computer , Posture , Humans , Posture/physiology , Learning , Ergonomics/methods , Machine Learning
11.
J Sports Sci Med ; 23(1): 515-525, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39228769

ABSTRACT

OpenPose-based motion analysis (OpenPose-MA), utilizing deep learning methods, has emerged as a compelling technique for estimating human motion. It addresses the drawbacks associated with conventional three-dimensional motion analysis (3D-MA) and human visual detection-based motion analysis (Human-MA), including costly equipment, time-consuming analysis, and restricted experimental settings. This study aims to assess the precision of OpenPose-MA in comparison to Human-MA, using 3D-MA as the reference standard. The study involved a cohort of 21 young and healthy adults. OpenPose-MA employed the OpenPose algorithm, a deep learning-based open-source two-dimensional (2D) pose estimation method. Human-MA was conducted by a skilled physiotherapist. The knee valgus angle during a drop vertical jump task was computed by OpenPose-MA and Human-MA using the same frontal-plane video image, with 3D-MA serving as the reference standard. Various metrics were utilized to assess the reproducibility, accuracy and similarity of the knee valgus angle between the different methods, including the intraclass correlation coefficient (ICC) (1, 3), mean absolute error (MAE), coefficient of multiple correlation (CMC) for waveform pattern similarity, and Pearson's correlation coefficients (OpenPose-MA vs. 3D-MA, Human-MA vs. 3D-MA). Unpaired t-tests were conducted to compare MAEs and CMCs between OpenPose-MA and Human-MA. The ICCs (1,3) for OpenPose-MA, Human-MA, and 3D-MA demonstrated excellent reproducibility in the DVJ trial. No significant difference between OpenPose-MA and Human-MA was observed in terms of the MAEs (OpenPose: 2.4° [95%CI: 1.9-3.0°], Human: 3.2° [95%CI: 2.1-4.4°]) or CMCs (OpenPose: 0.83 [range: 0.99-0.53], Human: 0.87 [range: 0.24-0.98]) of knee valgus angles. The Pearson's correlation coefficients of OpenPose-MA and Human-MA relative to that of 3D-MA were 0.97 and 0.98, respectively. This study demonstrated that OpenPose-MA achieved satisfactory reproducibility, accuracy and exhibited waveform similarity comparable to 3D-MA, similar to Human-MA. Both OpenPose-MA and Human-MA showed a strong correlation with 3D-MA in terms of knee valgus angle excursion.


Subject(s)
Deep Learning , Humans , Reproducibility of Results , Young Adult , Male , Female , Biomechanical Phenomena , Knee Joint/physiology , Video Recording , Adult , Time and Motion Studies , Algorithms , Exercise Test/methods , Plyometric Exercise , Range of Motion, Articular/physiology , Imaging, Three-Dimensional
12.
Sensors (Basel) ; 23(21)2023 Oct 31.
Article in English | MEDLINE | ID: mdl-37960561

ABSTRACT

Physical rehabilitation plays a crucial role in restoring motor function following injuries or surgeries. However, the challenge of overcrowded waiting lists often hampers doctors' ability to monitor patients' recovery progress in person. Deep Learning methods offer a solution by enabling doctors to optimize their time with each patient and distinguish between those requiring specific attention and those making positive progress. Doctors use the flexion angle of limbs as a cue to assess a patient's mobility level during rehabilitation. From a Computer Vision perspective, this task can be framed as automatically estimating the pose of the target body limbs in an image. The objectives of this study can be summarized as follows: (i) evaluating and comparing multiple pose estimation methods; (ii) analyzing how the subject's position and camera viewpoint impact the estimation; and (iii) determining whether 3D estimation methods are necessary or if 2D estimation suffices for this purpose. To conduct this technical study, and due to the limited availability of public datasets related to physical rehabilitation exercises, we introduced a new dataset featuring 27 individuals performing eight diverse physical rehabilitation exercises focusing on various limbs and body positions. Each exercise was recorded using five RGB cameras capturing different viewpoints of the person. An infrared tracking system named OptiTrack was utilized to establish the ground truth positions of the joints in the limbs under study. The results, supported by statistical tests, show that not all state-of-the-art pose estimators perform equally in the presented situations (e.g., patient lying on the stretcher vs. standing). Statistical differences exist between camera viewpoints, with the frontal view being the most convenient. Additionally, the study concludes that 2D pose estimators are adequate for estimating joint angles given the selected camera viewpoints.


Subject(s)
Exercise Therapy , Posture , Humans , Exercise , Exercise Therapy/methods , Extremities , Standing Position
13.
Sensors (Basel) ; 23(17)2023 Aug 22.
Article in English | MEDLINE | ID: mdl-37687768

ABSTRACT

Human pose estimation is an important Computer Vision problem, whose goal is to estimate the human body through joints. Currently, methods that employ deep learning techniques excel in the task of 2D human pose estimation. However, the use of 3D poses can bring more accurate and robust results. Since 3D pose labels can only be acquired in restricted scenarios, fully convolutional methods tend to perform poorly on the task. One strategy to solve this problem is to use 2D pose estimators, to estimate 3D poses in two steps using 2D pose inputs. Due to database acquisition constraints, the performance improvement of this strategy can only be observed in controlled environments, therefore domain adaptation techniques can be used to increase the generalization capability of the system by inserting information from synthetic domains. In this work, we propose a novel method called Domain Unified approach, aimed at solving pose misalignment problems on a cross-dataset scenario, through a combination of three modules on top of the pose estimator: pose converter, uncertainty estimator, and domain classifier. Our method led to a 44.1mm (29.24%) error reduction, when training with the SURREAL synthetic dataset and evaluating with Human3.6M over a no-adaption scenario, achieving state-of-the-art performance.


Subject(s)
Acclimatization , Environment, Controlled , Humans , Databases, Factual , Uncertainty
14.
Sensors (Basel) ; 23(17)2023 Sep 03.
Article in English | MEDLINE | ID: mdl-37688082

ABSTRACT

Human pose estimation is the basis of many downstream tasks, such as motor intervention, behavior understanding, and human-computer interaction. The existing human pose estimation methods rely too much on the similarity of keypoints at the image feature level, which is vulnerable to three problems: object occlusion, keypoints ghost, and neighbor pose interference. We propose a dual-space-driven topology model for the human pose estimation task. Firstly, the model extracts relatively accurate keypoints features through a Transformer-based feature extraction method. Then, the correlation of keypoints in the physical space is introduced to alleviate the error localization problem caused by excessive dependence on the feature-level representation of the model. Finally, through the graph convolutional neural network, the spatial correlation of keypoints and the feature correlation are effectively fused to obtain more accurate human pose estimation results. The experimental results on real datasets also further verify the effectiveness of our proposed model.


Subject(s)
Electric Power Supplies , Neural Networks, Computer , Humans
15.
Sensors (Basel) ; 23(9)2023 Apr 30.
Article in English | MEDLINE | ID: mdl-37177628

ABSTRACT

Hybrid models which combine the convolution and transformer model achieve impressive performance on human pose estimation. However, the existing hybrid models on human pose estimation, which typically stack self-attention modules after convolution, are prone to mutual conflict. The mutual conflict enforces one type of module to dominate over these hybrid sequential models. Consequently, the performance of higher-precision keypoints localization is not consistent with overall performance. To alleviate this mutual conflict, we developed a hybrid parallel network by parallelizing the self-attention modules and the convolution modules, which conduce to leverage the complementary capabilities effectively. The parallel network ensures that the self-attention branch tends to model the long-range dependency to enhance the semantic representation, whereas the local sensitivity of the convolution branch contributes to high-precision localization simultaneously. To further mitigate the conflict, we proposed a cross-branches attention module to gate the features generated by both branches along the channel dimension. The hybrid parallel network achieves 75.6% and 75.4%AP on COCO validation and test-dev sets and achieves consistent performance on both higher-precision localization and overall performance. The experiments show that our hybrid parallel network is on par with the state-of-the-art human pose estimation models.


Subject(s)
Electric Power Supplies , Semantics , Humans
16.
Sensors (Basel) ; 23(6)2023 Mar 16.
Article in English | MEDLINE | ID: mdl-36991904

ABSTRACT

Axial postural abnormalities (aPA) are common features of Parkinson's disease (PD) and manifest in over 20% of patients during the course of the disease. aPA form a spectrum of functional trunk misalignment, ranging from a typical Parkinsonian stooped posture to progressively greater degrees of spine deviation. Current research has not yet led to a sufficient understanding of pathophysiology and management of aPA in PD, partially due to lack of agreement on validated, user-friendly, automatic tools for measuring and analysing the differences in the degree of aPA, according to patients' therapeutic conditions and tasks. In this context, human pose estimation (HPE) software based on deep learning could be a valid support as it automatically extrapolates spatial coordinates of the human skeleton keypoints from images or videos. Nevertheless, standard HPE platforms have two limitations that prevent their adoption in such a clinical practice. First, standard HPE keypoints are inconsistent with the keypoints needed to assess aPA (degrees and fulcrum). Second, aPA assessment either requires advanced RGB-D sensors or, when based on the processing of RGB images, they are most likely sensitive to the adopted camera and to the scene (e.g., sensor-subject distance, lighting, background-subject clothing contrast). This article presents a software that augments the human skeleton extrapolated by state-of-the-art HPE software from RGB pictures with exact bone points for posture evaluation through computer vision post-processing primitives. This article shows the software robustness and accuracy on the processing of 76 RGB images with different resolutions and sensor-subject distances from 55 PD patients with different degrees of anterior and lateral trunk flexion.


Subject(s)
Parkinson Disease , Humans , Parkinson Disease/diagnosis , Posture/physiology , Software , Videotape Recording , Bone and Bones , Postural Balance/physiology
17.
Sensors (Basel) ; 24(1)2023 Dec 25.
Article in English | MEDLINE | ID: mdl-38202972

ABSTRACT

In the recent era, 2D human pose estimation (HPE) has become an integral part of advanced computer vision (CV) applications, particularly in understanding human behaviors. Despite challenges such as occlusion, unfavorable lighting, and motion blur, advancements in deep learning have significantly enhanced the performance of 2D HPE by enabling automatic feature learning from data and improving model generalization. Given the crucial role of 2D HPE in accurately identifying and classifying human body joints, optimization is imperative. In response, we introduce the Spatially Oriented Attention-Infused Structured-Feature-enabled PoseResNet (SOCA-PRNet) for enhanced 2D HPE. This model incorporates a novel element, Spatially Oriented Attention (SOCA), designed to enhance accuracy without significantly increasing the parameter count. Leveraging the strength of ResNet34 and integrating Global Context Blocks (GCBs), SOCA-PRNet precisely captures detailed human poses. Empirical evaluations demonstrate that our model outperforms existing state-of-the-art approaches, achieving a Percentage of Correct Keypoints at 0.5 (PCKh@0.5) of 90.877 at a 50% threshold and a Mean Precision (Mean@0.1) score of 41.137. These results underscore the potential of SOCA-PRNet in real-world applications such as robotics, gaming, and human-computer interaction, where precise and efficient 2D HPE is paramount.


Subject(s)
Lighting , Robotics , Humans , Motion
18.
Sensors (Basel) ; 23(15)2023 Aug 07.
Article in English | MEDLINE | ID: mdl-37571779

ABSTRACT

As the use of construction robots continues to increase, ensuring safety and productivity while working alongside human workers becomes crucial. To prevent collisions, robots must recognize human behavior in close proximity. However, single, or RGB-depth cameras have limitations, such as detection failure, sensor malfunction, occlusions, unconstrained lighting, and motion blur. Therefore, this study proposes a multiple-camera approach for human activity recognition during human-robot collaborative activities in construction. The proposed approach employs a particle filter, to estimate the 3D human pose by fusing 2D joint locations extracted from multiple cameras and applies long short-term memory network (LSTM) to recognize ten activities associated with human and robot collaboration tasks in construction. The study compared the performance of human activity recognition models using one, two, three, and four cameras. Results showed that using multiple cameras enhances recognition performance, providing a more accurate and reliable means of identifying and differentiating between various activities. The results of this study are expected to contribute to the advancement of human activity recognition and utilization in human-robot collaboration in construction.


Subject(s)
Robotics , Humans , Robotics/methods , Motion , Lighting
19.
Sensors (Basel) ; 23(19)2023 Oct 01.
Article in English | MEDLINE | ID: mdl-37837046

ABSTRACT

Due to the growing interest in climbing, increasing importance has been given to research in the field of non-invasive, camera-based motion analysis. While existing work uses invasive technologies such as wearables or modified walls and holds, or focuses on competitive sports, we for the first time present a system that uses video analysis to automatically recognize six movement errors that are typical for novices with limited climbing experience. Climbing a complete route consists of three repetitive climbing phases. Therefore, a characteristic joint arrangement may be detected as an error in a specific climbing phase, while this exact arrangement may not considered to be an error in another climbing phase. That is why we introduced a finite state machine to determine the current phase and to check for errors that commonly occur in the current phase. The transition between the phases depends on which joints are being used. To capture joint movements, we use a fourth-generation iPad Pro with LiDAR to record climbing sequences in which we convert the climber's 2-D skeleton provided by the Vision framework from Apple into 3-D joints using the LiDAR depth information. Thereupon, we introduced a method that derives whether a joint moves or not, determining the current phase. Finally, the 3-D joints are analyzed with respect to defined characteristic joint arrangements to identify possible motion errors. To present the feedback to the climber, we imitate a virtual mentor by realizing an application on the iPad that creates an analysis immediately after the climber has finished the route by pointing out the detected errors and by giving suggestions for improvement. Quantitative tests with three experienced climbers that were able to climb reference routes without any errors and intentionally with errors resulted in precision-recall curves evaluating the error detection performance. The results demonstrate that while the number of false positives is still in an acceptable range, the number of detected errors is sufficient to provide climbing novices with adequate suggestions for improvement. Moreover, our study reveals limitations that mainly originate from incorrect joint localizations caused by the LiDAR sensor range. With human pose estimation becoming increasingly reliable and with the advance of sensor capabilities, these limitations will have a decreasing impact on our system performance.

20.
Sensors (Basel) ; 23(6)2023 Mar 12.
Article in English | MEDLINE | ID: mdl-36991768

ABSTRACT

The accurate estimation of a 3D human pose is of great importance in many fields, such as human-computer interaction, motion recognition and automatic driving. In view of the difficulty of obtaining 3D ground truth labels for a dataset of 3D pose estimation techniques, we take 2D images as the research object in this paper, and propose a self-supervised 3D pose estimation model called Pose ResNet. ResNet50 is used as the basic network for extract features. First, a convolutional block attention module (CBAM) was introduced to refine selection of significant pixels. Then, a waterfall atrous spatial pooling (WASP) module is used to capture multi-scale contextual information from the extracted features to increase the receptive field. Finally, the features are input into a deconvolution network to acquire the volume heat map, which is later processed by a soft argmax function to obtain the coordinates of the joints. In addition to the two learning strategies of transfer learning and synthetic occlusion, a self-supervised training method is also used in this model, in which the 3D labels are constructed by the epipolar geometry transformation to supervise the training of the network. Without the need for 3D ground truths for the dataset, accurate estimation of the 3D human pose can be realized from a single 2D image. The results show that the mean per joint position error (MPJPE) is 74.6 mm without the need for 3D ground truth labels. Compared with other approaches, the proposed method achieves better results.


Subject(s)
Automobile Driving , Self-Management , Humans , Hot Temperature , Learning , Motion
SELECTION OF CITATIONS
SEARCH DETAIL