Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 18.141
Filtrar
1.
Sensors (Basel) ; 24(13)2024 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-39000889

RESUMO

Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.


Assuntos
Emoções , Fala , Humanos , Emoções/fisiologia , Fala/fisiologia , Algoritmos , Reprodutibilidade dos Testes , Reconhecimento Automatizado de Padrão/métodos , Bases de Dados Factuais
2.
Sensors (Basel) ; 24(13)2024 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-39000930

RESUMO

Convolutional neural networks (CNNs) have made significant progress in the field of facial expression recognition (FER). However, due to challenges such as occlusion, lighting variations, and changes in head pose, facial expression recognition in real-world environments remains highly challenging. At the same time, methods solely based on CNN heavily rely on local spatial features, lack global information, and struggle to balance the relationship between computational complexity and recognition accuracy. Consequently, the CNN-based models still fall short in their ability to address FER adequately. To address these issues, we propose a lightweight facial expression recognition method based on a hybrid vision transformer. This method captures multi-scale facial features through an improved attention module, achieving richer feature integration, enhancing the network's perception of key facial expression regions, and improving feature extraction capabilities. Additionally, to further enhance the model's performance, we have designed the patch dropping (PD) module. This module aims to emulate the attention allocation mechanism of the human visual system for local features, guiding the network to focus on the most discriminative features, reducing the influence of irrelevant features, and intuitively lowering computational costs. Extensive experiments demonstrate that our approach significantly outperforms other methods, achieving an accuracy of 86.51% on RAF-DB and nearly 70% on FER2013, with a model size of only 3.64 MB. These results demonstrate that our method provides a new perspective for the field of facial expression recognition.


Assuntos
Expressão Facial , Redes Neurais de Computação , Humanos , Reconhecimento Facial Automatizado/métodos , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Face , Reconhecimento Automatizado de Padrão/métodos
3.
Sensors (Basel) ; 24(13)2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-39000981

RESUMO

This work presents a novel approach for elbow gesture recognition using an array of inductive sensors and a machine learning algorithm (MLA). This paper describes the design of the inductive sensor array integrated into a flexible and wearable sleeve. The sensor array consists of coils sewn onto the sleeve, which form an LC tank circuit along with the externally connected inductors and capacitors. Changes in the elbow position modulate the inductance of these coils, allowing the sensor array to capture a range of elbow movements. The signal processing and random forest MLA to recognize 10 different elbow gestures are described. Rigorous evaluation on 8 subjects and data augmentation, which leveraged the dataset to 1270 trials per gesture, enabled the system to achieve remarkable accuracy of 98.3% and 98.5% using 5-fold cross-validation and leave-one-subject-out cross-validation, respectively. The test performance was then assessed using data collected from five new subjects. The high classification accuracy of 94% demonstrates the generalizability of the designed system. The proposed solution addresses the limitations of existing elbow gesture recognition designs and offers a practical and effective approach for intuitive human-machine interaction.


Assuntos
Algoritmos , Cotovelo , Gestos , Aprendizado de Máquina , Humanos , Cotovelo/fisiologia , Dispositivos Eletrônicos Vestíveis , Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador , Masculino , Adulto , Feminino
4.
Int J Neural Syst ; 34(9): 2450049, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39010725

RESUMO

Abnormal behavior recognition is an important technology used to detect and identify activities or events that deviate from normal behavior patterns. It has wide applications in various fields such as network security, financial fraud detection, and video surveillance. In recent years, Deep Convolution Networks (ConvNets) have been widely applied in abnormal behavior recognition algorithms and have achieved significant results. However, existing abnormal behavior detection algorithms mainly focus on improving the accuracy of the algorithms and have not explored the real-time nature of abnormal behavior recognition. This is crucial to quickly identify abnormal behavior in public places and improve urban public safety. Therefore, this paper proposes an abnormal behavior recognition algorithm based on three-dimensional (3D) dense connections. The proposed algorithm uses a multi-instance learning strategy to classify various types of abnormal behaviors, and employs dense connection modules and soft-threshold attention mechanisms to reduce the model's parameter count and enhance network computational efficiency. Finally, redundant information in the sequence is reduced by attention allocation to mitigate its negative impact on recognition results. Experimental verification shows that our method achieves a recognition accuracy of 95.61% on the UCF-crime dataset. Comparative experiments demonstrate that our model has strong performance in terms of recognition accuracy and speed.


Assuntos
Redes Neurais de Computação , Humanos , Reconhecimento Automatizado de Padrão/métodos , Aprendizado Profundo , Algoritmos , Crime , Comportamento/fisiologia
5.
Sci Rep ; 14(1): 15310, 2024 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-38961136

RESUMO

Human activity recognition has a wide range of applications in various fields, such as video surveillance, virtual reality and human-computer intelligent interaction. It has emerged as a significant research area in computer vision. GCN (Graph Convolutional networks) have recently been widely used in these fields and have made great performance. However, there are still some challenges including over-smoothing problem caused by stack graph convolutions and deficient semantics correlation to capture the large movements between time sequences. Vision Transformer (ViT) is utilized in many 2D and 3D image fields and has surprised results. In our work, we propose a novel human activity recognition method based on ViT (HAR-ViT). We integrate enhanced AGCL (eAGCL) in 2s-AGCN to ViT to make it process spatio-temporal data (3D skeleton) and make full use of spatial features. The position encoder module orders the non-sequenced information while the transformer encoder efficiently compresses sequence data features to enhance calculation speed. Human activity recognition is accomplished through multi-layer perceptron (MLP) classifier. Experimental results demonstrate that the proposed method achieves SOTA performance on three extensively used datasets, NTU RGB+D 60, NTU RGB+D 120 and Kinetics-Skeleton 400.


Assuntos
Atividades Humanas , Humanos , Redes Neurais de Computação , Algoritmos , Reconhecimento Automatizado de Padrão/métodos , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos
6.
Sci Rep ; 14(1): 17155, 2024 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-39060307

RESUMO

Gait recognition has become an increasingly promising area of research in the search for noninvasive and effective methods of person identification. Its potential applications in security systems and medical diagnosis make it an exciting field with wide-ranging implications. However, precisely recognizing and assessing gait patterns is difficult, particularly in changing situations or from multiple perspectives. In this study, we utilized the widely used CASIA-B dataset to observe the performance of our proposed gait recognition model, with the aim of addressing some of the existing limitations in this field. Fifty individuals are randomly selected from the dataset, and the resulting data are split evenly for training and testing purposes. We begin by excerpting features from gait photos using two well-known deep learning networks, MobileNetV1 and Xception. We then combined these features and reduced their dimensionality via principal component analysis (PCA) to improve the model's performance. We subsequently assessed the model using two distinct classifiers: a random forest and a one against all support vector machine (OaA-SVM). The findings indicate that the OaA-SVM classifier manifests superior performance compared to the others, with a mean accuracy of 98.77% over eleven different viewing angles. This study is conducive to the development of effective gait recognition algorithms that can be applied to heighten people's security and promote their well-being.


Assuntos
Marcha , Análise de Componente Principal , Máquina de Vetores de Suporte , Humanos , Marcha/fisiologia , Algoritmos , Aprendizado Profundo , Feminino , Masculino , Reconhecimento Automatizado de Padrão/métodos , Adulto
7.
Sensors (Basel) ; 24(14)2024 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-39065968

RESUMO

Human action recognition based on optical and infrared video data is greatly affected by the environment, and feature extraction in traditional machine learning classification methods is complex; therefore, this paper proposes a method for human action recognition using Frequency Modulated Continuous Wave (FMCW) radar based on an asymmetric convolutional residual network. First, the radar echo data are analyzed and processed to extract the micro-Doppler time domain spectrograms of different actions. Second, a strategy combining asymmetric convolution and the Mish activation function is adopted in the residual block of the ResNet18 network to address the limitations of linear and nonlinear transformations in the residual block for micro-Doppler spectrum recognition. This approach aims to enhance the network's ability to learn features effectively. Finally, the Improved Convolutional Block Attention Module (ICBAM) is integrated into the residual block to enhance the model's attention and comprehension of input data. The experimental results demonstrate that the proposed method achieves a high accuracy of 98.28% in action recognition and classification within complex scenes, surpassing classic deep learning approaches. Moreover, this method significantly improves the recognition accuracy for actions with similar micro-Doppler features and demonstrates excellent anti-noise recognition performance.


Assuntos
Redes Neurais de Computação , Radar , Humanos , Algoritmos , Aprendizado de Máquina , Atividades Humanas/classificação , Aprendizado Profundo , Reconhecimento Automatizado de Padrão/métodos
8.
Sensors (Basel) ; 24(14)2024 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-39065979

RESUMO

By leveraging artificial intelligence and big data to analyze and assess classroom conditions, we can significantly enhance teaching quality. Nevertheless, numerous existing studies primarily concentrate on evaluating classroom conditions for student groups, often neglecting the need for personalized instructional support for individual students. To address this gap and provide a more focused analysis of individual students in the classroom environment, we implemented an embedded application design using face recognition technology and target detection algorithms. The Insightface face recognition algorithm was employed to identify students by constructing a classroom face dataset and training it; simultaneously, classroom behavioral data were collected and trained, utilizing the YOLOv5 algorithm to detect students' body regions and correlate them with their facial regions to identify students accurately. Subsequently, these modeling algorithms were deployed onto an embedded device, the Atlas 200 DK, for application development, enabling the recording of both overall classroom conditions and individual student behaviors. Test results show that the detection precision for various types of behaviors is above 0.67. The average false detection rate for face recognition is 41.5%. The developed embedded application can reliably detect student behavior in a classroom setting, identify students, and capture image sequences of body regions associated with negative behavior for better management. These data empower teachers to gain a deeper understanding of their students, which is crucial for enhancing teaching quality and addressing the individual needs of students.


Assuntos
Algoritmos , Humanos , Estudantes , Inteligência Artificial , Face/fisiologia , Reconhecimento Facial/fisiologia , Reconhecimento Facial Automatizado/métodos , Processamento de Imagem Assistida por Computador/métodos , Feminino , Reconhecimento Automatizado de Padrão/métodos
9.
Sensors (Basel) ; 24(14)2024 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-39066043

RESUMO

Human activity recognition (HAR) is pivotal in advancing applications ranging from healthcare monitoring to interactive gaming. Traditional HAR systems, primarily relying on single data sources, face limitations in capturing the full spectrum of human activities. This study introduces a comprehensive approach to HAR by integrating two critical modalities: RGB imaging and advanced pose estimation features. Our methodology leverages the strengths of each modality to overcome the drawbacks of unimodal systems, providing a richer and more accurate representation of activities. We propose a two-stream network that processes skeletal and RGB data in parallel, enhanced by pose estimation techniques for refined feature extraction. The integration of these modalities is facilitated through advanced fusion algorithms, significantly improving recognition accuracy. Extensive experiments conducted on the UTD multimodal human action dataset (UTD MHAD) demonstrate that the proposed approach exceeds the performance of existing state-of-the-art algorithms, yielding improved outcomes. This study not only sets a new benchmark for HAR systems but also highlights the importance of feature engineering in capturing the complexity of human movements and the integration of optimal features. Our findings pave the way for more sophisticated, reliable, and applicable HAR systems in real-world scenarios.


Assuntos
Algoritmos , Atividades Humanas , Humanos , Processamento de Imagem Assistida por Computador/métodos , Movimento/fisiologia , Postura/fisiologia , Reconhecimento Automatizado de Padrão/métodos
10.
IEEE J Biomed Health Inform ; 28(7): 3872-3881, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38954558

RESUMO

Electroencephalogram (EEG) has been widely utilized in emotion recognition due to its high temporal resolution and reliability. However, the individual differences and non-stationary characteristics of EEG, along with the complexity and variability of emotions, pose challenges in generalizing emotion recognition models across subjects. In this paper, an end-to-end framework is proposed to improve the performance of cross-subject emotion recognition. A novel evolutionary programming (EP)-based optimization strategy with neural network (NN) as the base classifier termed NN ensemble with EP (EPNNE) is designed for cross-subject emotion recognition. The effectiveness of the proposed method is evaluated on the publicly available DEAP, FACED, SEED, and SEED-IV datasets. Numerical results demonstrate that the proposed method is superior to state-of-the-art cross-subject emotion recognition methods. The proposed end-to-end framework for cross-subject emotion recognition aids biomedical researchers in effectively assessing individual emotional states, thereby enabling efficient treatment and interventions.


Assuntos
Eletroencefalografia , Emoções , Processamento de Sinais Assistido por Computador , Humanos , Eletroencefalografia/métodos , Emoções/fisiologia , Redes Neurais de Computação , Aprendizado de Máquina , Algoritmos , Reconhecimento Automatizado de Padrão/métodos , Bases de Dados Factuais , Adulto , Feminino , Masculino
11.
Sensors (Basel) ; 24(11)2024 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-38894423

RESUMO

Gesture recognition using electromyography (EMG) signals has prevailed recently in the field of human-computer interactions for controlling intelligent prosthetics. Currently, machine learning and deep learning are the two most commonly employed methods for classifying hand gestures. Despite traditional machine learning methods already achieving impressive performance, it is still a huge amount of work to carry out feature extraction manually. The existing deep learning methods utilize complex neural network architectures to achieve higher accuracy, which will suffer from overfitting, insufficient adaptability, and low recognition accuracy. To improve the existing phenomenon, a novel lightweight model named dual stream LSTM feature fusion classifier is proposed based on the concatenation of five time-domain features of EMG signals and raw data, which are both processed with one-dimensional convolutional neural networks and LSTM layers to carry out the classification. The proposed method can effectively capture global features of EMG signals using a simple architecture, which means less computational cost. An experiment is conducted on a public DB1 dataset with 52 gestures, and each of the 27 subjects repeats every gesture 10 times. The accuracy rate achieved by the model is 89.66%, which is comparable to that achieved by more complex deep learning neural networks, and the inference time for each gesture is 87.6 ms, which can also be implied in a real-time control system. The proposed model is validated using a subject-wise experiment on 10 out of the 40 subjects in the DB2 dataset, achieving a mean accuracy of 91.74%. This is illustrated by its ability to fuse time-domain features and raw data to extract more effective information from the sEMG signal and select an appropriate, efficient, lightweight network to enhance the recognition results.


Assuntos
Aprendizado Profundo , Eletromiografia , Gestos , Redes Neurais de Computação , Eletromiografia/métodos , Humanos , Processamento de Sinais Assistido por Computador , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Aprendizado de Máquina , Mãos/fisiologia , Memória de Curto Prazo/fisiologia
12.
Math Biosci Eng ; 21(4): 5007-5031, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38872524

RESUMO

In demanding application scenarios such as clinical psychotherapy and criminal interrogation, the accurate recognition of micro-expressions is of utmost importance but poses significant challenges. One of the main difficulties lies in effectively capturing weak and fleeting facial features and improving recognition performance. To address this fundamental issue, this paper proposed a novel architecture based on a multi-scale 3D residual convolutional neural network. The algorithm leveraged a deep 3D-ResNet50 as the skeleton model and utilized the micro-expression optical flow feature map as the input for the network model. Drawing upon the complex spatial and temporal features inherent in micro-expressions, the network incorporated multi-scale convolutional modules of varying sizes to integrate both global and local information. Furthermore, an attention mechanism feature fusion module was introduced to enhance the model's contextual awareness. Finally, to optimize the model's prediction of the optimal solution, a discriminative network structure with multiple output channels was constructed. The algorithm's performance was evaluated using the public datasets SMIC, SAMM, and CASME Ⅱ. The experimental results demonstrated that the proposed algorithm achieves recognition accuracies of 74.6, 84.77 and 91.35% on these datasets, respectively. This substantial improvement in efficiency compared to existing mainstream methods for extracting micro-expression subtle features effectively enhanced micro-expression recognition performance and increased the accuracy of high-precision micro-expression recognition. Consequently, this paper served as an important reference for researchers working on high-precision micro-expression recognition.


Assuntos
Algoritmos , Expressão Facial , Redes Neurais de Computação , Humanos , Imageamento Tridimensional/métodos , Face , Bases de Dados Factuais , Reconhecimento Automatizado de Padrão/métodos , Processamento de Imagem Assistida por Computador/métodos
13.
Artigo em Inglês | MEDLINE | ID: mdl-38833396

RESUMO

The global trend of population aging presents an urgent challenge in ensuring the safety and well-being of elderly individuals, especially those living alone due to various circumstances. A promising approach to this challenge involves leveraging Human Action Recognition (HAR) by integrating data from multiple sensors. However, the field of HAR has struggled to strike a balance between accuracy and response time. While technological advancements have improved recognition accuracy, complex algorithms often come at the expense of response time. To address this issue, we introduce an innovative asynchronous detection method called Rapid Response Elderly Safety Monitoring (RESAM), which relies on progressive hierarchical action recognition and multi-sensor data fusion. Through initial analysis of inertial sensor data using Kernel Principal Component Analysis (KPCA) and multi-class classifiers, we efficiently reduce processing time and lower the false-negative rate (FNR). The inertial sensor identification serves as a pre-filter, enabling the identification of filtered abnormal signals. Decision-level data fusion is then executed, incorporating skeleton image analysis based on ResNet and the inertial sensor data from the initial step. This integration enables the accurate differentiation between normal and abnormal behaviors. The RESAM method achieves an impressive 97.4% accuracy on the UTD-MHAD database with a minimal delay of 1.22 seconds. On our internally collected database, the RESAM system attains an accuracy of 99%, ranking among the most accurate state-of-the-art methods available. These results underscore the practicality and effectiveness of our approach in meeting the critical demand for swift and precise responses in healthcare scenarios.


Assuntos
Algoritmos , Análise de Componente Principal , Humanos , Idoso , Masculino , Feminino , Reconhecimento Automatizado de Padrão/métodos , Segurança , Idoso de 80 Anos ou mais
14.
Opt Express ; 32(10): 16645-16656, 2024 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-38858865

RESUMO

Single-Photon Avalanche Diode (SPAD) direct Time-of-Flight (dToF) sensors provide depth imaging over long distances, enabling the detection of objects even in the absence of contrast in colour or texture. However, distant objects are represented by just a few pixels and are subject to noise from solar interference, limiting the applicability of existing computer vision techniques for high-level scene interpretation. We present a new SPAD-based vision system for human activity recognition, based on convolutional and recurrent neural networks, which is trained entirely on synthetic data. In tests using real data from a 64×32 pixel SPAD, captured over a distance of 40 m, the scheme successfully overcomes the limited transverse resolution (in which human limbs are approximately one pixel across), achieving an average accuracy of 89% in distinguishing between seven different activities. The approach analyses continuous streams of video-rate depth data at a maximal rate of 66 FPS when executed on a GPU, making it well-suited for real-time applications such as surveillance or situational awareness in autonomous systems.


Assuntos
Fótons , Humanos , Atividades Humanas , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão/métodos , Desenho de Equipamento
15.
Artigo em Inglês | MEDLINE | ID: mdl-38869995

RESUMO

Gesture recognition is crucial for enhancing human-computer interaction and is particularly pivotal in rehabilitation contexts, aiding individuals recovering from physical impairments and significantly improving their mobility and interactive capabilities. However, current wearable hand gesture recognition approaches are often limited in detection performance, wearability, and generalization. We thus introduce EchoGest, a novel hand gesture recognition system based on soft, stretchable, transparent artificial skin with integrated ultrasonic waveguides. Our presented system is the first to use soft ultrasonic waveguides for hand gesture recognition. EcoflexTM 00-31 and EcoflexTM 00-45 Near ClearTM silicone elastomers were employed to fabricate the artificial skin and ultrasonic waveguides, while 0.1 mm diameter silver-plated copper wires connected the transducers in the waveguides to the electrical system. The wires are enclosed within an additional elastomer layer, achieving a sensing skin with a total thickness of around 500 µ m. Ten participants wore the EchoGest system and performed static hand gestures from two gesture sets: 8 daily life gestures and 10 American Sign Language (ASL) digits 0-9. Leave-One-Subject-Out Cross-Validation analysis demonstrated accuracies of 91.13% for daily life gestures and 88.5% for ASL gestures. The EchoGest system has significant potential in rehabilitation, particularly for tracking and evaluating hand mobility, which could substantially reduce the workload of therapists in both clinical and home-based settings. Integrating this technology could revolutionize hand gesture recognition applications, from real-time sign language translation to innovative rehabilitation techniques.


Assuntos
Gestos , Mãos , Reconhecimento Automatizado de Padrão , Dispositivos Eletrônicos Vestíveis , Humanos , Feminino , Mãos/fisiologia , Adulto , Masculino , Reconhecimento Automatizado de Padrão/métodos , Adulto Jovem , Ultrassom , Algoritmos , Elastômeros de Silicone , Pele , Reprodutibilidade dos Testes
16.
PLoS One ; 19(6): e0303451, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38870195

RESUMO

Infrared target detection is widely used in industrial fields, such as environmental monitoring, automatic driving, etc., and the detection of weak targets is one of the most challenging research topics in this field. Due to the small size of these targets, limited information and less surrounding contextual information, it increases the difficulty of target detection and recognition. To address these issues, this paper proposes YOLO-ISTD, an improved method for infrared small target detection based on the YOLOv5-S framework. Firstly, we propose a feature extraction module called SACSP, which incorporates the Shuffle Attention mechanism and makes certain adjustments to the CSP structure, enhancing the feature extraction capability and improving the performance of the detector. Secondly, we introduce a feature fusion module called NL-SPPF. By introducing an NL-Block, the network is able to capture richer long-range features, better capturing the correlation between background information and targets, thereby enhancing the detection capability for small targets. Lastly, we propose a modified K-means clustering algorithm based on Distance-IoU (DIoU), called K-means_DIOU, to improve the accuracy of clustering and generate anchors suitable for the task. Additionally, modifications are made to the detection heads in YOLOv5-S. The original 8, 16, and 32 times downsampling detection heads are replaced with 4, 8, and 16 times downsampling detection heads, capturing more informative coarse-grained features. This enables better understanding of the overall characteristics and structure of the targets, resulting in improved representation and localization of small targets. Experimental results demonstrate significant achievements of YOLO-ISTD on the NUST-SIRST dataset, with an improvement of 8.568% in mAP@0.5 and 8.618% in mAP@0.95. Compared to the comparative models, the proposed approach effectively addresses issues of missed detections and false alarms in the detection results, leading to substantial improvements in precision, recall, and model convergence speed.


Assuntos
Algoritmos , Raios Infravermelhos , Análise por Conglomerados , Reconhecimento Automatizado de Padrão/métodos
17.
PLoS One ; 19(6): e0300837, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38870208

RESUMO

Recognizing postures in multi-person dance scenarios presents challenges due to mutual body part obstruction and varying distortions across different dance actions. These challenges include differences in proximity and size, demanding precision in capturing fine details to convey action expressiveness. Robustness in recognition becomes crucial in complex real-world environments. To tackle these issues, our study introduces a novel approach, i.e., Multi-Person Dance Tiered Posture Recognition with Cross Progressive Multi-Resolution Representation Integration (CPMRI) and Tiered Posture Recognition (TPR) modules. The CPMRI module seamlessly merges high-level features, rich in semantic information, with low-level features that provide precise spatial details. Leveraging a cross progressive approach, it retains semantic understanding while enhancing spatial precision, bolstering the network's feature representation capabilities. Through innovative feature concatenation techniques, it efficiently blends high-resolution and low-resolution features, forming a comprehensive multi-resolution representation. This approach significantly improves posture recognition robustness, especially in intricate dance postures involving scale variations. The TPR module classifies body key points into core torso joints and extremity joints based on distinct distortion characteristics. Employing a three-tier tiered network, it progressively refines posture recognition. By addressing the optimal matching problem between torso and extremity joints, the module ensures accurate connections, refining the precision of body key point locations. Experimental evaluations against state-of-the-art methods using MSCOCO2017 and a custom Chinese dance dataset validate our approach's effectiveness. Evaluation metrics including Object Keypoint Similarity (OKS)-based Average Precision (AP), mean Average Precision (mAP), and Average Recall (AR) underscore the efficacy of the proposed method.


Assuntos
Dança , Postura , Humanos , Postura/fisiologia , Dança/fisiologia , Algoritmos , Reconhecimento Automatizado de Padrão/métodos
18.
Sci Rep ; 14(1): 13156, 2024 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-38849454

RESUMO

This research investigates the recognition of basketball techniques actions through the implementation of three-dimensional (3D) Convolutional Neural Networks (CNNs), aiming to enhance the accurate and automated identification of various actions in basketball games. Initially, basketball action sequences are extracted from publicly available basketball action datasets, followed by data preprocessing, including image sampling, data augmentation, and label processing. Subsequently, a novel action recognition model is proposed, combining 3D convolutions and Long Short-Term Memory (LSTM) networks to model temporal features and capture the spatiotemporal relationships and temporal information of actions. This facilitates the facilitating automatic learning of the spatiotemporal features associated with basketball actions. The model's performance and robustness are further improved through the adoption of optimization algorithms, such as adaptive learning rate adjustment and regularization. The efficacy of the proposed method is verified through experiments conducted on three publicly available basketball action datasets: NTURGB + D, Basketball-Action-Dataset, and B3D Dataset. The results indicate that this approach achieves outstanding performance in basketball technique action recognition tasks across different datasets compared to two common traditional methods. Specifically, when compared to the frame difference-based method, this model exhibits a significant accuracy improvement of 15.1%. When compared to the optical flow-based method, this model demonstrates a substantial accuracy improvement of 12.4%. Moreover, this method showcases strong robustness, accurately recognizing actions under diverse lighting conditions and scenes, achieving an average accuracy of 93.1%. The research demonstrates that the method reported here effectively captures the spatiotemporal relationships of basketball actions, thereby providing reliable technical assessment tools for basketball coaches and players.


Assuntos
Algoritmos , Basquetebol , Redes Neurais de Computação , Humanos , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos
19.
Sensors (Basel) ; 24(12)2024 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-38931728

RESUMO

There has been a resurgence of applications focused on human activity recognition (HAR) in smart homes, especially in the field of ambient intelligence and assisted-living technologies. However, such applications present numerous significant challenges to any automated analysis system operating in the real world, such as variability, sparsity, and noise in sensor measurements. Although state-of-the-art HAR systems have made considerable strides in addressing some of these challenges, they suffer from a practical limitation: they require successful pre-segmentation of continuous sensor data streams prior to automated recognition, i.e., they assume that an oracle is present during deployment, and that it is capable of identifying time windows of interest across discrete sensor events. To overcome this limitation, we propose a novel graph-guided neural network approach that performs activity recognition by learning explicit co-firing relationships between sensors. We accomplish this by learning a more expressive graph structure representing the sensor network in a smart home in a data-driven manner. Our approach maps discrete input sensor measurements to a feature space through the application of attention mechanisms and hierarchical pooling of node embeddings. We demonstrate the effectiveness of our proposed approach by conducting several experiments on CASAS datasets, showing that the resulting graph-guided neural network outperforms the state-of-the-art method for HAR in smart homes across multiple datasets and by large margins. These results are promising because they push HAR for smart homes closer to real-world applications.


Assuntos
Atividades Humanas , Redes Neurais de Computação , Humanos , Algoritmos , Reconhecimento Automatizado de Padrão/métodos
20.
Sensors (Basel) ; 24(12)2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38931754

RESUMO

Electromyography-based gesture recognition has become a challenging problem in the decoding of fine hand movements. Recent research has focused on improving the accuracy of gesture recognition by increasing the complexity of network models. However, training a complex model necessitates a significant amount of data, thereby escalating both user burden and computational costs. Moreover, owing to the considerable variability of surface electromyography (sEMG) signals across different users, conventional machine learning approaches reliant on a single feature fail to meet the demand for precise gesture recognition tailored to individual users. Therefore, to solve the problems of large computational cost and poor cross-user pattern recognition performance, we propose a feature selection method that combines mutual information, principal component analysis and the Pearson correlation coefficient (MPP). This method can filter out the optimal subset of features that match a specific user while combining with an SVM classifier to accurately and efficiently recognize the user's gesture movements. To validate the effectiveness of the above method, we designed an experiment including five gesture actions. The experimental results show that compared to the classification accuracy obtained using a single feature, we achieved an improvement of about 5% with the optimally selected feature as the input to any of the classifiers. This study provides an effective guarantee for user-specific fine hand movement decoding based on sEMG signals.


Assuntos
Eletromiografia , Antebraço , Gestos , Mãos , Reconhecimento Automatizado de Padrão , Humanos , Eletromiografia/métodos , Mãos/fisiologia , Antebraço/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Masculino , Adulto , Análise de Componente Principal , Feminino , Algoritmos , Movimento/fisiologia , Adulto Jovem , Máquina de Vetores de Suporte , Aprendizado de Máquina
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...