Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 18.102
Filtrar
1.
PLoS One ; 19(5): e0301862, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38753628

RESUMEN

Recognition of the key text of the Chinese seal can speed up the approval of documents, and improve the office efficiency of enterprises or government administrative departments. Due to image blurring and occlusion, the accuracy of Chinese seal recognition is low. In addition, the real dataset is very limited. In order to solve these problems, we improve the differentiable binarization detection algorithm (DBnet) to construct a model DB-ECA for text region detection, and propose a model named LSTR (Lightweight Seal Text Recognition) for text recognition. The efficient channel attention module is added to the differentiable binarization network to solve the feature pyramid conflict, and the convolutional layer network structure is improved to delay downsampling for reducing semantic feature loss. LSTR uses a lightweight CNN more suitable for small-sample generalization, and dynamically fuses positional and visual information through a self-attention-based inference layer to predict the label distribution of feature sequences in parallel. The inference layer not only solves the weak discriminative power of CNN in the shallow layer, but also facilitates CTC (Connectionist Temporal Classification) to accurately align the feature region with the target character. Experiments on the homemade dataset in this paper, DB-ECA compared with the other five commonly used detection models, the precision, recall, F-measure are the best effect of 90.29, 85.17, 87.65, respectively. LSTR compared with the other five kinds of recognition models in the last three years, to achieve the highest effect of accuracy 91.29%, and has the advantages of a small number of parameters and fast inference. The experimental results fully prove the innovation and effectiveness of our model.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Reconocimiento de Normas Patrones Automatizadas/métodos
2.
Sci Rep ; 14(1): 10560, 2024 05 08.
Artículo en Inglés | MEDLINE | ID: mdl-38720020

RESUMEN

The research on video analytics especially in the area of human behavior recognition has become increasingly popular recently. It is widely applied in virtual reality, video surveillance, and video retrieval. With the advancement of deep learning algorithms and computer hardware, the conventional two-dimensional convolution technique for training video models has been replaced by three-dimensional convolution, which enables the extraction of spatio-temporal features. Specifically, the use of 3D convolution in human behavior recognition has been the subject of growing interest. However, the increased dimensionality has led to challenges such as the dramatic increase in the number of parameters, increased time complexity, and a strong dependence on GPUs for effective spatio-temporal feature extraction. The training speed can be considerably slow without the support of powerful GPU hardware. To address these issues, this study proposes an Adaptive Time Compression (ATC) module. Functioning as an independent component, ATC can be seamlessly integrated into existing architectures and achieves data compression by eliminating redundant frames within video data. The ATC module effectively reduces GPU computing load and time complexity with negligible loss of accuracy, thereby facilitating real-time human behavior recognition.


Asunto(s)
Algoritmos , Compresión de Datos , Grabación en Video , Humanos , Compresión de Datos/métodos , Actividades Humanas , Aprendizaje Profundo , Procesamiento de Imagen Asistido por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos
3.
Sensors (Basel) ; 24(9)2024 Apr 24.
Artículo en Inglés | MEDLINE | ID: mdl-38732808

RESUMEN

Currently, surface EMG signals have a wide range of applications in human-computer interaction systems. However, selecting features for gesture recognition models based on traditional machine learning can be challenging and may not yield satisfactory results. Considering the strong nonlinear generalization ability of neural networks, this paper proposes a two-stream residual network model with an attention mechanism for gesture recognition. One branch processes surface EMG signals, while the other processes hand acceleration signals. Segmented networks are utilized to fully extract the physiological and kinematic features of the hand. To enhance the model's capacity to learn crucial information, we introduce an attention mechanism after global average pooling. This mechanism strengthens relevant features and weakens irrelevant ones. Finally, the deep features obtained from the two branches of learning are fused to further improve the accuracy of multi-gesture recognition. The experiments conducted on the NinaPro DB2 public dataset resulted in a recognition accuracy of 88.25% for 49 gestures. This demonstrates that our network model can effectively capture gesture features, enhancing accuracy and robustness across various gestures. This approach to multi-source information fusion is expected to provide more accurate and real-time commands for exoskeleton robots and myoelectric prosthetic control systems, thereby enhancing the user experience and the naturalness of robot operation.


Asunto(s)
Electromiografía , Gestos , Redes Neurales de la Computación , Humanos , Electromiografía/métodos , Procesamiento de Señales Asistido por Computador , Reconocimiento de Normas Patrones Automatizadas/métodos , Aceleración , Algoritmos , Mano/fisiología , Aprendizaje Automático , Fenómenos Biomecánicos/fisiología
4.
Sensors (Basel) ; 24(9)2024 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-38732843

RESUMEN

As the number of electronic gadgets in our daily lives is increasing and most of them require some kind of human interaction, this demands innovative, convenient input methods. There are limitations to state-of-the-art (SotA) ultrasound-based hand gesture recognition (HGR) systems in terms of robustness and accuracy. This research presents a novel machine learning (ML)-based end-to-end solution for hand gesture recognition with low-cost micro-electromechanical (MEMS) system ultrasonic transducers. In contrast to prior methods, our ML model processes the raw echo samples directly instead of using pre-processed data. Consequently, the processing flow presented in this work leaves it to the ML model to extract the important information from the echo data. The success of this approach is demonstrated as follows. Four MEMS ultrasonic transducers are placed in three different geometrical arrangements. For each arrangement, different types of ML models are optimized and benchmarked on datasets acquired with the presented custom hardware (HW): convolutional neural networks (CNNs), gated recurrent units (GRUs), long short-term memory (LSTM), vision transformer (ViT), and cross-attention multi-scale vision transformer (CrossViT). The three last-mentioned ML models reached more than 88% accuracy. The most important innovation described in this research paper is that we were able to demonstrate that little pre-processing is necessary to obtain high accuracy in ultrasonic HGR for several arrangements of cost-effective and low-power MEMS ultrasonic transducer arrays. Even the computationally intensive Fourier transform can be omitted. The presented approach is further compared to HGR systems using other sensor types such as vision, WiFi, radar, and state-of-the-art ultrasound-based HGR systems. Direct processing of the sensor signals by a compact model makes ultrasonic hand gesture recognition a true low-cost and power-efficient input method.


Asunto(s)
Gestos , Mano , Aprendizaje Automático , Redes Neurales de la Computación , Humanos , Mano/fisiología , Reconocimiento de Normas Patrones Automatizadas/métodos , Ultrasonografía/métodos , Ultrasonografía/instrumentación , Ultrasonido/instrumentación , Algoritmos
5.
Sensors (Basel) ; 24(9)2024 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-38732846

RESUMEN

Brain-computer interfaces (BCIs) allow information to be transmitted directly from the human brain to a computer, enhancing the ability of human brain activity to interact with the environment. In particular, BCI-based control systems are highly desirable because they can control equipment used by people with disabilities, such as wheelchairs and prosthetic legs. BCIs make use of electroencephalograms (EEGs) to decode the human brain's status. This paper presents an EEG-based facial gesture recognition method based on a self-organizing map (SOM). The proposed facial gesture recognition uses α, ß, and θ power bands of the EEG signals as the features of the gesture. The SOM-Hebb classifier is utilized to classify the feature vectors. We utilized the proposed method to develop an online facial gesture recognition system. The facial gestures were defined by combining facial movements that are easy to detect in EEG signals. The recognition accuracy of the system was examined through experiments. The recognition accuracy of the system ranged from 76.90% to 97.57% depending on the number of gestures recognized. The lowest accuracy (76.90%) occurred when recognizing seven gestures, though this is still quite accurate when compared to other EEG-based recognition systems. The implemented online recognition system was developed using MATLAB, and the system took 5.7 s to complete the recognition flow.


Asunto(s)
Interfaces Cerebro-Computador , Electroencefalografía , Gestos , Humanos , Electroencefalografía/métodos , Cara/fisiología , Algoritmos , Reconocimiento de Normas Patrones Automatizadas/métodos , Procesamiento de Señales Asistido por Computador , Encéfalo/fisiología , Masculino
6.
Sensors (Basel) ; 24(9)2024 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-38733038

RESUMEN

With the continuous advancement of autonomous driving and monitoring technologies, there is increasing attention on non-intrusive target monitoring and recognition. This paper proposes an ArcFace SE-attention model-agnostic meta-learning approach (AS-MAML) by integrating attention mechanisms into residual networks for pedestrian gait recognition using frequency-modulated continuous-wave (FMCW) millimeter-wave radar through meta-learning. We enhance the feature extraction capability of the base network using channel attention mechanisms and integrate the additive angular margin loss function (ArcFace loss) into the inner loop of MAML to constrain inner loop optimization and improve radar discrimination. Then, this network is used to classify small-sample micro-Doppler images obtained from millimeter-wave radar as the data source for pose recognition. Experimental tests were conducted on pose estimation and image classification tasks. The results demonstrate significant detection and recognition performance, with an accuracy of 94.5%, accompanied by a 95% confidence interval. Additionally, on the open-source dataset DIAT-µRadHAR, which is specially processed to increase classification difficulty, the network achieves a classification accuracy of 85.9%.


Asunto(s)
Peatones , Radar , Humanos , Algoritmos , Marcha/fisiología , Reconocimiento de Normas Patrones Automatizadas/métodos , Aprendizaje Automático
7.
PLoS One ; 19(5): e0298373, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38691542

RESUMEN

Pulse repetition interval modulation (PRIM) is integral to radar identification in modern electronic support measure (ESM) and electronic intelligence (ELINT) systems. Various distortions, including missing pulses, spurious pulses, unintended jitters, and noise from radar antenna scans, often hinder the accurate recognition of PRIM. This research introduces a novel three-stage approach for PRIM recognition, emphasizing the innovative use of PRI sound. A transfer learning-aided deep convolutional neural network (DCNN) is initially used for feature extraction. This is followed by an extreme learning machine (ELM) for real-time PRIM classification. Finally, a gray wolf optimizer (GWO) refines the network's robustness. To evaluate the proposed method, we develop a real experimental dataset consisting of sound of six common PRI patterns. We utilized eight pre-trained DCNN architectures for evaluation, with VGG16 and ResNet50V2 notably achieving recognition accuracies of 97.53% and 96.92%. Integrating ELM and GWO further optimized the accuracy rates to 98.80% and 97.58. This research advances radar identification by offering an enhanced method for PRIM recognition, emphasizing the potential of PRI sound to address real-world distortions in ESM and ELINT systems.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Sonido , Radar , Algoritmos , Reconocimiento de Normas Patrones Automatizadas/métodos
8.
J Neural Eng ; 21(3)2024 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-38757187

RESUMEN

Objective.Aiming for the research on the brain-computer interface (BCI), it is crucial to design a MI-EEG recognition model, possessing a high classification accuracy and strong generalization ability, and not relying on a large number of labeled training samples.Approach.In this paper, we propose a self-supervised MI-EEG recognition method based on self-supervised learning with one-dimensional multi-task convolutional neural networks and long short-term memory (1-D MTCNN-LSTM). The model is divided into two stages: signal transform identification stage and pattern recognition stage. In the signal transform recognition phase, the signal transform dataset is recognized by the upstream 1-D MTCNN-LSTM network model. Subsequently, the backbone network from the signal transform identification phase is transferred to the pattern recognition phase. Then, it is fine-tuned using a trace amount of labeled data to finally obtain the motion recognition model.Main results.The upstream stage of this study achieves more than 95% recognition accuracy for EEG signal transforms, up to 100%. For MI-EEG pattern recognition, the model obtained recognition accuracies of 82.04% and 87.14% with F1 scores of 0.7856 and 0.839 on the datasets of BCIC-IV-2b and BCIC-IV-2a.Significance.The improved accuracy proves the superiority of the proposed method. It is prospected to be a method for accurate classification of MI-EEG in the BCI system.


Asunto(s)
Interfaces Cerebro-Computador , Electroencefalografía , Imaginación , Redes Neurales de la Computación , Electroencefalografía/métodos , Humanos , Imaginación/fisiología , Aprendizaje Automático Supervisado , Reconocimiento de Normas Patrones Automatizadas/métodos
9.
PLoS One ; 19(5): e0302590, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38758731

RESUMEN

Automatic Urdu handwritten text recognition is a challenging task in the OCR industry. Unlike printed text, Urdu handwriting lacks a uniform font and structure. This lack of uniformity causes data inconsistencies and recognition issues. Different writing styles, cursive scripts, and limited data make Urdu text recognition a complicated task. Major languages, such as English, have experienced advances in automated recognition, whereas low-resource languages, such as Urdu, still lag. Transformer-based models are promising for automated recognition in high- and low-resource languages such as Urdu. This paper presents a transformer-based method called ET-Network that integrates self-attention into EfficientNet for feature extraction and a transformer for language modeling. The use of self-attention layers in EfficientNet helps to extract global and local features that capture long-range dependencies. These features proceeded into a vanilla transformer to generate text, and a prefix beam search is used for the finest outcome. NUST-UHWR, UPTI2.0, and MMU-OCR-21 are three datasets used to train and test the ET Network for a handwritten Urdu script. The ET-Network improved the character error rate by 4% and the word error rate by 1.55%, while establishing a new state-of-the-art character error rate of 5.27% and a word error rate of 19.09% for Urdu handwritten text.


Asunto(s)
Aprendizaje Profundo , Escritura Manual , Humanos , Lenguaje , Reconocimiento de Normas Patrones Automatizadas/métodos , Algoritmos
10.
J Neural Eng ; 21(3)2024 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-38722304

RESUMEN

Discrete myoelectric control-based gesture recognition has recently gained interest as a possible input modality for many emerging ubiquitous computing applications. Unlike the continuous control commonly employed in powered prostheses, discrete systems seek to recognize the dynamic sequences associated with gestures to generate event-based inputs. More akin to those used in general-purpose human-computer interaction, these could include, for example, a flick of the wrist to dismiss a phone call or a double tap of the index finger and thumb to silence an alarm. Moelectric control systems have been shown to achieve near-perfect classification accuracy, but in highly constrained offline settings. Real-world, online systems are subject to 'confounding factors' (i.e. factors that hinder the real-world robustness of myoelectric control that are not accounted for during typical offline analyses), which inevitably degrade system performance, limiting their practical use. Although these factors have been widely studied in continuous prosthesis control, there has been little exploration of their impacts on discrete myoelectric control systems for emerging applications and use cases. Correspondingly, this work examines, for the first time, three confounding factors and their effect on the robustness of discrete myoelectric control: (1)limb position variability, (2)cross-day use, and a newly identified confound faced by discrete systems (3)gesture elicitation speed. Results from four different discrete myoelectric control architectures: (1) Majority Vote LDA, (2) Dynamic Time Warping, (3) an LSTM network trained with Cross Entropy, and (4) an LSTM network trained with Contrastive Learning, show that classification accuracy is significantly degraded (p<0.05) as a result of each of these confounds. This work establishes that confounding factors are a critical barrier that must be addressed to enable the real-world adoption of discrete myoelectric control for robust and reliable gesture recognition.


Asunto(s)
Electromiografía , Gestos , Reconocimiento de Normas Patrones Automatizadas , Humanos , Electromiografía/métodos , Masculino , Reconocimiento de Normas Patrones Automatizadas/métodos , Femenino , Adulto , Adulto Joven , Miembros Artificiales
11.
PLoS One ; 19(4): e0298699, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38574042

RESUMEN

Sign language recognition presents significant challenges due to the intricate nature of hand gestures and the necessity to capture fine-grained details. In response to these challenges, a novel approach is proposed-Lightweight Attentive VGG16 with Random Forest (LAVRF) model. LAVRF introduces a refined adaptation of the VGG16 model integrated with attention modules, complemented by a Random Forest classifier. By streamlining the VGG16 architecture, the Lightweight Attentive VGG16 effectively manages complexity while incorporating attention mechanisms that dynamically concentrate on pertinent regions within input images, resulting in enhanced representation learning. Leveraging the Random Forest classifier provides notable benefits, including proficient handling of high-dimensional feature representations, reduction of variance and overfitting concerns, and resilience against noisy and incomplete data. Additionally, the model performance is further optimized through hyperparameter optimization, utilizing the Optuna in conjunction with hill climbing, which efficiently explores the hyperparameter space to discover optimal configurations. The proposed LAVRF model demonstrates outstanding accuracy on three datasets, achieving remarkable results of 99.98%, 99.90%, and 100% on the American Sign Language, American Sign Language with Digits, and NUS Hand Posture datasets, respectively.


Asunto(s)
Bosques Aleatorios , Lengua de Signos , Humanos , Reconocimiento de Normas Patrones Automatizadas/métodos , Gestos , Extremidad Superior
12.
Artículo en Inglés | MEDLINE | ID: mdl-38598402

RESUMEN

Canonical correlation analysis (CCA), Multivariate synchronization index (MSI), and their extended methods have been widely used for target recognition in Brain-computer interfaces (BCIs) based on Steady State Visual Evoked Potentials (SSVEP), and covariance calculation is an important process for these algorithms. Some studies have proved that embedding time-local information into the covariance can optimize the recognition effect of the above algorithms. However, the optimization effect can only be observed from the recognition results and the improvement principle of time-local information cannot be explained. Therefore, we propose a time-local weighted transformation (TT) recognition framework that directly embeds the time-local information into the electroencephalography signal through weighted transformation. The influence mechanism of time-local information on the SSVEP signal can then be observed in the frequency domain. Low-frequency noise is suppressed on the premise of sacrificing part of the SSVEP fundamental frequency energy, the harmonic energy of SSVEP is enhanced at the cost of introducing a small amount of high-frequency noise. The experimental results show that the TT recognition framework can significantly improve the recognition ability of the algorithms and the separability of extracted features. Its enhancement effect is significantly better than the traditional time-local covariance extraction method, which has enormous application potential.


Asunto(s)
Interfaces Cerebro-Computador , Humanos , Potenciales Evocados Visuales , Reconocimiento de Normas Patrones Automatizadas/métodos , Reconocimiento en Psicología , Electroencefalografía/métodos , Algoritmos , Estimulación Luminosa
13.
PLoS One ; 19(4): e0301093, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38662662

RESUMEN

Feature enhancement plays a crucial role in improving the quality and discriminative power of features used in matching tasks. By enhancing the informative and invariant aspects of features, the matching process becomes more robust and reliable, enabling accurate predictions even in challenging scenarios, such as occlusion and reflection in stereo matching. In this paper, we propose an end-to-end dual-dimension feature modulation network called DFMNet to address the issue of mismatches in interference areas. DFMNet utilizes dual-dimension feature modulation (DFM) to capture spatial and channel information separately. This approach enables the adaptive combination of local features with more extensive contextual information, resulting in an enhanced feature representation that is more effective in dealing with challenging scenarios. Additionally, we introduce the concept of cost filter volume (CFV) by utilizing guide weights derived from group-wise correlation. CFV aids in filtering the concatenated volume adaptively, effectively discarding redundant information, and further improving matching accuracy. To enable real-time performance, we designed a fast version named Fast-GFM. Fast-GFM employs the global feature modulation (GFM) block to enhance the feature expression ability, improving the accuracy and stereo matching robustness. The accurate DFMNet and the real-time Fast-GFM achieve state-of-the-art performance across multiple benchmarks, including Scene Flow, KITTI, ETH3D, and Middlebury. These results demonstrate the effectiveness of our proposed methods in enhancing feature representation and significantly improving matching accuracy in various stereo matching scenarios.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Humanos , Reconocimiento de Normas Patrones Automatizadas/métodos
14.
Asian Pac J Cancer Prev ; 25(4): 1265-1270, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38679986

RESUMEN

PURPOSE: This study aims to compare the accuracy of the ADNEX MR scoring system and pattern recognition system to evaluate adnexal lesions indeterminate on the US exam. METHODS: In this cross-sectional retrospective study, pelvic DCE-MRI of 245 patients with 340 adnexal masses was studied based on the ADNEX MR scoring system and pattern recognition system. RESULTS: ADNEX MR scoring system with a sensitivity of 96.6% and specificity of 91% has an accuracy of 92.9%. The pattern recognition system's sensitivity, specificity, and accuracy are 95.8%, 93.3%, and 94.7%, respectively. PPV and NPV for the ADNEX MR scoring system were 85.1 and 98.1, respectively. PPV and NPV for the pattern recognition system were 89.7% and 97.7%, respectively. The area under the ROC curve for the ADNEX MR scoring system and pattern recognition system is 0.938 (95% CI, 0.909-0.967) and 0.950 (95% CI, 0.922-0.977). Pairwise comparison of these AUCs showed no significant difference (p = 0.052). CONCLUSION: The pattern recognition system is less sensitive than the ADNEX MR scoring system, yet more specific.


Asunto(s)
Enfermedades de los Anexos , Imagen por Resonancia Magnética , Humanos , Femenino , Estudios Transversales , Estudios Retrospectivos , Persona de Mediana Edad , Enfermedades de los Anexos/diagnóstico por imagen , Enfermedades de los Anexos/patología , Enfermedades de los Anexos/diagnóstico , Adulto , Imagen por Resonancia Magnética/métodos , Anciano , Pronóstico , Curva ROC , Estudios de Seguimiento , Adolescente , Adulto Joven , Reconocimiento de Normas Patrones Automatizadas/métodos , Anexos Uterinos/patología , Anexos Uterinos/diagnóstico por imagen
15.
Sensors (Basel) ; 24(8)2024 Apr 10.
Artículo en Inglés | MEDLINE | ID: mdl-38676024

RESUMEN

In recent decades, technological advancements have transformed the industry, highlighting the efficiency of automation and safety. The integration of augmented reality (AR) and gesture recognition has emerged as an innovative approach to create interactive environments for industrial equipment. Gesture recognition enhances AR applications by allowing intuitive interactions. This study presents a web-based architecture for the integration of AR and gesture recognition, designed to interact with industrial equipment. Emphasizing hardware-agnostic compatibility, the proposed structure offers an intuitive interaction with equipment control systems through natural gestures. Experimental validation, conducted using Google Glass, demonstrated the practical viability and potential of this approach in industrial operations. The development focused on optimizing the system's software and implementing techniques such as normalization, clamping, conversion, and filtering to achieve accurate and reliable gesture recognition under different usage conditions. The proposed approach promotes safer and more efficient industrial operations, contributing to research in AR and gesture recognition. Future work will include improving the gesture recognition accuracy, exploring alternative gestures, and expanding the platform integration to improve the user experience.


Asunto(s)
Realidad Aumentada , Gestos , Humanos , Industrias , Programas Informáticos , Reconocimiento de Normas Patrones Automatizadas/métodos , Interfaz Usuario-Computador
16.
Sensors (Basel) ; 24(8)2024 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-38676108

RESUMEN

Egocentric activity recognition is a prominent computer vision task that is based on the use of wearable cameras. Since egocentric videos are captured through the perspective of the person wearing the camera, her/his body motions severely complicate the video content, imposing several challenges. In this work we propose a novel approach for domain-generalized egocentric human activity recognition. Typical approaches use a large amount of training data, aiming to cover all possible variants of each action. Moreover, several recent approaches have attempted to handle discrepancies between domains with a variety of costly and mostly unsupervised domain adaptation methods. In our approach we show that through simple manipulation of available source domain data and with minor involvement from the target domain, we are able to produce robust models, able to adequately predict human activity in egocentric video sequences. To this end, we introduce a novel three-stream deep neural network architecture combining elements of vision transformers and residual neural networks which are trained using multi-modal data. We evaluate the proposed approach using a challenging, egocentric video dataset and demonstrate its superiority over recent, state-of-the-art research works.


Asunto(s)
Redes Neurales de la Computación , Grabación en Video , Humanos , Grabación en Video/métodos , Algoritmos , Reconocimiento de Normas Patrones Automatizadas/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Actividades Humanas , Dispositivos Electrónicos Vestibles
17.
Sensors (Basel) ; 24(8)2024 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-38676137

RESUMEN

Human action recognition (HAR) is growing in machine learning with a wide range of applications. One challenging aspect of HAR is recognizing human actions while playing music, further complicated by the need to recognize the musical notes being played. This paper proposes a deep learning-based method for simultaneous HAR and musical note recognition in music performances. We conducted experiments on Morin khuur performances, a traditional Mongolian instrument. The proposed method consists of two stages. First, we created a new dataset of Morin khuur performances. We used motion capture systems and depth sensors to collect data that includes hand keypoints, instrument segmentation information, and detailed movement information. We then analyzed RGB images, depth images, and motion data to determine which type of data provides the most valuable features for recognizing actions and notes in music performances. The second stage utilizes a Spatial Temporal Attention Graph Convolutional Network (STA-GCN) to recognize musical notes as continuous gestures. The STA-GCN model is designed to learn the relationships between hand keypoints and instrument segmentation information, which are crucial for accurate recognition. Evaluation on our dataset demonstrates that our model outperforms the traditional ST-GCN model, achieving an accuracy of 81.4%.


Asunto(s)
Aprendizaje Profundo , Música , Humanos , Redes Neurales de la Computación , Actividades Humanas , Reconocimiento de Normas Patrones Automatizadas/métodos , Gestos , Algoritmos , Movimiento/fisiología
18.
Sensors (Basel) ; 24(8)2024 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-38676207

RESUMEN

Teaching gesture recognition is a technique used to recognize the hand movements of teachers in classroom teaching scenarios. This technology is widely used in education, including for classroom teaching evaluation, enhancing online teaching, and assisting special education. However, current research on gesture recognition in teaching mainly focuses on detecting the static gestures of individual students and analyzing their classroom behavior. To analyze the teacher's gestures and mitigate the difficulty of single-target dynamic gesture recognition in multi-person teaching scenarios, this paper proposes skeleton-based teaching gesture recognition (ST-TGR), which learns through spatio-temporal representation. This method mainly uses the human pose estimation technique RTMPose to extract the coordinates of the keypoints of the teacher's skeleton and then inputs the recognized sequence of the teacher's skeleton into the MoGRU action recognition network for classifying gesture actions. The MoGRU action recognition module mainly learns the spatio-temporal representation of target actions by stacking a multi-scale bidirectional gated recurrent unit (BiGRU) and using improved attention mechanism modules. To validate the generalization of the action recognition network model, we conducted comparative experiments on datasets including NTU RGB+D 60, UT-Kinect Action3D, SBU Kinect Interaction, and Florence 3D. The results indicate that, compared with most existing baseline models, the model proposed in this article exhibits better performance in recognition accuracy and speed.


Asunto(s)
Gestos , Humanos , Reconocimiento de Normas Patrones Automatizadas/métodos , Algoritmos , Enseñanza
19.
Artículo en Inglés | MEDLINE | ID: mdl-38683719

RESUMEN

To overcome the challenges posed by the complex structure and large parameter requirements of existing classification models, the authors propose an improved extreme learning machine (ELM) classifier for human locomotion intent recognition in this study, resulting in enhanced classification accuracy. The structure of the ELM algorithm is enhanced using the logistic regression (LR) algorithm, significantly reducing the number of hidden layer nodes. Hence, this algorithm can be adopted for real-time human locomotion intent recognition on portable devices with only 234 parameters to store. Additionally, a hybrid grey wolf optimization and slime mould algorithm (GWO-SMA) is proposed to optimize the hidden layer bias of the improved ELM classifier. Numerical results demonstrate that the proposed model successfully recognizes nine daily motion modes including low-, mid-, and fast-speed level ground walking, ramp ascent/descent, sit/stand, and stair ascent/descent. Specifically, it achieves 96.75% accuracy with 5-fold cross-validation while maintaining a real-time prediction time of only 2 ms. These promising findings highlight the potential of onboard real-time recognition of continuous locomotion modes based on our model for the high-level control of powered knee prostheses.


Asunto(s)
Algoritmos , Amputados , Intención , Prótesis de la Rodilla , Aprendizaje Automático , Humanos , Amputados/rehabilitación , Masculino , Modelos Logísticos , Locomoción/fisiología , Caminata , Fémur , Reconocimiento de Normas Patrones Automatizadas/métodos , Adulto
20.
Sensors (Basel) ; 24(6)2024 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-38544240

RESUMEN

Radio frequency (RF) technology has been applied to enable advanced behavioral sensing in human-computer interaction. Due to its device-free sensing capability and wide availability on Internet of Things devices. Enabling finger gesture-based identification with high accuracy can be challenging due to low RF signal resolution and user heterogeneity. In this paper, we propose MeshID, a novel RF-based user identification scheme that enables identification through finger gestures with high accuracy. MeshID significantly improves the sensing sensitivity on RF signal interference, and hence is able to extract subtle individual biometrics through velocity distribution profiling (VDP) features from less-distinct finger motions such as drawing digits in the air. We design an efficient few-shot model retraining framework based on first component reverse module, achieving high model robustness and performance in a complex environment. We conduct comprehensive real-world experiments and the results show that MeshID achieves a user identification accuracy of 95.17% on average in three indoor environments. The results indicate that MeshID outperforms the state-of-the-art in identification performance with less cost.


Asunto(s)
Algoritmos , Gestos , Humanos , Reconocimiento de Normas Patrones Automatizadas/métodos , Dedos , Movimiento (Física)
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA