Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 57
Filtrar
1.
Front Hum Neurosci ; 18: 1391531, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39099602

RESUMEN

Hand gestures are a natural and intuitive form of communication, and integrating this communication method into robotic systems presents significant potential to improve human-robot collaboration. Recent advances in motor neuroscience have focused on replicating human hand movements from synergies also known as movement primitives. Synergies, fundamental building blocks of movement, serve as a potential strategy adapted by the central nervous system to generate and control movements. Identifying how synergies contribute to movement can help in dexterous control of robotics, exoskeletons, prosthetics and extend its applications to rehabilitation. In this paper, 33 static hand gestures were recorded through a single RGB camera and identified in real-time through the MediaPipe framework as participants made various postures with their dominant hand. Assuming an open palm as initial posture, uniform joint angular velocities were obtained from all these gestures. By applying a dimensionality reduction method, kinematic synergies were obtained from these joint angular velocities. Kinematic synergies that explain 98% of variance of movements were utilized to reconstruct new hand gestures using convex optimization. Reconstructed hand gestures and selected kinematic synergies were translated onto a humanoid robot, Mitra, in real-time, as the participants demonstrated various hand gestures. The results showed that by using only few kinematic synergies it is possible to generate various hand gestures, with 95.7% accuracy. Furthermore, utilizing low-dimensional synergies in control of high dimensional end effectors holds promise to enable near-natural human-robot collaboration.

2.
Neural Netw ; 179: 106587, 2024 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-39111160

RESUMEN

Continuous Sign Language Recognition (CSLR) is a task which converts a sign language video into a gloss sequence. The existing deep learning based sign language recognition methods usually rely on large-scale training data and rich supervised information. However, current sign language datasets are limited, and they are only annotated at sentence-level rather than frame-level. Inadequate supervision of sign language data poses a serious challenge for sign language recognition, which may result in insufficient training of sign language recognition models. To address above problems, we propose a cross-modal knowledge distillation method for continuous sign language recognition, which contains two teacher models and one student model. One of the teacher models is the Sign2Text dialogue teacher model, which takes a sign language video and a dialogue sentence as input and outputs the sign language recognition result. The other teacher model is the Text2Gloss translation teacher model, which targets to translate a text sentence into a gloss sequence. Both teacher models can provide information-rich soft labels to assist the training of the student model, which is a general sign language recognition model. We conduct extensive experiments on multiple commonly used sign language datasets, i.e., PHOENIX 2014T, CSL-Daily and QSL, the results show that the proposed cross-modal knowledge distillation method can effectively improve the sign language recognition accuracy by transferring multi-modal information from teacher models to the student model. Code is available at https://github.com/glq-1992/cross-modal-knowledge-distillation_new.

3.
Sensors (Basel) ; 24(14)2024 Jul 16.
Artículo en Inglés | MEDLINE | ID: mdl-39066011

RESUMEN

The aim of this study is to develop a practical software solution for real-time recognition of sign language words using two arms. This will facilitate communication between hearing-impaired individuals and those who can hear. We are aware of several sign language recognition systems developed using different technologies, including cameras, armbands, and gloves. However, the system we propose in this study stands out for its practicality, utilizing surface electromyography (muscle activity) and inertial measurement unit (motion dynamics) data from both arms. We address the drawbacks of other methods, such as high costs, low accuracy due to ambient light and obstacles, and complex hardware requirements, which have limited their practical application. Our software can run on different operating systems using digital signal processing and machine learning methods specific to this study. For the test, we created a dataset of 80 words based on their frequency of use in daily life and performed a thorough feature extraction process. We tested the recognition performance using various classifiers and parameters and compared the results. The random forest algorithm emerged as the most successful, achieving a remarkable 99.875% accuracy, while the naïve Bayes algorithm had the lowest success rate with 87.625% accuracy. The new system promises to significantly improve communication for people with hearing disabilities and ensures seamless integration into daily life without compromising user comfort or lifestyle quality.


Asunto(s)
Algoritmos , Electromiografía , Lengua de Signos , Dispositivos Electrónicos Vestibles , Humanos , Electromiografía/métodos , Electromiografía/instrumentación , Aprendizaje Automático , Procesamiento de Señales Asistido por Computador , Adulto , Masculino , Femenino , Teorema de Bayes
4.
ACS Appl Mater Interfaces ; 16(29): 38780-38791, 2024 Jul 24.
Artículo en Inglés | MEDLINE | ID: mdl-39010653

RESUMEN

Flexible strain sensors have been widely researched in fields such as smart wearables, human health monitoring, and biomedical applications. However, achieving a wide sensing range and high sensitivity of flexible strain sensors simultaneously remains a challenge, limiting their further applications. To address these issues, a cross-scale combinatorial bionic hierarchical design featuring microscale morphology combined with a macroscale base to balance the sensing range and sensitivity is presented. Inspired by the combination of serpentine and butterfly wing structures, this study employs three-dimensional printing, prestretching, and mold transfer processes to construct a combinatorial bionic hierarchical flexible strain sensor (CBH-sensor) with serpentine-shaped inverted-V-groove/wrinkling-cracking structures. The CBH-sensor has a high wide sensing range of 150% and high sensitivity with a gauge factor of up to 2416.67. In addition, it demonstrates the application of the CBH-sensor array in sign language gesture recognition, successfully identifying nine different sign language gestures with an impressive accuracy of 100% with the assistance of machine learning. The CBH-sensor exhibits considerable promise for use in enabling unobstructed communication between individuals who use sign language and those who do not. Furthermore, it has wide-ranging possibilities for use in the field of gesture-driven interactions in human-computer interfaces.


Asunto(s)
Aprendizaje Automático , Lengua de Signos , Humanos , Biónica , Dispositivos Electrónicos Vestibles , Gestos , Impresión Tridimensional
5.
Front Artif Intell ; 7: 1297347, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38957453

RESUMEN

Addressing the increasing demand for accessible sign language learning tools, this paper introduces an innovative Machine Learning-Driven Web Application dedicated to Sign Language Learning. This web application represents a significant advancement in sign language education. Unlike traditional approaches, the application's unique methodology involves assigning users different words to spell. Users are tasked with signing each letter of the word, earning a point upon correctly signing the entire word. The paper delves into the development, features, and the machine learning framework underlying the application. Developed using HTML, CSS, JavaScript, and Flask, the web application seamlessly accesses the user's webcam for a live video feed, displaying the model's predictions on-screen to facilitate interactive practice sessions. The primary aim is to provide a learning platform for those who are not familiar with sign language, offering them the opportunity to acquire this essential skill and fostering inclusivity in the digital age.

6.
PeerJ Comput Sci ; 10: e2054, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38855212

RESUMEN

This article presents an innovative approach for the task of isolated sign language recognition (SLR); this approach centers on the integration of pose data with motion history images (MHIs) derived from these data. Our research combines spatial information obtained from body, hand, and face poses with the comprehensive details provided by three-channel MHI data concerning the temporal dynamics of the sign. Particularly, our developed finger pose-based MHI (FP-MHI) feature significantly enhances the recognition success, capturing the nuances of finger movements and gestures, unlike existing approaches in SLR. This feature improves the accuracy and reliability of SLR systems by more accurately capturing the fine details and richness of sign language. Additionally, we enhance the overall model accuracy by predicting missing pose data through linear interpolation. Our study, based on the randomized leaky rectified linear unit (RReLU) enhanced ResNet-18 model, successfully handles the interaction between manual and non-manual features through the fusion of extracted features and classification with a support vector machine (SVM). This innovative integration demonstrates competitive and superior results compared to current methodologies in the field of SLR across various datasets, including BosphorusSign22k-general, BosphorusSign22k, LSA64, and GSL, in our experiments.

7.
Sensors (Basel) ; 24(11)2024 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-38894473

RESUMEN

Sign language is an essential means of communication for individuals with hearing disabilities. However, there is a significant shortage of sign language interpreters in some languages, especially in Saudi Arabia. This shortage results in a large proportion of the hearing-impaired population being deprived of services, especially in public places. This paper aims to address this gap in accessibility by leveraging technology to develop systems capable of recognizing Arabic Sign Language (ArSL) using deep learning techniques. In this paper, we propose a hybrid model to capture the spatio-temporal aspects of sign language (i.e., letters and words). The hybrid model consists of a Convolutional Neural Network (CNN) classifier to extract spatial features from sign language data and a Long Short-Term Memory (LSTM) classifier to extract spatial and temporal characteristics to handle sequential data (i.e., hand movements). To demonstrate the feasibility of our proposed hybrid model, we created a dataset of 20 different words, resulting in 4000 images for ArSL: 10 static gesture words and 500 videos for 10 dynamic gesture words. Our proposed hybrid model demonstrates promising performance, with the CNN and LSTM classifiers achieving accuracy rates of 94.40% and 82.70%, respectively. These results indicate that our approach can significantly enhance communication accessibility for the hearing-impaired community in Saudi Arabia. Thus, this paper represents a major step toward promoting inclusivity and improving the quality of life for the hearing impaired.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Lengua de Signos , Humanos , Arabia Saudita , Lenguaje , Gestos
8.
J Imaging ; 10(6)2024 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-38921626

RESUMEN

Sign language recognition technology can help people with hearing impairments to communicate with non-hearing-impaired people. At present, with the rapid development of society, deep learning also provides certain technical support for sign language recognition work. In sign language recognition tasks, traditional convolutional neural networks used to extract spatio-temporal features from sign language videos suffer from insufficient feature extraction, resulting in low recognition rates. Nevertheless, a large number of video-based sign language datasets require a significant amount of computing resources for training while ensuring the generalization of the network, which poses a challenge for recognition. In this paper, we present a video-based sign language recognition method based on Residual Network (ResNet) and Long Short-Term Memory (LSTM). As the number of network layers increases, the ResNet network can effectively solve the granularity explosion problem and obtain better time series features. We use the ResNet convolutional network as the backbone model. LSTM utilizes the concept of gates to control unit states and update the output feature values of sequences. ResNet extracts the sign language features. Then, the learned feature space is used as the input of the LSTM network to obtain long sequence features. It can effectively extract the spatio-temporal features in sign language videos and improve the recognition rate of sign language actions. An extensive experimental evaluation demonstrates the effectiveness and superior performance of the proposed method, with an accuracy of 85.26%, F1-score of 84.98%, and precision of 87.77% on Argentine Sign Language (LSA64).

9.
Sensors (Basel) ; 24(10)2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38793964

RESUMEN

Deaf and hard-of-hearing people mainly communicate using sign language, which is a set of signs made using hand gestures combined with facial expressions to make meaningful and complete sentences. The problem that faces deaf and hard-of-hearing people is the lack of automatic tools that translate sign languages into written or spoken text, which has led to a communication gap between them and their communities. Most state-of-the-art vision-based sign language recognition approaches focus on translating non-Arabic sign languages, with few targeting the Arabic Sign Language (ArSL) and even fewer targeting the Saudi Sign Language (SSL). This paper proposes a mobile application that helps deaf and hard-of-hearing people in Saudi Arabia to communicate efficiently with their communities. The prototype is an Android-based mobile application that applies deep learning techniques to translate isolated SSL to text and audio and includes unique features that are not available in other related applications targeting ArSL. The proposed approach, when evaluated on a comprehensive dataset, has demonstrated its effectiveness by outperforming several state-of-the-art approaches and producing results that are comparable to these approaches. Moreover, testing the prototype on several deaf and hard-of-hearing users, in addition to hearing users, proved its usefulness. In the future, we aim to improve the accuracy of the model and enrich the application with more features.


Asunto(s)
Aprendizaje Profundo , Lengua de Signos , Humanos , Arabia Saudita , Aplicaciones Móviles , Sordera/fisiopatología , Personas con Deficiencia Auditiva
10.
Data Brief ; 51: 109797, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38075609

RESUMEN

Sign language is a form of communication medium for speech and hearing disabled people. It has various forms with different troublesome patterns, which are difficult for the general mass to comprehend. Bengali sign language (BdSL) is one of the difficult sign languages due to its immense number of alphabet, words, and expression techniques. Machine translation can ease the difficulty for disabled people to communicate with generals. From the machine learning (ML) domain, computer vision can be the solution for them, and every ML solution requires a optimized model and a proper dataset. Therefore, in this research work, we have created a BdSL dataset and named `KU-BdSL', which consists of 30 classes describing 38 consonants ('banjonborno') of the Bengali alphabet. The dataset includes 1500 images of hand signs in total, each representing Bengali consonant(s). Thirty-nine participants (30 males and 9 females) of different ages (21-38 years) participated in the creation of this dataset. We adopted smartphones to capture the images due to the availability of their high-definition cameras. We believe that this dataset can be beneficial to the deaf and dumb (D&D) community. Identification of Bengali consonants of BdSL from images or videos is feasible using the dataset. It can also be employed for a human-machine interface for disabled people. In the future, we will work on the vowels and word level of BdSL.

11.
Data Brief ; 51: 109799, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38075615

RESUMEN

Sign Language Recognition (SLR) is crucial for enabling communication between the deaf-mute and hearing communities. Nevertheless, the development of a comprehensive sign language dataset is a challenging task due to the complexity and variations in hand gestures. This challenge is particularly evident in the case of Bangla Sign Language (BdSL), where the limited availability of depth datasets impedes accurate recognition. To address this issue, we propose BdSL47, an open-access depth dataset for 47 one-handed static signs (10 digits, from ০ to ৯; and 37 letters, from অ to ँ) of BdSL. The dataset was created using the MediaPipe framework for extracting depth information. To classify the signs, we developed an Artificial Neural Network (ANN) model with a 63-node input layer, a 47-node output layer, and 4 hidden layers that included dropout in the last two hidden layers, an Adam optimizer, and a ReLU activation function. Based on the selected hyperparameters, the proposed ANN model effectively learns the spatial relationships and patterns from the depth-based gestural input features and gives an F1 score of 97.84 %, indicating the effectiveness of the approach compared to the baselines provided. The availability of BdSL47 as a comprehensive dataset can have an impact on improving the accuracy of SLR for BdSL using more advanced deep-learning models.

12.
J Imaging ; 9(11)2023 Oct 25.
Artículo en Inglés | MEDLINE | ID: mdl-37998082

RESUMEN

Communication between Deaf and hearing individuals remains a persistent challenge requiring attention to foster inclusivity. Despite notable efforts in the development of digital solutions for sign language recognition (SLR), several issues persist, such as cross-platform interoperability and strategies for tokenizing signs to enable continuous conversations and coherent sentence construction. To address such issues, this paper proposes a non-invasive Portuguese Sign Language (Língua Gestual Portuguesa or LGP) interpretation system-as-a-service, leveraging skeletal posture sequence inference powered by long-short term memory (LSTM) architectures. To address the scarcity of examples during machine learning (ML) model training, dataset augmentation strategies are explored. Additionally, a buffer-based interaction technique is introduced to facilitate LGP terms tokenization. This technique provides real-time feedback to users, allowing them to gauge the time remaining to complete a sign, which aids in the construction of grammatically coherent sentences based on inferred terms/words. To support human-like conditioning rules for interpretation, a large language model (LLM) service is integrated. Experiments reveal that LSTM-based neural networks, trained with 50 LGP terms and subjected to data augmentation, achieved accuracy levels ranging from 80% to 95.6%. Users unanimously reported a high level of intuition when using the buffer-based interaction strategy for terms/words tokenization. Furthermore, tests with an LLM-specifically ChatGPT-demonstrated promising semantic correlation rates in generated sentences, comparable to expected sentences.

13.
J Imaging ; 9(11)2023 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-37998085

RESUMEN

Supervised deep learning models can be optimised by applying regularisation techniques to reduce overfitting, which can prove difficult when fine tuning the associated hyperparameters. Not all hyperparameters are equal, and understanding the effect each hyperparameter and regularisation technique has on the performance of a given model is of paramount importance in research. We present the first comprehensive, large-scale ablation study for an encoder-only transformer to model sign language using the improved Word-level American Sign Language dataset (WLASL-alt) and human pose estimation keypoint data, with a view to put constraints on the potential to optimise the task. We measure the impact a range of model parameter regularisation and data augmentation techniques have on sign classification accuracy. We demonstrate that within the quoted uncertainties, other than ℓ2 parameter regularisation, none of the regularisation techniques we employ have an appreciable positive impact on performance, which we find to be in contradiction to results reported by other similar, albeit smaller scale, studies. We also demonstrate that the model architecture is bounded by the small dataset size for this task over finding an appropriate set of model parameter regularisation and common or basic dataset augmentation techniques. Furthermore, using the base model configuration, we report a new maximum top-1 classification accuracy of 84% on 100 signs, thereby improving on the previous benchmark result for this model architecture and dataset.

14.
Sensors (Basel) ; 23(22)2023 Nov 09.
Artículo en Inglés | MEDLINE | ID: mdl-38005455

RESUMEN

Sign language recognition, an essential interface between the hearing and deaf-mute communities, faces challenges with high false positive rates and computational costs, even with the use of advanced deep learning techniques. Our proposed solution is a stacked encoded model, combining artificial intelligence (AI) with the Internet of Things (IoT), which refines feature extraction and classification to overcome these challenges. We leverage a lightweight backbone model for preliminary feature extraction and use stacked autoencoders to further refine these features. Our approach harnesses the scalability of big data, showing notable improvement in accuracy, precision, recall, F1-score, and complexity analysis. Our model's effectiveness is demonstrated through testing on the ArSL2018 benchmark dataset, showcasing superior performance compared to state-of-the-art approaches. Additional validation through an ablation study with pre-trained convolutional neural network (CNN) models affirms our model's efficacy across all evaluation metrics. Our work paves the way for the sustainable development of high-performing, IoT-based sign-language-recognition applications.


Asunto(s)
Inteligencia Artificial , Aprendizaje Profundo , Humanos , Aprendizaje Automático , Lengua de Signos , Redes Neurales de la Computación
15.
Nanotechnology ; 35(8)2023 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-37995377

RESUMEN

In recent years, the synaptic properties of transistors have been extensively studied. Compared with liquid or organic material-based transistors, inorganic solid electrolyte-gated transistors have the advantage of better chemical stability. This study uses a simple, low-cost solution technology to prepare In2O3transistors gated by AlLiO solid electrolyte. The electrochemical performance of the device is achieved by forming a double electric layer and electrochemical doping, which can mimic basic functions of biological synapses, such as excitatory postsynaptic current, paired-pulse promotion, and spiking time-dependent plasticity. Furthermore, complex synaptic behaviors such as Pavlovian classical conditioning is successfully emulated. With a 95% identification accuracy, an artificial neural network based on transistors is built to recognize sign language and enable sign language interpretation. Additionally, the handwriting digit's identification accuracy is 94%. Even with various levels of Gaussian noise, the recognition rate is still above 84%. The above findings demonstrate the potential of In2O3/AlLiO TFT in shaping the next generation of artificial intelligence.

16.
Sensors (Basel) ; 23(19)2023 Oct 09.
Artículo en Inglés | MEDLINE | ID: mdl-37837173

RESUMEN

The analysis and recognition of sign languages are currently active fields of research focused on sign recognition. Various approaches differ in terms of analysis methods and the devices used for sign acquisition. Traditional methods rely on video analysis or spatial positioning data calculated using motion capture tools. In contrast to these conventional recognition and classification approaches, electromyogram (EMG) signals, which measure muscle electrical activity, offer potential technology for detecting gestures. These EMG-based approaches have recently gained attention due to their advantages. This prompted us to conduct a comprehensive study on the methods, approaches, and projects utilizing EMG sensors for sign language handshape recognition. In this paper, we provided an overview of the sign language recognition field through a literature review, with the objective of offering an in-depth review of the most significant techniques. These techniques were categorized in this article based on their respective methodologies. The survey discussed the progress and challenges in sign language recognition systems based on surface electromyography (sEMG) signals. These systems have shown promise but face issues like sEMG data variability and sensor placement. Multiple sensors enhance reliability and accuracy. Machine learning, including deep learning, is used to address these challenges. Common classifiers in sEMG-based sign language recognition include SVM, ANN, CNN, KNN, HMM, and LSTM. While SVM and ANN are widely used, random forest and KNN have shown better performance in some cases. A multilayer perceptron neural network achieved perfect accuracy in one study. CNN, often paired with LSTM, ranks as the third most popular classifier and can achieve exceptional accuracy, reaching up to 99.6% when utilizing both EMG and IMU data. LSTM is highly regarded for handling sequential dependencies in EMG signals, making it a critical component of sign language recognition systems. In summary, the survey highlights the prevalence of SVM and ANN classifiers but also suggests the effectiveness of alternative classifiers like random forests and KNNs. LSTM emerges as the most suitable algorithm for capturing sequential dependencies and improving gesture recognition in EMG-based sign language recognition systems.


Asunto(s)
Reconocimiento de Normas Patrones Automatizadas , Lengua de Signos , Humanos , Reproducibilidad de los Resultados , Reconocimiento de Normas Patrones Automatizadas/métodos , Redes Neurales de la Computación , Algoritmos , Electromiografía/métodos , Gestos
17.
Sensors (Basel) ; 23(15)2023 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-37571476

RESUMEN

Finding ways to enable seamless communication between deaf and able-bodied individuals has been a challenging and pressing issue. This paper proposes a solution to this problem by designing a low-cost data glove that utilizes multiple inertial sensors with the purpose of achieving efficient and accurate sign language recognition. In this study, four machine learning models-decision tree (DT), support vector machine (SVM), K-nearest neighbor method (KNN), and random forest (RF)-were employed to recognize 20 different types of dynamic sign language data used by deaf individuals. Additionally, a proposed attention-based mechanism of long and short-term memory neural networks (Attention-BiLSTM) was utilized in the process. Furthermore, this study verifies the impact of the number and position of data glove nodes on the accuracy of recognizing complex dynamic sign language. Finally, the proposed method is compared with existing state-of-the-art algorithms using nine public datasets. The results indicate that both the Attention-BiLSTM and RF algorithms have the highest performance in recognizing the twenty dynamic sign language gestures, with an accuracy of 98.85% and 97.58%, respectively. This provides evidence for the feasibility of our proposed data glove and recognition methods. This study may serve as a valuable reference for the development of wearable sign language recognition devices and promote easier communication between deaf and able-bodied individuals.


Asunto(s)
Lengua de Signos , Dispositivos Electrónicos Vestibles , Humanos , Habla , Algoritmos , Audición
18.
Sensors (Basel) ; 23(12)2023 Jun 14.
Artículo en Inglés | MEDLINE | ID: mdl-37420722

RESUMEN

Hand gesture recognition (HGR) is a crucial area of research that enhances communication by overcoming language barriers and facilitating human-computer interaction. Although previous works in HGR have employed deep neural networks, they fail to encode the orientation and position of the hand in the image. To address this issue, this paper proposes HGR-ViT, a Vision Transformer (ViT) model with an attention mechanism for hand gesture recognition. Given a hand gesture image, it is first split into fixed size patches. Positional embedding is added to these embeddings to form learnable vectors that capture the positional information of the hand patches. The resulting sequence of vectors are then served as the input to a standard Transformer encoder to obtain the hand gesture representation. A multilayer perceptron head is added to the output of the encoder to classify the hand gesture to the correct class. The proposed HGR-ViT obtains an accuracy of 99.98%, 99.36% and 99.85% for the American Sign Language (ASL) dataset, ASL with Digits dataset, and National University of Singapore (NUS) hand gesture dataset, respectively.


Asunto(s)
Gestos , Reconocimiento de Normas Patrones Automatizadas , Humanos , Reconocimiento de Normas Patrones Automatizadas/métodos , Redes Neurales de la Computación , Extremidad Superior , Lengua de Signos , Mano
19.
Sensors (Basel) ; 23(14)2023 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-37514679

RESUMEN

This article is devoted to solving the problem of converting sign language into a consistent text with intonation markup for subsequent voice synthesis of sign phrases by speech with intonation. The paper proposes an improved method of continuous recognition of sign language, the results of which are transmitted to a natural language processor based on analyzers of morphology, syntax, and semantics of the Kazakh language, including morphological inflection and the construction of an intonation model of simple sentences. This approach has significant practical and social significance, as it can lead to the development of technologies that will help people with disabilities to communicate and improve their quality of life. As a result of the cross-validation of the model, we obtained an average test accuracy of 0.97 and an average val_accuracy of 0.90 for model evaluation. We also identified 20 sentence structures of the Kazakh language with their intonational model.


Asunto(s)
Percepción del Habla , Habla , Humanos , Lengua de Signos , Calidad de Vida , Lenguaje
20.
Sensors (Basel) ; 23(13)2023 Jun 21.
Artículo en Inglés | MEDLINE | ID: mdl-37447625

RESUMEN

Deaf and hearing-impaired people always face communication barriers. Non-invasive surface electromyography (sEMG) sensor-based sign language recognition (SLR) technology can help them to better integrate into social life. Since the traditional tandem convolutional neural network (CNN) structure used in most CNN-based studies inadequately captures the features of the input data, we propose a novel inception architecture with a residual module and dilated convolution (IRDC-net) to enlarge the receptive fields and enrich the feature maps, applying it to SLR tasks for the first time. This work first transformed the time domain signal into a time-frequency domain using discrete Fourier transformation. Second, an IRDC-net was constructed to recognize ten Chinese sign language signs. Third, the tandem CNN networks VGG-net and ResNet-18 were compared with our proposed parallel structure network, IRDC-net. Finally, the public dataset Ninapro DB1 was utilized to verify the generalization performance of the IRDC-net. The results showed that after transforming the time domain sEMG signal into the time-frequency domain, the classification accuracy (acc) increased from 84.29% to 91.70% when using the IRDC-net on our sign language dataset. Furthermore, for the time-frequency information of the public dataset Ninapro DB1, the classification accuracy reached 89.82%; this value is higher than that achieved in other recent studies. As such, our findings contribute to research into SLR tasks and to improving deaf and hearing-impaired people's daily lives.


Asunto(s)
Reconocimiento de Normas Patrones Automatizadas , Lengua de Signos , Humanos , Electromiografía/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Redes Neurales de la Computación , Reconocimiento en Psicología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA