Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Bioengineering (Basel) ; 10(10)2023 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-37892863

RESUMO

Human skeleton data obtained using a depth camera have been used for pathological gait recognition to support doctor or physician diagnosis decisions. Most studies for skeleton-based pathological gait recognition have used either raw skeleton sequences directly or gait features, such as gait parameters and joint angles, extracted from raw skeleton sequences. We hypothesize that using skeleton, joint angles, and gait parameters together can improve recognition performance. This study aims to develop a deep neural network model that effectively combines different types of input data. We propose a hybrid deep neural network framework composed of a graph convolutional network, recurrent neural network, and artificial neural network to effectively encode skeleton sequences, joint angle sequences, and gait parameters, respectively. The features extracted from three different input data types are fused and fed into the final classification layer. We evaluate the proposed model on two different skeleton datasets (a simulated pathological gait dataset and a vestibular disorder gait dataset) that were collected using an Azure Kinect. The proposed model, with multiple types of input, improved the pathological gait recognition performance compared to single input models on both datasets. Furthermore, it achieved the best performance among the state-of-the-art models for skeleton-based action recognition.

2.
Sensors (Basel) ; 23(4)2023 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-36850876

RESUMO

With the development of wearable devices such as smartwatches, several studies have been conducted on the recognition of various human activities. Various types of data are used, e.g., acceleration data collected using an inertial measurement unit sensor. Most scholars segmented the entire timeseries data with a fixed window size before performing recognition. However, this approach has limitations in performance because the execution time of the human activity is usually unknown. Therefore, there have been many attempts to solve this problem through the method of activity recognition by sliding the classification window along the time axis. In this study, we propose a method for classifying all frames rather than a window-based recognition method. For implementation, features extracted using multiple convolutional neural networks with different kernel sizes were fused and used. In addition, similar to the convolutional block attention module, an attention layer to each channel and spatial level is applied to improve the model recognition performance. To verify the performance of the proposed model and prove the effectiveness of the proposed method on human activity recognition, evaluation experiments were performed. For comparison, models using various basic deep learning modules and models, in which all frames were classified for recognizing a specific wave in electrocardiography data were applied. As a result, the proposed model reported the best F1-score (over 0.9) for all kinds of target activities compared to other deep learning-based recognition models. Further, for the improvement verification of the proposed CEF method, the proposed method was compared with three types of SW method. As a result, the proposed method reported the 0.154 higher F1-score than SW. In the case of the designed model, the F1-score was higher as much as 0.184.


Assuntos
Aprendizado Profundo , Humanos , Semântica , Aceleração , Atividades Humanas , Atenção
3.
Sensors (Basel) ; 22(21)2022 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-36365858

RESUMO

Among existing wireless and wearable indoor pedestrian tracking solutions, the ultra-wideband (UWB) and inertial measurement unit (IMU) sensors are the popular options due to their accurate and globally referenced positioning, and low-cost and compact size, respectively. However, the UWB position accuracy is compromised by the indoor non-line of sight (NLOS) and the IMU estimation suffers from orientation drift as well as requiring position initialization. To overcome these limitations, this paper proposes a low-cost foot-placed UWB and IMU fusion-based indoor pedestrian tracking system. Our data fusion model is an improved loosely coupled Kalman filter with the inclusion of valid UWB observation detection. In this manner, the proposed system not only adjusts the consumer-grade IMU's accumulated drift but also filters out any NLOS instances in the UWB observation. We validated the performance of the proposed system with two experimental scenarios in a complex indoor environment. The root mean square (RMS) positioning accuracy of our data fusion model is enhanced by 60%, 53%, and 27% compared to that of the IMU-based pedestrian dead reckoning, raw UWB position, and conventional fusion model, respectively, in the single-lap NLOS scenario, and by 70%, 34%, and 12%, respectively, in the multi-lap LOS+NLOS scenario.


Assuntos
Pedestres , Humanos , Algoritmos ,
4.
Sensors (Basel) ; 22(20)2022 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-36298089

RESUMO

Speech is a commonly used interaction-recognition technique in edutainment-based systems and is a key technology for smooth educational learning and user-system interaction. However, its application to real environments is limited owing to the various noise disruptions in real environments. In this study, an audio and visual information-based multimode interaction system is proposed that enables virtual aquarium systems that use speech to interact to be robust to ambient noise. For audio-based speech recognition, a list of words recognized by a speech API is expressed as word vectors using a pretrained model. Meanwhile, vision-based speech recognition uses a composite end-to-end deep neural network. Subsequently, the vectors derived from the API and vision are classified after concatenation. The signal-to-noise ratio of the proposed system was determined based on data from four types of noise environments. Furthermore, it was tested for accuracy and efficiency against existing single-mode strategies for extracting visual features and audio speech recognition. Its average recognition rate was 91.42% when only speech was used, and improved by 6.7% to 98.12% when audio and visual information were combined. This method can be helpful in various real-world settings where speech recognition is regularly utilized, such as cafés, museums, music halls, and kiosks.


Assuntos
Percepção da Fala , Fala , Interface para o Reconhecimento da Fala , Ruído , Razão Sinal-Ruído
5.
Sensors (Basel) ; 22(9)2022 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-35590844

RESUMO

Skeleton data, which is often used in the HCI field, is a data structure that can efficiently express human poses and gestures because it consists of 3D positions of joints. The advancement of RGB-D sensors, such as Kinect sensors, enabled the easy capture of skeleton data from depth or RGB images. However, when tracking a target with a single sensor, there is an occlusion problem causing the quality of invisible joints to be randomly degraded. As a result, multiple sensors should be used to reliably track a target in all directions over a wide range. In this paper, we proposed a new method for combining multiple inaccurate skeleton data sets obtained from multiple sensors that capture a target from different angles into a single accurate skeleton data. The proposed algorithm uses density-based spatial clustering of applications with noise (DBSCAN) to prevent noise-added inaccurate joint candidates from participating in the merging process. After merging with the inlier candidates, we used Kalman filter to denoise the tremble error of the joint's movement. We evaluated the proposed algorithm's performance using the best view as the ground truth. In addition, the results of different sizes for the DBSCAN searching area were analyzed. By applying the proposed algorithm, the joint position accuracy of the merged skeleton improved as the number of sensors increased. Furthermore, highest performance was shown when the searching area of DBSCAN was 10 cm.


Assuntos
Algoritmos , Sistema Musculoesquelético , Humanos , Movimento , Esqueleto
6.
Sensors (Basel) ; 22(9)2022 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-35591284

RESUMO

Concomitant with the recent advances in deep learning, automatic speech recognition and visual speech recognition (VSR) have received considerable attention. However, although VSR systems must identify speech from both frontal and profile faces in real-world scenarios, most VSR studies have focused solely on frontal face pictures. To address this issue, we propose an end-to-end sentence-level multi-view VSR architecture for faces captured from four different perspectives (frontal, 30°, 45°, and 60°). The encoder uses multiple convolutional neural networks with a spatial attention module to detect minor changes in the mouth patterns of similarly pronounced words, and the decoder uses cascaded local self-attention connectionist temporal classification to collect the details of local contextual information in the immediate vicinity, which results in a substantial performance boost and speedy convergence. To compare the performance of the proposed model for experiments on the OuluVS2 dataset, the dataset was divided into four different perspectives, and the obtained performance improvement was 3.31% (0°), 4.79% (30°), 5.51% (45°), 6.18% (60°), and 4.95% (mean), respectively, compared with the existing state-of-the-art performance, and the average performance improved by 9.1% compared with the baseline. Thus, the suggested design enhances the performance of multi-view VSR and boosts its usefulness in real-world applications.


Assuntos
Leitura Labial , Redes Neurais de Computação , Atenção , Humanos , Idioma , Fala
7.
Sensors (Basel) ; 22(8)2022 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-35458932

RESUMO

Deep learning technology has encouraged research on noise-robust automatic speech recognition (ASR). The combination of cloud computing technologies and artificial intelligence has significantly improved the performance of open cloud-based speech recognition application programming interfaces (OCSR APIs). Noise-robust ASRs for application in different environments are being developed. This study proposes noise-robust OCSR APIs based on an end-to-end lip-reading architecture for practical applications in various environments. Several OCSR APIs, including Google, Microsoft, Amazon, and Naver, were evaluated using the Google Voice Command Dataset v2 to obtain the optimum performance. Based on performance, the Microsoft API was integrated with Google's trained word2vec model to enhance the keywords with more complete semantic information. The extracted word vector was integrated with the proposed lip-reading architecture for audio-visual speech recognition. Three forms of convolutional neural networks (3D CNN, 3D dense connection CNN, and multilayer 3D CNN) were used in the proposed lip-reading architecture. Vectors extracted from API and vision were classified after concatenation. The proposed architecture enhanced the OCSR API average accuracy rate by 14.42% using standard ASR evaluation measures along with the signal-to-noise ratio. The proposed model exhibits improved performance in various noise settings, increasing the dependability of OCSR APIs for practical applications.


Assuntos
Inteligência Artificial , Fala , Computação em Nuvem , Redes Neurais de Computação , Interface para o Reconhecimento da Fala
8.
Sensors (Basel) ; 23(1)2022 Dec 26.
Artigo em Inglês | MEDLINE | ID: mdl-36616844

RESUMO

The identification of attention deficit hyperactivity disorder (ADHD) in children, which is increasing every year worldwide, is very important for early diagnosis and treatment. However, since ADHD is not a simple disease that can be diagnosed with a simple test, doctors require a large period of time and substantial effort for accurate diagnosis and treatment. Currently, ADHD classification studies using various datasets and machine learning or deep learning algorithms are actively being conducted for the screening diagnosis of ADHD. However, there has been no study of ADHD classification using only skeleton data. It was hypothesized that the main symptoms of ADHD, such as distraction, hyperactivity, and impulsivity, could be differentiated through skeleton data. Thus, we devised a game system for the screening and diagnosis of children's ADHD and acquired children's skeleton data using five Azure Kinect units equipped with depth sensors, while the game was being played. The game for screening diagnosis involves a robot first travelling on a specific path, after which the child must remember the path the robot took and then follow it. The skeleton data used in this study were divided into two categories: standby data, obtained when a child waits while the robot demonstrates the path; and game data, obtained when a child plays the game. The acquired data were classified using the RNN series of GRU, RNN, and LSTM algorithms; a bidirectional layer; and a weighted cross-entropy loss function. Among these, an LSTM algorithm using a bidirectional layer and a weighted cross-entropy loss function obtained a classification accuracy of 97.82%.


Assuntos
Transtorno do Deficit de Atenção com Hiperatividade , Aprendizado Profundo , Sistema Musculoesquelético , Humanos , Criança , Transtorno do Deficit de Atenção com Hiperatividade/diagnóstico , Transtorno do Deficit de Atenção com Hiperatividade/terapia , Esqueleto
9.
Sensors (Basel) ; 23(1)2022 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-36616875

RESUMO

Although attention deficit hyperactivity disorder (ADHD) in children is rising worldwide, fewer studies have focused on screening than on the treatment of ADHD. Most previous similar ADHD classification studies classified only ADHD and normal classes. However, medical professionals believe that better distinguishing the ADHD-RISK class will assist them socially and medically. We created a projection-based game in which we can see stimuli and responses to better understand children's abnormal behavior. The developed screening game is divided into 11 stages. Children play five games. Each game is divided into waiting and game stages; thus, 10 stages are created, and the additional waiting stage includes an explanation stage where the robot waits while explaining the first game. Herein, we classified normal, ADHD-RISK, and ADHD using skeleton data obtained through games for ADHD screening of children and a bidirectional long short-term memory-based deep learning model. We verified the importance of each stage by passing the feature for each stage through the channel attention layer. Consequently, the final classification accuracy of the three classes was 98.15% using bi-directional LSTM with channel attention model. Additionally, the attention scores obtained through the channel attention layer indicated that the data in the latter part of the game are heavily involved in learning the ADHD-RISK case. These results imply that for ADHD-RISK, the game is repeated, and children's attention decreases as they progress to the second half.


Assuntos
Transtorno do Deficit de Atenção com Hiperatividade , Aprendizado Profundo , Comportamento Problema , Robótica , Jogos de Vídeo , Humanos , Criança , Transtorno do Deficit de Atenção com Hiperatividade/diagnóstico , Transtorno do Deficit de Atenção com Hiperatividade/terapia
10.
Sensors (Basel) ; 22(1)2021 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-35009612

RESUMO

In visual speech recognition (VSR), speech is transcribed using only visual information to interpret tongue and teeth movements. Recently, deep learning has shown outstanding performance in VSR, with accuracy exceeding that of lipreaders on benchmark datasets. However, several problems still exist when using VSR systems. A major challenge is the distinction of words with similar pronunciation, called homophones; these lead to word ambiguity. Another technical limitation of traditional VSR systems is that visual information does not provide sufficient data for learning words such as "a", "an", "eight", and "bin" because their lengths are shorter than 0.02 s. This report proposes a novel lipreading architecture that combines three different convolutional neural networks (CNNs; a 3D CNN, a densely connected 3D CNN, and a multi-layer feature fusion 3D CNN), which are followed by a two-layer bi-directional gated recurrent unit. The entire network was trained using connectionist temporal classification. The results of the standard automatic speech recognition evaluation metrics show that the proposed architecture reduced the character and word error rates of the baseline model by 5.681% and 11.282%, respectively, for the unseen-speaker dataset. Our proposed architecture exhibits improved performance even when visual ambiguity arises, thereby increasing VSR reliability for practical applications.


Assuntos
Percepção da Fala , Fala , Humanos , Leitura Labial , Redes Neurais de Computação , Reprodutibilidade dos Testes
11.
Annu Int Conf IEEE Eng Med Biol Soc ; 2019: 542-545, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31945957

RESUMO

Gait is an important indicator for specific diseases. Abnormal gait patterns are caused by various factors such as physical, neurological, and sensory problems. If it is possible to recognize abnormal gait patterns in the early stage of the related disease, patients can receive proper treatment early and prevent secondary accidents such as falls caused by unbalanced gait. In this paper, we propose a gait recognition system that can recognize 5 abnormal gait patterns. Our system using 3D joint information obtained by using multiple Kinect v2 sensors and RNN-LSTM. In particular, abnormal gaits caused by physical problems such as injury, weakness of muscle, and joint problems are targeted for recognition. The purpose of this paper is to find optimal condition for gait recognition when using the multiple Kinect v2 sensors. Experiments were conducted by comparing the test accuracies on 14 combinations of human joint. Through this experiment, we selected optimal joints to show outstanding results so that our gait recognition model performs optimally. Results show that Ankles, Wrists, and the Head are the most influential joints on RNN-LSTM model. We applied 25-joint information of the human body to recognize gait patterns and achieved an accuracy over 97%.


Assuntos
Marcha , Tornozelo , Fenômenos Biomecânicos , Humanos
12.
PLoS One ; 10(4): e0123251, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25898367

RESUMO

UNLABELLED: The purpose of this study was to investigate if multi-domain cognitive training, especially robot-assisted training, alters cortical thickness in the brains of elderly participants. A controlled trial was conducted with 85 volunteers without cognitive impairment who were 60 years old or older. Participants were first randomized into two groups. One group consisted of 48 participants who would receive cognitive training and 37 who would not receive training. The cognitive training group was randomly divided into two groups, 24 who received traditional cognitive training and 24 who received robot-assisted cognitive training. The training for both groups consisted of daily 90-min-session, five days a week for a total of 12 weeks. The primary outcome was the changes in cortical thickness. When compared to the control group, both groups who underwent cognitive training demonstrated attenuation of age related cortical thinning in the frontotemporal association cortices. When the robot and the traditional interventions were directly compared, the robot group showed less cortical thinning in the anterior cingulate cortices. Our results suggest that cognitive training can mitigate age-associated structural brain changes in the elderly. TRIAL REGISTRATION: ClnicalTrials.gov NCT01596205.


Assuntos
Encéfalo/patologia , Terapia Cognitivo-Comportamental/métodos , Demência/prevenção & controle , Idoso , Cognição , Feminino , Humanos , Vida Independente , Masculino , Pessoa de Meia-Idade , Tamanho do Órgão , Robótica , Resultado do Tratamento
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA