Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Int J Med Sci ; 21(12): 2252-2260, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39310268

RESUMO

Background: The early detection of arteriovenous (AV) access dysfunction is crucial for maintaining the patency of vascular access. This study aimed to use deep learning to predict AV access malfunction necessitating further vascular management. Methods: This prospective cohort study enrolled prevalent hemodialysis (HD) patients with an AV fistula or AV graft from a single HD center. Their AV access bruit sounds were recorded weekly using an electronic stethoscope from three different sites (arterial needle site, venous needle site, and the midpoint between the arterial and venous needle sites) before HD sessions. The audio signals were converted to Mel spectrograms using Fourier transformation and utilized to develop deep learning models. Three deep learning models, (1) Convolutional Neural Network (CNN), (2) Convolutional Recurrent Neural Network (CRNN), and (3) Vision Transformers-Gate Recurrent Unit (ViT-GRU), were trained and compared to predict the likelihood of dysfunctional AV access. Results: Total 437 audio recordings were obtained from 84 patients. The CNN model outperformed the other models in the test set, with an F1 score of 0.7037 and area under the receiver operating characteristic curve (AUROC) of 0.7112. The Vit-GRU model had high performance in out-of-fold predictions, with an F1 score of 0.7131 and AUROC of 0.7745, but low generalization ability in the test set, with an F1 score of 0.5225 and AUROC of 0.5977. Conclusions: The CNN model based on Mel spectrograms could predict malfunctioning AV access requiring vascular intervention within 10 days. This approach could serve as a useful screening tool for high-risk AV access.


Assuntos
Derivação Arteriovenosa Cirúrgica , Aprendizado Profundo , Diálise Renal , Humanos , Feminino , Masculino , Pessoa de Meia-Idade , Estudos Prospectivos , Idoso , Diálise Renal/métodos , Curva ROC , Espectrografia do Som/métodos , Redes Neurais de Computação
2.
Sensors (Basel) ; 24(5)2024 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-38474965

RESUMO

Deep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the attention of existing researchers. Aiming at the problem that the emotional information contained in the current speech signals is distributed dispersedly and cannot comprehensively integrate local and global information, this paper presents a network model based on a gated recurrent unit (GRU) and multi-head attention. We evaluate our proposed emotion model on the IEMOCAP and Emo-DB corpora. The experimental results show that the network model based on Bi-GRU and multi-head attention is significantly better than the traditional network model at detecting multiple evaluation indicators. At the same time, we also apply the model to a speech sentiment analysis task. On the CH-SIMS and MOSI datasets, the model shows excellent generalization performance.


Assuntos
Percepção , Fala , Acústica , Emoções , Reconhecimento Psicológico
3.
Sensors (Basel) ; 23(22)2023 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-38005472

RESUMO

Recent successes in deep learning have inspired researchers to apply deep neural networks to Acoustic Event Classification (AEC). While deep learning methods can train effective AEC models, they are susceptible to overfitting due to the models' high complexity. In this paper, we introduce EnViTSA, an innovative approach that tackles key challenges in AEC. EnViTSA combines an ensemble of Vision Transformers with SpecAugment, a novel data augmentation technique, to significantly enhance AEC performance. Raw acoustic signals are transformed into Log Mel-spectrograms using Short-Time Fourier Transform, resulting in a fixed-size spectrogram representation. To address data scarcity and overfitting issues, we employ SpecAugment to generate additional training samples through time masking and frequency masking. The core of EnViTSA resides in its ensemble of pre-trained Vision Transformers, harnessing the unique strengths of the Vision Transformer architecture. This ensemble approach not only reduces inductive biases but also effectively mitigates overfitting. In this study, we evaluate the EnViTSA method on three benchmark datasets: ESC-10, ESC-50, and UrbanSound8K. The experimental results underscore the efficacy of our approach, achieving impressive accuracy scores of 93.50%, 85.85%, and 83.20% on ESC-10, ESC-50, and UrbanSound8K, respectively. EnViTSA represents a substantial advancement in AEC, demonstrating the potential of Vision Transformers and SpecAugment in the acoustic domain.

4.
Sensors (Basel) ; 23(6)2023 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-36991794

RESUMO

In the industrial sector, tool health monitoring has taken on significant importance due to its ability to save labor costs, time, and waste. The approach used in this research uses spectrograms of airborne acoustic emission data and a convolutional neural network variation called the Residual Network to monitor the tool health of an end-milling machine. The dataset was created using three different types of cutting tools: new, moderately used, and worn out. For various cut depths, the acoustic emission signals generated by these tools were recorded. The cuts ranged from 1 mm to 3 mm in depth. In the experiment, two distinct kinds of wood-hardwood (Pine) and softwood (Himalayan Spruce)-were employed. For each example, 28 samples totaling 10 s were captured. The trained model's prediction accuracy was evaluated using 710 samples, and the results showed an overall classification accuracy of 99.7%. The model's total testing accuracy was 100% for classifying hardwood and 99.5% for classifying softwood.

5.
Sensors (Basel) ; 23(16)2023 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-37631690

RESUMO

Hydraulic systems are used in all kinds of industries. Mills, manufacturing, robotics, and Ports require the use of Hydraulic Equipment. Many industries prefer to use hydraulic systems due to their numerous advantages over electrical and mechanical systems. Hence, the growth in demand for hydraulic systems has been increasing over time. Due to its vast variety of applications, the faults in hydraulic systems can cause a breakdown. Using Artificial-Intelligence (AI)-based approaches, faults can be classified and predicted to avoid downtime and ensure sustainable operations. This research work proposes a novel approach for the classification of the cooling behavior of a hydraulic test rig. Three fault conditions for the cooling system of the hydraulic test rig were used. The spectrograms were generated using the time series data for three fault conditions. The CNN variant, the Residual Network, was used for the classification of the fault conditions. Various features were extracted from the data including the F-score, precision, accuracy, and recall using a Confusion Matrix. The data contained 43,680 attributes and 2205 instances. After testing, validating, and training, the model accuracy of the ResNet-18 architecture was found to be close to 95%.

6.
Sensors (Basel) ; 23(6)2023 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-36991659

RESUMO

Internet of things (IoT)-enabled wireless body area network (WBAN) is an emerging technology that combines medical devices, wireless devices, and non-medical devices for healthcare management applications. Speech emotion recognition (SER) is an active research field in the healthcare domain and machine learning. It is a technique that can be used to automatically identify speakers' emotions from their speech. However, the SER system, especially in the healthcare domain, is confronted with a few challenges. For example, low prediction accuracy, high computational complexity, delay in real-time prediction, and how to identify appropriate features from speech. Motivated by these research gaps, we proposed an emotion-aware IoT-enabled WBAN system within the healthcare framework where data processing and long-range data transmissions are performed by an edge AI system for real-time prediction of patients' speech emotions as well as to capture the changes in emotions before and after treatment. Additionally, we investigated the effectiveness of different machine learning and deep learning algorithms in terms of performance classification, feature extraction methods, and normalization methods. We developed a hybrid deep learning model, i.e., convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM), and a regularized CNN model. We combined the models with different optimization strategies and regularization techniques to improve the prediction accuracy, reduce generalization error, and reduce the computational complexity of the neural networks in terms of their computational time, power, and space. Different experiments were performed to check the efficiency and effectiveness of the proposed machine learning and deep learning algorithms. The proposed models are compared with a related existing model for evaluation and validation using standard performance metrics such as prediction accuracy, precision, recall, F1 score, confusion matrix, and the differences between the actual and predicted values. The experimental results proved that one of the proposed models outperformed the existing model with an accuracy of about 98%.


Assuntos
Internet das Coisas , Fala , Humanos , Redes Neurais de Computação , Aprendizado de Máquina , Emoções
7.
BMC Med Inform Decis Mak ; 22(1): 226, 2022 08 29.
Artigo em Inglês | MEDLINE | ID: mdl-36038901

RESUMO

BACKGROUND: The application of machine learning to cardiac auscultation has the potential to improve the accuracy and efficiency of both routine and point-of-care screenings. The use of convolutional neural networks (CNN) on heart sound spectrograms in particular has defined state-of-the-art performance. However, the relative paucity of patient data remains a significant barrier to creating models that can adapt to a wide range of potential variability. To that end, we examined a CNN model's performance on automated heart sound classification, before and after various forms of data augmentation, and aimed to identify the most optimal augmentation methods for cardiac spectrogram analysis. RESULTS: We built a standard CNN model to classify cardiac sound recordings as either normal or abnormal. The baseline control model achieved a PR AUC of 0.763 ± 0.047. Among the single data augmentation techniques explored, horizontal flipping of the spectrogram image improved the model performance the most, with a PR AUC of 0.819 ± 0.044. Principal component analysis color augmentation (PCA) and perturbations of saturation-value (SV) of the hue-saturation-value (HSV) color scale achieved a PR AUC of 0.779 ± 045 and 0.784 ± 0.037, respectively. Time and frequency masking resulted in a PR AUC of 0.772 ± 0.050. Pitch shifting, time stretching and compressing, noise injection, vertical flipping, and applying random color filters negatively impacted model performance. Concatenating the best performing data augmentation technique (horizontal flip) with PCA and SV perturbations improved model performance. CONCLUSION: Data augmentation can improve classification accuracy by expanding and diversifying the dataset, which protects against overfitting to random variance. However, data augmentation is necessarily domain specific. For example, methods like noise injection have found success in other areas of automated sound classification, but in the context of cardiac sound analysis, noise injection can mimic the presence of murmurs and worsen model performance. Thus, care should be taken to ensure clinically appropriate forms of data augmentation to avoid negatively impacting model performance.


Assuntos
Ruídos Cardíacos , Humanos , Aprendizado de Máquina , Redes Neurais de Computação
8.
Pattern Recognit ; 127: 108656, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35313619

RESUMO

This study presents the Auditory Cortex ResNet (AUCO ResNet), it is a biologically inspired deep neural network especially designed for sound classification and more specifically for Covid-19 recognition from audio tracks of coughs and breaths. Differently from other approaches, it can be trained end-to-end thus optimizing (with gradient descent) all the modules of the learning algorithm: mel-like filter design, feature extraction, feature selection, dimensionality reduction and prediction. This neural network includes three attention mechanisms namely the squeeze and excitation mechanism, the convolutional block attention module, and the novel sinusoidal learnable attention. The attention mechanism is able to merge relevant information from activation maps at various levels of the network. The net takes as input raw audio files and it is able to fine tune also the features extraction phase. In fact, a Mel-like filter is designed during the training, thus adapting filter banks on important frequencies. AUCO ResNet has proved to provide state of art results on many datasets. Firstly, it has been tested on many datasets containing Covid-19 cough and breath. This choice is related to the fact that that cough and breath are language independent, allowing for cross dataset tests with generalization aims. These tests demonstrate that the approach can be adopted as a low cost, fast and remote Covid-19 pre-screening tool. The net has also been tested on the famous UrbanSound 8K dataset, achieving state of the art accuracy without any data preprocessing or data augmentation technique.

9.
Sensors (Basel) ; 22(24)2022 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-36559944

RESUMO

The non-invasive electrocardiogram (ECG) signals are useful in heart condition assessment and are found helpful in diagnosing cardiac diseases. However, traditional ways, i.e., a medical consultation required effort, knowledge, and time to interpret the ECG signals due to the large amount of data and complexity. Neural networks have been shown to be efficient recently in interpreting the biomedical signals including ECG and EEG. The novelty of the proposed work is using spectrograms instead of raw signals. Spectrograms could be easily reduced by eliminating frequencies with no ECG information. Moreover, spectrogram calculation is time-efficient through short-time Fourier transformation (STFT) which allowed to present reduced data with well-distinguishable form to convolutional neural network (CNN). The data reduction was performed through frequency filtration by taking a specific cutoff value. These steps makes architecture of the CNN model simple which showed high accuracy. The proposed approach reduced memory usage and computational power through not using complex CNN models. A large publicly available PTB-XL dataset was utilized, and two datasets were prepared, i.e., spectrograms and raw signals for binary classification. The highest accuracy of 99.06% was achieved by the proposed approach, which reflects spectrograms are better than the raw signals for ECG classification. Further, up- and down-sampling of the signals were also performed at various sampling rates and accuracies were attained.


Assuntos
Cardiopatias , Redes Neurais de Computação , Humanos , Frequência Cardíaca , Eletrocardiografia , Filtração , Algoritmos
10.
Sensors (Basel) ; 22(5)2022 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-35271130

RESUMO

The periodic inspection of railroad tracks is very important to find structural and geometrical problems that lead to railway accidents. Currently, in Pakistan, rail tracks are inspected by an acoustic-based manual system that requires a railway engineer as a domain expert to differentiate between different rail tracks' faults, which is cumbersome, laborious, and error-prone. This study proposes the use of traditional acoustic-based systems with deep learning models to increase performance and reduce train accidents. Two convolutional neural networks (CNN) models, convolutional 1D and convolutional 2D, and one recurrent neural network (RNN) model, a long short-term memory (LSTM) model, are used in this regard. Initially, three types of faults are considered, including superelevation, wheel burnt, and normal tracks. Contrary to traditional acoustic-based systems where the spectrogram dataset is generated before the model training, the proposed approach uses on-the-fly feature extraction by generating spectrograms as a deep learning model's layer. Different lengths of audio samples are used to analyze their performance with each model. Each audio sample of 17 s is split into 3 variations of 1.7, 3.4, and 8.5 s, and all 3 deep learning models are trained and tested against each split time. Various combinations of audio data augmentation are analyzed extensively to investigate models' performance. The results suggest that the LSTM with 8.5 split time gives the best results with the accuracy of 99.7%, the precision of 99.5%, recall of 99.5%, and F1 score of 99.5%.


Assuntos
Aprendizado Profundo , Acústica , Redes Neurais de Computação
11.
Sensors (Basel) ; 21(5)2021 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-33800348

RESUMO

Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions-lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired t-test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0-8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.


Assuntos
Emoções , Fala , Bases de Dados Factuais , Percepção
12.
J Psycholinguist Res ; 50(3): 463-505, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33423116

RESUMO

The sixteen SOTs examined are on-air ones produced by native English TV presenters and anchors. Although these SOTs seem funny, they reflect a great deal about how naturalistic speech is assembled and produced. Acoustic analysis is also brought to bear on the present investigation with the aim of providing accurate findings. Several psycholinguistic models are invoked in the analysis, and Praat 6 is used to provide spectrograms and waveforms for the errors detected. The present study concludes that the SOTs examined in the present corpus reveal much about the processing of erroneous speech. Substitution errors, being the most prominent, exhibit uniform processing through a replacement on phonemic or higher levels. As for anticipation errors, they prove to be irregular in their production. Other errors are sparse in the present corpus, and cannot be generalized over a wide range of instances, since they occur either once or twice.


Assuntos
Psicolinguística , Fala , Acústica , Humanos , Idioma , Língua
13.
Cereb Cortex ; 29(6): 2668-2681, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29897408

RESUMO

Event-related fluctuations of neural oscillatory amplitude are reported widely in the context of cognitive processing and are typically interpreted as a marker of brain "activity". However, the precise nature of these effects remains unclear; in particular, whether such fluctuations reflect local dynamics, integration between regions, or both, is unknown. Here, using magnetoencephalography, we show that movement induced oscillatory modulation is associated with transient connectivity between sensorimotor regions. Further, in resting-state data, we demonstrate a significant association between oscillatory modulation and dynamic connectivity. A confound with such empirical measurements is that increased amplitude necessarily means increased signal-to-noise ratio (SNR): this means that the question of whether amplitude and connectivity are genuinely coupled, or whether increased connectivity is observed purely due to increased SNR is unanswered. Here, we counter this problem by analogy with computational models which show that, in the presence of global network coupling and local multistability, the link between oscillatory modulation and long-range connectivity is a natural consequence of neural networks. Our results provide evidence for the notion that connectivity is mediated by neural oscillations, and suggest that time-frequency spectrograms are not merely a description of local synchrony but also reflect fluctuations in long-range connectivity.


Assuntos
Encéfalo/fisiologia , Modelos Neurológicos , Rede Nervosa/fisiologia , Neurônios/fisiologia , Adulto , Simulação por Computador , Feminino , Humanos , Magnetoencefalografia , Masculino , Desempenho Psicomotor/fisiologia
14.
Sensors (Basel) ; 20(8)2020 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-32325959

RESUMO

Delamination is one of the detrimental defects in laminated composite materials that often arose due to manufacturing defects or in-service loadings (e.g., low/high velocity impacts). Most of the contemporary research efforts are dedicated to high-frequency guided wave and mode shape-based methods for the assessment (i.e., detection, quantification, localization) of delamination. This paper presents a deep learning framework for structural vibration-based assessment of delamination in smart composite laminates. A number of small-sized (4.5% of total area) inner and edge delaminations are simulated using an electromechanically coupled model of the piezo-bonded laminated composite. Healthy and delaminated structures are stimulated with random loads and the corresponding transient responses are transformed into spectrograms using optimal values of window size, overlapping rate, window type, and fast Fourier transform (FFT) resolution. A convolutional neural network (CNN) is designed to automatically extract discriminative features from the vibration-based spectrograms and use those to distinguish the intact and delaminated cases of the smart composite laminate. The proposed architecture of the convolutional neural network showed a training accuracy of 99.9%, validation accuracy of 97.1%, and test accuracy of 94.5% on an unseen data set. The testing confusion chart of the pre-trained convolutional neural network revealed interesting results regarding the severity and detectability for the in-plane and through the thickness scenarios of delamination.

15.
Sensors (Basel) ; 20(18)2020 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-32932723

RESUMO

Artificial intelligence (AI) and machine learning (ML) are employed to make systems smarter. Today, the speech emotion recognition (SER) system evaluates the emotional state of the speaker by investigating his/her speech signal. Emotion recognition is a challenging task for a machine. In addition, making it smarter so that the emotions are efficiently recognized by AI is equally challenging. The speech signal is quite hard to examine using signal processing methods because it consists of different frequencies and features that vary according to emotions, such as anger, fear, sadness, happiness, boredom, disgust, and surprise. Even though different algorithms are being developed for the SER, the success rates are very low according to the languages, the emotions, and the databases. In this paper, we propose a new lightweight effective SER model that has a low computational complexity and a high recognition accuracy. The suggested method uses the convolutional neural network (CNN) approach to learn the deep frequency features by using a plain rectangular filter with a modified pooling strategy that have more discriminative power for the SER. The proposed CNN model was trained on the extracted frequency features from the speech data and was then tested to predict the emotions. The proposed SER model was evaluated over two benchmarks, which included the interactive emotional dyadic motion capture (IEMOCAP) and the berlin emotional speech database (EMO-DB) speech datasets, and it obtained 77.01% and 92.02% recognition results. The experimental results demonstrated that the proposed CNN-based SER system can achieve a better recognition performance than the state-of-the-art SER systems.


Assuntos
Inteligência Artificial , Emoções , Fala , Feminino , Humanos , Aprendizado de Máquina , Masculino , Redes Neurais de Computação
16.
Sensors (Basel) ; 20(19)2020 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-32998382

RESUMO

Speech emotion recognition (SER) classifies emotions using low-level features or a spectrogram of an utterance. When SER methods are trained and tested using different datasets, they have shown performance reduction. Cross-corpus SER research identifies speech emotion using different corpora and languages. Recent cross-corpus SER research has been conducted to improve generalization. To improve the cross-corpus SER performance, we pretrained the log-mel spectrograms of the source dataset using our designed visual attention convolutional neural network (VACNN), which has a 2D CNN base model with channel- and spatial-wise visual attention modules. To train the target dataset, we extracted the feature vector using a bag of visual words (BOVW) to assist the fine-tuned model. Because visual words represent local features in the image, the BOVW helps VACNN to learn global and local features in the log-mel spectrogram by constructing a frequency histogram of visual words. The proposed method shows an overall accuracy of 83.33%, 86.92%, and 75.00% in the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the Berlin Database of Emotional Speech (EmoDB), and Surrey Audio-Visual Expressed Emotion (SAVEE), respectively. Experimental results on RAVDESS, EmoDB, SAVEE demonstrate improvements of 7.73%, 15.12%, and 2.34% compared to existing state-of-the-art cross-corpus SER approaches.


Assuntos
Emoções , Redes Neurais de Computação , Fala , Idioma , Percepção
17.
Sensors (Basel) ; 19(12)2019 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-31238537

RESUMO

Falls are the major cause of fatal and non-fatal injury among people aged more than 65 years. Due to the grave consequences of the occurrence of falls, it is necessary to conduct thorough research on falls. This paper presents a method for the study of fall detection using surface electromyography (sEMG) based on an improved dual parallel channels convolutional neural network (IDPC-CNN). The proposed IDPC-CNN model is designed to identify falls from daily activities using the spectral features of sEMG. Firstly, the classification accuracy of time domain features and spectrograms are compared using linear discriminant analysis (LDA), k-nearest neighbor (KNN) and support vector machine (SVM). Results show that spectrograms provide a richer way to extract pattern information and better classification performance. Therefore, the spectrogram features of sEMG are selected as the input of IDPC-CNN to distinguish between daily activities and falls. Finally, The IDPC-CNN is compared with SVM and three different structure CNNs under the same conditions. Experimental results show that the proposed IDPC-CNN achieves 92.55% accuracy, 95.71% sensitivity and 91.7% specificity. Overall, The IDPC-CNN is more effective than the comparison in accuracy, efficiency, training and generalization.


Assuntos
Acidentes por Quedas , Máquina de Vetores de Suporte , Análise Discriminante , Eletromiografia , Humanos , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão
18.
Sensors (Basel) ; 19(22)2019 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-31744136

RESUMO

This paper aims to design and implement a system capable of distinguishing between different activities carried out during a tennis match. The goal is to achieve the correct classification of a set of tennis strokes. The system must exhibit robustness to the variability of the height, age or sex of any subject that performs the actions. A new database is developed to meet this objective. The system is based on two sensor nodes using Bluetooth Low Energy (BLE) wireless technology to communicate with a PC that acts as a central device to collect the information received by the sensors. The data provided by these sensors are processed to calculate their spectrograms. Through the application of innovative deep learning techniques with semi-supervised training, it is possible to carry out the extraction of characteristics and the classification of activities. Preliminary results obtained with a data set of eight players, four women and four men have shown that our approach is able to address the problem of the diversity of human constitutions, weight and sex of different players, providing accuracy greater than 96.5% to recognize the tennis strokes of a new player never seen before by the system.


Assuntos
Desempenho Atlético/fisiologia , Tênis/fisiologia , Dispositivos Eletrônicos Vestíveis , Adulto , Feminino , Humanos , Masculino , Monitorização Fisiológica , Tecnologia sem Fio
19.
Sensors (Basel) ; 19(8)2019 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-30991690

RESUMO

We applied deep learning to create an algorithm for breathing phase detection in lung sound recordings, and we compared the breathing phases detected by the algorithm and manually annotated by two experienced lung sound researchers. Our algorithm uses a convolutional neural network with spectrograms as the features, removing the need to specify features explicitly. We trained and evaluated the algorithm using three subsets that are larger than previously seen in the literature. We evaluated the performance of the method using two methods. First, discrete count of agreed breathing phases (using 50% overlap between a pair of boxes), shows a mean agreement with lung sound experts of 97% for inspiration and 87% for expiration. Second, the fraction of time of agreement (in seconds) gives higher pseudo-kappa values for inspiration (0.73-0.88) than expiration (0.63-0.84), showing an average sensitivity of 97% and an average specificity of 84%. With both evaluation methods, the agreement between the annotators and the algorithm shows human level performance for the algorithm. The developed algorithm is valid for detecting breathing phases in lung sound recordings.

20.
Entropy (Basel) ; 21(5)2019 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-33267193

RESUMO

Detecting human intentions and emotions helps improve human-robot interactions. Emotion recognition has been a challenging research direction in the past decade. This paper proposes an emotion recognition system based on analysis of speech signals. Firstly, we split each speech signal into overlapping frames of the same length. Next, we extract an 88-dimensional vector of audio features including Mel Frequency Cepstral Coefficients (MFCC), pitch, and intensity for each of the respective frames. In parallel, the spectrogram of each frame is generated. In the final preprocessing step, by applying k-means clustering on the extracted features of all frames of each audio signal, we select k most discriminant frames, namely keyframes, to summarize the speech signal. Then, the sequence of the corresponding spectrograms of keyframes is encapsulated in a 3D tensor. These tensors are used to train and test a 3D Convolutional Neural network using a 10-fold cross-validation approach. The proposed 3D CNN has two convolutional layers and one fully connected layer. Experiments are conducted on the Surrey Audio-Visual Expressed Emotion (SAVEE), Ryerson Multimedia Laboratory (RML), and eNTERFACE'05 databases. The results are superior to the state-of-the-art methods reported in the literature.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA