Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Sensors (Basel) ; 22(16)2022 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-36016068

RESUMO

There are many speech and audio processing applications and their number is growing. They may cover a wide range of tasks, each having different requirements on the processed speech or audio signals and, therefore, indirectly, on the audio sensors as well. This article reports on tests and evaluation of the effect of basic physical properties of speech and audio signals on the recognition accuracy of major speech/audio processing applications, i.e., speech recognition, speaker recognition, speech emotion recognition, and audio event recognition. A particular focus is on frequency ranges, time intervals, a precision of representation (quantization), and complexities of models suitable for each class of applications. Using domain-specific datasets, eligible feature extraction methods and complex neural network models, it was possible to test and evaluate the effect of basic speech and audio signal properties on the achieved accuracies for each group of applications. The tests confirmed that the basic parameters do affect the overall performance and, moreover, this effect is domain-dependent. Therefore, accurate knowledge of the extent of these effects can be valuable for system designers when selecting appropriate hardware, sensors, architecture, and software for a particular application, especially in the case of limited resources.


Assuntos
Redes Neurais de Computação , Fala , Emoções , Software
2.
Sensors (Basel) ; 21(5)2021 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-33800348

RESUMO

Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions-lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired t-test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0-8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.


Assuntos
Emoções , Fala , Bases de Dados Factuais , Percepção
3.
Sci Rep ; 11(1): 15687, 2021 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-34344972

RESUMO

A new detection method for cognitive impairments is presented utilizing an eye tracking signals in a text reading test. This research enhances published articles that extract combination of various features. It does so by processing entire eye-tracking records either in time or frequency whereas applying only basic signal pre-processing. Such signals were classified as a whole by Convolutional Neural Networks (CNN) that hierarchically extract substantial features scatter either in time or frequency and nonlinearly binds them using machine learning to minimize a detection error. In the experiments we used a 100 fold cross validation and a dataset containing signals of 185 subjects (88 subjects with low risk and 97 subjects with high risk of dyslexia). In a series of experiments it was found that magnitude spectrum based representation of time interpolated eye-tracking signals recorded the best results, i.e. an average accuracy of 96.6% was reached in comparison to 95.6% that is the best published result on the same database. These findings suggest that a holistic approach involving small but complex enough CNNs applied to properly pre-process and expressed signals provides even better results than a combination of meticulously selected well-known features.


Assuntos
Dislexia/diagnóstico , Movimentos Oculares , Tecnologia de Rastreamento Ocular , Algoritmos , Análise de Dados , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Leitura
4.
IEEE J Biomed Health Inform ; 24(11): 3055-3065, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32750936

RESUMO

Currently psychiatry is a medical field lacking an automated diagnostic process. The presence of a mental disorder is established by observing its typical symptoms. Eye-movement specifics have already been established as an "endophenotype" for schizophrenia, but an automated diagnostic process of eye-movement analysis is still lacking. This article presents several novel approaches for the automatic detection of a schizophrenic disorder based on a free-view image test using a Rorschach inkblot and an eye tracker. Several features that enabled us to analyse the eye-tracker signal as a whole as well as its specific parts were tested. The variety of features spans global (heat maps, gaze plots), sequences of features (means, variances, and spectra), static (x and y signals as 2D images), dynamic (velocities), and model-based (limiting probabilities and transition matrices) categories. For each set of features, a proper modelling and classification method was designed (convolutional, recurrent, fully connected and combined neural networks; Hidden Markov models). By doing so, it was possible to find the importance of each feature and its physical representation using k-fold cross validation and a paired t-test. The dataset was sampled on 22 people with schizophrenia and 22 healthy individuals. The most successful approach was based on heat maps using all data and convolutional networks, reaching a 78.8% accuracy, which is a 10.5% improvement over the reference method. From all tested methods, there are two in an 85% accuracy range and over fifteen others in a 75% accuracy range at a 10% significance level.


Assuntos
Esquizofrenia , Movimentos Oculares , Tecnologia de Rastreamento Ocular , Humanos , Redes Neurais de Computação , Esquizofrenia/diagnóstico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA