Pesquisa | BVS CLAP/SMR-OPAS/OMS

Optimising Speaker-Dependent Feature Extraction Parameters to Improve Automatic Speech Recognition Performance for People with Dysarthria.

Marini, Marco; Vanello, Nicola; Fanucci, Luca.

Sensors (Basel) ; 21(19)2021 Sep 27.

Artigo em Inglês | MEDLINE | ID: mdl-34640780

RESUMO

Within the field of Automatic Speech Recognition (ASR) systems, facing impaired speech is a big challenge because standard approaches are ineffective in the presence of dysarthria. The first aim of our work is to confirm the effectiveness of a new speech analysis technique for speakers with dysarthria. This new approach exploits the fine-tuning of the size and shift parameters of the spectral analysis window used to compute the initial short-time Fourier transform, to improve the performance of a speaker-dependent ASR system. The second aim is to define if there exists a correlation among the speaker's voice features and the optimal window and shift parameters that minimises the error of an ASR system, for that specific speaker. For our experiments, we used both impaired and unimpaired Italian speech. Specifically, we used 30 speakers with dysarthria from the IDEA database and 10 professional speakers from the CLIPS database. Both databases are freely available. The results confirm that, if a standard ASR system performs poorly with a speaker with dysarthria, it can be improved by using the new speech analysis. Otherwise, the new approach is ineffective in cases of unimpaired and low impaired speech. Furthermore, there exists a correlation between some speaker's voice features and their optimal parameters.

Assuntos

Disartria , Percepção da Fala , Humanos , Fala , Distúrbios da Fala , Interface para o Reconhecimento da Fala

Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data.

Pervaiz, Ayesha; Hussain, Fawad; Israr, Huma; Tahir, Muhammad Ali; Raja, Fawad Riasat; Baloch, Naveed Khan; Ishmanov, Farruh; Zikria, Yousaf Bin.

Sensors (Basel) ; 20(8)2020 Apr 19.

Artigo em Inglês | MEDLINE | ID: mdl-32325814

RESUMO

The advent of new devices, technology, machine learning techniques, and the availability of free large speech corpora results in rapid and accurate speech recognition. In the last two decades, extensive research has been initiated by researchers and different organizations to experiment with new techniques and their applications in speech processing systems. There are several speech command based applications in the area of robotics, IoT, ubiquitous computing, and different human-computer interfaces. Various researchers have worked on enhancing the efficiency of speech command based systems and used the speech command dataset. However, none of them catered to noise in the same. Noise is one of the major challenges in any speech recognition system, as real-time noise is a very versatile and unavoidable factor that affects the performance of speech recognition systems, particularly those that have not learned the noise efficiently. We thoroughly analyse the latest trends in speech recognition and evaluate the speech command dataset on different machine learning based and deep learning based techniques. A novel technique is proposed for noise robustness by augmenting noise in training data. Our proposed technique is tested on clean and noisy data along with locally generated data and achieves much better results than existing state-of-the-art techniques, thus setting a new benchmark.

Assuntos

Ruído , Interface para o Reconhecimento da Fala , Aprendizado Profundo , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Percepção da Fala/fisiologia

Ver mais detalhes

ENVIAR RESULTADO:

Exportar

Imprimir

RSS

XML

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA