Assessment of Pepper Robot's Speech Recognition System through the Lens of Machine Learning.

Pande, Akshara; Mishra, Deepti

Pande, Akshara; Mishra, Deepti.

Afiliação

Pande A; Educational Technology Laboratory, Intelligent System and Analytics Group, Department of Computer Science (IDI), Norwegian University of Science and Technology, 2815 Gjøvik, Norway.
Mishra D; Educational Technology Laboratory, Intelligent System and Analytics Group, Department of Computer Science (IDI), Norwegian University of Science and Technology, 2815 Gjøvik, Norway.

Biomimetics (Basel) ; 9(7)2024 Jun 27.

Article em En | MEDLINE | ID: mdl-39056832

ABSTRACT

ABSTRACT

Speech comprehension can be challenging due to multiple factors, causing inconvenience for both the speaker and the listener. In such situations, using a humanoid robot, Pepper, can be beneficial as it can display the corresponding text on its screen. However, prior to that, it is essential to carefully assess the accuracy of the audio recordings captured by Pepper. Therefore, in this study, an experiment is conducted with eight participants with the primary objective of examining Pepper's speech recognition system with the help of audio features such as Mel-Frequency Cepstral Coefficients, spectral centroid, spectral flatness, the Zero-Crossing Rate, pitch, and energy. Furthermore, the K-means algorithm was employed to create clusters based on these features with the aim of selecting the most suitable cluster with the help of the speech-to-text conversion tool Whisper. The selection of the best cluster is accomplished by finding the maximum accuracy data points lying in a cluster. A criterion of discarding data points with values of WER above 0.3 is imposed to achieve this. The findings of this study suggest that a distance of up to one meter from the humanoid robot Pepper is suitable for capturing the best speech recordings. In contrast, age and gender do not influence the accuracy of recorded speech. The proposed system will provide a significant strength in settings where subtitles are required to improve the comprehension of spoken statements.

Palavras-chave

K-means clustering; Pepper robot; audio features; evaluation metrics; speech recognition

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links