Phonetic posteriorgram-based voice conversion system to improve speech intelligibility of dysarthric patients.

Zheng, Wei-Zhong; Han, Ji-Yan; Lee, Chen-Kai; Lin, Yu-Yi; Chang, Shu-Han; Lai, Ying-Hui

Zheng, Wei-Zhong; Han, Ji-Yan; Lee, Chen-Kai; Lin, Yu-Yi; Chang, Shu-Han; Lai, Ying-Hui.

Afiliação

Zheng WZ; Department of Biomedical Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan.
Han JY; Department of Biomedical Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan.
Lee CK; Department of Biomedical Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan.
Lin YY; Department of Biomedical Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan.
Chang SH; Monte Vista Christian School, CA, USA.
Lai YH; Department of Biomedical Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan; Medical Device Innovation & Translation Center, National Yang Ming Chiao Tung University, Taipei, Taiwan. Electronic address: yh.lai@nycu.edu.tw.

Comput Methods Programs Biomed ; 215: 106602, 2022 Mar.

Article em En | MEDLINE | ID: mdl-35021138

RESUMO

BACKGROUND AND OBJECTIVE: Most dysarthric patients encounter communication problems due to unintelligible speech. Currently, there are many voice-driven systems aimed at improving their speech intelligibility; however, the intelligibility performance of these systems are affected by challenging application conditions (e.g., time variance of patient's speech and background noise). To alleviate these problems, we proposed a dysarthria voice conversion (DVC) system for dysarthric patients and investigated the benefits under challenging application conditions. METHOD: A deep learning-based voice conversion system with phonetic posteriorgram (PPG) features, called the DVC-PPG system, was proposed in this study. An objective-evaluation metric of Google automatic speech recognition (Google ASR) system and a listening test were used to demonstrate the speech intelligibility benefits of DVC-PPG under quiet and noisy test conditions; besides, the well-known voice conversion system using mel-spectrogram, DVC-Mels, was used for comparison to verify the benefits of the proposed DVC-PPG system. RESULTS: The objective-evaluation metric of Google ASR showed the average accuracy of two subjects in the duplicate and outside test conditions while the DVC-PPG system provided higher speech recognitions rate (83.2% and 67.5%) than dysarthric speech (36.5% and 26.9%) and DVC-Mels (52.9% and 33.8%) under quiet conditions. However, the DVC-PPG system provided more stable performance than the DVC-Mels under noisy test conditions. In addition, the results of the listening test showed that the speech-intelligibility performance of DVC-PPG was better than those obtained via the dysarthria speech and DVC-Mels under the duplicate and outside conditions, respectively. CONCLUSIONS: The objective-evaluation metric and listening test results showed that the recognition rate of the proposed DVC-PPG system was significantly higher than those obtained via the original dysarthric speech and DVC-Mels system. Therefore, it can be inferred from our study that the DVC-PPG system can improve the ability of dysarthric patients to communicate with people under challenging application conditions.

Assuntos

Inteligibilidade da Fala; Voz; Disartria; Humanos; Fonética; Medida da Produção da Fala

Palavras-chave

Deep learning; Dysarthric patient; Phonetic posteriorgram; Voice conversion

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Inteligibilidade da Fala / Voz Limite: Humans Idioma: En Revista: Comput Methods Programs Biomed Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Taiwan País de publicação: Irlanda

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google