Automatic speech recognition (ASR) for the diagnosis of pronunciation of speech sound disorders in Korean children.

Ahn, Taekyung; Hong, Yeonjung; Im, Younggon; Kim, Do Hyung; Kang, Dayoung; Jeong, Joo Won; Kim, Jae Won; Kim, Min Jung; Cho, Ah-Ra; Nam, Hosung; Jang, Dae-Hyun

Ahn, Taekyung; Hong, Yeonjung; Im, Younggon; Kim, Do Hyung; Kang, Dayoung; Jeong, Joo Won; Kim, Jae Won; Kim, Min Jung; Cho, Ah-Ra; Nam, Hosung; Jang, Dae-Hyun.

Afiliação

Ahn T; Department of English Language and Literature, Korea University, Seoul, Republic of Korea.
Hong Y; AI R&D Group, MediaZen, Seongnam-si, Republic of Korea.
Im Y; AI R&D Group, MediaZen, Seongnam-si, Republic of Korea.
Kim DH; AI R&D Group, MediaZen, Seongnam-si, Republic of Korea.
Kang D; Department of Rehabilitation Medicine, Incheon St.Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.
Jeong JW; Department of Rehabilitation Medicine, Incheon St.Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.
Kim JW; Department of Rehabilitation Medicine, Incheon St.Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.
Kim MJ; Department of Rehabilitation Medicine, Incheon St.Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.
Cho AR; Department of Special Education, Dankook University, Youngin-si, Republic of Korea.
Nam H; Department of Rehabilitation Medicine, Eunpyeong St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.
Jang DH; Department of English Language and Literature, Korea University, Seoul, Republic of Korea.

Clin Linguist Phon ; : 1-14, 2024 Aug 20.

Article em En | MEDLINE | ID: mdl-39162064

ABSTRACT

ABSTRACT

This study presents a model of automatic speech recognition (ASR) that is designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Because ASR models trained for general purposes mainly predict input speech into standard spelling words, well-known high-performance ASR models are not suitable for evaluating pronunciation in children with SSDs. We fine-tuned the wav2vec2.0 XLS-R model to recognise words as they are pronounced by children, rather than converting the speech into their standard spelling words. The model was fine-tuned with a speech dataset of 137 children with SSDs pronouncing 73 Korean words that are selected for actual clinical diagnosis. The model's Phoneme Error Rate (PER) was only 10% when its predictions of children's pronunciations were compared to human annotations of pronunciations as heard. In contrast, despite its robust performance on general tasks, the state-of-the-art ASR model Whisper showed limitations in recognising the speech of children with SSDs, with a PER of approximately 50%. While the model still requires improvement in terms of the recognition of unclear pronunciation, this study demonstrates that ASR models can streamline complex pronunciation error diagnostic procedures in clinical fields.

Palavras-chave

Automatic speech recognition; Korean; children; mispronunciation detection; speech sound disorders

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Clin Linguist Phon Assunto da revista: PATOLOGIA DA FALA E LINGUAGEM Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google