Partial compensation for coarticulatory vowel nasalization across concatenative and neural text-to-speech.

Zellou, Georgia; Cohn, Michelle; Block, Aleese

Zellou, Georgia; Cohn, Michelle; Block, Aleese.

Zellou G; Phonetics Laboratory, Linguistics Department, University of California, Davis, 469 Kerr Hall, One Shields Avenue, Davis, California 95616, USA.
Cohn M; Phonetics Laboratory, Linguistics Department, University of California, Davis, 469 Kerr Hall, One Shields Avenue, Davis, California 95616, USA.
Block A; Phonetics Laboratory, Linguistics Department, University of California, Davis, 469 Kerr Hall, One Shields Avenue, Davis, California 95616, USA.

J Acoust Soc Am ; 149(5): 3424, 2021 05.

Article en En | MEDLINE | ID: mdl-34241128

ABSTRACT

ABSTRACT

This study investigates the perception of coarticulatory vowel nasality generated using different text-to-speech (TTS) methods in American English. Experiment 1 compared concatenative and neural TTS using a 4IAX task, where listeners discriminated between a word pair containing either both oral or nasalized vowels and a word pair containing one oral and one nasalized vowel. Vowels occurred either in identical or alternating consonant contexts across pairs to reveal perceptual sensitivity and compensatory behavior, respectively. For identical contexts, listeners were better at discriminating between oral and nasalized vowels in neural than in concatenative TTS for nasalized same-vowel trials, but better discrimination for concatenative TTS was observed for oral same-vowel trials. Meanwhile, listeners displayed less compensation for coarticulation in neural than in concatenative TTS. To determine whether apparent roboticity of the TTS voice shapes vowel discrimination and compensation patterns, a "roboticized" version of neural TTS was generated (monotonized f0 and addition of an echo), holding phonetic nasality constant; a ratings study (experiment 2) confirmed that the manipulation resulted in different apparent roboticity. Experiment 3 compared the discrimination of unmodified neural TTS and roboticized neural TTS listeners displayed lower accuracy in identical contexts for roboticized relative to unmodified neural TTS, yet the performances in alternating contexts were similar.

Asunto(s)

Percepción del Habla; Voz; Lenguaje; Fonética; Habla; Acústica del Lenguaje

Texto completo

Imprimir

XML

PubMed Links

Search on Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Percepción del Habla / Voz Idioma: En Año: 2021 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Search on Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Percepción del Habla / Voz Idioma: En Año: 2021 Tipo del documento: Article