Búsqueda | Biblioteca Virtual en Salud

Comparing perception of L1 and L2 English by human listeners and machines: Effect of interlocutor adaptationsa).

Vonessen, Jules; Aoki, Nicholas B; Cohn, Michelle; Zellou, Georgia.

J Acoust Soc Am ; 155(5): 3060-3070, 2024 May 01.

Artículo en Inglés | MEDLINE | ID: mdl-38717210

RESUMEN

Speakers tailor their speech to different types of interlocutors. For example, speech directed to voice technology has different acoustic-phonetic characteristics than speech directed to a human. The present study investigates the perceptual consequences of human- and device-directed registers in English. We compare two groups of speakers: participants whose first language is English (L1) and bilingual L1 Mandarin-L2 English talkers. Participants produced short sentences in several conditions: an initial production and a repeat production after a human or device guise indicated either understanding or misunderstanding. In experiment 1, a separate group of L1 English listeners heard these sentences and transcribed the target words. In experiment 2, the same productions were transcribed by an automatic speech recognition (ASR) system. Results show that transcription accuracy was highest for L1 talkers for both human and ASR transcribers. Furthermore, there were no overall differences in transcription accuracy between human- and device-directed speech. Finally, while human listeners showed an intelligibility benefit for coda repair productions, the ASR transcriber did not benefit from these enhancements. Findings are discussed in terms of models of register adaptation, phonetic variation, and human-computer interaction.

Asunto(s)

Multilingüismo , Inteligibilidad del Habla , Percepción del Habla , Humanos , Masculino , Femenino , Adulto , Adulto Joven , Acústica del Lenguaje , Fonética , Software de Reconocimiento del Habla

The perception of nasal coarticulatory variation in face-masked speech.

Zellou, Georgia; Pycha, Anne; Cohn, Michelle.

J Acoust Soc Am ; 153(2): 1084, 2023 02.

Artículo en Inglés | MEDLINE | ID: mdl-36859167

RESUMEN

This study investigates the impact of wearing a face mask on the production and perception of coarticulatory vowel nasalization. Speakers produced monosyllabic American English words with oral and nasal codas (i.e., CVC and CVN) in face-masked and un-face-masked conditions to a real human interlocutor. The vowel was either tense or lax. Acoustic analyses indicate that speakers produced greater coarticulatory vowel nasality in CVN items when wearing a face mask, particularly, when the vowel is lax, suggesting targeted enhancement of the oral-nasalized contrast in this condition. This enhancement is not observed for tense vowels. In a perception study, participants heard CV syllables excised from the recorded words and performed coda identifications. For lax vowels, listeners were more accurate at identifying the coda in the face-masked condition, indicating that they benefited from the speakers' production adjustments. Overall, the results indicate that speakers adapt their speech in specific contexts when wearing a face mask, and these speaker adjustments have an influence on listeners' abilities to identify words in the speech signal.

Asunto(s)

Máscaras , Habla , Humanos , Equipo de Protección Personal , Acústica , Percepción

Partial compensation for coarticulatory vowel nasalization across concatenative and neural text-to-speech.

Zellou, Georgia; Cohn, Michelle; Block, Aleese.

J Acoust Soc Am ; 149(5): 3424, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-34241128

RESUMEN

This study investigates the perception of coarticulatory vowel nasality generated using different text-to-speech (TTS) methods in American English. Experiment 1 compared concatenative and neural TTS using a 4IAX task, where listeners discriminated between a word pair containing either both oral or nasalized vowels and a word pair containing one oral and one nasalized vowel. Vowels occurred either in identical or alternating consonant contexts across pairs to reveal perceptual sensitivity and compensatory behavior, respectively. For identical contexts, listeners were better at discriminating between oral and nasalized vowels in neural than in concatenative TTS for nasalized same-vowel trials, but better discrimination for concatenative TTS was observed for oral same-vowel trials. Meanwhile, listeners displayed less compensation for coarticulation in neural than in concatenative TTS. To determine whether apparent roboticity of the TTS voice shapes vowel discrimination and compensation patterns, a "roboticized" version of neural TTS was generated (monotonized f0 and addition of an echo), holding phonetic nasality constant; a ratings study (experiment 2) confirmed that the manipulation resulted in different apparent roboticity. Experiment 3 compared the discrimination of unmodified neural TTS and roboticized neural TTS: listeners displayed lower accuracy in identical contexts for roboticized relative to unmodified neural TTS, yet the performances in alternating contexts were similar.

Asunto(s)

Percepción del Habla , Voz , Lenguaje , Fonética , Habla , Acústica del Lenguaje

African American English speakers' pitch variation and rate adjustments for imagined technological and human addressees.

Cohn, Michelle; Mengesha, Zion; Lahav, Michal; Heldreth, Courtney.

JASA Express Lett ; 4(4)2024 Apr 01.

Artículo en Inglés | MEDLINE | ID: mdl-38687585

RESUMEN

This paper examines the adaptations African American English speakers make when imagining talking to a voice assistant, compared to a close friend/family member and to a stranger. Results show that speakers slowed their rate and produced less pitch variation in voice-assistant-"directed speech" (DS), relative to human-DS. These adjustments were not mediated by how often participants reported experiencing errors with automatic speech recognition. Overall, this paper addresses a limitation in the types of language varieties explored when examining technology-DS registers and contributes to our understanding of the dynamics of human-computer interaction.

Asunto(s)

Negro o Afroamericano , Humanos , Masculino , Femenino , Adulto , Imaginación , Habla , Lenguaje , Adulto Joven , Acústica del Lenguaje

Children and adults produce distinct technology- and human-directed speech.

Cohn, Michelle; Barreda, Santiago; Graf Estes, Katharine; Yu, Zhou; Zellou, Georgia.

Sci Rep ; 14(1): 15611, 2024 Jul 06.

Artículo en Inglés | MEDLINE | ID: mdl-38971806

RESUMEN

This study compares how English-speaking adults and children from the United States adapt their speech when talking to a real person and a smart speaker (Amazon Alexa) in a psycholinguistic experiment. Overall, participants produced more effortful speech when talking to a device (longer duration and higher pitch). These differences also varied by age: children produced even higher pitch in device-directed speech, suggesting a stronger expectation to be misunderstood by the system. In support of this, we see that after a staged recognition error by the device, children increased pitch even more. Furthermore, both adults and children displayed the same degree of variation in their responses for whether "Alexa seems like a real person or not", further indicating that children's conceptualization of the system's competence shaped their register adjustments, rather than an increased anthropomorphism response. This work speaks to models on the mechanisms underlying speech production, and human-computer interaction frameworks, providing support for routinized theories of spoken interaction with technology.

Asunto(s)

Habla , Humanos , Adulto , Niño , Masculino , Femenino , Habla/fisiología , Adulto Joven , Adolescente , Psicolingüística

Selective tuning of nasal coarticulation and hyperarticulation across slow-clear, casual, and fast-clear speech styles.

Cohn, Michelle; Zellou, Georgia.

JASA Express Lett ; 3(12)2023 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-38117232

RESUMEN

This study investigates how California English speakers adjust nasal coarticulation and hyperarticulation on vowels across three speech styles: speaking slowly and clearly (imagining a hard-of-hearing addressee), casually (imagining a friend/family member addressee), and speaking quickly and clearly (imagining being an auctioneer). Results show covariation in speaking rate and vowel hyperarticulation across the styles. Additionally, results reveal that speakers produce more extensive anticipatory nasal coarticulation in the slow-clear speech style, in addition to a slower speech rate. These findings are interpreted in terms of accounts of coarticulation in which speakers selectively tune their production of nasal coarticulation based on the speaking style.

Asunto(s)

Geraniaceae , Habla , Humanos , Amigos , Lenguaje , Nariz

Differences in a Musician's Advantage for Speech-in-Speech Perception Based on Age and Task.

Cohn, Michelle; Barreda, Santiago; Zellou, Georgia.

J Speech Lang Hear Res ; 66(2): 545-564, 2023 02 13.

Artículo en Inglés | MEDLINE | ID: mdl-36729698

RESUMEN

PURPOSE: This study investigates the debate that musicians have an advantage in speech-in-noise perception from years of targeted auditory training. We also consider the effect of age on any such advantage, comparing musicians and nonmusicians (age range: 18-66 years), all of whom had normal hearing. We manipulate the degree of fundamental frequency (f o) separation between the competing talkers, as well as use different tasks, to probe attentional differences that might shape a musician's advantage across ages. METHOD: Participants (ranging in age from 18 to 66 years) included 29 musicians and 26 nonmusicians. They completed two tasks varying in attentional demands: (a) a selective attention task where listeners identify the target sentence presented with a one-talker interferer (Experiment 1), and (b) a divided attention task where listeners hear two vowels played simultaneously and identify both competing vowels (Experiment 2). In both paradigms, f o separation was manipulated between the two voices (Δf o = 0, 0.156, 0.306, 1, 2, 3 semitones). RESULTS: Results show that increasing differences in f o separation lead to higher accuracy on both tasks. Additionally, we find evidence for a musician's advantage across the two studies. In the sentence identification task, younger adult musicians show higher accuracy overall, as well as a stronger reliance on f o separation. Yet, this advantage declines with musicians' age. In the double vowel identification task, musicians of all ages show an across-the-board advantage in detecting two vowels-and use f o separation more to aid in stream separation-but show no consistent difference in double vowel identification. CONCLUSIONS: Overall, we find support for a hybrid auditory encoding-attention account of music-to-speech transfer. The musician's advantage includes f o, but the benefit also depends on the attentional demands in the task and listeners' age. Taken together, this study suggests a complex relationship between age, musical experience, and speech-in-speech paradigm on a musician's advantage. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.21956777.

Asunto(s)

Música , Percepción del Habla , Adulto , Humanos , Adolescente , Adulto Joven , Persona de Mediana Edad , Anciano , Habla , Audición , Ruido , Atención

The clear speech intelligibility benefit for text-to-speech voices: Effects of speaking style and visual guise.

Aoki, Nicholas B; Cohn, Michelle; Zellou, Georgia.

JASA Express Lett ; 2(4): 045204, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-36154231

RESUMEN

This study examined how speaking style and guise influence the intelligibility of text-to-speech (TTS) and naturally produced human voices. Results showed that TTS voices were less intelligible overall. Although using a clear speech style improved intelligibility for both human and TTS voices (using "newscaster" neural TTS), the clear speech effect was stronger for TTS voices. Finally, a visual device guise decreased intelligibility, regardless of voice type. The results suggest that both speaking style and visual guise affect intelligibility of human and TTS voices. Findings are discussed in terms of theories about the role of social information in speech perception.

Asunto(s)

Percepción del Habla , Envío de Mensajes de Texto , Voz , Cognición , Humanos , Inteligibilidad del Habla

Intelligibility of face-masked speech depends on speaking style: Comparing casual, clear, and emotional speech.

Cohn, Michelle; Pycha, Anne; Zellou, Georgia.

Cognition ; 210: 104570, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-33450446

RESUMEN

This study investigates the impact of wearing a fabric face mask on speech comprehension, an underexplored topic that can inform theories of speech production. Speakers produced sentences in three speech styles (casual, clear, positive-emotional) while in both face-masked and non-face-masked conditions. Listeners were most accurate at word identification in multi-talker babble for sentences produced in clear speech, and less accurate for casual speech (with emotional speech accuracy numerically in between). In the clear speaking style, face-masked speech was actually more intelligible than non-face-masked speech, suggesting that speakers make clarity adjustments specifically for face masks. In contrast, in the emotional condition, face-masked speech was less intelligible than non-face-masked speech, and in the casual condition, no difference was observed, suggesting that 'emotional' and 'casual' speech are not styles produced with the explicit intent to be intelligible to listeners. These findings are discussed in terms of automatic and targeted speech adaptation accounts.

Asunto(s)

Máscaras , Percepción del Habla , Adaptación Fisiológica , Emociones , Humanos , Inteligibilidad del Habla

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

Detalles de la búsqueda