Search | VHL Regional Portal

Distinct mechanisms for talker adaptation operate in parallel on different timescales.

Choi, Ja Young; Kou, Rita S N; Perrachione, Tyler K.

Psychon Bull Rev ; 29(2): 627-634, 2022 Apr.

Article in English | MEDLINE | ID: mdl-34731443

ABSTRACT

The mapping between speech acoustics and phonemic representations is highly variable across talkers, and listeners are slower to recognize words when listening to multiple talkers compared with a single talker. Listeners' speech processing efficiency in mixed-talker settings improves when given time to reorient their attention to each new talker. However, it remains unknown how much time is needed to fully reorient attention to a new talker in mixed-talker settings so that speech processing becomes as efficient as when listening to a single talker. In this study, we examined how speech processing efficiency improves in mixed-talker settings as a function of the duration of continuous speech from a talker. In single-talker and mixed-talker conditions, listeners identified target words either in isolation or preceded by a carrier vowel of parametrically varying durations from 300 to 1,500 ms. Listeners' word identification was significantly slower in every mixed-talker condition compared with the corresponding single-talker condition. The costs associated with processing mixed-talker speech declined significantly as the duration of the speech carrier increased from 0 to 600 ms. However, increasing the carrier duration beyond 600 ms did not achieve further reduction in talker variability-related processing costs. These results suggest that two parallel mechanisms support processing talker variability: A stimulus-driven mechanism that operates on short timescales to reorient attention to new auditory sources, and a top-down mechanism that operates over longer timescales to allocate the cognitive resources needed to accommodate uncertainty in acoustic-phonemic correspondences during contexts where speech may come from multiple talkers.

Subject(s)

Speech Perception , Adaptation, Physiological , Auditory Perception , Humans , Speech , Speech Acoustics

Cortical mechanisms of talker normalization in fluent sentences.

Uddin, Sophia; Reis, Katherine S; Heald, Shannon L M; Van Hedger, Stephen C; Nusbaum, Howard C.

Brain Lang ; 201: 104722, 2020 02.

Article in English | MEDLINE | ID: mdl-31835154

ABSTRACT

Adjusting to the vocal characteristics of a new talker is important for speech recognition. Previous research has indicated that adjusting to talker differences is an active cognitive process that depends on attention and working memory (WM). These studies have not examined how talker variability affects perception and neural responses in fluent speech. Here we use source analysis from high-density EEG to show that perceiving fluent speech in which the talker changes recruits early involvement of parietal and temporal cortical areas, suggesting functional involvement of WM and attention in talker normalization. We extend these findings to acoustic source change in general by examining understanding environmental sounds in spoken sentence context. Though there may be differences in cortical recruitment to processing demands for non-speech sounds versus a changing talker, the underlying mechanisms are similar, supporting the view that shared cognitive-general mechanisms assist both talker normalization and speech-to-nonspeech transitions.

Subject(s)

Cerebral Cortex/physiology , Speech Perception , Adult , Attention , Comprehension , Female , Humans , Male , Memory, Short-Term , Speech Acoustics , Voice

Time and information in perceptual adaptation to speech.

Choi, Ja Young; Perrachione, Tyler K.

Cognition ; 192: 103982, 2019 11.

Article in English | MEDLINE | ID: mdl-31229740

ABSTRACT

Perceptual adaptation to a talker enables listeners to efficiently resolve the many-to-many mapping between variable speech acoustics and abstract linguistic representations. However, models of speech perception have not delved into the variety or the quantity of information necessary for successful adaptation, nor how adaptation unfolds over time. In three experiments using speeded classification of spoken words, we explored how the quantity (duration), quality (phonetic detail), and temporal continuity of talker-specific context contribute to facilitating perceptual adaptation to speech. In single- and mixed-talker conditions, listeners identified phonetically-confusable target words in isolation or preceded by carrier phrases of varying lengths and phonetic content, spoken by the same talker as the target word. Word identification was always slower in mixed-talker conditions than single-talker ones. However, interference from talker variability decreased as the duration of preceding speech increased but was not affected by the amount of preceding talker-specific phonetic information. Furthermore, efficiency gains from adaptation depended on temporal continuity between preceding speech and the target word. These results suggest that perceptual adaptation to speech may be understood via models of auditory streaming, where perceptual continuity of an auditory object (e.g., a talker) facilitates allocation of attentional resources, resulting in more efficient perceptual processing.

Subject(s)

Adaptation, Psychological , Phonetics , Speech Acoustics , Speech Perception , Acoustic Stimulation , Adult , Attention , Female , Humans , Male , Time Factors , Young Adult

Talker normalization in typical Cantonese-speaking listeners and congenital amusics: Evidence from event-related potentials.

Shao, Jing; Zhang, Caicai.

Neuroimage Clin ; 23: 101814, 2019.

Article in English | MEDLINE | ID: mdl-30978657

ABSTRACT

Despite the lack of invariance in the mapping between the acoustic signal and phonological representation, typical listeners are capable of using information of a talker's vocal characteristics to recognize phonemes, a process known as "talker normalization". The current study investigated the time course of talker normalization in typical listeners and individuals with congenital amusia, a neurodevelopmental disorder of refined pitch processing. We examined the event-related potentials (ERPs) underling lexical tone processing in 24 Cantonese-speaking amusics and 24 typical listeners (controls) in two conditions: blocked-talker and mixed-talker conditions. The results demonstrated that for typical listeners, effects of talker variability can be observed as early as in the N1 time-window (100-150â¯ms), with the N1 amplitude reduced in the mixed-talker condition. Significant effects were also found in later components: the N2b/c peaked significantly earlier and the P3a and P3b amplitude was enhanced in the blocked-talker condition relative to the mixed-talker condition, especially for the tone pair that is more difficult to discriminate. These results suggest that the blocked-talker mode of stimulus presentation probably facilitates auditory processing and requires less attentional effort with easier speech categorization than the mixed-talker condition, providing neural evidence for the "active control theory". On the other hand, amusics exhibited comparable N1 amplitude to controls in both conditions, but deviated from controls in later components. They demonstrated overall later N2b/c peak latency significantly reduced P3a amplitude in the blocked-talker condition and reduced P3b amplitude irrespective of talker conditions. These results suggest that the amusic brain was intact in the auditory processing of talker normalization processes, as reflected by the comparable N1 amplitude, but exhibited reduced automatic attentional switch to tone changes in the blocked-talker condition, as captured by the reduced P3a amplitude, which presumably underlies a previously reported perceptual "anchoring" deficit in amusics. Altogether, these findings revealed the time course of talker normalization processes in typical listeners and extended the finding that conscious pitch processing is impaired in the amusic brain.

Subject(s)

Auditory Perceptual Disorders/physiopathology , Brain/physiopathology , Pitch Perception/physiology , Speech Perception/physiology , Acoustic Stimulation , Adolescent , Adult , Auditory Perceptual Disorders/psychology , Evoked Potentials, Auditory , Female , Humans , Male , Phonetics , Young Adult

Audiovisual perceptual learning with multiple speakers.

Mitchel, Aaron D; Gerfen, Chip; Weiss, Daniel J.

J Phon ; 56: 66-74, 2016 May.

Article in English | MEDLINE | ID: mdl-28867850

ABSTRACT

One challenge for speech perception is between-speaker variability in the acoustic parameters of speech. For example, the same phoneme (e.g. the vowel in "cat") may have substantially different acoustic properties when produced by two different speakers and yet the listener must be able to interpret these disparate stimuli as equivalent. Perceptual tuning, the use of contextual information to adjust phonemic representations, may be one mechanism that helps listeners overcome obstacles they face due to this variability during speech perception. Here we test whether visual contextual cues to speaker identity may facilitate the formation and maintenance of distributional representations for individual speakers, allowing listeners to adjust phoneme boundaries in a speaker-specific manner. We familiarized participants to an audiovisual continuum between /aba/ and /ada/. During familiarization, the "b-face" mouthed /aba/ when an ambiguous token was played, while the "D-face" mouthed /ada/. At test, the same ambiguous token was more likely to be identified as /aba/ when paired with a stilled image of the "b-face" than with an image of the "D-face." This was not the case in the control condition when the two faces were paired equally with the ambiguous token. Together, these results suggest that listeners may form speaker-specific phonemic representations using facial identity cues.

Talker variability in audio-visual speech perception.

Heald, Shannon L M; Nusbaum, Howard C.

Front Psychol ; 5: 698, 2014.

Article in English | MEDLINE | ID: mdl-25076919

ABSTRACT

A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.

Achieving constancy in spoken word identification: time course of talker normalization.

Zhang, Caicai; Peng, Gang; Wang, William S-Y.

Brain Lang ; 126(2): 193-202, 2013 Aug.

Article in English | MEDLINE | ID: mdl-23792769

ABSTRACT

This event-related potential (ERP) study examines the time course of context-dependent talker normalization in spoken word identification. We found three ERP components, the N1 (100-220 ms), the N400 (250-500 ms) and the Late Positive Component (500-800 ms), which are conjectured to involve (a) auditory processing, (b) talker normalization and lexical retrieval, and (c) decisional process/lexical selection respectively. Talker normalization likely occurs in the time window of the N400 and overlaps with the lexical retrieval process. Compared with the nonspeech context, the speech contexts, no matter whether they have semantic content or not, enable listeners to tune to a talker's pitch range. In this way, speech contexts induce more efficient talker normalization during the activation of potential lexical candidates and lead to more accurate selection of the intended word in spoken word identification.

Subject(s)

Brain/physiology , Evoked Potentials/physiology , Speech Perception/physiology , Electroencephalography , Female , Humans , Male , Signal Processing, Computer-Assisted , Time Factors , Young Adult

Tuned with a Tune: Talker Normalization via General Auditory Processes.

Laing, Erika J C; Liu, Ran; Lotto, Andrew J; Holt, Lori L.

Front Psychol ; 3: 203, 2012.

Article in English | MEDLINE | ID: mdl-22737140

ABSTRACT

Voices have unique acoustic signatures, contributing to the acoustic variability listeners must contend with in perceiving speech, and it has long been proposed that listeners normalize speech perception to information extracted from a talker's speech. Initial attempts to explain talker normalization relied on extraction of articulatory referents, but recent studies of context-dependent auditory perception suggest that general auditory referents such as the long-term average spectrum (LTAS) of a talker's speech similarly affect speech perception. The present study aimed to differentiate the contributions of articulatory/linguistic versus auditory referents for context-driven talker normalization effects and, more specifically, to identify the specific constraints under which such contexts impact speech perception. Synthesized sentences manipulated to sound like different talkers influenced categorization of a subsequent speech target only when differences in the sentences' LTAS were in the frequency range of the acoustic cues relevant for the target phonemic contrast. This effect was true both for speech targets preceded by spoken sentence contexts and for targets preceded by non-speech tone sequences that were LTAS-matched to the spoken sentence contexts. Specific LTAS characteristics, rather than perceived talker, predicted the results suggesting that general auditory mechanisms play an important role in effects considered to be instances of perceptual talker normalization.

Listening for the norm: adaptive coding in speech categorization.

Huang, Jingyuan; Holt, Lori L.

Front Psychol ; 3: 10, 2012.

Article in English | MEDLINE | ID: mdl-22347198

ABSTRACT

Perceptual aftereffects have been referred to as "the psychologist's microelectrode" because they can expose dimensions of representation through the residual effect of a context stimulus upon perception of a subsequent target. The present study uses such context-dependence to examine the dimensions of representation involved in a classic demonstration of "talker normalization" in speech perception. Whereas most accounts of talker normalization have emphasized talker-, speech-, or articulatory-specific dimensions' significance, the present work tests an alternative hypothesis: that the long-term average spectrum (LTAS) of speech context is responsible for patterns of context-dependent perception considered to be evidence for talker normalization. In support of this hypothesis, listeners' vowel categorization was equivalently influenced by speech contexts manipulated to sound as though they were spoken by different talkers and non-speech analogs matched in LTAS to the speech contexts. Since the non-speech contexts did not possess talker, speech, or articulatory information, general perceptual mechanisms are implicated. Results are described in terms of adaptive perceptual coding.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL