RESUMO
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle the effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise-vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (nondegraded) version of the speech. This intermediate priming, which generates a "pop-out" percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affect acoustic and linguistic neural representations using multivariate temporal response functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. mTRFs analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex, in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Assuntos
Inteligibilidade da Fala , Percepção da Fala , Inteligibilidade da Fala/fisiologia , Estimulação Acústica/métodos , Fala/fisiologia , Ruído , Acústica , Magnetoencefalografia/métodos , Percepção da Fala/fisiologiaRESUMO
Humans have an impressive ability to comprehend signal-degraded speech; however, the extent to which comprehension of degraded speech relies on human-specific features of speech perception vs. more general cognitive processes is unknown. Since dogs live alongside humans and regularly hear speech, they can be used as a model to differentiate between these possibilities. One often-studied type of degraded speech is noise-vocoded speech (sometimes thought of as cochlear-implant-simulation speech). Noise-vocoded speech is made by dividing the speech signal into frequency bands (channels), identifying the amplitude envelope of each individual band, and then using these envelopes to modulate bands of noise centered over the same frequency regions - the result is a signal with preserved temporal cues, but vastly reduced frequency information. Here, we tested dogs' recognition of familiar words produced in 16-channel vocoded speech. In the first study, dogs heard their names and unfamiliar dogs' names (foils) in vocoded speech as well as natural speech. In the second study, dogs heard 16-channel vocoded speech only. Dogs listened longer to their vocoded name than vocoded foils in both experiments, showing that they can comprehend a 16-channel vocoded version of their name without prior exposure to vocoded speech, and without immediate exposure to the natural-speech version of their name. Dogs' name recognition in the second study was mediated by the number of phonemes in the dogs' name, suggesting that phonological context plays a role in degraded speech comprehension.
Assuntos
Percepção da Fala , Fala , Humanos , Animais , Cães , Sinais (Psicologia) , Audição , LinguísticaRESUMO
Infants begin to segment word forms from fluent speech-a crucial task in lexical processing-between 4 and 7 months of age. Prior work has established that infants rely on a variety of cues available in the speech signal (i.e., prosodic, statistical, acoustic-segmental, and lexical) to accomplish this task. In two experiments with French-learning 6- and 10-month-olds, we use a psychoacoustic approach to examine if and how degradation of the two fundamental acoustic components extracted from speech by the auditory system, namely, temporal (both frequency and amplitude modulation) and spectral information, impact word form segmentation. Infants were familiarized with passages containing target words, in which frequency modulation (FM) information was replaced with pure tones using a vocoder, while amplitude modulation (AM) was preserved in either 8 or 16 spectral bands. Infants were then tested on their recognition of the target versus novel control words. While the 6-month-olds were unable to segment in either condition, the 10-month-olds succeeded, although only in the 16 spectral band condition. These findings suggest that 6-month-olds need FM temporal cues for speech segmentation while 10-month-olds do not, although they need the AM cues to be presented in enough spectral bands (i.e., 16). This developmental change observed in infants' sensitivity to spectrotemporal cues likely results from an increase in the range of available segmentation procedures, and/or shift from a vowel to a consonant bias in lexical processing between the two ages, as vowels are more affected by our acoustic manipulations. RESEARCH HIGHLIGHTS: Although segmenting speech into word forms is crucial for lexical acquisition, the acoustic information that infants' auditory system extracts to process continuous speech remains unknown. We examined infants' sensitivity to spectrotemporal cues in speech segmentation using vocoded speech, and revealed a developmental change between 6 and 10 months of age. We showed that FM information, that is, the fast temporal modulations of speech, is necessary for 6- but not 10-month-old infants to segment word forms. Moreover, reducing the number of spectral bands impacts 10-month-olds' segmentation abilities, who succeed when 16 bands are preserved, but fail with 8 bands.
Assuntos
Desenvolvimento da Linguagem , Percepção da Fala , Humanos , Lactente , Percepção da Fala/fisiologia , Feminino , Masculino , Estimulação Acústica , Sinais (Psicologia) , Fonética , Fala/fisiologia , Acústica da Fala , PsicoacústicaRESUMO
Listening to speech with poor signal quality is challenging. Neural speech tracking of degraded speech has been used to advance the understanding of how brain processes and speech intelligibility are interrelated. However, the temporal dynamics of neural speech tracking and their relation to speech intelligibility are not clear. In the present MEG study, we exploited temporal response functions (TRFs), which has been used to describe the time course of speech tracking on a gradient from intelligible to unintelligible degraded speech. In addition, we used inter-related facets of neural speech tracking (e.g., speech envelope reconstruction, speech-brain coherence, and components of broadband coherence spectra) to endorse our findings in TRFs. Our TRF analysis yielded marked temporally differential effects of vocoding: â¼50-110 ms (M50TRF), â¼175-230 ms (M200TRF), and â¼315-380 ms (M350TRF). Reduction of intelligibility went along with large increases of early peak responses M50TRF, but strongly reduced responses in M200TRF. In the late responses M350TRF, the maximum response occurred for degraded speech that was still comprehensible then declined with reduced intelligibility. Furthermore, we related the TRF components to our other neural "tracking" measures and found that M50TRF and M200TRF play a differential role in the shifting center frequency of the broadband coherence spectra. Overall, our study highlights the importance of time-resolved computation of neural speech tracking and decomposition of coherence spectra and provides a better understanding of degraded speech processing.
Assuntos
Inteligibilidade da Fala , Percepção da Fala , Humanos , Inteligibilidade da Fala/fisiologia , Percepção da Fala/fisiologia , Encéfalo/fisiologia , Percepção Auditiva , Cognição , Estimulação AcústicaRESUMO
Listeners can adapt to acoustically degraded speech with perceptual training. The learning processes for long periods underlies the rehabilitation of patients with hearing aids or cochlear implants. Perceptual learning of acoustically degraded speech has been associated with the frontotemporal cortices. However, neural processes during and after long-term perceptual learning remain unclear. Here we conducted perceptual training of noise-vocoded speech sounds (NVSS), which is spectrally degraded signals, and measured the cortical activity for seven days and the follow up testing (approximately 1 year later) to investigate changes in neural activation patterns using functional magnetic resonance imaging. We demonstrated that young adult participants (n = 5) improved their performance across seven experimental days, and the gains were maintained after 10 months or more. Representational similarity analysis showed that the neural activation patterns of NVSS relative to clear speech in the left posterior superior temporal sulcus (pSTS) were significantly different across seven training days, accompanying neural changes in frontal cortices. In addition, the distinct activation patterns to NVSS in the frontotemporal cortices were also observed 10-13 months after the training. We, therefore, propose that perceptual training can induce plastic changes and long-term effects on neural representations of the trained degraded speech in the frontotemporal cortices. These behavioral improvements and neural changes induced by the perceptual learning of degraded speech will provide insights into cortical mechanisms underlying adaptive processes in difficult listening situations and long-term rehabilitation of auditory disorders.
Assuntos
Percepção da Fala , Fala , Adulto Jovem , Humanos , Animais , Fala/fisiologia , Percepção da Fala/fisiologia , Estimulação Acústica , Aprendizagem/fisiologia , Percepção AuditivaRESUMO
This study investigated brain activation during auditory processing as a biomarker for the prediction of future perceptual learning performance. Cochlear implant simulated sounds (vocoded sounds) are degraded signals. Participants with normal hearing who were trained with these ambiguous sounds showed varied speech comprehension levels. We discovered that the neuronal signatures from untrained participants forecasted their future ambiguous speech comprehension levels. Participants' brain activations for auditory information processing were measured before (t1) they underwent a five-day vocoded sounds training session. We showed that the pre-training (t1) activities in the inferior frontal gyrus (IFG) correlate with the fifth-day (t2) vocoded sound comprehension performance. To further predict participants' future (t2) performances, we split the participants into two groups (i.e., good and bad learners) based on their fifth-day performance; a linear support vector machine (SVM) was trained to classify (predict) the remaining participants' groups. We found that pre-training (t1) fMRI activities in the bilateral IFG, angular gyrus (AG), and supramarginal gyrus (SMG) showed discriminability between future (t2) good and bad learners. These findings suggest that neural correlates of individual differences in auditory processing can potentially be used to predict participants' future cognition and behaviors.
Assuntos
Implantes Cocleares , Percepção da Fala , Estimulação Acústica , Mapeamento Encefálico , Compreensão/fisiologia , Humanos , Individualidade , Imageamento por Ressonância Magnética , Fala/fisiologia , Percepção da Fala/fisiologiaRESUMO
In this study we used functional near-infrared spectroscopy (fNIRS) to investigate neural responses in normal-hearing adults as a function of speech recognition accuracy, intelligibility of the speech stimulus, and the manner in which speech is distorted. Participants listened to sentences and reported aloud what they heard. Speech quality was distorted artificially by vocoding (simulated cochlear implant speech) or naturally by adding background noise. Each type of distortion included high and low-intelligibility conditions. Sentences in quiet were used as baseline comparison. fNIRS data were analyzed using a newly developed image reconstruction approach. First, elevated cortical responses in the middle temporal gyrus (MTG) and middle frontal gyrus (MFG) were associated with speech recognition during the low-intelligibility conditions. Second, activation in the MTG was associated with recognition of vocoded speech with low intelligibility, whereas MFG activity was largely driven by recognition of speech in background noise, suggesting that the cortical response varies as a function of distortion type. Lastly, an accuracy effect in the MFG demonstrated significantly higher activation during correct perception relative to incorrect perception of speech. These results suggest that normal-hearing adults (i.e., untrained listeners of vocoded stimuli) do not exploit the same attentional mechanisms of the frontal cortex used to resolve naturally degraded speech and may instead rely on segmental and phonetic analyses in the temporal lobe to discriminate vocoded speech.
Assuntos
Estimulação Acústica/métodos , Implantes Cocleares , Lobo Frontal/fisiologia , Inteligibilidade da Fala/fisiologia , Percepção da Fala/fisiologia , Lobo Temporal/fisiologia , Adolescente , Adulto , Feminino , Lobo Frontal/diagnóstico por imagem , Humanos , Masculino , Ruído/efeitos adversos , Espectroscopia de Luz Próxima ao Infravermelho/métodos , Lobo Temporal/diagnóstico por imagem , Adulto JovemRESUMO
Human perception is shaped by past experience on multiple timescales. Sudden and dramatic changes in perception occur when prior knowledge or expectations match stimulus content. These immediate effects contrast with the longer-term, more gradual improvements that are characteristic of perceptual learning. Despite extensive investigation of these two experience-dependent phenomena, there is considerable debate about whether they result from common or dissociable neural mechanisms. Here we test single- and dual-mechanism accounts of experience-dependent changes in perception using concurrent magnetoencephalographic and EEG recordings of neural responses evoked by degraded speech. When speech clarity was enhanced by prior knowledge obtained from matching text, we observed reduced neural activity in a peri-auditory region of the superior temporal gyrus (STG). Critically, longer-term improvements in the accuracy of speech recognition following perceptual learning resulted in reduced activity in a nearly identical STG region. Moreover, short-term neural changes caused by prior knowledge and longer-term neural changes arising from perceptual learning were correlated across subjects with the magnitude of learning-induced changes in recognition accuracy. These experience-dependent effects on neural processing could be dissociated from the neural effect of hearing physically clearer speech, which similarly enhanced perception but increased rather than decreased STG responses. Hence, the observed neural effects of prior knowledge and perceptual learning cannot be attributed to epiphenomenal changes in listening effort that accompany enhanced perception. Instead, our results support a predictive coding account of speech perception; computational simulations show how a single mechanism, minimization of prediction error, can drive immediate perceptual effects of prior knowledge and longer-term perceptual learning of degraded speech.
Assuntos
Modelos Neurológicos , Fonética , Inteligibilidade da Fala , Percepção da Fala/fisiologia , Lobo Temporal/fisiologia , Adolescente , Adulto , Mapeamento Encefálico , Simulação por Computador , Eletroencefalografia , Feminino , Humanos , Aprendizagem/fisiologia , Magnetoencefalografia , Masculino , Imagem Multimodal , Fatores de Tempo , Adulto JovemRESUMO
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography (MEG) recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (non-degraded) version of the speech. This intermediate priming, which generates a 'pop-out' percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affects acoustic and linguistic neural representations using multivariate Temporal Response Functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. TRF analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming, but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex (PFC), in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
RESUMO
Traditionally, speech perception training paradigms have not adequately taken into account the possibility that there may be modality-specific requirements for perceptual learning with auditory-only (AO) versus visual-only (VO) speech stimuli. The study reported here investigated the hypothesis that there are modality-specific differences in how prior information is used by normal-hearing participants during vocoded versus VO speech training. Two different experiments, one with vocoded AO speech (Experiment 1) and one with VO, lipread, speech (Experiment 2), investigated the effects of giving different types of prior information to trainees on each trial during training. The training was for four ~20 min sessions, during which participants learned to label novel visual images using novel spoken words. Participants were assigned to different types of prior information during training: Word Group trainees saw a printed version of each training word (e.g., "tethon"), and Consonant Group trainees saw only its consonants (e.g., "t_th_n"). Additional groups received no prior information (i.e., Experiment 1, AO Group; Experiment 2, VO Group) or a spoken version of the stimulus in a different modality from the training stimuli (Experiment 1, Lipread Group; Experiment 2, Vocoder Group). That is, in each experiment, there was a group that received prior information in the modality of the training stimuli from the other experiment. In both experiments, the Word Groups had difficulty retaining the novel words they attempted to learn during training. However, when the training stimuli were vocoded, the Word Group improved their phoneme identification. When the training stimuli were visual speech, the Consonant Group improved their phoneme identification and their open-set sentence lipreading. The results are considered in light of theoretical accounts of perceptual learning in relationship to perceptual modality.
RESUMO
Speech perception performance for degraded speech can improve with practice or exposure. Such perceptual learning is thought to be reliant on attention and theoretical accounts like the predictive coding framework suggest a key role for attention in supporting learning. However, it is unclear whether speech perceptual learning requires undivided attention. We evaluated the role of divided attention in speech perceptual learning in two online experiments (N = 336). Experiment 1 tested the reliance of perceptual learning on undivided attention. Participants completed a speech recognition task where they repeated forty noise-vocoded sentences in a between-group design. Participants performed the speech task alone or concurrently with a domain-general visual task (dual task) at one of three difficulty levels. We observed perceptual learning under divided attention for all four groups, moderated by dual-task difficulty. Listeners in easy and intermediate visual conditions improved as much as the single-task group. Those who completed the most challenging visual task showed faster learning and achieved similar ending performance compared to the single-task group. Experiment 2 tested whether learning relies on domain-specific or domain-general processes. Participants completed a single speech task or performed this task together with a dual task aiming to recruit domain-specific (lexical or phonological), or domain-general (visual) processes. All secondary task conditions produced patterns and amount of learning comparable to the single speech task. Our results demonstrate that the impact of divided attention on perceptual learning is not strictly dependent on domain-general or domain-specific processes and speech perceptual learning persists under divided attention.
Assuntos
Percepção da Fala , Fala , Humanos , Aprendizagem , Ruído/efeitos adversos , IdiomaRESUMO
When listening to degraded speech, such as speech delivered by a cochlear implant (CI), listeners make use of top-down linguistic knowledge to facilitate speech recognition. Lexical knowledge supports speech recognition and enhances the perceived clarity of speech. Yet, the extent to which lexical knowledge can be used to effectively compensate for degraded input may depend on the degree of degradation and the listener's age. The current study investigated lexical effects in the compensation for speech that was degraded via noise-vocoding in younger and older listeners. In an online experiment, younger and older normal-hearing (NH) listeners rated the clarity of noise-vocoded sentences on a scale from 1 ("very unclear") to 7 ("completely clear"). Lexical information was provided by matching text primes and the lexical content of the target utterance. Half of the sentences were preceded by a matching text prime, while half were preceded by a non-matching prime. Each sentence also consisted of three key words of high or low lexical frequency and neighborhood density. Sentences were processed to simulate CI hearing, using an eight-channel noise vocoder with varying filter slopes. Results showed that lexical information impacted the perceived clarity of noise-vocoded speech. Noise-vocoded speech was perceived as clearer when preceded by a matching prime, and when sentences included key words with high lexical frequency and low neighborhood density. However, the strength of the lexical effects depended on the level of degradation. Matching text primes had a greater impact for speech with poorer spectral resolution, but lexical content had a smaller impact for speech with poorer spectral resolution. Finally, lexical information appeared to benefit both younger and older listeners. Findings demonstrate that lexical knowledge can be employed by younger and older listeners in cognitive compensation during the processing of noise-vocoded speech. However, lexical content may not be as reliable when the signal is highly degraded. Clinical implications are that for adult CI users, lexical knowledge might be used to compensate for the degraded speech signal, regardless of age, but some CI users may be hindered by a relatively poor signal.
RESUMO
The perception of lexical pitch accent in Japanese was assessed using noise-excited vocoder speech, which contained no fundamental frequency (f o ) or its harmonics. While prosodic information such as in lexical stress in English and lexical tone in Mandarin Chinese is known to be encoded in multiple acoustic dimensions, such multidimensionality is less understood for lexical pitch accent in Japanese. In the present study, listeners were tested under four different conditions to investigate the contribution of non-f o properties to the perception of Japanese pitch accent: noise-vocoded speech stimuli consisting of 10 3-ERBN-wide bands and 15 2-ERBN-wide bands created from a male and female speaker. Results found listeners were able to identify minimal pairs of final-accented and unaccented words at a rate better than chance in all conditions, indicating the presence of secondary cues to Japanese pitch accent. Subsequent analyses were conducted to investigate if the listeners' ability to distinguish minimal pairs was correlated with duration, intensity or formant information. The results found no strong or consistent correlation, suggesting the possibility that listeners used different cues depending on the information available in the stimuli. Furthermore, the comparison of the current results with equivalent studies in English and Mandarin Chinese suggest that, although lexical prosodic information exists in multiple acoustic dimensions in Japanese, the primary cue is more salient than in other languages.
RESUMO
When speech is presented in their second language (L2), bilinguals have more difficulties with speech perception in noise than monolinguals do. However, how noise affects speech perception of bilinguals in their first language (L1) is still unclear. In addition, it is not clear whether bilinguals' speech perception in challenging listening conditions is specific to the type of degradation, or whether there is a shared mechanism for bilingual speech processing under complex listening conditions. Therefore, the current study examined the speech perception of 60 Arabic-Hebrew bilinguals and a control group of native Hebrew speakers during degraded (speech in noise, vocoded speech) and quiet listening conditions. Between participant comparisons (comparing native Hebrew speakers and bilinguals' perceptual performance in L1) and within participant comparisons (perceptual performance of bilinguals in L1 and L2) were conducted. The findings showed that bilinguals in L1 had more difficulty in noisy conditions than their control counterparts did, even when performed like controls under favorable listening conditions. However, bilingualism did not hinder language learning mechanisms. Bilinguals in L1 outperformed native Hebrew speakers in the perception of vocoded speech, demonstrating more extended learning processes. Bilinguals' perceptual performance in L1 versus L2 varied by task complexity. Correlation analyses revealed that bilinguals who coped better with noise degradation were more successful in perceiving the vocoding distortion. Together, these results provide insights into the mechanisms that contribute to speech perceptual performance in challenging listening conditions and suggest that bilinguals' language proficiency and age of language acquisition are not the only factors that affect performance. Rather, duration of exposure to languages, co-activation, and the ability to benefit from exposure to novel stimuli appear to affect the perceptual performance of bilinguals, even when operating in their dominant language. Our findings suggest that bilinguals use a shared mechanism for speech processing under challenging listening conditions.
Assuntos
Multilinguismo , Percepção da Fala , Humanos , Idioma , Ruído , Fala , Percepção da Fala/fisiologiaRESUMO
Individuals with autism spectrum disorder (ASD) are found to have difficulties in understanding speech in adverse conditions. In this study, we used noise-vocoded speech (VS) to investigate neural processing of degraded speech in individuals with ASD. We ran fMRI experiments in the ASD group and a typically developed control (TDC) group while they listened to clear speech (CS), VS, and spectrally rotated VS (SRVS), and they were requested to pay attention to the heard sentence and answer whether it was intelligible or not. The VS used in this experiment was spectrally degraded but still intelligible, but the SRVS was unintelligible. We recruited 21 right-handed adult males with ASD and 24 age-matched and right-handed male TDC participants for this experiment. Compared with the TDC group, we observed reduced functional connectivity (FC) between the left dorsal premotor cortex and left temporoparietal junction in the ASD group for the effect of task difficulty in speech processing, computed as VS-(CS + SRVS)/2. Furthermore, the observed reduced FC was negatively correlated with their Autism-Spectrum Quotient scores. This observation supports our hypothesis that the disrupted dorsal stream for attentive process of degraded speech in individuals with ASD might be related to their difficulty in understanding speech in adverse conditions.
Assuntos
Transtorno do Espectro Autista , Fala , Adulto , Transtorno do Espectro Autista/complicações , Transtorno do Espectro Autista/diagnóstico por imagem , Encéfalo/diagnóstico por imagem , Mapeamento Encefálico , Humanos , Imageamento por Ressonância Magnética , MasculinoRESUMO
Under an acoustically degraded condition, the degree of speech comprehension fluctuates within individuals. Understanding the relationship between such fluctuations in comprehension and neural responses might reveal perceptual processing for distorted speech. In this study we investigated the cerebral activity associated with the degree of subjective comprehension of noise-vocoded speech sounds (NVSS) using functional magnetic resonance imaging. Our results indicate that higher comprehension of NVSS sentences was associated with greater activation in the right superior temporal cortex, and that activity in the left inferior frontal gyrus (Broca's area) was increased when a listener recognized words in a sentence they did not fully comprehend. In addition, results of laterality analysis demonstrated that recognition of words in an NVSS sentence led to less lateralized responses in the temporal cortex, though a left-lateralization was observed when no words were recognized. The data suggest that variation in comprehension within individuals can be associated with changes in lateralization in the temporal auditory cortex.
Assuntos
Ruído , Percepção da Fala , Fala , Mapeamento Encefálico , Compreensão , Humanos , Imageamento por Ressonância Magnética , Ruído/efeitos adversosRESUMO
Previous studies have shown that at moderate levels of spectral degradation, semantic predictability facilitates language comprehension. It is argued that when speech is degraded, listeners have narrowed expectations about the sentence endings; i.e., semantic prediction may be limited to only most highly predictable sentence completions. The main objectives of this study were to (i) examine whether listeners form narrowed expectations or whether they form predictions across a wide range of probable sentence endings, (ii) assess whether the facilitatory effect of semantic predictability is modulated by perceptual adaptation to degraded speech, and (iii) use and establish a sensitive metric for the measurement of language comprehension. For this, we created 360 German Subject-Verb-Object sentences that varied in semantic predictability of a sentence-final target word in a graded manner (high, medium, and low) and levels of spectral degradation (1, 4, 6, and 8 channels noise-vocoding). These sentences were presented auditorily to two groups: One group (n =48) performed a listening task in an unpredictable channel context in which the degraded speech levels were randomized, while the other group (n =50) performed the task in a predictable channel context in which the degraded speech levels were blocked. The results showed that at 4 channels noise-vocoding, response accuracy was higher in high-predictability sentences than in the medium-predictability sentences, which in turn was higher than in the low-predictability sentences. This suggests that, in contrast to the narrowed expectations view, comprehension of moderately degraded speech, ranging from low- to high- including medium-predictability sentences, is facilitated in a graded manner; listeners probabilistically preactivate upcoming words from a wide range of semantic space, not limiting only to highly probable sentence endings. Additionally, in both channel contexts, we did not observe learning effects; i.e., response accuracy did not increase over the course of experiment, and response accuracy was higher in the predictable than in the unpredictable channel context. We speculate from these observations that when there is no trial-by-trial variation of the levels of speech degradation, listeners adapt to speech quality at a long timescale; however, when there is a trial-by-trial variation of the high-level semantic feature (e.g., sentence predictability), listeners do not adapt to low-level perceptual property (e.g., speech quality) at a short timescale.
RESUMO
The Irrelevant Sound Effect (ISE) is the finding that background sound impairs accuracy for visually presented serial recall tasks. Among various auditory backgrounds, speech typically acts as the strongest distractor. Based on the changing-state hypothesis, speech is a disruptive background because it is more complex than other nonspeech backgrounds. In the current study, we evaluate an alternative explanation by examining whether the speech-likeness of the background (speech fidelity) contributes, beyond signal complexity, to the ISE. We did this by using noise-vocoded speech as a background. In Experiment 1, we varied the complexity of the background by manipulating the number of vocoding channels. Results indicate that the ISE increases with the number of channels, suggesting that more complex signals produce greater ISEs. In Experiment 2, we varied complexity and speech fidelity independently. At each channel level, we selectively reversed a subset of channels to design a low-fidelity signal that was equated in overall complexity. Experiment 2 results indicated that speech-like noise-vocoded speech produces a larger ISE than selectively reversed noise-vocoded speech. Finally, in Experiment 3, we evaluated the locus of the speech-fidelity effect by assessing the distraction produced by these stimuli in a missing-item task. In this task, even though noise-vocoded speech disrupted task performance relative to silence, neither its complexity nor speech fidelity contributed to this effect. Together, these findings indicate a clear role for speech fidelity of the background beyond its changing-state quality and its attention capture potential.
Assuntos
Atenção/fisiologia , Ruído , Mascaramento Perceptivo , Percepção da Fala/fisiologia , Fala/fisiologia , Estimulação Acústica , Adulto , Análise de Variância , Feminino , Humanos , Masculino , Rememoração Mental/fisiologia , Adulto JovemRESUMO
We examined the frequency specificity of amplitude envelope patterns in 4 frequency bands, which universally appeared through factor analyses applied to power fluctuations of critical-band filtered speech sounds in 8 different languages/dialects [Ueda and Nakajima (2017). Sci. Rep., 7 (42468)]. A series of 3 perceptual experiments with noise-vocoded speech of Japanese sentences was conducted. Nearly perfect (92-94%) mora recognition was achieved, without any extensive training, in a control condition in which 4-band noise-vocoded speech was employed (Experiments 1-3). Blending amplitude envelope patterns of the frequency bands, which resulted in reducing the number of amplitude envelope patterns while keeping the average spectral levels unchanged, revealed a clear deteriorating effect on intelligibility (Experiment 1). Exchanging amplitude envelope patterns brought generally detrimental effects on intelligibility, especially when involving the 2 lowest bands (â²1850 Hz; Experiment 2). Exchanging spectral levels averaged in time had a small but significant deteriorating effect on intelligibility in a few conditions (Experiment 3). Frequency specificity in low-frequency-band envelope patterns thus turned out to be conspicuous in speech perception.
Assuntos
Sinais (Psicologia) , Ruído/efeitos adversos , Reconhecimento Fisiológico de Modelo , Mascaramento Perceptivo , Percepção da Altura Sonora , Acústica da Fala , Inteligibilidade da Fala , Percepção da Fala , Estimulação Acústica , Acústica , Adulto , Audiometria da Fala , Compreensão , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Reconhecimento Psicológico , Espectrografia do Som , Adulto JovemRESUMO
OBJECTIVE: Cochlear implants process the acoustic speech signal and convert it into electrical impulses. During this processing, many parameters contribute to speech perception. The available literature reviewed the effect of manipulating one or two such parameters on speech intelligibility, but multiple parameters are seldom manipulated. METHOD: Acoustic parameters, including pulse rate, number of channels, 'n of m', number of electrodes, and channel spacing, were manipulated in acoustic simulations of cochlear implant hearing and 90 different combinations were created. Speech intelligibility at sentence level was measured using subjective and objective tests. RESULTS: Principal component analysis was employed to select only those components with maximum factor loading, thus reducing the number of components to a reasonable limit. Perceptual speech intelligibility was maximum for signal processing manipulation with respect to 'n of m' and rate of pulses/sec. Regression analysis revealed that lower rate (=500â pps/ch) and lesser stimulating electrodes per cycle (2-4) contributed maximally for speech intelligibility. Perceptual estimate of speech quality (PESQ) and composite measures of spectral weights and likelihood ratio correlated with subjective speech intelligibility scores. DISCUSSION: The findings are consistent with the literature review, indicating that lesser stimulated channel per cycle reduces electrode interaction and hence improve spectral resolution of speech. Reduced rate of pulses/second enhances temporal resolution of speech. Thus, these two components contribute significantly to speech intelligibility. CONCLUSION: Pulse rate/channel and 'n of m' contribute maximally to speech intelligibility, at least in simulations of electric hearing.