RESUMO
Mounting evidence suggests that listeners perceptually compensate for the adverse effects of reverberation in rooms when listening to speech monaurally. However, it is not clear whether the underlying perceptual mechanism would be at all effective in the high levels of stimulus uncertainty that are present in everyday listening. Three experiments investigated monaural compensation with a consonant identification task in which listeners heard different speech on each trial. Consonant confusions frequently arose when a greater degree of reverberation was added to a test-word than to its surrounding context, but compensation became apparent in conditions where the context reverberation was increased to match that of the test-word; here, the confusions were largely resolved. A second experiment shows that information from the test-word itself can also effect compensation. Finally, the time course of compensation was examined by applying reverberation to a portion of the preceding context; consonant identification improves as this portion increases in duration. These findings indicate a monaural compensation mechanism that is likely to be effective in everyday listening, allowing listeners to recalibrate as their reverberant environment changes.
Assuntos
Distorção da Percepção , Mascaramento Perceptivo , Fonética , Acústica da Fala , Percepção da Fala , Adulto , Feminino , Humanos , Masculino , Espectrografia do SomRESUMO
Room reverberation usually degrades speech reception, such as when listeners identify test words from a 'sir'-to-'stir' continuum. Here, substantial reverberation introduces a 'tail' from the [s], which tends to fill the gap that cues the [t], and a degradation effect arises as listeners report correspondingly fewer 'stir' sounds. This effect is particularly clear when test words are preceded by a precursor phrase (e.g. 'next you'll get ') that contains much less reverberation than the test word. When the precursor's reverberation is increased to be the same as in the test word, the degradation diminishes as more 'stir' sounds are heard once again. This last effect has been attributed to a perceptual compensation mechanism that is informed by the precursor's reverberation level. However, a recent claim is that the degradation is caused by 'modulation masking' from precursors with a low level of reverberation. Such masking is likely to diminish when the precursor's reverberation level is raised, because reverberation acts as a low-pass modulation filter. Support for this hypothesis comes from results in conditions where degradation effects seem to be entirely absent, despite substantial reverberation. In these conditions, test words were played in isolation, with no precursor, and reverberation was kept at the same level in the test words of every trial. The experiments reported here have conditions that are similar, except that reverberation in test words is varied unpredictably from trial to trial, so that substantial-level trials are interspersed with trials that have a much lower level of reverberation. The result is that under these conditions, the degradation effect is entirely restored, allowing rejection of the modulation-masking hypothesis. An alternative is that some perceptual compensation comes from reverberation information within test words, and its effects accumulate over sequences of trials as long as the test word's reverberation level stays the same from trial to trial.
Assuntos
Acústica , Adaptação Fisiológica/fisiologia , Sinais (Psicologia) , Fonética , Percepção da Fala/fisiologia , Estimulação Acústica/métodos , Meio Ambiente , Humanos , Mascaramento Perceptivo/fisiologia , Espectrografia do Som , Fala , Inteligibilidade da FalaRESUMO
When speech is in competition with interfering sources in rooms, monaural indicators of intelligibility fail to take account of the listener's abilities to separate target speech from interfering sounds using the binaural system. In order to incorporate these segregation abilities and their susceptibility to reverberation, Lavandier and Culling [J. Acoust. Soc. Am. 127, 387-399 (2010)] proposed a model which combines effects of better-ear listening and binaural unmasking. A computationally efficient version of this model is evaluated here under more realistic conditions that include head shadow, multiple stationary noise sources, and real-room acoustics. Three experiments are presented in which speech reception thresholds were measured in the presence of one to three interferers using real-room listening over headphones, simulated by convolving anechoic stimuli with binaural room impulse-responses measured with dummy-head transducers in five rooms. Without fitting any parameter of the model, there was close correspondence between measured and predicted differences in threshold across all tested conditions. The model's components of better-ear listening and binaural unmasking were validated both in isolation and in combination. The computational efficiency of this prediction method allows the generation of complex "intelligibility maps" from room designs.
Assuntos
Ruído , Mascaramento Perceptivo/fisiologia , Inteligibilidade da Fala/fisiologia , Estimulação Acústica/métodos , Análise de Variância , Limiar Auditivo/fisiologia , Percepção de Distância/fisiologia , Humanos , Modelos Biológicos , Detecção de Sinal Psicológico/fisiologia , Localização de Som/fisiologia , Percepção da Fala/fisiologiaRESUMO
Three experiments measured constancy in speech perception, using natural-speech messages or noise-band vocoder versions of them. The eight vocoder-bands had equally log-spaced center-frequencies and the shapes of corresponding "auditory" filters. Consequently, the bands had the temporal envelopes that arise in these auditory filters when the speech is played. The "sir" or "stir" test-words were distinguished by degrees of amplitude modulation, and played in the context; "next you'll get _ to click on." Listeners identified test-words appropriately, even in the vocoder conditions where the speech had a "noise-like" quality. Constancy was assessed by comparing the identification of test-words with low or high levels of room reflections across conditions where the context had either a low or a high level of reflections. Constancy was obtained with both the natural and the vocoded speech, indicating that the effect arises through temporal-envelope processing. Two further experiments assessed perceptual weighting of the different bands, both in the test word and in the context. The resulting weighting functions both increase monotonically with frequency, following the spectral characteristics of the test-word's [s]. It is suggested that these two weighting functions are similar because they both come about through the perceptual grouping of the test-word's bands.
Assuntos
Fonética , Acústica da Fala , Percepção da Fala , Estimulação Acústica , Audiometria da Fala , Arquitetura de Instituições de Saúde , Humanos , Ruído/efeitos adversos , Mascaramento Perceptivo , Reconhecimento Psicológico , Espectrografia do Som , Fatores de TempoRESUMO
Perceptual compensation for reverberation was measured by embedding test words in contexts that were either spoken phrases or processed versions of this speech. The processing gave steady-spectrum contexts with no changes in the shape of the short-term spectral envelope over time, but with fluctuations in the temporal envelope. Test words were from a continuum between "sir" and "stir." When the amount of reverberation in test words was increased, to a level above the amount in the context, they sounded more like "sir." However, when the amount of reverberation in the context was also increased, to the level present in the test word, there was perceptual compensation in some conditions so that test words sounded more like "stir" again. Experiments here found compensation with speech contexts and with some steady-spectrum contexts, indicating that fluctuations in the context's temporal envelope can be sufficient for compensation. Other results suggest that the effectiveness of speech contexts is partly due to the narrow-band "frequency-channels" of the auditory periphery, where temporal-envelope fluctuations can be more pronounced than they are in the sound's broadband temporal envelope. Further results indicate that for compensation to influence speech, the context needs to be in a broad range of frequency channels.
Assuntos
Acústica , Acústica da Fala , Inteligibilidade da Fala , Percepção da Fala , Meio Ambiente , Humanos , FonéticaRESUMO
Listeners were asked to identify modified recordings of the words "sir" and "stir," which were spoken by an adult male British-English speaker. Steps along a continuum between the words were obtained by a pointwise interpolation of their temporal-envelopes. These test words were embedded in a longer "context" utterance, and played with different amounts of reverberation. Increasing only the test-word's reverberation shifts the listener's category boundary so that more "sir"-identifications are made. This effect reduces when the context's reverberation is also increased, indicating perceptual compensation that is informed by the context. Experiment 1 finds that compensation is more prominent in rapid speech, that it varies between rooms, that it is more prominent when the test-word's reverberation is high, and that it increases with the context's reverberation. Further experiments show that compensation persists when the room is switched between the context and the test word, when presentation is monaural, and when the context is reversed. However, compensation reduces when the context's reverberation pattern is reversed, as well as when noise-versions of the context are used. "Tails" that reverberation introduces at the ends of sounds and at spectral transitions may inform the compensation mechanism about the amount of reflected sound in the signal.