RESUMO
Better communication with older people requires not only improving speech intelligibility but also understanding how well emotions can be conveyed and the effect of age and hearing loss (HL) on emotion perception. In this paper, emotion discrimination experiments were conducted using a vocal morphing method and an HL simulator in young normal hearing (YNH) and older participants. Speech sounds were morphed to represent intermediate emotions between all combinations of happiness, sadness, and anger. Discrimination performance was compared when the YNH listened to normal sounds, when the same YNH listened to HL simulated sounds, and when older people listened to the same normal sounds. The results showed that there was no significant difference between discrimination with and without HL simulation, suggesting that peripheral HL may not affect emotion perception. The discrimination performance of the older participants was significantly worse only for the anger-happiness pair than for the other emotion pairs and for the YNH. It was also found that the difficulty increases with age, not just with hearing level.
Assuntos
Emoções , Perda Auditiva , Percepção da Fala , Humanos , Emoções/fisiologia , Masculino , Feminino , Idoso , Percepção da Fala/fisiologia , Perda Auditiva/psicologia , Perda Auditiva/fisiopatologia , Adulto , Adulto Jovem , Envelhecimento/fisiologia , Envelhecimento/psicologia , Pessoa de Meia-Idade , Fatores EtáriosRESUMO
Auditory filter (AF) shape has traditionally been estimated with a combination of a notched-noise (NN) masking experiment and a power spectrum model (PSM) of masking. However, there are several challenges that remain in both the simultaneous and forward masking paradigms. We hypothesized that AF shape estimation would be improved if absolute threshold (AT) and a level-dependent internal noise were explicitly represented in the PSM. To document the interaction between NN threshold and AT in normal hearing (NH) listeners, a large set of NN thresholds was measured at four center frequencies (500, 1000, 2000, and 4000 Hz) with the emphasis on low-level maskers. The proposed PSM, consisting of the compressive gammachirp (cGC) filter and three nonfilter parameters, allowed AF estimation over a wide range of frequencies and levels with fewer coefficients and less error than previous models. The results also provided new insights into the nonfilter parameters. The detector signal-to-noise ratio (K) was found to be constant across signal frequencies, suggesting that no frequency dependence hypothesis is required in the postfiltering process. The ANSI standard "Hearing Level-0dB" function, i.e., AT of NH listeners, could be applied to the frequency distribution of the noise floor for the best AF estimation. The introduction of a level-dependent internal noise could mitigate the nonlinear effects that occur in the simultaneous NN masking paradigm. The new PSM improves the applicability of the model, particularly when the sound pressure level of the NN threshold is close to AT.
Assuntos
Ruído , Mascaramento Perceptivo , Humanos , Limiar Auditivo , Ruído/efeitos adversos , Pressão , Razão Sinal-RuídoRESUMO
Psychotherapists, who use their communicative skills to assist people, review their dialogue practices and improve their skills from their experiences. However, technology has not been fully exploited for this purpose. In this study, we analyze the use of head movements during actual psychotherapeutic dialogues between two participants-therapist and client-using video recordings and head-mounted accelerometers. Accelerometers have been utilized in the mental health domain but not for analyzing mental health related communications. We examined the relationship between the state of the interaction and temporally varying head nod and movement patterns in psychological counseling sessions. Head nods were manually annotated and the head movements were measured using accelerometers. Head nod counts were analyzed based on annotations taken from video data. We conducted cross-correlation analysis of the head movements of the two participants using the accelerometer data. The results of two case studies suggest that upward and downward head nod count patterns may reflect stage transitions in counseling dialogues and that peaks of head movement synchrony may be related to emphasis in the interaction.
Assuntos
Movimentos da Cabeça , Cabeça , Acelerometria , Comunicação , Movimento , Gravação em VídeoRESUMO
This study aims to find an effective chirp signal that enhances the amplitude of wave-I of auditory brainstem response (ABR) to diagnose "cochlear synaptopathy." Although several chirp signals have been proposed to enhance the amplitude of wave-V, the effect on wave-I has not been clarified yet. Ten chirp signals, which have shorter group delays than the commonly used "CE-chirp," were produced to measure the amplitudes of wave-I and wave-V of the ABRs. The results show that one of the chirp signals significantly enhanced the amplitude of wave-I, where the group delay is approximately half of the CE-chirp.
Assuntos
Estimulação Acústica , Tronco Encefálico/fisiologia , Cóclea/fisiologia , Eletroencefalografia , Potenciais Evocados Auditivos do Tronco Encefálico , Audição , Adulto , Limiar Auditivo , Doenças Cocleares/diagnóstico , Doenças Cocleares/fisiopatologia , Feminino , Voluntários Saudáveis , Humanos , Masculino , Pessoa de Meia-Idade , Valor Preditivo dos Testes , Tempo de Reação , Fatores de Tempo , Adulto JovemRESUMO
OBJECTIVE: The temporal modulation transfer function (TMTF) has been proposed to estimate the temporal resolution abilities of listeners with normal hearing and listeners with hearing loss. The TMTF data of patients would be useful for clinical diagnosis and for adjusting the hearing instruments at clinical and fitting sites. However, practical application is precluded by the long measurement time of the conventional method, which requires several measurement points. This article presents a new method to measure the TMTF that requires only two measurement points. DESIGN: Experiments were performed to estimate the TMTF of normal listeners and listeners with hearing loss to demonstrate that the two-point method can estimate the TMTF parameter and the conventional method. Sixteen normal hearing and 21 subjects with hearing loss participated, and the difference between the estimated TMTF parameters and measurement time were compared. RESULTS: The TMTF parameters (the peak sensitivity Lps and cutoff frequency fcutoff) estimated by the conventional and two-point methods showed significantly high correlations: the correlation coefficient for Lps was 0.91 (t(45) = 14.3; p < 10) and that for fcutoff was 0.89 (t(45) = 13.2; p < 10). There were no fixed and proportional biases. Therefore, the estimated values were in good agreement. Moreover, there was no systematic bias depending on the subject's profile. The measurement time of the two-point method was approximately 10 min, which is approximately one-third that of the conventional method. CONCLUSION: The two-point method enables the introduction of TMTF measurement in clinical diagnosis.
Assuntos
Perda Auditiva/fisiopatologia , Testes Auditivos/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Estudos de Casos e Controles , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Adulto JovemRESUMO
Hearing impaired (HI) people often have difficulty understanding speech in multi-speaker or noisy environments. With HI listeners, however, it is often difficult to specify which stage, or stages, of auditory processing are responsible for the deficit. There might also be cognitive problems associated with age. In this paper, a HI simulator, based on the dynamic, compressive gammachirp (dcGC) filterbank, was used to measure the effect of a loss of compression on syllable recognition. The HI simulator can counteract the cochlear compression in normal hearing (NH) listeners and, thereby, isolate the deficit associated with a loss of compression in speech perception. Listeners were required to identify the second syllable in a three-syllable "nonsense word", and between trials, the relative level of the second syllable was varied, or the level of the entire sequence was varied. The difference between the Speech Reception Threshold (SRT) in these two conditions reveals the effect of compression on speech perception. The HI simulator adjusted a NH listener's compression to that of the "average 80-year old" with either normal compression or complete loss of compression. A reference condition was included where the HI simulator applied a simple 30-dB reduction in stimulus level. The results show that the loss of compression has its largest effect on recognition when the second syllable is attenuated relative to the first and third syllables. This is probably because the internal level of the second syllable is attenuated proportionately more when there is a loss of compression.
Assuntos
Perda Auditiva/fisiopatologia , Mascaramento Perceptivo , Percepção da Fala , Adulto , Feminino , Humanos , Masculino , Teste do Limiar de Recepção da FalaRESUMO
This chapter presents a unified gammachirp framework for -estimating cochlear compression and synthesizing sounds with inverse compression that -cancels the compression of a normal-hearing (NH) listener to simulate the -experience of a hearing-impaired (HI) listener. The compressive gammachirp (cGC) filter was -fitted to notched-noise masking data to derive level-dependent -filter shapes and the cochlear compression function (e.g., Patterson et al., J Acoust Soc Am 114:1529-1542, 2003). The procedure is based on the analysis/synthesis technique of Irino and Patterson (IEEE Trans Audio Speech Lang Process 14:2222-2232, 2006) using a dynamic cGC filterbank (dcGC-FB). The level dependency of the dcGC-FB can be reversed to produce inverse compression and resynthesize sounds in a form that cancels the compression applied by the -auditory system of the NH listener. The chapter shows that the estimation of compression in simultaneous masking is improved if the notched-noise procedure for the derivation of auditory filter shape includes noise bands with different levels. Since both the estimation and resynthesis are performed within the gammachirp framework, it is possible for a specific NH listener to experience the loss of a -specific HI listener.
Assuntos
Cóclea/fisiologia , Perda Auditiva/fisiopatologia , Audição/fisiologia , Modelos Biológicos , Mascaramento Perceptivo/fisiologia , Estimulação Acústica , Limiar Auditivo/fisiologia , Humanos , Ruído , PsicoacústicaRESUMO
Although the rounded-exponential (roex) filter has been successfully used to represent the magnitude response of the auditory filter, recent studies with the roex(p, w, t) filter reveal two serious problems: the fits to notched-noise masking data are somewhat unstable unless the filter is reduced to a physically unrealizable form, and there is no time-domain version of the roex(p, w, t) filter to support modeling of the perception of complex sounds. This paper describes a compressive gammachirp (cGC) filter with the same architecture as the roex(p, w, t) which can be implemented in the time domain. The gain and asymmetry of this parallel cGC filter are shown to be comparable to those of the roex(p, w, t) filter, but the fits to masking data are still somewhat unstable. The roex(p, w, t) and parallel cGC filters were also compared with the cascade cGC filter [Patterson et al., J. Acoust. Soc. Am. 114, 1529-1542 (2003)], which was found to provide an equivalent fit with 25% fewer coefficients. Moreover, the fits were stable. The advantage of the cascade cGC filter appears to derive from its parsimonious representation of the high-frequency side of the filter. It is concluded that cGC filters offer better prospects than roex filters for the representation of the auditory filter.
Assuntos
Acústica , Cóclea/fisiologia , Audição/fisiologia , Modelos Biológicos , Humanos , Ruído , Mascaramento Perceptivo , Som , Fatores de TempoRESUMO
It is now common to use knowledge about human auditory processing in the development of audio signal processors. Until recently, however, such systems were limited by their linearity. The auditory filter system is known to be level-dependent as evidenced by psychophysical data on masking, compression, and two-tone suppression. However, there were no analysis/synthesis schemes with nonlinear filterbanks. This paper describe18300060s such a scheme based on the compressive gammachirp (cGC) auditory filter. It was developed to extend the gammatone filter concept to accommodate the changes in psychophysical filter shape that are observed to occur with changes in stimulus level in simultaneous, tone-in-noise masking. In models of simultaneous noise masking, the temporal dynamics of the filtering can be ignored. Analysis/synthesis systems, however, are intended for use with speech sounds where the glottal cycle can be long with respect to auditory time constants, and so they require specification of the temporal dynamics of auditory filter. In this paper, we describe a fast-acting level control circuit for the cGC filter and show how psychophysical data involving two-tone suppression and compression can be used to estimate the parameter values for this dynamic version of the cGC filter (referred to as the "dcGC" filter). One important advantage of analysis/synthesis systems with a dcGC filterbank is that they can inherit previously refined signal processing algorithms developed with conventional short-time Fourier transforms (STFTs) and linear filterbanks.
RESUMO
We propose a new method to segregate concurrent speech sounds using an auditory version of a channel vocoder. The auditory representation of sound, referred to as an "auditory image," preserves fine temporal information, unlike conventional window-based processing systems. This makes it possible to segregate speech sources with an event synchronous procedure. Fundamental frequency information is used to estimate the sequence of glottal pulse times for a target speaker, and to repress the glottal events of other speakers. The procedure leads to robust extraction of the target speech and effective segregation even when the signal-to-noise ratio is as low as 0 dB. Moreover, the segregation performance remains high when the speech contains jitter, or when the estimate of the fundamental frequency F0 is inaccurate. This contrasts with conventional comb-filter methods where errors in F0 estimation produce a marked reduction in performance. We compared the new method to a comb-filter method using a cross-correlation measure and perceptual recognition experiments. The results suggest that the new method has the potential to supplant comb-filter and harmonic-selection methods for speech enhancement.
RESUMO
There is information in speech sounds about the length of the vocal tract; specifically, as a child grows, the resonators in the vocal tract grow and the formant frequencies of the vowels decrease. It has been hypothesized that the auditory system applies a scale transform to all sounds to segregate size information from resonator shape information, and thereby enhance both size perception and speech recognition [Irino and Patterson, Speech Commun. 36, 181-203 (2002)]. This paper describes size discrimination experiments and vowel recognition experiments designed to provide evidence for an auditory scaling mechanism. Vowels were scaled to represent people with vocal tracts much longer and shorter than normal, and with pitches much higher and lower than normal. The results of the discrimination experiments show that listeners can make fine judgments about the relative size of speakers, and they can do so for vowels scaled well beyond the normal range. Similarly, the recognition experiments show good performance for vowels in the normal range, and for vowels scaled well beyond the normal range of experience. Together, the experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds.
Assuntos
Percepção da Altura Sonora , Percepção da Fala/fisiologia , Humanos , Fonética , Som , Testes de Discriminação da FalaRESUMO
This paper presents a new method for robust and accurate fundamental frequency (F0) estimation in the presence of background noise and spectral distortion. Degree of dominance and dominance spectrum are defined based on instantaneous frequencies. The degree of dominance allows one to evaluate the magnitude of individual harmonic components of the speech signals relative to background noise while reducing the influence of spectral distortion. The fundamental frequency is more accurately estimated from reliable harmonic components which are easy to select given the dominance spectra. Experiments are performed using white and babble background noise with and without spectral distortion as produced by a SRAEN filter. The results show that the present method is better than previously reported methods in terms of both gross and fine F0 errors.
Assuntos
Ruído/efeitos adversos , Distorção da Percepção , Discriminação da Altura Tonal , Espectrografia do Som , Acústica da Fala , Adulto , Atenção , Feminino , Humanos , Masculino , Fonética , Testes de Discriminação da FalaRESUMO
The gammatone filter was imported from auditory physiology to provide a time-domain version of the roex auditory filter and enable the development of a realistic auditory filterbank for models of auditory perception [Patterson et al., J. Acoust. Soc. Am. 98, 1890-1894 (1995)]. The gammachirp auditory filter was developed to extend the domain of the gammatone auditory filter and simulate the changes in filter shape that occur with changes in stimulus level. Initially, the gammachirp filter was limited to center frequencies in the 2.0-kHz region where there were sufficient "notched-noise" masking data to define its parameters accurately. Recently, however, the range of the masking data has been extended in two massive studies. This paper reports how a compressive version of the gammachirp auditory filter was fitted to these new data sets to define the filter parameters over the extended frequency range. The results show that the shape of the filter can be specified for the entire domain of the data using just six constants (center frequencies from 0.25 to 6.0 kHz and levels from 30 to 80 dB SPL). The compressive, gammachirp auditory filter also has the advantage of being consistent with physiological studies of cochlear filtering insofar as the compression of the filter is mainly limited to the passband and the form of the chirp in the impulse response is largely independent of level.