ABSTRACT
Cochlear implants (CIs) do not offer the same level of effectiveness in noisy environments as in quiet settings. Current single-microphone noise reduction algorithms in hearing aids and CIs only remove predictable, stationary noise, and are ineffective against realistic, non-stationary noise such as multi-talker interference. Recent developments in deep neural network (DNN) algorithms have achieved noteworthy performance in speech enhancement and separation, especially in removing speech noise. However, more work is needed to investigate the potential of DNN algorithms in removing speech noise when tested with listeners fitted with CIs. Here, we implemented two DNN algorithms that are well suited for applications in speech audio processing: (1) recurrent neural network (RNN) and (2) SepFormer. The algorithms were trained with a customized dataset ( â¼ 30 h), and then tested with thirteen CI listeners. Both RNN and SepFormer algorithms significantly improved CI listener's speech intelligibility in noise without compromising the perceived quality of speech overall. These algorithms not only increased the intelligibility in stationary non-speech noise, but also introduced a substantial improvement in non-stationary noise, where conventional signal processing strategies fall short with little benefits. These results show the promise of using DNN algorithms as a solution for listening challenges in multi-talker noise interference.
Subject(s)
Algorithms , Cochlear Implants , Deep Learning , Noise , Speech Intelligibility , Humans , Female , Middle Aged , Male , Speech Perception/physiology , Aged , Adult , Neural Networks, ComputerABSTRACT
The present study investigated three different reverberation suppression rules based on the parametric ideal ratio mask, which is a generalization of the classical Wiener filter with additional parameters controlling the threshold and slope. Automatic selection of parameter values for the ideal ratio mask was performed using particle swarm optimization. Three different parameter sets were tested using sentences corrupted by reverberation. The results demonstrated that when optimizing parameters based on an objective measure of speech quality rather than intelligibility, cochlear implant users were able to perform at a level equivalent to that attainable with anechoic stimuli.
Subject(s)
Cochlear Implantation , Cochlear Implants , Speech Perception , Perceptual Masking , Speech IntelligibilityABSTRACT
In speech-in-speech recognition, listeners' performance improves when spatial and linguistic properties of background and target speech differ. However, it is unclear if these benefits interact or whether they persist under reverberant conditions typical of indoor listening. To address these issues, these benefits were tested together in the presence and absence of reverberation. Results demonstrate that, in the anechoic condition, both spatial and linguistic benefits are obtained but they do not constrain each other. Under reverberation, only the spatial benefit was obtained. This demonstrates the critical need to consider reverberation effects in order to fully characterize the challenges of speech-in-speech listening.
Subject(s)
Auditory Perception/physiology , Perceptual Masking/physiology , Speech Intelligibility/physiology , Speech Perception/physiology , Adult , Auscultation/methods , Female , Humans , NoiseABSTRACT
Purpose: The purpose of this study was to evaluate whether listeners with normal hearing perceiving noise-vocoded speech-in-speech demonstrate better intelligibility of target speech when the background speech was mismatched in language (linguistic release from masking [LRM]) and/or location (spatial release from masking [SRM]) relative to the target. We also assessed whether the spectral resolution of the noise-vocoded stimuli affected the presence of LRM and SRM under these conditions. Method: In Experiment 1, a mixed factorial design was used to simultaneously manipulate the masker language (within-subject, English vs. Dutch), the simulated masker location (within-subject, right, center, left), and the spectral resolution (between-subjects, 6 vs. 12 channels) of noise-vocoded target-masker combinations presented at +25 dB signal-to-noise ratio (SNR). In Experiment 2, the study was repeated using a spectral resolution of 12 channels at +15 dB SNR. Results: In both experiments, listeners' intelligibility of noise-vocoded targets was better when the background masker was Dutch, demonstrating reliable LRM in all conditions. The pattern of results in Experiment 1 was not reliably different across the 6- and 12-channel noise-vocoded speech. Finally, a reliable spatial benefit (SRM) was detected only in the more challenging SNR condition (Experiment 2). Conclusion: The current study is the first to report a clear LRM benefit in noise-vocoded speech-in-speech recognition. Our results indicate that this benefit is available even under spectrally degraded conditions and that it may augment the benefit due to spatial separation of target speech and competing backgrounds.
Subject(s)
Linguistics , Perceptual Masking , Speech Perception , Acoustic Stimulation/methods , Adolescent , Humans , Noise , Psychoacoustics , Young AdultABSTRACT
PURPOSE: The purpose of this study was to investigate whether bilateral cochlear implant (CI) listeners who are fitted with clinical processors are able to benefit from binaural advantages under reverberant conditions. Another aim of this contribution was to determine whether the magnitude of each binaural advantage observed inside a highly reverberant environment differs significantly from the magnitude measured in a near-anechoic environment. METHOD: Ten adults with postlingual deafness who are bilateral CI users fitted with either Nucleus 5 or Nucleus 6 clinical sound processors (Cochlear Corporation) participated in this study. Speech reception thresholds were measured in sound field and 2 different reverberation conditions (0.06 and 0.6 s) as a function of the listening condition (left, right, both) and the noise spatial location (left, front, right). RESULTS: The presence of the binaural effects of head-shadow, squelch, summation, and spatial release from masking in the 2 different reverberation conditions tested was determined using nonparametric statistical analysis. In the bilateral population tested, when the ambient reverberation time was equal to 0.6 s, results indicated strong positive effects of head-shadow and a weaker spatial release from masking advantage, whereas binaural squelch and summation contributed no statistically significant benefit to bilateral performance under this acoustic condition. These findings are consistent with those of previous studies, which have demonstrated that head-shadow yields the most pronounced advantage in noise. The finding that spatial release from masking produced little to almost no benefit in bilateral listeners is consistent with the hypothesis that additive reverberation degrades spatial cues and negatively affects binaural performance. CONCLUSIONS: The magnitude of 4 different binaural advantages was measured on the same group of bilateral CI subjects fitted with clinical processors in 2 different reverberation conditions. The results of this work demonstrate the impeding properties of reverberation on binaural speech understanding. In addition, results indicate that CI recipients who struggle in everyday listening environments are also more likely to benefit less in highly reverberant environments from their bilateral processors.
Subject(s)
Cochlear Implantation/methods , Cochlear Implants/statistics & numerical data , Deafness/rehabilitation , Psychoacoustics , Speech Intelligibility , Speech Perception , Adult , Age of Onset , Aged , Audiometry/methods , Auditory Perception , Deafness/diagnosis , Deafness/psychology , Female , Humans , Male , Middle Aged , Quality of Life , Sampling Studies , Sound Localization , Treatment Outcome , Young AdultABSTRACT
Several studies demonstrate that in complex auditory scenes, speech recognition is improved when the competing background and target speech differ linguistically. However, such studies typically utilize spatially co-located speech sources which may not fully capture typical listening conditions. Furthermore, co-located presentation may overestimate the observed benefit of linguistic dissimilarity. The current study examines the effect of spatial separation on linguistic release from masking. Results demonstrate that linguistic release from masking does extend to spatially separated sources. The overall magnitude of the observed effect, however, appears to be diminished relative to the co-located presentation conditions.
ABSTRACT
The smearing effects of room reverberation can significantly impair the ability of cochlear implant (CI) listeners to understand speech. To ameliorate the effects of reverberation, current dereverberation algorithms focus on recovering the direct sound from the reverberated signal by inverse filtering the reverberation process. This contribution describes and evaluates a spectral subtraction (SS) strategy capable of suppressing late reflections. Late reflections are the most detrimental to speech intelligibility by CI listeners as reverberation increases. By tackling only the late part of reflections, it is shown that users of CI devices can benefit from the proposed strategy even in highly reverberant rooms. The proposed strategy is also compared against an ideal reverberant (binary) masking approach. Speech intelligibility results indicate that the proposed SS solution is able to suppress additive reverberant energy to a degree comparable to that achieved by an ideal binary mask. The added advantage is that the SS strategy proposed in this work can allow for a potentially real-time implementation in clinical CI processors.
Subject(s)
Acoustics , Cochlear Implants , Perceptual Masking , Speech Intelligibility , Subtraction Technique , Wavelet Analysis , Acoustic Stimulation , Adolescent , Adult , Aged , Algorithms , Facility Design and Construction , Female , Hearing Loss, Sensorineural/physiopathology , Hearing Loss, Sensorineural/psychology , Hearing Loss, Sensorineural/therapy , Humans , Male , Middle Aged , Phonetics , Sound , Speech Perception , Young AdultABSTRACT
Behind-the-ear (BTE) processors of cochlear implant (CI) devices offer little to almost no protection from wind noise in most incidence angles. To assess speech intelligibility, eight CI recipients were tested in 3 and 9 m/s wind. Results indicated that speech intelligibility decreased substantially when the wind velocity, and in turn the wind sound pressure level, increased. A two-microphone wind noise suppression strategy was developed. Scores obtained with this strategy indicated substantial gains in speech intelligibility over other conventional noise reduction strategies tested.
Subject(s)
Cochlear Implants , Noise , Speech Intelligibility , Wind , Aged , Cochlear Implantation/instrumentation , Equipment Design , Female , Humans , Male , Middle Aged , Pressure , Speech Acoustics , Speech Perception , Transducers , Young AdultABSTRACT
PURPOSE: The purpose of this study was to evaluate the contribution of a contralateral hearing aid to the perception of consonants, in terms of voicing, manner, and place-of-articulation cues in reverberation and noise by adult cochlear implantees aided by bimodal fittings. METHOD: Eight postlingually deafened adult cochlear implant (CI) listeners with a fully inserted CI in 1 ear and low-frequency hearing in the other ear were tested on consonant perception. They were presented with consonant stimuli processed in the following experimental conditions: 1 quiet condition, 2 different reverberation times (0.3 s and 1.0 s), and the combination of 2 reverberation times with a single signal-to-noise ratio (5 dB). RESULTS: Consonant perception improved significantly when listening in combination with a contralateral hearing aid as opposed to listening with a CI alone in 0.3 s and 1.0 s of reverberation. Significantly higher scores were also noted when noise was added to 0.3 s of reverberation. CONCLUSIONS: A considerable benefit was noted from the additional acoustic information in conditions of reverberation and reverberation plus noise. The bimodal benefit observed was more pronounced for voicing and manner of articulation than for place of articulation.
Subject(s)
Cochlear Implants , Correction of Hearing Impairment/methods , Hearing Aids , Phonetics , Speech Perception , Adult , Aged , Aged, 80 and over , Combined Modality Therapy , Cues , Female , Hearing Loss, Sensorineural/rehabilitation , Humans , Male , Middle Aged , Noise , Signal-To-Noise RatioABSTRACT
OBJECTIVE: To investigate a set of acoustic features and classification methods for the classification of three groups of fricative consonants differing in place of articulation. METHOD: A support vector machine (SVM) algorithm was used to classify the fricatives extracted from the TIMIT database in quiet and also in speech babble noise at various signal-to-noise ratios (SNRs). Spectral features including four spectral moments, peak, slope, Mel-frequency cepstral coefficients (MFCC), Gammatone filters outputs, and magnitudes of fast Fourier Transform (FFT) spectrum were used for the classification. The analysis frame was restricted to only 8 msec. In addition, commonly-used linear and nonlinear principal component analysis dimensionality reduction techniques that project a high-dimensional feature vector onto a lower dimensional space were examined. RESULTS: With 13 MFCC coefficients, 14 or 24 Gammatone filter outputs, classification performance was greater than or equal to 85% in quiet and at +10 dB SNR. Using 14 Gammatone filter outputs above 1 kHz, classification accuracy remained high (greater than 80%) for a wide range of SNRs from +20 to +5 dB SNR. CONCLUSIONS: High levels of classification accuracy for fricative consonants in quiet and in noise could be achieved using only spectral features extracted from a short time window. Results of this work have a direct impact on the development of speech enhancement algorithms for hearing devices.
Subject(s)
Hearing Aids , Speech , Humans , Support Vector MachineABSTRACT
The purpose of this study was to determine the overall impact of early and late reflections on the intelligibility of reverberated speech by cochlear implant listeners. Two specific reverberation times were assessed. For each reverberation time, sentences were presented in three different conditions wherein the target signal was filtered through the early, late or entire part of the acoustic impulse response. Results obtained with seven cochlear implant listeners indicated that while early reflections neither enhanced nor reduced overall speech perception performance, late reflections severely reduced speech intelligibility in both reverberant conditions tested.
Subject(s)
Cochlear Implantation/instrumentation , Cochlear Implants , Correction of Hearing Impairment/instrumentation , Persons With Hearing Impairments/rehabilitation , Speech Acoustics , Speech Intelligibility , Speech Perception , Acoustic Stimulation , Adult , Audiometry, Speech , Female , Humans , Male , Middle Aged , Motion , Perceptual Masking , Persons With Hearing Impairments/psychology , Sound , Time Factors , Vibration , Young AdultABSTRACT
This paper investigates to what extent users of bilateral and bimodal fittings should expect to benefit from all three different binaural advantages found to be present in normal-hearing listeners. Head-shadow and binaural squelch are advantages occurring under spatially separated speech and noise, while summation emerges when speech and noise coincide in space. For 14 bilateral or bimodal listeners, speech reception thresholds in the presence of four-talker babble were measured in sound-field under various speech and noise configurations. Statistical analysis revealed significant advantages of head-shadow and summation for both bilateral and bimodal listeners. Squelch was significant only for bimodal listeners.
Subject(s)
Cochlear Implantation/instrumentation , Cochlear Implants , Correction of Hearing Impairment/instrumentation , Noise/adverse effects , Perceptual Masking , Persons With Hearing Impairments/rehabilitation , Speech Perception , Acoustic Stimulation , Adult , Aged , Aged, 80 and over , Audiometry, Speech , Female , Humans , Male , Middle Aged , Persons With Hearing Impairments/psychology , Prosthesis Design , Recognition, Psychology , Young AdultABSTRACT
To restore hearing sensation, cochlear implants deliver electrical pulses to the auditory nerve by relying on sophisticated signal processing algorithms that convert acoustic inputs to electrical stimuli. Although individuals fitted with cochlear implants perform well in quiet, in the presence of background noise, the speech intelligibility of cochlear implant listeners is more susceptible to background noise than that of normal hearing listeners. Traditionally, to increase performance in noise, single-microphone noise reduction strategies have been used. More recently, a number of approaches have suggested that speech intelligibility in noise can be improved further by making use of two or more microphones, instead. Processing strategies based on multiple microphones can better exploit the spatial diversity of speech and noise because such strategies rely mostly on spatial information about the relative position of competing sound sources. In this article, we identify and elucidate the most significant theoretical aspects that underpin single- and multi-microphone noise reduction strategies for cochlear implants. More analytically, we focus on strategies of both types that have been shown to be promising for use in current-generation implant devices. We present data from past and more recent studies, and furthermore we outline the direction that future research in the area of noise reduction for cochlear implants could follow.
Subject(s)
Cochlear Implantation/instrumentation , Cochlear Implants , Correction of Hearing Impairment/psychology , Noise/prevention & control , Perceptual Masking , Persons With Hearing Impairments/rehabilitation , Speech Perception , Algorithms , Humans , Noise/adverse effects , Persons With Hearing Impairments/psychology , Prosthesis Design , Signal Processing, Computer-Assisted , Speech IntelligibilityABSTRACT
The purpose of this study is to determine the relative impact of reverberant self-masking and overlap-masking effects on speech intelligibility by cochlear implant listeners. Sentences were presented in two conditions wherein reverberant consonant segments were replaced with clean consonants, and in another condition wherein reverberant vowel segments were replaced with clean vowels. The underlying assumption is that self-masking effects would dominate in the first condition, whereas overlap-masking effects would dominate in the second condition. Results indicated that the degradation of speech intelligibility in reverberant conditions is caused primarily by self-masking effects that give rise to flattened formant transitions.
Subject(s)
Cochlear Implantation/instrumentation , Cochlear Implants , Correction of Hearing Impairment/psychology , Noise/adverse effects , Perceptual Masking , Persons With Hearing Impairments/rehabilitation , Speech Intelligibility , Acoustic Stimulation , Aged , Analysis of Variance , Audiometry, Speech , Humans , Middle Aged , Persons With Hearing Impairments/psychology , Recognition, Psychology , Sound Spectrography , Speech Acoustics , Time Factors , VibrationABSTRACT
Little is known about the extent to which reverberation affects speech intelligibility by cochlear implant (CI) listeners. Experiment 1 assessed CI users' performance using Institute of Electrical and Electronics Engineers (IEEE) sentences corrupted with varying degrees of reverberation. Reverberation times of 0.30, 0.60, 0.80, and 1.0 s were used. Results indicated that for all subjects tested, speech intelligibility decreased exponentially with an increase in reverberation time. A decaying-exponential model provided an excellent fit to the data. Experiment 2 evaluated (offline) a speech coding strategy for reverberation suppression using a channel-selection criterion based on the signal-to-reverberant ratio (SRR) of individual frequency channels. The SRR reflects implicitly the ratio of the energies of the signal originating from the early (and direct) reflections and the signal originating from the late reflections. Channels with SRR larger than a preset threshold were selected, while channels with SRR smaller than the threshold were zeroed out. Results in a highly reverberant scenario indicated that the proposed strategy led to substantial gains (over 60 percentage points) in speech intelligibility over the subjects' daily strategy. Further analysis indicated that the proposed channel-selection criterion reduces the temporal envelope smearing effects introduced by reverberation and also diminishes the self-masking effects responsible for flattened formants.
Subject(s)
Cochlear Implants/standards , Perceptual Distortion , Speech Acoustics , Speech Perception , Aged , Algorithms , Equipment Design , Female , Hearing Loss, Sensorineural/physiopathology , Hearing Loss, Sensorineural/psychology , Hearing Loss, Sensorineural/surgery , Humans , Male , Middle Aged , Phonetics , Speech Intelligibility , Time FactorsABSTRACT
Bilateral cochlear implant (BI-CI) recipients achieve high word recognition scores in quiet listening conditions. Still, there is a substantial drop in speech recognition performance when there is reverberation and more than one interferers. BI-CI users utilize information from just two directional microphones placed on opposite sides of the head in a so-called independent stimulation mode. To enhance the ability of BI-CI users to communicate in noise, the use of two computationally inexpensive multi-microphone adaptive noise reduction strategies exploiting information simultaneously collected by the microphones associated with two behind-the-ear (BTE) processors (one per ear) is proposed. To this end, as many as four microphones are employed (two omni-directional and two directional) in each of the two BTE processors (one per ear). In the proposed two-microphone binaural strategies, all four microphones (two behind each ear) are being used in a coordinated stimulation mode. The hypothesis is that such strategies combine spatial information from all microphones to form a better representation of the target than that made available with only a single input. Speech intelligibility is assessed in BI-CI listeners using IEEE sentences corrupted by up to three steady speech-shaped noise sources. Results indicate that multi-microphone strategies improve speech understanding in single- and multi-noise source scenarios.
Subject(s)
Cochlear Implantation/instrumentation , Cochlear Implants , Correction of Hearing Impairment , Noise/adverse effects , Perceptual Masking , Signal Processing, Computer-Assisted , Speech Intelligibility , Speech Perception , Acoustic Stimulation , Adult , Aged , Audiometry, Speech , Comprehension , Correction of Hearing Impairment/psychology , Female , Humans , Male , Middle Aged , Prosthesis Design , Speech Acoustics , VibrationABSTRACT
In this paper, we propose a new blind multichannel adaptive filtering scheme, which incorporates a partial-updating mechanism in the error gradient of the update equation. The proposed blind processing algorithm operates in the time-domain by updating only a selected portion of the adaptive filters. The algorithm steers all computational resources to filter taps having the largest magnitude gradient components on the error surface. Therefore, it requires only a small number of updates at each iteration and can substantially minimize overall computational complexity. Numerical experiments carried out in realistic blind identification scenarios indicate that the performance of the proposed algorithm is comparable to the performance of its full-update counterpart, but with the added benefit of a highly reduced computational complexity.
Subject(s)
Signal Processing, Computer-Assisted , Speech , Algorithms , Computer Simulation , Humans , Male , Models, Theoretical , Neural Networks, Computer , Numerical Analysis, Computer-Assisted , Pattern Recognition, Visual , Reproducibility of Results , Software , Speech Acoustics , Speech Recognition Software , Time FactorsABSTRACT
In this letter we propose a novel approach for two-microphone enhancement of speech corrupted by reverberation. Our approach steers computational resources to filter coefficients having the largest impact on the error surface and therefore only updates a subset of coefficients in every iteration. Experimental results carried out in a realistically reverberant setup indicate that the performance of the proposed algorithm is comparable to the performance of its full-update counterpart.
ABSTRACT
Bilateral cochlear implants seek to restore the advantages of binaural hearing by improving access to binaural cues. Bilateral implant users are currently fitted with two processors, one in each ear, operating independent of one another. In this work, a different approach to bilateral processing is explored based on blind source separation (BSS) by utilizing two implants driven by a single processor. Sentences corrupted by interfering speech or speech-shaped noise are presented to bilateral cochlear implant users at 0 dB signal-to-noise ratio in order to evaluate the performance of the proposed BSS method. Subjects are tested in both anechoic and reverberant settings, wherein the target and masker signals are spatially separated. Results indicate substantial improvements in performance in both anechoic and reverberant settings over the subjects' daily strategies for both masker conditions and at various locations of the masker. It is speculated that such improvements are due to the fact that the proposed BSS algorithm capitalizes on the variations of interaural level differences and interaural time delays present in the mixtures of the signals received by the two microphones, and exploits that information to spatially separate the target from the masker signals.