Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 63
Filtrar
1.
Ear Hear ; 45(6): 1444-1460, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38816900

RESUMO

OBJECTIVES: This study aimed to determine the speech-to-background ratios (SBRs) at which normal-hearing (NH) and hearing-impaired (HI) listeners can recognize both speech and environmental sounds when the two types of signals are mixed. Also examined were the effect of individual sounds on speech recognition and environmental sound recognition (ESR), and the impact of divided versus selective attention on these tasks. DESIGN: In Experiment 1 (divided attention), 11 NH and 10 HI listeners heard sentences mixed with environmental sounds at various SBRs and performed speech recognition and ESR tasks concurrently in each trial. In Experiment 2 (selective attention), 20 NH listeners performed these tasks in separate trials. Psychometric functions were generated for each task, listener group, and environmental sound. The range over which speech recognition and ESR were both high was determined, as was the optimal SBR for balancing recognition with ESR, defined as the point of intersection between each pair of normalized psychometric functions. RESULTS: The NH listeners achieved greater than 95% accuracy on concurrent speech recognition and ESR over an SBR range of approximately 20 dB or greater. The optimal SBR for maximizing both speech recognition and ESR for NH listeners was approximately +12 dB. For the HI listeners, the range over which 95% performance was observed on both tasks was far smaller (span of 1 dB), with an optimal value of +5 dB. Acoustic analyses indicated that the speech and environmental sound stimuli were similarly audible, regardless of the hearing status of the listener, but that the speech fluctuated more than the environmental sounds. Divided versus selective attention conditions produced differences in performance that were statistically significant yet only modest in magnitude. In all conditions and for both listener groups, recognition was higher for environmental sounds than for speech when presented at equal intensities (i.e., 0 dB SBR), indicating that the environmental sounds were more effective maskers of speech than the converse. Each of the 25 environmental sounds used in this study (with one exception) had a span of SBRs over which speech recognition and ESR were both higher than 95%. These ranges tended to overlap substantially. CONCLUSIONS: A range of SBRs exists over which speech and environmental sounds can be simultaneously recognized with high accuracy by NH and HI listeners, but this range is larger for NH listeners. The single optimal SBR for jointly maximizing speech recognition and ESR also differs between NH and HI listeners. The greater masking effectiveness of the environmental sounds relative to the speech may be related to the lower degree of fluctuation present in the environmental sounds as well as possibly task differences between speech recognition and ESR (open versus closed set). The observed differences between the NH and HI results may possibly be related to the HI listeners' smaller fluctuating masker benefit. As noise-reduction systems become increasingly effective, the current results could potentially guide the design of future systems that provide listeners with highly intelligible speech without depriving them of access to important environmental sounds.


Assuntos
Atenção , Percepção da Fala , Humanos , Percepção da Fala/fisiologia , Adulto , Feminino , Masculino , Pessoa de Meia-Idade , Adulto Jovem , Atenção/fisiologia , Idoso , Ruído , Estudos de Casos e Controles , Perda Auditiva Neurossensorial/reabilitação , Perda Auditiva Neurossensorial/fisiopatologia , Psicometria
2.
J Acoust Soc Am ; 153(5): 2751, 2023 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-37133814

RESUMO

Recent years have brought considerable advances to our ability to increase intelligibility through deep-learning-based noise reduction, especially for hearing-impaired (HI) listeners. In this study, intelligibility improvements resulting from a current algorithm are assessed. These benefits are compared to those resulting from the initial demonstration of deep-learning-based noise reduction for HI listeners ten years ago in Healy, Yoho, Wang, and Wang [(2013). J. Acoust. Soc. Am. 134, 3029-3038]. The stimuli and procedures were broadly similar across studies. However, whereas the initial study involved highly matched training and test conditions, as well as non-causal operation, preventing its ability to operate in the real world, the current attentive recurrent network employed different noise types, talkers, and speech corpora for training versus test, as required for generalization, and it was fully causal, as required for real-time operation. Significant intelligibility benefit was observed in every condition, which averaged 51% points across conditions for HI listeners. Further, benefit was comparable to that obtained in the initial demonstration, despite the considerable additional demands placed on the current algorithm. The retention of large benefit despite the systematic removal of various constraints as required for real-world operation reflects the substantial advances made to deep-learning-based noise reduction.


Assuntos
Aprendizado Profundo , Auxiliares de Audição , Perda Auditiva Neurossensorial , Perda Auditiva , Percepção da Fala , Humanos , Inteligibilidade da Fala , Limiar Auditivo
3.
J Acoust Soc Am ; 149(6): 3943, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34241481

RESUMO

Real-time operation is critical for noise reduction in hearing technology. The essential requirement of real-time operation is causality-that an algorithm does not use future time-frame information and, instead, completes its operation by the end of the current time frame. This requirement is extended currently through the concept of "effectively causal," in which future time-frame information within the brief delay tolerance of the human speech-perception mechanism is used. Effectively causal deep learning was used to separate speech from background noise and improve intelligibility for hearing-impaired listeners. A single-microphone, gated convolutional recurrent network was used to perform complex spectral mapping. By estimating both the real and imaginary parts of the noise-free speech, both the magnitude and phase of the estimated noise-free speech were obtained. The deep neural network was trained using a large set of noises and tested using complex noises not employed during training. Significant algorithm benefit was observed in every condition, which was largest for those with the greatest hearing loss. Allowable delays across different communication settings are reviewed and assessed. The current work demonstrates that effectively causal deep learning can significantly improve intelligibility for one of the largest populations of need in challenging conditions involving untrained background noises.


Assuntos
Aprendizado Profundo , Auxiliares de Audição , Perda Auditiva Neurossensorial , Perda Auditiva , Percepção da Fala , Algoritmos , Audição , Perda Auditiva/diagnóstico , Perda Auditiva Neurossensorial/diagnóstico , Humanos , Inteligibilidade da Fala
4.
J Acoust Soc Am ; 150(5): 3976, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34852625

RESUMO

The fundamental requirement for real-time operation of a speech-processing algorithm is causality-that it operate without utilizing future time frames. In the present study, the performance of a fully causal deep computational auditory scene analysis algorithm was assessed. Target sentences were isolated from complex interference consisting of an interfering talker and concurrent room reverberation. The talker- and corpus/channel-independent model used Dense-UNet and temporal convolutional networks and estimated both magnitude and phase of the target speech. It was found that mean algorithm benefit was significant in every condition. Mean benefit for hearing-impaired (HI) listeners across all conditions was 46.4 percentage points. The cost of converting the algorithm to causal processing was also assessed by comparing to a prior non-causal version. Intelligibility decrements for HI and normal-hearing listeners from non-causal to causal processing were present in most but not all conditions, and these decrements were statistically significant in half of the conditions tested-those representing the greater levels of complex interference. Although a cost associated with causal processing was present in most conditions, it may be considered modest relative to the overall level of benefit.


Assuntos
Aprendizado Profundo , Perda Auditiva Neurossensorial , Percepção da Fala , Algoritmos , Humanos , Inteligibilidade da Fala
5.
J Acoust Soc Am ; 150(4): 2526, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34717521

RESUMO

The practical efficacy of deep learning based speaker separation and/or dereverberation hinges on its ability to generalize to conditions not employed during neural network training. The current study was designed to assess the ability to generalize across extremely different training versus test environments. Training and testing were performed using different languages having no known common ancestry and correspondingly large linguistic differences-English for training and Mandarin for testing. Additional generalizations included untrained speech corpus/recording channel, target-to-interferer energy ratios, reverberation room impulse responses, and test talkers. A deep computational auditory scene analysis algorithm, employing complex time-frequency masking to estimate both magnitude and phase, was used to segregate two concurrent talkers and simultaneously remove large amounts of room reverberation to increase the intelligibility of a target talker. Significant intelligibility improvements were observed for the normal-hearing listeners in every condition. Benefit averaged 43.5% points across conditions and was comparable to that obtained when training and testing were performed both in English. Benefit is projected to be considerably larger for individuals with hearing impairment. It is concluded that a properly designed and trained deep speaker separation/dereverberation network can be capable of generalization across vastly different acoustic environments that include different languages.


Assuntos
Aprendizado Profundo , Perda Auditiva , Percepção da Fala , Humanos , Idioma , Mascaramento Perceptivo , Inteligibilidade da Fala
6.
J Acoust Soc Am ; 148(3): 1552, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-33003879

RESUMO

Adverse listening conditions involve glimpses of spectro-temporal speech information. This study investigated if the acoustic organization of the spectro-temporal masking pattern affects speech glimpsing in "checkerboard" noise. The regularity and coherence of the masking pattern was varied. Regularity was reduced by randomizing the spectral or temporal gating of the masking noise. Coherence involved the spectral alignment of frequency bands across time or the temporal alignment of gated onsets/offsets across frequency bands. Experiment 1 investigated the effect of spectral or temporal coherence. Experiment 2 investigated independent and combined factors of regularity and coherence. Performance was best in spectro-temporally modulated noise having larger glimpses. Generally, performance also improved as the regularity and coherence of masker fluctuations increased, with regularity having a stronger effect than coherence. An acoustic glimpsing model suggested that the effect of regularity (but not coherence) could be partially attributed to the availability of glimpses retained after energetic masking. Performance tended to be better with maskers that were spectrally coherent as compared to temporally coherent. Overall, performance was best when the spectro-temporal masking pattern imposed even spectral sampling and minimal temporal uncertainty, indicating that listeners use reliable masking patterns to aid in spectro-temporal speech glimpsing.


Assuntos
Mascaramento Perceptivo , Percepção da Fala , Estimulação Acústica , Adolescente , Feminino , Humanos , Fala , Inteligibilidade da Fala , Incerteza , Adulto Jovem
7.
J Acoust Soc Am ; 147(6): 4106, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32611178

RESUMO

Deep learning based speech separation or noise reduction needs to generalize to voices not encountered during training and to operate under multiple corruptions. The current study provides such a demonstration for hearing-impaired (HI) listeners. Sentence intelligibility was assessed under conditions of a single interfering talker and substantial amounts of room reverberation. A talker-independent deep computational auditory scene analysis (CASA) algorithm was employed, in which talkers were separated and dereverberated in each time frame (simultaneous grouping stage), then the separated frames were organized to form two streams (sequential grouping stage). The deep neural networks consisted of specialized convolutional neural networks, one based on U-Net and the other a temporal convolutional network. It was found that every HI (and normal-hearing, NH) listener received algorithm benefit in every condition. Benefit averaged across all conditions ranged from 52 to 76 percentage points for individual HI listeners and averaged 65 points. Further, processed HI intelligibility significantly exceeded unprocessed NH intelligibility. Although the current utterance-based model was not implemented as a real-time system, a perspective on this important issue is provided. It is concluded that deep CASA represents a powerful framework capable of producing large increases in HI intelligibility for potentially any two voices.


Assuntos
Aprendizado Profundo , Perda Auditiva Neurossensorial , Percepção da Fala , Algoritmos , Audição , Humanos , Inteligibilidade da Fala
8.
J Acoust Soc Am ; 145(3): 1378, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-31067936

RESUMO

For deep learning based speech segregation to have translational significance as a noise-reduction tool, it must perform in a wide variety of acoustic environments. In the current study, performance was examined when target speech was subjected to interference from a single talker and room reverberation. Conditions were compared in which an algorithm was trained to remove both reverberation and interfering speech, or only interfering speech. A recurrent neural network incorporating bidirectional long short-term memory was trained to estimate the ideal ratio mask corresponding to target speech. Substantial intelligibility improvements were found for hearing-impaired (HI) and normal-hearing (NH) listeners across a range of target-to-interferer ratios (TIRs). HI listeners performed better with reverberation removed, whereas NH listeners demonstrated no difference. Algorithm benefit averaged 56 percentage points for the HI listeners at the least-favorable TIR, allowing these listeners to perform numerically better than young NH listeners without processing. The current study highlights the difficulty associated with perceiving speech in reverberant-noisy environments, and it extends the range of environments in which deep learning based speech segregation can be effectively applied. This increasingly wide array of environments includes not only a variety of background noises and interfering speech, but also room reverberation.


Assuntos
Aprendizado Profundo , Auxiliares de Audição/normas , Perda Auditiva Neurossensorial/reabilitação , Inteligibilidade da Fala , Interface para o Reconhecimento da Fala , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Razão Sinal-Ruído , Percepção da Fala
9.
J Acoust Soc Am ; 145(6): EL581, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31255108

RESUMO

Hearing-impaired listeners' intolerance to background noise during speech perception is well known. The current study employed speech materials free of ceiling effects to reveal the optimal trade-off between rejecting noise and retaining speech during time-frequency masking. This relative criterion value (-7 dB) was found to hold across noise types that differ in acoustic spectro-temporal complexity. It was also found that listeners with hearing impairment and those with normal hearing performed optimally at this same value, suggesting no true noise intolerance once time-frequency units containing speech are extracted.


Assuntos
Limiar Auditivo/fisiologia , Perda Auditiva/fisiopatologia , Ruído , Percepção da Fala/fisiologia , Fala/fisiologia , Adulto , Percepção Auditiva/fisiologia , Feminino , Perda Auditiva Neurossensorial/fisiopatologia , Humanos , Adulto Jovem
10.
J Acoust Soc Am ; 144(3): 1392, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-30424638

RESUMO

Time-frequency (T-F) masks represent powerful tools to increase the intelligibility of speech in background noise. Translational relevance is provided by their accurate estimation based only on the signal-plus-noise mixture, using deep learning or other machine-learning techniques. In the current study, a technique is designed to capture the benefits of existing techniques. In the ideal quantized mask (IQM), speech and noise are partitioned into T-F units, and each unit receives one of N attenuations according to its signal-to-noise ratio. It was found that as few as four to eight attenuation steps (IQM4, IQM8) improved intelligibility over the ideal binary mask (IBM, having two attenuation steps), and equaled the intelligibility resulting from the ideal ratio mask (IRM, having a theoretically infinite number of steps). Sound-quality ratings and rankings of noisy speech processed by the IQM4 and IQM8 were also superior to that processed by the IBM and equaled or exceeded that processed by the IRM. It is concluded that the intelligibility and sound-quality advantages of infinite attenuation resolution can be captured by an IQM having only a very small number of steps. Further, the classification-based nature of the IQM might provide algorithmic advantages over the regression-based IRM during machine estimation.


Assuntos
Estimulação Acústica/métodos , Ruído , Mascaramento Perceptivo/fisiologia , Espectrografia do Som/métodos , Inteligibilidade da Fala/fisiologia , Percepção da Fala/fisiologia , Adulto , Feminino , Humanos , Masculino , Acústica da Fala , Adulto Jovem
11.
J Acoust Soc Am ; 144(3): 1627, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-30424625

RESUMO

Recently, deep learning based speech segregation has been shown to improve human speech intelligibility in noisy environments. However, one important factor not yet considered is room reverberation, which characterizes typical daily environments. The combination of reverberation and background noise can severely degrade speech intelligibility for hearing-impaired (HI) listeners. In the current study, a deep learning based time-frequency masking algorithm was proposed to address both room reverberation and background noise. Specifically, a deep neural network was trained to estimate the ideal ratio mask, where anechoic-clean speech was considered as the desired signal. Intelligibility testing was conducted under reverberant-noisy conditions with reverberation time T 60 = 0.6 s, plus speech-shaped noise or babble noise at various signal-to-noise ratios. The experiments demonstrated that substantial speech intelligibility improvements were obtained for HI listeners. The algorithm was also somewhat beneficial for normal-hearing (NH) listeners. In addition, sentence intelligibility scores for HI listeners with algorithm processing approached or matched those of young-adult NH listeners without processing. The current study represents a step toward deploying deep learning algorithms to help the speech understanding of HI listeners in everyday conditions.


Assuntos
Algoritmos , Aprendizado Profundo , Perda Auditiva Neurossensorial/terapia , Ruído , Mascaramento Perceptivo/fisiologia , Inteligibilidade da Fala/fisiologia , Idoso , Feminino , Auxiliares de Audição , Perda Auditiva Neurossensorial/fisiopatologia , Humanos , Masculino , Pessoa de Meia-Idade , Ruído/efeitos adversos , Adulto Jovem
12.
J Acoust Soc Am ; 143(4): 2527, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29716288

RESUMO

The degrading influence of noise on various critical bands of speech was assessed. A modified version of the compound method [Apoux and Healy (2012) J. Acoust. Soc. Am. 132, 1078-1087] was employed to establish this noise susceptibility for each speech band. Noise was added to the target speech band at various signal-to-noise ratios to determine the amount of noise required to reduce the contribution of that band by 50%. It was found that noise susceptibility is not equal across the speech spectrum, as is commonly assumed and incorporated into modern indexes. Instead, the signal-to-noise ratio required to equivalently impact various speech bands differed by as much as 13 dB. This noise susceptibility formed an irregular pattern across frequency, despite the use of multi-talker speech materials designed to reduce the potential influence of a particular talker's voice. But basic trends in the pattern of noise susceptibility across the spectrum emerged. Further, no systematic relationship was observed between noise susceptibility and speech band importance. It is argued here that susceptibility to noise and band importance are different phenomena, and that this distinction may be underappreciated in previous works.


Assuntos
Estimulação Acústica/métodos , Percepção Auditiva/fisiologia , Limiar Auditivo/fisiologia , Audição/fisiologia , Ruído , Percepção da Fala/fisiologia , Adulto , Feminino , Humanos , Masculino , Razão Sinal-Ruído , Testes de Discriminação da Fala , Adulto Jovem
13.
J Acoust Soc Am ; 143(5): 3047, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29857753

RESUMO

Speech recognition in fluctuating maskers is influenced by the spectro-temporal properties of the noise. Three experiments examined different temporal and spectro-temporal noise properties. Experiment 1 replicated previous work by highlighting maximum performance at a temporal gating rate of 4-8 Hz. Experiment 2 involved spectro-temporal glimpses. Performance was best with the largest glimpses, and performance with small glimpses approached that for continuous noise matched to the average level of the modulated noise. Better performance occurred with periodic than for random spectro-temporal glimpses. Finally, time and frequency for spectro-temporal glimpses were dissociated in experiment 3. Larger spectral glimpses were more beneficial than smaller, and minimum performance was observed at a gating rate of 4-8 Hz. The current results involving continuous speech in gated noise (slower and larger glimpses most advantageous) run counter to several results involving gated and/or filtered speech, where a larger number of smaller speech samples is often advantageous. This is because mechanisms of masking dominate, negating the advantages of better speech-information sampling. It is suggested that spectro-temporal glimpsing combines temporal glimpsing with additional processes of simultaneous masking and uncomodulation, and continuous speech in gated noise is a better model for real-world glimpsing than is gated and/or filtered speech.


Assuntos
Estimulação Acústica/métodos , Ruído , Mascaramento Perceptivo/fisiologia , Inteligibilidade da Fala/fisiologia , Percepção da Fala/fisiologia , Feminino , Humanos , Masculino , Fatores de Tempo , Adulto Jovem
14.
J Acoust Soc Am ; 143(3): 1417, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29604719

RESUMO

Band-importance functions created using the compound method [Apoux and Healy (2012). J. Acoust. Soc. Am. 132, 1078-1087] provide more detail than those generated using the ANSI technique, necessitating and allowing a re-examination of the influences of speech material and talker on the shape of the band-importance function. More specifically, the detailed functions may reflect, to a larger extent, acoustic idiosyncrasies of the individual talker's voice. Twenty-one band functions were created using standard speech materials and recordings by different talkers. The band-importance functions representing the same speech-material type produced by different talkers were found to be more similar to one another than functions representing the same talker producing different speech-material types. Thus, the primary finding was the relative strength of a speech-material effect and weakness of a talker effect. This speech-material effect extended to other materials in the same broad class (different sentence corpora) despite considerable differences in the specific materials. Characteristics of individual talkers' voices were not readily apparent in the functions, and the talker effect was restricted to more global aspects of talker (i.e., gender). Finally, the use of multiple talkers diminished any residual effect of the talker.


Assuntos
Inteligibilidade da Fala , Percepção da Fala , Qualidade da Voz , Adulto , Audiometria de Tons Puros , Audiometria da Fala , Limiar Auditivo , Feminino , Humanos , Masculino , Ruído , Mascaramento Perceptivo , Fatores Sexuais , Acústica da Fala , Adulto Jovem
15.
J Acoust Soc Am ; 141(6): 4230, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-28618817

RESUMO

Individuals with hearing impairment have particular difficulty perceptually segregating concurrent voices and understanding a talker in the presence of a competing voice. In contrast, individuals with normal hearing perform this task quite well. This listening situation represents a very different problem for both the human and machine listener, when compared to perceiving speech in other types of background noise. A machine learning algorithm is introduced here to address this listening situation. A deep neural network was trained to estimate the ideal ratio mask for a male target talker in the presence of a female competing talker. The monaural algorithm was found to produce sentence-intelligibility increases for hearing-impaired (HI) and normal-hearing (NH) listeners at various signal-to-noise ratios (SNRs). This benefit was largest for the HI listeners and averaged 59%-points at the least-favorable SNR, with a maximum of 87%-points. The mean intelligibility achieved by the HI listeners using the algorithm was equivalent to that of young NH listeners without processing, under conditions of identical interference. Possible reasons for the limited ability of HI listeners to perceptually segregate concurrent voices are reviewed as are possible implementation considerations for algorithms like the current one.


Assuntos
Correção de Deficiência Auditiva/instrumentação , Aprendizado Profundo , Auxiliares de Audição , Perda Auditiva Neurossensorial/reabilitação , Mascaramento Perceptivo , Pessoas com Deficiência Auditiva/reabilitação , Processamento de Sinais Assistido por Computador , Inteligibilidade da Fala , Percepção da Fala , Estimulação Acústica , Idoso , Audiometria da Fala , Limiar Auditivo , Compreensão , Feminino , Audição , Perda Auditiva Neurossensorial/diagnóstico , Perda Auditiva Neurossensorial/fisiopatologia , Perda Auditiva Neurossensorial/psicologia , Humanos , Masculino , Pessoa de Meia-Idade , Pessoas com Deficiência Auditiva/psicologia , Razão Sinal-Ruído , Adulto Jovem
16.
J Acoust Soc Am ; 140(4): 2542, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-27794278

RESUMO

Listeners can reliably perceive speech in noisy conditions, but it is not well understood what specific features of speech they use to do this. This paper introduces a data-driven framework to identify the time-frequency locations of these features. Using the same speech utterance mixed with many different noise instances, the framework is able to compute the importance of each time-frequency point in the utterance to its intelligibility. The mixtures have approximately the same global signal-to-noise ratio at each frequency, but very different recognition rates. The difference between these intelligible vs unintelligible mixtures is the alignment between the speech and spectro-temporally modulated noise, providing different combinations of "glimpses" of speech in each mixture. The current results reveal the locations of these important noise-robust phonetic features in a restricted set of syllables. Classification models trained to predict whether individual mixtures are intelligible based on the location of these glimpses can generalize to new conditions, successfully predicting the intelligibility of novel mixtures. They are able to generalize to novel noise instances, novel productions of the same word by the same talker, novel utterances of the same word spoken by different talkers, and, to some extent, novel consonants.


Assuntos
Fala , Compreensão , Ruído , Fonética , Percepção da Fala
17.
J Acoust Soc Am ; 139(5): 2604, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27250154

RESUMO

Supervised speech segregation has been recently shown to improve human speech intelligibility in noise, when trained and tested on similar noises. However, a major challenge involves the ability to generalize to entirely novel noises. Such generalization would enable hearing aid and cochlear implant users to improve speech intelligibility in unknown noisy environments. This challenge is addressed in the current study through large-scale training. Specifically, a deep neural network (DNN) was trained on 10 000 noises to estimate the ideal ratio mask, and then employed to separate sentences from completely new noises (cafeteria and babble) at several signal-to-noise ratios (SNRs). Although the DNN was trained at the fixed SNR of - 2 dB, testing using hearing-impaired listeners demonstrated that speech intelligibility increased substantially following speech segregation using the novel noises and unmatched SNR conditions of 0 dB and 5 dB. Sentence intelligibility benefit was also observed for normal-hearing listeners in most noisy conditions. The results indicate that DNN-based supervised speech segregation with large-scale training is a very promising approach for generalization to new acoustic environments.


Assuntos
Auxiliares de Audição , Perda Auditiva Neurossensorial/reabilitação , Ruído/efeitos adversos , Mascaramento Perceptivo , Pessoas com Deficiência Auditiva/reabilitação , Processamento de Sinais Assistido por Computador , Acústica da Fala , Inteligibilidade da Fala , Percepção da Fala , Estimulação Acústica , Adulto , Idoso , Algoritmos , Limiar Auditivo , Estudos de Casos e Controles , Estimulação Elétrica , Feminino , Perda Auditiva Neurossensorial/diagnóstico , Perda Auditiva Neurossensorial/fisiopatologia , Perda Auditiva Neurossensorial/psicologia , Humanos , Masculino , Pessoa de Meia-Idade , Redes Neurais de Computação , Pessoas com Deficiência Auditiva/psicologia , Espectrografia do Som , Adulto Jovem
18.
J Acoust Soc Am ; 138(3): 1469-80, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26428784

RESUMO

Speech intelligibility in noise can be degraded by using vocoder processing to alter the temporal fine structure (TFS). Here it is argued that this degradation is not attributable to the loss of speech information potentially present in the TFS. Instead it is proposed that the degradation results from the loss of sound-source segregation information when two or more carriers (i.e., TFS) are substituted with only one as a consequence of vocoder processing. To demonstrate this segregation role, vocoder processing involving two carriers, one for the target and one for the background, was implemented. Because this approach does not preserve the speech TFS, it may be assumed that any improvement in intelligibility can only be a consequence of the preserved carrier duality and associated segregation cues. Three experiments were conducted using this "dual-carrier" approach. All experiments showed substantial sentence intelligibility in noise improvements compared to traditional single-carrier conditions. In several conditions, the improvement was so substantial that intelligibility approximated that for unprocessed speech in noise. A foreseeable and potentially promising implication for the dual-carrier approach involves implementation into cochlear implant speech processors, where it may provide the TFS cues necessary to segregate speech from noise.


Assuntos
Implantes Cocleares , Sinais (Psicologia) , Percepção da Fala/fisiologia , Estimulação Acústica , Adulto , Audiometria de Tons Puros , Audiometria da Fala , Feminino , Humanos , Masculino , Ruído , Mascaramento Perceptivo/fisiologia , Inteligibilidade da Fala/fisiologia , Adulto Jovem
19.
J Acoust Soc Am ; 138(3): 1660-9, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26428803

RESUMO

Machine learning algorithms to segregate speech from background noise hold considerable promise for alleviating limitations associated with hearing impairment. One of the most important considerations for implementing these algorithms into devices such as hearing aids and cochlear implants involves their ability to generalize to conditions not employed during the training stage. A major challenge involves the generalization to novel noise segments. In the current study, sentences were segregated from multi-talker babble and from cafeteria noise using an algorithm that employs deep neural networks to estimate the ideal ratio mask. Importantly, the algorithm was trained on segments of noise and tested using entirely novel segments of the same nonstationary noise type. Substantial sentence-intelligibility benefit was observed for hearing-impaired listeners in both noise types, despite the use of unseen noise segments during the test stage. Interestingly, normal-hearing listeners displayed benefit in babble but not in cafeteria noise. This result highlights the importance of evaluating these algorithms not only in human subjects, but in members of the actual target population.


Assuntos
Perda Auditiva/fisiopatologia , Inteligibilidade da Fala/fisiologia , Adulto , Idoso , Algoritmos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Ruído , Mascaramento Perceptivo/fisiologia , Acústica da Fala , Percepção da Fala/fisiologia
20.
J Acoust Soc Am ; 135(2): 581-4, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25234867

RESUMO

The relative independence of time-unit processing during speech reception was examined. It was found that temporally interpolated noise, even at very high levels, had little effect on sentence recognition using masking-release conditions similar to those of Kwon et al. [(2012). J. Acoust. Soc. Am. 131, 3111-3119]. The current data confirm the earlier conclusions of Kwon et al. involving masking release based on the relative timing of speech and noise. These data also indicate substantial levels of independence in the time domain, which has implications for current theories of speech perception in noise.


Assuntos
Implantes Cocleares , Perda Auditiva Neurossensorial/fisiopatologia , Mascaramento Perceptivo/fisiologia , Reconhecimento Psicológico/fisiologia , Percepção da Fala/fisiologia , Feminino , Humanos , Masculino
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA