|

1.

Do They Know It's Christmash? Lexical Knowledge Directly Impacts Speech Perception.

Luthra, Sahil; Crinnion, Anne Marie; Saltzman, David; Magnuson, James S.

Cogn Sci ; 48(5): e13449, 2024 May.

Article En | MEDLINE | ID: mdl-38773754

We recently reported strong, replicable (i.e., replicated) evidence for lexically mediated compensation for coarticulation (LCfC; Luthra et al., 2021), whereby lexical knowledge influences a prelexical process. Critically, evidence for LCfC provides robust support for interactive models of cognition that include top-down feedback and is inconsistent with autonomous models that allow only feedforward processing. McQueen, Jesse, and Mitterer (2023) offer five counter-arguments against our interpretation; we respond to each of those arguments here and conclude that top-down feedback provides the most parsimonious explanation of extant data.

Speech Perception , Humans , Speech Perception/physiology , Cognition , Language

2.

Feasibility of decoding covert speech in ECoG with a Transformer trained on overt speech.

Komeiji, Shuji; Mitsuhashi, Takumi; Iimura, Yasushi; Suzuki, Hiroharu; Sugano, Hidenori; Shinoda, Koichi; Tanaka, Toshihisa.

Sci Rep ; 14(1): 11491, 2024 05 20.

Article En | MEDLINE | ID: mdl-38769115

Several attempts for speech brain-computer interfacing (BCI) have been made to decode phonemes, sub-words, words, or sentences using invasive measurements, such as the electrocorticogram (ECoG), during auditory speech perception, overt speech, or imagined (covert) speech. Decoding sentences from covert speech is a challenging task. Sixteen epilepsy patients with intracranially implanted electrodes participated in this study, and ECoGs were recorded during overt speech and covert speech of eight Japanese sentences, each consisting of three tokens. In particular, Transformer neural network model was applied to decode text sentences from covert speech, which was trained using ECoGs obtained during overt speech. We first examined the proposed Transformer model using the same task for training and testing, and then evaluated the model's performance when trained with overt task for decoding covert speech. The Transformer model trained on covert speech achieved an average token error rate (TER) of 46.6% for decoding covert speech, whereas the model trained on overt speech achieved a TER of 46.3% ( p > 0.05 ; d = 0.07 ) . Therefore, the challenge of collecting training data for covert speech can be addressed using overt speech. The performance of covert speech can improve by employing several overt speeches.

Brain-Computer Interfaces , Electrocorticography , Speech , Humans , Female , Male , Adult , Speech/physiology , Speech Perception/physiology , Young Adult , Feasibility Studies , Epilepsy/physiopathology , Neural Networks, Computer , Middle Aged , Adolescent

3.

Attention capture by own name decreases with speech compression.

Li, Simon Y W; Lee, Alan L F; Chiu, Jenny W S; Loeb, Robert G; Sanderson, Penelope M.

Cogn Res Princ Implic ; 9(1): 29, 2024 05 12.

Article En | MEDLINE | ID: mdl-38735013

Auditory stimuli that are relevant to a listener have the potential to capture focal attention even when unattended, the listener's own name being a particularly effective stimulus. We report two experiments to test the attention-capturing potential of the listener's own name in normal speech and time-compressed speech. In Experiment 1, 39 participants were tested with a visual word categorization task with uncompressed spoken names as background auditory distractors. Participants' word categorization performance was slower when hearing their own name rather than other names, and in a final test, they were faster at detecting their own name than other names. Experiment 2 used the same task paradigm, but the auditory distractors were time-compressed names. Three compression levels were tested with 25 participants in each condition. Participants' word categorization performance was again slower when hearing their own name than when hearing other names; the slowing was strongest with slight compression and weakest with intense compression. Personally relevant time-compressed speech has the potential to capture attention, but the degree of capture depends on the level of compression. Attention capture by time-compressed speech has practical significance and provides partial evidence for the duplex-mechanism account of auditory distraction.

Attention , Names , Speech Perception , Humans , Attention/physiology , Female , Male , Speech Perception/physiology , Adult , Young Adult , Speech/physiology , Reaction Time/physiology , Acoustic Stimulation

4.

Stapedotomy Versus Cochlear Implantation for Far Advanced Otosclerosis: Insights From a Patient With Matched Preoperative and Postoperative Function.

Khandalavala, Karl R; Dornhoffer, Jim R; Lane, John I; Carlson, Matthew L.

Otol Neurotol ; 45(5): e381-e384, 2024 Jun 01.

Article En | MEDLINE | ID: mdl-38728553

OBJECTIVE: To examine patient preference after stapedotomy versus cochlear implantation in a unique case of a patient with symmetrical profound mixed hearing loss and similar postoperative speech perception improvement. PATIENTS: An adult patient with bilateral symmetrical far advanced otosclerosis, with profound mixed hearing loss. INTERVENTION: Stapedotomy in the left ear, cochlear implantation in the right ear. MAIN OUTCOME MEASURE: Performance on behavioral audiometry, and subjective report of hearing and intervention preference. RESULTS: A patient successfully underwent left stapedotomy and subsequent cochlear implantation on the right side, per patient preference. Preoperative audiometric characteristics were similar between ears (pure-tone average [PTA] [R: 114; L: 113 dB]; word recognition score [WRS]: 22%). Postprocedural audiometry demonstrated significant improvement after stapedotomy (PTA: 59 dB, WRS: 75%) and from cochlear implant (PTA: 20 dB, WRS: 60%). The patient subjectively reported a preference for the cochlear implant ear despite having substantial gains from stapedotomy. A nuanced discussion highlighting potentially overlooked benefits of cochlear implants in far advanced otosclerosis is conducted. CONCLUSION: In comparison with stapedotomy and hearing aids, cochlear implantation generally permits greater access to sound among patients with far advanced otosclerosis. Though the cochlear implant literature mainly focuses on speech perception outcomes, an underappreciated benefit of cochlear implantation is the high likelihood of achieving "normal" sound levels across the audiogram.

Cochlear Implantation , Otosclerosis , Speech Perception , Stapes Surgery , Humans , Otosclerosis/surgery , Stapes Surgery/methods , Cochlear Implantation/methods , Speech Perception/physiology , Treatment Outcome , Male , Middle Aged , Hearing Loss, Mixed Conductive-Sensorineural/surgery , Audiometry, Pure-Tone , Patient Preference , Female , Adult

5.

Perceptual Consequences of Cochlear Deafferentation in Humans.

Bramhall, Naomi F; McMillan, Garnett P.

Trends Hear ; 28: 23312165241239541, 2024.

Article En | MEDLINE | ID: mdl-38738337

Cochlear synaptopathy, a form of cochlear deafferentation, has been demonstrated in a number of animal species, including non-human primates. Both age and noise exposure contribute to synaptopathy in animal models, indicating that it may be a common type of auditory dysfunction in humans. Temporal bone and auditory physiological data suggest that age and occupational/military noise exposure also lead to synaptopathy in humans. The predicted perceptual consequences of synaptopathy include tinnitus, hyperacusis, and difficulty with speech-in-noise perception. However, confirming the perceptual impacts of this form of cochlear deafferentation presents a particular challenge because synaptopathy can only be confirmed through post-mortem temporal bone analysis and auditory perception is difficult to evaluate in animals. Animal data suggest that deafferentation leads to increased central gain, signs of tinnitus and abnormal loudness perception, and deficits in temporal processing and signal-in-noise detection. If equivalent changes occur in humans following deafferentation, this would be expected to increase the likelihood of developing tinnitus, hyperacusis, and difficulty with speech-in-noise perception. Physiological data from humans is consistent with the hypothesis that deafferentation is associated with increased central gain and a greater likelihood of tinnitus perception, while human data on the relationship between deafferentation and hyperacusis is extremely limited. Many human studies have investigated the relationship between physiological correlates of deafferentation and difficulty with speech-in-noise perception, with mixed findings. A non-linear relationship between deafferentation and speech perception may have contributed to the mixed results. When differences in sample characteristics and study measurements are considered, the findings may be more consistent.

Cochlea , Speech Perception , Tinnitus , Humans , Cochlea/physiopathology , Tinnitus/physiopathology , Tinnitus/diagnosis , Animals , Speech Perception/physiology , Hyperacusis/physiopathology , Noise/adverse effects , Auditory Perception/physiology , Synapses/physiology , Hearing Loss, Noise-Induced/physiopathology , Hearing Loss, Noise-Induced/diagnosis , Loudness Perception

6.

Extending Subcortical EEG Responses to Continuous Speech to the Sound-Field.

Bachmann, Florine L; Kulasingham, Joshua P; Eskelund, Kasper; Enqvist, Martin; Alickovic, Emina; Innes-Brown, Hamish.

Trends Hear ; 28: 23312165241246596, 2024.

Article En | MEDLINE | ID: mdl-38738341

The auditory brainstem response (ABR) is a valuable clinical tool for objective hearing assessment, which is conventionally detected by averaging neural responses to thousands of short stimuli. Progressing beyond these unnatural stimuli, brainstem responses to continuous speech presented via earphones have been recently detected using linear temporal response functions (TRFs). Here, we extend earlier studies by measuring subcortical responses to continuous speech presented in the sound-field, and assess the amount of data needed to estimate brainstem TRFs. Electroencephalography (EEG) was recorded from 24 normal hearing participants while they listened to clicks and stories presented via earphones and loudspeakers. Subcortical TRFs were computed after accounting for non-linear processing in the auditory periphery by either stimulus rectification or an auditory nerve model. Our results demonstrated that subcortical responses to continuous speech could be reliably measured in the sound-field. TRFs estimated using auditory nerve models outperformed simple rectification, and 16âminutes of data was sufficient for the TRFs of all participants to show clear wave V peaks for both earphones and sound-field stimuli. Subcortical TRFs to continuous speech were highly consistent in both earphone and sound-field conditions, and with click ABRs. However, sound-field TRFs required slightly more data (16âminutes) to achieve clear wave V peaks compared to earphone TRFs (12âminutes), possibly due to effects of room acoustics. By investigating subcortical responses to sound-field speech stimuli, this study lays the groundwork for bringing objective hearing assessment closer to real-life conditions, which may lead to improved hearing evaluations and smart hearing technologies.

Acoustic Stimulation , Electroencephalography , Evoked Potentials, Auditory, Brain Stem , Speech Perception , Humans , Evoked Potentials, Auditory, Brain Stem/physiology , Male , Female , Speech Perception/physiology , Acoustic Stimulation/methods , Adult , Young Adult , Auditory Threshold/physiology , Time Factors , Cochlear Nerve/physiology , Healthy Volunteers

7.

Effects of spatial configuration and fundamental frequency on speech intelligibility in multiple-talker conditions in the ipsilateral horizontal plane and median planea).

Yao, Dingding; Zhao, Jiale; Wang, Linyi; Shang, Zengqiang; Gu, Jianjun; Wang, Yunan; Jia, Maoshen; Li, Junfeng.

J Acoust Soc Am ; 155(5): 2934-2947, 2024 May 01.

Article En | MEDLINE | ID: mdl-38717201

Spatial separation and fundamental frequency (F0) separation are effective cues for improving the intelligibility of target speech in multi-talker scenarios. Previous studies predominantly focused on spatial configurations within the frontal hemifield, overlooking the ipsilateral side and the entire median plane, where localization confusion often occurs. This study investigated the impact of spatial and F0 separation on intelligibility under the above-mentioned underexplored spatial configurations. The speech reception thresholds were measured through three experiments for scenarios involving two to four talkers, either in the ipsilateral horizontal plane or in the entire median plane, utilizing monotonized speech with varying F0s as stimuli. The results revealed that spatial separation in symmetrical positions (front-back symmetry in the ipsilateral horizontal plane or front-back, up-down symmetry in the median plane) contributes positively to intelligibility. Both target direction and relative target-masker separation influence the masking release attributed to spatial separation. As the number of talkers exceeds two, the masking release from spatial separation diminishes. Nevertheless, F0 separation remains as a remarkably effective cue and could even facilitate spatial separation in improving intelligibility. Further analysis indicated that current intelligibility models encounter difficulties in accurately predicting intelligibility in scenarios explored in this study.

Cues , Perceptual Masking , Sound Localization , Speech Intelligibility , Speech Perception , Humans , Female , Male , Young Adult , Adult , Speech Perception/physiology , Acoustic Stimulation , Auditory Threshold , Speech Acoustics , Speech Reception Threshold Test , Noise

8.

Aging affects auditory contributions to focus perception in Jianghuai Mandarina).

Zhao, Xinxian; Yang, Xiaohu.

J Acoust Soc Am ; 155(5): 2990-3004, 2024 May 01.

Article En | MEDLINE | ID: mdl-38717206

Speakers can place their prosodic prominence on any locations within a sentence, generating focus prosody for listeners to perceive new information. This study aimed to investigate age-related changes in the bottom-up processing of focus perception in Jianghuai Mandarin by clarifying the perceptual cues and the auditory processing abilities involved in the identification of focus locations. Young, middle-aged, and older speakers of Jianghuai Mandarin completed a focus identification task and an auditory perception task. The results showed that increasing age led to a decrease in listeners' accuracy rate in identifying focus locations, with all participants performing the worst when dynamic pitch cues were inaccessible. Auditory processing abilities did not predict focus perception performance in young and middle-aged listeners but accounted significantly for the variance in older adults' performance. These findings suggest that age-related deteriorations in focus perception can be largely attributed to declined auditory processing of perceptual cues. Poor ability to extract frequency modulation cues may be the most important underlying psychoacoustic factor for older adults' difficulties in perceiving focus prosody in Jianghuai Mandarin. The results contribute to our understanding of the bottom-up mechanisms involved in linguistic prosody processing in aging adults, particularly in tonal languages.

Aging , Cues , Speech Perception , Humans , Middle Aged , Aged , Male , Female , Aging/psychology , Aging/physiology , Young Adult , Adult , Speech Perception/physiology , Age Factors , Speech Acoustics , Acoustic Stimulation , Pitch Perception , Language , Voice Quality , Psychoacoustics , Audiometry, Speech

9.

Impact of speech rate on perception of vowel and consonant duration by bilinguals and monolinguals.

Hisagi, Miwako; Higby, Eve; Zandona, Mike; Acosta, Annett P; Kent, Justin; Tajima, Keiichi.

JASA Express Lett ; 4(5)2024 May 01.

Article En | MEDLINE | ID: mdl-38717469

The perceptual boundary between short and long categories depends on speech rate. We investigated the influence of speech rate on perceptual boundaries for short and long vowel and consonant contrasts by Spanish-English bilingual listeners and English monolinguals. Listeners tended to adapt their perceptual boundaries to speech rates, but the strategy differed between groups, especially for consonants. Understanding the factors that influence auditory processing in this population is essential for developing appropriate assessments of auditory comprehension. These findings have implications for the clinical care of older populations whose ability to rely on spectral and/or temporal information in the auditory signal may decline.

Multilingualism , Speech Perception , Humans , Speech Perception/physiology , Female , Male , Adult , Phonetics , Young Adult

10.

Cochlear-implant listeners benefit from training with time-compressed speech, even at advanced ages.

Ezenwa, Amara C; Goupell, Matthew J; Gordon-Salant, Sandra.

JASA Express Lett ; 4(5)2024 May 01.

Article En | MEDLINE | ID: mdl-38717468

This study evaluated whether adaptive training with time-compressed speech produces an age-dependent improvement in speech recognition in 14 adult cochlear-implant users. The protocol consisted of a pretest, 5 h of training, and a posttest using time-compressed speech and an adaptive procedure. There were significant improvements in time-compressed speech recognition at the posttest session following training (>5% in the average time-compressed speech recognition threshold) but no effects of age. These results are promising for the use of adaptive training in aural rehabilitation strategies for cochlear-implant users across the adult lifespan and possibly using speech signals, such as time-compressed speech, to train temporal processing.

Cochlear Implants , Speech Perception , Humans , Speech Perception/physiology , Aged , Male , Middle Aged , Female , Adult , Aged, 80 and over , Cochlear Implantation/methods , Time Factors

11.

The Multimodal Trust Effects of Face, Voice, and Sentence Content.

Syed, Isar; Baart, Martijn; Vroomen, Jean.

Multisens Res ; 37(2): 125-141, 2024 Apr 03.

Article En | MEDLINE | ID: mdl-38714314

Trust is an aspect critical to human social interaction and research has identified many cues that help in the assimilation of this social trait. Two of these cues are the pitch of the voice and the width-to-height ratio of the face (fWHR). Additionally, research has indicated that the content of a spoken sentence itself has an effect on trustworthiness; a finding that has not yet been brought into multisensory research. The current research aims to investigate previously developed theories on trust in relation to vocal pitch, fWHR, and sentence content in a multimodal setting. Twenty-six female participants were asked to judge the trustworthiness of a voice speaking a neutral or romantic sentence while seeing a face. The average pitch of the voice and the fWHR were varied systematically. Results indicate that the content of the spoken message was an important predictor of trustworthiness extending into multimodality. Further, the mean pitch of the voice and fWHR of the face appeared to be useful indicators in a multimodal setting. These effects interacted with one another across modalities. The data demonstrate that trust in the voice is shaped by task-irrelevant visual stimuli. Future research is encouraged to clarify whether these findings remain consistent across genders, age groups, and languages.

Face , Trust , Voice , Humans , Female , Voice/physiology , Young Adult , Adult , Face/physiology , Speech Perception/physiology , Pitch Perception/physiology , Facial Recognition/physiology , Cues , Adolescent

12.

Neural processing of speech comprehension in noise predicts individual age using fNIRS-based brain-behavior models.

Liu, Yi; Wang, Songjian; Lu, Jing; Ding, Junhua; Chen, Younuo; Yang, Liu; Wang, Shuo.

Cereb Cortex ; 34(5)2024 May 02.

Article En | MEDLINE | ID: mdl-38715408

Speech comprehension in noise depends on complex interactions between peripheral sensory and central cognitive systems. Despite having normal peripheral hearing, older adults show difficulties in speech comprehension. It remains unclear whether the brain's neural responses could indicate aging. The current study examined whether individual brain activation during speech perception in different listening environments could predict age. We applied functional near-infrared spectroscopy to 93 normal-hearing human adults (20 to 70 years old) during a sentence listening task, which contained a quiet condition and 4 different signal-to-noise ratios (SNR = 10, 5, 0, -5 dB) noisy conditions. A data-driven approach, the region-based brain-age predictive modeling was adopted. We observed a significant behavioral decrease with age under the 4 noisy conditions, but not under the quiet condition. Brain activations in SNR = 10 dB listening condition could successfully predict individual's age. Moreover, we found that the bilateral visual sensory cortex, left dorsal speech pathway, left cerebellum, right temporal-parietal junction area, right homolog Wernicke's area, and right middle temporal gyrus contributed most to prediction performance. These results demonstrate that the activations of regions about sensory-motor mapping of sound, especially in noisy conditions, could be sensitive measures for age prediction than external behavior measures.

Aging , Brain , Comprehension , Noise , Spectroscopy, Near-Infrared , Speech Perception , Humans , Adult , Speech Perception/physiology , Male , Female , Spectroscopy, Near-Infrared/methods , Middle Aged , Young Adult , Aged , Comprehension/physiology , Brain/physiology , Brain/diagnostic imaging , Aging/physiology , Brain Mapping/methods , Acoustic Stimulation/methods

13.

Eye movements track prioritized auditory features in selective attention to natural speech.

Gehmacher, Quirin; Schubert, Juliane; Schmidt, Fabian; Hartmann, Thomas; Reisinger, Patrick; Rösch, Sebastian; Schwarz, Konrad; Popov, Tzvetan; Chait, Maria; Weisz, Nathan.

Nat Commun ; 15(1): 3692, 2024 May 01.

Article En | MEDLINE | ID: mdl-38693186

Over the last decades, cognitive neuroscience has identified a distributed set of brain regions that are critical for attention. Strong anatomical overlap with brain regions critical for oculomotor processes suggests a joint network for attention and eye movements. However, the role of this shared network in complex, naturalistic environments remains understudied. Here, we investigated eye movements in relation to (un)attended sentences of natural speech. Combining simultaneously recorded eye tracking and magnetoencephalographic data with temporal response functions, we show that gaze tracks attended speech, a phenomenon we termed ocular speech tracking. Ocular speech tracking even differentiates a target from a distractor in a multi-speaker context and is further related to intelligibility. Moreover, we provide evidence for its contribution to neural differences in speech processing, emphasizing the necessity to consider oculomotor activity in future research and in the interpretation of neural differences in auditory cognition.

Attention , Eye Movements , Magnetoencephalography , Speech Perception , Speech , Humans , Attention/physiology , Eye Movements/physiology , Male , Female , Adult , Young Adult , Speech Perception/physiology , Speech/physiology , Acoustic Stimulation , Brain/physiology , Eye-Tracking Technology

14.

Clear speech effects in production of sentence-medial Mandarin lexical tonesa).

Rittenberry, Jack; Shport, Irina A.

JASA Express Lett ; 4(5)2024 May 01.

Article En | MEDLINE | ID: mdl-38804812

Adding to limited research on clear speech in tone languages, productions of Mandarin lexical tones were examined in pentasyllabic sentences. Fourteen participants read sentences imagining a hard-of-hearing addressee or a friend in a casual social setting. Tones produced in clear speech had longer duration, higher intensity, and larger F0 values. This style effect was rarely modulated by tone, preceding tonal context, or syllable position, consistent with an overall signal enhancement strategy. Possible evidence for tone enhancement was observed only in one set of analysis for F0 minimum and F0 range, contrasting tones with low targets and tones with high targets.

Language , Humans , Female , Male , Speech Acoustics , Adult , Young Adult , Speech , Speech Perception/physiology , Phonetics

15.

A multimodal interface for speech perception: the role of the left superior temporal sulcus in social cognition and autism.

Kausel, Leonie; Michon, Maëva; Soto-Icaza, Patricia; Aboitiz, Francisco.

Cereb Cortex ; 34(13): 84-93, 2024 May 02.

Article En | MEDLINE | ID: mdl-38696598

Multimodal integration is crucial for human interaction, in particular for social communication, which relies on integrating information from various sensory modalities. Recently a third visual pathway specialized in social perception was proposed, which includes the right superior temporal sulcus (STS) playing a key role in processing socially relevant cues and high-level social perception. Importantly, it has also recently been proposed that the left STS contributes to audiovisual integration of speech processing. In this article, we propose that brain areas along the right STS that support multimodal integration for social perception and cognition can be considered homologs to those in the left, language-dominant hemisphere, sustaining multimodal integration of speech and semantic concepts fundamental for social communication. Emphasizing the significance of the left STS in multimodal integration and associated processes such as multimodal attention to socially relevant stimuli, we underscore its potential relevance in comprehending neurodevelopmental conditions characterized by challenges in social communication such as autism spectrum disorder (ASD). Further research into this left lateral processing stream holds the promise of enhancing our understanding of social communication in both typical development and ASD, which may lead to more effective interventions that could improve the quality of life for individuals with atypical neurodevelopment.

Social Cognition , Speech Perception , Temporal Lobe , Humans , Temporal Lobe/physiology , Temporal Lobe/physiopathology , Speech Perception/physiology , Social Perception , Autistic Disorder/physiopathology , Autistic Disorder/psychology , Functional Laterality/physiology

16.

Deep learning-based auditory attention decoding in listeners with hearing impairment.

Tanveer, M Asjid; Skoglund, Martin A; Bernhardsson, Bo; Alickovic, Emina.

J Neural Eng ; 21(3)2024 May 22.

Article En | MEDLINE | ID: mdl-38729132

Objective.This study develops a deep learning (DL) method for fast auditory attention decoding (AAD) using electroencephalography (EEG) from listeners with hearing impairment (HI). It addresses three classification tasks: differentiating noise from speech-in-noise, classifying the direction of attended speech (left vs. right) and identifying the activation status of hearing aid noise reduction algorithms (OFF vs. ON). These tasks contribute to our understanding of how hearing technology influences auditory processing in the hearing-impaired population.Approach.Deep convolutional neural network (DCNN) models were designed for each task. Two training strategies were employed to clarify the impact of data splitting on AAD tasks: inter-trial, where the testing set used classification windows from trials that the training set had not seen, and intra-trial, where the testing set used unseen classification windows from trials where other segments were seen during training. The models were evaluated on EEG data from 31 participants with HI, listening to competing talkers amidst background noise.Main results.Using 1 s classification windows, DCNN models achieve accuracy (ACC) of 69.8%, 73.3% and 82.9% and area-under-curve (AUC) of 77.2%, 80.6% and 92.1% for the three tasks respectively on inter-trial strategy. In the intra-trial strategy, they achieved ACC of 87.9%, 80.1% and 97.5%, along with AUC of 94.6%, 89.1%, and 99.8%. Our DCNN models show good performance on short 1 s EEG samples, making them suitable for real-world applications. Conclusion: Our DCNN models successfully addressed three tasks with short 1 s EEG windows from participants with HI, showcasing their potential. While the inter-trial strategy demonstrated promise for assessing AAD, the intra-trial approach yielded inflated results, underscoring the important role of proper data splitting in EEG-based AAD tasks.Significance.Our findings showcase the promising potential of EEG-based tools for assessing auditory attention in clinical contexts and advancing hearing technology, while also promoting further exploration of alternative DL architectures and their potential constraints.

Attention , Auditory Perception , Deep Learning , Electroencephalography , Hearing Loss , Humans , Attention/physiology , Female , Electroencephalography/methods , Male , Middle Aged , Hearing Loss/physiopathology , Hearing Loss/rehabilitation , Hearing Loss/diagnosis , Aged , Auditory Perception/physiology , Noise , Adult , Hearing Aids , Speech Perception/physiology , Neural Networks, Computer

17.

The Latency of Auditory Event-Related Potential P300 Prolonged in School-Age Students with Unilateral Hearing Loss in a Mandarin Learning Environment.

Foo, Hiu Che; Tang, Chenwei; Kao, Yuting; Wu, Hsingmei; Chang, Chelun; Wu, Meiyao; Lo, Yuchun; Weng, Shihming.

Am Ann Deaf ; 168(5): 241-257, 2024.

Article En | MEDLINE | ID: mdl-38766937

Our study investigated the differences in speech performance and neurophysiological response in groups of school-age children with unilateral hearing loss (UHL) who were otherwise typically developing (TD). We recruited a total of 16 primary school-age children for our study (UHL = 9/TD = 7), who were screened by doctors at Shin Kong Wu-Ho-Su Memorial Hospital. We used the Peabody Picture Vocabulary Test-Revised (PPVT-R) to test word comprehension, and the PPVT-R percentile rank (PR) value was proportional to the auditory memory score (by The Children's Oral Comprehension Test) in both groups. Later, we assessed the latency and amplitude of auditory ERP P300 and found that the latency of auditory ERP P300 in the UHL group was prolonged compared with that in the TD group. Although students with UHL have typical hearing in one ear, based on our results, long-term UHL might be the cause of atypical organization of brain areas responsible for auditory processing or even visual perceptions attributed to speech delay and learning difficulties.

Event-Related Potentials, P300 , Hearing Loss, Unilateral , Humans , Child , Event-Related Potentials, P300/physiology , Male , Female , Hearing Loss, Unilateral/physiopathology , Hearing Loss, Unilateral/rehabilitation , Reaction Time/physiology , Speech Perception/physiology , Evoked Potentials, Auditory/physiology , China , Case-Control Studies , Language , Comprehension

18.

Visual scanning patterns of a talking face when evaluating phonetic information in a native and non-native language.

Deng, Xizi; McClay, Elise; Jastrzebski, Erin; Wang, Yue; Yeung, H Henny.

PLoS One ; 19(5): e0304150, 2024.

Article En | MEDLINE | ID: mdl-38805447

When comprehending speech, listeners can use information encoded in visual cues from a face to enhance auditory speech comprehension. For example, prior work has shown that the mouth movements reflect articulatory features of speech segments and durational information, while pitch and speech amplitude are primarily cued by eyebrow and head movements. Little is known about how the visual perception of segmental and prosodic speech information is influenced by linguistic experience. Using eye-tracking, we studied how perceivers' visual scanning of different regions on a talking face predicts accuracy in a task targeting both segmental versus prosodic information, and also asked how this was influenced by language familiarity. Twenty-four native English perceivers heard two audio sentences in either English or Mandarin (an unfamiliar, non-native language), which sometimes differed in segmental or prosodic information (or both). Perceivers then saw a silent video of a talking face, and judged whether that video matched either the first or second audio sentence (or whether both sentences were the same). First, increased looking to the mouth predicted correct responses only for non-native language trials. Second, the start of a successful search for speech information in the mouth area was significantly delayed in non-native versus native trials, but just when there were only prosodic differences in the auditory sentences, and not when there were segmental differences. Third, (in correct trials) the saccade amplitude in native language trials was significantly greater than in non-native trials, indicating more intensely focused fixations in the latter. Taken together, these results suggest that mouth-looking was generally more evident when processing a non-native versus native language in all analyses, but fascinatingly, when measuring perceivers' latency to fixate the mouth, this language effect was largest in trials where only prosodic information was useful for the task.

Language , Phonetics , Speech Perception , Humans , Female , Male , Adult , Speech Perception/physiology , Young Adult , Face/physiology , Visual Perception/physiology , Eye Movements/physiology , Speech/physiology , Eye-Tracking Technology

19.

The human auditory system uses amplitude modulation to distinguish music from speech.

Chang, Andrew; Teng, Xiangbin; Assaneo, M Florencia; Poeppel, David.

PLoS Biol ; 22(5): e3002631, 2024 May.

Article En | MEDLINE | ID: mdl-38805517

Music and speech are complex and distinct auditory signals that are both foundational to the human experience. The mechanisms underpinning each domain are widely investigated. However, what perceptual mechanism transforms a sound into music or speech and how basic acoustic information is required to distinguish between them remain open questions. Here, we hypothesized that a sound's amplitude modulation (AM), an essential temporal acoustic feature driving the auditory system across processing levels, is critical for distinguishing music and speech. Specifically, in contrast to paradigms using naturalistic acoustic signals (that can be challenging to interpret), we used a noise-probing approach to untangle the auditory mechanism: If AM rate and regularity are critical for perceptually distinguishing music and speech, judging artificially noise-synthesized ambiguous audio signals should align with their AM parameters. Across 4 experiments (N = 335), signals with a higher peak AM frequency tend to be judged as speech, lower as music. Interestingly, this principle is consistently used by all listeners for speech judgments, but only by musically sophisticated listeners for music. In addition, signals with more regular AM are judged as music over speech, and this feature is more critical for music judgment, regardless of musical sophistication. The data suggest that the auditory system can rely on a low-level acoustic property as basic as AM to distinguish music from speech, a simple principle that provokes both neurophysiological and evolutionary experiments and speculations.

Acoustic Stimulation , Auditory Perception , Music , Speech Perception , Humans , Male , Female , Adult , Auditory Perception/physiology , Acoustic Stimulation/methods , Speech Perception/physiology , Young Adult , Speech/physiology , Adolescent

20.

The Impact of Spectral and Temporal Degradation on Vocoded Speech Recognition in Early-Blind Individuals.

Choi, Hyo Jung; Kyong, Jeong-Sug; Lee, Jae Hee; Han, Seung Ho; Shim, Hyun Joon.

eNeuro ; 11(5)2024 May.

Article En | MEDLINE | ID: mdl-38811162

This study compared the impact of spectral and temporal degradation on vocoded speech recognition between early-blind and sighted subjects. The participants included 25 early-blind subjects (30.32 ± 4.88âyears; male:female, 14:11) and 25 age- and sex-matched sighted subjects. Tests included monosyllable recognition in noise at various signal-to-noise ratios (-18 to -4âdB), matrix sentence-in-noise recognition, and vocoded speech recognition with different numbers of channels (4, 8, 16, and 32) and temporal envelope cutoff frequencies (50 vs 500âHz). Cortical-evoked potentials (N2 and P3b) were measured in response to spectrally and temporally degraded stimuli. The early-blind subjects displayed superior monosyllable and sentence recognition than sighted subjects (all p < 0.01). In the vocoded speech recognition test, a three-way repeated-measure analysis of variance (two groups × four channels × two cutoff frequencies) revealed significant main effects of group, channel, and cutoff frequency (all p < 0.001). Early-blind subjects showed increased sensitivity to spectral degradation for speech recognition, evident in the significant interaction between group and channel (p = 0.007). N2 responses in early-blind subjects exhibited shorter latency and greater amplitude in the 8-channel (p = 0.022 and 0.034, respectively) and shorter latency in the 16-channel (p = 0.049) compared with sighted subjects. In conclusion, early-blind subjects demonstrated speech recognition advantages over sighted subjects, even in the presence of spectral and temporal degradation. Spectral degradation had a greater impact on speech recognition in early-blind subjects, while the effect of temporal degradation was similar in both groups.

Blindness , Speech Perception , Humans , Male , Female , Speech Perception/physiology , Adult , Blindness/physiopathology , Young Adult , Electroencephalography/methods , Acoustic Stimulation , Recognition, Psychology/physiology , Evoked Potentials, Auditory/physiology