Pesquisa | Biblioteca Virtual em Saúde

1.

Joint population coding and temporal coherence link an attended talker's voice and location features in naturalistic multi-talker scenes.

van der Heijden, Kiki; Patel, Prachi; Bickel, Stephan; Herrero, Jose L; Mehta, Ashesh D; Mesgarani, Nima.

bioRxiv ; 2024 May 14.

Artigo em Inglês | MEDLINE | ID: mdl-38798551

RESUMO

Listeners readily extract multi-dimensional auditory objects such as a 'localized talker' from complex acoustic scenes with multiple talkers. Yet, the neural mechanisms underlying simultaneous encoding and linking of different sound features - for example, a talker's voice and location - are poorly understood. We analyzed invasive intracranial recordings in neurosurgical patients attending to a localized talker in real-life cocktail party scenarios. We found that sensitivity to an individual talker's voice and location features was distributed throughout auditory cortex and that neural sites exhibited a gradient from sensitivity to a single feature to joint sensitivity to both features. On a population level, cortical response patterns of both dual-feature sensitive sites but also single-feature sensitive sites revealed simultaneous encoding of an attended talker's voice and location features. However, for single-feature sensitive sites, the representation of the primary feature was more precise. Further, sites which selective tracked an attended speech stream concurrently encoded an attended talker's voice and location features, indicating that such sites combine selective tracking of an attended auditory object with encoding of the object's features. Finally, we found that attending a localized talker selectively enhanced temporal coherence between single-feature voice sensitive sites and single-feature location sensitive sites, providing an additional mechanism for linking voice and location in multi-talker scenes. These results demonstrate that a talker's voice and location features are linked during multi-dimensional object formation in naturalistic multi-talker scenes by joint population coding as well as by temporal coherence between neural sites. SIGNIFICANCE STATEMENT: Listeners effortlessly extract auditory objects from complex acoustic scenes consisting of multiple sound sources in naturalistic, spatial sound scenes. Yet, how the brain links different sound features to form a multi-dimensional auditory object is poorly understood. We investigated how neural responses encode and integrate an attended talker's voice and location features in spatial multi-talker sound scenes to elucidate which neural mechanisms underlie simultaneous encoding and linking of different auditory features. Our results show that joint population coding as well as temporal coherence mechanisms contribute to distributed multi-dimensional auditory object encoding. These findings shed new light on cortical functional specialization and multidimensional auditory object formation in complex, naturalistic listening scenes. HIGHLIGHTS: Cortical responses to an single talker exhibit a distributed gradient, ranging from sites that are sensitive to both a talker's voice and location (dual-feature sensitive sites) to sites that are sensitive to either voice or location (single-feature sensitive sites).Population response patterns of dual-feature sensitive sites encode voice and location features of the attended talker in multi-talker scenes jointly and with equal precision.Despite their sensitivity to a single feature at the level of individual cortical sites, population response patterns of single-feature sensitive sites also encode location and voice features of a talker jointly, but with higher precision for the feature they are primarily sensitive to.Neural sites which selectively track an attended speech stream concurrently encode the attended talker's voice and location features.Attention selectively enhances temporal coherence between voice and location selective sites over time.Joint population coding as well as temporal coherence mechanisms underlie distributed multi-dimensional auditory object encoding in auditory cortex.

2.

Large-scale single-neuron speech sound encoding across the depth of human cortex.

Leonard, Matthew K; Gwilliams, Laura; Sellers, Kristin K; Chung, Jason E; Xu, Duo; Mischler, Gavin; Mesgarani, Nima; Welkenhuysen, Marleen; Dutta, Barundeb; Chang, Edward F.

Nature ; 626(7999): 593-602, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38093008

RESUMO

Understanding the neural basis of speech perception requires that we study the human brain both at the scale of the fundamental computational unit of neurons and in their organization across the depth of cortex. Here we used high-density Neuropixels arrays1-3 to record from 685 neurons across cortical layers at nine sites in a high-level auditory region that is critical for speech, the superior temporal gyrus4,5, while participants listened to spoken sentences. Single neurons encoded a wide range of speech sound cues, including features of consonants and vowels, relative vocal pitch, onsets, amplitude envelope and sequence statistics. Neurons at each cross-laminar recording exhibited dominant tuning to a primary speech feature while also containing a substantial proportion of neurons that encoded other features contributing to heterogeneous selectivity. Spatially, neurons at similar cortical depths tended to encode similar speech features. Activity across all cortical layers was predictive of high-frequency field potentials (electrocorticography), providing a neuronal origin for macroelectrode recordings from the cortical surface. Together, these results establish single-neuron tuning across the cortical laminae as an important dimension of speech encoding in human superior temporal gyrus.

Assuntos

Córtex Auditivo , Neurônios , Percepção da Fala , Lobo Temporal , Humanos , Estimulação Acústica , Córtex Auditivo/citologia , Córtex Auditivo/fisiologia , Neurônios/fisiologia , Fonética , Fala , Percepção da Fala/fisiologia , Lobo Temporal/citologia , Lobo Temporal/fisiologia , Sinais (Psicologia) , Eletrodos

3.

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation.

Han, Cong; Choudhari, Vishal; Li, Yinghao Aaron; Mesgarani, Nima.

Annu Int Conf IEEE Eng Med Biol Soc ; 2023: 1-5, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-38083559

RESUMO

Auditory attention decoding (AAD) is a technique used to identify and amplify the talker that a listener is focused on in a noisy environment. This is done by comparing the listener's brainwaves to a representation of all the sound sources to find the closest match. The representation is typically the waveform or spectrogram of the sounds. The effectiveness of these representations for AAD is uncertain. In this study, we examined the use of self-supervised learned speech representation in improving the accuracy and speed of AAD. We recorded the brain activity of three subjects using invasive electrocorticography (ECoG) as they listened to two conversations and focused on one. We used WavLM to extract a latent representation of each talker and trained a spatiotemporal filter to map brain activity to intermediate representations of speech. During the evaluation, the reconstructed representation is compared to each speaker's representation to determine the target speaker. Our results indicate that speech representation from WavLM provides better decoding accuracy and speed than the speech envelope and spectrogram. Our findings demonstrate the advantages of self-supervised learned speech representation for auditory attention decoding and pave the way for developing brain-controlled hearable technologies.

Assuntos

Córtex Auditivo , Percepção da Fala , Humanos , Fala , Estimulação Acústica/métodos , Atenção

4.

naplib-python: Neural acoustic data processing and analysis tools in python.

Mischler, Gavin; Raghavan, Vinay; Keshishian, Menoua; Mesgarani, Nima.

Softw Impacts ; 172023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37771949

RESUMO

Recently, the computational neuroscience community has pushed for more transparent and reproducible methods across the field. In the interest of unifying the domain of auditory neuroscience, naplib-python provides an intuitive and general data structure for handling all neural recordings and stimuli, as well as extensive preprocessing, feature extraction, and analysis tools which operate on that data structure. The package removes many of the complications associated with this domain, such as varying trial durations and multi-modal stimuli, and provides a general-purpose analysis framework that interfaces easily with existing toolboxes used in the field.

5.

STYLETTS-VC: ONE-SHOT VOICE CONVERSION BY KNOWLEDGE TRANSFER FROM STYLE-BASED TTS MODELS.

Li, Yinghao Aaron; Han, Cong; Mesgarani, Nima.

SLT Workshop Spok Lang Technol ; 2022: 920-927, 2023 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-37577031

RESUMO

One-shot voice conversion (VC) aims to convert speech from any source speaker to an arbitrary target speaker with only a few seconds of reference speech from the target speaker. This relies heavily on disentangling the speaker's identity and speech content, a task that still remains challenging. Here, we propose a novel approach to learning disentangled speech representation by transfer learning from style-based text-to-speech (TTS) models. With cycle consistent and adversarial training, the style-based TTS models can perform transcription-guided one-shot VC with high fidelity and similarity. By learning an additional mel-spectrogram encoder through a teacher-student knowledge transfer and novel data augmentation scheme, our approach results in disentangled speech representation without needing the input text. The subjective evaluation shows that our approach can significantly outperform the previous state-of-the-art one-shot voice conversion models in both naturalness and similarity.

6.

PHONEME-LEVEL BERT FOR ENHANCED PROSODY OF TEXT-TO-SPEECH WITH GRAPHEME PREDICTIONS.

Aaron Li, Yinghao; Han, Cong; Jiang, Xilin; Mesgarani, Nima.

Proc IEEE Int Conf Acoust Speech Signal Process ; 20232023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-37577179

RESUMO

Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns. However, these models are usually word-level or sup-phoneme-level and jointly trained with phonemes, making them inefficient for the downstream TTS task where only phonemes are needed. In this work, we propose a phoneme-level BERT (PL-BERT) with a pretext task of predicting the corresponding graphemes along with the regular masked phoneme predictions. Subjective evaluations show that our phoneme-level BERT encoder has significantly improved the mean opinion scores (MOS) of rated naturalness of synthesized speech compared with the state-of-the-art (SOTA) StyleTTS baseline on out-of-distribution (OOD) texts.

7.

ONLINE BINAURAL SPEECH SEPARATION OF MOVING SPEAKERS WITH A WAVESPLIT NETWORK.

Han, Cong; Mesgarani, Nima.

Proc IEEE Int Conf Acoust Speech Signal Process ; 20232023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-37577180

RESUMO

Binaural speech separation in real-world scenarios often involves moving speakers. Most current speech separation methods use utterance-level permutation invariant training (u-PIT) for training. In inference time, however, the order of outputs can be inconsistent over time particularly in long-form speech separation. This situation which is referred to as the speaker swap problem is even more problematic when speakers constantly move in space and therefore poses a challenge for consistent placement of speakers in output channels. Here, we describe a real-time binaural speech separation model based on a Wavesplit network to mitigate the speaker swap problem for moving speaker separation. Our model computes a speaker embedding for each speaker at each time frame from the mixed audio, aggregates embeddings using online clustering, and uses cluster centroids as speaker profiles to track each speaker throughout the long duration. Experimental results on reverberant, long-form moving multitalker speech separation show that the proposed method is less prone to speaker swap and achieves comparable performance with u-PIT based models with ground truth tracking in both separation accuracy and preserving the interaural cues.

8.

Distinct neural encoding of glimpsed and masked speech in multitalker situations.

Raghavan, Vinay S; O'Sullivan, James; Bickel, Stephan; Mehta, Ashesh D; Mesgarani, Nima.

PLoS Biol ; 21(6): e3002128, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-37279203

RESUMO

Humans can easily tune in to one talker in a multitalker environment while still picking up bits of background speech; however, it remains unclear how we perceive speech that is masked and to what degree non-target speech is processed. Some models suggest that perception can be achieved through glimpses, which are spectrotemporal regions where a talker has more energy than the background. Other models, however, require the recovery of the masked regions. To clarify this issue, we directly recorded from primary and non-primary auditory cortex (AC) in neurosurgical patients as they attended to one talker in multitalker speech and trained temporal response function models to predict high-gamma neural activity from glimpsed and masked stimulus features. We found that glimpsed speech is encoded at the level of phonetic features for target and non-target talkers, with enhanced encoding of target speech in non-primary AC. In contrast, encoding of masked phonetic features was found only for the target, with a greater response latency and distinct anatomical organization compared to glimpsed phonetic features. These findings suggest separate mechanisms for encoding glimpsed and masked speech and provide neural evidence for the glimpsing model of speech perception.

Assuntos

Percepção da Fala , Fala , Humanos , Fala/fisiologia , Estimulação Acústica , Fonética , Percepção da Fala/fisiologia , Tempo de Reação

9.

Automatic speaker diarization for natural conversation analysis in autism clinical trials.

O'Sullivan, James; Bogaarts, Guy; Schoenenberger, Philipp; Tillmann, Julian; Slater, David; Mesgarani, Nima; Eule, Eckhart; Kilchenmann, Timothy; Murtagh, Lorraine; Hipp, Joerg; Lindemann, Michael; Lipsmeier, Florian; Cheng, Wei-Yi; Nobbs, David; Chatham, Christopher.

Sci Rep ; 13(1): 10270, 2023 06 24.

Artigo em Inglês | MEDLINE | ID: mdl-37355730

RESUMO

Challenges in social communication is one of the core symptom domains in autism spectrum disorder (ASD). Novel therapies are under development to help individuals with these challenges, however the ability to show a benefit is dependent on a sensitive and reliable measure of treatment effect. Currently, measuring these deficits requires the use of time-consuming and subjective techniques. Objective measures extracted from natural conversations could be more ecologically relevant, and administered more frequently-perhaps giving them added sensitivity to change. While several studies have used automated analysis methods to study autistic speech, they require manual transcriptions. In order to bypass this time-consuming process, an automated speaker diarization algorithm must first be applied. In this paper, we are testing whether a speaker diarization algorithm can be applied to natural conversations between autistic individuals and their conversational partner in a natural setting at home over the course of a clinical trial. We calculated the average duration that a participant would speak for within their turn. We found a significant correlation between this feature and the Vineland Adaptive Behaviour Scales (VABS) expressive communication score (r = 0.51, p = 7 × 10-5). Our results show that natural conversations can be used to obtain measures of talkativeness, and that this measure can be derived automatically, thus showing the promise of objectively evaluating communication challenges in ASD.

Assuntos

Transtorno do Espectro Autista , Transtorno Autístico , Humanos , Transtorno Autístico/terapia , Transtorno do Espectro Autista/terapia , Transtorno do Espectro Autista/diagnóstico , Comunicação , Fala

10.

naplib-python: Neural Acoustic Data Processing and Analysis Tools in Python.

Mischler, Gavin; Raghavan, Vinay; Keshishian, Menoua; Mesgarani, Nima.

ArXiv ; 2023 Apr 04.

Artigo em Inglês | MEDLINE | ID: mdl-37064534

RESUMO

Recently, the computational neuroscience community has pushed for more transparent and reproducible methods across the field. In the interest of unifying the domain of auditory neuroscience, naplib-python provides an intuitive and general data structure for handling all neural recordings and stimuli, as well as extensive preprocessing, feature extraction, and analysis tools which operate on that data structure. The package removes many of the complications associated with this domain, such as varying trial durations and multi-modal stimuli, and provides a general-purpose analysis framework that interfaces easily with existing toolboxes used in the field.

11.

Joint, distributed and hierarchically organized encoding of linguistic features in the human auditory cortex.

Keshishian, Menoua; Akkol, Serdar; Herrero, Jose; Bickel, Stephan; Mehta, Ashesh D; Mesgarani, Nima.

Nat Hum Behav ; 7(5): 740-753, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-36864134

RESUMO

The precise role of the human auditory cortex in representing speech sounds and transforming them to meaning is not yet fully understood. Here we used intracranial recordings from the auditory cortex of neurosurgical patients as they listened to natural speech. We found an explicit, temporally ordered and anatomically distributed neural encoding of multiple linguistic features, including phonetic, prelexical phonotactics, word frequency, and lexical-phonological and lexical-semantic information. Grouping neural sites on the basis of their encoded linguistic features revealed a hierarchical pattern, with distinct representations of prelexical and postlexical features distributed across various auditory areas. While sites with longer response latencies and greater distance from the primary auditory cortex encoded higher-level linguistic features, the encoding of lower-level features was preserved and not discarded. Our study reveals a cumulative mapping of sound to meaning and provides empirical evidence for validating neurolinguistic and psycholinguistic models of spoken word recognition that preserve the acoustic variations in speech.

Assuntos

Córtex Auditivo , Percepção da Fala , Humanos , Córtex Auditivo/fisiologia , Percepção da Fala/fisiologia , Percepção Auditiva/fisiologia , Fala/fisiologia , Fonética

12.

Deep neural networks effectively model neural adaptation to changing background noise and suggest nonlinear noise filtering methods in auditory cortex.

Mischler, Gavin; Keshishian, Menoua; Bickel, Stephan; Mehta, Ashesh D; Mesgarani, Nima.

Neuroimage ; 266: 119819, 2023 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-36529203

RESUMO

The human auditory system displays a robust capacity to adapt to sudden changes in background noise, allowing for continuous speech comprehension despite changes in background environments. However, despite comprehensive studies characterizing this ability, the computations that underly this process are not well understood. The first step towards understanding a complex system is to propose a suitable model, but the classical and easily interpreted model for the auditory system, the spectro-temporal receptive field (STRF), cannot match the nonlinear neural dynamics involved in noise adaptation. Here, we utilize a deep neural network (DNN) to model neural adaptation to noise, illustrating its effectiveness at reproducing the complex dynamics at the levels of both individual electrodes and the cortical population. By closely inspecting the model's STRF-like computations over time, we find that the model alters both the gain and shape of its receptive field when adapting to a sudden noise change. We show that the DNN model's gain changes allow it to perform adaptive gain control, while the spectro-temporal change creates noise filtering by altering the inhibitory region of the model's receptive field. Further, we find that models of electrodes in nonprimary auditory cortex also exhibit noise filtering changes in their excitatory regions, suggesting differences in noise filtering mechanisms along the cortical hierarchy. These findings demonstrate the capability of deep neural networks to model complex neural adaptation and offer new hypotheses about the computations the auditory cortex performs to enable noise-robust speech perception in real-world, dynamic environments.

Assuntos

Córtex Auditivo , Humanos , Estimulação Acústica/métodos , Percepção Auditiva , Neurônios , Redes Neurais de Computação

13.

Interaction of bottom-up and top-down neural mechanisms in spatial multi-talker speech perception.

Patel, Prachi; van der Heijden, Kiki; Bickel, Stephan; Herrero, Jose L; Mehta, Ashesh D; Mesgarani, Nima.

Curr Biol ; 32(18): 3971-3986.e4, 2022 09 26.

Artigo em Inglês | MEDLINE | ID: mdl-35973430

RESUMO

How the human auditory cortex represents spatially separated simultaneous talkers and how talkers' locations and voices modulate the neural representations of attended and unattended speech are unclear. Here, we measured the neural responses from electrodes implanted in neurosurgical patients as they performed single-talker and multi-talker speech perception tasks. We found that spatial separation between talkers caused a preferential encoding of the contralateral speech in Heschl's gyrus (HG), planum temporale (PT), and superior temporal gyrus (STG). Location and spectrotemporal features were encoded in different aspects of the neural response. Specifically, the talker's location changed the mean response level, whereas the talker's spectrotemporal features altered the variation of response around response's baseline. These components were differentially modulated by the attended talker's voice or location, which improved the population decoding of attended speech features. Attentional modulation due to the talker's voice only appeared in the auditory areas with longer latencies, but attentional modulation due to location was present throughout. Our results show that spatial multi-talker speech perception relies upon a separable pre-attentive neural representation, which could be further tuned by top-down attention to the location and voice of the talker.

Assuntos

Córtex Auditivo , Percepção da Fala , Voz , Córtex Auditivo/fisiologia , Humanos , Fala , Percepção da Fala/fisiologia , Lobo Temporal

14.

The Spatial Reach of Neuronal Coherence and Spike-Field Coupling across the Human Neocortex.

Myers, John C; Smith, Elliot H; Leszczynski, Marcin; O'Sullivan, James; Yates, Mark J; McKhann, Guy; Mesgarani, Nima; Schroeder, Charles; Schevon, Catherine; Sheth, Sameer A.

J Neurosci ; 42(32): 6285-6294, 2022 08 10.

Artigo em Inglês | MEDLINE | ID: mdl-35790403

RESUMO

Neuronal coherence is thought to be a fundamental mechanism of communication in the brain, where synchronized field potentials coordinate synaptic and spiking events to support plasticity and learning. Although the spread of field potentials has garnered great interest, little is known about the spatial reach of phase synchronization, or neuronal coherence. Functional connectivity between different brain regions is known to occur across long distances, but the locality of synchronization across the neocortex is understudied. Here we used simultaneous recordings from electrocorticography (ECoG) grids and high-density microelectrode arrays to estimate the spatial reach of neuronal coherence and spike-field coherence (SFC) across frontal, temporal, and occipital cortices during cognitive tasks in humans. We observed the strongest coherence within a 2-3 cm distance from the microelectrode arrays, potentially defining an effective range for local communication. This range was relatively consistent across brain regions, spectral frequencies, and cognitive tasks. The magnitude of coherence showed power law decay with increasing distance from the microelectrode arrays, where the highest coherence occurred between ECoG contacts, followed by coherence between ECoG and deep cortical local field potential (LFP), and then SFC (i.e., ECoG > LFP > SFC). The spectral frequency of coherence also affected its magnitude. Alpha coherence (8-14 Hz) was generally higher than other frequencies for signals nearest the microelectrode arrays, whereas delta coherence (1-3 Hz) was higher for signals that were farther away. Action potentials in all brain regions were most coherent with the phase of alpha oscillations, which suggests that alpha waves could play a larger, more spatially local role in spike timing than other frequencies. These findings provide a deeper understanding of the spatial and spectral dynamics of neuronal synchronization, further advancing knowledge about how activity propagates across the human brain.SIGNIFICANCE STATEMENT Coherence is theorized to facilitate information transfer across cerebral space by providing a convenient electrophysiological mechanism to modulate membrane potentials in spatiotemporally complex patterns. Our work uses a multiscale approach to evaluate the spatial reach of phase coherence and spike-field coherence during cognitive tasks in humans. Locally, coherence can reach up to 3 cm around a given area of neocortex. The spectral properties of coherence revealed that alpha phase-field and spike-field coherence were higher within ranges <2 cm, whereas lower-frequency delta coherence was higher for contacts farther away. Spatiotemporally shared information (i.e., coherence) across neocortex seems to reach farther than field potentials alone.

Assuntos

Neocórtex , Potenciais de Ação/fisiologia , Eletrocorticografia , Humanos , Microeletrodos , Neurônios/fisiologia

15.

Editorial: Neural Tracking: Closing the Gap Between Neurophysiology and Translational Medicine.

Di Liberto, Giovanni M; Hjortkjær, Jens; Mesgarani, Nima.

Front Neurosci ; 16: 872600, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35368278

16.

Improved Speech Hearing in Noise with Invasive Electrical Brain Stimulation.

Patel, Prachi; Khalijhinejad, Bahar; Herrero, Jose L; Bickel, Stephan; Mehta, Ashesh D; Mesgarani, Nima.

J Neurosci ; 42(17): 3648-3658, 2022 04 27.

Artigo em Inglês | MEDLINE | ID: mdl-35347046

RESUMO

Speech perception in noise is a challenging everyday task with which many listeners have difficulty. Here, we report a case in which electrical brain stimulation of implanted intracranial electrodes in the left planum temporale (PT) of a neurosurgical patient significantly and reliably improved subjective quality (up to 50%) and objective intelligibility (up to 97%) of speech in noise perception. Stimulation resulted in a selective enhancement of speech sounds compared with the background noises. The receptive fields of the PT sites whose stimulation improved speech perception were tuned to spectrally broad and rapidly changing sounds. Corticocortical evoked potential analysis revealed that the PT sites were located between the sites in Heschl's gyrus and the superior temporal gyrus. Moreover, the discriminability of speech from nonspeech sounds increased in population neural responses from Heschl's gyrus to the PT to the superior temporal gyrus sites. These findings causally implicate the PT in background noise suppression and may point to a novel potential neuroprosthetic solution to assist in the challenging task of speech perception in noise.SIGNIFICANCE STATEMENT Speech perception in noise remains a challenging task for many individuals. Here, we present a case in which the electrical brain stimulation of intracranially implanted electrodes in the planum temporale of a neurosurgical patient significantly improved both the subjective quality (up to 50%) and objective intelligibility (up to 97%) of speech perception in noise. Stimulation resulted in a selective enhancement of speech sounds compared with the background noises. Our local and network-level functional analyses placed the planum temporale sites in between the sites in the primary auditory areas in Heschl's gyrus and nonprimary auditory areas in the superior temporal gyrus. These findings causally implicate planum temporale in acoustic scene analysis and suggest potential neuroprosthetic applications to assist hearing in noise.

Assuntos

Córtex Auditivo , Percepção da Fala , Estimulação Acústica , Córtex Auditivo/fisiologia , Encéfalo , Mapeamento Encefálico/métodos , Audição , Humanos , Imageamento por Ressonância Magnética/métodos , Fala/fisiologia , Percepção da Fala/fisiologia

17.

Multiscale temporal integration organizes hierarchical computation in human auditory cortex.

Norman-Haignere, Sam V; Long, Laura K; Devinsky, Orrin; Doyle, Werner; Irobunda, Ifeoma; Merricks, Edward M; Feldstein, Neil A; McKhann, Guy M; Schevon, Catherine A; Flinker, Adeen; Mesgarani, Nima.

Nat Hum Behav ; 6(3): 455-469, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-35145280

RESUMO

To derive meaning from sound, the brain must integrate information across many timescales. What computations underlie multiscale integration in human auditory cortex? Evidence suggests that auditory cortex analyses sound using both generic acoustic representations (for example, spectrotemporal modulation tuning) and category-specific computations, but the timescales over which these putatively distinct computations integrate remain unclear. To answer this question, we developed a general method to estimate sensory integration windows-the time window when stimuli alter the neural response-and applied our method to intracranial recordings from neurosurgical patients. We show that human auditory cortex integrates hierarchically across diverse timescales spanning from ~50 to 400 ms. Moreover, we find that neural populations with short and long integration windows exhibit distinct functional properties: short-integration electrodes (less than ~200 ms) show prominent spectrotemporal modulation selectivity, while long-integration electrodes (greater than ~200 ms) show prominent category selectivity. These findings reveal how multiscale integration organizes auditory computation in the human brain.

Assuntos

Córtex Auditivo , Estimulação Acústica/métodos , Percepção Auditiva , Encéfalo , Mapeamento Encefálico/métodos , Humanos

18.

Functional characterization of human Heschl's gyrus in response to natural speech.

Khalighinejad, Bahar; Patel, Prachi; Herrero, Jose L; Bickel, Stephan; Mehta, Ashesh D; Mesgarani, Nima.

Neuroimage ; 235: 118003, 2021 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-33789135

RESUMO

Heschl's gyrus (HG) is a brain area that includes the primary auditory cortex in humans. Due to the limitations in obtaining direct neural measurements from this region during naturalistic speech listening, the functional organization and the role of HG in speech perception remain uncertain. Here, we used intracranial EEG to directly record neural activity in HG in eight neurosurgical patients as they listened to continuous speech stories. We studied the spatial distribution of acoustic tuning and the organization of linguistic feature encoding. We found a main gradient of change from posteromedial to anterolateral parts of HG. We also observed a decrease in frequency and temporal modulation tuning and an increase in phonemic representation, speaker normalization, speech sensitivity, and response latency. We did not observe a difference between the two brain hemispheres. These findings reveal a functional role for HG in processing and transforming simple to complex acoustic features and inform neurophysiological models of speech processing in the human auditory cortex.

Assuntos

Córtex Auditivo/fisiologia , Mapeamento Encefálico , Percepção da Fala/fisiologia , Adulto , Eletrocorticografia , Epilepsia/diagnóstico , Epilepsia/cirurgia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Procedimentos Neurocirúrgicos

19.

Learning Speech Production and Perception through Sensorimotor Interactions.

Shamma, Shihab; Patel, Prachi; Mukherjee, Shoutik; Marion, Guilhem; Khalighinejad, Bahar; Han, Cong; Herrero, Jose; Bickel, Stephan; Mehta, Ashesh; Mesgarani, Nima.

Cereb Cortex Commun ; 2(1): tgaa091, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33506209

RESUMO

Action and perception are closely linked in many behaviors necessitating a close coordination between sensory and motor neural processes so as to achieve a well-integrated smoothly evolving task performance. To investigate the detailed nature of these sensorimotor interactions, and their role in learning and executing the skilled motor task of speaking, we analyzed ECoG recordings of responses in the high-Î³ band (70-150 Hz) in human subjects while they listened to, spoke, or silently articulated speech. We found elaborate spectrotemporally modulated neural activity projecting in both "forward" (motor-to-sensory) and "inverse" directions between the higher-auditory and motor cortical regions engaged during speaking. Furthermore, mathematical simulations demonstrate a key role for the forward projection in "learning" to control the vocal tract, beyond its commonly postulated predictive role during execution. These results therefore offer a broader view of the functional role of the ubiquitous forward projection as an important ingredient in learning, rather than just control, of skilled sensorimotor tasks.

20.

Neural representation of linguistic feature hierarchy reflects second-language proficiency.

Liberto, Giovanni M Di; Nie, Jingping; Yeaton, Jeremy; Khalighinejad, Bahar; Shamma, Shihab A; Mesgarani, Nima.

Neuroimage ; 227: 117586, 2021 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-33346131

RESUMO

Acquiring a new language requires individuals to simultaneously and gradually learn linguistic attributes on multiple levels. Here, we investigated how this learning process changes the neural encoding of natural speech by assessing the encoding of the linguistic feature hierarchy in second-language listeners. Electroencephalography (EEG) signals were recorded from native Mandarin speakers with varied English proficiency and from native English speakers while they listened to audio-stories in English. We measured the temporal response functions (TRFs) for acoustic, phonemic, phonotactic, and semantic features in individual participants and found a main effect of proficiency on linguistic encoding. This effect of second-language proficiency was particularly prominent on the neural encoding of phonemes, showing stronger encoding of "new" phonemic contrasts (i.e., English contrasts that do not exist in Mandarin) with increasing proficiency. Overall, we found that the nonnative listeners with higher proficiency levels had a linguistic feature representation more similar to that of native listeners, which enabled the accurate decoding of language proficiency. This result advances our understanding of the cortical processing of linguistic information in second-language learners and provides an objective measure of language proficiency.

Assuntos

Encéfalo/fisiologia , Compreensão/fisiologia , Multilinguismo , Percepção da Fala/fisiologia , Adolescente , Adulto , Eletroencefalografia , Feminino , Humanos , Idioma , Masculino , Pessoa de Meia-Idade , Fonética , Adulto Jovem

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA