Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 63
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 626(7999): 593-602, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38093008

RESUMO

Understanding the neural basis of speech perception requires that we study the human brain both at the scale of the fundamental computational unit of neurons and in their organization across the depth of cortex. Here we used high-density Neuropixels arrays1-3 to record from 685 neurons across cortical layers at nine sites in a high-level auditory region that is critical for speech, the superior temporal gyrus4,5, while participants listened to spoken sentences. Single neurons encoded a wide range of speech sound cues, including features of consonants and vowels, relative vocal pitch, onsets, amplitude envelope and sequence statistics. Neurons at each cross-laminar recording exhibited dominant tuning to a primary speech feature while also containing a substantial proportion of neurons that encoded other features contributing to heterogeneous selectivity. Spatially, neurons at similar cortical depths tended to encode similar speech features. Activity across all cortical layers was predictive of high-frequency field potentials (electrocorticography), providing a neuronal origin for macroelectrode recordings from the cortical surface. Together, these results establish single-neuron tuning across the cortical laminae as an important dimension of speech encoding in human superior temporal gyrus.


Assuntos
Córtex Auditivo , Neurônios , Percepção da Fala , Lobo Temporal , Humanos , Estimulação Acústica , Córtex Auditivo/citologia , Córtex Auditivo/fisiologia , Neurônios/fisiologia , Fonética , Fala , Percepção da Fala/fisiologia , Lobo Temporal/citologia , Lobo Temporal/fisiologia , Sinais (Psicologia) , Eletrodos
2.
PLoS Biol ; 21(6): e3002128, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37279203

RESUMO

Humans can easily tune in to one talker in a multitalker environment while still picking up bits of background speech; however, it remains unclear how we perceive speech that is masked and to what degree non-target speech is processed. Some models suggest that perception can be achieved through glimpses, which are spectrotemporal regions where a talker has more energy than the background. Other models, however, require the recovery of the masked regions. To clarify this issue, we directly recorded from primary and non-primary auditory cortex (AC) in neurosurgical patients as they attended to one talker in multitalker speech and trained temporal response function models to predict high-gamma neural activity from glimpsed and masked stimulus features. We found that glimpsed speech is encoded at the level of phonetic features for target and non-target talkers, with enhanced encoding of target speech in non-primary AC. In contrast, encoding of masked phonetic features was found only for the target, with a greater response latency and distinct anatomical organization compared to glimpsed phonetic features. These findings suggest separate mechanisms for encoding glimpsed and masked speech and provide neural evidence for the glimpsing model of speech perception.


Assuntos
Percepção da Fala , Fala , Humanos , Fala/fisiologia , Estimulação Acústica , Fonética , Percepção da Fala/fisiologia , Tempo de Reação
3.
J Neurosci ; 42(17): 3648-3658, 2022 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-35347046

RESUMO

Speech perception in noise is a challenging everyday task with which many listeners have difficulty. Here, we report a case in which electrical brain stimulation of implanted intracranial electrodes in the left planum temporale (PT) of a neurosurgical patient significantly and reliably improved subjective quality (up to 50%) and objective intelligibility (up to 97%) of speech in noise perception. Stimulation resulted in a selective enhancement of speech sounds compared with the background noises. The receptive fields of the PT sites whose stimulation improved speech perception were tuned to spectrally broad and rapidly changing sounds. Corticocortical evoked potential analysis revealed that the PT sites were located between the sites in Heschl's gyrus and the superior temporal gyrus. Moreover, the discriminability of speech from nonspeech sounds increased in population neural responses from Heschl's gyrus to the PT to the superior temporal gyrus sites. These findings causally implicate the PT in background noise suppression and may point to a novel potential neuroprosthetic solution to assist in the challenging task of speech perception in noise.SIGNIFICANCE STATEMENT Speech perception in noise remains a challenging task for many individuals. Here, we present a case in which the electrical brain stimulation of intracranially implanted electrodes in the planum temporale of a neurosurgical patient significantly improved both the subjective quality (up to 50%) and objective intelligibility (up to 97%) of speech perception in noise. Stimulation resulted in a selective enhancement of speech sounds compared with the background noises. Our local and network-level functional analyses placed the planum temporale sites in between the sites in the primary auditory areas in Heschl's gyrus and nonprimary auditory areas in the superior temporal gyrus. These findings causally implicate planum temporale in acoustic scene analysis and suggest potential neuroprosthetic applications to assist hearing in noise.


Assuntos
Córtex Auditivo , Percepção da Fala , Estimulação Acústica , Córtex Auditivo/fisiologia , Encéfalo , Mapeamento Encefálico/métodos , Audição , Humanos , Imageamento por Ressonância Magnética/métodos , Fala/fisiologia , Percepção da Fala/fisiologia
4.
J Neurosci ; 42(32): 6285-6294, 2022 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-35790403

RESUMO

Neuronal coherence is thought to be a fundamental mechanism of communication in the brain, where synchronized field potentials coordinate synaptic and spiking events to support plasticity and learning. Although the spread of field potentials has garnered great interest, little is known about the spatial reach of phase synchronization, or neuronal coherence. Functional connectivity between different brain regions is known to occur across long distances, but the locality of synchronization across the neocortex is understudied. Here we used simultaneous recordings from electrocorticography (ECoG) grids and high-density microelectrode arrays to estimate the spatial reach of neuronal coherence and spike-field coherence (SFC) across frontal, temporal, and occipital cortices during cognitive tasks in humans. We observed the strongest coherence within a 2-3 cm distance from the microelectrode arrays, potentially defining an effective range for local communication. This range was relatively consistent across brain regions, spectral frequencies, and cognitive tasks. The magnitude of coherence showed power law decay with increasing distance from the microelectrode arrays, where the highest coherence occurred between ECoG contacts, followed by coherence between ECoG and deep cortical local field potential (LFP), and then SFC (i.e., ECoG > LFP > SFC). The spectral frequency of coherence also affected its magnitude. Alpha coherence (8-14 Hz) was generally higher than other frequencies for signals nearest the microelectrode arrays, whereas delta coherence (1-3 Hz) was higher for signals that were farther away. Action potentials in all brain regions were most coherent with the phase of alpha oscillations, which suggests that alpha waves could play a larger, more spatially local role in spike timing than other frequencies. These findings provide a deeper understanding of the spatial and spectral dynamics of neuronal synchronization, further advancing knowledge about how activity propagates across the human brain.SIGNIFICANCE STATEMENT Coherence is theorized to facilitate information transfer across cerebral space by providing a convenient electrophysiological mechanism to modulate membrane potentials in spatiotemporally complex patterns. Our work uses a multiscale approach to evaluate the spatial reach of phase coherence and spike-field coherence during cognitive tasks in humans. Locally, coherence can reach up to 3 cm around a given area of neocortex. The spectral properties of coherence revealed that alpha phase-field and spike-field coherence were higher within ranges <2 cm, whereas lower-frequency delta coherence was higher for contacts farther away. Spatiotemporally shared information (i.e., coherence) across neocortex seems to reach farther than field potentials alone.


Assuntos
Neocórtex , Potenciais de Ação/fisiologia , Eletrocorticografia , Humanos , Microeletrodos , Neurônios/fisiologia
5.
Neuroimage ; 266: 119819, 2023 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-36529203

RESUMO

The human auditory system displays a robust capacity to adapt to sudden changes in background noise, allowing for continuous speech comprehension despite changes in background environments. However, despite comprehensive studies characterizing this ability, the computations that underly this process are not well understood. The first step towards understanding a complex system is to propose a suitable model, but the classical and easily interpreted model for the auditory system, the spectro-temporal receptive field (STRF), cannot match the nonlinear neural dynamics involved in noise adaptation. Here, we utilize a deep neural network (DNN) to model neural adaptation to noise, illustrating its effectiveness at reproducing the complex dynamics at the levels of both individual electrodes and the cortical population. By closely inspecting the model's STRF-like computations over time, we find that the model alters both the gain and shape of its receptive field when adapting to a sudden noise change. We show that the DNN model's gain changes allow it to perform adaptive gain control, while the spectro-temporal change creates noise filtering by altering the inhibitory region of the model's receptive field. Further, we find that models of electrodes in nonprimary auditory cortex also exhibit noise filtering changes in their excitatory regions, suggesting differences in noise filtering mechanisms along the cortical hierarchy. These findings demonstrate the capability of deep neural networks to model complex neural adaptation and offer new hypotheses about the computations the auditory cortex performs to enable noise-robust speech perception in real-world, dynamic environments.


Assuntos
Córtex Auditivo , Humanos , Estimulação Acústica/métodos , Percepção Auditiva , Neurônios , Redes Neurais de Computação
6.
J Neurosci ; 40(44): 8530-8542, 2020 10 28.
Artigo em Inglês | MEDLINE | ID: mdl-33023923

RESUMO

Natural conversation is multisensory: when we can see the speaker's face, visual speech cues improve our comprehension. The neuronal mechanisms underlying this phenomenon remain unclear. The two main alternatives are visually mediated phase modulation of neuronal oscillations (excitability fluctuations) in auditory neurons and visual input-evoked responses in auditory neurons. Investigating this question using naturalistic audiovisual speech with intracranial recordings in humans of both sexes, we find evidence for both mechanisms. Remarkably, auditory cortical neurons track the temporal dynamics of purely visual speech using the phase of their slow oscillations and phase-related modulations in broadband high-frequency activity. Consistent with known perceptual enhancement effects, the visual phase reset amplifies the cortical representation of concomitant auditory speech. In contrast to this, and in line with earlier reports, visual input reduces the amplitude of evoked responses to concomitant auditory input. We interpret the combination of improved phase tracking and reduced response amplitude as evidence for more efficient and reliable stimulus processing in the presence of congruent auditory and visual speech inputs.SIGNIFICANCE STATEMENT Watching the speaker can facilitate our understanding of what is being said. The mechanisms responsible for this influence of visual cues on the processing of speech remain incompletely understood. We studied these mechanisms by recording the electrical activity of the human brain through electrodes implanted surgically inside the brain. We found that visual inputs can operate by directly activating auditory cortical areas, and also indirectly by modulating the strength of cortical responses to auditory input. Our results help to understand the mechanisms by which the brain merges auditory and visual speech into a unitary perception.


Assuntos
Córtex Auditivo/fisiologia , Potenciais Evocados/fisiologia , Comunicação não Verbal/fisiologia , Adulto , Epilepsia Resistente a Medicamentos/cirurgia , Eletrocorticografia , Potenciais Evocados Auditivos/fisiologia , Potenciais Evocados Visuais/fisiologia , Feminino , Humanos , Pessoa de Meia-Idade , Neurônios/fisiologia , Comunicação não Verbal/psicologia , Estimulação Luminosa , Adulto Jovem
7.
Neuroimage ; 227: 117586, 2021 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-33346131

RESUMO

Acquiring a new language requires individuals to simultaneously and gradually learn linguistic attributes on multiple levels. Here, we investigated how this learning process changes the neural encoding of natural speech by assessing the encoding of the linguistic feature hierarchy in second-language listeners. Electroencephalography (EEG) signals were recorded from native Mandarin speakers with varied English proficiency and from native English speakers while they listened to audio-stories in English. We measured the temporal response functions (TRFs) for acoustic, phonemic, phonotactic, and semantic features in individual participants and found a main effect of proficiency on linguistic encoding. This effect of second-language proficiency was particularly prominent on the neural encoding of phonemes, showing stronger encoding of "new" phonemic contrasts (i.e., English contrasts that do not exist in Mandarin) with increasing proficiency. Overall, we found that the nonnative listeners with higher proficiency levels had a linguistic feature representation more similar to that of native listeners, which enabled the accurate decoding of language proficiency. This result advances our understanding of the cortical processing of linguistic information in second-language learners and provides an objective measure of language proficiency.


Assuntos
Encéfalo/fisiologia , Compreensão/fisiologia , Multilinguismo , Percepção da Fala/fisiologia , Adolescente , Adulto , Eletroencefalografia , Feminino , Humanos , Idioma , Masculino , Pessoa de Meia-Idade , Fonética , Adulto Jovem
8.
Neuroimage ; 235: 118003, 2021 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-33789135

RESUMO

Heschl's gyrus (HG) is a brain area that includes the primary auditory cortex in humans. Due to the limitations in obtaining direct neural measurements from this region during naturalistic speech listening, the functional organization and the role of HG in speech perception remain uncertain. Here, we used intracranial EEG to directly record neural activity in HG in eight neurosurgical patients as they listened to continuous speech stories. We studied the spatial distribution of acoustic tuning and the organization of linguistic feature encoding. We found a main gradient of change from posteromedial to anterolateral parts of HG. We also observed a decrease in frequency and temporal modulation tuning and an increase in phonemic representation, speaker normalization, speech sensitivity, and response latency. We did not observe a difference between the two brain hemispheres. These findings reveal a functional role for HG in processing and transforming simple to complex acoustic features and inform neurophysiological models of speech processing in the human auditory cortex.


Assuntos
Córtex Auditivo/fisiologia , Mapeamento Encefálico , Percepção da Fala/fisiologia , Adulto , Eletrocorticografia , Epilepsia/diagnóstico , Epilepsia/cirurgia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Procedimentos Neurocirúrgicos
9.
Neuroimage ; 223: 117282, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32828921

RESUMO

Hearing-impaired people often struggle to follow the speech stream of an individual talker in noisy environments. Recent studies show that the brain tracks attended speech and that the attended talker can be decoded from neural data on a single-trial level. This raises the possibility of "neuro-steered" hearing devices in which the brain-decoded intention of a hearing-impaired listener is used to enhance the voice of the attended speaker from a speech separation front-end. So far, methods that use this paradigm have focused on optimizing the brain decoding and the acoustic speech separation independently. In this work, we propose a novel framework called brain-informed speech separation (BISS)1 in which the information about the attended speech, as decoded from the subject's brain, is directly used to perform speech separation in the front-end. We present a deep learning model that uses neural data to extract the clean audio signal that a listener is attending to from a multi-talker speech mixture. We show that the framework can be applied successfully to the decoded output from either invasive intracranial electroencephalography (iEEG) or non-invasive electroencephalography (EEG) recordings from hearing-impaired subjects. It also results in improved speech separation, even in scenes with background noise. The generalization capability of the system renders it a perfect candidate for neuro-steered hearing-assistive devices.


Assuntos
Encéfalo/fisiologia , Eletroencefalografia , Processamento de Sinais Assistido por Computador , Acústica da Fala , Percepção da Fala/fisiologia , Estimulação Acústica , Adulto , Algoritmos , Aprendizado Profundo , Perda Auditiva/fisiopatologia , Humanos , Pessoa de Meia-Idade
10.
Nature ; 495(7441): 327-32, 2013 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-23426266

RESUMO

Speaking is one of the most complex actions that we perform, but nearly all of us learn to do it effortlessly. Production of fluent speech requires the precise, coordinated movement of multiple articulators (for example, the lips, jaw, tongue and larynx) over rapid time scales. Here we used high-resolution, multi-electrode cortical recordings during the production of consonant-vowel syllables to determine the organization of speech sensorimotor cortex in humans. We found speech-articulator representations that are arranged somatotopically on ventral pre- and post-central gyri, and that partially overlap at individual electrodes. These representations were coordinated temporally as sequences during syllable production. Spatial patterns of cortical activity showed an emergent, population-level representation, which was organized by phonetic features. Over tens of milliseconds, the spatial patterns transitioned between distinct representations for different consonants and vowels. These results reveal the dynamic organization of speech sensorimotor cortex during the generation of multi-articulator movements that underlies our ability to speak.


Assuntos
Córtex Cerebral/fisiologia , Fala/fisiologia , Fenômenos Eletromagnéticos , Retroalimentação Sensorial/fisiologia , Humanos , Fonética , Análise de Componente Principal , Fatores de Tempo
11.
J Acoust Soc Am ; 146(6): EL459, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31893764

RESUMO

The analyses of more than 2000 spectro-temporal receptive fields (STRFs) obtained from the ferret primary auditory cortex revealed their dominant encoding properties along time and frequency domains. Results showed that the peak responses of the STRFs were roughly aligned along the time axis, and enhanced low modulation components of the signal around 3 Hz. On the contrary, the peaks of the STRF population along the frequency axis varied and were distributed widely. Further analyses revealed that there were some general properties along the frequency axis around the best frequency of each STRF. The STRFs enhanced modulation frequencies around 0.25 cycles/octave. These findings are consistent with techniques that have been reported in the literature to result in improvements in speech recognition performance, highlighting the significant potential of biological insights into the design of better machines.


Assuntos
Potenciais de Ação/fisiologia , Córtex Auditivo/fisiologia , Percepção Auditiva/fisiologia , Potenciais Evocados Auditivos/fisiologia , Estimulação Acústica/métodos , Animais , Furões , Modelos Neurológicos , Neurônios/fisiologia
12.
J Neurosci ; 37(8): 2176-2185, 2017 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-28119400

RESUMO

Humans are unique in their ability to communicate using spoken language. However, it remains unclear how the speech signal is transformed and represented in the brain at different stages of the auditory pathway. In this study, we characterized electroencephalography responses to continuous speech by obtaining the time-locked responses to phoneme instances (phoneme-related potential). We showed that responses to different phoneme categories are organized by phonetic features. We found that each instance of a phoneme in continuous speech produces multiple distinguishable neural responses occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Comparing the patterns of phoneme similarity in the neural responses and the acoustic signals confirms a repetitive appearance of acoustic distinctions of phonemes in the neural data. Analysis of the phonetic and speaker information in neural activations revealed that different time intervals jointly encode the acoustic similarity of both phonetic and speaker categories. These findings provide evidence for a dynamic neural transformation of low-level speech features as they propagate along the auditory pathway, and form an empirical framework to study the representational changes in learning, attention, and speech disorders.SIGNIFICANCE STATEMENT We characterized the properties of evoked neural responses to phoneme instances in continuous speech. We show that each instance of a phoneme in continuous speech produces several observable neural responses at different times occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Each temporal event explicitly encodes the acoustic similarity of phonemes, and linguistic and nonlinguistic information are best represented at different time intervals. Finally, we show a joint encoding of phonetic and speaker information, where the neural representation of speakers is dependent on phoneme category. These findings provide compelling new evidence for dynamic processing of speech sounds in the auditory pathway.


Assuntos
Mapeamento Encefálico , Potenciais Evocados Auditivos/fisiologia , Fonética , Percepção da Fala/fisiologia , Fala/fisiologia , Estimulação Acústica , Acústica , Eletroencefalografia , Feminino , Humanos , Idioma , Masculino , Tempo de Reação , Estatística como Assunto , Fatores de Tempo
13.
Nature ; 485(7397): 233-6, 2012 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-22522927

RESUMO

Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener's intended goal.


Assuntos
Estimulação Acústica , Atenção/fisiologia , Córtex Auditivo/fisiologia , Percepção da Fala/fisiologia , Fala , Acústica , Eletrodos , Feminino , Humanos , Idioma , Masculino , Modelos Neurológicos , Ruído , Espectrografia do Som
14.
J Neurosci ; 36(49): 12338-12350, 2016 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-27927954

RESUMO

A primary goal of auditory neuroscience is to identify the sound features extracted and represented by auditory neurons. Linear encoding models, which describe neural responses as a function of the stimulus, have been primarily used for this purpose. Here, we provide theoretical arguments and experimental evidence in support of an alternative approach, based on decoding the stimulus from the neural response. We used a Bayesian normative approach to predict the responses of neurons detecting relevant auditory features, despite ambiguities and noise. We compared the model predictions to recordings from the primary auditory cortex of ferrets and found that: (1) the decoding filters of auditory neurons resemble the filters learned from the statistics of speech sounds; (2) the decoding model captures the dynamics of responses better than a linear encoding model of similar complexity; and (3) the decoding model accounts for the accuracy with which the stimulus is represented in neural activity, whereas linear encoding model performs very poorly. Most importantly, our model predicts that neuronal responses are fundamentally shaped by "explaining away," a divisive competition between alternative interpretations of the auditory scene. SIGNIFICANCE STATEMENT: Neural responses in the auditory cortex are dynamic, nonlinear, and hard to predict. Traditionally, encoding models have been used to describe neural responses as a function of the stimulus. However, in addition to external stimulation, neural activity is strongly modulated by the responses of other neurons in the network. We hypothesized that auditory neurons aim to collectively decode their stimulus. In particular, a stimulus feature that is decoded (or explained away) by one neuron is not explained by another. We demonstrated that this novel Bayesian decoding model is better at capturing the dynamic responses of cortical neurons in ferrets. Whereas the linear encoding model poorly reflects selectivity of neurons, the decoding model can account for the strong nonlinearities observed in neural data.


Assuntos
Percepção Auditiva/fisiologia , Furões/fisiologia , Células Receptoras Sensoriais/fisiologia , Estimulação Acústica , Algoritmos , Animais , Córtex Auditivo/fisiologia , Teorema de Bayes , Feminino , Masculino , Modelos Neurológicos , Rede Nervosa/citologia , Rede Nervosa/fisiologia , Ruído , Fonética
15.
J Neurosci ; 36(6): 2014-26, 2016 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-26865624

RESUMO

The human superior temporal gyrus (STG) is critical for speech perception, yet the organization of spectrotemporal processing of speech within the STG is not well understood. Here, to characterize the spatial organization of spectrotemporal processing of speech across human STG, we use high-density cortical surface field potential recordings while participants listened to natural continuous speech. While synthetic broad-band stimuli did not yield sustained activation of the STG, spectrotemporal receptive fields could be reconstructed from vigorous responses to speech stimuli. We find that the human STG displays a robust anterior-posterior spatial distribution of spectrotemporal tuning in which the posterior STG is tuned for temporally fast varying speech sounds that have relatively constant energy across the frequency axis (low spectral modulation) while the anterior STG is tuned for temporally slow varying speech sounds that have a high degree of spectral variation across the frequency axis (high spectral modulation). This work illustrates organization of spectrotemporal processing in the human STG, and illuminates processing of ethologically relevant speech signals in a region of the brain specialized for speech perception. SIGNIFICANCE STATEMENT: Considerable evidence has implicated the human superior temporal gyrus (STG) in speech processing. However, the gross organization of spectrotemporal processing of speech within the STG is not well characterized. Here we use natural speech stimuli and advanced receptive field characterization methods to show that spectrotemporal features within speech are well organized along the posterior-to-anterior axis of the human STG. These findings demonstrate robust functional organization based on spectrotemporal modulation content, and illustrate that much of the encoded information in the STG represents the physical acoustic properties of speech stimuli.


Assuntos
Percepção da Fala/fisiologia , Lobo Temporal/fisiologia , Estimulação Acústica , Adulto , Algoritmos , Mapeamento Encefálico , Metabolismo Energético/fisiologia , Potenciais Evocados/fisiologia , Feminino , Humanos , Masculino , Fonética
16.
Proc Natl Acad Sci U S A ; 111(18): 6792-7, 2014 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-24753585

RESUMO

Humans and animals can reliably perceive behaviorally relevant sounds in noisy and reverberant environments, yet the neural mechanisms behind this phenomenon are largely unknown. To understand how neural circuits represent degraded auditory stimuli with additive and reverberant distortions, we compared single-neuron responses in ferret primary auditory cortex to speech and vocalizations in four conditions: clean, additive white and pink (1/f) noise, and reverberation. Despite substantial distortion, responses of neurons to the vocalization signal remained stable, maintaining the same statistical distribution in all conditions. Stimulus spectrograms reconstructed from population responses to the distorted stimuli resembled more the original clean than the distorted signals. To explore mechanisms contributing to this robustness, we simulated neural responses using several spectrotemporal receptive field models that incorporated either a static nonlinearity or subtractive synaptic depression and multiplicative gain normalization. The static model failed to suppress the distortions. A dynamic model incorporating feed-forward synaptic depression could account for the reduction of additive noise, but only the combined model with feedback gain normalization was able to predict the effects across both additive and reverberant conditions. Thus, both mechanisms can contribute to the abilities of humans and animals to extract relevant sounds in diverse noisy environments.


Assuntos
Córtex Auditivo/fisiologia , Percepção da Fala/fisiologia , Estimulação Acústica , Animais , Feminino , Furões/fisiologia , Humanos , Modelos Neurológicos , Neurônios/fisiologia , Ruído , Dinâmica não Linear , Vocalização Animal
17.
Cereb Cortex ; 25(7): 1697-706, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24429136

RESUMO

How humans solve the cocktail party problem remains unknown. However, progress has been made recently thanks to the realization that cortical activity tracks the amplitude envelope of speech. This has led to the development of regression methods for studying the neurophysiology of continuous speech. One such method, known as stimulus-reconstruction, has been successfully utilized with cortical surface recordings and magnetoencephalography (MEG). However, the former is invasive and gives a relatively restricted view of processing along the auditory hierarchy, whereas the latter is expensive and rare. Thus it would be extremely useful for research in many populations if stimulus-reconstruction was effective using electroencephalography (EEG), a widely available and inexpensive technology. Here we show that single-trial (≈60 s) unaveraged EEG data can be decoded to determine attentional selection in a naturalistic multispeaker environment. Furthermore, we show a significant correlation between our EEG-based measure of attention and performance on a high-level attention task. In addition, by attempting to decode attention at individual latencies, we identify neural processing at ∼200 ms as being critical for solving the cocktail party problem. These findings open up new avenues for studying the ongoing dynamics of cognition using EEG and for developing effective and natural brain-computer interfaces.


Assuntos
Atenção/fisiologia , Encéfalo/fisiologia , Eletroencefalografia/métodos , Processamento de Sinais Assistido por Computador , Percepção da Fala/fisiologia , Estimulação Acústica , Adulto , Feminino , Humanos , Masculino , Testes Neuropsicológicos , Fatores de Tempo
18.
PLoS Biol ; 10(1): e1001251, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22303281

RESUMO

How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex.


Assuntos
Córtex Auditivo/fisiologia , Mapeamento Encefálico , Acústica da Fala , Algoritmos , Simulação por Computador , Eletrodos Implantados , Eletroencefalografia , Feminino , Humanos , Modelos Lineares , Masculino , Modelos Biológicos
19.
J Acoust Soc Am ; 135(3): 1380-91, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24606276

RESUMO

Sounds such as the voice or musical instruments can be recognized on the basis of timbre alone. Here, sound recognition was investigated with severely reduced timbre cues. Short snippets of naturally recorded sounds were extracted from a large corpus. Listeners were asked to report a target category (e.g., sung voices) among other sounds (e.g., musical instruments). All sound categories covered the same pitch range, so the task had to be solved on timbre cues alone. The minimum duration for which performance was above chance was found to be short, on the order of a few milliseconds, with the best performance for voice targets. Performance was independent of pitch and was maintained when stimuli contained less than a full waveform cycle. Recognition was not generally better when the sound snippets were time-aligned with the sound onset compared to when they were extracted with a random starting time. Finally, performance did not depend on feedback or training, suggesting that the cues used by listeners in the artificial gating task were similar to those relevant for longer, more familiar sounds. The results show that timbre cues for sound recognition are available at a variety of time scales, including very short ones.


Assuntos
Sinais (Psicologia) , Discriminação Psicológica , Discriminação da Altura Tonal , Reconhecimento Psicológico , Estimulação Acústica , Adulto , Análise de Variância , Audiometria , Retroalimentação Psicológica , Feminino , Humanos , Masculino , Música , Filtro Sensorial , Detecção de Sinal Psicológico , Canto , Espectrografia do Som , Fatores de Tempo , Qualidade da Voz , Adulto Jovem
20.
Artigo em Inglês | MEDLINE | ID: mdl-39049981

RESUMO

In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augments different levels of audio features, including waveforms, Mel spectrograms, and generalized cross-correlation (GCC) features. In addition, we introduce simple yet effective channel-wise augmentation methods to randomly swap the order of the microphones and mask Mel and GCC channels. By using these augmentations, we find that linear layers on top of the learned representation significantly outperform supervised models in terms of both event classification accuracy and localization error. We also perform a comprehensive analysis of the effect of each augmentation method and a comparison of the fine-tuning performance using different amounts of labeled data.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA