Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 626(7999): 593-602, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38093008

RESUMEN

Understanding the neural basis of speech perception requires that we study the human brain both at the scale of the fundamental computational unit of neurons and in their organization across the depth of cortex. Here we used high-density Neuropixels arrays1-3 to record from 685 neurons across cortical layers at nine sites in a high-level auditory region that is critical for speech, the superior temporal gyrus4,5, while participants listened to spoken sentences. Single neurons encoded a wide range of speech sound cues, including features of consonants and vowels, relative vocal pitch, onsets, amplitude envelope and sequence statistics. Neurons at each cross-laminar recording exhibited dominant tuning to a primary speech feature while also containing a substantial proportion of neurons that encoded other features contributing to heterogeneous selectivity. Spatially, neurons at similar cortical depths tended to encode similar speech features. Activity across all cortical layers was predictive of high-frequency field potentials (electrocorticography), providing a neuronal origin for macroelectrode recordings from the cortical surface. Together, these results establish single-neuron tuning across the cortical laminae as an important dimension of speech encoding in human superior temporal gyrus.


Asunto(s)
Corteza Auditiva , Neuronas , Percepción del Habla , Lóbulo Temporal , Humanos , Estimulación Acústica , Corteza Auditiva/citología , Corteza Auditiva/fisiología , Neuronas/fisiología , Fonética , Habla , Percepción del Habla/fisiología , Lóbulo Temporal/citología , Lóbulo Temporal/fisiología , Señales (Psicología) , Electrodos
2.
PLoS Biol ; 21(6): e3002128, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37279203

RESUMEN

Humans can easily tune in to one talker in a multitalker environment while still picking up bits of background speech; however, it remains unclear how we perceive speech that is masked and to what degree non-target speech is processed. Some models suggest that perception can be achieved through glimpses, which are spectrotemporal regions where a talker has more energy than the background. Other models, however, require the recovery of the masked regions. To clarify this issue, we directly recorded from primary and non-primary auditory cortex (AC) in neurosurgical patients as they attended to one talker in multitalker speech and trained temporal response function models to predict high-gamma neural activity from glimpsed and masked stimulus features. We found that glimpsed speech is encoded at the level of phonetic features for target and non-target talkers, with enhanced encoding of target speech in non-primary AC. In contrast, encoding of masked phonetic features was found only for the target, with a greater response latency and distinct anatomical organization compared to glimpsed phonetic features. These findings suggest separate mechanisms for encoding glimpsed and masked speech and provide neural evidence for the glimpsing model of speech perception.


Asunto(s)
Percepción del Habla , Habla , Humanos , Habla/fisiología , Estimulación Acústica , Fonética , Percepción del Habla/fisiología , Tiempo de Reacción
3.
J Neurosci ; 42(17): 3648-3658, 2022 04 27.
Artículo en Inglés | MEDLINE | ID: mdl-35347046

RESUMEN

Speech perception in noise is a challenging everyday task with which many listeners have difficulty. Here, we report a case in which electrical brain stimulation of implanted intracranial electrodes in the left planum temporale (PT) of a neurosurgical patient significantly and reliably improved subjective quality (up to 50%) and objective intelligibility (up to 97%) of speech in noise perception. Stimulation resulted in a selective enhancement of speech sounds compared with the background noises. The receptive fields of the PT sites whose stimulation improved speech perception were tuned to spectrally broad and rapidly changing sounds. Corticocortical evoked potential analysis revealed that the PT sites were located between the sites in Heschl's gyrus and the superior temporal gyrus. Moreover, the discriminability of speech from nonspeech sounds increased in population neural responses from Heschl's gyrus to the PT to the superior temporal gyrus sites. These findings causally implicate the PT in background noise suppression and may point to a novel potential neuroprosthetic solution to assist in the challenging task of speech perception in noise.SIGNIFICANCE STATEMENT Speech perception in noise remains a challenging task for many individuals. Here, we present a case in which the electrical brain stimulation of intracranially implanted electrodes in the planum temporale of a neurosurgical patient significantly improved both the subjective quality (up to 50%) and objective intelligibility (up to 97%) of speech perception in noise. Stimulation resulted in a selective enhancement of speech sounds compared with the background noises. Our local and network-level functional analyses placed the planum temporale sites in between the sites in the primary auditory areas in Heschl's gyrus and nonprimary auditory areas in the superior temporal gyrus. These findings causally implicate planum temporale in acoustic scene analysis and suggest potential neuroprosthetic applications to assist hearing in noise.


Asunto(s)
Corteza Auditiva , Percepción del Habla , Estimulación Acústica , Corteza Auditiva/fisiología , Encéfalo , Mapeo Encefálico/métodos , Audición , Humanos , Imagen por Resonancia Magnética/métodos , Habla/fisiología , Percepción del Habla/fisiología
4.
J Neurosci ; 42(32): 6285-6294, 2022 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-35790403

RESUMEN

Neuronal coherence is thought to be a fundamental mechanism of communication in the brain, where synchronized field potentials coordinate synaptic and spiking events to support plasticity and learning. Although the spread of field potentials has garnered great interest, little is known about the spatial reach of phase synchronization, or neuronal coherence. Functional connectivity between different brain regions is known to occur across long distances, but the locality of synchronization across the neocortex is understudied. Here we used simultaneous recordings from electrocorticography (ECoG) grids and high-density microelectrode arrays to estimate the spatial reach of neuronal coherence and spike-field coherence (SFC) across frontal, temporal, and occipital cortices during cognitive tasks in humans. We observed the strongest coherence within a 2-3 cm distance from the microelectrode arrays, potentially defining an effective range for local communication. This range was relatively consistent across brain regions, spectral frequencies, and cognitive tasks. The magnitude of coherence showed power law decay with increasing distance from the microelectrode arrays, where the highest coherence occurred between ECoG contacts, followed by coherence between ECoG and deep cortical local field potential (LFP), and then SFC (i.e., ECoG > LFP > SFC). The spectral frequency of coherence also affected its magnitude. Alpha coherence (8-14 Hz) was generally higher than other frequencies for signals nearest the microelectrode arrays, whereas delta coherence (1-3 Hz) was higher for signals that were farther away. Action potentials in all brain regions were most coherent with the phase of alpha oscillations, which suggests that alpha waves could play a larger, more spatially local role in spike timing than other frequencies. These findings provide a deeper understanding of the spatial and spectral dynamics of neuronal synchronization, further advancing knowledge about how activity propagates across the human brain.SIGNIFICANCE STATEMENT Coherence is theorized to facilitate information transfer across cerebral space by providing a convenient electrophysiological mechanism to modulate membrane potentials in spatiotemporally complex patterns. Our work uses a multiscale approach to evaluate the spatial reach of phase coherence and spike-field coherence during cognitive tasks in humans. Locally, coherence can reach up to 3 cm around a given area of neocortex. The spectral properties of coherence revealed that alpha phase-field and spike-field coherence were higher within ranges <2 cm, whereas lower-frequency delta coherence was higher for contacts farther away. Spatiotemporally shared information (i.e., coherence) across neocortex seems to reach farther than field potentials alone.


Asunto(s)
Neocórtex , Potenciales de Acción/fisiología , Electrocorticografía , Humanos , Microelectrodos , Neuronas/fisiología
5.
Neuroimage ; 266: 119819, 2023 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-36529203

RESUMEN

The human auditory system displays a robust capacity to adapt to sudden changes in background noise, allowing for continuous speech comprehension despite changes in background environments. However, despite comprehensive studies characterizing this ability, the computations that underly this process are not well understood. The first step towards understanding a complex system is to propose a suitable model, but the classical and easily interpreted model for the auditory system, the spectro-temporal receptive field (STRF), cannot match the nonlinear neural dynamics involved in noise adaptation. Here, we utilize a deep neural network (DNN) to model neural adaptation to noise, illustrating its effectiveness at reproducing the complex dynamics at the levels of both individual electrodes and the cortical population. By closely inspecting the model's STRF-like computations over time, we find that the model alters both the gain and shape of its receptive field when adapting to a sudden noise change. We show that the DNN model's gain changes allow it to perform adaptive gain control, while the spectro-temporal change creates noise filtering by altering the inhibitory region of the model's receptive field. Further, we find that models of electrodes in nonprimary auditory cortex also exhibit noise filtering changes in their excitatory regions, suggesting differences in noise filtering mechanisms along the cortical hierarchy. These findings demonstrate the capability of deep neural networks to model complex neural adaptation and offer new hypotheses about the computations the auditory cortex performs to enable noise-robust speech perception in real-world, dynamic environments.


Asunto(s)
Corteza Auditiva , Humanos , Estimulación Acústica/métodos , Percepción Auditiva , Neuronas , Redes Neurales de la Computación
6.
J Neurosci ; 40(44): 8530-8542, 2020 10 28.
Artículo en Inglés | MEDLINE | ID: mdl-33023923

RESUMEN

Natural conversation is multisensory: when we can see the speaker's face, visual speech cues improve our comprehension. The neuronal mechanisms underlying this phenomenon remain unclear. The two main alternatives are visually mediated phase modulation of neuronal oscillations (excitability fluctuations) in auditory neurons and visual input-evoked responses in auditory neurons. Investigating this question using naturalistic audiovisual speech with intracranial recordings in humans of both sexes, we find evidence for both mechanisms. Remarkably, auditory cortical neurons track the temporal dynamics of purely visual speech using the phase of their slow oscillations and phase-related modulations in broadband high-frequency activity. Consistent with known perceptual enhancement effects, the visual phase reset amplifies the cortical representation of concomitant auditory speech. In contrast to this, and in line with earlier reports, visual input reduces the amplitude of evoked responses to concomitant auditory input. We interpret the combination of improved phase tracking and reduced response amplitude as evidence for more efficient and reliable stimulus processing in the presence of congruent auditory and visual speech inputs.SIGNIFICANCE STATEMENT Watching the speaker can facilitate our understanding of what is being said. The mechanisms responsible for this influence of visual cues on the processing of speech remain incompletely understood. We studied these mechanisms by recording the electrical activity of the human brain through electrodes implanted surgically inside the brain. We found that visual inputs can operate by directly activating auditory cortical areas, and also indirectly by modulating the strength of cortical responses to auditory input. Our results help to understand the mechanisms by which the brain merges auditory and visual speech into a unitary perception.


Asunto(s)
Corteza Auditiva/fisiología , Potenciales Evocados/fisiología , Comunicación no Verbal/fisiología , Adulto , Epilepsia Refractaria/cirugía , Electrocorticografía , Potenciales Evocados Auditivos/fisiología , Potenciales Evocados Visuales/fisiología , Femenino , Humanos , Persona de Mediana Edad , Neuronas/fisiología , Comunicación no Verbal/psicología , Estimulación Luminosa , Adulto Joven
7.
Neuroimage ; 227: 117586, 2021 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-33346131

RESUMEN

Acquiring a new language requires individuals to simultaneously and gradually learn linguistic attributes on multiple levels. Here, we investigated how this learning process changes the neural encoding of natural speech by assessing the encoding of the linguistic feature hierarchy in second-language listeners. Electroencephalography (EEG) signals were recorded from native Mandarin speakers with varied English proficiency and from native English speakers while they listened to audio-stories in English. We measured the temporal response functions (TRFs) for acoustic, phonemic, phonotactic, and semantic features in individual participants and found a main effect of proficiency on linguistic encoding. This effect of second-language proficiency was particularly prominent on the neural encoding of phonemes, showing stronger encoding of "new" phonemic contrasts (i.e., English contrasts that do not exist in Mandarin) with increasing proficiency. Overall, we found that the nonnative listeners with higher proficiency levels had a linguistic feature representation more similar to that of native listeners, which enabled the accurate decoding of language proficiency. This result advances our understanding of the cortical processing of linguistic information in second-language learners and provides an objective measure of language proficiency.


Asunto(s)
Encéfalo/fisiología , Comprensión/fisiología , Multilingüismo , Percepción del Habla/fisiología , Adolescente , Adulto , Electroencefalografía , Femenino , Humanos , Lenguaje , Masculino , Persona de Mediana Edad , Fonética , Adulto Joven
8.
Neuroimage ; 235: 118003, 2021 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-33789135

RESUMEN

Heschl's gyrus (HG) is a brain area that includes the primary auditory cortex in humans. Due to the limitations in obtaining direct neural measurements from this region during naturalistic speech listening, the functional organization and the role of HG in speech perception remain uncertain. Here, we used intracranial EEG to directly record neural activity in HG in eight neurosurgical patients as they listened to continuous speech stories. We studied the spatial distribution of acoustic tuning and the organization of linguistic feature encoding. We found a main gradient of change from posteromedial to anterolateral parts of HG. We also observed a decrease in frequency and temporal modulation tuning and an increase in phonemic representation, speaker normalization, speech sensitivity, and response latency. We did not observe a difference between the two brain hemispheres. These findings reveal a functional role for HG in processing and transforming simple to complex acoustic features and inform neurophysiological models of speech processing in the human auditory cortex.


Asunto(s)
Corteza Auditiva/fisiología , Mapeo Encefálico , Percepción del Habla/fisiología , Adulto , Electrocorticografía , Epilepsia/diagnóstico , Epilepsia/cirugía , Femenino , Humanos , Masculino , Persona de Mediana Edad , Procedimientos Neuroquirúrgicos
9.
Neuroimage ; 223: 117282, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-32828921

RESUMEN

Hearing-impaired people often struggle to follow the speech stream of an individual talker in noisy environments. Recent studies show that the brain tracks attended speech and that the attended talker can be decoded from neural data on a single-trial level. This raises the possibility of "neuro-steered" hearing devices in which the brain-decoded intention of a hearing-impaired listener is used to enhance the voice of the attended speaker from a speech separation front-end. So far, methods that use this paradigm have focused on optimizing the brain decoding and the acoustic speech separation independently. In this work, we propose a novel framework called brain-informed speech separation (BISS)1 in which the information about the attended speech, as decoded from the subject's brain, is directly used to perform speech separation in the front-end. We present a deep learning model that uses neural data to extract the clean audio signal that a listener is attending to from a multi-talker speech mixture. We show that the framework can be applied successfully to the decoded output from either invasive intracranial electroencephalography (iEEG) or non-invasive electroencephalography (EEG) recordings from hearing-impaired subjects. It also results in improved speech separation, even in scenes with background noise. The generalization capability of the system renders it a perfect candidate for neuro-steered hearing-assistive devices.


Asunto(s)
Encéfalo/fisiología , Electroencefalografía , Procesamiento de Señales Asistido por Computador , Acústica del Lenguaje , Percepción del Habla/fisiología , Estimulación Acústica , Adulto , Algoritmos , Aprendizaje Profundo , Pérdida Auditiva/fisiopatología , Humanos , Persona de Mediana Edad
10.
Nature ; 495(7441): 327-32, 2013 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-23426266

RESUMEN

Speaking is one of the most complex actions that we perform, but nearly all of us learn to do it effortlessly. Production of fluent speech requires the precise, coordinated movement of multiple articulators (for example, the lips, jaw, tongue and larynx) over rapid time scales. Here we used high-resolution, multi-electrode cortical recordings during the production of consonant-vowel syllables to determine the organization of speech sensorimotor cortex in humans. We found speech-articulator representations that are arranged somatotopically on ventral pre- and post-central gyri, and that partially overlap at individual electrodes. These representations were coordinated temporally as sequences during syllable production. Spatial patterns of cortical activity showed an emergent, population-level representation, which was organized by phonetic features. Over tens of milliseconds, the spatial patterns transitioned between distinct representations for different consonants and vowels. These results reveal the dynamic organization of speech sensorimotor cortex during the generation of multi-articulator movements that underlies our ability to speak.


Asunto(s)
Corteza Cerebral/fisiología , Habla/fisiología , Fenómenos Electromagnéticos , Retroalimentación Sensorial/fisiología , Humanos , Fonética , Análisis de Componente Principal , Factores de Tiempo
11.
J Acoust Soc Am ; 146(6): EL459, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31893764

RESUMEN

The analyses of more than 2000 spectro-temporal receptive fields (STRFs) obtained from the ferret primary auditory cortex revealed their dominant encoding properties along time and frequency domains. Results showed that the peak responses of the STRFs were roughly aligned along the time axis, and enhanced low modulation components of the signal around 3 Hz. On the contrary, the peaks of the STRF population along the frequency axis varied and were distributed widely. Further analyses revealed that there were some general properties along the frequency axis around the best frequency of each STRF. The STRFs enhanced modulation frequencies around 0.25 cycles/octave. These findings are consistent with techniques that have been reported in the literature to result in improvements in speech recognition performance, highlighting the significant potential of biological insights into the design of better machines.


Asunto(s)
Potenciales de Acción/fisiología , Corteza Auditiva/fisiología , Percepción Auditiva/fisiología , Potenciales Evocados Auditivos/fisiología , Estimulación Acústica/métodos , Animales , Hurones , Modelos Neurológicos , Neuronas/fisiología
12.
J Neurosci ; 37(8): 2176-2185, 2017 02 22.
Artículo en Inglés | MEDLINE | ID: mdl-28119400

RESUMEN

Humans are unique in their ability to communicate using spoken language. However, it remains unclear how the speech signal is transformed and represented in the brain at different stages of the auditory pathway. In this study, we characterized electroencephalography responses to continuous speech by obtaining the time-locked responses to phoneme instances (phoneme-related potential). We showed that responses to different phoneme categories are organized by phonetic features. We found that each instance of a phoneme in continuous speech produces multiple distinguishable neural responses occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Comparing the patterns of phoneme similarity in the neural responses and the acoustic signals confirms a repetitive appearance of acoustic distinctions of phonemes in the neural data. Analysis of the phonetic and speaker information in neural activations revealed that different time intervals jointly encode the acoustic similarity of both phonetic and speaker categories. These findings provide evidence for a dynamic neural transformation of low-level speech features as they propagate along the auditory pathway, and form an empirical framework to study the representational changes in learning, attention, and speech disorders.SIGNIFICANCE STATEMENT We characterized the properties of evoked neural responses to phoneme instances in continuous speech. We show that each instance of a phoneme in continuous speech produces several observable neural responses at different times occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Each temporal event explicitly encodes the acoustic similarity of phonemes, and linguistic and nonlinguistic information are best represented at different time intervals. Finally, we show a joint encoding of phonetic and speaker information, where the neural representation of speakers is dependent on phoneme category. These findings provide compelling new evidence for dynamic processing of speech sounds in the auditory pathway.


Asunto(s)
Mapeo Encefálico , Potenciales Evocados Auditivos/fisiología , Fonética , Percepción del Habla/fisiología , Habla/fisiología , Estimulación Acústica , Acústica , Electroencefalografía , Femenino , Humanos , Lenguaje , Masculino , Tiempo de Reacción , Estadística como Asunto , Factores de Tiempo
13.
Nature ; 485(7397): 233-6, 2012 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-22522927

RESUMEN

Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener's intended goal.


Asunto(s)
Estimulación Acústica , Atención/fisiología , Corteza Auditiva/fisiología , Percepción del Habla/fisiología , Habla , Acústica , Electrodos , Femenino , Humanos , Lenguaje , Masculino , Modelos Neurológicos , Ruido , Espectrografía del Sonido
14.
J Neurosci ; 36(49): 12338-12350, 2016 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-27927954

RESUMEN

A primary goal of auditory neuroscience is to identify the sound features extracted and represented by auditory neurons. Linear encoding models, which describe neural responses as a function of the stimulus, have been primarily used for this purpose. Here, we provide theoretical arguments and experimental evidence in support of an alternative approach, based on decoding the stimulus from the neural response. We used a Bayesian normative approach to predict the responses of neurons detecting relevant auditory features, despite ambiguities and noise. We compared the model predictions to recordings from the primary auditory cortex of ferrets and found that: (1) the decoding filters of auditory neurons resemble the filters learned from the statistics of speech sounds; (2) the decoding model captures the dynamics of responses better than a linear encoding model of similar complexity; and (3) the decoding model accounts for the accuracy with which the stimulus is represented in neural activity, whereas linear encoding model performs very poorly. Most importantly, our model predicts that neuronal responses are fundamentally shaped by "explaining away," a divisive competition between alternative interpretations of the auditory scene. SIGNIFICANCE STATEMENT: Neural responses in the auditory cortex are dynamic, nonlinear, and hard to predict. Traditionally, encoding models have been used to describe neural responses as a function of the stimulus. However, in addition to external stimulation, neural activity is strongly modulated by the responses of other neurons in the network. We hypothesized that auditory neurons aim to collectively decode their stimulus. In particular, a stimulus feature that is decoded (or explained away) by one neuron is not explained by another. We demonstrated that this novel Bayesian decoding model is better at capturing the dynamic responses of cortical neurons in ferrets. Whereas the linear encoding model poorly reflects selectivity of neurons, the decoding model can account for the strong nonlinearities observed in neural data.


Asunto(s)
Percepción Auditiva/fisiología , Hurones/fisiología , Células Receptoras Sensoriales/fisiología , Estimulación Acústica , Algoritmos , Animales , Corteza Auditiva/fisiología , Teorema de Bayes , Femenino , Masculino , Modelos Neurológicos , Red Nerviosa/citología , Red Nerviosa/fisiología , Ruido , Fonética
15.
J Neurosci ; 36(6): 2014-26, 2016 Feb 10.
Artículo en Inglés | MEDLINE | ID: mdl-26865624

RESUMEN

The human superior temporal gyrus (STG) is critical for speech perception, yet the organization of spectrotemporal processing of speech within the STG is not well understood. Here, to characterize the spatial organization of spectrotemporal processing of speech across human STG, we use high-density cortical surface field potential recordings while participants listened to natural continuous speech. While synthetic broad-band stimuli did not yield sustained activation of the STG, spectrotemporal receptive fields could be reconstructed from vigorous responses to speech stimuli. We find that the human STG displays a robust anterior-posterior spatial distribution of spectrotemporal tuning in which the posterior STG is tuned for temporally fast varying speech sounds that have relatively constant energy across the frequency axis (low spectral modulation) while the anterior STG is tuned for temporally slow varying speech sounds that have a high degree of spectral variation across the frequency axis (high spectral modulation). This work illustrates organization of spectrotemporal processing in the human STG, and illuminates processing of ethologically relevant speech signals in a region of the brain specialized for speech perception. SIGNIFICANCE STATEMENT: Considerable evidence has implicated the human superior temporal gyrus (STG) in speech processing. However, the gross organization of spectrotemporal processing of speech within the STG is not well characterized. Here we use natural speech stimuli and advanced receptive field characterization methods to show that spectrotemporal features within speech are well organized along the posterior-to-anterior axis of the human STG. These findings demonstrate robust functional organization based on spectrotemporal modulation content, and illustrate that much of the encoded information in the STG represents the physical acoustic properties of speech stimuli.


Asunto(s)
Percepción del Habla/fisiología , Lóbulo Temporal/fisiología , Estimulación Acústica , Adulto , Algoritmos , Mapeo Encefálico , Metabolismo Energético/fisiología , Potenciales Evocados/fisiología , Femenino , Humanos , Masculino , Fonética
16.
Proc Natl Acad Sci U S A ; 111(18): 6792-7, 2014 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-24753585

RESUMEN

Humans and animals can reliably perceive behaviorally relevant sounds in noisy and reverberant environments, yet the neural mechanisms behind this phenomenon are largely unknown. To understand how neural circuits represent degraded auditory stimuli with additive and reverberant distortions, we compared single-neuron responses in ferret primary auditory cortex to speech and vocalizations in four conditions: clean, additive white and pink (1/f) noise, and reverberation. Despite substantial distortion, responses of neurons to the vocalization signal remained stable, maintaining the same statistical distribution in all conditions. Stimulus spectrograms reconstructed from population responses to the distorted stimuli resembled more the original clean than the distorted signals. To explore mechanisms contributing to this robustness, we simulated neural responses using several spectrotemporal receptive field models that incorporated either a static nonlinearity or subtractive synaptic depression and multiplicative gain normalization. The static model failed to suppress the distortions. A dynamic model incorporating feed-forward synaptic depression could account for the reduction of additive noise, but only the combined model with feedback gain normalization was able to predict the effects across both additive and reverberant conditions. Thus, both mechanisms can contribute to the abilities of humans and animals to extract relevant sounds in diverse noisy environments.


Asunto(s)
Corteza Auditiva/fisiología , Percepción del Habla/fisiología , Estimulación Acústica , Animales , Femenino , Hurones/fisiología , Humanos , Modelos Neurológicos , Neuronas/fisiología , Ruido , Dinámicas no Lineales , Vocalización Animal
17.
Cereb Cortex ; 25(7): 1697-706, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24429136

RESUMEN

How humans solve the cocktail party problem remains unknown. However, progress has been made recently thanks to the realization that cortical activity tracks the amplitude envelope of speech. This has led to the development of regression methods for studying the neurophysiology of continuous speech. One such method, known as stimulus-reconstruction, has been successfully utilized with cortical surface recordings and magnetoencephalography (MEG). However, the former is invasive and gives a relatively restricted view of processing along the auditory hierarchy, whereas the latter is expensive and rare. Thus it would be extremely useful for research in many populations if stimulus-reconstruction was effective using electroencephalography (EEG), a widely available and inexpensive technology. Here we show that single-trial (≈60 s) unaveraged EEG data can be decoded to determine attentional selection in a naturalistic multispeaker environment. Furthermore, we show a significant correlation between our EEG-based measure of attention and performance on a high-level attention task. In addition, by attempting to decode attention at individual latencies, we identify neural processing at ∼200 ms as being critical for solving the cocktail party problem. These findings open up new avenues for studying the ongoing dynamics of cognition using EEG and for developing effective and natural brain-computer interfaces.


Asunto(s)
Atención/fisiología , Encéfalo/fisiología , Electroencefalografía/métodos , Procesamiento de Señales Asistido por Computador , Percepción del Habla/fisiología , Estimulación Acústica , Adulto , Femenino , Humanos , Masculino , Pruebas Neuropsicológicas , Factores de Tiempo
18.
PLoS Biol ; 10(1): e1001251, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22303281

RESUMEN

How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex.


Asunto(s)
Corteza Auditiva/fisiología , Mapeo Encefálico , Acústica del Lenguaje , Algoritmos , Simulación por Computador , Electrodos Implantados , Electroencefalografía , Femenino , Humanos , Modelos Lineales , Masculino , Modelos Biológicos
19.
J Acoust Soc Am ; 135(3): 1380-91, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24606276

RESUMEN

Sounds such as the voice or musical instruments can be recognized on the basis of timbre alone. Here, sound recognition was investigated with severely reduced timbre cues. Short snippets of naturally recorded sounds were extracted from a large corpus. Listeners were asked to report a target category (e.g., sung voices) among other sounds (e.g., musical instruments). All sound categories covered the same pitch range, so the task had to be solved on timbre cues alone. The minimum duration for which performance was above chance was found to be short, on the order of a few milliseconds, with the best performance for voice targets. Performance was independent of pitch and was maintained when stimuli contained less than a full waveform cycle. Recognition was not generally better when the sound snippets were time-aligned with the sound onset compared to when they were extracted with a random starting time. Finally, performance did not depend on feedback or training, suggesting that the cues used by listeners in the artificial gating task were similar to those relevant for longer, more familiar sounds. The results show that timbre cues for sound recognition are available at a variety of time scales, including very short ones.


Asunto(s)
Señales (Psicología) , Discriminación en Psicología , Discriminación de la Altura Tonal , Reconocimiento en Psicología , Estimulación Acústica , Adulto , Análisis de Varianza , Audiometría , Retroalimentación Psicológica , Femenino , Humanos , Masculino , Música , Filtrado Sensorial , Detección de Señal Psicológica , Canto , Espectrografía del Sonido , Factores de Tiempo , Calidad de la Voz , Adulto Joven
20.
bioRxiv ; 2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38798551

RESUMEN

Listeners readily extract multi-dimensional auditory objects such as a 'localized talker' from complex acoustic scenes with multiple talkers. Yet, the neural mechanisms underlying simultaneous encoding and linking of different sound features - for example, a talker's voice and location - are poorly understood. We analyzed invasive intracranial recordings in neurosurgical patients attending to a localized talker in real-life cocktail party scenarios. We found that sensitivity to an individual talker's voice and location features was distributed throughout auditory cortex and that neural sites exhibited a gradient from sensitivity to a single feature to joint sensitivity to both features. On a population level, cortical response patterns of both dual-feature sensitive sites but also single-feature sensitive sites revealed simultaneous encoding of an attended talker's voice and location features. However, for single-feature sensitive sites, the representation of the primary feature was more precise. Further, sites which selective tracked an attended speech stream concurrently encoded an attended talker's voice and location features, indicating that such sites combine selective tracking of an attended auditory object with encoding of the object's features. Finally, we found that attending a localized talker selectively enhanced temporal coherence between single-feature voice sensitive sites and single-feature location sensitive sites, providing an additional mechanism for linking voice and location in multi-talker scenes. These results demonstrate that a talker's voice and location features are linked during multi-dimensional object formation in naturalistic multi-talker scenes by joint population coding as well as by temporal coherence between neural sites. SIGNIFICANCE STATEMENT: Listeners effortlessly extract auditory objects from complex acoustic scenes consisting of multiple sound sources in naturalistic, spatial sound scenes. Yet, how the brain links different sound features to form a multi-dimensional auditory object is poorly understood. We investigated how neural responses encode and integrate an attended talker's voice and location features in spatial multi-talker sound scenes to elucidate which neural mechanisms underlie simultaneous encoding and linking of different auditory features. Our results show that joint population coding as well as temporal coherence mechanisms contribute to distributed multi-dimensional auditory object encoding. These findings shed new light on cortical functional specialization and multidimensional auditory object formation in complex, naturalistic listening scenes. HIGHLIGHTS: Cortical responses to an single talker exhibit a distributed gradient, ranging from sites that are sensitive to both a talker's voice and location (dual-feature sensitive sites) to sites that are sensitive to either voice or location (single-feature sensitive sites).Population response patterns of dual-feature sensitive sites encode voice and location features of the attended talker in multi-talker scenes jointly and with equal precision.Despite their sensitivity to a single feature at the level of individual cortical sites, population response patterns of single-feature sensitive sites also encode location and voice features of a talker jointly, but with higher precision for the feature they are primarily sensitive to.Neural sites which selectively track an attended speech stream concurrently encode the attended talker's voice and location features.Attention selectively enhances temporal coherence between voice and location selective sites over time.Joint population coding as well as temporal coherence mechanisms underlie distributed multi-dimensional auditory object encoding in auditory cortex.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA