Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Hear Res ; 433: 108767, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37060895

RESUMO

The goal of describing how the human brain responds to complex acoustic stimuli has driven auditory neuroscience research for decades. Often, a systems-based approach has been taken, in which neurophysiological responses are modeled based on features of the presented stimulus. This includes a wealth of work modeling electroencephalogram (EEG) responses to complex acoustic stimuli such as speech. Examples of the acoustic features used in such modeling include the amplitude envelope and spectrogram of speech. These models implicitly assume a direct mapping from stimulus representation to cortical activity. However, in reality, the representation of sound is transformed as it passes through early stages of the auditory pathway, such that inputs to the cortex are fundamentally different from the raw audio signal that was presented. Thus, it could be valuable to account for the transformations taking place in lower-order auditory areas, such as the auditory nerve, cochlear nucleus, and inferior colliculus (IC) when predicting cortical responses to complex sounds. Specifically, because IC responses are more similar to cortical inputs than acoustic features derived directly from the audio signal, we hypothesized that linear mappings (temporal response functions; TRFs) fit to the outputs of an IC model would better predict EEG responses to speech stimuli. To this end, we modeled responses to the acoustic stimuli as they passed through the auditory nerve, cochlear nucleus, and inferior colliculus before fitting a TRF to the output of the modeled IC responses. Results showed that using model-IC responses in traditional systems analyzes resulted in better predictions of EEG activity than using the envelope or spectrogram of a speech stimulus. Further, it was revealed that model-IC derived TRFs predict different aspects of the EEG than acoustic-feature TRFs, and combining both types of TRF models provides a more accurate prediction of the EEG response.


Assuntos
Córtex Auditivo , Colículos Inferiores , Humanos , Fala/fisiologia , Vias Auditivas/fisiologia , Eletroencefalografia , Córtex Auditivo/fisiologia , Colículos Inferiores/fisiologia , Estimulação Acústica/métodos , Percepção Auditiva/fisiologia
2.
J Neurosci ; 42(4): 682-691, 2022 01 26.
Artigo em Inglês | MEDLINE | ID: mdl-34893546

RESUMO

Humans have the remarkable ability to selectively focus on a single talker in the midst of other competing talkers. The neural mechanisms that underlie this phenomenon remain incompletely understood. In particular, there has been longstanding debate over whether attention operates at an early or late stage in the speech processing hierarchy. One way to better understand this is to examine how attention might differentially affect neurophysiological indices of hierarchical acoustic and linguistic speech representations. In this study, we do this by using encoding models to identify neural correlates of speech processing at various levels of representation. Specifically, we recorded EEG from fourteen human subjects (nine female and five male) during a "cocktail party" attention experiment. Model comparisons based on these data revealed phonetic feature processing for attended, but not unattended speech. Furthermore, we show that attention specifically enhances isolated indices of phonetic feature processing, but that such attention effects are not apparent for isolated measures of acoustic processing. These results provide new insights into the effects of attention on different prelexical representations of speech, insights that complement recent anatomic accounts of the hierarchical encoding of attended speech. Furthermore, our findings support the notion that, for attended speech, phonetic features are processed as a distinct stage, separate from the processing of the speech acoustics.SIGNIFICANCE STATEMENT Humans are very good at paying attention to one speaker in an environment with multiple speakers. However, the details of how attended and unattended speech are processed differently by the brain is not completely clear. Here, we explore how attention affects the processing of the acoustic sounds of speech as well as the mapping of those sounds onto categorical phonetic features. We find evidence of categorical phonetic feature processing for attended, but not unattended speech. Furthermore, we find evidence that categorical phonetic feature processing is enhanced by attention, but acoustic processing is not. These findings add an important new layer in our understanding of how the human brain solves the cocktail party problem.


Assuntos
Estimulação Acústica/métodos , Atenção/fisiologia , Fonética , Percepção da Fala/fisiologia , Fala/fisiologia , Adulto , Eletroencefalografia/métodos , Feminino , Humanos , Masculino , Estimulação Luminosa/métodos , Adulto Jovem
3.
PLoS Comput Biol ; 17(9): e1009358, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34534211

RESUMO

The human brain tracks amplitude fluctuations of both speech and music, which reflects acoustic processing in addition to the encoding of higher-order features and one's cognitive state. Comparing neural tracking of speech and music envelopes can elucidate stimulus-general mechanisms, but direct comparisons are confounded by differences in their envelope spectra. Here, we use a novel method of frequency-constrained reconstruction of stimulus envelopes using EEG recorded during passive listening. We expected to see music reconstruction match speech in a narrow range of frequencies, but instead we found that speech was reconstructed better than music for all frequencies we examined. Additionally, models trained on all stimulus types performed as well or better than the stimulus-specific models at higher modulation frequencies, suggesting a common neural mechanism for tracking speech and music. However, speech envelope tracking at low frequencies, below 1 Hz, was associated with increased weighting over parietal channels, which was not present for the other stimuli. Our results highlight the importance of low-frequency speech tracking and suggest an origin from speech-specific processing in the brain.


Assuntos
Percepção Auditiva/fisiologia , Encéfalo/fisiologia , Música , Percepção da Fala/fisiologia , Fala/fisiologia , Estimulação Acústica/métodos , Adolescente , Adulto , Biologia Computacional , Simulação por Computador , Eletroencefalografia/estatística & dados numéricos , Feminino , Humanos , Modelos Lineares , Masculino , Modelos Neurológicos , Análise de Componente Principal , Acústica da Fala , Adulto Jovem
4.
J Neurosci ; 41(23): 4991-5003, 2021 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-33824190

RESUMO

Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy.SIGNIFICANCE STATEMENT During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.


Assuntos
Encéfalo/fisiologia , Compreensão/fisiologia , Sinais (Psicologia) , Percepção da Fala/fisiologia , Percepção Visual/fisiologia , Estimulação Acústica , Mapeamento Encefálico , Eletroencefalografia , Feminino , Humanos , Masculino , Fonética , Estimulação Luminosa
5.
Eur J Neurosci ; 50(11): 3831-3842, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31287601

RESUMO

Speech is central to communication among humans. Meaning is largely conveyed by the selection of linguistic units such as words, phrases and sentences. However, prosody, that is the variation of acoustic cues that tie linguistic segments together, adds another layer of meaning. There are various features underlying prosody, one of the most important being pitch and how it is modulated. Recent fMRI and ECoG studies have suggested that there are cortical regions for pitch which respond primarily to resolved harmonics and that high-gamma cortical activity encodes intonation as represented by relative pitch. Importantly, this latter result was shown to be independent of the cortical tracking of the acoustic energy of speech, a commonly used measure. Here, we investigate whether we can isolate low-frequency EEG indices of pitch processing of continuous narrative speech from those reflecting the tracking of other acoustic and phonetic features. Harmonic resolvability was found to contain unique predictive power in delta and theta phase, but it was highly correlated with the envelope and tracked even when stimuli were pitch-impoverished. As such, we are circumspect about whether its contribution is truly pitch-specific. Crucially however, we found a unique contribution of relative pitch to EEG delta-phase prediction, and this tracking was absent when subjects listened to pitch-impoverished stimuli. This finding suggests the possibility of a separate processing stream for prosody that might operate in parallel to acoustic-linguistic processing. Furthermore, it provides a novel neural index that could be useful for testing prosodic encoding in populations with speech processing deficits and for improving cognitively controlled hearing aids.


Assuntos
Córtex Auditivo/fisiologia , Ritmo Delta/fisiologia , Fonética , Percepção da Altura Sonora/fisiologia , Percepção da Fala/fisiologia , Estimulação Acústica/métodos , Eletroencefalografia/métodos , Feminino , Humanos , Magnetoencefalografia/métodos , Masculino
6.
eNeuro ; 6(3)2019.
Artigo em Inglês | MEDLINE | ID: mdl-31171606

RESUMO

Characterizing how the brain responds to stimuli has been a goal of sensory neuroscience for decades. One key approach has been to fit linear models to describe the relationship between sensory inputs and neural responses. This has included models aimed at predicting spike trains, local field potentials, BOLD responses, and EEG/MEG. In the case of EEG/MEG, one explicit use of this linear modeling approach has been the fitting of so-called temporal response functions (TRFs). TRFs have been used to study how auditory cortex tracks the amplitude envelope of acoustic stimuli, including continuous speech. However, such linear models typically assume that variations in the amplitude of the stimulus feature (i.e., the envelope) produce variations in the magnitude but not the latency or morphology of the resulting neural response. Here, we show that by amplitude binning the stimulus envelope, and then using it to fit a multivariate TRF, we can better account for these amplitude-dependent changes, and that this leads to a significant improvement in model performance for both amplitude-modulated noise and continuous speech in humans. We also show that this performance can be further improved through the inclusion of an additional envelope representation that emphasizes onsets and positive changes in the stimulus, consistent with the idea that while some neurons track the entire envelope, others respond preferentially to onsets in the stimulus. We contend that these results have practical implications for researchers interested in modeling brain responses to amplitude modulated sounds.


Assuntos
Vias Auditivas/fisiologia , Córtex Cerebral/fisiologia , Potenciais Evocados Auditivos , Modelos Neurológicos , Processamento de Sinais Assistido por Computador , Percepção da Fala/fisiologia , Estimulação Acústica , Adulto , Córtex Auditivo/fisiologia , Eletroencefalografia , Feminino , Humanos , Magnetoencefalografia , Masculino , Acústica da Fala , Adulto Jovem
7.
J Neural Eng ; 16(3): 036017, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30836345

RESUMO

OBJECTIVE: It has been shown that attentional selection in a simple dichotic listening paradigm can be decoded offline by reconstructing the stimulus envelope from single-trial neural response data. Here, we test the efficacy of this approach in an environment with non-stationary talkers. We then look beyond the envelope reconstructions themselves and consider whether incorporating the decoder values-which reflect the weightings applied to the multichannel EEG data at different time lags and scalp locations when reconstructing the stimulus envelope-can improve decoding performance. APPROACH: High-density EEG was recorded as subjects attended to one of two talkers. The two speech streams were filtered using HRTFs, and the talkers were alternated between the left and right locations at varying intervals to simulate a dynamic environment. We trained spatio-temporal decoders mapping from EEG data to the attended and unattended stimulus envelopes. We then decoded auditory attention by (1) using the attended decoder to reconstruct the envelope and (2) exploiting the fact that decoder weightings themselves contain signatures of attention, resulting in consistent patterns across subjects that can be classified. MAIN RESULTS: The previously established decoding approach was found to be effective even with non-stationary talkers. Signatures of attentional selection and attended direction were found in the spatio-temporal structure of the decoders and were consistent across subjects. The inclusion of decoder weights into the decoding algorithm resulted in significantly improved decoding accuracies (from 61.07% to 65.31% for 4 s windows). An attempt was made to include alpha power lateralization as another feature to improve decoding, although this was unsuccessful at the single-trial level. SIGNIFICANCE: This work suggests that the spatial-temporal decoder weights can be utilised to improve decoding. More generally, looking beyond envelope reconstruction and incorporating other signatures of attention is an avenue that should be explored to improve selective auditory attention decoding.


Assuntos
Estimulação Acústica/métodos , Córtex Auditivo/fisiologia , Eletroencefalografia/métodos , Ruído , Localização de Som/fisiologia , Percepção da Fala/fisiologia , Adulto , Feminino , Humanos , Masculino
8.
Sci Rep ; 8(1): 13745, 2018 09 13.
Artigo em Inglês | MEDLINE | ID: mdl-30214000

RESUMO

This study assessed cortical tracking of temporal information in incoming natural speech in seven-month-old infants. Cortical tracking refers to the process by which neural activity follows the dynamic patterns of the speech input. In adults, it has been shown to involve attentional mechanisms and to facilitate effective speech encoding. However, in infants, cortical tracking or its effects on speech processing have not been investigated. This study measured cortical tracking of speech in infants and, given the involvement of attentional mechanisms in this process, cortical tracking of both infant-directed speech (IDS), which is highly attractive to infants, and the less captivating adult-directed speech (ADS), were compared. IDS is the speech register parents use when addressing young infants. In comparison to ADS, it is characterised by several acoustic qualities that capture infants' attention to linguistic input and assist language learning. Seven-month-old infants' cortical responses were recorded via electroencephalography as they listened to IDS or ADS recordings. Results showed stronger low-frequency cortical tracking of the speech envelope in IDS than in ADS. This suggests that IDS has a privileged status in facilitating successful cortical tracking of incoming speech which may, in turn, augment infants' early speech processing and even later language development.


Assuntos
Encéfalo/fisiologia , Desenvolvimento da Linguagem , Fala/fisiologia , Estimulação Acústica , Atenção/fisiologia , Percepção Auditiva/fisiologia , Encéfalo/diagnóstico por imagem , Eletroencefalografia , Feminino , Humanos , Lactente , Masculino , Percepção da Fala/fisiologia
9.
Hear Res ; 348: 70-77, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28246030

RESUMO

Speech is central to human life. As such, any delay or impairment in receptive speech processing can have a profoundly negative impact on the social and professional life of a person. Thus, being able to assess the integrity of speech processing in different populations is an important goal. Current standardized assessment is mostly based on psychometric measures that do not capture the full extent of a person's speech processing abilities and that are difficult to administer in some subjects groups. A potential alternative to these tests would be to derive "direct", objective measures of speech processing from cortical activity. One such approach was recently introduced and showed that it is possible to use electroencephalography (EEG) to index cortical processing at the level of phonemes from responses to continuous natural speech. However, a large amount of data was required for such analyses. This limits the usefulness of this approach for assessing speech processing in particular cohorts for whom data collection is difficult. Here, we used EEG data from 10 subjects to assess whether measures reflecting phoneme-level processing could be reliably obtained using only 10 min of recording time from each subject. This was done successfully using a generic modeling approach wherein the data from a training group composed of 9 subjects were combined to derive robust predictions of the EEG signal for new subjects. This allowed the derivation of indices of cortical activity at the level of phonemes and the disambiguation of responses to specific phonetic features (e.g., stop, plosive, and nasal consonants) with limited data. This objective approach has the potential to complement psychometric measures of speech processing in a wide variety of subjects.


Assuntos
Córtex Auditivo/fisiologia , Potenciais Evocados Auditivos/fisiologia , Transtornos da Linguagem/diagnóstico , Percepção da Fala/fisiologia , Estimulação Acústica , Adulto , Algoritmos , Eletroencefalografia , Feminino , Voluntários Saudáveis , Humanos , Idioma , Masculino , Fonética , Processamento de Sinais Assistido por Computador , Fala/fisiologia , Adulto Jovem
10.
J Neurosci ; 36(38): 9888-95, 2016 09 21.
Artigo em Inglês | MEDLINE | ID: mdl-27656026

RESUMO

UNLABELLED: Speech comprehension is improved by viewing a speaker's face, especially in adverse hearing conditions, a principle known as inverse effectiveness. However, the neural mechanisms that help to optimize how we integrate auditory and visual speech in such suboptimal conversational environments are not yet fully understood. Using human EEG recordings, we examined how visual speech enhances the cortical representation of auditory speech at a signal-to-noise ratio that maximized the perceptual benefit conferred by multisensory processing relative to unisensory processing. We found that the influence of visual input on the neural tracking of the audio speech signal was significantly greater in noisy than in quiet listening conditions, consistent with the principle of inverse effectiveness. Although envelope tracking during audio-only speech was greatly reduced by background noise at an early processing stage, it was markedly restored by the addition of visual speech input. In background noise, multisensory integration occurred at much lower frequencies and was shown to predict the multisensory gain in behavioral performance at a time lag of ∼250 ms. Critically, we demonstrated that inverse effectiveness, in the context of natural audiovisual (AV) speech processing, relies on crossmodal integration over long temporal windows. Our findings suggest that disparate integration mechanisms contribute to the efficient processing of AV speech in background noise. SIGNIFICANCE STATEMENT: The behavioral benefit of seeing a speaker's face during conversation is especially pronounced in challenging listening environments. However, the neural mechanisms underlying this phenomenon, known as inverse effectiveness, have not yet been established. Here, we examine this in the human brain using natural speech-in-noise stimuli that were designed specifically to maximize the behavioral benefit of audiovisual (AV) speech. We find that this benefit arises from our ability to integrate multimodal information over longer periods of time. Our data also suggest that the addition of visual speech restores early tracking of the acoustic speech signal during excessive background noise. These findings support and extend current mechanistic perspectives on AV speech perception.


Assuntos
Potenciais Evocados/fisiologia , Modelos Neurológicos , Percepção da Fala/fisiologia , Percepção Visual/fisiologia , Estimulação Acústica , Adulto , Análise de Variância , Eletroencefalografia , Feminino , Humanos , Masculino , Estimulação Luminosa , Espectrografia do Som , Fatores de Tempo , Adulto Jovem
11.
J Neurosci ; 35(42): 14195-204, 2015 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-26490860

RESUMO

Congruent audiovisual speech enhances our ability to comprehend a speaker, even in noise-free conditions. When incongruent auditory and visual information is presented concurrently, it can hinder a listener's perception and even cause him or her to perceive information that was not presented in either modality. Efforts to investigate the neural basis of these effects have often focused on the special case of discrete audiovisual syllables that are spatially and temporally congruent, with less work done on the case of natural, continuous speech. Recent electrophysiological studies have demonstrated that cortical response measures to continuous auditory speech can be easily obtained using multivariate analysis methods. Here, we apply such methods to the case of audiovisual speech and, importantly, present a novel framework for indexing multisensory integration in the context of continuous speech. Specifically, we examine how the temporal and contextual congruency of ongoing audiovisual speech affects the cortical encoding of the speech envelope in humans using electroencephalography. We demonstrate that the cortical representation of the speech envelope is enhanced by the presentation of congruent audiovisual speech in noise-free conditions. Furthermore, we show that this is likely attributable to the contribution of neural generators that are not particularly active during unimodal stimulation and that it is most prominent at the temporal scale corresponding to syllabic rate (2-6 Hz). Finally, our data suggest that neural entrainment to the speech envelope is inhibited when the auditory and visual streams are incongruent both temporally and contextually. SIGNIFICANCE STATEMENT: Seeing a speaker's face as he or she talks can greatly help in understanding what the speaker is saying. This is because the speaker's facial movements relay information about what the speaker is saying, but also, importantly, when the speaker is saying it. Studying how the brain uses this timing relationship to combine information from continuous auditory and visual speech has traditionally been methodologically difficult. Here we introduce a new approach for doing this using relatively inexpensive and noninvasive scalp recordings. Specifically, we show that the brain's representation of auditory speech is enhanced when the accompanying visual speech signal shares the same timing. Furthermore, we show that this enhancement is most pronounced at a time scale that corresponds to mean syllable length.


Assuntos
Potenciais Evocados Auditivos/fisiologia , Potenciais Evocados Visuais/fisiologia , Percepção da Fala/fisiologia , Percepção Visual/fisiologia , Estimulação Acústica , Adulto , Análise de Variância , Mapeamento Encefálico , Eletroencefalografia , Eletromiografia , Feminino , Humanos , Masculino , Estimulação Luminosa , Tempo de Reação , Adulto Jovem
12.
J Neurosci ; 35(18): 7256-63, 2015 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-25948273

RESUMO

The human brain has evolved to operate effectively in highly complex acoustic environments, segregating multiple sound sources into perceptually distinct auditory objects. A recent theory seeks to explain this ability by arguing that stream segregation occurs primarily due to the temporal coherence of the neural populations that encode the various features of an individual acoustic source. This theory has received support from both psychoacoustic and functional magnetic resonance imaging (fMRI) studies that use stimuli which model complex acoustic environments. Termed stochastic figure-ground (SFG) stimuli, they are composed of a "figure" and background that overlap in spectrotemporal space, such that the only way to segregate the figure is by computing the coherence of its frequency components over time. Here, we extend these psychoacoustic and fMRI findings by using the greater temporal resolution of electroencephalography to investigate the neural computation of temporal coherence. We present subjects with modified SFG stimuli wherein the temporal coherence of the figure is modulated stochastically over time, which allows us to use linear regression methods to extract a signature of the neural processing of this temporal coherence. We do this under both active and passive listening conditions. Our findings show an early effect of coherence during passive listening, lasting from ∼115 to 185 ms post-stimulus. When subjects are actively listening to the stimuli, these responses are larger and last longer, up to ∼265 ms. These findings provide evidence for early and preattentive neural computations of temporal coherence that are enhanced by active analysis of an auditory scene.


Assuntos
Estimulação Acústica/métodos , Vias Auditivas/fisiologia , Percepção Auditiva/fisiologia , Mapeamento Encefálico/métodos , Psicoacústica , Adulto , Eletroencefalografia/métodos , Feminino , Humanos , Imageamento por Ressonância Magnética/métodos , Masculino , Fatores de Tempo , Adulto Jovem
13.
Cereb Cortex ; 25(7): 1697-706, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24429136

RESUMO

How humans solve the cocktail party problem remains unknown. However, progress has been made recently thanks to the realization that cortical activity tracks the amplitude envelope of speech. This has led to the development of regression methods for studying the neurophysiology of continuous speech. One such method, known as stimulus-reconstruction, has been successfully utilized with cortical surface recordings and magnetoencephalography (MEG). However, the former is invasive and gives a relatively restricted view of processing along the auditory hierarchy, whereas the latter is expensive and rare. Thus it would be extremely useful for research in many populations if stimulus-reconstruction was effective using electroencephalography (EEG), a widely available and inexpensive technology. Here we show that single-trial (≈60 s) unaveraged EEG data can be decoded to determine attentional selection in a naturalistic multispeaker environment. Furthermore, we show a significant correlation between our EEG-based measure of attention and performance on a high-level attention task. In addition, by attempting to decode attention at individual latencies, we identify neural processing at ∼200 ms as being critical for solving the cocktail party problem. These findings open up new avenues for studying the ongoing dynamics of cognition using EEG and for developing effective and natural brain-computer interfaces.


Assuntos
Atenção/fisiologia , Encéfalo/fisiologia , Eletroencefalografia/métodos , Processamento de Sinais Assistido por Computador , Percepção da Fala/fisiologia , Estimulação Acústica , Adulto , Feminino , Humanos , Masculino , Testes Neuropsicológicos , Fatores de Tempo
14.
J Neurophysiol ; 111(7): 1400-8, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24401714

RESUMO

Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research has shown that neuronal activity in human auditory cortex tracks the envelope of natural speech. Here, we exploit this finding by estimating a linear forward-mapping between the speech envelope and EEG data and show that the latency at which the envelope of natural speech is represented in cortex is shortened by >10 ms when continuous audiovisual speech is presented compared with audio-only speech. In addition, we use a reverse-mapping approach to reconstruct an estimate of the speech stimulus from the EEG data and, by comparing the bimodal estimate with the sum of the unimodal estimates, find no evidence of any nonlinear additive effects in the audiovisual speech condition. These findings point to an underlying mechanism that could account for enhanced comprehension during audiovisual speech. Specifically, we hypothesize that low-level acoustic features that are temporally coherent with the preceding visual stream may be synthesized into a speech object at an earlier latency, which may provide an extended period of low-level processing before extraction of semantic information.


Assuntos
Percepção Auditiva/fisiologia , Ondas Encefálicas/fisiologia , Córtex Cerebral/fisiologia , Fala/fisiologia , Percepção Visual/fisiologia , Estimulação Acústica , Adulto , Mapeamento Encefálico , Eletroencefalografia , Feminino , Humanos , Masculino , Estimulação Luminosa , Tempo de Reação , Adulto Jovem
15.
Neuroreport ; 25(4): 219-25, 2014 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-24231831

RESUMO

Auditory selective attention is the ability to enhance the processing of a single sound source, while simultaneously suppressing the processing of other competing sound sources. Recent research has addressed a long-running debate by showing that endogenous attention produces effects on obligatory sensory responses to continuous and competing auditory stimuli. However, until now, this result has only been shown under conditions where the competing stimuli differed in both their frequency characteristics and, importantly, their spatial location. Thus, it is unknown whether endogenous selective attention based only on nonspatial features modulates obligatory sensory processing. Here, we investigate this issue using a diotic paradigm, such that competing auditory stimuli differ in frequency, but had no separation in space. We find a significant effect of attention on electroencephalogram-based measures of obligatory sensory processing at several poststimulus latencies. We discuss these results in terms of previous research on feature-based attention and by comparing our findings with the previous work using stimuli that differed both in terms of spatial and frequency-based characteristics.


Assuntos
Atenção/fisiologia , Percepção Auditiva , Encéfalo/fisiologia , Discriminação Psicológica , Estimulação Acústica , Adulto , Eletroencefalografia , Potenciais Evocados Auditivos , Feminino , Humanos , Masculino , Psicoacústica , Análise e Desempenho de Tarefas , Fatores de Tempo , Adulto Jovem
16.
Artigo em Inglês | MEDLINE | ID: mdl-24110309

RESUMO

Traditionally, the use of electroencephalography (EEG) to study the neural processing of natural stimuli in humans has been hampered by the need to repeatedly present discrete stimuli. Progress has been made recently by the realization that cortical population activity tracks the amplitude envelope of speech stimuli. This has led to studies using linear regression methods which allow the presentation of continuous speech. One such method, known as stimulus reconstruction, has so far only been utilized in multi-electrode cortical surface recordings and magnetoencephalography (MEG). Here, in two studies, we show that such an approach is also possible with EEG, despite the poorer signal-to-noise ratio of the data. In the first study, we show that it is possible to decode attention in a naturalistic cocktail party scenario on a single trial (≈60 s) basis. In the second, we show that the representation of the envelope of auditory speech in the cortex is more robust when accompanied by visual speech. The sensitivity of this inexpensive, widely-accessible technology for the online monitoring of natural stimuli has implications for the design of future studies of the cocktail party problem and for the implementation of EEG-based brain-computer interfaces.


Assuntos
Atenção/fisiologia , Eletroencefalografia/métodos , Fala/fisiologia , Percepção Visual/fisiologia , Estimulação Acústica , Adulto , Comportamento , Feminino , Humanos , Masculino
17.
Eur J Neurosci ; 35(9): 1497-503, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22462504

RESUMO

Distinguishing between speakers and focusing attention on one speaker in multi-speaker environments is extremely important in everyday life. Exactly how the brain accomplishes this feat and, in particular, the precise temporal dynamics of this attentional deployment are as yet unknown. A long history of behavioral research using dichotic listening paradigms has debated whether selective attention to speech operates at an early stage of processing based on the physical characteristics of the stimulus or at a later stage during semantic processing. With its poor temporal resolution fMRI has contributed little to the debate, while EEG-ERP paradigms have been hampered by the need to average the EEG in response to discrete stimuli which are superimposed onto ongoing speech. This presents a number of problems, foremost among which is that early attention effects in the form of endogenously generated potentials can be so temporally broad as to mask later attention effects based on the higher level processing of the speech stream. Here we overcome this issue by utilizing the AESPA (auditory evoked spread spectrum analysis) method which allows us to extract temporally detailed responses to two concurrently presented speech streams in natural cocktail-party-like attentional conditions without the need for superimposed probes. We show attentional effects on exogenous stimulus processing in the 200-220 ms range in the left hemisphere. We discuss these effects within the context of research on auditory scene analysis and in terms of a flexible locus of attention that can be deployed at a particular processing stage depending on the task.


Assuntos
Atenção/fisiologia , Potenciais Evocados Auditivos/fisiologia , Percepção da Fala/fisiologia , Estimulação Acústica , Adulto , Análise de Variância , Eletroencefalografia , Feminino , Lateralidade Funcional , Humanos , Masculino , Tempo de Reação , Adulto Jovem
18.
Cereb Cortex ; 21(6): 1223-30, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21068187

RESUMO

Endogenous attention is the self-directed focus of attention to a region or feature of the environment. In this study, we assess the effects of endogenous attention on temporally detailed responses to continuous and competing auditory stimuli obtained using the novel auditory evoked spread spectrum analysis (AESPA) method. There is some debate as to whether an enhancement of sensory processing is involved in endogenous attention. It has been suggested that attentional effects are not due to increased sensory activity but are due to engagement of separate temporally overlapping nonsensory attention-related activity. There are also issues with the fact that the influence of exogenous attention grabbing mechanisms may hamper studies of endogenous attention. Due to the nature of the AESPA method, the obtained responses represent activity directly related to the stimulus envelope and thus predominantly correspond to obligatory sensory processing. In addition, the continuous nature of the stimuli minimizes exogenous attentional influence. We found attentional modulations at ~136 ms (during the Nc component of the AESPA response) and localized this to auditory cortex. Although the involvement of separate nonsensory attentional centers cannot be ruled out, these findings clearly demonstrate that endogenous attention does modulate obligatory sensory activity in auditory cortex.


Assuntos
Atenção/fisiologia , Córtex Auditivo/fisiologia , Percepção Auditiva/fisiologia , Percepção Espacial/fisiologia , Estimulação Acústica/métodos , Adulto , Análise de Variância , Mapeamento Encefálico , Eletroencefalografia , Potenciais Evocados Auditivos/fisiologia , Feminino , Humanos , Masculino , Psicoacústica , Tempo de Reação/fisiologia , Adulto Jovem
19.
Eur J Neurosci ; 31(1): 189-93, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-20092565

RESUMO

The human auditory system has evolved to efficiently process individual streams of speech. However, obtaining temporally detailed responses to distinct continuous natural speech streams has hitherto been impracticable using standard neurophysiological techniques. Here a method is described which provides for the estimation of a temporally precise electrophysiological response to uninterrupted natural speech. We have termed this response AESPA (Auditory Evoked Spread Spectrum Analysis) and it represents an estimate of the impulse response of the auditory system. It is obtained by assuming that the recorded electrophysiological function represents a convolution of the amplitude envelope of a continuous speech stream with the to-be-estimated impulse response. We present examples of these responses using both scalp and intracranially recorded human EEG, which were obtained while subjects listened to a binaurally presented recording of a male speaker reading naturally from a classic work of fiction. This method expands the arsenal of stimulation types that can now be effectively used to derive auditory evoked responses and allows for the use of considerably more ecologically valid stimulation parameters. Some implications for future research efforts are presented.


Assuntos
Encéfalo/fisiologia , Potenciais Evocados Auditivos , Percepção da Fala/fisiologia , Estimulação Acústica , Adulto , Eletrodos Implantados , Eletroencefalografia/métodos , Epilepsia , Feminino , Humanos , Masculino , Fala , Fatores de Tempo , Adulto Jovem
20.
J Neurophysiol ; 102(1): 349-59, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19439675

RESUMO

In natural environments complex and continuous auditory stimulation is virtually ubiquitous. The human auditory system has evolved to efficiently process an infinity of everyday sounds, which range from short, simple bursts of noise to signals with a much higher order of information such as speech. Investigation of temporal processing in this system using the event-related potential (ERP) technique has led to great advances in our knowledge. However, this method is restricted by the need to present simple, discrete, repeated stimuli to obtain a useful response. Alternatively the continuous auditory steady-state response is used, although this method reduces the evoked response to its fundamental frequency component at the expense of useful information on the timing of response transmission through the auditory system. In this report, we describe a method for eliciting a novel ERP, which circumvents these limitations, known as the AESPA (auditory-evoked spread spectrum analysis). This method uses rapid amplitude modulation of audio carrier signals to estimate the impulse response of the auditory system. We show AESPA responses with high signal-to-noise ratios obtained using two types of carrier wave: a 1-kHz tone and broadband noise. To characterize these responses, they are compared with auditory-evoked potentials elicited using standard techniques. A number of similarities and differences between the responses are noted and these are discussed in light of the differing stimulation and analysis methods used. Data are presented that demonstrate the generalizability of the AESPA method and a number of applications are proposed.


Assuntos
Estimulação Acústica/métodos , Vias Auditivas/fisiologia , Percepção Auditiva/fisiologia , Mapeamento Encefálico , Potenciais Evocados Auditivos/fisiologia , Adulto , Limiar Auditivo/fisiologia , Eletroencefalografia/métodos , Feminino , Análise de Fourier , Humanos , Masculino , Tempo de Reação/fisiologia , Estatística como Assunto , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA