RESUMEN
Selective attention is essential for the processing of multi-speaker auditory scenes because they require the perceptual segregation of the relevant speech ("target") from irrelevant speech ("distractors"). For simple sounds, it has been suggested that the processing of multiple distractor sounds depends on bottom-up factors affecting task performance. However, it remains unclear whether such dependency applies to naturalistic multi-speaker auditory scenes. In this study, we tested the hypothesis that increased perceptual demand (the processing requirement posed by the scene to separate the target speech) reduces the cortical processing of distractor speech thus decreasing their perceptual segregation. Human participants were presented with auditory scenes including three speakers and asked to selectively attend to one speaker while their EEG was acquired. The perceptual demand of this selective listening task was varied by introducing an auditory cue (interaural time differences, ITDs) for segregating the target from the distractor speakers, while acoustic differences between the distractors were matched in ITD and loudness. We obtained a quantitative measure of the cortical segregation of distractor speakers by assessing the difference in how accurately speech-envelope following EEG responses could be predicted by models of averaged distractor speech versus models of individual distractor speech. In agreement with our hypothesis, results show that interaural segregation cues led to improved behavioral word-recognition performance and stronger cortical segregation of the distractor speakers. The neural effect was strongest in the δ-band and at early delays (0 - 200 ms). Our results indicate that during low perceptual demand, the human cortex represents individual distractor speech signals as more segregated. This suggests that, in addition to purely acoustical properties, the cortical processing of distractor speakers depends on factors like perceptual demand.
Asunto(s)
Atención/fisiología , Corteza Cerebral/fisiología , Percepción del Habla/fisiología , Estimulación Acústica/métodos , Adulto , Electroencefalografía/métodos , Femenino , Humanos , Masculino , Ruido , Procesamiento de Señales Asistido por Computador , Adulto JovenRESUMEN
Multi-Voxel Pattern Analysis (MVPA) is a well established tool to disclose weak, distributed effects in brain activity patterns. The generalization ability is assessed by testing the learning model on new, unseen data. However, when limited data is available, the decoding success is estimated using cross-validation. There is general consensus on assessing statistical significance of cross-validated accuracy with non-parametric permutation tests. In this work we focus on the false positive control of different permutation strategies and on the statistical power of different cross-validation schemes. With simulations, we show that estimating the entire cross-validation error on each permuted dataset is the only statistically valid permutation strategy. Furthermore, using both simulations and real data from the HCP WU-Minn 3T fMRI dataset, we show that, among the different cross-validation schemes, a repeated split-half cross-validation is the most powerful, despite achieving slightly lower classification accuracy, when compared to other schemes. Our findings provide additional insights into the optimization of the experimental design for MVPA, highlighting the benefits of having many short runs.
Asunto(s)
Encéfalo/diagnóstico por imagen , Neuroimagen Funcional/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Simulación por Computador , Humanos , Imagen por Resonancia Magnética , Proyectos de InvestigaciónRESUMEN
When speech perception is difficult, one way listeners adjust is by reconfiguring phoneme category boundaries, drawing on contextual information. Both lexical knowledge and lipreading cues are used in this way, but it remains unknown whether these two differing forms of perceptual learning are similar at a neural level. This study compared phoneme boundary adjustments driven by lexical or audiovisual cues, using ultra-high-field 7-T fMRI. During imaging, participants heard exposure stimuli and test stimuli. Exposure stimuli for lexical retuning were audio recordings of words, and those for audiovisual recalibration were audio-video recordings of lip movements during utterances of pseudowords. Test stimuli were ambiguous phonetic strings presented without context, and listeners reported what phoneme they heard. Reports reflected phoneme biases in preceding exposure blocks (e.g., more reported /p/ after /p/-biased exposure). Analysis of corresponding brain responses indicated that both forms of cue use were associated with a network of activity across the temporal cortex, plus parietal, insula, and motor areas. Audiovisual recalibration also elicited significant occipital cortex activity despite the lack of visual stimuli. Activity levels in several ROIs also covaried with strength of audiovisual recalibration, with greater activity accompanying larger recalibration shifts. Similar activation patterns appeared for lexical retuning, but here, no significant ROIs were identified. Audiovisual and lexical forms of perceptual learning thus induce largely similar brain response patterns. However, audiovisual recalibration involves additional visual cortex contributions, suggesting that previously acquired visual information (on lip movements) is retrieved and deployed to disambiguate auditory perception.
Asunto(s)
Fonética , Percepción del Habla , Percepción Auditiva/fisiología , Humanos , Aprendizaje , Lectura de los Labios , Percepción del Habla/fisiologíaRESUMEN
The primary and posterior auditory cortex (AC) are known for their sensitivity to spatial information, but how this information is processed is not yet understood. AC that is sensitive to spatial manipulations is also modulated by the number of auditory streams present in a scene (Smith et al., 2010), suggesting that spatial and nonspatial cues are integrated for stream segregation. We reasoned that, if this is the case, then it is the distance between sounds rather than their absolute positions that is essential. To test this hypothesis, we measured human brain activity in response to spatially separated concurrent sounds with fMRI at 7 tesla in five men and five women. Stimuli were spatialized amplitude-modulated broadband noises recorded for each participant via in-ear microphones before scanning. Using a linear support vector machine classifier, we investigated whether sound location and/or location plus spatial separation between sounds could be decoded from the activity in Heschl's gyrus and the planum temporale. The classifier was successful only when comparing patterns associated with the conditions that had the largest difference in perceptual spatial separation. Our pattern of results suggests that the representation of spatial separation is not merely the combination of single locations, but rather is an independent feature of the auditory scene.SIGNIFICANCE STATEMENT Often, when we think of auditory spatial information, we think of where sounds are coming from-that is, the process of localization. However, this information can also be used in scene analysis, the process of grouping and segregating features of a soundwave into objects. Essentially, when sounds are further apart, they are more likely to be segregated into separate streams. Here, we provide evidence that activity in the human auditory cortex represents the spatial separation between sounds rather than their absolute locations, indicating that scene analysis and localization processes may be independent.
Asunto(s)
Corteza Auditiva/fisiología , Localización de Sonidos/fisiología , Percepción Espacial/fisiología , Estimulación Acústica , Adulto , Corteza Auditiva/diagnóstico por imagen , Percepción Auditiva/fisiología , Mapeo Encefálico , Simulación por Computador , Femenino , Humanos , Imagen por Resonancia Magnética , Masculino , Máquina de Vectores de SoporteRESUMEN
Viewing a speaker's lip movements can improve the brain's ability to 'track' the amplitude envelope of the auditory speech signal and facilitate intelligibility. Whether such neurobehavioral benefits can also arise from tactually sensing the speech envelope on the skin is unclear. We hypothesized that tactile speech envelopes can improve neural tracking of auditory speech and thereby facilitate intelligibility. To test this, we applied continuous auditory speech and vibrotactile speech-envelope-shaped stimulation at various asynchronies to the ears and index fingers of normally-hearing human listeners while simultaneously assessing speech-recognition performance and cortical speech-envelope tracking with electroencephalography. Results indicate that tactile speech-shaped envelopes improve the cortical tracking, but not intelligibility, of degraded auditory speech. The cortical speech-tracking benefit occurs for tactile input leading the auditory input by 100â¯mâ¯s or less, emerges in the EEG during an early time window (~0-150â¯mâ¯s), and in particular involves cortical activity in the delta (1-4â¯Hz) range. These characteristics hint at a predictive mechanism for multisensory integration of complex slow time-varying inputs that might play a role in tactile speech communication.
Asunto(s)
Corteza Cerebral/fisiología , Ritmo Delta/fisiología , Electroencefalografía , Inteligibilidad del Habla , Percepción del Habla/fisiología , Percepción del Tacto/fisiología , Adolescente , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estimulación Física , Factores de Tiempo , Adulto JovenRESUMEN
Often, in everyday life, we encounter auditory scenes comprising multiple simultaneous sounds and succeed to selectively attend to only one sound, typically the most relevant for ongoing behavior. Studies using basic sounds and two-talker stimuli have shown that auditory selective attention aids this by enhancing the neural representations of the attended sound in auditory cortex. It remains unknown, however, whether and how this selective attention mechanism operates on representations of auditory scenes containing natural sounds of different categories. In this high-field fMRI study we presented participants with simultaneous voices and musical instruments while manipulating their focus of attention. We found an attentional enhancement of neural sound representations in temporal cortex - as defined by spatial activation patterns - at locations that depended on the attended category (i.e., voices or instruments). In contrast, we found that in frontal cortex the site of enhancement was independent of the attended category and the same regions could flexibly represent any attended sound regardless of its category. These results are relevant to elucidate the interacting mechanisms of bottom-up and top-down processing when listening to real-life scenes comprised of multiple sound categories.
Asunto(s)
Atención/fisiología , Percepción Auditiva/fisiología , Corteza Cerebral/fisiología , Estimulación Acústica/métodos , Adulto , Mapeo Encefálico/métodos , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética/métodos , Masculino , Adulto JovenRESUMEN
In everyday life, we process mixtures of a variety of sounds. This processing involves the segregation of auditory input and the attentive selection of the stream that is most relevant to current goals. For natural scenes with multiple irrelevant sounds, however, it is unclear how the human auditory system represents all the unattended sounds. In particular, it remains elusive whether the sensory input to the human auditory cortex of unattended sounds biases the cortical integration/segregation of these sounds in a similar way as for attended sounds. In this study, we tested this by asking participants to selectively listen to one of two speakers or music in an ongoing 1-min sound mixture while their cortical neural activity was measured with EEG. Using a stimulus reconstruction approach, we find better reconstruction of mixed unattended sounds compared to individual unattended sounds at two early cortical stages (70â¯ms and 150â¯ms) of the auditory processing hierarchy. Crucially, at the earlier processing stage (70â¯ms), this cortical bias to represent unattended sounds as integrated rather than segregated increases with increasing similarity of the unattended sounds. Our results reveal an important role of acoustical properties for the cortical segregation of unattended auditory streams in natural listening situations. They further corroborate the notion that selective attention contributes functionally to cortical stream segregation. These findings highlight that a common, acoustics-based grouping principle governs the cortical representation of auditory streams not only inside but also outside the listener's focus of attention.
Asunto(s)
Atención/fisiología , Percepción Auditiva/fisiología , Corteza Cerebral/fisiología , Electroencefalografía/métodos , Neuroimagen Funcional/métodos , Música , Percepción del Habla/fisiología , Adolescente , Adulto , Corteza Auditiva/fisiología , Femenino , Humanos , Masculino , Adulto JovenRESUMEN
Pitch is a perceptual attribute related to the fundamental frequency (or periodicity) of a sound. So far, the cortical processing of pitch has been investigated mostly using synthetic sounds. However, the complex harmonic structure of natural sounds may require different mechanisms for the extraction and analysis of pitch. This study investigated the neural representation of pitch in human auditory cortex using model-based encoding and decoding analyses of high field (7 T) functional magnetic resonance imaging (fMRI) data collected while participants listened to a wide range of real-life sounds. Specifically, we modeled the fMRI responses as a function of the sounds' perceived pitch height and salience (related to the fundamental frequency and the harmonic structure respectively), which we estimated with a computational algorithm of pitch extraction (de Cheveigné and Kawahara, 2002). First, using single-voxel fMRI encoding, we identified a pitch-coding region in the antero-lateral Heschl's gyrus (HG) and adjacent superior temporal gyrus (STG). In these regions, the pitch representation model combining height and salience predicted the fMRI responses comparatively better than other models of acoustic processing and, in the right hemisphere, better than pitch representations based on height/salience alone. Second, we assessed with model-based decoding that multi-voxel response patterns of the identified regions are more informative of perceived pitch than the remainder of the auditory cortex. Further multivariate analyses showed that complementing a multi-resolution spectro-temporal sound representation with pitch produces a small but significant improvement to the decoding of complex sounds from fMRI response patterns. In sum, this work extends model-based fMRI encoding and decoding methods - previously employed to examine the representation and processing of acoustic sound features in the human auditory system - to the representation and processing of a relevant perceptual attribute such as pitch. Taken together, the results of our model-based encoding and decoding analyses indicated that the pitch of complex real life sounds is extracted and processed in lateral HG/STG regions, at locations consistent with those indicated in several previous fMRI studies using synthetic sounds. Within these regions, pitch-related sound representations reflect the modulatory combination of height and the salience of the pitch percept.
Asunto(s)
Corteza Auditiva/fisiología , Mapeo Encefálico/métodos , Modelos Neurológicos , Percepción de la Altura Tonal/fisiología , Estimulación Acústica , Adulto , Potenciales Evocados Auditivos/fisiología , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética/métodos , MasculinoRESUMEN
In many everyday listening situations, an otherwise audible sound may go unnoticed amid multiple other sounds. This auditory phenomenon, called informational masking (IM), is sensitive to visual input and involves early (50-250 msec) activity in the auditory cortex (the so-called awareness-related negativity). It is still unclear whether and how the timing of visual input influences the neural correlates of IM in auditory cortex. To address this question, we obtained simultaneous behavioral and neural measures of IM from human listeners in the presence of a visual input stream and varied the asynchrony between the visual stream and the rhythmic auditory target stream (in-phase, antiphase, or random). Results show effects of cross-modal asynchrony on both target detectability (RT and sensitivity) and the awareness-related negativity measured with EEG, which were driven primarily by antiphasic audiovisual stimuli. The neural effect was limited to the interval shortly before listeners' behavioral report of the target. Our results indicate that the relative timing of visual input can influence the IM of a target sound in the human auditory cortex. They further show that this audiovisual influence occurs early during the perceptual buildup of the target sound. In summary, these findings provide novel insights into the interaction of IM and multisensory interaction in the human brain.
Asunto(s)
Corteza Auditiva/fisiología , Percepción Auditiva/fisiología , Electroencefalografía/métodos , Potenciales Evocados/fisiología , Enmascaramiento Perceptual/fisiología , Percepción Visual/fisiología , Adolescente , Adulto , Femenino , Humanos , Masculino , Adulto JovenRESUMEN
Perceived roughness is associated with a variety of physical factors and multiple peripheral afferent types. The current study investigated whether this complexity of the mapping between physical and perceptual space is reflected at the cortical level. In an integrative psychophysical and imaging approach, we used dot pattern stimuli for which previous studies reported a simple linear relationship of interdot spacing and perceived spatial density and a more complex function of perceived roughness. Thus, by using both a roughness and a spatial estimation task, the physical and perceived stimulus characteristics could be dissociated, with the spatial density task controlling for the processing of low-level sensory aspects. Multivoxel pattern analysis was used to investigate which brain regions hold information indicative of the level of the perceived texture characteristics. While information about differences in perceived roughness was primarily available in higher-order cortices, that is, the operculo-insular cortex and a ventral visual cortex region, information about perceived spatial density could already be derived from early somatosensory and visual regions. This result indicates that cortical processing reflects the different complexities of the evaluated haptic texture dimensions. Furthermore, this study is to our knowledge the first to show a contribution of the visual cortex to tactile roughness perception.
Asunto(s)
Percepción Espacial/fisiología , Percepción del Tacto/fisiología , Tacto/fisiología , Corteza Visual/fisiología , Percepción Visual/fisiología , Adulto , Mapeo Encefálico , Corteza Cerebral/fisiología , Femenino , Humanos , Masculino , Corteza Somatosensorial/fisiología , Adulto JovenRESUMEN
Multivariate pattern analysis (MVPA) in fMRI has been used to extract information from distributed cortical activation patterns, which may go undetected in conventional univariate analysis. However, little is known about the physical and physiological underpinnings of MVPA in fMRI as well as about the effect of spatial smoothing on its performance. Several studies have addressed these issues, but their investigation was limited to the visual cortex at 3T with conflicting results. Here, we used ultra-high field (7T) fMRI to investigate the effect of spatial resolution and smoothing on decoding of speech content (vowels) and speaker identity from auditory cortical responses. To that end, we acquired high-resolution (1.1mm isotropic) fMRI data and additionally reconstructed them at 2.2 and 3.3mm in-plane spatial resolutions from the original k-space data. Furthermore, the data at each resolution were spatially smoothed with different 3D Gaussian kernel sizes (i.e. no smoothing or 1.1, 2.2, 3.3, 4.4, or 8.8mm kernels). For all spatial resolutions and smoothing kernels, we demonstrate the feasibility of decoding speech content (vowel) and speaker identity at 7T using support vector machine (SVM) MVPA. In addition, we found that high spatial frequencies are informative for vowel decoding and that the relative contribution of high and low spatial frequencies is different across the two decoding tasks. Moderate smoothing (up to 2.2mm) improved the accuracies for both decoding of vowels and speakers, possibly due to reduction of noise (e.g. residual motion artifacts or instrument noise) while still preserving information at high spatial frequency. In summary, our results show that - even with the same stimuli and within the same brain areas - the optimal spatial resolution for MVPA in fMRI depends on the specific decoding task of interest.
Asunto(s)
Mapeo Encefálico/métodos , Encéfalo/fisiología , Procesamiento de Señales Asistido por Computador , Estimulación Acústica , Adulto , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Masculino , Análisis Multivariante , Reconocimiento de Normas Patrones Automatizadas , Percepción del Habla , Máquina de Vectores de SoporteRESUMEN
Selective attention to relevant sound properties is essential for everyday listening situations. It enables the formation of different perceptual representations of the same acoustic input and is at the basis of flexible and goal-dependent behavior. Here, we investigated the role of the human auditory cortex in forming behavior-dependent representations of sounds. We used single-trial fMRI and analyzed cortical responses collected while subjects listened to the same speech sounds (vowels /a/, /i/, and /u/) spoken by different speakers (boy, girl, male) and performed a delayed-match-to-sample task on either speech sound or speaker identity. Univariate analyses showed a task-specific activation increase in the right superior temporal gyrus/sulcus (STG/STS) during speaker categorization and in the right posterior temporal cortex during vowel categorization. Beyond regional differences in activation levels, multivariate classification of single trial responses demonstrated that the success with which single speakers and vowels can be decoded from auditory cortical activation patterns depends on task demands and subject's behavioral performance. Speaker/vowel classification relied on distinct but overlapping regions across the (right) mid-anterior STG/STS (speakers) and bilateral mid-posterior STG/STS (vowels), as well as the superior temporal plane including Heschl's gyrus/sulcus. The task dependency of speaker/vowel classification demonstrates that the informative fMRI response patterns reflect the top-down enhancement of behaviorally relevant sound representations. Furthermore, our findings suggest that successful selection, processing, and retention of task-relevant sound properties relies on the joint encoding of information across early and higher-order regions of the auditory cortex.
Asunto(s)
Corteza Auditiva/fisiología , Fonética , Percepción del Habla/fisiología , Estimulación Acústica/métodos , Adulto , Corteza Auditiva/irrigación sanguínea , Mapeo Encefálico , Femenino , Lateralidad Funcional , Humanos , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Masculino , Oxígeno , Psicoacústica , Espectrografía del Sonido , Adulto JovenRESUMEN
Bilinguals derive the same semantic concepts from equivalent, but acoustically different, words in their first and second languages. The neural mechanisms underlying the representation of language-independent concepts in the brain remain unclear. Here, we measured fMRI in human bilingual listeners and reveal that response patterns to individual spoken nouns in one language (e.g., "horse" in English) accurately predict the response patterns to equivalent nouns in the other language (e.g., "paard" in Dutch). Stimuli were four monosyllabic words in both languages, all from the category of "animal" nouns. For each word, pronunciations from three different speakers were included, allowing the investigation of speaker-independent representations of individual words. We used multivariate classifiers and a searchlight method to map the informative fMRI response patterns that enable decoding spoken words within languages (within-language discrimination) and across languages (across-language generalization). Response patterns discriminative of spoken words within language were distributed in multiple cortical regions, reflecting the complexity of the neural networks recruited during speech and language processing. Response patterns discriminative of spoken words across language were limited to localized clusters in the left anterior temporal lobe, the left angular gyrus and the posterior bank of the left postcentral gyrus, the right posterior superior temporal sulcus/superior temporal gyrus, the right medial anterior temporal lobe, the right anterior insula, and bilateral occipital cortex. These results corroborate the existence of "hub" regions organizing semantic-conceptual knowledge in abstract form at the fine-grained level of within semantic category discriminations.
Asunto(s)
Estimulación Acústica/métodos , Imagen por Resonancia Magnética/métodos , Multilingüismo , Percepción del Habla/fisiología , Lóbulo Temporal/fisiología , Adulto , Femenino , Humanos , Lenguaje , Masculino , Habla/fisiología , Adulto JovenRESUMEN
When multivariate pattern decoding is applied to fMRI studies entailing more than two experimental conditions, a most common approach is to transform the multiclass classification problem into a series of binary problems. Furthermore, for decoding analyses, classification accuracy is often the only outcome reported although the topology of activation patterns in the high-dimensional features space may provide additional insights into underlying brain representations. Here we propose to decode and visualize voxel patterns of fMRI datasets consisting of multiple conditions with a supervised variant of self-organizing maps (SSOMs). Using simulations and real fMRI data, we evaluated the performance of our SSOM-based approach. Specifically, the analysis of simulated fMRI data with varying signal-to-noise and contrast-to-noise ratio suggested that SSOMs perform better than a k-nearest-neighbor classifier for medium and large numbers of features (i.e. 250 to 1000 or more voxels) and similar to support vector machines (SVMs) for small and medium numbers of features (i.e. 100 to 600voxels). However, for a larger number of features (>800voxels), SSOMs performed worse than SVMs. When applied to a challenging 3-class fMRI classification problem with datasets collected to examine the neural representation of three human voices at individual speaker level, the SSOM-based algorithm was able to decode speaker identity from auditory cortical activation patterns. Classification performances were similar between SSOMs and other decoding algorithms; however, the ability to visualize decoding models and underlying data topology of SSOMs promotes a more comprehensive understanding of classification outcomes. We further illustrated this visualization ability of SSOMs with a re-analysis of a dataset examining the representation of visual categories in the ventral visual cortex (Haxby et al., 2001). This analysis showed that SSOMs could retrieve and visualize topography and neighborhood relations of the brain representation of eight visual categories. We conclude that SSOMs are particularly suited for decoding datasets consisting of more than two classes and are optimally combined with approaches that reduce the number of voxels used for classification (e.g. region-of-interest or searchlight approaches).
Asunto(s)
Inteligencia Artificial , Mapeo Encefálico/métodos , Corteza Cerebral/fisiología , Imagen por Resonancia Magnética/métodos , Red Nerviosa/fisiología , Reconocimiento de Normas Patrones Automatizadas/métodos , Patrones de Reconocimiento Fisiológico/fisiología , Algoritmos , Humanos , Interpretación de Imagen Asistida por Computador/métodos , Imagenología Tridimensional/métodos , Análisis Multivariante , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Interfaz Usuario-ComputadorRESUMEN
Invasive and non-invasive electrophysiological measurements during "cocktail-party"-like listening indicate that neural activity in the human auditory cortex (AC) "tracks" the envelope of relevant speech. However, due to limited coverage and/or spatial resolution, the distinct contribution of primary and non-primary areas remains unclear. Here, using 7-Tesla fMRI, we measured brain responses of participants attending to one speaker, in the presence and absence of another speaker. Through voxel-wise modeling, we observed envelope tracking in bilateral Heschl's gyrus (HG), right middle superior temporal sulcus (mSTS) and left temporo-parietal junction (TPJ), despite the signal's sluggish nature and slow temporal sampling. Neurovascular activity correlated positively (HG) or negatively (mSTS, TPJ) with the envelope. Further analyses comparing the similarity between spatial response patterns in the single speaker and concurrent speakers conditions and envelope decoding indicated that tracking in HG reflected both relevant and (to a lesser extent) non-relevant speech, while mSTS represented the relevant speech signal. Additionally, in mSTS, the similarity strength correlated with the comprehension of relevant speech. These results indicate that the fMRI signal tracks cortical responses and attention effects related to continuous speech and support the notion that primary and non-primary AC process ongoing speech in a push-pull of acoustic and linguistic information.
Asunto(s)
Corteza Auditiva , Imagen por Resonancia Magnética , Percepción del Habla , Humanos , Corteza Auditiva/fisiología , Corteza Auditiva/diagnóstico por imagen , Masculino , Femenino , Adulto , Percepción del Habla/fisiología , Adulto Joven , Ruido , Mapeo Encefálico/métodos , Habla/fisiología , Estimulación Acústica , Percepción Auditiva/fisiologíaRESUMEN
Human hearing is constructive. For example, when a voice is partially replaced by an extraneous sound (e.g., on the telephone due to a transmission problem), the auditory system may restore the missing portion so that the voice can be perceived as continuous (Miller and Licklider, 1950; for review, see Bregman, 1990; Warren, 1999). The neural mechanisms underlying this continuity illusion have been studied mostly with schematic stimuli (e.g., simple tones) and are still a matter of debate (for review, see Petkov and Sutter, 2011). The goal of the present study was to elucidate how these mechanisms operate under more natural conditions. Using psychophysics and electroencephalography (EEG), we assessed simultaneously the perceived continuity of a human vowel sound through interrupting noise and the concurrent neural activity. We found that vowel continuity illusions were accompanied by a suppression of the 4 Hz EEG power in auditory cortex (AC) that was evoked by the vowel interruption. This suppression was stronger than the suppression accompanying continuity illusions of a simple tone. Finally, continuity perception and 4 Hz power depended on the intactness of the sound that preceded the vowel (i.e., the auditory context). These findings show that a natural sound may be restored during noise due to the suppression of 4 Hz AC activity evoked early during the noise. This mechanism may attenuate sudden pitch changes, adapt the resistance of the auditory system to extraneous sounds across auditory scenes, and provide a useful model for assisted hearing devices.
Asunto(s)
Corteza Auditiva/fisiología , Audición/fisiología , Ruido , Percepción del Habla/fisiología , Estimulación Acústica , Adaptación Psicológica/fisiología , Adulto , Interpretación Estadística de Datos , Electroencefalografía , Potenciales Evocados Auditivos/fisiología , Femenino , Humanos , Ilusiones/psicología , Masculino , Análisis de Componente Principal , Desempeño Psicomotor/fisiología , Psicofísica , Adulto JovenRESUMEN
The formation of new sound categories is fundamental to everyday goal-directed behavior. Categorization requires the abstraction of discrete classes from continuous physical features as required by context and task. Electrophysiology in animals has shown that learning to categorize novel sounds alters their spatiotemporal neural representation at the level of early auditory cortex. However, functional magnetic resonance imaging (fMRI) studies so far did not yield insight into the effects of category learning on sound representations in human auditory cortex. This may be due to the use of overlearned speech-like categories and fMRI subtraction paradigms, leading to insufficient sensitivity to distinguish the responses to learning-induced, novel sound categories. Here, we used fMRI pattern analysis to investigate changes in human auditory cortical response patterns induced by category learning. We created complex novel sound categories and analyzed distributed activation patterns during passive listening to a sound continuum before and after category learning. We show that only after training, sound categories could be successfully decoded from early auditory areas and that learning-induced pattern changes were specific to the category-distinctive sound feature (i.e., pitch). Notably, the similarity between fMRI response patterns for the sound continuum mirrored the sigmoid shape of the behavioral category identification function. Our results indicate that perceptual representations of novel sound categories emerge from neural changes at early levels of the human auditory processing hierarchy.
Asunto(s)
Corteza Auditiva/fisiología , Percepción Auditiva/fisiología , Mapeo Encefálico , Aprendizaje/fisiología , Sonido , Estimulación Acústica/clasificación , Adulto , Análisis de Varianza , Corteza Auditiva/irrigación sanguínea , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Masculino , Distribución Normal , Oxígeno/sangre , Psicoacústica , Análisis Espectral , Adulto JovenRESUMEN
Pattern recognition algorithms are becoming increasingly used in functional neuroimaging. These algorithms exploit information contained in temporal, spatial, or spatio-temporal patterns of independent variables (features) to detect subtle but reliable differences between brain responses to external stimuli or internal brain states. When applied to the analysis of electroencephalography (EEG) or magnetoencephalography (MEG) data, a choice needs to be made on how the input features to the algorithm are obtained from the signal amplitudes measured at the various channels. In this article, we consider six types of pattern analyses deriving from the combination of three types of feature selection in the temporal domain (predefined windows, shifting window, whole trial) with two approaches to handle the channel dimension (channel wise, multi-channel). We combined these different types of analyses with a Gaussian Naïve Bayes classifier and analyzed a multi-subject EEG data set from a study aimed at understanding the task dependence of the cortical mechanisms for encoding speaker's identity and speech content (vowels) from short speech utterances (Bonte, Valente, & Formisano, 2009). Outcomes of the analyses showed that different grouping of available features helps highlighting complementary (i.e. temporal, topographic) aspects of information content in the data. A shifting window/multi-channel approach proved especially valuable in tracing both the early build up of neural information reflecting speaker or vowel identity and the late and task-dependent maintenance of relevant information reflecting the performance of a working memory task. Because it exploits the high temporal resolution of EEG (and MEG), such a shifting window approach with sequential multi-channel classifications seems the most appropriate choice for tracing the temporal profile of neural information processing.
Asunto(s)
Percepción Auditiva/fisiología , Electroencefalografía , Patrones de Reconocimiento Fisiológico/fisiología , Habla , Voz , Mapeo Encefálico , Femenino , Humanos , MasculinoRESUMEN
Numerous neuroimaging studies demonstrated that the auditory cortex tracks ongoing speech and that, in multi-speaker environments, tracking of the attended speaker is enhanced compared to the other irrelevant speakers. In contrast to speech, multi-instrument music can be appreciated by attending not only on its individual entities (i.e., segregation) but also on multiple instruments simultaneously (i.e., integration). We investigated the neural correlates of these two modes of music listening using electroencephalography (EEG) and sound envelope tracking. To this end, we presented uniquely composed music pieces played by two instruments, a bassoon and a cello, in combination with a previously validated music auditory scene analysis behavioral paradigm (Disbergen et al., 2018). Similar to results obtained through selective listening tasks for speech, relevant instruments could be reconstructed better than irrelevant ones during the segregation task. A delay-specific analysis showed higher reconstruction for the relevant instrument during a middle-latency window for both the bassoon and cello and during a late window for the bassoon. During the integration task, we did not observe significant attentional modulation when reconstructing the overall music envelope. Subsequent analyses indicated that this null result might be due to the heterogeneous strategies listeners employ during the integration task. Overall, our results suggest that subsequent to a common processing stage, top-down modulations consistently enhance the relevant instrument's representation during an instrument segregation task, whereas such an enhancement is not observed during an instrument integration task. These findings extend previous results from speech tracking to the tracking of multi-instrument music and, furthermore, inform current theories on polyphonic music perception.
RESUMEN
In this issue of Neuron, O'Sullivan et al. (2019) measured electro-cortical responses to "cocktail party" speech mixtures in neurosurgical patients and demonstrated that the selective enhancement of attended speech is achieved through the adaptive weighting of primary auditory cortex output by non-primary auditory cortex.