RESUMEN
This paper suggests an explanation for listeners' greater tolerance to positive than negative mistuning of the higher tone within an octave pair. It hypothesizes a neural circuit tuned to cancel the lower tone that also cancels the higher tone if that tone is in tune. Imperfect cancellation is the cue to mistuning of the octave. The circuit involves two neural pathways, one delayed with respect to the other, that feed a coincidence-sensitive neuron via excitatory and inhibitory synapses. A mismatch between the time constants of these two synapses results in an asymmetry in sensitivity to mismatch. Specifically, if the time constant of the delayed pathway is greater than that of the direct pathway, there is a greater tolerance to positive mistuning than to negative mistuning. The model is directly applicable to the harmonic octave (concurrent tones) but extending it to the melodic octave (successive tones) requires additional assumptions that are discussed. The paper reviews evidence from auditory psychophysics and physiology in favor-or against-this explanation.
Asunto(s)
Tronco Encefálico , Neuronas , Neuronas/fisiología , Percepción Auditiva/fisiología , Estimulación AcústicaRESUMEN
A model of early auditory processing is proposed in which each peripheral channel is processed by a delay-and-subtract cancellation filter, tuned independently for each channel with a criterion of minimum power. For a channel dominated by a pure tone or a resolved partial of a complex tone, the optimal delay is its period. For a channel responding to harmonically related partials, the optimal delay is their common fundamental period. Each peripheral channel is thus split into two subchannels-one that is cancellation-filtered and the other that is not. Perception can involve either or both, depending on the task. The model is illustrated by applying it to the masking asymmetry between pure tones and narrowband noise: a noise target masked by a tone is more easily detectable than a tone target masked by noise. The model is one of a wider class of models, monaural or binaural, that cancel irrelevant stimulus dimensions to attain invariance to competing sources. Similar to occlusion in the visual domain, cancellation yields sensory evidence that is incomplete, thus requiring Bayesian inference of an internal model of the world along the lines of Helmholtz's doctrine of unconscious inference.
Asunto(s)
Percepción Auditiva , Enmascaramiento Perceptual , Umbral Auditivo , Teorema de Bayes , Ruido/efectos adversosRESUMEN
Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy.SIGNIFICANCE STATEMENT During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.
Asunto(s)
Encéfalo/fisiología , Comprensión/fisiología , Señales (Psicología) , Percepción del Habla/fisiología , Percepción Visual/fisiología , Estimulación Acústica , Mapeo Encefálico , Electroencefalografía , Femenino , Humanos , Masculino , Fonética , Estimulación LuminosaRESUMEN
Power line artifacts are the bane of animal and human electrophysiology. A number of methods are available to help attenuate or eliminate them, but each has its own set of drawbacks. In this brief note I present a simple method that combines the advantages of spectral and spatial filtering, while minimizing their downsides. A perfect-reconstruction filterbank is used to split the data into two parts, one noise-free and the other contaminated by line artifact. The artifact-contaminated stream is processed by a spatial filter to project out line components, and added to the noise-free part to obtain clean data. This method is applicable to multichannel data such as electroencephalography (EEG), magnetoencephalography (MEG), or multichannel local field potentials (LFP). I briefly review past methods, pointing out their drawbacks, describe the new method, and evaluate the outcome using synthetic and real data.
Asunto(s)
Encéfalo/anomalías , Electroencefalografía , Magnetoencefalografía , Procesamiento de Señales Asistido por Computador/instrumentación , Algoritmos , Artefactos , Encéfalo/fisiología , Electroencefalografía/métodos , Humanos , Magnetoencefalografía/métodos , RuidoRESUMEN
Humans comprehend speech despite the various challenges such as mispronunciation and noisy environments. Our auditory system is robust to these thanks to the integration of the sensory input with prior knowledge and expectations built on language-specific regularities. One such regularity regards the permissible phoneme sequences, which determine the likelihood that a word belongs to a given language (phonotactic probability; "blick" is more likely to be an English word than "bnick"). Previous research demonstrated that violations of these rules modulate brain-evoked responses. However, several fundamental questions remain unresolved, especially regarding the neural encoding and integration strategy of phonotactics in naturalistic conditions, when there are no (or few) violations. Here, we used linear modelling to assess the influence of phonotactic probabilities on the brain responses to narrative speech measured with non-invasive EEG. We found that the relationship between continuous speech and EEG responses is best described when the stimulus descriptor includes phonotactic probabilities. This indicates that low-frequency cortical signals (<9â¯Hz) reflect the integration of phonotactic information during natural speech perception, providing us with a measure of phonotactic processing at the individual subject-level. Furthermore, phonotactics-related signals showed the strongest speech-EEG interactions at latencies of 100-500â¯ms, supporting a pre-lexical role of phonotactic information.
Asunto(s)
Corteza Cerebral/fisiología , Fonética , Percepción del Habla/fisiología , Estimulación Acústica , Adulto , Potenciales Evocados Auditivos , Femenino , Humanos , Masculino , Adulto JovenRESUMEN
Brain data recorded with electroencephalography (EEG), magnetoencephalography (MEG) and related techniques often have poor signal-to-noise ratios due to the presence of multiple competing sources and artifacts. A common remedy is to average responses over repeats of the same stimulus, but this is not applicable for temporally extended stimuli that are presented only once (speech, music, movies, natural sound). An alternative is to average responses over multiple subjects that were presented with identical stimuli, but differences in geometry of brain sources and sensors reduce the effectiveness of this solution. Multiway canonical correlation analysis (MCCA) brings a solution to this problem by allowing data from multiple subjects to be fused in such a way as to extract components common to all. This paper reviews the method, offers application examples that illustrate its effectiveness, and outlines the caveats and risks entailed by the method.
Asunto(s)
Encéfalo/fisiología , Interpretación Estadística de Datos , Electroencefalografía/métodos , Magnetoencefalografía/métodos , Modelos Teóricos , Adulto , HumanosRESUMEN
Electroencephalography (EEG), magnetoencephalography (MEG) and related techniques are prone to glitches, slow drift, steps, etc., that contaminate the data and interfere with the analysis and interpretation. These artifacts are usually addressed in a preprocessing phase that attempts to remove them or minimize their impact. This paper offers a set of useful techniques for this purpose: robust detrending, robust rereferencing, outlier detection, data interpolation (inpainting), step removal, and filter ringing artifact removal. These techniques provide a less wasteful alternative to discarding corrupted trials or channels, and they are relatively immune to artifacts that disrupt alternative approaches such as filtering. Robust detrending allows slow drifts and common mode signals to be factored out while avoiding the deleterious effects of glitches. Robust rereferencing reduces the impact of artifacts on the reference. Inpainting allows corrupt data to be interpolated from intact parts based on the correlation structure estimated over the intact parts. Outlier detection allows the corrupt parts to be identified. Step removal fixes the high-amplitude flux jump artifacts that are common with some MEG systems. Ringing removal allows the ringing response of the antialiasing filter to glitches (steps, pulses) to be suppressed. The performance of the methods is illustrated and evaluated using synthetic data and data from real EEG and MEG systems. These methods, which are mainly automatic and require little tuning, can greatly improve the quality of the data.
Asunto(s)
Artefactos , Encéfalo/fisiología , Electroencefalografía/métodos , Magnetoencefalografía/métodos , Procesamiento de Señales Asistido por Computador , Mapeo Encefálico/métodos , HumanosRESUMEN
The relation between a stimulus and the evoked brain response can shed light on perceptual processes within the brain. Signals derived from this relation can also be harnessed to control external devices for Brain Computer Interface (BCI) applications. While the classic event-related potential (ERP) is appropriate for isolated stimuli, more sophisticated "decoding" strategies are needed to address continuous stimuli such as speech, music or environmental sounds. Here we describe an approach based on Canonical Correlation Analysis (CCA) that finds the optimal transform to apply to both the stimulus and the response to reveal correlations between the two. Compared to prior methods based on forward or backward models for stimulus-response mapping, CCA finds significantly higher correlation scores, thus providing increased sensitivity to relatively small effects, and supports classifier schemes that yield higher classification scores. CCA strips the brain response of variance unrelated to the stimulus, and the stimulus representation of variance that does not affect the response, and thus improves observations of the relation between stimulus and response.
Asunto(s)
Mapeo Encefálico/métodos , Encéfalo/fisiología , Procesamiento de Señales Asistido por Computador , Estimulación Acústica , Electroencefalografía/métodos , Potenciales Evocados Auditivos/fisiología , Humanos , Magnetoencefalografía/métodosRESUMEN
Studies that measure frequency discrimination often use 2, 3, or 4 tones per trial. This paper shows an investigation of a two-alternative forced choice (2AFC) task in which each tone of a series is judged relative to the previous tone ("sliding 2AFC"). Potential advantages are a greater yield (number of responses per unit time), and a more uniform history of stimulation for the study of context effects, or to relate time-varying performance to cortical activity. The new task was evaluated relative to a classic 2-tone-per-trial 2AFC task with similar stimulus parameters. For each task, conditions with different stimulus parameters were compared. The main results were as follows: (1) thresholds did not differ significantly between tasks when similar parameters were used. (2) Thresholds did differ between conditions for the new task, showing a deleterious effect of inserting relatively large steps in the frequency sequence. (3) Thresholds also differed between conditions for the classic task, showing an advantage for a fixed frequency standard. There was no indication that results were more variable with either task, and no reason was found not to use the new sliding 2AFC task in lieu of the classic 2-tone-per-trial 2AFC task.
RESUMEN
Studies that measure pitch discrimination relate a subject's response on each trial to the stimuli presented on that trial, but there is evidence that behavior depends also on earlier stimulation. Here, listeners heard a sequence of tones and reported after each tone whether it was higher or lower in pitch than the previous tone. Frequencies were determined by an adaptive staircase targeting 75% correct, with interleaved tracks to ensure independence between consecutive frequency changes. Responses for this specific task were predicted by a model that took into account the frequency interval on the current trial, as well as the interval and response on the previous trial. This model was superior to simpler models. The dependence on the previous interval was positive (assimilative) for all subjects, consistent with persistence of the sensory trace. The dependence on the previous response was either positive or negative, depending on the subject, consistent with a subject-specific suboptimal response strategy. It is argued that a full stimulus + response model is necessary to account for effects of stimulus history and obtain an accurate estimate of sensory noise.
Asunto(s)
Discriminación en Psicología , Juicio , Discriminación de la Altura Tonal , Estimulación Acústica , Adaptación Psicológica , Adulto , Audiometría de Tonos Puros , Femenino , Humanos , Masculino , Psicoacústica , Adulto JovenRESUMEN
We review a simple yet versatile approach for the analysis of multichannel data, focusing in particular on brain signals measured with EEG, MEG, ECoG, LFP or optical imaging. Sensors are combined linearly with weights that are chosen to provide optimal signal-to-noise ratio. Signal and noise can be variably defined to match the specific need, e.g. reproducibility over trials, frequency content, or differences between stimulus conditions. We demonstrate how the method can be used to remove power line or cardiac interference, enhance stimulus-evoked or stimulus-induced activity, isolate narrow-band cortical activity, and so on. The approach involves decorrelating both the original and filtered data by joint diagonalization of their covariance matrices. We trace its origins; offer an easy-to-understand explanation; review a range of applications; and chart failure scenarios that might lead to misleading results, in particular due to overfitting. In addition to its flexibility and effectiveness, a major appeal of the method is that it is easy to understand.
Asunto(s)
Artefactos , Encéfalo/fisiología , Electrodiagnóstico/métodos , Modelos Estadísticos , Procesamiento de Señales Asistido por Computador , Electroencefalografía/métodos , Humanos , Magnetoencefalografía/métodos , Imagen Óptica/métodos , Relación Señal-RuidoRESUMEN
In animal models, single-neuron response properties such as stimulus-specific adaptation have been described as possible precursors to mismatch negativity, a human brain response to stimulus change. In the present study, we attempted to bridge the gap between human and animal studies by characterising responses to changes in the frequency of repeated tone series in the anesthetised guinea pig using small-animal magnetoencephalography (MEG). We showed that 1) auditory evoked fields (AEFs) qualitatively similar to those observed in human MEG studies can be detected noninvasively in rodents using small-animal MEG; 2) guinea pig AEF amplitudes reduce rapidly with tone repetition, and this AEF reduction is largely complete by the second tone in a repeated series; and 3) differences between responses to the first (deviant) and later (standard) tones after a frequency transition resemble those previously observed in awake humans using a similar stimulus paradigm.
Asunto(s)
Percepción Auditiva/fisiología , Potenciales Evocados Auditivos , Magnetoencefalografía , Inhibición Neural , Estimulación Acústica , Animales , Cobayas , Humanos , MasculinoRESUMEN
Local field potentials (LFPs) recorded in the auditory cortex of mammals are known to reveal weakly selective and often multimodal spectrotemporal receptive fields in contrast to spiking activity. This may in part reflect the wider "listening sphere" of LFPs relative to spikes due to the greater current spread at low than high frequencies. We recorded LFPs and spikes from auditory cortex of guinea pigs using 16-channel electrode arrays. LFPs were processed by a component analysis technique that produces optimally tuned linear combinations of electrode signals. Linear combinations of LFPs were found to have sharply tuned responses, closer to spike-related tuning. The existence of a sharply tuned component implies that a cortical neuron (or group of neurons) capable of forming a linear combination of its inputs has access to that information. Linear combinations of signals from electrode arrays reveal information latent in the subspace spanned by multichannel LFP recordings and are justified by the fact that the observations themselves are linear combinations of neural sources.
Asunto(s)
Corteza Auditiva/fisiología , Percepción Auditiva/fisiología , Neuronas/fisiología , Potenciales de Acción/fisiología , Animales , Cobayas , Análisis de Componente PrincipalRESUMEN
I present a method for analyzing multichannel recordings in response to repeated stimulus presentation. Quadratic Component Analysis (QCA) extracts responses that are stimulus-induced (triggered by the stimulus but not precisely locked in time), as opposed to stimulus-evoked (time-locked to the stimulus). Induced responses are often found in neural response data from magnetoencephalography (MEG), electroencephalography (EEG), or multichannel electrophysiological and optical recordings. The instantaneous power of a linear combination of channels can be expressed as a weighted sum of instantaneous cross-products between channel waveforms. Based on this fact, a technique known as Denoising Source Separation (DSS) is used to find the most reproducible "quadratic component" (linear combination of cross-products). The linear component with a square most similar to this quadratic component is taken to approximate the most reproducible evoked activity. Projecting out the component and repeating the analysis allows multiple induced components to be extracted by deflation. The method is illustrated with synthetic data, as well as real MEG data. At unfavorable signal-to-noise ratios, it can reveal stimulus-induced activity that is invisible to other approaches such as time-frequency analysis.
Asunto(s)
Electroencefalografía/métodos , Magnetoencefalografía/métodos , Análisis de Componente Principal , Algoritmos , Artefactos , Humanos , Procesamiento de Señales Asistido por ComputadorRESUMEN
We investigated the effect of a biasing tone close to 5, 15, or 30 Hz on the response to higher-frequency probe tones, behaviorally, and by measuring distortion-product otoacoustic emissions (DPOAEs). The amplitude of the biasing tone was adjusted for criterion suppression of cubic DPOAE elicited by probe tones presented between 0.7 and 8 kHz, or criterion loudness suppression of a train of tone-pip probes in the range 0.125-8 kHz. For DPOAEs, the biasing-tone level for criterion suppression increased with probe-tone frequency by 8-9 dB/octave, consistent with an apex-to-base gradient of biasing-tone-induced basilar membrane displacement, as we verified by computational simulation. In contrast, the biasing-tone level for criterion loudness suppression increased with probe frequency by only 1-3 dB/octave, reminiscent of previously published data on low-side suppression of auditory nerve responses to characteristic frequency tones. These slopes were independent of biasing-tone frequency, but the biasing-tone sensation level required for criterion suppression was ~ 10 dB lower for the two infrasound biasing tones than for the 30-Hz biasing tone. On average, biasing-tone sensation levels as low as 5 dB were sufficient to modulate the perception of higher frequency sounds. Our results are relevant for recent debates on perceptual effects of environmental noise with very low-frequency content and might offer insight into the mechanism underlying low-side suppression.
Asunto(s)
Cóclea , Emisiones Otoacústicas Espontáneas , Estimulación Acústica , Membrana Basilar , Cóclea/fisiología , Ruido , Emisiones Otoacústicas Espontáneas/fisiología , SonidoRESUMEN
This paper reviews the hypothesis of harmonic cancellation according to which an interfering sound is suppressed or canceled on the basis of its harmonicity (or periodicity in the time domain) for the purpose of Auditory Scene Analysis. It defines the concept, discusses theoretical arguments in its favor, and reviews experimental results that support it, or not. If correct, the hypothesis may draw on time-domain processing of temporally accurate neural representations within the brainstem, as required also by the classic equalization-cancellation model of binaural unmasking. The hypothesis predicts that a target sound corrupted by interference will be easier to hear if the interference is harmonic than inharmonic, all else being equal. This prediction is borne out in a number of behavioral studies, but not all. The paper reviews those results, with the aim to understand the inconsistencies and come up with a reliable conclusion for, or against, the hypothesis of harmonic cancellation within the auditory system.
RESUMEN
Objective.An auditory stimulus can be related to the brain response that it evokes by a stimulus-response model fit to the data. This offers insight into perceptual processes within the brain and is also of potential use for devices such as brain computer interfaces (BCIs). The quality of the model can be quantified by measuring the fit with a regression problem, or by applying it to a classification task and measuring its performance.Approach.Here we focus on amatch-mismatch(MM) task that entails deciding whether a segment of brain signal matches, via a model, the auditory stimulus that evoked it.Main results. Using these metrics, we describe a range of models of increasing complexity that we compare to methods in the literature, showing state-of-the-art performance. We document in detail one particular implementation, calibrated on a publicly-available database, that can serve as a robust reference to evaluate future developments.Significance.The MM task allows stimulus-response models to be evaluated in the limit of very high model accuracy, making it an attractive alternative to the more commonly used task of auditory attention detection. The MM task does not require class labels, so it is immune to mislabeling, and it is applicable to data recorded in listening scenarios with only one sound source, thus it is cheap to obtain large quantities of training and testing data. Performance metrics from this task, associated with regression accuracy, provide complementary insights into the relation between stimulus and response, as well as information about discriminatory power directly applicable to BCI applications.
Asunto(s)
Interfaces Cerebro-Computador , Electroencefalografía , Atención , Percepción Auditiva , EncéfaloRESUMEN
Humans engagement in music rests on underlying elements such as the listeners' cultural background and interest in music. These factors modulate how listeners anticipate musical events, a process inducing instantaneous neural responses as the music confronts these expectations. Measuring such neural correlates would represent a direct window into high-level brain processing. Here we recorded cortical signals as participants listened to Bach melodies. We assessed the relative contributions of acoustic versus melodic components of the music to the neural signal. Melodic features included information on pitch progressions and their tempo, which were extracted from a predictive model of musical structure based on Markov chains. We related the music to brain activity with temporal response functions demonstrating, for the first time, distinct cortical encoding of pitch and note-onset expectations during naturalistic music listening. This encoding was most pronounced at response latencies up to 350 ms, and in both planum temporale and Heschl's gyrus.
Asunto(s)
Percepción Auditiva/fisiología , Música , Lóbulo Temporal/fisiología , Estimulación Acústica , Electroencefalografía , Potenciales Evocados Auditivos/fisiología , Humanos , Tiempo de ReacciónRESUMEN
Filters are commonly used to reduce noise and improve data quality. Filter theory is part of a scientist's training, yet the impact of filters on interpreting data is not always fully appreciated. This paper reviews the issue and explains what a filter is, what problems are to be expected when using them, how to choose the right filter, and how to avoid filtering by using alternative tools. Time-frequency analysis shares some of the same problems that filters have, particularly in the case of wavelet transforms. We recommend reporting filter characteristics with sufficient details, including a plot of the impulse or step response as an inset.
Asunto(s)
Artefactos , Exactitud de los Datos , Procesamiento de Señales Asistido por Computador , Relación Señal-Ruido , Causalidad , Análisis de Fourier , Humanos , Neurociencias , Análisis de OndículasRESUMEN
Auditory environments vary as a result of the appearance and disappearance of acoustic sources, as well as fluctuations characteristic of the sources themselves. The appearance of an object is often manifest as a transition in the pattern of ongoing fluctuation, rather than an onset or offset of acoustic power. How does the system detect and process such transitions? Based on magnetoencephalography data, we show that the temporal dynamics and response morphology of the neural temporal-edge detection processes depend in precise ways on the nature of the change. We measure auditory cortical responses to transitions between "disorder," modeled as a sequence of random frequency tone pips, and "order," modeled as a constant tone. Such transitions embody key characteristics of natural auditory edges. Early cortical responses (from approximately 50 ms post-transition) reveal that order-disorder transitions, and vice versa, are processed by different neural mechanisms. Their dynamics suggest that the auditory cortex optimally adjusts to stimulus statistics, even when this is not required for overt behavior. Furthermore, this response profile bears a striking similarity to that measured from another order-disorder transition, between interaurally correlated and uncorrelated noise, a radically different stimulus. This parallelism suggests the existence of a general mechanism that operates early in the processing stream on the abstract statistics of the auditory input, and is putatively related to the processes of constructing a new representation or detecting a deviation from a previously acquired model of the auditory scene. Together, the data reveal information about the mechanisms with which the brain samples, represents, and detects changes in the environment.