RESUMO
The cocktail party problem requires listeners to infer individual sound sources from mixtures of sound. The problem can be solved only by leveraging regularities in natural sound sources, but little is known about how such regularities are internalized. We explored whether listeners learn source "schemas"-the abstract structure shared by different occurrences of the same type of sound source-and use them to infer sources from mixtures. We measured the ability of listeners to segregate mixtures of time-varying sources. In each experiment a subset of trials contained schema-based sources generated from a common template by transformations (transposition and time dilation) that introduced acoustic variation but preserved abstract structure. Across several tasks and classes of sound sources, schema-based sources consistently aided source separation, in some cases producing rapid improvements in performance over the first few exposures to a schema. Learning persisted across blocks that did not contain the learned schema, and listeners were able to learn and use multiple schemas simultaneously. No learning was evident when schema were presented in the task-irrelevant (i.e., distractor) source. However, learning from task-relevant stimuli showed signs of being implicit, in that listeners were no more likely to report that sources recurred in experiments containing schema-based sources than in control experiments containing no schema-based sources. The results implicate a mechanism for rapidly internalizing abstract sound structure, facilitating accurate perceptual organization of sound sources that recur in the environment.
Assuntos
Atenção/fisiologia , Percepção Auditiva/fisiologia , Aprendizagem/fisiologia , Ruído , Localização de Som/fisiologia , Estimulação Acústica , Sinais (Psicologia) , HumanosRESUMO
Background music is widely used to sustain attention, but little is known about what musical properties aid attention. This may be due to inter-individual variability in neural responses to music. Here we find that music with amplitude modulations added at specific rates can sustain attention differentially for those with varying levels of attentional difficulty. We first tested the hypothesis that music with strong amplitude modulation would improve sustained attention, and found it did so when it occurred early in the experiment. Rapid modulations in music elicited greater activity in attentional networks in fMRI, as well as greater stimulus-brain coupling in EEG. Finally, to test the idea that specific modulation properties would differentially affect listeners based on their level of attentional difficulty, we parametrically manipulated the depth and rate of amplitude modulations inserted in otherwise-identical music, and found that beta-range modulations helped more than other modulation ranges for participants with more ADHD symptoms. Results suggest the possibility of an oscillation-based neural mechanism for targeted music to support improved cognitive performance.
Assuntos
Atenção , Percepção Auditiva , Imageamento por Ressonância Magnética , Música , Humanos , Música/psicologia , Atenção/fisiologia , Masculino , Feminino , Percepção Auditiva/fisiologia , Adulto Jovem , Adulto , Eletroencefalografia , Estimulação Acústica/métodos , Encéfalo/fisiologia , Encéfalo/fisiopatologiaRESUMO
Psychophysical experiments conducted remotely over the internet permit data collection from large numbers of participants but sacrifice control over sound presentation and therefore are not widely employed in hearing research. To help standardize online sound presentation, we introduce a brief psychophysical test for determining whether online experiment participants are wearing headphones. Listeners judge which of three pure tones is quietest, with one of the tones presented 180° out of phase across the stereo channels. This task is intended to be easy over headphones but difficult over loudspeakers due to phase-cancellation. We validated the test in the lab by testing listeners known to be wearing headphones or listening over loudspeakers. The screening test was effective and efficient, discriminating between the two modes of listening with a small number of trials. When run online, a bimodal distribution of scores was obtained, suggesting that some participants performed the task over loudspeakers despite instructions to use headphones. The ability to detect and screen out these participants mitigates concerns over sound quality for online experiments, a first step toward opening auditory perceptual research to the possibilities afforded by crowdsourcing.
Assuntos
Estimulação Acústica/métodos , Percepção Auditiva/fisiologia , Testes Auditivos/instrumentação , Testes Auditivos/métodos , Internet , Adulto , Feminino , Audição/fisiologia , Humanos , MasculinoRESUMO
Auditory scenes often contain concurrent sound sources, but listeners are typically interested in just one of these and must somehow select it for further processing. One challenge is that real-world sounds such as speech vary over time and as a consequence often cannot be separated or selected based on particular values of their features (e.g., high pitch). Here we show that human listeners can circumvent this challenge by tracking sounds with a movable focus of attention. We synthesized pairs of voices that changed in pitch and timbre over random, intertwined trajectories, lacking distinguishing features or linguistic information. Listeners were cued beforehand to attend to one of the voices. We measured their ability to extract this cued voice from the mixture by subsequently presenting the ending portion of one voice and asking whether it came from the cued voice. We found that listeners could perform this task but that performance was mediated by attention-listeners who performed best were also more sensitive to perturbations in the cued voice than in the uncued voice. Moreover, the task was impossible if the source trajectories did not maintain sufficient separation in feature space. The results suggest a locus of attention that can follow a sound's trajectory through a feature space, likely aiding selection and segregation amid similar distractors.
Assuntos
Atenção , Sinais (Psicologia) , Percepção da Fala , Adulto , Feminino , Humanos , Masculino , Mascaramento Perceptivo , Espectrografia do Som , Acústica da Fala , Adulto JovemRESUMO
Voice or speaker recognition is critical in a wide variety of social contexts. In this study, we investigated the contributions of acoustic, phonological, lexical, and semantic information toward voice recognition. Native English speaking participants were trained to recognize five speakers in five conditions: non-speech, Mandarin, German, pseudo-English, and English. We showed that voice recognition significantly improved as more information became available, from purely acoustic features in non-speech to additional phonological information varying in familiarity. Moreover, we found that the recognition performance is transferable between training and testing in phonologically familiar conditions (German, pseudo-English, and English), but not in unfamiliar (Mandarin) or non-speech conditions. These results provide evidence suggesting that bottom-up acoustic analysis and top-down influence from phonological processing collaboratively govern voice recognition.