Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Adv Neural Inf Process Syst ; 36: 638-654, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38434255

ABSTRACT

Modern language models excel at integrating across long temporal scales needed to encode linguistic meaning and show non-trivial similarities to biological neural systems. Prior work suggests that human brain responses to language exhibit hierarchically organized "integration windows" that substantially constrain the overall influence of an input token (e.g., a word) on the neural response. However, little prior work has attempted to use integration windows to characterize computations in large language models (LLMs). We developed a simple word-swap procedure for estimating integration windows from black-box language models that does not depend on access to gradients or knowledge of the model architecture (e.g., attention weights). Using this method, we show that trained LLMs exhibit stereotyped integration windows that are well-fit by a convex combination of an exponential and a power-law function, with a partial transition from exponential to power-law dynamics across network layers. We then introduce a metric for quantifying the extent to which these integration windows vary with structural boundaries (e.g., sentence boundaries), and using this metric, we show that integration windows become increasingly yoked to structure at later network layers. None of these findings were observed in an untrained model, which as expected integrated uniformly across its input. These results suggest that LLMs learn to integrate information in natural language using a stereotyped pattern: integrating across position-yoked, exponential windows at early layers, followed by structure-yoked, power-law windows at later layers. The methods we describe in this paper provide a general-purpose toolkit for understanding temporal integration in language models, facilitating cross-disciplinary research at the intersection of biological and artificial intelligence.

3.
Curr Biol ; 32(7): 1470-1484.e12, 2022 04 11.
Article in English | MEDLINE | ID: mdl-35196507

ABSTRACT

How is music represented in the brain? While neuroimaging has revealed some spatial segregation between responses to music versus other sounds, little is known about the neural code for music itself. To address this question, we developed a method to infer canonical response components of human auditory cortex using intracranial responses to natural sounds, and further used the superior coverage of fMRI to map their spatial distribution. The inferred components replicated many prior findings, including distinct neural selectivity for speech and music, but also revealed a novel component that responded nearly exclusively to music with singing. Song selectivity was not explainable by standard acoustic features, was located near speech- and music-selective responses, and was also evident in individual electrodes. These results suggest that representations of music are fractionated into subpopulations selective for different types of music, one of which is specialized for the analysis of song.


Subject(s)
Auditory Cortex , Music , Speech Perception , Acoustic Stimulation/methods , Auditory Cortex/physiology , Auditory Perception/physiology , Brain Mapping/methods , Humans , Speech/physiology , Speech Perception/physiology
4.
Nat Hum Behav ; 6(3): 455-469, 2022 03.
Article in English | MEDLINE | ID: mdl-35145280

ABSTRACT

To derive meaning from sound, the brain must integrate information across many timescales. What computations underlie multiscale integration in human auditory cortex? Evidence suggests that auditory cortex analyses sound using both generic acoustic representations (for example, spectrotemporal modulation tuning) and category-specific computations, but the timescales over which these putatively distinct computations integrate remain unclear. To answer this question, we developed a general method to estimate sensory integration windows-the time window when stimuli alter the neural response-and applied our method to intracranial recordings from neurosurgical patients. We show that human auditory cortex integrates hierarchically across diverse timescales spanning from ~50 to 400 ms. Moreover, we find that neural populations with short and long integration windows exhibit distinct functional properties: short-integration electrodes (less than ~200 ms) show prominent spectrotemporal modulation selectivity, while long-integration electrodes (greater than ~200 ms) show prominent category selectivity. These findings reveal how multiscale integration organizes auditory computation in the human brain.


Subject(s)
Auditory Cortex , Acoustic Stimulation/methods , Auditory Perception , Brain , Brain Mapping/methods , Humans
5.
Cognition ; 214: 104627, 2021 09.
Article in English | MEDLINE | ID: mdl-34044231

ABSTRACT

Sound is caused by physical events in the world. Do humans infer these causes when recognizing sound sources? We tested whether the recognition of common environmental sounds depends on the inference of a basic physical variable - the source intensity (i.e., the power that produces a sound). A source's intensity can be inferred from the intensity it produces at the ear and its distance, which is normally conveyed by reverberation. Listeners could thus use intensity at the ear and reverberation to constrain recognition by inferring the underlying source intensity. Alternatively, listeners might separate these acoustic cues from their representation of a sound's identity in the interest of invariant recognition. We compared these two hypotheses by measuring recognition accuracy for sounds with typically low or high source intensity (e.g., pepper grinders vs. trucks) that were presented across a range of intensities at the ear or with reverberation cues to distance. The recognition of low-intensity sources (e.g., pepper grinders) was impaired by high presentation intensities or reverberation that conveyed distance, either of which imply high source intensity. Neither effect occurred for high-intensity sources. The results suggest that listeners implicitly use the intensity at the ear along with distance cues to infer a source's power and constrain its identity. The recognition of real-world sounds thus appears to depend upon the inference of their physical generative parameters, even generative parameters whose cues might otherwise be separated from the representation of a sound's identity.


Subject(s)
Auditory Perception , Sound Localization , Acoustic Stimulation , Cues , Humans , Sound
6.
J Neurophysiol ; 125(6): 2237-2263, 2021 06 01.
Article in English | MEDLINE | ID: mdl-33596723

ABSTRACT

Recent work has shown that human auditory cortex contains neural populations anterior and posterior to primary auditory cortex that respond selectively to music. However, it is unknown how this selectivity for music arises. To test whether musical training is necessary, we measured fMRI responses to 192 natural sounds in 10 people with almost no musical training. When voxel responses were decomposed into underlying components, this group exhibited a music-selective component that was very similar in response profile and anatomical distribution to that previously seen in individuals with moderate musical training. We also found that musical genres that were less familiar to our participants (e.g., Balinese gamelan) produced strong responses within the music component, as did drum clips with rhythm but little melody, suggesting that these neural populations are broadly responsive to music as a whole. Our findings demonstrate that the signature properties of neural music selectivity do not require musical training to develop, showing that the music-selective neural populations are a fundamental and widespread property of the human brain.NEW & NOTEWORTHY We show that music-selective neural populations are clearly present in people without musical training, demonstrating that they are a fundamental and widespread property of the human brain. Additionally, we show music-selective neural populations respond strongly to music from unfamiliar genres as well as music with rhythm but little pitch information, suggesting that they are broadly responsive to music as a whole.


Subject(s)
Auditory Cortex/physiology , Auditory Perception/physiology , Brain Mapping , Music , Practice, Psychological , Adult , Brain Mapping/methods , Female , Humans , Magnetic Resonance Imaging , Male , Young Adult
7.
Adv Neural Inf Process Syst ; 34: 24455-24467, 2021 Dec.
Article in English | MEDLINE | ID: mdl-38737583

ABSTRACT

Natural signals such as speech are hierarchically structured across many different timescales, spanning tens (e.g., phonemes) to hundreds (e.g., words) of milliseconds, each of which is highly variable and context-dependent. While deep neural networks (DNNs) excel at recognizing complex patterns from natural signals, relatively little is known about how DNNs flexibly integrate across multiple timescales. Here, we show how a recently developed method for studying temporal integration in biological neural systems - the temporal context invariance (TCI) paradigm - can be used to understand temporal integration in DNNs. The method is simple: we measure responses to a large number of stimulus segments presented in two different contexts and estimate the smallest segment duration needed to achieve a context invariant response. We applied our method to understand how the popular DeepSpeech2 model learns to integrate across time in speech. We find that nearly all of the model units, even in recurrent layers, have a compact integration window within which stimuli substantially alter the response and outside of which stimuli have little effect. We show that training causes these integration windows to shrink at early layers and expand at higher layers, creating a hierarchy of integration windows across the network. Moreover, by measuring integration windows for time-stretched/compressed speech, we reveal a transition point, midway through the trained network, where integration windows become yoked to the duration of stimulus structures (e.g., phonemes or words) rather than absolute time. Similar phenomena were observed in a purely recurrent and purely convolutional network although structure-yoked integration was more prominent in the recurrent network. These findings suggest that deep speech recognition systems use a common motif to encode the hierarchical structure of speech: integrating across short, time-yoked windows at early layers and long, structure-yoked windows at later layers. Our method provides a straightforward and general-purpose toolkit for understanding temporal integration in black-box machine learning models.

8.
Front Neurosci ; 13: 1165, 2019.
Article in English | MEDLINE | ID: mdl-31736698

ABSTRACT

Machine learning classification techniques are frequently applied to structural and resting-state fMRI data to identify brain-based biomarkers for developmental disorders. However, task-related fMRI has rarely been used as a diagnostic tool. Here, we used structural MRI, resting-state connectivity and task-based fMRI data to detect congenital amusia, a pitch-specific developmental disorder. All approaches discriminated amusics from controls in meaningful brain networks at similar levels of accuracy. Interestingly, the classifier outcome was specific to deficit-related neural circuits, as the group classification failed for fMRI data acquired during a verbal task for which amusics were unimpaired. Most importantly, classifier outputs of task-related fMRI data predicted individual behavioral performance on an independent pitch-based task, while this relationship was not observed for structural or resting-state data. These results suggest that task-related imaging data can potentially be used as a powerful diagnostic tool to identify developmental disorders as they allow for the prediction of symptom severity.

9.
Nat Neurosci ; 22(7): 1057-1060, 2019 07.
Article in English | MEDLINE | ID: mdl-31182868

ABSTRACT

We report a difference between humans and macaque monkeys in the functional organization of cortical regions implicated in pitch perception. Humans but not macaques showed regions with a strong preference for harmonic sounds compared to noise, measured with both synthetic tones and macaque vocalizations. In contrast, frequency-selective tonotopic maps were similar between the two species. This species difference may be driven by the unique demands of speech and music perception in humans.


Subject(s)
Auditory Cortex/physiology , Pitch Perception/physiology , Speech Perception/physiology , Acoustic Stimulation , Adult , Animals , Auditory Cortex/anatomy & histology , Brain Mapping , Female , Humans , Macaca mulatta , Magnetic Resonance Imaging , Male , Music , Species Specificity , Vocalization, Animal
10.
PLoS Biol ; 16(12): e2005127, 2018 12.
Article in English | MEDLINE | ID: mdl-30507943

ABSTRACT

A central goal of sensory neuroscience is to construct models that can explain neural responses to natural stimuli. As a consequence, sensory models are often tested by comparing neural responses to natural stimuli with model responses to those stimuli. One challenge is that distinct model features are often correlated across natural stimuli, and thus model features can predict neural responses even if they do not in fact drive them. Here, we propose a simple alternative for testing a sensory model: we synthesize a stimulus that yields the same model response as each of a set of natural stimuli, and test whether the natural and "model-matched" stimuli elicit the same neural responses. We used this approach to test whether a common model of auditory cortex-in which spectrogram-like peripheral input is processed by linear spectrotemporal filters-can explain fMRI responses in humans to natural sounds. Prior studies have that shown that this model has good predictive power throughout auditory cortex, but this finding could reflect feature correlations in natural stimuli. We observed that fMRI responses to natural and model-matched stimuli were nearly equivalent in primary auditory cortex (PAC) but that nonprimary regions, including those selective for music or speech, showed highly divergent responses to the two sound sets. This dissociation between primary and nonprimary regions was less clear from model predictions due to the influence of feature correlations across natural stimuli. Our results provide a signature of hierarchical organization in human auditory cortex, and suggest that nonprimary regions compute higher-order stimulus properties that are not well captured by traditional models. Our methodology enables stronger tests of sensory models and could be broadly applied in other domains.


Subject(s)
Auditory Cortex/physiology , Auditory Perception/physiology , Acoustic Stimulation/methods , Adult , Algorithms , Female , Humans , Linear Models , Magnetic Resonance Imaging/methods , Male , Models, Neurological , Music , Sound , Speech/physiology
11.
Neuron ; 98(3): 630-644.e16, 2018 05 02.
Article in English | MEDLINE | ID: mdl-29681533

ABSTRACT

A core goal of auditory neuroscience is to build quantitative models that predict cortical responses to natural sounds. Reasoning that a complete model of auditory cortex must solve ecologically relevant tasks, we optimized hierarchical neural networks for speech and music recognition. The best-performing network contained separate music and speech pathways following early shared processing, potentially replicating human cortical organization. The network performed both tasks as well as humans and exhibited human-like errors despite not being optimized to do so, suggesting common constraints on network and human performance. The network predicted fMRI voxel responses substantially better than traditional spectrotemporal filter models throughout auditory cortex. It also provided a quantitative signature of cortical representational hierarchy-primary and non-primary responses were best predicted by intermediate and late network layers, respectively. The results suggest that task optimization provides a powerful set of tools for modeling sensory systems.


Subject(s)
Acoustic Stimulation/methods , Auditory Cortex/diagnostic imaging , Auditory Cortex/physiology , Magnetic Resonance Imaging/methods , Nerve Net/diagnostic imaging , Nerve Net/physiology , Psychomotor Performance/physiology , Adolescent , Adult , Aged , Female , Forecasting , Humans , Male , Middle Aged , Young Adult
12.
J Neurosci ; 36(10): 2986-94, 2016 Mar 09.
Article in English | MEDLINE | ID: mdl-26961952

ABSTRACT

Congenital amusia is a lifelong deficit in music perception thought to reflect an underlying impairment in the perception and memory of pitch. The neural basis of amusic impairments is actively debated. Some prior studies have suggested that amusia stems from impaired connectivity between auditory and frontal cortex. However, it remains possible that impairments in pitch coding within auditory cortex also contribute to the disorder, in part because prior studies have not measured responses from the cortical regions most implicated in pitch perception in normal individuals. We addressed this question by measuring fMRI responses in 11 subjects with amusia and 11 age- and education-matched controls to a stimulus contrast that reliably identifies pitch-responsive regions in normal individuals: harmonic tones versus frequency-matched noise. Our findings demonstrate that amusic individuals with a substantial pitch perception deficit exhibit clusters of pitch-responsive voxels that are comparable in extent, selectivity, and anatomical location to those of control participants. We discuss possible explanations for why amusics might be impaired at perceiving pitch relations despite exhibiting normal fMRI responses to pitch in their auditory cortex: (1) individual neurons within the pitch-responsive region might exhibit abnormal tuning or temporal coding not detectable with fMRI, (2) anatomical tracts that link pitch-responsive regions to other brain areas (e.g., frontal cortex) might be altered, and (3) cortical regions outside of pitch-responsive cortex might be abnormal. The ability to identify pitch-responsive regions in individual amusic subjects will make it possible to ask more precise questions about their role in amusia in future work.


Subject(s)
Auditory Perceptual Disorders/complications , Auditory Perceptual Disorders/pathology , Cerebral Cortex/physiopathology , Pitch Discrimination/physiology , Acoustic Stimulation , Adolescent , Adult , Case-Control Studies , Cerebral Cortex/blood supply , Female , Humans , Image Processing, Computer-Assisted , Magnetic Resonance Imaging , Male , Middle Aged , Oxygen/blood , Regression Analysis , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL