RESUMO
Adults can learn to identify nonnative speech sounds with training, albeit with substantial variability in learning behavior. Increases in behavioral accuracy are associated with increased separability for sound representations in cortical speech areas. However, it remains unclear whether individual auditory neural populations all show the same types of changes with learning, or whether there are heterogeneous encoding patterns. Here, we used high-resolution direct neural recordings to examine local population response patterns, while native English listeners learned to recognize unfamiliar vocal pitch patterns in Mandarin Chinese tones. We found a distributed set of neural populations in bilateral superior temporal gyrus and ventrolateral frontal cortex, where the encoding of Mandarin tones changed throughout training as a function of trial-by-trial accuracy ("learning effect"), including both increases and decreases in the separability of tones. These populations were distinct from populations that showed changes as a function of exposure to the stimuli regardless of trial-by-trial accuracy. These learning effects were driven in part by more variable neural responses to repeated presentations of acoustically identical stimuli. Finally, learning effects could be predicted from speech-evoked activity even before training, suggesting that intrinsic properties of these populations make them amenable to behavior-related changes. Together, these results demonstrate that nonnative speech sound learning involves a wide array of changes in neural representations across a distributed set of brain regions.
Assuntos
Lobo Frontal/fisiologia , Aprendizagem/fisiologia , Percepção da Fala/fisiologia , Estimulação Acústica , Adulto , Encéfalo/fisiologia , Potenciais Evocados Auditivos/fisiologia , Feminino , Humanos , Idioma , Masculino , Pessoa de Meia-Idade , Fonética , Percepção da Altura Sonora/fisiologia , Fala/fisiologia , Acústica da Fala , Lobo Temporal/fisiologiaRESUMO
Adults struggle to learn non-native speech contrasts even after years of exposure. While laboratory-based training approaches yield learning, the optimal training conditions for maximizing speech learning in adulthood are currently unknown. Vagus nerve stimulation has been shown to prime adult sensory-perceptual systems towards plasticity in animal models. Precise temporal pairing with auditory stimuli can enhance auditory cortical representations with a high degree of specificity. Here, we examined whether sub-perceptual threshold transcutaneous vagus nerve stimulation (tVNS), paired with non-native speech sounds, enhances speech category learning in adults. Twenty-four native English-speakers were trained to identify non-native Mandarin tone categories. Across two groups, tVNS was paired with the tone categories that were easier- or harder-to-learn. A control group received no stimulation but followed an identical thresholding procedure as the intervention groups. We found that tVNS robustly enhanced speech category learning and retention of correct stimulus-response associations, but only when stimulation was paired with the easier-to-learn categories. This effect emerged rapidly, generalized to new exemplars, and was qualitatively different from the normal individual variability observed in hundreds of learners who have performed in the same task without stimulation. Electroencephalography recorded before and after training indicated no evidence of tVNS-induced changes in the sensory representation of auditory stimuli. These results suggest that paired-tVNS induces a temporally precise neuromodulatory signal that selectively enhances the perception and memory consolidation of perceptually salient categories.
RESUMO
INTRODUCTION: Scalp-recorded electrophysiological responses to complex, periodic auditory signals reflect phase-locked activity from neural ensembles within the auditory system. These responses, referred to as frequency-following responses (FFRs), have been widely utilized to index typical and atypical representation of speech signals in the auditory system. One of the major limitations in FFR is the low signal-to-noise ratio at the level of single trials. For this reason, the analysis relies on averaging across thousands of trials. The ability to examine the quality of single-trial FFRs will allow investigation of trial-by-trial dynamics of the FFR, which has been impossible due to the averaging approach. METHODS: In a novel, data-driven approach, we used machine learning principles to decode information related to the speech signal from single trial FFRs. FFRs were collected from participants while they listened to two vowels produced by two speakers. Scalp-recorded electrophysiological responses were projected onto a low-dimensional spectral feature space independently derived from the same two vowels produced by 40 speakers, which were not presented to the participants. A novel supervised machine learning classifier was trained to discriminate vowel tokens on a subset of FFRs from each participant, and tested on the remaining subset. RESULTS: We demonstrate reliable decoding of speech signals at the level of single-trials by decomposing the raw FFR based on information-bearing spectral features in the speech signal that were independently derived. CONCLUSIONS: Taken together, the ability to extract interpretable features at the level of single-trials in a data-driven manner offers unchartered possibilities in the noninvasive assessment of human auditory function.