RESUMO
Few studies have examined neural correlates of late talking in toddlers, which could aid in understanding etiology and improving diagnosis of developmental language disorder (DLD). Greater frontal gamma activity has been linked to better language skills, but findings vary by risk for developmental disorders, and this has not been investigated in late talkers. This study examined whether frontal gamma power (30-50 Hz), from baseline-state electroencephalography (EEG), was related to DLD risk (categorical late talking status) and a continuous measure of expressive language in n = 124 toddlers. Frontal gamma power was significantly associated with late talker status when controlling for demographic factors and concurrent receptive language (ß = 1.96, McFadden's Pseudo R2 = 0.21). Demographic factors and receptive language did not significantly moderate the association between frontal gamma power and late talker status. A continuous measure of expressive language ability was not significantly associated with gamma (r = -0.07). Findings suggest that frontal gamma power may be useful in discriminating between groups of children that differ in DLD risk, but not for expressive language along a continuous spectrum of ability.
Assuntos
Eletroencefalografia , Lobo Frontal , Ritmo Gama , Transtornos do Desenvolvimento da Linguagem , Humanos , Feminino , Masculino , Pré-Escolar , Lobo Frontal/fisiologia , Ritmo Gama/fisiologia , Transtornos do Desenvolvimento da Linguagem/fisiopatologia , Lactente , Idioma , Desenvolvimento da Linguagem , Fala/fisiologia , Linguagem InfantilRESUMO
Pitch peaks tend to be higher at the beginning of longer utterances than in shorter ones (e.g., 'The Santa is decorating the Christmas trees' vs. 'The Santa is decorating the Christmas tree and the window'). Given that a rise in pitch frequently occurs in response to increased mental effort, we explore the link between higher pitch at the beginning of an utterance and the cognitive demands of sentence planning for speech production. To modulate the cognitive resources available for generating a message in a visual world speech production task, the study implemented a dual-task paradigm. Participants described pictures depicting events with multiple actors. In one-half of these descriptions, the participants memorized three nouns, later recalling them and answering related questions. The results demonstrate both cognitive and linguistic influences on sentence intonation. Specifically, intonation peaks at the beginning of longer utterances were higher than in shorter ones, and they were lower under the condition of memory load than under no load. Measurements of eye gaze indicated a very short processing delay at the outset of processing the picture and the sentence, which was rapidly overcome by the start of speech. The short time frame of restricted cognitive resources thus was manifested in the lowering of the intonation peaks. These findings establish a novel link between language-related memory span and sentence intonation and warrant further study to investigate the cognitive mechanisms of the planning of intonation.
Assuntos
Cognição , Fala , Humanos , Fala/fisiologia , Masculino , Cognição/fisiologia , Feminino , Adulto , Adulto JovemRESUMO
Speaker recognition is a technology that identifies the speaker in an input utterance by extracting speaker-distinguishable features from the speech signal. Speaker recognition is used for system security and authentication; therefore, it is crucial to extract unique features of the speaker to achieve high recognition rates. Representative methods for extracting these features include a classification approach, or utilizing contrastive learning to learn the speaker relationship between representations and then using embeddings extracted from a specific layer of the model. This paper introduces a framework for developing robust speaker recognition models through contrastive learning. This approach aims to minimize the similarity to hard negative samples-those that are genuine negatives, but have extremely similar features to the positives, leading to potential mistaken. Specifically, our proposed method trains the model by estimating hard negative samples within a mini-batch during contrastive learning, and then utilizes a cross-attention mechanism to determine speaker agreement for pairs of utterances. To demonstrate the effectiveness of our proposed method, we compared the performance of a deep learning model trained with a conventional loss function utilized in speaker recognition with that of a deep learning model trained using our proposed method, as measured by the equal error rate (EER), an objective performance metric. Our results indicate that when trained with the voxceleb2 dataset, the proposed method achieved an EER of 0.98% on the voxceleb1-E dataset and 1.84% on the voxceleb1-H dataset.
Assuntos
Fala , Humanos , Fala/fisiologia , Algoritmos , Aprendizado Profundo , Reconhecimento Automatizado de Padrão/métodos , Interface para o Reconhecimento da FalaRESUMO
BACKGROUND: Several studies indicate that people who stutter show greater variability in speech movements than people who do not stutter, even when the speech produced is perceptibly fluent. Speaking to the beat of a metronome reliably increases fluency in people who stutter, regardless of the severity of stuttering. OBJECTIVES: Here, we aimed to test whether metronome-timed speech reduces articulatory variability. METHOD: We analysed vocal tract MRI data from 24 people who stutter and 16 controls. Participants repeated sentences with and without a metronome. Midsagittal images of the vocal tract from lips to larynx were reconstructed at 33.3 frames per second. Any utterances containing dysfluencies or non-speech movements (e.g. swallowing) were excluded. For each participant, we measured the variability of movements (coefficient of variation) from the alveolar, palatal and velar regions of the vocal tract. RESULTS: People who stutter had more variability than control speakers when speaking without a metronome, which was then reduced to the same level as controls when speaking with the metronome. The velar region contained more variability than the alveolar and palatal regions, which were similar. CONCLUSIONS: These results demonstrate that kinematic variability during perceptibly fluent speech is increased in people who stutter compared with controls when repeating naturalistic sentences without any alteration or disruption to the speech. This extends our previous findings of greater variability in the movements of people who stutter when producing perceptibly fluent nonwords compared with controls. These results also show, that in addition to increasing fluency in people who stutter, metronome-timed speech also reduces articulatory variability to the same level as that seen in control speakers.
Assuntos
Imageamento por Ressonância Magnética , Fala , Gagueira , Humanos , Gagueira/fisiopatologia , Masculino , Adulto , Feminino , Fala/fisiologia , Fenômenos Biomecânicos , Adulto Jovem , Pessoa de Meia-Idade , Estudos de Casos e ControlesRESUMO
Two experiments served to examine how people arrive at stimulus-specific prospective judgments about the distracting effects of speech on cognitive performance. The direct-access account implies that people have direct metacognitive access to the cognitive effects of sounds that determine distraction. The processing-fluency account implies that people rely on the processing-fluency heuristic to predict the distracting effects of sounds on cognitive performance. To test these accounts against each other, we manipulated the processing fluency of speech by playing speech forward or backward and by playing speech in the participants' native or a foreign language. Forward speech and native speech disrupted serial recall to the same degree as backward speech and foreign speech, respectively. However, the more fluently experienced forward speech and native speech were incorrectly predicted to be less distracting than backward speech and foreign speech. This provides evidence of a metacognitive illusion in stimulus-specific prospective judgments of distraction by speech, supporting the processing-fluency account over the direct-access account. The difference between more and less fluently experienced speech was largely absent in the participants' global retrospective judgments of distraction, suggesting that people gain access to comparatively valid cues when experiencing the distracting effects of speech on their serial-recall performance firsthand.
Assuntos
Ilusões , Julgamento , Metacognição , Fala , Humanos , Masculino , Feminino , Julgamento/fisiologia , Metacognição/fisiologia , Fala/fisiologia , Adulto , Adulto Jovem , Ilusões/fisiologia , Atenção/fisiologia , Rememoração Mental/fisiologia , Percepção da Fala/fisiologia , IdiomaRESUMO
Listeners adapt to the speech rate of talkers. Many studies of speech rate adaptation have focused on the influence of rate information on the perception of word segmentation or segmental perception in English. The effects of immediately adjacent (i.e., proximal) information are generally strong on the perception of both segments and segmentation, but the effects of rate information temporally remote from (i.e., distal to) ambiguous speech signals are less clear, especially for segments. The present study examines the influence of distal rate adaptation on the perception of geminate consonants in Arabic that straddle a morpheme boundary (i.e., heteromorphemic geminates). Participants heard sentences that at one point were ambiguous to the presence of the Arabic definite clitic al, which, under certain circumstances, can be realized as gemination of the subsequent word-initial consonant. The sentences were either recorded with or without the clitic and with three possible distal speech rates in the context of the clitic. They transcribed the sentences and reported what they heard, and those transcriptions were analyzed for the contribution of the original recording and the distal speech rate on the perception of al. It was found that the perception of geminates in Arabic is rate dependent. This extends the knowledge of the effects of distal rate cues to Arabic, showing that Arabic geminate consonants are perceived relative to the rate of the distal context.
Assuntos
Idioma , Fonética , Percepção da Fala , Humanos , Percepção da Fala/fisiologia , Feminino , Masculino , Adulto , Adulto Jovem , Fala/fisiologiaRESUMO
Many studies have shown that input in more than one language influences children's phonemic development. In this study, we examined the neural processes supporting perception of Voice Onset Time (VOT) in bilingual Italian-German children and their monolingual German peers. While German contrasts short-lag and long-lag, Italian contrasts short-lag and voicing lead. We examined whether bilinguals' phonetic/phonological systems for the two languages develop independently or whether they influence each other, and what role language input plays in the formation of phonetic/phonological categories. Forty five-year-old children (16 monolingual German, 24 bilingual Italian-German) were tested in an oddball design expected to elicit a neural Mismatch Response (MMR). The stimuli were bilabial stop VOT contrasts with the short-lag stop, common to both languages, as the standard. Four deviant VOTs were selected: 92 ms and 36 ms lag for German; 112 ms and 36 ms voicing lead for Italian. Bilingual children's language background was assessed using a caregiver questionnaire. Italian-German bilingual 5-year-old children and German monolingual controls showed similar MMRs to German long-lag and Italian voicing lead VOT, except for the 36 ms long-lag deviant; this acoustically difficult distinction did not elicit a robust negative MMR in the bilingual children. The lack of a difference between the bilinguals and monolinguals for voicing lead suggests that the amount of input in Italian for the bilinguals was not sufficient to lead to an advantage compared to the monolingual German children. Alternatively, the finding could indicate that voicing lead is easier to discriminate than voicing lag.
Assuntos
Multilinguismo , Fonética , Humanos , Masculino , Feminino , Pré-Escolar , Alemanha , Itália , Percepção da Fala/fisiologia , Idioma , Criança , Desenvolvimento da Linguagem , Fala/fisiologiaRESUMO
Behavioral speech tasks have been widely used to understand the mechanisms of speech motor control in typical speakers as well as in various clinical populations. However, determining which neural functions differ between typical speakers and clinical populations based on behavioral data alone is difficult because multiple mechanisms may lead to the same behavioral differences. For example, individuals with cerebellar ataxia (CA) produce atypically large compensatory responses to pitch perturbations in their auditory feedback, compared to typical speakers, but this pattern could have many explanations. Here, computational modeling techniques were used to address this challenge. Bayesian inference was used to fit a state feedback control (SFC) model of voice fundamental frequency (fo) control to the behavioral pitch perturbation responses of speakers with CA and typical speakers. This fitting process resulted in estimates of posterior likelihood distributions for five model parameters (sensory feedback delays, absolute and relative levels of auditory and somatosensory feedback noise, and controller gain), which were compared between the two groups. Results suggest that the speakers with CA may proportionally weight auditory and somatosensory feedback differently from typical speakers. Specifically, the CA group showed a greater relative sensitivity to auditory feedback than the control group. There were also large group differences in the controller gain parameter, suggesting increased motor output responses to target errors in the CA group. These modeling results generate hypotheses about how CA may affect the speech motor system, which could help guide future empirical investigations in CA. This study also demonstrates the overall proof-of-principle of using this Bayesian inference approach to understand behavioral speech data in terms of interpretable parameters of speech motor control models.
Assuntos
Teorema de Bayes , Ataxia Cerebelar , Retroalimentação Sensorial , Humanos , Ataxia Cerebelar/fisiopatologia , Masculino , Retroalimentação Sensorial/fisiologia , Feminino , Pessoa de Meia-Idade , Adulto , Biologia Computacional , Fala/fisiologia , Simulação por ComputadorRESUMO
Human-Computer Interaction (HCI) is a multidisciplinary field focused on designing and utilizing computer technology, underlining the interaction interface between computers and humans. HCI aims to generate systems that allow consumers to relate to computers effectively, efficiently, and pleasantly. Multiple Spoken Language Identification (SLI) for HCI (MSLI for HCI) denotes the ability of a computer system to recognize and distinguish various spoken languages to enable more complete and handy interactions among consumers and technology. SLI utilizing deep learning (DL) involves using artificial neural networks (ANNs), a subset of DL models, to automatically detect and recognize the language spoken in an audio signal. DL techniques, particularly neural networks (NNs), have succeeded in various pattern detection tasks, including speech and language processing. This paper develops a novel Coot Optimizer Algorithm with a DL-Driven Multiple SLI and Detection (COADL-MSLID) technique for HCI applications. The COADL-MSLID approach aims to detect multiple spoken languages from the input audio regardless of gender, speaking style, and age. In the COADL-MSLID technique, the audio files are transformed into spectrogram images as a primary step. Besides, the COADL-MSLID technique employs the SqueezeNet model to produce feature vectors, and the COA is applied to the hyperparameter range of the SqueezeNet method. The COADL-MSLID technique exploits the SLID process's convolutional autoencoder (CAE) model. To underline the importance of the COADL-MSLID technique, a series of experiments were conducted on the benchmark dataset. The experimentation validation of the COADL-MSLID technique exhibits a greater accuracy result of 98.33% over other techniques.
Assuntos
Algoritmos , Aprendizado Profundo , Idioma , Redes Neurais de Computação , Humanos , Fala/fisiologia , Feminino , Masculino , Interface Usuário-ComputadorRESUMO
Within species, vocal and auditory systems presumably coevolved to converge on a critical temporal acoustic structure that can be best produced and perceived. While dogs cannot produce articulated sounds, they respond to speech, raising the question as to whether this heterospecific receptive ability could be shaped by exposure to speech or remains bounded by their own sensorimotor capacity. Using acoustic analyses of dog vocalisations, we show that their main production rhythm is slower than the dominant (syllabic) speech rate, and that human-dog-directed speech falls halfway in between. Comparative exploration of neural (electroencephalography) and behavioural responses to speech reveals that comprehension in dogs relies on a slower speech rhythm tracking (delta) than humans' (theta), even though dogs are equally sensitive to speech content and prosody. Thus, the dog audio-motor tuning differs from humans', and we hypothesise that humans may adjust their speech rate to this shared temporal channel as means to improve communication efficacy.
Assuntos
Fala , Vocalização Animal , Animais , Cães , Humanos , Vocalização Animal/fisiologia , Fala/fisiologia , Masculino , Feminino , Eletroencefalografia , Percepção Auditiva/fisiologia , Adulto , Interação Humano-Animal , Estimulação Acústica , Percepção da Fala/fisiologiaRESUMO
The pelvic floor responds to changes in trunk pressure, elevating during low-pressure exhale and descending during high-pressure exhale. Voicing occurs during exhalation, spanning low-to-high trunk-pressure, yet it is unknown how voicing affects the pelvic floor. The aim of this study was to quantify pelvic floor response to voicing and identify if there are differences for women with stress urinary incontinence. We hypothesized that shouting would cause pelvic floor descent, with greater magnitude for incontinent women. Sixty women (38 incontinent, 22 continent) performed four voicing tasks (counting to "4" in speaking/shouting/low-pitch/high-pitch voice) while transperineal ultrasound measured changes in pelvic floor morphology. ANOVA compared variance of responses to voicing and t-tests compared groups. Bladder neck height shortened, levator plate length increased and levator plate angle decreased more during shouting compared to speaking; consistent with pelvic floor straining. There were no differences for high versus low pitch-voicing and small group differences based on continence status. Voicing causes pelvic floor muscles to strain, with greater strain during shouting. Changing vocal pitch does not affect pelvic floor morphology and incontinent women had slight differences from continent women. Voicing may be a safe way to lengthen the pelvic floor without provoking incontinence.
Assuntos
Diafragma da Pelve , Pressão , Humanos , Feminino , Diafragma da Pelve/diagnóstico por imagem , Diafragma da Pelve/fisiologia , Diafragma da Pelve/fisiopatologia , Pessoa de Meia-Idade , Adulto , Estudos Transversais , Incontinência Urinária por Estresse/fisiopatologia , Incontinência Urinária por Estresse/diagnóstico por imagem , Idoso , Fala/fisiologiaRESUMO
Learning information may benefit from movement: Items that are spoken aloud are more accurately remembered than items that are silently read (the production effect). Candidate mechanisms for this phenomenon suggest that speaking may enrich or improve the feature content of memory traces, yet research suggests that prior language skill also plays a role. Recent work showed a larger production effect in bilinguals for words in their second language (L2) compared to their first language (L1), potentially suggesting that bilinguals engage different or additional linguistic features when speaking L2 compared to L1 words. The current study examined whether the increased L2 production effect reduces for L2 and L1 pseudowords, which may similarly engage mainly phonological features. German (L1)-English (L2) bilinguals first read (out loud or silently) and subsequently recognized German or English words or pseudowords following German or English phonology. The production effect increased for L2 compared to L1 items and for words compared to pseudowords. Modest evidence suggested L2-L1 similarity in production effect scores for pseudowords, but different L2-L1 scores for words. Integrating feature models of memory with models of bilingual language production, we propose that speaking an L2 may engage more extensive and diverse linguistic features than an L1.
Assuntos
Multilinguismo , Humanos , Feminino , Masculino , Adulto Jovem , Adulto , Leitura , Fala/fisiologia , Fonética , Memória/fisiologia , Aprendizagem Verbal/fisiologia , PsicolinguísticaRESUMO
This study investigates the role of morphology during speech planning in Mandarin Chinese. In a long-lag priming experiment, thirty-two Mandarin Chinese native speakers were asked to name target pictures (e.g., "" /shan1/ "mountain"). The design involved pictures referring to morpheme-related compound words (e.g., "" /shan1yang2/ "goat") sharing a morpheme with the first (e.g., "" /shan1/ "mountain") or the second position of the targets (e.g., /nao3/ "brain" with prime /dian4nao3/ "computer"), as well as unrelated control items. Behavioral and electrophysiological data were collected. Interestingly, the behavioral results went against earlier findings in Indo-European languages, showing that the target picture naming was not facilitated by morphologically related primes. This suggests no morphological priming for individual constituents in producing Mandarin Chinese disyllabic compound words. However, targets in the morpheme-related word condition did elicit a reduced N400 compared with targets in the morpheme-unrelated condition for the first position overlap in the ERP analyses but not for the second, suggesting automatic activation of the first individual constituent in noun compound production. Implications of these findings are discussed.
Assuntos
Idioma , Fala , Humanos , Masculino , Feminino , Adulto Jovem , Adulto , Fala/fisiologia , Eletroencefalografia , Potenciais Evocados/fisiologia , China , Tempo de Reação/fisiologia , População do Leste AsiáticoRESUMO
Humans excel at extracting structurally-determined meaning from speech despite inherent physical variability. This study explores the brain's ability to predict and understand spoken language robustly. It investigates the relationship between structural and statistical language knowledge in brain dynamics, focusing on phase and amplitude modulation. Using syntactic features from constituent hierarchies and surface statistics from a transformer model as predictors of forward encoding models, we reconstructed cross-frequency neural dynamics from MEG data during audiobook listening. Our findings challenge a strict separation of linguistic structure and statistics in the brain, with both aiding neural signal reconstruction. Syntactic features have a more temporally spread impact, and both word entropy and the number of closing syntactic constituents are linked to the phase-amplitude coupling of neural dynamics, implying a role in temporal prediction and cortical oscillation alignment during speech processing. Our results indicate that structured and statistical information jointly shape neural dynamics during spoken language comprehension and suggest an integration process via a cross-frequency coupling mechanism.
Assuntos
Encéfalo , Compreensão , Idioma , Magnetoencefalografia , Percepção da Fala , Humanos , Compreensão/fisiologia , Masculino , Percepção da Fala/fisiologia , Feminino , Adulto , Encéfalo/fisiologia , Adulto Jovem , Fala/fisiologiaRESUMO
A model synthesizing average frequency components from select sentences in an electromagnetic articulography database has been crafted. This revealed the dual roles of the tongue: its dorsum acts like a carrier wave, and the tip acts as a modulation signal within the articulatory realm. This model illuminates anticipatory coarticulation's subtleties during speech planning. It undergoes rigorous, two-stage optimization: statistical estimation and refinement to depict carryover and anticipation. The model's base, rooted in physiological insights, deciphers carryover targets while its upper layer captures anticipation. Optimization has pinpointed unique phonetic targets for each phoneme, providing deep insights into virtual target formation during speech planning. These simulations, aligning closely with empirical data and marked by a mere 0.18 cm average error, along with extensive listening tests attest to the model's accuracy and enhanced speech synthesis quality.
Assuntos
Fonética , Língua , Humanos , Língua/fisiologia , Masculino , Feminino , Acústica da Fala , Fala/fisiologia , Adulto , Percepção da Fala/fisiologia , Aprendizagem , Modelos BiológicosRESUMO
BACKGROUND: Remote objective tests may supplement in-clinic examination to better inform treatment decisions. Previous cross-sectional studies presented objective speech metrics as potential markers of Multiple Sclerosis (MS) disease progression. OBJECTIVE: To examine the short-term stability and long-term sensitivity of speech metrics to MS progression. METHODS: We prospectively recorded speech from people with MS at baseline, six, twelve weeks, and at ten months or longer after baseline (1y+). Only people with a definite diagnosis of MS and without other potential causes of dysarthria were included. Speech tasks comprehended 1) a sustained vowel /a/, 2) saying the days of the week, 3) repeating the non-word pa-ta-ka multiple times as fast as possible, 4) reading the Grandfather Passage, and 5) telling a personal story. We selected speech metrics of interest according to their association with MS presence, correlation with general disability, and short-term metric stability in the absence of disease progression. Selected speech metrics were analysed for short- versus long-term changes in the whole MS cohort and in the clinically stable versus progression subgroups at 1y+. RESULTS: Sixty-nine people with MS participated (76.8 % female, age mean 47.5 ± 11.1 SD, EDSS median 3.5, interquartile range 3.5). Twenty-six unique speech metrics satisfied the suitability criteria. On average, reading rate improved 3.5 % for all people with MS and 6.5 % for slow readers with MS from baseline to the six-week, driven by a reduction in pauses. At 1y+, participants showed a 3.1 % average reduction in vocalization time during the reading task, which was similar in the progression (n = 29) and non-progression (n = 40) groups and thus unrelated to disease progression. Both findings are in the opposite direction of what would be generally expected for deterioration in speech performance and might be attributable to familiarity and training effects. Other speech metrics showed either negligible change or a similar variability between short-term and long-term differences. CONCLUSION: Most individual long-term changes were small and within short-term variability intervals, irrespective of clinical disease progression. Familiarity and practice effects might have blunted the measurement of change. The present lack of longitudinal sensitivity of speech in MS contradicts previous cross-sectional findings and requires further investigation.
Assuntos
Progressão da Doença , Esclerose Múltipla , Humanos , Feminino , Masculino , Adulto , Pessoa de Meia-Idade , Estudos Longitudinais , Esclerose Múltipla/complicações , Esclerose Múltipla/fisiopatologia , Esclerose Múltipla/diagnóstico , Disartria/etiologia , Disartria/fisiopatologia , Disartria/diagnóstico , Fala/fisiologia , Estudos ProspectivosRESUMO
Humans perceive continuous speech signals as discrete sequences. To clarify the temporal segmentation window of speech information processing in the human auditory cortex, the relationship between speech perception and cortical responses was investigated using auditory evoked magnetic fields (AEFs). AEFs were measured while participants heard synthetic Japanese words /atataka/. There were eight types of /atataka/ with different speech rates. The durations of the words ranged from 75 to 600 ms. The results revealed a clear correlation between the AEFs and syllables. Specifically, when the durations of the words were between 375 and 600 ms, the evoked responses exhibited four clear responses from the superior temporal area, M100, that corresponded not only to the onset of speech but also to each group of consonant/vowel syllable units. The number of evoked M100 responses was correlated to the duration of the stimulus as well as the number of perceived syllables. The approximate range of the temporal segmentation window limit of speech perception was considered to be between 75 and 94 ms. This finding may contribute to optimizing the temporal performance of high-speed synthesized speech generation systems.
Assuntos
Córtex Auditivo , Potenciais Evocados Auditivos , Percepção da Fala , Humanos , Córtex Auditivo/fisiologia , Percepção da Fala/fisiologia , Masculino , Feminino , Potenciais Evocados Auditivos/fisiologia , Adulto , Adulto Jovem , Estimulação Acústica , Fala/fisiologiaRESUMO
Perceptual segregation of complex sounds such as speech and music simultaneously emanating from multiple sources is a remarkable ability that is common in humans and other animals alike. Unlike animal physiological experiments with simplified sounds or human investigations with spatially broad imaging techniques, this study combines insights from animal single-unit recordings with segregation of speech-like sound mixtures. Ferrets are trained to attend to a female voice and detect a target word, both in presence and absence of a concurrent equally salient male voice. Recordings are made in primary and secondary auditory cortical fields, and in frontal cortex. During task performance, representation of the female words becomes enhanced relative to the male in all, but especially in higher cortical regions. Analysis of the temporal and spectral response characteristics during task performance reveals how speech segregation gradually emerges in the auditory cortex. A computational model evaluated on the same voice mixtures replicates and extends these results to different attentional targets (attention to female or male voices). These findings underscore the role of the principle of temporal coherence whereby attention to a target voice binds together all neural responses coherently modulated with the target, thus ultimately forming and extracting a common auditory stream.
Assuntos
Estimulação Acústica , Córtex Auditivo , Furões , Animais , Furões/fisiologia , Córtex Auditivo/fisiologia , Feminino , Masculino , Percepção Auditiva/fisiologia , Percepção da Fala/fisiologia , Fala/fisiologia , Atenção/fisiologiaRESUMO
Research has shown that talkers reliably coordinate the timing of articulator movements across variation in production rate and syllable stress, and that this precision of inter-articulator timing instantiates phonetic structure in the resulting acoustic signal. We here tested the hypothesis that immediate auditory feedback helps regulate that consistent articulatory timing control. Talkers with normal hearing recorded 480 /tV#Cat/ utterances using electromagnetic articulography, with alternative V (/É/-/É/) and C (/t/-/d/), across variation in production rate (fast-normal) and stress (first syllable stressed-unstressed). Utterances were split between two listening conditions: unmasked and masked. To quantify the effect of immediate auditory feedback on the coordination between the jaw and tongue-tip, the timing of tongue-tip raising onset for C, relative to the jaw opening-closing cycle for V, was obtained in each listening condition. Across both listening conditions, any manipulation that shortened the jaw opening-closing cycle reduced the latency of tongue-tip movement onset, relative to the onset of jaw opening. Moreover, tongue-tip latencies were strongly affiliated with utterance type. During auditory masking, however, tongue-tip latencies were less strongly affiliated with utterance type, demonstrating that talkers use afferent auditory signals in real-time to regulate the precision of inter-articulator timing in service to phonetic structure.
Assuntos
Retroalimentação Sensorial , Fonética , Percepção da Fala , Língua , Humanos , Língua/fisiologia , Masculino , Feminino , Adulto , Retroalimentação Sensorial/fisiologia , Adulto Jovem , Percepção da Fala/fisiologia , Arcada Osseodentária/fisiologia , Acústica da Fala , Medida da Produção da Fala/métodos , Fatores de Tempo , Fala/fisiologia , Mascaramento PerceptivoRESUMO
Conversations encompass continuous exchanges of verbal and nonverbal information. Previous research has demonstrated that gestures dynamically entrain each other and that speakers tend to align their vocal properties. While gesture and speech are known to synchronize at the intrapersonal level, few studies have investigated the multimodal dynamics of gesture/speech between individuals. The present study aims to extend our comprehension of unimodal dynamics of speech and gesture to multimodal speech/gesture dynamics. We used an online dataset of 14 dyads engaged in unstructured conversation. Speech and gesture synchronization was measured with cross-wavelets at different timescales. Results supported previous research on intrapersonal speech/gesture coordination, finding synchronization at all timescales of the conversation. Extending the literature, we also found interpersonal synchronization between speech and gesture. Given that the unimodal and multimodal synchronization occurred at similar timescales, we suggest that synchronization likely depends on the vocal channel, particularly on the turn-taking dynamics of the conversation.