Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28.811
Filter
1.
F1000Res ; 13: 867, 2024.
Article in English | MEDLINE | ID: mdl-39310814

ABSTRACT

Background: There is an increasing interest in cross-linguistic influences of the second language (L2) on the first (L1), but its communicative impact remains to be elucidated. This study investigates how L2 learners' L1 pronunciation is perceived as foreign-accented and (in) comprehensible as a function of their L2 learning experience and proficiency levels. Methods: Read speech of 154 L1 Japanese learners of L2 English in the J-AESOP corpus was examined, where approximately one-third of them had lived in English-speaking countries and the rest had never lived outside of Japan. Their L1 speech was rated by another group of native Japanese listeners for accentedness and comprehensibility (from October 25, 2022 to August 20, 2023), while their L2 speech was previously rated by native American English listeners for nativelikeness or proficiency. The speakers' vowel acoustics were also examined. Results: More proficient L2 speakers were perceived as more foreign-accented in their L1, but only if they had lived overseas; their length of residence abroad predicted the degree of perceived accentedness. In contrast, more proficient L2 speakers were consistently perceived as more comprehensible in the L1, regardless of prior overseas experience. Acoustic analyses indicated that perceived accentedness is associated with a clockwise chain shift of all vowel categories in the vowel space. It was also revealed that the dispersion, rather than compactness, of vowel production contributed to perceived comprehensibility, although the degree of L1 vowel dispersion did not predict L2 proficiency. Conclusions: The overall results suggest two main conclusions. First, perceptible L1 foreign accent likely results from L1 disuse rather than L2 interference, thereby L1 pronunciation differs from native norms at a system-wide rather than category-specific level. Second, L2 learning has a positive influence on perceived L1 comprehensibility, rather than individuals with clearer and more comprehensible L1 speech being inherently better L2 learners.


Subject(s)
Learning , Multilingualism , Humans , Female , Male , Adult , Speech Perception , Comprehension , Language , Speech , Young Adult , Japan
2.
Article in English | MEDLINE | ID: mdl-39338053

ABSTRACT

In recent years, an increasing number of studies have begun to use conversational data in spontaneous speech to estimate cognitive function in older people. The providers of spontaneous speech with older people used to be physicians and licensed psychologists, but it is now possible to have conversations with fully automatic AI agents. However, it has not yet been clarified what differences exist in conversational communication with older people when the examiner is either a human or an AI agent. In this study, elderly people living in the community and attending a silver human resource center and a day service center were the subjects. Dialogues were conducted using generic interview items for estimating cognitive function through daily conversation, which were developed through research on estimation methods for cognitive function. From the data obtained from the dialogues, we compared the effects of human-AI interaction on the number of utterances, speaking time, and silence time. This study was conducted at a facility in Japan and included 32 subjects (12 males and 20 females). The results showed significant differences between human and AI dialogue in the number of utterances and silent time. This study suggests the effectiveness of AI in communication with older people and explores the possibility of using AI in social welfare.


Subject(s)
Speech , Humans , Female , Male , Aged , Aged, 80 and over , Cognition/drug effects , Artificial Intelligence , Communication , Japan
3.
Sci Rep ; 14(1): 20756, 2024 09 05.
Article in English | MEDLINE | ID: mdl-39237702

ABSTRACT

The basic function of the tongue in pronouncing diadochokinesis and other syllables is not fully understood. This study investigates the influence of sound pressure levels and syllables on tongue pressure and muscle activity in 19 healthy adults (mean age: 28.2 years; range: 22-33 years). Tongue pressure and activity of the posterior tongue were measured using electromyography (EMG) when the velar stops /ka/, /ko/, /ga/, and /go/ were pronounced at 70, 60, 50, and 40 dB. Spearman's rank correlation revealed a significant, yet weak, positive association between tongue pressure and EMG activity (ρ = 0.14, p < 0.05). Mixed-effects model analysis showed that tongue pressure and EMG activity significantly increased at 70 dB compared to other sound pressure levels. While syllables did not significantly affect tongue pressure, the syllable /ko/ significantly increased EMG activity (coefficient = 0.048, p = 0.013). Although no significant differences in tongue pressure were observed for the velar stops /ka/, /ko/, /ga/, and /go/, it is suggested that articulation is achieved by altering the activity of both extrinsic and intrinsic tongue muscles. These findings highlight the importance of considering both tongue pressure and muscle activity when examining the physiological factors contributing to sound pressure levels during speech.


Subject(s)
Electromyography , Pressure , Speech , Tongue , Humans , Tongue/physiology , Electromyography/methods , Adult , Male , Female , Young Adult , Speech/physiology , Phonetics
4.
Nat Commun ; 15(1): 7629, 2024 Sep 02.
Article in English | MEDLINE | ID: mdl-39223110

ABSTRACT

Recent advances in technology for hyper-realistic visual and audio effects provoke the concern that deepfake videos of political speeches will soon be indistinguishable from authentic video. We conduct 5 pre-registered randomized experiments with N = 2215 participants to evaluate how accurately humans distinguish real political speeches from fabrications across base rates of misinformation, audio sources, question framings with and without priming, and media modalities. We do not find base rates of misinformation have statistically significant effects on discernment. We find deepfakes with audio produced by the state-of-the-art text-to-speech algorithms are harder to discern than the same deepfakes with voice actor audio. Moreover across all experiments and question framings, we find audio and visual information enables more accurate discernment than text alone: human discernment relies more on how something is said, the audio-visual cues, than what is said, the speech content.


Subject(s)
Politics , Speech , Video Recording , Humans , Female , Male , Adult , Young Adult , Communication , Algorithms
5.
Hum Brain Mapp ; 45(13): e70023, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39268584

ABSTRACT

The relationship between speech production and perception is a topic of ongoing debate. Some argue that there is little interaction between the two, while others claim they share representations and processes. One perspective suggests increased recruitment of the speech motor system in demanding listening situations to facilitate perception. However, uncertainties persist regarding the specific regions involved and the listening conditions influencing its engagement. This study used activation likelihood estimation in coordinate-based meta-analyses to investigate the neural overlap between speech production and three speech perception conditions: speech-in-noise, spectrally degraded speech and linguistically complex speech. Neural overlap was observed in the left frontal, insular and temporal regions. Key nodes included the left frontal operculum (FOC), left posterior lateral part of the inferior frontal gyrus (IFG), left planum temporale (PT), and left pre-supplementary motor area (pre-SMA). The left IFG activation was consistently observed during linguistic processing, suggesting sensitivity to the linguistic content of speech. In comparison, the left pre-SMA activation was observed when processing degraded and noisy signals, indicating sensitivity to signal quality. Activations of the left PT and FOC activation were noted in all conditions, with the posterior FOC area overlapping in all conditions. Our meta-analysis reveals context-independent (FOC, PT) and context-dependent (pre-SMA, posterior lateral IFG) regions within the speech motor system during challenging speech perception. These regions could contribute to sensorimotor integration and executive cognitive control for perception and production.


Subject(s)
Speech Perception , Speech , Humans , Speech Perception/physiology , Speech/physiology , Brain Mapping , Likelihood Functions , Motor Cortex/physiology , Cerebral Cortex/physiology , Cerebral Cortex/diagnostic imaging
6.
Nat Commun ; 15(1): 7897, 2024 Sep 16.
Article in English | MEDLINE | ID: mdl-39284848

ABSTRACT

Historically, eloquent functions have been viewed as localized to focal areas of human cerebral cortex, while more recent studies suggest they are encoded by distributed networks. We examined the network properties of cortical sites defined by stimulation to be critical for speech and language, using electrocorticography from sixteen participants during word-reading. We discovered distinct network signatures for sites where stimulation caused speech arrest and language errors. Both demonstrated lower local and global connectivity, whereas sites causing language errors exhibited higher inter-community connectivity, identifying them as connectors between modules in the language network. We used machine learning to classify these site types with reasonably high accuracy, even across participants, suggesting that a site's pattern of connections within the task-activated language network helps determine its importance to function. These findings help to bridge the gap in our understanding of how focal cortical stimulation interacts with complex brain networks to elicit language deficits.


Subject(s)
Cerebral Cortex , Electrocorticography , Language , Speech , Humans , Male , Female , Cerebral Cortex/physiology , Adult , Speech/physiology , Nerve Net/physiology , Young Adult , Machine Learning , Brain Mapping
7.
Hum Brain Mapp ; 45(14): e70030, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39301700

ABSTRACT

Psychosis implicates changes across a broad range of cognitive functions. These functions are cortically organized in the form of a hierarchy ranging from primary sensorimotor (unimodal) to higher-order association cortices, which involve functions such as language (transmodal). Language has long been documented as undergoing structural changes in psychosis. We hypothesized that these changes as revealed in spontaneous speech patterns may act as readouts of alterations in the configuration of this unimodal-to-transmodal axis of cortical organization in psychosis. Results from 29 patients with first-episodic psychosis (FEP) and 29 controls scanned with 7 T resting-state fMRI confirmed a compression of the cortical hierarchy in FEP, which affected metrics of the hierarchical distance between the sensorimotor and default mode networks, and of the hierarchical organization within the semantic network. These organizational changes were predicted by graphs representing semantic and syntactic associations between meaningful units in speech produced during picture descriptions. These findings unite psychosis, language, and the cortical hierarchy in a single conceptual scheme, which helps to situate language within the neurocognition of psychosis and opens the clinical prospect for mental dysfunction to become computationally measurable in spontaneous speech.


Subject(s)
Magnetic Resonance Imaging , Psychotic Disorders , Speech , Humans , Psychotic Disorders/diagnostic imaging , Psychotic Disorders/physiopathology , Psychotic Disorders/pathology , Male , Adult , Female , Speech/physiology , Young Adult , Nerve Net/diagnostic imaging , Nerve Net/physiopathology , Nerve Net/pathology , Cerebral Cortex/diagnostic imaging , Cerebral Cortex/physiopathology , Default Mode Network/diagnostic imaging , Default Mode Network/physiopathology
8.
Sensors (Basel) ; 24(17)2024 Aug 25.
Article in English | MEDLINE | ID: mdl-39275417

ABSTRACT

Speech emotion recognition (SER) is not only a ubiquitous aspect of everyday communication, but also a central focus in the field of human-computer interaction. However, SER faces several challenges, including difficulties in detecting subtle emotional nuances and the complicated task of recognizing speech emotions in noisy environments. To effectively address these challenges, we introduce a Transformer-based model called MelTrans, which is designed to distill critical clues from speech data by learning core features and long-range dependencies. At the heart of our approach is a dual-stream framework. Using the Transformer architecture as its foundation, MelTrans deciphers broad dependencies within speech mel-spectrograms, facilitating a nuanced understanding of emotional cues embedded in speech signals. Comprehensive experimental evaluations on the EmoDB (92.52%) and IEMOCAP (76.54%) datasets demonstrate the effectiveness of MelTrans. These results highlight MelTrans's ability to capture critical cues and long-range dependencies in speech data, setting a new benchmark within the context of these specific datasets. These results highlight the effectiveness of the proposed model in addressing the complex challenges posed by SER tasks.


Subject(s)
Emotions , Speech , Humans , Emotions/physiology , Speech/physiology , Algorithms , Speech Recognition Software
9.
Sensors (Basel) ; 24(17)2024 Aug 26.
Article in English | MEDLINE | ID: mdl-39275431

ABSTRACT

Advancements in deep learning speech representations have facilitated the effective use of extensive unlabeled speech datasets for Parkinson's disease (PD) modeling with minimal annotated data. This study employs the non-fine-tuned wav2vec 1.0 architecture to develop machine learning models for PD speech diagnosis tasks, such as cross-database classification and regression to predict demographic and articulation characteristics. The primary aim is to analyze overlapping components within the embeddings on both classification and regression tasks, investigating whether latent speech representations in PD are shared across models, particularly for related tasks. Firstly, evaluation using three multi-language PD datasets showed that wav2vec accurately detected PD based on speech, outperforming feature extraction using mel-frequency cepstral coefficients in the proposed cross-database classification scenarios. In cross-database scenarios using Italian and English-read texts, wav2vec demonstrated performance comparable to intra-dataset evaluations. We also compared our cross-database findings against those of other related studies. Secondly, wav2vec proved effective in regression, modeling various quantitative speech characteristics related to articulation and aging. Ultimately, subsequent analysis of important features examined the presence of significant overlaps between classification and regression models. The feature importance experiments discovered shared features across trained models, with increased sharing for related tasks, further suggesting that wav2vec contributes to improved generalizability. The study proposes wav2vec embeddings as a next promising step toward a speech-based universal model to assist in the evaluation of PD.


Subject(s)
Databases, Factual , Parkinson Disease , Speech , Parkinson Disease/physiopathology , Humans , Speech/physiology , Deep Learning , Male , Female , Aged , Machine Learning , Middle Aged
10.
Sensors (Basel) ; 24(17)2024 Sep 02.
Article in English | MEDLINE | ID: mdl-39275615

ABSTRACT

Speech emotion recognition is key to many fields, including human-computer interaction, healthcare, and intelligent assistance. While acoustic features extracted from human speech are essential for this task, not all of them contribute to emotion recognition effectively. Thus, reduced numbers of features are required within successful emotion recognition models. This work aimed to investigate whether splitting the features into two subsets based on their distribution and then applying commonly used feature reduction methods would impact accuracy. Filter reduction was employed using the Kruskal-Wallis test, followed by principal component analysis (PCA) and independent component analysis (ICA). A set of features was investigated to determine whether the indiscriminate use of parametric feature reduction techniques affects the accuracy of emotion recognition. For this investigation, data from three databases-Berlin EmoDB, SAVEE, and RAVDES-were organized into subsets according to their distribution in applying both PCA and ICA. The results showed a reduction from 6373 features to 170 for the Berlin EmoDB database with an accuracy of 84.3%; a final size of 130 features for SAVEE, with a corresponding accuracy of 75.4%; and 150 features for RAVDESS, with an accuracy of 59.9%.


Subject(s)
Emotions , Principal Component Analysis , Speech , Humans , Emotions/physiology , Speech/physiology , Databases, Factual , Algorithms , Pattern Recognition, Automated/methods
11.
Sensors (Basel) ; 24(17)2024 Sep 06.
Article in English | MEDLINE | ID: mdl-39275707

ABSTRACT

Emotion recognition through speech is a technique employed in various scenarios of Human-Computer Interaction (HCI). Existing approaches have achieved significant results; however, limitations persist, with the quantity and diversity of data being more notable when deep learning techniques are used. The lack of a standard in feature selection leads to continuous development and experimentation. Choosing and designing the appropriate network architecture constitutes another challenge. This study addresses the challenge of recognizing emotions in the human voice using deep learning techniques, proposing a comprehensive approach, and developing preprocessing and feature selection stages while constructing a dataset called EmoDSc as a result of combining several available databases. The synergy between spectral features and spectrogram images is investigated. Independently, the weighted accuracy obtained using only spectral features was 89%, while using only spectrogram images, the weighted accuracy reached 90%. These results, although surpassing previous research, highlight the strengths and limitations when operating in isolation. Based on this exploration, a neural network architecture composed of a CNN1D, a CNN2D, and an MLP that fuses spectral features and spectogram images is proposed. The model, supported by the unified dataset EmoDSc, demonstrates a remarkable accuracy of 96%.


Subject(s)
Deep Learning , Emotions , Neural Networks, Computer , Humans , Emotions/physiology , Speech/physiology , Databases, Factual , Algorithms , Pattern Recognition, Automated/methods
12.
Sci Justice ; 64(5): 485-497, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39277331

ABSTRACT

Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (Cllr) and equal error rate (EER). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.


Subject(s)
Forensic Sciences , Humans , Forensic Sciences/methods , Likelihood Functions , Linguistics , Support Vector Machine , Speech Acoustics , Algorithms , Speech
13.
PLoS One ; 19(9): e0309831, 2024.
Article in English | MEDLINE | ID: mdl-39321138

ABSTRACT

Conversations encompass continuous exchanges of verbal and nonverbal information. Previous research has demonstrated that gestures dynamically entrain each other and that speakers tend to align their vocal properties. While gesture and speech are known to synchronize at the intrapersonal level, few studies have investigated the multimodal dynamics of gesture/speech between individuals. The present study aims to extend our comprehension of unimodal dynamics of speech and gesture to multimodal speech/gesture dynamics. We used an online dataset of 14 dyads engaged in unstructured conversation. Speech and gesture synchronization was measured with cross-wavelets at different timescales. Results supported previous research on intrapersonal speech/gesture coordination, finding synchronization at all timescales of the conversation. Extending the literature, we also found interpersonal synchronization between speech and gesture. Given that the unimodal and multimodal synchronization occurred at similar timescales, we suggest that synchronization likely depends on the vocal channel, particularly on the turn-taking dynamics of the conversation.


Subject(s)
Cues , Gestures , Speech , Humans , Female , Male , Adult , Speech/physiology , Nonverbal Communication/psychology , Young Adult , Verbal Behavior/physiology , Interpersonal Relations , Communication
14.
J Speech Lang Hear Res ; 67(9): 2964-2976, 2024 Sep 12.
Article in English | MEDLINE | ID: mdl-39265154

ABSTRACT

INTRODUCTION: Transcribing disordered speech can be useful when diagnosing motor speech disorders such as primary progressive apraxia of speech (PPAOS), who have sound additions, deletions, and substitutions, or distortions and/or slow, segmented speech. Since transcribing speech can be a laborious process and requires an experienced listener, using automatic speech recognition (ASR) systems for diagnosis and treatment monitoring is appealing. This study evaluated the efficacy of a readily available ASR system (wav2vec 2.0) in transcribing speech of PPAOS patients to determine if the word error rate (WER) output by the ASR can differentiate between healthy speech and PPAOS and/or among its subtypes, whether WER correlates with AOS severity, and how the ASR's errors compare to those noted in manual transcriptions. METHOD: Forty-five patients with PPAOS and 22 healthy controls were recorded repeating 13 words, 3 times each, which were transcribed manually and using wav2vec 2.0. The WER and phonetic and prosodic speech errors were compared between groups, and ASR results were compared against manual transcriptions. RESULTS: Mean overall WER was 0.88 for patients and 0.33 for controls. WER significantly correlated with AOS severity and accurately distinguished between patients and controls but not between AOS subtypes. The phonetic and prosodic errors from the ASR transcriptions were also unable to distinguish between subtypes, whereas errors calculated from human transcriptions were. There was poor agreement in the number of phonetic and prosodic errors between the ASR and human transcriptions. CONCLUSIONS: This study demonstrates that ASR can be useful in differentiating healthy from disordered speech and evaluating PPAOS severity but does not distinguish PPAOS subtypes. ASR transcriptions showed weak agreement with human transcriptions; thus, ASR may be a useful tool for the transcription of speech in PPAOS, but the research questions posed must be carefully considered within the context of its limitations. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.26359417.


Subject(s)
Speech Recognition Software , Humans , Male , Female , Aged , Middle Aged , Speech/physiology , Apraxias/diagnosis , Speech Production Measurement/methods , Phonetics , Aphasia, Primary Progressive/diagnosis , Case-Control Studies
15.
Brain Lang ; 256: 105463, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39243486

ABSTRACT

We investigated how neural oscillations code the hierarchical nature of stress rhythms in speech and how stress processing varies with language experience. By measuring phase synchrony of multilevel EEG-acoustic tracking and intra-brain cross-frequency coupling, we show the encoding of stress involves different neural signatures (delta rhythms = stress foot rate; theta rhythms = syllable rate), is stronger for amplitude vs. duration stress cues, and induces nested delta-theta coherence mirroring the stress-syllable hierarchy in speech. Only native English, but not Mandarin, speakers exhibited enhanced neural entrainment at central stress (2 Hz) and syllable (4 Hz) rates intrinsic to natural English. English individuals with superior cortical-stress tracking capabilities also displayed stronger neural hierarchical coherence, highlighting a nuanced interplay between internal nesting of brain rhythms and external entrainment rooted in language-specific speech rhythms. Our cross-language findings reveal brain-speech synchronization is not purely a "bottom-up" but benefits from "top-down" processing from listeners' language-specific experience.


Subject(s)
Speech Perception , Humans , Female , Male , Speech Perception/physiology , Adult , Electroencephalography , Brain/physiology , Young Adult , Speech/physiology , Language , Acoustic Stimulation
16.
Turk J Med Sci ; 54(4): 700-709, 2024.
Article in English | MEDLINE | ID: mdl-39295620

ABSTRACT

Background/aim: Individuals with multiple sclerosis (MS) may experience various speech-related issues, including decreased speech rate, increased pauses, and changes in speech rhythms. The purpose of this study was to compare the volumes of speech-related neuroanatomical structures in MS patients with those in a control group. Materials and methods: The research was conducted in the Neurology and Radiology Departments of Malatya Training and Research Hospital. The records of patients who presented to the Neurology Department between 2019 and 2022 were examined. The study included the magnetic resonance imaging (MRI) findings of 100 individuals, with 50 in the control group and 50 patients with MS, who had applied to the hospital in the specified years. VolBrain is a free system that works automatically over the internet (http://volbrain.upv.es/), enabling the measurement of brain volumes without human interaction. The acquired images were analyzed using the VolBrain program. Results: As a result of our research, a significant decrease was found in the volume of 18 of 26 speech-related regions in MS patients. It was determined that whole brain volumes decreased in the MS group compared to the control group. Conclusion: In our study, volume measurements of more speech-related areas were performed, unlike the few related studies previously conducted. We observed significant atrophy findings in the speech-related areas of the frontal, temporal, and parietal lobes of MS patients.


Subject(s)
Brain , Magnetic Resonance Imaging , Multiple Sclerosis , Humans , Multiple Sclerosis/pathology , Multiple Sclerosis/complications , Multiple Sclerosis/diagnostic imaging , Male , Female , Adult , Brain/pathology , Brain/diagnostic imaging , Middle Aged , Speech/physiology , Atrophy/pathology , Speech Disorders/etiology , Speech Disorders/pathology , Speech Disorders/diagnostic imaging , Organ Size
17.
PLoS One ; 19(9): e0307158, 2024.
Article in English | MEDLINE | ID: mdl-39292701

ABSTRACT

This study aimed to investigate integration of alternating speech, a stimulus which classically produces a V-shaped speech intelligibility function with minimum at 2-6 Hz in typical-hearing (TH) listeners. We further studied how degraded speech impacts intelligibility across alternating rates (2, 4, 8, and 32 Hz) using vocoded speech, either in the right ear or bilaterally, to simulate single-sided deafness with a cochlear implant (SSD-CI) and bilateral CIs (BiCI), respectively. To assess potential cortical signatures of across-ear integration, we recorded activity in the bilateral auditory cortices (AC) and dorsolateral prefrontal cortices (DLPFC) during the task using functional near-infrared spectroscopy (fNIRS). For speech intelligibility, the V-shaped function was reproduced only in the BiCI condition; TH (with ceiling scores) and SSD-CI conditions had significantly higher scores across all alternating rates compared to the BiCI condition. For fNIRS, the AC and DLPFC exhibited significantly different activity across alternating rates in the TH condition, with altered activity patterns in both regions in the SSD-CI and BiCI conditions. Our results suggest that degraded speech inputs in one or both ears impact across-ear integration and that different listening strategies were employed for speech integration manifested as differences in cortical activity across conditions.


Subject(s)
Auditory Cortex , Cochlear Implants , Spectroscopy, Near-Infrared , Speech Perception , Humans , Spectroscopy, Near-Infrared/methods , Male , Female , Adult , Speech Perception/physiology , Auditory Cortex/physiology , Auditory Cortex/diagnostic imaging , Young Adult , Speech Intelligibility/physiology , Acoustic Stimulation , Dorsolateral Prefrontal Cortex/physiology , Deafness/physiopathology , Speech/physiology
18.
J Int Med Res ; 52(9): 3000605241265338, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39291423

ABSTRACT

Functional MRI (fMRI) is gaining importance in the preoperative assessment of language for presurgical planning. However, inconsistencies with the Wada test might arise. This current case report describes a very rare case of an epileptic patient who exhibited bilateral distribution (right > left) in the inferior frontal gyrus (laterality index [LI] = -0.433) and completely right dominance in the superior temporal gyrus (LI = -1). However, the Wada test revealed a dissociation: his motor speech was located in the left hemisphere, while he could understand vocal instructions with his right hemisphere. A clinical implication is that the LIs obtained by fMRI should be cautiously used to determine Broca's area in atypical patients; for example, even when complete right dominance is found in the temporal cortex in right-handed patients. Theoretically, as the spatially separated functions of motor speech and language comprehension (by the combined results of fMRI and Wada) can be further temporally separated (by the intracarotid amobarbital procedure) in this case report, these findings might provide direct support to Broca's initial conclusions that Broca's area is associated with acquired motor speech impairment, but not language comprehension per se. Moreover, this current finding supports the idea that once produced, motor speech can be independent from language comprehension.


Subject(s)
Functional Laterality , Language , Magnetic Resonance Imaging , Humans , Magnetic Resonance Imaging/methods , Male , Broca Area/diagnostic imaging , Broca Area/physiopathology , Adult , Temporal Lobe/diagnostic imaging , Temporal Lobe/physiopathology , Brain Mapping/methods , Epilepsy/diagnostic imaging , Epilepsy/surgery , Epilepsy/physiopathology , Epilepsy/diagnosis , Speech/physiology
19.
J Acoust Soc Am ; 156(3): 1850-1861, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-39287467

ABSTRACT

Research has shown that talkers reliably coordinate the timing of articulator movements across variation in production rate and syllable stress, and that this precision of inter-articulator timing instantiates phonetic structure in the resulting acoustic signal. We here tested the hypothesis that immediate auditory feedback helps regulate that consistent articulatory timing control. Talkers with normal hearing recorded 480 /tV#Cat/ utterances using electromagnetic articulography, with alternative V (/ɑ/-/ɛ/) and C (/t/-/d/), across variation in production rate (fast-normal) and stress (first syllable stressed-unstressed). Utterances were split between two listening conditions: unmasked and masked. To quantify the effect of immediate auditory feedback on the coordination between the jaw and tongue-tip, the timing of tongue-tip raising onset for C, relative to the jaw opening-closing cycle for V, was obtained in each listening condition. Across both listening conditions, any manipulation that shortened the jaw opening-closing cycle reduced the latency of tongue-tip movement onset, relative to the onset of jaw opening. Moreover, tongue-tip latencies were strongly affiliated with utterance type. During auditory masking, however, tongue-tip latencies were less strongly affiliated with utterance type, demonstrating that talkers use afferent auditory signals in real-time to regulate the precision of inter-articulator timing in service to phonetic structure.


Subject(s)
Feedback, Sensory , Phonetics , Speech Perception , Tongue , Humans , Tongue/physiology , Male , Female , Adult , Feedback, Sensory/physiology , Young Adult , Speech Perception/physiology , Jaw/physiology , Speech Acoustics , Speech Production Measurement/methods , Time Factors , Speech/physiology , Perceptual Masking
20.
Cogn Sci ; 48(9): e13495, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39283264

ABSTRACT

Causation is a core feature of human cognition and language. How children learn about intricate causal meanings is yet unresolved. Here, we focus on how children learn verbs that express causation. Such verbs, known as lexical causatives (e.g., break and raise), lack explicit morphosyntactic markers indicating causation, thus requiring that the child generalizes the causal meaning from the context. The language addressed to children presumably plays a crucial role in this learning process. Hence, we tested whether adults adapt their use of lexical causatives to children when talking to them in day-to-day interactions. We analyzed naturalistic longitudinal data from 12 children in the Manchester corpus (spanning from 20 to 36 months of age). To detect semantic generalization, we employed a network approach with semantics learned from cross-situational contexts. Our results show an increasing trend in the expansion of causative semantics, observable in both child speech and child-directed speech. Adults consistently maintain somewhat more intricate causative semantic networks compared to children. However, both groups display evolving patterns. Around 28-30 months of age, children undergo a reduction in the degree of causative generalization, followed by a slightly time-lagged adjustment by adults in their speech directed to children. These findings substantiate adults' adaptation in child-directed speech, extending to semantics. They highlight child-directed speech as a highly adaptive and subconscious teaching tool that facilitates the dynamic processes of language acquisition.


Subject(s)
Language Development , Semantics , Speech , Humans , Child, Preschool , Adult , Male , Female , Infant , Learning , Longitudinal Studies , Language , Child Language
SELECTION OF CITATIONS
SEARCH DETAIL