Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28.807
Filter
1.
Cogn Sci ; 48(9): e13495, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39283264

ABSTRACT

Causation is a core feature of human cognition and language. How children learn about intricate causal meanings is yet unresolved. Here, we focus on how children learn verbs that express causation. Such verbs, known as lexical causatives (e.g., break and raise), lack explicit morphosyntactic markers indicating causation, thus requiring that the child generalizes the causal meaning from the context. The language addressed to children presumably plays a crucial role in this learning process. Hence, we tested whether adults adapt their use of lexical causatives to children when talking to them in day-to-day interactions. We analyzed naturalistic longitudinal data from 12 children in the Manchester corpus (spanning from 20 to 36 months of age). To detect semantic generalization, we employed a network approach with semantics learned from cross-situational contexts. Our results show an increasing trend in the expansion of causative semantics, observable in both child speech and child-directed speech. Adults consistently maintain somewhat more intricate causative semantic networks compared to children. However, both groups display evolving patterns. Around 28-30 months of age, children undergo a reduction in the degree of causative generalization, followed by a slightly time-lagged adjustment by adults in their speech directed to children. These findings substantiate adults' adaptation in child-directed speech, extending to semantics. They highlight child-directed speech as a highly adaptive and subconscious teaching tool that facilitates the dynamic processes of language acquisition.


Subject(s)
Language Development , Semantics , Speech , Humans , Child, Preschool , Adult , Male , Female , Infant , Learning , Longitudinal Studies , Language , Child Language
2.
Acta Neurochir (Wien) ; 166(1): 369, 2024 Sep 16.
Article in English | MEDLINE | ID: mdl-39283500

ABSTRACT

BACKGROUND: Speech changes significantly impact the quality of life for Parkinson's disease (PD) patients. Deep Brain Stimulation (DBS) of the Subthalamic Nucleus (STN) is a standard treatment for advanced PD, but its effects on speech remain unclear. This study aimed to investigate the relationship between STN-DBS and speech changes in PD patients using comprehensive clinical assessments and tractography. METHODS: Forty-seven PD patients underwent STN-DBS, with preoperative and 3-month postoperative assessments. Speech analyses included acoustic measurements, auditory-perceptual evaluations, and fluency-intelligibility tests. On the other hand, structures within the volume tissue activated (VTA) were identified using MRI and DTI. The clinical and demographic data and structures associated with VTA (Corticospinal tract, Internal capsule, Dentato-rubro-thalamic tract, Medial forebrain bundle, Medial lemniscus, Substantia nigra, Red nucleus) were compared with speech analyses. RESULTS: The majority of patients (36.2-55.4% good, 29.7-53.1% same) exhibited either improved or unchanged speech quality following STN-DBS. Only a small percentage (8.5-14.9%) experienced deterioration. Older patients and those with worsened motor symptoms postoperatively were more likely to experience negative speech changes (p < 0.05). Interestingly, stimulation of the right Substantia Nigra correlated with improved speech quality (p < 0.05). No significant relationship was found between other structures affected by VTA and speech changes. CONCLUSIONS: This study suggests that STN-DBS does not predominantly negatively impact speech in PD patients, with potential benefits observed, especially in younger patients. These findings underscore the importance of individualized treatment approaches and highlight the need for further long-term studies to optimize therapeutic outcomes and better understand the effects of STN-DBS on speech.


Subject(s)
Deep Brain Stimulation , Diffusion Tensor Imaging , Parkinson Disease , Speech , Subthalamic Nucleus , Humans , Subthalamic Nucleus/diagnostic imaging , Subthalamic Nucleus/surgery , Deep Brain Stimulation/methods , Male , Female , Middle Aged , Parkinson Disease/therapy , Parkinson Disease/diagnostic imaging , Aged , Diffusion Tensor Imaging/methods , Prospective Studies , Speech/physiology , Speech Disorders/etiology , Treatment Outcome , Adult
3.
Turk J Med Sci ; 54(4): 700-709, 2024.
Article in English | MEDLINE | ID: mdl-39295620

ABSTRACT

Background/aim: Individuals with multiple sclerosis (MS) may experience various speech-related issues, including decreased speech rate, increased pauses, and changes in speech rhythms. The purpose of this study was to compare the volumes of speech-related neuroanatomical structures in MS patients with those in a control group. Materials and methods: The research was conducted in the Neurology and Radiology Departments of Malatya Training and Research Hospital. The records of patients who presented to the Neurology Department between 2019 and 2022 were examined. The study included the magnetic resonance imaging (MRI) findings of 100 individuals, with 50 in the control group and 50 patients with MS, who had applied to the hospital in the specified years. VolBrain is a free system that works automatically over the internet (http://volbrain.upv.es/), enabling the measurement of brain volumes without human interaction. The acquired images were analyzed using the VolBrain program. Results: As a result of our research, a significant decrease was found in the volume of 18 of 26 speech-related regions in MS patients. It was determined that whole brain volumes decreased in the MS group compared to the control group. Conclusion: In our study, volume measurements of more speech-related areas were performed, unlike the few related studies previously conducted. We observed significant atrophy findings in the speech-related areas of the frontal, temporal, and parietal lobes of MS patients.


Subject(s)
Brain , Magnetic Resonance Imaging , Multiple Sclerosis , Humans , Multiple Sclerosis/pathology , Multiple Sclerosis/complications , Multiple Sclerosis/diagnostic imaging , Male , Female , Adult , Brain/pathology , Brain/diagnostic imaging , Middle Aged , Speech/physiology , Atrophy/pathology , Speech Disorders/etiology , Speech Disorders/pathology , Speech Disorders/diagnostic imaging , Organ Size
4.
Codas ; 36(5): e20230194, 2024.
Article in English | MEDLINE | ID: mdl-39230179

ABSTRACT

PURPOSE: To describe the effects of subthalamic nucleus deep brain stimulation (STN-DBS) on the speech of Spanish-speaking Parkinson's disease (PD) patients during the first year of treatment. METHODS: The speech measures (SMs): maximum phonation time, acoustic voice measures, speech rate, speech intelligibility measures, and oral diadochokinesis rates of nine Colombian idiopathic PD patients (four females and five males; age = 63 ± 7 years; years of PD = 10 ± 7 years; UPDRS-III = 57 ± 6; H&Y = 2 ± 0.3) were studied in OFF and ON medication states before and every three months during the first year after STN-DBS surgery. Praat software and healthy native listeners' ratings were used for speech analysis. Statistical analysis tried to find significant differences in the SMs during follow-up (Friedman test) and between medication states (Wilcoxon paired test). Also, a pre-surgery variation interval (PSVI) of reference for every participant and SM was calculated to make an individual analysis of post-surgery variation. RESULTS: Non-significative post-surgery or medication state-related differences in the SMs were found. Nevertheless, individually, based on PSVIs, the SMs exhibited: no variation, inconsistent or consistent variation during post-surgery follow-up in different combinations, depending on the medication state. CONCLUSION: As a group, participants did not have a shared post-surgery pattern of change in any SM. Instead, based on PSVIs, the SMs varied differently in every participant, which suggests that in Spanish-speaking PD patients, the effects of STN-DBS on speech during the first year of treatment could be highly variable.


Subject(s)
Deep Brain Stimulation , Parkinson Disease , Subthalamic Nucleus , Humans , Parkinson Disease/therapy , Parkinson Disease/physiopathology , Male , Female , Middle Aged , Aged , Speech Intelligibility/physiology , Language , Speech Disorders/etiology , Speech Disorders/therapy , Speech/physiology , Speech Production Measurement , Treatment Outcome
5.
PLoS One ; 19(9): e0307158, 2024.
Article in English | MEDLINE | ID: mdl-39292701

ABSTRACT

This study aimed to investigate integration of alternating speech, a stimulus which classically produces a V-shaped speech intelligibility function with minimum at 2-6 Hz in typical-hearing (TH) listeners. We further studied how degraded speech impacts intelligibility across alternating rates (2, 4, 8, and 32 Hz) using vocoded speech, either in the right ear or bilaterally, to simulate single-sided deafness with a cochlear implant (SSD-CI) and bilateral CIs (BiCI), respectively. To assess potential cortical signatures of across-ear integration, we recorded activity in the bilateral auditory cortices (AC) and dorsolateral prefrontal cortices (DLPFC) during the task using functional near-infrared spectroscopy (fNIRS). For speech intelligibility, the V-shaped function was reproduced only in the BiCI condition; TH (with ceiling scores) and SSD-CI conditions had significantly higher scores across all alternating rates compared to the BiCI condition. For fNIRS, the AC and DLPFC exhibited significantly different activity across alternating rates in the TH condition, with altered activity patterns in both regions in the SSD-CI and BiCI conditions. Our results suggest that degraded speech inputs in one or both ears impact across-ear integration and that different listening strategies were employed for speech integration manifested as differences in cortical activity across conditions.


Subject(s)
Auditory Cortex , Cochlear Implants , Spectroscopy, Near-Infrared , Speech Perception , Humans , Spectroscopy, Near-Infrared/methods , Male , Female , Adult , Speech Perception/physiology , Auditory Cortex/physiology , Auditory Cortex/diagnostic imaging , Young Adult , Speech Intelligibility/physiology , Acoustic Stimulation , Dorsolateral Prefrontal Cortex/physiology , Deafness/physiopathology , Speech/physiology
6.
J Int Med Res ; 52(9): 3000605241265338, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39291423

ABSTRACT

Functional MRI (fMRI) is gaining importance in the preoperative assessment of language for presurgical planning. However, inconsistencies with the Wada test might arise. This current case report describes a very rare case of an epileptic patient who exhibited bilateral distribution (right > left) in the inferior frontal gyrus (laterality index [LI] = -0.433) and completely right dominance in the superior temporal gyrus (LI = -1). However, the Wada test revealed a dissociation: his motor speech was located in the left hemisphere, while he could understand vocal instructions with his right hemisphere. A clinical implication is that the LIs obtained by fMRI should be cautiously used to determine Broca's area in atypical patients; for example, even when complete right dominance is found in the temporal cortex in right-handed patients. Theoretically, as the spatially separated functions of motor speech and language comprehension (by the combined results of fMRI and Wada) can be further temporally separated (by the intracarotid amobarbital procedure) in this case report, these findings might provide direct support to Broca's initial conclusions that Broca's area is associated with acquired motor speech impairment, but not language comprehension per se. Moreover, this current finding supports the idea that once produced, motor speech can be independent from language comprehension.


Subject(s)
Functional Laterality , Language , Magnetic Resonance Imaging , Humans , Magnetic Resonance Imaging/methods , Male , Broca Area/diagnostic imaging , Broca Area/physiopathology , Adult , Temporal Lobe/diagnostic imaging , Temporal Lobe/physiopathology , Brain Mapping/methods , Epilepsy/diagnostic imaging , Epilepsy/surgery , Epilepsy/physiopathology , Epilepsy/diagnosis , Speech/physiology
7.
Nat Commun ; 15(1): 7897, 2024 Sep 16.
Article in English | MEDLINE | ID: mdl-39284848

ABSTRACT

Historically, eloquent functions have been viewed as localized to focal areas of human cerebral cortex, while more recent studies suggest they are encoded by distributed networks. We examined the network properties of cortical sites defined by stimulation to be critical for speech and language, using electrocorticography from sixteen participants during word-reading. We discovered distinct network signatures for sites where stimulation caused speech arrest and language errors. Both demonstrated lower local and global connectivity, whereas sites causing language errors exhibited higher inter-community connectivity, identifying them as connectors between modules in the language network. We used machine learning to classify these site types with reasonably high accuracy, even across participants, suggesting that a site's pattern of connections within the task-activated language network helps determine its importance to function. These findings help to bridge the gap in our understanding of how focal cortical stimulation interacts with complex brain networks to elicit language deficits.


Subject(s)
Cerebral Cortex , Electrocorticography , Language , Speech , Humans , Male , Female , Cerebral Cortex/physiology , Adult , Speech/physiology , Nerve Net/physiology , Young Adult , Machine Learning , Brain Mapping
8.
Article in English | MEDLINE | ID: mdl-39255187

ABSTRACT

OBJECTIVE: Speech brain-computer interfaces (speech BCIs), which convert brain signals into spoken words or sentences, have demonstrated great potential for high-performance BCI communication. Phonemes are the basic pronunciation units. For monosyllabic languages such as Chinese Mandarin, where a word usually contains less than three phonemes, accurate decoding of phonemes plays a vital role. We found that in the neural representation space, phonemes with similar pronunciations are often inseparable, leading to confusion in phoneme classification. METHODS: We mapped the neural signals of phoneme pronunciation into a hyperbolic space for a more distinct phoneme representation. Critically, we proposed a hyperbolic hierarchical clustering approach to specifically learn a phoneme-level structure to guide the representation. RESULTS: We found such representation facilitated greater distance between similar phonemes, effectively reducing confusion. In the phoneme decoding task, our approach demonstrated an average accuracy of 75.21% for 21 phonemes and outperformed existing methods across different experimental days. CONCLUSION: Our approach showed high accuracy in phoneme classification. By learning the phoneme-level neural structure, the representations of neural signals were more discriminative and interpretable. SIGNIFICANCE: Our approach can potentially facilitate high-performance speech BCIs for Chinese and other monosyllabic languages.


Subject(s)
Algorithms , Brain-Computer Interfaces , Electroencephalography , Neural Networks, Computer , Humans , Electroencephalography/methods , Male , Female , Young Adult , Speech/physiology , Adult , Phonetics , Cluster Analysis , Language
9.
Sci Justice ; 64(5): 485-497, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39277331

ABSTRACT

Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (Cllr) and equal error rate (EER). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.


Subject(s)
Forensic Sciences , Humans , Forensic Sciences/methods , Likelihood Functions , Linguistics , Support Vector Machine , Speech Acoustics , Algorithms , Speech
10.
Hum Brain Mapp ; 45(13): e70023, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39268584

ABSTRACT

The relationship between speech production and perception is a topic of ongoing debate. Some argue that there is little interaction between the two, while others claim they share representations and processes. One perspective suggests increased recruitment of the speech motor system in demanding listening situations to facilitate perception. However, uncertainties persist regarding the specific regions involved and the listening conditions influencing its engagement. This study used activation likelihood estimation in coordinate-based meta-analyses to investigate the neural overlap between speech production and three speech perception conditions: speech-in-noise, spectrally degraded speech and linguistically complex speech. Neural overlap was observed in the left frontal, insular and temporal regions. Key nodes included the left frontal operculum (FOC), left posterior lateral part of the inferior frontal gyrus (IFG), left planum temporale (PT), and left pre-supplementary motor area (pre-SMA). The left IFG activation was consistently observed during linguistic processing, suggesting sensitivity to the linguistic content of speech. In comparison, the left pre-SMA activation was observed when processing degraded and noisy signals, indicating sensitivity to signal quality. Activations of the left PT and FOC activation were noted in all conditions, with the posterior FOC area overlapping in all conditions. Our meta-analysis reveals context-independent (FOC, PT) and context-dependent (pre-SMA, posterior lateral IFG) regions within the speech motor system during challenging speech perception. These regions could contribute to sensorimotor integration and executive cognitive control for perception and production.


Subject(s)
Speech Perception , Speech , Humans , Speech Perception/physiology , Speech/physiology , Brain Mapping , Likelihood Functions , Motor Cortex/physiology , Cerebral Cortex/physiology , Cerebral Cortex/diagnostic imaging
11.
J Acoust Soc Am ; 156(3): 1850-1861, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-39287467

ABSTRACT

Research has shown that talkers reliably coordinate the timing of articulator movements across variation in production rate and syllable stress, and that this precision of inter-articulator timing instantiates phonetic structure in the resulting acoustic signal. We here tested the hypothesis that immediate auditory feedback helps regulate that consistent articulatory timing control. Talkers with normal hearing recorded 480 /tV#Cat/ utterances using electromagnetic articulography, with alternative V (/ɑ/-/ɛ/) and C (/t/-/d/), across variation in production rate (fast-normal) and stress (first syllable stressed-unstressed). Utterances were split between two listening conditions: unmasked and masked. To quantify the effect of immediate auditory feedback on the coordination between the jaw and tongue-tip, the timing of tongue-tip raising onset for C, relative to the jaw opening-closing cycle for V, was obtained in each listening condition. Across both listening conditions, any manipulation that shortened the jaw opening-closing cycle reduced the latency of tongue-tip movement onset, relative to the onset of jaw opening. Moreover, tongue-tip latencies were strongly affiliated with utterance type. During auditory masking, however, tongue-tip latencies were less strongly affiliated with utterance type, demonstrating that talkers use afferent auditory signals in real-time to regulate the precision of inter-articulator timing in service to phonetic structure.


Subject(s)
Feedback, Sensory , Phonetics , Speech Perception , Tongue , Humans , Tongue/physiology , Male , Female , Adult , Feedback, Sensory/physiology , Young Adult , Speech Perception/physiology , Jaw/physiology , Speech Acoustics , Speech Production Measurement/methods , Time Factors , Speech/physiology , Perceptual Masking
12.
Hum Brain Mapp ; 45(14): e70030, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39301700

ABSTRACT

Psychosis implicates changes across a broad range of cognitive functions. These functions are cortically organized in the form of a hierarchy ranging from primary sensorimotor (unimodal) to higher-order association cortices, which involve functions such as language (transmodal). Language has long been documented as undergoing structural changes in psychosis. We hypothesized that these changes as revealed in spontaneous speech patterns may act as readouts of alterations in the configuration of this unimodal-to-transmodal axis of cortical organization in psychosis. Results from 29 patients with first-episodic psychosis (FEP) and 29 controls scanned with 7 T resting-state fMRI confirmed a compression of the cortical hierarchy in FEP, which affected metrics of the hierarchical distance between the sensorimotor and default mode networks, and of the hierarchical organization within the semantic network. These organizational changes were predicted by graphs representing semantic and syntactic associations between meaningful units in speech produced during picture descriptions. These findings unite psychosis, language, and the cortical hierarchy in a single conceptual scheme, which helps to situate language within the neurocognition of psychosis and opens the clinical prospect for mental dysfunction to become computationally measurable in spontaneous speech.


Subject(s)
Magnetic Resonance Imaging , Psychotic Disorders , Speech , Humans , Psychotic Disorders/diagnostic imaging , Psychotic Disorders/physiopathology , Psychotic Disorders/pathology , Male , Adult , Female , Speech/physiology , Young Adult , Nerve Net/diagnostic imaging , Nerve Net/physiopathology , Nerve Net/pathology , Cerebral Cortex/diagnostic imaging , Cerebral Cortex/physiopathology , Default Mode Network/diagnostic imaging , Default Mode Network/physiopathology
13.
Nat Commun ; 15(1): 7629, 2024 Sep 02.
Article in English | MEDLINE | ID: mdl-39223110

ABSTRACT

Recent advances in technology for hyper-realistic visual and audio effects provoke the concern that deepfake videos of political speeches will soon be indistinguishable from authentic video. We conduct 5 pre-registered randomized experiments with N = 2215 participants to evaluate how accurately humans distinguish real political speeches from fabrications across base rates of misinformation, audio sources, question framings with and without priming, and media modalities. We do not find base rates of misinformation have statistically significant effects on discernment. We find deepfakes with audio produced by the state-of-the-art text-to-speech algorithms are harder to discern than the same deepfakes with voice actor audio. Moreover across all experiments and question framings, we find audio and visual information enables more accurate discernment than text alone: human discernment relies more on how something is said, the audio-visual cues, than what is said, the speech content.


Subject(s)
Politics , Speech , Video Recording , Humans , Female , Male , Adult , Young Adult , Communication , Algorithms
14.
Cogn Sci ; 48(9): e13484, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39228272

ABSTRACT

When people talk about kinship systems, they often use co-speech gestures and other representations to elaborate. This paper investigates such polysemiotic (spoken, gestured, and drawn) descriptions of kinship relations, to see if they display recurring patterns of conventionalization that capture specific social structures. We present an exploratory hypothesis-generating study of descriptions produced by a lesser-known ethnolinguistic community to the cognitive sciences: the Paamese people of Vanuatu. Forty Paamese speakers were asked to talk about their family in semi-guided kinship interviews. Analyses of the speech, gesture, and drawings produced during these interviews revealed that lineality (i.e., mother's side vs. father's side) is lateralized in the speaker's gesture space. In other words, kinship members of the speaker's matriline are placed on the left side of the speaker's body and those of the patriline are placed on their right side, when they are mentioned in speech. Moreover, we find that the gesture produced by Paamese participants during verbal descriptions of marital relations are performed significantly more often on two diagonal directions of the sagittal axis. We show that these diagonals are also found in the few diagrams that participants drew on the ground to augment their verbo-gestural descriptions of marriage practices with drawing. We interpret this behavior as evidence of a spatial template, which Paamese speakers activate to think and communicate about family relations. We therefore argue that extending investigations of kinship structures beyond kinship terminologies alone can unveil additional key factors that shape kinship cognition and communication and hereby provide further insights into the diversity of social structures.


Subject(s)
Cognition , Communication , Family , Gestures , Humans , Male , Female , Family/psychology , Adult , Speech , Middle Aged
15.
Elife ; 132024 Sep 10.
Article in English | MEDLINE | ID: mdl-39255194

ABSTRACT

Across the animal kingdom, neural responses in the auditory cortex are suppressed during vocalization, and humans are no exception. A common hypothesis is that suppression increases sensitivity to auditory feedback, enabling the detection of vocalization errors. This hypothesis has been previously confirmed in non-human primates, however a direct link between auditory suppression and sensitivity in human speech monitoring remains elusive. To address this issue, we obtained intracranial electroencephalography (iEEG) recordings from 35 neurosurgical participants during speech production. We first characterized the detailed topography of auditory suppression, which varied across superior temporal gyrus (STG). Next, we performed a delayed auditory feedback (DAF) task to determine whether the suppressed sites were also sensitive to auditory feedback alterations. Indeed, overlapping sites showed enhanced responses to feedback, indicating sensitivity. Importantly, there was a strong correlation between the degree of auditory suppression and feedback sensitivity, suggesting suppression might be a key mechanism that underlies speech monitoring. Further, we found that when participants produced speech with simultaneous auditory feedback, posterior STG was selectively activated if participants were engaged in a DAF paradigm, suggesting that increased attentional load can modulate auditory feedback sensitivity.


The brain lowers its response to inputs we generate ourselves, such as moving or speaking. Essentially, our brain 'knows' what will happen next when we carry out these actions, and therefore does not need to react as strongly as it would to unexpected events. This is why we cannot tickle ourselves, and why the brain does not react as much to our own voice as it does to someone else's. Quieting down the brain's response also allows us to focus on things that are new or important without getting distracted by our own movements or sounds. Studies in non-human primates showed that neurons in the auditory cortex (the region of the brain responsible for processing sound) displayed suppressed levels of activity when the animals made sounds. Interestingly, when the primates heard an altered version of their own voice, many of these same neurons became more active. But it was unclear whether this also happens in humans. To investigate, Ozker et al. used a technique called electrocorticography to record neural activity in different regions of the human brain while participants spoke. The results showed that most areas of the brain involved in auditory processing showed suppressed activity when individuals were speaking. However, when people heard an altered version of their own voice which had an unexpected delay, those same areas displayed increased activity. In addition, Ozker et al. found that the higher the level of suppression in the auditory cortex, the more sensitive these areas were to changes in a person's speech. These findings suggest that suppressing the brain's response to self-generated speech may help in detecting errors during speech production. Speech deficits are common in various neurological disorders, such as stuttering, Parkinson's disease, and aphasia. Ozker et al. hypothesize that these deficits may arise because individuals fail to suppress activity in auditory regions of the brain, causing a struggle when detecting and correcting errors in their own speech. However, further experiments are needed to test this theory.


Subject(s)
Feedback, Sensory , Speech , Humans , Male , Female , Adult , Feedback, Sensory/physiology , Speech/physiology , Young Adult , Auditory Cortex/physiology , Temporal Lobe/physiology , Speech Perception/physiology , Electroencephalography , Electrocorticography , Acoustic Stimulation
16.
PLoS One ; 19(9): e0310244, 2024.
Article in English | MEDLINE | ID: mdl-39255303

ABSTRACT

BACKGROUND: Alexithymia, characterized by difficulty identifying and describing emotions and an externally oriented thinking style, is a personality trait linked to various mental health issues. Despite its recognized importance, research on alexithymia in early childhood is sparse. This study addresses this gap by investigating alexithymia in preschool-aged children and its correlation with psychopathology, along with parental alexithymia. METHODS: Data were analyzed from 174 parents of preschoolers aged 3 to 6, including 27 children in an interdisciplinary intervention program, all of whom attended regular preschools. Parents filled out online questionnaires assessing their children's alexithymia (Perth Alexithymia Questionnaire-Parent Report) and psychopathology (Strengths and Difficulties Questionnaire), as well as their own alexithymia (Perth Alexithymia Questionnaire) and emotion recognition (Reading Mind in the Eyes Test). Linear multivariable regressions were computed to predict child psychopathology based on both child and parental alexithymia. RESULTS: Preschool children's alexithymia could be predicted by their parents' alexithymia and parents' emotion recognition skills. Internalizing symptomatology could be predicted by overall child alexithymia, whereas externalizing symptomatology was predicted by difficulties describing negative feelings only. Parental alexithymia was linked to both child alexithymia and psychopathology. CONCLUSIONS: The findings provide first evidence of the importance of alexithymia as a possible risk factor in early childhood and contribute to understanding the presentation and role of alexithymia. This could inform future research aimed at investigating the causes, prevention, and intervention strategies for psychopathology in children.


Subject(s)
Affective Symptoms , Emotions , Parents , Humans , Affective Symptoms/psychology , Child, Preschool , Male , Female , Parents/psychology , Child , Surveys and Questionnaires , Speech , Adult
17.
Sensors (Basel) ; 24(17)2024 Aug 25.
Article in English | MEDLINE | ID: mdl-39275417

ABSTRACT

Speech emotion recognition (SER) is not only a ubiquitous aspect of everyday communication, but also a central focus in the field of human-computer interaction. However, SER faces several challenges, including difficulties in detecting subtle emotional nuances and the complicated task of recognizing speech emotions in noisy environments. To effectively address these challenges, we introduce a Transformer-based model called MelTrans, which is designed to distill critical clues from speech data by learning core features and long-range dependencies. At the heart of our approach is a dual-stream framework. Using the Transformer architecture as its foundation, MelTrans deciphers broad dependencies within speech mel-spectrograms, facilitating a nuanced understanding of emotional cues embedded in speech signals. Comprehensive experimental evaluations on the EmoDB (92.52%) and IEMOCAP (76.54%) datasets demonstrate the effectiveness of MelTrans. These results highlight MelTrans's ability to capture critical cues and long-range dependencies in speech data, setting a new benchmark within the context of these specific datasets. These results highlight the effectiveness of the proposed model in addressing the complex challenges posed by SER tasks.


Subject(s)
Emotions , Speech , Humans , Emotions/physiology , Speech/physiology , Algorithms , Speech Recognition Software
18.
Sensors (Basel) ; 24(17)2024 Aug 26.
Article in English | MEDLINE | ID: mdl-39275431

ABSTRACT

Advancements in deep learning speech representations have facilitated the effective use of extensive unlabeled speech datasets for Parkinson's disease (PD) modeling with minimal annotated data. This study employs the non-fine-tuned wav2vec 1.0 architecture to develop machine learning models for PD speech diagnosis tasks, such as cross-database classification and regression to predict demographic and articulation characteristics. The primary aim is to analyze overlapping components within the embeddings on both classification and regression tasks, investigating whether latent speech representations in PD are shared across models, particularly for related tasks. Firstly, evaluation using three multi-language PD datasets showed that wav2vec accurately detected PD based on speech, outperforming feature extraction using mel-frequency cepstral coefficients in the proposed cross-database classification scenarios. In cross-database scenarios using Italian and English-read texts, wav2vec demonstrated performance comparable to intra-dataset evaluations. We also compared our cross-database findings against those of other related studies. Secondly, wav2vec proved effective in regression, modeling various quantitative speech characteristics related to articulation and aging. Ultimately, subsequent analysis of important features examined the presence of significant overlaps between classification and regression models. The feature importance experiments discovered shared features across trained models, with increased sharing for related tasks, further suggesting that wav2vec contributes to improved generalizability. The study proposes wav2vec embeddings as a next promising step toward a speech-based universal model to assist in the evaluation of PD.


Subject(s)
Databases, Factual , Parkinson Disease , Speech , Parkinson Disease/physiopathology , Humans , Speech/physiology , Deep Learning , Male , Female , Aged , Machine Learning , Middle Aged
19.
Sensors (Basel) ; 24(17)2024 Sep 02.
Article in English | MEDLINE | ID: mdl-39275615

ABSTRACT

Speech emotion recognition is key to many fields, including human-computer interaction, healthcare, and intelligent assistance. While acoustic features extracted from human speech are essential for this task, not all of them contribute to emotion recognition effectively. Thus, reduced numbers of features are required within successful emotion recognition models. This work aimed to investigate whether splitting the features into two subsets based on their distribution and then applying commonly used feature reduction methods would impact accuracy. Filter reduction was employed using the Kruskal-Wallis test, followed by principal component analysis (PCA) and independent component analysis (ICA). A set of features was investigated to determine whether the indiscriminate use of parametric feature reduction techniques affects the accuracy of emotion recognition. For this investigation, data from three databases-Berlin EmoDB, SAVEE, and RAVDES-were organized into subsets according to their distribution in applying both PCA and ICA. The results showed a reduction from 6373 features to 170 for the Berlin EmoDB database with an accuracy of 84.3%; a final size of 130 features for SAVEE, with a corresponding accuracy of 75.4%; and 150 features for RAVDESS, with an accuracy of 59.9%.


Subject(s)
Emotions , Principal Component Analysis , Speech , Humans , Emotions/physiology , Speech/physiology , Databases, Factual , Algorithms , Pattern Recognition, Automated/methods
20.
Sensors (Basel) ; 24(17)2024 Sep 06.
Article in English | MEDLINE | ID: mdl-39275707

ABSTRACT

Emotion recognition through speech is a technique employed in various scenarios of Human-Computer Interaction (HCI). Existing approaches have achieved significant results; however, limitations persist, with the quantity and diversity of data being more notable when deep learning techniques are used. The lack of a standard in feature selection leads to continuous development and experimentation. Choosing and designing the appropriate network architecture constitutes another challenge. This study addresses the challenge of recognizing emotions in the human voice using deep learning techniques, proposing a comprehensive approach, and developing preprocessing and feature selection stages while constructing a dataset called EmoDSc as a result of combining several available databases. The synergy between spectral features and spectrogram images is investigated. Independently, the weighted accuracy obtained using only spectral features was 89%, while using only spectrogram images, the weighted accuracy reached 90%. These results, although surpassing previous research, highlight the strengths and limitations when operating in isolation. Based on this exploration, a neural network architecture composed of a CNN1D, a CNN2D, and an MLP that fuses spectral features and spectogram images is proposed. The model, supported by the unified dataset EmoDSc, demonstrates a remarkable accuracy of 96%.


Subject(s)
Deep Learning , Emotions , Neural Networks, Computer , Humans , Emotions/physiology , Speech/physiology , Databases, Factual , Algorithms , Pattern Recognition, Automated/methods
SELECTION OF CITATIONS
SEARCH DETAIL