Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27.534
Filtrar
1.
Sci Rep ; 14(1): 20756, 2024 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-39237702

RESUMO

The basic function of the tongue in pronouncing diadochokinesis and other syllables is not fully understood. This study investigates the influence of sound pressure levels and syllables on tongue pressure and muscle activity in 19 healthy adults (mean age: 28.2 years; range: 22-33 years). Tongue pressure and activity of the posterior tongue were measured using electromyography (EMG) when the velar stops /ka/, /ko/, /ga/, and /go/ were pronounced at 70, 60, 50, and 40 dB. Spearman's rank correlation revealed a significant, yet weak, positive association between tongue pressure and EMG activity (ρ = 0.14, p < 0.05). Mixed-effects model analysis showed that tongue pressure and EMG activity significantly increased at 70 dB compared to other sound pressure levels. While syllables did not significantly affect tongue pressure, the syllable /ko/ significantly increased EMG activity (coefficient = 0.048, p = 0.013). Although no significant differences in tongue pressure were observed for the velar stops /ka/, /ko/, /ga/, and /go/, it is suggested that articulation is achieved by altering the activity of both extrinsic and intrinsic tongue muscles. These findings highlight the importance of considering both tongue pressure and muscle activity when examining the physiological factors contributing to sound pressure levels during speech.


Assuntos
Eletromiografia , Pressão , Fala , Língua , Humanos , Língua/fisiologia , Eletromiografia/métodos , Adulto , Masculino , Feminino , Adulto Jovem , Fala/fisiologia , Fonética
2.
Cogn Sci ; 48(9): e13484, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39228272

RESUMO

When people talk about kinship systems, they often use co-speech gestures and other representations to elaborate. This paper investigates such polysemiotic (spoken, gestured, and drawn) descriptions of kinship relations, to see if they display recurring patterns of conventionalization that capture specific social structures. We present an exploratory hypothesis-generating study of descriptions produced by a lesser-known ethnolinguistic community to the cognitive sciences: the Paamese people of Vanuatu. Forty Paamese speakers were asked to talk about their family in semi-guided kinship interviews. Analyses of the speech, gesture, and drawings produced during these interviews revealed that lineality (i.e., mother's side vs. father's side) is lateralized in the speaker's gesture space. In other words, kinship members of the speaker's matriline are placed on the left side of the speaker's body and those of the patriline are placed on their right side, when they are mentioned in speech. Moreover, we find that the gesture produced by Paamese participants during verbal descriptions of marital relations are performed significantly more often on two diagonal directions of the sagittal axis. We show that these diagonals are also found in the few diagrams that participants drew on the ground to augment their verbo-gestural descriptions of marriage practices with drawing. We interpret this behavior as evidence of a spatial template, which Paamese speakers activate to think and communicate about family relations. We therefore argue that extending investigations of kinship structures beyond kinship terminologies alone can unveil additional key factors that shape kinship cognition and communication and hereby provide further insights into the diversity of social structures.


Assuntos
Cognição , Comunicação , Família , Gestos , Humanos , Masculino , Feminino , Família/psicologia , Adulto , Fala , Pessoa de Meia-Idade
3.
Cogn Sci ; 48(9): e13495, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39283264

RESUMO

Causation is a core feature of human cognition and language. How children learn about intricate causal meanings is yet unresolved. Here, we focus on how children learn verbs that express causation. Such verbs, known as lexical causatives (e.g., break and raise), lack explicit morphosyntactic markers indicating causation, thus requiring that the child generalizes the causal meaning from the context. The language addressed to children presumably plays a crucial role in this learning process. Hence, we tested whether adults adapt their use of lexical causatives to children when talking to them in day-to-day interactions. We analyzed naturalistic longitudinal data from 12 children in the Manchester corpus (spanning from 20 to 36 months of age). To detect semantic generalization, we employed a network approach with semantics learned from cross-situational contexts. Our results show an increasing trend in the expansion of causative semantics, observable in both child speech and child-directed speech. Adults consistently maintain somewhat more intricate causative semantic networks compared to children. However, both groups display evolving patterns. Around 28-30 months of age, children undergo a reduction in the degree of causative generalization, followed by a slightly time-lagged adjustment by adults in their speech directed to children. These findings substantiate adults' adaptation in child-directed speech, extending to semantics. They highlight child-directed speech as a highly adaptive and subconscious teaching tool that facilitates the dynamic processes of language acquisition.


Assuntos
Desenvolvimento da Linguagem , Semântica , Fala , Humanos , Pré-Escolar , Adulto , Masculino , Feminino , Lactente , Aprendizagem , Estudos Longitudinais , Idioma , Linguagem Infantil
4.
Acta Neurochir (Wien) ; 166(1): 369, 2024 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-39283500

RESUMO

BACKGROUND: Speech changes significantly impact the quality of life for Parkinson's disease (PD) patients. Deep Brain Stimulation (DBS) of the Subthalamic Nucleus (STN) is a standard treatment for advanced PD, but its effects on speech remain unclear. This study aimed to investigate the relationship between STN-DBS and speech changes in PD patients using comprehensive clinical assessments and tractography. METHODS: Forty-seven PD patients underwent STN-DBS, with preoperative and 3-month postoperative assessments. Speech analyses included acoustic measurements, auditory-perceptual evaluations, and fluency-intelligibility tests. On the other hand, structures within the volume tissue activated (VTA) were identified using MRI and DTI. The clinical and demographic data and structures associated with VTA (Corticospinal tract, Internal capsule, Dentato-rubro-thalamic tract, Medial forebrain bundle, Medial lemniscus, Substantia nigra, Red nucleus) were compared with speech analyses. RESULTS: The majority of patients (36.2-55.4% good, 29.7-53.1% same) exhibited either improved or unchanged speech quality following STN-DBS. Only a small percentage (8.5-14.9%) experienced deterioration. Older patients and those with worsened motor symptoms postoperatively were more likely to experience negative speech changes (p < 0.05). Interestingly, stimulation of the right Substantia Nigra correlated with improved speech quality (p < 0.05). No significant relationship was found between other structures affected by VTA and speech changes. CONCLUSIONS: This study suggests that STN-DBS does not predominantly negatively impact speech in PD patients, with potential benefits observed, especially in younger patients. These findings underscore the importance of individualized treatment approaches and highlight the need for further long-term studies to optimize therapeutic outcomes and better understand the effects of STN-DBS on speech.


Assuntos
Estimulação Encefálica Profunda , Imagem de Tensor de Difusão , Doença de Parkinson , Fala , Núcleo Subtalâmico , Humanos , Núcleo Subtalâmico/diagnóstico por imagem , Núcleo Subtalâmico/cirurgia , Estimulação Encefálica Profunda/métodos , Masculino , Feminino , Pessoa de Meia-Idade , Doença de Parkinson/terapia , Doença de Parkinson/diagnóstico por imagem , Idoso , Imagem de Tensor de Difusão/métodos , Estudos Prospectivos , Fala/fisiologia , Distúrbios da Fala/etiologia , Resultado do Tratamento , Adulto
5.
Nat Commun ; 15(1): 7897, 2024 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-39284848

RESUMO

Historically, eloquent functions have been viewed as localized to focal areas of human cerebral cortex, while more recent studies suggest they are encoded by distributed networks. We examined the network properties of cortical sites defined by stimulation to be critical for speech and language, using electrocorticography from sixteen participants during word-reading. We discovered distinct network signatures for sites where stimulation caused speech arrest and language errors. Both demonstrated lower local and global connectivity, whereas sites causing language errors exhibited higher inter-community connectivity, identifying them as connectors between modules in the language network. We used machine learning to classify these site types with reasonably high accuracy, even across participants, suggesting that a site's pattern of connections within the task-activated language network helps determine its importance to function. These findings help to bridge the gap in our understanding of how focal cortical stimulation interacts with complex brain networks to elicit language deficits.


Assuntos
Córtex Cerebral , Eletrocorticografia , Idioma , Fala , Humanos , Masculino , Feminino , Córtex Cerebral/fisiologia , Adulto , Fala/fisiologia , Rede Nervosa/fisiologia , Adulto Jovem , Aprendizado de Máquina , Mapeamento Encefálico
6.
Nat Commun ; 15(1): 7629, 2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39223110

RESUMO

Recent advances in technology for hyper-realistic visual and audio effects provoke the concern that deepfake videos of political speeches will soon be indistinguishable from authentic video. We conduct 5 pre-registered randomized experiments with N = 2215 participants to evaluate how accurately humans distinguish real political speeches from fabrications across base rates of misinformation, audio sources, question framings with and without priming, and media modalities. We do not find base rates of misinformation have statistically significant effects on discernment. We find deepfakes with audio produced by the state-of-the-art text-to-speech algorithms are harder to discern than the same deepfakes with voice actor audio. Moreover across all experiments and question framings, we find audio and visual information enables more accurate discernment than text alone: human discernment relies more on how something is said, the audio-visual cues, than what is said, the speech content.


Assuntos
Política , Fala , Gravação em Vídeo , Humanos , Feminino , Masculino , Adulto , Adulto Jovem , Comunicação , Algoritmos
7.
Codas ; 36(5): e20230194, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39230179

RESUMO

PURPOSE: To describe the effects of subthalamic nucleus deep brain stimulation (STN-DBS) on the speech of Spanish-speaking Parkinson's disease (PD) patients during the first year of treatment. METHODS: The speech measures (SMs): maximum phonation time, acoustic voice measures, speech rate, speech intelligibility measures, and oral diadochokinesis rates of nine Colombian idiopathic PD patients (four females and five males; age = 63 ± 7 years; years of PD = 10 ± 7 years; UPDRS-III = 57 ± 6; H&Y = 2 ± 0.3) were studied in OFF and ON medication states before and every three months during the first year after STN-DBS surgery. Praat software and healthy native listeners' ratings were used for speech analysis. Statistical analysis tried to find significant differences in the SMs during follow-up (Friedman test) and between medication states (Wilcoxon paired test). Also, a pre-surgery variation interval (PSVI) of reference for every participant and SM was calculated to make an individual analysis of post-surgery variation. RESULTS: Non-significative post-surgery or medication state-related differences in the SMs were found. Nevertheless, individually, based on PSVIs, the SMs exhibited: no variation, inconsistent or consistent variation during post-surgery follow-up in different combinations, depending on the medication state. CONCLUSION: As a group, participants did not have a shared post-surgery pattern of change in any SM. Instead, based on PSVIs, the SMs varied differently in every participant, which suggests that in Spanish-speaking PD patients, the effects of STN-DBS on speech during the first year of treatment could be highly variable.


Assuntos
Estimulação Encefálica Profunda , Doença de Parkinson , Núcleo Subtalâmico , Humanos , Doença de Parkinson/terapia , Doença de Parkinson/fisiopatologia , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Inteligibilidade da Fala/fisiologia , Idioma , Distúrbios da Fala/etiologia , Distúrbios da Fala/terapia , Fala/fisiologia , Medida da Produção da Fala , Resultado do Tratamento
8.
Sensors (Basel) ; 24(17)2024 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-39275417

RESUMO

Speech emotion recognition (SER) is not only a ubiquitous aspect of everyday communication, but also a central focus in the field of human-computer interaction. However, SER faces several challenges, including difficulties in detecting subtle emotional nuances and the complicated task of recognizing speech emotions in noisy environments. To effectively address these challenges, we introduce a Transformer-based model called MelTrans, which is designed to distill critical clues from speech data by learning core features and long-range dependencies. At the heart of our approach is a dual-stream framework. Using the Transformer architecture as its foundation, MelTrans deciphers broad dependencies within speech mel-spectrograms, facilitating a nuanced understanding of emotional cues embedded in speech signals. Comprehensive experimental evaluations on the EmoDB (92.52%) and IEMOCAP (76.54%) datasets demonstrate the effectiveness of MelTrans. These results highlight MelTrans's ability to capture critical cues and long-range dependencies in speech data, setting a new benchmark within the context of these specific datasets. These results highlight the effectiveness of the proposed model in addressing the complex challenges posed by SER tasks.


Assuntos
Emoções , Fala , Humanos , Emoções/fisiologia , Fala/fisiologia , Algoritmos , Interface para o Reconhecimento da Fala
9.
Sensors (Basel) ; 24(17)2024 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-39275431

RESUMO

Advancements in deep learning speech representations have facilitated the effective use of extensive unlabeled speech datasets for Parkinson's disease (PD) modeling with minimal annotated data. This study employs the non-fine-tuned wav2vec 1.0 architecture to develop machine learning models for PD speech diagnosis tasks, such as cross-database classification and regression to predict demographic and articulation characteristics. The primary aim is to analyze overlapping components within the embeddings on both classification and regression tasks, investigating whether latent speech representations in PD are shared across models, particularly for related tasks. Firstly, evaluation using three multi-language PD datasets showed that wav2vec accurately detected PD based on speech, outperforming feature extraction using mel-frequency cepstral coefficients in the proposed cross-database classification scenarios. In cross-database scenarios using Italian and English-read texts, wav2vec demonstrated performance comparable to intra-dataset evaluations. We also compared our cross-database findings against those of other related studies. Secondly, wav2vec proved effective in regression, modeling various quantitative speech characteristics related to articulation and aging. Ultimately, subsequent analysis of important features examined the presence of significant overlaps between classification and regression models. The feature importance experiments discovered shared features across trained models, with increased sharing for related tasks, further suggesting that wav2vec contributes to improved generalizability. The study proposes wav2vec embeddings as a next promising step toward a speech-based universal model to assist in the evaluation of PD.


Assuntos
Bases de Dados Factuais , Doença de Parkinson , Fala , Doença de Parkinson/fisiopatologia , Humanos , Fala/fisiologia , Aprendizado Profundo , Masculino , Feminino , Idoso , Aprendizado de Máquina , Pessoa de Meia-Idade
10.
Sensors (Basel) ; 24(17)2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39275615

RESUMO

Speech emotion recognition is key to many fields, including human-computer interaction, healthcare, and intelligent assistance. While acoustic features extracted from human speech are essential for this task, not all of them contribute to emotion recognition effectively. Thus, reduced numbers of features are required within successful emotion recognition models. This work aimed to investigate whether splitting the features into two subsets based on their distribution and then applying commonly used feature reduction methods would impact accuracy. Filter reduction was employed using the Kruskal-Wallis test, followed by principal component analysis (PCA) and independent component analysis (ICA). A set of features was investigated to determine whether the indiscriminate use of parametric feature reduction techniques affects the accuracy of emotion recognition. For this investigation, data from three databases-Berlin EmoDB, SAVEE, and RAVDES-were organized into subsets according to their distribution in applying both PCA and ICA. The results showed a reduction from 6373 features to 170 for the Berlin EmoDB database with an accuracy of 84.3%; a final size of 130 features for SAVEE, with a corresponding accuracy of 75.4%; and 150 features for RAVDESS, with an accuracy of 59.9%.


Assuntos
Emoções , Análise de Componente Principal , Fala , Humanos , Emoções/fisiologia , Fala/fisiologia , Bases de Dados Factuais , Algoritmos , Reconhecimento Automatizado de Padrão/métodos
11.
Sensors (Basel) ; 24(17)2024 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-39275707

RESUMO

Emotion recognition through speech is a technique employed in various scenarios of Human-Computer Interaction (HCI). Existing approaches have achieved significant results; however, limitations persist, with the quantity and diversity of data being more notable when deep learning techniques are used. The lack of a standard in feature selection leads to continuous development and experimentation. Choosing and designing the appropriate network architecture constitutes another challenge. This study addresses the challenge of recognizing emotions in the human voice using deep learning techniques, proposing a comprehensive approach, and developing preprocessing and feature selection stages while constructing a dataset called EmoDSc as a result of combining several available databases. The synergy between spectral features and spectrogram images is investigated. Independently, the weighted accuracy obtained using only spectral features was 89%, while using only spectrogram images, the weighted accuracy reached 90%. These results, although surpassing previous research, highlight the strengths and limitations when operating in isolation. Based on this exploration, a neural network architecture composed of a CNN1D, a CNN2D, and an MLP that fuses spectral features and spectogram images is proposed. The model, supported by the unified dataset EmoDSc, demonstrates a remarkable accuracy of 96%.


Assuntos
Aprendizado Profundo , Emoções , Redes Neurais de Computação , Humanos , Emoções/fisiologia , Fala/fisiologia , Bases de Dados Factuais , Algoritmos , Reconhecimento Automatizado de Padrão/métodos
12.
Sci Justice ; 64(5): 485-497, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39277331

RESUMO

Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (Cllr) and equal error rate (EER). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.


Assuntos
Ciências Forenses , Humanos , Ciências Forenses/métodos , Funções Verossimilhança , Linguística , Máquina de Vetores de Suporte , Acústica da Fala , Algoritmos , Fala
13.
Hum Brain Mapp ; 45(13): e70023, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39268584

RESUMO

The relationship between speech production and perception is a topic of ongoing debate. Some argue that there is little interaction between the two, while others claim they share representations and processes. One perspective suggests increased recruitment of the speech motor system in demanding listening situations to facilitate perception. However, uncertainties persist regarding the specific regions involved and the listening conditions influencing its engagement. This study used activation likelihood estimation in coordinate-based meta-analyses to investigate the neural overlap between speech production and three speech perception conditions: speech-in-noise, spectrally degraded speech and linguistically complex speech. Neural overlap was observed in the left frontal, insular and temporal regions. Key nodes included the left frontal operculum (FOC), left posterior lateral part of the inferior frontal gyrus (IFG), left planum temporale (PT), and left pre-supplementary motor area (pre-SMA). The left IFG activation was consistently observed during linguistic processing, suggesting sensitivity to the linguistic content of speech. In comparison, the left pre-SMA activation was observed when processing degraded and noisy signals, indicating sensitivity to signal quality. Activations of the left PT and FOC activation were noted in all conditions, with the posterior FOC area overlapping in all conditions. Our meta-analysis reveals context-independent (FOC, PT) and context-dependent (pre-SMA, posterior lateral IFG) regions within the speech motor system during challenging speech perception. These regions could contribute to sensorimotor integration and executive cognitive control for perception and production.


Assuntos
Percepção da Fala , Fala , Humanos , Percepção da Fala/fisiologia , Fala/fisiologia , Mapeamento Encefálico , Funções Verossimilhança , Córtex Motor/fisiologia , Córtex Cerebral/fisiologia , Córtex Cerebral/diagnóstico por imagem
14.
Elife ; 132024 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-39255194

RESUMO

Across the animal kingdom, neural responses in the auditory cortex are suppressed during vocalization, and humans are no exception. A common hypothesis is that suppression increases sensitivity to auditory feedback, enabling the detection of vocalization errors. This hypothesis has been previously confirmed in non-human primates, however a direct link between auditory suppression and sensitivity in human speech monitoring remains elusive. To address this issue, we obtained intracranial electroencephalography (iEEG) recordings from 35 neurosurgical participants during speech production. We first characterized the detailed topography of auditory suppression, which varied across superior temporal gyrus (STG). Next, we performed a delayed auditory feedback (DAF) task to determine whether the suppressed sites were also sensitive to auditory feedback alterations. Indeed, overlapping sites showed enhanced responses to feedback, indicating sensitivity. Importantly, there was a strong correlation between the degree of auditory suppression and feedback sensitivity, suggesting suppression might be a key mechanism that underlies speech monitoring. Further, we found that when participants produced speech with simultaneous auditory feedback, posterior STG was selectively activated if participants were engaged in a DAF paradigm, suggesting that increased attentional load can modulate auditory feedback sensitivity.


The brain lowers its response to inputs we generate ourselves, such as moving or speaking. Essentially, our brain 'knows' what will happen next when we carry out these actions, and therefore does not need to react as strongly as it would to unexpected events. This is why we cannot tickle ourselves, and why the brain does not react as much to our own voice as it does to someone else's. Quieting down the brain's response also allows us to focus on things that are new or important without getting distracted by our own movements or sounds. Studies in non-human primates showed that neurons in the auditory cortex (the region of the brain responsible for processing sound) displayed suppressed levels of activity when the animals made sounds. Interestingly, when the primates heard an altered version of their own voice, many of these same neurons became more active. But it was unclear whether this also happens in humans. To investigate, Ozker et al. used a technique called electrocorticography to record neural activity in different regions of the human brain while participants spoke. The results showed that most areas of the brain involved in auditory processing showed suppressed activity when individuals were speaking. However, when people heard an altered version of their own voice which had an unexpected delay, those same areas displayed increased activity. In addition, Ozker et al. found that the higher the level of suppression in the auditory cortex, the more sensitive these areas were to changes in a person's speech. These findings suggest that suppressing the brain's response to self-generated speech may help in detecting errors during speech production. Speech deficits are common in various neurological disorders, such as stuttering, Parkinson's disease, and aphasia. Ozker et al. hypothesize that these deficits may arise because individuals fail to suppress activity in auditory regions of the brain, causing a struggle when detecting and correcting errors in their own speech. However, further experiments are needed to test this theory.


Assuntos
Retroalimentação Sensorial , Fala , Humanos , Masculino , Feminino , Adulto , Retroalimentação Sensorial/fisiologia , Fala/fisiologia , Adulto Jovem , Córtex Auditivo/fisiologia , Lobo Temporal/fisiologia , Percepção da Fala/fisiologia , Eletroencefalografia , Eletrocorticografia , Estimulação Acústica
15.
PLoS One ; 19(9): e0310244, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39255303

RESUMO

BACKGROUND: Alexithymia, characterized by difficulty identifying and describing emotions and an externally oriented thinking style, is a personality trait linked to various mental health issues. Despite its recognized importance, research on alexithymia in early childhood is sparse. This study addresses this gap by investigating alexithymia in preschool-aged children and its correlation with psychopathology, along with parental alexithymia. METHODS: Data were analyzed from 174 parents of preschoolers aged 3 to 6, including 27 children in an interdisciplinary intervention program, all of whom attended regular preschools. Parents filled out online questionnaires assessing their children's alexithymia (Perth Alexithymia Questionnaire-Parent Report) and psychopathology (Strengths and Difficulties Questionnaire), as well as their own alexithymia (Perth Alexithymia Questionnaire) and emotion recognition (Reading Mind in the Eyes Test). Linear multivariable regressions were computed to predict child psychopathology based on both child and parental alexithymia. RESULTS: Preschool children's alexithymia could be predicted by their parents' alexithymia and parents' emotion recognition skills. Internalizing symptomatology could be predicted by overall child alexithymia, whereas externalizing symptomatology was predicted by difficulties describing negative feelings only. Parental alexithymia was linked to both child alexithymia and psychopathology. CONCLUSIONS: The findings provide first evidence of the importance of alexithymia as a possible risk factor in early childhood and contribute to understanding the presentation and role of alexithymia. This could inform future research aimed at investigating the causes, prevention, and intervention strategies for psychopathology in children.


Assuntos
Sintomas Afetivos , Emoções , Pais , Humanos , Sintomas Afetivos/psicologia , Pré-Escolar , Masculino , Feminino , Pais/psicologia , Criança , Inquéritos e Questionários , Fala , Adulto
16.
J Speech Lang Hear Res ; 67(9): 2964-2976, 2024 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-39265154

RESUMO

INTRODUCTION: Transcribing disordered speech can be useful when diagnosing motor speech disorders such as primary progressive apraxia of speech (PPAOS), who have sound additions, deletions, and substitutions, or distortions and/or slow, segmented speech. Since transcribing speech can be a laborious process and requires an experienced listener, using automatic speech recognition (ASR) systems for diagnosis and treatment monitoring is appealing. This study evaluated the efficacy of a readily available ASR system (wav2vec 2.0) in transcribing speech of PPAOS patients to determine if the word error rate (WER) output by the ASR can differentiate between healthy speech and PPAOS and/or among its subtypes, whether WER correlates with AOS severity, and how the ASR's errors compare to those noted in manual transcriptions. METHOD: Forty-five patients with PPAOS and 22 healthy controls were recorded repeating 13 words, 3 times each, which were transcribed manually and using wav2vec 2.0. The WER and phonetic and prosodic speech errors were compared between groups, and ASR results were compared against manual transcriptions. RESULTS: Mean overall WER was 0.88 for patients and 0.33 for controls. WER significantly correlated with AOS severity and accurately distinguished between patients and controls but not between AOS subtypes. The phonetic and prosodic errors from the ASR transcriptions were also unable to distinguish between subtypes, whereas errors calculated from human transcriptions were. There was poor agreement in the number of phonetic and prosodic errors between the ASR and human transcriptions. CONCLUSIONS: This study demonstrates that ASR can be useful in differentiating healthy from disordered speech and evaluating PPAOS severity but does not distinguish PPAOS subtypes. ASR transcriptions showed weak agreement with human transcriptions; thus, ASR may be a useful tool for the transcription of speech in PPAOS, but the research questions posed must be carefully considered within the context of its limitations. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.26359417.


Assuntos
Interface para o Reconhecimento da Fala , Humanos , Masculino , Feminino , Idoso , Pessoa de Meia-Idade , Fala/fisiologia , Apraxias/diagnóstico , Medida da Produção da Fala/métodos , Fonética , Afasia Primária Progressiva/diagnóstico , Estudos de Casos e Controles
17.
Brain Lang ; 256: 105463, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39243486

RESUMO

We investigated how neural oscillations code the hierarchical nature of stress rhythms in speech and how stress processing varies with language experience. By measuring phase synchrony of multilevel EEG-acoustic tracking and intra-brain cross-frequency coupling, we show the encoding of stress involves different neural signatures (delta rhythms = stress foot rate; theta rhythms = syllable rate), is stronger for amplitude vs. duration stress cues, and induces nested delta-theta coherence mirroring the stress-syllable hierarchy in speech. Only native English, but not Mandarin, speakers exhibited enhanced neural entrainment at central stress (2 Hz) and syllable (4 Hz) rates intrinsic to natural English. English individuals with superior cortical-stress tracking capabilities also displayed stronger neural hierarchical coherence, highlighting a nuanced interplay between internal nesting of brain rhythms and external entrainment rooted in language-specific speech rhythms. Our cross-language findings reveal brain-speech synchronization is not purely a "bottom-up" but benefits from "top-down" processing from listeners' language-specific experience.


Assuntos
Percepção da Fala , Humanos , Feminino , Masculino , Percepção da Fala/fisiologia , Adulto , Eletroencefalografia , Encéfalo/fisiologia , Adulto Jovem , Fala/fisiologia , Idioma , Estimulação Acústica
18.
J Psychiatr Res ; 178: 66-76, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39121709

RESUMO

BACKGROUND: Objective diagnostic approaches need to be tested to enhance the efficacy of depression detection. Non-invasive EEG-based identification represents a promising area. AIMS: The present EEG study addresses two central questions: 1) whether inner or overt speech condition result in higher diagnositc accuracy of depression detection; and 2) does the affective nature of the presented emotion words count in such diagnostic approach. METHODS: A matched case-control sample consisting of 10 depressed subjects and 10 healthy controls was assessed. An EEG headcap containing 64 electrodes measured neural responses to experimental cues presented in the form of 15 different words that belonged to three emotional categories: neutral, positive, and negative. 120 experimental cues was presented for every participant, each containing an "inner speech" and an "overt speech" segment. An EEGNet neural network was utilized. RESULTS: The highest diagnostic accuracy of the EEGNet model was observed in the case of the overt speech condition (i.e. 69.5%), while a an overall subject-wise accuracy of 80% was achieved by the model. Only a negligible difference in diagnostic accuracy could be found between aggregated emotion word categories, with the highest accuracy (i.e. 70.2%) associated with the presentation of positive emotion words. Model decision was primarily influenced by electrodes representing the regions of the left parietal, the left temporal lobe and the middle frontal areas. CONCLUSIONS: While the generalizability of our results is limited by the small sample size and potentially uncontrolled confounders, depression was associated with sensitive and presumably network-like aspects of these brain areas, potentially implying a higher level of emotion regulation that increases primarily in open communication.


Assuntos
Sinais (Psicologia) , Eletroencefalografia , Emoções , Aprendizado de Máquina , Humanos , Feminino , Masculino , Adulto , Emoções/fisiologia , Estudos de Casos e Controles , Depressão/diagnóstico , Depressão/fisiopatologia , Fala/fisiologia , Adulto Jovem , Pessoa de Meia-Idade , Encéfalo/fisiopatologia
19.
Sci Rep ; 14(1): 20270, 2024 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-39217249

RESUMO

Dysphagia, a disorder affecting the ability to swallow, has a high prevalence among the older adults and can lead to serious health complications. Therefore, early detection of dysphagia is important. This study evaluated the effectiveness of a newly developed deep learning model that analyzes syllable-segmented data for diagnosing dysphagia, an aspect not addressed in prior studies. The audio data of daily conversations were collected from 16 patients with dysphagia and 24 controls. The presence of dysphagia was determined by videofluoroscopic swallowing study. The data were segmented into syllables using a speech-to-text model and analyzed with a convolutional neural network to perform binary classification between the dysphagia patients and control group. The proposed model in this study was assessed in two different aspects. Firstly, with syllable-segmented analysis, it demonstrated a diagnostic accuracy of 0.794 for dysphagia, a sensitivity of 0.901, a specificity of 0.687, a positive predictive value of 0.742, and a negative predictive value of 0.874. Secondly, at the individual level, it achieved an overall accuracy of 0.900 and area under the curve of 0.953. This research highlights the potential of deep learning modal as an early, non-invasive, and simple method for detecting dysphagia in everyday environments.


Assuntos
Aprendizado Profundo , Transtornos de Deglutição , Fala , Humanos , Transtornos de Deglutição/diagnóstico , Transtornos de Deglutição/fisiopatologia , Masculino , Feminino , Idoso , Fala/fisiologia , Idoso de 80 Anos ou mais , Pessoa de Meia-Idade , Deglutição/fisiologia , Redes Neurais de Computação
20.
Sci Rep ; 14(1): 20069, 2024 08 29.
Artigo em Inglês | MEDLINE | ID: mdl-39209957

RESUMO

Communication is a fundamental aspect of human interaction, yet many individuals must speak in less-than-ideal acoustic environments daily. Adapting their speech to ensure intelligibility in these varied settings can impose a significant cognitive burden. Understanding this burden on talkers has significant implications for the design of public spaces and workplace environments, as well as speaker training programs. The aim of this study was to examine how room acoustics and speaking style affect cognitive load through self-rating of mental demand and pupillometry. Nineteen adult native speakers of American English were instructed to read sentences in both casual and clear speech-a technique known to enhance intelligibility-across three levels of reverberation (0.05 s, 1.2 s, and 1.83 s at 500-1000 Hz). Our findings revealed that speaking style consistently affects the cognitive load on talkers more than room acoustics across the tested reverberation range. Specifically, pupillometry data suggested that speaking in clear speech elevates the cognitive load comparably to speaking in a room with long reverberation, challenging the conventional view of clear speech as an 'easy' strategy for improving intelligibility. These results underscore the importance of accounting for talkers' cognitive load when optimizing room acoustics and developing speech production training.


Assuntos
Cognição , Inteligibilidade da Fala , Humanos , Masculino , Feminino , Cognição/fisiologia , Adulto , Inteligibilidade da Fala/fisiologia , Fala/fisiologia , Adulto Jovem , Percepção da Fala/fisiologia , Acústica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA