Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 8.398
Filtrar
1.
Hum Brain Mapp ; 45(14): e70030, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39301700

RESUMO

Psychosis implicates changes across a broad range of cognitive functions. These functions are cortically organized in the form of a hierarchy ranging from primary sensorimotor (unimodal) to higher-order association cortices, which involve functions such as language (transmodal). Language has long been documented as undergoing structural changes in psychosis. We hypothesized that these changes as revealed in spontaneous speech patterns may act as readouts of alterations in the configuration of this unimodal-to-transmodal axis of cortical organization in psychosis. Results from 29 patients with first-episodic psychosis (FEP) and 29 controls scanned with 7 T resting-state fMRI confirmed a compression of the cortical hierarchy in FEP, which affected metrics of the hierarchical distance between the sensorimotor and default mode networks, and of the hierarchical organization within the semantic network. These organizational changes were predicted by graphs representing semantic and syntactic associations between meaningful units in speech produced during picture descriptions. These findings unite psychosis, language, and the cortical hierarchy in a single conceptual scheme, which helps to situate language within the neurocognition of psychosis and opens the clinical prospect for mental dysfunction to become computationally measurable in spontaneous speech.


Assuntos
Imageamento por Ressonância Magnética , Transtornos Psicóticos , Fala , Humanos , Transtornos Psicóticos/diagnóstico por imagem , Transtornos Psicóticos/fisiopatologia , Transtornos Psicóticos/patologia , Masculino , Adulto , Feminino , Fala/fisiologia , Adulto Jovem , Rede Nervosa/diagnóstico por imagem , Rede Nervosa/fisiopatologia , Rede Nervosa/patologia , Córtex Cerebral/diagnóstico por imagem , Córtex Cerebral/fisiopatologia , Rede de Modo Padrão/diagnóstico por imagem , Rede de Modo Padrão/fisiopatologia
2.
Turk J Med Sci ; 54(4): 700-709, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39295620

RESUMO

Background/aim: Individuals with multiple sclerosis (MS) may experience various speech-related issues, including decreased speech rate, increased pauses, and changes in speech rhythms. The purpose of this study was to compare the volumes of speech-related neuroanatomical structures in MS patients with those in a control group. Materials and methods: The research was conducted in the Neurology and Radiology Departments of Malatya Training and Research Hospital. The records of patients who presented to the Neurology Department between 2019 and 2022 were examined. The study included the magnetic resonance imaging (MRI) findings of 100 individuals, with 50 in the control group and 50 patients with MS, who had applied to the hospital in the specified years. VolBrain is a free system that works automatically over the internet (http://volbrain.upv.es/), enabling the measurement of brain volumes without human interaction. The acquired images were analyzed using the VolBrain program. Results: As a result of our research, a significant decrease was found in the volume of 18 of 26 speech-related regions in MS patients. It was determined that whole brain volumes decreased in the MS group compared to the control group. Conclusion: In our study, volume measurements of more speech-related areas were performed, unlike the few related studies previously conducted. We observed significant atrophy findings in the speech-related areas of the frontal, temporal, and parietal lobes of MS patients.


Assuntos
Encéfalo , Imageamento por Ressonância Magnética , Esclerose Múltipla , Humanos , Esclerose Múltipla/patologia , Esclerose Múltipla/complicações , Esclerose Múltipla/diagnóstico por imagem , Masculino , Feminino , Adulto , Encéfalo/patologia , Encéfalo/diagnóstico por imagem , Pessoa de Meia-Idade , Fala/fisiologia , Atrofia/patologia , Distúrbios da Fala/etiologia , Distúrbios da Fala/patologia , Distúrbios da Fala/diagnóstico por imagem , Tamanho do Órgão
3.
Brain Lang ; 256: 105463, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39243486

RESUMO

We investigated how neural oscillations code the hierarchical nature of stress rhythms in speech and how stress processing varies with language experience. By measuring phase synchrony of multilevel EEG-acoustic tracking and intra-brain cross-frequency coupling, we show the encoding of stress involves different neural signatures (delta rhythms = stress foot rate; theta rhythms = syllable rate), is stronger for amplitude vs. duration stress cues, and induces nested delta-theta coherence mirroring the stress-syllable hierarchy in speech. Only native English, but not Mandarin, speakers exhibited enhanced neural entrainment at central stress (2 Hz) and syllable (4 Hz) rates intrinsic to natural English. English individuals with superior cortical-stress tracking capabilities also displayed stronger neural hierarchical coherence, highlighting a nuanced interplay between internal nesting of brain rhythms and external entrainment rooted in language-specific speech rhythms. Our cross-language findings reveal brain-speech synchronization is not purely a "bottom-up" but benefits from "top-down" processing from listeners' language-specific experience.


Assuntos
Percepção da Fala , Humanos , Feminino , Masculino , Percepção da Fala/fisiologia , Adulto , Eletroencefalografia , Encéfalo/fisiologia , Adulto Jovem , Fala/fisiologia , Idioma , Estimulação Acústica
4.
Elife ; 132024 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-39255194

RESUMO

Across the animal kingdom, neural responses in the auditory cortex are suppressed during vocalization, and humans are no exception. A common hypothesis is that suppression increases sensitivity to auditory feedback, enabling the detection of vocalization errors. This hypothesis has been previously confirmed in non-human primates, however a direct link between auditory suppression and sensitivity in human speech monitoring remains elusive. To address this issue, we obtained intracranial electroencephalography (iEEG) recordings from 35 neurosurgical participants during speech production. We first characterized the detailed topography of auditory suppression, which varied across superior temporal gyrus (STG). Next, we performed a delayed auditory feedback (DAF) task to determine whether the suppressed sites were also sensitive to auditory feedback alterations. Indeed, overlapping sites showed enhanced responses to feedback, indicating sensitivity. Importantly, there was a strong correlation between the degree of auditory suppression and feedback sensitivity, suggesting suppression might be a key mechanism that underlies speech monitoring. Further, we found that when participants produced speech with simultaneous auditory feedback, posterior STG was selectively activated if participants were engaged in a DAF paradigm, suggesting that increased attentional load can modulate auditory feedback sensitivity.


The brain lowers its response to inputs we generate ourselves, such as moving or speaking. Essentially, our brain 'knows' what will happen next when we carry out these actions, and therefore does not need to react as strongly as it would to unexpected events. This is why we cannot tickle ourselves, and why the brain does not react as much to our own voice as it does to someone else's. Quieting down the brain's response also allows us to focus on things that are new or important without getting distracted by our own movements or sounds. Studies in non-human primates showed that neurons in the auditory cortex (the region of the brain responsible for processing sound) displayed suppressed levels of activity when the animals made sounds. Interestingly, when the primates heard an altered version of their own voice, many of these same neurons became more active. But it was unclear whether this also happens in humans. To investigate, Ozker et al. used a technique called electrocorticography to record neural activity in different regions of the human brain while participants spoke. The results showed that most areas of the brain involved in auditory processing showed suppressed activity when individuals were speaking. However, when people heard an altered version of their own voice which had an unexpected delay, those same areas displayed increased activity. In addition, Ozker et al. found that the higher the level of suppression in the auditory cortex, the more sensitive these areas were to changes in a person's speech. These findings suggest that suppressing the brain's response to self-generated speech may help in detecting errors during speech production. Speech deficits are common in various neurological disorders, such as stuttering, Parkinson's disease, and aphasia. Ozker et al. hypothesize that these deficits may arise because individuals fail to suppress activity in auditory regions of the brain, causing a struggle when detecting and correcting errors in their own speech. However, further experiments are needed to test this theory.


Assuntos
Retroalimentação Sensorial , Fala , Humanos , Masculino , Feminino , Adulto , Retroalimentação Sensorial/fisiologia , Fala/fisiologia , Adulto Jovem , Córtex Auditivo/fisiologia , Lobo Temporal/fisiologia , Percepção da Fala/fisiologia , Eletroencefalografia , Eletrocorticografia , Estimulação Acústica
5.
PLoS One ; 19(9): e0307158, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39292701

RESUMO

This study aimed to investigate integration of alternating speech, a stimulus which classically produces a V-shaped speech intelligibility function with minimum at 2-6 Hz in typical-hearing (TH) listeners. We further studied how degraded speech impacts intelligibility across alternating rates (2, 4, 8, and 32 Hz) using vocoded speech, either in the right ear or bilaterally, to simulate single-sided deafness with a cochlear implant (SSD-CI) and bilateral CIs (BiCI), respectively. To assess potential cortical signatures of across-ear integration, we recorded activity in the bilateral auditory cortices (AC) and dorsolateral prefrontal cortices (DLPFC) during the task using functional near-infrared spectroscopy (fNIRS). For speech intelligibility, the V-shaped function was reproduced only in the BiCI condition; TH (with ceiling scores) and SSD-CI conditions had significantly higher scores across all alternating rates compared to the BiCI condition. For fNIRS, the AC and DLPFC exhibited significantly different activity across alternating rates in the TH condition, with altered activity patterns in both regions in the SSD-CI and BiCI conditions. Our results suggest that degraded speech inputs in one or both ears impact across-ear integration and that different listening strategies were employed for speech integration manifested as differences in cortical activity across conditions.


Assuntos
Córtex Auditivo , Implantes Cocleares , Espectroscopia de Luz Próxima ao Infravermelho , Percepção da Fala , Humanos , Espectroscopia de Luz Próxima ao Infravermelho/métodos , Masculino , Feminino , Adulto , Percepção da Fala/fisiologia , Córtex Auditivo/fisiologia , Córtex Auditivo/diagnóstico por imagem , Adulto Jovem , Inteligibilidade da Fala/fisiologia , Estimulação Acústica , Córtex Pré-Frontal Dorsolateral/fisiologia , Surdez/fisiopatologia , Fala/fisiologia
6.
Sensors (Basel) ; 24(17)2024 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-39275417

RESUMO

Speech emotion recognition (SER) is not only a ubiquitous aspect of everyday communication, but also a central focus in the field of human-computer interaction. However, SER faces several challenges, including difficulties in detecting subtle emotional nuances and the complicated task of recognizing speech emotions in noisy environments. To effectively address these challenges, we introduce a Transformer-based model called MelTrans, which is designed to distill critical clues from speech data by learning core features and long-range dependencies. At the heart of our approach is a dual-stream framework. Using the Transformer architecture as its foundation, MelTrans deciphers broad dependencies within speech mel-spectrograms, facilitating a nuanced understanding of emotional cues embedded in speech signals. Comprehensive experimental evaluations on the EmoDB (92.52%) and IEMOCAP (76.54%) datasets demonstrate the effectiveness of MelTrans. These results highlight MelTrans's ability to capture critical cues and long-range dependencies in speech data, setting a new benchmark within the context of these specific datasets. These results highlight the effectiveness of the proposed model in addressing the complex challenges posed by SER tasks.


Assuntos
Emoções , Fala , Humanos , Emoções/fisiologia , Fala/fisiologia , Algoritmos , Interface para o Reconhecimento da Fala
7.
Sensors (Basel) ; 24(17)2024 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-39275431

RESUMO

Advancements in deep learning speech representations have facilitated the effective use of extensive unlabeled speech datasets for Parkinson's disease (PD) modeling with minimal annotated data. This study employs the non-fine-tuned wav2vec 1.0 architecture to develop machine learning models for PD speech diagnosis tasks, such as cross-database classification and regression to predict demographic and articulation characteristics. The primary aim is to analyze overlapping components within the embeddings on both classification and regression tasks, investigating whether latent speech representations in PD are shared across models, particularly for related tasks. Firstly, evaluation using three multi-language PD datasets showed that wav2vec accurately detected PD based on speech, outperforming feature extraction using mel-frequency cepstral coefficients in the proposed cross-database classification scenarios. In cross-database scenarios using Italian and English-read texts, wav2vec demonstrated performance comparable to intra-dataset evaluations. We also compared our cross-database findings against those of other related studies. Secondly, wav2vec proved effective in regression, modeling various quantitative speech characteristics related to articulation and aging. Ultimately, subsequent analysis of important features examined the presence of significant overlaps between classification and regression models. The feature importance experiments discovered shared features across trained models, with increased sharing for related tasks, further suggesting that wav2vec contributes to improved generalizability. The study proposes wav2vec embeddings as a next promising step toward a speech-based universal model to assist in the evaluation of PD.


Assuntos
Bases de Dados Factuais , Doença de Parkinson , Fala , Doença de Parkinson/fisiopatologia , Humanos , Fala/fisiologia , Aprendizado Profundo , Masculino , Feminino , Idoso , Aprendizado de Máquina , Pessoa de Meia-Idade
8.
Sensors (Basel) ; 24(17)2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39275615

RESUMO

Speech emotion recognition is key to many fields, including human-computer interaction, healthcare, and intelligent assistance. While acoustic features extracted from human speech are essential for this task, not all of them contribute to emotion recognition effectively. Thus, reduced numbers of features are required within successful emotion recognition models. This work aimed to investigate whether splitting the features into two subsets based on their distribution and then applying commonly used feature reduction methods would impact accuracy. Filter reduction was employed using the Kruskal-Wallis test, followed by principal component analysis (PCA) and independent component analysis (ICA). A set of features was investigated to determine whether the indiscriminate use of parametric feature reduction techniques affects the accuracy of emotion recognition. For this investigation, data from three databases-Berlin EmoDB, SAVEE, and RAVDES-were organized into subsets according to their distribution in applying both PCA and ICA. The results showed a reduction from 6373 features to 170 for the Berlin EmoDB database with an accuracy of 84.3%; a final size of 130 features for SAVEE, with a corresponding accuracy of 75.4%; and 150 features for RAVDESS, with an accuracy of 59.9%.


Assuntos
Emoções , Análise de Componente Principal , Fala , Humanos , Emoções/fisiologia , Fala/fisiologia , Bases de Dados Factuais , Algoritmos , Reconhecimento Automatizado de Padrão/métodos
9.
Sensors (Basel) ; 24(17)2024 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-39275707

RESUMO

Emotion recognition through speech is a technique employed in various scenarios of Human-Computer Interaction (HCI). Existing approaches have achieved significant results; however, limitations persist, with the quantity and diversity of data being more notable when deep learning techniques are used. The lack of a standard in feature selection leads to continuous development and experimentation. Choosing and designing the appropriate network architecture constitutes another challenge. This study addresses the challenge of recognizing emotions in the human voice using deep learning techniques, proposing a comprehensive approach, and developing preprocessing and feature selection stages while constructing a dataset called EmoDSc as a result of combining several available databases. The synergy between spectral features and spectrogram images is investigated. Independently, the weighted accuracy obtained using only spectral features was 89%, while using only spectrogram images, the weighted accuracy reached 90%. These results, although surpassing previous research, highlight the strengths and limitations when operating in isolation. Based on this exploration, a neural network architecture composed of a CNN1D, a CNN2D, and an MLP that fuses spectral features and spectogram images is proposed. The model, supported by the unified dataset EmoDSc, demonstrates a remarkable accuracy of 96%.


Assuntos
Aprendizado Profundo , Emoções , Redes Neurais de Computação , Humanos , Emoções/fisiologia , Fala/fisiologia , Bases de Dados Factuais , Algoritmos , Reconhecimento Automatizado de Padrão/métodos
10.
Nat Commun ; 15(1): 7897, 2024 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-39284848

RESUMO

Historically, eloquent functions have been viewed as localized to focal areas of human cerebral cortex, while more recent studies suggest they are encoded by distributed networks. We examined the network properties of cortical sites defined by stimulation to be critical for speech and language, using electrocorticography from sixteen participants during word-reading. We discovered distinct network signatures for sites where stimulation caused speech arrest and language errors. Both demonstrated lower local and global connectivity, whereas sites causing language errors exhibited higher inter-community connectivity, identifying them as connectors between modules in the language network. We used machine learning to classify these site types with reasonably high accuracy, even across participants, suggesting that a site's pattern of connections within the task-activated language network helps determine its importance to function. These findings help to bridge the gap in our understanding of how focal cortical stimulation interacts with complex brain networks to elicit language deficits.


Assuntos
Córtex Cerebral , Eletrocorticografia , Idioma , Fala , Humanos , Masculino , Feminino , Córtex Cerebral/fisiologia , Adulto , Fala/fisiologia , Rede Nervosa/fisiologia , Adulto Jovem , Aprendizado de Máquina , Mapeamento Encefálico
11.
J Int Med Res ; 52(9): 3000605241265338, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39291423

RESUMO

Functional MRI (fMRI) is gaining importance in the preoperative assessment of language for presurgical planning. However, inconsistencies with the Wada test might arise. This current case report describes a very rare case of an epileptic patient who exhibited bilateral distribution (right > left) in the inferior frontal gyrus (laterality index [LI] = -0.433) and completely right dominance in the superior temporal gyrus (LI = -1). However, the Wada test revealed a dissociation: his motor speech was located in the left hemisphere, while he could understand vocal instructions with his right hemisphere. A clinical implication is that the LIs obtained by fMRI should be cautiously used to determine Broca's area in atypical patients; for example, even when complete right dominance is found in the temporal cortex in right-handed patients. Theoretically, as the spatially separated functions of motor speech and language comprehension (by the combined results of fMRI and Wada) can be further temporally separated (by the intracarotid amobarbital procedure) in this case report, these findings might provide direct support to Broca's initial conclusions that Broca's area is associated with acquired motor speech impairment, but not language comprehension per se. Moreover, this current finding supports the idea that once produced, motor speech can be independent from language comprehension.


Assuntos
Lateralidade Funcional , Idioma , Imageamento por Ressonância Magnética , Humanos , Imageamento por Ressonância Magnética/métodos , Masculino , Área de Broca/diagnóstico por imagem , Área de Broca/fisiopatologia , Adulto , Lobo Temporal/diagnóstico por imagem , Lobo Temporal/fisiopatologia , Mapeamento Encefálico/métodos , Epilepsia/diagnóstico por imagem , Epilepsia/cirurgia , Epilepsia/fisiopatologia , Epilepsia/diagnóstico , Fala/fisiologia
12.
Codas ; 36(5): e20230194, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39230179

RESUMO

PURPOSE: To describe the effects of subthalamic nucleus deep brain stimulation (STN-DBS) on the speech of Spanish-speaking Parkinson's disease (PD) patients during the first year of treatment. METHODS: The speech measures (SMs): maximum phonation time, acoustic voice measures, speech rate, speech intelligibility measures, and oral diadochokinesis rates of nine Colombian idiopathic PD patients (four females and five males; age = 63 ± 7 years; years of PD = 10 ± 7 years; UPDRS-III = 57 ± 6; H&Y = 2 ± 0.3) were studied in OFF and ON medication states before and every three months during the first year after STN-DBS surgery. Praat software and healthy native listeners' ratings were used for speech analysis. Statistical analysis tried to find significant differences in the SMs during follow-up (Friedman test) and between medication states (Wilcoxon paired test). Also, a pre-surgery variation interval (PSVI) of reference for every participant and SM was calculated to make an individual analysis of post-surgery variation. RESULTS: Non-significative post-surgery or medication state-related differences in the SMs were found. Nevertheless, individually, based on PSVIs, the SMs exhibited: no variation, inconsistent or consistent variation during post-surgery follow-up in different combinations, depending on the medication state. CONCLUSION: As a group, participants did not have a shared post-surgery pattern of change in any SM. Instead, based on PSVIs, the SMs varied differently in every participant, which suggests that in Spanish-speaking PD patients, the effects of STN-DBS on speech during the first year of treatment could be highly variable.


Assuntos
Estimulação Encefálica Profunda , Doença de Parkinson , Núcleo Subtalâmico , Humanos , Doença de Parkinson/terapia , Doença de Parkinson/fisiopatologia , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Inteligibilidade da Fala/fisiologia , Idioma , Distúrbios da Fala/etiologia , Distúrbios da Fala/terapia , Fala/fisiologia , Medida da Produção da Fala , Resultado do Tratamento
13.
Acta Neurochir (Wien) ; 166(1): 369, 2024 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-39283500

RESUMO

BACKGROUND: Speech changes significantly impact the quality of life for Parkinson's disease (PD) patients. Deep Brain Stimulation (DBS) of the Subthalamic Nucleus (STN) is a standard treatment for advanced PD, but its effects on speech remain unclear. This study aimed to investigate the relationship between STN-DBS and speech changes in PD patients using comprehensive clinical assessments and tractography. METHODS: Forty-seven PD patients underwent STN-DBS, with preoperative and 3-month postoperative assessments. Speech analyses included acoustic measurements, auditory-perceptual evaluations, and fluency-intelligibility tests. On the other hand, structures within the volume tissue activated (VTA) were identified using MRI and DTI. The clinical and demographic data and structures associated with VTA (Corticospinal tract, Internal capsule, Dentato-rubro-thalamic tract, Medial forebrain bundle, Medial lemniscus, Substantia nigra, Red nucleus) were compared with speech analyses. RESULTS: The majority of patients (36.2-55.4% good, 29.7-53.1% same) exhibited either improved or unchanged speech quality following STN-DBS. Only a small percentage (8.5-14.9%) experienced deterioration. Older patients and those with worsened motor symptoms postoperatively were more likely to experience negative speech changes (p < 0.05). Interestingly, stimulation of the right Substantia Nigra correlated with improved speech quality (p < 0.05). No significant relationship was found between other structures affected by VTA and speech changes. CONCLUSIONS: This study suggests that STN-DBS does not predominantly negatively impact speech in PD patients, with potential benefits observed, especially in younger patients. These findings underscore the importance of individualized treatment approaches and highlight the need for further long-term studies to optimize therapeutic outcomes and better understand the effects of STN-DBS on speech.


Assuntos
Estimulação Encefálica Profunda , Imagem de Tensor de Difusão , Doença de Parkinson , Fala , Núcleo Subtalâmico , Humanos , Núcleo Subtalâmico/diagnóstico por imagem , Núcleo Subtalâmico/cirurgia , Estimulação Encefálica Profunda/métodos , Masculino , Feminino , Pessoa de Meia-Idade , Doença de Parkinson/terapia , Doença de Parkinson/diagnóstico por imagem , Idoso , Imagem de Tensor de Difusão/métodos , Estudos Prospectivos , Fala/fisiologia , Distúrbios da Fala/etiologia , Resultado do Tratamento , Adulto
14.
Hum Brain Mapp ; 45(13): e70023, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39268584

RESUMO

The relationship between speech production and perception is a topic of ongoing debate. Some argue that there is little interaction between the two, while others claim they share representations and processes. One perspective suggests increased recruitment of the speech motor system in demanding listening situations to facilitate perception. However, uncertainties persist regarding the specific regions involved and the listening conditions influencing its engagement. This study used activation likelihood estimation in coordinate-based meta-analyses to investigate the neural overlap between speech production and three speech perception conditions: speech-in-noise, spectrally degraded speech and linguistically complex speech. Neural overlap was observed in the left frontal, insular and temporal regions. Key nodes included the left frontal operculum (FOC), left posterior lateral part of the inferior frontal gyrus (IFG), left planum temporale (PT), and left pre-supplementary motor area (pre-SMA). The left IFG activation was consistently observed during linguistic processing, suggesting sensitivity to the linguistic content of speech. In comparison, the left pre-SMA activation was observed when processing degraded and noisy signals, indicating sensitivity to signal quality. Activations of the left PT and FOC activation were noted in all conditions, with the posterior FOC area overlapping in all conditions. Our meta-analysis reveals context-independent (FOC, PT) and context-dependent (pre-SMA, posterior lateral IFG) regions within the speech motor system during challenging speech perception. These regions could contribute to sensorimotor integration and executive cognitive control for perception and production.


Assuntos
Percepção da Fala , Fala , Humanos , Percepção da Fala/fisiologia , Fala/fisiologia , Mapeamento Encefálico , Funções Verossimilhança , Córtex Motor/fisiologia , Córtex Cerebral/fisiologia , Córtex Cerebral/diagnóstico por imagem
15.
Sci Rep ; 14(1): 20756, 2024 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-39237702

RESUMO

The basic function of the tongue in pronouncing diadochokinesis and other syllables is not fully understood. This study investigates the influence of sound pressure levels and syllables on tongue pressure and muscle activity in 19 healthy adults (mean age: 28.2 years; range: 22-33 years). Tongue pressure and activity of the posterior tongue were measured using electromyography (EMG) when the velar stops /ka/, /ko/, /ga/, and /go/ were pronounced at 70, 60, 50, and 40 dB. Spearman's rank correlation revealed a significant, yet weak, positive association between tongue pressure and EMG activity (ρ = 0.14, p < 0.05). Mixed-effects model analysis showed that tongue pressure and EMG activity significantly increased at 70 dB compared to other sound pressure levels. While syllables did not significantly affect tongue pressure, the syllable /ko/ significantly increased EMG activity (coefficient = 0.048, p = 0.013). Although no significant differences in tongue pressure were observed for the velar stops /ka/, /ko/, /ga/, and /go/, it is suggested that articulation is achieved by altering the activity of both extrinsic and intrinsic tongue muscles. These findings highlight the importance of considering both tongue pressure and muscle activity when examining the physiological factors contributing to sound pressure levels during speech.


Assuntos
Eletromiografia , Pressão , Fala , Língua , Humanos , Língua/fisiologia , Eletromiografia/métodos , Adulto , Masculino , Feminino , Adulto Jovem , Fala/fisiologia , Fonética
16.
J Speech Lang Hear Res ; 67(9): 2964-2976, 2024 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-39265154

RESUMO

INTRODUCTION: Transcribing disordered speech can be useful when diagnosing motor speech disorders such as primary progressive apraxia of speech (PPAOS), who have sound additions, deletions, and substitutions, or distortions and/or slow, segmented speech. Since transcribing speech can be a laborious process and requires an experienced listener, using automatic speech recognition (ASR) systems for diagnosis and treatment monitoring is appealing. This study evaluated the efficacy of a readily available ASR system (wav2vec 2.0) in transcribing speech of PPAOS patients to determine if the word error rate (WER) output by the ASR can differentiate between healthy speech and PPAOS and/or among its subtypes, whether WER correlates with AOS severity, and how the ASR's errors compare to those noted in manual transcriptions. METHOD: Forty-five patients with PPAOS and 22 healthy controls were recorded repeating 13 words, 3 times each, which were transcribed manually and using wav2vec 2.0. The WER and phonetic and prosodic speech errors were compared between groups, and ASR results were compared against manual transcriptions. RESULTS: Mean overall WER was 0.88 for patients and 0.33 for controls. WER significantly correlated with AOS severity and accurately distinguished between patients and controls but not between AOS subtypes. The phonetic and prosodic errors from the ASR transcriptions were also unable to distinguish between subtypes, whereas errors calculated from human transcriptions were. There was poor agreement in the number of phonetic and prosodic errors between the ASR and human transcriptions. CONCLUSIONS: This study demonstrates that ASR can be useful in differentiating healthy from disordered speech and evaluating PPAOS severity but does not distinguish PPAOS subtypes. ASR transcriptions showed weak agreement with human transcriptions; thus, ASR may be a useful tool for the transcription of speech in PPAOS, but the research questions posed must be carefully considered within the context of its limitations. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.26359417.


Assuntos
Interface para o Reconhecimento da Fala , Humanos , Masculino , Feminino , Idoso , Pessoa de Meia-Idade , Fala/fisiologia , Apraxias/diagnóstico , Medida da Produção da Fala/métodos , Fonética , Afasia Primária Progressiva/diagnóstico , Estudos de Casos e Controles
17.
Artigo em Inglês | MEDLINE | ID: mdl-39255187

RESUMO

OBJECTIVE: Speech brain-computer interfaces (speech BCIs), which convert brain signals into spoken words or sentences, have demonstrated great potential for high-performance BCI communication. Phonemes are the basic pronunciation units. For monosyllabic languages such as Chinese Mandarin, where a word usually contains less than three phonemes, accurate decoding of phonemes plays a vital role. We found that in the neural representation space, phonemes with similar pronunciations are often inseparable, leading to confusion in phoneme classification. METHODS: We mapped the neural signals of phoneme pronunciation into a hyperbolic space for a more distinct phoneme representation. Critically, we proposed a hyperbolic hierarchical clustering approach to specifically learn a phoneme-level structure to guide the representation. RESULTS: We found such representation facilitated greater distance between similar phonemes, effectively reducing confusion. In the phoneme decoding task, our approach demonstrated an average accuracy of 75.21% for 21 phonemes and outperformed existing methods across different experimental days. CONCLUSION: Our approach showed high accuracy in phoneme classification. By learning the phoneme-level neural structure, the representations of neural signals were more discriminative and interpretable. SIGNIFICANCE: Our approach can potentially facilitate high-performance speech BCIs for Chinese and other monosyllabic languages.


Assuntos
Algoritmos , Interfaces Cérebro-Computador , Eletroencefalografia , Redes Neurais de Computação , Humanos , Eletroencefalografia/métodos , Masculino , Feminino , Adulto Jovem , Fala/fisiologia , Adulto , Fonética , Análise por Conglomerados , Idioma
18.
J Acoust Soc Am ; 156(3): 1850-1861, 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-39287467

RESUMO

Research has shown that talkers reliably coordinate the timing of articulator movements across variation in production rate and syllable stress, and that this precision of inter-articulator timing instantiates phonetic structure in the resulting acoustic signal. We here tested the hypothesis that immediate auditory feedback helps regulate that consistent articulatory timing control. Talkers with normal hearing recorded 480 /tV#Cat/ utterances using electromagnetic articulography, with alternative V (/ɑ/-/ɛ/) and C (/t/-/d/), across variation in production rate (fast-normal) and stress (first syllable stressed-unstressed). Utterances were split between two listening conditions: unmasked and masked. To quantify the effect of immediate auditory feedback on the coordination between the jaw and tongue-tip, the timing of tongue-tip raising onset for C, relative to the jaw opening-closing cycle for V, was obtained in each listening condition. Across both listening conditions, any manipulation that shortened the jaw opening-closing cycle reduced the latency of tongue-tip movement onset, relative to the onset of jaw opening. Moreover, tongue-tip latencies were strongly affiliated with utterance type. During auditory masking, however, tongue-tip latencies were less strongly affiliated with utterance type, demonstrating that talkers use afferent auditory signals in real-time to regulate the precision of inter-articulator timing in service to phonetic structure.


Assuntos
Retroalimentação Sensorial , Fonética , Percepção da Fala , Língua , Humanos , Língua/fisiologia , Masculino , Feminino , Adulto , Retroalimentação Sensorial/fisiologia , Adulto Jovem , Percepção da Fala/fisiologia , Arcada Osseodentária/fisiologia , Acústica da Fala , Medida da Produção da Fala/métodos , Fatores de Tempo , Fala/fisiologia , Mascaramento Perceptivo
19.
Behav Brain Res ; 475: 115216, 2024 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-39214421

RESUMO

Engaging in dialog requires interlocutors to coordinate sending and receiving linguistic signals to build a discourse based upon interpretations and perceptions interconnected with a range of emotions. Conversing in a foreign language may induce emotions such as anxiety which influence the quality communication. The neural processes underpinning these interactions are crucial to understanding foreign language anxiety (FLA). Electroencephalography (EEG) studies reveal that anxiety is often displayed via hemispheric frontal alpha asymmetry (FAA). To examine the neural mechanisms underlying FLA, we collected self-reported data on the listening and speaking sections of the Second language skill specific anxiety scale (L2AS) over behavioral, cognitive, and somatic domains and recorded EEG signals during participation in word chain turn-taking activities in first (L1, Chinese) and second (L2, English) languages. Regression analysis showed FAA for the L2 condition was a significant predictor primarily of the behavioral and somatic domains on the L2AS speaking section. The results are discussed along with implications for improving communication during L2 interactions.


Assuntos
Ritmo alfa , Ansiedade , Eletroencefalografia , Multilinguismo , Humanos , Masculino , Ansiedade/fisiopatologia , Feminino , Adulto Jovem , Ritmo alfa/fisiologia , Adulto , Fala/fisiologia , Lateralidade Funcional/fisiologia , Lobo Frontal/fisiologia , Idioma , Adolescente
20.
Sci Rep ; 14(1): 20270, 2024 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-39217249

RESUMO

Dysphagia, a disorder affecting the ability to swallow, has a high prevalence among the older adults and can lead to serious health complications. Therefore, early detection of dysphagia is important. This study evaluated the effectiveness of a newly developed deep learning model that analyzes syllable-segmented data for diagnosing dysphagia, an aspect not addressed in prior studies. The audio data of daily conversations were collected from 16 patients with dysphagia and 24 controls. The presence of dysphagia was determined by videofluoroscopic swallowing study. The data were segmented into syllables using a speech-to-text model and analyzed with a convolutional neural network to perform binary classification between the dysphagia patients and control group. The proposed model in this study was assessed in two different aspects. Firstly, with syllable-segmented analysis, it demonstrated a diagnostic accuracy of 0.794 for dysphagia, a sensitivity of 0.901, a specificity of 0.687, a positive predictive value of 0.742, and a negative predictive value of 0.874. Secondly, at the individual level, it achieved an overall accuracy of 0.900 and area under the curve of 0.953. This research highlights the potential of deep learning modal as an early, non-invasive, and simple method for detecting dysphagia in everyday environments.


Assuntos
Aprendizado Profundo , Transtornos de Deglutição , Fala , Humanos , Transtornos de Deglutição/diagnóstico , Transtornos de Deglutição/fisiopatologia , Masculino , Feminino , Idoso , Fala/fisiologia , Idoso de 80 Anos ou mais , Pessoa de Meia-Idade , Deglutição/fisiologia , Redes Neurais de Computação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...