Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 28.557
1.
Curr Biol ; 34(9): R348-R351, 2024 May 06.
Article En | MEDLINE | ID: mdl-38714162

A recent study has used scalp-recorded electroencephalography to obtain evidence of semantic processing of human speech and objects by domesticated dogs. The results suggest that dogs do comprehend the meaning of familiar spoken words, in that a word can evoke the mental representation of the object to which it refers.


Cognition , Semantics , Animals , Dogs/psychology , Cognition/physiology , Humans , Electroencephalography , Speech/physiology , Speech Perception/physiology , Comprehension/physiology
2.
Cogn Res Princ Implic ; 9(1): 29, 2024 05 12.
Article En | MEDLINE | ID: mdl-38735013

Auditory stimuli that are relevant to a listener have the potential to capture focal attention even when unattended, the listener's own name being a particularly effective stimulus. We report two experiments to test the attention-capturing potential of the listener's own name in normal speech and time-compressed speech. In Experiment 1, 39 participants were tested with a visual word categorization task with uncompressed spoken names as background auditory distractors. Participants' word categorization performance was slower when hearing their own name rather than other names, and in a final test, they were faster at detecting their own name than other names. Experiment 2 used the same task paradigm, but the auditory distractors were time-compressed names. Three compression levels were tested with 25 participants in each condition. Participants' word categorization performance was again slower when hearing their own name than when hearing other names; the slowing was strongest with slight compression and weakest with intense compression. Personally relevant time-compressed speech has the potential to capture attention, but the degree of capture depends on the level of compression. Attention capture by time-compressed speech has practical significance and provides partial evidence for the duplex-mechanism account of auditory distraction.


Attention , Names , Speech Perception , Humans , Attention/physiology , Female , Male , Speech Perception/physiology , Adult , Young Adult , Speech/physiology , Reaction Time/physiology , Acoustic Stimulation
3.
Cereb Cortex ; 34(5)2024 May 02.
Article En | MEDLINE | ID: mdl-38715409

Behavioral and brain-related changes in word production have been claimed to predominantly occur after 70 years of age. Most studies investigating age-related changes in adulthood only compared young to older adults, failing to determine whether neural processes underlying word production change at an earlier age than observed in behavior. This study aims to fill this gap by investigating whether changes in neurophysiological processes underlying word production are aligned with behavioral changes. Behavior and the electrophysiological event-related potential patterns of word production were assessed during a picture naming task in 95 participants across five adult lifespan age groups (ranging from 16 to 80 years old). While behavioral performance decreased starting from 70 years of age, significant neurophysiological changes were present at the age of 40 years old, in a time window (between 150 and 220 ms) likely associated with lexical-semantic processes underlying referential word production. These results show that neurophysiological modifications precede the behavioral changes in language production; they can be interpreted in line with the suggestion that the lexical-semantic reorganization in mid-adulthood influences the maintenance of language skills longer than for other cognitive functions.


Aging , Electroencephalography , Evoked Potentials , Humans , Adult , Aged , Male , Middle Aged , Female , Young Adult , Adolescent , Aged, 80 and over , Aging/physiology , Evoked Potentials/physiology , Brain/physiology , Speech/physiology , Semantics
4.
Nat Commun ; 15(1): 3692, 2024 May 01.
Article En | MEDLINE | ID: mdl-38693186

Over the last decades, cognitive neuroscience has identified a distributed set of brain regions that are critical for attention. Strong anatomical overlap with brain regions critical for oculomotor processes suggests a joint network for attention and eye movements. However, the role of this shared network in complex, naturalistic environments remains understudied. Here, we investigated eye movements in relation to (un)attended sentences of natural speech. Combining simultaneously recorded eye tracking and magnetoencephalographic data with temporal response functions, we show that gaze tracks attended speech, a phenomenon we termed ocular speech tracking. Ocular speech tracking even differentiates a target from a distractor in a multi-speaker context and is further related to intelligibility. Moreover, we provide evidence for its contribution to neural differences in speech processing, emphasizing the necessity to consider oculomotor activity in future research and in the interpretation of neural differences in auditory cognition.


Attention , Eye Movements , Magnetoencephalography , Speech Perception , Speech , Humans , Attention/physiology , Eye Movements/physiology , Male , Female , Adult , Young Adult , Speech Perception/physiology , Speech/physiology , Acoustic Stimulation , Brain/physiology , Eye-Tracking Technology
5.
J Hist Ideas ; 85(2): 209-235, 2024.
Article En | MEDLINE | ID: mdl-38708647

In 1644 George Wither stood outside or without the doors of the House of Commons and delivered a speech to Parliament and the nation simultaneously. Not only did this "print oration" function as a prototype for Areopagitica, A Speech of John Milton [. . .] to the Parliament of England, but it inspired a genre of print pamphlets that would extend well into the eighteenth century. This article identifies and argues for the popular consequences of the genre, detailing its contribution to England's developing structure of political communication and representation.


Politics , History, 18th Century , England , History, 17th Century , Speech
6.
Ugeskr Laeger ; 186(18)2024 Apr 29.
Article Da | MEDLINE | ID: mdl-38704717

Ankyloglossia or tongue-tie is a condition where the anatomical variation of the sublingual frenulum can limit normal tongue function. In Denmark, as in other countries, an increase in the number of children treated for ankyloglossia has been described over the past years. Whether or not ankyloglossia and its release affect the speech has also been increasingly discussed on Danish television and social media. In this review, the possible connection between ankyloglossia, its surgical treatment, and speech development in children is discussed.


Ankyloglossia , Humans , Ankyloglossia/surgery , Child , Language Development , Tongue/surgery , Lingual Frenum/surgery , Lingual Frenum/abnormalities , Speech , Infant
7.
J Acoust Soc Am ; 155(5): 3206-3212, 2024 May 01.
Article En | MEDLINE | ID: mdl-38738937

Modern humans and chimpanzees share a common ancestor on the phylogenetic tree, yet chimpanzees do not spontaneously produce speech or speech sounds. The lab exercise presented in this paper was developed for undergraduate students in a course entitled "What's Special About Human Speech?" The exercise is based on acoustic analyses of the words "cup" and "papa" as spoken by Viki, a home-raised, speech-trained chimpanzee, as well as the words spoken by a human. The analyses allow students to relate differences in articulation and vocal abilities between Viki and humans to the known anatomical differences in their vocal systems. Anatomical and articulation differences between humans and Viki include (1) potential tongue movements, (2) presence or absence of laryngeal air sacs, (3) presence or absence of vocal membranes, and (4) exhalation vs inhalation during production.


Pan troglodytes , Speech Acoustics , Speech , Humans , Animals , Pan troglodytes/physiology , Speech/physiology , Tongue/physiology , Tongue/anatomy & histology , Vocalization, Animal/physiology , Species Specificity , Speech Production Measurement , Larynx/physiology , Larynx/anatomy & histology , Phonetics
8.
Cereb Cortex ; 34(5)2024 May 02.
Article En | MEDLINE | ID: mdl-38741267

The role of the left temporoparietal cortex in speech production has been extensively studied during native language processing, proving crucial in controlled lexico-semantic retrieval under varying cognitive demands. Yet, its role in bilinguals, fluent in both native and second languages, remains poorly understood. Here, we employed continuous theta burst stimulation to disrupt neural activity in the left posterior middle-temporal gyrus (pMTG) and angular gyrus (AG) while Italian-Friulian bilinguals performed a cued picture-naming task. The task involved between-language (naming objects in Italian or Friulian) and within-language blocks (naming objects ["knife"] or associated actions ["cut"] in a single language) in which participants could either maintain (non-switch) or change (switch) instructions based on cues. During within-language blocks, cTBS over the pMTG entailed faster naming for high-demanding switch trials, while cTBS to the AG elicited slower latencies in low-demanding non-switch trials. No cTBS effects were observed in the between-language block. Our findings suggest a causal involvement of the left pMTG and AG in lexico-semantic processing across languages, with distinct contributions to controlled vs. "automatic" retrieval, respectively. However, they do not support the existence of shared control mechanisms within and between language(s) production. Altogether, these results inform neurobiological models of semantic control in bilinguals.


Multilingualism , Parietal Lobe , Speech , Temporal Lobe , Transcranial Magnetic Stimulation , Humans , Male , Temporal Lobe/physiology , Female , Young Adult , Adult , Parietal Lobe/physiology , Speech/physiology , Cues
9.
PLoS One ; 19(5): e0302739, 2024.
Article En | MEDLINE | ID: mdl-38728329

BACKGROUND: Deep brain stimulation (DBS) reliably ameliorates cardinal motor symptoms in Parkinson's disease (PD) and essential tremor (ET). However, the effects of DBS on speech, voice and language have been inconsistent and have not been examined comprehensively in a single study. OBJECTIVE: We conducted a systematic analysis of literature by reviewing studies that examined the effects of DBS on speech, voice and language in PD and ET. METHODS: A total of 675 publications were retrieved from PubMed, Embase, CINHAL, Web of Science, Cochrane Library and Scopus databases. Based on our selection criteria, 90 papers were included in our analysis. The selected publications were categorized into four subcategories: Fluency, Word production, Articulation and phonology and Voice quality. RESULTS: The results suggested a long-term decline in verbal fluency, with more studies reporting deficits in phonemic fluency than semantic fluency following DBS. Additionally, high frequency stimulation, left-sided and bilateral DBS were associated with worse verbal fluency outcomes. Naming improved in the short-term following DBS-ON compared to DBS-OFF, with no long-term differences between the two conditions. Bilateral and low-frequency DBS demonstrated a relative improvement for phonation and articulation. Nonetheless, long-term DBS exacerbated phonation and articulation deficits. The effect of DBS on voice was highly variable, with both improvements and deterioration in different measures of voice. CONCLUSION: This was the first study that aimed to combine the outcome of speech, voice, and language following DBS in a single systematic review. The findings revealed a heterogeneous pattern of results for speech, voice, and language across DBS studies, and provided directions for future studies.


Deep Brain Stimulation , Language , Parkinson Disease , Speech , Voice , Deep Brain Stimulation/methods , Humans , Parkinson Disease/therapy , Parkinson Disease/physiopathology , Speech/physiology , Voice/physiology , Essential Tremor/therapy , Essential Tremor/physiopathology
10.
J Psycholinguist Res ; 53(3): 45, 2024 May 13.
Article En | MEDLINE | ID: mdl-38739304

English is widely regarded as a global language, and it has become increasingly important for global communication. As a result, the demand for English language education has been on the rise. In China, a significant number of individuals are engaged in learning the English language. However, many English learners in China encounter challenges when it comes to developing their speaking skills. This study aims to investigate the factors influencing the speaking skills of English learners in China. Employing a mixed-methods approach, data were collected through a questionnaire from 455 college students from three different courses (arts, science & business, and commerce) in China. The study findings identified several factors impacting the speaking skills of English learners in China, including limited opportunities for speaking practice, fear of making mistakes, limited exposure to English-speaking environments, inadequate teacher training, and the influence of the Chinese language on English pronunciation. Additionally, the study highlighted that learners who have greater exposure to English-speaking environments and more opportunities for speaking practice tend to demonstrate better speaking skills. The novelty of this study lies in its valuable insights into the factors influencing the speaking skills of English learners in China. Based on the findings, it is recommended that English teachers receive enhanced training to effectively teach speaking skills, and learners should be provided with increased opportunities for speaking practice, such as participating in group discussions or engaging in speaking activities.


Learning , Humans , China , Female , Male , Learning/physiology , Young Adult , Multilingualism , Speech , Language , Students/psychology , Adult , Surveys and Questionnaires , Phonetics , East Asian People
11.
eNeuro ; 11(5)2024 May.
Article En | MEDLINE | ID: mdl-38658138

More and more patients worldwide are diagnosed with dementia, which emphasizes the urgent need for early detection markers. In this study, we built on the auditory hypersensitivity theory of a previous study-which postulated that responses to auditory input in the subcortex as well as cortex are enhanced in cognitive decline-and examined auditory encoding of natural continuous speech at both neural levels for its indicative potential for cognitive decline. We recruited study participants aged 60 years and older, who were divided into two groups based on the Montreal Cognitive Assessment, one group with low scores (n = 19, participants with signs of cognitive decline) and a control group (n = 25). Participants completed an audiometric assessment and then we recorded their electroencephalography while they listened to an audiobook and click sounds. We derived temporal response functions and evoked potentials from the data and examined response amplitudes for their potential to predict cognitive decline, controlling for hearing ability and age. Contrary to our expectations, no evidence of auditory hypersensitivity was observed in participants with signs of cognitive decline; response amplitudes were comparable in both cognitive groups. Moreover, the combination of response amplitudes showed no predictive value for cognitive decline. These results challenge the proposed hypothesis and emphasize the need for further research to identify reliable auditory markers for the early detection of cognitive decline.


Cognitive Dysfunction , Electroencephalography , Evoked Potentials, Auditory , Humans , Female , Male , Aged , Cognitive Dysfunction/physiopathology , Cognitive Dysfunction/diagnosis , Middle Aged , Evoked Potentials, Auditory/physiology , Speech Perception/physiology , Aged, 80 and over , Cerebral Cortex/physiology , Cerebral Cortex/physiopathology , Acoustic Stimulation , Speech/physiology
12.
J Neural Eng ; 21(3)2024 May 07.
Article En | MEDLINE | ID: mdl-38648782

Objective.Brain-computer interfaces (BCIs) have the potential to reinstate lost communication faculties. Results from speech decoding studies indicate that a usable speech BCI based on activity in the sensorimotor cortex (SMC) can be achieved using subdurally implanted electrodes. However, the optimal characteristics for a successful speech implant are largely unknown. We address this topic in a high field blood oxygenation level dependent functional magnetic resonance imaging (fMRI) study, by assessing the decodability of spoken words as a function of hemisphere, gyrus, sulcal depth, and position along the ventral/dorsal-axis.Approach.Twelve subjects conducted a 7T fMRI experiment in which they pronounced 6 different pseudo-words over 6 runs. We divided the SMC by hemisphere, gyrus, sulcal depth, and position along the ventral/dorsal axis. Classification was performed on in these SMC areas using multiclass support vector machine (SVM).Main results.Significant classification was possible from the SMC, but no preference for the left or right hemisphere, nor for the precentral or postcentral gyrus for optimal word classification was detected. Classification while using information from the cortical surface was slightly better than when using information from deep in the central sulcus and was highest within the ventral 50% of SMC. Confusion matrices where highly similar across the entire SMC. An SVM-searchlight analysis revealed significant classification in the superior temporal gyrus and left planum temporale in addition to the SMC.Significance.The current results support a unilateral implant using surface electrodes, covering the ventral 50% of the SMC. The added value of depth electrodes is unclear. We did not observe evidence for variations in the qualitative nature of information across SMC. The current results need to be confirmed in paralyzed patients performing attempted speech.


Brain-Computer Interfaces , Magnetic Resonance Imaging , Speech , Humans , Magnetic Resonance Imaging/methods , Male , Adult , Female , Speech/physiology , Young Adult , Electrodes, Implanted , Brain Mapping/methods
13.
J Acoust Soc Am ; 155(4): 2603-2611, 2024 Apr 01.
Article En | MEDLINE | ID: mdl-38629881

Open science practices have led to an increase in available speech datasets for researchers interested in acoustic analysis. Accurate evaluation of these databases frequently requires manual or semi-automated analysis. The time-intensive nature of these analyses makes them ideally suited for research assistants in laboratories focused on speech and voice production. However, the completion of high-quality, consistent, and reliable analyses requires clear rules and guidelines for all research assistants to follow. This tutorial will provide information on training and mentoring research assistants to complete these analyses, covering areas including RA training, ongoing data analysis monitoring, and documentation needed for reliable and re-creatable findings.


Voice Disorders , Voice , Humans , Acoustics , Speech
14.
Elife ; 132024 Apr 18.
Article En | MEDLINE | ID: mdl-38635312

Complex skills like speech and dance are composed of ordered sequences of simpler elements, but the neuronal basis for the syntactic ordering of actions is poorly understood. Birdsong is a learned vocal behavior composed of syntactically ordered syllables, controlled in part by the songbird premotor nucleus HVC (proper name). Here, we test whether one of HVC's recurrent inputs, mMAN (medial magnocellular nucleus of the anterior nidopallium), contributes to sequencing in adult male Bengalese finches (Lonchura striata domestica). Bengalese finch song includes several patterns: (1) chunks, comprising stereotyped syllable sequences; (2) branch points, where a given syllable can be followed probabilistically by multiple syllables; and (3) repeat phrases, where individual syllables are repeated variable numbers of times. We found that following bilateral lesions of mMAN, acoustic structure of syllables remained largely intact, but sequencing became more variable, as evidenced by 'breaks' in previously stereotyped chunks, increased uncertainty at branch points, and increased variability in repeat numbers. Our results show that mMAN contributes to the variable sequencing of vocal elements in Bengalese finch song and demonstrate the influence of recurrent projections to HVC. Furthermore, they highlight the utility of species with complex syntax in investigating neuronal control of ordered sequences.


Songbirds , Male , Animals , Speech , Acoustics , Memory , Stereotyped Behavior
15.
Int J Yoga Therap ; 34(2024)2024 Apr 01.
Article En | MEDLINE | ID: mdl-38640400

A previous study discovered that two speakers with moderate apraxia of speech increased their sequential motion rates after unilateral forced-nostril breathing (UFNB) practiced as an adjunct to speech-language therapy in an AB repeated-measures design. The current study sought to: (1) delineate possible UFNB plus practice effects from practice effects alone in motor speech skills; (2) examine the relationships between UFNB integrity, participant-reported stress levels, and motor speech performance; and (3) sample a participant-led UFNB training schedule to contribute to the literature's growing understanding of UFNB dosage. A single-subject (n-of-1 trial), ABAB reversal design was used across four motor speech behaviors. A 60-year-old female with chronic, severe apraxia of speech participated. The researchers developed a breathing app to assess UFNB practice integrity and administer the Simple Aphasia Stress Scale after each UFNB session. The participant improved from overall severe to moderate apraxia of speech on the Apraxia Battery for Adults. Visual inspection of graphs confirmed robust motor speech practice effects for all variables. Articulatory-kinematic variables demonstrated sensitivity to the UFNB-plus-practice condition and correlated to stress scale scores but not UFNB integrity scores. The participant achieved 20-minute UFNB sessions 4 times per week. Removal of UFNB during A2 (UFNB withdrawal) and after a 10-day break during B2 (UFNB full dosage) revealed UFNB practice effects on stress scale scores. UFNB with motor speech practice may benefit articulatory-kinematic skills compared to motor speech practice alone. Regular, cumulative UFNB practice appeared to lower self-perceived stress levels. These findings, along with prior work, provide a foundation to further explore yoga breathing and its use with speakers who have apraxia of speech.


Aphasia , Apraxias , Yoga , Adult , Female , Humans , Middle Aged , Speech , Apraxias/therapy , Respiration , Aphasia/therapy
16.
Sensors (Basel) ; 24(8)2024 Apr 17.
Article En | MEDLINE | ID: mdl-38676191

This paper addresses a joint training approach applied to a pipeline comprising speech enhancement (SE) and automatic speech recognition (ASR) models, where an acoustic tokenizer is included in the pipeline to leverage the linguistic information from the ASR model to the SE model. The acoustic tokenizer takes the outputs of the ASR encoder and provides a pseudo-label through K-means clustering. To transfer the linguistic information, represented by pseudo-labels, from the acoustic tokenizer to the SE model, a cluster-based pairwise contrastive (CBPC) loss function is proposed, which is a self-supervised contrastive loss function, and combined with an information noise contrastive estimation (infoNCE) loss function. This combined loss function prevents the SE model from overfitting to outlier samples and represents the pronunciation variability in samples with the same pseudo-label. The effectiveness of the proposed CBPC loss function is evaluated on a noisy LibriSpeech dataset by measuring both the speech quality scores and the word error rate (WER). The experimental results reveal that the proposed joint training approach using the described CBPC loss function achieves a lower WER than the conventional joint training approaches. In addition, it is demonstrated that the speech quality scores of the SE model trained using the proposed training approach are higher than those of the standalone-SE model and SE models trained using conventional joint training approaches. An ablation study is also conducted to investigate the effects of different combinations of loss functions on the speech quality scores and WER. Here, it is revealed that the proposed CBPC loss function combined with infoNCE contributes to a reduced WER and an increase in most of the speech quality scores.


Noise , Speech Recognition Software , Humans , Cluster Analysis , Algorithms , Speech/physiology
17.
Sensors (Basel) ; 24(8)2024 Apr 20.
Article En | MEDLINE | ID: mdl-38676246

Stuttering, affecting approximately 1% of the global population, is a complex speech disorder significantly impacting individuals' quality of life. Prior studies using electromyography (EMG) to examine orofacial muscle activity in stuttering have presented mixed results, highlighting the variability in neuromuscular responses during stuttering episodes. Fifty-five participants with stuttering and 30 individuals without stuttering, aged between 18 and 40, participated in the study. EMG signals from five facial and cervical muscles were recorded during speech tasks and analyzed for mean amplitude and frequency activity in the 5-15 Hz range to identify significant differences. Upon analysis of the 5-15 Hz frequency range, a higher average amplitude was observed in the zygomaticus major muscle for participants while stuttering (p < 0.05). Additionally, when assessing the overall EMG signal amplitude, a higher average amplitude was observed in samples obtained from disfluencies in participants who did not stutter, particularly in the depressor anguli oris muscle (p < 0.05). Significant differences in muscle activity were observed between the two groups, particularly in the depressor anguli oris and zygomaticus major muscles. These results suggest that the underlying neuromuscular mechanisms of stuttering might involve subtle aspects of timing and coordination in muscle activation. Therefore, these findings may contribute to the field of biosensors by providing valuable perspectives on neuromuscular mechanisms and the relevance of electromyography in stuttering research. Further research in this area has the potential to advance the development of biosensor technology for language-related applications and therapeutic interventions in stuttering.


Electromyography , Facial Muscles , Speech , Stuttering , Humans , Electromyography/methods , Male , Adult , Female , Stuttering/physiopathology , Speech/physiology , Facial Muscles/physiology , Facial Muscles/physiopathology , Biomechanical Phenomena/physiology , Young Adult , Adolescent , Muscle Contraction/physiology
18.
Alzheimers Dement ; 20(5): 3416-3428, 2024 May.
Article En | MEDLINE | ID: mdl-38572850

INTRODUCTION: Screening for Alzheimer's disease neuropathologic change (ADNC) in individuals with atypical presentations is challenging but essential for clinical management. We trained automatic speech-based classifiers to distinguish frontotemporal dementia (FTD) patients with ADNC from those with frontotemporal lobar degeneration (FTLD). METHODS: We trained automatic classifiers with 99 speech features from 1 minute speech samples of 179 participants (ADNC = 36, FTLD = 60, healthy controls [HC] = 89). Patients' pathology was assigned based on autopsy or cerebrospinal fluid analytes. Structural network-based magnetic resonance imaging analyses identified anatomical correlates of distinct speech features. RESULTS: Our classifier showed 0.88 ± $ \pm $ 0.03 area under the curve (AUC) for ADNC versus FTLD and 0.93 ± $ \pm $ 0.04 AUC for patients versus HC. Noun frequency and pause rate correlated with gray matter volume loss in the limbic and salience networks, respectively. DISCUSSION: Brief naturalistic speech samples can be used for screening FTD patients for underlying ADNC in vivo. This work supports the future development of digital assessment tools for FTD. HIGHLIGHTS: We trained machine learning classifiers for frontotemporal dementia patients using natural speech. We grouped participants by neuropathological diagnosis (autopsy) or cerebrospinal fluid biomarkers. Classifiers well distinguished underlying pathology (Alzheimer's disease vs. frontotemporal lobar degeneration) in patients. We identified important features through an explainable artificial intelligence approach. This work lays the groundwork for a speech-based neuropathology screening tool.


Alzheimer Disease , Frontotemporal Dementia , Magnetic Resonance Imaging , Speech , Humans , Female , Alzheimer Disease/pathology , Male , Aged , Frontotemporal Dementia/pathology , Speech/physiology , Middle Aged , Phenotype , Frontotemporal Lobar Degeneration/pathology , Machine Learning
19.
IEEE J Transl Eng Health Med ; 12: 382-389, 2024.
Article En | MEDLINE | ID: mdl-38606392

Acoustic features extracted from speech can help with the diagnosis of neurological diseases and monitoring of symptoms over time. Temporal segmentation of audio signals into individual words is an important pre-processing step needed prior to extracting acoustic features. Machine learning techniques could be used to automate speech segmentation via automatic speech recognition (ASR) and sequence to sequence alignment. While state-of-the-art ASR models achieve good performance on healthy speech, their performance significantly drops when evaluated on dysarthric speech. Fine-tuning ASR models on impaired speech can improve performance in dysarthric individuals, but it requires representative clinical data, which is difficult to collect and may raise privacy concerns. This study explores the feasibility of using two augmentation methods to increase ASR performance on dysarthric speech: 1) healthy individuals varying their speaking rate and loudness (as is often used in assessments of pathological speech); 2) synthetic speech with variations in speaking rate and accent (to ensure more diverse vocal representations and fairness). Experimental evaluations showed that fine-tuning a pre-trained ASR model with data from these two sources outperformed a model fine-tuned only on real clinical data and matched the performance of a model fine-tuned on the combination of real clinical data and synthetic speech. When evaluated on held-out acoustic data from 24 individuals with various neurological diseases, the best performing model achieved an average word error rate of 5.7% and a mean correct count accuracy of 94.4%. In segmenting the data into individual words, a mean intersection-over-union of 89.2% was obtained against manual parsing (ground truth). It can be concluded that emulated and synthetic augmentations can significantly reduce the need for real clinical data of dysarthric speech when fine-tuning ASR models and, in turn, for speech segmentation.


Speech Perception , Speech , Humans , Speech Recognition Software , Dysarthria/diagnosis , Speech Disorders
20.
Sensors (Basel) ; 24(7)2024 Mar 22.
Article En | MEDLINE | ID: mdl-38610256

The ongoing biodiversity crisis, driven by factors such as land-use change and global warming, emphasizes the need for effective ecological monitoring methods. Acoustic monitoring of biodiversity has emerged as an important monitoring tool. Detecting human voices in soundscape monitoring projects is useful both for analyzing human disturbance and for privacy filtering. Despite significant strides in deep learning in recent years, the deployment of large neural networks on compact devices poses challenges due to memory and latency constraints. Our approach focuses on leveraging knowledge distillation techniques to design efficient, lightweight student models for speech detection in bioacoustics. In particular, we employed the MobileNetV3-Small-Pi model to create compact yet effective student architectures to compare against the larger EcoVAD teacher model, a well-regarded voice detection architecture in eco-acoustic monitoring. The comparative analysis included examining various configurations of the MobileNetV3-Small-Pi-derived student models to identify optimal performance. Additionally, a thorough evaluation of different distillation techniques was conducted to ascertain the most effective method for model selection. Our findings revealed that the distilled models exhibited comparable performance to the EcoVAD teacher model, indicating a promising approach to overcoming computational barriers for real-time ecological monitoring.


Speech , Voice , Humans , Acoustics , Biodiversity , Knowledge
...