Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 316
Filter
1.
Stud Health Technol Inform ; 315: 300-304, 2024 Jul 24.
Article in English | MEDLINE | ID: mdl-39049272

ABSTRACT

The complex nature of verbal patient-nurse communication holds valuable insights for nursing research, but traditional documentation methods often miss these crucial details. This study explores the emerging role of speech processing technology in nursing research, emphasizing patient-nurse verbal communication. We conducted case studies across various healthcare settings, revealing a substantial gap in electronic health records for capturing vital patient-nurse encounters. Our research demonstrates that speech processing technology can effectively bridge this gap, enhancing documentation accuracy and enriching data for quality care assessment and risk prediction. The technology's application in home healthcare, outpatient settings, and specialized areas like dementia care illustrates its versatility. It offers the potential for real-time decision support, improved communication training, and enhanced telehealth practices. This paper provides insights into the promises and challenges of integrating speech processing into nursing practice, paving the way for future patient care and healthcare data management advancements.


Subject(s)
Electronic Health Records , Nurse-Patient Relations , Humans , Speech Recognition Software , Nursing Records , Nursing Research , Information Sources
2.
Front Psychol ; 15: 1322665, 2024.
Article in English | MEDLINE | ID: mdl-38988379

ABSTRACT

Young children's language and social development is influenced by the linguistic environment of their classrooms, including their interactions with teachers and peers. Measurement of the classroom linguistic environment typically relies on observational methods, often providing limited 'snapshots' of children's interactions, from which broad generalizations are made. Recent technological advances, including artificial intelligence, provide opportunities to capture children's interactions using continuous recordings representing much longer durations of time. The goal of the present study was to evaluate the accuracy of the Interaction Detection in Early Childhood Settings (IDEAS) system on 13 automated indices of language output using recordings collected from 19 children and three teachers over two weeks in an urban preschool classroom. The accuracy of language outputs processed via IDEAS were compared to ground truth via linear correlations and median absolute relative error. Findings indicate high correlations between IDEAS and ground truth data on measures of teacher and child speech, and relatively low error rates on the majority of IDEAS language output measures. Study findings indicate that IDEAS may provide a useful measurement tool for advancing knowledge about children's classroom experiences and their role in shaping development.

3.
Front Neurosci ; 18: 1428256, 2024.
Article in English | MEDLINE | ID: mdl-38988764

ABSTRACT

Encoding artificial perceptions through brain stimulation, especially that of higher cognitive functions such as speech perception, is one of the most formidable challenges in brain-computer interfaces (BCI). Brain stimulation has been used for functional mapping in clinical practices for the last 70 years to treat various disorders affecting the nervous system, including epilepsy, Parkinson's disease, essential tremors, and dystonia. Recently, direct electrical stimulation has been used to evoke various forms of perception in humans, ranging from sensorimotor, auditory, and visual to speech cognition. Successfully evoking and fine-tuning artificial perceptions could revolutionize communication for individuals with speech disorders and significantly enhance the capabilities of brain-computer interface technologies. However, despite the extensive literature on encoding various perceptions and the rising popularity of speech BCIs, inducing artificial speech perception is still largely unexplored, and its potential has yet to be determined. In this paper, we examine the various stimulation techniques used to evoke complex percepts and the target brain areas for the input of speech-like information. Finally, we discuss strategies to address the challenges of speech encoding and discuss the prospects of these approaches.

4.
IEEE Open J Signal Process ; 5: 738-749, 2024.
Article in English | MEDLINE | ID: mdl-38957540

ABSTRACT

The ADReSS-M Signal Processing Grand Challenge was held at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023. The challenge targeted difficult automatic prediction problems of great societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD) and the estimation of cognitive test scoress. Participants were invited to create models for the assessment of cognitive function based on spontaneous speech data. Most of these models employed signal processing and machine learning methods. The ADReSS-M challenge was designed to assess the extent to which predictive models built based on speech in one language generalise to another language. The language data compiled and made available for ADReSS-M comprised English, for model training, and Greek, for model testing and validation. To the best of our knowledge no previous shared research task investigated acoustic features of the speech signal or linguistic characteristics in the context of multilingual AD detection. This paper describes the context of the ADReSS-M challenge, its data sets, its predictive tasks, the evaluation methodology we employed, our baseline models and results, and the top five submissions. The paper concludes with a summary discussion of the ADReSS-M results, and our critical assessment of the future outlook in this field.

5.
Article in English | MEDLINE | ID: mdl-38898345

ABSTRACT

We used a novel nonword detection task to examine the lexical competition principle postulated in most models of spoken word recognition. To do so, in Experiment 1 we presented sequences of spoken words with half of the sequences containing a nonword, and the target nonword (i.e., press a response key whenever you detect a nonword in the sequence) could either be phonologically related (a phonological neighbor) or unrelated to the immediately preceding word. We reasoned that the reactivation of a phonological neighbor during target nonword processing should delay the moment at which a nonword decision can be made. Contrary to our hypothesis, participants were faster at detecting nonwords when they were preceded by a phonological neighbor compared with an unrelated word. In Experiment 2, an inhibitory effect of phonological relatedness on nonword decisions was observed in a classic priming situation using the same set of related and unrelated word-nonword pairs. We discuss the implications of these findings in regard to the main models of spoken word recognition, and conclude that our specific experimental set-up with phonological neighbors embedded in spoken sentences is more sensitive to cooperative interactions between co-activated sublexical representations than lexical competition between co-activated lexical representations, with the latter being modulated by whether or not the words compete for the same slot in time.

6.
Hum Brain Mapp ; 45(8): e26676, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38798131

ABSTRACT

Aphasia is a communication disorder that affects processing of language at different levels (e.g., acoustic, phonological, semantic). Recording brain activity via Electroencephalography while people listen to a continuous story allows to analyze brain responses to acoustic and linguistic properties of speech. When the neural activity aligns with these speech properties, it is referred to as neural tracking. Even though measuring neural tracking of speech may present an interesting approach to studying aphasia in an ecologically valid way, it has not yet been investigated in individuals with stroke-induced aphasia. Here, we explored processing of acoustic and linguistic speech representations in individuals with aphasia in the chronic phase after stroke and age-matched healthy controls. We found decreased neural tracking of acoustic speech representations (envelope and envelope onsets) in individuals with aphasia. In addition, word surprisal displayed decreased amplitudes in individuals with aphasia around 195 ms over frontal electrodes, although this effect was not corrected for multiple comparisons. These results show that there is potential to capture language processing impairments in individuals with aphasia by measuring neural tracking of continuous speech. However, more research is needed to validate these results. Nonetheless, this exploratory study shows that neural tracking of naturalistic, continuous speech presents a powerful approach to studying aphasia.


Subject(s)
Aphasia , Electroencephalography , Stroke , Humans , Aphasia/physiopathology , Aphasia/etiology , Aphasia/diagnostic imaging , Male , Female , Middle Aged , Stroke/complications , Stroke/physiopathology , Aged , Speech Perception/physiology , Adult , Speech/physiology
7.
Biol Futur ; 75(1): 145-158, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38805154

ABSTRACT

The speech multi-feature MMN (Mismatch Negativity) offers a means to explore the neurocognitive background of the processing of multiple speech features in a short time, by capturing the time-locked electrophysiological activity of the brain known as event-related brain potentials (ERPs). Originating from Näätänen et al. (Clin Neurophysiol 115:140-144, 2004) pioneering work, this paradigm introduces several infrequent deviant stimuli alongside standard ones, each differing in various speech features. In this study, we aimed to refine the multi-feature MMN paradigm used previously to encompass both segmental and suprasegmental (prosodic) features of speech. In the experiment, a two-syllable long pseudoword was presented as a standard, and the deviant stimuli included alterations in consonants (deviation by place or place and mode of articulation), vowels (deviation by place or mode of articulation), and stress pattern in the first syllable of the pseudoword. Results indicated the emergence of MMN components across all segmental and prosodic contrasts, with the expected fronto-central amplitude distribution. Subsequent analyses revealed subtle differences in MMN responses to the deviants, suggesting varying sensitivity to phonetic contrasts. Furthermore, individual differences in MMN amplitudes were noted, partially attributable to participants' musical and language backgrounds. These findings underscore the utility of the multi-feature MMN paradigm for rapid and efficient investigation of the neurocognitive mechanisms underlying speech processing. Moreover, the paradigm demonstrated the potential to be used in further research to study the speech processing abilities in various populations.


Subject(s)
Speech Perception , Adult , Female , Humans , Male , Young Adult , Electroencephalography/methods , Evoked Potentials/physiology , Evoked Potentials, Auditory/physiology , Speech Perception/physiology
8.
Cogn Emot ; : 1-10, 2024 May 19.
Article in English | MEDLINE | ID: mdl-38764186

ABSTRACT

Older adults process emotional speech differently than young adults, relying less on prosody (tone) relative to semantics (words). This study aimed to elucidate the mechanisms underlying these age-related differences via an emotional speech-in-noise test. A sample of 51 young and 47 older adults rated spoken sentences with emotional content on both prosody and semantics, presented on the background of wideband speech-spectrum noise (sensory interference) or on the background of multi-talker babble (sensory/cognitive interference). The presence of wideband noise eliminated age-related differences in semantics but not in prosody when processing emotional speech. Conversely, the presence of babble resulted in the elimination of age-related differences across all measures. The results suggest that both sensory and cognitive-linguistic factors contribute to age-related changes in emotional speech processing. Because real world conditions typically involve noisy background, our results highlight the importance of testing under such conditions.

9.
Trends Hear ; 28: 23312165241246596, 2024.
Article in English | MEDLINE | ID: mdl-38738341

ABSTRACT

The auditory brainstem response (ABR) is a valuable clinical tool for objective hearing assessment, which is conventionally detected by averaging neural responses to thousands of short stimuli. Progressing beyond these unnatural stimuli, brainstem responses to continuous speech presented via earphones have been recently detected using linear temporal response functions (TRFs). Here, we extend earlier studies by measuring subcortical responses to continuous speech presented in the sound-field, and assess the amount of data needed to estimate brainstem TRFs. Electroencephalography (EEG) was recorded from 24 normal hearing participants while they listened to clicks and stories presented via earphones and loudspeakers. Subcortical TRFs were computed after accounting for non-linear processing in the auditory periphery by either stimulus rectification or an auditory nerve model. Our results demonstrated that subcortical responses to continuous speech could be reliably measured in the sound-field. TRFs estimated using auditory nerve models outperformed simple rectification, and 16 minutes of data was sufficient for the TRFs of all participants to show clear wave V peaks for both earphones and sound-field stimuli. Subcortical TRFs to continuous speech were highly consistent in both earphone and sound-field conditions, and with click ABRs. However, sound-field TRFs required slightly more data (16 minutes) to achieve clear wave V peaks compared to earphone TRFs (12 minutes), possibly due to effects of room acoustics. By investigating subcortical responses to sound-field speech stimuli, this study lays the groundwork for bringing objective hearing assessment closer to real-life conditions, which may lead to improved hearing evaluations and smart hearing technologies.


Subject(s)
Acoustic Stimulation , Electroencephalography , Evoked Potentials, Auditory, Brain Stem , Speech Perception , Humans , Evoked Potentials, Auditory, Brain Stem/physiology , Male , Female , Speech Perception/physiology , Acoustic Stimulation/methods , Adult , Young Adult , Auditory Threshold/physiology , Time Factors , Cochlear Nerve/physiology , Healthy Volunteers
10.
Article in English | MEDLINE | ID: mdl-38726473

ABSTRACT

BACKGROUND: Cleft lip and palate is one of the most common oral and maxillofacial deformities associated with a variety of functional disorders. Cleft palate speech disorder (CPSD) occurs the most frequently and manifests a series of characteristic speech features, which are called cleft speech characteristics. Some scholars believe that children with CPSD and poor speech outcomes may also have weaknesses in speech input processing ability, but evidence is still lacking so far. AIMS: (1) To explore whether children with CPSD and speech output disorders also have defects in speech input processing abilities; (2) to explore the correlation between speech input and output processing abilities. METHODS & PROCEDURES: Children in the experimental group were enrolled from Beijing Stomatological Hospital, Capital Medical University, and healthy volunteers were recruited as controls. Then three tasks containing real and pseudo words were performed sequentially. Reaction time, accuracy and other indicators in three tasks were collected and then analysed. OUTCOMES & RESULTS: The indicators in the experimental group were significantly lower than those in the control group. There was a strong correlation between speech input and output processing tasks. The performance of both groups when processing pseudo words in the three tasks was worse than that when dealing with real words. CONCLUSIONS & IMPLICATIONS: Compared with normal controls, children with CPSD have deficits in both speech input and output processing, and there is a strong correlation between speech input and output speech processing abilities. In addition, the pseudo words task was more challenging than the real word task for both groups. WHAT THIS PAPER ADDS: What is already known on the subject Children with cleft lip and palate often have speech sound disorders known as cleft palate speech disorder (CPSD). CPSD is characterised by consonant errors called cleft speech characteristics, which can persist even after surgery. Some studies suggest that poor speech outcomes in children with CPSD may be associated with deficits in processing speech input. However, this has not been validated in mainland China. What this paper adds to existing knowledge The results of our study indicate that children with CPSD exhibit poorer performance in three tasks assessing speech input and output abilities compared to healthy controls, suggesting their deficits in both speech input and output processing. Furthermore, a significant correlation was observed between speech input and output processing abilities. Additionally, both groups demonstrated greater difficulty in processing pseudo words compared to real words, as evidenced by their worse performance in dealing with pseudo words. What are the potential or actual clinical implications of this work? The pseudo word tasks designed and implemented in our study can be employed in future research and assessment of speech input and output abilities in Chinese Mandarin children with CPSD. Additionally, our findings revealed the significance of considering both speech output processing abilities and potential existence of speech input processing ability for speech and language therapists when evaluating and developing treatment options for children with CPSD as these abilities are also important for the development of literacy development.

11.
Front Psychol ; 15: 1345906, 2024.
Article in English | MEDLINE | ID: mdl-38596333

ABSTRACT

Introduction: Temporal co-ordination between speech and gestures has been thoroughly studied in natural production. In most cases gesture strokes precede or coincide with the stressed syllable in words that they are semantically associated with. Methods: To understand whether processing of speech and gestures is attuned to such temporal coordination, we investigated the effect of delaying, preposing or eliminating individual gestures on the memory for words in an experimental study in which 83 participants watched video sequences of naturalistic 3D-animated speakers generated based on motion capture data. A target word in the sequence appeared (a) with a gesture presented in its original position synchronized with speech, (b) temporally shifted 500 ms before or (c) after the original position, or (d) with the gesture eliminated. Participants were asked to retell the videos in a free recall task. The strength of recall was operationalized as the inclusion of the target word in the free recall. Results: Both eliminated and delayed gesture strokes resulted in reduced recall rates compared to synchronized strokes, whereas there was no difference between advanced (preposed) and synchronized strokes. An item-level analysis also showed that the greater the interval between the onsets of delayed strokes and stressed syllables in target words, the greater the negative effect was on recall. Discussion: These results indicate that speech-gesture synchrony affects memory for speech, and that temporal patterns that are common in production lead to the best recall. Importantly, the study also showcases a procedure for using motion capture-based 3D-animated speakers to create an experimental paradigm for the study of speech-gesture comprehension.

12.
J Neurosci ; 44(22)2024 May 29.
Article in English | MEDLINE | ID: mdl-38589232

ABSTRACT

In developmental language disorder (DLD), learning to comprehend and express oneself with spoken language is impaired, but the reason for this remains unknown. Using millisecond-scale magnetoencephalography recordings combined with machine learning models, we investigated whether the possible neural basis of this disruption lies in poor cortical tracking of speech. The stimuli were common spoken Finnish words (e.g., dog, car, hammer) and sounds with corresponding meanings (e.g., dog bark, car engine, hammering). In both children with DLD (10 boys and 7 girls) and typically developing (TD) control children (14 boys and 3 girls), aged 10-15 years, the cortical activation to spoken words was best modeled as time-locked to the unfolding speech input at ∼100 ms latency between sound and cortical activation. Amplitude envelope (amplitude changes) and spectrogram (detailed time-varying spectral content) of the spoken words, but not other sounds, were very successfully decoded based on time-locked brain responses in bilateral temporal areas; based on the cortical responses, the models could tell at ∼75-85% accuracy which of the two sounds had been presented to the participant. However, the cortical representation of the amplitude envelope information was poorer in children with DLD compared with TD children at longer latencies (at ∼200-300 ms lag). We interpret this effect as reflecting poorer retention of acoustic-phonetic information in short-term memory. This impaired tracking could potentially affect the processing and learning of words as well as continuous speech. The present results offer an explanation for the problems in language comprehension and acquisition in DLD.


Subject(s)
Language Development Disorders , Magnetoencephalography , Speech Perception , Humans , Male , Female , Child , Adolescent , Magnetoencephalography/methods , Language Development Disorders/physiopathology , Speech Perception/physiology , Cerebral Cortex/physiopathology , Acoustic Stimulation/methods , Speech/physiology
13.
Curr Biol ; 34(8): 1750-1754.e4, 2024 04 22.
Article in English | MEDLINE | ID: mdl-38521063

ABSTRACT

Using words to refer to objects in the environment is a core feature of the human language faculty. Referential understanding assumes the formation of mental representations of these words.1,2 Such understanding of object words has not yet been demonstrated as a general capacity in any non-human species,3 despite multiple behavior-based case reports.4,5,6,7,8,9,10 In human event-related potential (ERP) studies, object word knowledge is typically tested using the semantic violation paradigm, where words are presented either with their referent (match) or another object (mismatch).11,12 Such mismatch elicits an N400 effect, a well-established neural correlate of semantic processing.12,13 Reports of preverbal infant N400 evoked by semantic violations14 assert the use of this paradigm to probe mental representations of object words in nonverbal populations. Here, measuring dogs' (Canis familiaris) ERPs to objects primed with matching or mismatching object words, we found a mismatch effect at a frontal electrode, with a latency (206-606 ms) comparable to the human N400. A greater difference for words that dogs knew better, according to owner reports, further supported a semantic interpretation of this effect. Semantic expectations emerged irrespective of vocabulary size, demonstrating the prevalence of referential understanding in dogs. These results provide the first neural evidence for object word knowledge in a non-human animal. VIDEO ABSTRACT.


Subject(s)
Evoked Potentials , Semantics , Animals , Dogs/physiology , Male , Female , Evoked Potentials/physiology , Comprehension/physiology , Electroencephalography , Humans
14.
Data Brief ; 53: 110229, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38445201

ABSTRACT

Obtaining real-world multi-channel speech recordings is expensive and time-consuming. Therefore, multi-channel recordings are often artificially generated by convolving existing monaural speech recordings with simulated Room Impulse Responses (RIRs) from a so-called shoebox room [1] for stationary (not moving) speakers. Far-field speech processing for home automation or smart assistants have to cope with moving speakers in reverberant environments. With this dataset, we aim to support the generation of realistic speech data by providing multiple directional RIRs along a fine grid of locations in a real room. We provide directional RIR recordings for a classroom and a large corridor. These RIRs can be used to simulate moving speakers by generating random trajectories on that grid, and quantize the trajectories along the grid points. For each matching grid point, the monaural speech recording can be convolved with the RIR at this grid point. Then, the spatialized recording can be compiled using the overlap-add method for each grid point [2]. An example is provided with the data.

15.
JMIR Aging ; 7: e50537, 2024 Apr 29.
Article in English | MEDLINE | ID: mdl-38386279

ABSTRACT

BACKGROUND: The rise in life expectancy is associated with an increase in long-term and gradual cognitive decline. Treatment effectiveness is enhanced at the early stage of the disease. Therefore, there is a need to find low-cost and ecological solutions for mass screening of community-dwelling older adults. OBJECTIVE: This work aims to exploit automatic analysis of free speech to identify signs of cognitive function decline. METHODS: A sample of 266 participants older than 65 years were recruited in Italy and Spain and were divided into 3 groups according to their Mini-Mental Status Examination (MMSE) scores. People were asked to tell a story and describe a picture, and voice recordings were used to extract high-level features on different time scales automatically. Based on these features, machine learning algorithms were trained to solve binary and multiclass classification problems by using both mono- and cross-lingual approaches. The algorithms were enriched using Shapley Additive Explanations for model explainability. RESULTS: In the Italian data set, healthy participants (MMSE score≥27) were automatically discriminated from participants with mildly impaired cognitive function (20≤MMSE score≤26) and from those with moderate to severe impairment of cognitive function (11≤MMSE score≤19) with accuracy of 80% and 86%, respectively. Slightly lower performance was achieved in the Spanish and multilanguage data sets. CONCLUSIONS: This work proposes a transparent and unobtrusive assessment method, which might be included in a mobile app for large-scale monitoring of cognitive functionality in older adults. Voice is confirmed to be an important biomarker of cognitive decline due to its noninvasive and easily accessible nature.


Subject(s)
Cognitive Dysfunction , Speech , Humans , Aged , Female , Male , Cognitive Dysfunction/diagnosis , Cross-Sectional Studies , Italy/epidemiology , Aged, 80 and over , Speech/physiology , Spain/epidemiology , Mental Status and Dementia Tests , Machine Learning , Algorithms
16.
Neuroimage ; 289: 120546, 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-38387743

ABSTRACT

The neuronal signatures of sensory and cognitive load provide access to brain activities related to complex listening situations. Sensory and cognitive loads are typically reflected in measures like response time (RT) and event-related potentials (ERPs) components. It's, however, strenuous to distinguish the underlying brain processes solely from these measures. In this study, along with RT- and ERP-analysis, we performed time-frequency analysis and source localization of oscillatory activity in participants performing two different auditory tasks with varying degrees of complexity and related them to sensory and cognitive load. We studied neuronal oscillatory activity in both periods before the behavioral response (pre-response) and after it (post-response). Robust oscillatory activities were found in both periods and were differentially affected by sensory and cognitive load. Oscillatory activity under sensory load was characterized by decrease in pre-response (early) theta activity and increased alpha activity. Oscillatory activity under cognitive load was characterized by increased theta activity, mainly in post-response (late) time. Furthermore, source localization revealed specific brain regions responsible for processing these loads, such as temporal and frontal lobe, cingulate cortex and precuneus. The results provide evidence that in complex listening situations, the brain processes sensory and cognitive loads differently. These neural processes have specific oscillatory signatures and are long lasting, extending beyond the behavioral response.


Subject(s)
Electroencephalography , Evoked Potentials , Humans , Electroencephalography/methods , Evoked Potentials/physiology , Brain/physiology , Frontal Lobe , Cognition/physiology
17.
Int J Lang Commun Disord ; 59(4): 1422-1435, 2024.
Article in English | MEDLINE | ID: mdl-38237606

ABSTRACT

BACKGROUND: Perceptual measures such as speech intelligibility are known to be biased, variant and subjective, to which an automatic approach has been seen as a more reliable alternative. On the other hand, automatic approaches tend to lack explainability, an aspect that can prevent the widespread usage of these technologies clinically. AIMS: In the present work, we aim to study the relationship between four perceptual parameters and speech intelligibility by automatically modelling the behaviour of six perceptual judges, in the context of head and neck cancer. From this evaluation we want to assess the different levels of relevance of each parameter as well as the different judge profiles that arise, both perceptually and automatically. METHODS AND PROCEDURES: Based on a passage reading task from the Carcinologic Speech Severity Index (C2SI) corpus, six expert listeners assessed the voice quality, resonance, prosody and phonemic distortions, as well as the speech intelligibility of patients treated for oral or oropharyngeal cancer. A statistical analysis and an ensemble of automatic systems, one per judge, were devised, where speech intelligibility is predicted as a function of the four aforementioned perceptual parameters of voice quality, resonance, prosody and phonemic distortions. OUTCOMES AND RESULTS: The results suggest that we can automatically predict speech intelligibility as a function of the four aforementioned perceptual parameters, achieving a high correlation of 0.775 (Spearman's ρ). Furthermore, different judge profiles were found perceptually that were successfully modelled automatically. CONCLUSIONS AND IMPLICATIONS: The four investigated perceptual parameters influence the global rating of speech intelligibility, showing that different judge profiles emerge. The proposed automatic approach displayed a more uniform profile across all judges, displaying a more reliable, unbiased and objective prediction. The system also adds an extra layer of interpretability, since speech intelligibility is regressed as a direct function of the individual prediction of the four perceptual parameters, an improvement over more black box approaches. WHAT THIS PAPER ADDS: What is already known on this subject Speech intelligibility is a clinical measure typically used in the post-treatment assessment of speech affecting disorders, such as head and neck cancer. Their perceptual assessment is currently the main method of evaluation; however, it is known to be quite subjective since intelligibility can be seen as a combination of other perceptual parameters (voice quality, resonance, etc.). Given this, automatic approaches have been seen as a more viable alternative to the traditionally used perceptual assessments. What this study adds to existing knowledge The present work introduces a study based on the relationship between four perceptual parameters (voice quality, resonance, prosody and phonemic distortions) and speech intelligibility, by automatically modelling the behaviour of six perceptual judges. The results suggest that different judge profiles arise, both in the perceptual case as well as in the automatic models. These different profiles found showcase the different schools of thought that perceptual judges have, in comparison to the automatic judges, that display more uniform levels of relevance across all the four perceptual parameters. This aspect shows that an automatic approach promotes unbiased, reliable and more objective predictions. What are the clinical implications of this work? The automatic prediction of speech intelligibility, using a combination of four perceptual parameters, show that these approaches can achieve high correlations with the reference scores while maintaining a certain degree of explainability. The more uniform judge profiles found on the automatic case also display less biased results towards the four perceptual parameters. This aspect facilitates the clinical implementation of this class of systems, as opposed to the more subjective and harder to reproduce perceptual assessments.


Subject(s)
Head and Neck Neoplasms , Speech Intelligibility , Humans , Male , Female , Head and Neck Neoplasms/psychology , Middle Aged , Aged , Judgment , Speech Perception , Voice Quality , Adult
18.
Hum Brain Mapp ; 45(1): e26577, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38224542

ABSTRACT

Healthy aging leads to complex changes in the functional network of speech processing in a noisy environment. The dual-route neural architecture has been applied to the study of speech processing. Although evidence suggests that senescent increases activity in the brain regions across the dorsal and ventral stream regions to offset reduced periphery, the regulatory mechanism of dual-route functional networks underlying such compensation remains largely unknown. Here, by utilizing functional near-infrared spectroscopy (fNIRS), we investigated the compensatory mechanism of the dual-route functional connectivity, and its relationship with healthy aging by using a speech perception task at varying signal-to-noise ratios (SNR) in healthy individuals (young adults, middle-aged adults, and older adults). Results showed that the speech perception scores showed a significant age-related decrease with the reduction of the SNR. The analysis results of dual-route speech processing networks showed that the functional connection of Wernicke's area and homolog Wernicke's area were age-related increases. Further to clarify the age-related characteristics of the dual-route speech processing networks, graph-theoretical network analysis revealed an age-related increase in the efficiency of the networks, and the age-related differences in nodal characteristics were found both in Wernicke's area and homolog Wernicke's area under noise environment. Thus, Wernicke's area might be a key network hub to maintain efficient information transfer across the speech process network with healthy aging. Moreover, older adults would recruit more resources from the homologous Wernicke's area in a noisy environment. The recruitment of the homolog of Wernicke's area might provide a means of compensation for older adults for decoding speech in an adverse listening environment. Together, our results characterized dual-route speech processing networks at varying noise environments and provided new insight for the compensatory theories of how aging modulates the dual-route speech processing functional networks.


Subject(s)
Speech Perception , Speech , Middle Aged , Young Adult , Humans , Aged , Magnetic Resonance Imaging , Aging , Brain/diagnostic imaging
19.
Article in English | MEDLINE | ID: mdl-38206308

ABSTRACT

BACKGROUND: Classification systems in healthcare support shared understanding of conditions for clinical communication, service monitoring and development, and research. Children born with cleft palate with or without cleft lip (CP+/-L) are at high risk of developing cleft-related speech sound disorder (SSD). The way cleft-related SSD is represented and described in SSD classification systems varies. Reflecting on the potential causal pathways for different cleft-related speech features, including the role of speech processing skills, may inform how cleft-related SSD is represented in classification systems. AIM & APPROACH: To explore and reflect on how cleft-related SSD is represented in current SSD classification systems in the context of considering how speech processing skills and other factors may be involved in causal pathways of cleft speech characteristics (CSCs). MAIN CONTRIBUTION: Variation in the representation of cleft-related SSD in classification systems is described. Potential causal pathways for passive cleft- related speech features and different active CSCs are explored. The factors involved in the development and/or persistence of different active CSCs may vary. Some factors may be specific to children born with CP+/-L, but if speech processing skills are also involved, this is an overlap with other SSD subtypes. Current evidence regarding relationships between different speech processing skills and active CSCs is limited. Implications for the representation of cleft-related SSD in SSD classification systems are discussed. CONCLUSION: There are different categories of cleft-related speech features which are essential to understand and identify in children with cleft-related SSD to ensure appropriate management. Representation of these feature categories in classification systems could support understanding of speech in this population. Speech processing skills could be involved in the development and/or persistence of different active CSCs in individual children. Reflection and discussion on how cleft-related SSD is represented in classification systems in relation to other SSD subtypes may inform future iterations of these systems. Further work is needed to understand factors influencing the development and/or persistence of active CSCs, including speech processing skills. WHAT THIS PAPER ADDS: What is already known on the subject Cleft-related speech sound disorder (SSD) is commonly described as being of known origin. The features of cleft-related SSD have been described extensively and several authors have also examined factors which may contribute to speech development and outcomes in children born with cleft palate +/- lip. There is limited evidence regarding the role of speech processing in the development and persistence of cleft-related SSD. What this study adds This paper reflects on how cleft-related SSD is represented in SSD classification systems in relation to key feature categories of cleft-related SSD and possible causal pathways for passive features and active cleft speech characteristics (CSCs). The role of cognitive speech processing skills is specifically considered alongside other factors that may contribute to the development of active CSCs. What are the clinical implications of this work? Causal pathways for different features of cleft-related SSD may vary, particularly between passive and active features, abut also between different active CSCs. Speech and language therapists (SLTs) need to differentially diagnose passive speech features and active CSCs. Consideration of the role of different speech processing skills and interactions with other potentially influencing factors in relation to active CSCs may inform clinical hypotheses and speech and language therapy (SLT) intervention. Representing key features of cleft-related SSD in classification systems may support understanding of cleft-related SSD in relation to other SSD subtypes.

20.
Cereb Cortex ; 34(2)2024 01 31.
Article in English | MEDLINE | ID: mdl-38265297

ABSTRACT

Numerous studies have been devoted to neural mechanisms of a variety of linguistic tasks (e.g. speech comprehension and production). To date, however, whether and how the neural patterns underlying different linguistic tasks are similar or differ remains elusive. In this study, we compared the neural patterns underlying 3 linguistic tasks mainly concerning speech comprehension and production. To address this, multivariate regression approaches with lesion/disconnection symptom mapping were applied to data from 216 stroke patients with damage to the left hemisphere. The results showed that lesion/disconnection patterns could predict both poststroke scores of speech comprehension and production tasks; these patterns exhibited shared regions on the temporal pole of the left hemisphere as well as unique regions contributing to the prediction for each domain. Lower scores in speech comprehension tasks were associated with lesions/abnormalities in the superior temporal gyrus and middle temporal gyrus, while lower scores in speech production tasks were associated with lesions/abnormalities in the left inferior parietal lobe and frontal lobe. These results suggested an important role of the ventral and dorsal stream pathways in speech comprehension and production (i.e. supporting the dual stream model) and highlighted the applicability of the novel multivariate disconnectome-based symptom mapping in cognitive neuroscience research.


Subject(s)
Brain Mapping , Stroke , Humans , Brain Mapping/methods , Magnetic Resonance Imaging/methods , Linguistics , Stroke/complications , Stroke/diagnostic imaging , Comprehension
SELECTION OF CITATIONS
SEARCH DETAIL