RESUMEN
Auditory perceptual evaluation is considered the gold standard for assessing voice quality, but its reliability is limited due to inter-rater variability and coarse rating scales. This study investigates a continuous, objective approach to evaluate hoarseness severity combining machine learning (ML) and sustained phonation. For this purpose, 635 acoustic recordings of the sustained vowel /a/ and subjective ratings based on the roughness, breathiness, and hoarseness scale were collected from 595 subjects. A total of 50 temporal, spectral, and cepstral features were extracted from each recording and used to identify suitable ML algorithms. Using variance and correlation analysis followed by backward elimination, a subset of relevant features was selected. Recordings were classified into two levels of hoarseness, H<2 and H≥2, yielding a continuous probability score y∈[0,1]. An accuracy of 0.867 and a correlation of 0.805 between the model's predictions and subjective ratings was obtained using only five acoustic features and logistic regression (LR). Further examination of recordings pre- and post-treatment revealed high qualitative agreement with the change in subjectively determined hoarseness levels. Quantitatively, a moderate correlation of 0.567 was obtained. This quantitative approach to hoarseness severity estimation shows promising results and potential for improving the assessment of voice quality.
Asunto(s)
Disfonía , Ronquera , Humanos , Ronquera/diagnóstico , Reproducibilidad de los Resultados , Calidad de la Voz , Fonación , Acústica , Acústica del Lenguaje , Medición de la Producción del HablaRESUMEN
BACKGROUND: Dysarthric symptoms in Parkinson's disease (PD) vary greatly across cohorts. Abundant research suggests that such heterogeneity could reflect subject-level and task-related cognitive factors. However, the interplay of these variables during motor speech remains underexplored, let alone by administering validated materials to carefully matched samples with varying cognitive profiles and combining automated tools with machine learning methods. OBJECTIVE: We aimed to identify which speech dimensions best identify patients with PD in cognitively heterogeneous, cognitively preserved, and cognitively impaired groups through tasks with low (reading) and high (retelling) processing demands. METHODS: We used support vector machines to analyze prosodic, articulatory, and phonemic identifiability features. Patient groups were compared with healthy control subjects and against each other in both tasks, using each measure separately and in combination. RESULTS: Relative to control subjects, patients in cognitively heterogeneous and cognitively preserved groups were best discriminated by combined dysarthric signs during reading (accuracy = 84% and 80.2%). Conversely, patients with cognitive impairment were maximally discriminated from control subjects when considering phonemic identifiability during retelling (accuracy = 86.9%). This same pattern maximally distinguished between cognitively spared and impaired patients (accuracy = 72.1%). Also, cognitive (executive) symptom severity was predicted by prosody in cognitively preserved patients and by phonemic identifiability in cognitively heterogeneous and impaired groups. No measure predicted overall motor dysfunction in any group. CONCLUSIONS: Predominant dysarthric symptoms appear to be best captured through undemanding tasks in cognitively heterogeneous and preserved cohorts and through cognitively loaded tasks in patients with cognitive impairment. Further applications of this framework could enhance dysarthria assessments in PD. © 2021 International Parkinson and Movement Disorder Society.
Asunto(s)
Disfunción Cognitiva , Enfermedad de Parkinson , Cognición , Disartria/diagnóstico , Disartria/etiología , Humanos , Aprendizaje Automático , HablaRESUMEN
OBJECTIVE: To systematically evaluate the evidence for the reliability, sensitivity and specificity of existing measures of vowel-initial voice onset. METHODS: A literature search was conducted across electronic databases for published studies (MEDLINE, EMBASE, Scopus, Web of Science, CINAHL, PubMed Central, IEEE Xplore) and grey literature (ProQuest for unpublished dissertations) measuring vowel onset. Eligibility criteria included research of any study design type or context focused on measuring human voice onset on an initial vowel. Two independent reviewers were involved at each stage of title and abstract screening, data extraction and analysis. Data extracted included measures used, their reliability, sensitivity and specificity. Risk of bias and certainty of evidence was assessed using GRADE as the data of interest was extracted. RESULTS: The search retrieved 6,983 records. Titles and abstracts were screened against the inclusion criteria by two independent reviewers, with a third reviewer responsible for conflict resolution. Thirty-five papers were included in the review, which identified five categories of voice onset measurement: auditory perceptual, acoustic, aerodynamic, physiological and visual imaging. Reliability was explored in 14 papers with varied reliability ratings, while sensitivity was rarely assessed, and no assessment of specificity was conducted across any of the included records. Certainty of evidence ranged from very low to moderate with high variability in methodology and voice onset measures used. CONCLUSIONS: A range of vowel-initial voice onset measurements have been applied throughout the literature, however, there is a lack of evidence regarding their sensitivity, specificity and reliability in the detection and discrimination of voice onset types. Heterogeneity in study populations and methods used preclude conclusions on the most valid measures. There is a clear need for standardisation of research methodology, and for future studies to examine the practicality of these measures in research and clinical settings.
Asunto(s)
Sensibilidad y Especificidad , Humanos , Reproducibilidad de los Resultados , VozRESUMEN
BACKGROUND: Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where privacy is especially vital, has not been extensively examined. METHODS: This study investigates anonymization's impact on pathological speech across over 2700 speakers from multiple German institutions, focusing on privacy, pathological utility, and demographic fairness. We explore both deep-learning-based and signal processing-based anonymization methods. RESULTS: We document substantial privacy improvements across disorders-evidenced by equal error rate increases up to 1933%, with minimal overall impact on utility. Specific disorders such as Dysarthria, Dysphonia, and Cleft Lip and Palate experience minimal utility changes, while Dysglossia shows slight improvements. Our findings underscore that the impact of anonymization varies substantially across different disorders. This necessitates disorder-specific anonymization strategies to optimally balance privacy with diagnostic utility. Additionally, our fairness analysis reveals consistent anonymization effects across most of the demographics. CONCLUSIONS: This study demonstrates the effectiveness of anonymization in pathological speech for enhancing privacy, while also highlighting the importance of customized and disorder-specific approaches to account for inversion attacks.
When someone's way of speaking is disrupted due to health issues, making it hard for them to communicate clearly, it is described as pathological speech. Our study explores whether this type of speech can be modified to protect patient privacy without losing its ability to help diagnose health conditions. We evaluated automatic anonymization for over 2,700 speakers. The results show that these methods can substantially enhance privacy while still maintaining the usefulness of speech in medical diagnostics. This means we can keep speech data private whilst still being able to use it to identify health issues. However, our results show the effectiveness of these methods can vary depending on the specific condition being diagnosed. Our study provides a method that can help maintain patient privacy, whilst highlighting that further customized approaches will be required to ensure optimal privacy.
RESUMEN
Tagged magnetic resonance imaging (MRI) has been successfully used to track the motion of internal tissue points within moving organs. Typically, to analyze motion using tagged MRI, cine MRI data in the same coordinate system are acquired, incurring additional time and costs. Consequently, tagged-to-cine MR synthesis holds the potential to reduce the extra acquisition time and costs associated with cine MRI, without disrupting downstream motion analysis tasks. Previous approaches have processed each frame independently, thereby overlooking the fact that complementary information from occluded regions of the tag patterns could be present in neighboring frames exhibiting motion. Furthermore, the inconsistent visual appearance, e.g., tag fading, across frames can reduce synthesis performance. To address this, we propose an efficient framework for tagged-to-cine MR sequence synthesis, leveraging both spatial and temporal information with relatively limited data. Specifically, we follow a split-and-integral protocol to balance spatialtemporal modeling efficiency and consistency. The light spatial-temporal transformer (LiST2) is designed to exploit the local and global attention in motion sequence with relatively lightweight training parameters. The directional product relative position-time bias is adapted to make the model aware of the spatial-temporal correlation, while the shifted window is used for motion alignment. Then, a recurrent sliding fine-tuning (ReST) scheme is applied to further enhance the temporal consistency. Our framework is evaluated on paired tagged and cine MRI sequences, demonstrating superior performance over comparison methods.
RESUMEN
OBJECTIVES: The Nyquist plot provides a graphical representation of the glottal cycles as elliptical trajectories in a 2D plane. This study proposes a methodology to parameterize the Nyquist plot with application to support the quantitative analysis of voice disorders. METHODS: We considered high-speed videoendoscopy recordings of 33 functional dysphonia (FD) patients and 33 normophonic controls (NC). Quantitative analysis was performed by computing four shape-based parameters from the Nyquist plot: Variability, Size (Perimeter and Area), and Consistency. Additionally, we performed automatic classification using a linear support vector machine and feature importance analysis by combining the proposed features with state-of-the-art glottal area waveform (GAW) parameters. RESULTS: We found that the inter-cycle variability was significantly higher in FD patients compared to NC. We achieved a classification accuracy of 83% when the top 30 most important features were used. Furthermore, the proposed Nyquist plot features were ranked in the top 12 most important features. CONCLUSIONS: The Nyquist plot provides complementary information for subjective and objective assessment of voice disorders. On the one hand, with visual inspection it is possible to observe intra- and inter-glottal cycle irregularities during sustained phonation. On the other hand, shaped-based parameters allow quantifying such irregularities and provide complementary information to state-of-the-art GAW parameters.
RESUMEN
PURPOSE: The aim of this study was to investigate the speech prosody of postlingually deaf cochlear implant (CI) users compared with control speakers without hearing or speech impairment. METHOD: Speech recordings of 74 CI users (37 males and 37 females) and 72 age-balanced control speakers (36 males and 36 females) are considered. All participants are German native speakers and read Der Nordwind und die Sonne (The North Wind and the Sun), a standard text in pathological speech analysis and phonetic transcriptions. Automatic acoustic analysis is performed considering pitch, loudness, and duration features, including speech rate and rhythm. RESULTS: In general, duration and rhythm features differ between CI users and control speakers. CI users read slower and have a lower voiced segment ratio compared with control speakers. A lower voiced ratio goes along with a prolongation of the voiced segments' duration in male and with a prolongation of pauses in female CI users. Rhythm features in CI users have higher variability in the duration of vowels and consonants than in control speakers. The use of bilateral CIs showed no advantages concerning speech prosody features in comparison to unilateral use of CI. CONCLUSIONS: Even after cochlear implantation and rehabilitation, the speech of postlingually deaf adults deviates from the speech of control speakers, which might be due to changed auditory feedback. We suggest considering changes in temporal aspects of speech in future rehabilitation strategies. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.21579171.
Asunto(s)
Implantación Coclear , Implantes Cocleares , Sordera , Percepción del Habla , Adulto , Masculino , Femenino , Humanos , Sordera/rehabilitación , Audición , AcústicaRESUMEN
Aim: This paper introduces Apkinson, a mobile application for motor evaluation and monitoring of Parkinson's disease patients. Materials & methods: The App is based on previously reported methods, for instance, the evaluation of articulation and pronunciation in speech, regularity and freezing of gait in walking, and tapping accuracy in hand movement. Results: Preliminary experiments indicate that most of the measurements are suitable to discriminate patients and controls. Significance is evaluated through statistical tests. Conclusion: Although the reported results correspond to preliminary experiments, we think that Apkinson is a very useful App that can help patients, caregivers and clinicians, in performing a more accurate monitoring of the disease progression. Additionally, the mobile App can be a personal health assistant.
Asunto(s)
Aplicaciones Móviles , Enfermedad de Parkinson/fisiopatología , Teléfono Inteligente , Anciano , Anciano de 80 o más Años , Femenino , Marcha , Humanos , Masculino , Persona de Mediana Edad , Movimiento , Índice de Severidad de la Enfermedad , HablaRESUMEN
Parkinson's disease is a neurodegenerative disorder characterized by a variety of motor symptoms. Particularly, difficulties to start/stop movements have been observed in patients. From a technical/diagnostic point of view, these movement changes can be assessed by modeling the transitions between voiced and unvoiced segments in speech, the movement when the patient starts or stops a new stroke in handwriting, or the movement when the patient starts or stops the walking process. This study proposes a methodology to model such difficulties to start or to stop movements considering information from speech, handwriting, and gait. We used those transitions to train convolutional neural networks to classify patients and healthy subjects. The neurological state of the patients was also evaluated according to different stages of the disease (initial, intermediate, and advanced). In addition, we evaluated the robustness of the proposed approach when considering speech signals in three different languages: Spanish, German, and Czech. According to the results, the fusion of information from the three modalities is highly accurate to classify patients and healthy subjects, and it shows to be suitable to assess the neurological state of the patients in several stages of the disease. We also aimed to interpret the feature maps obtained from the deep learning architectures with respect to the presence or absence of the disease and the neurological state of the patients. As far as we know, this is one of the first works that considers multimodal information to assess Parkinson's disease following a deep learning approach.