RESUMEN
The production of phonation involves very complex processes, linked to the physical, clinical, and emotional state of the speaker. Thus, in populations with neurological diseases, it is possible to find the imprint in the voice signal left by the deterioration of certain cortical areas or part of the neurocognitive mechanisms that are involved in speech. In previous works, the authors determined the relationship between the pathological characteristics of the voice of the speakers with Smith-Magenis syndrome (SMS) and a lower value in the cepstral peak prominence (CPP) with respect to normative speakers. They also described the presence of subharmonics in their voices. OBJECTIVES: The present study aims to verify whether both characteristics can be used simultaneously to differentiate SMS voices from neurotypical voices. It will also be analyzed if there is variation in the trajectory of the formants coinciding with the subharmonics. METHODS: To do this, the effect of subharmonics in the voices of 12 SMS individuals was isolated to see if they were responsible for the lower CPP values. An evaluation of the CPP was also carried out in the areas of subharmonic presence, from the peak that reflected the value of f0, rather than using the most prominent peak. This offered us a baseline for the CPP value in the presence of subharmonics. It was checked if changes in the formants occurred synchronously to the appearance of those subharmonics. If so, the muscles that control the position of the jaw and tongue would be affected at the same time as the larynx. The latter was difficult to observe since the samples were very short. A comparison of phonatory performance of a sustained /a/ between a normotypical group and non-normotypical group of children was carried out. These groups were balanced and matched in age and gender. The Spanish Association of Smith-Magenis Syndrome (ASME) provides almost 20% of the population in Spain. RESULTS: The CPP allows differentiating between normative speakers and those with SMS, even when isolating the effect of subharmonics. CONCLUSIONS: The CPP is a robust index for determining the degree of dysphonia. It makes it possible to differentiate pathological voices from healthy voices even when subharmonics are present. The presence of subharmonics is a characteristic of voices of SMS individuals and is not present in healthy ones. Both indexes can be used simultaneously to differentiate SMS voices from neurotypical voices.
RESUMEN
This research work introduces a novel, nonintrusive method for the automatic identification of Smith-Magenis syndrome, traditionally studied through genetic markers. The method utilizes cepstral peak prominence and various machine learning techniques, relying on a single metric computed by the research group. The performance of these techniques is evaluated across two case studies, each employing a unique data preprocessing approach. A proprietary data "windowing" technique is also developed to derive a more representative dataset. To address class imbalance in the dataset, the synthetic minority oversampling technique (SMOTE) is applied for data augmentation. The application of these preprocessing techniques has yielded promising results from a limited initial dataset. The study concludes that the k-nearest neighbors and linear discriminant analysis perform best, and that cepstral peak prominence is a promising measure for identifying Smith-Magenis syndrome.
RESUMEN
Speech is controlled by axial neuromotor systems, therefore, it is highly sensitive to the effects of neurodegenerative illnesses such as Parkinson's Disease (PD). Patients suffering from PD present important alterations in speech, which are manifested in phonation, articulation, prosody, and fluency. These alterations may be evaluated using statistical methods on features obtained from glottal, spectral, cepstral, or fractal descriptions of speech. This work introduces an evaluation paradigm based on Information Theory (IT) to differentiate the effects of PD and aging on glottal amplitude distributions. The study is conducted on a database including 48 PD patients (24 males, 24 females), 48 age-matched healthy controls (HC, 24 males, 24 females), and 48 mid-age normative subjects (NS, 24 males, 24 females). It may be concluded from the study that Hierarchical Clustering (HiCl) methods produce a clear separation between the phonation of PD patients from NS subjects (accuracy of 89.6% for both male and female subsets), but the separation between PD patients and HC subjects is less efficient (accuracy of 75.0% for the male subset and 70.8% for the female subset). Conversely, using feature selection and Support Vector Machine (SVM) classification, the differentiation between PD and HC is substantially improved (accuracy of 94.8% for the male subset and 92.8% for the female subset). This improvement was mainly boosted by feature selection, at a cost of information and generalization losses. The results point to the possibility that speech deterioration may affect HC phonation with aging, reducing its difference to PD phonation.
Asunto(s)
Envejecimiento/fisiología , Enfermedad de Parkinson/fisiopatología , Fonación/fisiología , Trastornos del Habla/fisiopatología , Máquina de Vectores de Soporte , Anciano , Diagnóstico Diferencial , Femenino , Humanos , Masculino , Enfermedad de Parkinson/complicaciones , Acústica del Lenguaje , Trastornos del Habla/etiologíaRESUMEN
Speech articulation is produced by the movements of muscles in the larynx, pharynx, mouth and face. Therefore speech shows acoustic features as formants which are directly related with neuromotor actions of these muscles. The first two formants are strongly related with jaw and tongue muscular activity. Speech can be used as a simple and ubiquitous signal, easy to record and process, either locally or on e-Health platforms. This fact may open a wide set of applications in the study of functional grading and monitoring neurodegenerative diseases. A relevant question, in this sense, is how far speech correlates and neuromotor actions are related. This preliminary study is intended to find answers to this question by using surface electromyographic recordings on the masseter and the acoustic kinematics related with the first formant. It is shown in the study that relevant correlations can be found among the surface electromyographic activity (dynamic muscle behavior) and the positions and first derivatives of the first formant (kinematic variables related to vertical velocity and acceleration of the joint jaw and tongue biomechanical system). As an application example, it is shown that the probability density function associated to these kinematic variables is more sensitive than classical features as Vowel Space Area (VSA) or Formant Centralization Ratio (FCR) in characterizing neuromotor degeneration in Parkinson's Disease.
Asunto(s)
Electromiografía/métodos , Músculo Masetero/fisiología , Modelos Neurológicos , Medición de la Producción del Habla/métodos , Habla/fisiología , Adulto , Anciano , Fenómenos Biomecánicos , Disartria/diagnóstico , Disartria/etiología , Humanos , Maxilares/fisiología , Persona de Mediana Edad , Enfermedad de Parkinson/complicaciones , Enfermedad de Parkinson/diagnóstico , Lengua/fisiologíaRESUMEN
Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.