RESUMO
Imprecise articulation is the major issue reported in various types of dysarthria. Detection of articulation errors can help in diagnosis. The cues derived from both the burst and the formant transitions contribute to the discrimination of place of articulation of stops. It is believed that any acoustic deviations in stops due to articulation error can be analyzed by deriving features around the burst and the voicing onsets. The derived features can be used to discriminate the normal and dysarthric speech. In this work, a method is proposed to differentiate the voiceless stops produced by the normal speakers from the dysarthric by deriving the spectral moments, two-dimensional discrete cosine transform of linear prediction spectrum and Mel frequency cepstral coefficients features. These features and cosine distance based classifier is used for the classification of normal and dysarthic speech.
Assuntos
Disartria/diagnóstico , Disartria/fisiopatologia , Acústica da Fala , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , PsicolinguísticaRESUMO
Assessment of intelligibility is required to characterize the overall speech production capability and to measure the speech outcome of different interventions for individuals with cleft lip and palate (CLP). Researchers have found that articulation error and hypernasality have a significant effect on the degradation of CLP speech intelligibility. Motivated by this finding, the present work proposes an objective measure of sentence-level intelligibility by combining the information of articulation deficits and hypernasality. These two speech disorders represent different aspects of CLP speech. Hence, it is expected that the composite measure based on them may utilize complementary clinical information. The objective scores of articulation and hypernasality are used as features to train a regression model, and the output of the model is considered as the predicted intelligibility score. The Spearman's correlation coefficient based analysis shows a significant correlation between the predicted and perceptual intelligibility scores (ρ = 0.77, p < 0.001).
Assuntos
Fenda Labial/fisiopatologia , Fissura Palatina/fisiopatologia , Cavidade Nasal/fisiologia , Inteligibilidade da Fala , Voz/fisiologia , Criança , Fenda Labial/complicações , Fissura Palatina/complicações , Feminino , Humanos , Masculino , Acústica da FalaRESUMO
The present work explores the acoustic characteristics of articulatory deviations near g(lottis) landmarks to derive the correlates of cleft lip and palate speech intelligibility. The speech region around the g landmark is used to compute two different acoustic features, namely, two-dimensional discrete cosine transform based joint spectro-temporal features, and Mel-frequency cepstral coefficients. Sentence-specific acoustic models are built using these features extracted from the normal speakers' group. The mean log-likelihood score for each test utterance is computed and tested as the acoustic correlates of intelligibility. Derived intelligibility measure shows significant correlation (ρ = 0.78, p < 0.001) with the perceptual ratings.
Assuntos
Fenda Labial/fisiopatologia , Glote/anatomia & histologia , Palato/fisiopatologia , Inteligibilidade da Fala/classificação , Algoritmos , Criança , Fenda Labial/complicações , Feminino , Análise de Fourier , Glote/fisiologia , Humanos , Índia/epidemiologia , Masculino , Palato/anormalidades , Acústica da Fala , Distúrbios da Fala/fisiopatologia , Distúrbios da Fala/reabilitação , Inteligibilidade da Fala/fisiologia , Percepção da Fala/fisiologia , Medida da Produção da Fala/métodosRESUMO
Intelligibility is considered as one of the primary measures for speech rehabilitation of individuals with a cleft lip and palate (CLP). Currently, speech processing and machine-learning-based objective methods are gaining more research interest as a way to quantify speech intelligibility. In this work, joint spectro-temporal features computed from a time-frequency representation of speech are explored to derive speech representations based on Gaussian posteriograms. A comparative framework using dynamic time warping (DTW) is used to quantify the intelligibility of child CLP speech. The DTW distance is used to score sentence-level intelligibility and tested for correlation with perceptual intelligibility ratings obtained from expert speech-language pathologists. A baseline DTW system using the conventional Mel-frequency cepstral coefficients (MFCCs) is also developed to compare the performance of the proposed system. Spearman's rank correlation coefficient between the objective intelligibility scores and the perceptual intelligibility rating is studied. A Williams significance test is conducted to assess the statistical significance of the correlation difference between the methods. The results show that the system based on joint spectro-temporal features significantly outperforms the MFCC-based system.
RESUMO
In this paper, acoustic analysis of misarticulated trills in cleft lip and palate speakers is carried out using excitation source based features: strength of excitation and fundamental frequency, derived from zero-frequency filtered signal, and vocal tract system features: first formant frequency (F1) and trill frequency, derived from the linear prediction analysis and autocorrelation approach, respectively. These features are found to be statistically significant while discriminating normal from misarticulated trills. Using acoustic features, dynamic time warping based trill misarticulation detection system is demonstrated. The performance of the proposed system in terms of the F1-score is 73.44%, whereas that for conventional Mel-frequency cepstral coefficients is 66.11%.