Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.194
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
J Acoust Soc Am ; 155(6): 3848-3860, 2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38884524

RESUMO

The ability to accurately classify accents and assess accentedness in non-native speakers are challenging tasks due primarily to the complexity and diversity of accent and dialect variations. In this study, embeddings from advanced pretrained language identification (LID) and speaker identification (SID) models are leveraged to improve the accuracy of accent classification and non-native accentedness assessment. Findings demonstrate that employing pretrained LID and SID models effectively encodes accent/dialect information in speech. Furthermore, the LID and SID encoded accent information complement an end-to-end (E2E) accent identification (AID) model trained from scratch. By incorporating all three embeddings, the proposed multi-embedding AID system achieves superior accuracy in AID. Next, leveraging automatic speech recognition (ASR) and AID models is investigated to explore accentedness estimation. The ASR model is an E2E connectionist temporal classification model trained exclusively with American English (en-US) utterances. The ASR error rate and en-US output of the AID model are leveraged as objective accentedness scores. Evaluation results demonstrate a strong correlation between scores estimated by the two models. Additionally, a robust correlation between objective accentedness scores and subjective scores based on human perception is demonstrated, providing evidence for the reliability and validity of using AID-based and ASR-based systems for accentedness assessment in non-native speech. Such advanced systems would benefit accent assessment in language learning as well as speech and speaker assessment for intelligibility, quality, and speaker diarization and speech recognition advancements.


Assuntos
Percepção da Fala , Interface para o Reconhecimento da Fala , Humanos , Percepção da Fala/fisiologia , Acústica da Fala , Fonética , Idioma , Medida da Produção da Fala/métodos , Feminino , Masculino
2.
J Acoust Soc Am ; 155(6): 3877-3888, 2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38888391

RESUMO

The quality of speech input influences the efficiency of L1 and L2 acquisition. This study examined modifications in infant-directed speech (IDS) and foreigner-directed speech (FDS) in Standard Mandarin-a tonal language-and explored how IDS and FDS features were manifested in disyllabic words and a longer discourse. The study aimed to determine which characteristics of IDS and FDS were enhanced in comparison with adult-directed speech (ADS), and how IDS and FDS differed when measured in a common set of acoustic parameters. For words, it was found that tone-bearing vowel duration, mean and range of fundamental frequency (F0), and the lexical tone contours were enhanced in IDS and FDS relative to ADS, except for the dipping Tone 3 that exhibited an unexpected lowering in FDS, but no modification in IDS when compared with ADS. For the discourse, different aspects of temporal and F0 enhancements were emphasized in IDS and FDS: the mean F0 was higher in IDS whereas the total discourse duration was greater in FDS. These findings add to the growing literature on L1 and L2 speech input characteristics and their role in language acquisition.


Assuntos
Acústica da Fala , Humanos , Feminino , Masculino , Lactente , Adulto , Fonética , Medida da Produção da Fala/métodos , Adulto Jovem , Multilinguismo , Qualidade da Voz , Acústica , Idioma , Fatores de Tempo , Percepção da Fala
3.
J Acoust Soc Am ; 155(4): 2836-2848, 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38682915

RESUMO

This paper evaluates an innovative framework for spoken dialect density prediction on children's and adults' African American English. A speaker's dialect density is defined as the frequency with which dialect-specific language characteristics occur in their speech. Rather than treating the presence or absence of a target dialect in a user's speech as a binary decision, instead, a classifier is trained to predict the level of dialect density to provide a higher degree of specificity in downstream tasks. For this, self-supervised learning representations from HuBERT, handcrafted grammar-based features extracted from ASR transcripts, prosodic features, and other feature sets are experimented with as the input to an XGBoost classifier. Then, the classifier is trained to assign dialect density labels to short recorded utterances. High dialect density level classification accuracy is achieved for child and adult speech and demonstrated robust performance across age and regional varieties of dialect. Additionally, this work is used as a basis for analyzing which acoustic and grammatical cues affect machine perception of dialect.


Assuntos
Negro ou Afro-Americano , Acústica da Fala , Humanos , Adulto , Criança , Masculino , Feminino , Medida da Produção da Fala/métodos , Idioma , Pré-Escolar , Adulto Jovem , Percepção da Fala , Adolescente , Fonética , Linguagem Infantil
4.
Clin Linguist Phon ; 38(2): 97-115, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-36592050

RESUMO

To study the possibility of using acoustic parameters, i.e., Acoustic Voice Quality Index (AVQI) and Maximum Phonation Time (MPT) for predicting the degree of lung involvement in COVID-19 patients. This cross-sectional case-control study was conducted on the voice samples collected from 163 healthy individuals and 181 patients with COVID-19. Each participant produced a sustained vowel/a/, and a phonetically balanced Persian text containing 36 syllables. AVQI and MPT were measured using Praat scripts. Each patient underwent a non-enhanced chest computed tomographic scan and the Total Opacity score was rated to assess the degree of lung involvement. The results revealed significant differences between patients with COVID-19 and healthy individuals in terms of AVQI and MPT. A significant difference was also observed between male and female participants in AVQI and MPT. The results from the receiver operating characteristic curve analysis and area under the curve indicated that MPT (0.909) had higher diagnostic accuracy than AVQI (0.771). A significant relationship was observed between AVQI and TO scores. In the case of MPT, however, no such relationship was observed. The findings indicated that MPT was a better classifier in differentiating patients from healthy individuals, in comparison with AVQI. The results also showed that AVQI can be used as a predictor of the degree of patients' and recovered individuals' lung involvement. A formula is suggested for calculating the degree of lung involvement using AVQI.


Assuntos
COVID-19 , Disfonia , Humanos , Masculino , Feminino , Disfonia/diagnóstico , Acústica da Fala , Estudos de Casos e Controles , Estudos de Viabilidade , Estudos Transversais , Reprodutibilidade dos Testes , Índice de Gravidade de Doença , Acústica , Tomografia , Medida da Produção da Fala/métodos
5.
Int J Lang Commun Disord ; 58(4): 1251-1267, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36861494

RESUMO

BACKGROUND: Speech-language pathologists often multitask in order to be efficient with their commonly large caseloads. In stuttering assessment, multitasking often involves collecting multiple measures simultaneously. AIMS: The present study sought to determine reliability when collecting multiple measures simultaneously versus individually. METHODS & PROCEDURES: Over two time periods, 50 graduate students viewed videos of four persons who stutter (PWS) and counted the number of stuttered syllables and total number of syllables uttered, and rated speech naturalness. Students were randomly assigned to one of two groups: the simultaneous group, in which all measures were gathered during one viewing; and the individual group, in which one measure was gathered per viewing. Relative and absolute intra- and inter-rater reliability values were calculated for each measure. OUTCOMES & RESULTS: The following results were notable: better intra-rater relative reliability for the number of stuttered syllables for the individual group (intraclass correlation coefficient (ICC) = 0.839) compared with the simultaneous group (ICC = 0.350), smaller intra-rater standard error of measurement (SEM) (i.e., better absolute reliability) for the number of stuttered syllables for the individual group (7.40) versus the simultaneous group (15.67), and better inter-rater absolute reliability for the total number of syllables for the individual group (88.29) compared with the simultaneous group (125.05). Absolute reliability was unacceptable for all measures across both groups. CONCLUSIONS & IMPLICATIONS: These findings show that judges are likely to be more reliable when identifying stuttered syllables in isolation than when simultaneously collecting them with total syllables spoken and naturalness data. Results are discussed in terms of narrowing the reliability gap between data collection methods for stuttered syllables, improving overall reliability of stuttering measurements, and a procedural change when implementing widely used stuttering assessment protocols. WHAT THIS PAPER ADDS: What is already known on the subject The reliability of stuttering judgments has been found to be unacceptable across a number of studies, including those examining the reliability of the most popular stuttering assessment tool, the Stuttering Severity Instrument (4th edition). The SSI-4, and other assessment applications, involve collecting multiple measures simultaneously. It has been suggested, but not examined, that collecting measures simultaneously, which occurs in the most popular stuttering assessment protocols, may result in substantially inferior reliability when compared to collecting measures individually. What this paper adds to existing knowledge The present study has multiple novel findings. First, relative and absolute intra-rater reliability were substantially better when stuttered syllables data were collected individually compared to when the same data were collected simultaneously with total number of syllables and speech naturalness data. Second, inter-rater absolute reliability for total number of syllables was also substantially better when collected individually. Third, intra-rater and inter-rater reliability were similar when speech naturalness ratings were given individually compared to when they were given while simultaneously counting stuttered and fluent syllables. What are the potential or actual clinical implications of this work? Clinicians can be more reliable when identifying stuttered syllables individually compared to when they judge stuttering along with other clinical measures of stuttering. In addition, when clinicians and researchers use current popular protocols for assessing stuttering that recommend simultaneous data collection, including the SSI-4, they should instead consider collecting stuttering event counts individually. This procedural change will lead to more reliable data and stronger clinical decision making.


Assuntos
Gagueira , Humanos , Reprodutibilidade dos Testes , Índice de Gravidade de Doença , Fala , Medida da Produção da Fala/métodos , Gagueira/diagnóstico
6.
Int J Lang Commun Disord ; 58(2): 279-294, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36117378

RESUMO

BACKGROUND: Auditory-perceptual assessment of voice is a subjective procedure. Artificial intelligence with deep learning (DL) may improve the consistency and accessibility of this task. It is unclear how a DL model performs on different acoustic features. AIMS: To develop a generalizable DL framework for identifying dysphonia using a multidimensional acoustic feature. METHODS & PROCEDURES: Recordings of sustained phonations of /a/ and /i/ were retrospectively collected from a clinical database. Subjects contained 238 dysphonic and 223 vocally healthy speakers of Chinese Mandarin. All audio clips were split into multiple 1.5-s segments and normalized to the same loudness level. Mel frequency cepstral coefficients and mel-spectrogram were extracted from these standardized segments. Each set of features was used in a convolutional neural network (CNN) to perform a binary classification task. The best feature was obtained through a five-fold cross-validation on a random selection of 80% data. The resultant DL framework was tested on the remaining 20% data and a public German voice database. The performance of the DL framework was compared with those of two baseline machine-learning models. OUTCOMES & RESULTS: The mel-spectrogram yielded the best model performance, with a mean area under the receiver operating characteristic curve of 0.972 and an accuracy of 92% in classifying audio segments. The resultant DL framework significantly outperformed both baseline models in detecting dysphonic subjects on both test sets. The best outcomes were achieved when classifications were made based on all segments of both vowels, with 95% accuracy, 92% recall, 98% precision and 98% specificity on the Chinese test set, and 92%, 95%, 90% and 89%, respectively, on the German set. CONCLUSIONS & IMPLICATIONS: This study demonstrates the feasibility of DL for automatic detection of dysphonia. The mel-spectrogram is a preferred acoustic feature for the task. This framework may be used for vocal health screening and facilitate automatic perceptual evaluation of voice in the era of big data. WHAT THIS PAPER ADDS: What is already known on this subject Auditory-perceptual assessment is the current gold standard in clinical evaluation of voice quality, but its value may be limited by the rater's reliability and accessibility. DL is a new method of artificial intelligence that can overcome these disadvantages and promote automatic voice assessment. This study explored the feasibility of a DL approach for automatic detection of dysphonia, along with a quantitative comparison of two common sets of acoustic features. What this study adds to existing knowledge A CNN model is excellent at decoding multidimensional acoustic features, outperforming the baseline parameter-based models in identifying dysphonic voices. The first 13 mel-frequency cepstral coefficients (MFCCs) are sufficient for this task. The mel-spectrogram results in greater performance, indicating the acoustic features are presented in a more favourable way than the MFCCs to the CNN model. What are the potential or actual clinical implications of this work? DL is a feasible method for the detection of dysphonia. The current DL framework may be used for remote vocal health screening or documenting voice recovery after treatment. In future, DL models may potentially be used to perform auditory-perceptual tasks in an automatic, efficient, reliable and low-cost manner.


Assuntos
Aprendizado Profundo , Disfonia , Humanos , Disfonia/diagnóstico , Acústica da Fala , Estudos Retrospectivos , Inteligência Artificial , Reprodutibilidade dos Testes , Medida da Produção da Fala/métodos , Acústica
7.
Sensors (Basel) ; 23(11)2023 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-37299922

RESUMO

Biometrics-based authentication has become the most well-established form of user recognition in systems that demand a certain level of security. For example, the most commonplace social activities stand out, such as access to the work environment or to one's own bank account. Among all biometrics, voice receives special attention due to factors such as ease of collection, the low cost of reading devices, and the high quantity of literature and software packages available for use. However, these biometrics may have the ability to represent the individual impaired by the phenomenon known as dysphonia, which consists of a change in the sound signal due to some disease that acts on the vocal apparatus. As a consequence, for example, a user with the flu may not be properly authenticated by the recognition system. Therefore, it is important that automatic voice dysphonia detection techniques be developed. In this work, we propose a new framework based on the representation of the voice signal by the multiple projection of cepstral coefficients to promote the detection of dysphonic alterations in the voice through machine learning techniques. Most of the best-known cepstral coefficient extraction techniques in the literature are mapped and analyzed separately and together with measures related to the fundamental frequency of the voice signal, and its representation capacity is evaluated on three classifiers. Finally, the experiments on a subset of the Saarbruecken Voice Database prove the effectiveness of the proposed material in detecting the presence of dysphonia in the voice.


Assuntos
Disfonia , Voz , Humanos , Disfonia/diagnóstico , Acústica da Fala , Qualidade da Voz , Medida da Produção da Fala/métodos
8.
Clin Linguist Phon ; 37(3): 258-271, 2023 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-35652557

RESUMO

This study aimed to determine the effect of phonological and morphological factors on the dysfluencies of Nepali-speaking adults who stutter. Eighteen Nepali-speaking adult speakers with mild to very severe developmental stuttering were recruited. The spontaneous speech sample was audio-video recorded and transcribed through orthographic transcription. A total of 350 syllables were analysed to calculate stuttering frequency. Phoneme position, phoneme category, and word length were considered as the phonological factors and word-class as morphological factors. The percentage of stuttering for each of these variables was computed. The study's outcome displayed a significant effect of phoneme position and word length but no effect of phoneme category. Significantly greater stuttering was noticed in the word-initial position and longer words compared to word-medial and shorter words, respectively. In morphological factors, content words and content-function words had a greater stuttering rate than function words. This study showed a significant effect of phoneme position, word length, and grammatical class on the frequency of dysfluency in Nepali-speaking adults who stutter but no effect of phoneme category. The phonetic complexity of these variables may lead to an increase in motor planning demand resulting in more stuttering.


Assuntos
Gagueira , Adulto , Humanos , Medida da Produção da Fala/métodos , Idioma , Fala , Fonética
9.
Clin Linguist Phon ; 37(4-6): 454-472, 2023 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-35801560

RESUMO

There is a general need for more knowledge on the development of French phonology, and little information is currently available for typically developing French-speaking three-year-old children. This study took place in Belgium and explores the accuracy of speech production of 34 typically developing French-speaking children using a picture naming task. Measures of speech accuracy revealed lower performance than previously seen in the literature. We investigated speech accuracy across different phonological contexts in light of characteristics of target words that are known to have an influence on speech production, namely the condition of production (spontaneous vs. imitated), the length of the word (in number of syllables), syllable complexity (singleton vs. cluster) and positional complexity (onset vs. coda). Results indicate that the accuracy of words produced spontaneously did not differ from imitated words. The presence of consonant clusters in the target word was associated with lower performance on measures of Percentage of Consonants Correct and Whole Word Proximity for both 1- and 4-syllable words. Singleton codas were produced less accurately than onsets in 1-syllable words. Word-internal singleton codas were produced less accurately than final codas. In our sample, 1-syllable words showed surprisingly low levels of performance which we can explain by an over-representation of phonologically complex properties in the target words used in the present study. These results highlight the importance of assessing various aspects of phonological complexity in French speech tasks in order to detect developmental errors in typically developing children and, ultimately, help identify children with speech sound disorders.


Assuntos
Idioma , Fonética , Humanos , Criança , Pré-Escolar , Medida da Produção da Fala/métodos , Fala , Linguagem Infantil
10.
Clin Linguist Phon ; 37(1): 1-16, 2023 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-34844496

RESUMO

This study aimed to investigate the linguistic factors involved in stuttering among Japanese-speaking preschool children. The participants included 10 Japanese children who stutter, with a mean age of 5 years and 9 months. Speech samples comprised spontaneous conversations of the participants with their parents for about 20 minutes. We compared the percentages of the occurrence of stuttering-like disfluencies (SLDs) at the word and sentence levels, using the Wilcoxon signed-rank test. The results showed no significant differences in SLDs based on syllable structure when comparing light and heavy syllables and comparing consonants and vowels in the initial position of each content word. SLDs occurred more frequently in the initial than non-initial position of words and in longer rather than shorter words. Additionally, SLDs occurred more frequently in sentences that contained more 'bunsetsu' (a kind of linguistic unit in Japanese). Our study is the first to show that both word and sentence-level factors could contribute to SLDs in preschool children who stutter in agglutinating languages, such as Japanese. This aspect is rarely reported in psycholinguistic studies based on stuttering occurrence in inflecting languages, such as English.


Assuntos
Gagueira , Humanos , Pré-Escolar , População do Leste Asiático , Medida da Produção da Fala/métodos , Linguística/métodos , Fala
11.
Clin Linguist Phon ; 37(12): 1141-1156, 2023 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-36592037

RESUMO

Speech language pathologists regularly use perceptual methods in clinical practice to assess children's speech. In this study, we examined relationships between measures of speech intelligibility, clinical articulation test results, age, and perceptual ratings of articulatory goodness for children. We also examined the extent to which established measures of intelligibility and clinical articulation test results predicted articulatory goodness ratings, and whether goodness ratings were influenced by intelligibility. A sample of 164 (30-47 months) typically developing children provided speech samples and completed a standardised articulation test. Single word intelligibility scores and ratings of articulatory goodness were gathered from 328 naïve listeners; scores on a standardised articulation test were obtained from each child. Bivariate Pearson correlation, linear regression, and linear mixed effects modelling were used for analysis. Results showed that articulatory goodness ratings had the highest correlation with intelligibility, followed by age, followed by articulation score. Age and clinical articulation scores were both significant predictors of goodness ratings, but articulation scores made only a small contribution to prediction. Articulatory goodness ratings were substantially lower for unintelligible words compared to intelligible words, but articulatory goodness scores increased with age at the same rate for unintelligible and intelligible words. Perceptual ratings of articulatory goodness are sensitive to developmental changes in speech production (regardless of intelligibility) and yield a different kind of information than clinical articulation scores from standardised measures.


Assuntos
Fonética , Inteligibilidade da Fala , Criança , Pré-Escolar , Humanos , Cognição , Medida da Produção da Fala/métodos , Transtornos da Articulação
12.
Clin Linguist Phon ; 37(1): 52-76, 2023 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-34955083

RESUMO

Speech intelligibility is an essential though complex construct in speech pathology. In this paper, we investigated the interrater reliability and validity of two types of intelligibility measures: a rating-based measure, through Visual Analogue Scales (VAS), and a transcription-based measure called Accuracy of Words (AcW), through two forms of orthographic transcriptions, one containing only existing words (EWTrans) and one allowing all sorts of words, including both existing words and pseudowords (AWTrans). Both VAS and AcW scores were collected from five expert raters. We selected speakers with various severity levels of dysarthria (SevL) and employed two types of speech materials, i.e. meaningful sentences and word lists. To measure reliability, we applied Generalizability Theory, which is relatively unknown in the field of pathological speech and language research but enables more comprehensive analyses than traditional methods, e.g., the intraclass correlation coefficient. The results convincingly indicate that five expert raters were sufficient to provide reliable rating-based (VAS) and transcription-based (AcW) measures, and that reliability increased as the number of raters or utterances increased. Generalizability Theory has proved effective in systematically dealing with reliability issues in our experimental design. We also investigated construct and concurrent validity. Construct validity was addressed by exploring the correlations between VAS and AcW within and across speech materials. Concurrent validity was addressed by exploring the correlations between our measures, i.e. VAS and AcW, and two external measures, i.e. phoneme intelligibility and SevL. The correlations corroborate the validity of VAS and AcW to assess speech intelligibility, both in sentences and word lists.


Assuntos
Inteligibilidade da Fala , Patologia da Fala e Linguagem , Humanos , Reprodutibilidade dos Testes , Disartria/diagnóstico , Medida da Produção da Fala/métodos
13.
Clin Linguist Phon ; 37(4-6): 345-362, 2023 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-36106455

RESUMO

Accumulating evidence suggests that ultrasound visual feedback increases the treatment efficacy for persistent speech sound errors. However, the available evidence is mostly from English. This is a feasibility study of ultrasound visual feedback for treating distortion of Finnish [r]. We developed a web-based application for auditory-perceptual judgement. We investigated the impact of listener's experience on perceptual judgement and the intra-rater reliability of listeners. Four boys (10-11 years) with distortion of [r], otherwise typical development, partook in eight ultrasound treatment sessions. In total, 117 [r] samples collected at pre- and post-intervention were judged with visual analogue scale (VAS) by two listener groups: five speech and language therapists (SLTs) and six SLT students. We constructed a linear mixed-effects model with fixed effects for time and listener group and several random effects. Our findings indicate that measurement time had a significant main effect on judgement results, χ2 = 78.82, p < 0.001. Effect of listener group was non-significant, but a significant main effect of interaction of group × time, χ2 = 6.33, p < 0.012 was observed. We further explored the effect of group with nested models, and results revealed a non-significant effect of group. The average intra-rater correlation of the 11 listeners was 0.83 for the pre-intervention samples and 0.92 for post-intervention showing a good or excellent degree of agreement. Finnish [r] sound can be evaluated with VAS and ultrasound visual feedback is a feasible and promising method in treatment for distortion of [r], and its efficacy should be further assessed.


Assuntos
Retroalimentação Sensorial , Percepção da Fala , Masculino , Humanos , Escala Visual Analógica , Reprodutibilidade dos Testes , Finlândia , Estudos de Viabilidade , Medida da Produção da Fala/métodos
14.
Clin Linguist Phon ; 37(10): 935-957, 2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-35971981

RESUMO

This multiple case pilot study explored how nonword imitation influences articulatory and segmental performance in children with and without speech disorder. Eight children, ages 4- to 8-years-old, participated, including two children with childhood apraxia of speech (CAS), four children with phonological disorder (PD), and two children with typical development (TD). Tokens included two complexity types and were presented in random order. Minimal feedback was provided and nonwords were never associated with a referent. Kinematic and transcription data were analysed to examine articulatory variability, segmental accuracy, and segmental variability in session 1 and session 5. Descriptive statistics, percent change, effect sizes, and Pearson correlations are reported. In session 1, the two participants with CAS showed high articulatory variability, low segmental accuracy, and high segmental variability compared to the participants with PD and TD. By session 5, both participants with CAS, two with PD, and one with TD showed increased articulatory variability in the lowest complexity nonword. Segmental accuracy remained low and variability remained high for the two participants with CAS in session 5, whereas several participants with PD and TD showed improved segmental performance. Articulatory and segmental variability were not significantly correlated. The results of this study suggest that motor practice with minimal feedback and no assignment of a lexical referent can instantiate positive changes to segmental performance for children without apraxia. Positive changes to segmental performance are not necessarily related to increased articulatory control; these two processing levels can show distinct and disparate learning trajectories.


Assuntos
Apraxias , Fala , Humanos , Criança , Pré-Escolar , Projetos Piloto , Distúrbios da Fala , Medida da Produção da Fala/métodos
15.
Clin Linguist Phon ; 37(2): 169-195, 2023 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-35243947

RESUMO

Speech sound disorders can pose a challenge to communication in children that may persist into adulthood. As some speech sounds are known to require differential control of anterior versus posterior regions of the tongue body, valid measurement of the degree of differentiation of a given tongue shape has the potential to shed light on development of motor skill in typical and disordered speakers. The current study sought to compare the success of multiple techniques in quantifying tongue shape complexity as an index of degree of lingual differentiation in child and adult speakers. Using a pre-existing data set of ultrasound images of tongue shapes from adult speakers producing a variety of phonemes, we compared the extent to which three metrics of tongue shape complexity differed across phonemes/phoneme classes that were expected to differ in articulatory complexity. We then repeated this process with ultrasound tongue shapes produced by a sample of young children. The results of these comparisons suggested that a modified curvature index and a metric representing the number of inflection points best reflected small changes in tongue shapes across individuals differing in vocal tract size. Ultimately, these metrics have the potential to reveal delays in motor skill in young children, which could inform assessment procedures and treatment decisions for children with speech delays and disorders.


Assuntos
Benchmarking , Fonética , Adulto , Humanos , Criança , Pré-Escolar , Medida da Produção da Fala/métodos , Fala , Língua/diagnóstico por imagem , Ultrassonografia/métodos
16.
Vestn Otorinolaringol ; 88(5): 23-26, 2023.
Artigo em Russo | MEDLINE | ID: mdl-37970766

RESUMO

In order to evaluate the effectiveness of the treatment in patients with functional dysphonia, the Cepstral Peak Prominence (CPP) test was used. Twenty dysphonic women aged from 18 to 47 years were under observation. The control group consisted of 20 healthy women of close age. Patients underwent 5-7 sessions electrostimulation of laryngeal muscles and phonopedic treatment, after which a complete restoration of the voice was noted. The Praat clinical program was used, installed on a Hewlett-Packard 630 laptop (Pentium B960, 2.2 GHz). A SHURE SM94 condenser microphone was used as well. In the control group, the results were as follows: M=7.49 (SD=1.26) dB. In the main group before treatment: M=5.00 (SD=1.07) dB, after treatment: M=7.95 (SD=1.34) dB. Differences in KT values in the main group before and after treatment (5.00 dB and 7.95 dB, respectively) were significant at p<0.0001. Differences in KT values in the main group before treatment (5.00 dB) and in the control group (7.49 dB) were significant at p<0.0001. Differences in KT values in the main group after treatment (7.95 dB) and in the control group (7.49 dB) were not significant at p>0.05. The study showed high sensitivity of the method. The CPP data after treatment were higher than those before treatment and did not differ from the control ones. It is concluded that CPP is a highly sensitive method for evaluating the degree of periodicity of an acoustic signal and can be used to evaluate the effectiveness of treatment in patients with functional dysphonia.


Assuntos
Disfonia , Voz , Humanos , Feminino , Disfonia/diagnóstico , Disfonia/terapia , Acústica da Fala , Medida da Produção da Fala/métodos , Acústica
17.
Eur Arch Otorhinolaryngol ; 279(9): 4617-4621, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35522325

RESUMO

PURPOSE: Investigating whether the Acoustic Voice Quality Index (AVQI) and the Acoustic Breathiness Index (ABI) are valid and comparable to previous unmasked measurements if the speaker wears a surgical mask or a FFP-2 mask to reduce the risk of transmitting air-borne viruses such as SARS-CoV-2. METHODS: A convenience sample of 31 subjectively healthy participants was subjected to AVQI and ABI voice examination four times: Twice wearing no mask, once with a surgical mask and once with a FFP-2 mask as used regularly in our hospital. The order of the four mask conditions was randomized. The difference in the results between the two recordings without a mask was then compared to the differences between the recordings with each mask and one recording without a mask. RESULTS: Sixty-two percent of the AVQI readings without a mask represented perfectly healthy voices, the largest AVQI without a mask value was 4.0. The mean absolute difference in AVQI was 0.45 between the measurements without masks, 0.48 between no mask and surgical mask and 0.51 between no mask and FFP-2 mask. The results were neither clinically nor statistically significant. For the ABI the resulting absolute differences (in the same order) were 0.48, 0.69 and 0.56, again neither clinically nor statistically different. CONCLUSION: Based on a convenience sample of healthy or only mildly impaired voices wearing CoViD-19 protective masks does not substantially impair the results of either AVQI or ABI results.


Assuntos
COVID-19 , Disfonia , Acústica , COVID-19/prevenção & controle , Disfonia/diagnóstico , Humanos , Máscaras , Reprodutibilidade dos Testes , SARS-CoV-2 , Índice de Gravidade de Doença , Acústica da Fala , Medida da Produção da Fala/métodos , Qualidade da Voz
18.
J Acoust Soc Am ; 152(1): 580, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35931551

RESUMO

Recent studies have advocated for the use of connected speech in clinical voice and speech assessment. This suggestion is based on the presence of clinically relevant information within the onset, offset, and variation in connected speech. Existing works on connected speech utilize methods originally designed for analysis of sustained vowels and, hence, cannot properly quantify the transient behavior of connected speech. This study presents a non-parametric approach to analysis based on a two-dimensional, temporal-spectral representation of speech. Variations along horizontal and vertical axes corresponding to the temporal and spectral dynamics of speech were quantified using two statistical models. The first, a spectral model, was defined as the probability of changes between the energy of two consecutive frequency sub-bands at a fixed time segment. The second, a temporal model, was defined as the probability of changes in the energy of a sub-band between consecutive time segments. As the first step of demonstrating the efficacy and utility of the proposed method, a diagnostic framework was adopted in this study. Data obtained revealed that the proposed method has (at minimum) significant discriminatory power over the existing alternative approaches.


Assuntos
Percepção da Fala , Fala , Acústica , Acústica da Fala , Medida da Produção da Fala/métodos
19.
J Acoust Soc Am ; 152(6): 3444, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36586884

RESUMO

When making voice interactions with hands-free speech communication devices, direction-of-arrival estimation is an essential step. To address the detrimental influence of unavoidable background noise and interference speech on direction-of-arrival estimation, this paper introduces a stacked self-attention network system, a supervised deep learning method that enables utterance level estimation without requirement for any pre-processing such as voice activity detection. Specifically, alternately stacked time- and frequency-dependent self-attention blocks are designed to process information in terms of time and frequency, respectively. The former blocks focus on the importance of each time frame of the received audio mixture and perform temporal selection to reduce the influence of non-speech and interference frames, while the latter blocks are utilized to derive inner-correlation among different frequencies. Additionally, the non-causal convolution and self-attention networks are replaced by causal ones, enabling real-time direction-of-arrival estimation with a latency of only 6.25 ms. Experiments with simulated and measured room impulse responses, as well as real recordings, verify the advantages of the proposed method over the state-of-the-art baselines.


Assuntos
Ruído , Fala , Medida da Produção da Fala/métodos
20.
Int J Lang Commun Disord ; 57(5): 1023-1049, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35714104

RESUMO

'Dysarthria' is a group of motor speech disorders resulting from a disturbance in neuromuscular control. Most individuals with dysarthria cope with communicative restrictions due to speech impairments and reduced intelligibility. Thus, language-sensitive measurements of intelligibility are important in dysarthria neurological assessment. The Frenchay Dysarthria Assessment, 2nd edition (FDA-2), is a validated tool for the identification of the nature and patterns of oro-motor movements associated with different types of dysarthria. The current study conducted a careful culture- and linguistic-sensitive adaption of the two intelligibility subtests of the FDA-2 to Hebrew (words and sentences) and performed a preliminary validation with relevant clinical populations. First, sets of Hebrew words and sentences were constructed, based on the criteria defined in FDA-2, as well as on several other factors that may affect performance: emotional valence, arousal and familiarity. Second, the new subtests were validated in healthy older adults (n = 20), and in two clinical groups (acquired dysarthria, n = 15; and developmental dysarthria, n = 19). Analysis indicated that the new subtests were found to be specific and sensitive, valid and reliable, as scores significantly differ between healthy older adults and adults with dysarthria, correlated with other subjective measures of intelligibility, and showed high test-retest reliability. The words and sentences intelligibility subtests can be used to evaluate speech disorders in various populations of Hebrew speakers, thus may be an important addition to the speech-language pathologist's toolbox, for clinical work as well as for research purposes. WHAT THIS PAPER ADDS: What is already known on the subject 'Dysarthria' is a group of disorders reflecting impairments in the strength, speed and precision of movements required for adequate control of the various speech subsystems. Reduced speech intelligibility is one of the main consequences of all dysarthria subtypes, irrespective of their underlying cause. Indeed, most individuals with dysarthria cope with communicative restrictions due to speech impairments. Thus, language-sensitive measurements of intelligibility are important in dysarthria assessment. The FDA-2's words and sentences subtests present standardized and validated tools for the identification of the nature and patterns of oro-motor movements associated with different types of dysarthria. What this paper adds to existing knowledge The lack of assessment tools in Hebrew poses challenges to clinical evaluation as well as research purposes. The current study conducted a careful culture- and linguistic-sensitive adaption of the FDA-2 intelligibility subtests to Hebrew and performed a preliminary validation with relevant clinical populations. First, sets of Hebrew words and sentences were constructed, based on the criteria defined in FDA-2, as well as on several other factors that may affect performance: emotional valence, arousal and familiarity. Second, the new subtests were validated in healthy older adults (n = 20), and in two clinical groups (adults with acquired dysarthria, n = 15; and young adults with developmental dysarthria, n = 19). What are the potential or actual clinical implications of this work? Analyses indicated that the new word and sentence subtests are specific, sensitive, valid and reliable. Namely, (1) they successfully differentiate between healthy individuals and individuals with dysarthria; (2) they correlate with other subjective measures of intelligibility; and (3) they show high test-retest reliability. The words and sentences intelligibility subtests can be used to evaluate speech disorders in various populations of Hebrew speakers. Thus, they may be an important addition to the speech-language pathologist's toolbox, for clinical and research purposes. The methods described here can be emulated for the adaptation of speech assessment tools to other languages.


Assuntos
Disartria , Inteligibilidade da Fala , Idoso , Disartria/psicologia , Humanos , Linguística , Reprodutibilidade dos Testes , Distúrbios da Fala/complicações , Medida da Produção da Fala/métodos , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA