Pesquisa | Portal de Pesquisa da BVS

Determining the relevance of different aspects of formant contours to intelligibility.

Amano-Kusumoto, Akiko; Hosom, John-Paul; Kain, Alexander; Aronoff, Justin M.

Speech Commun ; 59: 1-9, 2014 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-24910484

RESUMO

Previous studies have shown that "clear" speech, where the speaker intentionally tries to enunciate, has better intelligibility than "conversational" speech, which is produced in regular conversation. However, conversational and clear speech vary along a number of acoustic dimensions and it is unclear what aspects of clear speech lead to better intelligibility. Previously, Kain et al. [J. Acoust. Soc. Am. 124 (4), 2308-2319 (2008)] showed that a combination of short-term spectra and duration was responsible for the improved intelligibility of one speaker. This study investigates subsets of specific features of short-term spectra including temporal aspects. Similar to Kain's study, hybrid stimuli were synthesized with a combination of features from clear speech and complementary features from conversational speech to determine which acoustic features cause the improved intelligibility of clear speech. Our results indicate that, although steady-state formant values of tense vowels contributed to the intelligibility of clear speech, neither the steady-state portion nor the formant transition was sufficient to yield comparable intelligibility to that of clear speech. In contrast, when the entire formant contour of conversational speech including the phoneme duration was replaced by that of clear speech, intelligibility was comparable to that of clear speech. It indicated that the combination of formant contour and duration information was relevant to the improved intelligibility of clear speech. The study provides a better understanding of the relevance of different aspects of formant contours to the improved intelligibility of clear speech.

Evaluation of a speech recognition prototype for speakers with moderate and severe dysarthria: a preliminary report.

Fager, Susan K; Beukelman, David R; Jakobs, Tom; Hosom, John-Paul.

Augment Altern Commun ; 26(4): 267-77, 2010 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-21091303

RESUMO

This study described preliminary work with the Supplemented Speech Recognition (SSR) system for speakers with dysarthria. SSR incorporated automatic speech recognition optimized for dysarthric speech, alphabet supplementation, and word prediction. Participants included seven individuals with a range of dysarthria severity. Keystroke savings using SSR averaged 68.2% for typical sentences and 67.5% for atypical phrases. This was significantly different to using word prediction alone. The SSR correctly identified an average of 80.7% of target stimulus words for typical sentences and 82.8% for atypical phrases. Statistical significance could not be claimed for the relations between sentence intelligibility and keystroke savings or sentence intelligibility and system performance. The results suggest that individuals with dysarthria using SSR could achieve comparable keystroke savings regardless of speech severity.

Assuntos

Auxiliares de Comunicação para Pessoas com Deficiência/normas , Disartria/fisiopatologia , Disartria/reabilitação , Adulto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Índice de Gravidade de Doença , Inteligibilidade da Fala , Interface para o Reconhecimento da Fala/normas , Adulto Jovem

Speaker-Independent Phoneme Alignment Using Transition-Dependent States.

Hosom, John-Paul.

Speech Commun ; 51(4): 352-368, 2009 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-20161342

RESUMO

Determining the location of phonemes is important to a number of speech applications, including training of automatic speech recognition systems, building text-to-speech systems, and research on human speech processing. Agreement of humans on the location of phonemes is, on average, 93.78% within 20 msec on a variety of corpora, and 93.49% within 20 msec on the TIMIT corpus. We describe a baseline forced-alignment system and a proposed system with several modifications to this baseline. Modifications include the addition of energy-based features to the standard cepstral feature set, the use of probabilities of a state transition given an observation, and the computation of probabilities of distinctive phonetic features instead of phoneme-level probabilities. Performance of the baseline system on the test partition of the TIMIT corpus is 91.48% within 20 msec, and performance of the proposed system on this corpus is 93.36% within 20 msec. The results of the proposed system are a 22% relative reduction in error over the baseline system, and a 14% reduction in error over results from a non-HMM alignment system. This result of 93.36% agreement is the best known reported result on the TIMIT corpus.

Hybridizing conversational and clear speech to determine the degree of contribution of acoustic features to intelligibility.

Kain, Alexander; Amano-Kusumoto, Akiko; Hosom, John-Paul.

J Acoust Soc Am ; 124(4): 2308-19, 2008 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-19062869

RESUMO

Speakers naturally adopt a special "clear" (CLR) speaking style in order to be better understood by listeners who are moderately impaired in their ability to understand speech due to a hearing impairment, the presence of background noise, or both. In contrast, speech intended for nonimpaired listeners in quiet environments is referred to as "conversational" (CNV). Studies have shown that the intelligibility of CLR speech is usually higher than that of CNV speech in adverse circumstances. It is not known which individual acoustic features or combinations of features cause the higher intelligibility of CLR speech. The objective of this study is to determine the contribution of some acoustic features to intelligibility for a single speaker. The proposed method creates "hybrid" (HYB) speech stimuli that selectively combine acoustic features of one sentence spoken in the CNV and CLR styles. The intelligibility of these stimuli is then measured in perceptual tests, using 96 phonetically balanced sentences. Results for one speaker show significant sentence-level intelligibility improvements over CNV speech when replacing certain combinations of short-term spectra, phoneme identities, and phoneme durations of CNV speech with those from CLR speech, but no improvements for combinations involving fundamental frequency, energy, or nonspeech events (pauses).

Assuntos

Acústica da Fala , Inteligibilidade da Fala , Estimulação Acústica , Adolescente , Adulto , Algoritmos , Audiometria da Fala , Compreensão , Feminino , Perda Auditiva/fisiopatologia , Humanos , Masculino , Modelos Biológicos , Ruído , Mascaramento Perceptivo , Fonética , Adulto Jovem

Diagnostic Assessment of Childhood Apraxia of Speech Using Automatic Speech Recognition (ASR) Methods.

Hosom, John-Paul; Shriberg, Lawrence; Green, Jordan R.

J Med Speech Lang Pathol ; 12(4): 167-171, 2004 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-17066124

RESUMO

We report findings from two feasibility studies using automatic speech recognition (ASR) methods in childhood speech sound disorders. The studies evaluated and implemented the automation of two recently proposed diagnostic markers for suspected Apraxia of Speech (AOS) termed the Lexical Stress Ratio (LSR) and the Coefficient of Variation Ratio (CVR). The LSR is a weighted composite of amplitude area, frequency area , and duration in the stressed compared to the unstressed vowel as obtained from a speaker's productions of eight trochaic word forms. Composite weightings for the three stress parameters were determined from a principal components analysis. The CVR expresses the average normalized variability of durations of pause and speech events that were obtained from a conversational speech sample. We describe the automation procedures used to obtain LSR and CVR scores for four children with suspected AOS and report comparative findings. The LSR values obtained with ASR were within 1.2% to 6.7% of the LSR values obtained manually using Computerized Speech Lab (CSL). The CVR values obtained with ASR were within 0.7% to 2.7% of the CVR values obtained manually using Matlab. These results indicate the potential of ASR-based techniques to process these and other diagnostic markers of childhood speech sound disorders.

Spoken Language Derived Measures for Detecting Mild Cognitive Impairment.

Roark, Brian; Mitchell, Margaret; Hosom, John-Paul; Hollingshead, Kristy; Kaye, Jeffrey.

IEEE Trans Audio Speech Lang Process ; 19(7): 2081-2090, 2011 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-22199464

RESUMO

Spoken responses produced by subjects during neuropsychological exams can provide diagnostic markers beyond exam performance. In particular, characteristics of the spoken language itself can discriminate between subject groups. We present results on the utility of such markers in discriminating between healthy elderly subjects and subjects with mild cognitive impairment (MCI). Given the audio and transcript of a spoken narrative recall task, a range of markers are automatically derived. These markers include speech features such as pause frequency and duration, and many linguistic complexity measures. We examine measures calculated from manually annotated time alignments (of the transcript with the audio) and syntactic parse trees, as well as the same measures calculated from automatic (forced) time alignments and automatic parses. We show statistically significant differences between clinical subject groups for a number of measures. These differences are largely preserved with automation. We then present classification results, and demonstrate a statistically significant improvement in the area under the ROC curve (AUC) when using automatic spoken language derived features in addition to the neuropsychological test scores. Our results indicate that using multiple, complementary measures can aid in automatic detection of MCI.

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA