Pesquisa | Portal Regional da BVS

1.

Speech Entrainment in Adolescent Conversations: A Developmental Perspective.

Wynn, Camille J; Barrett, Tyson S; Berisha, Visar; Liss, Julie M; Borrie, Stephanie A.

J Speech Lang Hear Res ; 66(8S): 3132-3150, 2023 08 17.

Artigo em Inglês | MEDLINE | ID: mdl-37071795

RESUMO

PURPOSE: Defined as the similarity of speech behaviors between interlocutors, speech entrainment plays an important role in successful adult conversations. According to theoretical models of entrainment and research on motoric, cognitive, and social developmental milestones, the ability to entrain should develop throughout adolescence. However, little is known about the specific developmental trajectory or the role of speech entrainment in conversational outcomes of this age group. The purpose of this study is to characterize speech entrainment patterns in the conversations of neurotypical early adolescents. METHOD: This study utilized a corpus of 96 task-based conversations between adolescents between the ages of 9 and 14 years and a comparison corpus of 32 task-based conversations between adults. For each conversational turn, two speech entrainment scores were calculated for 429 acoustic features across rhythmic, articulatory, and phonatory dimensions. Predictive modeling was used to evaluate the degree of entrainment and relationship between entrainment and two metrics of conversational success. RESULTS: Speech entrainment increased throughout early adolescence but did not reach the level exhibited in conversations between adults. Additionally, speech entrainment was predictive of both conversational quality and conversational efficiency. Furthermore, models that included all acoustic features and both entrainment types performed better than models that only included individual acoustic feature sets or one type of entrainment. CONCLUSIONS: Our findings show that speech entrainment skills are largely developed during early adolescence with continued development possibly occurring across later adolescence. Additionally, results highlight the role of speech entrainment in successful conversation in this population, suggesting the import of continued exploration of this phenomenon in both neurotypical and neurodivergent adolescents. We also provide evidence of the value of using holistic measures that capture the multidimensionality of speech entrainment and provide a validated methodology for investigating entrainment across multiple acoustic features and entrainment types.

Assuntos

Comunicação , Fala , Adulto , Humanos , Adolescente , Criança , Fonação , Medida da Produção da Fala , Acústica

2.

Language Analytics for Assessment of Mental Health Status and Functional Competency.

Voleti, Rohit; Woolridge, Stephanie M; Liss, Julie M; Milanovic, Melissa; Stegmann, Gabriela; Hahn, Shira; Harvey, Philip D; Patterson, Thomas L; Bowie, Christopher R; Berisha, Visar.

Schizophr Bull ; 49(Suppl_2): S183-S195, 2023 03 22.

Artigo em Inglês | MEDLINE | ID: mdl-36946533

RESUMO

BACKGROUND AND HYPOTHESIS: Automated language analysis is becoming an increasingly popular tool in clinical research involving individuals with mental health disorders. Previous work has largely focused on using high-dimensional language features to develop diagnostic and prognostic models, but less work has been done to use linguistic output to assess downstream functional outcomes, which is critically important for clinical care. In this work, we study the relationship between automated language composites and clinical variables that characterize mental health status and functional competency using predictive modeling. STUDY DESIGN: Conversational transcripts were collected from a social skills assessment of individuals with schizophrenia (n = 141), bipolar disorder (n = 140), and healthy controls (n = 22). A set of composite language features based on a theoretical framework of speech production were extracted from each transcript and predictive models were trained. The prediction targets included clinical variables for assessment of mental health status and social and functional competency. All models were validated on a held-out test sample not accessible to the model designer. STUDY RESULTS: Our models predicted the neurocognitive composite with Pearson correlation PCC = 0.674; PANSS-positive with PCC = 0.509; PANSS-negative with PCC = 0.767; social skills composite with PCC = 0.785; functional competency composite with PCC = 0.616. Language features related to volition, affect, semantic coherence, appropriateness of response, and lexical diversity were useful for prediction of clinical variables. CONCLUSIONS: Language samples provide useful information for the prediction of a variety of clinical variables that characterize mental health status and functional competency.

Assuntos

Transtorno Bipolar , Esquizofrenia , Humanos , Esquizofrenia/diagnóstico , Fala , Comunicação , Nível de Saúde

3.

TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library.

Kinahan, Sean P; Liss, Julie M; Berisha, Visar.

PLoS One ; 18(2): e0281306, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36800358

RESUMO

The DIVA model is a computational model of speech motor control that combines a simulation of the brain regions responsible for speech production with a model of the human vocal tract. The model is currently implemented in Matlab Simulink; however, this is less than ideal as most of the development in speech technology research is done in Python. This means there is a wealth of machine learning tools which are freely available in the Python ecosystem that cannot be easily integrated with DIVA. We present TorchDIVA, a full rebuild of DIVA in Python using PyTorch tensors. DIVA source code was directly translated from Matlab to Python, and built-in Simulink signal blocks were implemented from scratch. After implementation, the accuracy of each module was evaluated via systematic block-by-block validation. The TorchDIVA model is shown to produce outputs that closely match those of the original DIVA model, with a negligible difference between the two. We additionally present an example of the extensibility of TorchDIVA as a research platform. Speech quality enhancement in TorchDIVA is achieved through an integration with an existing PyTorch generative vocoder called DiffWave. A modified DiffWave mel-spectrum upsampler was trained on human speech waveforms and conditioned on the TorchDIVA speech production. The results indicate improved speech quality metrics in the DiffWave-enhanced output as compared to the baseline. This enhancement would have been difficult or impossible to accomplish in the original Matlab implementation. This proof-of-concept demonstrates the value TorchDIVA can bring to the research community. Researchers can download the new implementation at: https://github.com/skinahan/DIVA_PyTorch.

Assuntos

Ecossistema , Fala , Humanos , Software , Simulação por Computador , Aprendizado de Máquina

4.

Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation.

Mathad, Vikram C; Liss, Julie M; Chapman, Kathy; Scherer, Nancy; Berisha, Visar.

IEEE/ACM Trans Audio Speech Lang Process ; 31: 86-95, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36712557

RESUMO

Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trained to classify between different consonants using CV regions as input. We demonstrate that the OAM is correlated with perceptual measures in a variety of contexts including (a) adult dysarthric speech, (b) the speech of children with cleft lip/palate, and (c) a database of accented English speech from native Mandarin and Spanish speakers.

5.

Spatio-Semantic Graphs From Picture Description: Applications to Detection of Cognitive Impairment.

Ambadi, Pranav S; Basche, Kristin; Koscik, Rebecca L; Berisha, Visar; Liss, Julie M; Mueller, Kimberly D.

Front Neurol ; 12: 795374, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34956070

RESUMO

Clinical assessments often use complex picture description tasks to elicit natural speech patterns and magnify changes occurring in brain regions implicated in Alzheimer's disease and dementia. As The Cookie Theft picture description task is used in the largest Alzheimer's disease and dementia cohort studies available, we aimed to create algorithms that could characterize the visual narrative path a participant takes in describing what is happening in this image. We proposed spatio-semantic graphs, models based on graph theory that transform the participants' narratives into graphs that retain semantic order and encode the visuospatial information between content units in the image. The resulting graphs differ between Cognitively Impaired and Unimpaired participants in several important ways. Cognitively Impaired participants consistently scored higher on features that are heavily associated with symptoms of cognitive decline, including repetition, evidence of short-term memory lapses, and generally disorganized narrative descriptions, while Cognitively Unimpaired participants produced more efficient narrative paths. These results provide evidence that spatio-semantic graph analysis of these tasks can generate important insights into a participant's cognitive performance that cannot be generated from semantic analysis alone.

6.

A Deep Learning Algorithm for Objective Assessment of Hypernasality in Children With Cleft Palate.

Mathad, Vikram C; Scherer, Nancy; Chapman, Kathy; Liss, Julie M; Berisha, Visar.

IEEE Trans Biomed Eng ; 68(10): 2986-2996, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-33566756

RESUMO

OBJECTIVES: Evaluation of hypernasality requires extensive perceptual training by clinicians and extending this training on a large scale internationally is untenable; this compounds the health disparities that already exist among children with cleft. In this work, we present the objective hypernasality measure (OHM), a speech-based algorithm that automatically measures hypernasality in speech, and validate it relative to a group of trained clinicians. METHODS: We trained a deep neural network (DNN) on approximately 100 hours of a publicly-available healthy speech corpus to detect the presence of nasal acoustic cues generated through the production of nasal consonants and nasalized phonemes in speech. Importantly, this model does not require any clinical data for training. The posterior probabilities of the deep learning model were aggregated at the sentence and speaker-levels to compute the OHM. RESULTS: The results showed that the OHM was significantly correlated with perceptual hypernasality ratings from the Americleft database (r = 0.797, p < 0.001) and the New Mexico Cleft Palate Center (NMCPC) database (r = 0.713, p < 0.001). In addition, we evaluated the relationship between the OHM and articulation errors; the sensitivity of the OHM in detecting the presence of very mild hypernasality; and established the internal reliability of the metric. Further, the performance of the OHM was compared with a DNN regression algorithm directly trained on the hypernasal speech samples. SIGNIFICANCE: The results indicate that the OHM is able to measure the severity of hypernasality on par with Americleft-trained clinicians on thisdataset.

Assuntos

Fissura Palatina , Aprendizado Profundo , Distúrbios da Voz , Criança , Fissura Palatina/diagnóstico , Humanos , Reprodutibilidade dos Testes , Medida da Produção da Fala

7.

The Orofacial Somatosensory System Is Modulated During Speech Planning and Production.

McGuffin, Brianna J; Liss, Julie M; Daliri, Ayoub.

J Speech Lang Hear Res ; 63(8): 2637-2648, 2020 08 10.

Artigo em Inglês | MEDLINE | ID: mdl-32697611

RESUMO

Purpose In our previous studies, we showed that the brain modulates the auditory system, and the modulation starts during speech planning. However, it remained unknown whether the brain uses similar mechanisms to modulate the orofacial somatosensory system. Here, we developed a novel behavioral paradigm to (a) examine whether the somatosensory system is modulated during speech planning and (b) determine the somatosensory modulation's time course during planning and production. Method Participants (N = 20) completed two experiments in which we applied electrical current stimulation to the lower lip to induce somatosensory sensation. In the first experiment, we used a staircase method (one-up, four-down) to determine each participant's perceptual threshold at rest (i.e., the stimulus that the participant detected on 85% of trials). In the second experiment, we estimated each participant's detection ratio of electrical stimuli (with a magnitude equivalent of their perceptual threshold) delivered at various time points before speaking and during a control condition (silent reading). Results We found that the overall detection ratio in the silent reading condition remained unchanged relative to the detection ratio at rest. Approximately 536 ms before speech onset, the detection ratio in the speaking condition was similar to that in the silent reading condition; however, the detection ratio in the speaking condition gradually started to decrease and reached its lowest level at 58 ms before speech onset. Conclusions Overall, we provided compelling behavioral evidence that, as the speech motor system prepares speech movements, it also modulates the orofacial somatosensory system in a temporally specific manner.

Assuntos

Percepção da Fala , Fala , Encéfalo , Humanos , Lábio , Leitura

8.

A Review of Automated Speech and Language Features for Assessment of Cognitive and Thought Disorders.

Voleti, Rohit; Liss, Julie M; Berisha, Visar.

IEEE J Sel Top Signal Process ; 14(2): 282-298, 2020 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-33907590

RESUMO

It is widely accepted that information derived from analyzing speech (the acoustic signal) and language production (words and sentences) serves as a useful window into the health of an individual's cognitive ability. In fact, most neuropsychological testing batteries have a component related to speech and language where clinicians elicit speech from patients for subjective evaluation across a broad set of dimensions. With advances in speech signal processing and natural language processing, there has been recent interest in developing tools to detect more subtle changes in cognitive-linguistic function. This work relies on extracting a set of features from recorded and transcribed speech for objective assessments of speech and language, early diagnosis of neurological disease, and tracking of disease after diagnosis. With an emphasis on cognitive and thought disorders, in this paper we provide a review of existing speech and language features used in this domain, discuss their clinical application, and highlight their advantages and disadvantages. Broadly speaking, the review is split into two categories: language features based on natural language processing and speech features based on speech signal processing. Within each category, we consider features that aim to measure complementary dimensions of cognitive-linguistics, including language diversity, syntactic complexity, semantic coherence, and timing. We conclude the review with a proposal of new research directions to further advance the field.

9.

Sync Pending: Characterizing Conversational Entrainment in Dysarthria Using a Multidimensional, Clinically Informed Approach.

Borrie, Stephanie A; Barrett, Tyson S; Liss, Julie M; Berisha, Visar.

J Speech Lang Hear Res ; 63(1): 83-94, 2020 01 22.

Artigo em Inglês | MEDLINE | ID: mdl-31855608

RESUMO

Purpose Despite the import of conversational entrainment to successful spoken dialogue, the systematic characterization of this behavioral syncing phenomenon represents a critical gap in the field of speech pathology. The goal of this study was to acoustically characterize conversational entrainment in the context of dysarthria using a multidimensional approach previously validated in healthy populations (healthy conversations; Borrie, Barrett, Willi, & Berisha, 2019). Method A large corpus of goal-oriented conversations between participants with dysarthria and healthy participants (disordered conversations) was elicited using a "spot the difference" task. Expert clinical assessment of entrainment and a measure of conversational success (communicative efficiency) was obtained for each of the audio-recorded conversations. Conversational entrainment of acoustic features representing rhythmic, articulatory, and phonatory dimensions of speech was identified using cross-recurrence quantification analysis with clinically informed model parameters and validated with a sham condition involving conversational participants who did not converse with one another. The relationship between conversational entrainment and communicative efficiency was examined. Results Acoustic evidence of entrainment was observed in phonatory, but not rhythmic and articulatory, behavior, a finding that differs from healthy conversations in which entrainment was observed in all speech signal dimensions. This result, that disordered conversations showed less acoustic entrainment than healthy conversations, is corroborated by clinical assessment of entrainment in which the disordered conversations were rated, overall, as being less in sync than healthy conversations. Furthermore, acoustic entrainment was predictive of communicative efficiency, corroborated by a relationship between clinical assessment and the same outcome measure. Conclusions The findings confirm our hypothesis that the pathological speech production parameters of dysarthria disrupt the seemingly ubiquitous phenomenon of conversational entrainment, thus advancing entrainment deficits as an important variable in dysarthria, one that may have causative effects on the success of everyday communication. Results further reveal that while this approach provides a broad overview, methodologies for characterizing conversational entrainment in dysarthria must continue to be developed and refined, with a focus on clinical utility. Supplemental Material https://osf.io/ktg5q.

Assuntos

Comunicação , Disartria/fisiopatologia , Fonação/fisiologia , Acústica da Fala , Comportamento Verbal/fisiologia , Adulto , Eficiência/fisiologia , Feminino , Humanos , Masculino , Medida da Produção da Fala , Qualidade da Voz

10.

The Effects of Speech Compression Algorithms on the Intelligibility of Two Individuals With Dysarthric Speech.

Utianski, Rene L; Sandoval, Steven; Berisha, Visar; Lansford, Kaitlin L; Liss, Julie M.

Am J Speech Lang Pathol ; 28(1): 195-203, 2019 02 21.

Artigo em Inglês | MEDLINE | ID: mdl-30515518

RESUMO

Purpose Telemedicine, used to offset disparities in access to speech-language therapy, relies on technology that utilizes compression algorithms to transmit signals efficiently. These algorithms have been thoroughly evaluated on healthy speech; however, the effects of compression algorithms on the intelligibility of disordered speech have not been adequately explored. Method This case study assessed acoustic and perceptual effects of resampling and speech compression (i.e., transcoding) on the speech of 2 individuals with dysarthria. Forced-choice vowel identification and transcription tasks were utilized, completed by 20 naive undergraduate listeners. Results Results showed relative improvements and decrements in intelligibility, on various measures, based on the speakers' acoustic profiles. The transcoding of the speech compression algorithm resulted in an enlarged vowel space area and associated improvements in vowel identification for 1 speaker and a smaller vowel space area and decreased vowel identification for the other speaker. Interestingly, there was an overall decrease in intelligibility in the transcription task in this condition for both speakers. Conclusions There is a complex interplay between dysarthria and compression algorithms that warrants further exploration. The findings suggest that it is critical to be mindful of apparent changes in intelligibility secondary to compression algorithms necessary for practicing telemedicine. Supplemental Material https://doi.org/10.23641/asha.7291940.

Assuntos

Algoritmos , Compressão de Dados/métodos , Disartria/psicologia , Inteligibilidade da Fala , Telemedicina/métodos , Adulto , Humanos , Masculino , Fonética , Processamento de Sinais Assistido por Computador , Acústica da Fala , Percepção da Fala , Medida da Produção da Fala/métodos

11.

INVESTIGATING THE EFFECTS OF WORD SUBSTITUTION ERRORS ON SENTENCE EMBEDDINGS.

Voleti, Rohit; Liss, Julie M; Berisha, Visar.

Proc IEEE Int Conf Acoust Speech Signal Process ; 2019: 7315-7319, 2019 May.

Artigo em Inglês | MEDLINE | ID: mdl-33376454

RESUMO

A key initial step in several natural language processing (NLP) tasks involves embedding phrases of text to vectors of real numbers that preserve semantic meaning. To that end, several methods have been recently proposed with impressive results on semantic similarity tasks. However, all of these approaches assume that perfect transcripts are available when generating the embeddings. While this is a reasonable assumption for analysis of written text, it is limiting for analysis of transcribed text. In this paper we investigate the effects of word substitution errors, such as those coming from automatic speech recognition errors (ASR), on several state-of-the-art sentence embedding methods. To do this, we propose a new simulator that allows the experimenter to induce ASR-plausible word substitution errors in a corpus at a desired word error rate. We use this simulator to evaluate the robustness of several sentence embedding methods. Our results show that pre-trained neural sentence encoders are both robust to ASR errors and perform well on textual similarity tasks after errors are introduced. Meanwhile, unweighted averages of word vectors perform well with perfect transcriptions, but their performance degrades rapidly on textual similarity tasks for text with word substitution errors.

12.

Predicting Intelligibility Gains in Individuals With Dysarthria From Baseline Speech Features.

Fletcher, Annalise R; McAuliffe, Megan J; Lansford, Kaitlin L; Sinex, Donal G; Liss, Julie M.

J Speech Lang Hear Res ; 60(11): 3043-3057, 2017 11 09.

Artigo em Inglês | MEDLINE | ID: mdl-29075753

RESUMO

Purpose: Across the treatment literature, behavioral speech modifications have produced variable intelligibility changes in speakers with dysarthria. This study is the first of two articles exploring whether measurements of baseline speech features can predict speakers' responses to these modifications. Methods: Fifty speakers (7 older individuals and 43 speakers with dysarthria) read a standard passage in habitual, loud, and slow speaking modes. Eighteen listeners rated how easy the speech samples were to understand. Baseline acoustic measurements of articulation, prosody, and voice quality were collected with perceptual measures of severity. Results: Cues to speak louder and reduce rate did not confer intelligibility benefits to every speaker. The degree to which cues to speak louder improved intelligibility could be predicted by speakers' baseline articulation rates and overall dysarthria severity. Improvements in the slow condition could be predicted by speakers' baseline severity and temporal variability. Speakers with a breathier voice quality tended to perform better in the loud condition than in the slow condition. Conclusions: Assessments of baseline speech features can be used to predict appropriate treatment strategies for speakers with dysarthria. Further development of these assessments could provide the basis for more individualized treatment programs.

Assuntos

Disartria/diagnóstico , Inteligibilidade da Fala , Medida da Produção da Fala , Adulto , Idoso , Idoso de 80 Anos ou mais , Tomada de Decisão Clínica , Sinais (Psicologia) , Disartria/terapia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Leitura , Reprodutibilidade dos Testes , Índice de Gravidade de Doença , Acústica da Fala , Fonoterapia

13.

Predicting Intelligibility Gains in Dysarthria Through Automated Speech Feature Analysis.

Fletcher, Annalise R; Wisler, Alan A; McAuliffe, Megan J; Lansford, Kaitlin L; Liss, Julie M.

J Speech Lang Hear Res ; 60(11): 3058-3068, 2017 11 09.

Artigo em Inglês | MEDLINE | ID: mdl-29075755

RESUMO

Purpose: Behavioral speech modifications have variable effects on the intelligibility of speakers with dysarthria. In the companion article, a significant relationship was found between measures of speakers' baseline speech and their intelligibility gains following cues to speak louder and reduce rate (Fletcher, McAuliffe, Lansford, Sinex, & Liss, 2017). This study reexamines these features and assesses whether automated acoustic assessments can also be used to predict intelligibility gains. Method: Fifty speakers (7 older individuals and 43 with dysarthria) read a passage in habitual, loud, and slow speaking modes. Automated measurements of long-term average spectra, envelope modulation spectra, and Mel-frequency cepstral coefficients were extracted from short segments of participants' baseline speech. Intelligibility gains were statistically modeled, and the predictive power of the baseline speech measures was assessed using cross-validation. Results: Statistical models could predict the intelligibility gains of speakers they had not been trained on. The automated acoustic features were better able to predict speakers' improvement in the loud condition than the manual measures reported in the companion article. Conclusions: These acoustic analyses present a promising tool for rapidly assessing treatment options. Automated measures of baseline speech patterns may enable more selective inclusion criteria and stronger group outcomes within treatment studies.

Assuntos

Disartria/diagnóstico , Reconhecimento Automatizado de Padrão , Acústica da Fala , Inteligibilidade da Fala , Medida da Produção da Fala/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Tomada de Decisão Clínica , Sinais (Psicologia) , Disartria/terapia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Prognóstico , Leitura , Reprodutibilidade dos Testes , Índice de Gravidade de Doença , Interface para o Reconhecimento da Fala , Fonoterapia

14.

Assessing Vowel Centralization in Dysarthria: A Comparison of Methods.

Fletcher, Annalise R; McAuliffe, Megan J; Lansford, Kaitlin L; Liss, Julie M.

J Speech Lang Hear Res ; 60(2): 341-354, 2017 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-28124069

RESUMO

Purpose: The strength of the relationship between vowel centralization measures and perceptual ratings of dysarthria severity has varied considerably across reports. This article evaluates methods of acoustic-perceptual analysis to determine whether procedural changes can strengthen the association between these measures. Method: Sixty-one speakers (17 healthy individuals and 44 speakers with dysarthria) read a standard passage. To obtain acoustic data, 2 points of formant extraction (midpoint and articulatory point) and 2 frequency measures (Hz and Bark) were trialed. Both vowel space area and an adapted formant centralization ratio were calculated using first and second formants of speakers' corner vowels. Twenty-eight listeners rated speech samples using different prompts: one with a focus on intelligibility, the other on speech precision. Results: Perceptually, listener ratings of speech precision provided the best index of acoustic change. Acoustically, the combined use of an articulatory-based formant extraction point, Bark frequency units, and the formant centralization ratio was most effective in explaining perceptual ratings. This combination of procedures resulted in an increase of 17% to 27% explained variance between measures. Conclusions: The procedures researchers use to assess articulatory impairment can significantly alter the strength of relationship between acoustic and perceptual measures. Procedures that maximize this relationship are recommended.

Assuntos

Disartria/diagnóstico , Fonética , Acústica da Fala , Medida da Produção da Fala/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Disartria/fisiopatologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Leitura , Reprodutibilidade dos Testes , Espectrografia do Som , Inteligibilidade da Fala

15.

The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance.

Tu, Ming; Wisler, Alan; Berisha, Visar; Liss, Julie M.

J Acoust Soc Am ; 140(5): EL416, 2016 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-27908075

RESUMO

State-of-the-art automatic speech recognition (ASR) engines perform well on healthy speech; however recent studies show that their performance on dysarthric speech is highly variable. This is because of the acoustic variability associated with the different dysarthria subtypes. This paper aims to develop a better understanding of how perceptual disturbances in dysarthric speech relate to ASR performance. Accurate ratings of a representative set of 32 dysarthric speakers along different perceptual dimensions are obtained and the performance of a representative ASR algorithm on the same set of speakers is analyzed. This work explores the relationship between these ratings and ASR performance and reveals that ASR performance can be predicted from perceptual disturbances in dysarthric speech with articulatory precision contributing the most to the prediction followed by prosody.

Assuntos

Disartria , Algoritmos , Humanos , Fala , Inteligibilidade da Fala , Medida da Produção da Fala , Interface para o Reconhecimento da Fala

16.

The relationship between speech segment duration and vowel centralization in a group of older speakers.

Fletcher, Annalise R; McAuliffe, Megan J; Lansford, Kaitlin L; Liss, Julie M.

J Acoust Soc Am ; 138(4): 2132-9, 2015 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-26520296

RESUMO

This study examined the relationship between average vowel duration and spectral vowel quality across a group of 149 New Zealand English speakers aged 65 to 90 yr. The primary intent was to determine whether participants who had a natural tendency to speak slowly would also produce more spectrally distinct vowel segments. As a secondary aim, this study investigated whether advancing age exhibited a measurable effect on vowel quality and vowel durations within the group. In examining vowel quality, both flexible and static formant extraction points were compared. Two formant measurements, from selected [É:], [ i:], and [ o:] vowels, were extracted from a standard passage and used to calculate two measurements of vowel space area (VSA) for each speaker. Average vowel duration was calculated from segments across the passage. The study found a statistically significant relationship between speakers' average vowel durations and VSA measurements indicating that, on average, speakers with slower speech rates produced more acoustically distinct speech segments. As expected, increases in average vowel duration were found with advancing age. However, speakers' formant values remained unchanged. It is suggested that the use of a habitually slower speaking rate may assist speakers in maintaining acoustically distinct vowels.

Assuntos

Idoso/psicologia , Fonação , Fonética , Fatores Etários , Idoso de 80 Anos ou mais , Feminino , Hábitos , Humanos , Masculino , Fatores Sexuais , Espectrografia do Som , Acústica da Fala , Medida da Produção da Fala , Fatores de Tempo , Comportamento Verbal

17.

Cortical characterization of the perception of intelligible and unintelligible speech measured via high-density electroencephalography.

Utianski, Rene L; Caviness, John N; Liss, Julie M.

Brain Lang ; 140: 49-54, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25513975

RESUMO

High-density electroencephalography was used to evaluate cortical activity during speech comprehension via a sentence verification task. Twenty-four participants assigned true or false to sentences produced with 3 noise-vocoded channel levels (1--unintelligible, 6--decipherable, 16--intelligible), during simultaneous EEG recording. Participant data were sorted into higher- (HP) and lower-performing (LP) groups. The identification of a late-event related potential for LP listeners in the intelligible condition and in all listeners when challenged with a 6-Ch signal supports the notion that this induced potential may be related to either processing degraded speech, or degraded processing of intelligible speech. Different cortical locations are identified as neural generators responsible for this activity; HP listeners are engaging motor aspects of their language system, utilizing an acoustic-phonetic based strategy to help resolve the sentence, while LP listeners do not. This study presents evidence for neurophysiological indices associated with more or less successful speech comprehension performance across listening conditions.

Assuntos

Compreensão/fisiologia , Eletroencefalografia , Inteligibilidade da Fala , Percepção da Fala/fisiologia , Adulto , Potenciais Evocados/fisiologia , Feminino , Lobo Frontal/fisiologia , Audição/fisiologia , Humanos , Masculino , Pessoa de Meia-Idade , Ruído , Fonética , Lobo Temporal/fisiologia , Adulto Jovem

18.

Free-classification of perceptually similar speakers with dysarthria.

Lansford, Kaitlin L; Liss, Julie M; Norton, Rebecca E.

J Speech Lang Hear Res ; 57(6): 2051-64, 2014 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-25057892

RESUMO

PURPOSE: In this investigation, the construct of perceptual similarity was explored in the dysarthrias. Specifically, we employed an auditory free-classification task to determine whether listeners could cluster speakers by perceptual similarity, whether the clusters mapped to acoustic metrics, and whether the clusters were constrained by dysarthria subtype diagnosis. METHOD: Twenty-three listeners blinded to speakers' medical and dysarthria subtype diagnoses participated. The task was to group together (drag and drop) the icons corresponding to 33 speakers with dysarthria on the basis of how similar they sounded. Cluster analysis and multidimensional scaling (MDS) modeled the perceptual dimensions underlying similarity. Acoustic metrics and perceptual judgments were used in correlation analyses to facilitate interpretation of the derived dimensions. RESULTS: Six clusters of similar-sounding speakers and 3 perceptual dimensions underlying similarity were revealed. The clusters of similar-sounding speakers were not constrained by dysarthria subtype diagnosis. The 3 perceptual dimensions revealed by MDS were correlated with metrics for articulation rate, intelligibility, and vocal quality, respectively. CONCLUSIONS: This study shows (a) feasibility of a free-classification approach for studying perceptual similarity in dysarthria, (b) correspondence between acoustic and perceptual metrics to clusters of similar-sounding speakers, and (c) similarity judgments transcended dysarthria subtype diagnosis.

Assuntos

Disartria/classificação , Fonética , Acústica da Fala , Inteligibilidade da Fala , Percepção da Fala , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Qualidade da Voz

19.

Vowel acoustics in dysarthria: speech disorder diagnosis and classification.

Lansford, Kaitlin L; Liss, Julie M.

J Speech Lang Hear Res ; 57(1): 57-67, 2014 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-24687467

RESUMO

PURPOSE: The purpose of this study was to determine the extent to which vowel metrics are capable of distinguishing healthy from dysarthric speech and among different forms of dysarthria. METHOD: A variety of vowel metrics were derived from spectral and temporal measurements of vowel tokens embedded in phrases produced by 45 speakers with dysarthria and 12 speakers with no history of neurological disease. Via means testing and discriminant function analysis (DFA), the acoustic metrics were used to (a) detect the presence of dysarthria and (b) classify the dysarthria subtype. RESULTS: Significant differences between dysarthric and healthy control speakers were revealed for all vowel metrics. However, the results of the DFA demonstrated some metrics (particularly metrics that capture vowel distinctiveness) to be more sensitive and specific predictors of dysarthria. Only the vowel metrics that captured slope of the second formant (F2) demonstrated between-group differences across the dysarthrias. However, when subjected to DFA, these metrics proved unreliable classifiers of dysarthria subtype. CONCLUSION: The results of these analyses suggest that some vowel metrics may be useful clinically for the detection of dysarthria but may not be reliable indicators of dysarthria subtype using the current dysarthria classification scheme.

Assuntos

Disartria/classificação , Disartria/diagnóstico , Fonética , Acústica da Fala , Medida da Produção da Fala/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Esclerose Lateral Amiotrófica/complicações , Bases de Dados Factuais , Disartria/etiologia , Feminino , Humanos , Doença de Huntington/complicações , Masculino , Pessoa de Meia-Idade , Doença de Parkinson/complicações

20.

Vowel acoustics in dysarthria: mapping to perception.

Lansford, Kaitlin L; Liss, Julie M.

J Speech Lang Hear Res ; 57(1): 68-80, 2014 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-24687468

RESUMO

PURPOSE: The aim of the present report was to explore whether vowel metrics, demonstrated to distinguish dysarthric and healthy speech in a companion article (Lansford & Liss, 2014), are able to predict human perceptual performance. METHOD: Vowel metrics derived from vowels embedded in phrases produced by 45 speakers with dysarthria were compared with orthographic transcriptions of these phrases collected from 120 healthy listeners. First, correlation and stepwise multiple regressions were conducted to identify acoustic metrics that had predictive value for perceptual measures. Next, discriminant function analysis misclassifications were compared with listeners' misperceptions to examine more directly the perceptual consequences of degraded vowel acoustics. RESULTS: Several moderate correlative relationships were found between acoustic metrics and perceptual measures, with predictive models accounting for 18%-75% of the variance in measures of intelligibility and vowel accuracy. Results of the second analysis showed that listeners better identified acoustically distinctive vowel tokens. In addition, the level of agreement between misclassified-to-misperceived vowel tokens supports some specificity of degraded acoustic profiles on the resulting percept. CONCLUSION: Results provide evidence that degraded vowel acoustics have some effect on human perceptual performance, even in the presence of extravowel variables that naturally exert influence in phrase perception.

Assuntos

Disartria/diagnóstico , Fonética , Acústica da Fala , Inteligibilidade da Fala , Percepção da Fala , Adolescente , Adulto , Esclerose Lateral Amiotrófica/complicações , Ataxia Cerebelar/complicações , Disartria/etiologia , Feminino , Humanos , Doença de Huntington/complicações , Masculino , Pessoa de Meia-Idade , Doença de Parkinson/complicações , Valor Preditivo dos Testes , Adulto Jovem

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA