Pesquisa | Secretaria de Estado da Saúde

1.

Altered speech patterns in subjects with post-traumatic headache due to mild traumatic brain injury.

Chong, Catherine D; Zhang, Jianwei; Li, Jing; Wu, Teresa; Dumkrieger, Gina; Nikolova, Simona; Ross, Katherine; Stegmann, Gabriela; Liss, Julie; Schwedt, Todd J; Jayasuriya, Suren; Berisha, Visar.

J Headache Pain ; 22(1): 82, 2021 Jul 23.

Artigo em Inglês | MEDLINE | ID: mdl-34301180

RESUMO

BACKGROUND/OBJECTIVE: Changes in speech can be detected objectively before and during migraine attacks. The goal of this study was to interrogate whether speech changes can be detected in subjects with post-traumatic headache (PTH) attributed to mild traumatic brain injury (mTBI) and whether there are within-subject changes in speech during headaches compared to the headache-free state. METHODS: Using a series of speech elicitation tasks uploaded via a mobile application, PTH subjects and healthy controls (HC) provided speech samples once every 3 days, over a period of 12 weeks. The following speech parameters were assessed: vowel space area, vowel articulation precision, consonant articulation precision, average pitch, pitch variance, speaking rate and pause rate. Speech samples of subjects with PTH were compared to HC. To assess speech changes associated with PTH, speech samples of subjects during headache were compared to speech samples when subjects were headache-free. All analyses were conducted using a mixed-effect model design. RESULTS: Longitudinal speech samples were collected from nineteen subjects with PTH (mean age = 42.5, SD = 13.7) who were an average of 14 days (SD = 32.2) from their mTBI at the time of enrollment and thirty-one HC (mean age = 38.7, SD = 12.5). Regardless of headache presence or absence, PTH subjects had longer pause rates and reductions in vowel and consonant articulation precision relative to HC. On days when speech was collected during a headache, there were longer pause rates, slower sentence speaking rates and less precise consonant articulation compared to the speech production of HC. During headache, PTH subjects had slower speaking rates yet more precise vowel articulation compared to when they were headache-free. CONCLUSIONS: Compared to HC, subjects with acute PTH demonstrate altered speech as measured by objective features of speech production. For individuals with PTH, speech production may have been more effortful resulting in slower speaking rates and more precise vowel articulation during headache vs. when they were headache-free, suggesting that speech alterations were related to PTH and not solely due to the underlying mTBI.

Assuntos

Concussão Encefálica , Transtornos de Enxaqueca , Cefaleia Pós-Traumática , Adulto , Concussão Encefálica/complicações , Cefaleia , Humanos , Cefaleia Pós-Traumática/etiologia , Fala

2.

Inter-rater Agreement of Clinicians' Treatment Recommendations Based on Modified Barium Swallow Study Reports.

Slovarp, Laurie; Danielson, Jennifer; Liss, Julie.

Dysphagia ; 33(6): 818-826, 2018 12.

Artigo em Inglês | MEDLINE | ID: mdl-29882104

RESUMO

The modified barium swallow study (MBSS) is a commonly used radiographic procedure for diagnosis and treatment of swallowing disorders. Despite attempts by dysphagia specialists to standardize the MBSS, most institutions have not adopted such standardized procedures. High variability of assessment patterns arguably contribute to variability of treatment recommendations made from diagnostic information derived from the MBSS report. An online survey was distributed to speech-language pathologists (SLPs) participating in American Speech Language Hearing Association (ASHA) listservs. Sixty-three SLPs who treat swallowing disorders participated. Participating SLPs reviewed two MBSS reports and chose physiologic treatment targets (e.g., tongue base retraction) based on each report. One report primarily contained symptomatology (e.g., aspiration, pharyngeal residue) with minimal information on impaired physiology (e.g., laryngeal incompetence, reduced hyolaryngeal elevation/excursion). In contrast, the second report contained a clear description of impaired physiology to explain the dysphagia symptoms. Fleiss kappa coefficients were used to analyze inter-rater agreement across the high and low physiology report types. Results revealed significantly higher inter-rater agreement across clinicians when reviewing reports with clear explanation(s) of physiologic impairment relative to reports that primarily focused on symptomatology. Clinicians also reported significantly greater satisfaction and treatment confidence following review of reports with clear description(s) of impaired physiology.

Assuntos

Sulfato de Bário/farmacologia , Transtornos de Deglutição/diagnóstico , Deglutição/fisiologia , Fluoroscopia/métodos , Laringe/diagnóstico por imagem , Faringe/diagnóstico por imagem , Atitude do Pessoal de Saúde , Meios de Contraste/farmacologia , Transtornos de Deglutição/fisiopatologia , Transtornos de Deglutição/terapia , Humanos , Laringe/fisiopatologia , Otorrinolaringologistas , Planejamento de Assistência ao Paciente , Seleção de Pacientes , Faringe/fisiopatologia , Reprodutibilidade dos Testes , Estados Unidos

3.

A Pilot Study of the Tongue Pull-Back Exercise for Improving Tongue-Base Retraction and Two Novel Methods to Add Resistance to the Tongue Pull-Back.

Slovarp, Laurie; King, Lauren; Off, Catherine; Liss, Julie.

Dysphagia ; 31(3): 416-23, 2016 06.

Artigo em Inglês | MEDLINE | ID: mdl-26857465

RESUMO

This pilot study investigated the tongue pull-back (TPB) exercise to improve tongue-base retraction as well as two methods to add resistance to the TPB. Surface electromyography (sEMG) to the submental triangle was used as an indication of tongue-base activity on 13 healthy adults during: (1) saliva swallow, (2) 15 mL water swallow, (3) effortful swallow, (4) unassisted TPB, (5) TPB with added resistance by holding the tongue with gauze (finger-resisted TPB), and (6) TPB with the tongue clipped to a spring-loaded tension resistance device (device-resisted TPB). Order of the exercises was randomized. The exercises fell into two groups-weak and intense. Weak exercises included saliva swallow, water swallow, and unassisted TPB (mean sEMG = 19.07 µV, p = .593). Intense exercises included effortful swallow, finger-resisted TPB, and device-resisted TPB (mean sEMG = 36.44 µV, p = .315). Each intense exercise resulted in significantly higher mean sEMG peak amplitude than each weak exercise (p < .05), with one exception; the effortful swallow was not significantly different than the unassisted TPB (p = .171). This study provides preliminary evidence that the unassisted TPB may not be any more helpful for improving tongue-base retraction than normal swallowing. Adding resistance to the TPB by holding the tongue with gauze may be an effective alternative. This study also demonstrates proof-of-concept for creating a device to attach to the tongue and provide tension resistance during the TPB exercise. Further research with a more sophisticated design is needed before such a device can be fully developed and implemented clinically.

Assuntos

Deglutição/fisiologia , Treinamento Resistido/métodos , Língua/fisiologia , Ingestão de Líquidos , Eletromiografia/métodos , Feminino , Voluntários Saudáveis , Humanos , Masculino , Projetos Piloto , Treinamento Resistido/instrumentação , Saliva , Adulto Jovem

4.

The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance.

Tu, Ming; Wisler, Alan; Berisha, Visar; Liss, Julie M.

J Acoust Soc Am ; 140(5): EL416, 2016 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-27908075

RESUMO

State-of-the-art automatic speech recognition (ASR) engines perform well on healthy speech; however recent studies show that their performance on dysarthric speech is highly variable. This is because of the acoustic variability associated with the different dysarthria subtypes. This paper aims to develop a better understanding of how perceptual disturbances in dysarthric speech relate to ASR performance. Accurate ratings of a representative set of 32 dysarthric speakers along different perceptual dimensions are obtained and the performance of a representative ASR algorithm on the same set of speakers is analyzed. This work explores the relationship between these ratings and ASR performance and reveals that ASR performance can be predicted from perceptual disturbances in dysarthric speech with articulatory precision contributing the most to the prediction followed by prosody.

Assuntos

Disartria , Algoritmos , Humanos , Fala , Inteligibilidade da Fala , Medida da Produção da Fala , Interface para o Reconhecimento da Fala

5.

The role of stress and word size in Spanish speech segmentation.

LaCross, Amy; Liss, Julie; Barragan, Beatriz; Adams, Ashley; Berisha, Visar; McAuliffe, Megan; Fromont, Robert.

J Acoust Soc Am ; 140(6): EL484, 2016 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-28040010

RESUMO

In English, the predominance of stressed syllables as word onsets aids lexical segmentation in degraded listening conditions. Yet it is unlikely that these findings would readily transfer to languages with differing rhythmic structure. In the current study, the authors seek to examine whether listeners exploit both common word size (syllable number) and stress cues to aid lexical segmentation in Spanish. Forty-seven Spanish-speaking listeners transcribed two-word Spanish phrases in noise. As predicted by the statistical probabilities of Spanish, error analysis revealed that listeners preferred two- and three-syllable words with penultimate stress in their attempts to parse the degraded speech signal. These findings provide insight into the importance of stress in tandem with word size in the segmentation of Spanish words and suggest testable hypotheses for cross-linguistic studies that examine the effects of degraded acoustic cues on lexical segmentation.

Assuntos

Fala , Sinais (Psicologia) , Humanos , Fonética , Percepção da Fala

6.

The relationship between speech segment duration and vowel centralization in a group of older speakers.

Fletcher, Annalise R; McAuliffe, Megan J; Lansford, Kaitlin L; Liss, Julie M.

J Acoust Soc Am ; 138(4): 2132-9, 2015 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-26520296

RESUMO

This study examined the relationship between average vowel duration and spectral vowel quality across a group of 149 New Zealand English speakers aged 65 to 90 yr. The primary intent was to determine whether participants who had a natural tendency to speak slowly would also produce more spectrally distinct vowel segments. As a secondary aim, this study investigated whether advancing age exhibited a measurable effect on vowel quality and vowel durations within the group. In examining vowel quality, both flexible and static formant extraction points were compared. Two formant measurements, from selected [É:], [ i:], and [ o:] vowels, were extracted from a standard passage and used to calculate two measurements of vowel space area (VSA) for each speaker. Average vowel duration was calculated from segments across the passage. The study found a statistically significant relationship between speakers' average vowel durations and VSA measurements indicating that, on average, speakers with slower speech rates produced more acoustically distinct speech segments. As expected, increases in average vowel duration were found with advancing age. However, speakers' formant values remained unchanged. It is suggested that the use of a habitually slower speaking rate may assist speakers in maintaining acoustically distinct vowels.

Assuntos

Idoso/psicologia , Fonação , Fonética , Fatores Etários , Idoso de 80 Anos ou mais , Feminino , Hábitos , Humanos , Masculino , Fatores Sexuais , Espectrografia do Som , Acústica da Fala , Medida da Produção da Fala , Fatores de Tempo , Comportamento Verbal

7.

Characterizing the distribution of the quadrilateral vowel space area.

Berisha, Visar; Sandoval, Steven; Utianski, Rene; Liss, Julie; Spanias, Andreas.

J Acoust Soc Am ; 135(1): 421-7, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24437782

RESUMO

The vowel space area (VSA) has been studied as a quantitative index of intelligibility to the extent it captures articulatory working space and reductions therein. The majority of such studies have been empirical wherein measures of VSA are correlated with perceptual measures of intelligibility. However, the literature contains minimal mathematical analysis of the properties of this metric. This paper further develops the theoretical underpinnings of this metric by presenting a detailed analysis of the statistical properties of the VSA and characterizing its distribution through the moment generating function. The theoretical analysis is confirmed by a series of experiments where empirically estimated and theoretically predicted statistics of this function are compared. The results show that on the Hillenbrand and TIMIT data, the theoretically predicted values of the higher-order statistics of the VSA match very well with the empirical estimates of the same.

Assuntos

Acústica da Fala , Inteligibilidade da Fala , Percepção da Fala , Qualidade da Voz , Simulação por Computador , Humanos , Modelos Estatísticos , Análise Numérica Assistida por Computador , Fonética

8.

Operationalizing Clinical Speech Analytics: Moving From Features to Measures for Real-World Clinical Impact.

Liss, Julie; Berisha, Visar.

J Speech Lang Hear Res ; : 1-7, 2024 Jun 05.

Artigo em Inglês | MEDLINE | ID: mdl-38838248

RESUMO

OBJECTIVE: This research note advocates for a methodological shift in clinical speech analytics, emphasizing the transition from high-dimensional speech feature representations to clinically validated speech measures designed to operationalize clinically relevant constructs of interest. The aim is to enhance model generalizability and clinical applicability in real-world settings. METHOD: We outline the challenges of using conventional supervised machine learning models in clinical speech analytics, particularly their limited generalizability and interpretability. We propose a new framework focusing on speech measures that are closely tied to specific speech constructs and have undergone rigorous validation. This research note discusses a case study involving the development of a measure for articulatory precision in amyotrophic lateral sclerosis (ALS), detailing the process from ideation through Food and Drug Administration (FDA) breakthrough status designation. RESULTS: The case study demonstrates how the operationalization of the articulatory precision construct into a quantifiable measure yields robust, clinically meaningful results. The measure's validation followed the V3 framework (verification, analytical validation, and clinical validation), showing high correlation with clinical status and speech intelligibility. The practical application of these measures is exemplified in a clinical trial and designation by the FDA as a breakthrough status device, underscoring their real-world impact. CONCLUSIONS: Transitioning from speech features to speech measures offers a more targeted approach for developing speech analytics tools in clinical settings. This shift ensures that models are not only technically sound but also clinically relevant and interpretable, thereby bridging the gap between laboratory research and practical health care applications. We encourage further exploration and adoption of this approach for developing interpretable speech representations tailored to specific clinical needs.

9.

Responsible development of clinical speech AI: Bridging the gap between clinical research and technology.

Berisha, Visar; Liss, Julie M.

NPJ Digit Med ; 7(1): 208, 2024 Aug 09.

Artigo em Inglês | MEDLINE | ID: mdl-39122889

RESUMO

This perspective article explores the challenges and potential of using speech as a biomarker in clinical settings, particularly when constrained by the small clinical datasets typically available in such contexts. We contend that by integrating insights from speech science and clinical research, we can reduce sample complexity in clinical speech AI models with the potential to decrease timelines to translation. Most existing models are based on high-dimensional feature representations trained with limited sample sizes and often do not leverage insights from speech science and clinical research. This approach can lead to overfitting, where the models perform exceptionally well on training data but fail to generalize to new, unseen data. Additionally, without incorporating theoretical knowledge, these models may lack interpretability and robustness, making them challenging to troubleshoot or improve post-deployment. We propose a framework for organizing health conditions based on their impact on speech and promote the use of speech analytics in diverse clinical contexts beyond cross-sectional classification. For high-stakes clinical use cases, we advocate for a focus on explainable and individually-validated measures and stress the importance of rigorous validation frameworks and ethical considerations for responsible deployment. Bridging the gap between AI research and clinical speech research presents new opportunities for more efficient translation of speech-based AI tools and advancement of scientific discoveries in this interdisciplinary space, particularly if limited to small or retrospective datasets.

10.

Electroencephalographic Classification Reveals Atypical Speech Motor Planning in Stuttering Adults.

Kinahan, Sean P; Saidi, Pouria; Daliri, Ayoub; Liss, Julie; Berisha, Visar.

J Speech Lang Hear Res ; 67(7): 2053-2076, 2024 Jul 09.

Artigo em Inglês | MEDLINE | ID: mdl-38924389

RESUMO

PURPOSE: This study explores speech motor planning in adults who stutter (AWS) and adults who do not stutter (ANS) by applying machine learning algorithms to electroencephalographic (EEG) signals. In this study, we developed a technique to holistically examine neural activity differences in speaking and silent reading conditions across the entire cortical surface. This approach allows us to test the hypothesis that AWS will exhibit lower separability of the speech motor planning condition. METHOD: We used the silent reading condition as a control condition to isolate speech motor planning activity. We classified EEG signals from AWS and ANS individuals into speaking and silent reading categories using kernel support vector machines. We used relative complexities of the learned classifiers to compare speech motor planning discernibility for both classes. RESULTS: AWS group classifiers require a more complex decision boundary to separate speech motor planning and silent reading classes. CONCLUSIONS: These findings indicate that the EEG signals associated with speech motor planning are less discernible in AWS, which may result from altered neuronal dynamics in AWS. Our results support the hypothesis that AWS exhibit lower inherent separability of the silent reading and speech motor planning conditions. Further investigation may identify and compare the features leveraged for speech motor classification in AWS and ANS. These observations may have clinical value for developing novel speech therapies or assistive devices for AWS.

Assuntos

Eletroencefalografia , Fala , Gagueira , Humanos , Gagueira/fisiopatologia , Gagueira/classificação , Eletroencefalografia/métodos , Adulto , Fala/fisiologia , Masculino , Feminino , Adulto Jovem , Leitura , Máquina de Vetores de Suporte , Aprendizado de Máquina

11.

Automated speech analytics in ALS: higher sensitivity of digital articulatory precision over the ALSFRS-R.

Stegmann, Gabriela; Krantsevich, Chelsea; Liss, Julie; Charles, Sherman; Bartlett, Meredith; Shefner, Jeremy; Rutkove, Seward; Kawabata, Kan; Talkar, Tanya; Berisha, Visar.

Amyotroph Lateral Scler Frontotemporal Degener ; : 1-9, 2024 Jun 26.

Artigo em Inglês | MEDLINE | ID: mdl-38932502

RESUMO

Objective: Although studies have shown that digital measures of speech detected ALS speech impairment and correlated with the ALSFRS-R speech item, no study has yet compared their performance in detecting speech changes. In this study, we compared the performances of the ALSFRS-R speech item and an algorithmic speech measure in detecting clinically important changes in speech. Importantly, the study was part of a FDA submission which received the breakthrough device designation for monitoring ALS; we provide this paper as a roadmap for validating other speech measures for monitoring disease progression. Methods: We obtained ALSFRS-R speech subscores and speech samples from participants with ALS. We computed the minimum detectable change (MDC) of both measures; using clinician-reported listener effort and a perceptual ratings of severity, we calculated the minimal clinically important difference (MCID) of each measure with respect to both sets of clinical ratings. Results: For articulatory precision, the MDC (.85) was lower than both MCID measures (2.74 and 2.28), and for the ALSFRS-R speech item, MDC (.86) was greater than both MCID measures (.82 and .72), indicating that while the articulatory precision measure detected minimal clinically important differences in speech, the ALSFRS-R speech item did not. Conclusion: The results demonstrate that the digital measure of articulatory precision effectively detects clinically important differences in speech ratings, outperforming the ALSFRS-R speech item. Taken together, the results herein suggest that this speech outcome is a clinically meaningful measure of speech change.

12.

Automatic assessment of vowel space area.

Sandoval, Steven; Berisha, Visar; Utianski, Rene L; Liss, Julie M; Spanias, Andreas.

J Acoust Soc Am ; 134(5): EL477-83, 2013 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-24181994

RESUMO

Vowel space area (VSA) is an attractive metric for the study of speech production deficits and reductions in intelligibility, in addition to the traditional study of vowel distinctiveness. Traditional VSA estimates are not currently sufficiently sensitive to map to production deficits. The present report describes an automated algorithm using healthy, connected speech rather than single syllables and estimates the entire vowel working space rather than corner vowels. Analyses reveal a strong correlation between the traditional VSA and automated estimates. When the two methods diverge, the automated method seems to provide a more accurate area since it accounts for all vowels.

Assuntos

Processamento de Sinais Assistido por Computador , Acústica da Fala , Inteligibilidade da Fala , Medida da Produção da Fala/métodos , Qualidade da Voz , Algoritmos , Automação , Feminino , Humanos , Masculino , Fonética , Espectrografia do Som , Fatores de Tempo

13.

The role of linguistic and indexical information in improved recognition of dysarthric speech.

Borrie, Stephanie A; McAuliffe, Megan J; Liss, Julie M; O'Beirne, Greg A; Anderson, Tim J.

J Acoust Soc Am ; 133(1): 474-82, 2013 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-23297919

RESUMO

This investigation examined perceptual learning of dysarthric speech. Forty listeners were randomly assigned to one of two identification training tasks, aimed at highlighting either the linguistic (word identification task) or indexical (speaker identification task) properties of the neurologically degraded signal. Twenty additional listeners served as a control group, passively exposed to the training stimuli. Immediately following exposure to dysarthric speech, all three listener groups completed an identical phrase transcription task. Analysis of listener transcripts revealed remarkably similar intelligibility improvements for listeners trained to attend to either the linguistic or the indexical properties of the signal. Perceptual learning effects were also evaluated with regards to underlying error patterns indicative of segmental and suprasegmental processing. The findings of this study suggest that elements within both the linguistic and indexical properties of the dysarthric signal are learnable and interact to promote improved processing of this type and severity of speech degradation. Thus, the current study extends support for the development of a model of perceptual processing in which the learning of indexical properties is encoded and retained in conjunction with linguistic properties of the signal.

Assuntos

Aprendizagem por Discriminação , Disartria/fisiopatologia , Fonética , Reconhecimento Psicológico , Acústica da Fala , Inteligibilidade da Fala , Percepção da Fala , Qualidade da Voz , Estimulação Acústica , Adulto , Análise de Variância , Atenção , Audiometria de Tons Puros , Audiometria da Fala , Limiar Auditivo , Distribuição de Qui-Quadrado , Sinais (Psicologia) , Feminino , Humanos , Aprendizagem , Masculino , Modelos Psicológicos , Índice de Gravidade de Doença , Adulto Jovem

14.

Crosslinguistic application of English-centric rhythm descriptors in motor speech disorders.

Liss, Julie M; Utianski, Rene; Lansford, Kaitlin.

Folia Phoniatr Logop ; 65(1): 3-19, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24157596

RESUMO

BACKGROUND: Rhythmic disturbances are a hallmark of motor speech disorders, in which the motor control deficits interfere with the outward flow of speech and by extension speech understanding. As the functions of rhythm are language-specific, breakdowns in rhythm should have language-specific consequences for communication. OBJECTIVE: The goals of this paper are to (i) provide a review of the cognitive-linguistic role of rhythm in speech perception in a general sense and crosslinguistically; (ii) present new results of lexical segmentation challenges posed by different types of dysarthria in American English, and (iii) offer a framework for crosslinguistic considerations for speech rhythm disturbances in the diagnosis and treatment of communication disorders associated with motor speech disorders. SUMMARY: This review presents theoretical and empirical reasons for considering speech rhythm as a critical component of communication deficits in motor speech disorders, and addresses the need for crosslinguistic research to explore language-universal versus language-specific aspects of motor speech disorders.

Assuntos

Idioma , Transtornos dos Movimentos/complicações , Periodicidade , Distúrbios da Fala , Ataxia/complicações , Ataxia/fisiopatologia , Barreiras de Comunicação , Sinais (Psicologia) , Disartria/etiologia , Disartria/fisiopatologia , Disartria/psicologia , Humanos , Transtornos dos Movimentos/fisiopatologia , Reconhecimento Fisiológico de Modelo , Distúrbios da Fala/etiologia , Distúrbios da Fala/fisiopatologia , Distúrbios da Fala/psicologia , Inteligibilidade da Fala , Percepção da Fala

15.

Dysarthria detection based on a deep learning model with a clinically-interpretable layer.

Xu, Lingfeng; Liss, Julie; Berisha, Visar.

JASA Express Lett ; 3(1): 015201, 2023 01.

Artigo em Inglês | MEDLINE | ID: mdl-36725533

RESUMO

Studies have shown deep neural networks (DNN) as a potential tool for classifying dysarthric speakers and controls. However, representations used to train DNNs are largely not clinically interpretable, which limits clinical value. Here, a model with a bottleneck layer is trained to jointly learn a classification label and four clinically-interpretable features. Evaluation of two dysarthria subtypes shows that the proposed method can flexibly trade-off between improved classification accuracy and discovery of clinically-interpretable deficit patterns. The analysis using Shapley additive explanation shows the model learns a representation consistent with the disturbances that define the two dysarthria subtypes considered in this work.

Assuntos

Aprendizado Profundo , Disartria , Humanos , Disartria/diagnóstico , Redes Neurais de Computação

16.

TorchDIVA: An extensible computational model of speech production built on an open-source machine learning library.

Kinahan, Sean P; Liss, Julie M; Berisha, Visar.

PLoS One ; 18(2): e0281306, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36800358

RESUMO

The DIVA model is a computational model of speech motor control that combines a simulation of the brain regions responsible for speech production with a model of the human vocal tract. The model is currently implemented in Matlab Simulink; however, this is less than ideal as most of the development in speech technology research is done in Python. This means there is a wealth of machine learning tools which are freely available in the Python ecosystem that cannot be easily integrated with DIVA. We present TorchDIVA, a full rebuild of DIVA in Python using PyTorch tensors. DIVA source code was directly translated from Matlab to Python, and built-in Simulink signal blocks were implemented from scratch. After implementation, the accuracy of each module was evaluated via systematic block-by-block validation. The TorchDIVA model is shown to produce outputs that closely match those of the original DIVA model, with a negligible difference between the two. We additionally present an example of the extensibility of TorchDIVA as a research platform. Speech quality enhancement in TorchDIVA is achieved through an integration with an existing PyTorch generative vocoder called DiffWave. A modified DiffWave mel-spectrum upsampler was trained on human speech waveforms and conditioned on the TorchDIVA speech production. The results indicate improved speech quality metrics in the DiffWave-enhanced output as compared to the baseline. This enhancement would have been difficult or impossible to accomplish in the original Matlab implementation. This proof-of-concept demonstrates the value TorchDIVA can bring to the research community. Researchers can download the new implementation at: https://github.com/skinahan/DIVA_PyTorch.

Assuntos

Ecossistema , Fala , Humanos , Software , Simulação por Computador , Aprendizado de Máquina

17.

Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection.

Zhang, Jianwei; Liss, Julie; Jayasuriya, Suren; Berisha, Visar.

IEEE/ACM Trans Audio Speech Lang Process ; 31: 1348-1359, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37899766

RESUMO

Approximately 1.2% of the world's population has impaired voice production. As a result, automatic dysphonic voice detection has attracted considerable academic and clinical interest. However, existing methods for automated voice assessment often fail to generalize outside the training conditions or to other related applications. In this paper, we propose a deep learning framework for generating acoustic feature embeddings sensitive to vocal quality and robust across different corpora. A contrastive loss is combined with a classification loss to train our deep learning model jointly. Data warping methods are used on input voice samples to improve the robustness of our method. Empirical results demonstrate that our method not only achieves high in-corpus and cross-corpus classification accuracy but also generates good embeddings sensitive to voice quality and robust across different corpora. We also compare our results against three baseline methods on clean and three variations of deteriorated in-corpus and cross-corpus datasets and demonstrate that the proposed model consistently outperforms the baseline methods.

18.

Reliability and validity of a widely-available AI tool for assessment of stress based on speech.

Yawer, Batul A; Liss, Julie; Berisha, Visar.

Sci Rep ; 13(1): 20224, 2023 11 18.

Artigo em Inglês | MEDLINE | ID: mdl-37980431

RESUMO

Cigna's online stress management toolkit includes an AI-based tool that purports to evaluate a person's psychological stress level based on analysis of their speech, the Cigna StressWaves Test (CSWT). In this study, we evaluate the claim that the CSWT is a "clinical grade" tool via an independent validation. The results suggest that the CSWT is not repeatable and has poor convergent validity; the public availability of the CSWT despite insufficient validation data highlights concerns regarding premature deployment of digital health tools for stress and anxiety management.

Assuntos

Inteligência Artificial , Fala , Humanos , Reprodutibilidade dos Testes

19.

Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation.

Mathad, Vikram C; Liss, Julie M; Chapman, Kathy; Scherer, Nancy; Berisha, Visar.

IEEE/ACM Trans Audio Speech Lang Process ; 31: 86-95, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36712557

RESUMO

Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trained to classify between different consonants using CV regions as input. We demonstrate that the OAM is correlated with perceptual measures in a variety of contexts including (a) adult dysarthric speech, (b) the speech of children with cleft lip/palate, and (c) a database of accented English speech from native Mandarin and Spanish speakers.

20.

A speech-based prognostic model for dysarthria progression in ALS.

Stegmann, Gabriela; Charles, Sherman; Liss, Julie; Shefner, Jeremy; Rutkove, Seward; Berisha, Visar.

Amyotroph Lateral Scler Frontotemporal Degener ; : 1-6, 2023 Jun 12.

Artigo em Inglês | MEDLINE | ID: mdl-37309077

RESUMO

Objective: We demonstrated that it was possible to predict ALS patients' degree of future speech impairment based on past data. We used longitudinal data from two ALS studies where participants recorded their speech on a daily or weekly basis and provided ALSFRS-R speech subscores on a weekly or quarterly basis (quarter-annually). Methods: Using their speech recordings, we measured articulatory precision (a measure of the crispness of pronunciation) using an algorithm that analyzed the acoustic signal of each phoneme in the words produced. First, we established the analytical and clinical validity of the measure of articulatory precision, showing that the measure correlated with perceptual ratings of articulatory precision (r = .9). Second, using articulatory precision from speech samples from each participant collected over a 45-90 day model calibration period, we showed it was possible to predict articulatory precision 30-90 days after the last day of the model calibration period. Finally, we showed that the predicted articulatory precision scores mapped onto ALSFRS-R speech subscores. Results: the mean absolute error was as low as 4% for articulatory precision and 14% for ALSFRS-R speech subscores relative to the total range of their respective scales. Conclusion: Our results demonstrated that a subject-specific prognostic model for speech predicts future articulatory precision and ALSFRS-R speech values accurately.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa