Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
J Acoust Soc Am ; 140(5): EL416, 2016 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-27908075

RESUMO

State-of-the-art automatic speech recognition (ASR) engines perform well on healthy speech; however recent studies show that their performance on dysarthric speech is highly variable. This is because of the acoustic variability associated with the different dysarthria subtypes. This paper aims to develop a better understanding of how perceptual disturbances in dysarthric speech relate to ASR performance. Accurate ratings of a representative set of 32 dysarthric speakers along different perceptual dimensions are obtained and the performance of a representative ASR algorithm on the same set of speakers is analyzed. This work explores the relationship between these ratings and ASR performance and reveals that ASR performance can be predicted from perceptual disturbances in dysarthric speech with articulatory precision contributing the most to the prediction followed by prosody.


Assuntos
Disartria , Algoritmos , Humanos , Fala , Inteligibilidade da Fala , Medida da Produção da Fala , Interface para o Reconhecimento da Fala
2.
J Acoust Soc Am ; 138(4): 2132-9, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26520296

RESUMO

This study examined the relationship between average vowel duration and spectral vowel quality across a group of 149 New Zealand English speakers aged 65 to 90 yr. The primary intent was to determine whether participants who had a natural tendency to speak slowly would also produce more spectrally distinct vowel segments. As a secondary aim, this study investigated whether advancing age exhibited a measurable effect on vowel quality and vowel durations within the group. In examining vowel quality, both flexible and static formant extraction points were compared. Two formant measurements, from selected [ɐ:], [ i:], and [ o:] vowels, were extracted from a standard passage and used to calculate two measurements of vowel space area (VSA) for each speaker. Average vowel duration was calculated from segments across the passage. The study found a statistically significant relationship between speakers' average vowel durations and VSA measurements indicating that, on average, speakers with slower speech rates produced more acoustically distinct speech segments. As expected, increases in average vowel duration were found with advancing age. However, speakers' formant values remained unchanged. It is suggested that the use of a habitually slower speaking rate may assist speakers in maintaining acoustically distinct vowels.


Assuntos
Idoso/psicologia , Fonação , Fonética , Fatores Etários , Idoso de 80 Anos ou mais , Feminino , Hábitos , Humanos , Masculino , Fatores Sexuais , Espectrografia do Som , Acústica da Fala , Medida da Produção da Fala , Fatores de Tempo , Comportamento Verbal
3.
NPJ Digit Med ; 7(1): 208, 2024 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-39122889

RESUMO

This perspective article explores the challenges and potential of using speech as a biomarker in clinical settings, particularly when constrained by the small clinical datasets typically available in such contexts. We contend that by integrating insights from speech science and clinical research, we can reduce sample complexity in clinical speech AI models with the potential to decrease timelines to translation. Most existing models are based on high-dimensional feature representations trained with limited sample sizes and often do not leverage insights from speech science and clinical research. This approach can lead to overfitting, where the models perform exceptionally well on training data but fail to generalize to new, unseen data. Additionally, without incorporating theoretical knowledge, these models may lack interpretability and robustness, making them challenging to troubleshoot or improve post-deployment. We propose a framework for organizing health conditions based on their impact on speech and promote the use of speech analytics in diverse clinical contexts beyond cross-sectional classification. For high-stakes clinical use cases, we advocate for a focus on explainable and individually-validated measures and stress the importance of rigorous validation frameworks and ethical considerations for responsible deployment. Bridging the gap between AI research and clinical speech research presents new opportunities for more efficient translation of speech-based AI tools and advancement of scientific discoveries in this interdisciplinary space, particularly if limited to small or retrospective datasets.

4.
J Acoust Soc Am ; 134(5): EL477-83, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24181994

RESUMO

Vowel space area (VSA) is an attractive metric for the study of speech production deficits and reductions in intelligibility, in addition to the traditional study of vowel distinctiveness. Traditional VSA estimates are not currently sufficiently sensitive to map to production deficits. The present report describes an automated algorithm using healthy, connected speech rather than single syllables and estimates the entire vowel working space rather than corner vowels. Analyses reveal a strong correlation between the traditional VSA and automated estimates. When the two methods diverge, the automated method seems to provide a more accurate area since it accounts for all vowels.


Assuntos
Processamento de Sinais Assistido por Computador , Acústica da Fala , Inteligibilidade da Fala , Medida da Produção da Fala/métodos , Qualidade da Voz , Algoritmos , Automação , Feminino , Humanos , Masculino , Fonética , Espectrografia do Som , Fatores de Tempo
5.
J Acoust Soc Am ; 133(1): 474-82, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23297919

RESUMO

This investigation examined perceptual learning of dysarthric speech. Forty listeners were randomly assigned to one of two identification training tasks, aimed at highlighting either the linguistic (word identification task) or indexical (speaker identification task) properties of the neurologically degraded signal. Twenty additional listeners served as a control group, passively exposed to the training stimuli. Immediately following exposure to dysarthric speech, all three listener groups completed an identical phrase transcription task. Analysis of listener transcripts revealed remarkably similar intelligibility improvements for listeners trained to attend to either the linguistic or the indexical properties of the signal. Perceptual learning effects were also evaluated with regards to underlying error patterns indicative of segmental and suprasegmental processing. The findings of this study suggest that elements within both the linguistic and indexical properties of the dysarthric signal are learnable and interact to promote improved processing of this type and severity of speech degradation. Thus, the current study extends support for the development of a model of perceptual processing in which the learning of indexical properties is encoded and retained in conjunction with linguistic properties of the signal.


Assuntos
Aprendizagem por Discriminação , Disartria/fisiopatologia , Fonética , Reconhecimento Psicológico , Acústica da Fala , Inteligibilidade da Fala , Percepção da Fala , Qualidade da Voz , Estimulação Acústica , Adulto , Análise de Variância , Atenção , Audiometria de Tons Puros , Audiometria da Fala , Limiar Auditivo , Distribuição de Qui-Quadrado , Sinais (Psicologia) , Feminino , Humanos , Aprendizagem , Masculino , Modelos Psicológicos , Índice de Gravidade de Doença , Adulto Jovem
6.
Folia Phoniatr Logop ; 65(1): 3-19, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24157596

RESUMO

BACKGROUND: Rhythmic disturbances are a hallmark of motor speech disorders, in which the motor control deficits interfere with the outward flow of speech and by extension speech understanding. As the functions of rhythm are language-specific, breakdowns in rhythm should have language-specific consequences for communication. OBJECTIVE: The goals of this paper are to (i) provide a review of the cognitive-linguistic role of rhythm in speech perception in a general sense and crosslinguistically; (ii) present new results of lexical segmentation challenges posed by different types of dysarthria in American English, and (iii) offer a framework for crosslinguistic considerations for speech rhythm disturbances in the diagnosis and treatment of communication disorders associated with motor speech disorders. SUMMARY: This review presents theoretical and empirical reasons for considering speech rhythm as a critical component of communication deficits in motor speech disorders, and addresses the need for crosslinguistic research to explore language-universal versus language-specific aspects of motor speech disorders.


Assuntos
Idioma , Transtornos dos Movimentos/complicações , Periodicidade , Distúrbios da Fala , Ataxia/complicações , Ataxia/fisiopatologia , Barreiras de Comunicação , Sinais (Psicologia) , Disartria/etiologia , Disartria/fisiopatologia , Disartria/psicologia , Humanos , Transtornos dos Movimentos/fisiopatologia , Reconhecimento Fisiológico de Modelo , Distúrbios da Fala/etiologia , Distúrbios da Fala/fisiopatologia , Distúrbios da Fala/psicologia , Inteligibilidade da Fala , Percepção da Fala
7.
PLoS One ; 18(2): e0281306, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36800358

RESUMO

The DIVA model is a computational model of speech motor control that combines a simulation of the brain regions responsible for speech production with a model of the human vocal tract. The model is currently implemented in Matlab Simulink; however, this is less than ideal as most of the development in speech technology research is done in Python. This means there is a wealth of machine learning tools which are freely available in the Python ecosystem that cannot be easily integrated with DIVA. We present TorchDIVA, a full rebuild of DIVA in Python using PyTorch tensors. DIVA source code was directly translated from Matlab to Python, and built-in Simulink signal blocks were implemented from scratch. After implementation, the accuracy of each module was evaluated via systematic block-by-block validation. The TorchDIVA model is shown to produce outputs that closely match those of the original DIVA model, with a negligible difference between the two. We additionally present an example of the extensibility of TorchDIVA as a research platform. Speech quality enhancement in TorchDIVA is achieved through an integration with an existing PyTorch generative vocoder called DiffWave. A modified DiffWave mel-spectrum upsampler was trained on human speech waveforms and conditioned on the TorchDIVA speech production. The results indicate improved speech quality metrics in the DiffWave-enhanced output as compared to the baseline. This enhancement would have been difficult or impossible to accomplish in the original Matlab implementation. This proof-of-concept demonstrates the value TorchDIVA can bring to the research community. Researchers can download the new implementation at: https://github.com/skinahan/DIVA_PyTorch.


Assuntos
Ecossistema , Fala , Humanos , Software , Simulação por Computador , Aprendizado de Máquina
8.
Artigo em Inglês | MEDLINE | ID: mdl-36712557

RESUMO

Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trained to classify between different consonants using CV regions as input. We demonstrate that the OAM is correlated with perceptual measures in a variety of contexts including (a) adult dysarthric speech, (b) the speech of children with cleft lip/palate, and (c) a database of accented English speech from native Mandarin and Spanish speakers.

9.
J Speech Lang Hear Res ; 66(8S): 3132-3150, 2023 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-37071795

RESUMO

PURPOSE: Defined as the similarity of speech behaviors between interlocutors, speech entrainment plays an important role in successful adult conversations. According to theoretical models of entrainment and research on motoric, cognitive, and social developmental milestones, the ability to entrain should develop throughout adolescence. However, little is known about the specific developmental trajectory or the role of speech entrainment in conversational outcomes of this age group. The purpose of this study is to characterize speech entrainment patterns in the conversations of neurotypical early adolescents. METHOD: This study utilized a corpus of 96 task-based conversations between adolescents between the ages of 9 and 14 years and a comparison corpus of 32 task-based conversations between adults. For each conversational turn, two speech entrainment scores were calculated for 429 acoustic features across rhythmic, articulatory, and phonatory dimensions. Predictive modeling was used to evaluate the degree of entrainment and relationship between entrainment and two metrics of conversational success. RESULTS: Speech entrainment increased throughout early adolescence but did not reach the level exhibited in conversations between adults. Additionally, speech entrainment was predictive of both conversational quality and conversational efficiency. Furthermore, models that included all acoustic features and both entrainment types performed better than models that only included individual acoustic feature sets or one type of entrainment. CONCLUSIONS: Our findings show that speech entrainment skills are largely developed during early adolescence with continued development possibly occurring across later adolescence. Additionally, results highlight the role of speech entrainment in successful conversation in this population, suggesting the import of continued exploration of this phenomenon in both neurotypical and neurodivergent adolescents. We also provide evidence of the value of using holistic measures that capture the multidimensionality of speech entrainment and provide a validated methodology for investigating entrainment across multiple acoustic features and entrainment types.


Assuntos
Comunicação , Fala , Adulto , Humanos , Adolescente , Criança , Fonação , Medida da Produção da Fala , Acústica
10.
Schizophr Bull ; 49(Suppl_2): S183-S195, 2023 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-36946533

RESUMO

BACKGROUND AND HYPOTHESIS: Automated language analysis is becoming an increasingly popular tool in clinical research involving individuals with mental health disorders. Previous work has largely focused on using high-dimensional language features to develop diagnostic and prognostic models, but less work has been done to use linguistic output to assess downstream functional outcomes, which is critically important for clinical care. In this work, we study the relationship between automated language composites and clinical variables that characterize mental health status and functional competency using predictive modeling. STUDY DESIGN: Conversational transcripts were collected from a social skills assessment of individuals with schizophrenia (n = 141), bipolar disorder (n = 140), and healthy controls (n = 22). A set of composite language features based on a theoretical framework of speech production were extracted from each transcript and predictive models were trained. The prediction targets included clinical variables for assessment of mental health status and social and functional competency. All models were validated on a held-out test sample not accessible to the model designer. STUDY RESULTS: Our models predicted the neurocognitive composite with Pearson correlation PCC = 0.674; PANSS-positive with PCC = 0.509; PANSS-negative with PCC = 0.767; social skills composite with PCC = 0.785; functional competency composite with PCC = 0.616. Language features related to volition, affect, semantic coherence, appropriateness of response, and lexical diversity were useful for prediction of clinical variables. CONCLUSIONS: Language samples provide useful information for the prediction of a variety of clinical variables that characterize mental health status and functional competency.


Assuntos
Transtorno Bipolar , Esquizofrenia , Humanos , Esquizofrenia/diagnóstico , Fala , Comunicação , Nível de Saúde
11.
J Acoust Soc Am ; 131(2): EL112-8, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22352609

RESUMO

There is substantial performance variability among listeners who transcribe degraded speech. Error patterns from 88 listeners who transcribed dysarthric speech were examined to identify differential use of syllabic strength cues for lexical segmentation. Transcripts from listeners were divided into four groups (ranging from Better- to Poorer- performing). Phrases classified as Higher- and Lower-intelligibility were analyzed separately for each performance group to assess the independent variable of severity. Results revealed that all four listener groups used syllabic strength cues for lexical segmentation of Higher-intelligibility speech, but only the Poorer listeners persisted with this strategy for the Lower-intelligibility phrases. This finding and additional analyses suggest testable hypotheses to address the role of cue-use and performance patterns.


Assuntos
Percepção Auditiva/fisiologia , Sinais (Psicologia) , Disartria/psicologia , Inteligibilidade da Fala/fisiologia , Adolescente , Adulto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Variações Dependentes do Observador , Fonética , Adulto Jovem
12.
J Acoust Soc Am ; 132(2): EL102-8, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22894306

RESUMO

Differences in perceptual strategies for lexical segmentation of moderate hypokinetic dysarthric speech, apparently related to the conditions of the familiarization procedure, have been previously reported [Borrie et al., Language and Cognitive Processes (2012)]. The current follow-up investigation examined whether this difference was also observed when familiarization stimuli highlighted syllabic strength contrast cues. Forty listeners completed an identical transcription task following familiarization with dysarthric phrases presented under either passive or explicit learning conditions. Lexical boundary error patterns revealed that syllabic strength cues were exploited in both familiarization conditions. Comparisons with data previously reported afford further insight into perceptual learning of dysarthric speech.


Assuntos
Sinais (Psicologia) , Disartria/fisiopatologia , Reconhecimento Psicológico , Acústica da Fala , Inteligibilidade da Fala , Percepção da Fala , Estimulação Acústica , Adulto , Análise de Variância , Audiometria da Fala , Humanos , Aprendizagem , Estimulação Luminosa , Leitura , Adulto Jovem
13.
J Med Speech Lang Pathol ; 19(4): 25-36, 2011 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-24569812

RESUMO

Benefits to speech intelligibility can be achieved by enhancing a listener's ability to decipher it. However, much remains to be learned about the variables that influence the effectiveness of various listener-based manipulations. This study examined the benefit of providing listeners with the topic of some phases produced by speakers with either hypokinetic or ataxic dysarthria. Total and topic word accuracy, topic-related substitutions, and lexical boundary errors were calculated from the listener transcripts. Data were compared with those who underwent a familiarization process (reported by Liss, Spitzer, Caviness, & Adler, 2002) and with those inexperienced with disordered speech (reported by Liss Spitzer, Caviness, & Adler, 2000). Results revealed that listeners of ataxic speech provided with topic knowledge obtained higher intelligibility scores than naïve listeners. The magnitude of benefit was similar to the familiarization condition. However, topic word and word substitution analyses revealed different underlying perceptual mechanisms responsible for the observed benefit. No differences attributable to listening condition were discovered in lexical segmentation patterns. Overall, the results support the need for further study of listener-based manipulations to elucidate the mechanisms responsible for the observed perceptual benefits for each dysarthria type.

14.
IEEE Trans Biomed Eng ; 68(10): 2986-2996, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-33566756

RESUMO

OBJECTIVES: Evaluation of hypernasality requires extensive perceptual training by clinicians and extending this training on a large scale internationally is untenable; this compounds the health disparities that already exist among children with cleft. In this work, we present the objective hypernasality measure (OHM), a speech-based algorithm that automatically measures hypernasality in speech, and validate it relative to a group of trained clinicians. METHODS: We trained a deep neural network (DNN) on approximately 100 hours of a publicly-available healthy speech corpus to detect the presence of nasal acoustic cues generated through the production of nasal consonants and nasalized phonemes in speech. Importantly, this model does not require any clinical data for training. The posterior probabilities of the deep learning model were aggregated at the sentence and speaker-levels to compute the OHM. RESULTS: The results showed that the OHM was significantly correlated with perceptual hypernasality ratings from the Americleft database (r = 0.797, p < 0.001) and the New Mexico Cleft Palate Center (NMCPC) database (r = 0.713, p < 0.001). In addition, we evaluated the relationship between the OHM and articulation errors; the sensitivity of the OHM in detecting the presence of very mild hypernasality; and established the internal reliability of the metric. Further, the performance of the OHM was compared with a DNN regression algorithm directly trained on the hypernasal speech samples. SIGNIFICANCE: The results indicate that the OHM is able to measure the severity of hypernasality on par with Americleft-trained clinicians on thisdataset.


Assuntos
Fissura Palatina , Aprendizado Profundo , Distúrbios da Voz , Criança , Fissura Palatina/diagnóstico , Humanos , Reprodutibilidade dos Testes , Medida da Produção da Fala
15.
Front Neurol ; 12: 795374, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34956070

RESUMO

Clinical assessments often use complex picture description tasks to elicit natural speech patterns and magnify changes occurring in brain regions implicated in Alzheimer's disease and dementia. As The Cookie Theft picture description task is used in the largest Alzheimer's disease and dementia cohort studies available, we aimed to create algorithms that could characterize the visual narrative path a participant takes in describing what is happening in this image. We proposed spatio-semantic graphs, models based on graph theory that transform the participants' narratives into graphs that retain semantic order and encode the visuospatial information between content units in the image. The resulting graphs differ between Cognitively Impaired and Unimpaired participants in several important ways. Cognitively Impaired participants consistently scored higher on features that are heavily associated with symptoms of cognitive decline, including repetition, evidence of short-term memory lapses, and generally disorganized narrative descriptions, while Cognitively Unimpaired participants produced more efficient narrative paths. These results provide evidence that spatio-semantic graph analysis of these tasks can generate important insights into a participant's cognitive performance that cannot be generated from semantic analysis alone.

16.
J Speech Lang Hear Res ; 63(8): 2637-2648, 2020 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-32697611

RESUMO

Purpose In our previous studies, we showed that the brain modulates the auditory system, and the modulation starts during speech planning. However, it remained unknown whether the brain uses similar mechanisms to modulate the orofacial somatosensory system. Here, we developed a novel behavioral paradigm to (a) examine whether the somatosensory system is modulated during speech planning and (b) determine the somatosensory modulation's time course during planning and production. Method Participants (N = 20) completed two experiments in which we applied electrical current stimulation to the lower lip to induce somatosensory sensation. In the first experiment, we used a staircase method (one-up, four-down) to determine each participant's perceptual threshold at rest (i.e., the stimulus that the participant detected on 85% of trials). In the second experiment, we estimated each participant's detection ratio of electrical stimuli (with a magnitude equivalent of their perceptual threshold) delivered at various time points before speaking and during a control condition (silent reading). Results We found that the overall detection ratio in the silent reading condition remained unchanged relative to the detection ratio at rest. Approximately 536 ms before speech onset, the detection ratio in the speaking condition was similar to that in the silent reading condition; however, the detection ratio in the speaking condition gradually started to decrease and reached its lowest level at 58 ms before speech onset. Conclusions Overall, we provided compelling behavioral evidence that, as the speech motor system prepares speech movements, it also modulates the orofacial somatosensory system in a temporally specific manner.


Assuntos
Percepção da Fala , Fala , Encéfalo , Humanos , Lábio , Leitura
17.
IEEE J Sel Top Signal Process ; 14(2): 282-298, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-33907590

RESUMO

It is widely accepted that information derived from analyzing speech (the acoustic signal) and language production (words and sentences) serves as a useful window into the health of an individual's cognitive ability. In fact, most neuropsychological testing batteries have a component related to speech and language where clinicians elicit speech from patients for subjective evaluation across a broad set of dimensions. With advances in speech signal processing and natural language processing, there has been recent interest in developing tools to detect more subtle changes in cognitive-linguistic function. This work relies on extracting a set of features from recorded and transcribed speech for objective assessments of speech and language, early diagnosis of neurological disease, and tracking of disease after diagnosis. With an emphasis on cognitive and thought disorders, in this paper we provide a review of existing speech and language features used in this domain, discuss their clinical application, and highlight their advantages and disadvantages. Broadly speaking, the review is split into two categories: language features based on natural language processing and speech features based on speech signal processing. Within each category, we consider features that aim to measure complementary dimensions of cognitive-linguistics, including language diversity, syntactic complexity, semantic coherence, and timing. We conclude the review with a proposal of new research directions to further advance the field.

18.
J Speech Lang Hear Res ; 63(1): 83-94, 2020 01 22.
Artigo em Inglês | MEDLINE | ID: mdl-31855608

RESUMO

Purpose Despite the import of conversational entrainment to successful spoken dialogue, the systematic characterization of this behavioral syncing phenomenon represents a critical gap in the field of speech pathology. The goal of this study was to acoustically characterize conversational entrainment in the context of dysarthria using a multidimensional approach previously validated in healthy populations (healthy conversations; Borrie, Barrett, Willi, & Berisha, 2019). Method A large corpus of goal-oriented conversations between participants with dysarthria and healthy participants (disordered conversations) was elicited using a "spot the difference" task. Expert clinical assessment of entrainment and a measure of conversational success (communicative efficiency) was obtained for each of the audio-recorded conversations. Conversational entrainment of acoustic features representing rhythmic, articulatory, and phonatory dimensions of speech was identified using cross-recurrence quantification analysis with clinically informed model parameters and validated with a sham condition involving conversational participants who did not converse with one another. The relationship between conversational entrainment and communicative efficiency was examined. Results Acoustic evidence of entrainment was observed in phonatory, but not rhythmic and articulatory, behavior, a finding that differs from healthy conversations in which entrainment was observed in all speech signal dimensions. This result, that disordered conversations showed less acoustic entrainment than healthy conversations, is corroborated by clinical assessment of entrainment in which the disordered conversations were rated, overall, as being less in sync than healthy conversations. Furthermore, acoustic entrainment was predictive of communicative efficiency, corroborated by a relationship between clinical assessment and the same outcome measure. Conclusions The findings confirm our hypothesis that the pathological speech production parameters of dysarthria disrupt the seemingly ubiquitous phenomenon of conversational entrainment, thus advancing entrainment deficits as an important variable in dysarthria, one that may have causative effects on the success of everyday communication. Results further reveal that while this approach provides a broad overview, methodologies for characterizing conversational entrainment in dysarthria must continue to be developed and refined, with a focus on clinical utility. Supplemental Material https://osf.io/ktg5q.


Assuntos
Comunicação , Disartria/fisiopatologia , Fonação/fisiologia , Acústica da Fala , Comportamento Verbal/fisiologia , Adulto , Eficiência/fisiologia , Feminino , Humanos , Masculino , Medida da Produção da Fala , Qualidade da Voz
19.
J Speech Lang Hear Res ; 52(5): 1334-52, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19717656

RESUMO

PURPOSE: In this study, the authors examined whether rhythm metrics capable of distinguishing languages with high and low temporal stress contrast also can distinguish among control and dysarthric speakers of American English with perceptually distinct rhythm patterns. Methods Acoustic measures of vocalic and consonantal segment durations were obtained for speech samples from 55 speakers across 5 groups (hypokinetic, hyperkinetic, flaccid-spastic, ataxic dysarthrias, and controls). Segment durations were used to calculate standard and new rhythm metrics. Discriminant function analyses (DFAs) were used to determine which sets of predictor variables (rhythm metrics) best discriminated between groups (control vs. dysarthrias; and among the 4 dysarthrias). A cross-validation method was used to test the robustness of each original DFA. RESULTS: The majority of classification functions were more than 80% successful in classifying speakers into their appropriate group. New metrics that combined successive vocalic and consonantal segments emerged as important predictor variables. DFAs pitting each dysarthria group against the combined others resulted in unique constellations of predictor variables that yielded high levels of classification accuracy. CONCLUSIONS: This study confirms the ability of rhythm metrics to distinguish control speech from dysarthrias and to discriminate dysarthria subtypes. Rhythm metrics show promise for use as a rational and objective clinical tool.


Assuntos
Disartria/diagnóstico , Disartria/fisiopatologia , Testes de Articulação da Fala , Fala/fisiologia , Análise de Variância , Ataxia/diagnóstico , Ataxia/fisiopatologia , Humanos , Idioma , Valor Preditivo dos Testes , Acústica da Fala , Fatores de Tempo
20.
Artigo em Inglês | MEDLINE | ID: mdl-33376454

RESUMO

A key initial step in several natural language processing (NLP) tasks involves embedding phrases of text to vectors of real numbers that preserve semantic meaning. To that end, several methods have been recently proposed with impressive results on semantic similarity tasks. However, all of these approaches assume that perfect transcripts are available when generating the embeddings. While this is a reasonable assumption for analysis of written text, it is limiting for analysis of transcribed text. In this paper we investigate the effects of word substitution errors, such as those coming from automatic speech recognition errors (ASR), on several state-of-the-art sentence embedding methods. To do this, we propose a new simulator that allows the experimenter to induce ASR-plausible word substitution errors in a corpus at a desired word error rate. We use this simulator to evaluate the robustness of several sentence embedding methods. Our results show that pre-trained neural sentence encoders are both robust to ASR errors and perform well on textual similarity tasks after errors are introduced. Meanwhile, unweighted averages of word vectors perform well with perfect transcriptions, but their performance degrades rapidly on textual similarity tasks for text with word substitution errors.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa