Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Curr Alzheimer Res ; 17(7): 658-666, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33032509

RESUMO

BACKGROUND: Current conventional cognitive assessments are limited in their efficiency and sensitivity, often relying on a single score such as the total correct items. Typically, multiple features of response go uncaptured. OBJECTIVES: We aim to explore a new set of automatically derived features from the Digit Span (DS) task that address some of the drawbacks in the conventional scoring and are also useful for distinguishing subjects with Mild Cognitive Impairment (MCI) from those with intact cognition. METHODS: Audio-recordings of the DS tests administered to 85 subjects (22 MCI and 63 healthy controls, mean age 90.2 years) were transcribed using an Automatic Speech Recognition (ASR) system. Next, five correctness measures were generated from Levenshtein distance analysis of responses: number correct, incorrect, deleted, inserted, and substituted words compared to the test item. These per-item features were aggregated across all test items for both Forward Digit Span (FDS) and Backward Digit Span (BDS) tasks using summary statistical functions, constructing a global feature vector representing the detailed assessment of each subject's response. A support vector machine classifier distinguished MCI from cognitively intact participants. RESULTS: Conventional DS scores did not differentiate MCI participants from controls. The automated multi-feature DS-derived metric achieved 73% on AUC-ROC of the SVM classifier, independent of additional clinical features (77% when combined with demographic features of subjects); well above chance, 50%. CONCLUSION: Our analysis verifies the effectiveness of introduced measures, solely derived from the DS task, in the context of differentiating subjects with MCI from those with intact cognition.

2.
Proc Conf Assoc Comput Linguist Meet ; 2020: 177-185, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-33060888

RESUMO

Many clinical assessment instruments used to diagnose language impairments in children include a task in which the subject must formulate a sentence to describe an image using a specific target word. Because producing sentences in this way requires the speaker to integrate syntactic and semantic knowledge in a complex manner, responses are typically evaluated on several different dimensions of appropriateness yielding a single composite score for each response. In this paper, we present a dataset consisting of non-clinically elicited responses for three related sentence formulation tasks, and we propose an approach for automatically evaluating their appropriateness. Using neural machine translation, we generate correct-incorrect sentence pairs to serve as synthetic data in order to increase the amount and diversity of training data for our scoring model. Our scoring model uses transfer learning to facilitate automatic sentence appropriateness evaluation. We further compare custom word embeddings with pre-trained contextualized embeddings serving as features for our scoring model. We find that transfer learning improves scoring accuracy, particularly when using pre-trained contextualized embeddings.

3.
Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 6111-6114, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-33019365

RESUMO

This study describes a fully automated method of expressive language assessment based on vocal responses of children to a sentence repetition task (SRT), a language test that taps into core language skills. Our proposed method automatically transcribes the vocal responses using a test-specific automatic speech recognition system. From the transcriptions, a regression model predicts the gold standard test scores provided by speech-language pathologists. Our preliminary experimental results on audio recordings of 104 children (43 with typical development and 61 with a neurodevelopmental disorder) verifies the feasibility of the proposed automatic method for predicting gold standard scores on this language test, with averaged mean absolute error of 6.52 (on a observed score range from 0 to 90 with a mean value of 49.56) between observed and predicted ratings.Clinical relevance-We describe the use of fully automatic voice-based scoring in language assessment including the clinical impact this development may have on the field of speech-language pathology. The automated test also creates a technological foundation for the computerization of a broad array of tests for voice-based language assessment.


Assuntos
Patologia da Fala e Linguagem , Voz , Criança , Humanos , Idioma , Desenvolvimento da Linguagem , Testes de Linguagem
4.
Front Psychol ; 11: 535, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32328008

RESUMO

Introduction: Clinically relevant information can go uncaptured in the conventional scoring of a verbal fluency test. We hypothesize that characterizing the temporal aspects of the response through a set of time related measures will be useful in distinguishing those with MCI from cognitively intact controls. Methods: Audio recordings of an animal fluency test administered to 70 demographically matched older adults (mean age 90.4 years), 28 with mild cognitive impairment (MCI) and 42 cognitively intact (CI) were professionally transcribed and fed into an automatic speech recognition (ASR) system to estimate the start time of each recalled word in the response. Next, we semantically cluster participant generated animal names and through a novel set of time-based measures, we characterize the semantic search strategy of subjects in retrieving words from animal name clusters. This set of time-based features along with standard count-based features (e.g., number of correctly retrieved animal names) were then used in a machine learning algorithm trained for distinguishing those with MCI from CI controls. Results: The combination of both count-based and time-based features, automatically derived from the test response, achieved 77% on AUC-ROC of the support vector machine (SVM) classifier, outperforming the model trained only on the raw test score (AUC, 65%), and well above the chance model (AUC, 50%). Conclusion: This approach supports the value of introducing time-based measures to the assessment of verbal fluency in the context of this generative task differentiating subjects with MCI from those with intact cognition.

5.
J Voice ; 33(5): 721-727, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29884509

RESUMO

INTRODUCTION: Adductor spasmodic dysphonia (ADSD) is one of the most disabling voice disorders with no permanent cure. Patients with ADSD suffer from poor voice quality and repeated interruption of phonation that leads to limitations in daily communication. Botox (BT) injection, considered the gold standard treatment for ADSD, reduces the amount of voice breaks and improves voice quality for a limited period. In this study, patients with ADSD were followed after a single BT injection to track the changes in QOL and perceptual voice quality over a 6-month period. METHOD: This is a prospective and longitudinal study. Fifteen patients with ADSD were evaluated preinjection and 1, 3, and 6 months postinjection. They completed the Voice Activity and Participation Profile-Persian Version (VAPPP) and read a passage at each recording period. Perceptual assessment was done by three expert speech-language pathologists with knowledge of ADSD using the grade, roughness, breathiness, asthenia, strain (GRBAS) scale. The data were analyzed using Friedman, Wilcoxon, and McNemar tests. The significance level was set at P < 0.05. RESULTS: The VAPPP total score and each of the domain scores reached their peak scores at 3 months postinjection. At 6 months postinjection, the VAPPP scores increased significantly in comparison with the 3-month scores and but were lower than preinjection scores. GRBAS results also indicated that patients' voices at 1 and 3 months postinjection were significantly less severe in terms of strain and roughness (P = 0.01; P < 0.001, respectively). CONCLUSION: BT injection resulted in improvement of subjects' QOL. The improvement was greatest at 3 months postinjection but remained above the preinjection values at 6 months after injection. The voice quality also improved but was not judged as normal.


Assuntos
Inibidores da Liberação da Acetilcolina/administração & dosagem , Toxinas Botulínicas/administração & dosagem , Disfonia/tratamento farmacológico , Fonação/efeitos dos fármacos , Qualidade de Vida , Prega Vocal/efeitos dos fármacos , Qualidade da Voz/efeitos dos fármacos , Adulto , Idoso , Disfonia/diagnóstico , Disfonia/fisiopatologia , Feminino , Humanos , Injeções , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Estudos Prospectivos , Recuperação de Função Fisiológica , Fatores de Tempo , Resultado do Tratamento , Prega Vocal/fisiopatologia
6.
Interspeech ; 2019: 11-15, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33088838

RESUMO

This study explores building and improving an automatic speech recognition (ASR) system for children aged 6-9 years and diagnosed with autism spectrum disorder (ASD), language impairment (LI), or both. Working with only 1.5 hours of target data in which children perform the Clinical Evaluation of Language Fundamentals Recalling Sentences task, we apply deep neural network (DNN) weight transfer techniques to adapt a large DNN model trained on the LibriSpeech corpus of adult speech. To begin, we aim to find the best proportional training rates of the DNN layers. Our best configuration yields a 29.38% word error rate (WER). Using this configuration, we explore the effects of quantity and similarity of data augmentation in transfer learning. We augment our training with portions of the OGI Kids' Corpus, adding 4.6 hours of typically developing speakers aged kindergarten through 3rd grade. We find that 2nd grade data alone - approximately the mean age of the target data - outperforms other grades and all the sets combined. Doubling the data for 1st, 2nd, and 3rd grade, we again compare each grade as well as pairs of grades. We find the combination of 1st and 2nd grade performs best at a 26.21% WER.

7.
Comput Speech Lang ; 50: 62-84, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29628620

RESUMO

Computer-Assisted Pronunciation Training (CAPT) systems aim to help a child learn the correct pronunciations of words. However, while there are many online commercial CAPT apps, there is no consensus among Speech Language Therapists (SLPs) or non-professionals about which CAPT systems, if any, work well. The prevailing assumption is that practicing with such programs is less reliable and thus does not provide the feedback necessary to allow children to improve their performance. The most common method for assessing pronunciation performance is the Goodness of Pronunciation (GOP) technique. Our paper proposes two new GOP techniques. We have found that pronunciation models that use explicit knowledge about error pronunciation patterns can lead to more accurate classification whether a phoneme was correctly pronounced or not. We evaluate the proposed pronunciation assessment methods against a baseline state of the art GOP approach, and show that the proposed techniques lead to classification performance that is more similar to that of a human expert.

8.
Alzheimers Dement (N Y) ; 3(2): 219-228, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29067328

RESUMO

INTRODUCTION: Trials in Alzheimer's disease are increasingly focusing on prevention in asymptomatic individuals. We hypothesized that indicators of mild cognitive impairment (MCI) may be present in the content of spoken language in older adults and be useful in distinguishing those with MCI from those who are cognitively intact. To test this hypothesis, we performed linguistic analyses of spoken words in participants with MCI and those with intact cognition participating in a clinical trial. METHODS: Data came from a randomized controlled behavioral clinical trial to examine the effect of unstructured conversation on cognitive function among older adults with either normal cognition or MCI (ClinicalTrials.gov: NCT01571427). Unstructured conversations (but with standardized preselected topics across subjects) were recorded between interviewers and interviewees during the intervention sessions of the trial from 14 MCI and 27 cognitively intact participants. From the transcription of interviewees recordings, we grouped spoken words using Linguistic Inquiry and Word Count (LIWC), a structured table of words, which categorizes 2500 words into 68 different word subcategories such as positive and negative words, fillers, and physical states. The number of words in each LIWC word subcategory constructed a vector of 68 dimensions representing the linguistic features of each subject. We used support vector machine and random forest classifiers to distinguish MCI from cognitively intact participants. RESULTS: MCI participants were distinguished from those with intact cognition using linguistic features obtained by LIWC with 84% classification accuracy which is well above chance 60%. DISCUSSION: Linguistic analyses of spoken language may be a powerful tool in distinguishing MCI subjects from those with intact cognition. Further studies to assess whether spoken language derived measures could detect changes in cognitive functions in clinical trials are warrented.

9.
Proc Int Conf Mach Learn Appl ; 2017: 304-308, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33215167

RESUMO

In this study, we explore the feasibility of speech-based techniques to automatically evaluate a nonword repetition (NWR) test. NWR tests, a useful marker for detecting language impairment, require repetition of pronounceable nonwords, such as "D OY F", presented aurally by an examiner or via a recording. Our proposed method leverages ASR techniques to first transcribe verbal responses. Second, it applies machine learning techniques to ASR output for predicting gold standard scores provided by speech and language pathologists. Our experimental results for a sample of 101 children (42 with autism spectrum disorders, or ASD; 18 with specific language impairment, or SLI; and 41 typically developed, or TD) show that the proposed approach is successful in predicting scores on this test, with averaged product-moment correlations of 0.74 and mean absolute error of 0.06 (on a observed score range from 0.34 to 0.97) between observed and predicted ratings.

10.
Annu Int Conf IEEE Eng Med Biol Soc ; 2016: 570-573, 2016 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28268395

RESUMO

Automatic detection of falls is important for enabling people who are older to safely live independently longer within their homes. Current automated fall detection systems are typically designed using inertial sensors positioned on the body that generate an alert if there is an abrupt change in motion. These inertial sensors provide no information about the context of the person being monitored and are prone to false positives that can limit their ongoing usage. We describe a fall-detection system consisting of a wearable inertial measurement unit (IMU) and an RF time-of-flight (ToF) transceiver that ranges with other ToF beacons positioned throughout a home. The ToF ranging enables the system to track the position of the person as they move around a home. We describe and show results from three machine learning algorithms that integrate context-related position information with IMU based fall detection to enable a deeper understanding of where falls are occurring and also to improve the specificity of fall detection. The beacons used to localize the falls were able to accurately track to within 0.39 meters of specific waypoints in a simulated home environment. Each of the three algorithms was evaluated with and without the context-based false alarm detection on simulated falls done by 3 volunteer subjects in a simulated home. False positive rates were reduced by 50% when including context.


Assuntos
Acidentes por Quedas , Algoritmos , Monitorização Ambulatorial/métodos , Humanos , Monitorização Ambulatorial/instrumentação , Monitorização Ambulatorial/normas , Sensibilidade e Especificidade
11.
Text Speech Dialog ; 9924: 470-477, 2016 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33244525

RESUMO

In this paper, we propose an automatic scoring approach for assessing the language deficit in a sentence repetition task used to evaluate children with language disorders. From ASR-transcribed sentences, we extract sentence similarity measures, including WER and Levenshtein distance, and use them as the input features in a regression model to predict the reference scores manually rated by experts. Our experimental analysis on subject-level scores of 46 children, 33 diagnosed with autism spectrum disorders (ASD), and 13 with specific language impairment (SLI) show that proposed approach is successful in prediction of scores with averaged product-moment correlations of 0.84 between observed and predicted ratings across test folds.

12.
Curr Alzheimer Res ; 12(6): 513-9, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26027814

RESUMO

BACKGROUND: Detecting early signs of Alzheimer's disease (AD) and mild cognitive impairment (MCI) during the pre-symptomatic phase is becoming increasingly important for costeffective clinical trials and also for deriving maximum benefit from currently available treatment strategies. However, distinguishing early signs of MCI from normal cognitive aging is difficult. Biomarkers have been extensively examined as early indicators of the pathological process for AD, but assessing these biomarkers is expensive and challenging to apply widely among pre-symptomatic community dwelling older adults. Here we propose assessment of social markers, which could provide an alternative or complementary and ecologically valid strategy for identifying the pre-symptomatic phase leading to MCI and AD. METHODS: The data came from a larger randomized controlled clinical trial (RCT), where we examined whether daily conversational interactions using remote video telecommunications software could improve cognitive functions of older adult participants. We assessed the proportion of words generated by participants out of total words produced by both participants and staff interviewers using transcribed conversations during the intervention trial as an indicator of how two people (participants and interviewers) interact with each other in one-on-one conversations. We examined whether the proportion differed between those with intact cognition and MCI, using first, generalized estimating equations with the proportion as outcome, and second, logistic regression models with cognitive status as outcome in order to estimate the area under ROC curve (ROC AUC). RESULTS: Compared to those with normal cognitive function, MCI participants generated a greater proportion of words out of the total number of words during the timed conversation sessions (p=0.01). This difference remained after controlling for participant age, gender, interviewer and time of assessment (p=0.03). The logistic regression models showed the ROC AUC of identifying MCI (vs. normals) was 0.71 (95% Confidence Interval: 0.54 - 0.89) when average proportion of word counts spoken by subjects was included univariately into the model. CONCLUSION: An ecologically valid social marker such as the proportion of spoken words produced during spontaneous conversations may be sensitive to transitions from normal cognition to MCI.


Assuntos
Disfunção Cognitiva/psicologia , Disfunção Cognitiva/reabilitação , Entrevista Psicológica/métodos , Comportamento Social , Fala/fisiologia , Idoso , Idoso de 80 Anos ou mais , Doença de Alzheimer/psicologia , Doença de Alzheimer/reabilitação , Doenças Assintomáticas/reabilitação , Biomarcadores , Progressão da Doença , Feminino , Humanos , Modelos Logísticos , Masculino , Testes Neuropsicológicos
13.
Comput Speech Lang ; 29(1): 172-185, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25382935

RESUMO

For several decades now, there has been sporadic interest in automatically characterizing the speech impairment due to Parkinson's disease (PD). Most early studies were confined to quantifying a few speech features that were easy to compute. More recent studies have adopted a machine learning approach where a large number of potential features are extracted and the models are learned automatically from the data. In the same vein, here we characterize the disease using a relatively large cohort of 168 subjects, collected from multiple (three) clinics. We elicited speech using three tasks - the sustained phonation task, the diadochokinetic task and a reading task, all within a time budget of 4 minutes, prompted by a portable device. From these recordings, we extracted 1582 features for each subject using openSMILE, a standard feature extraction tool. We compared the effectiveness of three strategies for learning a regularized regression and find that ridge regression performs better than lasso and support vector regression for our task. We refine the feature extraction to capture pitch-related cues, including jitter and shimmer, more accurately using a time-varying harmonic model of speech. Our results show that the severity of the disease can be inferred from speech with a mean absolute error of about 5.5, explaining 61% of the variance and consistently well-above chance across all clinics. Of the three speech elicitation tasks, we find that the reading task is significantly better at capturing cues than diadochokinetic or sustained phonation task. In all, we have demonstrated that the data collection and inference can be fully automated, and the results show that speech-based assessment has promising practical application in PD. The techniques reported here are more widely applicable to other paralinguistic tasks in clinical domain.

14.
Annu Int Conf IEEE Eng Med Biol Soc ; 2015: 5573-6, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26737555

RESUMO

Phonological disorders affect 10% of preschool and school-age children, adversely affecting their communication, academic performance, and interaction level. Effective pronunciation training requires prolonged supervised practice and interaction. Unfortunately, many children do not have access or only limited access to a speech-language pathologist. Computer-assisted pronunciation training has the potential for being a highly effective teaching aid; however, to-date such systems remain incapable of identifying pronunciation errors with sufficient accuracy. In this paper, we propose to improve accuracy by (1) learning acoustic models from a large children's speech database, (2) using an explicit model of typical pronunciation errors of children in the target age range, and (3) explicit modeling of the acoustics of distorted phonemes.


Assuntos
Transtorno Fonológico , Criança , Humanos , Fonética , Fala , Medida da Produção da Fala
15.
Artigo em Inglês | MEDLINE | ID: mdl-33288990

RESUMO

In this paper, we investigate the problem of detecting depression from recordings of subjects' speech using speech processing and machine learning. There has been considerable interest in this problem in recent years due to the potential for developing objective assessments from real-world behaviors, which may provide valuable supplementary clinical information or may be useful in screening. The cues for depression may be present in "what is said" (content) and "how it is said" (prosody). Given the limited amounts of text data, even in this relatively large study, it is difficult to employ standard method of learning models from n-gram features. Instead, we learn models using word representations in an alternative feature space of valence and arousal. This is akin to embedding words into a real vector space albeit with manual ratings instead of those learned with deep neural networks [1]. For extracting prosody, we employ standard feature extractors such as those implemented in openSMILE and compare them with features extracted from harmonic models that we have been developing in recent years. Our experiments show that our features from harmonic model improve the performance of detecting depression from spoken utterances than other alternatives. The context features provide additional improvements to achieve an accuracy of about 74%, sufficient to be useful in screening applications.

16.
Artigo em Inglês | MEDLINE | ID: mdl-21095825

RESUMO

Parkinson's disease is known to cause mild to profound communication impairments depending on the stage of progression of the disease. There is a growing interest in home-based assessment tools for measuring severity of Parkinson's disease and speech is an appealing source of evidence. This paper reports tasks to elicit a versatile sample of voice production, algorithms to extract useful information from speech and models to predict the severity of the disease. Apart from standard features from time domain (e.g., energy, speaking rate), spectral domain (e.g., pitch, spectral entropy) and cepstral domain (e.g, mel-frequency warped cepstral coefficients), we also estimate harmonic-to-noise ratio, shimmer and jitter using our recently developed algorithms. In a preliminary study, we evaluate the proposed paradigm on data collected through 2 clinics from 82 subjects in 116 assessment sessions. Our results show that the information extracted from speech, elicited through 3 tasks, can predict the severity of the disease to within a mean absolute error of 5.7 with respect to the clinical assessment using the Unified Parkinson's Disease Rating Scale; the range of target motor sub-scale is 0 to 108. Our analysis shows that elicitation of speech through less constrained task provides useful information not captured in widely employed phonation task. While still preliminary, our results demonstrate that the proposed computational approach has promising real-world applications such as in home-based assessment or in telemonitoring of Parkinson's disease.


Assuntos
Doença de Parkinson/patologia , Doença de Parkinson/fisiopatologia , Índice de Gravidade de Doença , Fala/fisiologia , Humanos , Análise de Regressão , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA