Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
medRxiv ; 2024 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-38978682

RESUMEN

Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that severely impacts affected persons' speech and motor functions, yet early detection and tracking of disease progression remain challenging. The current gold standard for monitoring ALS progression, the ALS functional rating scale - revised (ALSFRS-R), is based on subjective ratings of symptom severity, and may not capture subtle but clinically meaningful changes due to a lack of granularity. Multimodal speech measures which can be automatically collected from patients in a remote fashion allow us to bridge this gap because they are continuous-valued and therefore, potentially more granular at capturing disease progression. Here we investigate the responsiveness and sensitivity of multimodal speech measures in persons with ALS (pALS) collected via a remote patient monitoring platform in an effort to quantify how long it takes to detect a clinically-meaningful change associated with disease progression. We recorded audio and video from 278 participants and automatically extracted multimodal speech biomarkers (acoustic, orofacial, linguistic) from the data. We find that the timing alignment of pALS speech relative to a canonical elicitation of the same prompt and the number of words used to describe a picture are the most responsive measures at detecting such change in both pALS with bulbar ( n = 36) and non-bulbar onset ( n = 107). Interestingly, the responsiveness of these measures is stable even at small sample sizes. We further found that certain speech measures are sensitive enough to track bulbar decline even when there is no patient-reported clinical change, i.e. the ALSFRS-R speech score remains unchanged at 3 out of a total possible score of 4. The findings of this study have the potential to facilitate improved, accelerated and cost-effective clinical trials and care.

2.
J Speech Lang Hear Res ; : 1-13, 2024 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-38984943

RESUMEN

PURPOSE: Automated remote assessment and monitoring of patients' neurological and mental health is increasingly becoming an essential component of the digital clinic and telehealth ecosystem, especially after the COVID-19 pandemic. This review article reviews various modalities of health information that are useful for developing such remote clinical assessments in the real world at scale. APPROACH: We first present an overview of the various modalities of health information-speech acoustics, natural language, conversational dynamics, orofacial or full body movement, eye gaze, respiration, cardiopulmonary, and neural-which can each be extracted from various signal sources-audio, video, text, or sensors. We further motivate their clinical utility with examples of how information from each modality can help us characterize how different disorders affect different aspects of patients' spoken communication. We then elucidate the advantages of combining one or more of these modalities toward a more holistic, informative, and robust assessment. FINDINGS: We find that combining multiple modalities of health information allows for improved scientific interpretability, improved performance on downstream health applications such as early detection and progress monitoring, improved technological robustness, and improved user experience. We illustrate how these principles can be leveraged for remote clinical assessment at scale using a real-world case study of the Modality assessment platform. CONCLUSION: This review article motivates the combination of human-centric information from multiple modalities to measure various aspects of patients' health, arguing that remote clinical assessment that integrates this complementary information can be more effective and lead to better clinical outcomes than using any one data stream in isolation.

3.
Interspeech ; 2023: 5441-5445, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37791043

RESUMEN

We investigate the feasibility, task compliance and audiovisual data quality of a multimodal dialog-based solution for remote assessment of Amyotrophic Lateral Sclerosis (ALS). 53 people with ALS and 52 healthy controls interacted with Tina, a cloud-based conversational agent, in performing speech tasks designed to probe various aspects of motor speech function while their audio and video was recorded. We rated a total of 250 recordings for audio/video quality and participant task compliance, along with the relative frequency of different issues observed. We observed excellent compliance (98%) and audio (95.2%) and visual quality rates (84.8%), resulting in an overall yield of 80.8% recordings that were both compliant and of high quality. Furthermore, recording quality and compliance were not affected by level of speech severity and did not differ significantly across end devices. These findings support the utility of dialog systems for remote monitoring of speech in ALS.

4.
Front Psychol ; 14: 1135469, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37767217

RESUMEN

Background: The rise of depression, anxiety, and suicide rates has led to increased demand for telemedicine-based mental health screening and remote patient monitoring (RPM) solutions to alleviate the burden on, and enhance the efficiency of, mental health practitioners. Multimodal dialog systems (MDS) that conduct on-demand, structured interviews offer a scalable and cost-effective solution to address this need. Objective: This study evaluates the feasibility of a cloud based MDS agent, Tina, for mental state characterization in participants with depression, anxiety, and suicide risk. Method: Sixty-eight participants were recruited through an online health registry and completed 73 sessions, with 15 (20.6%), 21 (28.8%), and 26 (35.6%) sessions screening positive for depression, anxiety, and suicide risk, respectively using conventional screening instruments. Participants then interacted with Tina as they completed a structured interview designed to elicit calibrated, open-ended responses regarding the participants' feelings and emotional state. Simultaneously, the platform streamed their speech and video recordings in real-time to a HIPAA-compliant cloud server, to compute speech, language, and facial movement-based biomarkers. After their sessions, participants completed user experience surveys. Machine learning models were developed using extracted features and evaluated with the area under the receiver operating characteristic curve (AUC). Results: For both depression and suicide risk, affected individuals tended to have a higher percent pause time, while those positive for anxiety showed reduced lip movement relative to healthy controls. In terms of single-modality classification models, speech features performed best for depression (AUC = 0.64; 95% CI = 0.51-0.78), facial features for anxiety (AUC = 0.57; 95% CI = 0.43-0.71), and text features for suicide risk (AUC = 0.65; 95% CI = 0.52-0.78). Best overall performance was achieved by decision fusion of all models in identifying suicide risk (AUC = 0.76; 95% CI = 0.65-0.87). Participants reported the experience comfortable and shared their feelings. Conclusion: MDS is a feasible, useful, effective, and interpretable solution for RPM in real-world clinical depression, anxiety, and suicidal populations. Facial information is more informative for anxiety classification, while speech and language are more discriminative of depression and suicidality markers. In general, combining speech, language, and facial information improved model performance on all classification tasks.

5.
PLoS Comput Biol ; 19(7): e1011244, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37506120

RESUMEN

Upon perceiving sensory errors during movements, the human sensorimotor system updates future movements to compensate for the errors, a phenomenon called sensorimotor adaptation. One component of this adaptation is thought to be driven by sensory prediction errors-discrepancies between predicted and actual sensory feedback. However, the mechanisms by which prediction errors drive adaptation remain unclear. Here, auditory prediction error-based mechanisms involved in speech auditory-motor adaptation were examined via the feedback aware control of tasks in speech (FACTS) model. Consistent with theoretical perspectives in both non-speech and speech motor control, the hierarchical architecture of FACTS relies on both the higher-level task (vocal tract constrictions) as well as lower-level articulatory state representations. Importantly, FACTS also computes sensory prediction errors as a part of its state feedback control mechanism, a well-established framework in the field of motor control. We explored potential adaptation mechanisms and found that adaptive behavior was present only when prediction errors updated the articulatory-to-task state transformation. In contrast, designs in which prediction errors updated forward sensory prediction models alone did not generate adaptation. Thus, FACTS demonstrated that 1) prediction errors can drive adaptation through task-level updates, and 2) adaptation is likely driven by updates to task-level control rather than (only) to forward predictive models. Additionally, simulating adaptation with FACTS generated a number of important hypotheses regarding previously reported phenomena such as identifying the source(s) of incomplete adaptation and driving factor(s) for changes in the second formant frequency during adaptation to the first formant perturbation. The proposed model design paves the way for a hierarchical state feedback control framework to be examined in the context of sensorimotor adaptation in both speech and non-speech effector systems.


Asunto(s)
Adaptación Fisiológica , Habla , Humanos , Retroalimentación , Retroalimentación Sensorial , Movimiento
6.
Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 3464-3467, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-36086652

RESUMEN

We present a cloud-based multimodal dialogue platform for the remote assessment and monitoring of speech, facial and fine motor function in Parkinson's Disease (PD) at scale, along with a preliminary investigation of the efficacy of the various metrics automatically extracted by the platform. 22 healthy controls and 38 people with Parkinson's Disease (pPD) were instructed to complete four interactive sessions, spaced a week apart, on the platform. Each session involved a battery of tasks designed to elicit speech, facial movements and finger movements. We find that speech, facial kinematic and finger movement dexterity metrics show statistically significant differences between controls and pPD. We further investigate the sensitivity, specificity, reliability and generalisability of these metrics. Our results offer encouraging evidence for the utility of automatically-extracted audiovisual analytics in remote mon-itoring of PD and other movement disorders.


Asunto(s)
Enfermedad de Parkinson , Habla , Dedos , Humanos , Movimiento , Enfermedad de Parkinson/diagnóstico , Reproducibilidad de los Resultados
7.
JASA Express Lett ; 1(12): 124402, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-35005711

RESUMEN

The Maeda model was used to generate a large set of vocoid-producing vocal tract configurations. The resulting dataset (a) produced a comprehensive range of formant frequencies and (b) displayed discrete tongue body constriction locations (palatal, velar/uvular, and lower pharyngeal). The discrete parameterization of constriction location across the vowel space suggests this is likely a fundamental characteristic of the human vocal tract, and not limited to any specific set of vowel contrasts. These findings suggest that in addition to established articulatory-acoustic constraints, fundamental biomechanical constraints of the vocal tract may also explain such discreteness.

8.
J Acoust Soc Am ; 148(6): 3682, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33379892

RESUMEN

A hallmark feature of speech motor control is its ability to learn to anticipate and compensate for persistent feedback alterations, a process referred to as sensorimotor adaptation. Because this process involves adjusting articulation to counter the perceived effects of altering acoustic feedback, there are a number of factors that affect it, including the complex relationship between acoustics and articulation and non-uniformities of speech perception. As a consequence, sensorimotor adaptation is hypothesised to vary as a function of the direction of the applied auditory feedback alteration in vowel formant space. This hypothesis was tested in two experiments where auditory feedback was altered in real time, shifting the frequency values of the first and second formants (F1 and F2) of participants' speech. Shifts were designed on a subject-by-subject basis and sensorimotor adaptation was quantified with respect to the direction of applied shift, normalised for individual speakers. Adaptation was indeed found to depend on the direction of the applied shift in vowel formant space, independent of shift magnitude. These findings have implications for models of sensorimotor adaptation of speech.


Asunto(s)
Percepción del Habla , Habla , Retroalimentación , Retroalimentación Sensorial , Humanos , Acústica del Lenguaje
9.
PLoS Comput Biol ; 15(9): e1007321, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31479444

RESUMEN

We present a new computational model of speech motor control: the Feedback-Aware Control of Tasks in Speech or FACTS model. FACTS employs a hierarchical state feedback control architecture to control simulated vocal tract and produce intelligible speech. The model includes higher-level control of speech tasks and lower-level control of speech articulators. The task controller is modeled as a dynamical system governing the creation of desired constrictions in the vocal tract, after Task Dynamics. Both the task and articulatory controllers rely on an internal estimate of the current state of the vocal tract to generate motor commands. This estimate is derived, based on efference copy of applied controls, from a forward model that predicts both the next vocal tract state as well as expected auditory and somatosensory feedback. A comparison between predicted feedback and actual feedback is then used to update the internal state prediction. FACTS is able to qualitatively replicate many characteristics of the human speech system: the model is robust to noise in both the sensory and motor pathways, is relatively unaffected by a loss of auditory feedback but is more significantly impacted by the loss of somatosensory feedback, and responds appropriately to externally-imposed alterations of auditory and somatosensory feedback. The model also replicates previously hypothesized trade-offs between reliance on auditory and somatosensory feedback and shows for the first time how this relationship may be mediated by acuity in each sensory domain. These results have important implications for our understanding of the speech motor control system in humans.


Asunto(s)
Modelos Biológicos , Destreza Motora/fisiología , Habla/fisiología , Biología Computacional , Retroalimentación Sensorial/fisiología , Humanos , Corteza Sensoriomotora/fisiología
10.
IEEE/ACM Trans Audio Speech Lang Process ; 26(5): 967-980, 2018 May.
Artículo en Inglés | MEDLINE | ID: mdl-30271810

RESUMEN

We present a method for speech enhancement of data collected in extremely noisy environments, such as those obtained during magnetic resonance imaging (MRI) scans. We propose an algorithm based on dictionary learning to perform this enhancement. We use complex nonnegative matrix factorization with intra-source additivity (CMF-WISA) to learn dictionaries of the noise and speech+noise portions of the data and use these to factor the noisy spectrum into estimated speech and noise components. We augment the CMF-WISA cost function with spectral and temporal regularization terms to improve the noise modeling. Based on both objective and subjective assessments, we find that our algorithm significantly outperforms traditional techniques such as Least Mean Squares (LMS) filtering, while not requiring prior knowledge or specific assumptions such as periodicity of the noise waveforms that current state-of-the-art algorithms require.

11.
Comput Speech Lang ; 36: 330-346, 2016 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-26688612

RESUMEN

How the speech production and perception systems evolved in humans still remains a mystery today. Previous research suggests that human auditory systems are able, and have possibly evolved, to preserve maximal information about the speaker's articulatory gestures. This paper attempts an initial step towards answering the complementary question of whether speakers' articulatory mechanisms have also evolved to produce sounds that can be optimally discriminated by the listener's auditory system. To this end we explicitly model, using computational methods, the extent to which derived representations of "primitive movements" of speech articulation can be used to discriminate between broad phone categories. We extract interpretable spatio-temporal primitive movements as recurring patterns in a data matrix of human speech articulation, i.e. representing the trajectories of vocal tract articulators over time. To this end, we propose a weakly-supervised learning method that attempts to find a part-based representation of the data in terms of recurring basis trajectory units (or primitives) and their corresponding activations over time. For each phone interval, we then derive a feature representation that captures the co-occurrences between the activations of the various bases over different time-lags. We show that this feature, derived entirely from activations of these primitive movements, is able to achieve a greater discrimination relative to using conventional features on an interval-based phone classification task. We discuss the implications of these findings in furthering our understanding of speech signal representations and the links between speech production and perception systems.

12.
Comput Speech Lang ; 36: 196-211, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-28496292

RESUMEN

We propose a practical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks.

13.
J Acoust Soc Am ; 136(3): 1307, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25190403

RESUMEN

USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English. Electromagnetic articulography data have also been presently collected from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus used previously in the MOCHA-TIMIT database. In both cases the audio signal was recorded and synchronized with the articulatory data. The database and companion software are freely available to the research community.


Asunto(s)
Acústica , Investigación Biomédica , Bases de Datos Factuales , Fenómenos Electromagnéticos , Imagen por Resonancia Magnética , Faringe/fisiología , Acústica del Lenguaje , Medición de la Producción del Habla , Calidad de la Voz , Acústica/instrumentación , Adulto , Fenómenos Biomecánicos , Femenino , Humanos , Masculino , Persona de Mediana Edad , Faringe/anatomía & histología , Procesamiento de Señales Asistido por Computador , Programas Informáticos , Medición de la Producción del Habla/instrumentación , Factores de Tiempo , Transductores
14.
PLoS One ; 9(8): e104168, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25133544

RESUMEN

We address the hypothesis that postures adopted during grammatical pauses in speech production are more "mechanically advantageous" than absolute rest positions for facilitating efficient postural motor control of vocal tract articulators. We quantify vocal tract posture corresponding to inter-speech pauses, absolute rest intervals as well as vowel and consonant intervals using automated analysis of video captured with real-time magnetic resonance imaging during production of read and spontaneous speech by 5 healthy speakers of American English. We then use locally-weighted linear regression to estimate the articulatory forward map from low-level articulator variables to high-level task/goal variables for these postures. We quantify the overall magnitude of the first derivative of the forward map as a measure of mechanical advantage. We find that postures assumed during grammatical pauses in speech as well as speech-ready postures are significantly more mechanically advantageous than postures assumed during absolute rest. Further, these postures represent empirical extremes of mechanical advantage, between which lie the postures assumed during various vowels and consonants. Relative mechanical advantage of different postures might be an important physical constraint influencing planning and control of speech production.


Asunto(s)
Acústica del Lenguaje , Fenómenos Biomecánicos , Femenino , Humanos , Maxilares/fisiología , Labio/fisiología , Destreza Motora , Postura , Lengua/fisiología , Pliegues Vocales/fisiología
15.
Phonetica ; 71(4): 229-48, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25997724

RESUMEN

The English past tense allomorph following a coronal stop (e.g., /bɑndəd/) includes a vocoid that has traditionally been transcribed as a schwa or as a barred i. Previous evidence has suggested that this entity does not involve a specific articulatory gesture of any kind. Rather, its presence may simply result from temporal coordination of the two temporally adjacent coronal gestures, while the interval between those two gestures remains voiced and is acoustically reminiscent of a schwa. The acoustic and articulatory characteristics of this vocoid are reexamined in this work using real-time MRI with synchronized audio which affords complete midsagittal views of the vocal tract. A novel statistical analysis is developed to address the issue of articulatory targetlessness based on previous models that predict articulatory action from segmental context. Results reinforce the idea that this vocoid is different, both acoustically and articulatorily, than lexical schwa, but its targetless nature is not supported. Data suggest that an articulatory target does exist, especially in the pharynx where it is revealed by the new data acquisition methodology. Moreover, substantial articulatory differences are observed between subjects, which highlights both the difficulty in characterizing this entity previously, and the need for further study with additional subjects.


Asunto(s)
Lenguaje , Imagen por Resonancia Magnética/métodos , Fonética , Acústica del Lenguaje , Pruebas de Articulación del Habla/métodos , Gestos , Humanos , Masculino , Faringe/fisiología , Adulto Joven
16.
J Acoust Soc Am ; 134(2): 1378-94, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23927134

RESUMEN

This paper presents a computational approach to derive interpretable movement primitives from speech articulation data. It puts forth a convolutive Nonnegative Matrix Factorization algorithm with sparseness constraints (cNMFsc) to decompose a given data matrix into a set of spatiotemporal basis sequences and an activation matrix. The algorithm optimizes a cost function that trades off the mismatch between the proposed model and the input data against the number of primitives that are active at any given instant. The method is applied to both measured articulatory data obtained through electromagnetic articulography as well as synthetic data generated using an articulatory synthesizer. The paper then describes how to evaluate the algorithm performance quantitatively and further performs a qualitative assessment of the algorithm's ability to recover compositional structure from data. This is done using pseudo ground-truth primitives generated by the articulatory synthesizer based on an Articulatory Phonology frame-work [Browman and Goldstein (1995). "Dynamics and articulatory phonology," in Mind as motion: Explorations in the dynamics of cognition, edited by R. F. Port and T.van Gelder (MIT Press, Cambridge, MA), pp. 175-194]. The results suggest that the proposed algorithm extracts movement primitives from human speech production data that are linguistically interpretable. Such a framework might aid the understanding of longstanding issues in speech production such as motor control and coarticulation.


Asunto(s)
Laringe/fisiología , Modelos Teóricos , Boca/fisiología , Acústica del Lenguaje , Calidad de la Voz , Algoritmos , Fenómenos Biomecánicos , Simulación por Computador , Fenómenos Electromagnéticos , Femenino , Humanos , Masculino , Destreza Motora , Análisis Numérico Asistido por Computador , Reproducibilidad de los Resultados , Medición de la Producción del Habla , Factores de Tiempo
17.
J Acoust Soc Am ; 134(1): 510-9, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23862826

RESUMEN

This paper presents an automatic procedure to analyze articulatory setting in speech production using real-time magnetic resonance imaging of the moving human vocal tract. The procedure extracts frames corresponding to inter-speech pauses, speech-ready intervals and absolute rest intervals from magnetic resonance imaging sequences of read and spontaneous speech elicited from five healthy speakers of American English and uses automatically extracted image features to quantify vocal tract posture during these intervals. Statistical analyses show significant differences between vocal tract postures adopted during inter-speech pauses and those at absolute rest before speech; the latter also exhibits a greater variability in the adopted postures. In addition, the articulatory settings adopted during inter-speech pauses in read and spontaneous speech are distinct. The results suggest that adopted vocal tract postures differ on average during rest positions, ready positions and inter-speech pauses, and might, in that order, involve an increasing degree of active control by the cognitive speech planning mechanism.


Asunto(s)
Epiglotis/fisiología , Glotis/fisiología , Interpretación de Imagen Asistida por Computador/métodos , Labio/fisiología , Imagen por Resonancia Magnética/métodos , Paladar Blando/fisiología , Faringe/fisiología , Fonación/fisiología , Fonética , Habla/fisiología , Lengua/fisiología , Algoritmos , Femenino , Humanos , Contracción Muscular/fisiología , Ventilación Pulmonar/fisiología , Posición Supina/fisiología
18.
J Acoust Soc Am ; 126(5): EL160-5, 2009 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-19894792

RESUMEN

It is hypothesized that pauses at major syntactic boundaries (i.e., grammatical pauses), but not ungrammatical (e.g., word search) pauses, are planned by a high-level cognitive mechanism that also controls the rate of articulation around these junctures. Real-time magnetic resonance imaging is used to analyze articulation at and around grammatical and ungrammatical pauses in spontaneous speech. Measures quantifying the speed of articulators were developed and applied during these pauses as well as during their immediate neighborhoods. Grammatical pauses were found to have an appreciable drop in speed at the pause itself as compared to ungrammatical pauses, which is consistent with our hypothesis that grammatical pauses are indeed choreographed by a central cognitive planner.


Asunto(s)
Corteza Auditiva/fisiología , Imagen por Resonancia Magnética/métodos , Pruebas de Articulación del Habla , Habla/fisiología , Pliegues Vocales/fisiología , Humanos , Fonética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA