Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
J Autism Dev Disord ; 2024 Apr 13.
Artículo en Inglés | MEDLINE | ID: mdl-38613592

RESUMEN

PURPOSE: Non-verbal utterances are an important tool of communication for individuals who are non- or minimally-speaking. While these utterances are typically understood by caregivers, they can be challenging to interpret by their larger community. To date, there has been little work done to detect and characterize the vocalizations produced by non- or minimally-speaking individuals. This paper aims to characterize five categories of utterances across a set of 7 non- or minimally-speaking individuals. METHODS: The characterization is accomplished using a correlation structure methodology, acting as a proxy measurement for motor coordination, to localize similarities and differences to specific speech production systems. RESULTS: We specifically find that frustrated and dysregulated utterances show similar correlation structure outputs, especially when compared to self-talk, request, and delighted utterances. We additionally witness higher complexity of coordination between articulatory and respiratory subsystems and lower complexity of coordination between laryngeal and respiratory subsystems in frustration and dysregulation as compared to self-talk, request, and delight. Finally, we observe lower complexity of coordination across all three speech subsystems in the request utterances as compared to self-talk and delight. CONCLUSION: The insights from this work aid in understanding of the modifications made by non- or minimally-speaking individuals to accomplish specific goals in non-verbal communication.

2.
Sci Rep ; 13(1): 1567, 2023 01 28.
Artículo en Inglés | MEDLINE | ID: mdl-36709368

RESUMEN

In the face of the global pandemic caused by the disease COVID-19, researchers have increasingly turned to simple measures to detect and monitor the presence of the disease in individuals at home. We sought to determine if measures of neuromotor coordination, derived from acoustic time series, as well as phoneme-based and standard acoustic features extracted from recordings of simple speech tasks could aid in detecting the presence of COVID-19. We further hypothesized that these features would aid in characterizing the effect of COVID-19 on speech production systems. A protocol, consisting of a variety of speech tasks, was administered to 12 individuals with COVID-19 and 15 individuals with other viral infections at University Hospital Galway. From these recordings, we extracted a set of acoustic time series representative of speech production subsystems, as well as their univariate statistics. The time series were further utilized to derive correlation-based features, a proxy for speech production motor coordination. We additionally extracted phoneme-based features. These features were used to create machine learning models to distinguish between the COVID-19 positive and other viral infection groups, with respiratory- and laryngeal-based features resulting in the highest performance. Coordination-based features derived from harmonic-to-noise ratio time series from read speech discriminated between the two groups with an area under the ROC curve (AUC) of 0.94. A longitudinal case study of two subjects, one from each group, revealed differences in laryngeal based acoustic features, consistent with observed physiological differences between the two groups. The results from this analysis highlight the promise of using nonintrusive sensing through simple speech recordings for early warning and tracking of COVID-19.


Asunto(s)
COVID-19 , Humanos , COVID-19/diagnóstico , Habla/fisiología , Acústica , Ruido , Medición de la Producción del Habla/métodos
3.
Eur J Neurosci ; 55(5): 1262-1277, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35098604

RESUMEN

Everyday environments often contain distracting competing talkers and background noise, requiring listeners to focus their attention on one acoustic source and reject others. During this auditory attention task, listeners may naturally interrupt their sustained attention and switch attended sources. The effort required to perform this attention switch has not been well studied in the context of competing continuous speech. In this work, we developed two variants of endogenous attention switching and a sustained attention control. We characterized these three experimental conditions under the context of decoding auditory attention, while simultaneously evaluating listening effort and neural markers of spatial-audio cues. A least-squares, electroencephalography (EEG)-based, attention decoding algorithm was implemented across all conditions. It achieved an accuracy of 69.4% and 64.0% when computed over nonoverlapping 10 and 5-s correlation windows, respectively. Both decoders illustrated smooth transitions in the attended talker prediction through switches at approximately half of the analysis window size (e.g., the mean lag taken across the two switch conditions was 2.2 s when the 5-s correlation window was used). Expended listening effort, as measured by simultaneous EEG and pupillometry, was also a strong indicator of whether the listeners sustained attention or performed an endogenous attention switch (peak pupil diameter measure [ p=0.034 ] and minimum parietal alpha power measure [ p=0.016 ]). We additionally found evidence of talker spatial cues in the form of centrotemporal alpha power lateralization ( p=0.0428 ). These results suggest that listener effort and spatial cues may be promising features to pursue in a decoding context, in addition to speech-based features.


Asunto(s)
Percepción del Habla , Estimulación Acústica/métodos , Atención , Electroencefalografía , Esfuerzo de Escucha , Pupila
4.
Front Neurol ; 12: 665338, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34295299

RESUMEN

Repeated subconcussive blows to the head during sports or other contact activities may have a cumulative and long lasting effect on cognitive functioning. Unobtrusive measurement and tracking of cognitive functioning is needed to enable preventative interventions for people at elevated risk of concussive injury. The focus of the present study is to investigate the potential for using passive measurements of fine motor movements (smooth pursuit eye tracking and read speech) and resting state brain activity (measured using fMRI) to complement existing diagnostic tools, such as the Immediate Post-concussion Assessment and Cognitive Testing (ImPACT), that are used for this purpose. Thirty-one high school American football and soccer athletes were tracked through the course of a sports season. Hypotheses were that (1) measures of complexity of fine motor coordination and of resting state brain activity are predictive of cognitive functioning measured by the ImPACT test, and (2) within-subject changes in these measures over the course of a sports season are predictive of changes in ImPACT scores. The first principal component of the six ImPACT composite scores was used as a latent factor that represents cognitive functioning. This latent factor was positively correlated with four of the ImPACT composites: verbal memory, visual memory, visual motor speed and reaction speed. Strong correlations, ranging between r = 0.26 and r = 0.49, were found between this latent factor and complexity features derived from each sensor modality. Based on a regression model, the complexity features were combined across sensor modalities and used to predict the latent factor on out-of-sample subjects. The predictions correlated with the true latent factor with r = 0.71. Within-subject changes over time were predicted with r = 0.34. These results indicate the potential to predict cognitive performance from passive monitoring of fine motor movements and brain activity, offering initial support for future application in detection of performance deficits associated with subconcussive events.

5.
Neural Netw ; 140: 136-147, 2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-33765529

RESUMEN

Future wearable technology may provide for enhanced communication in noisy environments and for the ability to pick out a single talker of interest in a crowded room simply by the listener shifting their attentional focus. Such a system relies on two components, speaker separation and decoding the listener's attention to acoustic streams in the environment. To address the former, we present a system for joint speaker separation and noise suppression, referred to as the Binaural Enhancement via Attention Masking Network (BEAMNET). The BEAMNET system is an end-to-end neural network architecture based on self-attention. Binaural input waveforms are mapped to a joint embedding space via a learned encoder, and separate multiplicative masking mechanisms are included for noise suppression and speaker separation. Pairs of output binaural waveforms are then synthesized using learned decoders, each capturing a separated speaker while maintaining spatial cues. A key contribution of BEAMNET is that the architecture contains a separation path, an enhancement path, and an autoencoder path. This paper proposes a novel loss function which simultaneously trains these paths, so that disabling the masking mechanisms during inference causes BEAMNET to reconstruct the input speech signals. This allows dynamic control of the level of suppression applied by BEAMNET via a minimum gain level, which is not possible in other state-of-the-art approaches to end-to-end speaker separation. This paper also proposes a perceptually-motivated waveform distance measure. Using objective speech quality metrics, the proposed system is demonstrated to perform well at separating two equal-energy talkers, even in high levels of background noise. Subjective testing shows an improvement in speech intelligibility across a range of noise levels, for signals with artificially added head-related transfer functions and background noise. Finally, when used as part of an auditory attention decoder (AAD) system using existing electroencephalogram (EEG) data, BEAMNET is found to maintain the decoding accuracy achieved with ideal speaker separation, even in severe acoustic conditions. These results suggest that this enhancement system is highly effective at decoding auditory attention in realistic noise environments, and could possibly lead to improved speech perception in a cognitively controlled hearing aid.


Asunto(s)
Cognición , Audífonos/normas , Ruido , Adulto , Atención , Aglomeración , Señales (Psicología) , Potenciales Evocados Auditivos , Humanos , Masculino , Percepción del Habla
6.
Front Neurol ; 12: 584684, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33746869

RESUMEN

There is mounting evidence linking the cumulative effects of repetitive head impacts to neuro-degenerative conditions. Robust clinical assessment tools to identify mild traumatic brain injuries are needed to assist with timely diagnosis for return-to-field decisions and appropriately guide rehabilitation. The focus of the present study is to investigate the potential for oculomotor features to complement existing diagnostic tools, such as measurements of Optic Nerve Sheath Diameter (ONSD) and Immediate Post-concussion Assessment and Cognitive Testing (ImPACT). Thirty-one high school American football and soccer athletes were tracked through the course of a sports season. Given the high risk of repetitive head impacts associated with both soccer and football, our hypotheses were that (1) ONSD and ImPACT scores would worsen through the season and (2) oculomotor features would effectively capture both neurophysiological changes reflected by ONSD and neuro-functional status assessed via ImPACT. Oculomotor features were used as input to Linear Mixed-Effects Regression models to predict ONSD and ImPACT scores as outcomes. Prediction accuracy was evaluated to identify explicit relationships between eye movements, ONSD, and ImPACT scores. Significant Pearson correlations were observed between predicted and actual outcomes for ONSD (Raw = 0.70; Normalized = 0.45) and for ImPACT (Raw = 0.86; Normalized = 0.71), demonstrating the capability of oculomotor features to capture neurological changes detected by both ONSD and ImPACT. The most predictive features were found to relate to motor control and visual-motor processing. In future work, oculomotor models, linking neural structures to oculomotor function, can be built to gain extended mechanistic insights into neurophysiological changes observed through seasons of participation in contact sports.

7.
Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 832-836, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-33018114

RESUMEN

Lapses in vigilance and slowed reactions due to mental fatigue can increase risk of accidents and injuries and degrade performance. This paper describes a method for rapid, unobtrusive detection of mental fatigue based on changes in electrodermal arousal (EDA), and changes in neuromotor coordination derived from speaking. Twenty-nine Soldiers completed a 2-hour battery of cognitive tasks intended to induce fatigue. Behavioral markers derived from audio and video during speech were acquired before and after the 2hour cognitive load tasks, as was EDA. Exposure to cognitive load produced detectable increases in neuromotor variability in speech and facial measures after load and even after a recovery period. A Gaussian mixture model classifier with crossvalidation and fusion across speech, video, and EDA produced an accuracy of AUC=0.99 in detecting a change in cognitive fatigue relative to a personalized baseline.


Asunto(s)
Nivel de Alerta , Fatiga Mental , Cognición , Humanos , Fatiga Mental/diagnóstico , Habla , Vigilia
8.
Sci Rep ; 10(1): 14773, 2020 09 08.
Artículo en Inglés | MEDLINE | ID: mdl-32901067

RESUMEN

Current clinical tests lack the sensitivity needed for detecting subtle balance impairments associated with mild traumatic brain injury (mTBI). Patient-reported symptoms can be significant and have a huge impact on daily life, but impairments may remain undetected or poorly quantified using clinical measures. Our central hypothesis was that provocative sensorimotor perturbations, delivered in a highly instrumented, immersive virtual environment, would challenge sensory subsystems recruited for balance through conflicting multi-sensory evidence, and therefore reveal that not all subsystems are performing optimally. The results show that, as compared to standard clinical tests, the provocative perturbations illuminate balance impairments in subjects who have had mild traumatic brain injuries. Perturbations delivered while subjects were walking provided greater discriminability (average accuracy ≈ 0.90) than those delivered during standing (average accuracy ≈ 0.65) between mTBI subjects and healthy controls. Of the categories of features extracted to characterize balance, the lower limb accelerometry-based metrics proved to be most informative. Further, in response to perturbations, subjects with an mTBI utilized hip strategies more than ankle strategies to prevent loss of balance and also showed less variability in gait patterns. We have shown that sensorimotor conflicts illuminate otherwise-hidden balance impairments, which can be used to increase the sensitivity of current clinical procedures. This augmentation is vital in order to robustly detect the presence of balance impairments after mTBI and potentially define a phenotype of balance dysfunction that enhances risk of injury.


Asunto(s)
Conmoción Encefálica/complicaciones , Ambiente , Trastornos Neurológicos de la Marcha/patología , Equilibrio Postural , Caminata , Acelerometría , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Estudios de Seguimiento , Trastornos Neurológicos de la Marcha/etiología , Humanos , Masculino , Persona de Mediana Edad , Pronóstico , Adulto Joven
9.
Clin Neuropsychol ; 34(6): 1190-1214, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-32657221

RESUMEN

OBJECTIVE: Military job and training activities place significant demands on service members' (SMs') cognitive resources, increasing risk of injury and degrading performance. Early detection of cognitive fatigue is essential to reduce risk and support optimal function. This paper describes a multimodal approach, based on changes in measures of speech motor coordination and electrodermal activity (EDA), for predicting changes in performance following sustained cognitive effort. METHODS: Twenty-nine active duty SMs completed computer-based cognitive tasks for 2 h (load period). Measures of speech derived from audio were acquired, along with concurrent measures of EDA, before and after the load period. Cognitive performance was assessed before and during the load period using the Automated Neuropsychological Assessment Metrics Military Battery (ANAM MIL). Subjective assessments of cognitive effort and alertness were obtained intermittently. RESULTS: Across the load period, participants' ratings of cognitive workload increased, while alertness ratings declined. Cognitive performance declined significantly during the first half of the load period. Three speech and arousal features predicted cognitive performance changes during this period with statistically significant accuracy: EDA (r = 0.43, p = 0.01), articulator velocity coordination (r = 0.50, p = 0.00), and vocal creak (r = 0.35, p = 0.03). Fusing predictions from these features predicted performance changes with r = 0.68 (p = 0.00). CONCLUSIONS: Results suggest that speech and arousal measures may be used to predict changes in performance associated with cognitive fatigue. This work supports ongoing efforts to develop reliable, unobtrusive measures for cognitive state assessment aimed at reducing injury risk, informing return to work decisions, and supporting diverse mobile healthcare applications in civilian and military settings.


Asunto(s)
Cognición/fisiología , Respuesta Galvánica de la Piel/fisiología , Personal Militar/psicología , Pruebas Neuropsicológicas/normas , Habla/fisiología , Adolescente , Adulto , Femenino , Humanos , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Adulto Joven
10.
Front Hum Neurosci ; 14: 222, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32719593

RESUMEN

Modern operational environments can place significant demands on a service member's cognitive resources, increasing the risk of errors or mishaps due to overburden. The ability to monitor cognitive burden and associated performance within operational environments is critical to improving mission readiness. As a key step toward a field-ready system, we developed a simulated marksmanship scenario with an embedded working memory task in an immersive virtual reality environment. As participants performed the marksmanship task, they were instructed to remember numbered targets and recall the sequence of those targets at the end of the trial. Low and high cognitive load conditions were defined as the recall of three- and six-digit strings, respectively. Physiological and behavioral signals recorded included speech, heart rate, breathing rate, and body movement. These features were input into a random forest classifier that significantly discriminated between the low- and high-cognitive load conditions (AUC = 0.94). Behavioral features of gait were the most informative, followed by features of speech. We also showed the capability to predict performance on the digit recall (AUC = 0.71) and marksmanship (AUC = 0.58) tasks. The experimental framework can be leveraged in future studies to quantify the interaction of other types of stressors and their impact on operational cognitive and physical performance.

11.
J Speech Lang Hear Res ; 63(4): 917-930, 2020 04 27.
Artículo en Inglés | MEDLINE | ID: mdl-32302242

RESUMEN

Purpose A common way of eliciting speech from individuals is by using passages of written language that are intended to be read aloud. Read passages afford the opportunity for increased control over the phonetic properties of elicited speech, of which phonetic balance is an often-noted example. No comprehensive analysis of the phonetic balance of read passages has been reported in the literature. The present article provides a quantitative comparison of the phonetic balance of widely used passages in English. Method Assessment of phonetic balance is carried out by comparing the distribution of phonemes in several passages to distributions consistent with typical spoken English. Data regarding the distribution of phonemes in spoken American English are aggregated from the published literature and large speech corpora. Phoneme distributions are compared using Spearman rank order correlation coefficient to quantify similarities of phoneme counts in those sources. Results Correlations between phoneme distributions in read passages and aggregated material representative of spoken American English ranged from .70 to .89. Correlations between phoneme counts from all passages, literature sources, and corpus sources ranged from .55 to .99. All correlations were statistically significant at the Bonferroni-adjusted level. Conclusions Passages considered in the present work provide high, but not ideal, phonetic balance. Space exists for the creation of new passages that more closely match the phoneme distributions observed in spoken American English. The Caterpillar provided the best phonetic balance, but phoneme distributions in all considered materials were highly similar to each other.


Asunto(s)
Fonética , Percepción del Habla , Humanos , Lenguaje , Lectura , Habla
12.
IEEE Open J Eng Med Biol ; 1: 203-206, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-35402959

RESUMEN

Goal: We propose a speech modeling and signal-processing framework to detect and track COVID-19 through asymptomatic and symptomatic stages. Methods: The approach is based on complexity of neuromotor coordination across speech subsystems involved in respiration, phonation and articulation, motivated by the distinct nature of COVID-19 involving lower (i.e., bronchial, diaphragm, lower tracheal) versus upper (i.e., laryngeal, pharyngeal, oral and nasal) respiratory tract inflammation, as well as by the growing evidence of the virus' neurological manifestations. Preliminary results: An exploratory study with audio interviews of five subjects provides Cohen's d effect sizes between pre-COVID-19 (pre-exposure) and post-COVID-19 (after positive diagnosis but presumed asymptomatic) using: coordination of respiration (as measured through acoustic waveform amplitude) and laryngeal motion (fundamental frequency and cepstral peak prominence), and coordination of laryngeal and articulatory (formant center frequencies) motion. Conclusions: While there is a strong subject-dependence, the group-level morphology of effect sizes indicates a reduced complexity of subsystem coordination. Validation is needed with larger more controlled datasets and to address confounding influences such as different recording conditions, unbalanced data quantities, and changes in underlying vocal status from pre-to-post time recordings.

13.
IEEE Access ; 8: 127535-127545, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33747676

RESUMEN

Autism Spectrum Disorder (ASD) is a developmental disorder characterized by difficulty in communication, which includes a high incidence of speech production errors. We hypothesize that these errors are partly due to underlying deficits in motor coordination and control, which are also manifested in degraded fine motor control of facial expressions and purposeful hand movements. In this pilot study, we computed correlations of acoustic, video, and handwriting time-series derived from five children with ASD and five children with neurotypical development during speech and handwriting tasks. These correlations and eigenvalues derived from the correlations act as a proxy for motor coordination across articulatory, laryngeal, and respiratory speech production systems and for fine motor skills. We utilized features derived from these correlations to discriminate between children with and without ASD. Eigenvalues derived from these correlations highlighted differences in complexity of coordination across speech subsystems and during handwriting, and helped discriminate between the two subject groups. These results suggest differences in coupling within speech production and fine motor skill systems in children with ASD. Our long-term goal is to create a platform assessing motor coordination in children with ASD in order to track progress from speech and motor interventions administered by clinicians.

14.
Front Neurosci ; 14: 588448, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33384579

RESUMEN

Many individuals struggle to understand speech in listening scenarios that include reverberation and background noise. An individual's ability to understand speech arises from a combination of peripheral auditory function, central auditory function, and general cognitive abilities. The interaction of these factors complicates the prescription of treatment or therapy to improve hearing function. Damage to the auditory periphery can be studied in animals; however, this method alone is not enough to understand the impact of hearing loss on speech perception. Computational auditory models bridge the gap between animal studies and human speech perception. Perturbations to the modeled auditory systems can permit mechanism-based investigations into observed human behavior. In this study, we propose a computational model that accounts for the complex interactions between different hearing damage mechanisms and simulates human speech-in-noise perception. The model performs a digit classification task as a human would, with only acoustic sound pressure as input. Thus, we can use the model's performance as a proxy for human performance. This two-stage model consists of a biophysical cochlear-nerve spike generator followed by a deep neural network (DNN) classifier. We hypothesize that sudden damage to the periphery affects speech perception and that central nervous system adaptation over time may compensate for peripheral hearing damage. Our model achieved human-like performance across signal-to-noise ratios (SNRs) under normal-hearing (NH) cochlear settings, achieving 50% digit recognition accuracy at -20.7 dB SNR. Results were comparable to eight NH participants on the same task who achieved 50% behavioral performance at -22 dB SNR. We also simulated medial olivocochlear reflex (MOCR) and auditory nerve fiber (ANF) loss, which worsened digit-recognition accuracy at lower SNRs compared to higher SNRs. Our simulated performance following ANF loss is consistent with the hypothesis that cochlear synaptopathy impacts communication in background noise more so than in quiet. Following the insult of various cochlear degradations, we implemented extreme and conservative adaptation through the DNN. At the lowest SNRs (<0 dB), both adapted models were unable to fully recover NH performance, even with hundreds of thousands of training samples. This implies a limit on performance recovery following peripheral damage in our human-inspired DNN architecture.

15.
Ear Hear ; 41(1): 82-94, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31045653

RESUMEN

OBJECTIVES: Hearing-protection devices (HPDs) are made available, and often are required, for industrial use as well as military training exercises and operational duties. However, these devices often are disliked, and consequently not worn, in part because they compromise situational awareness through reduced sound detection and localization performance as well as degraded speech intelligibility. In this study, we carried out a series of tests, involving normal-hearing subjects and multiple background-noise conditions, designed to evaluate the performance of four HPDs in terms of their modifications of auditory-detection thresholds, sound-localization accuracy, and speech intelligibility. In addition, we assessed their impact on listening effort to understand how the additional effort required to perceive and process auditory signals while wearing an HPD reduces available cognitive resources for other tasks. DESIGN: Thirteen normal-hearing subjects participated in a protocol, which included auditory tasks designed to measure detection and localization performance, speech intelligibility, and cognitive load. Each participant repeated the battery of tests with unoccluded ears and four hearing protectors, two active (electronic) and two passive. The tasks were performed both in quiet and in background noise. RESULTS: Our findings indicate that, in variable degrees, all of the tested HPDs induce performance degradation on most of the conducted tasks as compared to the open ear. Of particular note in this study is the finding of increased cognitive load or listening effort, as measured by visual reaction time, for some hearing protectors during a dual-task, which added working-memory demands to the speech-intelligibility task. CONCLUSIONS: These results indicate that situational awareness can vary greatly across the spectrum of HPDs, and that listening effort is another aspect of performance that should be considered in future studies. The increased listening effort induced by hearing protectors may lead to earlier cognitive fatigue in noisy environments. Further study is required to characterize how auditory performance is limited by the combination of hearing impairment and the use of HPDs, and how the effects of such limitations can be linked to safe and effective use of hearing protection to maximize job performance.


Asunto(s)
Localización de Sonidos , Percepción del Habla , Percepción Auditiva , Concienciación , Audición , Humanos
16.
Sci Rep ; 9(1): 11538, 2019 08 08.
Artículo en Inglés | MEDLINE | ID: mdl-31395905

RESUMEN

Auditory attention decoding (AAD) through a brain-computer interface has had a flowering of developments since it was first introduced by Mesgarani and Chang (2012) using electrocorticograph recordings. AAD has been pursued for its potential application to hearing-aid design in which an attention-guided algorithm selects, from multiple competing acoustic sources, which should be enhanced for the listener and which should be suppressed. Traditionally, researchers have separated the AAD problem into two stages: reconstruction of a representation of the attended audio from neural signals, followed by determining the similarity between the candidate audio streams and the reconstruction. Here, we compare the traditional two-stage approach with a novel neural-network architecture that subsumes the explicit similarity step. We compare this new architecture against linear and non-linear (neural-network) baselines using both wet and dry electroencephalogram (EEG) systems. Our results indicate that the new architecture outperforms the baseline linear stimulus-reconstruction method, improving decoding accuracy from 66% to 81% using wet EEG and from 59% to 87% for dry EEG. Also of note was the finding that the dry EEG system can deliver comparable or even better results than the wet, despite the latter having one third as many EEG channels as the former. The 11-subject, wet-electrode AAD dataset for two competing, co-located talkers, the 11-subject, dry-electrode AAD dataset, and our software are available for further validation, experimentation, and modification.


Asunto(s)
Atención/fisiología , Corteza Auditiva/fisiología , Interfaces Cerebro-Computador , Electroencefalografía , Estimulación Acústica , Algoritmos , Corteza Auditiva/diagnóstico por imagen , Electrocorticografía , Audífonos/tendencias , Humanos , Modelos Lineales , Redes Neurales de la Computación , Ruido , Dinámicas no Lineales , Percepción del Habla/fisiología
17.
J Acoust Soc Am ; 145(3): 1456, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-31067944

RESUMEN

This paper reviews the current state of several formal models of speech motor control, with particular focus on the low-level control of the speech articulators. Further development of speech motor control models may be aided by a comparison of model attributes. The review builds an understanding of existing models from first principles, before moving into a discussion of several models, showing how each is constructed out of the same basic domain-general ideas and components-e.g., generalized feedforward, feedback, and model predictive components. This approach allows for direct comparisons to be made in terms of where the models differ, and their points of agreement. Substantial differences among models can be observed in their use of feedforward control, process of estimating system state, and method of incorporating feedback signals into control. However, many commonalities exist among the models in terms of their reliance on higher-level motor planning, use of feedback signals, lack of time-variant adaptation, and focus on kinematic aspects of control and biomechanics. Ongoing research bridging hybrid feedforward/feedback pathways with forward dynamic control, as well as feedback/internal model-based state estimation, is discussed.

18.
PLoS One ; 13(9): e0202180, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30192767

RESUMEN

Speech motor actions are performed quickly, while simultaneously maintaining a high degree of accuracy. Are speed and accuracy in conflict during speech production? Speed-accuracy tradeoffs have been shown in many domains of human motor action, but have not been directly examined in the domain of speech production. The present work seeks evidence for Fitts' law, a rigorous formulation of this fundamental tradeoff, in speech articulation kinematics by analyzing USC-TIMIT, a real-time magnetic resonance imaging data set of speech production. A theoretical framework for considering Fitts' law with respect to models of speech motor control is elucidated. Methodological challenges in seeking relationships consistent with Fitts' law are addressed, including the operational definitions and measurement of key variables in real-time MRI data. Results suggest the presence of speed-accuracy tradeoffs for certain types of speech production actions, with wide variability across syllable position, and substantial variability also across subjects. Coda consonant targets immediately following the syllabic nucleus show the strongest evidence of this tradeoff, with correlations as high as 0.72 between speed and accuracy. A discussion is provided concerning the potentially limited applicability of Fitts' law in the context of speech production, as well as the theoretical context for interpreting the results.


Asunto(s)
Corteza Motora/fisiología , Desempeño Psicomotor/fisiología , Tiempo de Reacción/fisiología , Habla/fisiología , Algoritmos , Fenómenos Biomecánicos , Humanos , Laringe/diagnóstico por imagen , Laringe/fisiología , Imagen por Resonancia Magnética , Modelos Biológicos , Pliegues Vocales/diagnóstico por imagen , Pliegues Vocales/fisiología
19.
IEEE/ACM Trans Audio Speech Lang Process ; 25(8): 1718-1730, 2017 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-34268444

RESUMEN

Glottal inverse filtering aims to estimate the glottal airflow signal from a speech signal for applications such as speaker recognition and clinical voice assessment. Nonetheless, evaluation of inverse filtering algorithms has been challenging due to the practical difficulties of directly measuring glottal airflow. Apart from this, it is acknowledged that the performance of many methods degrade in voice conditions that are of great interest, such as breathiness, high pitch, soft voice, and running speech. This paper presents a comprehensive, objective, and comparative evaluation of state-of-the-art inverse filtering algorithms that takes advantage of speech and glottal airflow signals generated by a physiological speech synthesizer. The synthesizer provides a physics-based simulation of the voice production process and thus an adequate test bed for revealing the temporal and spectral performance characteristics of each algorithm. Included in the synthetic data are continuous speech utterances and sustained vowels, which are produced with multiple voice qualities (pressed, slightly pressed, modal, slightly breathy, and breathy), fundamental frequencies, and subglottal pressures to simulate the natural variations in real speech. In evaluating the accuracy of a glottal flow estimate, multiple error measures are used, including an error in the estimated signal that measures overall waveform deviation, as well as an error in each of several clinically relevant features extracted from the glottal flow estimate. Waveform errors calculated from glottal flow estimation experiments exhibited mean values around 30% for sustained vowels, and around 40% for continuous speech, of the amplitude of true glottal flow derivative. Closed-phase approaches showed remarkable stability across different voice qualities and subglottal pressures. The algorithms of choice, as suggested by significance tests, are closed-phase covariance analysis for the analysis of sustained vowels, and sparse linear prediction for the analysis of continuous speech. Results of data subset analysis suggest that analysis of close rounded vowels is an additional challenge in glottal flow estimation.

20.
J Speech Lang Hear Res ; 54(1): 47-54, 2011 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-20699347

RESUMEN

PURPOSE: In prior work, a manually derived measure of vocal fold vibratory phase asymmetry correlated to varying degrees with visual judgments made from laryngeal high-speed videoendoscopy (HSV) recordings. This investigation extended this work by establishing an automated HSV-based framework to quantify 3 categories of vocal fold vibratory asymmetry. METHOD: HSV-based analysis provided for cycle-to-cycle estimates of left-right phase asymmetry, left-right amplitude asymmetry, and axis shift during glottal closure for 52 speakers with no vocal pathology producing comfortable and pressed phonation. An initial cross-validation of the automated left-right phase asymmetry measure was performed by correlating the measure with other objective and subjective assessments of phase asymmetry. RESULTS: Vocal fold vibratory asymmetry was exhibited to a similar extent in both comfortable and pressed phonations. The automated measure of left-right phase asymmetry strongly correlated with manually derived measures and moderately correlated with visual-perceptual ratings. Correlations with the visual-perceptual ratings remained relatively consistent as the automated measure was derived from kymograms taken at different glottal locations. CONCLUSIONS: An automated HSV-based framework for the quantification of vocal fold vibratory asymmetry was developed and initially validated. This framework serves as a platform for investigating relationships between vocal fold tissue motion and acoustic measures of voice function.


Asunto(s)
Endoscopía/métodos , Modelos Biológicos , Grabación de Cinta de Video/métodos , Pliegues Vocales/fisiología , Voz/fisiología , Algoritmos , Endoscopía/normas , Humanos , Reproducibilidad de los Resultados , Acústica del Lenguaje , Vibración , Grabación de Cinta de Video/normas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA