RESUMO
Speech neuroprosthetics aim to provide a natural communication channel to individuals who are unable to speak due to physical or neurological impairments. Real-time synthesis of acoustic speech directly from measured neural activity could enable natural conversations and notably improve quality of life, particularly for individuals who have severely limited means of communication. Recent advances in decoding approaches have led to high quality reconstructions of acoustic speech from invasively measured neural activity. However, most prior research utilizes data collected during open-loop experiments of articulated speech, which might not directly translate to imagined speech processes. Here, we present an approach that synthesizes audible speech in real-time for both imagined and whispered speech conditions. Using a participant implanted with stereotactic depth electrodes, we were able to reliably generate audible speech in real-time. The decoding models rely predominately on frontal activity suggesting that speech processes have similar representations when vocalized, whispered, or imagined. While reconstructed audio is not yet intelligible, our real-time synthesis approach represents an essential step towards investigating how patients will learn to operate a closed-loop speech neuroprosthesis based on imagined speech.
Assuntos
Interfaces Cérebro-Computador , Eletrodos Implantados/estatística & dados numéricos , Próteses Neurais/estatística & dados numéricos , Qualidade de Vida , Fala , Feminino , Humanos , Adulto JovemRESUMO
Millions of individuals suffer from impairments that significantly disrupt or completely eliminate their ability to speak. An ideal intervention would restore one's natural ability to physically produce speech. Recent progress has been made in decoding speech-related brain activity to generate synthesized speech. Our vision is to extend these recent advances toward the goal of restoring physical speech production using decoded speech-related brain activity to modulate the electrical stimulation of the orofacial musculature involved in speech. In this pilot study we take a step toward this vision by investigating the feasibility of stimulating orofacial muscles during vocalization in order to alter acoustic production. The results of our study provide necessary foundation for eventual orofacial stimulation controlled directly from decoded speech-related brain activity.
Assuntos
Estimulação Elétrica , Músculos Faciais/fisiologia , Movimento , Fala , Encéfalo/fisiologia , Humanos , Projetos PilotoRESUMO
Neural interfaces that directly produce intelligible speech from brain activity would allow people with severe impairment from neurological disorders to communicate more naturally. Here, we record neural population activity in motor, premotor and inferior frontal cortices during speech production using electrocorticography (ECoG) and show that ECoG signals alone can be used to generate intelligible speech output that can preserve conversational cues. To produce speech directly from neural data, we adapted a method from the field of speech synthesis called unit selection, in which units of speech are concatenated to form audible output. In our approach, which we call Brain-To-Speech, we chose subsequent units of speech based on the measured ECoG activity to generate audio waveforms directly from the neural recordings. Brain-To-Speech employed the user's own voice to generate speech that sounded very natural and included features such as prosody and accentuation. By investigating the brain areas involved in speech production separately, we found that speech motor cortex provided more information for the reconstruction process than the other cortical areas.
RESUMO
This paper presents early-stage results of our investigations into the direct conversion of facial surface electromyographic (EMG) signals into audible speech in a real-time setting, enabling novel avenues for research and system improvement through real-time feedback. The system uses a pipeline approach to enable online acquisition of EMG data, extraction of EMG features, mapping of EMG features to audio features, synthesis of audio waveforms from audio features and output of the audio waveforms via speakers or headphones. Our system allows for performing EMG-to-Speech conversion with low latency and on a continuous stream of EMG data, enabling near instantaneous audio output during audible as well as silent speech production. In this paper, we present an analysis of our systems components for latency incurred, as well as the tradeoffs between conversion quality, latency and training duration required.
Assuntos
Eletromiografia/métodos , Processamento de Sinais Assistido por Computador , Fala/fisiologia , HumanosRESUMO
Most current Brain-Computer Interfaces (BCIs) achieve high information transfer rates using spelling paradigms based on stimulus-evoked potentials. Despite the success of this interfaces, this mode of communication can be cumbersome and unnatural. Direct synthesis of speech from neural activity represents a more natural mode of communication that would enable users to convey verbal messages in real-time. In this pilot study with one participant, we demonstrate that electrocoticography (ECoG) intracranial activity from temporal areas can be used to resynthesize speech in real-time. This is accomplished by reconstructing the audio magnitude spectrogram from neural activity and subsequently creating the audio waveform from these reconstructed spectrograms. We show that significant correlations between the original and reconstructed spectrograms and temporal waveforms can be achieved. While this pilot study uses audibly spoken speech for the models, it represents a first step towards speech synthesis from speech imagery.