Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Nat Rev Neurosci ; 25(7): 473-492, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38745103

RESUMO

Loss of speech after paralysis is devastating, but circumventing motor-pathway injury by directly decoding speech from intact cortical activity has the potential to restore natural communication and self-expression. Recent discoveries have defined how key features of speech production are facilitated by the coordinated activity of vocal-tract articulatory and motor-planning cortical representations. In this Review, we highlight such progress and how it has led to successful speech decoding, first in individuals implanted with intracranial electrodes for clinical epilepsy monitoring and subsequently in individuals with paralysis as part of early feasibility clinical trials to restore speech. We discuss high-spatiotemporal-resolution neural interfaces and the adaptation of state-of-the-art speech computational algorithms that have driven rapid and substantial progress in decoding neural activity into text, audible speech, and facial movements. Although restoring natural speech is a long-term goal, speech neuroprostheses already have performance levels that surpass communication rates offered by current assistive-communication technology. Given this accelerated rate of progress in the field, we propose key evaluation metrics for speed and accuracy, among others, to help standardize across studies. We finish by highlighting several directions to more fully explore the multidimensional feature space of speech and language, which will continue to accelerate progress towards a clinically viable speech neuroprosthesis.


Assuntos
Interfaces Cérebro-Computador , Fala , Humanos , Fala/fisiologia , Próteses Neurais , Animais
2.
Nature ; 620(7976): 1037-1046, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37612505

RESUMO

Speech neuroprostheses have the potential to restore communication to people living with paralysis, but naturalistic speed and expressivity are elusive1. Here we use high-density surface recordings of the speech cortex in a clinical-trial participant with severe limb and vocal paralysis to achieve high-performance real-time decoding across three complementary speech-related output modalities: text, speech audio and facial-avatar animation. We trained and evaluated deep-learning models using neural data collected as the participant attempted to silently speak sentences. For text, we demonstrate accurate and rapid large-vocabulary decoding with a median rate of 78 words per minute and median word error rate of 25%. For speech audio, we demonstrate intelligible and rapid speech synthesis and personalization to the participant's pre-injury voice. For facial-avatar animation, we demonstrate the control of virtual orofacial movements for speech and non-speech communicative gestures. The decoders reached high performance with less than two weeks of training. Our findings introduce a multimodal speech-neuroprosthetic approach that has substantial promise to restore full, embodied communication to people living with severe paralysis.


Assuntos
Face , Próteses Neurais , Paralisia , Fala , Humanos , Córtex Cerebral/fisiologia , Córtex Cerebral/fisiopatologia , Ensaios Clínicos como Assunto , Comunicação , Aprendizado Profundo , Gestos , Movimento , Próteses Neurais/normas , Paralisia/fisiopatologia , Paralisia/reabilitação , Vocabulário , Voz
3.
N Engl J Med ; 385(3): 217-227, 2021 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-34260835

RESUMO

BACKGROUND: Technology to restore the ability to communicate in paralyzed persons who cannot speak has the potential to improve autonomy and quality of life. An approach that decodes words and sentences directly from the cerebral cortical activity of such patients may represent an advancement over existing methods for assisted communication. METHODS: We implanted a subdural, high-density, multielectrode array over the area of the sensorimotor cortex that controls speech in a person with anarthria (the loss of the ability to articulate speech) and spastic quadriparesis caused by a brain-stem stroke. Over the course of 48 sessions, we recorded 22 hours of cortical activity while the participant attempted to say individual words from a vocabulary set of 50 words. We used deep-learning algorithms to create computational models for the detection and classification of words from patterns in the recorded cortical activity. We applied these computational models, as well as a natural-language model that yielded next-word probabilities given the preceding words in a sequence, to decode full sentences as the participant attempted to say them. RESULTS: We decoded sentences from the participant's cortical activity in real time at a median rate of 15.2 words per minute, with a median word error rate of 25.6%. In post hoc analyses, we detected 98% of the attempts by the participant to produce individual words, and we classified words with 47.1% accuracy using cortical signals that were stable throughout the 81-week study period. CONCLUSIONS: In a person with anarthria and spastic quadriparesis caused by a brain-stem stroke, words and sentences were decoded directly from cortical activity during attempted speech with the use of deep-learning models and a natural-language model. (Funded by Facebook and others; ClinicalTrials.gov number, NCT03698149.).


Assuntos
Infartos do Tronco Encefálico/complicações , Interfaces Cérebro-Computador , Aprendizado Profundo , Disartria/reabilitação , Próteses Neurais , Fala , Adulto , Disartria/etiologia , Eletrocorticografia , Eletrodos Implantados , Humanos , Masculino , Processamento de Linguagem Natural , Quadriplegia/etiologia , Córtex Sensório-Motor/fisiologia
5.
Nat Biomed Eng ; 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38769157

RESUMO

Advancements in decoding speech from brain activity have focused on decoding a single language. Hence, the extent to which bilingual speech production relies on unique or shared cortical activity across languages has remained unclear. Here, we leveraged electrocorticography, along with deep-learning and statistical natural-language models of English and Spanish, to record and decode activity from speech-motor cortex of a Spanish-English bilingual with vocal-tract and limb paralysis into sentences in either language. This was achieved without requiring the participant to manually specify the target language. Decoding models relied on shared vocal-tract articulatory representations across languages, which allowed us to build a syllable classifier that generalized across a shared set of English and Spanish syllables. Transfer learning expedited training of the bilingual decoder by enabling neural data recorded in one language to improve decoding in the other language. Overall, our findings suggest shared cortical articulatory representations that persist after paralysis and enable the decoding of multiple languages without the need to train separate language-specific decoders.

6.
Nat Commun ; 13(1): 6510, 2022 11 08.
Artigo em Inglês | MEDLINE | ID: mdl-36347863

RESUMO

Neuroprostheses have the potential to restore communication to people who cannot speak or type due to paralysis. However, it is unclear if silent attempts to speak can be used to control a communication neuroprosthesis. Here, we translated direct cortical signals in a clinical-trial participant (ClinicalTrials.gov; NCT03698149) with severe limb and vocal-tract paralysis into single letters to spell out full sentences in real time. We used deep-learning and language-modeling techniques to decode letter sequences as the participant attempted to silently spell using code words that represented the 26 English letters (e.g. "alpha" for "a"). We leveraged broad electrode coverage beyond speech-motor cortex to include supplemental control signals from hand cortex and complementary information from low- and high-frequency signal components to improve decoding accuracy. We decoded sentences using words from a 1,152-word vocabulary at a median character error rate of 6.13% and speed of 29.4 characters per minute. In offline simulations, we showed that our approach generalized to large vocabularies containing over 9,000 words (median character error rate of 8.23%). These results illustrate the clinical viability of a silently controlled speech neuroprosthesis to generate sentences from a large vocabulary through a spelling-based approach, complementing previous demonstrations of direct full-word decoding.


Assuntos
Percepção da Fala , Fala , Humanos , Idioma , Vocabulário , Paralisia
7.
J Assoc Res Otolaryngol ; 23(3): 319-349, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35441936

RESUMO

Use of artificial intelligence (AI) is a burgeoning field in otolaryngology and the communication sciences. A virtual symposium on the topic was convened from Duke University on October 26, 2020, and was attended by more than 170 participants worldwide. This review presents summaries of all but one of the talks presented during the symposium; recordings of all the talks, along with the discussions for the talks, are available at https://www.youtube.com/watch?v=ktfewrXvEFg and https://www.youtube.com/watch?v=-gQ5qX2v3rg . Each of the summaries is about 2500 words in length and each summary includes two figures. This level of detail far exceeds the brief summaries presented in traditional reviews and thus provides a more-informed glimpse into the power and diversity of current AI applications in otolaryngology and the communication sciences and how to harness that power for future applications.


Assuntos
Inteligência Artificial , Otolaringologia , Comunicação , Humanos
8.
Nat Neurosci ; 23(4): 575-582, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32231340

RESUMO

A decade after speech was first decoded from human brain signals, accuracy and speed remain far below that of natural speech. Here we show how to decode the electrocorticogram with high accuracy and at natural-speech rates. Taking a cue from recent advances in machine translation, we train a recurrent neural network to encode each sentence-length sequence of neural activity into an abstract representation, and then to decode this representation, word by word, into an English sentence. For each participant, data consist of several spoken repeats of a set of 30-50 sentences, along with the contemporaneous signals from ~250 electrodes distributed over peri-Sylvian cortices. Average word error rates across a held-out repeat set are as low as 3%. Finally, we show how decoding with limited data can be improved with transfer learning, by training certain layers of the network under multiple participants' data.


Assuntos
Interfaces Cérebro-Computador , Encéfalo/fisiologia , Redes Neurais de Computação , Percepção da Fala , Fala , Adulto , Eletrocorticografia , Feminino , Humanos , Pessoa de Meia-Idade
9.
Nat Commun ; 10(1): 3096, 2019 07 30.
Artigo em Inglês | MEDLINE | ID: mdl-31363096

RESUMO

Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance's identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate.


Assuntos
Mapeamento Encefálico/métodos , Córtex Cerebral/fisiologia , Fala/fisiologia , Interfaces Cérebro-Computador , Eletrocorticografia/instrumentação , Eletrocorticografia/métodos , Eletrodos Implantados , Epilepsia/diagnóstico , Epilepsia/fisiopatologia , Feminino , Humanos , Fatores de Tempo
10.
J Neural Eng ; 15(3): 036005, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29378977

RESUMO

OBJECTIVE: Recent research has characterized the anatomical and functional basis of speech perception in the human auditory cortex. These advances have made it possible to decode speech information from activity in brain regions like the superior temporal gyrus, but no published work has demonstrated this ability in real-time, which is necessary for neuroprosthetic brain-computer interfaces. APPROACH: Here, we introduce a real-time neural speech recognition (rtNSR) software package, which was used to classify spoken input from high-resolution electrocorticography signals in real-time. We tested the system with two human subjects implanted with electrode arrays over the lateral brain surface. Subjects listened to multiple repetitions of ten sentences, and rtNSR classified what was heard in real-time from neural activity patterns using direct sentence-level and HMM-based phoneme-level classification schemes. MAIN RESULTS: We observed single-trial sentence classification accuracies of [Formula: see text] or higher for each subject with less than 7 minutes of training data, demonstrating the ability of rtNSR to use cortical recordings to perform accurate real-time speech decoding in a limited vocabulary setting. SIGNIFICANCE: Further development and testing of the package with different speech paradigms could influence the design of future speech neuroprosthetic applications.


Assuntos
Estimulação Acústica/métodos , Córtex Auditivo/fisiologia , Interfaces Cérebro-Computador , Sistemas Computacionais , Percepção da Fala/fisiologia , Fala/fisiologia , Estimulação Acústica/instrumentação , Eletrodos Implantados , Humanos
11.
J Neural Eng ; 13(5): 056004, 2016 10.
Artigo em Inglês | MEDLINE | ID: mdl-27484713

RESUMO

OBJECTIVE: The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. APPROACH: The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. MAIN RESULTS: The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. SIGNIFICANCE: These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.


Assuntos
Córtex Cerebral/fisiologia , Interface para o Reconhecimento da Fala , Estimulação Acústica , Algoritmos , Córtex Auditivo/fisiologia , Simulação por Computador , Análise Discriminante , Eletrocorticografia , Eletrodos Implantados , Feminino , Ritmo Gama , Humanos , Funções Verossimilhança , Masculino , Cadeias de Markov , Reprodutibilidade dos Testes , Caracteres Sexuais , Lobo Temporal/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA