Pesquisa | Secretaria de Estado da Saúde

Differential Representation of Articulatory Gestures and Phonemes in Precentral and Inferior Frontal Gyri.

Mugler, Emily M; Tate, Matthew C; Livescu, Karen; Templer, Jessica W; Goldrick, Matthew A; Slutzky, Marc W.

J Neurosci ; 38(46): 9803-9813, 2018 11 14.

Artigo em Inglês | MEDLINE | ID: mdl-30257858

RESUMO

Speech is a critical form of human communication and is central to our daily lives. Yet, despite decades of study, an understanding of the fundamental neural control of speech production remains incomplete. Current theories model speech production as a hierarchy from sentences and phrases down to words, syllables, speech sounds (phonemes), and the actions of vocal tract articulators used to produce speech sounds (articulatory gestures). Here, we investigate the cortical representation of articulatory gestures and phonemes in ventral precentral and inferior frontal gyri in men and women. Our results indicate that ventral precentral cortex represents gestures to a greater extent than phonemes, while inferior frontal cortex represents both gestures and phonemes. These findings suggest that speech production shares a common cortical representation with that of other types of movement, such as arm and hand movements. This has important implications both for our understanding of speech production and for the design of brain-machine interfaces to restore communication to people who cannot speak.SIGNIFICANCE STATEMENT Despite being studied for decades, the production of speech by the brain is not fully understood. In particular, the most elemental parts of speech, speech sounds (phonemes) and the movements of vocal tract articulators used to produce these sounds (articulatory gestures), have both been hypothesized to be encoded in motor cortex. Using direct cortical recordings, we found evidence that primary motor and premotor cortices represent gestures to a greater extent than phonemes. Inferior frontal cortex (part of Broca's area) appears to represent both gestures and phonemes. These findings suggest that speech production shares a similar cortical organizational structure with the movement of other body parts.

Assuntos

Mapeamento Encefálico/métodos , Eletrocorticografia/métodos , Lobo Frontal/fisiologia , Gestos , Córtex Pré-Frontal/fisiologia , Fala/fisiologia , Adulto , Mapeamento Encefálico/instrumentação , Feminino , Humanos , Masculino , Movimento/fisiologia , Estimulação Luminosa/métodos

Multistream articulatory feature-based models for visual speech recognition.

Saenko, Kate; Livescu, Karen; Glass, James; Darrell, Trevor.

IEEE Trans Pattern Anal Mach Intell ; 31(9): 1700-7, 2009 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-19574628

RESUMO

We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DBN)-based models consisting of multiple sequences of hidden states, each corresponding to an articulatory feature (AF) such as lip opening (LO) or lip rounding (LR). A bank of discriminative articulatory feature classifiers provides input to the DBN, in the form of either virtual evidence (VE) (scaled likelihoods) or raw classifier margin outputs. We present experiments on two tasks, a medium-vocabulary word-ranking task and a small-vocabulary phrase recognition task. We show that articulatory feature-based models outperform baseline models, and we study several aspects of the models, such as the effects of allowing articulatory asynchrony, of using dictionary-based versus whole-word models, and of incorporating classifier outputs via virtual evidence versus alternative observation models.

Assuntos

Interpretação de Imagem Assistida por Computador/métodos , Lábio/anatomia & histologia , Lábio/fisiologia , Leitura Labial , Modelos Biológicos , Medida da Produção da Fala/métodos , Interface para o Reconhecimento da Fala , Algoritmos , Simulação por Computador , Humanos , Aumento da Imagem/métodos , Modelos Anatômicos , Reconhecimento Automatizado de Padrão/métodos

Speech production knowledge in automatic speech recognition.

King, Simon; Frankel, Joe; Livescu, Karen; McDermott, Erik; Richmond, Korin; Wester, Mirjam.

J Acoust Soc Am ; 121(2): 723-42, 2007 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-17348495

RESUMO

Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, a survey of a growing body of work in which such representations are used to improve automatic speech recognition is provided.

Assuntos

Fonação , Fonética , Acústica da Fala , Medida da Produção da Fala , Interface para o Reconhecimento da Fala , Teorema de Bayes , Humanos , Funções Verossimilhança , Cadeias de Markov , Redes Neurais de Computação , Semântica , Espectrografia do Som , Testes de Articulação da Fala

LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 JOHNS HOPKINS SUMMER WORKSHOP.

Hasegawa-Johnson, Mark; Baker, James; Borys, Sarah; Chen, Ken; Coogan, Emily; Greenberg, Steven; Juneja, Amit; Kirchhoff, Katrin; Livescu, Karen; Mohan, Srividya; Muller, Jennifer; Sonmez, Kemal; Wang, Tianyu.

Proc IEEE Int Conf Acoust Speech Signal Process ; 1(1415088): 1213-1216, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-19212454

RESUMO

Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multiframe acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa