Surface Electromyography-Based Recognition, Synthesis, and Perception of Prosodic Subvocal Speech.

Vojtech, Jennifer M; Chan, Michael D; Shiwani, Bhawna; Roy, Serge H; Heaton, James T; Meltzner, Geoffrey S; Contessa, Paola; De Luca, Gianluca; Patel, Rupal; Kline, Joshua C

Vojtech, Jennifer M; Chan, Michael D; Shiwani, Bhawna; Roy, Serge H; Heaton, James T; Meltzner, Geoffrey S; Contessa, Paola; De Luca, Gianluca; Patel, Rupal; Kline, Joshua C.

Affiliation

Vojtech JM; Delsys/Altec, Inc., Natick, MA.
Chan MD; Boston University, MA.
Shiwani B; Delsys/Altec, Inc., Natick, MA.
Roy SH; Delsys/Altec, Inc., Natick, MA.
Heaton JT; Delsys/Altec, Inc., Natick, MA.
Meltzner GS; Massachusetts General Hospital Department of Surgery, Boston.
Contessa P; VocaliD, Inc., Belmont, MA.
De Luca G; Delsys/Altec, Inc., Natick, MA.
Patel R; Delsys/Altec, Inc., Natick, MA.
Kline JC; VocaliD, Inc., Belmont, MA.

J Speech Lang Hear Res ; 64(6S): 2134-2153, 2021 06 18.

Article in En | MEDLINE | ID: mdl-33979177

ABSTRACT

ABSTRACT

Purpose This study aimed to evaluate a novel communication system designed to translate surface electromyographic (sEMG) signals from articulatory muscles into speech using a personalized, digital voice. The system was evaluated for word recognition, prosodic classification, and listener perception of synthesized speech. Method sEMG signals were recorded from the face and neck as speakers with (n = 4) and without (n = 4) laryngectomy subvocally recited (silently mouthed) a speech corpus comprising 750 phrases (150 phrases with variable phrase-level stress). Corpus tokens were then translated into speech via personalized voice synthesis (n = 8 synthetic voices) and compared against phrases produced by each speaker when using their typical mode of communication (n = 4 natural voices, n = 4 electrolaryngeal [EL] voices). Naïve listeners (n = 12) evaluated synthetic, natural, and EL speech for acceptability and intelligibility in a visual sort-and-rate task, as well as phrasal stress discriminability via a classification mechanism. Results Recorded sEMG signals were processed to translate sEMG muscle activity into lexical content and categorize variations in phrase-level stress, achieving a mean accuracy of 96.3% (SD = 3.10%) and 91.2% (SD = 4.46%), respectively. Synthetic speech was significantly higher in acceptability and intelligibility than EL speech, also leading to greater phrasal stress classification accuracy, whereas natural speech was rated as the most acceptable and intelligible, with the greatest phrasal stress classification accuracy. Conclusion This proof-of-concept study establishes the feasibility of using subvocal sEMG-based alternative communication not only for lexical recognition but also for prosodic communication in healthy individuals, as well as those living with vocal impairments and residual articulatory function. Supplemental Material https//doi.org/10.23641/asha.14558481.

Subject(s)

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Speech Perception / Voice Limits: Humans Language: En Journal: J Speech Lang Hear Res Journal subject: AUDIOLOGIA / PATOLOGIA DA FALA E LINGUAGEM Year: 2021 Document type: Article Affiliation country: Marruecos

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google