Pesquisa | Biblioteca Virtual em Saúde

Future Speech Interfaces with Sensors and Machine Intelligence.

Denby, Bruce; Csapó, Tamás Gábor; Wand, Michael.

Sensors (Basel) ; 23(4)2023 Feb 10.

Artigo em Inglês | MEDLINE | ID: mdl-36850569

RESUMO

Speech is the most spontaneous and natural means of communication. Speech is also becoming the preferred modality for interacting with mobile or fixed electronic devices. However, speech interfaces have drawbacks, including a lack of user privacy; non-inclusivity for certain users; poor robustness in noisy conditions; and the difficulty of creating complex man-machine interfaces. To help address these problems, the Special Issue "Future Speech Interfaces with Sensors and Machine Intelligence" assembles eleven contributions covering multimodal and silent speech interfaces; lip reading applications; novel sensors for speech interfaces; and enhanced speech inclusivity tools for future speech interfaces. Short summaries of the articles are presented, followed by an overall evaluation. The success of this Special Issue has led to its being re-issued as "Future Speech Interfaces with Sensors and Machine Intelligence-II" with a deadline in March of 2023.

Assuntos

Comunicação , Fala , Humanos , Inteligência Artificial , Eletrônica , Privacidade

Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping.

Csapó, Tamás Gábor; Gosztolya, Gábor; Tóth, László; Shandiz, Amin Honarmandi; Markó, Alexandra.

Sensors (Basel) ; 22(22)2022 Nov 08.

Artigo em Inglês | MEDLINE | ID: mdl-36433196

RESUMO

Within speech processing, articulatory-to-acoustic mapping (AAM) methods can apply ultrasound tongue imaging (UTI) as an input. (Micro)convex transducers are mostly used, which provide a wedge-shape visual image. However, this process is optimized for the visual inspection of the human eye, and the signal is often post-processed by the equipment. With newer ultrasound equipment, now it is possible to gain access to the raw scanline data (i.e., ultrasound echo return) without any internal post-processing. In this study, we compared the raw scanline representation with the wedge-shaped processed UTI as the input for the residual network applied for AAM, and we also investigated the optimal size of the input image. We found no significant differences between the performance attained using the raw data and the wedge-shaped image extrapolated from it. We found the optimal pixel size to be 64 × 43 in the case of the raw scanline input, and 64 × 64 when transformed to a wedge. Therefore, it is not necessary to use the full original 64 × 842 pixels raw scanline, but a smaller image is enough. This allows for the building of smaller networks, and will be beneficial for the development of session and speaker-independent methods for practical applications. AAM systems have the target application of a "silent speech interface", which could be helpful for the communication of the speaking-impaired, in military applications, or in extremely noisy conditions.

Assuntos

Acústica , Língua , Humanos , Língua/diagnóstico por imagem , Ultrassonografia , Fala , Ruído

Convolutional neural network-based automatic classification of midsagittal tongue gestural targets using B-mode ultrasound images.

Xu, Kele; Roussel, Pierre; Csapó, Tamás Gábor; Denby, Bruce.

J Acoust Soc Am ; 141(6): EL531, 2017 06.

Artigo em Inglês | MEDLINE | ID: mdl-28618815

RESUMO

Tongue gestural target classification is of great interest to researchers in the speech production field. Recently, deep convolutional neural networks (CNN) have shown superiority to standard feature extraction techniques in a variety of domains. In this letter, both CNN-based speaker-dependent and speaker-independent tongue gestural target classification experiments are conducted to classify tongue gestures during natural speech production. The CNN-based method achieves state-of-the-art performance, even though no pre-training of the CNN (with the exception of a data augmentation preprocessing) was carried out.

Assuntos

Gestos , Redes Neurais de Computação , Processamento de Sinais Assistido por Computador , Acústica da Fala , Língua/diagnóstico por imagem , Língua/fisiologia , Ultrassonografia/métodos , Qualidade da Voz , Fenômenos Biomecânicos , Aprendizado Profundo , Feminino , Humanos , Masculino , Reconhecimento Automatizado de Padrão

Concept and Pictogram-Based User-Interface Design of a Helper Tool for People with Aphasia.

Mayer, Peter; Werner, Katharina; Al-Radhi, Mohammed; Csapo, Tamas Gabor; Czeba, Bálint; Nemeth, Géza; Rocha, Ana Patrícia; Oliveira, Ilídio C; Silva, Samuel; Szeker, Melinda; Teixeira, António; Panek, Paul.

Stud Health Technol Inform ; 301: 77-82, 2023 May 02.

Artigo em Inglês | MEDLINE | ID: mdl-37172157

RESUMO

BACKGROUND: Aphasia describes the lack of the already gained ability to use language in a common way. "Language" here covers all variations of forming or understanding messages. OBJECTIVES: The APH-Alarm project aims to develop a service concept that provides alternative communication options for people with Aphasia to trigger timely help when needed. It considers that a typical user may not be familiar with modern technologies and offers several simple and intuitive options. METHODS: The approach is based on event detection of gestures (during daytime or in bed), movement pattern recognition in bed, and an easy-to-use pictogram-based smartphone app. RESULTS: Agile evaluation of the smartphone app showed a promising outcome. CONCLUSION: The idea of a versatile and comprehensive solution for aphasic people to easily contact private or public helpers based on their actions or automatic detection is promising and will be further investigated in an upcoming field trial.

Assuntos

Afasia , Auxiliares de Comunicação para Pessoas com Deficiência , Aplicativos Móveis , Humanos , Idioma , Gestos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA