CST: Complex Sparse Transformer for Low-SNR Speech Enhancement.

Tan, Kaijun; Mao, Wenyu; Guo, Xiaozhou; Lu, Huaxiang; Zhang, Chi; Cao, Zhanzhong; Wang, Xingang

Tan, Kaijun; Mao, Wenyu; Guo, Xiaozhou; Lu, Huaxiang; Zhang, Chi; Cao, Zhanzhong; Wang, Xingang.

Afiliação

Tan K; Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China.
Mao W; University of Chinese Academy of Sciences, Beijing 100089, China.
Guo X; Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China.
Lu H; Chinese Association of Artificial Intelligence, Beijing 100876, China.
Zhang C; Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China.
Cao Z; University of Chinese Academy of Sciences, Beijing 100089, China.
Wang X; Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China.

Sensors (Basel) ; 23(5)2023 Feb 21.

Article em En | MEDLINE | ID: mdl-36904579

ABSTRACT

ABSTRACT

Speech enhancement tasks for audio with a low SNR are challenging. Existing speech enhancement methods are mainly designed for high SNR audio, and they usually use RNNs to model audio sequence features, which causes the model to be unable to learn long-distance dependencies, thus limiting its performance in low-SNR speech enhancement tasks. We design a complex transformer module with sparse attention to overcome this problem. Different from the traditional transformer model, this model is extended to effectively model complex domain sequences, using the sparse attention mask balance model's attention to long-distance and nearby relations, introducing the pre-layer positional embedding module to enhance the model's perception of position information, adding the channel attention module to enable the model to dynamically adjust the weight distribution between channels according to the input audio. The experimental results show that, in the low-SNR speech enhancement tests, our models have noticeable performance improvements in speech quality and intelligibility, respectively.

Assuntos

Percepção da Fala; Fala; Cognição; Aprendizagem

Palavras-chave

UNET architecture; attention mechanisms; speech enhancement; transformer

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Fala / Percepção da Fala Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google