A dual-region speech enhancement method based on voiceprint segmentation.

Li, Yang; Zhang, Wei-Tao; Lou, Shun-Tian

Li, Yang; Zhang, Wei-Tao; Lou, Shun-Tian.

Afiliação

Li Y; School of Electronic Engineering, Xidian University, Xi'an 710071, China.
Zhang WT; School of Electronic Engineering, Xidian University, Xi'an 710071, China. Electronic address: zhwt-work@foxmail.com.
Lou ST; School of Electronic Engineering, Xidian University, Xi'an 710071, China.

Neural Netw ; 180: 106683, 2024 Aug 31.

Article em En | MEDLINE | ID: mdl-39255636

ABSTRACT

ABSTRACT

Single-channel speech enhancement primarily relies on deep learning models to recover clean speech signals from noise-contaminated speech. These models establish a mapping relationship between noisy and clean speech. However, considering the sparse distribution characteristics of speech energy across the entire time-frequency spectrogram, constructing the mapping relationship from noisy to clean speech exhibits significant differences in regions where speech energy is concentrated and non-concentrated. Utilizing one deep model to simultaneously address these two distinct regression tasks increases the complexity of the mapping relationships, consequently restricting the model's performance. To validate our hypothesis, we propose a dual-region speech enhancement model based on voiceprint region segmentation. Specifically, we first train a voiceprint segmentation model to classify noisy speech into two regions. Subsequently, we establish dedicated speech enhancement models for each region, with the dual-region models concurrently constructing mapping relationships for noise-corrupted speech to clean speech in distinct regions. Finally, by merging the results, the complete restored speech can be obtained. Experimental results on public datasets demonstrate that our method achieves competitive speech enhancement performance, outperforming the state-of-the-art. Ablation study results confirm the effectiveness of the proposed approach in enhancing model performance.

Palavras-chave

Deep learning; Regression task; Speech enhancement; Voiceprint segmentation

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Neural Netw Assunto da revista: NEUROLOGIA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google