A dual-region speech enhancement method based on voiceprint segmentation.
Neural Netw
; 180: 106683, 2024 Aug 31.
Article
em En
| MEDLINE
| ID: mdl-39255636
ABSTRACT
Single-channel speech enhancement primarily relies on deep learning models to recover clean speech signals from noise-contaminated speech. These models establish a mapping relationship between noisy and clean speech. However, considering the sparse distribution characteristics of speech energy across the entire time-frequency spectrogram, constructing the mapping relationship from noisy to clean speech exhibits significant differences in regions where speech energy is concentrated and non-concentrated. Utilizing one deep model to simultaneously address these two distinct regression tasks increases the complexity of the mapping relationships, consequently restricting the model's performance. To validate our hypothesis, we propose a dual-region speech enhancement model based on voiceprint region segmentation. Specifically, we first train a voiceprint segmentation model to classify noisy speech into two regions. Subsequently, we establish dedicated speech enhancement models for each region, with the dual-region models concurrently constructing mapping relationships for noise-corrupted speech to clean speech in distinct regions. Finally, by merging the results, the complete restored speech can be obtained. Experimental results on public datasets demonstrate that our method achieves competitive speech enhancement performance, outperforming the state-of-the-art. Ablation study results confirm the effectiveness of the proposed approach in enhancing model performance.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Idioma:
En
Revista:
Neural Netw
Assunto da revista:
NEUROLOGIA
Ano de publicação:
2024
Tipo de documento:
Article
País de afiliação:
China