Pesquisa | Biblioteca Virtual em Saúde

A temporal-spectral generative adversarial network based end-to-end packet loss concealment for wideband speech transmission.

Wang, Jie; Guan, Yuansheng; Zheng, Chengshi; Peng, Renhua; Li, Xiaodong.

J Acoust Soc Am ; 150(4): 2577, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-34717509

RESUMO

Packet loss concealment (PLC) aims to mitigate speech impairments caused by packet losses so as to improve speech perceptual quality. This paper proposes an end-to-end PLC algorithm with a time-frequency hybrid generative adversarial network, which incorporates a dilated residual convolution and the integration of a time-domain discriminator and frequency-domain discriminator into a convolutional encoder-decoder architecture. The dilated residual convolution is employed to aggregate the short-term and long-term context information of lost speech frames through two network receptive fields with different dilation rates, and the integrated time-frequency discriminators are proposed to learn multi-resolution time-frequency features from correctly received speech frames with both time-domain waveform and frequency-domain complex spectrums. Both causal and noncausal strategies are proposed for the packet-loss problem, which can effectively reduce the transitional distortion caused by lost speech frames with a significantly reduced number of training parameters and computational complexity. The experimental results show that the proposed method can achieve better performance in terms of three objective measurements, including the signal-to-noise ratio, perceptual evaluation of speech quality, and short-time objective intelligibility. The results of the subjective listening test further confirm a better performance in the speech perceptual quality.

Assuntos

Algoritmos , Fala , Percepção Auditiva , Razão Sinal-Ruído

Deep learning-based stereophonic acoustic echo suppression without decorrelation.

Cheng, Linjuan; Peng, Renhua; Li, Andong; Zheng, Chengshi; Li, Xiaodong.

J Acoust Soc Am ; 150(2): 816, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-34470328

RESUMO

Traditional stereophonic acoustic echo cancellation algorithms need to estimate acoustic echo paths from stereo loudspeakers to a microphone, which often suffers from the nonuniqueness problem caused by a high correlation between the two far-end signals of these stereo loudspeakers. Many decorrelation methods have already been proposed to mitigate this problem. However, these methods may reduce the audio quality and/or stereophonic spatial perception. This paper proposes to use a convolutional recurrent network (CRN) to suppress the stereophonic echo components by estimating a nonlinear gain, which is then multiplied by the complex spectrum of the microphone signal to obtain the estimated near-end speech without a decorrelation procedure. The CRN includes an encoder-decoder module and two-layer gated recurrent network module, which can take advantage of the feature extraction capability of the convolutional neural networks and temporal modeling capability of recurrent neural networks simultaneously. The magnitude spectra of the two far-end signals are used as input features directly without any decorrelation preprocessing and, thus, both the audio quality and stereophonic spatial perception can be maintained. The experimental results in both the simulated and real acoustic environments show that the proposed algorithm outperforms traditional algorithms such as the normalized least-mean square and Wiener algorithms, especially in situations of low signal-to-echo ratio and high reverberation time RT60.

Assuntos

Aprendizado Profundo , Acústica , Algoritmos , Análise dos Mínimos Quadrados , Redes Neurais de Computação

On the importance of power compression and phase estimation in monaural speech dereverberation.

Li, Andong; Zheng, Chengshi; Peng, Renhua; Li, Xiaodong.

JASA Express Lett ; 1(1): 014802, 2021 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-36154095

RESUMO

Previous studies have shown the importance of introducing power compression on both feature and target when only the magnitude is considered in the dereverberation task. When both real and imaginary components are estimated without power compression, it has been shown that it is important to take magnitude constraint into account. In this paper, both power compression and phase estimation are considered to show their equal importance in the dereverberation task, where we propose to reconstruct the compressed real and imaginary components (cRI) for training. Both objective and subjective results reveal that better dereverberation can be achieved when using cRI.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA