Búsqueda | Portal Regional de la BVS

DGSD: Dynamical graph self-distillation for EEG-based auditory spatial attention detection.

Fan, Cunhang; Zhang, Hongyu; Huang, Wei; Xue, Jun; Tao, Jianhua; Yi, Jiangyan; Lv, Zhao; Wu, Xiaopei.

Neural Netw ; 179: 106580, 2024 Jul 26.

Artículo en Inglés | MEDLINE | ID: mdl-39096751

RESUMEN

Auditory Attention Detection (AAD) aims to detect the target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural networks designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean characteristics. In order to address this problem, this paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input. Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals. In addition, to further improve AAD detection performance, self-distillation, consisting of feature distillation and hierarchical distillation strategies at each layer, is integrated. These strategies leverage features and classification results from the deepest network layers to guide the learning of shallow layers. Our experiments are conducted on two publicly available datasets, KUL and DTU. Under a 1-second time window, we achieve results of 90.0% and 79.6% accuracy on KUL and DTU, respectively. We compare our DGSD method with competitive baselines, and the experimental results indicate that the detection performance of our proposed DGSD method is not only superior to the best reproducible baseline but also significantly reduces the number of trainable parameters by approximately 100 times.

Spatial reconstructed local attention Res2Net with F0 subband for fake speech detection.

Fan, Cunhang; Xue, Jun; Tao, Jianhua; Yi, Jiangyan; Wang, Chenglong; Zheng, Chengshi; Lv, Zhao.

Neural Netw ; 175: 106320, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38640696

RESUMEN

The rhythm of bonafide speech is often difficult to replicate, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so as to improve the performance of FSD, the spatial reconstructed local attention Res2Net (SR-LA Res2Net) is proposed. Specifically, Res2Net is used as a backbone network to obtain multiscale information, and enhanced with a spatial reconstruction mechanism to avoid losing important information when the channel group is constantly superimposed. In addition, local attention is designed to make the model focus on the local information of the F0 subband. Experimental results on the ASVspoof 2019 LA dataset show that our proposed method obtains an equal error rate (EER) of 0.47% and a minimum tandem detection cost function (min t-DCF) of 0.0159, achieving the state-of-the-art performance among all of the single systems.

Asunto(s)

Redes Neurales de la Computación , Humanos , Habla , Atención/fisiología , Algoritmos

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA