Búsqueda | Biblioteca Virtual en Salud

Sound Event Localization and Detection Using Imbalanced Real and Synthetic Data via Multi-Generator.

Shin, Yeongseo; Chun, Chanjun.

Sensors (Basel) ; 23(7)2023 Mar 23.

Artículo en Inglés | MEDLINE | ID: mdl-37050458

RESUMEN

This study proposes a sound event localization and detection (SELD) method using imbalanced real and synthetic data via a multi-generator. The proposed method is based on a residual convolutional neural network (RCNN) and a transformer encoder for real spatial sound scenes. SELD aims to classify the sound event, detect the onset and offset of the classified event, and estimate the direction of the sound event. In Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Task 3, SELD is performed with a few real spatial sound scene data and a relatively large number of synthetic data. When a model is trained using imbalanced data, it can proceed by focusing only on a larger number of data. Thus, a multi-generator that samples real and synthetic data at a specific rate in one batch is proposed to prevent this problem. We applied the data augmentation technique SpecAugment and used time-frequency masking to the dataset. Furthermore, we propose a neural network architecture to apply the RCNN and transformer encoder. Several models were trained with various structures and hyperparameters, and several ensemble models were obtained by "cherry-picking" specific models. Based on the experiment, the single model of the proposed method and the model applied with the ensemble exhibited improved performance compared with the baseline model.

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

Detalles de la búsqueda