Pesquisa | Portal Regional da BVS

Bridging auditory perception and natural language processing with semantically informed deep neural networks.

Esposito, Michele; Valente, Giancarlo; Plasencia-Calaña, Yenisel; Dumontier, Michel; Giordano, Bruno L; Formisano, Elia.

Sci Rep ; 14(1): 20994, 2024 09 09.

Artigo em Inglês | MEDLINE | ID: mdl-39251659

RESUMO

Sound recognition is effortless for humans but poses a significant challenge for artificial hearing systems. Deep neural networks (DNNs), especially convolutional neural networks (CNNs), have recently surpassed traditional machine learning in sound classification. However, current DNNs map sounds to labels using binary categorical variables, neglecting the semantic relations between labels. Cognitive neuroscience research suggests that human listeners exploit such semantic information besides acoustic cues. Hence, our hypothesis is that incorporating semantic information improves DNN's sound recognition performance, emulating human behaviour. In our approach, sound recognition is framed as a regression problem, with CNNs trained to map spectrograms to continuous semantic representations from NLP models (Word2Vec, BERT, and CLAP text encoder). Two DNN types were trained: semDNN with continuous embeddings and catDNN with categorical labels, both with a dataset extracted from a collection of 388,211 sounds enriched with semantic descriptions. Evaluations across four external datasets, confirmed the superiority of semantic labeling from semDNN compared to catDNN, preserving higher-level relations. Importantly, an analysis of human similarity ratings for natural sounds, showed that semDNN approximated human listener behaviour better than catDNN, other DNNs, and NLP models. Our work contributes to understanding the role of semantics in sound recognition, bridging the gap between artificial systems and human auditory perception.

Assuntos

Percepção Auditiva , Processamento de Linguagem Natural , Redes Neurais de Computação , Semântica , Humanos , Percepção Auditiva/fisiologia , Aprendizado Profundo , Som

What do we mean with sound semantics, exactly? A survey of taxonomies and ontologies of everyday sounds.

Giordano, Bruno L; de Miranda Azevedo, Ricardo; Plasencia-Calaña, Yenisel; Formisano, Elia; Dumontier, Michel.

Front Psychol ; 13: 964209, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36312201

RESUMO

Taxonomies and ontologies for the characterization of everyday sounds have been developed in several research fields, including auditory cognition, soundscape research, artificial hearing, sound design, and medicine. Here, we surveyed 36 of such knowledge organization systems, which we identified through a systematic literature search. To evaluate the semantic domains covered by these systems within a homogeneous framework, we introduced a comprehensive set of verbal sound descriptors (sound source properties; attributes of sensation; sound signal descriptors; onomatopoeias; music genres), which we used to manually label the surveyed descriptor classes. We reveal that most taxonomies and ontologies were developed to characterize higher-level semantic relations between sound sources in terms of the sound-generating objects and actions involved (what/how), or in terms of the environmental context (where). This indicates the current lack of a comprehensive ontology of everyday sounds that covers simultaneously all semantic aspects of the relation between sounds. Such an ontology may have a wide range of applications and purposes, ranging from extending our scientific knowledge of auditory processes in the real world, to developing artificial hearing systems.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA