Automatic recognition of giant panda vocalizations using wide spectrum features and deep neural network.

Liao, Zhiwu; Hu, Shaoxiang; Hou, Rong; Liu, Meiling; Xu, Ping; Zhang, Zhihe; Chen, Peng

Liao, Zhiwu; Hu, Shaoxiang; Hou, Rong; Liu, Meiling; Xu, Ping; Zhang, Zhihe; Chen, Peng.

Afiliação

Liao Z; Key Laboratory of Land Resources Evaluation and Monitoring in Southwest China, Ministry of Education, Sichuan Normal University, Chengdu, China.
Hu S; Academy of Global Governance and Area Studies, Sichuan Normal University, Chengdu, China.
Hou R; School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China.
Liu M; Chengdu Research Base of Giant Panda Breeding, Sichuan Key Laboratory of Conservation Biology for Endangered Wildlife, Chengdu 610081, China.
Xu P; Giant Panda National Park Chengdu Administration, Chengdu 610096, China.
Zhang Z; Giant Panda National Park Chengdu Administration, Chengdu 610096, China.
Chen P; Giant Panda National Park Chengdu Administration, Chengdu 610096, China.

Math Biosci Eng ; 20(8): 15456-15475, 2023 07 24.

Article em En | MEDLINE | ID: mdl-37679187

ABSTRACT

ABSTRACT

The goal of this study is to present an automatic vocalization recognition system of giant pandas (GPs). Over 12800 vocal samples of GPs were recorded at Chengdu Research Base of Giant Panda Breeding (CRBGPB) and labeled by CRBGPB animal husbandry staff. These vocal samples were divided into 16 categories, each with 800 samples. A novel deep neural network (DNN) named 3Fbank-GRU was proposed to automatically give labels to GP's vocalizations. Unlike existing human vocalization recognition frameworks based on Mel filter bank (Fbank) which used low-frequency features of voice only, we extracted the high, medium and low frequency features by Fbank and two self-deduced filter banks, named Medium Mel Filter bank (MFbank) and Reversed Mel Filter bank (RFbank). The three frequency features were sent into the 3Fbank-GRU to train and test. By training models using datasets labeled by CRBGPB animal husbandry staff and subsequent testing of trained models on recognizing tasks, the proposed method achieved recognition accuracy over 95%, which means that the automatic system can be used to accurately label large data sets of GP vocalizations collected by camera traps or other recording methods.

Assuntos

Ursidae; Animais; Humanos; Criação de Animais Domésticos; Redes Neurais de Computação

Palavras-chave

3Fbank-GRU; Mel filter bank (Fbank); deep neural network (DNN); gated recurrent unit (GRU); medium Mel filter bank (MFbank); reversed Mel filter bank (RFbank); vocalization recognition

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Ursidae Limite: Animals / Humans Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google