Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Más filtros

Banco de datos
Tipo del documento
Publication year range
1.
J Acoust Soc Am ; 151(4): 2773, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35461490

RESUMEN

Recognizing background information in human speech signals is a task that is extremely useful in a wide range of practical applications, and many articles on background sound classification have been published. It has not, however, been addressed with background embedded in real-world human speech signals. Thus, this work proposes a lightweight deep convolutional neural network (CNN) in conjunction with spectrograms for an efficient background sound classification with practical human speech signals. The proposed model classifies 11 different background sounds such as airplane, airport, babble, car, drone, exhibition, helicopter, restaurant, station, street, and train sounds embedded in human speech signals. The proposed deep CNN model consists of four convolution layers, four max-pooling layers, and one fully connected layer. The model is tested on human speech signals with varying signal-to-noise ratios (SNRs). Based on the results, the proposed deep CNN model utilizing spectrograms achieves an overall background sound classification accuracy of 95.2% using the human speech signals with a wide range of SNRs. It is also observed that the proposed model outperforms the benchmark models in terms of both accuracy and inference time when evaluated on edge computing devices.


Asunto(s)
Redes Neurales de la Computación , Habla , Humanos , Sonido
2.
Data Brief ; 42: 108037, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35341036

RESUMEN

An update to the previously published low resolution thermal imaging dataset is presented in this paper. The new dataset contains high resolution thermal images corresponding to various hand gestures captured using the FLIR Lepton 3.5 thermal camera and Purethermal 2 breakout board. The resolution of the camera is 160 × 120 with calibrated array of 19,200 pixels. The images captured by the thermal camera are light-independent. The dataset consists of 14,400 images with equal share from color and gray scale. The dataset consists of 10 different hand gestures. Each gesture has a total of 24 images from a single person with a total of 30 persons for the whole dataset. The dataset also contains the images captured under different orientations of the hand under different lighting conditions.

3.
Data Brief ; 41: 107977, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-35242951

RESUMEN

The dataset contains low resolution thermal images corresponding to various sign language digits represented by hand and captured using the Omron D6T thermal camera. The resolution of the camera is 32 × 32 pixels. Because of the low resolution of the images captured by this camera, machine learning models for detecting and classifying sign language digits face additional challenges. Furthermore, the sensor's position and quality have a significant impact on the quality of the captured images. In addition, it is affected by external factors such as the temperature of the surface in comparison to the temperature of the hand. The dataset consists of 3200 images corresponding to ten sign digits, 0-9. Thus, each sign language digit consists of 320 images collected from different persons. The hand is oriented in various ways to capture all of the variations in the dataset.

SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda