RESUMO
Emotion recognition through speech is a technique employed in various scenarios of Human-Computer Interaction (HCI). Existing approaches have achieved significant results; however, limitations persist, with the quantity and diversity of data being more notable when deep learning techniques are used. The lack of a standard in feature selection leads to continuous development and experimentation. Choosing and designing the appropriate network architecture constitutes another challenge. This study addresses the challenge of recognizing emotions in the human voice using deep learning techniques, proposing a comprehensive approach, and developing preprocessing and feature selection stages while constructing a dataset called EmoDSc as a result of combining several available databases. The synergy between spectral features and spectrogram images is investigated. Independently, the weighted accuracy obtained using only spectral features was 89%, while using only spectrogram images, the weighted accuracy reached 90%. These results, although surpassing previous research, highlight the strengths and limitations when operating in isolation. Based on this exploration, a neural network architecture composed of a CNN1D, a CNN2D, and an MLP that fuses spectral features and spectogram images is proposed. The model, supported by the unified dataset EmoDSc, demonstrates a remarkable accuracy of 96%.
Assuntos
Aprendizado Profundo , Emoções , Redes Neurais de Computação , Humanos , Emoções/fisiologia , Fala/fisiologia , Bases de Dados Factuais , Algoritmos , Reconhecimento Automatizado de Padrão/métodosRESUMO
Berry production is increasing worldwide each year; however, high production leads to labor shortages and an increase in wasted fruit during harvest seasons. This problem opened new research opportunities in computer vision as one main challenge to address is the uncontrolled light conditions in greenhouses and open fields. The high light variations between zones can lead to underexposure of the regions of interest, making it difficult to classify between vegetation, ripe, and unripe blackberries due to their black color. Therefore, the aim of this work is to automate the process of classifying the ripeness stages of blackberries in normal and low-light conditions by exploring the use of image fusion methods to improve the quality of the input image before the inference process. The proposed algorithm adds information from three sources: visible, an improved version of the visible, and a sensor that captures images in the near-infrared spectra, obtaining a mean F1 score of 0.909±0.074 and 0.962±0.028 in underexposed images, without and with model fine-tuning, respectively, which in some cases is an increase of up to 12% in the classification rates. Furthermore, the analysis of the fusion metrics showed that the method could be used in outdoor images to enhance their quality; the weighted fusion helps to improve only underexposed vegetation, improving the contrast of objects in the image without significant changes in saturation and colorfulness.
Assuntos
Aprendizado Profundo , Rubus , Frutas , Algoritmos , LuzRESUMO
In image classification, few-shot learning deals with recognizing visual categories from a few tagged examples. The degree of expressiveness of the encoded features in this scenario is a crucial question that needs to be addressed in the models being trained. Recent approaches have achieved encouraging results in improving few-shot models in deep learning, but designing a competitive and simple architecture is challenging, especially considering its requirement in many practical applications. This work proposes an improved few-shot model based on a multi-layer feature fusion (FMLF) method. The presented approach includes extended feature extraction and fusion mechanisms in the Convolutional Neural Network (CNN) backbone, as well as an effective metric to compute the divergences in the end. In order to evaluate the proposed method, a challenging visual classification problem, maize crop insect classification with specific pests and beneficial categories, is addressed, serving both as a test of our model and as a means to propose a novel dataset. Experiments were carried out to compare the results with ResNet50, VGG16, and MobileNetv2, used as feature extraction backbones, and the FMLF method demonstrated higher accuracy with fewer parameters. The proposed FMLF method improved accuracy scores by up to 3.62% in one-shot and 2.82% in five-shot classification tasks compared to a traditional backbone, which uses only global image features.
RESUMO
Precise instrument segmentation aids surgeons to navigate the body more easily and increases patient safety. While accurate tracking of surgical instruments in real-time plays a crucial role in minimally invasive computer-assisted surgeries, it is a challenging task to achieve, mainly due to: (1) a complex surgical environment, and (2) model design trade-off in terms of both optimal accuracy and speed. Deep learning gives us the opportunity to learn complex environment from large surgery scene environments and placements of these instruments in real world scenarios. The Robust Medical Instrument Segmentation 2019 challenge (ROBUST-MIS) provides more than 10,000 frames with surgical tools in different clinical settings. In this paper, we propose a light-weight single stage instance segmentation model complemented with a convolutional block attention module for achieving both faster and accurate inference. We further improve accuracy through data augmentation and optimal anchor localization strategies. To our knowledge, this is the first work that explicitly focuses on both real-time performance and improved accuracy. Our approach out-performed top team performances in the most recent edition of ROBUST-MIS challenge with over 44% improvement on area-based multi-instance dice metric MI_DSC and 39% on distance-based multi-instance normalized surface dice MI_NSD. We also demonstrate real-time performance (>60 frames-per-second) with different but competitive variants of our final approach.
Assuntos
Cirurgia Assistida por Computador , Instrumentos Cirúrgicos , Atenção , Humanos , Processamento de Imagem Assistida por Computador , Procedimentos Cirúrgicos Minimamente InvasivosRESUMO
Abstract Vehicle re-id play a very import role in recent public safety, it has received more and more attention. The local features (e.g. hanging decorations and stickers) are widely used for vehicle re-id, but the same local feature exists in one perspective, but not exactly exists in other perspectives. In this paper, we firstly use experiments to verify that there is a low linear correlation between different dimension global features. Then we propose a new technique which uses global features instead of local features to distinguish the nuances between different vehicles. We design a vehicle re-identification method named a generated multi branch feature fusion method (GMBFF) to make full use of the complementarity between global features with different dimensions. All branches of the proposed GMBFF model are derived from the same model and there are only slight differences among those branches. Each of those branches can extract highly discriminative features with different dimensions. Finally, we fuse the features extracted by these branches. Existing research uses the fusing features for fusion and we use the global vehicle features for fusion. We also propose two different feature fusion methods which are single fusion method (SFM) and multi fusion method (MFM). In SFM, features for fusion with larger dimension occupy more weight in fused features. MFM overcomes the disadvantage of SFM. Finally, we carry out a lot of experiments on two widely used datasets which are VeRi-776 dataset and Vehicle ID dataset. The experimental results show that our proposed method is much better than the state-of-the-art vehicle re-identification methods.
RESUMO
The result of the emotional state induced by music may provide theoretical support and help for assisted music therapy. The key to assessing the state of emotion is feature extraction of the emotional electroencephalogram (EEG). In this paper, we study the performance optimization of the feature extraction algorithm. A public multimodal database for emotion analysis using physiological signals (DEAP) proposed by Koelstra et al. was applied. Eight kinds of positive and negative emotions were extracted from the dataset, representing the data of fourteen channels from the different regions of brain. Based on wavelet transform, δ, θ, α and ß rhythms were extracted. This paper analyzed and compared the performances of three kinds of EEG features for emotion classification, namely wavelet features (wavelet coefficients energy and wavelet entropy), approximate entropy and Hurst exponent. On this basis, an EEG feature fusion algorithm based on principal component analysis (PCA) was proposed. The principal component with a cumulative contribution rate more than 85% was retained, and the parameters which greatly varied in characteristic root were selected. The support vector machine was used to assess the state of emotion. The results showed that the average accuracy rates of emotional classification with wavelet features, approximate entropy and Hurst exponent were respectively 73.15%, 50.00% and 45.54%. By combining these three methods, the features fused with PCA possessed an accuracy of about 85%. The obtained classification accuracy by using the proposed fusion algorithm based on PCA was improved at least 12% than that by using single feature, providing assistance for emotional EEG feature extraction and music therapy.