Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 96
Filtrar
1.
Sensors (Basel) ; 24(11)2024 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-38894200

RESUMO

Chicken behavior recognition is crucial for a number of reasons, including promoting animal welfare, ensuring the early detection of health issues, optimizing farm management practices, and contributing to more sustainable and ethical poultry farming. In this paper, we introduce a technique for recognizing chicken behavior on edge computing devices based on video sensing mosaicing. Our method combines video sensing mosaicing with deep learning to accurately identify specific chicken behaviors from videos. It attains remarkable accuracy, achieving 79.61% with MobileNetV2 for chickens demonstrating three types of behavior. These findings underscore the efficacy and promise of our approach in chicken behavior recognition on edge computing devices, making it adaptable for diverse applications. The ongoing exploration and identification of various behavioral patterns will contribute to a more comprehensive understanding of chicken behavior, enhancing the scope and accuracy of behavior analysis within diverse contexts.


Assuntos
Criação de Animais Domésticos , Comportamento Animal , Galinhas , Metodologias Computacionais , Criação de Animais Domésticos/instrumentação , Criação de Animais Domésticos/métodos , Gravação em Vídeo , Animais , Aprendizado Profundo
2.
Methods ; 202: 22-30, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-33838272

RESUMO

This paper focuses on automatic Cholangiocarcinoma (CC) diagnosis from microscopic hyperspectral (HSI) pathological dataset with deep learning method. The first benchmark based on the microscopic hyperspectral pathological images is set up. Particularly, 880 scenes of multidimensional hyperspectral Cholangiocarcinoma images are collected and manually labeled each pixel as either tumor or non-tumor for supervised learning. Moreover, each scene from the slide is given a binary label indicating whether it is from a patient or a normal person. Different from traditional RGB images, the HSI acquires pixels in multiple spectral intervals, which is added as an extension on the channel dimension of 3-channel RGB image. This work aims at fully exploiting the spatial-spectral HSI data through a deep Convolution Neural Network (CNN). The whole scene is first divided into several patches. Then they are fed into CNN for the tumor/non-tumor binary prediction and the tumor area regression. The further diagnosis on the scene is made by random forest based on the features from patch prediction. Experiments show that HSI provides a more accurate result than RGB image. Moreover, a spectral interval convolution and normalization scheme are proposed for further mining the spectral information in HSI, which demonstrates the effectiveness of the spatial-spectral data for CC diagnosis.


Assuntos
Colangiocarcinoma , Redes Neurais de Computação , Colangiocarcinoma/diagnóstico , Humanos
3.
Chemometr Intell Lab Syst ; 233: 104750, 2023 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-36619376

RESUMO

Deep learning (DL) algorithms have demonstrated a high ability to perform speedy and accurate COVID-19 diagnosis utilizing computed tomography (CT) and X-Ray scans. The spatial information in these images was used to train DL models in the majority of relevant studies. However, training these models with images generated by radiomics approaches could enhance diagnostic accuracy. Furthermore, combining information from several radiomics approaches with time-frequency representations of the COVID-19 patterns can increase performance even further. This study introduces "RADIC", an automated tool that uses three DL models that are trained using radiomics-generated images to detect COVID-19. First, four radiomics approaches are used to analyze the original CT and X-ray images. Next, each of the three DL models is trained on a different set of radiomics, X-ray, and CT images. Then, for each DL model, deep features are obtained, and their dimensions are decreased using the Fast Walsh Hadamard Transform, yielding a time-frequency representation of the COVID-19 patterns. The tool then uses the discrete cosine transform to combine these deep features. Four classification models are then used to achieve classification. In order to validate the performance of RADIC, two benchmark datasets (CT and X-Ray) for COVID-19 are employed. The final accuracy attained using RADIC is 99.4% and 99% for the first and second datasets respectively. To prove the competing ability of RADIC, its performance is compared with related studies in the literature. The results reflect that RADIC achieve superior performance compared to other studies. The results of the proposed tool prove that a DL model can be trained more effectively with images generated by radiomics techniques than the original X-Ray and CT images. Besides, the incorporation of deep features extracted from DL models trained with multiple radiomics approaches will improve diagnostic accuracy.

4.
Sensors (Basel) ; 23(10)2023 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-37430568

RESUMO

Two convolution neural network (CNN) models are introduced to accurately classify event-related potentials (ERPs) by fusing frequency, time, and spatial domain information acquired from the continuous wavelet transform (CWT) of the ERPs recorded from multiple spatially distributed channels. The multidomain models fuse the multichannel Z-scalograms and the V-scalograms, which are generated from the standard CWT scalogram by zeroing-out and by discarding the inaccurate artifact coefficients that are outside the cone of influence (COI), respectively. In the first multidomain model, the input to the CNN is generated by fusing the Z-scalograms of the multichannel ERPs into a frequency-time-spatial cuboid. The input to the CNN in the second multidomain model is formed by fusing the frequency-time vectors of the V-scalograms of the multichannel ERPs into a frequency-time-spatial matrix. Experiments are designed to demonstrate (a) customized classification of ERPs, where the multidomain models are trained and tested with the ERPs of individual subjects for brain-computer interface (BCI)-type applications, and (b) group-based ERP classification, where the models are trained on the ERPs from a group of subjects and tested on single subjects not included in the training set for applications such as brain disorder classification. Results show that both multidomain models yield high classification accuracies for single trials and small-average ERPs with a small subset of top-ranked channels, and the multidomain fusion models consistently outperform the best unichannel classifiers.


Assuntos
Artefatos , Encefalopatias , Humanos , Encéfalo , Potenciais Evocados , Redes Neurais de Computação
5.
Sensors (Basel) ; 23(4)2023 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-36850647

RESUMO

Non-intrusive load monitoring systems that are based on deep learning methods produce high-accuracy end use detection; however, they are mainly designed with the one vs. one strategy. This strategy dictates that one model is trained to disaggregate only one appliance, which is sub-optimal in production. Due to the high number of parameters and the different models, training and inference can be very costly. A promising solution to this problem is the design of an NILM system in which all the target appliances can be recognized by only one model. This paper suggests a novel multi-appliance power disaggregation model. The proposed architecture is a multi-target regression neural network consisting of two main parts. The first part is a variational encoder with convolutional layers, and the second part has multiple regression heads which share the encoder's parameters. Considering the total consumption of an installation, the multi-regressor outputs the individual consumption of all the target appliances simultaneously. The experimental setup includes a comparative analysis against other multi- and single-target state-of-the-art models.

6.
Sensors (Basel) ; 23(3)2023 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-36772427

RESUMO

Emotions have a crucial function in the mental existence of humans. They are vital for identifying a person's behaviour and mental condition. Speech Emotion Recognition (SER) is extracting a speaker's emotional state from their speech signal. SER is a growing discipline in human-computer interaction, and it has recently attracted more significant interest. This is because there are not so many universal emotions; therefore, any intelligent system with enough computational capacity can educate itself to recognise them. However, the issue is that human speech is immensely diverse, making it difficult to create a single, standardised recipe for detecting hidden emotions. This work attempted to solve this research difficulty by combining a multilingual emotional dataset with building a more generalised and effective model for recognising human emotions. A two-step process was used to develop the model. The first stage involved the extraction of features, and the second stage involved the classification of the features that were extracted. ZCR, RMSE, and the renowned MFC coefficients were retrieved as features. Two proposed models, 1D CNN combined with LSTM and attention and a proprietary 2D CNN architecture, were used for classification. The outcomes demonstrated that the suggested 1D CNN with LSTM and attention performed better than the 2D CNN. For the EMO-DB, SAVEE, ANAD, and BAVED datasets, the model's accuracy was 96.72%, 97.13%, 96.72%, and 88.39%, respectively. The model beat several earlier efforts on the same datasets, demonstrating the generality and efficacy of recognising multiple emotions from various languages.


Assuntos
Redes Neurais de Computação , Fala , Humanos , Atenção , Emoções , Computadores
7.
Cluster Comput ; 26(2): 1389-1403, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36034678

RESUMO

Coronavirus disease (COVID-19) is rapidly spreading worldwide. Recent studies show that radiological images contain accurate data for detecting the coronavirus. This paper proposes a pre-trained convolutional neural network (VGG16) with Capsule Neural Networks (CapsNet) to detect COVID-19 with unbalanced data sets. The CapsNet is proposed due to its ability to define features such as perspective, orientation, and size. Synthetic Minority Over-sampling Technique (SMOTE) was employed to ensure that new samples were generated close to the sample center, avoiding the production of outliers or changes in data distribution. As the results may change by changing capsule network parameters (Capsule dimensionality and routing number), the Gaussian optimization method has been used to optimize these parameters. Four experiments have been done, (1) CapsNet with the unbalanced data sets, (2) CapsNet with balanced data sets based on class weight, (3) CapsNet with balanced data sets based on SMOTE, and (4) CapsNet hyperparameters optimization with balanced data sets based on SMOTE. The performance has improved and achieved an accuracy rate of 96.58% and an F1- score of 97.08%, a competitive optimized model compared to other related models.

8.
BMC Med Inform Decis Mak ; 22(Suppl 6): 318, 2022 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-36476613

RESUMO

BACKGROUND: In recent years, neuroimaging with deep learning (DL) algorithms have made remarkable advances in the diagnosis of neurodegenerative disorders. However, applying DL in different medical domains is usually challenged by lack of labeled data. To address this challenge, transfer learning (TL) has been applied to use state-of-the-art convolution neural networks pre-trained on natural images. Yet, there are differences in characteristics between medical and natural images, also image classification and targeted medical diagnosis tasks. The purpose of this study is to investigate the performance of specialized and TL in the classification of neurodegenerative disorders using 3D volumes of 18F-FDG-PET brain scans. RESULTS: Results show that TL models are suboptimal for classification of neurodegenerative disorders, especially when the objective is to separate more than two disorders. Additionally, specialized CNN model provides better interpretations of predicted diagnosis. CONCLUSIONS: TL can indeed lead to superior performance on binary classification in timely and data efficient manner, yet for detecting more than a single disorder, TL models do not perform well. Additionally, custom 3D model performs comparably to TL models for binary classification, and interestingly perform better for diagnosis of multiple disorders. The results confirm the superiority of the custom 3D-CNN in providing better explainable model compared to TL adopted ones.


Assuntos
Redes Neurais de Computação , Doenças Neurodegenerativas , Humanos , Aprendizado de Máquina
9.
Sensors (Basel) ; 22(5)2022 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-35271130

RESUMO

The periodic inspection of railroad tracks is very important to find structural and geometrical problems that lead to railway accidents. Currently, in Pakistan, rail tracks are inspected by an acoustic-based manual system that requires a railway engineer as a domain expert to differentiate between different rail tracks' faults, which is cumbersome, laborious, and error-prone. This study proposes the use of traditional acoustic-based systems with deep learning models to increase performance and reduce train accidents. Two convolutional neural networks (CNN) models, convolutional 1D and convolutional 2D, and one recurrent neural network (RNN) model, a long short-term memory (LSTM) model, are used in this regard. Initially, three types of faults are considered, including superelevation, wheel burnt, and normal tracks. Contrary to traditional acoustic-based systems where the spectrogram dataset is generated before the model training, the proposed approach uses on-the-fly feature extraction by generating spectrograms as a deep learning model's layer. Different lengths of audio samples are used to analyze their performance with each model. Each audio sample of 17 s is split into 3 variations of 1.7, 3.4, and 8.5 s, and all 3 deep learning models are trained and tested against each split time. Various combinations of audio data augmentation are analyzed extensively to investigate models' performance. The results suggest that the LSTM with 8.5 split time gives the best results with the accuracy of 99.7%, the precision of 99.5%, recall of 99.5%, and F1 score of 99.5%.


Assuntos
Aprendizado Profundo , Acústica , Redes Neurais de Computação
10.
Sensors (Basel) ; 22(10)2022 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-35632070

RESUMO

Deep learning-based methods, especially convolutional neural networks, have been developed to automatically process the images of concrete surfaces for crack identification tasks. Although deep learning-based methods claim very high accuracy, they often ignore the complexity of the image collection process. Real-world images are often impacted by complex illumination conditions, shadows, the randomness of crack shapes and sizes, blemishes, and concrete spall. Published literature and available shadow databases are oriented towards images taken in laboratory conditions. In this paper, we explore the complexity of image classification for concrete crack detection in the presence of demanding illumination conditions. Challenges associated with the application of deep learning-based methods for detecting concrete cracks in the presence of shadows are elaborated on in this paper. Novel shadow augmentation techniques are developed to increase the accuracy of automatic detection of concrete cracks.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação
11.
Sensors (Basel) ; 22(12)2022 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-35746398

RESUMO

Object detection is one of the most important and challenging branches of computer vision. It has been widely used in people's lives, such as for surveillance security and autonomous driving. We propose a novel dual-path multi-scale object detection paradigm in order to extract more abundant feature information for the object detection task and optimize the multi-scale object detection problem, and based on this, we design a single-stage general object detection algorithm called Dual-Path Single-Shot Detector (DPSSD). The dual path ensures that shallow features, i.e., residual path and concatenation path, can be more easily utilized to improve detection accuracy. Our improved dual-path network is more adaptable to multi-scale object detection tasks, and we combine it with the feature fusion module to generate a multi-scale feature learning paradigm called the "Dual-Path Feature Pyramid". We trained the models on PASCAL VOC datasets and COCO datasets with 320 pixels and 512 pixels input, respectively, and performed inference experiments to validate the structures in the neural network. The experimental results show that our algorithm has an advantage over anchor-based single-stage object detection algorithms and achieves an advanced level in average accuracy. Researchers can replicate the reported results of this paper.


Assuntos
Condução de Veículo , Redes Neurais de Computação , Algoritmos , Progressão da Doença , Humanos , Aprendizagem
12.
Sensors (Basel) ; 22(14)2022 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-35890872

RESUMO

Nowadays, the grinding process is mostly automatic, yet post-grinding quality inspection is mostly carried out manually. Although the conventional inspection technique may have cumbersome setup and tuning processes, the data-driven model, with its vision-based dataset, provides an opportunity to automate the inspection process. In this study, a convolutional neural network technique with transfer learning is proposed for three kinds of inspections based on 750-1000 surface raw images of the ground workpieces in each task: classifying the grit number of the abrasive belt that grinds the workpiece, estimating the surface roughness of the ground workpiece, and classifying the degree of wear of the abrasive belts. The results show that a deep convolutional neural network can recognize the texture on the abrasive surface images and that the classification model can achieve an accuracy of 0.9 or higher. In addition, the external coaxial white light was the most suitable light source among the three tested light sources: the external coaxial white light, the high-angle ring light, and the external coaxial red light. Finally, the model that classifies the degree of wear of the abrasive belts can also be utilized as the abrasive belt life estimator.

13.
Sensors (Basel) ; 22(9)2022 Apr 28.
Artigo em Inglês | MEDLINE | ID: mdl-35591069

RESUMO

The average error rate in liver cirrhosis classification on B-mode ultrasound images using the traditional pattern recognition approach is still too high. In order to improve the liver cirrhosis classification performance, image correction methods and a convolution neural network (CNN) approach are focused on. The impact of image correction methods on region of interest (ROI) images that are input into the CNN for the purpose of classifying liver cirrhosis based on data from B-mode ultrasound images is investigated. In this paper, image correction methods based on tone curves are developed. The experimental results show positive benefits from the image correction methods by improving the image quality of ROI images. By enhancing the image contrast of ROI images, the image quality improves and thus the generalization ability of the CNN also improves.


Assuntos
Processamento de Imagem Assistida por Computador , Cirrose Hepática , Humanos , Processamento de Imagem Assistida por Computador/métodos , Cirrose Hepática/diagnóstico por imagem , Redes Neurais de Computação , Ultrassonografia
14.
Proc Natl Acad Sci U S A ; 115(2): 254-259, 2018 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-29279403

RESUMO

Deep convolutional neural networks have been successfully applied to many image-processing problems in recent works. Popular network architectures often add additional operations and connections to the standard architecture to enable training deeper networks. To achieve accurate results in practice, a large number of trainable parameters are often required. Here, we introduce a network architecture based on using dilated convolutions to capture features at different image scales and densely connecting all feature maps with each other. The resulting architecture is able to achieve accurate results with relatively few parameters and consists of a single set of operations, making it easier to implement, train, and apply in practice, and automatically adapts to different problems. We compare results of the proposed network architecture with popular existing architectures for several segmentation problems, showing that the proposed architecture is able to achieve accurate results with fewer parameters, with a reduced risk of overfitting the training data.


Assuntos
Diagnóstico por Imagem/métodos , Aprendizado de Máquina , Modelos Teóricos , Redes Neurais de Computação , Algoritmos , Simulação por Computador
15.
BMC Biol ; 18(1): 113, 2020 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-32883273

RESUMO

BACKGROUND: Studies of mammalian sexual dimorphism have traditionally involved the measurement of selected dimensions of particular skeletal elements and use of single data-analysis procedures. Consequently, such studies have been limited by a variety of both practical and conceptual constraints. To compare and contrast what might be gained from a more exploratory, multifactorial approach to the quantitative assessment of form-variation, images of a small sample of modern Israeli gray wolf (Canis lupus) crania were analyzed via elliptical Fourier analysis of cranial outlines, a Naïve Bayes machine-learning approach to the analysis of these same outline data, and a deep-learning analysis of whole images in which all aspects of these cranial morphologies were represented. The statistical significance and stability of each discriminant result were tested using bootstrap and jackknife procedures. RESULTS: Our results reveal no evidence for statistically significant sexual size dimorphism, but significant sex-mediated shape dimorphism. These are consistent with the findings of prior wolf sexual dimorphism studies and extend these studies by identifying new aspects of dimorphic variation. Additionally, our results suggest that shape-based sexual dimorphism in the C. lupus cranial complex may be more widespread morphologically than had been appreciated by previous researchers. CONCLUSION: Our results suggest that size and shape dimorphism can be detected in small samples and may be dissociated in mammalian morphologies. This result is particularly noteworthy in that it implies there may be a need to refine allometric hypothesis tests that seek to account for phenotypic sexual dimorphism. The methods we employed in this investigation are fully generalizable and can be applied to a wide range of biological materials and could facilitate the rapid evaluation of a diverse array of morphological/phenomic hypotheses.


Assuntos
Aprendizado de Máquina , Caracteres Sexuais , Crânio/anatomia & histologia , Lobos/anatomia & histologia , Animais , Teorema de Bayes , Feminino , Análise de Fourier , Israel , Masculino
16.
Sensors (Basel) ; 21(18)2021 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-34577468

RESUMO

This paper outlines a system for detecting printing errors and misidentifications on hard disk drive sliders, which may contribute to shipping tracking problems and incorrect product delivery to end users. A deep-learning-based technique is proposed for determining the printed identity of a slider serial number from images captured by a digital camera. Our approach starts with image preprocessing methods that deal with differences in lighting and printing positions and then progresses to deep learning character detection based on the You-Only-Look-Once (YOLO) v4 algorithm and finally character classification. For character classification, four convolutional neural networks (CNN) were compared for accuracy and effectiveness: DarkNet-19, EfficientNet-B0, ResNet-50, and DenseNet-201. Experimenting on almost 15,000 photographs yielded accuracy greater than 99% on four CNN networks, proving the feasibility of the proposed technique. The EfficientNet-B0 network outperformed highly qualified human readers with the best recovery rate (98.4%) and fastest inference time (256.91 ms).


Assuntos
Aprendizado Profundo , Algoritmos , Humanos , Redes Neurais de Computação
17.
Sensors (Basel) ; 21(6)2021 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-33803891

RESUMO

Human activity recognition (HAR) remains a challenging yet crucial problem to address in computer vision. HAR is primarily intended to be used with other technologies, such as the Internet of Things, to assist in healthcare and eldercare. With the development of deep learning, automatic high-level feature extraction has become a possibility and has been used to optimize HAR performance. Furthermore, deep-learning techniques have been applied in various fields for sensor-based HAR. This study introduces a new methodology using convolution neural networks (CNN) with varying kernel dimensions along with bi-directional long short-term memory (BiLSTM) to capture features at various resolutions. The novelty of this research lies in the effective selection of the optimal video representation and in the effective extraction of spatial and temporal features from sensor data using traditional CNN and BiLSTM. Wireless sensor data mining (WISDM) and UCI datasets are used for this proposed methodology in which data are collected through diverse methods, including accelerometers, sensors, and gyroscopes. The results indicate that the proposed scheme is efficient in improving HAR. It was thus found that unlike other available methods, the proposed method improved accuracy, attaining a higher score in the WISDM dataset compared to the UCI dataset (98.53% vs. 97.05%).


Assuntos
Aprendizado Profundo , Mineração de Dados , Atividades Humanas , Humanos , Memória de Longo Prazo , Redes Neurais de Computação
18.
Sensors (Basel) ; 21(18)2021 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-34577402

RESUMO

In the recent era, various diseases have severely affected the lifestyle of individuals, especially adults. Among these, bone diseases, including Knee Osteoarthritis (KOA), have a great impact on quality of life. KOA is a knee joint problem mainly produced due to decreased Articular Cartilage between femur and tibia bones, producing severe joint pain, effusion, joint movement constraints and gait anomalies. To address these issues, this study presents a novel KOA detection at early stages using deep learning-based feature extraction and classification. Firstly, the input X-ray images are preprocessed, and then the Region of Interest (ROI) is extracted through segmentation. Secondly, features are extracted from preprocessed X-ray images containing knee joint space width using hybrid feature descriptors such as Convolutional Neural Network (CNN) through Local Binary Patterns (LBP) and CNN using Histogram of oriented gradient (HOG). Low-level features are computed by HOG, while texture features are computed employing the LBP descriptor. Lastly, multi-class classifiers, that is, Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbour (KNN), are used for the classification of KOA according to the Kellgren-Lawrence (KL) system. The Kellgren-Lawrence system consists of Grade I, Grade II, Grade III, and Grade IV. Experimental evaluation is performed on various combinations of the proposed framework. The experimental results show that the HOG features descriptor provides approximately 97% accuracy for the early detection and classification of KOA for all four grades of KL.


Assuntos
Osteoartrite do Joelho , Humanos , Articulação do Joelho/diagnóstico por imagem , Redes Neurais de Computação , Osteoartrite do Joelho/diagnóstico por imagem , Qualidade de Vida , Máquina de Vetores de Suporte
19.
Sensors (Basel) ; 21(12)2021 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-34204695

RESUMO

As the performance of devices that conduct large-scale computations has been rapidly improved, various deep learning models have been successfully utilized in various applications. Particularly, convolution neural networks (CNN) have shown remarkable performance in image processing tasks such as image classification and segmentation. Accordingly, more stable and robust optimization methods are required to effectively train them. However, the traditional optimizers used in deep learning still have unsatisfactory training performance for the models with many layers and weights. Accordingly, in this paper, we propose a new Adam-based hybrid optimization method called HyAdamC for training CNNs effectively. HyAdamC uses three new velocity control functions to adjust its search strength carefully in term of initial, short, and long-term velocities. Moreover, HyAdamC utilizes an adaptive coefficient computation method to prevent that a search direction determined by the first momentum is distorted by any outlier gradients. Then, these are combined into one hybrid method. In our experiments, HyAdamC showed not only notable test accuracies but also significantly stable and robust optimization abilities when training various CNN models. Furthermore, we also found that HyAdamC could be applied into not only image classification and image segmentation tasks.


Assuntos
Algoritmos , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador
20.
Sensors (Basel) ; 21(12)2021 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-34203112

RESUMO

Emotion is a form of high-level paralinguistic information that is intrinsically conveyed by human speech. Automatic speech emotion recognition is an essential challenge for various applications; including mental disease diagnosis; audio surveillance; human behavior understanding; e-learning and human-machine/robot interaction. In this paper, we introduce a novel speech emotion recognition method, based on the Squeeze and Excitation ResNet (SE-ResNet) model and fed with spectrogram inputs. In order to overcome the limitations of the state-of-the-art techniques, which fail in providing a robust feature representation at the utterance level, the CNN architecture is extended with a trainable discriminative GhostVLAD clustering layer that aggregates the audio features into compact, single-utterance vector representation. In addition, an end-to-end neural embedding approach is introduced, based on an emotionally constrained triplet loss function. The loss function integrates the relations between the various emotional patterns and thus improves the latent space data representation. The proposed methodology achieves 83.35% and 64.92% global accuracy rates on the RAVDESS and CREMA-D publicly available datasets, respectively. When compared with the results provided by human observers, the gains in global accuracy scores are superior to 24%. Finally, the objective comparative evaluation with state-of-the-art techniques demonstrates accuracy gains of more than 3%.


Assuntos
Redes Neurais de Computação , Fala , Emoções , Humanos , Aprendizado de Máquina , Percepção
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa