Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
Sensors (Basel) ; 23(10)2023 May 15.
Article in English | MEDLINE | ID: mdl-37430689

ABSTRACT

Human facial emotion detection is one of the challenging tasks in computer vision. Owing to high inter-class variance, it is hard for machine learning models to predict facial emotions accurately. Moreover, a person with several facial emotions increases the diversity and complexity of classification problems. In this paper, we have proposed a novel and intelligent approach for the classification of human facial emotions. The proposed approach comprises customized ResNet18 by employing transfer learning with the integration of triplet loss function (TLF), followed by SVM classification model. Using deep features from a customized ResNet18 trained with triplet loss, the proposed pipeline consists of a face detector used to locate and refine the face bounding box and a classifier to identify the facial expression class of discovered faces. RetinaFace is used to extract the identified face areas from the source image, and a ResNet18 model is trained on cropped face images with triplet loss to retrieve those features. An SVM classifier is used to categorize the facial expression based on the acquired deep characteristics. In this paper, we have proposed a method that can achieve better performance than state-of-the-art (SoTA) methods on JAFFE and MMI datasets. The technique is based on the triplet loss function to generate deep input image features. The proposed method performed well on the JAFFE and MMI datasets with an accuracy of 98.44% and 99.02%, respectively, on seven emotions; meanwhile, the performance of the method needs to be fine-tuned for the FER2013 and AFFECTNET datasets.


Subject(s)
Emotions , Support Vector Machine , Humans , Intelligence , Machine Learning
2.
Sensors (Basel) ; 23(13)2023 Jun 25.
Article in English | MEDLINE | ID: mdl-37447738

ABSTRACT

Detecting dense text in scene images is a challenging task due to the high variability, complexity, and overlapping of text areas. To adequately distinguish text instances with high density in scenes, we propose an efficient approach called DenseTextPVT. We first generated high-resolution features at different levels to enable accurate dense text detection, which is essential for dense prediction tasks. Additionally, to enhance the feature representation, we designed the Deep Multi-scale Feature Refinement Network (DMFRN), which effectively detects texts of varying sizes, shapes, and fonts, including small-scale texts. DenseTextPVT, then, is inspired by Pixel Aggregation (PA) similarity vector algorithms to cluster text pixels into correct text kernels in the post-processing step. In this way, our proposed method enhances the precision of text detection and effectively reduces overlapping between text regions under dense adjacent text in natural images. The comprehensive experiments indicate the effectiveness of our method on the TotalText, CTW1500, and ICDAR-2015 benchmark datasets in comparison to existing methods.


Subject(s)
Algorithms , Benchmarking , Electric Power Supplies
3.
Sensors (Basel) ; 23(1)2022 Dec 24.
Article in English | MEDLINE | ID: mdl-36616796

ABSTRACT

Speech emotion recognition (SER) is one of the most exciting topics many researchers have recently been involved in. Although much research has been conducted recently on this topic, emotion recognition via non-verbal speech (known as the vocal burst) is still sparse. The vocal burst is concise and has meaningless content, which is harder to deal with than verbal speech. Therefore, in this paper, we proposed a self-relation attention and temporal awareness (SRA-TA) module to tackle this problem with vocal bursts, which could capture the dependency in a long-term period and focus on the salient parts of the audio signal as well. Our proposed method contains three main stages. Firstly, the latent features are extracted using a self-supervised learning model from the raw audio signal and its Mel-spectrogram. After the SRA-TA module is utilized to capture the valuable information from latent features, all features are concatenated and fed into ten individual fully-connected layers to predict the scores of 10 emotions. Our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, which achieves the first ranking of the high-dimensional emotion task in the 2022 ACII Affective Vocal Burst Workshop & Challenge.


Subject(s)
Emotions , Speech Perception , Speech , Attention
4.
Front Oncol ; 11: 697178, 2021.
Article in English | MEDLINE | ID: mdl-34660267

ABSTRACT

Segmentation of liver tumors from Computerized Tomography (CT) images remains a challenge due to the natural variation in tumor shape and structure as well as the noise in CT images. A key assumption is that the performance of liver tumor segmentation depends on the characteristics of multiple features extracted from multiple filters. In this paper, we design an enhanced approach based on a two-class (liver, tumor) convolutional neural network that discriminates tumor as well as liver from CT images. First, the contrast and intensity values in CT images are adjusted and high frequencies are removed using Hounsfield units (HU) filtering and standardization. Then, the liver tumor is segmented from entire images with multiple filter U-net (MFU-net). Finally, a quantitative analysis is carried out to evaluate the segmentation results using three different methods: boundary-distance-based metrics, size-based metrics, and overlap-based metrics. The proposed method is validated on CT images from the 3Dircadb and LiTS dataset. The results demonstrate that the multiple filters are useful for extracting local and global feature simultaneously, minimizing the boundary distance errors, and our approach demonstrates better performance in heterogeneous tumor regions of CT images.

5.
Sensors (Basel) ; 21(15)2021 Jul 27.
Article in English | MEDLINE | ID: mdl-34372327

ABSTRACT

Besides facial or gesture-based emotion recognition, Electroencephalogram (EEG) data have been drawing attention thanks to their capability in countering the effect of deceptive external expressions of humans, like faces or speeches. Emotion recognition based on EEG signals heavily relies on the features and their delineation, which requires the selection of feature categories converted from the raw signals and types of expressions that could display the intrinsic properties of an individual signal or a group of them. Moreover, the correlation or interaction among channels and frequency bands also contain crucial information for emotional state prediction, and it is commonly disregarded in conventional approaches. Therefore, in our method, the correlation between 32 channels and frequency bands were put into use to enhance the emotion prediction performance. The extracted features chosen from the time domain were arranged into feature-homogeneous matrices, with their positions following the corresponding electrodes placed on the scalp. Based on this 3D representation of EEG signals, the model must have the ability to learn the local and global patterns that describe the short and long-range relations of EEG channels, along with the embedded features. To deal with this problem, we proposed the 2D CNN with different kernel-size of convolutional layers assembled into a convolution block, combining features that were distributed in small and large regions. Ten-fold cross validation was conducted on the DEAP dataset to prove the effectiveness of our approach. We achieved the average accuracies of 98.27% and 98.36% for arousal and valence binary classification, respectively.


Subject(s)
Electroencephalography , Neural Networks, Computer , Arousal , Electrodes , Emotions , Humans
6.
Sensors (Basel) ; 21(13)2021 Jul 02.
Article in English | MEDLINE | ID: mdl-34283090

ABSTRACT

One essential step in radiotherapy treatment planning is the organ at risk of segmentation in Computed Tomography (CT). Many recent studies have focused on several organs such as the lung, heart, esophagus, trachea, liver, aorta, kidney, and prostate. However, among the above organs, the esophagus is one of the most difficult organs to segment because of its small size, ambiguous boundary, and very low contrast in CT images. To address these challenges, we propose a fully automated framework for the esophagus segmentation from CT images. The proposed method is based on the processing of slice images from the original three-dimensional (3D) image so that our method does not require large computational resources. We employ the spatial attention mechanism with the atrous spatial pyramid pooling module to locate the esophagus effectively, which enhances the segmentation performance. To optimize our model, we use group normalization because the computation is independent of batch sizes, and its performance is stable. We also used the simultaneous truth and performance level estimation (STAPLE) algorithm to reach robust results for segmentation. Firstly, our model was trained by k-fold cross-validation. And then, the candidate labels generated by each fold were combined by using the STAPLE algorithm. And as a result, Dice and Hausdorff Distance scores have an improvement when applying this algorithm to our segmentation results. Our method was evaluated on SegTHOR and StructSeg 2019 datasets, and the experiment shows that our method outperforms the state-of-the-art methods in esophagus segmentation. Our approach shows a promising result in esophagus segmentation, which is still challenging in medical analyses.


Subject(s)
Image Processing, Computer-Assisted , Neural Networks, Computer , Algorithms , Esophagus/diagnostic imaging , Humans , Male , Tomography, X-Ray Computed
7.
BMC Bioinformatics ; 22(1): 192, 2021 Apr 15.
Article in English | MEDLINE | ID: mdl-33858319

ABSTRACT

BACKGROUND: The Cox proportional hazards model is commonly used to predict hazard ratio, which is the risk or probability of occurrence of an event of interest. However, the Cox proportional hazard model cannot directly generate an individual survival time. To do this, the survival analysis in the Cox model converts the hazard ratio to survival times through distributions such as the exponential, Weibull, Gompertz or log-normal distributions. In other words, to generate the survival time, the Cox model has to select a specific distribution over time. RESULTS: This study presents a method to predict the survival time by integrating hazard network and a distribution function network. The Cox proportional hazards network is adapted in DeepSurv for the prediction of the hazard ratio and a distribution function network applied to generate the survival time. To evaluate the performance of the proposed method, a new evaluation metric that calculates the intersection over union between the predicted curve and ground truth was proposed. To further understand significant prognostic factors, we use the 1D gradient-weighted class activation mapping method to highlight the network activations as a heat map visualization over an input data. The performance of the proposed method was experimentally verified and the results compared to other existing methods. CONCLUSIONS: Our results confirmed that the combination of the two networks, Cox proportional hazards network and distribution function network, can effectively generate accurate survival time.


Subject(s)
Research Design , Probability , Proportional Hazards Models , Survival Analysis
8.
Sensors (Basel) ; 21(7)2021 Mar 27.
Article in English | MEDLINE | ID: mdl-33801739

ABSTRACT

Emotion recognition plays an important role in human-computer interactions. Recent studies have focused on video emotion recognition in the wild and have run into difficulties related to occlusion, illumination, complex behavior over time, and auditory cues. State-of-the-art methods use multiple modalities, such as frame-level, spatiotemporal, and audio approaches. However, such methods have difficulties in exploiting long-term dependencies in temporal information, capturing contextual information, and integrating multi-modal information. In this paper, we introduce a multi-modal flexible system for video-based emotion recognition in the wild. Our system tracks and votes on significant faces corresponding to persons of interest in a video to classify seven basic emotions. The key contribution of this study is that it proposes the use of face feature extraction with context-aware and statistical information for emotion recognition. We also build two model architectures to effectively exploit long-term dependencies in temporal information with a temporal-pyramid model and a spatiotemporal model with "Conv2D+LSTM+3DCNN+Classify" architecture. Finally, we propose the best selection ensemble to improve the accuracy of multi-modal fusion. The best selection ensemble selects the best combination from spatiotemporal and temporal-pyramid models to achieve the best accuracy for classifying the seven basic emotions. In our experiment, we take benchmark measurement on the AFEW dataset with high accuracy.


Subject(s)
Awareness , Emotions , Humans , Photic Stimulation , Physical Therapy Modalities
9.
Appl Opt ; 53(33): 7924-36, 2014 Nov 20.
Article in English | MEDLINE | ID: mdl-25607869

ABSTRACT

Most methods for the detection and removal of specular reflections suffer from nonuniform highlight regions and/or nonconverged artifacts induced by discontinuities in the surface colors, especially when dealing with highly textured, multicolored images. In this paper, a novel noniterative and predefined constraint-free method based on tensor voting is proposed to detect and remove the highlight components of a single color image. The distribution of diffuse and specular pixels in the original image is determined using tensors' saliency analysis, instead of comparing color information among neighbor pixels. The achieved diffuse reflectance distribution is used to remove specularity components. The proposed method is evaluated quantitatively and qualitatively over a dataset of highly textured, multicolor images. The experimental results show that our result outperforms other state-of-the-art techniques.

10.
J Comput Assist Tomogr ; 35(2): 280-9, 2011.
Article in English | MEDLINE | ID: mdl-21412104

ABSTRACT

OBJECTIVE: This article presents a new computerized scheme that aims to accurately and robustly separate left and right lungs on computed tomography (CT) examinations. METHODS: We developed and tested a method to separate the left and right lungs using sequential CT information and a guided dynamic programming algorithm using adaptively and automatically selected start point and end point with especially severe and multiple connections. RESULTS: The scheme successfully identified and separated all 827 connections on the total 4034 CT images in an independent testing data set of CT examinations. The proposed scheme separated multiple connections regardless of their locations, and the guided dynamic programming algorithm reduced the computation time to approximately 4.6% in comparison with the traditional dynamic programming and avoided the permeation of the separation boundary into normal lung tissue. CONCLUSIONS: The proposed method is able to robustly and accurately disconnect all connections between left and right lungs, and the guided dynamic programming algorithm is able to remove redundant processing.


Subject(s)
Algorithms , Imaging, Three-Dimensional/methods , Lung/diagnostic imaging , Pattern Recognition, Automated/methods , Pulmonary Disease, Chronic Obstructive/diagnostic imaging , Radiographic Image Interpretation, Computer-Assisted/methods , Tomography, X-Ray Computed/methods , Adult , Artificial Intelligence , Female , Humans , Male , Radiographic Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity , Subtraction Technique
SELECTION OF CITATIONS
SEARCH DETAIL
...