Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
IEEE J Biomed Health Inform ; 28(7): 3953-3964, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38652609

RESUMEN

Emotion recognition from electroencephalogram (EEG) signals is a critical domain in biomedical research with applications ranging from mental disorder regulation to human-computer interaction. In this paper, we address two fundamental aspects of EEG emotion recognition: continuous regression of emotional states and discrete classification of emotions. While classification methods have garnered significant attention, regression methods remain relatively under-explored. To bridge this gap, we introduce MASA-TCN, a novel unified model that leverages the spatial learning capabilities of Temporal Convolutional Networks (TCNs) for EEG emotion regression and classification tasks. The key innovation lies in the introduction of a space-aware temporal layer, which empowers TCN to capture spatial relationships among EEG electrodes, enhancing its ability to discern nuanced emotional states. Additionally, we design a multi-anchor block with attentive fusion, enabling the model to adaptively learn dynamic temporal dependencies within the EEG signals. Experiments on two publicly available datasets show that MASA-TCN achieves higher results than the state-of-the-art methods for both EEG emotion regression and classification tasks.


Asunto(s)
Electroencefalografía , Emociones , Redes Neurales de la Computación , Procesamiento de Señales Asistido por Computador , Humanos , Electroencefalografía/métodos , Emociones/fisiología , Emociones/clasificación , Algoritmos
2.
Entropy (Basel) ; 25(10)2023 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-37895561

RESUMEN

Multimodal emotion recognition (MER) refers to the identification and understanding of human emotional states by combining different signals, including-but not limited to-text, speech, and face cues. MER plays a crucial role in the human-computer interaction (HCI) domain. With the recent progression of deep learning technologies and the increasing availability of multimodal datasets, the MER domain has witnessed considerable development, resulting in numerous significant research breakthroughs. However, a conspicuous absence of thorough and focused reviews on these deep learning-based MER achievements is observed. This survey aims to bridge this gap by providing a comprehensive overview of the recent advancements in MER based on deep learning. For an orderly exposition, this paper first outlines a meticulous analysis of the current multimodal datasets, emphasizing their advantages and constraints. Subsequently, we thoroughly scrutinize diverse methods for multimodal emotional feature extraction, highlighting the merits and demerits of each method. Moreover, we perform an exhaustive analysis of various MER algorithms, with particular focus on the model-agnostic fusion methods (including early fusion, late fusion, and hybrid fusion) and fusion based on intermediate layers of deep models (encompassing simple concatenation fusion, utterance-level interaction fusion, and fine-grained interaction fusion). We assess the strengths and weaknesses of these fusion strategies, providing guidance to researchers to help them select the most suitable techniques for their studies. In summary, this survey aims to provide a thorough and insightful review of the field of deep learning-based MER. It is intended as a valuable guide to aid researchers in furthering the evolution of this dynamic and impactful field.

3.
Front Neurorobot ; 16: 987146, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36187564

RESUMEN

In this paper, we investigate a challenging but interesting task in the research of speech emotion recognition (SER), i.e., cross-corpus SER. Unlike the conventional SER, the training (source) and testing (target) samples in cross-corpus SER come from different speech corpora, which results in a feature distribution mismatch between them. Hence, the performance of most existing SER methods may sharply decrease. To cope with this problem, we propose a simple yet effective deep transfer learning method called progressive distribution adapted neural networks (PDAN). PDAN employs convolutional neural networks (CNN) as the backbone and the speech spectrum as the inputs to achieve an end-to-end learning framework. More importantly, its basic idea for solving cross-corpus SER is very straightforward, i.e., enhancing the backbone's corpus invariant feature learning ability by incorporating a progressive distribution adapted regularization term into the original loss function to guide the network training. To evaluate the proposed PDAN, extensive cross-corpus SER experiments on speech emotion corpora including EmoDB, eNTERFACE, and CASIA are conducted. Experimental results showed that the proposed PDAN outperforms most well-performing deep and subspace transfer learning methods in dealing with the cross-corpus SER tasks.

4.
Entropy (Basel) ; 24(9)2022 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-36141136

RESUMEN

In this paper, we focus on a challenging, but interesting, task in speech emotion recognition (SER), i.e., cross-corpus SER. Unlike conventional SER, a feature distribution mismatch may exist between the labeled source (training) and target (testing) speech samples in cross-corpus SER because they come from different speech emotion corpora, which degrades the performance of most well-performing SER methods. To address this issue, we propose a novel transfer subspace learning method called multiple distribution-adapted regression (MDAR) to bridge the gap between speech samples from different corpora. Specifically, MDAR aims to learn a projection matrix to build the relationship between the source speech features and emotion labels. A novel regularization term called multiple distribution adaption (MDA), consisting of a marginal and two conditional distribution-adapted operations, is designed to collaboratively enable such a discriminative projection matrix to be applicable to the target speech samples, regardless of speech corpus variance. Consequently, by resorting to the learned projection matrix, we are able to predict the emotion labels of target speech samples when only the source label information is given. To evaluate the proposed MDAR method, extensive cross-corpus SER tasks based on three different speech emotion corpora, i.e., EmoDB, eNTERFACE, and CASIA, were designed. Experimental results showed that the proposed MDAR outperformed most recent state-of-the-art transfer subspace learning methods and even performed better than several well-performing deep transfer learning methods in dealing with cross-corpus SER tasks.

5.
Entropy (Basel) ; 24(8)2022 Jul 29.
Artículo en Inglés | MEDLINE | ID: mdl-36010710

RESUMEN

Cross-corpus speech emotion recognition (SER) is a challenging task, and its difficulty lies in the mismatch between the feature distributions of the training (source domain) and testing (target domain) data, leading to the performance degradation when the model deals with new domain data. Previous works explore utilizing domain adaptation (DA) to eliminate the domain shift between the source and target domains and have achieved the promising performance in SER. However, these methods mainly treat cross-corpus tasks simply as the DA problem, directly aligning the distributions across domains in a common feature space. In this case, excessively narrowing the domain distance will impair the emotion discrimination of speech features since it is difficult to maintain the completeness of the emotion space only by an emotion classifier. To overcome this issue, we propose a progressively discriminative transfer network (PDTN) for cross-corpus SER in this paper, which can enhance the emotion discrimination ability of speech features while eliminating the mismatch between the source and target corpora. In detail, we design two special losses in the feature layers of PDTN, i.e., emotion discriminant loss Ld and distribution alignment loss La. By incorporating prior knowledge of speech emotion into feature learning (i.e., high and low valence speech emotion features have their respective cluster centers), we integrate a valence-aware center loss Lv and an emotion-aware center loss Lc as the Ld to guarantee the discriminative learning of speech emotions except an emotion classifier. Furthermore, a multi-layer distribution alignment loss La is adopted to more precisely eliminate the discrepancy of feature distributions between the source and target domains. Finally, through the optimization of PDTN by combining three losses, i.e., cross-entropy loss Le, Ld, and La, we can gradually eliminate the domain mismatch between the source and target corpora while maintaining the emotion discrimination of speech features. Extensive experimental results of six cross-corpus tasks on three datasets, i.e., Emo-DB, eNTERFACE, and CASIA, reveal that our proposed PDTN outperforms the state-of-the-art methods.

6.
Front Psychiatry ; 12: 837149, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35368726

RESUMEN

The main characteristic of depression is emotional dysfunction, manifested by increased levels of negative emotions and decreased levels of positive emotions. Therefore, accurate emotion recognition is an effective way to assess depression. Among the various signals used for emotion recognition, electroencephalogram (EEG) signal has attracted widespread attention due to its multiple advantages, such as rich spatiotemporal information in multi-channel EEG signals. First, we use filtering and Euclidean alignment for data preprocessing. In the feature extraction, we use short-time Fourier transform and Hilbert-Huang transform to extract time-frequency features, and convolutional neural networks to extract spatial features. Finally, bi-directional long short-term memory explored the timing relationship. Before performing the convolution operation, according to the unique topology of the EEG channel, the EEG features are converted into 3D tensors. This study has achieved good results on two emotion databases: SEED and Emotional BCI of 2020 WORLD ROBOT COMPETITION. We applied this method to the recognition of depression based on EEG and achieved a recognition rate of more than 70% under the five-fold cross-validation. In addition, the subject-independent protocol on SEED data has achieved a state-of-the-art recognition rate, which exceeds the existing research methods. We propose a novel EEG emotion recognition framework for depression detection, which provides a robust algorithm for real-time clinical depression detection based on EEG.

7.
IEEE Trans Neural Syst Rehabil Eng ; 28(11): 2401-2410, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32991285

RESUMEN

It is reported that the symptoms of autism spectrum disorder (ASD) could be improved by effective early interventions, which arouses an urgent need for large-scale early identification of ASD. Until now, the screening of ASD has relied on the child psychiatrist to collect medical history and conduct behavioral observations with the help of psychological assessment tools. Such screening measures inevitably have some disadvantages, including strong subjectivity, relying on experts and low-efficiency. With the development of computer science, it is possible to realize a computer-aided screening for ASD and alleviate the disadvantages of manual evaluation. In this study, we propose a behavior-based automated screening method to identify high-risk ASD (HR-ASD) for babies aged 8-24 months. The still-face paradigm (SFP) was used to elicit baby's spontaneous social behavior through a face-to-face interaction, in which a mother was required to maintain a normal interaction to amuse her baby for 2 minutes (a baseline episode) and then suddenly change to the no-reaction and no-expression status with 1 minute (a still-face episode). Here, multiple cues derived from baby's social stress response behavior during the latter episode, including head-movements, facial expressions and vocal characteristics, were statistically analyzed between HR-ASD and typical developmental (TD) groups. An automated identification model of HR-ASD was constructed based on these multi-cue features and the support vector machine (SVM) classifier; moreover, its screening performance was satisfied, for all the accuracy, specificity and sensitivity exceeded 90% on the cases included in this study. The experimental results suggest its feasibility in the early screening of HR-ASD.


Asunto(s)
Trastorno del Espectro Autista , Trastorno del Espectro Autista/diagnóstico , Niño , Señales (Psicología) , Estudios de Factibilidad , Femenino , Humanos , Lactante , Tamizaje Masivo , Conducta Social
8.
Front Pediatr ; 8: 290, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32582594

RESUMEN

Background: Although autism spectrum disorder (ASD) can currently be diagnosed at the age of 2 years, age at ASD diagnosis is still 40 months or even later. In order to early screening for ASD with more objective method, behavioral videos were used in a number of studies in recent years. Method: The still-face paradigm (SFP) was adopted to measure the frequency and duration of non-social smiling, protest behavior, eye contact, social smiling, and active social engagement in high-risk ASD group (HR) and typical development group (TD) (HR: n = 45; TD: n = 43). The HR group was follow-up until they were 2 years old to confirm final diagnosis. Machine learning methods were used to establish models for early screening of ASD. Results: During the face-to-face interaction (FF) episode of the SFP, there were statistically significant differences in the duration and frequency of eye contact, social smiling, and active social engagement between the two groups. During the still-face (SF) episode, there were statistically significant differences in the duration and frequency of eye contact and active social engagement between the two groups. The 45 children in the HR group were reclassified into two groups after follow-up: five children in the N-ASD group who were not meet the criterion of ASD and 40 children in the ASD group. The results showed that the accuracy of Support Vector Machine (SVM) classification was 83.35% for the SF episode. Conclusion: The use of the social behavior indicator of the SFP for a child with HR before 2 years old can effectively predict the clinical diagnosis of the child at the age of 2 years. The screening model constructed using SVM based on the SF episode of the SFP was the best. This also proves that the SFP has certain value in high-risk autism spectrum disorder screening. In addition, because of its convenient, it can provide a self-screening mode for use at home. Trial registration: Chinese Clinical Trial Registry, ChiCTR-OPC-17011995.

9.
Zhongguo Dang Dai Er Ke Za Zhi ; 22(4): 361-367, 2020 Apr.
Artículo en Chino | MEDLINE | ID: mdl-32312376

RESUMEN

OBJECTIVE: To study the characteristics of vocalization during the still-face paradigm (SFP) before the age of 2 years and their correlation with the severity of autism spectrum disorder (ASD) symptoms at diagnosis in children with ASD. METHODS: A total of 43 children aged 7-23 months, who were suspected of ASD, were enrolled as the suspected ASD group, and 37 typical development (TD) children, aged 7-23 months, were enrolled as the TD group. The frequency and durations of vocalization in the SFP were measured. The children in the suspected ASD group were followed up to the age of 2 years, and 34 children were diagnosed with ASD. Autism Diagnostic Observation Schedule (ADOS) was used to assess the severity of symptoms. The correlation of the characteristics of vocalization before the age of 2 years with the severity of ASD symptoms was analyzed. RESULTS: Compared with the TD group, the ASD group had significant reductions in the frequency and durations of meaningful vocalization and vocalization towards people and a significant increase in the duration of vocalization toward objects (P<0.05). The Spearman correlation analysis showed that in the ASD group, the frequency and durations of total vocalization, non-speech vocalization, babbling, vocalization towards people, and vocalization towards objects were negatively correlated with the score of communication in ADOS (P<0.05). The frequency and durations of total vocalization, babbling, and vocalization towards people and the duration of vocalization towards objects were negatively correlated with the score of reciprocal social interaction in ADOS (P<0.05). The frequency of total vocalization, the duration of babbling, and the frequency and duration of vocalization towards people were negatively correlated with the score of play in ADOS (P<0.05). The frequency of total vocalization and non-speech vocalization and the frequency and durations of vocalization towards people were negatively correlated with the score of stereotyped behaviors and restricted interests in ADOS (P<0.05). The multiple linear regression analysis showed that the frequency of total vocalization was a negative predictive factor for the score of communication in ADOS (P<0.001), and the duration of vocalization towards people was a negative predictive factor for the score of reciprocal social interaction in ADOS (P<0.05). CONCLUSIONS: SFP can better highlight the abnormal vocalization of ASD children before the age of 2 years, and such abnormalities can predict the severity of ASD symptoms early.


Asunto(s)
Trastorno del Espectro Autista , Humanos , Lactante , Relaciones Interpersonales
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...