Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Int J Neural Syst ; 34(5): 2450024, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38533631

RESUMO

Emotion recognition plays an essential role in human-human interaction since it is a key to understanding the emotional states and reactions of human beings when they are subject to events and engagements in everyday life. Moving towards human-computer interaction, the study of emotions becomes fundamental because it is at the basis of the design of advanced systems to support a broad spectrum of application areas, including forensic, rehabilitative, educational, and many others. An effective method for discriminating emotions is based on ElectroEncephaloGraphy (EEG) data analysis, which is used as input for classification systems. Collecting brain signals on several channels and for a wide range of emotions produces cumbersome datasets that are hard to manage, transmit, and use in varied applications. In this context, the paper introduces the Empátheia system, which explores a different EEG representation by encoding EEG signals into images prior to their classification. In particular, the proposed system extracts spatio-temporal image encodings, or atlases, from EEG data through the Processing and transfeR of Interaction States and Mappings through Image-based eNcoding (PRISMIN) framework, thus obtaining a compact representation of the input signals. The atlases are then classified through the Empátheia architecture, which comprises branches based on convolutional, recurrent, and transformer models designed and tuned to capture the spatial and temporal aspects of emotions. Extensive experiments were conducted on the Shanghai Jiao Tong University (SJTU) Emotion EEG Dataset (SEED) public dataset, where the proposed system significantly reduced its size while retaining high performance. The results obtained highlight the effectiveness of the proposed approach and suggest new avenues for data representation in emotion recognition from EEG signals.


Assuntos
Encéfalo , Emoções , Humanos , China , Eletroencefalografia/métodos , Comportamento Compulsivo
2.
Comput Methods Programs Biomed ; 245: 108037, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38271793

RESUMO

BACKGROUND: aortic stenosis is a common heart valve disease that mainly affects older people in developed countries. Its early detection is crucial to prevent the irreversible disease progression and, eventually, death. A typical screening technique to detect stenosis uses echocardiograms; however, variations introduced by other tissues, camera movements, and uneven lighting can hamper the visual inspection, leading to misdiagnosis. To address these issues, effective solutions involve employing deep learning algorithms to assist clinicians in detecting and classifying stenosis by developing models that can predict this pathology from single heart views. Although promising, the visual information conveyed by a single image may not be sufficient for an accurate diagnosis, especially when using an automatic system; thus, this indicates that different solutions should be explored. METHODOLOGY: following this rationale, this paper proposes a novel deep learning architecture, composed of a multi-view, multi-scale feature extractor, and a transformer encoder (MV-MS-FETE) to predict stenosis from parasternal long and short-axis views. In particular, starting from the latter, the designed model extracts relevant features at multiple scales along its feature extractor component and takes advantage of a transformer encoder to perform the final classification. RESULTS: experiments were performed on the recently released Tufts medical echocardiogram public dataset, which comprises 27,788 images split into training, validation, and test sets. Due to the recent release of this collection, tests were also conducted on several state-of-the-art models to create multi-view and single-view benchmarks. For all models, standard classification metrics were computed (e.g., precision, F1-score). The obtained results show that the proposed approach outperforms other multi-view methods in terms of accuracy and F1-score and has more stable performance throughout the training procedure. Furthermore, the experiments also highlight that multi-view methods generally perform better than their single-view counterparts. CONCLUSION: this paper introduces a novel multi-view and multi-scale model for aortic stenosis recognition, as well as three benchmarks to evaluate it, effectively providing multi-view and single-view comparisons that fully highlight the model's effectiveness in aiding clinicians in performing diagnoses while also producing several baselines for the aortic stenosis recognition task.


Assuntos
Estenose da Valva Aórtica , Humanos , Idoso , Constrição Patológica , Estenose da Valva Aórtica/diagnóstico por imagem , Ecocardiografia , Coração , Algoritmos
3.
Neural Netw ; 153: 386-398, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35785610

RESUMO

Improving existing neural network architectures can involve several design choices such as manipulating the loss functions, employing a diverse learning strategy, exploiting gradient evolution at training time, optimizing the network hyper-parameters, or increasing the architecture depth. The latter approach is a straightforward solution, since it directly enhances the representation capabilities of a network; however, the increased depth generally incurs in the well-known vanishing gradient problem. In this paper, borrowing from different methods addressing this issue, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task. The presented methodology directly improves a convolutional neural network (CNN) by preserving information from the input image through interlaced auto-encoders (AEs), and further refines the base network architecture by means of skip and residual connections. To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on five collections, i.e., MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and Caltech-256; where the SIRe-extended architectures achieve significantly increased performances across all models and datasets, thus confirming the presented approach effectiveness.


Assuntos
Aprendizagem , Redes Neurais de Computação
4.
Int J Neural Syst ; 32(10): 2250040, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35881015

RESUMO

Human feelings expressed through verbal (e.g. voice) and non-verbal communication channels (e.g. face or body) can influence either human actions or interactions. In the literature, most of the attention was given to facial expressions for the analysis of emotions conveyed through non-verbal behaviors. Despite this, psychology highlights that the body is an important indicator of the human affective state in performing daily life activities. Therefore, this paper presents a novel method for affective action and interaction recognition from videos, exploiting multi-view representation learning and only full-body handcrafted characteristics selected following psychological and proxemic studies. Specifically, 2D skeletal data are extracted from RGB video sequences to derive diverse low-level skeleton features, i.e. multi-views, modeled through the bag-of-visual-words clustering approach generating a condition-related codebook. In this way, each affective action and interaction within a video can be represented as a frequency histogram of codewords. During the learning phase, for each affective class, training samples are used to compute its global histogram of codewords stored in a database and later used for the recognition task. In the recognition phase, the video frequency histogram representation is matched against the database of class histograms and classified as the closest affective class in terms of Euclidean distance. The effectiveness of the proposed system is evaluated on a specifically collected dataset containing 6 emotion for both actions and interactions, on which the proposed system obtains 93.64% and 90.83% accuracy, respectively. In addition, the devised strategy also achieves in line performances with other literature works based on deep learning when tested on a public collection containing 6 emotions plus a neutral state, demonstrating the effectiveness of the presented approach and confirming the findings in psychological and proxemic studies.


Assuntos
Algoritmos , Expressão Facial , Análise por Conglomerados , Atividades Humanas , Humanos , Esqueleto
5.
Comput Methods Programs Biomed ; 221: 106833, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35537296

RESUMO

BACKGROUND: over the last year, the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) and its variants have highlighted the importance of screening tools with high diagnostic accuracy for new illnesses such as COVID-19. In that regard, deep learning approaches have proven as effective solutions for pneumonia classification, especially when considering chest-x-rays images. However, this lung infection can also be caused by other viral, bacterial or fungi pathogens. Consequently, efforts are being poured toward distinguishing the infection source to help clinicians to diagnose the correct disease origin. Following this tendency, this study further explores the effectiveness of established neural network architectures on the pneumonia classification task through the transfer learning paradigm. METHODOLOGY: to present a comprehensive comparison, 12 well-known ImageNet pre-trained models were fine-tuned and used to discriminate among chest-x-rays of healthy people, and those showing pneumonia symptoms derived from either a viral (i.e., generic or SARS-CoV-2) or bacterial source. Furthermore, since a common public collection distinguishing between such categories is currently not available, two distinct datasets of chest-x-rays images, describing the aforementioned sources, were combined and employed to evaluate the various architectures. RESULTS: the experiments were performed using a total of 6330 images split between train, validation, and test sets. For all models, standard classification metrics were computed (e.g., precision, f1-score), and most architectures obtained significant performances, reaching, among the others, up to 84.46% average f1-score when discriminating the four identified classes. Moreover, execution times, areas under the receiver operating characteristic (AUROC), confusion matrices, activation maps computed via the Grad-CAM algorithm, and additional experiments to assess the robustness of each model using only 50%, 20%, and 10% of the training set were also reported to present an informed discussion on the networks classifications. CONCLUSION: this paper examines the effectiveness of well-known architectures on a joint collection of chest-x-rays presenting pneumonia cases derived from either viral or bacterial sources, with particular attention to SARS-CoV-2 contagions for viral pathogens; demonstrating that existing architectures can effectively diagnose pneumonia sources and suggesting that the transfer learning paradigm could be a crucial asset in diagnosing future unknown illnesses.


Assuntos
COVID-19 , Aprendizado Profundo , Pneumonia , COVID-19/diagnóstico por imagem , Humanos , Pneumonia/diagnóstico por imagem , SARS-CoV-2 , Raios X
6.
Int J Neural Syst ; 32(5): 2250015, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-35209810

RESUMO

The increasing availability of wireless access points (APs) is leading toward human sensing applications based on Wi-Fi signals as support or alternative tools to the widespread visual sensors, where the signals enable to address well-known vision-related problems such as illumination changes or occlusions. Indeed, using image synthesis techniques to translate radio frequencies to the visible spectrum can become essential to obtain otherwise unavailable visual data. This domain-to-domain translation is feasible because both objects and people affect electromagnetic waves, causing radio and optical frequencies variations. In the literature, models capable of inferring radio-to-visual features mappings have gained momentum in the last few years since frequency changes can be observed in the radio domain through the channel state information (CSI) of Wi-Fi APs, enabling signal-based feature extraction, e.g. amplitude. On this account, this paper presents a novel two-branch generative neural network that effectively maps radio data into visual features, following a teacher-student design that exploits a cross-modality supervision strategy. The latter conditions signal-based features in the visual domain to completely replace visual data. Once trained, the proposed method synthesizes human silhouette and skeleton videos using exclusively Wi-Fi signals. The approach is evaluated on publicly available data, where it obtains remarkable results for both silhouette and skeleton videos generation, demonstrating the effectiveness of the proposed cross-modality supervision strategy.


Assuntos
Ondas de Rádio , Tecnologia sem Fio , Humanos , Esqueleto
7.
Int J Neural Syst ; 31(2): 2050068, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-33200620

RESUMO

Deception detection is a relevant ability in high stakes situations such as police interrogatories or court trials, where the outcome is highly influenced by the interviewed person behavior. With the use of specific devices, e.g. polygraph or magnetic resonance, the subject is aware of being monitored and can change his behavior, thus compromising the interrogation result. For this reason, video analysis-based methods for automatic deception detection are receiving ever increasing interest. In this paper, a deception detection approach based on RGB videos, leveraging both facial features and stacked generalization ensemble, is proposed. First, a face, which is well-known to present several meaningful cues for deception detection, is identified, aligned, and masked to build video signatures. These signatures are constructed starting from five different descriptors, which allow the system to capture both static and dynamic facial characteristics. Then, video signatures are given as input to four base-level algorithms, which are subsequently fused applying the stacked generalization technique, resulting in a more robust meta-level classifier used to predict deception. By exploiting relevant cues via specific features, the proposed system achieves improved performances on a public dataset of famous court trials, with respect to other state-of-the-art methods based on facial features, highlighting the effectiveness of the proposed method.


Assuntos
Sinais (Psicologia) , Enganação , Algoritmos , Humanos
8.
Sensors (Basel) ; 20(18)2020 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-32962168

RESUMO

Person re-identification is concerned with matching people across disjointed camera views at different places and different time instants. This task results of great interest in computer vision, especially in video surveillance applications where the re-identification and tracking of persons are required on uncontrolled crowded spaces and after long time periods. The latter aspects are responsible for most of the current unsolved problems of person re-identification, in fact, the presence of many people in a location as well as the passing of hours or days give arise to important visual appearance changes of people, for example, clothes, lighting, and occlusions; thus making person re-identification a very hard task. In this paper, for the first time in the state-of-the-art, a meta-feature based Long Short-Term Memory (LSTM) hashing model for person re-identification is presented. Starting from 2D skeletons extracted from RGB video streams, the proposed method computes a set of novel meta-features based on movement, gait, and bone proportions. These features are analysed by a network composed of a single LSTM layer and two dense layers. The first layer is used to create a pattern of the person's identity, then, the seconds are used to generate a bodyprint hash through binary coding. The effectiveness of the proposed method is tested on three challenging datasets, that is, iLIDS-VID, PRID 2011, and MARS. In particular, the reported results show that the proposed method, which is not based on visual appearance of people, is fully competitive with respect to other methods based on visual features. In addition, thanks to its skeleton model abstraction, the method results to be a concrete contribute to address open problems, such as long-term re-identification and severe illumination changes, which tend to heavily influence the visual appearance of persons.


Assuntos
Algoritmos , Memória de Longo Prazo , Marcha , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA