RESUMO
Alzheimer's disease (AD) is a global neurodegenerative disorder that affects millions of individuals worldwide. Actual AD imaging datasets challenge the construction of reliable longitudinal models owing to imaging modality uncertainty. In addition, they are still unable to retain or obtain important information during disease progression from previous to followup time points. For example, the output values of current gates in recurrent models should be close to a specific value that indicates the model is uncertain about retaining or forgetting information. In this study, we propose a model which can extract and constrain each modality into a common representation space to capture intermodality interactions among different modalities associated with modality uncertainty to predict AD progression. In addition, we provide an auxiliary function to enhance the ability of recurrent gate robustly and effectively in controlling the flow of information over time using longitudinal data. We conducted comparative analysis on data from the Alzheimer's Disease Neuroimaging Initiative database. Our model outperformed other methods across all evaluation metrics. Therefore, the proposed model provides a promising solution for addressing modality uncertainty challenges in multimodal longitudinal AD progression prediction.
RESUMO
Parkinson's disease (PD) is one of the most common neurodegenerative disorders. The increasing demand for high-accuracy forecasts of disease progression has led to a surge in research employing multi-modality variables for prediction. In this review, we selected articles published from 2016 through June 2024, adhering strictly to our exclusion-inclusion criteria. These articles employed a minimum of two types of variables, including clinical, genetic, biomarker, and neuroimaging modalities. We conducted a comprehensive review and discussion on the application of multi-modality approaches in predicting PD progression. The predictive mechanisms, advantages, and shortcomings of relevant key modalities in predicting PD progression are discussed in the paper. The findings suggest that integrating multiple modalities resulted in more accurate predictions compared to those of fewer modalities in similar conditions. Furthermore, we identified some limitations in the existing field. Future studies that harness advancements in multi-modality variables and machine learning algorithms can mitigate these limitations and enhance predictive accuracy in PD progression.
RESUMO
BACKGROUND AND OBJECTIVE: Survival analysis plays an essential role in the medical field for optimal treatment decision-making. Recently, survival analysis based on the deep learning (DL) approach has been proposed and is demonstrating promising results. However, developing an ideal prediction model requires integrating large datasets across multiple institutions, which poses challenges concerning medical data privacy. METHODS: In this paper, we propose FedSurv, an asynchronous federated learning (FL) framework designed to predict survival time using clinical information and positron emission tomography (PET)-based features. This study used two datasets: a public radiogenic dataset of non-small cell lung cancer (NSCLC) from the Cancer Imaging Archive (RNSCLC), and an in-house dataset from the Chonnam National University Hwasun Hospital (CNUHH) in South Korea, consisting of clinical risk factors and F-18 fluorodeoxyglucose (FDG) PET images in NSCLC patients. Initially, each dataset was divided into multiple clients according to histological attributes, and each client was trained using the proposed DL model to predict individual survival time. The FL framework collected weights and parameters from the clients, which were then incorporated into the global model. Finally, the global model aggregated all weights and parameters and redistributed the updated model weights to each client. We evaluated different frameworks including single-client-based approach, centralized learning and FL. RESULTS: We evaluated our method on two independent datasets. First, on the RNSCLC dataset, the mean absolute error (MAE) was 490.80±22.95 d and the C-Index was 0.69±0.01. Second, on the CNUHH dataset, the MAE was 494.25±40.16 d and the C-Index was 0.71±0.01. The FL approach achieved centralized method performance in PET-based survival time prediction and outperformed single-client-based approaches. CONCLUSIONS: Our results demonstrated the feasibility and effectiveness of employing FL for individual survival prediction in NSCLC patients, using clinical information and PET-based features.
Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Carcinoma Pulmonar de Células não Pequenas/diagnóstico por imagem , Neoplasias Pulmonares/diagnóstico por imagem , Tomografia por Emissão de Pósitrons , Prognóstico , Hospitais UniversitáriosRESUMO
Human facial emotion detection is one of the challenging tasks in computer vision. Owing to high inter-class variance, it is hard for machine learning models to predict facial emotions accurately. Moreover, a person with several facial emotions increases the diversity and complexity of classification problems. In this paper, we have proposed a novel and intelligent approach for the classification of human facial emotions. The proposed approach comprises customized ResNet18 by employing transfer learning with the integration of triplet loss function (TLF), followed by SVM classification model. Using deep features from a customized ResNet18 trained with triplet loss, the proposed pipeline consists of a face detector used to locate and refine the face bounding box and a classifier to identify the facial expression class of discovered faces. RetinaFace is used to extract the identified face areas from the source image, and a ResNet18 model is trained on cropped face images with triplet loss to retrieve those features. An SVM classifier is used to categorize the facial expression based on the acquired deep characteristics. In this paper, we have proposed a method that can achieve better performance than state-of-the-art (SoTA) methods on JAFFE and MMI datasets. The technique is based on the triplet loss function to generate deep input image features. The proposed method performed well on the JAFFE and MMI datasets with an accuracy of 98.44% and 99.02%, respectively, on seven emotions; meanwhile, the performance of the method needs to be fine-tuned for the FER2013 and AFFECTNET datasets.
Assuntos
Emoções , Máquina de Vetores de Suporte , Humanos , Inteligência , Aprendizado de MáquinaRESUMO
Diffuse large B-cell lymphoma (DLBCL) is a common and aggressive subtype of lymphoma, and accurate survival prediction is crucial for treatment decisions. This study aims to develop a robust survival prediction strategy to integrate various risk factors effectively, including clinical risk factors and Deauville scores in positron-emission tomography/computed tomography at different treatment stages using a deep-learning-based approach. We conduct a multi-institutional study on 604 DLBCL patients' clinical data and validate the model on 220 patients from an independent institution. We propose a survival prediction model using transformer architecture and a categorical-feature-embedding technique that can handle high-dimensional and categorical data. Comparison with deep-learning survival models such as DeepSurv, CoxTime, and CoxCC based on the concordance index (C-index) and the mean absolute error (MAE) demonstrates that the categorical features obtained using transformers improved the MAE and the C-index. The proposed model outperforms the best-performing existing method by approximately 185 days in terms of the MAE for survival time estimation on the testing set. Using the Deauville score obtained during treatment resulted in a 0.02 improvement in the C-index and a 53.71-day improvement in the MAE, highlighting its prognostic importance. Our deep-learning model could improve survival prediction accuracy and treatment personalization for DLBCL patients.
RESUMO
Combating mental illnesses such as depression and anxiety has become a global concern. As a result of the necessity for finding effective ways to battle these problems, machine learning approaches have been included in healthcare systems for the diagnosis and probable prediction of the treatment outcomes of mental health conditions. With the growing interest in machine and deep learning methods, analysis of existing work to guide future research directions is necessary. In this study, 33 articles on the diagnosis of schizophrenia, depression, anxiety, bipolar disorder, post-traumatic stress disorder (PTSD), anorexia nervosa, and attention deficit hyperactivity disorder (ADHD) were retrieved from various search databases using the preferred reporting items for systematic reviews and meta-analysis (PRISMA) review methodology. These publications were chosen based on their use of machine learning and deep learning technologies, individually assessed, and their recommended methodologies were then classified into the various disorders included in this study. In addition, the difficulties encountered by the researchers are discussed, and a list of some public datasets is provided.
RESUMO
If left untreated, Alzheimer's disease (AD) is a leading cause of slowly progressive dementia. Therefore, it is critical to detect AD to prevent its progression. In this study, we propose a bidirectional progressive recurrent network with imputation (BiPro) that uses longitudinal data, including patient demographics and biomarkers of magnetic resonance imaging (MRI), to forecast clinical diagnoses and phenotypic measurements at multiple timepoints. To compensate for missing observations in the longitudinal data, we use an imputation module to inspect both temporal and multivariate relations associated with the mean and forward relations inherent in the time series data. To encode the imputed information, we define a modification of the long short-term memory (LSTM) cell by using a progressive module to compute the progression score of each biomarker between the given timepoint and the baseline through a negative exponential function. These features are used for the prediction task. The proposed system is an end-to-end deep recurrent network that can accomplish multiple tasks at the same time, including (1) imputing missing values, (2) forecasting phenotypic measurements, and (3) predicting the clinical status of a patient based on longitudinal data. We experimented on 1,335 participants from The Alzheimer's Disease Prediction of Longitudinal Evolution (TADPOLE) challenge cohort. The proposed method achieved a mean area under the receiver-operating characteristic curve (mAUC) of 78% for predicting the clinical status of patients, a mean absolute error (MAE) of 3.5ml for forecasting MRI biomarkers, and an MAE of 6.9ml for missing value imputation. The results confirm that our proposed model outperforms prevalent approaches, and can be used to minimize the progression of Alzheimer's disease.
Assuntos
Doença de Alzheimer , Doença de Alzheimer/diagnóstico por imagem , Biomarcadores , Previsões , Humanos , Imageamento por Ressonância Magnética/métodosRESUMO
Speech emotion recognition (SER) is one of the most exciting topics many researchers have recently been involved in. Although much research has been conducted recently on this topic, emotion recognition via non-verbal speech (known as the vocal burst) is still sparse. The vocal burst is concise and has meaningless content, which is harder to deal with than verbal speech. Therefore, in this paper, we proposed a self-relation attention and temporal awareness (SRA-TA) module to tackle this problem with vocal bursts, which could capture the dependency in a long-term period and focus on the salient parts of the audio signal as well. Our proposed method contains three main stages. Firstly, the latent features are extracted using a self-supervised learning model from the raw audio signal and its Mel-spectrogram. After the SRA-TA module is utilized to capture the valuable information from latent features, all features are concatenated and fed into ten individual fully-connected layers to predict the scores of 10 emotions. Our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, which achieves the first ranking of the high-dimensional emotion task in the 2022 ACII Affective Vocal Burst Workshop & Challenge.
Assuntos
Emoções , Percepção da Fala , Fala , AtençãoRESUMO
With the development of sensing technologies and machine learning, techniques that can identify emotions and inner states of a human through physiological signals, known as electroencephalography (EEG), have been actively developed and applied to various domains, such as automobiles, robotics, healthcare, and customer-support services. Thus, the demand for acquiring and analyzing EEG signals in real-time is increasing. In this paper, we aimed to acquire a new EEG dataset based on the discrete emotion theory, termed as WeDea (Wireless-based eeg Data for emotion analysis), and propose a new combination for WeDea analysis. For the collected WeDea dataset, we used video clips as emotional stimulants that were selected by 15 volunteers. Consequently, WeDea is a multi-way dataset measured while 30 subjects are watching the selected 79 video clips under five different emotional states using a convenient portable headset device. Furthermore, we designed a framework for recognizing human emotional state using this new database. The practical results for different types of emotions have proven that WeDea is a promising resource for emotion analysis and can be applied to the field of neuroscience.
Assuntos
Eletroencefalografia , Aprendizado de Máquina , Bases de Dados Factuais , Eletroencefalografia/métodos , Emoções/fisiologia , HumanosRESUMO
Segmentation of liver tumors from Computerized Tomography (CT) images remains a challenge due to the natural variation in tumor shape and structure as well as the noise in CT images. A key assumption is that the performance of liver tumor segmentation depends on the characteristics of multiple features extracted from multiple filters. In this paper, we design an enhanced approach based on a two-class (liver, tumor) convolutional neural network that discriminates tumor as well as liver from CT images. First, the contrast and intensity values in CT images are adjusted and high frequencies are removed using Hounsfield units (HU) filtering and standardization. Then, the liver tumor is segmented from entire images with multiple filter U-net (MFU-net). Finally, a quantitative analysis is carried out to evaluate the segmentation results using three different methods: boundary-distance-based metrics, size-based metrics, and overlap-based metrics. The proposed method is validated on CT images from the 3Dircadb and LiTS dataset. The results demonstrate that the multiple filters are useful for extracting local and global feature simultaneously, minimizing the boundary distance errors, and our approach demonstrates better performance in heterogeneous tumor regions of CT images.
RESUMO
Besides facial or gesture-based emotion recognition, Electroencephalogram (EEG) data have been drawing attention thanks to their capability in countering the effect of deceptive external expressions of humans, like faces or speeches. Emotion recognition based on EEG signals heavily relies on the features and their delineation, which requires the selection of feature categories converted from the raw signals and types of expressions that could display the intrinsic properties of an individual signal or a group of them. Moreover, the correlation or interaction among channels and frequency bands also contain crucial information for emotional state prediction, and it is commonly disregarded in conventional approaches. Therefore, in our method, the correlation between 32 channels and frequency bands were put into use to enhance the emotion prediction performance. The extracted features chosen from the time domain were arranged into feature-homogeneous matrices, with their positions following the corresponding electrodes placed on the scalp. Based on this 3D representation of EEG signals, the model must have the ability to learn the local and global patterns that describe the short and long-range relations of EEG channels, along with the embedded features. To deal with this problem, we proposed the 2D CNN with different kernel-size of convolutional layers assembled into a convolution block, combining features that were distributed in small and large regions. Ten-fold cross validation was conducted on the DEAP dataset to prove the effectiveness of our approach. We achieved the average accuracies of 98.27% and 98.36% for arousal and valence binary classification, respectively.
Assuntos
Eletroencefalografia , Redes Neurais de Computação , Nível de Alerta , Eletrodos , Emoções , HumanosRESUMO
One essential step in radiotherapy treatment planning is the organ at risk of segmentation in Computed Tomography (CT). Many recent studies have focused on several organs such as the lung, heart, esophagus, trachea, liver, aorta, kidney, and prostate. However, among the above organs, the esophagus is one of the most difficult organs to segment because of its small size, ambiguous boundary, and very low contrast in CT images. To address these challenges, we propose a fully automated framework for the esophagus segmentation from CT images. The proposed method is based on the processing of slice images from the original three-dimensional (3D) image so that our method does not require large computational resources. We employ the spatial attention mechanism with the atrous spatial pyramid pooling module to locate the esophagus effectively, which enhances the segmentation performance. To optimize our model, we use group normalization because the computation is independent of batch sizes, and its performance is stable. We also used the simultaneous truth and performance level estimation (STAPLE) algorithm to reach robust results for segmentation. Firstly, our model was trained by k-fold cross-validation. And then, the candidate labels generated by each fold were combined by using the STAPLE algorithm. And as a result, Dice and Hausdorff Distance scores have an improvement when applying this algorithm to our segmentation results. Our method was evaluated on SegTHOR and StructSeg 2019 datasets, and the experiment shows that our method outperforms the state-of-the-art methods in esophagus segmentation. Our approach shows a promising result in esophagus segmentation, which is still challenging in medical analyses.
Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Algoritmos , Esôfago/diagnóstico por imagem , Humanos , Masculino , Tomografia Computadorizada por Raios XRESUMO
BACKGROUND: This study aimed to investigate the feasibility of using circulating tumor cells (CTCs), peripheral blood cells (PBCs), and circulating cell-free DNA (cfDNA) as biomarkers of immune checkpoint inhibitor treatment response in patients with advanced non-small cell lung cancer (NSCLC). METHODS: We recruited patients diagnosed with advanced NSCLC who received pembrolizumab or atezolizumab between July 2019 and June 2020. Blood was collected before each treatment cycle (C1-C4) to calculate absolute neutrophil count (ANC), neutrophil-to-lymphocyte ratio (NLR), derived NLR (dNLR), and platelet-to-lymphocyte ratio (PLR). CTCs, isolated using the CD-PRIMETM system, exhibited EpCAM/CK+/CD45- phenotype in BioViewCCBSTM. The cfDNA was extracted from plasma at the beginning of C1 and C4. RESULTS: The durable clinical benefit (DCB) rate among 83 response-evaluable patients was 34%. CTC, PBC, and cfDNA levels at baseline (C1) were not significantly correlated with treatment response, although patients with DCB had lower CTC counts from C2 to C4. However, patients with low NLR, dNLR, PLR, and cfDNA levels at C1 had improved progression-free survival (PFS) and overall survival (OS). Patients with decreased CTC counts from C1 to C2 had higher median PFS (6.2 vs. 2.3 months; P=0.078) and OS (not reached vs. 6.8 months, P=0.021) than those with increased CTC counts. Low dNLR (≤2.0) at C1 and decreased CTC counts were independent factors for predicting survival. CONCLUSIONS: Comprehensive analysis of CTC, PBC, and cfDNA levels at baseline and during treatment demonstrated they might be biomarkers for predicting survival benefit. This finding could aid in risk stratification of patients with advanced NSCLC who are undergoing immune checkpoint inhibitor treatment.
RESUMO
BACKGROUND: The Cox proportional hazards model is commonly used to predict hazard ratio, which is the risk or probability of occurrence of an event of interest. However, the Cox proportional hazard model cannot directly generate an individual survival time. To do this, the survival analysis in the Cox model converts the hazard ratio to survival times through distributions such as the exponential, Weibull, Gompertz or log-normal distributions. In other words, to generate the survival time, the Cox model has to select a specific distribution over time. RESULTS: This study presents a method to predict the survival time by integrating hazard network and a distribution function network. The Cox proportional hazards network is adapted in DeepSurv for the prediction of the hazard ratio and a distribution function network applied to generate the survival time. To evaluate the performance of the proposed method, a new evaluation metric that calculates the intersection over union between the predicted curve and ground truth was proposed. To further understand significant prognostic factors, we use the 1D gradient-weighted class activation mapping method to highlight the network activations as a heat map visualization over an input data. The performance of the proposed method was experimentally verified and the results compared to other existing methods. CONCLUSIONS: Our results confirmed that the combination of the two networks, Cox proportional hazards network and distribution function network, can effectively generate accurate survival time.
Assuntos
Projetos de Pesquisa , Probabilidade , Modelos de Riscos Proporcionais , Análise de SobrevidaRESUMO
Emotion recognition plays an important role in human-computer interactions. Recent studies have focused on video emotion recognition in the wild and have run into difficulties related to occlusion, illumination, complex behavior over time, and auditory cues. State-of-the-art methods use multiple modalities, such as frame-level, spatiotemporal, and audio approaches. However, such methods have difficulties in exploiting long-term dependencies in temporal information, capturing contextual information, and integrating multi-modal information. In this paper, we introduce a multi-modal flexible system for video-based emotion recognition in the wild. Our system tracks and votes on significant faces corresponding to persons of interest in a video to classify seven basic emotions. The key contribution of this study is that it proposes the use of face feature extraction with context-aware and statistical information for emotion recognition. We also build two model architectures to effectively exploit long-term dependencies in temporal information with a temporal-pyramid model and a spatiotemporal model with "Conv2D+LSTM+3DCNN+Classify" architecture. Finally, we propose the best selection ensemble to improve the accuracy of multi-modal fusion. The best selection ensemble selects the best combination from spatiotemporal and temporal-pyramid models to achieve the best accuracy for classifying the seven basic emotions. In our experiment, we take benchmark measurement on the AFEW dataset with high accuracy.
Assuntos
Conscientização , Emoções , Humanos , Estimulação Luminosa , Modalidades de FisioterapiaRESUMO
Tumor classification and segmentation problems have attracted interest in recent years. In contrast to the abundance of studies examining brain, lung, and liver cancers, there has been a lack of studies using deep learning to classify and segment knee bone tumors. In this study, our objective is to assist physicians in radiographic interpretation to detect and classify knee bone regions in terms of whether they are normal, begin-tumor, or malignant-tumor regions. We proposed the Seg-Unet model with global and patched-based approaches to deal with challenges involving the small size, appearance variety, and uncommon nature of bone lesions. Our model contains classification, tumor segmentation, and high-risk region segmentation branches to learn mutual benefits among the global context on the whole image and the local texture at every pixel. The patch-based model improves our performance in malignant-tumor detection. We built the knee bone tumor dataset supported by the physicians of Chonnam National University Hospital (CNUH). Experiments on the dataset demonstrate that our method achieves better performance than other methods with an accuracy of 99.05% for the classification and an average Mean IoU of 84.84% for segmentation. Our results showed a significant contribution to help the physicians in knee bone tumor detection.
RESUMO
An electroencephalogram (EEG) is the most extensively used physiological signal in emotion recognition using biometric data. However, these EEG data are difficult to analyze, because of their anomalous characteristic where statistical elements vary according to time as well as spatial-temporal correlations. Therefore, new methods that can clearly distinguish emotional states in EEG data are required. In this paper, we propose a new emotion recognition method, named AsEmo. The proposed method extracts effective features boosting classification performance on various emotional states from multi-class EEG data. AsEmo Automatically determines the number of spatial filters needed to extract significant features using the explained variance ratio (EVR) and employs a Subject-independent method for real-time processing of Emotion EEG data. The advantages of this method are as follows: (a) it automatically determines the spatial filter coefficients distinguishing emotional states and extracts the best features; (b) it is very robust for real-time analysis of new data using a subject-independent technique that considers subject sets, and not a specific subject; (c) it can be easily applied to both binary-class and multi-class data. Experimental results on real-world EEG emotion recognition tasks demonstrate that AsEmo outperforms other state-of-the-art methods with a 2-8% improvement in terms of classification accuracy.
Assuntos
Biometria , Eletroencefalografia , Emoções , Humanos , Projetos de PesquisaRESUMO
The early detection and rapid quantification of acute ischemic lesions play pivotal roles in stroke management. We developed a deep learning algorithm for the automatic binary classification of the Alberta Stroke Program Early Computed Tomographic Score (ASPECTS) using diffusion-weighted imaging (DWI) in acute stroke patients. Three hundred and ninety DWI datasets with acute anterior circulation stroke were included. A classifier algorithm utilizing a recurrent residual convolutional neural network (RRCNN) was developed for classification between low (1-6) and high (7-10) DWI-ASPECTS groups. The model performance was compared with a pre-trained VGG16, Inception V3, and a 3D convolutional neural network (3DCNN). The proposed RRCNN model demonstrated higher performance than the pre-trained models and 3DCNN with an accuracy of 87.3%, AUC of 0.941, and F1-score of 0.888 for classification between the low and high DWI-ASPECTS groups. These results suggest that the deep learning algorithm developed in this study can provide a rapid assessment of DWI-ASPECTS and may serve as an ancillary tool that can assist physicians in making urgent clinical decisions.
RESUMO
A distance map captured using a time-of-flight (ToF) depth sensor has fundamental problems, such as ambiguous depth information in shiny or dark surfaces, optical noise, and mismatched boundaries. Severe depth errors exist in shiny and dark surfaces owing to excess reflection and excess absorption of light, respectively. Dealing with this problem has been a challenge due to the inherent hardware limitations of ToF, which measures the distance using the number of reflected photons. This study proposes a distance error correction method using three ToF sensors, set to different integration times to address the ambiguity in depth information. First, the three ToF depth sensors are installed horizontally at different integration times to capture distance maps at different integration times. Given the amplitude maps and error regions are estimated based on the amount of light, the estimated error regions are refined by exploiting the accurate depth information from the neighboring depth sensors that use different integration times. Moreover, we propose a new optical noise reduction filter that considers the distribution of the depth information biased toward one side. Experimental results verified that the proposed method overcomes the drawbacks of ToF cameras and provides enhanced distance maps.
RESUMO
Epilepsy forecasting has been extensively studied using high-order time series obtained from scalp-recorded electroencephalography (EEG). An accurate seizure prediction system would not only help significantly improve patients' quality of life, but would also facilitate new therapeutic strategies to manage epilepsy. This paper thus proposes an improved Kalman Filter (KF) algorithm to mine seizure forecasts from neural activity by modeling three properties in the high-order EEG time series: noise, temporal smoothness, and tensor structure. The proposed High-Order Kalman Filter (HOKF) is an extension of the standard Kalman filter, for which higher-order modeling is limited. The efficient dynamic of HOKF system preserves the tensor structure of the observations and latent states. As such, the proposed method offers two main advantages: (i) effectiveness with HOKF results in hidden variables that capture major evolving trends suitable to predict neural activity, even in the presence of missing values; and (ii) scalability in that the wall clock time of the HOKF is linear with respect to the number of time-slices of the sequence. The HOKF algorithm is examined in terms of its effectiveness and scalability by conducting forecasting and scalability experiments with a real epilepsy EEG dataset. The results of the simulation demonstrate the superiority of the proposed method over the original Kalman Filter and other existing methods.