RESUMO
Lithium, a rare metal of strategic importance, has garnered heightened global attention. This investigation delves into the laboratory visible-near infrared and short-wavelength infrared reflectance (VNIR-SWIR 350 nm-2500 nm) spectral properties of lithium-rich rocks and stream sediments, aiming to elucidate their quantitative relationship with lithium concentration. This research seeks to pave new avenues and furnish innovative technical solutions for probing sedimentary lithium reserves. Conducted in the Tuanjie Peak region of Western Kunlun, Xinjiang, China, this study analyzed 614 stream sediments and 222 rock specimens. Initial steps included laboratory VNIR-SWIR spectral reflectance measurements and lithium quantification. Following the preprocessing of spectral data via Savitzky-Golay (SG) smoothing and continuum removal (CR), the absorption positions (Pos2210nm, Pos1910nm) and depths (Depth2210, Depth1910) in the rock spectra, as well as the Illite Spectral Maturity (ISM) of the rock samples, were extracted. Employing both the Successive Projections Algorithm (SPA) and genetic algorithm (GA), wavelengths indicative of lithium content were identified. Integrating the lithium-sensitive wavelengths identified by these feature selection methods, A quantitative predictive regression model for lithium content in rock and stream sediments was developed using partial least squares regression (PLSR), support vector regression (SVR), and convolutional neural network (CNN). Spectral analysis indicated that lithium is predominantly found in montmorillonite and illite, with its content positively correlating with the spectral maturity of illite and closely related to Al-OH absorption depth (Depth2210) and clay content. The SPA algorithm was more effective than GA in extracting lithium-sensitive bands. The optimal regression model for quantitative prediction of lithium content in rock samples was SG-SPA-CNN, with a correlation coefficient prediction (Rp) of 0.924 and root-mean-square error prediction (RMSEP) of 0.112. The optimal model for the prediction of lithium content in stream sediment was SG-SPA-CNN, with an Rp and RMSEP of 0.881 and 0.296, respectively. The higher prediction accuracy for lithium content in rocks compared to sediments indicates that rocks are a more suitable medium for predicting lithium content. Compared to the PLSR and SVR models, the CNN model performs better in both sample types. Despite the limitations, this study highlights the effectiveness of hyperspectral technology in exploring the potential of clay-type lithium resources in the Tuanjie Peak area, offering new perspectives and approaches for further exploration.
RESUMO
Introduction: This study aims to investigate the potential of machine learning in predicting mental health conditions among college students by analyzing existing literature on mental health diagnoses using various machine learning algorithms. Methods: The research employed a systematic literature review methodology to investigate the application of deep learning techniques in predicting mental health diagnoses among students from 2011 to 2024. The search strategy involved key terms, such as "deep learning," "mental health," and related terms, conducted on reputable repositories like IEEE, Xplore, ScienceDirect, SpringerLink, PLOS, and Elsevier. Papers published between January, 2011, and May, 2024, specifically focusing on deep learning models for mental health diagnoses, were considered. The selection process adhered to PRISMA guidelines and resulted in 30 relevant studies. Results: The study highlights Convolutional Neural Networks (CNN), Random Forest (RF), Support Vector Machine (SVM), Deep Neural Networks, and Extreme Learning Machine (ELM) as prominent models for predicting mental health conditions. Among these, CNN demonstrated exceptional accuracy compared to other models in diagnosing bipolar disorder. However, challenges persist, including the need for more extensive and diverse datasets, consideration of heterogeneity in mental health condition, and inclusion of longitudinal data to capture temporal dynamics. Conclusion: This study offers valuable insights into the potential and challenges of machine learning in predicting mental health conditions among college students. While deep learning models like CNN show promise, addressing data limitations and incorporating temporal dynamics are crucial for further advancements.
RESUMO
This article presents an image dataset of palm leaf diseases to aid the early identification and classification of date palm infections. The dataset contains images of 8 main types of disorders affecting date palm leaves, three of which are physiological, four are fungal, and one is caused by pests. Specifically, the collected samples exhibit symptoms and signs of potassium deficiency, manganese deficiency, magnesium deficiency, black scorch, leaf spots, fusarium wilt, rachis blight, and parlatoria blanchardi. Moreover, the dataset includes a baseline of healthy palm leaves. In total, 608 raw images were captured over a period of three months, coinciding with the autumn and spring seasons, from 10 real date farms in the Madinah region of Saudi Arabia. The images were captured using smartphones and an SLR camera, focusing mainly on inflected leaves and leaflets. Date palm fruits, trunks, and roots are beyond the focus of this dataset. The infected leaf images were filtered, cropped, augmented, and categorized into their disease classes. The resulting processed dataset comprises 3089 images. Our proposed dataset can be used to train classification deep learning models of infected date palm leaves, thus enabling the early prevention of palm tree-related diseases.
RESUMO
Water leakage within water distribution networks (WDNs) presents significant challenges, encompassing infrastructure damage, economic losses, and public health risks. Traditional methods for leak localization based on acoustic signals encounter inherent limitations due to environmental noise and signal distortions. In response to this crucial issue, this study introduces an innovative approach that utilizes deep learning-based techniques to estimate time delay for leak localization. The research findings reveal that while the Res1D-CNN model demonstrates inferior performance compared to the GCC-SCOT and BCC under high signal-to-noise ratio (SNR) conditions, it exhibits robust capabilities and higher accuracy in low SNR scenarios. The proposed method's efficacy was empirically validated through field measurements. This advancement in acoustic leak localization holds the potential to significantly improve fault diagnosis and maintenance systems, thereby enabling efficient management of WDNs.
RESUMO
Hearing-impaired people use sign language as a means of communication with those with no hearing disability. It is therefore difficult to communicate with hearing impaired people without the expertise of a signer or knowledge of sign language. As a result, technologies that understands sign language are required to bridge the communication gap between those that have hearing impairments and those that dont. Ethiopian Amharic alphabets sign language (EAMASL) is different from other countries sign languages because Amharic Language is spoken in Ethiopia and has a number of complex alphabets. Presently in Ethiopia, just a few studies on AMASL have been conducted. Previous works, on the other hand, only worked on basic and a few derived Amharic alphabet signs. To solve this challenge, in this paper, we propose Machine Learning techniques such as Support Vector Machine (SVM) with Convolutional Neural Network (CNN), Histogram of Oriented Gradients (HOG), and their hybrid features to recognize the remaining derived Amharic alphabet signs. Because CNN is good for rotation and translation of signs, and HOG works well for low quality data under strong illumination variation and a small quantity of training data, the two have been combined for feature extraction. CNN (Softmax) was utilized as a classifier for normalized hybrid features in addition to SVM. SVM model using CNN, HOG, normalized, and non-normalized hybrid feature vectors achieved an accuracy of 89.02%, 95.42%, 97.40%, and 93.61% using 10-fold cross validation, respectively. With the normalized hybrid features, the other classifier, CNN (sofmax), produced a 93.55% accuracy.
RESUMO
Psychiatric disorders present diagnostic challenges due to individuals concealing their genuine emotions, and traditional methods relying on neurophysiological signals have limitations. Our study proposes an improved EEG-based diagnostic model employing Deep Learning (DL) techniques to address this. By experimenting with DL models on EEG data, we aimed to enhance psychiatric disorder diagnosis, offering promising implications for medical advancements. We utilized a dataset of 945 individuals, including 850 patients and 95 healthy subjects, focusing on six main and nine specific disorders. Quantitative EEG data were analyzed during resting states, featuring power spectral density (PSD) and functional connectivity (FC) across various frequency bands. Employing artificial neural networks (ANN), K nearest neighbors (KNN), Long short-term memory (LSTM), bidirectional Long short-term memory (Bi LSTM), and a hybrid CNN-LSTM model, we performed binary classification. Remarkably, all proposed models outperformed previous approaches, with the ANN achieving 96.83â¯% accuracy for obsessive-compulsive disorder using entire band features. CNN-LSTM attained the same accuracy for adjustment disorder, while KNN and LSTM achieved 98.94â¯% accuracy for acute stress disorder using specific feature sets. Notably, KNN and Bi-LSTM models reached 97.88â¯% accuracy for predicting obsessive-compulsive disorder. These findings underscore the potential of EEG as a cost-effective and accessible diagnostic tool for psychiatric disorders, complementing traditional methods like MRI. Our study's advanced DL models show promise in enhancing psychiatric disorder detection and monitoring, with significant implications for clinical application, inspiring hope for improved patient care and outcomes. The potential of EEG as a diagnostic tool for psychiatric disorders is substantial, as it can lead to improved patient care and outcomes in the field of psychiatry.
RESUMO
In public spaces, threats to societal security are a major concern, and emerging technologies offer potential countermeasures. The proposed intelligent person identification system monitors and identifies individuals in public spaces using gait, face, and iris recognition. The system employs a multimodal approach for secure identification and utilises deep convolutional neural networks (DCNNs) that have been pretrained to predict individuals. For increased accuracy, the proposed system is implemented on a cloud server and integrated with citizen identification systems such as Aadhar/SSN. The performance of the system is determined by the rate of accuracy achieved when identifying individuals in a public space. The proposed multimodal secure identification system achieves a 94% accuracy rate, which is higher than that of existing public space person identification systems. Integration with citizen identification systems improves precision and provides immediate life-saving assistance to those in need. Utilising secure deep learning techniques for precise person identification, the proposed system offers a promising solution to security threats in public spaces. This research is necessary to investigate the efficacy and potential applications of the proposed system, including accident identification, theft identification, and intruder identification in public spaces.
Assuntos
Aprendizado Profundo , Humanos , Identificação Biométrica/métodos , Redes Neurais de Computação , Medidas de SegurançaRESUMO
Characterization of breast parenchyma in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a challenging task owing to the complexity of underlying tissue structures. Existing quantitative approaches, like radiomics and deep learning models, lack explicit quantification of intricate and subtle parenchymal structures, including fibroglandular tissue. To address this, we propose a novel topological approach that explicitly extracts multi-scale topological structures to better approximate breast parenchymal structures, and then incorporates these structures into a deep-learning-based prediction model via an attention mechanism. Our topology-informed deep learning model, TopoTxR, leverages topology to provide enhanced insights into tissues critical for disease pathophysiology and treatment response. We empirically validate TopoTxR using the VICTRE phantom breast dataset, showing that the topological structures extracted by our model effectively approximate the breast parenchymal structures. We further demonstrate TopoTxR's efficacy in predicting response to neoadjuvant chemotherapy. Our qualitative and quantitative analyses suggest differential topological behavior of breast tissue in treatment-naïve imaging, in patients who respond favorably to therapy as achieving pathological complete response (pCR) versus those who do not. In a comparative analysis with several baselines on the publicly available I-SPY 1 dataset (N = 161, including 47 patients with pCR and 114 without) and the Rutgers proprietary dataset (N = 120, with 69 patients achieving pCR and 51 not), TopoTxR demonstrates a notable improvement, achieving a 2.6% increase in accuracy and a 4.6% enhancement in AUC compared to the state-of-the-art method.
RESUMO
Gynaecological cancers, especially ovarian cancer, remain a critical public health issue, particularly in regions like India, where there are challenges related to cancer awareness, variable pathology, and limited access to screening facilities. These challenges often lead to the diagnosis of cancer at advanced stages, resulting in poorer outcomes for patients. The goal of this study is to enhance the accuracy of classifying ovarian tumours, with a focus on distinguishing between malignant and early-stage cases, by applying advanced deep learning methods. In our approach, we utilized three pre-trained deep learning models-Xception, ResNet50V2, and ResNet50V2FPN-to classify ovarian tumors using publicly available Computed Tomography (CT) scan data. To further improve the model's performance, we developed a novel CT Sequence Selection Algorithm, which optimises the use of CT images for a more precise classification of ovarian tumours. The models were trained and evaluated on selected TIFF images, comparing the performance of the ResNet50V2FPN model with and without the CT Sequence Selection Algorithm. Our experimental results show the Comparative evaluation against the ResNet50V2 FPN model, both with and without the CT Sequence Selection Algorithm, demonstrates the superiority of the proposed algorithm over existing state-of-the-art methods. This research presents a promising approach for improving the early detection and management of gynecological cancers, with potential benefits for patient outcomes, especially in areas with limited healthcare resources.
Assuntos
Algoritmos , Aprendizado Profundo , Neoplasias Ovarianas , Tomografia Computadorizada por Raios X , Humanos , Feminino , Neoplasias Ovarianas/diagnóstico por imagem , Neoplasias Ovarianas/classificação , Neoplasias Ovarianas/patologia , Tomografia Computadorizada por Raios X/métodosRESUMO
In Thailand, two snail-eating turtle species in the genus Malayemes (M. subtrijuga and M. macrocephala) are protected animals in which smuggling and trading are illegal. Recently, a new species M. khoratensis has been reported and it has not yet been considered as protected animal species. To enforce the law, species identification of Malayemes is crucial. However, it is quite challenging and requires expertise. Therefore, a simple tool, such as image analysis, to differentiate these three snail-eating species would be highly useful. This study proposes a novel ensemble multiview image processing approach for the automated classification of three turtle species in the genus Malayemys. The original YOLOv8 architecture was improved by utilizing a convolutional neural network (CNN) to overcome the limitations of traditional identification methods. This model captures unique morphological features by analyzing Malayemys species images from various angles, addressing challenges such as occlusion and appearance variations. The ensemble multiview strategy significantly increases the YOLOv8 classification accuracy using a comprehensive dataset, achieving an average mean average precision (mAP) of 98% for the genus Malayemys compared with the nonensemble multiview and single-view strategies. The species identification accuracy of the proposed models was validated by comparing genetic methods using mitochondrial DNA with morphological characteristics. Even though the morphological characteristics of these three species are ambiguous, the mitochondrial DNA sequences are quite distinct. Therefore, this alternative tool should be used to increase confidence in field identification. In summary, the contribution of this study not only marks a significant advancement in computational biology but also supports wildlife and turtle conservation efforts by enabling rapid, accurate species identification.
Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Tartarugas , Animais , Tartarugas/anatomia & histologia , Processamento de Imagem Assistida por Computador/métodos , TailândiaRESUMO
Timely diagnosis of brain tumors using MRI and its potential impact on patient survival are critical issues addressed in this study. Traditional DL models often lack transparency, leading to skepticism among medical experts owing to their "black box" nature. This study addresses this gap by presenting an innovative approach for brain tumor detection. It utilizes a customized Convolutional Neural Network (CNN) model empowered by three advanced explainable artificial intelligence (XAI) techniques: Shapley Additive Explana-tions (SHAP), Local Interpretable Model-agnostic Explanations (LIME), and Gradient-weighted Class Activation Mapping (Grad-CAM). The study utilized the BR35H dataset, which includes 3060 brain MRI images encompassing both tumorous and non-tumorous cases. The proposed model achieved a remarkable training accuracy of 100 % and validation accuracy of 98.67 %. Precision, recall, and F1 score metrics demonstrated exceptional performance at 98.50 %, confirming the accuracy of the model in tumor detection. Detailed result analysis, including a confusion matrix, comparison with existing models, and generalizability tests on other datasets, establishes the superiority of the proposed approach and sets a new benchmark for accuracy. By integrating a customized CNN model with XAI techniques, this research enhances trust in AI-driven medical diagnostics and offers a promising pathway for early tumor detection and potentially life-saving interventions.
RESUMO
Introduction: Protein O-GlcNAcylation is a dynamic post-translational modification involved in major cellular processes and associated with many human diseases. Bioinformatic prediction of O-GlcNAc sites before experimental validation is a challenge task in O-GlcNAc research. Recent advancements in deep learning algorithms and the availability of O-GlcNAc proteomics data present an opportunity to improve O-GlcNAc site prediction. Objectives: This study aims to develop a deep learning-based tool to improve O-GlcNAcylation site prediction. Methods: We construct an annotated unbalanced O-GlcNAcylation data set and propose a new deep learning framework, DeepO-GlcNAc, using Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) combined with attention mechanism. Results: The ablation study confirms that the additional model components in DeepO-GlcNAc, such as attention mechanisms and LSTM, contribute positively to improving prediction performance. Our model demonstrates strong robustness across five cross-species datasets, excluding humans. We also compare our model with three external predictors using an independent dataset. Our results demonstrated that DeepO-GlcNAc outperforms the external predictors, achieving an accuracy of 92%, an average precision of 72%, a MCC of 0.60, and an AUC of 92% in ROC analysis. Moreover, we have implemented DeepO-GlcNAc as a web server to facilitate further investigation and usage by the scientific community. Conclusion: Our work demonstrates the feasibility of utilizing deep learning for O-GlcNAc site prediction and provides a novel tool for O-GlcNAc investigation.
RESUMO
In light of the ongoing battle against COVID-19, while the pandemic may eventually subside, sporadic cases may still emerge, underscoring the need for accurate detection from radiological images. However, the limited explainability of current deep learning models restricts clinician acceptance. To address this issue, our research integrates multiple CNN models with explainable AI techniques, ensuring model interpretability before ensemble construction. Our approach enhances both accuracy and interpretability by evaluating advanced CNN models on the largest publicly available X-ray dataset, COVIDx CXR-3, which includes 29,986 images, and the CT scan dataset for SARS-CoV-2 from Kaggle, which includes a total of 2,482 images. We also employed additional public datasets for cross-dataset evaluation, ensuring a thorough assessment of model performance across various imaging conditions. By leveraging methods including LIME, SHAP, Grad-CAM, and Grad-CAM++, we provide transparent insights into model decisions. Our ensemble model, which includes DenseNet169, ResNet50, and VGG16, demonstrates strong performance. For the X-ray image dataset, sensitivity, specificity, accuracy, F1-score, and AUC are recorded at 99.00%, 99.00%, 99.00%, 0.99, and 0.99, respectively. For the CT image dataset, these metrics are 96.18%, 96.18%, 96.18%, 0.9618, and 0.96, respectively. Our methodology bridges the gap between precision and interpretability in clinical settings by combining model diversity with explainability, promising enhanced disease diagnosis and greater clinician acceptance.
Assuntos
COVID-19 , Aprendizado Profundo , SARS-CoV-2 , Tomografia Computadorizada por Raios X , COVID-19/diagnóstico por imagem , COVID-19/diagnóstico , Humanos , Tomografia Computadorizada por Raios X/métodos , SARS-CoV-2/isolamento & purificação , Redes Neurais de ComputaçãoRESUMO
Cell nuclei interpretation is crucial in pathological diagnostics, especially in tumor specimens. A critical step in computational pathology is to detect and analyze individual nuclear properties using segmentation algorithms. Conventionally, a semantic segmentation network is used, where individual nuclear properties are derived after post-processing a segmentation mask. In this study, we focus on showing that an object-detection-based instance segmentation network, the Mask R-CNN, after integrating it with a Feature Pyramidal Network (FPN), gives mature and reliable results for nuclei detection without the need for additional post-processing. The results were analyzed using the Kumar dataset, a public dataset with over 20,000 nuclei annotations from various organs. The dice score of the baseline Mask R-CNN improved from 76% to 83% after integration with an FPN. This was comparable with the 82.6% dice score achieved by modern semantic-segmentation-based networks. Thus, evidence is provided that an end-to-end trainable detection-based instance segmentation algorithm with minimal post-processing steps can reliably be used for the detection and analysis of individual nuclear properties. This represents a relevant task for research and diagnosis in digital pathology, which can improve the automated analysis of histopathological images.
RESUMO
Introduction: Early diagnosis of cervical cancer at the precancerous stage is critical for effective treatment and improved patient outcomes. Objective: This study aims to explore the use of SWIN Transformer and Convolutional Neural Network (CNN) hybrid models combined with transfer learning to classify precancerous colposcopy images. Methods: Out of 913 images from 200 cases obtained from the Colposcopy Image Bank of the International Agency for Research on Cancer, 898 met quality standards and were classified as normal, precancerous, or cancerous based on colposcopy and histopathological findings. The cases corresponding to the 360 precancerous images, along with an equal number of normal cases, were divided into a 70/30 train-test split. The SWIN Transformer and CNN hybrid model combines the advantages of local feature extraction by CNNs with the global context modeling by SWIN Transformers, resulting in superior classification performance and a more automated process. The hybrid model approach involves enhancing image quality through preprocessing, extracting local features with CNNs, capturing the global context with the SWIN Transformer, integrating these features for classification, and refining the training process by tuning hyperparameters. Results: The trained model achieved the following classification performances on fivefold cross-validation data: a 94% Area Under the Curve (AUC), an 88% F1 score, and 87% accuracy. On two completely independent test sets, which were never seen by the model during training, the model achieved an 80% AUC, a 75% F1 score, and 75% accuracy on the first test set (precancerous vs. normal) and an 82% AUC, a 78% F1 score, and 75% accuracy on the second test set (cancer vs. normal). Conclusions: These high-performance metrics demonstrate the models' effectiveness in distinguishing precancerous from normal colposcopy images, even with modest datasets, limited data augmentation, and the smaller effect size of precancerous images compared to malignant lesions. The findings suggest that these techniques can significantly aid in the early detection of cervical cancer at the precancerous stage.
RESUMO
In recent years, deep learning-based approaches, particularly those leveraging the Transformer architecture, have garnered widespread attention for network traffic anomaly detection. However, when dealing with noisy data sets, directly inputting network traffic sequences into Transformer networks often significantly degrades detection performance due to interference and noise across dimensions. In this paper, we propose a novel multi-channel network traffic anomaly detection model, MTC-Net, which reduces computational complexity and enhances the model's ability to capture long-distance dependencies. This is achieved by decomposing network traffic sequences into multiple unidimensional time sequences and introducing a patch-based strategy that enables each sub-sequence to retain local semantic information. A backbone network combining Transformer and CNN is employed to capture complex patterns, with information from all channels being fused at the final classification header in order to achieve modelling and detection of complex network traffic patterns. The experimental results demonstrate that MTC-Net outperforms existing state-of-the-art methods in several evaluation metrics, including accuracy, precision, recall, and F1 score, on four publicly available data sets: KDD Cup 99, NSL-KDD, UNSW-NB15, and CIC-IDS2017.
RESUMO
BACKGROUND: The global healthcare system faces challenges in diagnosing and managing lung and colon cancers, which are significant health burdens. Traditional diagnostic methods are inefficient and prone to errors, while data privacy and security concerns persist. OBJECTIVE: This study aims to develop a secure and transparent framework for remote consultation and classification of lung and colon cancer, leveraging blockchain technology and Microsoft Azure cloud services. Dataset and Features: The framework utilizes the LC25000 dataset, containing 25,000 histopathological images, for training and evaluating advanced machine learning models. Key features include secure data upload, anonymization, encryption, and controlled access via blockchain and Azure services. METHODS: The proposed framework integrates Microsoft Azure's cloud services with a permissioned blockchain network. Patients upload CT scans through a mobile app, which are then preprocessed, anonymized, and stored securely in Azure Blob Storage. Blockchain smart contracts manage data access, ensuring only authorized specialists can retrieve and analyze the scans. Azure Machine Learning is used to train and deploy state-of-the-art machine learning models for cancer classification. Evaluation Metrics: The framework's performance is evaluated using metrics such as accuracy, precision, recall, and F1-score, demonstrating the effectiveness of the integrated approach in enhancing diagnostic accuracy and data security. RESULTS: The proposed framework achieves an impressive accuracy of 100% for lung and colon cancer classification using DenseNet, ResNet50, and MobileNet models with different split ratios (70-30, 80-20, 90-10). The F1-score and k-fold cross-validation accuracy (5-fold and 10-fold) also demonstrate exceptional performance, with values exceeding 99.9%. Real-time notifications and secure remote consultations enhance the efficiency and transparency of the diagnostic process, contributing to better patient outcomes and streamlined cancer care management.
Assuntos
Neoplasias do Colo , Segurança Computacional , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Neoplasias Pulmonares/classificação , Neoplasias Pulmonares/patologia , Neoplasias do Colo/diagnóstico por imagem , Neoplasias do Colo/classificação , Neoplasias do Colo/patologia , Blockchain , Aprendizado de Máquina , Computação em NuvemRESUMO
In this paper, we propose a lightweight lithography machine learning-based hotspot detection model that integrates the Squeeze-and-Excitation (SE) attention mechanism and the Efficient Channel Attention (ECA) mechanism. These mechanisms can adaptively adjust channel weights, significantly enhancing the model's ability to extract relevant features of hotspots and non-hotspots through cross-channel interaction without dimensionality reduction. Our model extracts feature vectors through seven convolutional layers and four pooling layers, followed by three fully connected layers that map to the output, thereby simplifying the CNN network structure. Experimental results on our collected layout dataset and the ICCAD 2012 layout dataset demonstrate that our model is more lightweight. By evaluating overall accuracy, recall, and runtime, the comprehensive performance of our model is shown to exceed that of ConvNeXt, Swin transformer, and ResNet 50.
RESUMO
The automatic video recognition of depression is becoming increasingly important in clinical applications. However, traditional depression recognition models still face challenges in practical applications, such as high computational costs, the poor application effectiveness of facial movement features, and spatial feature degradation due to model stitching. To overcome these challenges, this work proposes a lightweight Time-Context Enhanced Depression Detection Network (TCEDN). We first use attention-weighted blocks to aggregate and enhance video frame-level features, easing the model's computational workload. Next, by integrating the temporal and spatial changes of video raw features and facial movement features in a self-learning weight manner, we enhance the precision of depression detection. Finally, a fusion network of 3-Dimensional Convolutional Neural Network (3D-CNN) and Convolutional Long Short-Term Memory Network (ConvLSTM) is constructed to minimize spatial feature loss by avoiding feature flattening and to achieve depression score prediction. Tests on the AVEC2013 and AVEC2014 datasets reveal that our approach yields results on par with state-of-the-art techniques for detecting depression using video analysis. Additionally, our method has significantly lower computational complexity than mainstream methods.
RESUMO
Rapid and reliable detection of human survivors trapped under debris is crucial for effective post-earthquake search and rescue (SAR) operations. This paper presents a novel approach to survivor detection using a snake robot equipped with deep learning (DL) based object identification algorithms. We evaluated the performance of three main algorithms: Faster R-CNN, Single Shot MultiBox Detector (SSD), and You Only Look Once (YOLO). While these algorithms are initially trained on the PASCAL VOC 2012 dataset for human identification, we address the lack of a dedicated dataset for trapped individuals by compiling a new dataset of 200 images that specifically depicts this scenario, featuring cluttered environment and occlusion. Our evaluation takes into account detection accuracy, confidence interval, and running time. The results demonstrate that the YOLOv10 algorithm achieves the 98.4 mAP@0.5, accuracy of 98.5% for inference time of 15 ms. We validate the performance of these algorithms using images of human survivors trapped under debris and subjected to various occlusions.