Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 778
Filtrar
1.
PeerJ Comput Sci ; 10: e2080, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983194

RESUMO

Poultry farming is an indispensable part of global agriculture, playing a crucial role in food safety and economic development. Managing and preventing diseases is a vital task in the poultry industry, where semantic segmentation technology can significantly enhance the efficiency of traditional manual monitoring methods. Furthermore, traditional semantic segmentation has achieved excellent results on extensively manually annotated datasets, facilitating real-time monitoring of poultry. Nonetheless, the model encounters limitations when exposed to new environments, diverse breeding varieties, or varying growth stages within the same species, necessitating extensive data retraining. Overreliance on large datasets results in higher costs for manual annotations and deployment delays, thus hindering practical applicability. To address this issue, our study introduces HSDNet, an innovative semantic segmentation model based on few-shot learning, for monitoring poultry farms. The HSDNet model adeptly adjusts to new settings or species with a single image input while maintaining substantial accuracy. In the specific context of poultry breeding, characterized by small congregating animals and the inherent complexities of agricultural environments, issues of non-smooth losses arise, potentially compromising accuracy. HSDNet incorporates a Sharpness-Aware Minimization (SAM) strategy to counteract these challenges. Furthermore, by considering the effects of imbalanced loss on convergence, HSDNet mitigates the overfitting issue induced by few-shot learning. Empirical findings underscore HSDNet's proficiency in poultry breeding settings, exhibiting a significant 72.89% semantic segmentation accuracy on single images, which is higher than SOTA's 68.85%.

2.
PeerJ Comput Sci ; 10: e2146, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983210

RESUMO

In recent years, the growing importance of accurate semantic segmentation in ultrasound images has led to numerous advances in deep learning-based techniques. In this article, we introduce a novel hybrid network that synergistically combines convolutional neural networks (CNN) and Vision Transformers (ViT) for ultrasound image semantic segmentation. Our primary contribution is the incorporation of multi-scale CNN in both the encoder and decoder stages, enhancing feature learning capabilities across multiple scales. Further, the bottleneck of the network leverages the ViT to capture long-range high-dimension spatial dependencies, a critical factor often overlooked in conventional CNN-based approaches. We conducted extensive experiments using a public benchmark ultrasound nerve segmentation dataset. Our proposed method was benchmarked against 17 existing baseline methods, and the results underscored its superiority, as it outperformed all competing methods including a 4.6% improvement of Dice compared against TransUNet, 13.0% improvement of Dice against Attention UNet, 10.5% improvement of precision compared against UNet. This research offers significant potential for real-world applications in medical imaging, demonstrating the power of blending CNN and ViT in a unified framework.

3.
Neural Netw ; 179: 106505, 2024 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-39002205

RESUMO

Unsupervised domain adaptation (UDA) aims to transfer knowledge in previous and related labeled datasets (sources) to a new unlabeled dataset (target). Despite the impressive performance, existing approaches have largely focused on image-based UDA only, and video-based UDA has been relatively understudied and received less attention due to the difficulty of adapting diverse modal video features and modeling temporal associations efficiently. To address this, existing studies use optical flow to capture motion cues between in-domain consecutive frames, but is limited by heavy compute requirements and modeling flow patterns across diverse domains is equally challenging. In this work, we propose an adversarial domain adaptation approach for video semantic segmentation that aims to align temporally associated pixels in successive source and target domain frames without relying on optical flow. Specifically, we introduce a Perceptual Consistency Matching (PCM) strategy that leverages perceptual similarity to identify pixels with high correlation across consecutive frames, and infer that such pixels should correspond to the same class. Therefore, we can enhance prediction accuracy for video-UDA by enforcing consistency not only between in-domain frames, but across domains using PCM objectives during model training. Extensive experiments on public datasets show the benefit of our approach over existing state-of-the-art UDA methods. Our approach not only addresses a crucial task in video domain adaptation but also offers notable improvements in performance with faster inference times.

4.
Trop Anim Health Prod ; 56(6): 192, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38954103

RESUMO

Accurate breed identification in dairy cattle is essential for optimizing herd management and improving genetic standards. A smart method for correctly identifying phenotypically similar breeds can empower farmers to enhance herd productivity. A convolutional neural network (CNN) based model was developed for the identification of Sahiwal and Red Sindhi cows. To increase the classification accuracy, first, cows's pixels were segmented from the background using CNN model. Using this segmented image, a masked image was produced by retaining cows' pixels from the original image while eliminating the background. To improve the classification accuracy, models were trained on four different images of each cow: front view, side view, grayscale front view, and grayscale side view. The masked images of these views were fed to the multi-input CNN model which predicts the class of input images. The segmentation model achieved intersection-over-union (IoU) and F1-score values of 81.75% and 85.26%, respectively with an inference time of 296 ms. For the classification task, multiple variants of MobileNet and EfficientNet models were used as the backbone along with pre-trained weights. The MobileNet model achieved 80.0% accuracy for both breeds, while MobileNetV2 and MobileNetV3 reached 82.0% accuracy. CNN models with EfficientNet as backbones outperformed MobileNet models, with accuracy ranging from 84.0% to 86.0%. The F1-scores for these models were found to be above 83.0%, indicating effective breed classification with fewer false positives and negatives. Thus, the present study demonstrates that deep learning models can be used effectively to identify phenotypically similar-looking cattle breeds. To accurately identify zebu breeds, this study will reduce the dependence of farmers on experts.


Assuntos
Aprendizado Profundo , Fenótipo , Animais , Bovinos , Cruzamento , Redes Neurais de Computação , Feminino , Indústria de Laticínios/métodos
5.
Sensors (Basel) ; 24(13)2024 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-39000825

RESUMO

Intelligent Traditional Chinese Medicine can provide people with a convenient way to participate in daily health care. The ease of acceptance of Traditional Chinese Medicine is also a major advantage in promoting health management. In Traditional Chinese Medicine, tongue imaging is an important step in the examination process. The segmentation and processing of the tongue image directly affects the results of intelligent Traditional Chinese Medicine diagnosis. As intelligent Traditional Chinese Medicine continues to develop, remote diagnosis and patient participation will play important roles. Smartphone sensor cameras can provide irreplaceable data collection capabilities in enhancing interaction in smart Traditional Chinese Medicine. However, these factors lead to differences in the size and quality of the captured images due to factors such as differences in shooting equipment, professionalism of the photographer, and the subject's cooperation. Most current tongue image segmentation algorithms are based on data collected by professional tongue diagnosis instruments in standard environments, and are not able to demonstrate the tongue image segmentation effect in complex environments. Therefore, we propose a segmentation algorithm for tongue images collected in complex multi-device and multi-user environments. We use convolutional attention and extend state space models to the 2D environment in the encoder. Then, cross-layer connection fusion is used in the decoder part to fuse shallow texture and deep semantic features. Through segmentation experiments on tongue image datasets collected by patients and doctors in real-world settings, our algorithm significantly improves segmentation performance and accuracy.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Medicina Tradicional Chinesa , Língua , Língua/diagnóstico por imagem , Humanos , Medicina Tradicional Chinesa/métodos , Processamento de Imagem Assistida por Computador/métodos , Smartphone
6.
Sci Rep ; 14(1): 16389, 2024 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-39013980

RESUMO

Fluorescence polarization (Fpol) imaging of methylene blue (MB) is a promising quantitative approach to thyroid cancer detection. Clinical translation of MB Fpol technology requires reduction of the data analysis time that can be achieved via deep learning-based automated cell segmentation with a 2D U-Net convolutional neural network. The model was trained and tested using images of pathologically diverse human thyroid cells and evaluated by comparing the number of cells selected, segmented areas, and Fpol values obtained using automated (AU) and manual (MA) data processing methods. Overall, the model segmented 15.8% more cells than the human operator. Differences in AU and MA segmented cell areas varied between - 55.2 and + 31.0%, whereas differences in Fpol values varied from - 20.7 and + 10.7%. No statistically significant differences between AU and MA derived Fpol data were observed. The largest differences in Fpol values correlated with greatest discrepancies in AU versus MA segmented cell areas. Time required for auto-processing was reduced to 10 s versus one hour required for MA data processing. Implementation of the automated cell analysis makes quantitative fluorescence polarization-based diagnosis clinically feasible.


Assuntos
Aprendizado Profundo , Neoplasias da Glândula Tireoide , Humanos , Neoplasias da Glândula Tireoide/patologia , Neoplasias da Glândula Tireoide/diagnóstico por imagem , Neoplasias da Glândula Tireoide/diagnóstico , Azul de Metileno , Polarização de Fluorescência/métodos , Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação , Glândula Tireoide/patologia , Glândula Tireoide/diagnóstico por imagem , Citologia
7.
Comput Methods Programs Biomed ; 254: 108317, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38996804

RESUMO

BACKGROUND AND OBJECTIVE: Preterm delivery is an important factor in the disease burden of the newborn and infants worldwide. Electrohysterography (EHG) has become a promising technique for predicting this condition, thanks to its high degree of sensitivity. Despite the technological progress made in predicting preterm labor, its use in clinical practice is still limited, one of the main barriers being the lack of tools for automatic signal processing without expert supervision, i.e. automatic screening of motion and respiratory artifacts in EHG records. Our main objective was thus to design and validate an automatic system of segmenting and screening the physiological segments of uterine origin in EHG records for robust characterization of uterine myoelectric activity, predicting preterm labor and help to promote the transferability of the EHG technique to clinical practice. METHODS: For this, we combined 300 EHG recordings from the TPEHG DS database and 69 EHG recordings from our own database (Ci2B-La Fe) of women with singleton gestations. This dataset was used to train and evaluate U-Net, U-Net++, and U-Net 3+ for semantic segmentation of the physiological and artifacted segments of EHG signals. The model's predictions were then fine-tuned by post-processing. RESULTS: U-Net 3+ outperformed the other models, achieving an area under the ROC curve of 91.4 % and an average precision of 96.4 % in detecting physiological activity. Thresholds from 0.6 to 0.8 achieved precision from 93.7 % to 97.4 % and specificity from 81.7 % to 94.5 %, detecting high-quality physiological segments while maintaining a trade-off between recall and specificity. Post-processing improved the model's adaptability by fine-tuning both the physiological and corrupted segments, ensuring accurate artifact detection while maintaining physiological segment integrity in EHG signals. CONCLUSIONS: As automatic segmentation proved to be as effective as double-blind manual segmentation in predicting preterm labor, this automatic segmentation tool fills a crucial gap in the existing preterm delivery prediction system workflow by eliminating the need for double-blind segmentation by experts and facilitates the practical clinical use of EHG. This work potentially contributes to the early detection of authentic preterm labor women and will allow clinicians to design individual patient strategies for maternal health surveillance systems and predict adverse pregnancy outcomes.

8.
Sensors (Basel) ; 24(13)2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-39001142

RESUMO

The semantic segmentation of the 3D operating environment represents the key to intelligent mining shovels' autonomous digging and loading operation. However, the complexity of the operating environment of intelligent mining shovels presents challenges, including the variety of scene targets and the uneven number of samples. This results in low accuracy of 3D semantic segmentation and reduces the autonomous operation accuracy of the intelligent mine shovels. To solve these issues, this paper proposes a 3D point cloud semantic segmentation network based on memory enhancement and lightweight attention mechanisms. This model addresses the challenges of an uneven number of sampled scene targets, insufficient extraction of key features to reduce the semantic segmentation accuracy, and an insufficient lightweight level of the model to reduce deployment capability. Firstly, we investigate the memory enhancement learning mechanism, establishing a memory module for key semantic features of the targets. Furthermore, we address the issue of forgetting non-dominant target point cloud features caused by the unbalanced number of samples and enhance the semantic segmentation accuracy. Subsequently, the channel attention mechanism is studied. An attention module based on the statistical characteristics of the channel is established. The adequacy of the expression of the key features is improved by adjusting the weights of the features. This is done in order to improve the accuracy of semantic segmentation further. Finally, the lightweight mechanism is studied by adopting the deep separable convolution instead of conventional convolution to reduce the number of model parameters. Experiments demonstrate that the proposed method can improve the accuracy of semantic segmentation in the 3D scene and reduce the model's complexity. Semantic segmentation accuracy is improved by 7.15% on average compared with the experimental control methods, which contributes to the improvement of autonomous operation accuracy and safety of intelligent mining shovels.

9.
Sensors (Basel) ; 24(11)2024 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-38894177

RESUMO

Visual simultaneous localization and mapping (VSLAM) enhances the navigation of autonomous agents in unfamiliar environments by progressively constructing maps and estimating poses. However, conventional VSLAM pipelines often exhibited degraded performance in dynamic environments featuring mobile objects. Recent research in deep learning led to notable progress in semantic segmentation, which involves assigning semantic labels to image pixels. The integration of semantic segmentation into VSLAM can effectively differentiate between static and dynamic elements in intricate scenes. This paper provided a comprehensive comparative review on leveraging semantic segmentation to improve major components of VSLAM, including visual odometry, loop closure detection, and environmental mapping. Key principles and methods for both traditional VSLAM and deep semantic segmentation were introduced. This paper presented an overview and comparative analysis of the technical implementations of semantic integration across various modules of the VSLAM pipeline. Furthermore, it examined the features and potential use cases associated with the fusion of VSLAM and semantics. It was found that the existing VSLAM model continued to face challenges related to computational complexity. Promising future research directions were identified, including efficient model design, multimodal fusion, online adaptation, dynamic scene reconstruction, and end-to-end joint optimization. This review shed light on the emerging paradigm of semantic VSLAM and how deep learning-enabled semantic reasoning could unlock new capabilities for autonomous intelligent systems to operate reliably in the real world.

10.
Sensors (Basel) ; 24(11)2024 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-38894210

RESUMO

In hazardous environments like mining sites, mobile inspection robots play a crucial role in condition monitoring (CM) tasks, particularly by collecting various kinds of data, such as images. However, the sheer volume of collected image samples and existing noise pose challenges in processing and visualizing thermal anomalies. Recognizing these challenges, our study addresses the limitations of industrial big data analytics for mobile robot-generated image data. We present a novel, fully integrated approach involving a dimension reduction procedure. This includes a semantic segmentation technique utilizing the pre-trained VGG16 CNN architecture for feature selection, followed by random forest (RF) and extreme gradient boosting (XGBoost) classifiers for the prediction of the pixel class labels. We also explore unsupervised learning using the PCA-K-means method for dimension reduction and classification of unlabeled thermal defects based on anomaly severity. Our comprehensive methodology aims to efficiently handle image-based CM tasks in hazardous environments. To validate its practicality, we applied our approach in a real-world scenario, and the results confirm its robust performance in processing and visualizing thermal data collected by mobile inspection robots. This affirms the effectiveness of our methodology in enhancing the overall performance of CM processes.

11.
Sensors (Basel) ; 24(11)2024 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-38894212

RESUMO

Advancements in imaging, computer vision, and automation have revolutionized various fields, including field-based high-throughput plant phenotyping (FHTPP). This integration allows for the rapid and accurate measurement of plant traits. Deep Convolutional Neural Networks (DCNNs) have emerged as a powerful tool in FHTPP, particularly in crop segmentation-identifying crops from the background-crucial for trait analysis. However, the effectiveness of DCNNs often hinges on the availability of large, labeled datasets, which poses a challenge due to the high cost of labeling. In this study, a deep learning with bagging approach is introduced to enhance crop segmentation using high-resolution RGB images, tested on the NU-Spidercam dataset from maize plots. The proposed method outperforms traditional machine learning and deep learning models in prediction accuracy and speed. Remarkably, it achieves up to 40% higher Intersection-over-Union (IoU) than the threshold method and 11% over conventional machine learning, with significantly faster prediction times and manageable training duration. Crucially, it demonstrates that even small labeled datasets can yield high accuracy in semantic segmentation. This approach not only proves effective for FHTPP but also suggests potential for broader application in remote sensing, offering a scalable solution to semantic segmentation challenges. This paper is accompanied by publicly available source code.


Assuntos
Produtos Agrícolas , Aprendizado Profundo , Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Fenótipo , Zea mays , Processamento de Imagem Assistida por Computador/métodos , Semântica
12.
Sensors (Basel) ; 24(11)2024 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-38894374

RESUMO

Visual Simultaneous Localization and Mapping (V-SLAM) plays a crucial role in the development of intelligent robotics and autonomous navigation systems. However, it still faces significant challenges in handling highly dynamic environments. The prevalent method currently used for dynamic object recognition in the environment is deep learning. However, models such as Yolov5 and Mask R-CNN require significant computational resources, which limits their potential in real-time applications due to hardware and time constraints. To overcome this limitation, this paper proposes ADM-SLAM, a visual SLAM system designed for dynamic environments that builds upon the ORB-SLAM2. This system integrates efficient adaptive feature point homogenization extraction, lightweight deep learning semantic segmentation based on an improved DeepLabv3, and multi-view geometric segmentation. It optimizes keyframe extraction, segments potential dynamic objects using contextual information with the semantic segmentation network, and detects the motion states of dynamic objects using multi-view geometric methods, thereby eliminating dynamic interference points. The results indicate that ADM-SLAM outperforms ORB-SLAM2 in dynamic environments, especially in high-dynamic scenes, where it achieves up to a 97% reduction in Absolute Trajectory Error (ATE). In various highly dynamic test sequences, ADM-SLAM outperforms DS-SLAM and DynaSLAM in terms of real-time performance and accuracy, proving its excellent adaptability.

13.
Sensors (Basel) ; 24(11)2024 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-38894421

RESUMO

Steel structures are susceptible to corrosion due to their exposure to the environment. Currently used non-destructive techniques require inspector involvement. Inaccessibility of the defective part may lead to unnoticed corrosion, allowing the corrosion to propagate and cause catastrophic structural failure over time. Autonomous corrosion detection is essential for mitigating these problems. This study investigated the effect of the type of encoder-decoder neural network and the training strategy that works the best to automate the segmentation of corroded pixels in visual images. Models using pre-trained DesnseNet121 and EfficientNetB7 backbones yielded 96.78% and 98.5% average pixel-level accuracy, respectively. Deeper EffiecientNetB7 performed the worst, with only 33% true-positive values, which was 58% less than ResNet34 and the original UNet. ResNet 34 successfully classified the corroded pixels, with 2.98% false positives, whereas the original UNet predicted 8.24% of the non-corroded pixels as corroded when tested on a specific set of images exclusive to the investigated training dataset. Deep networks were found to be better for transfer learning than full training, and a smaller dataset could be one of the reasons for performance degradation. Both fully trained conventional UNet and ResNet34 models were tested on some external images of different steel structures with different colors and types of corrosion, with the ResNet 34 backbone outperforming conventional UNet.

14.
Front Plant Sci ; 15: 1335037, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38895615

RESUMO

Canopy temperature (CT) is often interpreted as representing leaf activity traits such as photosynthetic rates, gas exchange rates, or stomatal conductance. This interpretation is based on the observation that leaf activity traits correlate with transpiration which affects leaf temperature. Accordingly, CT measurements may provide a basis for high throughput assessments of the productivity of wheat canopies during early grain filling, which would allow distinguishing functional from dysfunctional stay-green. However, whereas the usefulness of CT as a fast surrogate measure of sustained vigor under soil drying is well established, its potential to quantify leaf activity traits under high-yielding conditions is less clear. To better understand sensitivity limits of CT measurements under high yielding conditions, we generated within-genotype variability in stay-green functionality by means of differential short-term pre-anthesis canopy shading that modified the sink:source balance. We quantified the effects of these modifications on stay-green properties through a combination of gold standard physiological measurements of leaf activity and newly developed methods for organ-level senescence monitoring based on timeseries of high-resolution imagery and deep-learning-based semantic image segmentation. In parallel, we monitored CT by means of a pole-mounted thermal camera that delivered continuous, ultra-high temporal resolution CT data. Our results show that differences in stay-green functionality translate into measurable differences in CT in the absence of major confounding factors. Differences amounted to approximately 0.8°C and 1.5°C for a very high-yielding source-limited genotype, and a medium-yielding sink-limited genotype, respectively. The gradual nature of the effects of shading on CT during the stay-green phase underscore the importance of a high measurement frequency and a time-integrated analysis of CT, whilst modest effect sizes confirm the importance of restricting screenings to a limited range of morphological and phenological diversity.

15.
BMC Med Imaging ; 24(1): 154, 2024 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-38902660

RESUMO

BACKGROUND: Acute pancreatitis is one of the most common diseases requiring emergency surgery. Rapid and accurate recognition of acute pancreatitis can help improve clinical outcomes. This study aimed to develop a deep learning-powered diagnostic model for acute pancreatitis. MATERIALS AND METHODS: In this investigation, we enrolled a cohort of 190 patients with acute pancreatitis who were admitted to Sichuan Provincial People's Hospital between January 2020 and December 2021. Abdominal computed tomography (CT) scans were obtained from both patients with acute pancreatitis and healthy individuals. Our model was constructed using two modules: (1) the acute pancreatitis classifier module; (2) the pancreatitis lesion segmentation module. Each model's performance was assessed based on precision, recall rate, F1-score, Area Under the Curve (AUC), loss rate, frequency-weighted accuracy (fwavacc), and Mean Intersection over Union (MIOU). RESULTS: Upon admission, significant variations were observed between patients with mild and severe acute pancreatitis in inflammatory indexes, liver, and kidney function indicators, as well as coagulation parameters. The acute pancreatitis classifier module exhibited commendable diagnostic efficacy, showing an impressive AUC of 0.993 (95%CI: 0.978-0.999) in the test set (comprising healthy examination patients vs. those with acute pancreatitis, P < 0.001) and an AUC of 0.850 (95%CI: 0.790-0.898) in the external validation set (healthy examination patients vs. patients with acute pancreatitis, P < 0.001). Furthermore, the acute pancreatitis lesion segmentation module demonstrated exceptional performance in the validation set. For pancreas segmentation, peripancreatic inflammatory exudation, peripancreatic effusion, and peripancreatic abscess necrosis, the MIOU values were 86.02 (84.52, 87.20), 61.81 (56.25, 64.83), 57.73 (49.90, 68.23), and 66.36 (55.08, 72.12), respectively. These findings underscore the robustness and reliability of the developed models in accurately characterizing and assessing acute pancreatitis. CONCLUSION: The diagnostic model for acute pancreatitis, driven by deep learning, exhibits excellent efficacy in accurately evaluating the severity of the condition. TRIAL REGISTRATION: This is a retrospective study.


Assuntos
Aprendizado Profundo , Pancreatite , Tomografia Computadorizada por Raios X , Humanos , Pancreatite/diagnóstico por imagem , Masculino , Feminino , Tomografia Computadorizada por Raios X/métodos , Pessoa de Meia-Idade , Adulto , Doença Aguda , Idoso , Estudos Retrospectivos
16.
J Imaging ; 10(6)2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38921602

RESUMO

A fundamental task in computer vision is the process of differentiation and identification of different objects or entities in a visual scene using semantic segmentation methods. The advancement of transformer networks has surpassed traditional convolutional neural network (CNN) architectures in terms of segmentation performance. The continuous pursuit of optimal performance, with respect to the popular evaluation metric results, has led to very large architectures that require a significant amount of computational power to operate, making them prohibitive for real-time applications, including autonomous driving. In this paper, we propose a model that leverages a visual transformer encoder with a parallel twin decoder, consisting of a visual transformer decoder and a CNN decoder with multi-resolution connections working in parallel. The two decoders are merged with the aid of two trainable CNN blocks, the fuser that combined the information from the two decoders and the scaler that scales the contribution of each decoder. The proposed model achieves state-of-the-art performance on the Cityscapes and ADE20K datasets, maintaining a low-complexity network that can be used in real-time applications.

17.
Biomedicines ; 12(6)2024 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-38927516

RESUMO

This article addresses the semantic segmentation of laparoscopic surgery images, placing special emphasis on the segmentation of structures with a smaller number of observations. As a result of this study, adjustment parameters are proposed for deep neural network architectures, enabling a robust segmentation of all structures in the surgical scene. The U-Net architecture with five encoder-decoders (U-Net5ed), SegNet-VGG19, and DeepLabv3+ employing different backbones are implemented. Three main experiments are conducted, working with Rectified Linear Unit (ReLU), Gaussian Error Linear Unit (GELU), and Swish activation functions. The applied loss functions include Cross Entropy (CE), Focal Loss (FL), Tversky Loss (TL), Dice Loss (DiL), Cross Entropy Dice Loss (CEDL), and Cross Entropy Tversky Loss (CETL). The performance of Stochastic Gradient Descent with momentum (SGDM) and Adaptive Moment Estimation (Adam) optimizers is compared. It is qualitatively and quantitatively confirmed that DeepLabv3+ and U-Net5ed architectures yield the best results. The DeepLabv3+ architecture with the ResNet-50 backbone, Swish activation function, and CETL loss function reports a Mean Accuracy (MAcc) of 0.976 and Mean Intersection over Union (MIoU) of 0.977. The semantic segmentation of structures with a smaller number of observations, such as the hepatic vein, cystic duct, Liver Ligament, and blood, verifies that the obtained results are very competitive and promising compared to the consulted literature. The proposed selected parameters were validated in the YOLOv9 architecture, which showed an improvement in semantic segmentation compared to the results obtained with the original architecture.

18.
Neural Netw ; 178: 106475, 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38941738

RESUMO

Spiking neural networks (SNNs) have attracted attention due to their biological plausibility and the potential for low-energy applications on neuromorphic hardware. Two mainstream approaches are commonly used to obtain SNNs, i.e., ANN-to-SNN conversion methods, and Directly-trained-SNN methods. However, the former achieve excellent performance at the cost of a large number of time steps (i.e., latency), while the latter exhibit lower latency but suffers from suboptimal performance. To tackle the performance-latency trade-off, we propose Self-Architectural Knowledge Distillation (SAKD), an intuitive and effective method for SNNs leveraging Knowledge Distillation (KD). We adopt a bilevel teacher-student training strategy in SAKD, i.e., level-1 involves directly transferring same-architectural pre-trained ANN weights to SNNs, and level-2 encourages the SNNs to mimic ANN's behavior, considering both final responses and intermediate features aspects. Learning with informative supervision signals fostered by labels and ANNs, our SAKD achieves new state-of-the-art (SOTA) performance with a few time steps on widely-used classification benchmark datasets. On ImageNet-1K, with only 4 time steps, our Spiking-ResNet34 model attains a Top-1 accuracy of 70.04%, outperforming the previous same-architectural SOTA methods. Notably, our SEW-ResNet152 model reaches a Top-1 accuracy of 77.30% on ImageNet-1K, setting a new SOTA benchmark for SNNs. Furthermore, we apply our SAKD to various dense prediction downstream tasks, such as object detection and semantic segmentation, demonstrating strong generalization ability and superior performance. In conclusion, our proposed SAKD framework presents a promising approach for achieving both high performance and low latency in SNNs, potentially paving the way for future advancements in the field.

19.
Sensors (Basel) ; 24(12)2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38931509

RESUMO

Oil spills are a major threat to marine and coastal environments. Their unique radar backscatter intensity can be captured by synthetic aperture radar (SAR), resulting in dark regions in the images. However, many marine phenomena can lead to erroneous detections of oil spills. In addition, SAR images of the ocean include multiple targets, such as sea surface, land, ships, and oil spills and their look-alikes. The training of a multi-category classifier will encounter significant challenges due to the inherent class imbalance. Addressing this issue requires extracting target features more effectively. In this study, a lightweight U-Net-based model, Full-Scale Aggregated MobileUNet (FA-MobileUNet), was proposed to improve the detection performance for oil spills using SAR images. First, a lightweight MobileNetv3 model was used as the backbone of the U-Net encoder for feature extraction. Next, atrous spatial pyramid pooling (ASPP) and a convolutional block attention module (CBAM) were used to improve the capacity of the network to extract multi-scale features and to increase the speed of module calculation. Finally, full-scale features from the encoder were aggregated to enhance the network's competence in extracting features. The proposed modified network enhanced the extraction and integration of features at different scales to improve the accuracy of detecting diverse marine targets. The experimental results showed that the mean intersection over union (mIoU) of the proposed model reached more than 80% for the detection of five types of marine targets including sea surface, land, ships, and oil spills and their look-alikes. In addition, the IoU of the proposed model reached 75.85 and 72.67% for oil spill and look-alike detection, which was 18.94% and 25.55% higher than that of the original U-Net model, respectively. Compared with other segmentation models, the proposed network can more accurately classify the black regions in SAR images into oil spills and their look-alikes. Furthermore, the detection performance and computational efficiency of the proposed model were also validated against other semantic segmentation models.

20.
Ultrasound Med Biol ; 2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38834493

RESUMO

OBJECTIVE: Echocardiographic videos are commonly used for automatic semantic segmentation of endocardium, which is crucial in evaluating cardiac function and assisting doctors to make accurate diagnoses of heart disease. However, this task faces two distinct challenges: one is the edge blurring, which is caused by the presence of speckle noise or excessive de-noising operation, and the other is the lack of an effective feature fusion approach for multilevel features for obtaining accurate endocardium. METHODS: In this study, a deep learning model, based on multilevel edge perception and calibration fusion is proposed to improve the segmentation performance. First, a multilevel edge perception module is proposed to comprehensively extract edge features through both a detail branch and a semantic branch to alleviate the adverse impact of noise. Second, a calibration fusion module is proposed that calibrates and integrates various features, including semantic and detailed information, to maximize segmentation performance. Furthermore, the features obtained from the calibration fusion module are stored by using a memory architecture to achieve semi-supervised segmentation through both labeled and unlabeled data. RESULTS: Our method is evaluated on two public echocardiography video data sets, achieving average Dice coefficients of 93.05% and 93.93%, respectively. Additionally, we validated our method on a local hospital clinical data set, achieving a Pearson correlation of 0.765 for predicting left ventricular ejection fraction. CONCLUSION: The proposed model effectively solves the challenges encountered in echocardiography by using semi-supervised networks, thereby improving the segmentation accuracy of the ventricles. This indicates that the proposed model can assist cardiologists in obtaining accurate and effective research and diagnostic results.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...