Pesquisa | BVS Integralidade em Saúde

1.

Benchmarking robustness of deep neural networks in semantic segmentation of fluorescence microscopy images.

Zhong, Liqun; Li, Lingrui; Yang, Ge.

BMC Bioinformatics ; 25(1): 269, 2024 Aug 20.

Artigo em Inglês | MEDLINE | ID: mdl-39164632

RESUMO

BACKGROUND: Fluorescence microscopy (FM) is an important and widely adopted biological imaging technique. Segmentation is often the first step in quantitative analysis of FM images. Deep neural networks (DNNs) have become the state-of-the-art tools for image segmentation. However, their performance on natural images may collapse under certain image corruptions or adversarial attacks. This poses real risks to their deployment in real-world applications. Although the robustness of DNN models in segmenting natural images has been studied extensively, their robustness in segmenting FM images remains poorly understood RESULTS: To address this deficiency, we have developed an assay that benchmarks robustness of DNN segmentation models using datasets of realistic synthetic 2D FM images with precisely controlled corruptions or adversarial attacks. Using this assay, we have benchmarked robustness of ten representative models such as DeepLab and Vision Transformer. We find that models with good robustness on natural images may perform poorly on FM images. We also find new robustness properties of DNN models and new connections between their corruption robustness and adversarial robustness. To further assess the robustness of the selected models, we have also benchmarked them on real microscopy images of different modalities without using simulated degradation. The results are consistent with those obtained on the realistic synthetic images, confirming the fidelity and reliability of our image synthesis method as well as the effectiveness of our assay. CONCLUSIONS: Based on comprehensive benchmarking experiments, we have found distinct robustness properties of deep neural networks in semantic segmentation of FM images. Based on the findings, we have made specific recommendations on selection and design of robust models for FM image segmentation.

Assuntos

Benchmarking , Processamento de Imagem Assistida por Computador , Microscopia de Fluorescência , Redes Neurais de Computação , Microscopia de Fluorescência/métodos , Benchmarking/métodos , Processamento de Imagem Assistida por Computador/métodos , Semântica , Aprendizado Profundo , Algoritmos , Humanos

2.

Application of deep learning for the analysis of stomata: A review of current methods and future directions.

Gibbs, Jonathon A; Burgess, Alexandra J.

J Exp Bot ; 2024 May 08.

Artigo em Inglês | MEDLINE | ID: mdl-38716775

RESUMO

Plant physiology and metabolism relies on the function of stomata, structures on the surface of above ground organs, which facilitate the exchange of gases with the atmosphere. The morphology of the guard cells and corresponding pore which make up the stomata, as well as the density (number per unit area) are critical in determining overall gas exchange capacity. These characteristics can be quantified visually from images captured using microscopes, traditionally relying on time-consuming manual analysis. However, deep learning (DL) models provide a promising route to increase the throughput and accuracy of plant phenotyping tasks, including stomatal analysis. Here we review the published literature on the application of DL for stomatal analysis. We discuss the variation in pipelines used; from data acquisition, pre-processing, DL architecture and output evaluation to post processing. We introduce the most common network structures, the plant species that have been studied, and the measurements that have been performed. Through this review, we hope to promote the use of DL methods for plant phenotyping tasks and highlight future requirements to optimise uptake; predominantly focusing on the sharing of datasets and generalisation of models as well as the caveats associated with utilising image data to infer physiological function.

3.

Fully semantic segmentation for rectal cancer based on post-nCRT MRl modality and deep learning framework.

Xia, Shaojun; Li, Qingyang; Zhu, Hai-Tao; Zhang, Xiao-Yan; Shi, Yan-Jie; Yang, Ding; Wu, Jiaqi; Guan, Zhen; Lu, Qiaoyuan; Li, Xiao-Ting; Sun, Ying-Shi.

BMC Cancer ; 24(1): 315, 2024 Mar 07.

Artigo em Inglês | MEDLINE | ID: mdl-38454349

RESUMO

PURPOSE: Rectal tumor segmentation on post neoadjuvant chemoradiotherapy (nCRT) magnetic resonance imaging (MRI) has great significance for tumor measurement, radiomics analysis, treatment planning, and operative strategy. In this study, we developed and evaluated segmentation potential exclusively on post-chemoradiation T2-weighted MRI using convolutional neural networks, with the aim of reducing the detection workload for radiologists and clinicians. METHODS: A total of 372 consecutive patients with LARC were retrospectively enrolled from October 2015 to December 2017. The standard-of-care neoadjuvant process included 22-fraction intensity-modulated radiation therapy and oral capecitabine. Further, 243 patients (3061 slices) were grouped into training and validation datasets with a random 80:20 split, and 41 patients (408 slices) were used as the test dataset. A symmetric eight-layer deep network was developed using the nnU-Net Framework, which outputs the segmentation result with the same size. The trained deep learning (DL) network was examined using fivefold cross-validation and tumor lesions with different TRGs. RESULTS: At the stage of testing, the Dice similarity coefficient (DSC), 95% Hausdorff distance (HD95), and mean surface distance (MSD) were applied to quantitatively evaluate the performance of generalization. Considering the test dataset (41 patients, 408 slices), the average DSC, HD95, and MSD were 0.700 (95% CI: 0.680-0.720), 17.73 mm (95% CI: 16.08-19.39), and 3.11 mm (95% CI: 2.67-3.56), respectively. Eighty-two percent of the MSD values were less than 5 mm, and fifty-five percent were less than 2 mm (median 1.62 mm, minimum 0.07 mm). CONCLUSIONS: The experimental results indicated that the constructed pipeline could achieve relatively high accuracy. Future work will focus on assessing the performances with multicentre external validation.

Assuntos

Aprendizado Profundo , Neoplasias Retais , Humanos , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Terapia Neoadjuvante , Neoplasias Retais/diagnóstico por imagem , Neoplasias Retais/terapia , Neoplasias Retais/patologia , Estudos Retrospectivos , Semântica

4.

Renal tumor segmentation, visualization, and segmentation confidence using ensembles of neural networks in patients undergoing surgical resection.

Bachanek, Sophie; Wuerzberg, Paul; Biggemann, Lorenz; Janssen, Tanja Yani; Nietert, Manuel; Lotz, Joachim; Zeuschner, Philip; Maßmann, Alexander; Uhlig, Annemarie; Uhlig, Johannes.

Eur Radiol ; 2024 Aug 23.

Artigo em Inglês | MEDLINE | ID: mdl-39177855

RESUMO

OBJECTIVES: To develop an automatic segmentation model for solid renal tumors on contrast-enhanced CTs and to visualize segmentation with associated confidence to promote clinical applicability. MATERIALS AND METHODS: The training dataset included solid renal tumor patients from two tertiary centers undergoing surgical resection and receiving CT in the corticomedullary or nephrogenic contrast media (CM) phase. Manual tumor segmentation was performed on all axial CT slices serving as reference standard for automatic segmentations. Independent testing was performed on the publicly available KiTS 2019 dataset. Ensembles of neural networks (ENN, DeepLabV3) were used for automatic renal tumor segmentation, and their performance was quantified with DICE score. ENN average foreground entropy measured segmentation confidence (binary: successful segmentation with DICE score > 0.8 versus inadequate segmentation ≤ 0.8). RESULTS: N = 639/n = 210 patients were included in the training and independent test dataset. Datasets were comparable regarding age and sex (p > 0.05), while renal tumors in the training dataset were larger and more frequently benign (p < 0.01). In the internal test dataset, the ENN model yielded a median DICE score = 0.84 (IQR: 0.62-0.97, corticomedullary) and 0.86 (IQR: 0.77-0.96, nephrogenic CM phase), and the segmentation confidence an AUC = 0.89 (sensitivity = 0.86; specificity = 0.77). In the independent test dataset, the ENN model achieved a median DICE score = 0.84 (IQR: 0.71-0.97, corticomedullary CM phase); and segmentation confidence an accuracy = 0.84 (sensitivity = 0.86 and specificity = 0.81). ENN segmentations were visualized with color-coded voxelwise tumor probabilities and thresholds superimposed on clinical CT images. CONCLUSIONS: ENN-based renal tumor segmentation robustly performs in external test data and might aid in renal tumor classification and treatment planning. CLINICAL RELEVANCE STATEMENT: Ensembles of neural networks (ENN) models could automatically segment renal tumors on routine CTs, enabling and standardizing downstream image analyses and treatment planning. Providing confidence measures and segmentation overlays on images can lower the threshold for clinical ENN implementation. KEY POINTS: Ensembles of neural networks (ENN) segmentation is visualized by color-coded voxelwise tumor probabilities and thresholds. ENN provided a high segmentation accuracy in internal testing and in an independent external test dataset. ENN models provide measures of segmentation confidence which can robustly discriminate between successful and inadequate segmentations.

5.

Intraoperative artificial intelligence system identifying liver vessels in laparoscopic liver resection: a retrospective experimental study.

Une, Norikazu; Kobayashi, Shin; Kitaguchi, Daichi; Sunakawa, Taiki; Sasaki, Kimimasa; Ogane, Tateo; Hayashi, Kazuyuki; Kosugi, Norihito; Kudo, Masashi; Sugimoto, Motokazu; Hasegawa, Hiro; Takeshita, Nobuyoshi; Gotohda, Naoto; Ito, Masaaki.

Surg Endosc ; 38(2): 1088-1095, 2024 02.

Artigo em Inglês | MEDLINE | ID: mdl-38216749

RESUMO

BACKGROUND: The precise recognition of liver vessels during liver parenchymal dissection is the crucial technique for laparoscopic liver resection (LLR). This retrospective feasibility study aimed to develop artificial intelligence (AI) models to recognize liver vessels in LLR, and to evaluate their accuracy and real-time performance. METHODS: Images from LLR videos were extracted, and the hepatic veins and Glissonean pedicles were labeled separately. Two AI models were developed to recognize liver vessels: the "2-class model" which recognized both hepatic veins and Glissonean pedicles as equivalent vessels and distinguished them from the background class, and the "3-class model" which recognized them all separately. The Feature Pyramid Network was used as a neural network architecture for both models in their semantic segmentation tasks. The models were evaluated using fivefold cross-validation tests, and the Dice coefficient (DC) was used as an evaluation metric. Ten gastroenterological surgeons also evaluated the models qualitatively through rubric. RESULTS: In total, 2421 frames from 48 video clips were extracted. The mean DC value of the 2-class model was 0.789, with a processing speed of 0.094 s. The mean DC values for the hepatic vein and the Glissonean pedicle in the 3-class model were 0.631 and 0.482, respectively. The average processing time for the 3-class model was 0.097 s. Qualitative evaluation by surgeons revealed that false-negative and false-positive ratings in the 2-class model averaged 4.40 and 3.46, respectively, on a five-point scale, while the false-negative, false-positive, and vessel differentiation ratings in the 3-class model averaged 4.36, 3.44, and 3.28, respectively, on a five-point scale. CONCLUSION: We successfully developed deep-learning models that recognize liver vessels in LLR with high accuracy and sufficient processing speed. These findings suggest the potential of a new real-time automated navigation system for LLR.

Assuntos

Inteligência Artificial , Laparoscopia , Humanos , Estudos Retrospectivos , Fígado/diagnóstico por imagem , Fígado/cirurgia , Fígado/irrigação sanguínea , Hepatectomia/métodos , Laparoscopia/métodos

6.

A deep learning-powered diagnostic model for acute pancreatitis.

Zhang, Chi; Peng, Jin; Wang, Lu; Wang, Yu; Chen, Wei; Sun, Ming-Wei; Jiang, Hua.

BMC Med Imaging ; 24(1): 154, 2024 Jun 20.

Artigo em Inglês | MEDLINE | ID: mdl-38902660

RESUMO

BACKGROUND: Acute pancreatitis is one of the most common diseases requiring emergency surgery. Rapid and accurate recognition of acute pancreatitis can help improve clinical outcomes. This study aimed to develop a deep learning-powered diagnostic model for acute pancreatitis. MATERIALS AND METHODS: In this investigation, we enrolled a cohort of 190 patients with acute pancreatitis who were admitted to Sichuan Provincial People's Hospital between January 2020 and December 2021. Abdominal computed tomography (CT) scans were obtained from both patients with acute pancreatitis and healthy individuals. Our model was constructed using two modules: (1) the acute pancreatitis classifier module; (2) the pancreatitis lesion segmentation module. Each model's performance was assessed based on precision, recall rate, F1-score, Area Under the Curve (AUC), loss rate, frequency-weighted accuracy (fwavacc), and Mean Intersection over Union (MIOU). RESULTS: Upon admission, significant variations were observed between patients with mild and severe acute pancreatitis in inflammatory indexes, liver, and kidney function indicators, as well as coagulation parameters. The acute pancreatitis classifier module exhibited commendable diagnostic efficacy, showing an impressive AUC of 0.993 (95%CI: 0.978-0.999) in the test set (comprising healthy examination patients vs. those with acute pancreatitis, P < 0.001) and an AUC of 0.850 (95%CI: 0.790-0.898) in the external validation set (healthy examination patients vs. patients with acute pancreatitis, P < 0.001). Furthermore, the acute pancreatitis lesion segmentation module demonstrated exceptional performance in the validation set. For pancreas segmentation, peripancreatic inflammatory exudation, peripancreatic effusion, and peripancreatic abscess necrosis, the MIOU values were 86.02 (84.52, 87.20), 61.81 (56.25, 64.83), 57.73 (49.90, 68.23), and 66.36 (55.08, 72.12), respectively. These findings underscore the robustness and reliability of the developed models in accurately characterizing and assessing acute pancreatitis. CONCLUSION: The diagnostic model for acute pancreatitis, driven by deep learning, exhibits excellent efficacy in accurately evaluating the severity of the condition. TRIAL REGISTRATION: This is a retrospective study.

Assuntos

Aprendizado Profundo , Pancreatite , Tomografia Computadorizada por Raios X , Humanos , Pancreatite/diagnóstico por imagem , Masculino , Feminino , Tomografia Computadorizada por Raios X/métodos , Pessoa de Meia-Idade , Adulto , Doença Aguda , Idoso , Estudos Retrospectivos

7.

Imaging segmentation mechanism for rectal tumors using improved U-Net.

Zhang, Kenan; Yang, Xiaotang; Cui, Yanfen; Zhao, Jumin; Li, Dengao.

BMC Med Imaging ; 24(1): 95, 2024 Apr 23.

Artigo em Inglês | MEDLINE | ID: mdl-38654162

RESUMO

OBJECTIVE: In radiation therapy, cancerous region segmentation in magnetic resonance images (MRI) is a critical step. For rectal cancer, the automatic segmentation of rectal tumors from an MRI is a great challenge. There are two main shortcomings in existing deep learning-based methods that lead to incorrect segmentation: 1) there are many organs surrounding the rectum, and the shape of some organs is similar to that of rectal tumors; 2) high-level features extracted by conventional neural networks often do not contain enough high-resolution information. Therefore, an improved U-Net segmentation network based on attention mechanisms is proposed to replace the traditional U-Net network. METHODS: The overall framework of the proposed method is based on traditional U-Net. A ResNeSt module was added to extract the overall features, and a shape module was added after the encoder layer. We then combined the outputs of the shape module and the decoder to obtain the results. Moreover, the model used different types of attention mechanisms, so that the network learned information to improve segmentation accuracy. RESULTS: We validated the effectiveness of the proposed method using 3773 2D MRI datasets from 304 patients. The results showed that the proposed method achieved 0.987, 0.946, 0.897, and 0.899 for Dice, MPA, MioU, and FWIoU, respectively; these values are significantly better than those of other existing methods. CONCLUSION: Due to time savings, the proposed method can help radiologists segment rectal tumors effectively and enable them to focus on patients whose cancerous regions are difficult for the network to segment. SIGNIFICANCE: The proposed method can help doctors segment rectal tumors, thereby ensuring good diagnostic quality and accuracy.

Assuntos

Aprendizado Profundo , Imageamento por Ressonância Magnética , Neoplasias Retais , Neoplasias Retais/diagnóstico por imagem , Neoplasias Retais/patologia , Humanos , Imageamento por Ressonância Magnética/métodos , Redes Neurais de Computação , Interpretação de Imagem Assistida por Computador/métodos , Masculino

8.

Assessment of changes in vessel area during needle manipulation in microvascular anastomosis using a deep learning-based semantic segmentation algorithm: A pilot study.

Tang, Minghui; Sugiyama, Taku; Takahari, Ren; Sugimori, Hiroyuki; Yoshimura, Takaaki; Ogasawara, Katsuhiko; Kudo, Kohsuke; Fujimura, Miki.

Neurosurg Rev ; 47(1): 200, 2024 May 09.

Artigo em Inglês | MEDLINE | ID: mdl-38722409

RESUMO

Appropriate needle manipulation to avoid abrupt deformation of fragile vessels is a critical determinant of the success of microvascular anastomosis. However, no study has yet evaluated the area changes in surgical objects using surgical videos. The present study therefore aimed to develop a deep learning-based semantic segmentation algorithm to assess the area change of vessels during microvascular anastomosis for objective surgical skill assessment with regard to the "respect for tissue." The semantic segmentation algorithm was trained based on a ResNet-50 network using microvascular end-to-side anastomosis training videos with artificial blood vessels. Using the created model, video parameters during a single stitch completion task, including the coefficient of variation of vessel area (CV-VA), relative change in vessel area per unit time (ΔVA), and the number of tissue deformation errors (TDE), as defined by a ΔVA threshold, were compared between expert and novice surgeons. A high validation accuracy (99.1%) and Intersection over Union (0.93) were obtained for the auto-segmentation model. During the single-stitch task, the expert surgeons displayed lower values of CV-VA (p < 0.05) and ΔVA (p < 0.05). Additionally, experts committed significantly fewer TDEs than novices (p < 0.05), and completed the task in a shorter time (p < 0.01). Receiver operating curve analyses indicated relatively strong discriminative capabilities for each video parameter and task completion time, while the combined use of the task completion time and video parameters demonstrated complete discriminative power between experts and novices. In conclusion, the assessment of changes in the vessel area during microvascular anastomosis using a deep learning-based semantic segmentation algorithm is presented as a novel concept for evaluating microsurgical performance. This will be useful in future computer-aided devices to enhance surgical education and patient safety.

Assuntos

Algoritmos , Anastomose Cirúrgica , Aprendizado Profundo , Humanos , Anastomose Cirúrgica/métodos , Projetos Piloto , Microcirurgia/métodos , Microcirurgia/educação , Agulhas , Competência Clínica , Semântica , Procedimentos Cirúrgicos Vasculares/métodos , Procedimentos Cirúrgicos Vasculares/educação

9.

Automatic segmentation of dura for quantitative analysis of lumbar stenosis: A deep learning study with 518 CT myelograms.

Fan, Guoxin; Li, Yufeng; Wang, Dongdong; Zhang, Jianjin; Du, Xiaokang; Liu, Huaqing; Liao, Xiang.

J Appl Clin Med Phys ; 25(7): e14378, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38729652

RESUMO

BACKGROUND: The diagnosis of lumbar spinal stenosis (LSS) can be challenging because radicular pain is not often present in the culprit-level localization. Accurate segmentation and quantitative analysis of the lumbar dura on radiographic images are key to the accurate differential diagnosis of LSS. The aim of this study is to develop an automatic dura-contouring tool for radiographic quantification on computed tomography myelogram (CTM) for patients with LSS. METHODS: A total of 518 CTM cases with or without lumbar stenosis were included in this study. A deep learning (DL) segmentation algorithm 3-dimensional (3D) U-Net was deployed. A total of 210 labeled cases were used to develop the dura-contouring tool, with the ratio of the training, independent testing, and external validation datasets being 150:30:30. The Dice score (DCS) was the primary measure to evaluate the segmentation performance of the 3D U-Net, which was subsequently developed as the dura-contouring tool to segment another unlabeled 308 CTM cases with LSS. Automatic masks of 446 slices on the stenotic levels were then meticulously reviewed and revised by human experts, and the cross-sectional area (CSA) of the dura was compared. RESULTS: The mean DCS of the 3D U-Net were 0.905 ± 0.080, 0.933 ± 0.018, and 0.928 ± 0.034 in the five-fold cross-validation, the independent testing, and the external validation datasets, respectively. The segmentation performance of the dura-contouring tool was also comparable to that of the second observer (the human expert). With the dura-contouring tool, only 59.0% (263/446) of the automatic masks of the stenotic slices needed to be revised. In the revised cases, there were no significant differences in the dura CSA between automatic masks and corresponding revised masks (p = 0.652). Additionally, a strong correlation of dura CSA was found between the automatic masks and corresponding revised masks (r = 0.805). CONCLUSIONS: A dura-contouring tool was developed that could automatically segment the dural sac on CTM, and it demonstrated high accuracy and generalization ability. Additionally, the dura-contouring tool has the potential to be applied in patients with LSS because it facilitates the quantification of the dural CSA on stenotic slices.

Assuntos

Aprendizado Profundo , Dura-Máter , Vértebras Lombares , Mielografia , Estenose Espinal , Tomografia Computadorizada por Raios X , Humanos , Estenose Espinal/diagnóstico por imagem , Tomografia Computadorizada por Raios X/métodos , Dura-Máter/diagnóstico por imagem , Dura-Máter/patologia , Vértebras Lombares/diagnóstico por imagem , Mielografia/métodos , Masculino , Feminino , Idoso , Pessoa de Meia-Idade , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Adulto , Estudos Retrospectivos

10.

Distinguishing the Uterine Artery, the Ureter, and Nerves in Laparoscopic Surgical Images Using Ensembles of Binary Semantic Segmentation Networks.

Serban, Norbert; Kupas, David; Hajdu, Andras; Török, Peter; Harangi, Balazs.

Sensors (Basel) ; 24(9)2024 May 04.

Artigo em Inglês | MEDLINE | ID: mdl-38733032

RESUMO

Performing a minimally invasive surgery comes with a significant advantage regarding rehabilitating the patient after the operation. But it also causes difficulties, mainly for the surgeon or expert who performs the surgical intervention, since only visual information is available and they cannot use their tactile senses during keyhole surgeries. This is the case with laparoscopic hysterectomy since some organs are also difficult to distinguish based on visual information, making laparoscope-based hysterectomy challenging. In this paper, we propose a solution based on semantic segmentation, which can create pixel-accurate predictions of surgical images and differentiate the uterine arteries, ureters, and nerves. We trained three binary semantic segmentation models based on the U-Net architecture with the EfficientNet-b3 encoder; then, we developed two ensemble techniques that enhanced the segmentation performance. Our pixel-wise ensemble examines the segmentation map of the binary networks on the lowest level of pixels. The other algorithm developed is a region-based ensemble technique that takes this examination to a higher level and makes the ensemble based on every connected component detected by the binary segmentation networks. We also introduced and trained a classic multi-class semantic segmentation model as a reference and compared it to the ensemble-based approaches. We used 586 manually annotated images from 38 surgical videos for this research and published this dataset.

Assuntos

Algoritmos , Laparoscopia , Redes Neurais de Computação , Ureter , Artéria Uterina , Humanos , Laparoscopia/métodos , Feminino , Ureter/diagnóstico por imagem , Ureter/cirurgia , Artéria Uterina/cirurgia , Artéria Uterina/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos , Semântica , Histerectomia/métodos

11.

Simple Scalable Multimodal Semantic Segmentation Model.

Zhu, Yuchang; Xiao, Nanfeng.

Sensors (Basel) ; 24(2)2024 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-38276388

RESUMO

Visual perception is a crucial component of autonomous driving systems. Traditional approaches for autonomous driving visual perception often rely on single-modal methods, and semantic segmentation tasks are accomplished by inputting RGB images. However, for semantic segmentation tasks in autonomous driving visual perception, a more effective strategy involves leveraging multiple modalities, which is because different sensors of the autonomous driving system bring diverse information, and the complementary features among different modalities enhance the robustness of the semantic segmentation modal. Contrary to the intuitive belief that more modalities lead to better accuracy, our research reveals that adding modalities to traditional semantic segmentation models can sometimes decrease precision. Inspired by the residual thinking concept, we propose a multimodal visual perception model which is capable of maintaining or even improving accuracy with the addition of any modality. Our approach is straightforward, using RGB as the main branch and employing the same feature extraction backbone for other modal branches. The modals score module (MSM) evaluates channel and spatial scores of all modality features, measuring their importance for overall semantic segmentation. Subsequently, the modal branches provide additional features to the RGB main branch through the features complementary module (FCM). Leveraging the residual thinking concept further enhances the feature extraction capabilities of all the branches. Through extensive experiments, we derived several conclusions. The integration of certain modalities into traditional semantic segmentation models tends to result in a decline in segmentation accuracy. In contrast, our proposed simple and scalable multimodal model demonstrates the ability to maintain segmentation precision when accommodating any additional modality. Moreover, our approach surpasses some state-of-the-art multimodal semantic segmentation models. Additionally, we conducted ablation experiments on the proposed model, confirming that the application of the proposed MSM, FCM, and the incorporation of residual thinking contribute significantly to the enhancement of the model.

12.

Multi-Resolution Learning and Semantic Edge Enhancement for Super-Resolution Semantic Segmentation of Urban Scene Images.

Shu, Ruijun; Zhao, Shengjie.

Sensors (Basel) ; 24(14)2024 Jul 12.

Artigo em Inglês | MEDLINE | ID: mdl-39065919

RESUMO

Super-resolution semantic segmentation (SRSS) is a technique that aims to obtain high-resolution semantic segmentation results based on resolution-reduced input images. SRSS can significantly reduce computational cost and enable efficient, high-resolution semantic segmentation on mobile devices with limited resources. Some of the existing methods require modifications of the original semantic segmentation network structure or add additional and complicated processing modules, which limits the flexibility of actual deployment. Furthermore, the lack of detailed information in the low-resolution input image renders existing methods susceptible to misdetection at the semantic edges. To address the above problems, we propose a simple but effective framework called multi-resolution learning and semantic edge enhancement-based super-resolution semantic segmentation (MS-SRSS) which can be applied to any existing encoder-decoder based semantic segmentation network. Specifically, a multi-resolution learning mechanism (MRL) is proposed that enables the feature encoder of the semantic segmentation network to improve its feature extraction ability. Furthermore, we introduce a semantic edge enhancement loss (SEE) to alleviate the false detection at the semantic edges. We conduct extensive experiments on the three challenging benchmarks, Cityscapes, Pascal Context, and Pascal VOC 2012, to verify the effectiveness of our proposed MS-SRSS method. The experimental results show that, compared with the existing methods, our method can obtain the new state-of-the-art semantic segmentation performance.

13.

Enhancing Query Formulation for Universal Image Segmentation.

Qu, Yipeng; Kim, Joohee.

Sensors (Basel) ; 24(6)2024 Mar 14.

Artigo em Inglês | MEDLINE | ID: mdl-38544142

RESUMO

Recent advancements in image segmentation have been notably driven by Vision Transformers. These transformer-based models offer one versatile network structure capable of handling a variety of segmentation tasks. Despite their effectiveness, the pursuit of enhanced capabilities often leads to more intricate architectures and greater computational demands. OneFormer has responded to these challenges by introducing a query-text contrastive learning strategy active during training only. However, this approach has not completely addressed the inefficiency issues in text generation and the contrastive loss computation. To solve these problems, we introduce Efficient Query Optimizer (EQO), an approach that efficiently utilizes multi-modal data to refine query optimization in image segmentation. Our strategy significantly reduces the complexity of parameters and computations by distilling inter-class and inter-task information from an image into a single template sentence. Furthermore, we propose a novel attention-based contrastive loss. It is designed to facilitate a one-to-many matching mechanism in the loss computation, which helps object queries learn more robust representations. Beyond merely reducing complexity, our model demonstrates superior performance compared to OneFormer across all three segmentation tasks using the Swin-T backbone. Our evaluations on the ADE20K dataset reveal that our model outperforms OneFormer in multiple metrics: by 0.2% in mean Intersection over Union (mIoU), 0.6% in Average Precision (AP), and 0.8% in Panoptic Quality (PQ). These results highlight the efficacy of our model in advancing the field of image segmentation.

14.

Real-Time Ferrogram Segmentation of Wear Debris Using Multi-Level Feature Reused Unet.

You, Jie; Fan, Shibo; Yu, Qinghai; Wang, Lianfu; Zhang, Zhou; Zong, Zheying.

Sensors (Basel) ; 24(8)2024 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-38676061

RESUMO

The real-time monitoring and fault diagnosis of modern machinery and equipment impose higher demands on equipment maintenance, with the extraction of morphological characteristics of wear debris in lubricating oil emerging as a critical approach for real-time monitoring of wear, holding significant importance in the field. The online visual ferrograph (OLVF) technique serves as a representative method in this study. Various semantic segmentation approaches, such as DeepLabV3+, PSPNet, Segformer, Unet, and other models, are employed to process the oil wear particle image for conducting comparative experiments. In order to accurately segment the minute wear debris in oil abrasive images and mitigate the influence of reflection and bubbles, we propose a multi-level feature reused Unet (MFR Unet) that enhances the residual link strategy of Unet for improved identification of tiny wear debris in ferrograms, leading to superior segmentation results.

15.

Semantic Guidance Fusion Network for Cross-Modal Semantic Segmentation.

Zhang, Pan; Chen, Ming; Gao, Meng.

Sensors (Basel) ; 24(8)2024 Apr 12.

Artigo em Inglês | MEDLINE | ID: mdl-38676090

RESUMO

Leveraging data from various modalities to enhance multimodal segmentation tasks is a well-regarded approach. Recently, efforts have been made to incorporate an array of modalities, including depth and thermal imaging. Nevertheless, the effective amalgamation of cross-modal interactions remains a challenge, given the unique traits each modality presents. In our current research, we introduce the semantic guidance fusion network (SGFN), which is an innovative cross-modal fusion network adept at integrating a diverse set of modalities. Particularly, the SGFN features a semantic guidance module (SGM) engineered to boost bi-modal feature extraction. It encompasses a learnable semantic guidance convolution (SGC) designed to merge intensity and gradient data from disparate modalities. Comprehensive experiments carried out on the NYU Depth V2, SUN-RGBD, Cityscapes, MFNet, and ZJU datasets underscore both the superior performance and generalization ability of the SGFN compared to the current leading models. Moreover, when tested on the DELIVER dataset, the efficiency of our bi-modal SGFN displayed a mIoU that is comparable to the hitherto leading model, CMNEXT.

16.

SEG-SLAM: Dynamic Indoor RGB-D Visual SLAM Integrating Geometric and YOLOv5-Based Semantic Information.

Cong, Peichao; Li, Jiaxing; Liu, Junjie; Xiao, Yixuan; Zhang, Xin.

Sensors (Basel) ; 24(7)2024 Mar 25.

Artigo em Inglês | MEDLINE | ID: mdl-38610313

RESUMO

Simultaneous localisation and mapping (SLAM) is crucial in mobile robotics. Most visual SLAM systems assume that the environment is static. However, in real life, there are many dynamic objects, which affect the accuracy and robustness of these systems. To improve the performance of visual SLAM systems, this study proposes a dynamic visual SLAM (SEG-SLAM) system based on the orientated FAST and rotated BRIEF (ORB)-SLAM3 framework and you only look once (YOLO)v5 deep-learning method. First, based on the ORB-SLAM3 framework, the YOLOv5 deep-learning method is used to construct a fusion module for target detection and semantic segmentation. This module can effectively identify and extract prior information for obviously and potentially dynamic objects. Second, differentiated dynamic feature point rejection strategies are developed for different dynamic objects using the prior information, depth information, and epipolar geometry method. Thus, the localisation and mapping accuracy of the SEG-SLAM system is improved. Finally, the rejection results are fused with the depth information, and a static dense 3D mapping without dynamic objects is constructed using the Point Cloud Library. The SEG-SLAM system is evaluated using public TUM datasets and real-world scenarios. The proposed method is more accurate and robust than current dynamic visual SLAM algorithms.

17.

DFSNet: A 3D Point Cloud Segmentation Network toward Trees Detection in an Orchard Scene.

Bu, Xinrong; Liu, Chao; Liu, Hui; Yang, Guanxue; Shen, Yue; Xu, Jie.

Sensors (Basel) ; 24(7)2024 Mar 31.

Artigo em Inglês | MEDLINE | ID: mdl-38610455

RESUMO

In order to guide orchard management robots to realize some tasks in orchard production such as autonomic navigation and precision spraying, this research proposed a deep-learning network called dynamic fusion segmentation network (DFSNet). The network contains a local feature aggregation (LFA) layer and a dynamic fusion segmentation architecture. The LFA layer uses the positional encoders for initial transforming embedding, and progressively aggregates local patterns via the multi-stage hierarchy. The fusion segmentation module (Fus-Seg) can format point tags by learning a multi-embedding space, and the generated tags can further mine the point cloud features. At the experimental stage, significant segmentation results of the DFSNet were demonstrated on the dataset of orchard fields, achieving an accuracy rate of 89.43% and an mIoU rate of 74.05%. DFSNet outperforms other semantic segmentation networks, such as PointNet, PointNet++, D-PointNet++, DGCNN, and Point-NN, with improved accuracies over them by 11.73%, 3.76%, 2.36%, and 2.74%, respectively, and improved mIoUs over the these networks by 28.19%, 9.89%, 6.33%, 9.89, and 24.69%, respectively, on the all-scale dataset (simple-scale dataset + complex-scale dataset). The proposed DFSNet can capture more information from orchard scene point clouds and provide more accurate point cloud segmentation results, which are beneficial to the management of orchards.

18.

Semantic Segmentation of Remote Sensing Data Based on Channel Attention and Feature Information Entropy.

Duan, Sining; Zhao, Jingyi; Huang, Xinyi; Zhao, Shuhe.

Sensors (Basel) ; 24(4)2024 Feb 19.

Artigo em Inglês | MEDLINE | ID: mdl-38400482

RESUMO

The common channel attention mechanism maps feature statistics to feature weights. However, the effectiveness of this mechanism may not be assured in remotely sensing images due to statistical differences across multiple bands. This paper proposes a novel channel attention mechanism based on feature information called the feature information entropy attention mechanism (FEM). The FEM constructs a relationship between features based on feature information entropy and then maps this relationship to their importance. The Vaihingen dataset and OpenEarthMap dataset are selected for experiments. The proposed method was compared with the squeeze-and-excitation mechanism (SEM), the convolutional block attention mechanism (CBAM), and the frequency channel attention mechanism (FCA). Compared with these three channel attention mechanisms, the mIoU of the FEM in the Vaihingen dataset is improved by 0.90%, 1.10%, and 0.40%, and in the OpenEarthMap dataset, it is improved by 2.30%, 2.20%, and 2.10%, respectively. The proposed channel attention mechanism in this paper shows better performance in remote sensing land use classification.

19.

Autonomous Image-Based Corrosion Detection in Steel Structures Using Deep Learning.

Das, Amrita; Dorafshan, Sattar; Kaabouch, Naima.

Sensors (Basel) ; 24(11)2024 Jun 04.

Artigo em Inglês | MEDLINE | ID: mdl-38894421

RESUMO

Steel structures are susceptible to corrosion due to their exposure to the environment. Currently used non-destructive techniques require inspector involvement. Inaccessibility of the defective part may lead to unnoticed corrosion, allowing the corrosion to propagate and cause catastrophic structural failure over time. Autonomous corrosion detection is essential for mitigating these problems. This study investigated the effect of the type of encoder-decoder neural network and the training strategy that works the best to automate the segmentation of corroded pixels in visual images. Models using pre-trained DesnseNet121 and EfficientNetB7 backbones yielded 96.78% and 98.5% average pixel-level accuracy, respectively. Deeper EffiecientNetB7 performed the worst, with only 33% true-positive values, which was 58% less than ResNet34 and the original UNet. ResNet 34 successfully classified the corroded pixels, with 2.98% false positives, whereas the original UNet predicted 8.24% of the non-corroded pixels as corroded when tested on a specific set of images exclusive to the investigated training dataset. Deep networks were found to be better for transfer learning than full training, and a smaller dataset could be one of the reasons for performance degradation. Both fully trained conventional UNet and ResNet34 models were tested on some external images of different steel structures with different colors and types of corrosion, with the ResNet 34 backbone outperforming conventional UNet.

20.

A Deep-Learning-Based Algorithm for Landslide Detection over Wide Areas Using InSAR Images Considering Topographic Features.

Li, Ning; Feng, Guangcai; Zhao, Yinggang; Xiong, Zhiqiang; He, Lijia; Wang, Xiuhua; Wang, Wenxin; An, Qi.

Sensors (Basel) ; 24(14)2024 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-39065981

RESUMO

The joint action of human activities and environmental changes contributes to the frequent occurrence of landslide, causing major hazards. Using Interferometric Synthetic Aperture Radar (InSAR) technique enables the detailed detection of surface deformation, facilitating early landslide detection. The growing availability of SAR data and the development of artificial intelligence have spurred the integration of deep learning methods with InSAR for intelligent geological identification. However, existing studies using deep learning methods to detect landslides in InSAR deformation often rely on single InSAR data, which leads to the presence of other types of geological hazards in the identification results and limits the accuracy of landslide identification. Landslides are affected by many factors, especially topographic features. To enhance the accuracy of landslide identification, this study improves the existing geological hazard detection model and proposes a multi-source data fusion network termed MSFD-Net. MSFD-Net employs a pseudo-Siamese network without weight sharing, enabling the extraction of texture features from the wrapped deformation data and topographic features from topographic data, which are then fused in higher-level feature layers. We conducted comparative experiments on different networks and ablation experiments, and the results show that the proposed method achieved the best performance. We applied our method to the middle and upper reaches of the Yellow River in eastern Qinghai Province, China, and obtained deformation rates using Sentinel-1 SAR data from 2018 to 2020 in the region, ultimately identifying 254 landslides. Quantitative evaluations reveal that most detected landslides in the study area occurred at an elevation of 2500-3700 m with slope angles of 10-30°. The proposed landslide detection algorithm holds significant promise for quickly and accurately detecting wide-area landslides, facilitating timely preventive and control measures.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa