Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 796
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Exp Bot ; 2024 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-38716775

RESUMEN

Plant physiology and metabolism relies on the function of stomata, structures on the surface of above ground organs, which facilitate the exchange of gases with the atmosphere. The morphology of the guard cells and corresponding pore which make up the stomata, as well as the density (number per unit area) are critical in determining overall gas exchange capacity. These characteristics can be quantified visually from images captured using microscopes, traditionally relying on time-consuming manual analysis. However, deep learning (DL) models provide a promising route to increase the throughput and accuracy of plant phenotyping tasks, including stomatal analysis. Here we review the published literature on the application of DL for stomatal analysis. We discuss the variation in pipelines used; from data acquisition, pre-processing, DL architecture and output evaluation to post processing. We introduce the most common network structures, the plant species that have been studied, and the measurements that have been performed. Through this review, we hope to promote the use of DL methods for plant phenotyping tasks and highlight future requirements to optimise uptake; predominantly focusing on the sharing of datasets and generalisation of models as well as the caveats associated with utilising image data to infer physiological function.

2.
BMC Cancer ; 24(1): 315, 2024 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-38454349

RESUMEN

PURPOSE: Rectal tumor segmentation on post neoadjuvant chemoradiotherapy (nCRT) magnetic resonance imaging (MRI) has great significance for tumor measurement, radiomics analysis, treatment planning, and operative strategy. In this study, we developed and evaluated segmentation potential exclusively on post-chemoradiation T2-weighted MRI using convolutional neural networks, with the aim of reducing the detection workload for radiologists and clinicians. METHODS: A total of 372 consecutive patients with LARC were retrospectively enrolled from October 2015 to December 2017. The standard-of-care neoadjuvant process included 22-fraction intensity-modulated radiation therapy and oral capecitabine. Further, 243 patients (3061 slices) were grouped into training and validation datasets with a random 80:20 split, and 41 patients (408 slices) were used as the test dataset. A symmetric eight-layer deep network was developed using the nnU-Net Framework, which outputs the segmentation result with the same size. The trained deep learning (DL) network was examined using fivefold cross-validation and tumor lesions with different TRGs. RESULTS: At the stage of testing, the Dice similarity coefficient (DSC), 95% Hausdorff distance (HD95), and mean surface distance (MSD) were applied to quantitatively evaluate the performance of generalization. Considering the test dataset (41 patients, 408 slices), the average DSC, HD95, and MSD were 0.700 (95% CI: 0.680-0.720), 17.73 mm (95% CI: 16.08-19.39), and 3.11 mm (95% CI: 2.67-3.56), respectively. Eighty-two percent of the MSD values were less than 5 mm, and fifty-five percent were less than 2 mm (median 1.62 mm, minimum 0.07 mm). CONCLUSIONS: The experimental results indicated that the constructed pipeline could achieve relatively high accuracy. Future work will focus on assessing the performances with multicentre external validation.


Asunto(s)
Aprendizaje Profundo , Neoplasias del Recto , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética/métodos , Terapia Neoadyuvante , Neoplasias del Recto/diagnóstico por imagen , Neoplasias del Recto/terapia , Neoplasias del Recto/patología , Estudios Retrospectivos , Semántica
3.
Surg Endosc ; 38(1): 171-178, 2024 01.
Artículo en Inglés | MEDLINE | ID: mdl-37950028

RESUMEN

BACKGROUND: In laparoscopic right hemicolectomy (RHC) for right-sided colon cancer, accurate recognition of the vascular anatomy is required for appropriate lymph node harvesting and safe operative procedures. We aimed to develop a deep learning model that enables the automatic recognition and visualization of major blood vessels in laparoscopic RHC. MATERIALS AND METHODS: This was a single-institution retrospective feasibility study. Semantic segmentation of three vessel areas, including the superior mesenteric vein (SMV), ileocolic artery (ICA), and ileocolic vein (ICV), was performed using the developed deep learning model. The Dice coefficient, recall, and precision were utilized as evaluation metrics to quantify the model performance after fivefold cross-validation. The model was further qualitatively appraised by 13 surgeons, based on a grading rubric to assess its potential for clinical application. RESULTS: In total, 2624 images were extracted from 104 laparoscopic colectomy for right-sided colon cancer videos, and the pixels corresponding to the SMV, ICA, and ICV were manually annotated and utilized as training data. SMV recognition was the most accurate, with all three evaluation metrics having values above 0.75, whereas the recognition accuracy of ICA and ICV ranged from 0.53 to 0.57 for the three evaluation metrics. Additionally, all 13 surgeons gave acceptable ratings for the possibility of clinical application in rubric-based quantitative evaluations. CONCLUSION: We developed a DL-based vessel segmentation model capable of achieving feasible identification and visualization of major blood vessels in association with RHC. This model may be used by surgeons to accomplish reliable navigation of vessel visualization.


Asunto(s)
Neoplasias del Colon , Aprendizaje Profundo , Laparoscopía , Humanos , Neoplasias del Colon/diagnóstico por imagen , Neoplasias del Colon/cirugía , Neoplasias del Colon/irrigación sanguínea , Estudios Retrospectivos , Laparoscopía/métodos , Colectomía/métodos
4.
Surg Endosc ; 38(2): 1088-1095, 2024 02.
Artículo en Inglés | MEDLINE | ID: mdl-38216749

RESUMEN

BACKGROUND: The precise recognition of liver vessels during liver parenchymal dissection is the crucial technique for laparoscopic liver resection (LLR). This retrospective feasibility study aimed to develop artificial intelligence (AI) models to recognize liver vessels in LLR, and to evaluate their accuracy and real-time performance. METHODS: Images from LLR videos were extracted, and the hepatic veins and Glissonean pedicles were labeled separately. Two AI models were developed to recognize liver vessels: the "2-class model" which recognized both hepatic veins and Glissonean pedicles as equivalent vessels and distinguished them from the background class, and the "3-class model" which recognized them all separately. The Feature Pyramid Network was used as a neural network architecture for both models in their semantic segmentation tasks. The models were evaluated using fivefold cross-validation tests, and the Dice coefficient (DC) was used as an evaluation metric. Ten gastroenterological surgeons also evaluated the models qualitatively through rubric. RESULTS: In total, 2421 frames from 48 video clips were extracted. The mean DC value of the 2-class model was 0.789, with a processing speed of 0.094 s. The mean DC values for the hepatic vein and the Glissonean pedicle in the 3-class model were 0.631 and 0.482, respectively. The average processing time for the 3-class model was 0.097 s. Qualitative evaluation by surgeons revealed that false-negative and false-positive ratings in the 2-class model averaged 4.40 and 3.46, respectively, on a five-point scale, while the false-negative, false-positive, and vessel differentiation ratings in the 3-class model averaged 4.36, 3.44, and 3.28, respectively, on a five-point scale. CONCLUSION: We successfully developed deep-learning models that recognize liver vessels in LLR with high accuracy and sufficient processing speed. These findings suggest the potential of a new real-time automated navigation system for LLR.


Asunto(s)
Inteligencia Artificial , Laparoscopía , Humanos , Estudios Retrospectivos , Hígado/diagnóstico por imagen , Hígado/cirugía , Hígado/irrigación sanguínea , Hepatectomía/métodos , Laparoscopía/métodos
5.
BMC Med Imaging ; 24(1): 95, 2024 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-38654162

RESUMEN

OBJECTIVE: In radiation therapy, cancerous region segmentation in magnetic resonance images (MRI) is a critical step. For rectal cancer, the automatic segmentation of rectal tumors from an MRI is a great challenge. There are two main shortcomings in existing deep learning-based methods that lead to incorrect segmentation: 1) there are many organs surrounding the rectum, and the shape of some organs is similar to that of rectal tumors; 2) high-level features extracted by conventional neural networks often do not contain enough high-resolution information. Therefore, an improved U-Net segmentation network based on attention mechanisms is proposed to replace the traditional U-Net network. METHODS: The overall framework of the proposed method is based on traditional U-Net. A ResNeSt module was added to extract the overall features, and a shape module was added after the encoder layer. We then combined the outputs of the shape module and the decoder to obtain the results. Moreover, the model used different types of attention mechanisms, so that the network learned information to improve segmentation accuracy. RESULTS: We validated the effectiveness of the proposed method using 3773 2D MRI datasets from 304 patients. The results showed that the proposed method achieved 0.987, 0.946, 0.897, and 0.899 for Dice, MPA, MioU, and FWIoU, respectively; these values are significantly better than those of other existing methods. CONCLUSION: Due to time savings, the proposed method can help radiologists segment rectal tumors effectively and enable them to focus on patients whose cancerous regions are difficult for the network to segment. SIGNIFICANCE: The proposed method can help doctors segment rectal tumors, thereby ensuring good diagnostic quality and accuracy.


Asunto(s)
Aprendizaje Profundo , Imagen por Resonancia Magnética , Neoplasias del Recto , Neoplasias del Recto/diagnóstico por imagen , Neoplasias del Recto/patología , Humanos , Imagen por Resonancia Magnética/métodos , Redes Neurales de la Computación , Interpretación de Imagen Asistida por Computador/métodos , Masculino
6.
BMC Med Imaging ; 24(1): 154, 2024 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-38902660

RESUMEN

BACKGROUND: Acute pancreatitis is one of the most common diseases requiring emergency surgery. Rapid and accurate recognition of acute pancreatitis can help improve clinical outcomes. This study aimed to develop a deep learning-powered diagnostic model for acute pancreatitis. MATERIALS AND METHODS: In this investigation, we enrolled a cohort of 190 patients with acute pancreatitis who were admitted to Sichuan Provincial People's Hospital between January 2020 and December 2021. Abdominal computed tomography (CT) scans were obtained from both patients with acute pancreatitis and healthy individuals. Our model was constructed using two modules: (1) the acute pancreatitis classifier module; (2) the pancreatitis lesion segmentation module. Each model's performance was assessed based on precision, recall rate, F1-score, Area Under the Curve (AUC), loss rate, frequency-weighted accuracy (fwavacc), and Mean Intersection over Union (MIOU). RESULTS: Upon admission, significant variations were observed between patients with mild and severe acute pancreatitis in inflammatory indexes, liver, and kidney function indicators, as well as coagulation parameters. The acute pancreatitis classifier module exhibited commendable diagnostic efficacy, showing an impressive AUC of 0.993 (95%CI: 0.978-0.999) in the test set (comprising healthy examination patients vs. those with acute pancreatitis, P < 0.001) and an AUC of 0.850 (95%CI: 0.790-0.898) in the external validation set (healthy examination patients vs. patients with acute pancreatitis, P < 0.001). Furthermore, the acute pancreatitis lesion segmentation module demonstrated exceptional performance in the validation set. For pancreas segmentation, peripancreatic inflammatory exudation, peripancreatic effusion, and peripancreatic abscess necrosis, the MIOU values were 86.02 (84.52, 87.20), 61.81 (56.25, 64.83), 57.73 (49.90, 68.23), and 66.36 (55.08, 72.12), respectively. These findings underscore the robustness and reliability of the developed models in accurately characterizing and assessing acute pancreatitis. CONCLUSION: The diagnostic model for acute pancreatitis, driven by deep learning, exhibits excellent efficacy in accurately evaluating the severity of the condition. TRIAL REGISTRATION: This is a retrospective study.


Asunto(s)
Aprendizaje Profundo , Pancreatitis , Tomografía Computarizada por Rayos X , Humanos , Pancreatitis/diagnóstico por imagen , Masculino , Femenino , Tomografía Computarizada por Rayos X/métodos , Persona de Mediana Edad , Adulto , Enfermedad Aguda , Anciano , Estudios Retrospectivos
7.
Neurosurg Rev ; 47(1): 200, 2024 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-38722409

RESUMEN

Appropriate needle manipulation to avoid abrupt deformation of fragile vessels is a critical determinant of the success of microvascular anastomosis. However, no study has yet evaluated the area changes in surgical objects using surgical videos. The present study therefore aimed to develop a deep learning-based semantic segmentation algorithm to assess the area change of vessels during microvascular anastomosis for objective surgical skill assessment with regard to the "respect for tissue." The semantic segmentation algorithm was trained based on a ResNet-50 network using microvascular end-to-side anastomosis training videos with artificial blood vessels. Using the created model, video parameters during a single stitch completion task, including the coefficient of variation of vessel area (CV-VA), relative change in vessel area per unit time (ΔVA), and the number of tissue deformation errors (TDE), as defined by a ΔVA threshold, were compared between expert and novice surgeons. A high validation accuracy (99.1%) and Intersection over Union (0.93) were obtained for the auto-segmentation model. During the single-stitch task, the expert surgeons displayed lower values of CV-VA (p < 0.05) and ΔVA (p < 0.05). Additionally, experts committed significantly fewer TDEs than novices (p < 0.05), and completed the task in a shorter time (p < 0.01). Receiver operating curve analyses indicated relatively strong discriminative capabilities for each video parameter and task completion time, while the combined use of the task completion time and video parameters demonstrated complete discriminative power between experts and novices. In conclusion, the assessment of changes in the vessel area during microvascular anastomosis using a deep learning-based semantic segmentation algorithm is presented as a novel concept for evaluating microsurgical performance. This will be useful in future computer-aided devices to enhance surgical education and patient safety.


Asunto(s)
Algoritmos , Anastomosis Quirúrgica , Aprendizaje Profundo , Humanos , Anastomosis Quirúrgica/métodos , Proyectos Piloto , Microcirugia/métodos , Microcirugia/educación , Agujas , Competencia Clínica , Semántica , Procedimientos Quirúrgicos Vasculares/métodos , Procedimientos Quirúrgicos Vasculares/educación
8.
J Appl Clin Med Phys ; 25(7): e14378, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38729652

RESUMEN

BACKGROUND: The diagnosis of lumbar spinal stenosis (LSS) can be challenging because radicular pain is not often present in the culprit-level localization. Accurate segmentation and quantitative analysis of the lumbar dura on radiographic images are key to the accurate differential diagnosis of LSS. The aim of this study is to develop an automatic dura-contouring tool for radiographic quantification on computed tomography myelogram (CTM) for patients with LSS. METHODS: A total of 518 CTM cases with or without lumbar stenosis were included in this study. A deep learning (DL) segmentation algorithm 3-dimensional (3D) U-Net was deployed. A total of 210 labeled cases were used to develop the dura-contouring tool, with the ratio of the training, independent testing, and external validation datasets being 150:30:30. The Dice score (DCS) was the primary measure to evaluate the segmentation performance of the 3D U-Net, which was subsequently developed as the dura-contouring tool to segment another unlabeled 308 CTM cases with LSS. Automatic masks of 446 slices on the stenotic levels were then meticulously reviewed and revised by human experts, and the cross-sectional area (CSA) of the dura was compared. RESULTS: The mean DCS of the 3D U-Net were 0.905 ± 0.080, 0.933 ± 0.018, and 0.928 ± 0.034 in the five-fold cross-validation, the independent testing, and the external validation datasets, respectively. The segmentation performance of the dura-contouring tool was also comparable to that of the second observer (the human expert). With the dura-contouring tool, only 59.0% (263/446) of the automatic masks of the stenotic slices needed to be revised. In the revised cases, there were no significant differences in the dura CSA between automatic masks and corresponding revised masks (p = 0.652). Additionally, a strong correlation of dura CSA was found between the automatic masks and corresponding revised masks (r = 0.805). CONCLUSIONS: A dura-contouring tool was developed that could automatically segment the dural sac on CTM, and it demonstrated high accuracy and generalization ability. Additionally, the dura-contouring tool has the potential to be applied in patients with LSS because it facilitates the quantification of the dural CSA on stenotic slices.


Asunto(s)
Aprendizaje Profundo , Duramadre , Vértebras Lumbares , Mielografía , Estenosis Espinal , Tomografía Computarizada por Rayos X , Humanos , Estenosis Espinal/diagnóstico por imagen , Tomografía Computarizada por Rayos X/métodos , Duramadre/diagnóstico por imagen , Duramadre/patología , Vértebras Lumbares/diagnóstico por imagen , Mielografía/métodos , Masculino , Femenino , Anciano , Persona de Mediana Edad , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Adulto , Estudios Retrospectivos
9.
Sensors (Basel) ; 24(9)2024 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-38733032

RESUMEN

Performing a minimally invasive surgery comes with a significant advantage regarding rehabilitating the patient after the operation. But it also causes difficulties, mainly for the surgeon or expert who performs the surgical intervention, since only visual information is available and they cannot use their tactile senses during keyhole surgeries. This is the case with laparoscopic hysterectomy since some organs are also difficult to distinguish based on visual information, making laparoscope-based hysterectomy challenging. In this paper, we propose a solution based on semantic segmentation, which can create pixel-accurate predictions of surgical images and differentiate the uterine arteries, ureters, and nerves. We trained three binary semantic segmentation models based on the U-Net architecture with the EfficientNet-b3 encoder; then, we developed two ensemble techniques that enhanced the segmentation performance. Our pixel-wise ensemble examines the segmentation map of the binary networks on the lowest level of pixels. The other algorithm developed is a region-based ensemble technique that takes this examination to a higher level and makes the ensemble based on every connected component detected by the binary segmentation networks. We also introduced and trained a classic multi-class semantic segmentation model as a reference and compared it to the ensemble-based approaches. We used 586 manually annotated images from 38 surgical videos for this research and published this dataset.


Asunto(s)
Algoritmos , Laparoscopía , Redes Neurales de la Computación , Uréter , Arteria Uterina , Humanos , Laparoscopía/métodos , Femenino , Uréter/diagnóstico por imagen , Uréter/cirugía , Arteria Uterina/cirugía , Arteria Uterina/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Semántica , Histerectomía/métodos
10.
Sensors (Basel) ; 24(14)2024 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-39065919

RESUMEN

Super-resolution semantic segmentation (SRSS) is a technique that aims to obtain high-resolution semantic segmentation results based on resolution-reduced input images. SRSS can significantly reduce computational cost and enable efficient, high-resolution semantic segmentation on mobile devices with limited resources. Some of the existing methods require modifications of the original semantic segmentation network structure or add additional and complicated processing modules, which limits the flexibility of actual deployment. Furthermore, the lack of detailed information in the low-resolution input image renders existing methods susceptible to misdetection at the semantic edges. To address the above problems, we propose a simple but effective framework called multi-resolution learning and semantic edge enhancement-based super-resolution semantic segmentation (MS-SRSS) which can be applied to any existing encoder-decoder based semantic segmentation network. Specifically, a multi-resolution learning mechanism (MRL) is proposed that enables the feature encoder of the semantic segmentation network to improve its feature extraction ability. Furthermore, we introduce a semantic edge enhancement loss (SEE) to alleviate the false detection at the semantic edges. We conduct extensive experiments on the three challenging benchmarks, Cityscapes, Pascal Context, and Pascal VOC 2012, to verify the effectiveness of our proposed MS-SRSS method. The experimental results show that, compared with the existing methods, our method can obtain the new state-of-the-art semantic segmentation performance.

11.
Sensors (Basel) ; 24(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38276388

RESUMEN

Visual perception is a crucial component of autonomous driving systems. Traditional approaches for autonomous driving visual perception often rely on single-modal methods, and semantic segmentation tasks are accomplished by inputting RGB images. However, for semantic segmentation tasks in autonomous driving visual perception, a more effective strategy involves leveraging multiple modalities, which is because different sensors of the autonomous driving system bring diverse information, and the complementary features among different modalities enhance the robustness of the semantic segmentation modal. Contrary to the intuitive belief that more modalities lead to better accuracy, our research reveals that adding modalities to traditional semantic segmentation models can sometimes decrease precision. Inspired by the residual thinking concept, we propose a multimodal visual perception model which is capable of maintaining or even improving accuracy with the addition of any modality. Our approach is straightforward, using RGB as the main branch and employing the same feature extraction backbone for other modal branches. The modals score module (MSM) evaluates channel and spatial scores of all modality features, measuring their importance for overall semantic segmentation. Subsequently, the modal branches provide additional features to the RGB main branch through the features complementary module (FCM). Leveraging the residual thinking concept further enhances the feature extraction capabilities of all the branches. Through extensive experiments, we derived several conclusions. The integration of certain modalities into traditional semantic segmentation models tends to result in a decline in segmentation accuracy. In contrast, our proposed simple and scalable multimodal model demonstrates the ability to maintain segmentation precision when accommodating any additional modality. Moreover, our approach surpasses some state-of-the-art multimodal semantic segmentation models. Additionally, we conducted ablation experiments on the proposed model, confirming that the application of the proposed MSM, FCM, and the incorporation of residual thinking contribute significantly to the enhancement of the model.

12.
Sensors (Basel) ; 24(3)2024 Jan 26.
Artículo en Inglés | MEDLINE | ID: mdl-38339532

RESUMEN

Visual localization refers to the process of determining an observer's pose by analyzing the spatial relationships between a query image and a pre-existing set of images. In this procedure, matched visual features between images are identified and utilized for pose estimation; consequently, the accuracy of the estimation heavily relies on the precision of feature matching. Incorrect feature matchings, such as those between different objects and/or different points within an object in an image, should thus be avoided. In this paper, our initial evaluation focused on gauging the reliability of each object class within image datasets concerning pose estimation accuracy. This assessment revealed the building class to be reliable, while humans exhibited unreliability across diverse locations. The subsequent study delved deeper into the degradation of pose estimation accuracy by artificially increasing the proportion of the unreliable object-humans. The findings revealed a noteworthy decline started when the average proportion of the humans in the images exceeded 20%. We discuss the results and implications for dataset construction for visual localization.

13.
Sensors (Basel) ; 24(3)2024 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-38339624

RESUMEN

Robust visual place recognition (VPR) enables mobile robots to identify previously visited locations. For this purpose, the extracted visual information and place matching method plays a significant role. In this paper, we critically review the existing VPR methods and group them into three major categories based on visual information used, i.e., handcrafted features, deep features, and semantics. Focusing the benefits of convolutional neural networks (CNNs) and semantics, and limitations of existing research, we propose a robust appearance-based place recognition method, termed SVS-VPR, which is implemented as a hierarchical model consisting of two major components: global scene-based and local feature-based matching. The global scene semantics are extracted and compared with pre-visited images to filter the match candidates while reducing the search space and computational cost. The local feature-based matching involves the extraction of robust local features from CNN possessing invariant properties against environmental conditions and a place matching method utilizing semantic, visual, and spatial information. SVS-VPR is evaluated on publicly available benchmark datasets using true positive detection rate, recall at 100% precision, and area under the curve. Experimental findings demonstrate that SVS-VPR surpasses several state-of-the-art deep learning-based methods, boosting robustness against significant changes in viewpoint and appearance while maintaining efficient matching time performance.

14.
Sensors (Basel) ; 24(3)2024 Feb 04.
Artículo en Inglés | MEDLINE | ID: mdl-38339722

RESUMEN

Cracks inside urban underground comprehensive pipe galleries are small and their characteristics are not obvious. Due to low lighting and large shadow areas, the differentiation between the cracks and background in an image is low. Most current semantic segmentation methods focus on overall segmentation and have a large perceptual range. However, for urban underground comprehensive pipe gallery crack segmentation tasks, it is difficult to pay attention to the detailed features of local edges to obtain accurate segmentation results. A Global Attention Segmentation Network (GA-SegNet) is proposed in this paper. The GA-SegNet is designed to perform semantic segmentation by incorporating global attention mechanisms. In order to perform precise pixel classification in the image, a residual separable convolution attention model is employed in an encoder to extract features at multiple scales. A global attention upsample model (GAM) is utilized in a decoder to enhance the connection between shallow-level features and deep abstract features, which could increase the attention of the network towards small cracks. By employing a balanced loss function, the contribution of crack pixels is increased while reducing the focus on background pixels in the overall loss. This approach aims to improve the segmentation accuracy of cracks. The comparative experimental results with other classic models show that the GA SegNet model proposed in this study has better segmentation performance and multiple evaluation indicators, and has advantages in segmentation accuracy and efficiency.

15.
Sensors (Basel) ; 24(6)2024 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-38544142

RESUMEN

Recent advancements in image segmentation have been notably driven by Vision Transformers. These transformer-based models offer one versatile network structure capable of handling a variety of segmentation tasks. Despite their effectiveness, the pursuit of enhanced capabilities often leads to more intricate architectures and greater computational demands. OneFormer has responded to these challenges by introducing a query-text contrastive learning strategy active during training only. However, this approach has not completely addressed the inefficiency issues in text generation and the contrastive loss computation. To solve these problems, we introduce Efficient Query Optimizer (EQO), an approach that efficiently utilizes multi-modal data to refine query optimization in image segmentation. Our strategy significantly reduces the complexity of parameters and computations by distilling inter-class and inter-task information from an image into a single template sentence. Furthermore, we propose a novel attention-based contrastive loss. It is designed to facilitate a one-to-many matching mechanism in the loss computation, which helps object queries learn more robust representations. Beyond merely reducing complexity, our model demonstrates superior performance compared to OneFormer across all three segmentation tasks using the Swin-T backbone. Our evaluations on the ADE20K dataset reveal that our model outperforms OneFormer in multiple metrics: by 0.2% in mean Intersection over Union (mIoU), 0.6% in Average Precision (AP), and 0.8% in Panoptic Quality (PQ). These results highlight the efficacy of our model in advancing the field of image segmentation.

16.
Sensors (Basel) ; 24(4)2024 Feb 19.
Artículo en Inglés | MEDLINE | ID: mdl-38400482

RESUMEN

The common channel attention mechanism maps feature statistics to feature weights. However, the effectiveness of this mechanism may not be assured in remotely sensing images due to statistical differences across multiple bands. This paper proposes a novel channel attention mechanism based on feature information called the feature information entropy attention mechanism (FEM). The FEM constructs a relationship between features based on feature information entropy and then maps this relationship to their importance. The Vaihingen dataset and OpenEarthMap dataset are selected for experiments. The proposed method was compared with the squeeze-and-excitation mechanism (SEM), the convolutional block attention mechanism (CBAM), and the frequency channel attention mechanism (FCA). Compared with these three channel attention mechanisms, the mIoU of the FEM in the Vaihingen dataset is improved by 0.90%, 1.10%, and 0.40%, and in the OpenEarthMap dataset, it is improved by 2.30%, 2.20%, and 2.10%, respectively. The proposed channel attention mechanism in this paper shows better performance in remote sensing land use classification.

17.
Sensors (Basel) ; 24(7)2024 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-38610313

RESUMEN

Simultaneous localisation and mapping (SLAM) is crucial in mobile robotics. Most visual SLAM systems assume that the environment is static. However, in real life, there are many dynamic objects, which affect the accuracy and robustness of these systems. To improve the performance of visual SLAM systems, this study proposes a dynamic visual SLAM (SEG-SLAM) system based on the orientated FAST and rotated BRIEF (ORB)-SLAM3 framework and you only look once (YOLO)v5 deep-learning method. First, based on the ORB-SLAM3 framework, the YOLOv5 deep-learning method is used to construct a fusion module for target detection and semantic segmentation. This module can effectively identify and extract prior information for obviously and potentially dynamic objects. Second, differentiated dynamic feature point rejection strategies are developed for different dynamic objects using the prior information, depth information, and epipolar geometry method. Thus, the localisation and mapping accuracy of the SEG-SLAM system is improved. Finally, the rejection results are fused with the depth information, and a static dense 3D mapping without dynamic objects is constructed using the Point Cloud Library. The SEG-SLAM system is evaluated using public TUM datasets and real-world scenarios. The proposed method is more accurate and robust than current dynamic visual SLAM algorithms.

18.
Sensors (Basel) ; 24(7)2024 Mar 31.
Artículo en Inglés | MEDLINE | ID: mdl-38610455

RESUMEN

In order to guide orchard management robots to realize some tasks in orchard production such as autonomic navigation and precision spraying, this research proposed a deep-learning network called dynamic fusion segmentation network (DFSNet). The network contains a local feature aggregation (LFA) layer and a dynamic fusion segmentation architecture. The LFA layer uses the positional encoders for initial transforming embedding, and progressively aggregates local patterns via the multi-stage hierarchy. The fusion segmentation module (Fus-Seg) can format point tags by learning a multi-embedding space, and the generated tags can further mine the point cloud features. At the experimental stage, significant segmentation results of the DFSNet were demonstrated on the dataset of orchard fields, achieving an accuracy rate of 89.43% and an mIoU rate of 74.05%. DFSNet outperforms other semantic segmentation networks, such as PointNet, PointNet++, D-PointNet++, DGCNN, and Point-NN, with improved accuracies over them by 11.73%, 3.76%, 2.36%, and 2.74%, respectively, and improved mIoUs over the these networks by 28.19%, 9.89%, 6.33%, 9.89, and 24.69%, respectively, on the all-scale dataset (simple-scale dataset + complex-scale dataset). The proposed DFSNet can capture more information from orchard scene point clouds and provide more accurate point cloud segmentation results, which are beneficial to the management of orchards.

19.
Sensors (Basel) ; 24(8)2024 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-38676061

RESUMEN

The real-time monitoring and fault diagnosis of modern machinery and equipment impose higher demands on equipment maintenance, with the extraction of morphological characteristics of wear debris in lubricating oil emerging as a critical approach for real-time monitoring of wear, holding significant importance in the field. The online visual ferrograph (OLVF) technique serves as a representative method in this study. Various semantic segmentation approaches, such as DeepLabV3+, PSPNet, Segformer, Unet, and other models, are employed to process the oil wear particle image for conducting comparative experiments. In order to accurately segment the minute wear debris in oil abrasive images and mitigate the influence of reflection and bubbles, we propose a multi-level feature reused Unet (MFR Unet) that enhances the residual link strategy of Unet for improved identification of tiny wear debris in ferrograms, leading to superior segmentation results.

20.
Sensors (Basel) ; 24(8)2024 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-38676090

RESUMEN

Leveraging data from various modalities to enhance multimodal segmentation tasks is a well-regarded approach. Recently, efforts have been made to incorporate an array of modalities, including depth and thermal imaging. Nevertheless, the effective amalgamation of cross-modal interactions remains a challenge, given the unique traits each modality presents. In our current research, we introduce the semantic guidance fusion network (SGFN), which is an innovative cross-modal fusion network adept at integrating a diverse set of modalities. Particularly, the SGFN features a semantic guidance module (SGM) engineered to boost bi-modal feature extraction. It encompasses a learnable semantic guidance convolution (SGC) designed to merge intensity and gradient data from disparate modalities. Comprehensive experiments carried out on the NYU Depth V2, SUN-RGBD, Cityscapes, MFNet, and ZJU datasets underscore both the superior performance and generalization ability of the SGFN compared to the current leading models. Moreover, when tested on the DELIVER dataset, the efficiency of our bi-modal SGFN displayed a mIoU that is comparable to the hitherto leading model, CMNEXT.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA