Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 153
Filtrar
1.
Comput Biol Med ; 181: 109038, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39178804

RESUMO

Obtaining accurate distance or depth information in endoscopy is crucial for the effective utilization of navigation systems. However, due to space constraints, incorporating depth cameras into endoscopic systems is often impractical. Our goal is to estimate depth images directly from endoscopic images using deep learning. This study presents a three-step methodology for training a depth-estimation network model. Initially, simulated endoscopy images and corresponding depth maps are generated using Unity based on a colon surface model obtained from segmented computed tomography colonography data. Subsequently, a cycle generative adversarial network model is employed to enhance the realism of the simulated endoscopy images. Finally, a deep learning model is trained using the synthesized endoscopy images and depth maps to estimate depths accurately. The performance of the proposed approach is evaluated and compared against prior studies utilizing unsupervised training methods. The results demonstrate the superior precision of the proposed technique in estimating depth images within endoscopy. The proposed depth estimation method holds promise for advancing the field by enabling enhanced navigation, improved lesion marking capabilities, and ultimately leading to better clinical outcomes.

2.
Sensors (Basel) ; 24(14)2024 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-39065938

RESUMO

In recent years, there has been extensive research and application of unsupervised monocular depth estimation methods for intelligent vehicles. However, a major limitation of most existing approaches is their inability to predict absolute depth values in physical units, as they generally suffer from the scale problem. Furthermore, most research efforts have focused on ground vehicles, neglecting the potential application of these methods to unmanned aerial vehicles (UAVs). To address these gaps, this paper proposes a novel absolute depth estimation method specifically designed for flight scenes using a monocular vision sensor, in which a geometry-based scale recovery algorithm serves as a post-processing stage of relative depth estimation results with scale consistency. By exploiting the feature correspondence between successive images and using the pose data provided by equipped navigation sensors, the scale factor between relative and absolute scales is calculated according to a multi-view geometry model, and then absolute depth maps are generated by pixel-wise multiplication of relative depth maps with the scale factor. As a result, the unsupervised monocular depth estimation technology is extended from relative depth estimation in semi-structured scenes to absolute depth estimation in unstructured scenes. Experiments on the publicly available Mid-Air dataset and customized data demonstrate the effectiveness of our method in different cases and settings, as well as its robustness to navigation sensor noise. The proposed method only requires UAVs to be equipped with monocular camera and common navigation sensors, and the obtained absolute depth information can be directly used for downstream tasks, which is significant for this kind of vehicle that has rarely been explored in previous depth estimation studies.

3.
Sensors (Basel) ; 24(14)2024 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-39066131

RESUMO

This work presents TTFDNet, a transformer-based and transfer learning network for end-to-end depth estimation from single-frame fringe patterns in fringe projection profilometry. TTFDNet features a precise contour and coarse depth (PCCD) pre-processor, a global multi-dimensional fusion (GMDF) module and a progressive depth extractor (PDE). It utilizes transfer learning through fringe structure consistency evaluation (FSCE) to leverage the transformer's benefits even on a small dataset. Tested on 208 scenes, the model achieved a mean absolute error (MAE) of 0.00372 mm, outperforming Unet (0.03458 mm) models, PDE (0.01063 mm) and PCTNet (0.00518 mm). It demonstrated precise measurement capabilities with deviations of ~90 µm for a 25.4 mm radius ball and ~6 µm for a 20 mm thick metal part. Additionally, TTFDNet showed excellent generalization and robustness in dynamic reconstruction and varied imaging conditions, making it appropriate for practical applications in manufacturing, automation and computer vision.

4.
Sensors (Basel) ; 24(13)2024 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-39000869

RESUMO

Self-supervised monocular depth estimation can exhibit excellent performance in static environments due to the multi-view consistency assumption during the training process. However, it is hard to maintain depth consistency in dynamic scenes when considering the occlusion problem caused by moving objects. For this reason, we propose a method of self-supervised self-distillation for monocular depth estimation (SS-MDE) in dynamic scenes, where a deep network with a multi-scale decoder and a lightweight pose network are designed to predict depth in a self-supervised manner via the disparity, motion information, and the association between two adjacent frames in the image sequence. Meanwhile, in order to improve the depth estimation accuracy of static areas, the pseudo-depth images generated by the LeReS network are used to provide the pseudo-supervision information, enhancing the effect of depth refinement in static areas. Furthermore, a forgetting factor is leveraged to alleviate the dependency on the pseudo-supervision. In addition, a teacher model is introduced to generate depth prior information, and a multi-view mask filter module is designed to implement feature extraction and noise filtering. This can enable the student model to better learn the deep structure of dynamic scenes, enhancing the generalization and robustness of the entire model in a self-distillation manner. Finally, on four public data datasets, the performance of the proposed SS-MDE method outperformed several state-of-the-art monocular depth estimation techniques, achieving an accuracy (δ1) of 89% while minimizing the error (AbsRel) by 0.102 in NYU-Depth V2 and achieving an accuracy (δ1) of 87% while minimizing the error (AbsRel) by 0.111 in KITTI.

5.
Sensors (Basel) ; 24(13)2024 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-39000900

RESUMO

In recent years, the technological landscape has undergone a profound metamorphosis catalyzed by the widespread integration of drones across diverse sectors. Essential to the drone manufacturing process is comprehensive testing, typically conducted in controlled laboratory settings to uphold safety and privacy standards. However, a formidable challenge emerges due to the inherent limitations of GPS signals within indoor environments, posing a threat to the accuracy of drone positioning. This limitation not only jeopardizes testing validity but also introduces instability and inaccuracies, compromising the assessment of drone performance. Given the pivotal role of precise GPS-derived data in drone autopilots, addressing this indoor-based GPS constraint is imperative to ensure the reliability and resilience of unmanned aerial vehicles (UAVs). This paper delves into the implementation of an Indoor Positioning System (IPS) leveraging computer vision. The proposed system endeavors to detect and localize UAVs within indoor environments through an enhanced vision-based triangulation approach. A comparative analysis with alternative positioning methodologies is undertaken to ascertain the efficacy of the proposed system. The results obtained showcase the efficiency and precision of the designed system in detecting and localizing various types of UAVs, underscoring its potential to advance the field of indoor drone navigation and testing.

6.
Sensors (Basel) ; 24(13)2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-39000982

RESUMO

Accurate 3D image recognition, critical for autonomous driving safety, is shifting from the LIDAR-based point cloud to camera-based depth estimation technologies driven by cost considerations and the point cloud's limitations in detecting distant small objects. This research aims to enhance MDE (Monocular Depth Estimation) using a single camera, offering extreme cost-effectiveness in acquiring 3D environmental data. In particular, this paper focuses on novel data augmentation methods designed to enhance the accuracy of MDE. Our research addresses the challenge of limited MDE data quantities by proposing the use of synthetic-based augmentation techniques: Mask, Mask-Scale, and CutFlip. The implementation of these synthetic-based data augmentation strategies has demonstrably enhanced the accuracy of MDE models by 4.0% compared to the original dataset. Furthermore, this study introduces the RMS (Real-time Monocular Depth Estimation configuration considering Resolution, Efficiency, and Latency) algorithm, designed for the optimization of neural networks to augment the performance of contemporary monocular depth estimation technologies through a three-step process. Initially, it selects a model based on minimum latency and REL criteria, followed by refining the model's accuracy using various data augmentation techniques and loss functions. Finally, the refined model is compressed using quantization and pruning techniques to minimize its size for efficient on-device real-time applications. Experimental results from implementing the RMS algorithm indicated that, within the required latency and size constraints, the IEBins model exhibited the most accurate REL (absolute RELative error) performance, achieving a 0.0480 REL. Furthermore, the data augmentation combination of the original dataset with Flip, Mask, and CutFlip, alongside the SigLoss loss function, displayed the best REL performance, with a score of 0.0461. The network compression technique using FP16 was analyzed as the most effective, reducing the model size by 83.4% compared to the original while maintaining the least impact on REL performance and latency. Finally, the performance of the RMS algorithm was validated on the on-device autonomous driving platform, NVIDIA Jetson AGX Orin, through which optimal deployment strategies were derived for various applications and scenarios requiring autonomous driving technologies.

7.
Sensors (Basel) ; 24(13)2024 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-39001132

RESUMO

Acquiring underwater depth maps is essential as they provide indispensable three-dimensional spatial information for visualizing the underwater environment. These depth maps serve various purposes, including underwater navigation, environmental monitoring, and resource exploration. While most of the current depth estimation methods can work well in ideal underwater environments with homogeneous illumination, few consider the risk caused by irregular illumination, which is common in practical underwater environments. On the one hand, underwater environments with low-light conditions can reduce image contrast. The reduction brings challenges to depth estimation models in accurately differentiating among objects. On the other hand, overexposure caused by reflection or artificial illumination can degrade the textures of underwater objects, which is crucial to geometric constraints between frames. To address the above issues, we propose an underwater self-supervised monocular depth estimation network integrating image enhancement and auxiliary depth information. We use the Monte Carlo image enhancement module (MC-IEM) to tackle the inherent uncertainty in low-light underwater images through probabilistic estimation. When pixel values are enhanced, object recognition becomes more accessible, allowing for a more precise acquisition of distance information and thus resulting in more accurate depth estimation. Next, we extract additional geometric features through transfer learning, infusing prior knowledge from a supervised large-scale model into a self-supervised depth estimation network to refine loss functions and a depth network to address the overexposure issue. We conduct experiments with two public datasets, which exhibited superior performance compared to existing approaches in underwater depth estimation.

8.
Artigo em Inglês | MEDLINE | ID: mdl-38834903

RESUMO

PURPOSE: This work presents a novel platform for stereo reconstruction in anterior segment ophthalmic surgery to enable enhanced scene understanding, especially depth perception, for advanced computer-assisted eye surgery by effectively addressing the lack of texture and corneal distortions artifacts in the surgical scene. METHODS: The proposed platform for stereo reconstruction uses a two-step approach: generating a sparse 3D point cloud from microscopic images, deriving a dense 3D representation by fitting surfaces onto the point cloud, and considering geometrical priors of the eye anatomy. We incorporate a pre-processing step to rectify distortion artifacts induced by the cornea's high refractive power, achieved by aligning a 3D phenotypical cornea geometry model to the images and computing a distortion map using ray tracing. RESULTS: The accuracy of 3D reconstruction is evaluated on stereo microscopic images of ex vivo porcine eyes, rigid phantom eyes, and synthetic photo-realistic images. The results demonstrate the potential of the proposed platform to enhance scene understanding via an accurate 3D representation of the eye and enable the estimation of instrument to layer distances in porcine eyes with a mean average error of 190  µ m , comparable to the scale of surgeons' hand tremor. CONCLUSION: This work marks a significant advancement in stereo reconstruction for ophthalmic surgery by addressing corneal distortions, a previously often overlooked aspect in such surgical scenarios. This could improve surgical outcomes by allowing for intra-operative computer assistance, e.g., in the form of virtual distance sensors.

9.
Sensors (Basel) ; 24(11)2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38894371

RESUMO

The Rich spatial and angular information in light field images enables accurate depth estimation, which is a crucial aspect of environmental perception. However, the abundance of light field information also leads to high computational costs and memory pressure. Typically, selectively pruning some light field information can significantly improve computational efficiency but at the expense of reduced depth estimation accuracy in the pruned model, especially in low-texture regions and occluded areas where angular diversity is reduced. In this study, we propose a lightweight disparity estimation model that balances speed and accuracy and enhances depth estimation accuracy in textureless regions. We combined cost matching methods based on absolute difference and correlation to construct cost volumes, improving both accuracy and robustness. Additionally, we developed a multi-scale disparity cost fusion architecture, employing 3D convolutions and a UNet-like structure to handle matching costs at different depth scales. This method effectively integrates information across scales, utilizing the UNet structure for efficient fusion and completion of cost volumes, thus yielding more precise depth maps. Extensive testing shows that our method achieves computational efficiency on par with the most efficient existing methods, yet with double the accuracy. Moreover, our approach achieves comparable accuracy to the current highest-accuracy methods but with an order of magnitude improvement in computational performance.

10.
Comput Biol Med ; 175: 108546, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38704902

RESUMO

Three-dimensional reconstruction of images acquired through endoscopes is playing a vital role in an increasing number of medical applications. Endoscopes used in the clinic are commonly classified as monocular endoscopes and binocular endoscopes. We have reviewed the classification of methods for depth estimation according to the type of endoscope. Basically, depth estimation relies on feature matching of images and multi-view geometry theory. However, these traditional techniques have many problems in the endoscopic environment. With the increasing development of deep learning techniques, there is a growing number of works based on learning methods to address challenges such as inconsistent illumination and texture sparsity. We have reviewed over 170 papers published in the 10 years from 2013 to 2023. The commonly used public datasets and performance metrics are summarized. We also give a taxonomy of methods and analyze the advantages and drawbacks of algorithms. Summary tables and result atlas are listed to facilitate the comparison of qualitative and quantitative performance of different methods in each category. In addition, we summarize commonly used scene representation methods in endoscopy and speculate on the prospects of deep estimation research in medical applications. We also compare the robustness performance, processing time, and scene representation of the methods to facilitate doctors and researchers in selecting appropriate methods based on surgical applications.


Assuntos
Endoscopia , Imageamento Tridimensional , Humanos , Imageamento Tridimensional/métodos , Endoscopia/métodos , Algoritmos , Aprendizado Profundo
11.
Int J Comput Assist Radiol Surg ; 19(7): 1359-1366, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38753135

RESUMO

PURPOSE: Preoperative imaging plays a pivotal role in sinus surgery where CTs offer patient-specific insights of complex anatomy, enabling real-time intraoperative navigation to complement endoscopy imaging. However, surgery elicits anatomical changes not represented in the preoperative model, generating an inaccurate basis for navigation during surgery progression. METHODS: We propose a first vision-based approach to update the preoperative 3D anatomical model leveraging intraoperative endoscopic video for navigated sinus surgery where relative camera poses are known. We rely on comparisons of intraoperative monocular depth estimates and preoperative depth renders to identify modified regions. The new depths are integrated in these regions through volumetric fusion in a truncated signed distance function representation to generate an intraoperative 3D model that reflects tissue manipulation RESULTS: We quantitatively evaluate our approach by sequentially updating models for a five-step surgical progression in an ex vivo specimen. We compute the error between correspondences from the updated model and ground-truth intraoperative CT in the region of anatomical modification. The resulting models show a decrease in error during surgical progression as opposed to increasing when no update is employed. CONCLUSION: Our findings suggest that preoperative 3D anatomical models can be updated using intraoperative endoscopy video in navigated sinus surgery. Future work will investigate improvements to monocular depth estimation as well as removing the need for external navigation systems. The resulting ability to continuously update the patient model may provide surgeons with a more precise understanding of the current anatomical state and paves the way toward a digital twin paradigm for sinus surgery.


Assuntos
Endoscopia , Imageamento Tridimensional , Modelos Anatômicos , Cirurgia Assistida por Computador , Tomografia Computadorizada por Raios X , Imageamento Tridimensional/métodos , Humanos , Endoscopia/métodos , Tomografia Computadorizada por Raios X/métodos , Cirurgia Assistida por Computador/métodos , Seios Paranasais/cirurgia , Seios Paranasais/diagnóstico por imagem
12.
Comput Med Imaging Graph ; 115: 102390, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38714018

RESUMO

Colonoscopy is the choice procedure to diagnose, screening, and treat the colon and rectum cancer, from early detection of small precancerous lesions (polyps), to confirmation of malign masses. However, the high variability of the organ appearance and the complex shape of both the colon wall and structures of interest make this exploration difficult. Learned visuospatial and perceptual abilities mitigate technical limitations in clinical practice by proper estimation of the intestinal depth. This work introduces a novel methodology to estimate colon depth maps in single frames from monocular colonoscopy videos. The generated depth map is inferred from the shading variation of the colon wall with respect to the light source, as learned from a realistic synthetic database. Briefly, a classic convolutional neural network architecture is trained from scratch to estimate the depth map, improving sharp depth estimations in haustral folds and polyps by a custom loss function that minimizes the estimation error in edges and curvatures. The network was trained by a custom synthetic colonoscopy database herein constructed and released, composed of 248400 frames (47 videos), with depth annotations at the level of pixels. This collection comprehends 5 subsets of videos with progressively higher levels of visual complexity. Evaluation of the depth estimation with the synthetic database reached a threshold accuracy of 95.65%, and a mean-RMSE of 0.451cm, while a qualitative assessment with a real database showed consistent depth estimations, visually evaluated by the expert gastroenterologist coauthoring this paper. Finally, the method achieved competitive performance with respect to another state-of-the-art method using a public synthetic database and comparable results in a set of images with other five state-of-the-art methods. Additionally, three-dimensional reconstructions demonstrated useful approximations of the gastrointestinal tract geometry. Code for reproducing the reported results and the dataset are available at https://github.com/Cimalab-unal/ColonDepthEstimation.


Assuntos
Colo , Colonoscopia , Bases de Dados Factuais , Humanos , Colonoscopia/métodos , Colo/diagnóstico por imagem , Redes Neurais de Computação , Pólipos do Colo/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos
13.
Sensors (Basel) ; 24(8)2024 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-38676016

RESUMO

With the widespread adoption of modern RGB cameras, an abundance of RGB images is available everywhere. Therefore, multi-view stereo (MVS) 3D reconstruction has been extensively applied across various fields because of its cost-effectiveness and accessibility, which involves multi-view depth estimation and stereo matching algorithms. However, MVS tasks face noise challenges because of natural multiplicative noise and negative gain in algorithms, which reduce the quality and accuracy of the generated models and depth maps. Traditional MVS methods often struggle with noise, relying on assumptions that do not always hold true under real-world conditions, while deep learning-based MVS approaches tend to suffer from high noise sensitivity. To overcome these challenges, we introduce LNMVSNet, a deep learning network designed to enhance local feature attention and fuse features across different scales, aiming for low-noise, high-precision MVS 3D reconstruction. Through extensive evaluation of multiple benchmark datasets, LNMVSNet has demonstrated its superior performance, showcasing its ability to improve reconstruction accuracy and completeness, especially in the recovery of fine details and clear feature delineation. This advancement brings hope for the widespread application of MVS, ranging from precise industrial part inspection to the creation of immersive virtual environments.

14.
Sensors (Basel) ; 24(6)2024 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-38544126

RESUMO

Radar data can provide additional depth information for monocular depth estimation. It provides a cost-effective solution and is robust in various weather conditions, particularly when compared with lidar. Given the sparse and limited vertical field of view of radar signals, existing methods employ either a vertical extension of radar points or the training of a preprocessing neural network to extend sparse radar points under lidar supervision. In this work, we present a novel radar expansion technique inspired by the joint bilateral filter, tailored for radar-guided monocular depth estimation. Our approach is motivated by the synergy of spatial and range kernels within the joint bilateral filter. Unlike traditional methods that assign a weighted average of nearby pixels to the current pixel, we expand sparse radar points by calculating a confidence score based on the values of spatial and range kernels. Additionally, we propose the use of a range-aware window size for radar expansion instead of a fixed window size in the image plane. Our proposed method effectively increases the number of radar points from an average of 39 points in a raw radar frame to an average of 100 K points. Notably, the expanded radar exhibits fewer intrinsic errors when compared with raw radar and previous methodologies. To validate our approach, we assess our proposed depth estimation model on the nuScenes dataset. Comparative evaluations with existing radar-guided depth estimation models demonstrate its state-of-the-art performance.

15.
Sensors (Basel) ; 24(6)2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38544229

RESUMO

This study addresses the ongoing challenge for learning-based methods to achieve accurate object detection in foggy conditions. In response to the scarcity of foggy traffic image datasets, we propose a foggy weather simulation algorithm based on monocular depth estimation. The algorithm involves a multi-step process: a self-supervised monocular depth estimation network generates a relative depth map and then applies dense geometric constraints for scale recovery to derive an absolute depth map. Subsequently, the visibility of the simulated image is defined to generate a transmittance map. The dark channel map is then used to distinguish sky regions and estimate atmospheric light values. Finally, the atmospheric scattering model is used to generate fog simulation images under specified visibility conditions. Experimental results show that more than 90% of fog images have AuthESI values of less than 2, which indicates that their non-structural similarity (NSS) characteristics are very close to those of natural fog. The proposed fog simulation method is able to convert clear images in natural environments, providing a solution to the problem of lack of foggy image datasets and incomplete visibility data.

16.
Sensors (Basel) ; 24(5)2024 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-38475237

RESUMO

Fringe projection profilometry (FPP) is widely used for high-accuracy 3D imaging. However, employing multiple sets of fringe patterns ensures 3D reconstruction accuracy while inevitably constraining the measurement speed. Conventional dual-frequency FPP reduces the number of fringe patterns for one reconstruction to six or fewer, but the highest period-number of fringe patterns generally is limited because of phase errors. Deep learning makes depth estimation from fringe images possible. Inspired by unsupervised monocular depth estimation, this paper proposes a novel, weakly supervised method of depth estimation for single-camera FPP. The trained network can estimate the depth from three frames of 64-period fringe images. The proposed method is more efficient in terms of fringe pattern efficiency by at least 50% compared to conventional FPP. The experimental results show that the method achieves competitive accuracy compared to the supervised method and is significantly superior to the conventional dual-frequency methods.

17.
Sci Rep ; 14(1): 5868, 2024 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-38467677

RESUMO

Monocular depth estimation has a wide range of applications in the field of autostereoscopic displays, while accuracy and robustness in complex scenes are still a challenge. In this paper, we propose a depth estimation network for autostereoscopic displays, which aims at improving the accuracy of monocular depth estimation by fusing Vision Transformer (ViT) and Convolutional Neural Network (CNN). Our approach feeds the input image as a sequence of visual features into the ViT module and utilizes its global perception capability to extract high-level semantic features of the image. The relationship between the losses is quantified by adding a weight correction module to improve robustness of the model. Experimental evaluation results on several public datasets show that AMENet exhibits higher accuracy and robustness than existing methods in different scenarios and complex conditions. In addition, a detailed experimental analysis was conducted to verify the effectiveness and stability of our method. The accuracy improvement on the KITTI dataset compared to the baseline method is 4.4%. In summary, AMENet is a promising depth estimation method with sufficient high robustness and accuracy for monocular depth estimation tasks.

18.
Sci Rep ; 14(1): 7037, 2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38528098

RESUMO

Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE.

19.
Int J Comput Assist Radiol Surg ; 19(6): 1013-1020, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38459402

RESUMO

PURPOSE: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation. METHODS: We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the LoRA layers and depth decoder to integrate features from the surgical scene. RESULTS: Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery. We empirically show that Surgical-DINO significantly outperforms all the state-of-the-art models in endoscopic depth estimation tasks. The analysis with ablation studies has shown evidence of the remarkable effect of our LoRA layers and adaptation. CONCLUSION: Surgical-DINO shed some light on the successful adaptation of the foundation models into the surgical domain for depth estimation. There is clear evidence in the results that zero-shot prediction on pre-trained weights in computer vision datasets or naive fine-tuning is not sufficient to use the foundation model in the surgical domain directly.


Assuntos
Imageamento Tridimensional , Humanos , Imageamento Tridimensional/métodos , Procedimentos Cirúrgicos Robóticos/métodos , Endoscopia/métodos , Cirurgia Assistida por Computador/métodos , Percepção de Profundidade/fisiologia
20.
Sensors (Basel) ; 24(3)2024 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-38339654

RESUMO

In the context of recent technological advancements driven by distributed work and open-source resources, computer vision stands out as an innovative force, transforming how machines interact with and comprehend the visual world around us. This work conceives, designs, implements, and operates a computer vision and artificial intelligence method for object detection with integrated depth estimation. With applications ranging from autonomous fruit-harvesting systems to phenotyping tasks, the proposed Depth Object Detector (DOD) is trained and evaluated using the Microsoft Common Objects in Context dataset and the MinneApple dataset for object and fruit detection, respectively. The DOD is benchmarked against current state-of-the-art models. The results demonstrate the proposed method's efficiency for operation on embedded systems, with a favorable balance between accuracy and speed, making it well suited for real-time applications on edge devices in the context of the Internet of things.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA