RESUMO
Robot-assisted minimally invasive surgery has been broadly employed in complicated operations. However, the multiple surgical instruments may occupy a large amount of visual space in complex operations performed in narrow spaces, which affects the surgeon's judgment on the shape and position of the lesion as well as the course of its adjacent vessels/lacunae. In this paper, a surgical scene reconstruction method is proposed, which involves the tracking and removal of surgical instruments and the dynamic prediction of the obscured region. For tracking and segmentation of instruments, the image sequences are preprocessed by a modified U-Net architecture composed of a pre-trained ResNet101 encoder and a redesigned decoder. Also, the segmentation boundaries of the instrument shafts are extended using image filtering and a real-time index mask algorithm to achieve precise localization of the obscured elements. For predicting the deformation of soft tissues, a soft tissue deformation prediction algorithm is proposed based on dense optical flow gravitational field and entropy increase, which can achieve local dynamic visualization of the surgical scene by integrating image morphological operations. Finally, the preliminary experiments and the pre-clinical evaluation were presented to demonstrate the performance of the proposed method. The results show that the proposed method can provide the surgeon with a clean and comprehensive surgical scene, reconstruct the course of important vessels/lacunae, and avoid inadvertent injuries.
Assuntos
Laparoscopia , Procedimentos Cirúrgicos Robóticos , Robótica , Cirurgiões , Humanos , Campos VisuaisRESUMO
PURPOSE: Automatic image segmentation of surgical instruments is a fundamental task in robot-assisted minimally invasive surgery, which greatly improves the context awareness of surgeons during the operation. A novel method based on Mask R-CNN is proposed in this paper to realize accurate instance segmentation of surgical instruments. METHODS: A novel feature extraction backbone is built, which could extract both local features through the convolutional neural network branch and global representations through the Swin-Transformer branch. Moreover, skip fusions are applied in the backbone to fuse both features and improve the generalization ability of the network. RESULTS: The proposed method is evaluated on the dataset of MICCAI 2017 EndoVis Challenge with three segmentation tasks and shows state-of-the-art performance with an mIoU of 0.5873 in type segmentation and 0.7408 in part segmentation. Furthermore, the results of ablation studies prove that the proposed novel backbone contributes to at least 17% improvement in mIoU. CONCLUSION: The promising results demonstrate that our method can effectively extract global representations as well as local features in the segmentation of surgical instruments and improve the accuracy of segmentation. With the proposed novel backbone, the network can segment the contours of surgical instruments' end tips more precisely. This method can provide more accurate data for localization and pose estimation of surgical instruments, and make a further contribution to the automation of robot-assisted minimally invasive surgery.