Your browser doesn't support javascript.
loading
The effects of different levels of realism on the training of CNNs with only synthetic images for the semantic segmentation of robotic instruments in a head phantom.
Heredia Perez, Saul Alexis; Marques Marinho, Murilo; Harada, Kanako; Mitsuishi, Mamoru.
Afiliación
  • Heredia Perez SA; Department of Mechanical Engineering, The University of Tokyo, Tokyo, Japan.
  • Marques Marinho M; Department of Mechanical Engineering, The University of Tokyo, Tokyo, Japan. murilo@nml.t.u-tokyo.ac.jp.
  • Harada K; Department of Mechanical Engineering, The University of Tokyo, Tokyo, Japan.
  • Mitsuishi M; Department of Mechanical Engineering, The University of Tokyo, Tokyo, Japan.
Int J Comput Assist Radiol Surg ; 15(8): 1257-1265, 2020 Aug.
Article en En | MEDLINE | ID: mdl-32445129
PURPOSE: The manual generation of training data for the semantic segmentation of medical images using deep neural networks is a time-consuming and error-prone task. In this paper, we investigate the effect of different levels of realism on the training of deep neural networks for semantic segmentation of robotic instruments. An interactive virtual-reality environment was developed to generate synthetic images for robot-aided endoscopic surgery. In contrast with earlier works, we use physically based rendering for increased realism. METHODS: Using a virtual reality simulator that replicates our robotic setup, three synthetic image databases with an increasing level of realism were generated: flat, basic, and realistic (using the physically-based rendering). Each of those databases was used to train 20 instances of a UNet-based semantic-segmentation deep-learning model. The networks trained with only synthetic images were evaluated on the segmentation of 160 endoscopic images of a phantom. The networks were compared using the Dwass-Steel-Critchlow-Fligner nonparametric test. RESULTS: Our results show that the levels of realism increased the mean intersection-over-union (mIoU) of the networks on endoscopic images of a phantom ([Formula: see text]). The median mIoU values were 0.235 for the flat dataset, 0.458 for the basic, and 0.729 for the realistic. All the networks trained with synthetic images outperformed naive classifiers. Moreover, in an ablation study, we show that the mIoU of physically based rendering is superior to texture mapping ([Formula: see text]) of the instrument (0.606), the background (0.685), and the background and instruments combined (0.672). CONCLUSIONS: Using physical-based rendering to generate synthetic images is an effective approach to improve the training of neural networks for the semantic segmentation of surgical instruments in endoscopic images. Our results show that this strategy can be an essential step in the broad applicability of deep neural networks in semantic segmentation tasks and help bridge the domain gap in machine learning.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Redes Neurales de la Computación / Procedimientos Quirúrgicos Robotizados / Aprendizaje Automático / Entrenamiento Simulado Límite: Humans Idioma: En Revista: Int J Comput Assist Radiol Surg Asunto de la revista: RADIOLOGIA Año: 2020 Tipo del documento: Article País de afiliación: Japón Pais de publicación: Alemania

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Redes Neurales de la Computación / Procedimientos Quirúrgicos Robotizados / Aprendizaje Automático / Entrenamiento Simulado Límite: Humans Idioma: En Revista: Int J Comput Assist Radiol Surg Asunto de la revista: RADIOLOGIA Año: 2020 Tipo del documento: Article País de afiliación: Japón Pais de publicación: Alemania