Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
IEEE Trans Med Imaging ; PP2024 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-38990752

RESUMO

Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (Git).

2.
Med Image Anal ; 96: 103195, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38815359

RESUMO

Colorectal cancer is one of the most common cancers in the world. While colonoscopy is an effective screening technique, navigating an endoscope through the colon to detect polyps is challenging. A 3D map of the observed surfaces could enhance the identification of unscreened colon tissue and serve as a training platform. However, reconstructing the colon from video footage remains difficult. Learning-based approaches hold promise as robust alternatives, but necessitate extensive datasets. Establishing a benchmark dataset, the 2022 EndoVis sub-challenge SimCol3D aimed to facilitate data-driven depth and pose prediction during colonoscopy. The challenge was hosted as part of MICCAI 2022 in Singapore. Six teams from around the world and representatives from academia and industry participated in the three sub-challenges: synthetic depth prediction, synthetic pose prediction, and real pose prediction. This paper describes the challenge, the submitted methods, and their results. We show that depth prediction from synthetic colonoscopy images is robustly solvable, while pose estimation remains an open research question.


Assuntos
Colonoscopia , Imageamento Tridimensional , Humanos , Imageamento Tridimensional/métodos , Neoplasias Colorretais/diagnóstico por imagem , Pólipos do Colo/diagnóstico por imagem
3.
Med Image Anal ; 91: 102985, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37844472

RESUMO

This paper introduces the "SurgT: Surgical Tracking" challenge which was organized in conjunction with the 25th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2022). There were two purposes for the creation of this challenge: (1) the establishment of the first standardized benchmark for the research community to assess soft-tissue trackers; and (2) to encourage the development of unsupervised deep learning methods, given the lack of annotated data in surgery. A dataset of 157 stereo endoscopic videos from 20 clinical cases, along with stereo camera calibration parameters, have been provided. Participants were assigned the task of developing algorithms to track the movement of soft tissues, represented by bounding boxes, in stereo endoscopic videos. At the end of the challenge, the developed methods were assessed on a previously hidden test subset. This assessment uses benchmarking metrics that were purposely developed for this challenge, to verify the efficacy of unsupervised deep learning algorithms in tracking soft-tissue. The metric used for ranking the methods was the Expected Average Overlap (EAO) score, which measures the average overlap between a tracker's and the ground truth bounding boxes. Coming first in the challenge was the deep learning submission by ICVS-2Ai with a superior EAO score of 0.617. This method employs ARFlow to estimate unsupervised dense optical flow from cropped images, using photometric and regularization losses. Second, Jmees with an EAO of 0.583, uses deep learning for surgical tool segmentation on top of a non-deep learning baseline method: CSRT. CSRT by itself scores a similar EAO of 0.563. The results from this challenge show that currently, non-deep learning methods are still competitive. The dataset and benchmarking tool created for this challenge have been made publicly available at https://surgt.grand-challenge.org/. This challenge is expected to contribute to the development of autonomous robotic surgery and other digital surgical technologies.


Assuntos
Procedimentos Cirúrgicos Robóticos , Humanos , Benchmarking , Algoritmos , Endoscopia , Processamento de Imagem Assistida por Computador/métodos
4.
Nat Commun ; 14(1): 6676, 2023 10 21.
Artigo em Inglês | MEDLINE | ID: mdl-37865629

RESUMO

Recent advancements in artificial intelligence have witnessed human-level performance; however, AI-enabled cognitive assistance for therapeutic procedures has not been fully explored nor pre-clinically validated. Here we propose AI-Endo, an intelligent surgical workflow recognition suit, for endoscopic submucosal dissection (ESD). Our AI-Endo is trained on high-quality ESD cases from an expert endoscopist, covering a decade time expansion and consisting of 201,026 labeled frames. The learned model demonstrates outstanding performance on validation data, including cases from relatively junior endoscopists with various skill levels, procedures conducted with different endoscopy systems and therapeutic skills, and cohorts from international multi-centers. Furthermore, we integrate our AI-Endo with the Olympus endoscopic system and validate the AI-enabled cognitive assistance system with animal studies in live ESD training sessions. Dedicated data analysis from surgical phase recognition results is summarized in an automatically generated report for skill assessment.


Assuntos
Endometriose , Ressecção Endoscópica de Mucosa , Animais , Feminino , Humanos , Ressecção Endoscópica de Mucosa/educação , Ressecção Endoscópica de Mucosa/métodos , Inteligência Artificial , Fluxo de Trabalho , Endoscopia , Aprendizagem
5.
Med Image Anal ; 86: 102770, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36889206

RESUMO

PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center video dataset. In this work we investigated the generalizability of phase recognition algorithms in a multicenter setting including more difficult recognition tasks such as surgical action and surgical skill. METHODS: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 h was created. Labels included framewise annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 international Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 research teams trained and submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. RESULTS: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n = 9 teams), for instrument presence detection between 38.5% and 63.8% (n = 8 teams), but for action recognition only between 21.8% and 23.3% (n = 5 teams). The average absolute error for skill assessment was 0.78 (n = 1 team). CONCLUSION: Surgical workflow and skill analysis are promising technologies to support the surgical team, but there is still room for improvement, as shown by our comparison of machine learning algorithms. This novel HeiChole benchmark can be used for comparable evaluation and validation of future work. In future studies, it is of utmost importance to create more open, high-quality datasets in order to allow the development of artificial intelligence and cognitive robotics in surgery.


Assuntos
Inteligência Artificial , Benchmarking , Humanos , Fluxo de Trabalho , Algoritmos , Aprendizado de Máquina
6.
Int J Comput Assist Radiol Surg ; 17(12): 2193-2202, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36129573

RESUMO

PURPOSE: Real-time surgical workflow analysis has been a key component for computer-assisted intervention system to improve cognitive assistance. Most existing methods solely rely on conventional temporal models and encode features with a successive spatial-temporal arrangement. Supportive benefits of intermediate features are partially lost from both visual and temporal aspects. In this paper, we rethink feature encoding to attend and preserve the critical information for accurate workflow recognition and anticipation. METHODS: We introduce Transformer in surgical workflow analysis, to reconsider complementary effects of spatial and temporal representations. We propose a hybrid embedding aggregation Transformer, named Trans-SVNet, to effectively interact with the designed spatial and temporal embeddings, by employing spatial embedding to query temporal embedding sequence. We jointly optimized by loss objectives from both analysis tasks to leverage their high correlation. RESULTS: We extensively evaluate our method on three large surgical video datasets. Our method consistently outperforms the state-of-the-arts across three datasets on workflow recognition task. Jointly learning with anticipation, recognition results can gain a large improvement. Our approach also shows its effectiveness on anticipation with promising performance achieved. Our model achieves a real-time inference speed of 0.0134 second per frame. CONCLUSION: Experimental results demonstrate the efficacy of our hybrid embeddings integration by rediscovering the crucial cues from complementary spatial-temporal embeddings. The better performance by multi-task learning indicates that anticipation task brings the additional knowledge to recognition task. Promising effectiveness and efficiency of our method also show its promising potential to be used in operating room.


Assuntos
Salas Cirúrgicas , Humanos , Fluxo de Trabalho
7.
IEEE Trans Med Imaging ; 41(3): 621-632, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-34633927

RESUMO

Multimodal learning usually requires a complete set of modalities during inference to maintain performance. Although training data can be well-prepared with high-quality multiple modalities, in many cases of clinical practice, only one modality can be acquired and important clinical evaluations have to be made based on the limited single modality information. In this work, we propose a privileged knowledge learning framework with the 'Teacher-Student' architecture, in which the complete multimodal knowledge that is only available in the training data (called privileged information) is transferred from a multimodal teacher network to a unimodal student network, via both a pixel-level and an image-level distillation scheme. Specifically, for the pixel-level distillation, we introduce a regularized knowledge distillation loss which encourages the student to mimic the teacher's softened outputs in a pixel-wise manner and incorporates a regularization factor to reduce the effect of incorrect predictions from the teacher. For the image-level distillation, we propose a contrastive knowledge distillation loss which encodes image-level structured information to enrich the knowledge encoding in combination with the pixel-level distillation. We extensively evaluate our method on two different multi-class segmentation tasks, i.e., cardiac substructure segmentation and brain tumor segmentation. Experimental results on both tasks demonstrate that our privileged knowledge learning is effective in improving unimodal segmentation and outperforms previous methods.


Assuntos
Coração , Redes Neurais de Computação , Humanos
8.
Med Image Anal ; 75: 102291, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34753019

RESUMO

We propose a novel shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection (ESD) surgery. This task is of great clinical significance but extremely challenging due to bleeding, lighting reflection, and motion blur in the complicated surgical environment. Compared with existing solutions, which either neglect geometric relationships among targeting objects or capture the relationships by using complicated aggregation schemes, the proposed network is capable of achieving satisfactory accuracy while maintaining real-time performance by taking full advantage of the spatial relations among landmarks. We first devise an algorithm to automatically generate relation keypoint heatmaps, which are able to intuitively represent the prior knowledge of spatial relations among landmarks without using any extra manual annotation efforts. We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process. While one scheme introduces pixel-level regularization by multi-task learning, the other integrates global-level regularization by harnessing a newly designed grouped consistency evaluator, which adds relation constraints to the proposed network in an adversarial manner. Both schemes are beneficial to the model in training, and can be readily unloaded in inference to achieve real-time detection. We establish a large in-house dataset of ESD surgery for esophageal cancer to validate the effectiveness of our proposed method. Extensive experimental results demonstrate that our approach outperforms state-of-the-art methods in terms of accuracy and efficiency, achieving better detection results faster. Promising results on two downstream applications further corroborate the great potential of our method in ESD clinical practice.


Assuntos
Ressecção Endoscópica de Mucosa , Algoritmos , Humanos
9.
Med Image Anal ; 75: 102296, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34781159

RESUMO

In this paper, we propose a novel method of Unsupervised Disentanglement of Scene and Motion (UDSM) representations for minimally invasive surgery video retrieval within large databases, which has the potential to advance intelligent and efficient surgical teaching systems. To extract more discriminative video representations, two designed encoders with a triplet ranking loss and an adversarial learning mechanism are established to respectively capture the spatial and temporal information for achieving disentangled features from each frame with promising interpretability. In addition, the long-range temporal dependencies are improved in an integrated video level using a temporal aggregation module and then a set of compact binary codes that carries representative features is yielded to realize fast retrieval. The entire framework is trained in an unsupervised scheme, i.e., purely learning from raw surgical videos without using any annotation. We construct two large-scale minimally invasive surgery video datasets based on the public dataset Cholec80 and our in-house dataset of laparoscopic hysterectomy, to establish the learning process and validate the effectiveness of our proposed method qualitatively and quantitatively on the surgical video retrieval task. Extensive experiments show that our approach significantly outperforms the state-of-the-art video retrieval methods on both datasets, revealing a promising future for injecting intelligence in the next generation of surgical teaching systems.


Assuntos
Procedimentos Cirúrgicos Minimamente Invasivos , Bases de Dados Factuais , Humanos , Movimento (Física)
10.
Med Image Anal ; 74: 102240, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34614476

RESUMO

The scarcity of annotated surgical data in robot-assisted surgery (RAS) motivates prior works to borrow related domain knowledge to achieve promising segmentation results in surgical images by adaptation. For dense instrument tracking in a robotic surgical video, collecting one initial scene to specify target instruments (or parts of tools) is desirable and feasible during the preoperative preparation. In this paper, we study the challenging one-shot instrument segmentation for robotic surgical videos, in which only the first frame mask of each video is provided at test time, such that the pre-trained model (learned from easily accessible source) can adapt to the target instruments. Straightforward methods transfer the domain knowledge by fine-tuning the model on each given mask. Such one-shot optimization takes hundred of iterations and the test runtime is unfeasible. We present anchor-guided online meta adaptation (AOMA) for this problem. We achieve fast one-shot test time optimization by meta-learning a good model initialization and learning rates from source videos to avoid the laborious and handcrafted fine-tuning. The trainable two components are optimized in a video-specific task space with a matching-aware loss. Furthermore, we design an anchor-guided online adaptation to tackle the performance drop throughout a robotic surgical sequence. The model is continuously adapted on motion-insensitive pseudo-masks supported by anchor matching. AOMA achieves state-of-the-art results on two practical scenarios: (1) general videos to surgical videos, (2) public surgical videos to in-house surgical videos, while reducing the test runtime substantially.


Assuntos
Procedimentos Cirúrgicos Robóticos , Humanos , Aprendizagem , Movimento (Física) , Instrumentos Cirúrgicos
11.
Med Image Anal ; 73: 102158, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34325149

RESUMO

Surgical workflow recognition is a fundamental task in computer-assisted surgery and a key component of various applications in operating rooms. Existing deep learning models have achieved promising results for surgical workflow recognition, heavily relying on a large amount of annotated videos. However, obtaining annotation is time-consuming and requires the domain knowledge of surgeons. In this paper, we propose a novel two-stage Semi-Supervised Learning method for label-efficient Surgical workflow recognition, named as SurgSSL. Our proposed SurgSSL progressively leverages the inherent knowledge held in the unlabeled data to a larger extent: from implicit unlabeled data excavation via motion knowledge excavation, to explicit unlabeled data excavation via pre-knowledge pseudo labeling. Specifically, we first propose a novel intra-sequence Visual and Temporal Dynamic Consistency (VTDC) scheme for implicit excavation. It enforces prediction consistency of the same data under perturbations in both spatial and temporal spaces, encouraging model to capture rich motion knowledge. We further perform explicit excavation by optimizing the model towards our pre-knowledge pseudo label. It is naturally generated by the VTDC regularized model with prior knowledge of unlabeled data encoded, and demonstrates superior reliability for model supervision compared with the label generated by existing methods. We extensively evaluate our method on two public surgical datasets of Cholec80 and M2CAI challenge dataset. Our method surpasses the state-of-the-art semi-supervised methods by a large margin, e.g., improving 10.5% Accuracy under the severest annotation regime of M2CAI dataset. Using only 50% labeled videos on Cholec80, our approach achieves competitive performance compared with full-data training method.


Assuntos
Redes Neurais de Computação , Cirurgia Assistida por Computador , Reprodutibilidade dos Testes , Aprendizado de Máquina Supervisionado , Fluxo de Trabalho
12.
Int J Comput Assist Radiol Surg ; 16(9): 1607-1614, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34173182

RESUMO

PURPOSE: Automatic segmentation of surgical instruments in robot-assisted minimally invasive surgery plays a fundamental role in improving context awareness. In this work, we present an instance segmentation model based on refined Mask R-CNN for accurately segmenting the instruments as well as identifying their types. METHODS: We re-formulate the instrument segmentation task as an instance segmentation task. Then we optimize the Mask R-CNN with anchor optimization and improved Region Proposal Network for instrument segmentation. Moreover, we perform cross-dataset evaluation with different sampling strategies. RESULTS: We evaluate our model on a public dataset of the MICCAI 2017 Endoscopic Vision Challenge with two segmentation tasks, and both achieve new state-of-the-art performance. Besides, cross-dataset training improved the performance on both segmentation tasks compared with those tested on the public dataset. CONCLUSION: Results demonstrate the effectiveness of the proposed instance segmentation network for surgical instruments segmentation. Cross-dataset evaluation shows our instance segmentation model presents certain cross-dataset generalization capability, and cross-dataset training can significantly improve the segmentation performance. Our empirical study also provides guidance on how to allocate the annotation cost for surgeons while labelling a new dataset in practice.


Assuntos
Procedimentos Cirúrgicos Robóticos , Endoscopia , Humanos , Processamento de Imagem Assistida por Computador , Procedimentos Cirúrgicos Minimamente Invasivos , Instrumentos Cirúrgicos
13.
Med Image Anal ; 70: 101920, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33676097

RESUMO

Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions. While numerous methods for detecting, segmenting and tracking of medical instruments based on endoscopic video images have been proposed in the literature, key limitations remain to be addressed: Firstly, robustness, that is, the reliable performance of state-of-the-art methods when run on challenging images (e.g. in the presence of blood, smoke or motion artifacts). Secondly, generalization; algorithms trained for a specific intervention in a specific hospital should generalize to other interventions or institutions. In an effort to promote solutions for these limitations, we organized the Robust Medical Instrument Segmentation (ROBUST-MIS) challenge as an international benchmarking competition with a specific focus on the robustness and generalization capabilities of algorithms. For the first time in the field of endoscopic image processing, our challenge included a task on binary segmentation and also addressed multi-instance detection and segmentation. The challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures from three different types of surgery. The validation of the competing methods for the three tasks (binary segmentation, multi-instance detection and multi-instance segmentation) was performed in three different stages with an increasing domain gap between the training and the test data. The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap. While the average detection and segmentation quality of the best-performing algorithms is high, future research should concentrate on detection and segmentation of small, crossing, moving and transparent instrument(s) (parts).


Assuntos
Processamento de Imagem Assistida por Computador , Laparoscopia , Algoritmos , Artefatos
14.
Int J Comput Assist Radiol Surg ; 15(9): 1573-1584, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32588246

RESUMO

PURPOSE: Automatic surgical workflow recognition in video is an essentially fundamental yet challenging problem for developing computer-assisted and robotic-assisted surgery. Existing approaches with deep learning have achieved remarkable performance on analysis of surgical videos, however, heavily relying on large-scale labelled datasets. Unfortunately, the annotation is not often available in abundance, because it requires the domain knowledge of surgeons. Even for experts, it is very tedious and time-consuming to do a sufficient amount of annotations. METHODS: In this paper, we propose a novel active learning method for cost-effective surgical video analysis. Specifically, we propose a non-local recurrent convolutional network, which introduces non-local block to capture the long-range temporal dependency (LRTD) among continuous frames. We then formulate an intra-clip dependency score to represent the overall dependency within this clip. By ranking scores among clips in unlabelled data pool, we select the clips with weak dependencies to annotate, which indicates the most informative ones to better benefit network training. RESULTS: We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task. By using our LRTD based selection strategy, we can outperform other state-of-the-art active learning methods who only consider neighbor-frame information. Using only up to 50% of samples, our approach can exceed the performance of full-data training. CONCLUSION: By modeling the intra-clip dependency, our LRTD based strategy shows stronger capability to select informative video clips for annotation compared with other active learning methods, through the evaluation on a popular public surgical dataset. The results also show the promising potential of our framework for reducing annotation workload in the clinical practice.


Assuntos
Reconhecimento Automatizado de Padrão , Aprendizagem Baseada em Problemas , Procedimentos Cirúrgicos Robóticos , Cirurgia Assistida por Computador/métodos , Fluxo de Trabalho , Algoritmos , Simulação por Computador , Humanos , Aprendizagem , Modelos Estatísticos , Redes Neurais de Computação , Reprodutibilidade dos Testes , Cirurgiões , Cirurgia Assistida por Computador/instrumentação , Gravação em Vídeo
15.
Radiology ; 291(3): 677-686, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30912722

RESUMO

Background Nasopharyngeal carcinoma (NPC) may be cured with radiation therapy. Tumor proximity to critical structures demands accuracy in tumor delineation to avoid toxicities from radiation therapy; however, tumor target contouring for head and neck radiation therapy is labor intensive and highly variable among radiation oncologists. Purpose To construct and validate an artificial intelligence (AI) contouring tool to automate primary gross tumor volume (GTV) contouring in patients with NPC. Materials and Methods In this retrospective study, MRI data sets covering the nasopharynx from 1021 patients (median age, 47 years; 751 male, 270 female) with NPC between September 2016 and September 2017 were collected and divided into training, validation, and testing cohorts of 715, 103, and 203 patients, respectively. GTV contours were delineated for 1021 patients and were defined by consensus of two experts. A three-dimensional convolutional neural network was applied to 818 training and validation MRI data sets to construct the AI tool, which was tested in 203 independent MRI data sets. Next, the AI tool was compared against eight qualified radiation oncologists in a multicenter evaluation by using a random sample of 20 test MRI examinations. The Wilcoxon matched-pairs signed rank test was used to compare the difference of Dice similarity coefficient (DSC) of pre- versus post-AI assistance. Results The AI-generated contours demonstrated a high level of accuracy when compared with ground truth contours at testing in 203 patients (DSC, 0.79; 2.0-mm difference in average surface distance). In multicenter evaluation, AI assistance improved contouring accuracy (five of eight oncologists had a higher median DSC after AI assistance; average median DSC, 0.74 vs 0.78; P < .001), reduced intra- and interobserver variation (by 36.4% and 54.5%, respectively), and reduced contouring time (by 39.4%). Conclusion The AI contouring tool improved primary gross tumor contouring accuracy of nasopharyngeal carcinoma, which could have a positive impact on tumor control and patient survival. © RSNA, 2019 Online supplemental material is available for this article. See also the editorial by Chang in this issue.


Assuntos
Aprendizado Profundo , Interpretação de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Carcinoma Nasofaríngeo/diagnóstico por imagem , Neoplasias Nasofaríngeas/diagnóstico por imagem , Adolescente , Adulto , Algoritmos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Nasofaringe/diagnóstico por imagem , Estudos Retrospectivos , Adulto Jovem
16.
IEEE Trans Med Imaging ; 37(5): 1114-1126, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29727275

RESUMO

We propose an analysis of surgical videos that is based on a novel recurrent convolutional network (SV-RCNet), specifically for automatic workflow recognition from surgical videos online, which is a key component for developing the context-aware computer-assisted intervention systems. Different from previous methods which harness visual and temporal information separately, the proposed SV-RCNet seamlessly integrates a convolutional neural network (CNN) and a recurrent neural network (RNN) to form a novel recurrent convolutional architecture in order to take full advantages of the complementary information of visual and temporal features learned from surgical videos. We effectively train the SV-RCNet in an end-to-end manner so that the visual representations and sequential dynamics can be jointly optimized in the learning process. In order to produce more discriminative spatio-temporal features, we exploit a deep residual network (ResNet) and a long short term memory (LSTM) network, to extract visual features and temporal dependencies, respectively, and integrate them into the SV-RCNet. Moreover, based on the phase transition-sensitive predictions from the SV-RCNet, we propose a simple yet effective inference scheme, namely the prior knowledge inference (PKI), by leveraging the natural characteristic of surgical video. Such a strategy further improves the consistency of results and largely boosts the recognition performance. Extensive experiments have been conducted with the MICCAI 2016 Modeling and Monitoring of Computer Assisted Interventions Workflow Challenge dataset and Cholec80 dataset to validate SV-RCNet. Our approach not only achieves superior performance on these two datasets but also outperforms the state-of-the-art methods by a significant margin.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação , Software , Cirurgia Vídeoassistida/classificação , Algoritmos , Bases de Dados Factuais , Humanos , Fluxo de Trabalho
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA