Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 73
Filter
1.
Br J Surg ; 111(6)2024 Jun 12.
Article in English | MEDLINE | ID: mdl-38916133

ABSTRACT

Surgical technique is essential to ensure safe minimally invasive adrenalectomy. Due to the relative rarity of adrenal surgery, it is challenging to ensure adequate exposure in surgical training. Surgical video analysis supports auto-evaluation, expert assessment and could be a target for automatization. The developed ontology was validated by a European expert consensus and is applicable across the surgical techniques encountered in all participating centres, with an exemplary demonstration in bi-centric recordings. Standardization of adrenalectomy video analysis may foster surgical training and enable machine learning training for automated safety alerts.


Subject(s)
Adrenalectomy , Delphi Technique , Laparoscopy , Machine Learning , Humans , Adrenalectomy/education , Adrenalectomy/methods , Laparoscopy/education , Laparoscopy/methods , Pilot Projects , Video Recording
2.
Article in English | MEDLINE | ID: mdl-38761319

ABSTRACT

PURPOSE: Most studies on surgical activity recognition utilizing artificial intelligence (AI) have focused mainly on recognizing one type of activity from small and mono-centric surgical video datasets. It remains speculative whether those models would generalize to other centers. METHODS: In this work, we introduce a large multi-centric multi-activity dataset consisting of 140 surgical videos (MultiBypass140) of laparoscopic Roux-en-Y gastric bypass (LRYGB) surgeries performed at two medical centers, i.e., the University Hospital of Strasbourg, France (StrasBypass70) and Inselspital, Bern University Hospital, Switzerland (BernBypass70). The dataset has been fully annotated with phases and steps by two board-certified surgeons. Furthermore, we assess the generalizability and benchmark different deep learning models for the task of phase and step recognition in 7 experimental studies: (1) Training and evaluation on BernBypass70; (2) Training and evaluation on StrasBypass70; (3) Training and evaluation on the joint MultiBypass140 dataset; (4) Training on BernBypass70, evaluation on StrasBypass70; (5) Training on StrasBypass70, evaluation on BernBypass70; Training on MultiBypass140, (6) evaluation on BernBypass70 and (7) evaluation on StrasBypass70. RESULTS: The model's performance is markedly influenced by the training data. The worst results were obtained in experiments (4) and (5) confirming the limited generalization capabilities of models trained on mono-centric data. The use of multi-centric training data, experiments (6) and (7), improves the generalization capabilities of the models, bringing them beyond the level of independent mono-centric training and validation (experiments (1) and (2)). CONCLUSION: MultiBypass140 shows considerable variation in surgical technique and workflow of LRYGB procedures between centers. Therefore, generalization experiments demonstrate a remarkable difference in model performance. These results highlight the importance of multi-centric datasets for AI model generalization to account for variance in surgical technique and workflows. The dataset and code are publicly available at https://github.com/CAMMA-public/MultiBypass140.

3.
Cir Esp (Engl Ed) ; 102 Suppl 1: S66-S71, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38704146

ABSTRACT

Artificial intelligence (AI) will power many of the tools in the armamentarium of digital surgeons. AI methods and surgical proof-of-concept flourish, but we have yet to witness clinical translation and value. Here we exemplify the potential of AI in the care pathway of colorectal cancer patients and discuss clinical, technical, and governance considerations of major importance for the safe translation of surgical AI for the benefit of our patients and practices.


Subject(s)
Artificial Intelligence , Colorectal Neoplasms , Humans , Colorectal Neoplasms/surgery
4.
Int J Comput Assist Radiol Surg ; 19(7): 1409-1417, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38780829

ABSTRACT

PURPOSE: The modern operating room is becoming increasingly complex, requiring innovative intra-operative support systems. While the focus of surgical data science has largely been on video analysis, integrating surgical computer vision with natural language capabilities is emerging as a necessity. Our work aims to advance visual question answering (VQA) in the surgical context with scene graph knowledge, addressing two main challenges in the current surgical VQA systems: removing question-condition bias in the surgical VQA dataset and incorporating scene-aware reasoning in the surgical VQA model design. METHODS: First, we propose a surgical scene graph-based dataset, SSG-VQA, generated by employing segmentation and detection models on publicly available datasets. We build surgical scene graphs using spatial and action information of instruments and anatomies. These graphs are fed into a question engine, generating diverse QA pairs. We then propose SSG-VQA-Net, a novel surgical VQA model incorporating a lightweight Scene-embedded Interaction Module, which integrates geometric scene knowledge in the VQA model design by employing cross-attention between the textual and the scene features. RESULTS: Our comprehensive analysis shows that our SSG-VQA dataset provides a more complex, diverse, geometrically grounded, unbiased and surgical action-oriented dataset compared to existing surgical VQA datasets and SSG-VQA-Net outperforms existing methods across different question types and complexities. We highlight that the primary limitation in the current surgical VQA systems is the lack of scene knowledge to answer complex queries. CONCLUSION: We present a novel surgical VQA dataset and model and show that results can be significantly improved by incorporating geometric scene features in the VQA model design. We point out that the bottleneck of the current surgical visual question-answer model lies in learning the encoded representation rather than decoding the sequence. Our SSG-VQA dataset provides a diagnostic benchmark to test the scene understanding and reasoning capabilities of the model. The source code and the dataset will be made publicly available at: https://github.com/CAMMA-public/SSG-VQA .


Subject(s)
Operating Rooms , Humans , Surgery, Computer-Assisted/methods , Natural Language Processing , Video Recording
5.
Int J Comput Assist Radiol Surg ; 19(6): 1093-1101, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38573565

ABSTRACT

PURPOSE: In medical research, deep learning models rely on high-quality annotated data, a process often laborious and time-consuming. This is particularly true for detection tasks where bounding box annotations are required. The need to adjust two corners makes the process inherently frame-by-frame. Given the scarcity of experts' time, efficient annotation methods suitable for clinicians are needed. METHODS: We propose an on-the-fly method for live video annotation to enhance the annotation efficiency. In this approach, a continuous single-point annotation is maintained by keeping the cursor on the object in a live video, mitigating the need for tedious pausing and repetitive navigation inherent in traditional annotation methods. This novel annotation paradigm inherits the point annotation's ability to generate pseudo-labels using a point-to-box teacher model. We empirically evaluate this approach by developing a dataset and comparing on-the-fly annotation time against traditional annotation method. RESULTS: Using our method, annotation speed was 3.2 × faster than the traditional annotation technique. We achieved a mean improvement of 6.51 ± 0.98 AP@50 over conventional method at equivalent annotation budgets on the developed dataset. CONCLUSION: Without bells and whistles, our approach offers a significant speed-up in annotation tasks. It can be easily implemented on any annotation platform to accelerate the integration of deep learning in video-based medical research.


Subject(s)
Deep Learning , Video Recording , Video Recording/methods , Humans , Data Curation/methods
6.
Int J Comput Assist Radiol Surg ; 19(6): 1243-1250, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38678488

ABSTRACT

PURPOSE: Advances in deep learning have resulted in effective models for surgical video analysis; however, these models often fail to generalize across medical centers due to domain shift caused by variations in surgical workflow, camera setups, and patient demographics. Recently, object-centric learning has emerged as a promising approach for improved surgical scene understanding, capturing and disentangling visual and semantic properties of surgical tools and anatomy to improve downstream task performance. In this work, we conduct a multicentric performance benchmark of object-centric approaches, focusing on critical view of safety assessment in laparoscopic cholecystectomy, then propose an improved approach for unseen domain generalization. METHODS: We evaluate four object-centric approaches for domain generalization, establishing baseline performance. Next, leveraging the disentangled nature of object-centric representations, we dissect one of these methods through a series of ablations (e.g., ignoring either visual or semantic features for downstream classification). Finally, based on the results of these ablations, we develop an optimized method specifically tailored for domain generalization, LG-DG, that includes a novel disentanglement loss function. RESULTS: Our optimized approach, LG-DG, achieves an improvement of 9.28% over the best baseline approach. More broadly, we show that object-centric approaches are highly effective for domain generalization thanks to their modular approach to representation learning. CONCLUSION: We investigate the use of object-centric methods for unseen domain generalization, identify method-agnostic factors critical for performance, and present an optimized approach that substantially outperforms existing methods.


Subject(s)
Cholecystectomy, Laparoscopic , Humans , Cholecystectomy, Laparoscopic/methods , Video Recording , Deep Learning
7.
Br J Surg ; 111(1)2024 Jan 03.
Article in English | MEDLINE | ID: mdl-37935636

ABSTRACT

The growing availability of surgical digital data and developments in analytics such as artificial intelligence (AI) are being harnessed to improve surgical care. However, technical and cultural barriers to real-time intraoperative AI assistance exist. This early-stage clinical evaluation shows the technical feasibility of concurrently deploying several AIs in operating rooms for real-time assistance during procedures. In addition, potentially relevant clinical applications of these AI models are explored with a multidisciplinary cohort of key stakeholders.


Subject(s)
Cholecystectomy, Laparoscopic , Humans , Artificial Intelligence
8.
IEEE Trans Med Imaging ; 43(3): 1247-1258, 2024 Mar.
Article in English | MEDLINE | ID: mdl-37971921

ABSTRACT

Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. Prior works have approached this task by including semantic segmentation as an intermediate step, using predicted segmentation masks to then predict the CVS. While these methods are effective, they rely on extremely expensive ground-truth segmentation annotations and tend to fail when the predicted segmentation is incorrect, limiting generalization. In this work, we propose a method for CVS prediction wherein we first represent a surgical image using a disentangled latent scene graph, then process this representation using a graph neural network. Our graph representations explicitly encode semantic information - object location, class information, geometric relations - to improve anatomy-driven reasoning, as well as visual features to retain differentiability and thereby provide robustness to semantic errors. Finally, to address annotation cost, we propose to train our method using only bounding box annotations, incorporating an auxiliary image reconstruction objective to learn fine-grained object boundaries. We show that our method not only outperforms several baseline methods when trained with bounding box annotations, but also scales effectively when trained with segmentation masks, maintaining state-of-the-art performance.


Subject(s)
Image Processing, Computer-Assisted , Neural Networks, Computer , Semantics
9.
Surg Endosc ; 37(10): 7412-7424, 2023 10.
Article in English | MEDLINE | ID: mdl-37584774

ABSTRACT

BACKGROUND: Technical skill assessment in surgery relies on expert opinion. Therefore, it is time-consuming, costly, and often lacks objectivity. Analysis of intraoperative data by artificial intelligence (AI) has the potential for automated technical skill assessment. The aim of this systematic review was to analyze the performance, external validity, and generalizability of AI models for technical skill assessment in minimally invasive surgery. METHODS: A systematic search of Medline, Embase, Web of Science, and IEEE Xplore was performed to identify original articles reporting the use of AI in the assessment of technical skill in minimally invasive surgery. Risk of bias (RoB) and quality of the included studies were analyzed according to Quality Assessment of Diagnostic Accuracy Studies criteria and the modified Joanna Briggs Institute checklists, respectively. Findings were reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement. RESULTS: In total, 1958 articles were identified, 50 articles met eligibility criteria and were analyzed. Motion data extracted from surgical videos (n = 25) or kinematic data from robotic systems or sensors (n = 22) were the most frequent input data for AI. Most studies used deep learning (n = 34) and predicted technical skills using an ordinal assessment scale (n = 36) with good accuracies in simulated settings. However, all proposed models were in development stage, only 4 studies were externally validated and 8 showed a low RoB. CONCLUSION: AI showed good performance in technical skill assessment in minimally invasive surgery. However, models often lacked external validity and generalizability. Therefore, models should be benchmarked using predefined performance metrics and tested in clinical implementation studies.


Subject(s)
Artificial Intelligence , Minimally Invasive Surgical Procedures , Humans , Academies and Institutes , Benchmarking , Checklist
10.
Med Image Anal ; 89: 102888, 2023 10.
Article in English | MEDLINE | ID: mdl-37451133

ABSTRACT

Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery.


Subject(s)
Artificial Intelligence , Surgery, Computer-Assisted , Humans , Endoscopy , Algorithms , Surgery, Computer-Assisted/methods , Surgical Instruments
11.
Phys Med Biol ; 68(16)2023 07 31.
Article in English | MEDLINE | ID: mdl-37433326

ABSTRACT

Objective.Patient dose estimation in x-ray-guided interventions is essential to prevent radiation-induced biological side effects. Current dose monitoring systems estimate the skin dose based in dose metrics such as the reference air kerma. However, these approximations do not take into account the exact patient morphology and organs composition. Furthermore, accurate organ dose estimation has not been proposed for these procedures. Monte Carlo simulation can accurately estimate the dose by recreating the irradiation process generated during the x-ray imaging, but at a high computation time, limiting an intra-operative application. This work presents a fast deep convolutional neural network trained with MC simulations for patient dose estimation during x-ray-guided interventions.Approach.We introduced a modified 3D U-Net that utilizes a patient's CT scan and the numerical values of imaging settings as input to produce a Monte Carlo dose map. To create a dataset of dose maps, we simulated the x-ray irradiation process for the abdominal region using a publicly available dataset of 82 patient CT scans. The simulation involved varying the angulation, position, and tube voltage of the x-ray source for each scan. We additionally conducted a clinical study during endovascular abdominal aortic repairs to validate the reliability of our Monte Carlo simulation dose maps. Dose measurements were taken at four specific anatomical points on the skin and compared to the corresponding simulated doses. The proposed network was trained using a 4-fold cross-validation approach with 65 patients, and evaluating the performance on the remaining 17 patients during testing.Main results.The clinical validation demonstrated a average error within the anatomical points of 5.1%. The network yielded test errors of 11.5 ± 4.6% and 6.2 ± 1.5% for peak and average skin doses, respectively. Furthermore, the mean errors for the abdominal region and pancreas doses were 5.0 ± 1.4% and 13.1 ± 2.7%, respectively.Significance.Our network can accurately predict a personalized 3D dose map considering the current imaging settings. A short computation time was achieved, making our approach a potential solution for dose monitoring and reporting commercial systems.


Subject(s)
Deep Learning , Humans , Radiation Dosage , X-Rays , Reproducibility of Results , Phantoms, Imaging , Monte Carlo Method
12.
Surg Endosc ; 37(11): 8690-8707, 2023 11.
Article in English | MEDLINE | ID: mdl-37516693

ABSTRACT

BACKGROUND: Surgery generates a vast amount of data from each procedure. Particularly video data provides significant value for surgical research, clinical outcome assessment, quality control, and education. The data lifecycle is influenced by various factors, including data structure, acquisition, storage, and sharing; data use and exploration, and finally data governance, which encompasses all ethical and legal regulations associated with the data. There is a universal need among stakeholders in surgical data science to establish standardized frameworks that address all aspects of this lifecycle to ensure data quality and purpose. METHODS: Working groups were formed, among 48 representatives from academia and industry, including clinicians, computer scientists and industry representatives. These working groups focused on: Data Use, Data Structure, Data Exploration, and Data Governance. After working group and panel discussions, a modified Delphi process was conducted. RESULTS: The resulting Delphi consensus provides conceptualized and structured recommendations for each domain related to surgical video data. We identified the key stakeholders within the data lifecycle and formulated comprehensive, easily understandable, and widely applicable guidelines for data utilization. Standardization of data structure should encompass format and quality, data sources, documentation, metadata, and account for biases within the data. To foster scientific data exploration, datasets should reflect diversity and remain adaptable to future applications. Data governance must be transparent to all stakeholders, addressing legal and ethical considerations surrounding the data. CONCLUSION: This consensus presents essential recommendations around the generation of standardized and diverse surgical video databanks, accounting for multiple stakeholders involved in data generation and use throughout its lifecycle. Following the SAGES annotation framework, we lay the foundation for standardization of data use, structure, and exploration. A detailed exploration of requirements for adequate data governance will follow.


Subject(s)
Artificial Intelligence , Quality Improvement , Humans , Consensus , Data Collection
13.
Med Image Anal ; 88: 102866, 2023 08.
Article in English | MEDLINE | ID: mdl-37356320

ABSTRACT

Searching through large volumes of medical data to retrieve relevant information is a challenging yet crucial task for clinical care. However the primitive and most common approach to retrieval, involving text in the form of keywords, is severely limited when dealing with complex media formats. Content-based retrieval offers a way to overcome this limitation, by using rich media as the query itself. Surgical video-to-video retrieval in particular is a new and largely unexplored research problem with high clinical value, especially in the real-time case: using real-time video hashing, search can be achieved directly inside of the operating room. Indeed, the process of hashing converts large data entries into compact binary arrays or hashes, enabling large-scale search operations at a very fast rate. However, due to fluctuations over the course of a video, not all bits in a given hash are equally reliable. In this work, we propose a method capable of mitigating this uncertainty while maintaining a light computational footprint. We present superior retrieval results (3%-4% top 10 mean average precision) on a multi-task evaluation protocol for surgery, using cholecystectomy phases, bypass phases, and coming from an entirely new dataset introduced here, surgical events across six different surgery types. Success on this multi-task benchmark shows the generalizability of our approach for surgical video retrieval.


Subject(s)
Algorithms , Laparoscopy , Humans , Cholecystectomy , Uncertainty
14.
Sci Rep ; 13(1): 9235, 2023 06 07.
Article in English | MEDLINE | ID: mdl-37286660

ABSTRACT

Surgical video analysis facilitates education and research. However, video recordings of endoscopic surgeries can contain privacy-sensitive information, especially if the endoscopic camera is moved out of the body of patients and out-of-body scenes are recorded. Therefore, identification of out-of-body scenes in endoscopic videos is of major importance to preserve the privacy of patients and operating room staff. This study developed and validated a deep learning model for the identification of out-of-body images in endoscopic videos. The model was trained and evaluated on an internal dataset of 12 different types of laparoscopic and robotic surgeries and was externally validated on two independent multicentric test datasets of laparoscopic gastric bypass and cholecystectomy surgeries. Model performance was evaluated compared to human ground truth annotations measuring the receiver operating characteristic area under the curve (ROC AUC). The internal dataset consisting of 356,267 images from 48 videos and the two multicentric test datasets consisting of 54,385 and 58,349 images from 10 and 20 videos, respectively, were annotated. The model identified out-of-body images with 99.97% ROC AUC on the internal test dataset. Mean ± standard deviation ROC AUC on the multicentric gastric bypass dataset was 99.94 ± 0.07% and 99.71 ± 0.40% on the multicentric cholecystectomy dataset, respectively. The model can reliably identify out-of-body images in endoscopic videos and is publicly shared. This facilitates privacy preservation in surgical video analysis.


Subject(s)
Deep Learning , Laparoscopy , Humans , Privacy , Video Recording , Cholecystectomy
15.
Med Image Anal ; 88: 102844, 2023 08.
Article in English | MEDLINE | ID: mdl-37270898

ABSTRACT

The field of surgical computer vision has undergone considerable breakthroughs in recent years with the rising popularity of deep neural network-based methods. However, standard fully-supervised approaches for training such models require vast amounts of annotated data, imposing a prohibitively high cost; especially in the clinical domain. Self-Supervised Learning (SSL) methods, which have begun to gain traction in the general computer vision community, represent a potential solution to these annotation costs, allowing to learn useful representations from only unlabeled data. Still, the effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. In this work, we address this critical need by investigating four state-of-the-art SSL methods (MoCo v2, SimCLR, DINO, SwAV) in the context of surgical computer vision. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection. We examine their parameterization, then their behavior with respect to training data quantities in semi-supervised settings. Correct transfer of these methods to surgery, as described and conducted in this work, leads to substantial performance gains over generic uses of SSL - up to 7.4% on phase recognition and 20% on tool presence detection - as well as state-of-the-art semi-supervised phase recognition approaches by up to 14%. Further results obtained on a highly diverse selection of surgical datasets exhibit strong generalization properties. The code is available at https://github.com/CAMMA-public/SelfSupSurg.


Subject(s)
Computers , Neural Networks, Computer , Humans , Supervised Machine Learning
16.
Int J Comput Assist Radiol Surg ; 18(6): 1053-1059, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37097518

ABSTRACT

PURPOSE: One of the recent advances in surgical AI is the recognition of surgical activities as triplets of [Formula: see text]instrument, verb, target[Formula: see text]. Albeit providing detailed information for computer-assisted intervention, current triplet recognition approaches rely only on single-frame features. Exploiting the temporal cues from earlier frames would improve the recognition of surgical action triplets from videos. METHODS: In this paper, we propose Rendezvous in Time (RiT)-a deep learning model that extends the state-of-the-art model, Rendezvous, with temporal modeling. Focusing more on the verbs, our RiT explores the connectedness of current and past frames to learn temporal attention-based features for enhanced triplet recognition. RESULTS: We validate our proposal on the challenging surgical triplet dataset, CholecT45, demonstrating an improved recognition of the verb and triplet along with other interactions involving the verb such as [Formula: see text]instrument, verb[Formula: see text]. Qualitative results show that the RiT produces smoother predictions for most triplet instances than the state-of-the-arts. CONCLUSION: We present a novel attention-based approach that leverages the temporal fusion of video frames to model the evolution of surgical actions and exploit their benefits for surgical triplet recognition.

17.
IEEE Trans Med Imaging ; 42(9): 2592-2602, 2023 09.
Article in English | MEDLINE | ID: mdl-37030859

ABSTRACT

Automatic recognition of fine-grained surgical activities, called steps, is a challenging but crucial task for intelligent intra-operative computer assistance. The development of current vision-based activity recognition methods relies heavily on a high volume of manually annotated data. This data is difficult and time-consuming to generate and requires domain-specific knowledge. In this work, we propose to use coarser and easier-to-annotate activity labels, namely phases, as weak supervision to learn step recognition with fewer step annotated videos. We introduce a step-phase dependency loss to exploit the weak supervision signal. We then employ a Single-Stage Temporal Convolutional Network (SS-TCN) with a ResNet-50 backbone, trained in an end-to-end fashion from weakly annotated videos, for temporal activity segmentation and recognition. We extensively evaluate and show the effectiveness of the proposed method on a large video dataset consisting of 40 laparoscopic gastric bypass procedures and the public benchmark CATARACTS containing 50 cataract surgeries.


Subject(s)
Neural Networks, Computer , Surgery, Computer-Assisted
18.
Int J Comput Assist Radiol Surg ; 18(9): 1665-1672, 2023 Sep.
Article in English | MEDLINE | ID: mdl-36944845

ABSTRACT

PURPOSE: Automatic recognition of surgical activities from intraoperative surgical videos is crucial for developing intelligent support systems for computer-assisted interventions. Current state-of-the-art recognition methods are based on deep learning where data augmentation has shown the potential to improve the generalization of these methods. This has spurred work on automated and simplified augmentation strategies for image classification and object detection on datasets of still images. Extending such augmentation methods to videos is not straightforward, as the temporal dimension needs to be considered. Furthermore, surgical videos pose additional challenges as they are composed of multiple, interconnected, and long-duration activities. METHODS: This work proposes a new simplified augmentation method, called TRandAugment, specifically designed for long surgical videos, that treats each video as an assemble of temporal segments and applies consistent but random transformations to each segment. The proposed augmentation method is used to train an end-to-end spatiotemporal model consisting of a CNN (ResNet50) followed by a TCN. RESULTS: The effectiveness of the proposed method is demonstrated on two surgical video datasets, namely Bypass40 and CATARACTS, and two tasks, surgical phase and step recognition. TRandAugment adds a performance boost of 1-6% over previous state-of-the-art methods, that uses manually designed augmentations. CONCLUSION: This work presents a simplified and automated augmentation method for long surgical videos. The proposed method has been validated on different datasets and tasks indicating the importance of devising temporal augmentation methods for long surgical videos.


Subject(s)
Cataract Extraction , Neural Networks, Computer , Humans , Algorithms , Cataract Extraction/methods
19.
Med Image Anal ; 86: 102770, 2023 05.
Article in English | MEDLINE | ID: mdl-36889206

ABSTRACT

PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center video dataset. In this work we investigated the generalizability of phase recognition algorithms in a multicenter setting including more difficult recognition tasks such as surgical action and surgical skill. METHODS: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 h was created. Labels included framewise annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 international Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 research teams trained and submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. RESULTS: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n = 9 teams), for instrument presence detection between 38.5% and 63.8% (n = 8 teams), but for action recognition only between 21.8% and 23.3% (n = 5 teams). The average absolute error for skill assessment was 0.78 (n = 1 team). CONCLUSION: Surgical workflow and skill analysis are promising technologies to support the surgical team, but there is still room for improvement, as shown by our comparison of machine learning algorithms. This novel HeiChole benchmark can be used for comparable evaluation and validation of future work. In future studies, it is of utmost importance to create more open, high-quality datasets in order to allow the development of artificial intelligence and cognitive robotics in surgery.


Subject(s)
Artificial Intelligence , Benchmarking , Humans , Workflow , Algorithms , Machine Learning
20.
Biomedicines ; 11(1)2023 Jan 12.
Article in English | MEDLINE | ID: mdl-36672702

ABSTRACT

The aim of this work was to compare the classification of cardiac MR-images of AL versus ATTR amyloidosis by neural networks and by experienced human readers. Cine-MR images and late gadolinium enhancement (LGE) images of 120 patients were studied (70 AL and 50 TTR). A VGG16 convolutional neural network (CNN) was trained with a 5-fold cross validation process, taking care to strictly distribute images of a given patient in either the training group or the test group. The analysis was performed at the patient level by averaging the predictions obtained for each image. The classification accuracy obtained between AL and ATTR amyloidosis was 0.750 for cine-CNN, 0.611 for Gado-CNN and between 0.617 and 0.675 for human readers. The corresponding AUC of the ROC curve was 0.839 for cine-CNN, 0.679 for gado-CNN (p < 0.004 vs. cine) and 0.714 for the best human reader (p < 0.007 vs. cine). Logistic regression with cine-CNN and gado-CNN, as well as analysis focused on the specific orientation plane, did not change the overall results. We conclude that cine-CNN leads to significantly better discrimination between AL and ATTR amyloidosis as compared to gado-CNN or human readers, but with lower performance than reported in studies where visual diagnosis is easy, and is currently suboptimal for clinical practice.

SELECTION OF CITATIONS
SEARCH DETAIL