Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 73
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Surg Endosc ; 37(10): 7412-7424, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37584774

RESUMO

BACKGROUND: Technical skill assessment in surgery relies on expert opinion. Therefore, it is time-consuming, costly, and often lacks objectivity. Analysis of intraoperative data by artificial intelligence (AI) has the potential for automated technical skill assessment. The aim of this systematic review was to analyze the performance, external validity, and generalizability of AI models for technical skill assessment in minimally invasive surgery. METHODS: A systematic search of Medline, Embase, Web of Science, and IEEE Xplore was performed to identify original articles reporting the use of AI in the assessment of technical skill in minimally invasive surgery. Risk of bias (RoB) and quality of the included studies were analyzed according to Quality Assessment of Diagnostic Accuracy Studies criteria and the modified Joanna Briggs Institute checklists, respectively. Findings were reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement. RESULTS: In total, 1958 articles were identified, 50 articles met eligibility criteria and were analyzed. Motion data extracted from surgical videos (n = 25) or kinematic data from robotic systems or sensors (n = 22) were the most frequent input data for AI. Most studies used deep learning (n = 34) and predicted technical skills using an ordinal assessment scale (n = 36) with good accuracies in simulated settings. However, all proposed models were in development stage, only 4 studies were externally validated and 8 showed a low RoB. CONCLUSION: AI showed good performance in technical skill assessment in minimally invasive surgery. However, models often lacked external validity and generalizability. Therefore, models should be benchmarked using predefined performance metrics and tested in clinical implementation studies.


Assuntos
Inteligência Artificial , Procedimentos Cirúrgicos Minimamente Invasivos , Humanos , Academias e Institutos , Benchmarking , Lista de Checagem
2.
Surg Endosc ; 37(3): 2070-2077, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36289088

RESUMO

BACKGROUND: Phase and step annotation in surgical videos is a prerequisite for surgical scene understanding and for downstream tasks like intraoperative feedback or assistance. However, most ontologies are applied on small monocentric datasets and lack external validation. To overcome these limitations an ontology for phases and steps of laparoscopic Roux-en-Y gastric bypass (LRYGB) is proposed and validated on a multicentric dataset in terms of inter- and intra-rater reliability (inter-/intra-RR). METHODS: The proposed LRYGB ontology consists of 12 phase and 46 step definitions that are hierarchically structured. Two board certified surgeons (raters) with > 10 years of clinical experience applied the proposed ontology on two datasets: (1) StraBypass40 consists of 40 LRYGB videos from Nouvel Hôpital Civil, Strasbourg, France and (2) BernBypass70 consists of 70 LRYGB videos from Inselspital, Bern University Hospital, Bern, Switzerland. To assess inter-RR the two raters' annotations of ten randomly chosen videos from StraBypass40 and BernBypass70 each, were compared. To assess intra-RR ten randomly chosen videos were annotated twice by the same rater and annotations were compared. Inter-RR was calculated using Cohen's kappa. Additionally, for inter- and intra-RR accuracy, precision, recall, F1-score, and application dependent metrics were applied. RESULTS: The mean ± SD video duration was 108 ± 33 min and 75 ± 21 min in StraBypass40 and BernBypass70, respectively. The proposed ontology shows an inter-RR of 96.8 ± 2.7% for phases and 85.4 ± 6.0% for steps on StraBypass40 and 94.9 ± 5.8% for phases and 76.1 ± 13.9% for steps on BernBypass70. The overall Cohen's kappa of inter-RR was 95.9 ± 4.3% for phases and 80.8 ± 10.0% for steps. Intra-RR showed an accuracy of 98.4 ± 1.1% for phases and 88.1 ± 8.1% for steps. CONCLUSION: The proposed ontology shows an excellent inter- and intra-RR and should therefore be implemented routinely in phase and step annotation of LRYGB.


Assuntos
Derivação Gástrica , Laparoscopia , Obesidade Mórbida , Humanos , Obesidade Mórbida/cirurgia , Reprodutibilidade dos Testes , Resultado do Tratamento , Complicações Pós-Operatórias/cirurgia
3.
Surg Endosc ; 37(11): 8690-8707, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37516693

RESUMO

BACKGROUND: Surgery generates a vast amount of data from each procedure. Particularly video data provides significant value for surgical research, clinical outcome assessment, quality control, and education. The data lifecycle is influenced by various factors, including data structure, acquisition, storage, and sharing; data use and exploration, and finally data governance, which encompasses all ethical and legal regulations associated with the data. There is a universal need among stakeholders in surgical data science to establish standardized frameworks that address all aspects of this lifecycle to ensure data quality and purpose. METHODS: Working groups were formed, among 48 representatives from academia and industry, including clinicians, computer scientists and industry representatives. These working groups focused on: Data Use, Data Structure, Data Exploration, and Data Governance. After working group and panel discussions, a modified Delphi process was conducted. RESULTS: The resulting Delphi consensus provides conceptualized and structured recommendations for each domain related to surgical video data. We identified the key stakeholders within the data lifecycle and formulated comprehensive, easily understandable, and widely applicable guidelines for data utilization. Standardization of data structure should encompass format and quality, data sources, documentation, metadata, and account for biases within the data. To foster scientific data exploration, datasets should reflect diversity and remain adaptable to future applications. Data governance must be transparent to all stakeholders, addressing legal and ethical considerations surrounding the data. CONCLUSION: This consensus presents essential recommendations around the generation of standardized and diverse surgical video databanks, accounting for multiple stakeholders involved in data generation and use throughout its lifecycle. Following the SAGES annotation framework, we lay the foundation for standardization of data use, structure, and exploration. A detailed exploration of requirements for adequate data governance will follow.


Assuntos
Inteligência Artificial , Melhoria de Qualidade , Humanos , Consenso , Coleta de Dados
4.
Ann Surg ; 275(5): 955-961, 2022 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-33201104

RESUMO

OBJECTIVE: To develop a deep learning model to automatically segment hepatocystic anatomy and assess the criteria defining the critical view of safety (CVS) in laparoscopic cholecystectomy (LC). BACKGROUND: Poor implementation and subjective interpretation of CVS contributes to the stable rates of bile duct injuries in LC. As CVS is assessed visually, this task can be automated by using computer vision, an area of artificial intelligence aimed at interpreting images. METHODS: Still images from LC videos were annotated with CVS criteria and hepatocystic anatomy segmentation. A deep neural network comprising a segmentation model to highlight hepatocystic anatomy and a classification model to predict CVS criteria achievement was trained and tested using 5-fold cross validation. Intersection over union, average precision, and balanced accuracy were computed to evaluate the model performance versus the annotated ground truth. RESULTS: A total of 2854 images from 201 LC videos were annotated and 402 images were further segmented. Mean intersection over union for segmentation was 66.6%. The model assessed the achievement of CVS criteria with a mean average precision and balanced accuracy of 71.9% and 71.4%, respectively. CONCLUSIONS: Deep learning algorithms can be trained to reliably segment hepatocystic anatomy and assess CVS criteria in still laparoscopic images. Surgical-technical partnerships should be encouraged to develop and evaluate deep learning models to improve surgical safety.


Assuntos
Doenças dos Ductos Biliares , Colecistectomia Laparoscópica , Aprendizado Profundo , Inteligência Artificial , Colecistectomia Laparoscópica/métodos , Humanos , Gravação em Vídeo
5.
Surg Endosc ; 36(11): 8379-8386, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-35171336

RESUMO

BACKGROUND: A computer vision (CV) platform named EndoDigest was recently developed to facilitate the use of surgical videos. Specifically, EndoDigest automatically provides short video clips to effectively document the critical view of safety (CVS) in laparoscopic cholecystectomy (LC). The aim of the present study is to validate EndoDigest on a multicentric dataset of LC videos. METHODS: LC videos from 4 centers were manually annotated with the time of the cystic duct division and an assessment of CVS criteria. Incomplete recordings, bailout procedures and procedures with an intraoperative cholangiogram were excluded. EndoDigest leveraged predictions of deep learning models for workflow analysis in a rule-based inference system designed to estimate the time of the cystic duct division. Performance was assessed by computing the error in estimating the manually annotated time of the cystic duct division. To provide concise video documentation of CVS, EndoDigest extracted video clips showing the 2 min preceding and the 30 s following the predicted cystic duct division. The relevance of the documentation was evaluated by assessing CVS in automatically extracted 2.5-min-long video clips. RESULTS: 144 of the 174 LC videos from 4 centers were analyzed. EndoDigest located the time of the cystic duct division with a mean error of 124.0 ± 270.6 s despite the use of fluorescent cholangiography in 27 procedures and great variations in surgical workflows across centers. The surgical evaluation found that 108 (75.0%) of the automatically extracted short video clips documented CVS effectively. CONCLUSIONS: EndoDigest was robust enough to reliably locate the time of the cystic duct division and efficiently video document CVS despite the highly variable workflows. Training specifically on data from each center could improve results; however, this multicentric validation shows the potential for clinical translation of this surgical data science tool to efficiently document surgical safety.


Assuntos
Colecistectomia Laparoscópica , Humanos , Colecistectomia Laparoscópica/métodos , Gravação em Vídeo , Colangiografia , Documentação , Computadores
6.
Ann Surg ; 274(1): e93-e95, 2021 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-33417329

RESUMO

OBJECTIVE: The aim of this study was to develop a computer vision platform to automatically locate critical events in surgical videos and provide short video clips documenting the critical view of safety (CVS) in laparoscopic cholecystectomy (LC). BACKGROUND: Intraoperative events are typically documented through operator-dictated reports that do not always translate the operative reality. Surgical videos provide complete information on surgical procedures, but the burden associated with storing and manually analyzing full-length videos has so far limited their effective use. METHODS: A computer vision platform named EndoDigest was developed and used to analyze LC videos. The mean absolute error (MAE) of the platform in automatically locating the manually annotated time of the cystic duct division in full-length videos was assessed. The relevance of the automatically extracted short video clips was evaluated by calculating the percentage of video clips in which the CVS was assessable by surgeons. RESULTS: A total of 155 LC videos were analyzed: 55 of these videos were used to develop EndoDigest, whereas the remaining 100 were used to test it. The time of the cystic duct division was automatically located with a MAE of 62.8 ±â€Š130.4 seconds (1.95% of full-length video duration). CVS was assessable in 91% of the 2.5 minutes long video clips automatically extracted from the considered test procedures. CONCLUSIONS: Deep learning models for workflow analysis can be used to reliably locate critical events in surgical videos and document CVS in LC. Further studies are needed to assess the clinical impact of surgical data science solutions for safer laparoscopic cholecystectomy.


Assuntos
Colecistectomia Laparoscópica/normas , Documentação/métodos , Processamento de Imagem Assistida por Computador/métodos , Segurança do Paciente/normas , Melhoria de Qualidade , Gravação em Vídeo , Algoritmos , Competência Clínica , Aprendizado Profundo , Humanos , Fluxo de Trabalho
7.
J Surg Oncol ; 124(2): 221-230, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34245578

RESUMO

Surgical data science (SDS) aims to improve the quality of interventional healthcare and its value through the capture, organization, analysis, and modeling of procedural data. As data capture has increased and artificial intelligence (AI) has advanced, SDS can help to unlock augmented and automated coaching, feedback, assessment, and decision support in surgery. We review major concepts in SDS and AI as applied to surgical education and surgical oncology.


Assuntos
Inteligência Artificial , Ciência de Dados , Educação de Pós-Graduação em Medicina/métodos , Oncologia Cirúrgica/educação , Competência Clínica , Sistemas de Apoio a Decisões Clínicas , Europa (Continente) , Humanos , América do Norte , Procedimentos Cirúrgicos Operatórios/educação , Procedimentos Cirúrgicos Operatórios/métodos
8.
Surg Endosc ; 35(9): 4918-4929, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34231065

RESUMO

BACKGROUND: The growing interest in analysis of surgical video through machine learning has led to increased research efforts; however, common methods of annotating video data are lacking. There is a need to establish recommendations on the annotation of surgical video data to enable assessment of algorithms and multi-institutional collaboration. METHODS: Four working groups were formed from a pool of participants that included clinicians, engineers, and data scientists. The working groups were focused on four themes: (1) temporal models, (2) actions and tasks, (3) tissue characteristics and general anatomy, and (4) software and data structure. A modified Delphi process was utilized to create a consensus survey based on suggested recommendations from each of the working groups. RESULTS: After three Delphi rounds, consensus was reached on recommendations for annotation within each of these domains. A hierarchy for annotation of temporal events in surgery was established. CONCLUSIONS: While additional work remains to achieve accepted standards for video annotation in surgery, the consensus recommendations on a general framework for annotation presented here lay the foundation for standardization. This type of framework is critical to enabling diverse datasets, performance benchmarks, and collaboration.


Assuntos
Aprendizado de Máquina , Consenso , Técnica Delphi , Humanos , Inquéritos e Questionários
9.
Proc IEEE Inst Electr Electron Eng ; 108(1): 198-214, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31920208

RESUMO

Data-driven computational approaches have evolved to enable extraction of information from medical images with a reliability, accuracy and speed which is already transforming their interpretation and exploitation in clinical practice. While similar benefits are longed for in the field of interventional imaging, this ambition is challenged by a much higher heterogeneity. Clinical workflows within interventional suites and operating theatres are extremely complex and typically rely on poorly integrated intra-operative devices, sensors, and support infrastructures. Taking stock of some of the most exciting developments in machine learning and artificial intelligence for computer assisted interventions, we highlight the crucial need to take context and human factors into account in order to address these challenges. Contextual artificial intelligence for computer assisted intervention, or CAI4CAI, arises as an emerging opportunity feeding into the broader field of surgical data science. Central challenges being addressed in CAI4CAI include how to integrate the ensemble of prior knowledge and instantaneous sensory information from experts, sensors and actuators; how to create and communicate a faithful and actionable shared representation of the surgery among a mixed human-AI actor team; how to design interventional systems and associated cognitive shared control schemes for online uncertainty-aware collaborative decision making ultimately producing more precise and reliable interventions.

10.
Surg Endosc ; 34(6): 2709-2714, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-31583466

RESUMO

BACKGROUND: In laparoscopic cholecystectomy (LC), achievement of the Critical View of Safety (CVS) is commonly advocated to prevent bile duct injuries (BDI). However, BDI rates remain stable, probably due to inconsistent application or a poor understanding of CVS as well as unreliable reporting. Objective video reporting could serve for quality auditing and help generate consistent datasets for deep learning models aimed at intraoperative assistance. In this study, we develop and test a method to report CVS using videos. METHOD: LC videos performed at our institution were retrieved and the video segments starting 60 s prior to the division of cystic structures were edited. Two independent reviewers assessed CVS using an adaptation of the doublet view 6-point scale and a novel binary method in which each criterion is considered either achieved or not. Feasibility to assess CVS in the edited video clips and inter-rater agreements were evaluated. RESULTS: CVS was attempted in 78 out of the 100 LC videos retrieved. CVS was assessable in 100% of the 60-s video clips. After mediation, CVS was achieved in 32/78(41.03%). Kappa scores of inter-rater agreements using the doublet view versus the binary assessment were as follows: 0.54 versus 0.75 for CVS achievement, 0.45 versus 0.62 for the dissection of the hepatocystic triangle, 0.36 versus 0.77 for the exposure of the lower part of the cystic plate, and 0.48 versus 0.79 for the 2 structures connected to the gallbladder. CONCLUSIONS: The present study is the first to formalize a reproducible method for objective video reporting of CVS in LC. Minute-long video clips provide information on CVS and binary assessment yields a higher inter-rater agreement than previously used methods. These results offer an easy-to-implement strategy for objective video reporting of CVS, which could be used for quality auditing, scientific communication, and development of deep learning models for intraoperative guidance.


Assuntos
Inteligência Artificial/normas , Colecistectomia Laparoscópica/métodos , Gravação em Vídeo/métodos , Feminino , Humanos , Masculino
11.
Br J Surg ; 111(1)2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-37935636

RESUMO

The growing availability of surgical digital data and developments in analytics such as artificial intelligence (AI) are being harnessed to improve surgical care. However, technical and cultural barriers to real-time intraoperative AI assistance exist. This early-stage clinical evaluation shows the technical feasibility of concurrently deploying several AIs in operating rooms for real-time assistance during procedures. In addition, potentially relevant clinical applications of these AI models are explored with a multidisciplinary cohort of key stakeholders.


Assuntos
Colecistectomia Laparoscópica , Humanos , Inteligência Artificial
12.
Br J Surg ; 111(6)2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38916133

RESUMO

Surgical technique is essential to ensure safe minimally invasive adrenalectomy. Due to the relative rarity of adrenal surgery, it is challenging to ensure adequate exposure in surgical training. Surgical video analysis supports auto-evaluation, expert assessment and could be a target for automatization. The developed ontology was validated by a European expert consensus and is applicable across the surgical techniques encountered in all participating centres, with an exemplary demonstration in bi-centric recordings. Standardization of adrenalectomy video analysis may foster surgical training and enable machine learning training for automated safety alerts.


Assuntos
Adrenalectomia , Técnica Delphi , Laparoscopia , Aprendizado de Máquina , Humanos , Adrenalectomia/educação , Adrenalectomia/métodos , Laparoscopia/educação , Laparoscopia/métodos , Projetos Piloto , Gravação em Vídeo
13.
Minim Invasive Ther Allied Technol ; 28(2): 82-90, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30849261

RESUMO

Recent years have seen tremendous progress in artificial intelligence (AI), such as with the automatic and real-time recognition of objects and activities in videos in the field of computer vision. Due to its increasing digitalization, the operating room (OR) promises to directly benefit from this progress in the form of new assistance tools that can enhance the abilities and performance of surgical teams. Key for such tools is the recognition of the surgical workflow, because efficient assistance by an AI system requires this system to be aware of the surgical context, namely of all activities taking place inside the operating room. We present here how several recent techniques relying on machine and deep learning can be used to analyze the activities taking place during surgery, using videos captured from either endoscopic or ceiling-mounted cameras. We also present two potential clinical applications that we are developing at the University of Strasbourg with our clinical partners.


Assuntos
Procedimentos Cirúrgicos Operatórios , Análise e Desempenho de Tarefas , Fluxo de Trabalho , Algoritmos , Inteligência Artificial , Aprendizado Profundo , Humanos , Invenções , Aprendizado de Máquina
14.
Bioelectromagnetics ; 39(7): 503-515, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30307039

RESUMO

This paper tackles the problem of estimating exposure to static magnetic field (SMF) in magnetic resonance imaging (MRI) sites using a non-invasive approach. The proposed approach relies on a vision-based system to detect people's body parts and on a mathematical model to compute SMF exposure. A multi-view camera system was used to capture the MRI room, and a vision-based system was applied to detect body parts. The detected localization was then fed into a mathematical model to compute SMF exposure. In this study, we focused on exposure at the neck due to two main reasons. First, according to regulations, the limit of exposure at head and trunk for MR workers is higher than that for the general public. Second, it was easier to attach a dosimeter at the neck to perform measurements, which allowed a quantitative evaluation of our approach. This approach was applied to two scenarios simulating the daily movements of medical workers for which dosimeter measurements were also recorded. The results indicated that the proposed approach predicted occupational SMF exposure with reasonable accuracy compared with the dosimeter measurements. The proposed approach is a simple safe working procedure to estimate the exposure of MR workers at different parts of the body without wearing any marker detection. It can be applied to reduce occupational SMF exposure, without changes in workers' performances. For that reason, our non-invasive proposed method can be used as a simple safety tool to estimate occupational SMF exposure in MR sites. Bioelectromagnetics. 39:503-515, 2018.© 2018 Wiley Periodicals, Inc.


Assuntos
Campos Magnéticos , Imageamento por Ressonância Magnética/instrumentação , Exposição Ocupacional/análise , Postura , Algoritmos , Humanos , Movimento
16.
Int J Comput Assist Radiol Surg ; 19(6): 1093-1101, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38573565

RESUMO

PURPOSE: In medical research, deep learning models rely on high-quality annotated data, a process often laborious and time-consuming. This is particularly true for detection tasks where bounding box annotations are required. The need to adjust two corners makes the process inherently frame-by-frame. Given the scarcity of experts' time, efficient annotation methods suitable for clinicians are needed. METHODS: We propose an on-the-fly method for live video annotation to enhance the annotation efficiency. In this approach, a continuous single-point annotation is maintained by keeping the cursor on the object in a live video, mitigating the need for tedious pausing and repetitive navigation inherent in traditional annotation methods. This novel annotation paradigm inherits the point annotation's ability to generate pseudo-labels using a point-to-box teacher model. We empirically evaluate this approach by developing a dataset and comparing on-the-fly annotation time against traditional annotation method. RESULTS: Using our method, annotation speed was 3.2 × faster than the traditional annotation technique. We achieved a mean improvement of 6.51 ± 0.98 AP@50 over conventional method at equivalent annotation budgets on the developed dataset. CONCLUSION: Without bells and whistles, our approach offers a significant speed-up in annotation tasks. It can be easily implemented on any annotation platform to accelerate the integration of deep learning in video-based medical research.


Assuntos
Aprendizado Profundo , Gravação em Vídeo , Gravação em Vídeo/métodos , Humanos , Curadoria de Dados/métodos
17.
Int J Comput Assist Radiol Surg ; 19(6): 1243-1250, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38678488

RESUMO

PURPOSE: Advances in deep learning have resulted in effective models for surgical video analysis; however, these models often fail to generalize across medical centers due to domain shift caused by variations in surgical workflow, camera setups, and patient demographics. Recently, object-centric learning has emerged as a promising approach for improved surgical scene understanding, capturing and disentangling visual and semantic properties of surgical tools and anatomy to improve downstream task performance. In this work, we conduct a multicentric performance benchmark of object-centric approaches, focusing on critical view of safety assessment in laparoscopic cholecystectomy, then propose an improved approach for unseen domain generalization. METHODS: We evaluate four object-centric approaches for domain generalization, establishing baseline performance. Next, leveraging the disentangled nature of object-centric representations, we dissect one of these methods through a series of ablations (e.g., ignoring either visual or semantic features for downstream classification). Finally, based on the results of these ablations, we develop an optimized method specifically tailored for domain generalization, LG-DG, that includes a novel disentanglement loss function. RESULTS: Our optimized approach, LG-DG, achieves an improvement of 9.28% over the best baseline approach. More broadly, we show that object-centric approaches are highly effective for domain generalization thanks to their modular approach to representation learning. CONCLUSION: We investigate the use of object-centric methods for unseen domain generalization, identify method-agnostic factors critical for performance, and present an optimized approach that substantially outperforms existing methods.


Assuntos
Colecistectomia Laparoscópica , Humanos , Colecistectomia Laparoscópica/métodos , Gravação em Vídeo , Aprendizado Profundo
18.
IEEE Trans Med Imaging ; 43(3): 1247-1258, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37971921

RESUMO

Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. Prior works have approached this task by including semantic segmentation as an intermediate step, using predicted segmentation masks to then predict the CVS. While these methods are effective, they rely on extremely expensive ground-truth segmentation annotations and tend to fail when the predicted segmentation is incorrect, limiting generalization. In this work, we propose a method for CVS prediction wherein we first represent a surgical image using a disentangled latent scene graph, then process this representation using a graph neural network. Our graph representations explicitly encode semantic information - object location, class information, geometric relations - to improve anatomy-driven reasoning, as well as visual features to retain differentiability and thereby provide robustness to semantic errors. Finally, to address annotation cost, we propose to train our method using only bounding box annotations, incorporating an auxiliary image reconstruction objective to learn fine-grained object boundaries. We show that our method not only outperforms several baseline methods when trained with bounding box annotations, but also scales effectively when trained with segmentation masks, maintaining state-of-the-art performance.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Semântica
19.
Cir Esp (Engl Ed) ; 102 Suppl 1: S66-S71, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38704146

RESUMO

Artificial intelligence (AI) will power many of the tools in the armamentarium of digital surgeons. AI methods and surgical proof-of-concept flourish, but we have yet to witness clinical translation and value. Here we exemplify the potential of AI in the care pathway of colorectal cancer patients and discuss clinical, technical, and governance considerations of major importance for the safe translation of surgical AI for the benefit of our patients and practices.


Assuntos
Inteligência Artificial , Neoplasias Colorretais , Humanos , Neoplasias Colorretais/cirurgia
20.
Int J Comput Assist Radiol Surg ; 19(7): 1409-1417, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38780829

RESUMO

PURPOSE: The modern operating room is becoming increasingly complex, requiring innovative intra-operative support systems. While the focus of surgical data science has largely been on video analysis, integrating surgical computer vision with natural language capabilities is emerging as a necessity. Our work aims to advance visual question answering (VQA) in the surgical context with scene graph knowledge, addressing two main challenges in the current surgical VQA systems: removing question-condition bias in the surgical VQA dataset and incorporating scene-aware reasoning in the surgical VQA model design. METHODS: First, we propose a surgical scene graph-based dataset, SSG-VQA, generated by employing segmentation and detection models on publicly available datasets. We build surgical scene graphs using spatial and action information of instruments and anatomies. These graphs are fed into a question engine, generating diverse QA pairs. We then propose SSG-VQA-Net, a novel surgical VQA model incorporating a lightweight Scene-embedded Interaction Module, which integrates geometric scene knowledge in the VQA model design by employing cross-attention between the textual and the scene features. RESULTS: Our comprehensive analysis shows that our SSG-VQA dataset provides a more complex, diverse, geometrically grounded, unbiased and surgical action-oriented dataset compared to existing surgical VQA datasets and SSG-VQA-Net outperforms existing methods across different question types and complexities. We highlight that the primary limitation in the current surgical VQA systems is the lack of scene knowledge to answer complex queries. CONCLUSION: We present a novel surgical VQA dataset and model and show that results can be significantly improved by incorporating geometric scene features in the VQA model design. We point out that the bottleneck of the current surgical visual question-answer model lies in learning the encoded representation rather than decoding the sequence. Our SSG-VQA dataset provides a diagnostic benchmark to test the scene understanding and reasoning capabilities of the model. The source code and the dataset will be made publicly available at: https://github.com/CAMMA-public/SSG-VQA .


Assuntos
Salas Cirúrgicas , Humanos , Cirurgia Assistida por Computador/métodos , Processamento de Linguagem Natural , Gravação em Vídeo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA