RESUMO
OBJECTIVES: We aimed to study classical, publicly available convolutional neural networks (3D-CNNs) using a combination of several cine-MR orientation planes for the estimation of left ventricular ejection fraction (LVEF) without contour tracing. METHODS: Cine-MR examinations carried out on 1082 patients from our institution were analysed by comparing the LVEF provided by the CVI42 software (V5.9.3) with the estimation resulting from different 3D-CNN models and various combinations of long- and short-axis orientation planes. RESULTS: The 3D-Resnet18 architecture appeared to be the most favourable, and the results gradually and significantly improved as several long-axis and short-axis planes were combined. Simply pasting multiple orientation views into composite frames increased performance. Optimal results were obtained by pasting two long-axis views and six short-axis views. The best configuration provided an R2 = 0.83, a mean absolute error (MAE) = 4.97, and a root mean square error (RMSE) = 6.29; the area under the ROC curve (AUC) for the classification of LVEF < 40% was 0.99, and for the classification of LVEF > 60%, the AUC was 0.97. Internal validation performed on 149 additional patients after model training provided very similar results (MAE 4.98). External validation carried out on 62 patients from another institution showed an MAE of 6.59. Our results in this area are among the most promising obtained to date using CNNs with cardiac magnetic resonance. CONCLUSION: (1) The use of traditional 3D-CNNs and a combination of multiple orientation planes is capable of estimating LVEF from cine-MRI data without segmenting ventricular contours, with a reliability similar to that of traditional methods. (2) Performance significantly improves as the number of orientation planes increases, providing a more complete view of the left ventricle.
RESUMO
Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. Prior works have approached this task by including semantic segmentation as an intermediate step, using predicted segmentation masks to then predict the CVS. While these methods are effective, they rely on extremely expensive ground-truth segmentation annotations and tend to fail when the predicted segmentation is incorrect, limiting generalization. In this work, we propose a method for CVS prediction wherein we first represent a surgical image using a disentangled latent scene graph, then process this representation using a graph neural network. Our graph representations explicitly encode semantic information - object location, class information, geometric relations - to improve anatomy-driven reasoning, as well as visual features to retain differentiability and thereby provide robustness to semantic errors. Finally, to address annotation cost, we propose to train our method using only bounding box annotations, incorporating an auxiliary image reconstruction objective to learn fine-grained object boundaries. We show that our method not only outperforms several baseline methods when trained with bounding box annotations, but also scales effectively when trained with segmentation masks, maintaining state-of-the-art performance.
Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , SemânticaRESUMO
The growing availability of surgical digital data and developments in analytics such as artificial intelligence (AI) are being harnessed to improve surgical care. However, technical and cultural barriers to real-time intraoperative AI assistance exist. This early-stage clinical evaluation shows the technical feasibility of concurrently deploying several AIs in operating rooms for real-time assistance during procedures. In addition, potentially relevant clinical applications of these AI models are explored with a multidisciplinary cohort of key stakeholders.
Assuntos
Colecistectomia Laparoscópica , Humanos , Inteligência ArtificialRESUMO
Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery.
Assuntos
Inteligência Artificial , Cirurgia Assistida por Computador , Humanos , Endoscopia , Algoritmos , Cirurgia Assistida por Computador/métodos , Instrumentos CirúrgicosRESUMO
Surgical video analysis facilitates education and research. However, video recordings of endoscopic surgeries can contain privacy-sensitive information, especially if the endoscopic camera is moved out of the body of patients and out-of-body scenes are recorded. Therefore, identification of out-of-body scenes in endoscopic videos is of major importance to preserve the privacy of patients and operating room staff. This study developed and validated a deep learning model for the identification of out-of-body images in endoscopic videos. The model was trained and evaluated on an internal dataset of 12 different types of laparoscopic and robotic surgeries and was externally validated on two independent multicentric test datasets of laparoscopic gastric bypass and cholecystectomy surgeries. Model performance was evaluated compared to human ground truth annotations measuring the receiver operating characteristic area under the curve (ROC AUC). The internal dataset consisting of 356,267 images from 48 videos and the two multicentric test datasets consisting of 54,385 and 58,349 images from 10 and 20 videos, respectively, were annotated. The model identified out-of-body images with 99.97% ROC AUC on the internal test dataset. Mean ± standard deviation ROC AUC on the multicentric gastric bypass dataset was 99.94 ± 0.07% and 99.71 ± 0.40% on the multicentric cholecystectomy dataset, respectively. The model can reliably identify out-of-body images in endoscopic videos and is publicly shared. This facilitates privacy preservation in surgical video analysis.
Assuntos
Aprendizado Profundo , Laparoscopia , Humanos , Privacidade , Gravação em Vídeo , ColecistectomiaRESUMO
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of combination delivers more comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and the assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms from the competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
Assuntos
Benchmarking , Laparoscopia , Humanos , Algoritmos , Salas Cirúrgicas , Fluxo de Trabalho , Aprendizado ProfundoRESUMO
PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center video dataset. In this work we investigated the generalizability of phase recognition algorithms in a multicenter setting including more difficult recognition tasks such as surgical action and surgical skill. METHODS: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 h was created. Labels included framewise annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 international Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 research teams trained and submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. RESULTS: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n = 9 teams), for instrument presence detection between 38.5% and 63.8% (n = 8 teams), but for action recognition only between 21.8% and 23.3% (n = 5 teams). The average absolute error for skill assessment was 0.78 (n = 1 team). CONCLUSION: Surgical workflow and skill analysis are promising technologies to support the surgical team, but there is still room for improvement, as shown by our comparison of machine learning algorithms. This novel HeiChole benchmark can be used for comparable evaluation and validation of future work. In future studies, it is of utmost importance to create more open, high-quality datasets in order to allow the development of artificial intelligence and cognitive robotics in surgery.
Assuntos
Inteligência Artificial , Benchmarking , Humanos , Fluxo de Trabalho , Algoritmos , Aprendizado de MáquinaRESUMO
The aim of this work was to compare the classification of cardiac MR-images of AL versus ATTR amyloidosis by neural networks and by experienced human readers. Cine-MR images and late gadolinium enhancement (LGE) images of 120 patients were studied (70 AL and 50 TTR). A VGG16 convolutional neural network (CNN) was trained with a 5-fold cross validation process, taking care to strictly distribute images of a given patient in either the training group or the test group. The analysis was performed at the patient level by averaging the predictions obtained for each image. The classification accuracy obtained between AL and ATTR amyloidosis was 0.750 for cine-CNN, 0.611 for Gado-CNN and between 0.617 and 0.675 for human readers. The corresponding AUC of the ROC curve was 0.839 for cine-CNN, 0.679 for gado-CNN (p < 0.004 vs. cine) and 0.714 for the best human reader (p < 0.007 vs. cine). Logistic regression with cine-CNN and gado-CNN, as well as analysis focused on the specific orientation plane, did not change the overall results. We conclude that cine-CNN leads to significantly better discrimination between AL and ATTR amyloidosis as compared to gado-CNN or human readers, but with lower performance than reported in studies where visual diagnosis is easy, and is currently suboptimal for clinical practice.
Assuntos
Colecistectomia Laparoscópica , Competência Clínica , Computadores , Humanos , Gravação em VídeoRESUMO
BACKGROUND: A computer vision (CV) platform named EndoDigest was recently developed to facilitate the use of surgical videos. Specifically, EndoDigest automatically provides short video clips to effectively document the critical view of safety (CVS) in laparoscopic cholecystectomy (LC). The aim of the present study is to validate EndoDigest on a multicentric dataset of LC videos. METHODS: LC videos from 4 centers were manually annotated with the time of the cystic duct division and an assessment of CVS criteria. Incomplete recordings, bailout procedures and procedures with an intraoperative cholangiogram were excluded. EndoDigest leveraged predictions of deep learning models for workflow analysis in a rule-based inference system designed to estimate the time of the cystic duct division. Performance was assessed by computing the error in estimating the manually annotated time of the cystic duct division. To provide concise video documentation of CVS, EndoDigest extracted video clips showing the 2 min preceding and the 30 s following the predicted cystic duct division. The relevance of the documentation was evaluated by assessing CVS in automatically extracted 2.5-min-long video clips. RESULTS: 144 of the 174 LC videos from 4 centers were analyzed. EndoDigest located the time of the cystic duct division with a mean error of 124.0 ± 270.6 s despite the use of fluorescent cholangiography in 27 procedures and great variations in surgical workflows across centers. The surgical evaluation found that 108 (75.0%) of the automatically extracted short video clips documented CVS effectively. CONCLUSIONS: EndoDigest was robust enough to reliably locate the time of the cystic duct division and efficiently video document CVS despite the highly variable workflows. Training specifically on data from each center could improve results; however, this multicentric validation shows the potential for clinical translation of this surgical data science tool to efficiently document surgical safety.
Assuntos
Colecistectomia Laparoscópica , Humanos , Colecistectomia Laparoscópica/métodos , Gravação em Vídeo , Colangiografia , Documentação , ComputadoresRESUMO
OBJECTIVE: To develop a deep learning model to automatically segment hepatocystic anatomy and assess the criteria defining the critical view of safety (CVS) in laparoscopic cholecystectomy (LC). BACKGROUND: Poor implementation and subjective interpretation of CVS contributes to the stable rates of bile duct injuries in LC. As CVS is assessed visually, this task can be automated by using computer vision, an area of artificial intelligence aimed at interpreting images. METHODS: Still images from LC videos were annotated with CVS criteria and hepatocystic anatomy segmentation. A deep neural network comprising a segmentation model to highlight hepatocystic anatomy and a classification model to predict CVS criteria achievement was trained and tested using 5-fold cross validation. Intersection over union, average precision, and balanced accuracy were computed to evaluate the model performance versus the annotated ground truth. RESULTS: A total of 2854 images from 201 LC videos were annotated and 402 images were further segmented. Mean intersection over union for segmentation was 66.6%. The model assessed the achievement of CVS criteria with a mean average precision and balanced accuracy of 71.9% and 71.4%, respectively. CONCLUSIONS: Deep learning algorithms can be trained to reliably segment hepatocystic anatomy and assess CVS criteria in still laparoscopic images. Surgical-technical partnerships should be encouraged to develop and evaluate deep learning models to improve surgical safety.
Assuntos
Doenças dos Ductos Biliares , Colecistectomia Laparoscópica , Aprendizado Profundo , Inteligência Artificial , Colecistectomia Laparoscópica/métodos , Humanos , Gravação em VídeoRESUMO
The automatic classification of various types of cardiomyopathies is desirable but has never been performed using a convolutional neural network (CNN). The purpose of this study was to evaluate currently available CNN models to classify cine magnetic resonance (cine-MR) images of cardiomyopathies. METHOD: Diastolic and systolic frames of 1200 cine-MR sequences of three categories of subjects (395 normal, 411 hypertrophic cardiomyopathy, and 394 dilated cardiomyopathy) were selected, preprocessed, and labeled. Pretrained, fine-tuned deep learning models (VGG) were used for image classification (sixfold cross-validation and double split testing with hold-out data). The heat activation map algorithm (Grad-CAM) was applied to reveal salient pixel areas leading to the classification. RESULTS: The diastolic-systolic dual-input concatenated VGG model cross-validation accuracy was 0.982 ± 0.009. Summed confusion matrices showed that, for the 1200 inputs, the VGG model led to 22 errors. The classification of a 227-input validation group, carried out by an experienced radiologist and cardiologist, led to a similar number of discrepancies. The image preparation process led to 5% accuracy improvement as compared to nonprepared images. Grad-CAM heat activation maps showed that most misclassifications occurred when extracardiac location caught the attention of the network. CONCLUSIONS: CNN networks are very well suited and are 98% accurate for the classification of cardiomyopathies, regardless of the imaging plane, when both diastolic and systolic frames are incorporated. Misclassification is in the same range as inter-observer discrepancies in experienced human readers.
RESUMO
OBJECTIVE: The aim of this study was to develop a computer vision platform to automatically locate critical events in surgical videos and provide short video clips documenting the critical view of safety (CVS) in laparoscopic cholecystectomy (LC). BACKGROUND: Intraoperative events are typically documented through operator-dictated reports that do not always translate the operative reality. Surgical videos provide complete information on surgical procedures, but the burden associated with storing and manually analyzing full-length videos has so far limited their effective use. METHODS: A computer vision platform named EndoDigest was developed and used to analyze LC videos. The mean absolute error (MAE) of the platform in automatically locating the manually annotated time of the cystic duct division in full-length videos was assessed. The relevance of the automatically extracted short video clips was evaluated by calculating the percentage of video clips in which the CVS was assessable by surgeons. RESULTS: A total of 155 LC videos were analyzed: 55 of these videos were used to develop EndoDigest, whereas the remaining 100 were used to test it. The time of the cystic duct division was automatically located with a MAE of 62.8â±â130.4âseconds (1.95% of full-length video duration). CVS was assessable in 91% of the 2.5 minutes long video clips automatically extracted from the considered test procedures. CONCLUSIONS: Deep learning models for workflow analysis can be used to reliably locate critical events in surgical videos and document CVS in LC. Further studies are needed to assess the clinical impact of surgical data science solutions for safer laparoscopic cholecystectomy.
Assuntos
Colecistectomia Laparoscópica/normas , Documentação/métodos , Processamento de Imagem Assistida por Computador/métodos , Segurança do Paciente/normas , Melhoria de Qualidade , Gravação em Vídeo , Algoritmos , Competência Clínica , Aprendizado Profundo , Humanos , Fluxo de TrabalhoRESUMO
BACKGROUND: Diagnosing cardiac amyloidosis (CA) from cine-CMR (cardiac magnetic resonance) alone is not reliable. In this study, we tested if a convolutional neural network (CNN) could outperform the visual diagnosis of experienced operators. METHOD: 119 patients with cardiac amyloidosis and 122 patients with left ventricular hypertrophy (LVH) of other origins were retrospectively selected. Diastolic and systolic cine-CMR images were preprocessed and labeled. A dual-input visual geometry group (VGG ) model was used for binary image classification. All images belonging to the same patient were distributed in the same set. Accuracy and area under the curve (AUC) were calculated per frame and per patient from a 40% held-out test set. Results were compared to a visual analysis assessed by three experienced operators. RESULTS: frame-based comparisons between humans and a CNN provided an accuracy of 0.605 vs. 0.746 (p < 0.0008) and an AUC of 0.630 vs. 0.824 (p < 0.0001). Patient-based comparisons provided an accuracy of 0.660 vs. 0.825 (p < 0.008) and an AUC of 0.727 vs. 0.895 (p < 0.002). CONCLUSION: based on cine-CMR images alone, a CNN is able to discriminate cardiac amyloidosis from LVH of other origins better than experienced human operators (15 to 20 points more in absolute value for accuracy and AUC), demonstrating a unique capability to identify what the eyes cannot see through classical radiological analysis.