RESUMO
OBJECTIVE: While the utilization of machine learning (ML) for data analysis typically requires significant technical expertise, novel platforms can deploy ML methods without requiring the user to have any coding experience (termed AutoML). The potential for these methods to be applied to neurosurgical video and surgical data science is unknown. METHODS: AutoML, a code-free ML (CFML) system, was used to identify surgical instruments contained within each frame of endoscopic, endonasal intraoperative video obtained from a previously validated internal carotid injury training exercise performed on a high-fidelity cadaver model. Instrument-detection performances using CFML were compared with two state-of-the-art ML models built using the Python coding language on the same intraoperative video data set. RESULTS: The CFML system successfully ingested surgical video without the use of any code. A total of 31,443 images were used to develop this model; 27,223 images were uploaded for training, 2292 images for validation, and 1928 images for testing. The mean average precision on the test set across all instruments was 0.708. The CFML model outperformed two standard object detection networks, RetinaNet and YOLOv3, which had mean average precisions of 0.669 and 0.527, respectively, in analyzing the same data set. Significant advantages to the CFML system included ease of use, relatively low cost, displays of true/false positives and negatives in a user-friendly interface, and the ability to deploy models for further analysis with ease. Significant drawbacks of the CFML model included an inability to view the structure of the trained model, an inability to update the ML model once trained with new examples, and the inability for robust downstream analysis of model performance and error modes. CONCLUSIONS: This first report describes the baseline performance of CFML in an object detection task using a publicly available surgical video data set as a test bed. Compared with standard, code-based object detection networks, CFML exceeded performance standards. This finding is encouraging for surgeon-scientists seeking to perform object detection tasks to answer clinical questions, perform quality improvement, and develop novel research ideas. The limited interpretability and customization of CFML models remain ongoing challenges. With the further development of code-free platforms, CFML will become increasingly important across biomedical research. Using CFML, surgeons without significant coding experience can perform exploratory ML analyses rapidly and efficiently.
Assuntos
Benchmarking , Cirurgiões , Algoritmos , Estudos de Viabilidade , Humanos , Aprendizado de MáquinaRESUMO
OBJECTIVE: Virtual reality (VR) and augmented reality (AR) systems are increasingly available to neurosurgeons. These systems may provide opportunities for technical rehearsal and assessments of surgeon performance. The assessment of neurosurgeon skill in VR and AR environments and the validity of VR and AR feedback has not been systematically reviewed. METHODS: A systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines was conducted through MEDLINE and PubMed. Studies published in English between January 1990 and February 2021 describing the use of VR or AR to quantify surgical technical performance of neurosurgeons without the use of human raters were included. The types and categories of automated performance metrics (APMs) from each of these studies were recorded. RESULTS: Thirty-three VR studies were included in the review; no AR studies met inclusion criteria. VR APMs were categorized as either distance to target, force, kinematics, time, blood loss, or volume of resection. Distance and time were the most well-studied APM domains, although all domains were effective at differentiating surgeon experience levels. Distance was successfully used to track improvements with practice. Examining volume of resection demonstrated that attending surgeons removed less simulated tumor but preserved more normal tissue than trainees. More recently, APMs have been used in machine learning algorithms to predict level of training with a high degree of accuracy. Key limitations to enhanced-reality systems include limited AR usage for automated surgical assessment and lack of external and longitudinal validation of VR systems. CONCLUSIONS: VR has been used to assess surgeon performance across a wide spectrum of domains. The VR environment can be used to quantify surgeon performance, assess surgeon proficiency, and track training progression. AR systems have not yet been used to provide metrics for surgeon performance assessment despite potential for intraoperative integration. VR-based APMs may be especially useful for metrics that are difficult to assess intraoperatively, including blood loss and extent of resection.
Assuntos
Realidade Aumentada , Neurocirurgia , Realidade Virtual , Humanos , Procedimentos Neurocirúrgicos , Interface Usuário-ComputadorRESUMO
BACKGROUND: Intraoperative tool movement data have been demonstrated to be clinically useful in quantifying surgical performance. However, collecting this information from intraoperative video requires laborious hand annotation. The ability to automatically annotate tools in surgical video would advance surgical data science by eliminating a time-intensive step in research. OBJECTIVE: To identify whether machine learning (ML) can automatically identify surgical instruments contained within neurosurgical video. METHODS: A ML model which automatically identifies surgical instruments in frame was developed and trained on multiple publicly available surgical video data sets with instrument location annotations. A total of 39 693 frames from 4 data sets were used (endoscopic endonasal surgery [EEA] [30 015 frames], cataract surgery [4670], laparoscopic cholecystectomy [2532], and microscope-assisted brain/spine tumor removal [2476]). A second model trained only on EEA video was also developed. Intraoperative EEA videos from YouTube were used for test data (3 videos, 1239 frames). RESULTS: The YouTube data set contained 2169 total instruments. Mean average precision (mAP) for instrument detection on the YouTube data set was 0.74. The mAP for each individual video was 0.65, 0.74, and 0.89. The second model trained only on EEA video also had an overall mAP of 0.74 (0.62, 0.84, and 0.88 for individual videos). Development costs were $130 for manual video annotation and under $100 for computation. CONCLUSION: Surgical instruments contained within endoscopic endonasal intraoperative video can be detected using a fully automated ML model. The addition of disparate surgical data sets did not improve model performance, although these data sets may improve generalizability of the model in other use cases.
Assuntos
Aprendizado de Máquina , Instrumentos Cirúrgicos , Humanos , Gravação em VídeoRESUMO
BACKGROUND: Deep neural networks (DNNs) have not been proven to detect blood loss (BL) or predict surgeon performance from video. OBJECTIVE: To train a DNN using video from cadaveric training exercises of surgeons controlling simulated internal carotid hemorrhage to predict clinically relevant outcomes. METHODS: Video was input as a series of images; deep learning networks were developed, which predicted BL and task success from images alone (automated model) and images plus human-labeled instrument annotations (semiautomated model). These models were compared against 2 reference models, which used average BL across all trials as its prediction (control 1) and a linear regression with time to hemostasis (a metric with known association with BL) as input (control 2). The root-mean-square error (RMSE) and correlation coefficients were used to compare the models; lower RMSE indicates superior performance. RESULTS: One hundred forty-three trials were used (123 for training and 20 for testing). Deep learning models outperformed controls (control 1: RMSE 489 mL, control 2: RMSE 431 mL, R2 = 0.35) at BL prediction. The automated model predicted BL with an RMSE of 358 mL (R2 = 0.4) and correctly classified outcome in 85% of trials. The RMSE and classification performance of the semiautomated model improved to 260 mL and 90%, respectively. CONCLUSION: BL and task outcome classification are important components of an automated assessment of surgical performance. DNNs can predict BL and outcome of hemorrhage control from video alone; their performance is improved with surgical instrument presence data. The generalizability of DNNs trained on hemorrhage control tasks should be investigated.
Assuntos
Redes Neurais de Computação , Cirurgiões , Artérias Carótidas , Hemorragia , Humanos , Modelos LinearesRESUMO
Importance: Surgical data scientists lack video data sets that depict adverse events, which may affect model generalizability and introduce bias. Hemorrhage may be particularly challenging for computer vision-based models because blood obscures the scene. Objective: To assess the utility of the Simulated Outcomes Following Carotid Artery Laceration (SOCAL)-a publicly available surgical video data set of hemorrhage complication management with instrument annotations and task outcomes-to provide benchmarks for surgical data science techniques, including computer vision instrument detection, instrument use metrics and outcome associations, and validation of a SOCAL-trained neural network using real operative video. Design, Setting, and Participants: For this quailty improvement study, a total of 75 surgeons with 1 to 30 years' experience (mean, 7 years) were filmed from January 1, 2017, to December 31, 2020, managing catastrophic surgical hemorrhage in a high-fidelity cadaveric training exercise at nationwide training courses. Videos were annotated from January 1 to June 30, 2021. Interventions: Surgeons received expert coaching between 2 trials. Main Outcomes and Measures: Hemostasis within 5 minutes (task success, dichotomous), time to hemostasis (in seconds), and blood loss (in milliliters) were recorded. Deep neural networks (DNNs) were trained to detect surgical instruments in view. Model performance was measured using mean average precision (mAP), sensitivity, and positive predictive value. Results: SOCAL contains 31â¯443 frames with 65â¯071 surgical instrument annotations from 147 trials with associated surgeon demographic characteristics, time to hemostasis, and recorded blood loss for each trial. Computer vision-based instrument detection methods using DNNs trained on SOCAL achieved a mAP of 0.67 overall and 0.91 for the most common surgical instrument (suction). Hemorrhage control challenges standard object detectors: detection of some surgical instruments remained poor (mAP, 0.25). On real intraoperative video, the model achieved a sensitivity of 0.77 and a positive predictive value of 0.96. Instrument use metrics derived from the SOCAL video were significantly associated with performance (blood loss). Conclusions and Relevance: Hemorrhage control is a high-stakes adverse event that poses unique challenges for video analysis, but no data sets of hemorrhage control exist. The use of SOCAL, the first data set to depict hemorrhage control, allows the benchmarking of data science applications, including object detection, performance metric development, and identification of metrics associated with outcomes. In the future, SOCAL may be used to build and validate surgical data science models.
Assuntos
Lacerações , Cirurgiões , Artérias Carótidas , Humanos , Lacerações/cirurgia , Aprendizado de Máquina , Redes Neurais de ComputaçãoRESUMO
OBJECTIVE: Experts can assess surgeon skill using surgical video, but a limited number of expert surgeons are available. Automated performance metrics (APMs) are a promising alternative but have not been created from operative videos in neurosurgery to date. The authors aimed to evaluate whether video-based APMs can predict task success and blood loss during endonasal endoscopic surgery in a validated cadaveric simulator of vascular injury of the internal carotid artery. METHODS: Videos of cadaveric simulation trials by 73 neurosurgeons and otorhinolaryngologists were analyzed and manually annotated with bounding boxes to identify the surgical instruments in the frame. APMs in five domains were defined-instrument usage, time-to-phase, instrument disappearance, instrument movement, and instrument interactions-on the basis of expert analysis and task-specific surgical progressions. Bounding-box data of instrument position were then used to generate APMs for each trial. Multivariate linear regression was used to test for the associations between APMs and blood loss and task success (hemorrhage control in less than 5 minutes). The APMs of 93 successful trials were compared with the APMs of 49 unsuccessful trials. RESULTS: In total, 29,151 frames of surgical video were annotated. Successful simulation trials had superior APMs in each domain, including proportionately more time spent with the key instruments in view (p < 0.001) and less time without hemorrhage control (p = 0.002). APMs in all domains improved in subsequent trials after the participants received personalized expert instruction. Attending surgeons had superior instrument usage, time-to-phase, and instrument disappearance metrics compared with resident surgeons (p < 0.01). APMs predicted surgeon performance better than surgeon training level or prior experience. A regression model that included APMs predicted blood loss with an R2 value of 0.87 (p < 0.001). CONCLUSIONS: Video-based APMs were superior predictors of simulation trial success and blood loss than surgeon characteristics such as case volume and attending status. Surgeon educators can use APMs to assess competency, quantify performance, and provide actionable, structured feedback in order to improve patient outcomes. Validation of APMs provides a benchmark for further development of fully automated video assessment pipelines that utilize machine learning and computer vision.