Search | VHL Regional Portal

1.

Correcting for Rater Effects in Operating Room Surgical Skills Assessment.

Chou, Ryan; Naz, Hajira; Boahene, Kofi D O; Maxwell, Jessica H; Wanamaker, John R; Byrne, Patrick J; Papel, Ira D; Kontis, Theda C; Hager, Gregory D; Ishii, Lisa E; Malekzadeh, Sonya; Vedula, S Swaroop; Ishii, Masaru.

Laryngoscope ; 2024 Mar 12.

Article in English | MEDLINE | ID: mdl-38470307

ABSTRACT

OBJECTIVE: To estimate and adjust for rater effects in operating room surgical skills assessment performed using a structured rating scale for nasal septoplasty. METHODS: We analyzed survey responses from attending surgeons (raters) who supervised residents and fellows (trainees) performing nasal septoplasty in a prospective cohort study. We fit a structural equation model with the rubric item scores regressed on a latent component of skill and then fit a second model including the rating surgeon as a random effect to model a rater-effects-adjusted latent surgical skill. We validated this model against conventional measures including the level of expertise and post-graduation year (PGY) commensurate with the trainee's performance, the actual PGY of the trainee, and whether the surgical goals were achieved. RESULTS: Our dataset included 188 assessments by 7 raters and 41 trainees. The model with one latent construct for surgical skill and the rater as a random effect was the best. Rubric scores depended on how severe or lenient the rater was, sometimes almost as much as they depended on trainee skill. Rater-adjusted latent skill scores increased with attending-estimated skill levels and PGY of trainees, increased with the actual PGY, and appeared constant over different levels of achievement of surgical goals. CONCLUSION: Our work provides a method to obtain rater effect adjusted surgical skill assessments in the operating room using structured rating scales. Our method allows for the creation of standardized (i.e., rater-effects-adjusted) quantitative surgical skill benchmarks using national-level databases on trainee assessments. LEVEL OF EVIDENCE: N/A Laryngoscope, 2024.

2.

Opportunities for Improving Glaucoma Clinical Trials via Deep Learning-Based Identification of Patients with Low Visual Field Variability.

Wang, Ruolin; Bradley, Chris; Herbert, Patrick; Hou, Kaihua; Hager, Gregory D; Breininger, Katharina; Unberath, Mathias; Ramulu, Pradeep; Yohannan, Jithin.

Ophthalmol Glaucoma ; 2024 Jan 29.

Article in English | MEDLINE | ID: mdl-38296108

ABSTRACT

PURPOSE: Develop and evaluate the performance of a deep learning model (DLM) that forecasts eyes with low future visual field (VF) variability, and study the impact of using this DLM on sample size requirements for neuroprotective trials. DESIGN: Retrospective cohort and simulation study. METHODS: We included 1 eye per patient with baseline reliable VFs, OCT, clinical measures (demographics, intraocular pressure, and visual acuity), and 5 subsequent reliable VFs to forecast VF variability using DLMs and perform sample size estimates. We estimated sample size for 3 groups of eyes: all eyes (AE), low variability eyes (LVE: the subset of AE with a standard deviation of mean deviation [MD] slope residuals in the bottom 25th percentile), and DLM-predicted low variability eyes (DLPE: the subset of AE predicted to be low variability by the DLM). Deep learning models using only baseline VF/OCT/clinical data as input (DLM1), or also using a second VF (DLM2) were constructed to predict low VF variability (DLPE1 and DLPE2, respectively). Data were split 60/10/30 into train/val/test. Clinical trial simulations were performed only on the test set. We estimated the sample size necessary to detect treatment effects of 20% to 50% in MD slope with 80% power. Power was defined as the percentage of simulated clinical trials where the MD slope was significantly worse from the control. Clinical trials were simulated with visits every 3 months with a total of 10 visits. RESULTS: A total of 2817 eyes were included in the analysis. Deep learning models 1 and 2 achieved an area under the receiver operating characteristic curve of 0.73 (95% confidence interval [CI]: 0.68, 0.76) and 0.82 (95% CI: 0.78, 0.85) in forecasting low VF variability. When compared with including AE, using DLPE1 and DLPE2 reduced sample size to achieve 80% power by 30% and 38% for 30% treatment effect, and 31% and 38% for 50% treatment effect. CONCLUSIONS: Deep learning models can forecast eyes with low VF variability using data from a single baseline clinical visit. This can reduce sample size requirements, and potentially reduce the burden of future glaucoma clinical trials. FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

3.

Curricula for teaching end-users to kinesthetically program collaborative robots.

Ajaykumar, Gopika; Hager, Gregory D; Huang, Chien-Ming.

PLoS One ; 18(12): e0294786, 2023.

Article in English | MEDLINE | ID: mdl-38039277

ABSTRACT

Non-expert users can now program robots using various end-user robot programming methods, which have widened the use of robots and lowered barriers preventing robot use by laypeople. Kinesthetic teaching is a common form of end-user robot programming, allowing users to forgo writing code by physically guiding the robot to demonstrate behaviors. Although it can be more accessible than writing code, kinesthetic teaching is difficult in practice because of users' unfamiliarity with kinematics or limitations of robots and programming interfaces. Developing good kinesthetic demonstrations requires physical and cognitive skills, such as the ability to plan effective grasps for different task objects and constraints, to overcome programming difficulties. How to help users learn these skills remains a largely unexplored question, with users conventionally learning through self-guided practice. Our study compares how self-guided practice compares with curriculum-based training in building users' programming proficiency. While we found no significant differences between study participants who learned through practice compared to participants who learned through our curriculum, our study reveals insights into factors contributing to end-user robot programmers' confidence and success during programming and how learning interventions may contribute to such factors. Our work paves the way for further research on how to best structure training interventions for end-user robot programmers.

Subject(s)

Robotics , Humans , Robotics/methods , Learning , Curriculum , Physical Examination , Biomechanical Phenomena

4.

Large-scale pancreatic cancer detection via non-contrast CT and deep learning.

Cao, Kai; Xia, Yingda; Yao, Jiawen; Han, Xu; Lambert, Lukas; Zhang, Tingting; Tang, Wei; Jin, Gang; Jiang, Hui; Fang, Xu; Nogues, Isabella; Li, Xuezhou; Guo, Wenchao; Wang, Yu; Fang, Wei; Qiu, Mingyan; Hou, Yang; Kovarnik, Tomas; Vocka, Michal; Lu, Yimei; Chen, Yingli; Chen, Xin; Liu, Zaiyi; Zhou, Jian; Xie, Chuanmiao; Zhang, Rong; Lu, Hong; Hager, Gregory D; Yuille, Alan L; Lu, Le; Shao, Chengwei; Shi, Yu; Zhang, Qi; Liang, Tingbo; Zhang, Ling; Lu, Jianping.

Nat Med ; 29(12): 3033-3043, 2023 Dec.

Article in English | MEDLINE | ID: mdl-37985692

ABSTRACT

Pancreatic ductal adenocarcinoma (PDAC), the most deadly solid malignancy, is typically detected late and at an inoperable stage. Early or incidental detection is associated with prolonged survival, but screening asymptomatic individuals for PDAC using a single test remains unfeasible due to the low prevalence and potential harms of false positives. Non-contrast computed tomography (CT), routinely performed for clinical indications, offers the potential for large-scale screening, however, identification of PDAC using non-contrast CT has long been considered impossible. Here, we develop a deep learning approach, pancreatic cancer detection with artificial intelligence (PANDA), that can detect and classify pancreatic lesions with high accuracy via non-contrast CT. PANDA is trained on a dataset of 3,208 patients from a single center. PANDA achieves an area under the receiver operating characteristic curve (AUC) of 0.986-0.996 for lesion detection in a multicenter validation involving 6,239 patients across 10 centers, outperforms the mean radiologist performance by 34.1% in sensitivity and 6.3% in specificity for PDAC identification, and achieves a sensitivity of 92.9% and specificity of 99.9% for lesion detection in a real-world multi-scenario validation consisting of 20,530 consecutive patients. Notably, PANDA utilized with non-contrast CT shows non-inferiority to radiology reports (using contrast-enhanced CT) in the differentiation of common pancreatic lesion subtypes. PANDA could potentially serve as a new tool for large-scale pancreatic cancer screening.

Subject(s)

Carcinoma, Pancreatic Ductal , Deep Learning , Pancreatic Neoplasms , Humans , Artificial Intelligence , Pancreatic Neoplasms/diagnostic imaging , Pancreatic Neoplasms/pathology , Tomography, X-Ray Computed , Pancreas/diagnostic imaging , Pancreas/pathology , Carcinoma, Pancreatic Ductal/diagnostic imaging , Carcinoma, Pancreatic Ductal/pathology , Retrospective Studies

5.

Effect of pre-operative warm-up on trainee intraoperative performance during robot-assisted hysterectomy: a randomized controlled trial.

Chen, Chi Chiung Grace; Malpani, Anand; Waldram, Madeleine M; Romanczyk, Caitlin; Tanner, Edward J; Fader, Amanda N; Scheib, Stacey A; Hager, Gregory D; Vedula, S Swaroop.

Int Urogynecol J ; 34(11): 2751-2758, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37449987

ABSTRACT

INTRODUCTION AND HYPOTHESIS: The objective was to study the effect of immediate pre-operative warm-up using virtual reality simulation on intraoperative robot-assisted laparoscopic hysterectomy (RALH) performance by gynecology trainees (residents and fellows). METHODS: We randomized the first, non-emergent RALH of the day that involved trainees warming up or not warming up. For cases assigned to warm-up, trainees performed a set of exercises on the da Vinci Skills Simulator immediately before the procedure. The supervising attending surgeon, who was not informed whether or not the trainee was assigned to warm-up, assessed the trainee's performance using the Objective Structured Assessment for Technical Skill (OSATS) and the Global Evaluative Assessment of Robotic Skills (GEARS) immediately after each surgery. RESULTS: We randomized 66 cases and analyzed 58 cases (30 warm-up, 28 no warm-up), which involved 21 trainees. Attending surgeons rated trainees similarly irrespective of warm-up randomization with mean (SD) OSATS composite scores of 22.6 (4.3; warm-up) vs 21.8 (3.4; no warm-up) and mean GEARS composite scores of 19.2 (3.8; warm-up) vs 18.8 (3.1; no warm-up). The difference in composite scores between warm-up and no warm-up was 0.34 (95% CI: -1.44, 2.13), and 0.34 (95% CI: -1.22, 1.90) for OSATS and GEARS respectively. Also, we did not observe any significant differences in each of the component/subscale scores within OSATS and GEARS between cases assigned to warm-up and no warm-up. CONCLUSION: Performing a brief virtual reality-based warm-up before RALH did not significantly improve the intraoperative performance of the trainees.

Subject(s)

Laparoscopy , Robotic Surgical Procedures , Robotics , Female , Humans , Computer Simulation , Hysterectomy , Clinical Competence

6.

Eye Tracking and Motion Data Predict Endoscopic Sinus Surgery Skill.

Berges, Alexandra J; Vedula, S Swaroop; Chara, Alejandro; Hager, Gregory D; Ishii, Masaru; Malpani, Anand.

Laryngoscope ; 133(3): 500-505, 2023 03.

Article in English | MEDLINE | ID: mdl-35357011

ABSTRACT

OBJECTIVE: Endoscopic surgery has a considerable learning curve due to dissociation of the visual-motor axes, coupled with decreased tactile feedback and mobility. In particular, endoscopic sinus surgery (ESS) lacks objective skill assessment metrics to provide specific feedback to trainees. This study aims to identify summary metrics from eye tracking, endoscope motion, and tool motion to objectively assess surgeons' ESS skill. METHODS: In this cross-sectional study, expert and novice surgeons performed ESS tasks of inserting an endoscope and tool into a cadaveric nose, touching an anatomical landmark, and withdrawing the endoscope and tool out of the nose. Tool and endoscope motion were collected using an electromagnetic tracker, and eye gaze was tracked using an infrared camera. Three expert surgeons provided binary assessments of low/high skill. 20 summary statistics were calculated for eye, tool, and endoscope motion and used in logistic regression models to predict surgical skill. RESULTS: 14 metrics (10 eye gaze, 2 tool motion, and 2 endoscope motion) were significantly different between surgeons with low and high skill. Models to predict skill for 6/9 ESS tasks had an AUC >0.95. A combined model of all tasks (AUC 0.95, PPV 0.93, NPV 0.89) included metrics from eye tracking data and endoscope motion, indicating that these metrics are transferable across tasks. CONCLUSIONS: Eye gaze, endoscope, and tool motion data can provide an objective and accurate measurement of ESS surgical performance. Incorporation of these algorithmic techniques intraoperatively could allow for automated skill assessment for trainees learning endoscopic surgery. LEVEL OF EVIDENCE: N/A Laryngoscope, 133:500-505, 2023.

Subject(s)

Eye-Tracking Technology , Surgeons , Humans , Cross-Sectional Studies , Endoscopy , Endoscopes , Clinical Competence

7.

A Delphi consensus statement for digital surgery.

Lam, Kyle; Abràmoff, Michael D; Balibrea, José M; Bishop, Steven M; Brady, Richard R; Callcut, Rachael A; Chand, Manish; Collins, Justin W; Diener, Markus K; Eisenmann, Matthias; Fermont, Kelly; Neto, Manoel Galvao; Hager, Gregory D; Hinchliffe, Robert J; Horgan, Alan; Jannin, Pierre; Langerman, Alexander; Logishetty, Kartik; Mahadik, Amit; Maier-Hein, Lena; Antona, Esteban Martín; Mascagni, Pietro; Mathew, Ryan K; Müller-Stich, Beat P; Neumuth, Thomas; Nickel, Felix; Park, Adrian; Pellino, Gianluca; Rudzicz, Frank; Shah, Sam; Slack, Mark; Smith, Myles J; Soomro, Naeem; Speidel, Stefanie; Stoyanov, Danail; Tilney, Henry S; Wagner, Martin; Darzi, Ara; Kinross, James M; Purkayastha, Sanjay.

NPJ Digit Med ; 5(1): 100, 2022 Jul 19.

Article in English | MEDLINE | ID: mdl-35854145

ABSTRACT

The use of digital technology is increasing rapidly across surgical specialities, yet there is no consensus for the term 'digital surgery'. This is critical as digital health technologies present technical, governance, and legal challenges which are unique to the surgeon and surgical patient. We aim to define the term digital surgery and the ethical issues surrounding its clinical application, and to identify barriers and research goals for future practice. 38 international experts, across the fields of surgery, AI, industry, law, ethics and policy, participated in a four-round Delphi exercise. Issues were generated by an expert panel and public panel through a scoping questionnaire around key themes identified from the literature and voted upon in two subsequent questionnaire rounds. Consensus was defined if >70% of the panel deemed the statement important and <30% unimportant. A final online meeting was held to discuss consensus statements. The definition of digital surgery as the use of technology for the enhancement of preoperative planning, surgical performance, therapeutic support, or training, to improve outcomes and reduce harm achieved 100% consensus agreement. We highlight key ethical issues concerning data, privacy, confidentiality and public trust, consent, law, litigation and liability, and commercial partnerships within digital surgery and identify barriers and research goals for future practice. Developers and users of digital surgery must not only have an awareness of the ethical issues surrounding digital applications in healthcare, but also the ethical considerations unique to digital surgery. Future research into these issues must involve all digital surgery stakeholders including patients.

8.

Pre-Clinical Development of Robot-Assisted Ventriculoscopy for 3D Image Reconstruction and Guidance of Deep Brain Neurosurgery.

Vagdargi, Prasad; Uneri, Ali; Jones, Craig K; Wu, Pengwei; Han, Runze; Luciano, Mark G; Anderson, William S; Helm, Patrick A; Hager, Gregory D; Siewerdsen, Jeffrey H.

IEEE Trans Med Robot Bionics ; 4(1): 28-37, 2022 Feb.

Article in English | MEDLINE | ID: mdl-35368731

ABSTRACT

Conventional neuro-navigation can be challenged in targeting deep brain structures via transventricular neuroendoscopy due to unresolved geometric error following soft-tissue deformation. Current robot-assisted endoscopy techniques are fairly limited, primarily serving to planned trajectories and provide a stable scope holder. We report the implementation of a robot-assisted ventriculoscopy (RAV) system for 3D reconstruction, registration, and augmentation of the neuroendoscopic scene with intraoperative imaging, enabling guidance even in the presence of tissue deformation and providing visualization of structures beyond the endoscopic field-of-view. Phantom studies were performed to quantitatively evaluate image sampling requirements, registration accuracy, and computational runtime for two reconstruction methods and a variety of clinically relevant ventriculoscope trajectories. A median target registration error of 1.2 mm was achieved with an update rate of 2.34 frames per second, validating the RAV concept and motivating translation to future clinical studies.

9.

Do Attending and Trainee Surgeons Agree on What Happens in the Operating Room During Septoplasty?

Ryan, John F; Malpani, Anand; Naz, Hajira; Boahene, Kofi D O; Papel, Ira D; Kontis, Theda C; Maxwell, Jessica H; Creighton, Francis X; Byrne, Patrick J; Wanamaker, John R; Hager, Gregory D; Vedula, S Swaroop; Malekzadeh, Sonya; Ishii, Lisa E; Ishii, Masaru.

Facial Plast Surg Aesthet Med ; 24(6): 472-477, 2022.

Article in English | MEDLINE | ID: mdl-35255228

ABSTRACT

Background: Surgeons must select cases whose complexity aligns with their skill set. Objectives: To determine how accurately trainees report involvement in procedures, judge case complexity, and assess their own skills. Methods: We recruited attendings and trainees from two otolaryngology departments. After performing septoplasty, they completed identical surveys regarding case complexity, achievement of goals, who performed which steps, and trainee skill using the septoplasty global assessment tool (SGAT) and visual analog scale (VAS). Agreement regarding which steps were performed by the trainee was assessed with Cohen's kappa coefficients (κ). Correlations between trainee and attending responses were measured with Spearman's correlation coefficients (rho). Results: Seven attendings and 42 trainees completed 181 paired surveys. Trainees and attendings sometimes disagreed about which steps were performed by trainees (range of κ = 0.743-0.846). Correlation between attending and trainee responses was low for VAS skill ratings (range of rho = 0.12-0.34), SGAT questions (range of rho = 0.03-0.53), and evaluation of case complexity (range of rho = 0.24-0.48). Conclusion: Trainees sometimes disagree with attendings about which septoplasty steps they perform and are limited in their ability to judge complexity, goals, and their skill.

Subject(s)

Otolaryngology , Rhinoplasty , Surgeons , Humans , Operating Rooms , Clinical Competence

10.

Characterizing the Details of Spatial Construction: Cognitive Constraints and Variability.

Shelton, Amy Lynne; Davis, E Emory; Cortesa, Cathryn S; Jones, Jonathan D; Hager, Gregory D; Khudanpur, Sanjeev; Landau, Barbara.

Cogn Sci ; 46(1): e13081, 2022 01.

Article in English | MEDLINE | ID: mdl-35066920

ABSTRACT

Spatial construction-the activity of creating novel spatial arrangements or copying existing ones-is a hallmark of human spatial cognition. Spatial construction abilities predict math and other academic outcomes and are regularly used in IQ testing, but we know little about the cognitive processes that underlie them. In part, this lack of understanding is due to both the complex nature of construction tasks and the tendency to limit measurement to the overall accuracy of the end goal. Using an automated recording and coding system, we examined in detail adults' performance on a block copying task, specifying their step-by-step actions, culminating in all steps in the full construction of the build-path. The results revealed the consistent use of a structured plan that unfolded in an organized way, layer by layer (bottom to top). We also observed that complete layers served as convergence points, where the most agreement among participants occurred, whereas the specific steps taken to achieve each of those layers diverged, or varied, both across and even within individuals. This pattern of convergence and divergence suggests that the layers themselves were serving as the common subgoals across both inter and intraindividual builds of the same model, reflecting cognitive "chunking." This structured use of layers as subgoals was functionally related to better performance among builders. Our findings offer a foundation for further exploration that may yield insights into the development and training of block-construction as well as other complex cognitive-motor skills. In addition, this work offers proof-of-concept for systematic investigation into a wide range of complex action-based cognitive tasks.

Subject(s)

Cognition , Memory , Adult , Humans , Intelligence Tests

11.

SAGE: SLAM with Appearance and Geometry Prior for Endoscopy.

Liu, Xingtong; Li, Zhaoshuo; Ishii, Masaru; Hager, Gregory D; Taylor, Russell H; Unberath, Mathias.

IEEE Int Conf Robot Autom ; 2022: 5587-5593, 2022 May.

Article in English | MEDLINE | ID: mdl-36937551

ABSTRACT

In endoscopy, many applications (e.g., surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Mapping system by combining the learning-based appearance and optimizable geometry priors and factor graph optimization. The appearance and geometry priors are explicitly learned in an end-to-end differentiable training pipeline to master the task of pair-wise image alignment, one of the core components of the SLAM system. In our experiments, the proposed SLAM system is shown to robustly handle the challenges of texture scarceness and illumination variation that are commonly seen in endoscopy. The system generalizes well to unseen endoscopes and subjects and performs favorably compared with a state-of-the-art feature-based SLAM system. The code repository is available at https://github.com/lppllppl920/SAGE-SLAM.git.

12.

Control of Magnetic Surgical Robots With Model-Based Simulators and Reinforcement Learning.

Barnoy, Yotam; Erin, Onder; Raval, Suraj; Pryor, Will; Mair, Lamar O; Weinberg, Irving N; Diaz-Mercado, Yancy; Krieger, Axel; Hager, Gregory D.

IEEE Trans Med Robot Bionics ; 4(4): 945-956, 2022 Nov.

Article in English | MEDLINE | ID: mdl-37600471

ABSTRACT

Magnetically manipulated medical robots are a promising alternative to current robotic platforms, allowing for miniaturization and tetherless actuation. Controlling such systems autonomously may enable safe, accurate operation. However, classical control methods require rigorous models of magnetic fields, robot dynamics, and robot environments, which can be difficult to generate. Model-free reinforcement learning (RL) offers an alternative that can bypass these requirements. We apply RL to a robotic magnetic needle manipulation system. Reinforcement learning algorithms often require long runtimes, making them impractical for many surgical robotics applications, most of which require careful, constant monitoring. Our approach first constructs a model-based simulation (MBS) on guided real-world exploration, learning the dynamics of the environment. After intensive MBS environment training, we transfer the learned behavior from the MBS environment to the real-world. Our MBS method applies RL roughly 200 times faster than doing so in the real world, and achieves a 6 mm root-mean-square (RMS) error for a square reference trajectory. In comparison, pure simulation-based approaches fail to transfer, producing a 31 mm RMS error. These results demonstrate that MBS environments are a good solution for domains where running model-free RL is impractical, especially if an accurate simulation is not available.

13.

Surgical data science - from concepts toward clinical translation.

Maier-Hein, Lena; Eisenmann, Matthias; Sarikaya, Duygu; März, Keno; Collins, Toby; Malpani, Anand; Fallert, Johannes; Feussner, Hubertus; Giannarou, Stamatia; Mascagni, Pietro; Nakawala, Hirenkumar; Park, Adrian; Pugh, Carla; Stoyanov, Danail; Vedula, Swaroop S; Cleary, Kevin; Fichtinger, Gabor; Forestier, Germain; Gibaud, Bernard; Grantcharov, Teodor; Hashizume, Makoto; Heckmann-Nötzel, Doreen; Kenngott, Hannes G; Kikinis, Ron; Mündermann, Lars; Navab, Nassir; Onogur, Sinan; Roß, Tobias; Sznitman, Raphael; Taylor, Russell H; Tizabi, Minu D; Wagner, Martin; Hager, Gregory D; Neumuth, Thomas; Padoy, Nicolas; Collins, Justin; Gockel, Ines; Goedeke, Jan; Hashimoto, Daniel A; Joyeux, Luc; Lam, Kyle; Leff, Daniel R; Madani, Amin; Marcus, Hani J; Meireles, Ozanan; Seitel, Alexander; Teber, Dogu; Ückert, Frank; Müller-Stich, Beat P; Jannin, Pierre.

Med Image Anal ; 76: 102306, 2022 02.

Article in English | MEDLINE | ID: mdl-34879287

ABSTRACT

Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical Data Science (SDS) is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applications have been studied in the fields of radiological and clinical data science, translational success stories are still lacking in surgery. In this publication, we shed light on the underlying reasons and provide a roadmap for future advances in the field. Based on an international workshop involving leading researchers in the field of SDS, we review current practice, key achievements and initiatives as well as available standards and tools for a number of topics relevant to the field, namely (1) infrastructure for data acquisition, storage and access in the presence of regulatory constraints, (2) data annotation and sharing and (3) data analytics. We further complement this technical perspective with (4) a review of currently available SDS products and the translational progress from academia and (5) a roadmap for faster clinical translation and exploitation of the full potential of SDS, based on an international multi-round Delphi process.

Subject(s)

Data Science , Machine Learning , Humans

14.

Radiology "forensics": determination of age and sex from chest radiographs using deep learning.

Yi, Paul H; Wei, Jinchi; Kim, Tae Kyung; Shin, Jiwon; Sair, Haris I; Hui, Ferdinand K; Hager, Gregory D; Lin, Cheng Ting.

Emerg Radiol ; 28(5): 949-954, 2021 Oct.

Article in English | MEDLINE | ID: mdl-34089126

ABSTRACT

PURPOSE: To develop and test the performance of deep convolutional neural networks (DCNNs) for automated classification of age and sex on chest radiographs (CXR). METHODS: We obtained 112,120 frontal CXRs from the NIH ChestX-ray14 database performed in 48,780 females (44%) and 63,340 males (56%) ranging from 1 to 95 years old. The dataset was split into training (70%), validation (10%), and test (20%) datasets, and used to fine-tune ResNet-18 DCNNs pretrained on ImageNet for (1) determination of sex (using entire dataset and only pediatric CXRs); (2) determination of age < 18 years old or ≥ 18 years old (using entire dataset); and (3) determination of age < 11 years old or 11-18 years old (using only pediatric CXRs). External testing was performed on 662 CXRs from China. Area under the receiver operating characteristic curve (AUC) was used to evaluate DCNN test performance. RESULTS: DCNNs trained to determine sex on the entire dataset and pediatric CXRs only had AUCs of 1.0 and 0.91, respectively (p < 0.0001). DCNNs trained to determine age < or ≥ 18 years old and < 11 vs. 11-18 years old had AUCs of 0.99 and 0.96 (p < 0.0001), respectively. External testing showed AUC of 0.98 for sex (p = 0.01) and 0.91 for determining age < or ≥ 18 years old (p < 0.001). CONCLUSION: DCNNs can accurately predict sex from CXRs and distinguish between adult and pediatric patients in both American and Chinese populations. The ability to glean demographic information from CXRs may aid forensic investigations, as well as help identify novel anatomic landmarks for sex and age.

Subject(s)

Deep Learning , Radiology , Adolescent , Adult , Aged , Aged, 80 and over , Child , Child, Preschool , Female , Humans , Infant , Male , Middle Aged , Neural Networks, Computer , Radiography , Radiography, Thoracic , Young Adult

15.

DeepCAT: Deep Computer-Aided Triage of Screening Mammography.

Yi, Paul H; Singh, Dhananjay; Harvey, Susan C; Hager, Gregory D; Mullen, Lisa A.

J Digit Imaging ; 34(1): 27-35, 2021 02.

Article in English | MEDLINE | ID: mdl-33432446

ABSTRACT

Although much deep learning research has focused on mammographic detection of breast cancer, relatively little attention has been paid to mammography triage for radiologist review. The purpose of this study was to develop and test DeepCAT, a deep learning system for mammography triage based on suspicion of cancer. Specifically, we evaluate DeepCAT's ability to provide two augmentations to radiologists: (1) discarding images unlikely to have cancer from radiologist review and (2) prioritization of images likely to contain cancer. We used 1878 2D-mammographic images (CC & MLO) from the Digital Database for Screening Mammography to develop DeepCAT, a deep learning triage system composed of 2 components: (1) mammogram classifier cascade and (2) mass detector, which are combined to generate an overall priority score. This priority score is used to order images for radiologist review. Of 595 testing images, DeepCAT recommended low priority for 315 images (53%), of which none contained a malignant mass. In evaluation of prioritizing images according to likelihood of containing cancer, DeepCAT's study ordering required an average of 26 adjacent swaps to obtain perfect review order. Our results suggest that DeepCAT could substantially increase efficiency for breast imagers and effectively triage review of mammograms with malignant masses.

Subject(s)

Breast Neoplasms , Mammography , Breast Neoplasms/diagnostic imaging , Computers , Early Detection of Cancer , Female , Humans , Triage

16.

Localization and Control of Magnetic Suture Needles in Cluttered Surgical Site with Blood and Tissue.

Pryor, Will; Barnoy, Yotam; Raval, Suraj; Liu, Xiaolong; Mair, Lamar; Lerner, Daniel; Erin, Onder; Hager, Gregory D; Diaz-Mercado, Yancy; Krieger, Axel.

Rep U S ; 2021: 524-531, 2021.

Article in English | MEDLINE | ID: mdl-35223133

ABSTRACT

Real-time visual localization of needles is necessary for various surgical applications, including surgical automation and visual feedback. In this study we investigate localization and autonomous robotic control of needles in the context of our magneto-suturing system. Our system holds the potential for surgical manipulation with the benefit of minimal invasiveness and reduced patient side effects. However, the nonlinear magnetic fields produce unintuitive forces and demand delicate position-based control that exceeds the capabilities of direct human manipulation. This makes automatic needle localization a necessity. Our localization method combines neural network-based segmentation and classical techniques, and we are able to consistently locate our needle with 0.73 mm RMS error in clean environments and 2.72 mm RMS error in challenging environments with blood and occlusion. The average localization RMS error is 2.16 mm for all environments we used in the experiments. We combine this localization method with our closed-loop feedback control system to demonstrate the further applicability of localization to autonomous control. Our needle is able to follow a running suture path in (1) no blood, no tissue; (2) heavy blood, no tissue; (3) no blood, with tissue; and (4) heavy blood, with tissue environments. The tip position tracking error ranges from 2.6 mm to 3.7 mm RMS, opening the door towards autonomous suturing tasks.

17.

Deep Learning Detection of Sea Fan Neovascularization From Ultra-Widefield Color Fundus Photographs of Patients With Sickle Cell Hemoglobinopathy.

Cai, Sophie; Parker, Felix; Urias, Muller G; Goldberg, Morton F; Hager, Gregory D; Scott, Adrienne W.

JAMA Ophthalmol ; 139(2): 206-213, 2021 02 01.

Article in English | MEDLINE | ID: mdl-33377944

ABSTRACT

Importance: Adherence to screening for vision-threatening proliferative sickle cell retinopathy is limited among patients with sickle cell hemoglobinopathy despite guidelines recommending dilated fundus examinations beginning in childhood. An automated algorithm for detecting sea fan neovascularization from ultra-widefield color fundus photographs could expand access to rapid retinal evaluations to identify patients at risk of vision loss from proliferative sickle cell retinopathy. Objective: To develop a deep learning system for detecting sea fan neovascularization from ultra-widefield color fundus photographs from patients with sickle cell hemoglobinopathy. Design, Setting, and Participants: In a cross-sectional study conducted at a single-institution, tertiary academic referral center, deidentified, retrospectively collected, ultra-widefield color fundus photographs from 190 adults with sickle cell hemoglobinopathy were independently graded by 2 masked retinal specialists for presence or absence of sea fan neovascularization. A third masked retinal specialist regraded images with discordant or indeterminate grades. Consensus retinal specialist reference standard grades were used to train a convolutional neural network to classify images for presence or absence of sea fan neovascularization. Participants included nondiabetic adults with sickle cell hemoglobinopathy receiving care from a Wilmer Eye Institute retinal specialist; the patients had received no previous laser or surgical treatment for sickle cell retinopathy and underwent imaging with ultra-widefield color fundus photographs between January 1, 2012, and January 30, 2019. Interventions: Deidentified ultra-widefield color fundus photographs were retrospectively collected. Main Outcomes and Measures: Sensitivity, specificity, and area under the receiver operating characteristic curve of the convolutional neural network for sea fan detection. Results: A total of 1182 images from 190 patients were included. Of the 190 patients, 101 were women (53.2%), and the mean (SD) age at baseline was 36.2 (12.3) years; 119 patients (62.6%) had hemoglobin SS disease and 46 (24.2%) had hemoglobin SC disease. One hundred seventy-nine patients (94.2%) were of Black or African descent. Images with sea fan neovascularization were obtained in 57 patients (30.0%). The convolutional neural network had an area under the curve of 0.988 (95% CI, 0.969-0.999), with sensitivity of 97.4% (95% CI, 86.5%-99.9%) and specificity of 97.0% (95% CI, 93.5%-98.9%) for detecting sea fan neovascularization from ultra-widefield color fundus photographs. Conclusions and Relevance: This study reports an automated system with high sensitivity and specificity for detecting sea fan neovascularization from ultra-widefield color fundus photographs from patients with sickle cell hemoglobinopathy, with potential applications for improving screening for vision-threatening proliferative sickle cell retinopathy.

Subject(s)

Anemia, Sickle Cell/complications , Deep Learning , Fluorescein Angiography , Image Interpretation, Computer-Assisted , Photography , Retinal Neovascularization/diagnostic imaging , Retinal Vessels/diagnostic imaging , Adult , Anemia, Sickle Cell/diagnosis , Cross-Sectional Studies , Female , Humans , Male , Middle Aged , Observer Variation , Pattern Recognition, Automated , Predictive Value of Tests , Reproducibility of Results , Retinal Neovascularization/etiology , Retrospective Studies , Young Adult

18.

Impact of data on generalization of AI for surgical intelligence applications.

Bar, Omri; Neimark, Daniel; Zohar, Maya; Hager, Gregory D; Girshick, Ross; Fried, Gerald M; Wolf, Tamir; Asselmann, Dotan.

Sci Rep ; 10(1): 22208, 2020 12 17.

Article in English | MEDLINE | ID: mdl-33335191

ABSTRACT

AI is becoming ubiquitous, revolutionizing many aspects of our lives. In surgery, it is still a promise. AI has the potential to improve surgeon performance and impact patient care, from post-operative debrief to real-time decision support. But, how much data is needed by an AI-based system to learn surgical context with high fidelity? To answer this question, we leveraged a large-scale, diverse, cholecystectomy video dataset. We assessed surgical workflow recognition and report a deep learning system, that not only detects surgical phases, but does so with high accuracy and is able to generalize to new settings and unseen medical centers. Our findings provide a solid foundation for translating AI applications from research to practice, ushering in a new era of surgical intelligence.

19.

Refining dataset curation methods for deep learning-based automated tuberculosis screening.

Kim, Tae Kyung; Yi, Paul H; Hager, Gregory D; Lin, Cheng Ting.

J Thorac Dis ; 12(9): 5078-5085, 2020 Sep.

Article in English | MEDLINE | ID: mdl-33145084

ABSTRACT

BACKGROUND: The study objective was to determine whether unlabeled datasets can be used to further train and improve the accuracy of a deep learning system (DLS) for the detection of tuberculosis (TB) on chest radiographs (CXRs) using a two-stage semi-supervised approach. METHODS: A total of 111,622 CXRs from the National Institute of Health ChestX-ray14 database were collected. A cardiothoracic radiologist reviewed a subset of 11,000 CXRs and dichotomously labeled each for the presence or absence of potential TB findings; these interpretations were used to train a deep convolutional neural network (DCNN) to identify CXRs with possible TB (Phase I). The best performing algorithm was then used to label the remaining database consisting of 100,622 radiographs; subsequently, these newly-labeled images were used to train a second DCNN (phase II). The best-performing algorithm from phase II (TBNet) was then tested against CXRs obtained from 3 separate sites (2 from the USA, 1 from China) with clinically confirmed cases of TB. Receiver operating characteristic (ROC) curves were generated with area under the curve (AUC) calculated. RESULTS: The phase I algorithm trained using 11,000 expert-labelled radiographs achieved an AUC of 0.88. The phase II algorithm trained on images labeled by the phase I algorithm achieved an AUC of 0.91 testing against a TB dataset obtained from Shenzhen, China and Montgomery County, USA. The algorithm generalized well to radiographs obtained from a tertiary care hospital, achieving an AUC of 0.87; TBNet's sensitivity, specificity, positive predictive value, and negative predictive value were 85%, 76%, 0.64, and 0.9, respectively. When TBNet was used to arbitrate discrepancies between 2 radiologists, the overall sensitivity reached 94% and negative predictive value reached 0.96, demonstrating a synergistic effect between the algorithm's output and radiologists' interpretations. CONCLUSIONS: Using semi-supervised learning, we trained a deep learning algorithm that detected TB at a high accuracy and demonstrated value as a CAD tool by identifying relevant CXR findings, especially in cases that were misinterpreted by radiologists. When dataset labels are noisy or absent, the described methods can significantly reduce the required amount of curated data to build clinically-relevant deep learning models, which will play an important role in the era of precision medicine.

20.

Deep hiearchical multi-label classification applied to chest X-ray abnormality taxonomies.

Chen, Haomin; Miao, Shun; Xu, Daguang; Hager, Gregory D; Harrison, Adam P.

Med Image Anal ; 66: 101811, 2020 12.

Article in English | MEDLINE | ID: mdl-32937229

ABSTRACT

Chest X-rays (CXRs) are a crucial and extraordinarily common diagnostic tool, leading to heavy research for computer-aided diagnosis (CAD) solutions. However, both high classification accuracy and meaningful model predictions that respect and incorporate clinical taxonomies are crucial for CAD usability. To this end, we present a deep hierarchical multi-label classification (HMLC) approach for CXR CAD. Different than other hierarchical systems, we show that first training the network to model conditional probability directly and then refining it with unconditional probabilities is key in boosting performance. In addition, we also formulate a numerically stable cross-entropy loss function for unconditional probabilities that provides concrete performance improvements. Finally, we demonstrate that HMLC can be an effective means to manage missing or incomplete labels. To the best of our knowledge, we are the first to apply HMLC to medical imaging CAD. We extensively evaluate our approach on detecting abnormality labels from the CXR arm of the Prostate, Lung, Colorectal and Ovarian (PLCO) dataset, which comprises over 198,000 manually annotated CXRs. When using complete labels, we report a mean area under the curve (AUC) of 0.887, the highest yet reported for this dataset. These results are supported by ancillary experiments on the PadChest dataset, where we also report significant improvements, 1.2% and 4.1% in AUC and average precision, respectively over strong "flat" classifiers. Finally, we demonstrate that our HMLC approach can much better handle incompletely labelled data. These performance improvements, combined with the inherent usefulness of taxonomic predictions, indicate that our approach represents a useful step forward for CXR CAD.

Subject(s)

Lung , Tomography, X-Ray Computed , Diagnosis, Computer-Assisted , Humans , Lung/diagnostic imaging , Male , Radiography , X-Rays

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL