RESUMO
The last five years marked a surge in interest for and use of smart robots, which operate in dynamic and unstructured environments and might interact with humans. We posit that well-validated computer simulation can provide a virtual proving ground that in many cases is instrumental in understanding safely, faster, at lower costs, and more thoroughly how the robots of the future should be designed and controlled for safe operation and improved performance. Against this backdrop, we discuss how simulation can help in robotics, barriers that currently prevent its broad adoption, and potential steps that can eliminate some of these barriers. The points and recommendations made concern the following simulation-in-robotics aspects: simulation of the dynamics of the robot; simulation of the virtual world; simulation of the sensing of this virtual world; simulation of the interaction between the human and the robot; and, in less depth, simulation of the communication between robots. This Perspectives contribution summarizes the points of view that coalesced during a 2018 National Science Foundation/Department of Defense/National Institute for Standards and Technology workshop dedicated to the topic at hand. The meeting brought together participants from a range of organizations, disciplines, and application fields, with expertise at the intersection of robotics, machine learning, and physics-based simulation.
RESUMO
INTRODUCTION AND HYPOTHESIS: The objective was to study the effect of immediate pre-operative warm-up using virtual reality simulation on intraoperative robot-assisted laparoscopic hysterectomy (RALH) performance by gynecology trainees (residents and fellows). METHODS: We randomized the first, non-emergent RALH of the day that involved trainees warming up or not warming up. For cases assigned to warm-up, trainees performed a set of exercises on the da Vinci Skills Simulator immediately before the procedure. The supervising attending surgeon, who was not informed whether or not the trainee was assigned to warm-up, assessed the trainee's performance using the Objective Structured Assessment for Technical Skill (OSATS) and the Global Evaluative Assessment of Robotic Skills (GEARS) immediately after each surgery. RESULTS: We randomized 66 cases and analyzed 58 cases (30 warm-up, 28 no warm-up), which involved 21 trainees. Attending surgeons rated trainees similarly irrespective of warm-up randomization with mean (SD) OSATS composite scores of 22.6 (4.3; warm-up) vs 21.8 (3.4; no warm-up) and mean GEARS composite scores of 19.2 (3.8; warm-up) vs 18.8 (3.1; no warm-up). The difference in composite scores between warm-up and no warm-up was 0.34 (95% CI: -1.44, 2.13), and 0.34 (95% CI: -1.22, 1.90) for OSATS and GEARS respectively. Also, we did not observe any significant differences in each of the component/subscale scores within OSATS and GEARS between cases assigned to warm-up and no warm-up. CONCLUSION: Performing a brief virtual reality-based warm-up before RALH did not significantly improve the intraoperative performance of the trainees.
Assuntos
Laparoscopia , Procedimentos Cirúrgicos Robóticos , Robótica , Feminino , Humanos , Simulação por Computador , Histerectomia , Competência ClínicaRESUMO
BACKGROUND: The growing interest in analysis of surgical video through machine learning has led to increased research efforts; however, common methods of annotating video data are lacking. There is a need to establish recommendations on the annotation of surgical video data to enable assessment of algorithms and multi-institutional collaboration. METHODS: Four working groups were formed from a pool of participants that included clinicians, engineers, and data scientists. The working groups were focused on four themes: (1) temporal models, (2) actions and tasks, (3) tissue characteristics and general anatomy, and (4) software and data structure. A modified Delphi process was utilized to create a consensus survey based on suggested recommendations from each of the working groups. RESULTS: After three Delphi rounds, consensus was reached on recommendations for annotation within each of these domains. A hierarchy for annotation of temporal events in surgery was established. CONCLUSIONS: While additional work remains to achieve accepted standards for video annotation in surgery, the consensus recommendations on a general framework for annotation presented here lay the foundation for standardization. This type of framework is critical to enabling diverse datasets, performance benchmarks, and collaboration.
Assuntos
Aprendizado de Máquina , Consenso , Técnica Delphi , Humanos , Inquéritos e QuestionáriosRESUMO
PURPOSE: To develop and test the performance of deep convolutional neural networks (DCNNs) for automated classification of age and sex on chest radiographs (CXR). METHODS: We obtained 112,120 frontal CXRs from the NIH ChestX-ray14 database performed in 48,780 females (44%) and 63,340 males (56%) ranging from 1 to 95 years old. The dataset was split into training (70%), validation (10%), and test (20%) datasets, and used to fine-tune ResNet-18 DCNNs pretrained on ImageNet for (1) determination of sex (using entire dataset and only pediatric CXRs); (2) determination of age < 18 years old or ≥ 18 years old (using entire dataset); and (3) determination of age < 11 years old or 11-18 years old (using only pediatric CXRs). External testing was performed on 662 CXRs from China. Area under the receiver operating characteristic curve (AUC) was used to evaluate DCNN test performance. RESULTS: DCNNs trained to determine sex on the entire dataset and pediatric CXRs only had AUCs of 1.0 and 0.91, respectively (p < 0.0001). DCNNs trained to determine age < or ≥ 18 years old and < 11 vs. 11-18 years old had AUCs of 0.99 and 0.96 (p < 0.0001), respectively. External testing showed AUC of 0.98 for sex (p = 0.01) and 0.91 for determining age < or ≥ 18 years old (p < 0.001). CONCLUSION: DCNNs can accurately predict sex from CXRs and distinguish between adult and pediatric patients in both American and Chinese populations. The ability to glean demographic information from CXRs may aid forensic investigations, as well as help identify novel anatomic landmarks for sex and age.
Assuntos
Aprendizado Profundo , Radiologia , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Masculino , Pessoa de Meia-Idade , Redes Neurais de Computação , Radiografia , Radiografia Torácica , Adulto JovemRESUMO
Although much deep learning research has focused on mammographic detection of breast cancer, relatively little attention has been paid to mammography triage for radiologist review. The purpose of this study was to develop and test DeepCAT, a deep learning system for mammography triage based on suspicion of cancer. Specifically, we evaluate DeepCAT's ability to provide two augmentations to radiologists: (1) discarding images unlikely to have cancer from radiologist review and (2) prioritization of images likely to contain cancer. We used 1878 2D-mammographic images (CC & MLO) from the Digital Database for Screening Mammography to develop DeepCAT, a deep learning triage system composed of 2 components: (1) mammogram classifier cascade and (2) mass detector, which are combined to generate an overall priority score. This priority score is used to order images for radiologist review. Of 595 testing images, DeepCAT recommended low priority for 315 images (53%), of which none contained a malignant mass. In evaluation of prioritizing images according to likelihood of containing cancer, DeepCAT's study ordering required an average of 26 adjacent swaps to obtain perfect review order. Our results suggest that DeepCAT could substantially increase efficiency for breast imagers and effectively triage review of mammograms with malignant masses.
Assuntos
Neoplasias da Mama , Mamografia , Neoplasias da Mama/diagnóstico por imagem , Computadores , Detecção Precoce de Câncer , Feminino , Humanos , TriagemRESUMO
BACKGROUND: Deep learning (DL) has demonstrated human expert levels of performance for medical image classification in a wide array of medical fields, including ophthalmology. In this article, we present the results of our DL system designed to determine optic disc laterality, right eye vs left eye, in the presence of both normal and abnormal optic discs. METHODS: Using transfer learning, we modified the ResNet-152 deep convolutional neural network (DCNN), pretrained on ImageNet, to determine the optic disc laterality. After a 5-fold cross-validation, we generated receiver operating characteristic curves and corresponding area under the curve (AUC) values to evaluate performance. The data set consisted of 576 color fundus photographs (51% right and 49% left). Both 30° photographs centered on the optic disc (63%) and photographs with varying degree of optic disc centration and/or wider field of view (37%) were included. Both normal (27%) and abnormal (73%) optic discs were included. Various neuro-ophthalmological diseases were represented, such as, but not limited to, atrophy, anterior ischemic optic neuropathy, hypoplasia, and papilledema. RESULTS: Using 5-fold cross-validation (70% training; 10% validation; 20% testing), our DCNN for classifying right vs left optic disc achieved an average AUC of 0.999 (±0.002) with optimal threshold values, yielding an average accuracy of 98.78% (±1.52%), sensitivity of 98.60% (±1.72%), and specificity of 98.97% (±1.38%). When tested against a separate data set for external validation, our 5-fold cross-validation model achieved the following average performance: AUC 0.996 (±0.005), accuracy 97.2% (±2.0%), sensitivity 96.4% (±4.3%), and specificity 98.0% (±2.2%). CONCLUSIONS: Small data sets can be used to develop high-performing DL systems for semantic labeling of neuro-ophthalmology images, specifically in distinguishing between right and left optic discs, even in the presence of neuro-ophthalmological pathologies. Although this may seem like an elementary task, this study demonstrates the power of transfer learning and provides an example of a DCNN that can help curate large medical image databases for machine-learning purposes and facilitate ophthalmologist workflow by automatically labeling images according to laterality.
Assuntos
Algoritmos , Aprendizado Profundo , Técnicas de Diagnóstico Oftalmológico , Aprendizado de Máquina , Neurologia , Oftalmologia , Disco Óptico/diagnóstico por imagem , Doenças do Nervo Óptico/diagnóstico , Humanos , Curva ROCRESUMO
OBJECTIVE: To develop and evaluate the performance of deep convolutional neural networks (DCNN) to detect and identify specific total shoulder arthroplasty (TSA) models. MATERIALS AND METHODS: We included 482 radiography studies obtained from publicly available image repositories with native shoulders, reverse TSA (RTSA) implants, and five different TSA models. We trained separate ResNet DCNN-based binary classifiers to (1) detect the presence of shoulder arthroplasty implants, (2) differentiate between TSA and RTSA, and (3) differentiate between the five TSA models, using five individual classifiers for each model, respectively. Datasets were divided into training, validation, and test datasets. Training and validation datasets were 20-fold augmented. Test performances were assessed with area under the receiver-operating characteristic curves (AUC-ROC) analyses. Class activation mapping was used to identify distinguishing imaging features used for DCNN classification decisions. RESULTS: The DCNN for the detection of the presence of shoulder arthroplasty implants achieved an AUC-ROC of 1.0, whereas the AUC-ROC for differentiation between TSA and RTSA was 0.97. Class activation map analysis demonstrated the emphasis on the characteristic arthroplasty components in decision-making. DCNNs trained to distinguish between the five TSA models achieved AUC-ROCs ranging from 0.86 for Stryker Solar to 1.0 for Zimmer Bigliani-Flatow with class activation map analysis demonstrating an emphasis on unique implant design features. CONCLUSION: DCNNs can accurately identify the presence of and distinguish between TSA & RTSA, and classify five specific TSA models with high accuracy. The proof of concept of these DCNNs may set the foundation for an automated arthroplasty atlas for rapid and comprehensive model identification.
Assuntos
Artroplastia do Ombro , Aprendizado Profundo , Humanos , Redes Neurais de Computação , Curva ROC , RadiografiaRESUMO
BACKGROUND: An automated method for identifying the anatomical region of an image independent of metadata labels could improve radiologist workflow (e.g., automated hanging protocols) and help facilitate the automated curation of large medical imaging data sets for machine learning purposes. Deep learning is a potential tool for this purpose. OBJECTIVE: To develop and test the performance of deep convolutional neural networks (DCNN) for the automated classification of pediatric musculoskeletal radiographs by anatomical area. MATERIALS AND METHODS: We utilized a database of 250 pediatric bone radiographs (50 each of the shoulder, elbow, hand, pelvis and knee) to train 5 DCNNs, one to detect each anatomical region amongst the others, based on ResNet-18 pretrained on ImageNet (transfer learning). For each DCNN, the radiographs were randomly split into training (64%), validation (12%) and test (24%) data sets. The training and validation data sets were augmented 30 times using standard preprocessing methods. We also tested our DCNNs on a separate test set of 100 radiographs from a single institution. Receiver operating characteristics (ROC) with area under the curve (AUC) were used to evaluate DCNN performances. RESULTS: All five DCNN trained for classification of the radiographs into anatomical region achieved ROC AUC of 1, respectively, for both test sets. Classification of the test radiographs occurred at a rate of 33 radiographs per s. CONCLUSION: DCNNs trained on a small set of images with 30 times augmentation through standard processing techniques are able to automatically classify pediatric musculoskeletal radiographs into anatomical region with near-perfect to perfect accuracy at superhuman speeds. This concept may apply to other body parts and radiographic views with the potential to create an all-encompassing semantic-labeling DCNN.
Assuntos
Aprendizado Profundo , Doenças Musculoesqueléticas/diagnóstico por imagem , Redes Neurais de Computação , Radiografia/métodos , Adolescente , Área Sob a Curva , Automação , Criança , Pré-Escolar , Competência Clínica , Bases de Dados Factuais , Feminino , Humanos , Aprendizado de Máquina , Masculino , Doenças Musculoesqueléticas/classificação , Curva ROC , Radiologistas/estatística & dados numéricos , Estudos Retrospectivos , Semântica , Fluxo de TrabalhoRESUMO
Machine learning has several potential uses in medical imaging for semantic labeling of images to improve radiologist workflow and to triage studies for review. The purpose of this study was to (1) develop deep convolutional neural networks (DCNNs) for automated classification of 2D mammography views, determination of breast laterality, and assessment and of breast tissue density; and (2) compare the performance of DCNNs on these tasks of varying complexity to each other. We obtained 3034 2D-mammographic images from the Digital Database for Screening Mammography, annotated with mammographic view, image laterality, and breast tissue density. These images were used to train a DCNN to classify images for these three tasks. The DCNN trained to classify mammographic view achieved receiver-operating-characteristic (ROC) area under the curve (AUC) of 1. The DCNN trained to classify breast image laterality initially misclassified right and left breasts (AUC 0.75); however, after discontinuing horizontal flips during data augmentation, AUC improved to 0.93 (p < 0.0001). Breast density classification proved more difficult, with the DCNN achieving 68% accuracy. Automated semantic labeling of 2D mammography is feasible using DCNNs and can be performed with small datasets. However, automated classification of differences in breast density is more difficult, likely requiring larger datasets.
Assuntos
Neoplasias da Mama/diagnóstico por imagem , Aprendizado Profundo , Mamografia/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Semântica , Mama/diagnóstico por imagem , Feminino , Humanos , Aprendizado de MáquinaRESUMO
Ensuring correct radiograph view labeling is important for machine learning algorithm development and quality control of studies obtained from multiple facilities. The purpose of this study was to develop and test the performance of a deep convolutional neural network (DCNN) for the automated classification of frontal chest radiographs (CXRs) into anteroposterior (AP) or posteroanterior (PA) views. We obtained 112,120 CXRs from the NIH ChestX-ray14 database, a publicly available CXR database performed in adult (106,179 (95%)) and pediatric (5941 (5%)) patients consisting of 44,810 (40%) AP and 67,310 (60%) PA views. CXRs were used to train, validate, and test the ResNet-18 DCNN for classification of radiographs into anteroposterior and posteroanterior views. A second DCNN was developed in the same manner using only the pediatric CXRs (2885 (49%) AP and 3056 (51%) PA). Receiver operating characteristic (ROC) curves with area under the curve (AUC) and standard diagnostic measures were used to evaluate the DCNN's performance on the test dataset. The DCNNs trained on the entire CXR dataset and pediatric CXR dataset had AUCs of 1.0 and 0.997, respectively, and accuracy of 99.6% and 98%, respectively, for distinguishing between AP and PA CXR. Sensitivity and specificity were 99.6% and 99.5%, respectively, for the DCNN trained on the entire dataset and 98% for both sensitivity and specificity for the DCNN trained on the pediatric dataset. The observed difference in performance between the two algorithms was not statistically significant (p = 0.17). Our DCNNs have high accuracy for classifying AP/PA orientation of frontal CXRs, with only slight reduction in performance when the training dataset was reduced by 95%. Rapid classification of CXRs by the DCNN can facilitate annotation of large image datasets for machine learning and quality assurance purposes.
Assuntos
Aprendizado Profundo , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Radiografia Torácica/métodos , Adulto , Criança , Bases de Dados Factuais , Humanos , Reprodutibilidade dos Testes , Estudos Retrospectivos , Sensibilidade e EspecificidadeRESUMO
Training skillful and competent surgeons is critical to ensure high quality of care and to minimize disparities in access to effective care. Traditional models to train surgeons are being challenged by rapid advances in technology, an intensified patient-safety culture, and a need for value-driven health systems. Simultaneously, technological developments are enabling capture and analysis of large amounts of complex surgical data. These developments are motivating a "surgical data science" approach to objective computer-aided technical skill evaluation (OCASE-T) for scalable, accurate assessment; individualized feedback; and automated coaching. We define the problem space for OCASE-T and summarize 45 publications representing recent research in this domain. We find that most studies on OCASE-T are simulation based; very few are in the operating room. The algorithms and validation methodologies used for OCASE-T are highly varied; there is no uniform consensus. Future research should emphasize competency assessment in the operating room, validation against patient outcomes, and effectiveness for surgical training.
Assuntos
Algoritmos , Competência Clínica , Salas Cirúrgicas/organização & administração , Cirurgiões/classificação , Desempenho Profissional/classificaçãoRESUMO
PURPOSE: To develop a computer-based image segmentation method for standardizing the quantification of geographic atrophy (GA). METHODS: The authors present an automated image segmentation method based on the fuzzy c-means clustering algorithm for the detection of GA lesions. The method is evaluated by comparing computerized segmentation against outlines of GA drawn by an expert grader for a longitudinal series of fundus autofluorescence images with paired 30° color fundus photographs for 10 patients. RESULTS: The automated segmentation method showed excellent agreement with an expert grader for fundus autofluorescence images, achieving a performance level of 94 ± 5% sensitivity and 98 ± 2% specificity on a per-pixel basis for the detection of GA area, but performed less well on color fundus photographs with a sensitivity of 47 ± 26% and specificity of 98 ± 2%. The segmentation algorithm identified 75 ± 16% of the GA border correctly in fundus autofluorescence images compared with just 42 ± 25% for color fundus photographs. CONCLUSION: The results of this study demonstrate a promising computerized segmentation method that may enhance the reproducibility of GA measurement and provide an objective strategy to assist an expert in the grading of images.
Assuntos
Técnicas de Diagnóstico Oftalmológico , Atrofia Geográfica/diagnóstico , Processamento de Imagem Assistida por Computador/métodos , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Progressão da Doença , Reações Falso-Positivas , Feminino , Fóvea Central/patologia , Humanos , Masculino , Pessoa de Meia-Idade , Disco Óptico/patologia , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
PURPOSE: Develop and evaluate the performance of a deep learning model (DLM) that forecasts eyes with low future visual field (VF) variability, and study the impact of using this DLM on sample size requirements for neuroprotective trials. DESIGN: Retrospective cohort and simulation study. METHODS: We included 1 eye per patient with baseline reliable VFs, OCT, clinical measures (demographics, intraocular pressure, and visual acuity), and 5 subsequent reliable VFs to forecast VF variability using DLMs and perform sample size estimates. We estimated sample size for 3 groups of eyes: all eyes (AE), low variability eyes (LVE: the subset of AE with a standard deviation of mean deviation [MD] slope residuals in the bottom 25th percentile), and DLM-predicted low variability eyes (DLPE: the subset of AE predicted to be low variability by the DLM). Deep learning models using only baseline VF/OCT/clinical data as input (DLM1), or also using a second VF (DLM2) were constructed to predict low VF variability (DLPE1 and DLPE2, respectively). Data were split 60/10/30 into train/val/test. Clinical trial simulations were performed only on the test set. We estimated the sample size necessary to detect treatment effects of 20% to 50% in MD slope with 80% power. Power was defined as the percentage of simulated clinical trials where the MD slope was significantly worse from the control. Clinical trials were simulated with visits every 3 months with a total of 10 visits. RESULTS: A total of 2817 eyes were included in the analysis. Deep learning models 1 and 2 achieved an area under the receiver operating characteristic curve of 0.73 (95% confidence interval [CI]: 0.68, 0.76) and 0.82 (95% CI: 0.78, 0.85) in forecasting low VF variability. When compared with including AE, using DLPE1 and DLPE2 reduced sample size to achieve 80% power by 30% and 38% for 30% treatment effect, and 31% and 38% for 50% treatment effect. CONCLUSIONS: Deep learning models can forecast eyes with low VF variability using data from a single baseline clinical visit. This can reduce sample size requirements, and potentially reduce the burden of future glaucoma clinical trials. FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Assuntos
Aprendizado Profundo , Pressão Intraocular , Campos Visuais , Humanos , Campos Visuais/fisiologia , Estudos Retrospectivos , Pressão Intraocular/fisiologia , Feminino , Masculino , Ensaios Clínicos como Assunto , Glaucoma/fisiopatologia , Glaucoma/diagnóstico , Acuidade Visual/fisiologia , Idoso , Testes de Campo Visual/métodos , Pessoa de Meia-Idade , Tomografia de Coerência Óptica/métodosRESUMO
PURPOSE: Monocular SLAM algorithms are the key enabling technology for image-based surgical navigation systems for endoscopic procedures. Due to the visual feature scarcity and unique lighting conditions encountered in endoscopy, classical SLAM approaches perform inconsistently. Many of the recent approaches to endoscopic SLAM rely on deep learning models. They show promising results when optimized on singular domains such as arthroscopy, sinus endoscopy, colonoscopy or laparoscopy, but are limited by an inability to generalize to different domains without retraining. METHODS: To address this generality issue, we propose OneSLAM a monocular SLAM algorithm for surgical endoscopy that works out of the box for several endoscopic domains, including sinus endoscopy, colonoscopy, arthroscopy and laparoscopy. Our pipeline builds upon robust tracking any point (TAP) foundation models to reliably track sparse correspondences across multiple frames and runs local bundle adjustment to jointly optimize camera poses and a sparse 3D reconstruction of the anatomy. RESULTS: We compare the performance of our method against three strong baselines previously proposed for monocular SLAM in endoscopy and general scenes. OneSLAM presents better or comparable performance over existing approaches targeted to that specific data in all four tested domains, generalizing across domains without the need for retraining. CONCLUSION: OneSLAM benefits from the convincing performance of TAP foundation models but generalizes to endoscopic sequences of different anatomies all while demonstrating better or comparable performance over domain-specific SLAM approaches. Future research on global loop closure will investigate how to reliably detect loops in endoscopic scenes to reduce accumulated drift and enhance long-term navigation capabilities.
Assuntos
Algoritmos , Endoscopia , Humanos , Endoscopia/métodos , Imageamento Tridimensional/métodos , Cirurgia Assistida por Computador/métodos , Processamento de Imagem Assistida por Computador/métodosRESUMO
OBJECTIVE: To estimate and adjust for rater effects in operating room surgical skills assessment performed using a structured rating scale for nasal septoplasty. METHODS: We analyzed survey responses from attending surgeons (raters) who supervised residents and fellows (trainees) performing nasal septoplasty in a prospective cohort study. We fit a structural equation model with the rubric item scores regressed on a latent component of skill and then fit a second model including the rating surgeon as a random effect to model a rater-effects-adjusted latent surgical skill. We validated this model against conventional measures including the level of expertise and post-graduation year (PGY) commensurate with the trainee's performance, the actual PGY of the trainee, and whether the surgical goals were achieved. RESULTS: Our dataset included 188 assessments by 7 raters and 41 trainees. The model with one latent construct for surgical skill and the rater as a random effect was the best. Rubric scores depended on how severe or lenient the rater was, sometimes almost as much as they depended on trainee skill. Rater-adjusted latent skill scores increased with attending-estimated skill levels and PGY of trainees, increased with the actual PGY, and appeared constant over different levels of achievement of surgical goals. CONCLUSION: Our work provides a method to obtain rater effect adjusted surgical skill assessments in the operating room using structured rating scales. Our method allows for the creation of standardized (i.e., rater-effects-adjusted) quantitative surgical skill benchmarks using national-level databases on trainee assessments. LEVEL OF EVIDENCE: N/A Laryngoscope, 134:3548-3554, 2024.
Assuntos
Competência Clínica , Internato e Residência , Salas Cirúrgicas , Humanos , Salas Cirúrgicas/normas , Estudos Prospectivos , Septo Nasal/cirurgia , Rinoplastia/educação , Rinoplastia/normas , Cirurgiões/educação , Cirurgiões/normas , Cirurgiões/estatística & dados numéricos , Inquéritos e Questionários , Feminino , MasculinoRESUMO
PURPOSE: Preoperative imaging plays a pivotal role in sinus surgery where CTs offer patient-specific insights of complex anatomy, enabling real-time intraoperative navigation to complement endoscopy imaging. However, surgery elicits anatomical changes not represented in the preoperative model, generating an inaccurate basis for navigation during surgery progression. METHODS: We propose a first vision-based approach to update the preoperative 3D anatomical model leveraging intraoperative endoscopic video for navigated sinus surgery where relative camera poses are known. We rely on comparisons of intraoperative monocular depth estimates and preoperative depth renders to identify modified regions. The new depths are integrated in these regions through volumetric fusion in a truncated signed distance function representation to generate an intraoperative 3D model that reflects tissue manipulation RESULTS: We quantitatively evaluate our approach by sequentially updating models for a five-step surgical progression in an ex vivo specimen. We compute the error between correspondences from the updated model and ground-truth intraoperative CT in the region of anatomical modification. The resulting models show a decrease in error during surgical progression as opposed to increasing when no update is employed. CONCLUSION: Our findings suggest that preoperative 3D anatomical models can be updated using intraoperative endoscopy video in navigated sinus surgery. Future work will investigate improvements to monocular depth estimation as well as removing the need for external navigation systems. The resulting ability to continuously update the patient model may provide surgeons with a more precise understanding of the current anatomical state and paves the way toward a digital twin paradigm for sinus surgery.
Assuntos
Endoscopia , Imageamento Tridimensional , Modelos Anatômicos , Cirurgia Assistida por Computador , Tomografia Computadorizada por Raios X , Imageamento Tridimensional/métodos , Humanos , Endoscopia/métodos , Tomografia Computadorizada por Raios X/métodos , Cirurgia Assistida por Computador/métodos , Seios Paranasais/cirurgia , Seios Paranasais/diagnóstico por imagemRESUMO
Performing micromanipulation and delicate operations in submillimeter workspaces is difficult because of destabilizing tremor and imprecise targeting. Accurate micromanipulation is especially important for microsurgical procedures, such as vitreoretinal surgery, to maximize successful outcomes and minimize collateral damage. Robotic aid combined with filtering techniques that suppress tremor frequency bands increases performance; however, if knowledge of the operator's goals is available, virtual fixtures have been shown to further improve performance. In this paper, we derive a virtual fixture framework for active handheld micromanipulators that is based on high-bandwidth position measurements rather than forces applied to a robot handle. For applicability in surgical environments, the fixtures are generated in real-time from microscope video during the procedure. Additionally, we develop motion scaling behavior around virtual fixtures as a simple and direct extension to the proposed framework. We demonstrate that virtual fixtures significantly outperform tremor cancellation algorithms on a set of synthetic tracing tasks (p < 0.05). In more medically relevant experiments of vein tracing and membrane peeling in eye phantoms, virtual fixtures can significantly reduce both positioning error and forces applied to tissue (p < 0.05).
RESUMO
Non-expert users can now program robots using various end-user robot programming methods, which have widened the use of robots and lowered barriers preventing robot use by laypeople. Kinesthetic teaching is a common form of end-user robot programming, allowing users to forgo writing code by physically guiding the robot to demonstrate behaviors. Although it can be more accessible than writing code, kinesthetic teaching is difficult in practice because of users' unfamiliarity with kinematics or limitations of robots and programming interfaces. Developing good kinesthetic demonstrations requires physical and cognitive skills, such as the ability to plan effective grasps for different task objects and constraints, to overcome programming difficulties. How to help users learn these skills remains a largely unexplored question, with users conventionally learning through self-guided practice. Our study compares how self-guided practice compares with curriculum-based training in building users' programming proficiency. While we found no significant differences between study participants who learned through practice compared to participants who learned through our curriculum, our study reveals insights into factors contributing to end-user robot programmers' confidence and success during programming and how learning interventions may contribute to such factors. Our work paves the way for further research on how to best structure training interventions for end-user robot programmers.
Assuntos
Robótica , Humanos , Robótica/métodos , Aprendizagem , Currículo , Exame Físico , Fenômenos BiomecânicosRESUMO
OBJECTIVE: Endoscopic surgery has a considerable learning curve due to dissociation of the visual-motor axes, coupled with decreased tactile feedback and mobility. In particular, endoscopic sinus surgery (ESS) lacks objective skill assessment metrics to provide specific feedback to trainees. This study aims to identify summary metrics from eye tracking, endoscope motion, and tool motion to objectively assess surgeons' ESS skill. METHODS: In this cross-sectional study, expert and novice surgeons performed ESS tasks of inserting an endoscope and tool into a cadaveric nose, touching an anatomical landmark, and withdrawing the endoscope and tool out of the nose. Tool and endoscope motion were collected using an electromagnetic tracker, and eye gaze was tracked using an infrared camera. Three expert surgeons provided binary assessments of low/high skill. 20 summary statistics were calculated for eye, tool, and endoscope motion and used in logistic regression models to predict surgical skill. RESULTS: 14 metrics (10 eye gaze, 2 tool motion, and 2 endoscope motion) were significantly different between surgeons with low and high skill. Models to predict skill for 6/9 ESS tasks had an AUC >0.95. A combined model of all tasks (AUC 0.95, PPV 0.93, NPV 0.89) included metrics from eye tracking data and endoscope motion, indicating that these metrics are transferable across tasks. CONCLUSIONS: Eye gaze, endoscope, and tool motion data can provide an objective and accurate measurement of ESS surgical performance. Incorporation of these algorithmic techniques intraoperatively could allow for automated skill assessment for trainees learning endoscopic surgery. LEVEL OF EVIDENCE: N/A Laryngoscope, 133:500-505, 2023.
Assuntos
Tecnologia de Rastreamento Ocular , Cirurgiões , Humanos , Estudos Transversais , Endoscopia , Endoscópios , Competência ClínicaRESUMO
PURPOSE: Recent advances in computer vision and machine learning have resulted in endoscopic video-based solutions for dense reconstruction of the anatomy. To effectively use these systems in surgical navigation, a reliable image-based technique is required to constantly track the endoscopic camera's position within the anatomy, despite frequent removal and re-insertion. In this work, we investigate the use of recent learning-based keypoint descriptors for six degree-of-freedom camera pose estimation in intraoperative endoscopic sequences and under changes in anatomy due to surgical resection. METHODS: Our method employs a dense structure from motion (SfM) reconstruction of the preoperative anatomy, obtained with a state-of-the-art patient-specific learning-based descriptor. During the reconstruction step, each estimated 3D point is associated with a descriptor. This information is employed in the intraoperative sequences to establish 2D-3D correspondences for Perspective-n-Point (PnP) camera pose estimation. We evaluate this method in six intraoperative sequences that include anatomical modifications obtained from two cadaveric subjects. RESULTS: Show that this approach led to translation and rotation errors of 3.9 mm and 0.2 radians, respectively, with 21.86% of localized cameras averaged over the six sequences. In comparison to an additional learning-based descriptor (HardNet++), the selected descriptor can achieve a better percentage of localized cameras with similar pose estimation performance. We further discussed potential error causes and limitations of the proposed approach. CONCLUSION: Patient-specific learning-based descriptors can relocalize images that are well distributed across the inspected anatomy, even where the anatomy is modified. However, camera relocalization in endoscopic sequences remains a persistently challenging problem, and future research is necessary to increase the robustness and accuracy of this technique.