RESUMO
PURPOSE: Informative image selection in laryngoscopy has the potential for improving automatic data extraction alone, for selective data storage and a faster review process, or in combination with other artificial intelligence (AI) detection or diagnosis models. This paper aims to demonstrate the feasibility of AI in providing automatic informative laryngoscopy frame selection also capable of working in real-time providing visual feedback to guide the otolaryngologist during the examination. METHODS: Several deep learning models were trained and tested on an internal dataset (n = 5147 images) and then tested on an external test set (n = 646 images) composed of both white light and narrow band images. Four videos were used to assess the real-time performance of the best-performing model. RESULTS: ResNet-50, pre-trained with the pretext strategy, reached a precision = 95% vs. 97%, recall = 97% vs, 89%, and the F1-score = 96% vs. 93% on the internal and external test set respectively (p = 0.062). The four testing videos are provided in the supplemental materials. CONCLUSION: The deep learning model demonstrated excellent performance in identifying diagnostically relevant frames within laryngoscopic videos. With its solid accuracy and real-time capabilities, the system is promising for its development in a clinical setting, either autonomously for objective quality control or in conjunction with other algorithms within a comprehensive AI toolset aimed at enhancing tumor detection and diagnosis.
Assuntos
Aprendizado Profundo , Laringoscopia , Humanos , Laringoscopia/métodos , Gravação em Vídeo , Estudos de Viabilidade , Doenças da Laringe/diagnóstico , Doenças da Laringe/diagnóstico por imagemRESUMO
Recent advances in medical imaging have highlighted the critical development of algorithms for individual vertebral segmentation on computed tomography (CT) scans. Essential for diagnostic accuracy and treatment planning in orthopaedics, neurosurgery and oncology, these algorithms face challenges in clinical implementation, including integration into healthcare systems. Consequently, our focus lies in exploring the application of knowledge distillation (KD) methods to train shallower networks capable of efficiently segmenting vertebrae in CT scans. This approach aims to reduce segmentation time, enhance suitability for emergency cases, and optimize computational and memory resource efficiency. Building upon prior research in the field, a two-step segmentation approach was employed. Firstly, the spine's location was determined by predicting a heatmap, indicating the probability of each voxel belonging to the spine. Subsequently, an iterative segmentation of vertebrae was performed from the top to the bottom of the CT volume over the located spine, using a memory instance to record the already segmented vertebrae. KD methods were implemented by training a teacher network with performance similar to that found in the literature, and this knowledge was distilled to a shallower network (student). Two KD methods were applied: (1) using the soft outputs of both networks and (2) matching logits. Two publicly available datasets, comprising 319 CT scans from 300 patients and a total of 611 cervical, 2387 thoracic, and 1507 lumbar vertebrae, were used. To ensure dataset balance and robustness, effective data augmentation methods were applied, including cleaning the memory instance to replicate the first vertebra segmentation. The teacher network achieved an average Dice similarity coefficient (DSC) of 88.22% and a Hausdorff distance (HD) of 7.71 mm, showcasing performance similar to other approaches in the literature. Through knowledge distillation from the teacher network, the student network's performance improved, with an average DSC increasing from 75.78% to 84.70% and an HD decreasing from 15.17 mm to 8.08 mm. Compared to other methods, our teacher network exhibited up to 99.09% fewer parameters, 90.02% faster inference time, 88.46% shorter total segmentation time, and 89.36% less associated carbon (CO2) emission rate. Regarding our student network, it featured 75.00% fewer parameters than our teacher, resulting in a 36.15% reduction in inference time, a 33.33% decrease in total segmentation time, and a 42.96% reduction in CO2 emissions. This study marks the first exploration of applying KD to the problem of individual vertebrae segmentation in CT, demonstrating the feasibility of achieving comparable performance to existing methods using smaller neural networks.
Assuntos
Dióxido de Carbono , Tomografia Computadorizada por Raios X , Humanos , Tomografia Computadorizada por Raios X/métodos , Redes Neurais de Computação , Algoritmos , Vértebras LombaresRESUMO
OBJECTIVE: To investigate the potential of deep learning for automatically delineating (segmenting) laryngeal cancer superficial extent on endoscopic images and videos. METHODS: A retrospective study was conducted extracting and annotating white light (WL) and Narrow-Band Imaging (NBI) frames to train a segmentation model (SegMENT-Plus). Two external datasets were used for validation. The model's performances were compared with those of two otolaryngology residents. In addition, the model was tested on real intraoperative laryngoscopy videos. RESULTS: A total of 3933 images of laryngeal cancer from 557 patients were used. The model achieved the following median values (interquartile range): Dice Similarity Coefficient (DSC) = 0.83 (0.70-0.90), Intersection over Union (IoU) = 0.83 (0.73-0.90), Accuracy = 0.97 (0.95-0.99), Inference Speed = 25.6 (25.1-26.1) frames per second. The external testing cohorts comprised 156 and 200 images. SegMENT-Plus performed similarly on all three datasets for DSC (p = 0.05) and IoU (p = 0.07). No significant differences were noticed when separately analyzing WL and NBI test images on DSC (p = 0.06) and IoU (p = 0.78) and when analyzing the model versus the two residents on DSC (p = 0.06) and IoU (Senior vs. SegMENT-Plus, p = 0.13; Junior vs. SegMENT-Plus, p = 1.00). The model was then tested on real intraoperative laryngoscopy videos. CONCLUSION: SegMENT-Plus can accurately delineate laryngeal cancer boundaries in endoscopic images, with performances equal to those of two otolaryngology residents. The results on the two external datasets demonstrate excellent generalization capabilities. The computation speed of the model allowed its application on videolaryngoscopies simulating real-time use. Clinical trials are needed to evaluate the role of this technology in surgical practice and resection margin improvement. LEVEL OF EVIDENCE: III Laryngoscope, 134:2826-2834, 2024.
Assuntos
Aprendizado Profundo , Neoplasias Laríngeas , Laringoscopia , Imagem de Banda Estreita , Humanos , Laringoscopia/métodos , Imagem de Banda Estreita/métodos , Neoplasias Laríngeas/diagnóstico por imagem , Neoplasias Laríngeas/cirurgia , Neoplasias Laríngeas/patologia , Estudos Retrospectivos , Gravação em Vídeo , Masculino , Feminino , Pessoa de Meia-Idade , Luz , IdosoRESUMO
PURPOSE: In twin-to-twin transfusion syndrome (TTTS), abnormal vascular anastomoses in the monochorionic placenta can produce uneven blood flow between the two fetuses. In the current practice, TTTS is treated surgically by closing abnormal anastomoses using laser ablation. This surgery is minimally invasive and relies on fetoscopy. Limited field of view makes anastomosis identification a challenging task for the surgeon. METHODS: To tackle this challenge, we propose a learning-based framework for in vivo fetoscopy frame registration for field-of-view expansion. The novelties of this framework rely on a learning-based keypoint proposal network and an encoding strategy to filter (i) irrelevant keypoints based on fetoscopic semantic image segmentation and (ii) inconsistent homographies. RESULTS: We validate our framework on a dataset of six intraoperative sequences from six TTTS surgeries from six different women against the most recent state-of-the-art algorithm, which relies on the segmentation of placenta vessels. CONCLUSION: The proposed framework achieves higher performance compared to the state of the art, paving the way for robust mosaicking to provide surgeons with context awareness during TTTS surgery.
Assuntos
Transfusão Feto-Fetal , Terapia a Laser , Gravidez , Feminino , Humanos , Fetoscopia/métodos , Transfusão Feto-Fetal/diagnóstico por imagem , Transfusão Feto-Fetal/cirurgia , Placenta/cirurgia , Placenta/irrigação sanguínea , Terapia a Laser/métodos , AlgoritmosRESUMO
Fetoscopy laser photocoagulation is a widely adopted procedure for treating Twin-to-Twin Transfusion Syndrome (TTTS). The procedure involves photocoagulation pathological anastomoses to restore a physiological blood exchange among twins. The procedure is particularly challenging, from the surgeon's side, due to the limited field of view, poor manoeuvrability of the fetoscope, poor visibility due to amniotic fluid turbidity, and variability in illumination. These challenges may lead to increased surgery time and incomplete ablation of pathological anastomoses, resulting in persistent TTTS. Computer-assisted intervention (CAI) can provide TTTS surgeons with decision support and context awareness by identifying key structures in the scene and expanding the fetoscopic field of view through video mosaicking. Research in this domain has been hampered by the lack of high-quality data to design, develop and test CAI algorithms. Through the Fetoscopic Placental Vessel Segmentation and Registration (FetReg2021) challenge, which was organized as part of the MICCAI2021 Endoscopic Vision (EndoVis) challenge, we released the first large-scale multi-center TTTS dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms with a focus on creating drift-free mosaics from long duration fetoscopy videos. For this challenge, we released a dataset of 2060 images, pixel-annotated for vessels, tool, fetus and background classes, from 18 in-vivo TTTS fetoscopy procedures and 18 short video clips of an average length of 411 frames for developing placental scene segmentation and frame registration for mosaicking techniques. Seven teams participated in this challenge and their model performance was assessed on an unseen test dataset of 658 pixel-annotated images from 6 fetoscopic procedures and 6 short clips. For the segmentation task, overall baseline performed was the top performing (aggregated mIoU of 0.6763) and was the best on the vessel class (mIoU of 0.5817) while team RREB was the best on the tool (mIoU of 0.6335) and fetus (mIoU of 0.5178) classes. For the registration task, overall the baseline performed better than team SANO with an overall mean 5-frame SSIM of 0.9348. Qualitatively, it was observed that team SANO performed better in planar scenarios, while baseline was better in non-planner scenarios. The detailed analysis showed that no single team outperformed on all 6 test fetoscopic videos. The challenge provided an opportunity to create generalized solutions for fetoscopic scene understanding and mosaicking. In this paper, we present the findings of the FetReg2021 challenge, alongside reporting a detailed literature review for CAI in TTTS fetoscopy. Through this challenge, its analysis and the release of multi-center fetoscopic data, we provide a benchmark for future research in this field.
Assuntos
Transfusão Feto-Fetal , Placenta , Feminino , Humanos , Gravidez , Algoritmos , Transfusão Feto-Fetal/diagnóstico por imagem , Transfusão Feto-Fetal/cirurgia , Transfusão Feto-Fetal/patologia , Fetoscopia/métodos , Feto , Placenta/diagnóstico por imagemRESUMO
Vocal folds motility evaluation is paramount in both the assessment of functional deficits and in the accurate staging of neoplastic disease of the glottis. Diagnostic endoscopy, and in particular videoendoscopy, is nowadays the method through which the motility is estimated. The clinical diagnosis, however, relies on the examination of the videoendoscopic frames, which is a subjective and professional-dependent task. Hence, a more rigorous, objective, reliable, and repeatable method is needed. To support clinicians, this paper proposes a machine learning (ML) approach for vocal cords motility classification. From the endoscopic videos of 186 patients with both vocal cords preserved motility and fixation, a dataset of 558 images relative to the two classes was extracted. Successively, a number of features was retrieved from the images and used to train and test four well-grounded ML classifiers. From test results, the best performance was achieved using XGBoost, with precision = 0.82, recall = 0.82, F1 score = 0.82, and accuracy = 0.82. After comparing the most relevant ML models, we believe that this approach could provide precise and reliable support to clinical evaluation.Clinical Relevance- This research represents an important advancement in the state-of-the-art of computer-assisted otolaryngology, to develop an effective tool for motility assessment in the clinical practice.
Assuntos
Endoscopia , Prega Vocal , Humanos , Prega Vocal/diagnóstico por imagem , Glote , Gravação de Videoteipe , Aprendizado de MáquinaRESUMO
Amyloidosis refers to a range of medical conditions in which misshapen proteins accumulate in various organs and tissues, forming insoluble fibrils. Cardiac amyloidosis is frequently linked to the buildup of misfolded transthyretin (TTR) or immunoglobulin light chains (AL). Delayed diagnosis, due to lack of disease awareness, results in a poor prognosis, especially in patients with AL amyloidosis. Early identification is therefore a key factor to improve patient outcomes. This study investigates the use of supervised machine-learning algorithms to support clinicians in classifying amyloidosis and control subjects. The aim of this work is to foster model interpretability reporting the most important risk factors in predicting the presence of cardiac amyloidosis. We analyzed electronic health records (EHRs) of 418 participants acquired in a time window of 12 years as part of a case-control study conducted in Fondazione Toscana Gabriele Monasterio (Italy) clinical practice. This work paves the way for the creation of digital health solutions that can aid in amyloidosis screening. The effective handling, analysis, and interpretation of these solutions can have a transformative effect on modern healthcare, offering new opportunities for improved patient care.
Assuntos
Amiloidose , Cardiomiopatias , Humanos , Estudos de Casos e Controles , Registros Eletrônicos de Saúde , Cardiomiopatias/diagnóstico , Amiloidose/diagnóstico , Amiloidose/metabolismo , Aprendizado de Máquina , EletrônicaRESUMO
PURPOSE: Fetoscopic laser photocoagulation of placental anastomoses is the most effective treatment for twin-to-twin transfusion syndrome (TTTS). A robust mosaic of placenta and its vascular network could support surgeons' exploration of the placenta by enlarging the fetoscope field-of-view. In this work, we propose a learning-based framework for field-of-view expansion from intra-operative video frames. METHODS: While current state of the art for fetoscopic mosaicking builds upon the registration of anatomical landmarks which may not always be visible, our framework relies on learning-based features and keypoints, as well as robust transformer-based image-feature matching, without requiring any anatomical priors. We further address the problem of occlusion recovery and frame relocalization, relying on the computed features and their descriptors. RESULTS: Experiments were conducted on 10 in-vivo TTTS videos from two different fetal surgery centers. The proposed framework was compared with several state-of-the-art approaches, achieving higher [Formula: see text] on 7 out of 10 videos and a success rate of [Formula: see text] in occlusion recovery. CONCLUSION: This work introduces a learning-based framework for placental mosaicking with occlusion recovery from intra-operative videos using a keypoint-based strategy and features. The proposed framework can compute the placental panorama and recover even in case of camera tracking loss where other methods fail. The results suggest that the proposed framework has large potential to pave the way to creating a surgical navigation system for TTTS by providing robust field-of-view expansion.
Assuntos
Transfusão Feto-Fetal , Fetoscopia , Feminino , Humanos , Gravidez , Transfusão Feto-Fetal/cirurgia , Fetoscopia/métodos , Fotocoagulação , Placenta/cirurgiaRESUMO
Objective: To achieve instance segmentation of upper aerodigestive tract (UADT) neoplasms using a deep learning (DL) algorithm, and to identify differences in its diagnostic performance in three different sites: larynx/hypopharynx, oral cavity and oropharynx. Methods: A total of 1034 endoscopic images from 323 patients were examined under narrow band imaging (NBI). The Mask R-CNN algorithm was used for the analysis. The dataset split was: 935 training, 48 validation and 51 testing images. Dice Similarity Coefficient (Dsc) was the main outcome measure. Results: Instance segmentation was effective in 76.5% of images. The mean Dsc was 0.90 ± 0.05. The algorithm correctly predicted 77.8%, 86.7% and 55.5% of lesions in the larynx/hypopharynx, oral cavity, and oropharynx, respectively. The mean Dsc was 0.90 ± 0.05 for the larynx/hypopharynx, 0.60 ± 0.26 for the oral cavity, and 0.81 ± 0.30 for the oropharynx. The analysis showed inferior diagnostic results in the oral cavity compared with the larynx/hypopharynx (p < 0.001). Conclusions: The study confirms the feasibility of instance segmentation of UADT using DL algorithms and shows inferior diagnostic results in the oral cavity compared with other anatomic areas.
Assuntos
Laringe , Neoplasias , Humanos , Boca , Hipofaringe , AlgoritmosRESUMO
OBJECTIVE: The endoscopic and laryngoscopic examination is paramount for laryngeal, oropharyngeal, nasopharyngeal, nasal, and oral cavity benign lesions and cancer evaluation. Nevertheless, upper aerodigestive tract (UADT) endoscopy is intrinsically operator-dependent and lacks objective quality standards. At present, there has been an increased interest in artificial intelligence (AI) applications in this area to support physicians during the examination, thus enhancing diagnostic performances. The relative novelty of this research field poses a challenge both for the reviewers and readers as clinicians often lack a specific technical background. DATA SOURCES: Four bibliographic databases were searched: PubMed, EMBASE, Cochrane, and Google Scholar. REVIEW METHODS: A structured review of the current literature (up to September 2022) was performed. Search terms related to topics of AI, machine learning (ML), and deep learning (DL) in UADT endoscopy and laryngoscopy were identified and queried by 3 independent reviewers. Citations of selected studies were also evaluated to ensure comprehensiveness. CONCLUSIONS: Forty-one studies were included in the review. AI and computer vision techniques were used to achieve 3 fundamental tasks in this field: classification, detection, and segmentation. All papers were summarized and reviewed. IMPLICATIONS FOR PRACTICE: This article comprehensively reviews the latest developments in the application of ML and DL in UADT endoscopy and laryngoscopy, as well as their future clinical implications. The technical basis of AI is also explained, providing guidance for nonexpert readers to allow critical appraisal of the evaluation metrics and the most relevant quality requirements.
Assuntos
Inteligência Artificial , Médicos , Humanos , Endoscopia , Laringoscopia , Aprendizado de MáquinaRESUMO
Challenges have become the state-of-the-art approach to benchmark image analysis algorithms in a comparative manner. While the validation on identical data sets was a great step forward, results analysis is often restricted to pure ranking tables, leaving relevant questions unanswered. Specifically, little effort has been put into the systematic investigation on what characterizes images in which state-of-the-art algorithms fail. To address this gap in the literature, we (1) present a statistical framework for learning from challenges and (2) instantiate it for the specific task of instrument instance segmentation in laparoscopic videos. Our framework relies on the semantic meta data annotation of images, which serves as foundation for a General Linear Mixed Models (GLMM) analysis. Based on 51,542 meta data annotations performed on 2,728 images, we applied our approach to the results of the Robust Medical Instrument Segmentation Challenge (ROBUST-MIS) challenge 2019 and revealed underexposure, motion and occlusion of instruments as well as the presence of smoke or other objects in the background as major sources of algorithm failure. Our subsequent method development, tailored to the specific remaining issues, yielded a deep learning model with state-of-the-art overall performance and specific strengths in the processing of images in which previous methods tended to fail. Due to the objectivity and generic applicability of our approach, it could become a valuable tool for validation in the field of medical image analysis and beyond.
Assuntos
Algoritmos , Laparoscopia , Humanos , Processamento de Imagem Assistida por Computador/métodosRESUMO
Artificial intelligence is being increasingly seen as a useful tool in medicine. Specifically, these technologies have the objective to extract insights from complex datasets that cannot easily be analyzed by conventional statistical methods. While promising results have been obtained for various -omics datasets, radiological images, and histopathologic slides, analysis of videoendoscopic frames still represents a major challenge. In this context, videomics represents a burgeoning field wherein several methods of computer vision are systematically used to organize unstructured data from frames obtained during diagnostic videoendoscopy. Recent studies have focused on five broad tasks with increasing complexity: quality assessment of endoscopic images, classification of pathologic and nonpathologic frames, detection of lesions inside frames, segmentation of pathologic lesions, and in-depth characterization of neoplastic lesions. Herein, we present a broad overview of the field, with a focus on conceptual key points and future perspectives.
RESUMO
Introduction: Narrow Band Imaging (NBI) is an endoscopic visualization technique useful for upper aero-digestive tract (UADT) cancer detection and margins evaluation. However, NBI analysis is strongly operator-dependent and requires high expertise, thus limiting its wider implementation. Recently, artificial intelligence (AI) has demonstrated potential for applications in UADT videoendoscopy. Among AI methods, deep learning algorithms, and especially convolutional neural networks (CNNs), are particularly suitable for delineating cancers on videoendoscopy. This study is aimed to develop a CNN for automatic semantic segmentation of UADT cancer on endoscopic images. Materials and Methods: A dataset of white light and NBI videoframes of laryngeal squamous cell carcinoma (LSCC) was collected and manually annotated. A novel DL segmentation model (SegMENT) was designed. SegMENT relies on DeepLabV3+ CNN architecture, modified using Xception as a backbone and incorporating ensemble features from other CNNs. The performance of SegMENT was compared to state-of-the-art CNNs (UNet, ResUNet, and DeepLabv3). SegMENT was then validated on two external datasets of NBI images of oropharyngeal (OPSCC) and oral cavity SCC (OSCC) obtained from a previously published study. The impact of in-domain transfer learning through an ensemble technique was evaluated on the external datasets. Results: 219 LSCC patients were retrospectively included in the study. A total of 683 videoframes composed the LSCC dataset, while the external validation cohorts of OPSCC and OCSCC contained 116 and 102 images. On the LSCC dataset, SegMENT outperformed the other DL models, obtaining the following median values: 0.68 intersection over union (IoU), 0.81 dice similarity coefficient (DSC), 0.95 recall, 0.78 precision, 0.97 accuracy. For the OCSCC and OPSCC datasets, results were superior compared to previously published data: the median performance metrics were, respectively, improved as follows: DSC=10.3% and 11.9%, recall=15.0% and 5.1%, precision=17.0% and 14.7%, accuracy=4.1% and 10.3%. Conclusion: SegMENT achieved promising performances, showing that automatic tumor segmentation in endoscopic images is feasible even within the highly heterogeneous and complex UADT environment. SegMENT outperformed the previously published results on the external validation cohorts. The model demonstrated potential for improved detection of early tumors, more precise biopsies, and better selection of resection margins.
RESUMO
PURPOSE: Complications related to vascular damage such as intra-operative bleeding may be avoided during neurosurgical procedures such as petroclival meningioma surgery. To address this and improve the patient's safety, we designed a real-time blood vessel avoidance strategy that enables operation on deformable tissue during petroclival meningioma surgery using Micron, a handheld surgical robotic tool. METHODS: We integrated real-time intra-operative blood vessel segmentation of brain vasculature using deep learning, with a 3D reconstruction algorithm to obtain the vessel point cloud in real time. We then implemented a virtual-fixture-based strategy that prevented Micron's tooltip from entering a forbidden region around the vessel, thus avoiding damage to it. RESULTS: We achieved a median Dice similarity coefficient of 0.97, 0.86, 0.87 and 0.77 on datasets of phantom blood vessels, petrosal vein, internal carotid artery and superficial vessels, respectively. We conducted trials with deformable clay vessel phantoms, keeping the forbidden region 400 [Formula: see text]m outside and 400 [Formula: see text]m inside the vessel. Micron's tip entered the forbidden region with a median penetration of just 8.84 [Formula: see text]m and 9.63 [Formula: see text]m, compared to 148.74 [Formula: see text]m and 117.17 [Formula: see text]m without our strategy, for the former and latter trials, respectively. CONCLUSION: Real-time control of Micron was achieved at 33.3 fps. We achieved improvements in real-time segmentation of brain vasculature from intra-operative images and showed that our approach works even on non-stationary vessel phantoms. The results suggest that by enabling precise, real-time control, we are one step closer to using Micron in real neurosurgical procedures.
Assuntos
Neoplasias Meníngeas , Meningioma , Algoritmos , Humanos , Neoplasias Meníngeas/diagnóstico por imagem , Neoplasias Meníngeas/cirurgia , Meningioma/diagnóstico por imagem , Meningioma/cirurgia , Procedimentos Neurocirúrgicos , Imagens de FantasmasRESUMO
OBJECTIVES: To assess a new application of artificial intelligence for real-time detection of laryngeal squamous cell carcinoma (LSCC) in both white light (WL) and narrow-band imaging (NBI) videolaryngoscopies based on the You-Only-Look-Once (YOLO) deep learning convolutional neural network (CNN). STUDY DESIGN: Experimental study with retrospective data. METHODS: Recorded videos of LSCC were retrospectively collected from in-office transnasal videoendoscopies and intraoperative rigid endoscopies. LSCC videoframes were extracted for training, validation, and testing of various YOLO models. Different techniques were used to enhance the image analysis: contrast limited adaptive histogram equalization, data augmentation techniques, and test time augmentation (TTA). The best-performing model was used to assess the automatic detection of LSCC in six videolaryngoscopies. RESULTS: Two hundred and nineteen patients were retrospectively enrolled. A total of 624 LSCC videoframes were extracted. The YOLO models were trained after random distribution of images into a training set (82.6%), validation set (8.2%), and testing set (9.2%). Among the various models, the ensemble algorithm (YOLOv5s with YOLOv5m-TTA) achieved the best LSCC detection results, with performance metrics in par with the results reported by other state-of-the-art detection models: 0.66 Precision (positive predicted value), 0.62 Recall (sensitivity), and 0.63 mean Average Precision at 0.5 intersection over union. Tests on the six videolaryngoscopies demonstrated an average computation time per videoframe of 0.026 seconds. Three demonstration videos are provided. CONCLUSION: This study identified a suitable CNN model for LSCC detection in WL and NBI videolaryngoscopies. Detection performances are highly promising. The limited complexity and quick computational times for LSCC detection make this model ideal for real-time processing. LEVEL OF EVIDENCE: 3 Laryngoscope, 132:1798-1806, 2022.
Assuntos
Aprendizado Profundo , Neoplasias Laríngeas , Laringoscópios , Inteligência Artificial , Humanos , Neoplasias Laríngeas/diagnóstico por imagem , Laringoscopia , Imagem de Banda Estreita/métodos , Estudos RetrospectivosRESUMO
Laser microsurgery is the current gold standard surgical technique for the treatment of selected diseases in delicate organs such as the larynx. However, the operations require large surgical expertise and dexterity, and face significant limitations imposed by available technology, such as the requirement for direct line of sight to the surgical field, restricted access, and direct manual control of the surgical instruments. To change this status quo, the European project µRALP pioneered research towards a complete redesign of current laser microsurgery systems, focusing on the development of robotic micro-technologies to enable endoscopic operations. This has fostered awareness and interest in this field, which presents a unique set of needs, requirements and constraints, leading to research and technological developments beyond µRALP and its research consortium. This paper reviews the achievements and key contributions of such research, providing an overview of the current state of the art in robot-assisted endoscopic laser microsurgery. The primary target application considered is phonomicrosurgery, which is a representative use case involving highly challenging microsurgical techniques for the treatment of glottic diseases. The paper starts by presenting the motivations and rationale for endoscopic laser microsurgery, which leads to the introduction of robotics as an enabling technology for improved surgical field accessibility, visualization and management. Then, research goals, achievements, and current state of different technologies that can build-up to an effective robotic system for endoscopic laser microsurgery are presented. This includes research in micro-robotic laser steering, flexible robotic endoscopes, augmented imaging, assistive surgeon-robot interfaces, and cognitive surgical systems. Innovations in each of these areas are shown to provide sizable progress towards more precise, safer and higher quality endoscopic laser microsurgeries. Yet, major impact is really expected from the full integration of such individual contributions into a complete clinical surgical robotic system, as illustrated in the end of this paper with a description of preliminary cadaver trials conducted with the integrated µRALP system. Overall, the contribution of this paper lays in outlining the current state of the art and open challenges in the area of robot-assisted endoscopic laser microsurgery, which has important clinical applications even beyond laryngology.
RESUMO
Isocitrate dehydrogenase (IDH) mutational status is pivotal in the management of gliomas. Patients with IDH-mutated (IDH-MUT) tumors have a better prognosis and benefit more from extended surgical resection than IDH wild-type (IDH-WT). Raman spectroscopy (RS) is a minimally invasive optical technique with great potential for intraoperative diagnosis. We evaluated the RS's ability to characterize the IDH mutational status onto unprocessed glioma biopsies. We extracted 2073 Raman spectra from thirty-eight unprocessed samples. The classification performance was assessed using the eXtreme Gradient Boosted trees (XGB) and Support Vector Machine with Radial Basis Function kernel (RBF-SVM). Measured Raman spectra displayed differences between IDH-MUT and IDH-WT tumor tissue. From the 103 Raman shifts screened as input features, the cross-validation loop identified 52 shifts with the highest performance in the distinction of the two groups. Raman analysis showed differences in spectral features of lipids, collagen, DNA and cholesterol/phospholipids. We were able to distinguish between IDH-MUT and IDH-WT tumors with an accuracy and precision of 87%. RS is a valuable and accurate tool for characterizing the mutational status of IDH mutation in unprocessed glioma samples. This study improves RS knowledge for future personalized surgical strategy or in situ target therapies for glioma tumors.
RESUMO
BACKGROUND AND OBJECTIVES: Fetal head-circumference (HC) measurement from ultrasound (US) images provides useful hints for assessing fetal growth. Such measurement is performed manually during the actual clinical practice, posing issues relevant to intra- and inter-clinician variability. This work presents a fully automatic, deep-learning-based approach to HC delineation, which we named Mask-R[Formula: see text]CNN. It advances our previous work in the field and performs HC distance-field regression in an end-to-end fashion, without requiring a priori HC localization nor any postprocessing for outlier removal. METHODS: Mask-R[Formula: see text]CNN follows the Mask-RCNN architecture, with a backbone inspired by feature-pyramid networks, a region-proposal network and the ROI align. The Mask-RCNN segmentation head is here modified to regress the HC distance field. RESULTS: Mask-R[Formula: see text]CNN was tested on the HC18 Challenge dataset, which consists of 999 training and 335 testing images. With a comprehensive ablation study, we showed that Mask-R[Formula: see text]CNN achieved a mean absolute difference of 1.95 mm (standard deviation [Formula: see text] mm), outperforming other approaches in the literature. CONCLUSIONS: With this work, we proposed an end-to-end model for HC distance-field regression. With our experimental results, we showed that Mask-R[Formula: see text]CNN may be an effective support for clinicians for assessing fetal growth.
Assuntos
Cabeça , Processamento de Imagem Assistida por Computador , Humanos , Cabeça/diagnóstico por imagem , UltrassonografiaRESUMO
INTRODUCTION: Fully convoluted neural networks (FCNN) applied to video-analysis are of particular interest in the field of head and neck oncology, given that endoscopic examination is a crucial step in diagnosis, staging, and follow-up of patients affected by upper aero-digestive tract cancers. The aim of this study was to test FCNN-based methods for semantic segmentation of squamous cell carcinoma (SCC) of the oral cavity (OC) and oropharynx (OP). MATERIALS AND METHODS: Two datasets were retrieved from the institutional registry of a tertiary academic hospital analyzing 34 and 45 NBI endoscopic videos of OC and OP lesions, respectively. The dataset referring to the OC was composed of 110 frames, while 116 frames composed the OP dataset. Three FCNNs (U-Net, U-Net 3, and ResNet) were investigated to segment the neoplastic images. FCNNs performance was evaluated for each tested network and compared to the gold standard, represented by the manual annotation performed by expert clinicians. RESULTS: For FCNN-based segmentation of the OC dataset, the best results in terms of Dice Similarity Coefficient (Dsc) were achieved by ResNet with 5(×2) blocks and 16 filters, with a median value of 0.6559. In FCNN-based segmentation for the OP dataset, the best results in terms of Dsc were achieved by ResNet with 4(×2) blocks and 16 filters, with a median value of 0.7603. All tested FCNNs presented very high values of variance, leading to very low values of minima for all metrics evaluated. CONCLUSIONS: FCNNs have promising potential in the analysis and segmentation of OC and OP video-endoscopic images. All tested FCNN architectures demonstrated satisfying outcomes in terms of diagnostic accuracy. The inference time of the processing networks were particularly short, ranging between 14 and 115 ms, thus showing the possibility for real-time application.