RESUMO
This article discusses the role of computer vision in otolaryngology, particularly through endoscopy and surgery. It covers recent applications of artificial intelligence (AI) in nonradiologic imaging within otolaryngology, noting the benefits and challenges, such as improving diagnostic accuracy and optimizing therapeutic outcomes, while also pointing out the necessity for enhanced data curation and standardized research methodologies to advance clinical applications. Technical aspects are also covered, providing a detailed view of the progression from manual feature extraction to more complex AI models, including convolutional neural networks and vision transformers and their potential application in clinical settings.
Assuntos
Inteligência Artificial , Otolaringologia , Humanos , Otolaringologia/métodos , Endoscopia/métodos , Gravação em Vídeo , Redes Neurais de ComputaçãoRESUMO
OBJECTIVES: To evaluate the performance of vision transformer-derived image embeddings for distinguishing between normal and neoplastic tissues in the oropharynx and to investigate the potential of computer vision (CV) foundation models in medical imaging. METHODS: Computational study using endoscopic frames with a focus on the application of a self-supervised vision transformer model (DINOv2) for tissue classification. High-definition endoscopic images were used to extract image patches that were then normalized and processed using the DINOv2 model to obtain embeddings. These embeddings served as input for a standard support vector machine (SVM) to classify the tissues as neoplastic or normal. The model's discriminative performance was validated using an 80-20 train-validation split. RESULTS: From 38 endoscopic NBI videos, 327 image patches were analyzed. The classification results in the validation cohort demonstrated high accuracy (92%) and precision (89%), with a perfect recall (100%) and an F1-score of 94%. The receiver operating characteristic (ROC) curve yielded an area under the curve (AUC) of 0.96. CONCLUSION: The use of large vision model-derived embeddings effectively differentiated between neoplastic and normal oropharyngeal tissues. This study supports the feasibility of employing CV foundation models like DINOv2 in the endoscopic evaluation of mucosal lesions, potentially augmenting diagnostic precision in Otorhinolaryngology. LEVEL OF EVIDENCE: 4 Laryngoscope, 134:4535-4541, 2024.
Assuntos
Neoplasias Orofaríngeas , Humanos , Neoplasias Orofaríngeas/patologia , Endoscopia/métodos , Máquina de Vetores de Suporte , Estudo de Prova de Conceito , Curva ROC , Interpretação de Imagem Assistida por Computador/métodos , Orofaringe/patologia , Imagem de Banda Estreita/métodosRESUMO
Colorectal cancer is one of the most common cancers in the world. While colonoscopy is an effective screening technique, navigating an endoscope through the colon to detect polyps is challenging. A 3D map of the observed surfaces could enhance the identification of unscreened colon tissue and serve as a training platform. However, reconstructing the colon from video footage remains difficult. Learning-based approaches hold promise as robust alternatives, but necessitate extensive datasets. Establishing a benchmark dataset, the 2022 EndoVis sub-challenge SimCol3D aimed to facilitate data-driven depth and pose prediction during colonoscopy. The challenge was hosted as part of MICCAI 2022 in Singapore. Six teams from around the world and representatives from academia and industry participated in the three sub-challenges: synthetic depth prediction, synthetic pose prediction, and real pose prediction. This paper describes the challenge, the submitted methods, and their results. We show that depth prediction from synthetic colonoscopy images is robustly solvable, while pose estimation remains an open research question.
Assuntos
Colonoscopia , Imageamento Tridimensional , Humanos , Imageamento Tridimensional/métodos , Neoplasias Colorretais/diagnóstico por imagem , Pólipos do Colo/diagnóstico por imagemRESUMO
Intraoperative frozen section analysis can be used to improve the accuracy of tumour margin estimation during cancer resection surgery through rapid processing and pathological assessment of excised tissue. Its applicability is limited in some cases due to the additional risks associated with prolonged surgery, largely from the time-consuming staining procedure. Our work uses a measurable property of bulk tissue to bypass the staining process: as tumour cells proliferate, they influence the surrounding extra-cellular matrix, and the resulting change in elastic modulus provides a signature of the underlying pathology. In this work we accurately localise atomic force microscopy measurements of human liver tissue samples and train a generative adversarial network to infer elastic modulus from low-resolution images of unstained tissue sections. Pathology is predicted through unsupervised clustering of parameters characterizing the distributions of inferred values, achieving 89% accuracy for all samples based on the nominal assessment (n = 28), and 95% for samples that have been validated by two independent pathologists through post hoc staining (n = 20). Our results demonstrate that this technique could increase the feasibility of intraoperative frozen section analysis for use during resection surgery and improve patient outcomes.
RESUMO
PURPOSE: Colorectal cancer is the third most common cancer worldwide, and early therapeutic treatment of precancerous tissue during colonoscopy is crucial for better prognosis and can be curative. Navigation within the colon and comprehensive inspection of the endoluminal tissue are key to successful colonoscopy but can vary with the skill and experience of the endoscopist. Computer-assisted interventions in colonoscopy can provide better support tools for mapping the colon to ensure complete examination and for automatically detecting abnormal tissue regions. METHODS: We train the conditional generative adversarial network pix2pix, to transform monocular endoscopic images to depth, which can be a building block in a navigational pipeline or be used to measure the size of polyps during colonoscopy. To overcome the lack of labelled training data in endoscopy, we propose to use simulation environments and to additionally train the generator and discriminator of the model on unlabelled real video frames in order to adapt to real colonoscopy environments. RESULTS: We report promising results on synthetic, phantom and real datasets and show that generative models outperform discriminative models when predicting depth from colonoscopy images, in terms of both accuracy and robustness towards changes in domains. CONCLUSIONS: Training the discriminator and generator of the model on real images, we show that our model performs implicit domain adaptation, which is a key step towards bridging the gap between synthetic and real data. Importantly, we demonstrate the feasibility of training a single model to predict depth from both synthetic and real images without the need for explicit, unsupervised transformer networks mapping between the domains of synthetic and real data.