Pesquisa | BVS Doenças Infecciosas e Parasitárias

Classification of Mobile-Based Oral Cancer Images Using the Vision Transformer and the Swin Transformer.

Song, Bofan; Kc, Dharma Raj; Yang, Rubin Yuchan; Li, Shaobai; Zhang, Chicheng; Liang, Rongguang.

Cancers (Basel) ; 16(5)2024 Feb 29.

Artigo em Inglês | MEDLINE | ID: mdl-38473348

RESUMO

Oral cancer, a pervasive and rapidly growing malignant disease, poses a significant global health concern. Early and accurate diagnosis is pivotal for improving patient outcomes. Automatic diagnosis methods based on artificial intelligence have shown promising results in the oral cancer field, but the accuracy still needs to be improved for realistic diagnostic scenarios. Vision Transformers (ViT) have outperformed learning CNN models recently in many computer vision benchmark tasks. This study explores the effectiveness of the Vision Transformer and the Swin Transformer, two cutting-edge variants of the transformer architecture, for the mobile-based oral cancer image classification application. The pre-trained Swin transformer model achieved 88.7% accuracy in the binary classification task, outperforming the ViT model by 2.3%, while the conventional convolutional network model VGG19 and ResNet50 achieved 85.2% and 84.5% accuracy. Our experiments demonstrate that these transformer-based architectures outperform traditional convolutional neural networks in terms of oral cancer image classification, and underscore the potential of the ViT and the Swin Transformer in advancing the state of the art in oral cancer image analysis.

Interpretable and Reliable Oral Cancer Classifier with Attention Mechanism and Expert Knowledge Embedding via Attention Map.

Song, Bofan; Zhang, Chicheng; Sunny, Sumsum; Kc, Dharma Raj; Li, Shaobai; Gurushanth, Keerthi; Mendonca, Pramila; Mukhia, Nirza; Patrick, Sanjana; Gurudath, Shubha; Raghavan, Subhashini; Tsusennaro, Imchen; Leivon, Shirley T; Kolur, Trupti; Shetty, Vivek; Bushan, Vidya; Ramesh, Rohan; Pillai, Vijay; Wilder-Smith, Petra; Suresh, Amritha; Kuriakose, Moni Abraham; Birur, Praveen; Liang, Rongguang.

Cancers (Basel) ; 15(5)2023 Feb 23.

Artigo em Inglês | MEDLINE | ID: mdl-36900210

RESUMO

Convolutional neural networks have demonstrated excellent performance in oral cancer detection and classification. However, the end-to-end learning strategy makes CNNs hard to interpret, and it can be challenging to fully understand the decision-making procedure. Additionally, reliability is also a significant challenge for CNN based approaches. In this study, we proposed a neural network called the attention branch network (ABN), which combines the visual explanation and attention mechanisms to improve the recognition performance and interpret the decision-making simultaneously. We also embedded expert knowledge into the network by having human experts manually edit the attention maps for the attention mechanism. Our experiments have shown that ABN performs better than the original baseline network. By introducing the Squeeze-and-Excitation (SE) blocks to the network, the cross-validation accuracy increased further. Furthermore, we observed that some previously misclassified cases were correctly recognized after updating by manually editing the attention maps. The cross-validation accuracy increased from 0.846 to 0.875 with the ABN (Resnet18 as baseline), 0.877 with SE-ABN, and 0.903 after embedding expert knowledge. The proposed method provides an accurate, interpretable, and reliable oral cancer computer-aided diagnosis system through visual explanation, attention mechanisms, and expert knowledge embedding.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA