RESUMO
Recently, transformer-based architectures have been shown to outperform classic convolutional architectures and have rapidly been established as state-of-the-art models for many medical vision tasks. Their superior performance can be explained by their ability to capture long-range dependencies of their multi-head self-attention mechanism. However, they tend to overfit on small- or even medium-sized datasets because of their weak inductive bias. As a result, they require massive, labeled datasets, which are expensive to obtain, especially in the medical domain. This motivated us to explore unsupervised semantic feature learning without any form of annotation. In this work, we aimed to learn semantic features in a self-supervised manner by training transformer-based models to segment the numerical signals of geometric shapes inserted on original computed tomography (CT) images. Moreover, we developed a Convolutional Pyramid vision Transformer (CPT) that leverages multi-kernel convolutional patch embedding and local spatial reduction in each of its layer to generate multi-scale features, capture local information, and reduce computational cost. Using these approaches, we were able to noticeably outperformed state-of-the-art deep learning-based segmentation or classification models of liver cancer CT datasets of 5,237 patients, the pancreatic cancer CT datasets of 6,063 patients, and breast cancer MRI dataset of 127 patients.
Assuntos
Neoplasias da Mama , Neoplasias Hepáticas , Neoplasias Pancreáticas , Humanos , Feminino , Neoplasias da Mama/diagnóstico por imagem , Fontes de Energia Elétrica , Semântica , Processamento de Imagem Assistida por ComputadorRESUMO
The aim of this study was to develop a novel deep learning (DL) model without requiring large-annotated training datasets for detecting pancreatic cancer (PC) using computed tomography (CT) images. This retrospective diagnostic study was conducted using CT images collected from 2004 and 2019 from 4287 patients diagnosed with PC. We proposed a self-supervised learning algorithm (pseudo-lesion segmentation (PS)) for PC classification, which was trained with and without PS and validated on randomly divided training and validation sets. We further performed cross-racial external validation using open-access CT images from 361 patients. For internal validation, the accuracy and sensitivity for PC classification were 94.3% (92.8-95.4%) and 92.5% (90.0-94.4%), and 95.7% (94.5-96.7%) and 99.3 (98.4-99.7%) for the convolutional neural network (CNN) and transformer-based DL models (both with PS), respectively. Implementing PS on a small-sized training dataset (randomly sampled 10%) increased accuracy by 20.5% and sensitivity by 37.0%. For external validation, the accuracy and sensitivity were 82.5% (78.3-86.1%) and 81.7% (77.3-85.4%) and 87.8% (84.0-90.8%) and 86.5% (82.3-89.8%) for the CNN and transformer-based DL models (both with PS), respectively. PS self-supervised learning can increase DL-based PC classification performance, reliability, and robustness of the model for unseen, and even small, datasets. The proposed DL model is potentially useful for PC diagnosis.
RESUMO
Several state-of-the-art object detectors have demonstrated outstanding performances by optimizing feature representation through modification of the backbone architecture and exploitation of a feature pyramid. To determine the effectiveness of this approach, we explore the modification of object detectors' backbone and feature pyramid by utilizing Neural Architecture Search (NAS) and Capsule Network. We introduce two modules, namely, NAS-gate convolutional module and Capsule Attention module. The NAS-gate convolutional module optimizes standard convolution in a backbone network based on differentiable architecture search cooperation with multiple convolution conditions to overcome object scale variation problems. The Capsule Attention module exploits the strong spatial relationship encoding ability of the capsule network to generate a spatial attention mask, which emphasizes important features and suppresses unnecessary features in the feature pyramid, in order to optimize the feature representation and localization capability of the detectors. Experimental results indicate that the NAS-gate convolutional module can alleviate the object scale variation problem and the Capsule Attention network can help to avoid inaccurate localization. Next, we introduce NASGC-CapANet, which incorporates the two modules, i.e., a NAS-gate convolutional module and capsule attention module. Results of comparisons against state-of-the-art object detectors on the MS COCO val-2017 dataset demonstrate that NASGC-CapANet-based Faster R-CNN significantly outperforms the baseline Faster R-CNN with a ResNet-50 backbone and a ResNet-101 backbone by mAPs of 2.7% and 2.0%, respectively. Furthermore, the NASGC-CapANet-based Cascade R-CNN achieves a box mAP of 43.8% on the MS COCO test-dev dataset.
Assuntos
Redes Neurais de Computação , Registros , Extratos VegetaisRESUMO
Deep convolutional networks have been developed to detect prohibited items for automated inspection of X-ray screening systems in the transport security system. To our knowledge, the existing frameworks were developed to recognize threats using only baggage security X-ray scans. Therefore, the detection accuracy in other domains of security X-ray scans, such as cargo X-ray scans, cannot be ensured. We propose an object detection method for efficiently detecting contraband items in both cargo and baggage for X-ray security scans. The proposed network, MFA-net, consists of three plug-and-play modules, including the multiscale dilated convolutional module, fusion feature pyramid network, and auxiliary point detection head. First, the multiscale dilated convolutional module converts the standard convolution of the detector backbone to a conditional convolution by aggregating the features from multiple dilated convolutions using dynamic feature selection to overcome the object-scale variant issue. Second, the fusion feature pyramid network combines the proposed attention and fusion modules to enhance multiscale object recognition and alleviate the object and occlusion problem. Third, the auxiliary point detection head adopts an auxiliary head to predict the new keypoints of the bounding box to emphasize the localizability without requiring further ground-truth information. We tested the performance of the MFA-net on two large-scale X-ray security image datasets from different domains: a Security Inspection X-ray (SIXray) dataset in the baggage domain and our dataset, named CargoX, in the cargo domain. Moreover, MFA-net outperformed state-of-the-art object detectors in both domains. Thus, adopting the proposed modules can further increase the detection capability of the current object detectors on X-ray security images.