ABSTRACT
At present, adversarial attacks are designed in a task-specific fashion. However, for downstream computer vision tasks such as image captioning and image segmentation, the current deep-learning systems use an image classifier such as VGG16, ResNet50, and Inception-v3 as a feature extractor. Keeping this in mind, we propose Mimic and Fool (MaF), a task-agnostic adversarial attack. Given a feature extractor, the proposed attack finds an adversarial image, which can mimic the image feature of the original image. This ensures that the two images give the same (or similar) output regardless of the task. We randomly select 1000 MSCOCO validation images for experimentation. We perform experiments on two image captioning models, Show and Tell, Show Attend and Tell, and one visual question answering (VQA) model, namely, end-to-end neural module network (N2NMN). The proposed attack achieves a success rate of 74.0%, 81.0%, and 87.1% for Show and Tell, Show Attend and Tell, and N2NMN, respectively. We also propose a slight modification to our attack to generate natural-looking adversarial images. In addition, we also show the applicability of the proposed attack for invertible architecture. Since MaF only requires information about the feature extractor of the model, it can be considered as a gray-box attack.
ABSTRACT
Munro's Microabscess (MM) is the diagnostic hallmark of psoriasis. Neutrophil detection in the Stratum Corneum (SC) of the skin epidermis is an integral part of MM detection in skin biopsy. The microscopic inspection of skin biopsy is a tedious task and staining variations in skin histopathology often hinder human performance to differentiate neutrophils from skin keratinocytes. Motivated from this, we propose a computational framework that can assist human experts and reduce potential errors in diagnosis. The framework first segments the SC layer, and multiple patches are sampled from the segmented regions which are classified to detect neutrophils. Both UNet and CapsNet are used for segmentation and classification. Experiments show that of the two choices, CapsNet, owing to its robustness towards better hierarchical object representation and localisation ability, appears as a better candidate for both segmentation and classification tasks and hence, we termed our framework as MICaps. The training algorithm explores both minimisation of Dice Loss and Focal Loss and makes a comparative study between the two. The proposed framework is validated with our in-house dataset consisting of 290 skin biopsy images. Two different experiments are considered. Under the first protocol, only 3-fold cross-validation is done to directly compare the current results with the state-of-the-art ones. Next, the performance of the system on a held-out data set is reported. The experimental results show that MICaps improves the state-of-the-art diagnosis performance by 3.27% (maximum) and reduces the number of model parameters by 50%.