RESUMO
BACKGROUND: Artificial intelligence (AI) techniques are promising in early diagnosis of skin diseases. However, a precondition for their success is the access to large-scaled annotated data. Until now, obtaining this data has only been feasible with very high personnel and financial resources. OBJECTIVES: The aim of this study was to overcome the obstacle caused by the scarcity of labelled data. METHODS: To simulate the scenario of label shortage, we discarded a proportion of labels of the training set. The training set consisted of both labelled and unlabelled images. We then leveraged a self-supervised learning technique to pretrain the AI model on the unlabelled images. Next, we fine-tuned the pretrained model on the labelled images. RESULTS: When the images in the training dataset were fully labelled, the self-supervised pretrained model achieved 95.7% of accuracy, 91.7% of precision and 90.7% of sensitivity. When only 10% of the data were labelled, the model could still yield 87.7% of accuracy, 81.7% of precision and 68.6% of sensitivity. In addition, we also empirically verified that the AI model and dermatologists are consistent in visually inspecting the skin images. CONCLUSIONS: The experimental results demonstrate the great potential of the self-supervised learning in alleviating the scarcity of annotated data.