Diffeomorphic transforms for data augmentation of highly variable shape and texture objects.
Comput Methods Programs Biomed
; 219: 106775, 2022 Jun.
Article
em En
| MEDLINE
| ID: mdl-35397412
BACKGROUND AND OBJECTIVE: Training a deep convolutional neural network (CNN) for automatic image classification requires a large database with images of labeled samples. However, in some applications such as biology and medicine only a few experts can correctly categorize each sample. Experts are able to identify small changes in shape and texture which go unnoticed by untrained people, as well as distinguish between objects in the same class that present drastically different shapes and textures. This means that currently available databases are too small and not suitable to train deep learning models from scratch. To deal with this problem, data augmentation techniques are commonly used to increase the dataset size. However, typical data augmentation methods introduce artifacts or apply distortions to the original image, which instead of creating new realistic samples, obtain basic spatial variations of the original ones. METHODS: We propose a novel data augmentation procedure which generates new realistic samples, by combining two samples that belong to the same class. Although the idea behind the method described in this paper is to mimic the variations that diatoms experience in different stages of their life cycle, it has also been demonstrated in glomeruli and pollen identification problems. This new data augmentation procedure is based on morphing and image registration methods that perform diffeomorphic transformations. RESULTS: The proposed technique achieves an increase in accuracy over existing techniques of 0.47%, 1.47%, and 0.23% for diatom, glomeruli and pollen problems respectively. CONCLUSIONS: For the Diatom dataset, the method is able to simulate the shape changes in different diatom life cycle stages, and thus, images generated resemble newly acquired samples with intermediate shapes. In fact, the other methods compared obtained worse results than those which were not using data augmentation. For the Glomeruli dataset, the method is able to add new samples with different shapes and degrees of sclerosis (through different textures). This is the case where our proposed DA method is more beneficial, when objects highly differ in both shape and texture. Finally, for the Pollen dataset, since there are only small variations between samples in a few classes and this dataset has other features such as noise which are likely to benefit other existing DA techniques, the method still shows an improvement of the results.
Palavras-chave
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Redes Neurais de Computação
/
Gerenciamento de Dados
Idioma:
En
Ano de publicação:
2022
Tipo de documento:
Article