Diffeomorphic transforms for data augmentation of highly variable shape and texture objects.

Vallez, Noelia; Bueno, Gloria; Deniz, Oscar; Blanco, Saul

Vallez, Noelia; Bueno, Gloria; Deniz, Oscar; Blanco, Saul.

Afiliação

Vallez N; VISILAB, University of Castilla-La Mancha, E.T.S. Ingenieria Industrial, Avda. Camilo Jose Cela s/n, Ciudad Real 13071, Spain. Electronic address: noelia.vallez@uclm.es.
Bueno G; VISILAB, University of Castilla-La Mancha, E.T.S. Ingenieria Industrial, Avda. Camilo Jose Cela s/n, Ciudad Real 13071, Spain.
Deniz O; VISILAB, University of Castilla-La Mancha, E.T.S. Ingenieria Industrial, Avda. Camilo Jose Cela s/n, Ciudad Real 13071, Spain.
Blanco S; Institute of the Environment, University of Leon, Leon E-24071, Spain.

Comput Methods Programs Biomed ; 219: 106775, 2022 Jun.

Article em En | MEDLINE | ID: mdl-35397412

RESUMO

BACKGROUND AND OBJECTIVE: Training a deep convolutional neural network (CNN) for automatic image classification requires a large database with images of labeled samples. However, in some applications such as biology and medicine only a few experts can correctly categorize each sample. Experts are able to identify small changes in shape and texture which go unnoticed by untrained people, as well as distinguish between objects in the same class that present drastically different shapes and textures. This means that currently available databases are too small and not suitable to train deep learning models from scratch. To deal with this problem, data augmentation techniques are commonly used to increase the dataset size. However, typical data augmentation methods introduce artifacts or apply distortions to the original image, which instead of creating new realistic samples, obtain basic spatial variations of the original ones. METHODS: We propose a novel data augmentation procedure which generates new realistic samples, by combining two samples that belong to the same class. Although the idea behind the method described in this paper is to mimic the variations that diatoms experience in different stages of their life cycle, it has also been demonstrated in glomeruli and pollen identification problems. This new data augmentation procedure is based on morphing and image registration methods that perform diffeomorphic transformations. RESULTS: The proposed technique achieves an increase in accuracy over existing techniques of 0.47%, 1.47%, and 0.23% for diatom, glomeruli and pollen problems respectively. CONCLUSIONS: For the Diatom dataset, the method is able to simulate the shape changes in different diatom life cycle stages, and thus, images generated resemble newly acquired samples with intermediate shapes. In fact, the other methods compared obtained worse results than those which were not using data augmentation. For the Glomeruli dataset, the method is able to add new samples with different shapes and degrees of sclerosis (through different textures). This is the case where our proposed DA method is more beneficial, when objects highly differ in both shape and texture. Finally, for the Pollen dataset, since there are only small variations between samples in a few classes and this dataset has other features such as noise which are likely to benefit other existing DA techniques, the method still shows an improvement of the results.

Assuntos

Gerenciamento de Dados; Redes Neurais de Computação; Bases de Dados Factuais; Humanos

Palavras-chave

Algae classification; Data augmentation; Diffeomorphism transforms; Glomeruli classification; Pollen classification; Taxon life cycle

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Redes Neurais de Computação / Gerenciamento de Dados Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google