Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture.

Klein, Jonathan; Waller, Rebekah; Pirk, Sören; Palubicki, Wojtek; Tester, Mark; Michels, Dominik L

Klein, Jonathan; Waller, Rebekah; Pirk, Sören; Palubicki, Wojtek; Tester, Mark; Michels, Dominik L.

Afiliación

Klein J; Computational Sciences Group, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
Waller R; Center for Desert Agriculture, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
Pirk S; Institute of Computer Science, Christian-Albrechts-University, Kiel, Germany.
Palubicki W; Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Poznan, Poland.
Tester M; Center for Desert Agriculture, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
Michels DL; Computational Sciences Group, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

Front Plant Sci ; 15: 1360113, 2024.

Article en En | MEDLINE | ID: mdl-39351023

ABSTRACT

ABSTRACT

The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.

Palabras clave

artificial intelligence; data generation and annotation; disease detection; greenhouse farming; machine learning; synthetic data; tomato plants

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Front Plant Sci Año: 2024 Tipo del documento: Article País de afiliación: Arabia Saudita

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google