High-Accuracy Tomato Leaf Disease Image-Text Retrieval Method Utilizing LAFANet.

Xu, Jiaxin; Zhou, Hongliang; Hu, Yufan; Xue, Yongfei; Zhou, Guoxiong; Li, Liujun; Dai, Weisi; Li, Jinyang

Xu, Jiaxin; Zhou, Hongliang; Hu, Yufan; Xue, Yongfei; Zhou, Guoxiong; Li, Liujun; Dai, Weisi; Li, Jinyang.

Afiliação

Xu J; College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China.
Zhou H; College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China.
Hu Y; College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China.
Xue Y; College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China.
Zhou G; College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China.
Li L; Department of Soil and Water Systems, University of Idaho, Moscow, ID 83844, USA.
Dai W; College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China.
Li J; College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China.

Plants (Basel) ; 13(9)2024 Apr 23.

Article em En | MEDLINE | ID: mdl-38732391

ABSTRACT

ABSTRACT

Tomato leaf disease control in the field of smart agriculture urgently requires attention and reinforcement. This paper proposes a method called LAFANet for image-text retrieval, which integrates image and text information for joint analysis of multimodal data, helping agricultural practitioners to provide more comprehensive and in-depth diagnostic evidence to ensure the quality and yield of tomatoes. First, we focus on six common tomato leaf disease images and text descriptions, creating a Tomato Leaf Disease Image-Text Retrieval Dataset (TLDITRD), introducing image-text retrieval into the field of tomato leaf disease retrieval. Then, utilizing ViT and BERT models, we extract detailed image features and sequences of textual features, incorporating contextual information from image-text pairs. To address errors in image-text retrieval caused by complex backgrounds, we propose Learnable Fusion Attention (LFA) to amplify the fusion of textual and image features, thereby extracting substantial semantic insights from both modalities. To delve further into the semantic connections across various modalities, we propose a False Negative Elimination-Adversarial Negative Selection (FNE-ANS) approach. This method aims to identify adversarial negative instances that specifically target false negatives within the triplet function, thereby imposing constraints on the model. To bolster the model's capacity for generalization and precision, we propose Adversarial Regularization (AR). This approach involves incorporating adversarial perturbations during model training, thereby fortifying its resilience and adaptability to slight variations in input data. Experimental results show that, compared with existing ultramodern models, LAFANet outperformed existing models on TLDITRD dataset, with top1, top5, and top10 reaching 83.3% and 90.0%, and top1, top5, and top10 reaching 80.3%, 93.7%, and 96.3%. LAFANet offers fresh technical backing and algorithmic insights for the retrieval of tomato leaf disease through image-text correlation.

Palavras-chave

AR; FNE-ANS; LAFANet; LFA; TLDITRD; cross-modal; image-text retrieval

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Revista: Plants (Basel) Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Revista: Plants (Basel) Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China