Pesquisa | BVS - MINISTÉRIO DA SAÚDE

ViT-MVT: A Unified Vision Transformer Network for Multiple Vision Tasks.

Xie, Tao; Dai, Kun; Jiang, Zhiqiang; Li, Ruifeng; Mao, Shouren; Wang, Ke; Zhao, Lijun.

IEEE Trans Neural Netw Learn Syst ; PP2023 Dec 21.

Artigo em Inglês | MEDLINE | ID: mdl-38127606

RESUMO

In this work, we seek to learn multiple mainstream vision tasks concurrently using a unified network, which is storage-efficient as numerous networks with task-shared parameters can be implanted into a single consolidated network. Our framework, vision transformer (ViT)-MVT, built on a plain and nonhierarchical ViT, incorporates numerous visual tasks into a modest supernet and optimizes them jointly across various dataset domains. For the design of ViT-MVT, we augment the ViT with a multihead self-attention (MHSE) to offer complementary cues in the channel and spatial dimension, as well as a local perception unit (LPU) and locality feed-forward network (locality FFN) for information exchange in the local region, thus endowing ViT-MVT with the ability to effectively optimize multiple tasks. Besides, we construct a search space comprising potential architectures with a broad spectrum of model sizes to offer various optimum candidates for diverse tasks. After that, we design a layer-adaptive sharing technique that automatically determines whether each layer of the transformer block is shared or not for all tasks, enabling ViT-MVT to obtain task-shared parameters for a reduction of storage and task-specific parameters to learn task-related features such that boosting performance. Finally, we introduce a joint-task evolutionary search algorithm to discover an optimal backbone for all tasks under total model size constraint, which challenges the conventional wisdom that visual tasks are typically supplied with backbone networks developed for image classification. Extensive experiments reveal that ViT-MVT delivers exceptional performances for multiple visual tasks over state-of-the-art methods while necessitating considerably fewer total storage costs. We further demonstrate that once ViT-MVT has been trained, ViT-MVT is capable of incremental learning when generalized to new tasks while retaining identical performances for trained tasks. The code is available at https://github.com/XT-1997/vitmvt.

An Instance Segmentation-Based Method to Obtain the Leaf Age and Plant Centre of Weeds in Complex Field Environments.

Quan, Longzhe; Wu, Bing; Mao, Shouren; Yang, Chunjie; Li, Hengda.

Sensors (Basel) ; 21(10)2021 May 13.

Artigo em Inglês | MEDLINE | ID: mdl-34068108

RESUMO

Leaf age and plant centre are important phenotypic information of weeds, and accurate identification of them plays an important role in understanding the morphological structure of weeds, guiding precise targeted spraying and reducing the use of herbicides. In this work, a weed segmentation method based on BlendMask is proposed to obtain the phenotypic information of weeds under complex field conditions. This study collected images from different angles (front, side, and top views) of three kinds of weeds (Solanum nigrum, barnyard grass (Echinochloa crus-galli), and Abutilon theophrasti Medicus) in a maize field. Two datasets (with and without data enhancement) and two backbone networks (ResNet50 and ResNet101) were replaced to improve model performance. Finally, seven evaluation indicators are used to evaluate the segmentation results of the model under different angles. The results indicated that data enhancement and ResNet101 as the backbone network could enhance the model performance. The F1 value of the plant centre is 0.9330, and the recognition accuracy of leaf age can reach 0.957. The mIOU value of the top view is 0.642. Therefore, deep learning methods can effectively identify weed leaf age and plant centre, which is of great significance for variable spraying.

Assuntos

Herbicidas , Plantas Daninhas , Folhas de Planta , Zea mays

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA