Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 46(10): 6525-6541, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38502633

RESUMO

Transformers have shown remarkable performance, however, their architecture design is a time-consuming process that demands expertise and trial-and-error. Thus, it is worthwhile to investigate efficient methods for automatically searching high-performance Transformers via Transformer Architecture Search (TAS). In order to improve the search efficiency, training-free proxy based methods have been widely adopted in Neural Architecture Search (NAS). Whereas, these proxies have been found to be inadequate in generalizing well to Transformer search spaces, as confirmed by several studies and our own experiments. This paper presents an effective scheme for TAS called TRansformer Architecture search with ZerO-cost pRoxy guided evolution (T-Razor) that achieves exceptional efficiency. First, through theoretical analysis, we discover that the synaptic diversity of multi-head self-attention (MSA) and the saliency of multi-layer perceptron (MLP) are correlated with the performance of corresponding Transformers. The properties of synaptic diversity and synaptic saliency motivate us to introduce the ranks of synaptic diversity and saliency that denoted as DSS++ for evaluating and ranking Transformers. DSS++ incorporates correlation information among sampled Transformers to provide unified scores for both synaptic diversity and synaptic saliency. We then propose a block-wise evolution search guided by DSS++ to find optimal Transformers. DSS++ determines the positions for mutation and crossover, enhancing the exploration ability. Experimental results demonstrate that our T-Razor performs competitively against the state-of-the-art manually or automatically designed Transformer architectures across four popular Transformer search spaces. Significantly, T-Razor improves the searching efficiency across different Transformer search spaces, e.g., reducing required GPU days from more than 24 to less than 0.4 and outperforming existing zero-cost approaches. We also apply T-Razor to the BERT search space and find that the searched Transformers achieve competitive GLUE results on several Neural Language Processing (NLP) datasets. This work provides insights into training-free TAS, revealing the usefulness of evaluating Transformers based on the properties of their different blocks.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(9): 6199-6215, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38502629

RESUMO

PSNR-oriented models are a critical class of super-resolution models with applications across various fields. However, these models tend to generate over-smoothed images, a problem that has been analyzed previously from the perspectives of models or loss functions, but without taking into account the impact of data properties. In this paper, we present a novel phenomenon that we term the center-oriented optimization (COO) problem, where a model's output converges towards the center point of similar high-resolution images, rather than towards the ground truth. We demonstrate that the strength of this problem is related to the uncertainty of data, which we quantify using entropy. We prove that as the entropy of high-resolution images increases, their center point will move further away from the clean image distribution, and the model will generate over-smoothed images. Implicitly optimizing the COO problem, perceptual-driven approaches such as perceptual loss, model structure optimization, or GAN-based methods can be viewed. We propose an explicit solution to the COO problem, called Detail Enhanced Contrastive Loss (DECLoss). DECLoss utilizes the clustering property of contrastive learning to directly reduce the variance of the potential high-resolution distribution and thereby decrease the entropy. We evaluate DECLoss on multiple super-resolution benchmarks and demonstrate that it improves the perceptual quality of PSNR-oriented models. Moreover, when applied to GAN-based methods, such as RaGAN, DECLoss helps to achieve state-of-the-art performance, such as 0.093 LPIPS with 24.51 PSNR on 4× downsampled Urban100, validating the effectiveness and generalization of our approach.

3.
IEEE Trans Pattern Anal Mach Intell ; 43(9): 2936-2952, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-33710952

RESUMO

Neural architecture search (NAS) has achieved unprecedented performance in various computer vision tasks. However, most existing NAS methods are defected in search efficiency and model generalizability. In this paper, we propose a novel NAS framework, termed MIGO-NAS, with the aim to guarantee the efficiency and generalizability in arbitrary search spaces. On the one hand, we formulate the search space as a multivariate probabilistic distribution, which is then optimized by a novel multivariate information-geometric optimization (MIGO). By approximating the distribution with a sampling, training, and testing pipeline, MIGO guarantees the memory efficiency, training efficiency, and search flexibility. Besides, MIGO is the first time to decrease the estimation error of natural gradient in multivariate distribution. On the other hand, for a set of specific constraints, the neural architectures are generated by a novel dynamic programming network generation (DPNG), which significantly reduces the training cost under various hardware environments. Experiments validate the advantages of our approach over existing methods by establishing a superior accuracy and efficiency i.e., 2.39 test error on CIFAR-10 benchmark and 21.7 on ImageNet benchmark, with only 1.5 GPU hours and 96 GPU hours for searching, respectively. Besides, the searched architectures can be well generalize to computer vision tasks including object detection and semantic segmentation, i.e., 25× FLOPs compression, with 6.4 mAP gain over Pascal VOC dataset, and 29.9× FLOPs compression, with only 1.41 percent performance drop over Cityscapes dataset. The code is publicly available.

4.
IEEE Trans Pattern Anal Mach Intell ; 43(9): 3091-3107, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-33780333

RESUMO

Automated machine learning (AutoML) has achieved remarkable progress on various tasks, which is attributed to its minimal involvement of manual feature and model designs. However, most of existing AutoML pipelines only touch parts of the full machine learning pipeline, e.g., neural architecture search or optimizer selection. This leaves potentially important components such as data cleaning and model ensemble out of the optimization, and still results in considerable human involvement and suboptimal performance. The main challenges lie in the huge search space assembling all possibilities over all components, as well as the generalization ability over different tasks like image, text, and tabular etc. In this paper, we present a first-of-its-kind fully AutoML pipeline, to comprehensively automate data preprocessing, feature engineering, model generation/selection/training and ensemble for an arbitrary dataset and evaluation metric. Our innovation lies in the comprehensive scope of a learning pipeline, with a novel "life-long" knowledge anchor design to fundamentally accelerate the search over the full search space. Such knowledge anchors record detailed information of pipelines and integrates them with an evolutionary algorithm for joint optimization across components. Experiments demonstrate that the result pipeline achieves state-of-the-art performance on multiple datasets and modalities. Specifically, the proposed framework was extensively evaluated in the NeurIPS 2019 AutoDL challenge, and won the only champion with a significant gap against other approaches, on all the image, video, speech, text and tabular tracks.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA