Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38669166

RESUMO

The conventional approach to image recognition has been based on raster graphics, which can suffer from aliasing and information loss when scaled up or down. In this paper, we propose a novel approach that leverages the benefits of vector graphics for object localization and classification. Our method, called YOLaT (You Only Look at Text), takes the textual document of vector graphics as input, rather than rendering it into pixels. YOLaT builds multi-graphs to model the structural and spatial information in vector graphics and utilizes a dual-stream graph neural network (GNN) to detect objects from the graph. However, for real-world vector graphics, YOLaT only models in flat GNN with vertexes as nodes ignore higher-level information of vector data. Therefore, we propose YOLaT++ to learn Multi-level Abstraction Feature Learning from a new perspective: Primitive Shapes to Curves and Points. On the other hand, given few public datasets focus on vector graphics, data-driven learning cannot exert its full power on this format. We provide a large-scale and challenging dataset for Chart-based Vector Graphics Detection and Chart Understanding, termed VG-DCU, with vector graphics, raster graphics, annotations, and raw data drawn for creating these vector charts. Experiments show that the YOLaT series outperforms both vector graphics and raster graphics-based object detection methods on both subsets of VG-DCU in terms of both accuracy and efficiency, showcasing the potential of vector graphics for image recognition tasks. Our codes, models, and the VG-DCU dataset are available at: https://github.com/microsoft/YOLaT-VectorGraphicsRecognition.

2.
Nat Commun ; 14(1): 3315, 2023 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-37286541

RESUMO

Eye tracking provides valuable insight for analyzing visual attention and underlying thinking progress through the observation of eye movements. Here, a transparent, flexible and ultra-persistent electrostatic sensing interface is proposed for realizing active eye tracking (AET) system based on the electrostatic induction effect. Through a triple-layer structure combined with a dielectric bilayer and a rough-surface Ag nanowire (Ag NW) electrode layer, the inherent capacitance and interfacial trapping density of the electrostatic interface has been strongly enhanced, contributing to an unprecedented charge storage capability. The electrostatic charge density of the interface reached 1671.10 µC·m-2 with a charge-keeping rate of 96.91% after 1000 non-contact operation cycles, which can finally realize oculogyric detection with an angular resolution of 5°. Thus, the AET system enables real-time decoding eye movements for customer preference recording and eye-controlled human-computer interaction, supporting its limitless potentiality in commercial purpose, virtual reality, human computer interactions and medical monitoring.


Assuntos
Movimentos Oculares , Tecnologia de Rastreamento Ocular , Humanos , Eletricidade Estática , Eletrodos
3.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3634-3646, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-33497330

RESUMO

Neural architecture search (NAS) has attracted a lot of attention and has been illustrated to bring tangible benefits in a large number of applications in the past few years. Architecture topology and architecture size have been regarded as two of the most important aspects for the performance of deep learning models and the community has spawned lots of searching algorithms for both of those aspects of the neural architectures. However, the performance gain from these searching algorithms is achieved under different search spaces and training setups. This makes the overall performance of the algorithms incomparable and the improvement from a sub-module of the searching model unclear. In this paper, we propose NATS-Bench, a unified benchmark on searching for both topology and size, for (almost) any up-to-date NAS algorithm. NATS-Bench includes the search space of 15,625 neural cell candidates for architecture topology and 32,768 for architecture size on three datasets. We analyze the validity of our benchmark in terms of various criteria and performance comparison of all candidates in the search space. We also show the versatility of NATS-Bench by benchmarking 13 recent state-of-the-art NAS algorithms on it. All logs and diagnostic information trained using the same setup for each candidate are provided. This facilitates a much larger community of researchers to focus on developing better NAS algorithms in a more comparable and computationally effective environment. All codes are publicly available at: https://xuanyidong.com/assets/projects/NATS-Bench.


Assuntos
Algoritmos , Redes Neurais de Computação , Benchmarking , Neurônios , Extratos Vegetais
4.
IEEE Trans Pattern Anal Mach Intell ; 43(10): 3681-3694, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-32248096

RESUMO

We present supervision by registration and triangulation (SRT), an unsupervised approach that utilizes unlabeled multi-view video to improve the accuracy and precision of landmark detectors. Being able to utilize unlabeled data enables our detectors to learn from massive amounts of unlabeled data freely available and not be limited by the quality and quantity of manual human annotations. To utilize unlabeled data, there are two key observations: (I) The detections of the same landmark in adjacent frames should be coherent with registration, i.e., optical flow. (II) The detections of the same landmark in multiple synchronized and geometrically calibrated views should correspond to a single 3D point, i.e., multi-view consistency. Registration and multi-view consistency are sources of supervision that do not require manual labeling, thus it can be leveraged to augment existing training data during detector training. End-to-end training is made possible by differentiable registration and 3D triangulation modules. Experiments with 11 datasets and a newly proposed metric to measure precision demonstrate accuracy and precision improvements in landmark detection on both images and video.

5.
IEEE Trans Cybern ; 50(8): 3594-3604, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31478883

RESUMO

Deeper and wider convolutional neural networks (CNNs) achieve superior performance but bring expensive computation cost. Accelerating such overparameterized neural network has received increased attention. A typical pruning algorithm is a three-stage pipeline, i.e., training, pruning, and retraining. Prevailing approaches fix the pruned filters to zero during retraining and, thus, significantly reduce the optimization space. Besides, they directly prune a large number of filters at first, which would cause unrecoverable information loss. To solve these problems, we propose an asymptotic soft filter pruning (ASFP) method to accelerate the inference procedure of the deep neural networks. First, we update the pruned filters during the retraining stage. As a result, the optimization space of the pruned model would not be reduced but be the same as that of the original model. In this way, the model has enough capacity to learn from the training data. Second, we prune the network asymptotically. We prune few filters at first and asymptotically prune more filters during the training procedure. With asymptotic pruning, the information of the training set would be gradually concentrated in the remaining filters, so the subsequent training and pruning process would be stable. The experiments show the effectiveness of our ASFP on image classification benchmarks. Notably, on ILSVRC-2012, our ASFP reduces more than 40% FLOPs on ResNet-50 with only 0.14% top-5 accuracy degradation, which is higher than the soft filter pruning by 8%.


Assuntos
Algoritmos , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador
6.
IEEE Trans Pattern Anal Mach Intell ; 41(7): 1641-1654, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-29994192

RESUMO

In this paper, we study object detection using a large pool of unlabeled images and only a few labeled images per category, named "few-example object detection". The key challenge consists in generating trustworthy training samples as many as possible from the pool. Using few training examples as seeds, our method iterates between model training and high-confidence sample selection. In training, easy samples are generated first and, then the poorly initialized model undergoes improvement. As the model becomes more discriminative, challenging but reliable samples are selected. After that, another round of model improvement takes place. To further improve the precision and recall of the generated training samples, we embed multiple detection models in our framework, which has proven to outperform the single model baseline and the model ensemble method. Experiments on PASCAL VOC'07, MS COCO'14, and ILSVRC'13 indicate that by using as few as three or four samples selected for each category, our method produces very competitive results when compared to the state-of-the-art weakly-supervised approaches using a large number of image-level labels.

7.
IEEE Trans Image Process ; 28(1): 518-528, 2019 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-30176590

RESUMO

In many real-world applications, data can be represented by multiple ways or multi-view features to describe various characteristics of data. In this sense, the prediction performance can be significantly improved by taking advantages of these features together. Late fusion, which combines the predictions of multiple features, is a commonly used approach to make the final decision for a test instance. However, it is ubiquitous that different features dispute the prediction on the same data with each other, leading to performance degeneration. In this paper, we propose an efficient and effective matrix factorization-based approach to fuse predictions from multiple sources. This approach leverages a hard constraint on the matrix rank to preserve the consistency of predictions by various features, and we thus named it as Hard-rank Constraint Matrix Factorization-based fusion (HCMF). HCMF can avoid the performance degeneration caused by the controversy of multiple features. Extensive experiments demonstrate the efficacy of HCMF for outlier detection and the performance improvement, which outperforms the state-of-the-art late fusion algorithms on many data sets.

8.
Artigo em Inglês | MEDLINE | ID: mdl-30629502

RESUMO

In this paper, we focus on the one-example person re-identification (re-ID) task, where each identity has only one labeled example along with many unlabeled examples. We propose a progressive framework which gradually exploits the unlabeled data for person re-ID. In this framework, we iteratively (1) update the Convolutional Neural Network (CNN) model and (2) estimate pseudo labels for the unlabeled data. We split the training data into three parts, i.e., labeled data, pseudo-labeled data, and indexlabeled data. Initially, the re-ID model is trained using the labeled data. For the subsequent model training, we update the CNN model by the joint training on the three data parts. The proposed joint training method can optimize the model by both the data with labels (or pseudo labels) and the data without any reliable labels. For the label estimation step, instead of using a static sampling strategy, we propose a progressive sampling strategy to increase the number of the selected pseudo-labeled candidates step by step. We select a few candidates with most reliable pseudo labels from unlabeled examples as the pseudo-labeled data, and keep the rest as index-labeled data by assigning them with the data indexes. During iterations, the index-labeled data are dynamically transferred to pseudo-labeled data. Notably, the rank-1 accuracy of our method outperforms the state-of-the-art method by 21.6 points (absolute, i.e., 62.8% vs. 41.2%) on MARS, and 16.6 points on DukeMTMC-VideoReID. Extended to the few-example setting, our approach with only 20% labeled data surprisingly achieves comparable performance to the supervised state-of-the-art method with 100% labeled data.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA