Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Neural Netw ; 180: 106572, 2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-39173200

RESUMO

Person Re-identification (Re-ID) aims to match person images across non-overlapping cameras. The existing approaches formulate this task as fine-grained representation learning with deep neural networks, which involves extracting image features using a deep convolutional network, followed by mapping the features into a discriminative space through another smaller network, in order to make full use of all possible cues. However, recent Re-ID methods that strive to capture every cue and make the space more discriminative have resulted in longer features, ranging from 1024 to 14336, leading to higher time (distance computation) and space (feature storage) complexities. There are two potential solutions: reduction-after-training methods (such as Principal Component Analysis and Linear Discriminant Analysis) and reduction-during-training methods (such as 1 × 1 Convolution). The former utilizes a statistical approach aiming for a global optimum but lacking end-to-end optimization of large data and deep neural networks. The latter lacks theoretical guarantees and may be vulnerable to training noise such as dataset noise or initialization seed. To address these limitations, we propose a method called Euclidean-Distance-Preserving Feature Reduction (EDPFR) that combines the strengths of both reduction-after-training and reduction-during-training methods. EDPFR first formulates the feature reduction process as a matrix decomposition and derives a condition to preserve the Euclidean distance between features, thus ensuring accuracy in theory. Furthermore, the method integrates the matrix decomposition process into a deep neural network to enable end-to-end optimization and batch training, while maintaining the theoretical guarantee. The result of the EDPFR is a reduction of the feature dimensions from fa and fb to fa' and fb', while preserving their Euclidean distance, i.e.L2(fa,fb)=L2(fa',fb'). In addition to its Euclidean-Distance-Preserving capability, EDPFR also features a novel feature-level distillation loss. One of the main challenges in knowledge distillation is dimension mismatch. While previous distillation losses, usually project the mismatched features to matched class-level, spatial-level, or similarity-level spaces, this can result in a loss of information and decrease the flexibility and efficiency of distillation. Our proposed feature-level distillation leverages the benefits of the Euclidean-Distance-Preserving property and performs distillation directly in the feature space, resulting in a more flexible and efficient approach. Extensive on three Re-ID datasets, Market-1501, DukeMTMC-reID and MSMT demonstrate the effectiveness of our proposed Euclidean-Distance-Preserving Feature Reduction.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 3013-3030, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38090825

RESUMO

Fast person re-identification (ReID) aims to search person images quickly and accurately. The main idea of recent fast ReID methods is the hashing algorithm, which learns compact binary codes and performs fast Hamming distance and counting sort. However, a very long code is needed for high accuracy (e.g., 2048), which compromises search speed. In this work, we introduce a new solution for fast ReID by formulating a novel Coarse-to-Fine (CtF) hashing code search strategy, which complementarily uses short and long codes, achieving both faster speed and better accuracy. It uses shorter codes to coarsely rank broad matching similarities and longer codes to refine only a few top candidates for more accurate instance ReID. Specifically, we design an All-in-One (AiO) module together with a Distance Threshold Optimization (DTO) algorithm. In AiO, we simultaneously learn and enhance multiple codes of different lengths in a single model. It learns multiple codes in a pyramid structure, and encourage shorter codes to mimic longer codes by self-distillation. DTO solves a complex threshold search problem by a simple optimization process, and the balance between accuracy and speed is easily controlled by a single parameter. It formulates the optimization target as a Fß score that can be optimised by Gaussian cumulative distribution functions. Besides, we find even short code (e.g., 32) still takes a long time under large-scale gallery due to the O(n) time complexity. To solve the problem, we propose a gallery-size-free latent-attributes-based One-Shot-Filter (OSF) strategy, that is always O(1) time complexity, to quickly filter major easy negative gallery images, Specifically, we design a Latent-Attribute-Learning (LAL) module supervised a Single-Direction-Metric (SDM) Loss. LAL is derived from principal component analysis (PCA) that keeps largest variance using shortest feature vector, meanwhile enabling batch and end-to-end learning. Every logit of a feature vector represents a meaningful attribute. SDM is carefully designed for fine-grained attribute supervision, outperforming common metrics such as Euclidean and Cosine metrics. Experimental results on 2 datasets show that CtF+OSF is not only 2% more accurate but also 5× faster than contemporary hashing ReID methods. Compared with non-hashing ReID methods, CtF is 50× faster with comparable accuracy. OSF further speeds CtF by 2× again and upto 10× in total with almost no accuracy drop.

3.
IEEE Trans Med Imaging ; 41(8): 1925-1937, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35148262

RESUMO

Magnetic Resonance Imaging (MRI) has been proven to be an efficient way to diagnose Alzheimer's disease (AD). Recent dramatic progress on deep learning greatly promotes the MRI analysis based on data-driven CNN methods using a large-scale longitudinal MRI dataset. However, most of the existing MRI datasets are fragmented due to unexpected quits of volunteers. To tackle this problem, we propose a novel Temporal Recurrent Generative Adversarial Network (TR-GAN) to complete missing sessions of MRI datasets. Unlike existing GAN-based methods, which either fail to generate future sessions or only generate fixed-length sessions, TR-GAN takes all past sessions to recurrently and smoothly generate future ones with variant length. Specifically, TR-GAN adopts recurrent connection to deal with variant input sequence length and flexibly generate future variant sessions. Besides, we also design a multiple scale & location (MSL) module and a SWAP module to encourage the model to better focus on detailed information, which helps to generate high-quality MRI data. Compared with other popular GAN architectures, TR-GAN achieved the best performance in all evaluation metrics of two datasets. After expanding the Whole MRI dataset, the balanced accuracy of AD vs. cognitively normal (CN) vs. mild cognitive impairment (MCI) and stable MCI vs. progressive MCI classification can be increased by 3.61% and 4.00%, respectively.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Doença de Alzheimer/diagnóstico por imagem , Disfunção Cognitiva/diagnóstico por imagem , Humanos , Imageamento por Ressonância Magnética
4.
Med Image Anal ; 76: 102310, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34954623

RESUMO

Surgical instrument segmentation plays a promising role in robot-assisted surgery. However, illumination issues often appear in surgical scenes, altering the color and texture of surgical instruments. Changes in visual features make surgical instrument segmentation difficult. To address illumination issues, the SurgiNet is proposed to learn pyramid attention features. The double attention module is designed to capture the semantic dependencies between locations and channels. Based on semantic dependencies, the semantic features in the disturbed area can be inferred for addressing illumination issues. Pyramid attention is aggregated to capture multi-scale features and make predictions more accurate. To perform model compression, class-wise self-distillation is proposed to enhance the representation learning of the network, which performs feature distillation within the class to eliminate interference from other classes. Top-down and multi-stage knowledge distillation is designed to distill class probability maps. By inter-layer supervision, high-level probability maps are applied to calibrate the probability distribution of low-level probability maps. Since class-wise distillation enhances the self-learning of the network, the network can get excellent performance with a lightweight backbone. The proposed network achieves the state-of-the-art performance of 89.14% mIoU on CataIS with only 1.66 GFlops and 2.05 M parameters. It also takes first place on EndoVis 2017 with 66.30% mIoU.


Assuntos
Processamento de Imagem Assistida por Computador , Humanos , Atenção , Semântica , Instrumentos Cirúrgicos
5.
IEEE Trans Neural Netw Learn Syst ; 33(8): 4110-4124, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33684043

RESUMO

Hashing is a popular search algorithm for its compact binary representation and efficient Hamming distance calculation. Benefited from the advance of deep learning, deep hashing methods have achieved promising performance. However, those methods usually learn with expensive labeled data but fail to utilize unlabeled data. Furthermore, the traditional pairwise loss used by those methods cannot explicitly force similar/dissimilar pairs to small/large distances. Both weaknesses limit existing methods' performance. To solve the first problem, we propose a novel semi-supervised deep hashing model named adversarial binary mutual learning (ABML). Specifically, our ABML consists of a generative model GH and a discriminative model DH , where DH learns labeled data in a supervised way and GH learns unlabeled data by synthesizing real images. We adopt an adversarial learning (AL) strategy to transfer the knowledge of unlabeled data to DH by making GH and DH mutually learn from each other. To solve the second problem, we propose a novel Weibull cross-entropy loss (WCE) by using the Weibull distribution, which can distinguish tiny differences of distances and explicitly force similar/dissimilar distances as small/large as possible. Thus, the learned features are more discriminative. Finally, by incorporating ABML with WCE loss, our model can acquire more semantic and discriminative features. Extensive experiments on four common data sets (CIFAR-10, large database of handwritten digits (MNIST), ImageNet-10, and NUS-WIDE) and a large-scale data set ImageNet demonstrate that our approach successfully overcomes the two difficulties above and significantly outperforms state-of-the-art hashing methods.

6.
Artigo em Inglês | MEDLINE | ID: mdl-34851833

RESUMO

Recently, unsupervised cross-dataset person reidentification (Re-ID) has attracted more and more attention, which aims to transfer knowledge of a labeled source domain to an unlabeled target domain. There are two common frameworks: one is pixel-alignment of transferring low-level knowledge, and the other is feature-alignment of transferring high-level knowledge. In this article, we propose a novel recurrent autoencoder (RAE) framework to unify these two kinds of methods and inherit their merits. Specifically, the proposed RAE includes three modules, i.e., a feature-transfer (FT) module, a pixel-transfer (PT) module, and a fusion module. The FT module utilizes an encoder to map source and target images to a shared feature space. In the space, not only features are identity-discriminative but also the gap between source and target features is reduced. The PT module takes a decoder to reconstruct original images with its features. Here, we hope that the images reconstructed from target features are in the source style. Thus, the low-level knowledge can be propagated to the target domain. After transferring both high- and low-level knowledge with the two proposed modules above, we design another bilinear pooling layer to fuse both kinds of knowledge. Extensive experiments on Market-1501, DukeMTMC-ReID, and MSMT17 datasets show that our method significantly outperforms either pixel-alignment or feature-alignment Re-ID methods and achieves new state-of-the-art results.

7.
Neural Netw ; 128: 294-304, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32470795

RESUMO

RGB-Infrared (IR) person re-identification is very challenging due to the large cross-modality variations between RGB and IR images. Considering no correspondence labels between every pair of RGB and IR images, most methods try to alleviate the variations with set-level alignment by reducing marginal distribution divergence between the entire RGB and IR sets. However, this set-level alignment strategy may lead to misalignment of some instances, which limit the performance for RGB-IR Re-ID. Different from existing methods, in this paper, we propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments. Our proposed method enjoys several merits. First, our method can perform set-level alignment by disentangling modality-specific and modality-invariant features. Compared with conventional methods, ours can explicitly remove the modality-specific features and the modality variation can be better reduced. Second, given cross-modality unpaired-images of a person, our method can generate cross-modality paired images from exchanged features. With them, we can directly perform instance-level alignment by minimizing distances of every pair of images. Third, our method learns a latent manifold space. In the space, we can random sample and generate lots of images of unseen classes. Training with those images, the learned identity feature space is more smooth can generalize better when test. Finally, extensive experimental results on two standard benchmarks demonstrate that the proposed model favorably against state-of-the-art methods.


Assuntos
Identificação Biométrica/métodos , Aprendizado de Máquina , Raios Infravermelhos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA