Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-38833395

RESUMEN

Hashing has received significant interest in large-scale data retrieval due to its outstanding computational efficiency. Of late, numerous deep hashing approaches have emerged, which have obtained impressive performance. However, these approaches can contain ethical risks during image retrieval. To address this, we are the first to study the problem of group fairness within learning to hash and introduce a novel method termed Fairness-aware Hashing with Mixture of Experts (FATE). Specifically, FATE leverages the mixture-of-experts framework as the hashing network, where each expert contributes knowledge from an individual viewpoint, followed by aggregation using the gating mechanism. This strongly enhances the model capability, facilitating the generation of both discriminative and unbiased binary descriptors. We also incorporate fairness-aware contrastive learning, combining sensitive labels with feature similarities to ensure unbiased hash code learning. Furthermore, an adversarial learning objective condition on both deep features and hash codes is employed to further eliminate group biases. Extensive experiments on several benchmark datasets validate the superiority of the proposed FATE compared with various state-of-the-art approaches.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 5157-5173, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38319771

RESUMEN

In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention. This paper presents a novel single-shot instance segmentation approach, namely Box2Mask, which integrates the classical level-set evolution model into deep neural network learning to achieve accurate mask prediction with only bounding box supervision. Specifically, both the input image and its deep features are employed to evolve the level-set curves implicitly, and a local consistency module based on a pixel affinity kernel is used to mine the local context and spatial relations. Two types of single-stage frameworks, i.e., CNN-based and transformer-based frameworks, are developed to empower the level-set evolution for box-supervised instance segmentation, and each framework consists of three essential components: instance-aware decoder, box-level matching assignment and level-set evolution. By minimizing the level-set energy function, the mask map of each instance can be iteratively optimized within its bounding box annotation. The experimental results on five challenging testbeds, covering general scenes, remote sensing, medical and scene text images, demonstrate the outstanding performance of our proposed Box2Mask approach for box-supervised instance segmentation. In particular, with the Swin-Transformer large backbone, our Box2Mask obtains 42.4% mask AP on COCO, which is on par with the recently developed fully mask-supervised methods.

3.
IEEE Trans Pattern Anal Mach Intell ; 46(4): 2333-2347, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37943653

RESUMEN

This paper delves into the problem of correlated time-series forecasting in practical applications, an area of growing interest in a multitude of fields such as stock price prediction and traffic demand analysis. Current methodologies primarily represent data using conventional graph structures, yet these fail to capture intricate structures with non-pairwise relationships. To address this challenge, we adopt dynamic hypergraphs in this study to better illustrate complex interactions, and introduce a novel hypergraph neural network model named CHNN for correlated time series forecasting. In more detail, CHNN leverages both semantic and topological similarities via an interaction model and hypergraph diffusion process, thereby constructing comprehensive collaborative correlation scores that effectively guide spatial message propagation. In addition, it incorporates short-term temporal information to generate efficient spatio-temporal feature maps. Lastly, a long-term temporal module is proposed to generate future predictions utilizing both temporal attention and a gated recurrent network. Comprehensive experiments conducted on four real-world datasets, i.e., Tiingo, Stocktwits, NYC-Taxi, and Social Network demonstrate that the proposed CHNN markedly outperforms a range of benchmark methods.

4.
IEEE Trans Image Process ; 32: 5909-5920, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37883290

RESUMEN

The optical flow guidance strategy is ideal for obtaining motion information of objects in the video. It is widely utilized in video segmentation tasks. However, existing optical flow-based methods have a significant dependency on optical flow, which results in poor performance when the optical flow estimation fails for a particular scene. The temporal consistency provided by the optical flow could be effectively supplemented by modeling in a structural form. This paper proposes a new hierarchical graph neural network (GNN) architecture, dubbed hierarchical graph pattern understanding (HGPU), for zero-shot video object segmentation (ZS-VOS). Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (i.e., optical flow) to enhance the high-order representations from the neighbors of target frames. Specifically, a hierarchical graph pattern encoder with message aggregation is introduced to acquire different levels of motion and appearance features in a sequential manner. Furthermore, a decoder is designed for hierarchically parsing and understanding the transformed multi-modal contexts to achieve more accurate and robust results. HGPU achieves state-of-the-art performance on four publicly available benchmarks (DAVIS-16, YouTube-Objects, Long-Videos and DAVIS-17). Code and pre-trained model can be found at https://github.com/NUST-Machine-Intelligence-Laboratory/HGPU.

5.
IEEE Trans Image Process ; 32: 1285-1299, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37027745

RESUMEN

This paper studies the problem of unsupervised domain adaptive hashing, which is less-explored but emerging for efficient image retrieval, particularly for cross-domain retrieval. This problem is typically tackled by learning hashing networks with pseudo-labeling and domain alignment techniques. Nevertheless, these approaches usually suffer from overconfident and biased pseudo-labels and inefficient domain alignment without sufficiently exploring semantics, thus failing to achieve satisfactory retrieval performance. To tackle this issue, we present PEACE, a principled framework which holistically explores semantic information in both source and target data and extensively incorporates it for effective domain alignment. For comprehensive semantic learning, PEACE leverages label embeddings to guide the optimization of hash codes for source data. More importantly, to mitigate the effects of noisy pseudo-labels, we propose a novel method to holistically measure the uncertainty of pseudo-labels for unlabeled target data and progressively minimize them through alternative optimization under the guidance of the domain discrepancy. Additionally, PEACE effectively removes domain discrepancy in the Hamming space from two views. In particular, it not only introduces composite adversarial learning to implicitly explore semantic information embedded in hash codes, but also aligns cluster semantic centroids across domains to explicitly exploit label information. Experimental results on several popular domain adaptive retrieval benchmarks demonstrate the superiority of our proposed PEACE compared with various state-of-the-art methods on both single-domain and cross-domain retrieval tasks. Our source codes are available at https://github.com/WillDreamer/PEACE.

6.
Genomics Proteomics Bioinformatics ; 21(2): 259-266, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36209954

RESUMEN

In recent years, neoantigens have been recognized as ideal targets for tumor immunotherapy. With the development of neoantigen-based tumor immunotherapy, comprehensive neoantigen databases are urgently needed to meet the growing demand for clinical studies. We have built the tumor-specific neoantigen database (TSNAdb) previously, which has attracted much attention. In this study, we provide TSNAdb v2.0, an updated version of the TSNAdb. TSNAdb v2.0 offers several new features, including (1) adopting more stringent criteria for neoantigen identification, (2) providing predicted neoantigens derived from three types of somatic mutations, and (3) collecting experimentally validated neoantigens and dividing them according to the experimental level. TSNAdb v2.0 is freely available at https://pgx.zju.edu.cn/tsnadb/.


Asunto(s)
Antígenos de Neoplasias , Neoplasias , Humanos , Antígenos de Neoplasias/genética , Neoplasias/genética , Neoplasias/terapia , Bases de Datos Factuales , Inmunoterapia , Mutación
7.
IEEE Trans Image Process ; 31: 6548-6561, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36240040

RESUMEN

Recently, unsupervised person re-identification (Re-ID) has received increasing research attention due to its potential for label-free applications. A promising way to address unsupervised Re-ID is clustering-based, which generates pseudo labels by clustering and uses the pseudo labels to train a Re-ID model iteratively. However, most clustering-based methods take each cluster as a pseudo identity class, neglecting the intra-cluster variance mainly caused by the change of cameras. To address this issue, we propose to split each single cluster into multiple proxies according to camera views. The camera-aware proxies explicitly capture local structures within clusters, by which the intra-ID variance and inter-ID similarity can be better tackled. Assisted with the camera-aware proxies, we design two proxy-level contrastive learning losses that are, respectively, based on offline and online association results. The offline association directly associates proxies according to the clustering and splitting results, while the online strategy dynamically associates proxies in terms of up-to-date features to reduce the noise caused by the delayed update of pseudo labels. The combination of two losses enables us to train a desirable Re-ID model. Extensive experiments on three person Re-ID datasets and one vehicle Re-ID dataset show that our proposed approach demonstrates competitive performance with state-of-the-art methods. Code will be available at: https://github.com/Terminator8758/O2CAP.


Asunto(s)
Identificación Biométrica , Humanos , Identificación Biométrica/métodos , Análisis por Conglomerados
8.
Artículo en Inglés | MEDLINE | ID: mdl-35675236

RESUMEN

This article studies self-supervised graph representation learning, which is critical to various tasks, such as protein property prediction. Existing methods typically aggregate representations of each individual node as graph representations, but fail to comprehensively explore local substructures (i.e., motifs and subgraphs), which also play important roles in many graph mining tasks. In this article, we propose a self-supervised graph representation learning framework named cluster-enhanced Contrast (CLEAR) that models the structural semantics of a graph from graph-level and substructure-level granularities, i.e., global semantics and local semantics, respectively. Specifically, we use graph-level augmentation strategies followed by a graph neural network-based encoder to explore global semantics. As for local semantics, we first use graph clustering techniques to partition each whole graph into several subgraphs while preserving as much semantic information as possible. We further employ a self-attention interaction module to aggregate the semantics of all subgraphs into a local-view graph representation. Moreover, we integrate both global semantics and local semantics into a multiview graph contrastive learning framework, enhancing the semantic-discriminative ability of graph representations. Extensive experiments on various real-world benchmarks demonstrate the efficacy of the proposed over current graph self-supervised representation learning approaches on both graph classification and transfer learning tasks.

9.
Brief Bioinform ; 23(4)2022 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-35696650

RESUMEN

Graph neural networks (GNNs) are the most promising deep learning models that can revolutionize non-Euclidean data analysis. However, their full potential is severely curtailed by poorly represented molecular graphs and features. Here, we propose a multiphysical graph neural network (MP-GNN) model based on the developed multiphysical molecular graph representation and featurization. All kinds of molecular interactions, between different atom types and at different scales, are systematically represented by a series of scale-specific and element-specific graphs with distance-related node features. From these graphs, graph convolution network (GCN) models are constructed with specially designed weight-sharing architectures. Base learners are constructed from GCN models from different elements at different scales, and further consolidated together using both one-scale and multi-scale ensemble learning schemes. Our MP-GNN has two distinct properties. First, our MP-GNN incorporates multiscale interactions using more than one molecular graph. Atomic interactions from various different scales are not modeled by one specific graph (as in traditional GNNs), instead they are represented by a series of graphs at different scales. Second, it is free from the complicated feature generation process as in conventional GNN methods. In our MP-GNN, various atom interactions are embedded into element-specific graph representations with only distance-related node features. A unique GNN architecture is designed to incorporate all the information into a consolidated model. Our MP-GNN has been extensively validated on the widely used benchmark test datasets from PDBbind, including PDBbind-v2007, PDBbind-v2013 and PDBbind-v2016. Our model can outperform all existing models as far as we know. Further, our MP-GNN is used in coronavirus disease 2019 drug design. Based on a dataset with 185 complexes of inhibitors for severe acute respiratory syndrome coronavirus (SARS-CoV/SARS-CoV-2), we evaluate their binding affinities using our MP-GNN. It has been found that our MP-GNN is of high accuracy. This demonstrates the great potential of our MP-GNN for the screening of potential drugs for SARS-CoV-2. Availability: The Multiphysical graph neural network (MP-GNN) model can be found in https://github.com/Alibaba-DAMO-DrugAI/MGNN. Additional data or code will be available upon reasonable request.


Asunto(s)
Tratamiento Farmacológico de COVID-19 , Análisis de Datos , Diseño de Fármacos , Humanos , Redes Neurales de la Computación , SARS-CoV-2
10.
Sci Rep ; 12(1): 8725, 2022 05 30.
Artículo en Inglés | MEDLINE | ID: mdl-35637238

RESUMEN

Genome variant calling is a challenging yet critical task for subsequent studies. Existing methods almost rely on high depth DNA sequencing data. Performance on low depth data drops a lot. Using public Oxford Nanopore (ONT) data of human being from the Genome in a Bottle (GIAB) Consortium, we trained a generative adversarial network for low depth variant calling. Our method, noted as LDV-Caller, can project high depth sequencing information from low depth data. It achieves 94.25% F1 score on low depth data, while the F1 score of the state-of-the-art method on two times higher depth data is 94.49%. By doing so, the price of genome-wide sequencing examination can reduce deeply. In addition, we validated the trained LDV-Caller model on 157 public Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) samples. The mean sequencing depth of these samples is 2982. The LDV-Caller yields 92.77% F1 score using only 22x sequencing depth, which demonstrates our method has potential to analyze different species with only low depth sequencing data.


Asunto(s)
COVID-19 , Polimorfismo de Nucleótido Simple , COVID-19/genética , Genoma Humano , Humanos , SARS-CoV-2/genética , Análisis de Secuencia de ADN/métodos
11.
IEEE Trans Image Process ; 31: 1789-1804, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35100116

RESUMEN

Video Summarization (VS) has become one of the most effective solutions for quickly understanding a large volume of video data. Dictionary selection with self representation and sparse regularization has demonstrated its promise for VS by formulating the VS problem as a sparse selection task on video frames. However, existing dictionary selection models are generally designed only for data reconstruction, which results in the neglect of the inherent structured information among video frames. In addition, the sparsity commonly constrained by L2,1 norm is not strong enough, which causes the redundancy of keyframes, i.e., similar keyframes are selected. Therefore, to address these two issues, in this paper we propose a general framework called graph convolutional dictionary selection with L2,p ( ) norm (GCDS 2,p ) for both keyframe selection and skimming based summarization. Firstly, we incorporate graph embedding into dictionary selection to generate the graph embedding dictionary, which can take the structured information depicted in videos into account. Secondly, we propose to use L2,p ( ) norm constrained row sparsity, in which p can be flexibly set for two forms of video summarization. For keyframe selection, can be utilized to select diverse and representative keyframes; and for skimming, p=1 can be utilized to select key shots. In addition, an efficient iterative algorithm is devised to optimize the proposed model, and the convergence is theoretically proved. Experimental results including both keyframe selection and skimming based summarization on four benchmark datasets demonstrate the effectiveness and superiority of the proposed method.

12.
IEEE Trans Image Process ; 31: 1340-1348, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35025744

RESUMEN

Model fine-tuning is a widely used transfer learning approach in person Re-identification (ReID) applications, which fine-tuning a pre-trained feature extraction model into the target scenario instead of training a model from scratch. It is challenging due to the significant variations inside the target scenario, e.g., different camera viewpoint, illumination changes, and occlusion. These variations result in a gap between each mini-batch's distribution and the whole dataset's distribution when using mini-batch training. In this paper, we study model fine-tuning from the perspective of the aggregation and utilization of the dataset's global information when using mini-batch training. Specifically, we introduce a novel network structure called Batch-related Convolutional Cell (BConv-Cell), which progressively collects the dataset's global information into a latent state and uses it to rectify the extracted feature. Based on BConv-Cells, we further proposed the Progressive Transfer Learning (PTL) method to facilitate the model fine-tuning process by jointly optimizing BConv-Cells and the pre-trained ReID model. Empirical experiments show that our proposal can greatly improve the ReID model's performance on MSMT17, Market-1501, CUHK03, and DukeMTMC-reID datasets. Moreover, we extend our proposal to the general image classification task. The experiments in several image classification benchmark datasets demonstrate that our proposal can significantly improve baseline models' performance. The code has been released at https://github.com/ZJULearning/PTL.


Asunto(s)
Aprendizaje Automático , Humanos
13.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34958660

RESUMEN

Artificial intelligence (AI)-based drug design has great promise to fundamentally change the landscape of the pharmaceutical industry. Even though there are great progress from handcrafted feature-based machine learning models, 3D convolutional neural networks (CNNs) and graph neural networks, effective and efficient representations that characterize the structural, physical, chemical and biological properties of molecular structures and interactions remain to be a great challenge. Here, we propose an equal-sized molecular 2D image representation, known as the molecular persistent spectral image (Mol-PSI), and combine it with CNN model for AI-based drug design. Mol-PSI provides a unique one-to-one image representation for molecular structures and interactions. In general, deep models are empowered to achieve better performance with systematically organized representations in image format. A well-designed parallel CNN architecture for adapting Mol-PSIs is developed for protein-ligand binding affinity prediction. Our results, for the three most commonly used databases, including PDBbind-v2007, PDBbind-v2013 and PDBbind-v2016, are better than all traditional machine learning models, as far as we know. Our Mol-PSI model provides a powerful molecular representation that can be widely used in AI-based drug design and molecular data analysis.


Asunto(s)
Diseño de Fármacos , Aprendizaje Automático , Unión Proteica , Inteligencia Artificial , Ligandos , Modelos Moleculares , Modelos Teóricos , Estructura Molecular , Redes Neurales de la Computación , Unión Proteica/efectos de los fármacos
14.
IEEE Trans Image Process ; 30: 5933-5943, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34166192

RESUMEN

Video moment localization, as an important branch of video content analysis, has attracted extensive attention in recent years. However, it is still in its infancy due to the following challenges: cross-modal semantic alignment and localization efficiency. To address these impediments, we present a cross-modal semantic alignment network. To be specific, we first design a video encoder to generate moment candidates, learn their representations, as well as model their semantic relevance. Meanwhile, we design a query encoder for diverse query intention understanding. Thereafter, we introduce a multi-granularity interaction module to deeply explore the semantic correlation between multi-modalities. Thereby, we can effectively complete target moment localization via sufficient cross-modal semantic understanding. Moreover, we introduce a semantic pruning strategy to reduce cross-modal retrieval overhead, improving localization efficiency. Experimental results on two benchmark datasets have justified the superiority of our model over several state-of-the-art competitors.

15.
IEEE Trans Image Process ; 30: 6130-6141, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34185644

RESUMEN

In recent years, supervised hashing has been validated to greatly boost the performance of image retrieval. However, the label-hungry property requires massive label collection, making it intractable in practical scenarios. To liberate the model training procedure from laborious manual annotations, some unsupervised methods are proposed. However, the following two factors make unsupervised algorithms inferior to their supervised counterparts: (1) Without manually-defined labels, it is difficult to capture the semantic information across data, which is of crucial importance to guide robust binary code learning. (2) The widely adopted relaxation on binary constraints results in quantization error accumulation in the optimization procedure. To address the above-mentioned problems, in this paper, we propose a novel Unsupervised Discrete Hashing method (UDH). Specifically, to capture the semantic information, we propose a balanced graph-based semantic loss which explores the affinity priors in the original feature space. Then, we propose a novel self-supervised loss, termed orthogonal consistent loss, which can leverage semantic loss of instance and impose independence of codes. Moreover, by integrating the discrete optimization into the proposed unsupervised framework, the binary constraints are consistently preserved, alleviating the influence of quantization errors. Extensive experiments demonstrate that UDH outperforms state-of-the-art unsupervised methods for image retrieval.

16.
IEEE Trans Image Process ; 30: 2422-2435, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33493117

RESUMEN

Human pose transfer (HPT) is an emerging research topic with huge potential in fashion design, media production, online advertising and virtual reality. For these applications, the visual realism of fine-grained appearance details is crucial for production quality and user engagement. However, existing HPT methods often suffer from three fundamental issues: detail deficiency, content ambiguity and style inconsistency, which severely degrade the visual quality and realism of generated images. Aiming towards real-world applications, we develop a more challenging yet practical HPT setting, termed as Fine-grained Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail replenishment. Concretely, we analyze the potential design flaws of existing methods via an illustrative example, and establish the core FHPT methodology by combing the idea of content synthesis and feature transfer together in a mutually-guided fashion. Thereafter, we substantiate the proposed methodology with a Detail Replenishing Network (DRN) and a corresponding coarse-to-fine model training scheme. Moreover, we build up a complete suite of fine-grained evaluation protocols to address the challenges of FHPT in a comprehensive manner, including semantic analysis, structural detection and perceptual quality assessment. Extensive experiments on the DeepFashion benchmark dataset have verified the power of proposed benchmark against start-of-the-art works, with 12%-14% gain on top-10 retrieval recall, 5% higher joint localization accuracy, and near 40% gain on face identity preservation. Our codes, models and evaluation tools will be released at https://github.com/Lotayou/RATE.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , Postura/fisiología , Algoritmos , Femenino , Humanos , Masculino
17.
IEEE Trans Vis Comput Graph ; 27(8): 3438-3450, 2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-32070959

RESUMEN

There is an increasing demand for interior design and decorating. The main challenges are where to put the objects and how to put them plausibly in the given domain. In this article, we propose an automatic method for decorating the planes in a given image. We call it Decoration In (DecorIn for short). Given an image, we first extract planes as decorating candidates according to the estimated geometric features. Then we parameterize the planes with an orthogonal and semantically consistent grid. Finally, we compute the position for the decoration, i.e., a decoration box, on the plane by an example-based decorating method which can describe the partial image and compute the similarity between partial scenes. We have conducted comprehensive evaluations and demonstrate our method on a number of applications. Our method is more efficient both in time and economic than generating a layout from scratch.

18.
Artículo en Inglés | MEDLINE | ID: mdl-32149635

RESUMEN

The re-identification (ReID) task has received increasing studies in recent years and its performance has gained significant improvement. The progress mainly comes from searching for new network structures to learn person representations. Most of these networks are trained using the classic stochastic gradient descent optimizer. However, limited efforts have been made to explore potential performance of existing ReID networks directly by better training scheme, which leaves a large space for ReID research. In this paper, we propose a Self-Inspirited Feature Learning (SIF) method to enhance performance of given ReID networks from the viewpoint of optimization. We design a simple adversarial learning scheme to encourage a network to learn more discriminative person representation. In our method, an auxiliary branch is added into the network only in the training stage, while the structure of the original network stays unchanged during the testing stage. In summary, SIF has three aspects of advantages: (1) it is designed under general setting; (2) it is compatible with many existing feature learning networks on the ReID task; (3) it is easy to implement and has steady performance. We evaluate the performance of SIF on three public ReID datasets: Market1501, DuckMTMC-reID, and CUHK03(both labeled and detected). The results demonstrate significant improvement in performance brought by SIF. We also apply SIF to obtain state-of-the-art results on all the three datasets. Specifically, mAP / Rank-1 accuracy are: 87.6% / 95.2% (without re-rank) on Market1501, 79.4% / 89.8% on DuckMTMC-reID, 77.0% / 79.5% on CUHK03 (labeled) and 73.9% / 76.6% on CUHK03 (detected), respectively. The code of SIF will be available soon.

19.
Artículo en Inglés | MEDLINE | ID: mdl-32191885

RESUMEN

In recent years, hashing methods have been proved to be effective and efficient for large-scale Web media search. However, the existing general hashing methods have limited discriminative power for describing fine-grained objects that share similar overall appearance but have a subtle difference. To solve this problem, we for the first time introduce the attention mechanism to the learning of fine-grained hashing codes. Specifically, we propose a novel deep hashing model, named deep saliency hashing (DSaH), which automatically mines salient regions and learns semantic-preserving hashing codes simultaneously. DSaH is a two-step end-to-end model consisting of an attention network and a hashing network. Our loss function contains three basic components, including the semantic loss, the saliency loss, and the quantization loss. As the core of DSaH, the saliency loss guides the attention network to mine discriminative regions from pairs of images.We conduct extensive experiments on both fine-grained and general retrieval datasets for performance evaluation. Experimental results on fine-grained datasets, including Oxford Flowers, Stanford Dogs, and CUB Birds demonstrate that our DSaH performs the best for the fine-grained retrieval task and beats the strongest competitor (DTQ) by approximately 10% on both Stanford Dogs and CUB Birds. DSaH is also comparable to several state-of-the-art hashing methods on CIFAR-10 and NUS-WIDE.

20.
IEEE Trans Image Process ; 28(12): 6077-6090, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-31217115

RESUMEN

Detecting objects in surveillance videos is an important problem due to its wide applications in traffic control and public security. Existing methods tend to face performance degradation because of false positive or misalignment problems. We propose a novel framework, namely, Foreground Gating and Background Refining Network (FG-BR Net), for surveillance object detection (SOD). To reduce false positives in background regions, which is a critical problem in SOD, we introduce a new module that first subtracts the background of a video sequence and then generates high-quality region proposals. Unlike previous background subtraction methods that may wrongly remove the static foreground objects in a frame, a feedback connection from detection results to background subtraction process is proposed in our model to distill both static and moving objects in surveillance videos. Furthermore, we introduce another module, namely, the background refining stage, to refine the detection results with more accurate localizations. Pairwise non-local operations are adopted to cope with the misalignments between the features of original and background frames. Extensive experiments on real-world traffic surveillance benchmarks demonstrate the competitive performance of the proposed FG-BR Net. In particular, FG-BR Net ranks on the top among all the methods on hard and sunny subsets of the UA-DETRAC detection dataset, without any bells and whistles.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA