Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-39283787

RESUMO

Weakly supervised group activity recognition (WSGAR) aims at identifying the overall behavior of multiple persons without any fine-grained supervision information (including individual position and action label). Traditional methods usually adopt a person-to-whole way: detect persons via off-the-shelf detectors, obtain person-level features, and integrate into the group-level features for training the classifier. However, these methods are unflexible due to serious reliance on the quality of detectors. To get rid of the detector, recent works learn several prototype tokens from noisy grid features with learnable weights directly, which treat all the local visual information equally and bring in redundant and ambiguous information to some extent. To this end, we propose a novel coarse-fine nested network (CFNN) to coarsely localize the key visual patches of activity and further finely learn the local features, as well as the global features. Specifically, we design a nested interactor (NI) to progressively model the spatiotemporal interactions of the learnable global token. According to the cue of spatial interaction in NI, we localize several key visual patches via a new coarse-grained spatial localizer (CSL). Then, we finally encode these localized visual patches with the help of global spatiotemporal dependency via a new fine-grained spatiotemporal selector (FSS). Extensive experiments on Volleyball and NBA datasets demonstrate the effectiveness of the proposed CFNN compared with the existing competitive methods. Code is available at: https://github.com/gexiaojingshelby/CFNN.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 5131-5148, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38300783

RESUMO

One fundamental problem in deep learning is understanding the excellent performance of deep Neural Networks (NNs) in practice. An explanation for the superiority of NNs is that they can realize a large family of complicated functions, i.e., they have powerful expressivity. The expressivity of a Neural Network with Piecewise Linear activations (PLNN) can be quantified by the maximal number of linear regions it can separate its input space into. In this paper, we provide several mathematical results needed for studying the linear regions of Convolutional Neural Networks with Piecewise Linear activations (PLCNNs), and use them to derive the maximal and average numbers of linear regions for one-layer PLCNNs. Furthermore, we obtain upper and lower bounds for the number of linear regions of multi-layer PLCNNs. Our results suggest that deeper PLCNNs have more powerful expressivity than shallow PLCNNs, while PLCNNs have more expressivity than fully-connected PLNNs per parameter, in terms of the number of linear regions.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 12844-12861, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37015683

RESUMO

Zero-shot learning (ZSL) tackles the novel class recognition problem by transferring semantic knowledge from seen classes to unseen ones. Semantic knowledge is typically represented by attribute descriptions shared between different classes, which act as strong priors for localizing object attributes that represent discriminative region features, enabling significant and sufficient visual-semantic interaction for advancing ZSL. Existing attention-based models have struggled to learn inferior region features in a single image by solely using unidirectional attention, which ignore the transferable and discriminative attribute localization of visual features for representing the key semantic knowledge for effective knowledge transfer in ZSL. In this paper, we propose a cross attribute-guided Transformer network, termed TransZero++, to refine visual features and learn accurate attribute localization for key semantic knowledge representations in ZSL. Specifically, TransZero++ employs an attribute → visual Transformer sub-net (AVT) and a visual → attribute Transformer sub-net (VAT) to learn attribute-based visual features and visual-based attribute features, respectively. By further introducing feature-level and prediction-level semantical collaborative losses, the two attribute-guided transformers teach each other to learn semantic-augmented visual embeddings for key semantic knowledge representations via semantical collaborative learning. Finally, the semantic-augmented visual embeddings learned by AVT and VAT are fused to conduct desirable visual-semantic interaction cooperated with class semantic vectors for ZSL classification. Extensive experiments show that TransZero++ achieves the new state-of-the-art results on three golden ZSL benchmarks and on the large-scale ImageNet dataset. The project website is available at: https://shiming-chen.github.io/TransZero-pp/TransZero-pp.html.

4.
IEEE Trans Neural Netw Learn Syst ; 34(11): 9575-9582, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36269927

RESUMO

Generative (generalized) zero-shot learning [(G)ZSL] models aim to synthesize unseen class features by using only seen class feature and attribute pairs as training data. However, the generated fake unseen features tend to be dominated by the seen class features and thus classified as seen classes, which can lead to inferior performances under zero-shot learning (ZSL), and unbalanced results under generalized ZSL (GZSL). To address this challenge, we tailor a novel balanced semantic embedding generative network (BSeGN), which incorporates balanced semantic embedding learning into generative learning scenarios in the pursuit of unbiased GZSL. Specifically, we first design a feature-to-semantic embedding module (FEM) to distinguish real seen and fake unseen features collaboratively with the generator in an online manner. We introduce the bidirectional contrastive and balance losses for the FEM learning, which can guarantee a balanced prediction for the interdomain features. In turn, the updated FEM can boost the learning of the generator. Next, we propose a multilevel feature integration module (mFIM) from the cycle-consistency branch of BSeGN, which can mitigate the domain bias through feature enhancement. To the best of our knowledge, this is the first work to explore embedding and generative learning jointly within the field of ZSL. Extensive evaluations on four benchmarks demonstrate the superiority of BSeGN over its state-of-the-art counterparts.

5.
Artigo em Inglês | MEDLINE | ID: mdl-36107890

RESUMO

Recently, deep metric learning (DML) has achieved great success. Some existing DML methods propose adaptive sample mining strategies, which learn to weight the samples, leading to interesting performance. However, these methods suffer from a small memory (e.g., one training batch), limiting their efficacy. In this work, we introduce a data-driven method, meta-mining strategy with semiglobal information (MMSI), to apply meta-learning to learn to weight samples during the whole training, leading to an adaptive mining strategy. To introduce richer information than one training batch only, we elaborately take advantage of the validation set of meta-learning by implicitly adding additional validation sample information to training. Furthermore, motivated by the latest self-supervised learning, we introduce a dictionary (memory) that maintains very large and diverse information. Together with the validation set, this dictionary presents much richer information to the training, leading to promising performance. In addition, we propose a new theoretical framework that can formulate pairwise and tripletwise metric learning loss functions in a unified framework. This framework brings new insights to society and facilitates us to generalize our MMSI to many existing DML methods. We conduct extensive experiments on three public datasets, CUB200-2011, Cars-196, and Stanford Online Products (SOP). Results show that our method can achieve the state of the art or very competitive performance. Our source codes have been made available at https://github.com/NUST-Machine-Intelligence-Laboratory/MMSI.

6.
IEEE Trans Neural Netw Learn Syst ; 33(7): 2903-2915, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-33493121

RESUMO

Generative adversarial networks (GANs) for (generalized) zero-shot learning (ZSL) aim to generate unseen image features when conditioned on unseen class embeddings, each of which corresponds to one unique category. Most existing works on GANs for ZSL generate features by merely feeding the seen image feature/class embedding (combined with random Gaussian noise) pairs into the generator/discriminator for a two-player minimax game. However, the structure consistency of the distributions among the real/fake image features, which may shift the generated features away from their real distribution to some extent, is seldom considered. In this paper, to align the weights of the generator for better structure consistency between real/fake features, we propose a novel multigraph adaptive GAN (MGA-GAN). Specifically, a Wasserstein GAN equipped with a classification loss is trained to generate discriminative features with structure consistency. MGA-GAN leverages the multigraph similarity structures between sliced seen real/fake feature samples to assist in updating the generator weights in the local feature manifold. Moreover, correlation graphs for the whole real/fake features are adopted to guarantee structure correlation in the global feature manifold. Extensive evaluations on four benchmarks demonstrate well the superiority of MGA-GAN over its state-of-the-art counterparts.

7.
IEEE Trans Image Process ; 30: 4316-4329, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33835918

RESUMO

Transductive zero-shot learning (TZSL) extends conventional ZSL by leveraging (unlabeled) unseen images for model training. A typical method for ZSL involves learning embedding weights from the feature space to the semantic space. However, the learned weights in most existing methods are dominated by seen images, and can thus not be adapted to unseen images very well. In this paper, to align the (embedding) weights for better knowledge transfer between seen/unseen classes, we propose the virtual mainstay alignment network (VMAN), which is tailored for the transductive ZSL task. Specifically, VMAN is casted as a tied encoder-decoder net, thus only one linear mapping weights need to be learned. To explicitly learn the weights in VMAN, for the first time in ZSL, we propose to generate virtual mainstay (VM) samples for each seen class, which serve as new training data and can prevent the weights from being shifted to seen images, to some extent. Moreover, a weighted reconstruction scheme is proposed and incorporated into the model training phase, in both the semantic/feature spaces. In this way, the manifold relationships of the VM samples are well preserved. To further align the weights to adapt to more unseen images, a novel instance-category matching regularization is proposed for model re-training. VMAN is thus modeled as a nested minimization problem and is solved by a Taylor approximate optimization paradigm. In comprehensive evaluations on four benchmark datasets, VMAN achieves superior performances under the (Generalized) TZSL setting.

8.
Neural Netw ; 123: 94-107, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-31837517

RESUMO

Medical prediction is always collectively determined based on bioimages collected from different sources or various clinical characterizations described from multiple physiological features. Notably, learning intrinsic structures from multiple heterogeneous features is significant but challenging in multi-view disease understanding. Different from existing methods that separately deal with each single view, this paper proposes a discriminative Margin-Sensitive Autoencoder (MSAE) framework for automated Alzheimer's disease (AD) diagnosis and accurate protein fold recognition. Generally, our MSAE aims to collaboratively explore the complementary properties of multi-view bioimage features in a semantic-sensitive encoder-decoder paradigm, where the discriminative semantic space is explicitly constructed in a margin-scalable regression model. Specifically, we develop a semantic-sensitive autoencoder, where an encoder projects multi-view visual features into the common semantic-aware latent space, and a decoder is exerted as an additional constraint to reconstruct the respective visual features. In particular, the importance of different views is adaptively weighted by self-adjusting learning scheme, such that their underlying correlations and complementary characteristics across multiple views are simultaneously preserved into the latent common representations. Moreover, a flexible semantic space is formulated by a margin-scalable support vector machine to improve the discriminability of the learning model. Importantly, correntropy induced metric is exploited as a robust regularization measurement to better control outliers for effective classification. A half-quadratic minimization and alternating learning strategy are devised to optimize the resulting framework such that each subproblem exists a closed-form solution in each iterative minimization phase. Extensive experimental results performed on the Alzheimer's Disease Neuroimaging Initiative (ADNI) datasets show that our MSAE can achieve superior performances for both binary and multi-class classification in AD diagnosis, and evaluations on protein folds demonstrate that our method can achieve very encouraging performance on protein structure recognition, outperforming the state-of-the-art methods.


Assuntos
Doença de Alzheimer/diagnóstico por imagem , Neuroimagem/métodos , Reconhecimento Automatizado de Padrão/métodos , Máquina de Vetores de Suporte , Bases de Dados Factuais , Aprendizagem por Discriminação/fisiologia , Humanos
9.
IEEE Trans Neural Netw Learn Syst ; 31(10): 4290-4302, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31870993

RESUMO

Feature representation learning, an emerging topic in recent years, has achieved great progress. Powerful learned features can lead to excellent classification accuracy. In this article, a selective and robust feature representation framework with a supervised constraint (SRSC) is presented. SRSC seeks a selective, robust, and discriminative subspace by transforming the original feature space into the category space. Particularly, we add a selective constraint to the transformation matrix (or classifier parameter) that can select discriminative dimensions of the input samples. Moreover, a supervised regularization is tailored to further enhance the discriminability of the subspace. To relax the hard zero-one label matrix in the category space, an additional error term is also incorporated into the framework, which can lead to a more robust transformation matrix. SRSC is formulated as a constrained least square learning (feature transforming) problem. For the SRSC problem, an inexact augmented Lagrange multiplier method (ALM) is utilized to solve it. Extensive experiments on several benchmark data sets adequately demonstrate the effectiveness and superiority of the proposed method. The proposed SRSC approach has achieved better performances than the compared counterpart methods.

10.
IEEE Trans Image Process ; 28(10): 4803-4818, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31071030

RESUMO

Compact hash code learning has been widely applied to fast similarity search owing to its significantly reduced storage and highly efficient query speed. However, it is still a challenging task to learn discriminative binary codes for perfectly preserving the full pairwise similarities embedded in the high-dimensional real-valued features, such that the promising performance can be guaranteed. To overcome this difficulty, in this paper, we propose a novel scalable supervised asymmetric hashing (SSAH) method, which can skillfully approximate the full-pairwise similarity matrix based on maximum asymmetric inner product of two different non-binary embeddings. In particular, to comprehensively explore the semantic information of data, the supervised label information and the refined latent feature embedding are simultaneously considered to construct the high-quality hashing function and boost the discriminant of the learned binary codes. Specifically, SSAH learns two distinctive hashing functions in conjunction of minimizing the regression loss on the semantic label alignment and the encoding loss on the refined latent features. More importantly, instead of using only part of similarity correlations of data, the full-pairwise similarity matrix is directly utilized to avoid information loss and performance degeneration, and its cumbersome computation complexity on n ×n matrix can be dexterously manipulated during the optimization phase. Furthermore, an efficient alternating optimization scheme with guaranteed convergence is designed to address the resulting discrete optimization problem. The encouraging experimental results on diverse benchmark datasets demonstrate the superiority of the proposed SSAH method in comparison with many recently proposed hashing algorithms.

11.
Acta Biotheor ; 66(3): 251-253, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29872936

RESUMO

In the original publication of the article, the y axis labels present in Figs. 1a and 2a are incorrect. The correct Figs. 1a and 2a are provided here.

12.
Acta Biotheor ; 66(2): 113-133, 2018 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29675730

RESUMO

In this paper, we propose two four-base related 2D curves of DNA primary sequences (termed as F-B curves) and their corresponding single-base related 2D curves (termed as A-related, G-related, T-related and C-related curves). The constructions of these graphical curves are based on the assignments of individual base to four different sinusoidal (or tangent) functions; then by connecting all these points on these four sinusoidal (tangent) functions, we can get the F-B curves; similarly, by connecting the points on each of the four sinusoidal (tangent) functions, we get the single-base related 2D curves. The proposed 2D curves are all strictly non degenerate. Then, a 8-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on a normalized geometrical centers of the proposed curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species, similarity of cDNA sequences of beta-globin gene from eight species, and similarity of the whole mitochondrial genomes of 18 eutherian mammals. The experimental results well demonstrate the effectiveness of the proposed method.


Assuntos
Algoritmos , Gráficos por Computador , DNA/química , DNA/genética , Análise Numérica Assistida por Computador , Análise de Sequência de DNA/métodos , Globinas beta/genética , Animais , Sequência de Bases , Simulação por Computador , Genoma Mitocondrial , Humanos , Filogenia , Especificidade da Espécie
13.
Entropy (Basel) ; 20(10)2018 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-33265817

RESUMO

Aiming to implement image segmentation precisely and efficiently, we exploit new ways to encode images and achieve the optimal thresholding on quantum state space. Firstly, the state vector and density matrix are adopted for the representation of pixel intensities and their probability distribution, respectively. Then, the method based on global quantum entropy maximization (GQEM) is proposed, which has an equivalent object function to Otsu's, but gives a more explicit physical interpretation of image thresholding in the language of quantum mechanics. To reduce the time consumption for searching for optimal thresholds, the method of quantum lossy-encoding-based entropy maximization (QLEEM) is presented, in which the eigenvalues of density matrices can give direct clues for thresholding, and then, the process of optimal searching can be avoided. Meanwhile, the QLEEM algorithm achieves two additional effects: (1) the upper bound of the thresholding level can be implicitly determined according to the eigenvalues; and (2) the proposed approaches ensure that the local information in images is retained as much as possible, and simultaneously, the inter-class separability is maximized in the segmented images. Both of them contribute to the structural characteristics of images, which the human visual system is highly adapted to extract. Experimental results show that the proposed methods are able to achieve a competitive quality of thresholding and the fastest computation speed compared with the state-of-the-art methods.

14.
IEEE Trans Image Process ; 26(3): 1466-1481, 2017 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-28092552

RESUMO

In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA