Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters











Publication year range
1.
Article in English | MEDLINE | ID: mdl-39283787

ABSTRACT

Weakly supervised group activity recognition (WSGAR) aims at identifying the overall behavior of multiple persons without any fine-grained supervision information (including individual position and action label). Traditional methods usually adopt a person-to-whole way: detect persons via off-the-shelf detectors, obtain person-level features, and integrate into the group-level features for training the classifier. However, these methods are unflexible due to serious reliance on the quality of detectors. To get rid of the detector, recent works learn several prototype tokens from noisy grid features with learnable weights directly, which treat all the local visual information equally and bring in redundant and ambiguous information to some extent. To this end, we propose a novel coarse-fine nested network (CFNN) to coarsely localize the key visual patches of activity and further finely learn the local features, as well as the global features. Specifically, we design a nested interactor (NI) to progressively model the spatiotemporal interactions of the learnable global token. According to the cue of spatial interaction in NI, we localize several key visual patches via a new coarse-grained spatial localizer (CSL). Then, we finally encode these localized visual patches with the help of global spatiotemporal dependency via a new fine-grained spatiotemporal selector (FSS). Extensive experiments on Volleyball and NBA datasets demonstrate the effectiveness of the proposed CFNN compared with the existing competitive methods. Code is available at: https://github.com/gexiaojingshelby/CFNN.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 5131-5148, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38300783

ABSTRACT

One fundamental problem in deep learning is understanding the excellent performance of deep Neural Networks (NNs) in practice. An explanation for the superiority of NNs is that they can realize a large family of complicated functions, i.e., they have powerful expressivity. The expressivity of a Neural Network with Piecewise Linear activations (PLNN) can be quantified by the maximal number of linear regions it can separate its input space into. In this paper, we provide several mathematical results needed for studying the linear regions of Convolutional Neural Networks with Piecewise Linear activations (PLCNNs), and use them to derive the maximal and average numbers of linear regions for one-layer PLCNNs. Furthermore, we obtain upper and lower bounds for the number of linear regions of multi-layer PLCNNs. Our results suggest that deeper PLCNNs have more powerful expressivity than shallow PLCNNs, while PLCNNs have more expressivity than fully-connected PLNNs per parameter, in terms of the number of linear regions.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 12844-12861, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37015683

ABSTRACT

Zero-shot learning (ZSL) tackles the novel class recognition problem by transferring semantic knowledge from seen classes to unseen ones. Semantic knowledge is typically represented by attribute descriptions shared between different classes, which act as strong priors for localizing object attributes that represent discriminative region features, enabling significant and sufficient visual-semantic interaction for advancing ZSL. Existing attention-based models have struggled to learn inferior region features in a single image by solely using unidirectional attention, which ignore the transferable and discriminative attribute localization of visual features for representing the key semantic knowledge for effective knowledge transfer in ZSL. In this paper, we propose a cross attribute-guided Transformer network, termed TransZero++, to refine visual features and learn accurate attribute localization for key semantic knowledge representations in ZSL. Specifically, TransZero++ employs an attribute → visual Transformer sub-net (AVT) and a visual → attribute Transformer sub-net (VAT) to learn attribute-based visual features and visual-based attribute features, respectively. By further introducing feature-level and prediction-level semantical collaborative losses, the two attribute-guided transformers teach each other to learn semantic-augmented visual embeddings for key semantic knowledge representations via semantical collaborative learning. Finally, the semantic-augmented visual embeddings learned by AVT and VAT are fused to conduct desirable visual-semantic interaction cooperated with class semantic vectors for ZSL classification. Extensive experiments show that TransZero++ achieves the new state-of-the-art results on three golden ZSL benchmarks and on the large-scale ImageNet dataset. The project website is available at: https://shiming-chen.github.io/TransZero-pp/TransZero-pp.html.

4.
IEEE Trans Neural Netw Learn Syst ; 34(11): 9575-9582, 2023 Nov.
Article in English | MEDLINE | ID: mdl-36269927

ABSTRACT

Generative (generalized) zero-shot learning [(G)ZSL] models aim to synthesize unseen class features by using only seen class feature and attribute pairs as training data. However, the generated fake unseen features tend to be dominated by the seen class features and thus classified as seen classes, which can lead to inferior performances under zero-shot learning (ZSL), and unbalanced results under generalized ZSL (GZSL). To address this challenge, we tailor a novel balanced semantic embedding generative network (BSeGN), which incorporates balanced semantic embedding learning into generative learning scenarios in the pursuit of unbiased GZSL. Specifically, we first design a feature-to-semantic embedding module (FEM) to distinguish real seen and fake unseen features collaboratively with the generator in an online manner. We introduce the bidirectional contrastive and balance losses for the FEM learning, which can guarantee a balanced prediction for the interdomain features. In turn, the updated FEM can boost the learning of the generator. Next, we propose a multilevel feature integration module (mFIM) from the cycle-consistency branch of BSeGN, which can mitigate the domain bias through feature enhancement. To the best of our knowledge, this is the first work to explore embedding and generative learning jointly within the field of ZSL. Extensive evaluations on four benchmarks demonstrate the superiority of BSeGN over its state-of-the-art counterparts.

5.
Article in English | MEDLINE | ID: mdl-36107890

ABSTRACT

Recently, deep metric learning (DML) has achieved great success. Some existing DML methods propose adaptive sample mining strategies, which learn to weight the samples, leading to interesting performance. However, these methods suffer from a small memory (e.g., one training batch), limiting their efficacy. In this work, we introduce a data-driven method, meta-mining strategy with semiglobal information (MMSI), to apply meta-learning to learn to weight samples during the whole training, leading to an adaptive mining strategy. To introduce richer information than one training batch only, we elaborately take advantage of the validation set of meta-learning by implicitly adding additional validation sample information to training. Furthermore, motivated by the latest self-supervised learning, we introduce a dictionary (memory) that maintains very large and diverse information. Together with the validation set, this dictionary presents much richer information to the training, leading to promising performance. In addition, we propose a new theoretical framework that can formulate pairwise and tripletwise metric learning loss functions in a unified framework. This framework brings new insights to society and facilitates us to generalize our MMSI to many existing DML methods. We conduct extensive experiments on three public datasets, CUB200-2011, Cars-196, and Stanford Online Products (SOP). Results show that our method can achieve the state of the art or very competitive performance. Our source codes have been made available at https://github.com/NUST-Machine-Intelligence-Laboratory/MMSI.

6.
Article in English | MEDLINE | ID: mdl-35507624

ABSTRACT

Zero-shot learning (ZSL) tackles the unseen class recognition problem by transferring semantic knowledge from seen classes to unseen ones. Typically, to guarantee desirable knowledge transfer, a direct embedding is adopted for associating the visual and semantic domains in ZSL. However, most existing ZSL methods focus on learning the embedding from implicit global features or image regions to the semantic space. Thus, they fail to: 1) exploit the appearance relationship priors between various local regions in a single image, which corresponds to the semantic information and 2) learn cooperative global and local features jointly for discriminative feature representations. In this article, we propose the novel graph navigated dual attention network (GNDAN) for ZSL to address these drawbacks. GNDAN employs a region-guided attention network (RAN) and a region-guided graph attention network (RGAT) to jointly learn a discriminative local embedding and incorporate global context for exploiting explicit global embeddings under the guidance of a graph. Specifically, RAN uses soft spatial attention to discover discriminative regions for generating local embeddings. Meanwhile, RGAT employs an attribute-based attention to obtain attribute-based region features, where each attribute focuses on the most relevant image regions. Motivated by the graph neural network (GNN), which is beneficial for structural relationship representations, RGAT further leverages a graph attention network to exploit the relationships between the attribute-based region features for explicit global embedding representations. Based on the self-calibration mechanism, the joint visual embedding learned is matched with the semantic embedding to form the final prediction. Extensive experiments on three benchmark datasets demonstrate that the proposed GNDAN achieves superior performances to the state-of-the-art methods. Our code and trained models are available at https://github.com/shiming-chen/GNDAN.

7.
IEEE J Biomed Health Inform ; 26(11): 5310-5320, 2022 11.
Article in English | MEDLINE | ID: mdl-34478389

ABSTRACT

Accurate medical image segmentation of brain tumors is necessary for the diagnosing, monitoring, and treating disease. In recent years, with the gradual emergence of multi-sequence magnetic resonance imaging (MRI), multi-modal MRI diagnosis has played an increasingly important role in the early diagnosis of brain tumors by providing complementary information for a given lesion. Different MRI modalities vary significantly in context, as well as in coarse and fine information. As the manual identification of brain tumors is very complicated, it usually requires the lengthy consultation of multiple experts. The automatic segmentation of brain tumors from MRI images can thus greatly reduce the workload of doctors and buy more time for treating patients. In this paper, we propose a multi-modal brain tumor segmentation framework that adopts the hybrid fusion of modality-specific features using a self-supervised learning strategy. The algorithm is based on a fully convolutional neural network. Firstly, we propose a multi-input architecture that learns independent features from multi-modal data, and can be adapted to different numbers of multi-modal inputs. Compared with single-modal multi-channel networks, our model provides a better feature extractor for segmentation tasks, which learns cross-modal information from multi-modal data. Secondly, we propose a new feature fusion scheme, named hybrid attentional fusion. This scheme enables the network to learn the hybrid representation of multiple features and capture the correlation information between them through an attention mechanism. Unlike popular methods, such as feature map concatenation, this scheme focuses on the complementarity between multi-modal data, which can significantly improve the segmentation results of specific regions. Thirdly, we propose a self-supervised learning strategy for brain tumor segmentation tasks. Our experimental results demonstrate the effectiveness of the proposed model against other state-of-the-art multi-modal medical segmentation methods.


Subject(s)
Brain Neoplasms , Neural Networks, Computer , Humans , Brain Neoplasms/diagnostic imaging , Magnetic Resonance Imaging/methods , Algorithms , Image Processing, Computer-Assisted/methods
8.
IEEE Trans Neural Netw Learn Syst ; 33(7): 2903-2915, 2022 Jul.
Article in English | MEDLINE | ID: mdl-33493121

ABSTRACT

Generative adversarial networks (GANs) for (generalized) zero-shot learning (ZSL) aim to generate unseen image features when conditioned on unseen class embeddings, each of which corresponds to one unique category. Most existing works on GANs for ZSL generate features by merely feeding the seen image feature/class embedding (combined with random Gaussian noise) pairs into the generator/discriminator for a two-player minimax game. However, the structure consistency of the distributions among the real/fake image features, which may shift the generated features away from their real distribution to some extent, is seldom considered. In this paper, to align the weights of the generator for better structure consistency between real/fake features, we propose a novel multigraph adaptive GAN (MGA-GAN). Specifically, a Wasserstein GAN equipped with a classification loss is trained to generate discriminative features with structure consistency. MGA-GAN leverages the multigraph similarity structures between sliced seen real/fake feature samples to assist in updating the generator weights in the local feature manifold. Moreover, correlation graphs for the whole real/fake features are adopted to guarantee structure correlation in the global feature manifold. Extensive evaluations on four benchmarks demonstrate well the superiority of MGA-GAN over its state-of-the-art counterparts.

9.
IEEE Trans Image Process ; 30: 4316-4329, 2021.
Article in English | MEDLINE | ID: mdl-33835918

ABSTRACT

Transductive zero-shot learning (TZSL) extends conventional ZSL by leveraging (unlabeled) unseen images for model training. A typical method for ZSL involves learning embedding weights from the feature space to the semantic space. However, the learned weights in most existing methods are dominated by seen images, and can thus not be adapted to unseen images very well. In this paper, to align the (embedding) weights for better knowledge transfer between seen/unseen classes, we propose the virtual mainstay alignment network (VMAN), which is tailored for the transductive ZSL task. Specifically, VMAN is casted as a tied encoder-decoder net, thus only one linear mapping weights need to be learned. To explicitly learn the weights in VMAN, for the first time in ZSL, we propose to generate virtual mainstay (VM) samples for each seen class, which serve as new training data and can prevent the weights from being shifted to seen images, to some extent. Moreover, a weighted reconstruction scheme is proposed and incorporated into the model training phase, in both the semantic/feature spaces. In this way, the manifold relationships of the VM samples are well preserved. To further align the weights to adapt to more unseen images, a novel instance-category matching regularization is proposed for model re-training. VMAN is thus modeled as a nested minimization problem and is solved by a Taylor approximate optimization paradigm. In comprehensive evaluations on four benchmark datasets, VMAN achieves superior performances under the (Generalized) TZSL setting.

10.
IEEE Trans Neural Netw Learn Syst ; 31(7): 2348-2360, 2020 Jul.
Article in English | MEDLINE | ID: mdl-32012029

ABSTRACT

Studies present that dividing categories into subcategories contributes to better image classification. Existing image subcategorization works relying on expert knowledge and labeled images are both time-consuming and labor-intensive. In this article, we propose to select and subsequently classify images into categories and subcategories. Specifically, we first obtain a list of candidate subcategory labels from untagged corpora. Then, we purify these subcategory labels through calculating the relevance to the target category. To suppress the search error and noisy subcategory label-induced outlier images, we formulate outlier images removing and the optimal classification models learning as a unified problem to jointly learn multiple classifiers, where the classifier for a category is obtained by combining multiple subcategory classifiers. Compared with the existing subcategorization works, our approach eliminates the dependence on expert knowledge and labeled images. Extensive experiments on image categorization and subcategorization demonstrate the superiority of our proposed approach.

11.
Neural Netw ; 123: 94-107, 2020 Mar.
Article in English | MEDLINE | ID: mdl-31837517

ABSTRACT

Medical prediction is always collectively determined based on bioimages collected from different sources or various clinical characterizations described from multiple physiological features. Notably, learning intrinsic structures from multiple heterogeneous features is significant but challenging in multi-view disease understanding. Different from existing methods that separately deal with each single view, this paper proposes a discriminative Margin-Sensitive Autoencoder (MSAE) framework for automated Alzheimer's disease (AD) diagnosis and accurate protein fold recognition. Generally, our MSAE aims to collaboratively explore the complementary properties of multi-view bioimage features in a semantic-sensitive encoder-decoder paradigm, where the discriminative semantic space is explicitly constructed in a margin-scalable regression model. Specifically, we develop a semantic-sensitive autoencoder, where an encoder projects multi-view visual features into the common semantic-aware latent space, and a decoder is exerted as an additional constraint to reconstruct the respective visual features. In particular, the importance of different views is adaptively weighted by self-adjusting learning scheme, such that their underlying correlations and complementary characteristics across multiple views are simultaneously preserved into the latent common representations. Moreover, a flexible semantic space is formulated by a margin-scalable support vector machine to improve the discriminability of the learning model. Importantly, correntropy induced metric is exploited as a robust regularization measurement to better control outliers for effective classification. A half-quadratic minimization and alternating learning strategy are devised to optimize the resulting framework such that each subproblem exists a closed-form solution in each iterative minimization phase. Extensive experimental results performed on the Alzheimer's Disease Neuroimaging Initiative (ADNI) datasets show that our MSAE can achieve superior performances for both binary and multi-class classification in AD diagnosis, and evaluations on protein folds demonstrate that our method can achieve very encouraging performance on protein structure recognition, outperforming the state-of-the-art methods.


Subject(s)
Alzheimer Disease/diagnostic imaging , Neuroimaging/methods , Pattern Recognition, Automated/methods , Support Vector Machine , Databases, Factual , Discrimination Learning/physiology , Humans
12.
IEEE Trans Neural Netw Learn Syst ; 31(10): 4290-4302, 2020 Oct.
Article in English | MEDLINE | ID: mdl-31870993

ABSTRACT

Feature representation learning, an emerging topic in recent years, has achieved great progress. Powerful learned features can lead to excellent classification accuracy. In this article, a selective and robust feature representation framework with a supervised constraint (SRSC) is presented. SRSC seeks a selective, robust, and discriminative subspace by transforming the original feature space into the category space. Particularly, we add a selective constraint to the transformation matrix (or classifier parameter) that can select discriminative dimensions of the input samples. Moreover, a supervised regularization is tailored to further enhance the discriminability of the subspace. To relax the hard zero-one label matrix in the category space, an additional error term is also incorporated into the framework, which can lead to a more robust transformation matrix. SRSC is formulated as a constrained least square learning (feature transforming) problem. For the SRSC problem, an inexact augmented Lagrange multiplier method (ALM) is utilized to solve it. Extensive experiments on several benchmark data sets adequately demonstrate the effectiveness and superiority of the proposed method. The proposed SRSC approach has achieved better performances than the compared counterpart methods.

13.
IEEE Trans Image Process ; 28(10): 4803-4818, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31071030

ABSTRACT

Compact hash code learning has been widely applied to fast similarity search owing to its significantly reduced storage and highly efficient query speed. However, it is still a challenging task to learn discriminative binary codes for perfectly preserving the full pairwise similarities embedded in the high-dimensional real-valued features, such that the promising performance can be guaranteed. To overcome this difficulty, in this paper, we propose a novel scalable supervised asymmetric hashing (SSAH) method, which can skillfully approximate the full-pairwise similarity matrix based on maximum asymmetric inner product of two different non-binary embeddings. In particular, to comprehensively explore the semantic information of data, the supervised label information and the refined latent feature embedding are simultaneously considered to construct the high-quality hashing function and boost the discriminant of the learned binary codes. Specifically, SSAH learns two distinctive hashing functions in conjunction of minimizing the regression loss on the semantic label alignment and the encoding loss on the refined latent features. More importantly, instead of using only part of similarity correlations of data, the full-pairwise similarity matrix is directly utilized to avoid information loss and performance degeneration, and its cumbersome computation complexity on n ×n matrix can be dexterously manipulated during the optimization phase. Furthermore, an efficient alternating optimization scheme with guaranteed convergence is designed to address the resulting discrete optimization problem. The encouraging experimental results on diverse benchmark datasets demonstrate the superiority of the proposed SSAH method in comparison with many recently proposed hashing algorithms.

14.
Acta Biotheor ; 66(3): 251-253, 2018 09.
Article in English | MEDLINE | ID: mdl-29872936

ABSTRACT

In the original publication of the article, the y axis labels present in Figs. 1a and 2a are incorrect. The correct Figs. 1a and 2a are provided here.

15.
Acta Biotheor ; 66(2): 113-133, 2018 Jun.
Article in English | MEDLINE | ID: mdl-29675730

ABSTRACT

In this paper, we propose two four-base related 2D curves of DNA primary sequences (termed as F-B curves) and their corresponding single-base related 2D curves (termed as A-related, G-related, T-related and C-related curves). The constructions of these graphical curves are based on the assignments of individual base to four different sinusoidal (or tangent) functions; then by connecting all these points on these four sinusoidal (tangent) functions, we can get the F-B curves; similarly, by connecting the points on each of the four sinusoidal (tangent) functions, we get the single-base related 2D curves. The proposed 2D curves are all strictly non degenerate. Then, a 8-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on a normalized geometrical centers of the proposed curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species, similarity of cDNA sequences of beta-globin gene from eight species, and similarity of the whole mitochondrial genomes of 18 eutherian mammals. The experimental results well demonstrate the effectiveness of the proposed method.


Subject(s)
Algorithms , Computer Graphics , DNA/chemistry , DNA/genetics , Numerical Analysis, Computer-Assisted , Sequence Analysis, DNA/methods , beta-Globins/genetics , Animals , Base Sequence , Computer Simulation , Genome, Mitochondrial , Humans , Phylogeny , Species Specificity
16.
Entropy (Basel) ; 20(10)2018 Sep 23.
Article in English | MEDLINE | ID: mdl-33265817

ABSTRACT

Aiming to implement image segmentation precisely and efficiently, we exploit new ways to encode images and achieve the optimal thresholding on quantum state space. Firstly, the state vector and density matrix are adopted for the representation of pixel intensities and their probability distribution, respectively. Then, the method based on global quantum entropy maximization (GQEM) is proposed, which has an equivalent object function to Otsu's, but gives a more explicit physical interpretation of image thresholding in the language of quantum mechanics. To reduce the time consumption for searching for optimal thresholds, the method of quantum lossy-encoding-based entropy maximization (QLEEM) is presented, in which the eigenvalues of density matrices can give direct clues for thresholding, and then, the process of optimal searching can be avoided. Meanwhile, the QLEEM algorithm achieves two additional effects: (1) the upper bound of the thresholding level can be implicitly determined according to the eigenvalues; and (2) the proposed approaches ensure that the local information in images is retained as much as possible, and simultaneously, the inter-class separability is maximized in the segmented images. Both of them contribute to the structural characteristics of images, which the human visual system is highly adapted to extract. Experimental results show that the proposed methods are able to achieve a competitive quality of thresholding and the fastest computation speed compared with the state-of-the-art methods.

17.
IEEE Trans Image Process ; 26(3): 1466-1481, 2017 Mar.
Article in English | MEDLINE | ID: mdl-28092552

ABSTRACT

In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.

18.
J Theor Biol ; 269(1): 123-30, 2011 Jan 21.
Article in English | MEDLINE | ID: mdl-20969878

ABSTRACT

In this article, we introduce three 3D graphical representations of DNA primary sequences, which we call RY-curve, MK-curve and SW-curve, based on three classifications of the DNA bases. The advantages of our representations are that (i) these 3D curves are strictly non-degenerate and there is no loss of information when transferring a DNA sequence to its mathematical representation and (ii) the coordinates of every node on these 3D curves have clear biological implication. Two applications of these 3D curves are presented: (a) a simple formula is derived to calculate the content of the four bases (A, G, C and T) from the coordinates of nodes on the curves; and (b) a 12-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on the geometrical centers of the 3D curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species and validate similarity of cDNA sequences of beta-globin gene from eight species.


Subject(s)
Base Composition/genetics , DNA/chemistry , DNA/genetics , Models, Molecular , Animals , Base Sequence , DNA, Complementary/genetics , Exons/genetics , Humans , Molecular Sequence Data , Open Reading Frames/genetics , Sequence Homology, Nucleic Acid , Species Specificity , beta-Globins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL