Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
Comput Med Imaging Graph ; 108: 102249, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37290374

RESUMEN

Magnetic resonance (MR) and computer tomography (CT) images are two typical types of medical images that provide mutually-complementary information for accurate clinical diagnosis and treatment. However, obtaining both images may be limited due to some considerations such as cost, radiation dose and modality missing. Recently, medical image synthesis has aroused gaining research interest to cope with this limitation. In this paper, we propose a bidirectional learning model, denoted as dual contrast cycleGAN (DC-cycleGAN), to synthesize medical images from unpaired data. Specifically, a dual contrast loss is introduced into the discriminators to indirectly build constraints between real source and synthetic images by taking advantage of samples from the source domain as negative samples and enforce the synthetic images to fall far away from the source domain. In addition, cross-entropy and structural similarity index (SSIM) are integrated into the DC-cycleGAN in order to consider both the luminance and structure of samples when synthesizing images. The experimental results indicate that DC-cycleGAN is able to produce promising results as compared with other cycleGAN-based medical image synthesis methods such as cycleGAN, RegGAN, DualGAN, and NiceGAN. Code is available at https://github.com/JiayuanWang-JW/DC-cycleGAN.


Asunto(s)
Aprendizaje Profundo , Tomografía Computarizada por Rayos X , Tomografía Computarizada por Rayos X/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Computadores , Espectroscopía de Resonancia Magnética
2.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4051-4070, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-35849673

RESUMEN

Generalized zero-shot learning (GZSL) aims to train a model for classifying data samples under the condition that some output classes are unknown during supervised learning. To address this challenging task, GZSL leverages semantic information of the seen (source) and unseen (target) classes to bridge the gap between both seen and unseen classes. Since its introduction, many GZSL models have been formulated. In this review paper, we present a comprehensive review on GZSL. First, we provide an overview of GZSL including the problems and challenges. Then, we introduce a hierarchical categorization for the GZSL methods and discuss the representative methods in each category. In addition, we discuss the available benchmark data sets and applications of GZSL, along with a discussion on the research gaps and directions for future investigations.

3.
IEEE Trans Cybern ; 53(11): 6923-6936, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-35687637

RESUMEN

Semisupervised classification with a few labeled training samples is a challenging task in the area of data mining. Moore-Penrose inverse (MPI)-based manifold regularization (MR) is a widely used technique in tackling semisupervised classification. However, most of the existing MPI-based MR algorithms can only generate loosely connected feature encoding, which is generally less effective in data representation and feature learning. To alleviate this deficiency, we introduce a new semisupervised multilayer subnet neural network called SS-MSNN. The key contributions of this article are as follows: 1) a novel MPI-based MR model using the subnetwork structure is introduced. The subnet model is utilized to enrich the latent space representations iteratively; 2) a one-step training process to learn the discriminative encoding is proposed. The proposed SS-MSNN learns parameters by directly optimizing the entire network, accepting input from one end, and producing output at the other end; and 3) a new semisupervised dataset called HFSWR-RDE is built for this research. Experimental results on multiple domains show that the SS-MSNN achieves promising performance over the other semisupervised learning algorithms, demonstrating fast inference speed and better generalization ability.

4.
IEEE Trans Image Process ; 32: 13-28, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36459602

RESUMEN

Human action recognition (HAR) is one of most important tasks in video analysis. Since video clips distributed on networks are usually untrimmed, it is required to accurately segment a given untrimmed video into a set of action segments for HAR. As an unsupervised temporal segmentation technology, subspace clustering learns the codes from each video to construct an affinity graph, and then cuts the affinity graph to cluster the video into a set of action segments. However, most of the existing subspace clustering schemes not only ignore the sequential information of frames in code learning, but also the negative effects of noises when cutting the affinity graph, which lead to inferior performance. To address these issues, we propose a sequential order-aware coding-based robust subspace clustering (SOAC-RSC) scheme for HAR. By feeding the motion features of video frames into multi-layer neural networks, two expressive code matrices are learned in a sequential order-aware manner from unconstrained and constrained videos, respectively, to construct the corresponding affinity graphs. Then, with the consideration of the existence of noise effects, a simple yet robust cutting algorithm is proposed to cut the constructed affinity graphs to accurately obtain the action segments for HAR. The extensive experiments demonstrate the proposed SOAC-RSC scheme achieves the state-of-the-art performance on the datasets of Keck Gesture and Weizmann, and provides competitive performance on the other 6 public datasets such as UCF101 and URADL for HAR task, compared to the recent related approaches.

5.
IEEE Trans Cybern ; 53(10): 6303-6316, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-35486564

RESUMEN

The multilayer one-class classification (OCC) frameworks have gained great traction in research on anomaly and outlier detection. However, most multilayer OCC algorithms suffer from loosely connected feature coding, affecting the ability of generated latent space to properly generate a highly discriminative representation between object classes. To alleviate this deficiency, two novel OCC frameworks, namely: 1) OCC structure using the subnetwork neural network (OC-SNN) and 2) maximum correntropy-based OC-SNN (MCOC-SNN), are proposed in this article. The novelties of this article are as follows: 1) the subnetwork is used to build the discriminative latent space; 2) the proposed models are one-step learning networks, instead of stacking feature learning blocks and final classification layer to recognize the input pattern; 3) unlike existing works which utilize mean square error (MSE) to learn low-dimensional features, the MCOC-SNN uses maximum correntropy criterion (MCC) for discriminative feature encoding; and 4) a brand-new OCC dataset, called CO-Mask, is built for this research. Experimental results on the visual classification domain with a varying number of training samples from 6131 to 513 061 demonstrate that the proposed OC-SNN and MCOC-SNN achieve superior performance compared to the existing multilayer OCC models. For reproducibility, the source codes are available at https://github.com/W1AE/OCC.

6.
IEEE Trans Cybern ; 53(4): 2151-2163, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-34546939

RESUMEN

Pattern recognition is significantly challenging in real-world scenarios by the variability of visual statistics. Therefore, most existing algorithms relying on the independent identically distributed assumption of training and test data suffer from the poor generalization capability of inference on unseen testing datasets. Although numerous studies, including domain discriminator or domain-invariant feature learning, are proposed to alleviate this problem, the data-driven property and lack of interpretation of their principle throw researchers and developers off. Consequently, this dilemma incurs us to rethink the essence of networks' generalization. An observation that visual patterns cannot be discriminative after style transfer inspires us to take careful consideration of the importance of style features and content features. Does the style information related to the domain bias? How to effectively disentangle content and style features across domains? In this article, we first investigate the effect of feature normalization on domain adaptation. Based on it, we propose a novel normalization module to adaptively leverage the propagated information through each channel and batch of features called disentangling batch instance normalization (D-BIN). In this module, we explicitly explore domain-specific and domaininvariant feature disentanglement. We maneuver contrastive learning to encourage images with the same semantics from different domains to have similar content representations while having dissimilar style representations. Furthermore, we construct both self-form and dual-form regularizers for preserving the mutual information (MI) between feature representations of the normalization layer in order to compensate for the loss of discriminative information and effectively match the distributions across domains. D-BIN and the constrained term can be simply plugged into state-of-the-art (SOTA) networks to improve their performance. In the end, experiments, including domain adaptation and generalization, conducted on different datasets have proven their effectiveness.

7.
Artículo en Inglés | MEDLINE | ID: mdl-36215378

RESUMEN

In this article, we propose a new cross locality relation network (CLRNet) to generate high-quality crowd density maps for crowd counting in videos. Specifically, a cross locality relation module (CLRM) is proposed to enhance feature representations by modeling local dependencies of pixels between adjacent frames with an adapted local self-attention mechanism. First, different from the existing methods which measure similarity between pixels by dot product, a new adaptive cosine similarity is advanced to measure the relationship between two positions. Second, the traditional self-attention modules usually integrate the reconstructed features with the same weights for all the positions. However, crowd movement and background changes in a video sequence are uneven in real-life applications. As a consequence, it is inappropriate to treat all the positions in reconstructed features equally. To address this issue, a scene consistency attention map (SCAM) is developed to make CLRM pay more attention to the positions with strong correlations in adjacent frames. Furthermore, CLRM is incorporated into the network in a coarse-to-fine way to further enhance the representational capability of features. Experimental results demonstrate the effectiveness of our proposed CLRNet in comparison to the state-of-the-art methods on four public video datasets. The codes are available at: https://github.com/Amelie01/CLRNet.

8.
Artículo en Inglés | MEDLINE | ID: mdl-36279331

RESUMEN

Most multilayer Moore-Penrose inverse (MPI)-based neural networks, such as deep random vector functional link (RVFL), are structured with two separate stages: unsupervised feature encoding and supervised pattern classification. Once the unsupervised learning is finished, the latent encoding is fixed without supervised fine-tuning. However, in complex tasks such as handling the ImageNet dataset, there are often many more clues that can be directly encoded, while unsupervised learning, by definition, cannot know exactly what is useful for a certain task. There is a need to retrain the latent space representations in the supervised pattern classification stage to learn some clues that unsupervised learning has not yet been learned. In particular, the residual error in the output layer is pulled back to each hidden layer, and the parameters of the hidden layers are recalculated with MPI for more robust representations. In this article, a recomputation-based multilayer network using Moore-Penrose inverse (RML-MP) is developed. A sparse RML-MP (SRML-MP) model to boost the performance of RML-MP is then proposed. The experimental results with varying training samples (from 3k to 1.8 million) show that the proposed models provide higher Top-1 testing accuracy than most representation learning algorithms. For reproducibility, the source codes are available at https://github.com/W1AE/Retraining.

9.
IEEE Trans Cybern ; 52(5): 3097-3110, 2022 May.
Artículo en Inglés | MEDLINE | ID: mdl-33027022

RESUMEN

The phenomenon of increasing accidents caused by reduced vigilance does exist. In the future, the high accuracy of vigilance estimation will play a significant role in public transportation safety. We propose a multimodal regression network that consists of multichannel deep autoencoders with subnetwork neurons (MCDAE sn ). After we define two thresholds of "0.35" and "0.70" from the percentage of eye closure, the output values are in the continuous range of 0-0.35, 0.36-0.70, and 0.71-1 representing the awake state, the tired state, and the drowsy state, respectively. To verify the efficiency of our strategy, we first applied the proposed approach to a single modality. Then, for the multimodality, since the complementary information between forehead electrooculography and electroencephalography features, we found the performance of the proposed approach using features fusion significantly improved, demonstrating the effectiveness and efficiency of our method.


Asunto(s)
Aprendizaje Profundo , Vigilia , Electroencefalografía , Electrooculografía/métodos
10.
Mach Vis Appl ; 32(2): 45, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33623184

RESUMEN

Salient object detection is a hot spot of current computer vision. The emergence of the convolutional neural network (CNN) greatly improves the existing detection methods. In this paper, we present 3MNet, which is based on the CNN, to make the utmost of various features of the image and utilize the contour detection task of the salient object to explicitly model the features of multi-level structures, multiple tasks and multiple channels, so as to obtain the final saliency map of the fusion of these features. Specifically, we first utilize contour detection task for auxiliary detection and then utilize use multi-layer network structure to extract multi-scale image information. Finally, we introduce a unique module into the network to model the channel information of the image. Our network has produced good results on five widely used datasets. In addition, we also conducted a series of ablation experiments to verify the effectiveness of some components in the network.

11.
IEEE Trans Neural Netw Learn Syst ; 32(8): 3770-3776, 2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-32822309

RESUMEN

Autoencoding is a vital branch of representation learning in deep neural networks (DNNs). The extreme learning machine-based autoencoder (ELM-AE) has been recently developed and has gained popularity for its fast learning speed and ease of implementation. However, the ELM-AE uses random hidden node parameters without tuning, which may generate meaningless encoded features. In this brief, we first propose a within-class scatter information constraint-based AE (WSI-AE) that minimizes both the reconstruction error and the within-class scatter of the encoded features. We then build stacked WSI-AEs into a one-class classification (OCC) algorithm based on the hierarchical regularized least-squared method. The effectiveness of our approach was experimentally demonstrated in comparisons with several state-of-the-art AEs and OCC algorithms. The evaluations were performed on several benchmark data sets.

12.
IEEE Trans Neural Netw Learn Syst ; 32(11): 5008-5021, 2021 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-33021948

RESUMEN

Fully connected representation learning (FCRL) is one of the widely used network structures in multimodel image classification frameworks. However, most FCRL-based structures, for instance, stacked autoencoder encode features and find the final cognition with separate building blocks, resulting in loosely connected feature representation. This article achieves a robust representation by considering a low-dimensional feature and the classifier model simultaneously. Thus, a new hierarchical subnetwork-based neural network (HSNN) is proposed in this article. The novelties of this framework are as follows: 1) it is an iterative learning process, instead of stacking separate blocks to obtain the discriminative encoding and the final classification results. In this sense, the optimal global features are generated; 2) it applies Moore-Penrose (MP) inverse-based batch-by-batch learning strategy to handle large-scale data sets, so that large data set, such as Place365 containing 1.8 million images, can be processed effectively. The experimental results on multiple domains with a varying number of training samples from  âˆ¼  1 K to  âˆ¼ 2 M show that the proposed feature reinforcement framework achieves better generalization performance compared with most state-of-the-art FCRL methods.

13.
IEEE Trans Cybern ; 51(10): 5105-5115, 2021 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-31478888

RESUMEN

Recently, the correlation filter (CF) has been catching significant attention in visual tracking for its high efficiency in most state-of-the-art algorithms. However, the tracker easily fails when facing the distractions caused by background clutter, occlusion, and other challenging situations. These distractions commonly exist in the visual object tracking of real applications. Keep tracking under these circumstances is the bottleneck in the field. To improve tracking performance under complex interference, a combination of least absolute shrinkage and selection operator (LASSO) regression and contextual information is introduced to the CF framework through the learning stage in this article to ignore these distractions. Moreover, an elastic net regression is proposed to regroup the features, and an adaptive scale method is implemented to deal with the scale changes during tracking. Theoretical analysis and exhaustive experimental analysis show that the proposed peak strength context-aware (PSCA) CF significantly improves the kernelized CF (KCF) and achieves better performance than other state-of-the-art trackers.

14.
IEEE Trans Med Imaging ; 40(3): 1032-1041, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33326377

RESUMEN

Anomaly detection refers to the identification of cases that do not conform to the expected pattern, which takes a key role in diverse research areas and application domains. Most of existing methods can be summarized as anomaly object detection-based and reconstruction error-based techniques. However, due to the bottleneck of defining encompasses of real-world high-diversity outliers and inaccessible inference process, individually, most of them have not derived groundbreaking progress. To deal with those imperfectness, and motivated by memory-based decision-making and visual attention mechanism as a filter to select environmental information in human vision perceptual system, in this paper, we propose a Multi-scale Attention Memory with hash addressing Autoencoder network (MAMA Net) for anomaly detection. First, to overcome a battery of problems result from the restricted stationary receptive field of convolution operator, we coin the multi-scale global spatial attention block which can be straightforwardly plugged into any networks as sampling, upsampling and downsampling function. On account of its efficient features representation ability, networks can achieve competitive results with only several level blocks. Second, it's observed that traditional autoencoder can only learn an ambiguous model that also reconstructs anomalies "well" due to lack of constraints in training and inference process. To mitigate this challenge, we design a hash addressing memory module that proves abnormalities to produce higher reconstruction error for classification. In addition, we couple the mean square error (MSE) with Wasserstein loss to improve the encoding data distribution. Experiments on various datasets, including two different COVID-19 datasets and one brain MRI (RIDER) dataset prove the robustness and excellent generalization of the proposed MAMA Net.


Asunto(s)
Interpretación de Imagen Asistida por Computador/métodos , Redes Neurales de la Computación , Algoritmos , Encéfalo/diagnóstico por imagen , COVID-19/diagnóstico por imagen , Humanos , Pulmón/diagnóstico por imagen , Imagen por Resonancia Magnética , SARS-CoV-2 , Tomografía Computarizada por Rayos X
15.
ISA Trans ; 101: 160-169, 2020 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-32111406

RESUMEN

Student's t distribution is a useful tool that can model heavy-tailed noises appearing in many practical systems. Although t distribution based filter has been derived, the information filter form is not presented and the data fusion algorithms for dynamic systems disturbed by heavy-tailed noises are rarely concerned. In this paper, based on multivariate t distribution and variational Bayesian estimation, the information filter, the centralized batch fusion, the distributed fusion, and the suboptimal distributed fusion algorithms are derived, respectively. The centralized fusion is given in two forms, namely, from t distribution based filter and the proposed t distribution based information filter, respectively. The distributed fusion is deduced by the use of the newly derived information filter, and it has been demonstrated to be equivalent to the centralized batch fusion. The suboptimal distributed fusion is obtained by a parameter approximation from the derived distributed fusion to decrease the computation complexity. The presented algorithms are shown to be the generalization of the classical Kalman filter based traditional algorithms. Theoretical analysis and exhaustive experimental analysis by a target tracking example show that the proposed algorithms are feasible and effective.

16.
IEEE Trans Pattern Anal Mach Intell ; 42(11): 2912-2925, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-31107643

RESUMEN

Gradient descent optimization of learning has become a paradigm for training deep convolutional neural networks (DCNN). However, utilizing other learning strategies in the training process of the DCNN has rarely been explored by the deep learning (DL) community. This serves as the motivation to introduce a non-iterative learning strategy to retrain neurons at the top dense or fully connected (FC) layers of DCNN, resulting in, higher performance. The proposed method exploits the Moore-Penrose Inverse to pull back the current residual error to each FC layer, generating well-generalized features. Further, the weights of each FC layers are recomputed according to the Moore-Penrose Inverse. We evaluate the proposed approach on six most widely accepted object recognition benchmark datasets: Scene-15, CIFAR-10, CIFAR-100, SUN-397, Places365, and ImageNet. The experimental results show that the proposed method obtains improvements over 30 state-of-the-art methods. Interestingly, it also indicates that any DCNN with the proposed method can provide better performance than the same network with its original Backpropagation (BP)-based training.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la Computación , Algoritmos , Bases de Datos Factuales , Aprendizaje Profundo
17.
IEEE Trans Cybern ; 50(10): 4268-4280, 2020 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-30869636

RESUMEN

Recommender systems are currently utilized widely in e-commerce for product recommendations and within content delivery platforms. Previous studies usually use independent features to represent item content. As a result, the relationship hidden among the content features is overlooked. In fact, the reason that an item attracts a user may be attributed to only a few set of features. In addition, these features are often semantically coupled. In this paper, we present an optimization model for extracting the relationship hidden in content features by considering user preferences. The learned feature relationship matrix is then applied to address the cold-start recommendations and content-based recommendations. It could also easily be employed for the visualization of feature relation graphs. Our proposed method was examined on three public datasets: 1) hetrec-movielens-2k-v2; 2) book-crossing; and 3) Netflix. The experimental results demonstrated the effectiveness of our method in comparison to the state-of-the-art recommendation methods.

18.
IEEE Trans Neural Netw Learn Syst ; 30(11): 3313-3325, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-30703046

RESUMEN

In this paper, we believe that the mixed selectivity of neuron in the top layer encodes distributed information produced from other neurons to offer a significant computational advantage over recognition accuracy. Thus, this paper proposes a hierarchical network framework that the learning behaviors of features combined from hundreds of midlayers. First, a subnetwork neuron, which itself could be constructed by other nodes, is functional as a subspace features extractor. The top layer of a hierarchical network needs subspace features produced by the subnetwork neurons to get rid of factors that are not relevant, but at the same time, to recast the subspace features into a mapping space so that the hierarchical network can be processed to generate more reliable cognition. Second, this paper shows that with noniterative learning strategy, the proposed method has a wider and shallower structure, providing a significant role in generalization performance improvements. Hence, compared with other state-of-the-art methods, multiple channel features with the proposed method could provide a comparable or even better performance, which dramatically boosts the learning speed. Our experimental results show that our platform can provide a much better generalization performance than 55 other state-of-the-art methods.


Asunto(s)
Algoritmos , Aprendizaje Profundo , Redes Neurales de la Computación , Reconocimiento Visual de Modelos , Bases de Datos Factuales , Humanos
19.
IEEE Trans Neural Netw Learn Syst ; 29(11): 5304-5318, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-29994643

RESUMEN

The tree structure is one of the most powerful structures for data organization. An efficient learning framework for transforming tree-structured data into vectorial representations is presented. First, in attempting to uncover the global discriminative information of child nodes hidden at the same level of all of the trees, a clustering technique can be adopted for allocating children into different clusters, which are used to formulate the components of a vector. Moreover, a locality-sensitive reconstruction method is introduced to model a process, in which each parent node is assumed to be reconstructed by its children. The resulting reconstruction coefficients are reversely transformed into complementary coefficients, which are utilized for locally weighting the components of the vector. A new vector is formulated by concatenating the original parent node vector and the learned vector from its children. This new vector for each parent node is inputted into the learning process of formulating vectorial representation at the upper level of the tree. This recursive process concludes when a vectorial representation is achieved for the entire tree. Our method is examined in two applications: book author recommendations and content-based image retrieval. Extensive experimental results demonstrate the effectiveness of the proposed method for transforming tree-structured data into vectors.

20.
IEEE Trans Med Imaging ; 35(6): 1381-94, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-26841389

RESUMEN

The finite Gaussian mixture model with kernel correlation is a flexible tool that has recently received attention for point set registration. While there are many algorithms for point set registration presented in the literature, an important issue arising from these studies concerns the mapping of data with nonlinear relationships and the ability to select a suitable kernel. Kernel selection is crucial for effective point set registration. We focus here on multiple kernel point set registration. We make several contributions in this paper. First, each observation is modeled using the Student's t-distribution, which is heavily tailed and more robust than the Gaussian distribution. Second, by automatically adjusting the kernel weights, the proposed method allows us to prune the ineffective kernels. This makes the choice of kernels less crucial. After parameter learning, the kernel saliencies of the irrelevant kernels go to zero. Thus, the choice of kernels is less crucial and it is easy to include other kinds of kernels. Finally, we show empirically that our model outperforms state-of-the-art methods recently proposed in the literature.


Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , Modelos Teóricos , Animales , Bases de Datos Factuales , Tomografía Computarizada Cuatridimensional , Humanos , Distribución Normal , Enfermedad Pulmonar Obstructiva Crónica/genética , Radiografía Torácica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...