Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Open J Eng Med Biol ; 5: 345-352, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38899018

RESUMO

Goal: Auscultation for neonates is a simple and non-invasive method of diagnosing cardiovascular and respiratory disease. However, obtaining high-quality chest sounds containing only heart or lung sounds is non-trivial. Hence, this study introduces a new deep-learning model named NeoSSNet and evaluates its performance in neonatal chest sound separation with previous methods. Methods: We propose a masked-based architecture similar to Conv-TasNet. The encoder and decoder consist of 1D convolution and 1D transposed convolution, while the mask generator consists of a convolution and transformer architecture. The input chest sounds were first encoded as a sequence of tokens using 1D convolution. The tokens were then passed to the mask generator to generate two masks, one for heart sounds and one for lung sounds. Each mask is then applied to the input token sequence. Lastly, the tokens are converted back to waveforms using 1D transposed convolution. Results: Our proposed model showed superior results compared to the previous methods based on objective distortion measures, ranging from a 2.01 dB improvement to a 5.06 dB improvement. The proposed model is also significantly faster than the previous methods, with at least a 17-time improvement. Conclusions: The proposed model could be a suitable preprocessing step for any health monitoring system where only the heart sound or lung sound is desired.

2.
Comput Biol Med ; 168: 107775, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38061154

RESUMO

Deep learning MRI reconstruction methods are often based on Convolutional neural network (CNN) models; however, they are limited in capturing global correlations among image features due to the intrinsic locality of the convolution operation. Conversely, the recent vision transformer models (ViT) are capable of capturing global correlations by applying self-attention operations on image patches. Nevertheless, the existing transformer models for MRI reconstruction rarely leverage the physics of MRI. In this paper, we propose a novel physics-based transformer model titled, the Multi-branch Cascaded Swin Transformers (McSTRA) for robust MRI reconstruction. McSTRA combines several interconnected MRI physics-related concepts with the Swin transformers: it exploits global MRI features via the shifted window self-attention mechanism; it extracts MRI features belonging to different spectral components via a multi-branch setup; it iterates between intermediate de-aliasing and data consistency via a cascaded network with intermediate loss computations; furthermore, we propose a point spread function-guided positional embedding generation mechanism for the Swin transformers which exploit the spread of the aliasing artifacts for effective reconstruction. With the combination of all these components, McSTRA outperforms the state-of-the-art methods while demonstrating robustness in adversarial conditions such as higher accelerations, noisy data, different undersampling protocols, out-of-distribution data, and abnormalities in anatomy.


Assuntos
Aceleração , Artefatos , Imageamento por Ressonância Magnética , Redes Neurais de Computação
3.
J Imaging ; 9(12)2023 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-38132677

RESUMO

Lifelong learning portrays learning gradually in nonstationary environments and emulates the process of human learning, which is efficient, robust, and able to learn new concepts incrementally from sequential experience. To equip neural networks with such a capability, one needs to overcome the problem of catastrophic forgetting, the phenomenon of forgetting past knowledge while learning new concepts. In this work, we propose a novel knowledge distillation algorithm that makes use of contrastive learning to help a neural network to preserve its past knowledge while learning from a series of tasks. Our proposed generalized form of contrastive distillation strategy tackles catastrophic forgetting of old knowledge, and minimizes semantic drift by maintaining a similar embedding space, as well as ensures compactness in feature distribution to accommodate novel tasks in a current model. Our comprehensive study shows that our method achieves improved performances in the challenging class-incremental, task-incremental, and domain-incremental learning for supervised scenarios.

4.
Artigo em Inglês | MEDLINE | ID: mdl-37682650

RESUMO

Learning a latent embedding to understand the underlying nature of data distribution is often formulated in Euclidean spaces with zero curvature. However, the success of the geometry constraints, posed in the embedding space, indicates that curved spaces might encode more structural information, leading to better discriminative power and hence richer representations. In this work, we investigate the benefits of the curved space for analyzing anomalous, open-set, or out-of-distribution (OOD) objects in data. This is achieved by considering embeddings via three geometry constraints, namely, spherical geometry (with positive curvature), hyperbolic geometry (with negative curvature), or mixed geometry (with both positive and negative curvatures). Three geometric constraints can be chosen interchangeably in a unified design, given the task at hand. Tailored for the embeddings in the curved space, we also formulate functions to compute the anomaly score. Two types of geometric modules (i.e., geometric-in-one (GiO) and geometric-in-two (GiT) models) are proposed to plug in the original Euclidean classifier, and anomaly scores are computed from the curved embeddings. We evaluate the resulting designs under a diverse set of visual recognition scenarios, including image detection (multiclass OOD detection and one-class anomaly detection) and segmentation (multiclass anomaly segmentation and one-class anomaly segmentation). The empirical results show the effectiveness of our proposal through consistent improvement over various scenarios. The code is made available at https://github.com/JHome1/GiO-GiT.

5.
Neural Netw ; 167: 65-79, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37625243

RESUMO

An ultimate objective in continual learning is to preserve knowledge learned in preceding tasks while learning new tasks. To mitigate forgetting prior knowledge, we propose a novel knowledge distillation technique that takes into the account the manifold structure of the latent/output space of a neural network in learning novel tasks. To achieve this, we propose to approximate the data manifold up-to its first order, hence benefiting from linear subspaces to model the structure and maintain the knowledge of a neural network while learning novel concepts. We demonstrate that the modeling with subspaces provides several intriguing properties, including robustness to noise and therefore effective for mitigating Catastrophic Forgetting in continual learning. We also discuss and show how our proposed method can be adopted to address both classification and segmentation problems. Empirically, we observe that our proposed method outperforms various continual learning methods on several challenging datasets including Pascal VOC, and Tiny-Imagenet. Furthermore, we show how the proposed method can be seamlessly combined with existing learning approaches to improve their performances. The codes of this article will be available at https://github.com/csiro-robotics/SDCL.


Assuntos
Conhecimento , Aprendizagem , Redes Neurais de Computação
6.
IEEE Trans Neural Netw Learn Syst ; 34(11): 8630-8641, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35259116

RESUMO

Domain adaptation is concerned with the problem of generalizing a classification model to a target domain with little or no labeled data, by leveraging the abundant labeled data from a related source domain. The source and target domains possess different joint probability distributions, making it challenging for model generalization. In this article, we introduce domain neural adaptation (DNA): an approach that exploits nonlinear deep neural network to 1) match the source and target joint distributions in the network activation space and 2) learn the classifier in an end-to-end manner. Specifically, we employ the relative chi-square divergence to compare the two joint distributions, and show that the divergence can be estimated via seeking the maximal value of a quadratic functional over the reproducing kernel hilbert space. The analytic solution to this maximization problem enables us to explicitly express the divergence estimate as a function of the neural network mapping. We optimize the network parameters to minimize the estimated joint distribution divergence and the classification loss, yielding a classification model that generalizes well to the target domain. Empirical results on several visual datasets demonstrate that our solution is statistically better than its competitors.

7.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 1545-1562, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35380955

RESUMO

Meta-learning methods are shown to be effective in quickly adapting a model to novel tasks. Most existing meta-learning methods represent data and carry out fast adaptation in euclidean space. In fact, data of real-world applications usually resides in complex and various Riemannian manifolds. In this paper, we propose a curvature-adaptive meta-learning method that achieves fast adaptation to manifold data by producing suitable curvature. Specifically, we represent data in the product manifold of multiple constant curvature spaces and build a product manifold neural network as the base-learner. In this way, our method is capable of encoding complex manifold data into discriminative and generic representations. Then, we introduce curvature generation and curvature updating schemes, through which suitable product manifolds for various forms of data manifolds are constructed via few optimization steps. The curvature generation scheme identifies task-specific curvature initialization, leading to a shorter optimization trajectory. The curvature updating scheme automatically produces appropriate learning rate and search direction for curvature, making a faster and more adaptive optimization paradigm compared to hand-designed optimization schemes. We evaluate our method on a broad set of problems including few-shot classification, few-shot regression, and reinforcement learning tasks. Experimental results show that our method achieves substantial improvements as compared to meta-learning methods ignoring the geometry of the underlying space.

8.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 5935-5952, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36260581

RESUMO

Many learning tasks are modeled as optimization problems with nonlinear constraints, such as principal component analysis and fitting a Gaussian mixture model. A popular way to solve such problems is resorting to Riemannian optimization algorithms, which yet heavily rely on both human involvement and expert knowledge about Riemannian manifolds. In this paper, we propose a Riemannian meta-optimization method to automatically learn a Riemannian optimizer. We parameterize the Riemannian optimizer by a novel recurrent network and utilize Riemannian operations to ensure that our method is faithful to the geometry of manifolds. The proposed method explores the distribution of the underlying data by minimizing the objective of updated parameters, and hence is capable of learning task-specific optimizations. We introduce a Riemannian implicit differentiation training scheme to achieve efficient training in terms of numerical stability and computational cost. Unlike conventional meta-optimization training schemes that need to differentiate through the whole optimization trajectory, our training scheme is only related to the final two optimization steps. In this way, our training scheme avoids the exploding gradient problem, and significantly reduces the computational load and memory footprint. We discuss experimental results across various constrained problems, including principal component analysis on Grassmann manifolds, face recognition, person re-identification, and texture image classification on Stiefel manifolds, clustering and similarity learning on symmetric positive definite manifolds, and few-shot learning on hyperbolic manifolds.

9.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 4626-4641, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-33856981

RESUMO

This paper generalizes the Attention in Attention (AiA) mechanism, in P. Fang et al., 2019 by employing explicit mapping in reproducing kernel Hilbert spaces to generate attention values of the input feature map. The AiA mechanism models the capacity of building inter-dependencies among the local and global features by the interaction of inner and outer attention modules. Besides a vanilla AiA module, termed linear attention with AiA, two non-linear counterparts, namely, second-order polynomial attention and Gaussian attention, are also proposed to utilize the non-linear properties of the input features explicitly, via the second-order polynomial kernel and Gaussian kernel approximation. The deep convolutional neural network, equipped with the proposed AiA blocks, is referred to as Attention in Attention Network (AiA-Net). The AiA-Net learns to extract a discriminative pedestrian representation, which combines complementary person appearance and corresponding part features. Extensive ablation studies verify the effectiveness of the AiA mechanism and the use of non-linear features hidden in the feature map for attention design. Furthermore, our approach outperforms current state-of-the-art by a considerable margin across a number of benchmarks. In addition, state-of-the-art performance is also achieved in the video person retrieval task with the assistance of the proposed AiA blocks.


Assuntos
Algoritmos , Pedestres , Humanos , Redes Neurais de Computação
10.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5088-5102, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33856984

RESUMO

Representations in the form of Symmetric Positive Definite (SPD) matrices have been popularized in a variety of visual learning applications due to their demonstrated ability to capture rich second-order statistics of visual data. There exist several similarity measures for comparing SPD matrices with documented benefits. However, selecting an appropriate measure for a given problem remains a challenge and in most cases, is the result of a trial-and-error process. In this paper, we propose to learn similarity measures in a data-driven manner. To this end, we capitalize on the αß-log-det divergence, which is a meta-divergence parametrized by scalars α and ß, subsuming a wide family of popular information divergences on SPD matrices for distinct and discrete values of these parameters. Our key idea is to cast these parameters in a continuum and learn them from data. We systematically extend this idea to learn vector-valued parameters, thereby increasing the expressiveness of the underlying non-linear measure. We conjoin the divergence learning problem with several standard tasks in machine learning, including supervised discriminative dictionary learning and unsupervised SPD matrix clustering. We present Riemannian gradient descent schemes for optimizing our formulations efficiently, and show the usefulness of our method on eight standard computer vision tasks.

11.
IEEE Trans Pattern Anal Mach Intell ; 43(8): 2866-2873, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-33351750

RESUMO

The advances made in predicting visual saliency using deep neural networks come at the expense of collecting large-scale annotated data. However, pixel-wise annotation is labor-intensive and overwhelming. In this paper, we propose to learn saliency prediction from a single noisy labelling, which is easy to obtain (e.g., from imperfect human annotation or from unsupervised saliency prediction methods). With this goal, we address a natural question: Can we learn saliency prediction while identifying clean labels in a unified framework? To answer this question, we call on the theory of robust model fitting and formulate deep saliency prediction from a single noisy labelling as robust network learning and exploit model consistency across iterations to identify inliers and outliers (i.e., noisy labels). Extensive experiments on different benchmark datasets demonstrate the superiority of our proposed framework, which can learn comparable saliency prediction with state-of-the-art fully supervised saliency methods. Furthermore, we show that simply by treating ground truth annotations as noisy labelling, our framework achieves tangible improvements over state-of-the-art methods.

12.
IEEE Trans Neural Netw Learn Syst ; 32(12): 5708-5722, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33055040

RESUMO

An intrinsic problem in domain adaptation is the joint distribution mismatch between the source and target domains. Therefore, it is crucial to match the two joint distributions such that the source domain knowledge can be properly transferred to the target domain. Unfortunately, in semi-supervised domain adaptation (SSDA) this problem still remains unsolved. In this article, we therefore present an asymmetric joint distribution matching (AJDM) approach, which seeks a couple of asymmetric matrices to linearly match the source and target joint distributions under the relative chi-square divergence. Specifically, we introduce a least square method to estimate the divergence, which is free from estimating the two joint distributions. Furthermore, we show that our AJDM approach can be generalized to a kernel version, enabling it to handle nonlinearity in the data. From the perspective of Riemannian geometry, learning the linear and nonlinear mappings are both formulated as optimization problems defined on the product of Riemannian manifolds. Numerical experiments on synthetic and real-world data sets demonstrate the effectiveness of the proposed approach and testify its superiority over existing SSDA techniques.

13.
Artigo em Inglês | MEDLINE | ID: mdl-32755860

RESUMO

Domain adaptation addresses the learning problem where the training data are sampled from a source joint distribution (source domain), while the test data are sampled from a different target joint distribution (target domain). Because of this joint distribution mismatch, a discriminative classifier naively trained on the source domain often generalizes poorly to the target domain. In this paper, we therefore present a Joint Distribution Invariant Projections (JDIP) approach to solve this problem. The proposed approach exploits linear projections to directly match the source and target joint distributions under the L2-distance. Since the traditional kernel density estimators for distribution estimation tend to be less reliable as the dimensionality increases, we propose a least square method to estimate the L2-distance without the need to estimate the two joint distributions, leading to a quadratic problem with analytic solution. Furthermore, we introduce a kernel version of JDIP to account for inherent nonlinearity in the data. We show that the proposed learning problems can be naturally cast as optimization problems defined on the product of Riemannian manifolds. To be comprehensive, we also establish an error bound, theoretically explaining how our method works and contributes to reducing the target domain generalization error. Extensive empirical evidence demonstrates the benefits of our approach over state-of-the-art domain adaptation methods on several visual data sets.

14.
IEEE Trans Neural Netw Learn Syst ; 31(9): 3230-3244, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31567102

RESUMO

The symmetric positive definite (SPD) matrices, forming a Riemannian manifold, are commonly used as visual representations. The non-Euclidean geometry of the manifold often makes developing learning algorithms (e.g., classifiers) difficult and complicated. The concept of similarity-based learning has been shown to be effective to address various problems on SPD manifolds. This is mainly because the similarity-based algorithms are agnostic to the geometry and purely work based on the notion of similarities/distances. However, existing similarity-based models on SPD manifolds opt for holistic representations, ignoring characteristics of information captured by SPD matrices. To circumvent this limitation, we propose a novel SPD distance measure for the similarity-based algorithm. Specifically, we introduce the concept of point-to-set transformation, which enables us to learn multiple lower dimensional and discriminative SPD manifolds from a higher dimensional one. For lower dimensional SPD manifolds obtained by the point-to-set transformation, we propose a tailored set-to-set distance measure by making use of the family of alpha-beta divergences. We further propose to learn the point-to-set transformation and the set-to-set distance measure jointly, yielding a powerful similarity-based algorithm on SPD manifolds. Our thorough evaluations on several visual recognition tasks (e.g., action classification and face recognition) suggest that our algorithm comfortably outperforms various state-of-the-art algorithms.

15.
IEEE Trans Image Process ; 28(4): 1773-1782, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30369442

RESUMO

In this paper, we propose the novel principal backpropagation networks (PBNets) to revisit the backpropagation algorithms commonly used in training two-stream networks for video action recognition. We content that existing approaches always take all the frames/snippets for the backpropagation not optimal for video recognition since the desired actions only occur in a short period within a video. To remedy these drawbacks, we design a watch-and-choose mechanism. In particular, the watching stage exploits a dense snippet-wise temporal pooling strategy to discover the global characteristic for each input video, while the choosing phase only backpropagates a small number of representative snippets that are selected with two novel strategies, i.e., Max-rule and KL-rule. We prove that with the proposed selection strategies, performing the backpropagation on the selected subset is capable of decreasing the loss of the whole snippets as well. The proposed PBNets are evaluated on two standard video action recognition benchmarks UCF101 and HMDB51, where it surpasses the state of the arts consistently, but requiring less memory and computation to achieve high performance.

16.
IEEE Trans Neural Netw Learn Syst ; 29(9): 4339-4346, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29990173

RESUMO

Despite its attractive properties, the performance of the recently introduced Keep It Simple and Straightforward MEtric learning (KISSME) method is greatly dependent on principal component analysis as a preprocessing step. This dependence can lead to difficulties, e.g., when the dimensionality is not meticulously set. To address this issue, we devise a unified formulation for joint dimensionality reduction and metric learning based on the KISSME algorithm. Our joint formulation is expressed as an optimization problem on the Grassmann manifold, and hence enjoys the properties of Riemannian optimization techniques. Following the success of deep learning in recent years, we also devise end-to-end learning of a generic deep network for metric learning using our derivation.

17.
IEEE Trans Neural Netw Learn Syst ; 29(11): 5701-5712, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-29994290

RESUMO

Core to many learning pipelines is visual recognition such as image and video classification. In such applications, having a compact yet rich and informative representation plays a pivotal role. An underlying assumption in traditional coding schemes [e.g., sparse coding (SC)] is that the data geometrically comply with the Euclidean space. In other words, the data are presented to the algorithm in vector form and Euclidean axioms are fulfilled. This is of course restrictive in machine learning, computer vision, and signal processing, as shown by a large number of recent studies. This paper takes a further step and provides a comprehensive mathematical framework to perform coding in curved and non-Euclidean spaces, i.e., Riemannian manifolds. To this end, we start by the simplest form of coding, namely, bag of words. Then, inspired by the success of vector of locally aggregated descriptors in addressing computer vision problems, we will introduce its Riemannian extensions. Finally, we study Riemannian form of SC, locality-constrained linear coding, and collaborative coding. Through rigorous tests, we demonstrate the superior performance of our Riemannian coding schemes against the state-of-the-art methods on several visual classification tasks, including head pose classification, video-based face recognition, and dynamic scene recognition.

18.
IEEE Trans Pattern Anal Mach Intell ; 40(1): 48-62, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-28103548

RESUMO

Representing images and videos with Symmetric Positive Definite (SPD) matrices, and considering the Riemannian geometry of the resulting space, has been shown to yield high discriminative power in many visual recognition tasks. Unfortunately, computation on the Riemannian manifold of SPD matrices -especially of high-dimensional ones- comes at a high cost that limits the applicability of existing techniques. In this paper, we introduce algorithms able to handle high-dimensional SPD matrices by constructing a lower-dimensional SPD manifold. To this end, we propose to model the mapping from the high-dimensional SPD manifold to the low-dimensional one with an orthonormal projection. This lets us formulate dimensionality reduction as the problem of finding a projection that yields a low-dimensional manifold either with maximum discriminative power in the supervised scenario, or with maximum variance of the data in the unsupervised one. We show that learning can be expressed as an optimization problem on a Grassmann manifold and discuss fast solutions for special cases. Our evaluation on several classification tasks evidences that our approach leads to a significant accuracy gain over state-of-the-art methods.

19.
IEEE Trans Neural Netw Learn Syst ; 27(6): 1294-306, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-25643414

RESUMO

This paper introduces sparse coding and dictionary learning for symmetric positive definite (SPD) matrices, which are often used in machine learning, computer vision, and related areas. Unlike traditional sparse coding schemes that work in vector spaces, in this paper, we discuss how SPD matrices can be described by sparse combination of dictionary atoms, where the atoms are also SPD matrices. We propose to seek sparse coding by embedding the space of SPD matrices into the Hilbert spaces through two types of the Bregman matrix divergences. This not only leads to an efficient way of performing sparse coding but also an online and iterative scheme for dictionary learning. We apply the proposed methods to several computer vision tasks where images are represented by region covariance matrices. Our proposed algorithms outperform state-of-the-art methods on a wide range of classification tasks, including face recognition, action recognition, material classification, and texture categorization.

20.
IEEE Trans Pattern Anal Mach Intell ; 37(12): 2464-77, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26539851

RESUMO

In this paper, we develop an approach to exploiting kernel methods with manifold-valued data. In many computer vision problems, the data can be naturally represented as points on a Riemannian manifold. Due to the non-Euclidean geometry of Riemannian manifolds, usual Euclidean computer vision and machine learning algorithms yield inferior results on such data. In this paper, we define Gaussian radial basis function (RBF)-based positive definite kernels on manifolds that permit us to embed a given manifold with a corresponding metric in a high dimensional reproducing kernel Hilbert space. These kernels make it possible to utilize algorithms developed for linear spaces on nonlinear manifold-valued data. Since the Gaussian RBF defined with any given metric is not always positive definite, we present a unified framework for analyzing the positive definiteness of the Gaussian RBF on a generic metric space. We then use the proposed framework to identify positive definite kernels on two specific manifolds commonly encountered in computer vision: the Riemannian manifold of symmetric positive definite matrices and the Grassmann manifold, i.e., the Riemannian manifold of linear subspaces of a Euclidean space. We show that many popular algorithms designed for Euclidean spaces, such as support vector machines, discriminant analysis and principal component analysis can be generalized to Riemannian manifolds with the help of such positive definite Gaussian kernels.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...