Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
1.
Sensors (Basel) ; 23(11)2023 Jun 05.
Article in English | MEDLINE | ID: mdl-37300065

ABSTRACT

Image super-resolution (SR) usually synthesizes degraded low-resolution images with a predefined degradation model for training. Existing SR methods inevitably perform poorly when the true degradation does not follow the predefined degradation, especially in the case of the real world. To tackle this robustness issue, we propose a cascaded degradation-aware blind super-resolution network (CDASRN), which not only eliminates the influence of noise on blur kernel estimation but also can estimate the spatially varying blur kernel. With the addition of contrastive learning, our CDASRN can further distinguish the differences between local blur kernels, greatly improving its practicality. Experiments in various settings show that CDASRN outperforms state-of-the-art methods on both heavily degraded synthetic datasets and real-world datasets.

2.
IEEE Trans Image Process ; 33: 1838-1852, 2024.
Article in English | MEDLINE | ID: mdl-38451755

ABSTRACT

Weakly supervised point cloud semantic segmentation methods that require 1% or fewer labels with the aim of realizing almost the same performance as fully supervised approaches have recently attracted extensive research attention. A typical solution in this framework is to use self-training or pseudo-labeling to mine the supervision from the point cloud itself while ignoring the critical information from images. In fact, cameras widely exist in LiDAR scenarios, and this complementary information seems to be highly important for 3D applications. In this paper, we propose a novel cross-modality weakly supervised method for 3D segmentation that incorporates complementary information from unlabeled images. We design a dual-branch network equipped with an active labeling strategy to maximize the power of tiny parts of labels and to directly realize 2D-to-3D knowledge transfer. Afterward, we establish a cross-modal self-training framework, which iterates between parameter updating and pseudolabel estimation. In the training phase, we propose cross-modal association learning to mine complementary supervision from images by reinforcing the cycle consistency between 3D points and 2D superpixels. In the pseudolabel estimation phase, a pseudolabel self-rectification mechanism is derived to filter noisy labels, thus providing more accurate labels for the networks to be fully trained. The extensive experimental results demonstrate that our method even outperforms the state-of-the-art fully supervised competitors with less than 1% actively selected annotations.

3.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 4551-4566, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38133979

ABSTRACT

Information Bottleneck (IB) provides an information-theoretic principle for multi-view learning by revealing the various components contained in each viewpoint. This highlights the necessity to capture their distinct roles to achieve view-invariance and predictive representations but remains under-explored due to the technical intractability of modeling and organizing innumerable mutual information (MI) terms. Recent studies show that sufficiency and consistency play such key roles in multi-view representation learning, and could be preserved via a variational distillation framework. But when it generalizes to arbitrary viewpoints, such strategy fails as the mutual information terms of consistency become complicated. This paper presents Multi-View Variational Distillation (MV 2D), tackling the above limitations for generalized multi-view learning. Uniquely, MV 2D can recognize useful consistent information and prioritize diverse components by their generalization ability. This guides an analytical and scalable solution to achieving both sufficiency and consistency. Additionally, by rigorously reformulating the IB objective, MV 2D tackles the difficulties in MI optimization and fully realizes the theoretical advantages of the information bottleneck principle. We extensively evaluate our model on diverse tasks to verify its effectiveness, where the considerable gains provide key insights into achieving generalized multi-view representations under a rigorous information-theoretic principle.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4826-4842, 2023 Apr.
Article in English | MEDLINE | ID: mdl-35914039

ABSTRACT

Deep learning has made unprecedented progress in image restoration (IR), where residual block (RB) is popularly used and has a significant effect on promising performance. However, the massive stacked RBs bring about burdensome memory and computation cost. To tackle this issue, we aim to design an economical structure for adaptively connecting pair-wise RBs, thereby enhancing the model representation. Inspired by the topological structure of lattice filter in signal processing theory, we elaborately propose the lattice block (LB), where couple butterfly-style topological structures are utilized to bridge pair-wise RBs. Specifically, each candidate structure of LB relies on the combination coefficients learned through adaptive channel reweighting. As a basic mapping block, LB can be plugged into various IR models, such as image super-resolution, image denoising, image deraining, etc. It can avail the construction of lightweight IR models accompanying half parameter amount reduced, while keeping the considerable reconstruction accuracy compared with RBs. Moreover, a novel contrastive loss is exploited as a regularization constraint, which can further enhance the model representation without increasing the inference expenses. Experiments on several IR tasks illustrate that our method can achieve more favorable performance than other state-of-the-art models with lower storage and computation.

5.
Article in English | MEDLINE | ID: mdl-37874733

ABSTRACT

Recently, with the development of intelligent manufacturing, the demand for surface defect inspection has been increasing. Deep learning has achieved promising results in defect inspection. However, due to the rareness of defect data and the difficulties of pixelwise annotation, the existing supervised defect inspection methods are too inferior to be implemented in practice. To solve the problem of defect segmentation with few labeled data, we propose a simple and efficient method for semisupervised defect segmentation (SSDS), named perturbed progressive learning (PPL). On the one hand, PPL decouples the predictions of student and teacher networks as well as alleviates overfitting on noisy pseudo-labels. On the other hand, PPL encourages consistency across various perturbations in a broader stagewise scope, alleviating drift caused by the noisy pseudo-labels. Specifically, PPL contains two training stages. In the first stage, the teacher network gives the unlabeled data with pseudo-labels that are divided into the easy and hard groups. The labeled data and the unlabeled data in the easy group with their perturbation are both used to train for a better-performing student network. In the second stage, the unlabeled data in the hard group are predicted by the obtained student network, so the refined pseudo-labeled data are enlarged. All the pseudo-labeling data and labeled data with their perturbation are used to retrain the student network, progressively improving the defect feature representation. We build a mobile screen defect dataset (MSDD-3) with three classes of defects. PPL is implemented on MSDD-3 as well as other public datasets. Extensive experimental results demonstrate that PPL significantly surpasses the state-of-the-art methods across all evaluation partition protocols.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15328-15344, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37751346

ABSTRACT

Hidden features in the neural networks usually fail to learn informative representation for 3D segmentation as supervisions are only given on output prediction, while this can be solved by omni-scale supervision on intermediate layers. In this paper, we bring the first omni-scale supervision method to 3D segmentation via the proposed gradual Receptive Field Component Reasoning (RFCR), where target Receptive Field Component Codes (RFCCs) is designed to record categories within receptive fields for hidden units in the encoder. Then, target RFCCs will supervise the decoder to gradually infer the RFCCs in a coarse-to-fine categories reasoning manner, and finally obtain the semantic labels. To purchase more supervisions, we also propose an RFCR-NL model with complementary negative codes (i.e., Negative RFCCs, NRFCCs) with negative learning. Because many hidden features are inactive with tiny magnitudes and make minor contributions to RFCC prediction, we propose Feature Densification with a centrifugal potential to obtain more unambiguous features, and it is in effect equivalent to entropy regularization over features. More active features can unleash the potential of omni-supervision method. We embed our method into three prevailing backbones, which are significantly improved in all three datasets on both fully and weakly supervised segmentation tasks and achieve competitive performances.

7.
Article in English | MEDLINE | ID: mdl-36074884

ABSTRACT

Multiview clustering via binary representation has attracted intensive attention due to its effectiveness in handling large-scale multiple view data. However, these kind of clustering approaches usually ignore a very important potential high-order correlation in discrete representation learning. In this article, we propose a novel all-in collaborative multiview binary representation for clustering (AC-MVBC) framework, where multiview collaborative binary representation and clustering structure are learned in a joint manner. Specifically, using a new type of tensor low-rank constraint, the high-order collaborations, i.e., cross-view and inner view collaborations, can be effectively captured in our model. Moreover, by incorporating the Bregman discrepancy, the projective consistency among different views can be guaranteed to achieve a more powerful binary representation. An efficient optimization algorithm is also proposed to solve the objective function with fast convergence empirically. Experimental results on several challenge datasets demonstrate that the proposed method has achieved highly competent performance compared with the state-of-the-art multiview clustering (MVC) methods while maintaining low computational and memory requirements.

8.
IEEE Trans Neural Netw Learn Syst ; 32(2): 868-881, 2021 Feb.
Article in English | MEDLINE | ID: mdl-32287010

ABSTRACT

In this article, we propose a multiview self-representation model for nonlinear subspaces clustering. By assuming that the heterogeneous features lie within the union of multiple linear subspaces, the recent multiview subspace learning methods aim to capture the complementary and consensus from multiple views to boost the performance. However, in real-world applications, data feature usually resides in multiple nonlinear subspaces, leading to undesirable results. To this end, we propose a kernelized version of tensor-based multiview subspace clustering, which is referred to as Kt-SVD-MSC, to jointly learn self-representation coefficients in mapped high-dimensional spaces and multiple views correlation in unified tensor space. In view-specific feature space, a kernel-induced mapping is introduced for each view to ensure the separability of self-representation coefficients. In unified tensor space, a new kind of tensor low-rank regularizer is employed on the rotated self-representation coefficient tensor to preserve the global consistency across different views. We also derive an algorithm to efficiently solve the optimization problem with all the subproblems having closed-form solutions. Furthermore, by incorporating the nonnegative and sparsity constraints, the proposed method can be easily extended to a useful variant, meaning that several useful variants can be easily constructed in a similar way. Extensive experiments of the proposed method are tested on eight challenging data sets, in which a significant (even a breakthrough) advance over state-of-the-art multiview clustering is achieved.

9.
Med Phys ; 47(7): 2970-2985, 2020 Jul.
Article in English | MEDLINE | ID: mdl-32160321

ABSTRACT

PURPOSE: Image-based breast lesion detection is a powerful clinical diagnosis technology. In recent years, deep learning architectures have achieved considerable success in medical image analysis however, they always require large-scale samples. In mammography images, breast lesions are inconspicuous, multiscale, and have blurred edges. Moreover, few well-labeled images exist. Because of these factors, the detection accuracy of conventional deep learning methods is low. Therefore, we attempted to improve the accuracy of mammary lesion detection by introducing transfer learning (TL) into a deep learning framework for the few-shot learning task and thus provide a method that will further assist physicians in detecting breast lesions. METHODS: In this paper, we propose a method called "few-shot learning with deformable convolution for multiscale lesion detection in mammography," named FDMNet. Deformable convolution is introduced for enhancing the network's ability to detect lesions, and the sensitivity of the multiscale feature space is reinforced by using a feature pyramid method. Furthermore, by introducing location information in the predictor, the sensitivity of the model to lesion location is also enhanced. The proposed method, through the TL technique that is applied mines the potentially common knowledge of features in the source domain and transfers it into the target domain to improve the accuracy of breast lesion detection in the few-shot learning task. RESULTS: On the publicly available datasets for screening mammography CBIS-DDSM and Mini-MIAS, the proposed method performs better than five widely used detection methods. On the CBIS-DDSM dataset, its comprehensive scores, sensitivity, precision, and the mean dice similarity coefficient are 0.911, 0.949, 0.873, and 0.913, respectively, and on the Mini-MIAS dataset, these values are 0.931, 0.966, 0.882, and 0.941, respectively. CONCLUSIONS: To achieve the few-shot learning required for medical image analysis, the proposed method uses TL to execute feature knowledge transformation and includes deformable convolution to build a feature pyramid structure, which enhances the learning performance of the network for lesions. The results of comparative numerical experiments show that the proposed method outperforms some state-of-the-art methods.


Subject(s)
Breast Neoplasms , Mammography , Breast/diagnostic imaging , Breast Neoplasms/diagnostic imaging , Early Detection of Cancer , Humans
10.
IEEE Trans Cybern ; 50(2): 572-586, 2020 Feb.
Article in English | MEDLINE | ID: mdl-30281508

ABSTRACT

In this paper, we address the multiview nonlinear subspace representation problem. Traditional multiview subspace learning methods assume that the heterogeneous features of the data usually lie within the union of multiple linear subspaces. However, instead of linear subspaces, the data feature actually resides in multiple nonlinear subspaces in many real-world applications, resulting in unsatisfactory clustering performance. To overcome this, we propose a hyper-Laplacian regularized multilinear multiview self-representation model, which is referred to as HLR-M2VS, to jointly learn multiple views correlation and a local geometrical structure in a unified tensor space and view-specific self-representation feature spaces, respectively. In unified tensor space, a well-founded tensor low-rank regularization is adopted to impose on the self-representation coefficient tensor to ensure global consensus among different views. In view-specific feature space, hypergraph-induced hyper-Laplacian regularization is utilized to preserve the local geometrical structure embedded in a high-dimensional ambient space. An efficient algorithm is then derived to solve the optimization problem of the established model with theoretical convergence guarantee. Furthermore, the proposed model can be extended to semisupervised classification without introducing any additional parameters. An extensive experiment of our method is conducted on many challenging datasets, where a clear advance over state-of-the-art multiview clustering and multiview semisupervised classification approaches is achieved.

11.
Article in English | MEDLINE | ID: mdl-32224457

ABSTRACT

Existing enhancement methods are empirically expected to help the high-level end computer vision task: however, that is observed to not always be the case in practice. We focus on object or face detection in poor visibility enhancements caused by bad weathers (haze, rain) and low light conditions. To provide a more thorough examination and fair comparison, we introduce three benchmark sets collected in real-world hazy, rainy, and low-light conditions, respectively, with annotated objects/faces. We launched the UG2+ challenge Track 2 competition in IEEE CVPR 2019, aiming to evoke a comprehensive discussion and exploration about whether and how low-level vision techniques can benefit the high-level automatic visual recognition in various scenarios. To our best knowledge, this is the first and currently largest effort of its kind. Baseline results by cascading existing enhancement and detection models are reported, indicating the highly challenging nature of our new data as well as the large room for further technical innovations. Thanks to a large participation from the research community, we are able to analyze representative team solutions, striving to better identify the strengths and limitations of existing mindsets as well as the future directions.

12.
IEEE Trans Cybern ; 44(4): 539-53, 2014 Apr.
Article in English | MEDLINE | ID: mdl-23757567

ABSTRACT

We propose a robust tracking algorithm based on local sparse coding with discriminative dictionary learning and new keypoint matching schema. This algorithm consists of two parts: the local sparse coding with online updated discriminative dictionary for tracking (SOD part), and the keypoint matching refinement for enhancing the tracking performance (KP part). In the SOD part, the local image patches of the target object and background are represented by their sparse codes using an over-complete discriminative dictionary. Such discriminative dictionary, which encodes the information of both the foreground and the background, may provide more discriminative power. Furthermore, in order to adapt the dictionary to the variation of the foreground and background during the tracking, an online learning method is employed to update the dictionary. The KP part utilizes refined keypoint matching schema to improve the performance of the SOD. With the help of sparse representation and online updated discriminative dictionary, the KP part are more robust than the traditional method to reject the incorrect matches and eliminate the outliers. The proposed method is embedded into a Bayesian inference framework for visual tracking. Experimental results on several challenging video sequences demonstrate the effectiveness and robustness of our approach.


Subject(s)
Artificial Intelligence , Image Processing, Computer-Assisted/methods , Bayes Theorem , Support Vector Machine
SELECTION OF CITATIONS
SEARCH DETAIL