Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
IEEE Signal Process Mag ; 40(2): 89-100, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38404742

RESUMEN

Since 2016, deep learning (DL) has advanced tomographic imaging with remarkable successes, especially in low-dose computed tomography (LDCT) imaging. Despite being driven by big data, the LDCT denoising and pure end-to-end reconstruction networks often suffer from the black box nature and major issues such as instabilities, which is a major barrier to apply deep learning methods in low-dose CT applications. An emerging trend is to integrate imaging physics and model into deep networks, enabling a hybridization of physics/model-based and data-driven elements. In this paper, we systematically review the physics/model-based data-driven methods for LDCT, summarize the loss functions and training strategies, evaluate the performance of different methods, and discuss relevant issues and future directions.

2.
Opt Express ; 25(14): 15956-15966, 2017 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-28789106

RESUMEN

Scatterometry has been widely applied in microelectronic manufacturing process monitoring. As a key part in scatterometry, inverse problem uses scatter signature to determine the shape of profile structure. The most common solutions for the inverse problem are model-based methods, such as library search, Levenberg-Marquardt algorithm and artificial neural network (ANN). However, they all require a pre-defined geometric model to extract 3D profile of the structure. When facing the complex structure in manufacturing process monitoring, the model-based methods will cost a long time and may fail to build a valid geometric model. Without the assumption of the geometric model, model-free methods are developed to find a mapping between profile parameter named label Y and corresponding spectral signature X. These methods need lots of labeled data obtained from transmission electron microscopy (TEM) or cross-sectional scanning electron microscopy (XSEM) with time-consuming and highly cost, leading to the increase of production costs. To address these issues, this paper develops a novel model-free method, called maximum contributed component regression (MCCR). It utilizes canonical correlation analysis (CCA) to estimate the maximum contributed components from pairwise relationship of economic unlabeled data with few expensive labeled data. In MCCR, the maximum contributed components are used to guide the solution of the inverse problem based on the conventional regression methods. Experimental results on both synthetic and real-world semiconductor datasets demonstrate the effectiveness of the proposed method given small amount of labeled data.

3.
IEEE Trans Med Imaging ; 43(2): 745-759, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37773896

RESUMEN

Low-dose computed tomography (CT) images suffer from noise and artifacts due to photon starvation and electronic noise. Recently, some works have attempted to use diffusion models to address the over-smoothness and training instability encountered by previous deep-learning-based denoising models. However, diffusion models suffer from long inference time due to a large number of sampling steps involved. Very recently, cold diffusion model generalizes classical diffusion models and has greater flexibility. Inspired by cold diffusion, this paper presents a novel COntextual eRror-modulated gEneralized Diffusion model for low-dose CT (LDCT) denoising, termed CoreDiff. First, CoreDiff utilizes LDCT images to displace the random Gaussian noise and employs a novel mean-preserving degradation operator to mimic the physical process of CT degradation, significantly reducing sampling steps thanks to the informative LDCT images as the starting point of the sampling process. Second, to alleviate the error accumulation problem caused by the imperfect restoration operator in the sampling process, we propose a novel ContextuaL Error-modulAted Restoration Network (CLEAR-Net), which can leverage contextual information to constrain the sampling process from structural distortion and modulate time step embedding features for better alignment with the input at the next time step. Third, to rapidly generalize the trained model to a new, unseen dose level with as few resources as possible, we devise a one-shot learning framework to make CoreDiff generalize faster and better using only one single LDCT image (un)paired with normal-dose CT (NDCT). Extensive experimental results on four datasets demonstrate that our CoreDiff outperforms competing methods in denoising and generalization performance, with clinically acceptable inference time. Source code is made available at https://github.com/qgao21/CoreDiff.


Asunto(s)
Programas Informáticos , Tomografía Computarizada por Rayos X , Relación Señal-Ruido , Tomografía Computarizada por Rayos X/métodos , Artefactos , Difusión , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos
4.
IEEE Trans Med Imaging ; PP2024 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-38976467

RESUMEN

Medical image segmentation has been significantly advanced with the rapid development of deep learning (DL) techniques. Existing DL-based segmentation models are typically discriminative; i.e., they aim to learn a mapping from the input image to segmentation masks. However, these discriminative methods neglect the underlying data distribution and intrinsic class characteristics, suffering from unstable feature space. In this work, we propose to complement discriminative segmentation methods with the knowledge of underlying data distribution from generative models. To that end, we propose a novel hybrid diffusion framework for medical image segmentation, termed HiDiff, which can synergize the strengths of existing discriminative segmentation models and new generative diffusion models. HiDiff comprises two key components: discriminative segmentor and diffusion refiner. First, we utilize any conventional trained segmentation models as discriminative segmentor, which can provide a segmentation mask prior for diffusion refiner. Second, we propose a novel binary Bernoulli diffusion model (BBDM) as the diffusion refiner, which can effectively, efficiently, and interactively refine the segmentation mask by modeling the underlying data distribution. Third, we train the segmentor and BBDM in an alternate-collaborative manner to mutually boost each other. Extensive experimental results on abdomen organ, brain tumor, polyps, and retinal vessels segmentation datasets, covering four widely-used modalities, demonstrate the superior performance of HiDiff over existing medical segmentation algorithms, including the state-of-the-art transformer- and diffusion-based ones. In addition, HiDiff excels at segmenting small objects and generalizing to new datasets. Source codes are made available at https://github.com/takimailto/HiDiff.

5.
IEEE Trans Med Imaging ; 43(5): 1880-1894, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38194396

RESUMEN

This paper studies 3D low-dose computed tomography (CT) imaging. Although various deep learning methods were developed in this context, typically they focus on 2D images and perform denoising due to low-dose and deblurring for super-resolution separately. Up to date, little work was done for simultaneous in-plane denoising and through-plane deblurring, which is important to obtain high-quality 3D CT images with lower radiation and faster imaging speed. For this task, a straightforward method is to directly train an end-to-end 3D network. However, it demands much more training data and expensive computational costs. Here, we propose to link in-plane and through-plane transformers for simultaneous in-plane denoising and through-plane deblurring, termed as LIT-Former, which can efficiently synergize in-plane and through-plane sub-tasks for 3D CT imaging and enjoy the advantages of both convolution and transformer networks. LIT-Former has two novel designs: efficient multi-head self-attention modules (eMSM) and efficient convolutional feed-forward networks (eCFN). First, eMSM integrates in-plane 2D self-attention and through-plane 1D self-attention to efficiently capture global interactions of 3D self-attention, the core unit of transformer networks. Second, eCFN integrates 2D convolution and 1D convolution to extract local information of 3D convolution in the same fashion. As a result, the proposed LIT-Former synergizes these two sub-tasks, significantly reducing the computational complexity as compared to 3D counterparts and enabling rapid convergence. Extensive experimental results on simulated and clinical datasets demonstrate superior performance over state-of-the-art models. The source code is made available at https://github.com/hao1635/LIT-Former.


Asunto(s)
Algoritmos , Imagenología Tridimensional , Tomografía Computarizada por Rayos X , Tomografía Computarizada por Rayos X/métodos , Humanos , Imagenología Tridimensional/métodos , Aprendizaje Profundo , Fantasmas de Imagen
6.
Artículo en Inglés | MEDLINE | ID: mdl-38261490

RESUMEN

Mild cognitive impairment (MCI) is often at high risk of progression to Alzheimer's disease (AD). Existing works to identify the progressive MCI (pMCI) typically require MCI subtype labels, pMCI vs. stable MCI (sMCI), determined by whether or not an MCI patient will progress to AD after a long follow-up. However, prospectively acquiring MCI subtype data is time-consuming and resource-intensive; the resultant small datasets could lead to severe overfitting and difficulty in extracting discriminative information. Inspired by that various longitudinal biomarkers and cognitive measurements present an ordinal pathway on AD progression, we propose a novel Hybrid-granularity Ordinal PrototypE learning (HOPE) method to characterize AD ordinal progression for MCI progression prediction. First, HOPE learns an ordinal metric space that enables progression prediction by prototype comparison. Second, HOPE leverages a novel hybrid-granularity ordinal loss to learn the ordinal nature of AD via effectively integrating instance-to-instance ordinality, instance-to-class compactness, and class-to-class separation. Third, to make the prototype learning more stable, HOPE employs an exponential moving average strategy to learn the global prototypes of NC and AD dynamically. Experimental results on the internal ADNI and the external NACC datasets demonstrate the superiority of the proposed HOPE over existing state-of-the-art methods as well as its interpretability. Source code is made available at https://github.com/thibault-wch/HOPE-for-mild-cognitive-impairment.

7.
IEEE Trans Med Imaging ; 43(5): 1866-1879, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38194399

RESUMEN

Metal implants and other high-density objects in patients introduce severe streaking artifacts in CT images, compromising image quality and diagnostic performance. Although various methods were developed for CT metal artifact reduction over the past decades, including the latest dual-domain deep networks, remaining metal artifacts are still clinically challenging in many cases. Here we extend the state-of-the-art dual-domain deep network approach into a quad-domain counterpart so that all the features in the sinogram, image, and their corresponding Fourier domains are synergized to eliminate metal artifacts optimally without compromising structural subtleties. Our proposed quad-domain network for MAR, referred to as Quad-Net, takes little additional computational cost since the Fourier transform is highly efficient, and works across the four receptive fields to learn both global and local features as well as their relations. Specifically, we first design a Sinogram-Fourier Restoration Network (SFR-Net) in the sinogram domain and its Fourier space to faithfully inpaint metal-corrupted traces. Then, we couple SFR-Net with an Image-Fourier Refinement Network (IFR-Net) which takes both an image and its Fourier spectrum to improve a CT image reconstructed from the SFR-Net output using cross-domain contextual information. Quad-Net is trained on clinical datasets to minimize a composite loss function. Quad-Net does not require precise metal masks, which is of great importance in clinical practice. Our experimental results demonstrate the superiority of Quad-Net over the state-of-the-art MAR methods quantitatively, visually, and statistically. The Quad-Net code is publicly available at https://github.com/longzilicart/Quad-Net.


Asunto(s)
Artefactos , Metales , Tomografía Computarizada por Rayos X , Humanos , Tomografía Computarizada por Rayos X/métodos , Metales/química , Análisis de Fourier , Algoritmos , Aprendizaje Profundo , Prótesis e Implantes , Procesamiento de Imagen Asistido por Computador/métodos , Fantasmas de Imagen
8.
Med Image Anal ; 91: 103032, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37995628

RESUMEN

Alzheimer's disease (AD) is one of the most common neurodegenerative disorders presenting irreversible progression of cognitive impairment. How to identify AD as early as possible is critical for intervention with potential preventive measures. Among various neuroimaging modalities used to diagnose AD, functional positron emission tomography (PET) has higher sensitivity than structural magnetic resonance imaging (MRI), but it is also costlier and often not available in many hospitals. How to leverage massive unpaired unlabeled PET to improve the diagnosis performance of AD from MRI becomes rather important. To address this challenge, this paper proposes a novel joint learning framework of unsupervised cross-modal synthesis and AD diagnosis by mining underlying shared modality information, improving the AD diagnosis from MRI while synthesizing more discriminative PET images. We mine underlying shared modality information in two aspects: diversifying modality information through the cross-modal synthesis network and locating critical diagnosis-related patterns through the AD diagnosis network. First, to diversify the modality information, we propose a novel unsupervised cross-modal synthesis network, which implements the inter-conversion between 3D PET and MRI in a single model modulated by the AdaIN module. Second, to locate shared critical diagnosis-related patterns, we propose an interpretable diagnosis network based on fully 2D convolutions, which takes either 3D synthesized PET or original MRI as input. Extensive experimental results on the ADNI dataset show that our framework can synthesize more realistic images, outperform the state-of-the-art AD diagnosis methods, and have better generalization on external AIBL and NACC datasets.


Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Humanos , Enfermedad de Alzheimer/patología , Neuroimagen/métodos , Tomografía de Emisión de Positrones/métodos , Imagen por Resonancia Magnética/métodos , Aprendizaje , Disfunción Cognitiva/diagnóstico por imagen
9.
Med Phys ; 51(2): 1277-1288, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37486288

RESUMEN

BACKGROUND: Accurate measurement of bladder volume is necessary to maintain the consistency of the patient's anatomy in radiation therapy for pelvic tumors. As the diversity of the bladder shape, traditional methods for bladder volume measurement from 2D ultrasound have been found to produce inaccurate results. PURPOSE: To improve the accuracy of bladder volume measurement from 2D ultrasound images for patients with pelvic tumors. METHODS: The bladder ultrasound images from 130 patients with pelvic cancer were collected retrospectively. All data were split into a training set (80 patients), a validation set (20 patients), and a test set (30 patients). A total of 12 transabdominal ultrasound images for one patient were captured by automatically rotating the ultrasonic probe with an angle step of 15°. An incomplete 3D ultrasound volume was synthesized by arranging these 2D ultrasound images in 3D space according to the acquisition angles. With this as input, a weakly supervised learning-based 3D bladder reconstruction neural network model was built to predict the complete 3D bladder. The key point is that we designed a novel loss function, including the supervised loss of bladder segmentation in the ultrasound images at known angles and the compactness loss of the 3D bladder. Bladder volume was calculated by counting the number of voxels belonging to the 3D bladder. The dice similarity coefficient (DSC) was used to evaluate the accuracy of bladder segmentation, and the relative standard deviation (RSD) was used to evaluate the calculation accuracy of bladder volume with that of computed tomography (CT) images as the gold standard. RESULTS: The results showed that the mean DSC was up to 0.94 and the mean absolute RSD can be reduced to 6.3% when using 12 ultrasound images of one patient. Further, the mean DSC also was up to 0.90 and the mean absolute RSD can be reduced to 9.0% even if only two ultrasound images were used (i.e., the angle step is 90°). Compared with the commercial algorithm in bladder scanners, which has a mean absolute RSD of 13.6%, our proposed method showed a considerably huge improvement. CONCLUSIONS: The proposed weakly supervised learning-based 3D bladder reconstruction method can greatly improve the accuracy of bladder volume measurement. It has great potential to be used in bladder volume measurement devices in the future.


Asunto(s)
Neoplasias Pélvicas , Vejiga Urinaria , Humanos , Vejiga Urinaria/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Estudios Retrospectivos , Aprendizaje Automático Supervisado
10.
iScience ; 27(1): 108608, 2024 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-38174317

RESUMEN

Magnetic resonance imaging (MRI) is a widely used imaging modality in clinics for medical disease diagnosis, staging, and follow-up. Deep learning has been extensively used to accelerate k-space data acquisition, enhance MR image reconstruction, and automate tissue segmentation. However, these three tasks are usually treated as independent tasks and optimized for evaluation by radiologists, thus ignoring the strong dependencies among them; this may be suboptimal for downstream intelligent processing. Here, we present a novel paradigm, full-stack learning (FSL), which can simultaneously solve these three tasks by considering the overall imaging process and leverage the strong dependence among them to further improve each task, significantly boosting the efficiency and efficacy of practical MRI workflows. Experimental results obtained on multiple open MR datasets validate the superiority of FSL over existing state-of-the-art methods on each task. FSL has great potential to optimize the practical workflow of MRI for medical diagnosis and radiotherapy.

11.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7917-7932, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-36306297

RESUMEN

To minimize the impact of age variation on face recognition, age-invariant face recognition (AIFR) extracts identity-related discriminative features by minimizing the correlation between identity- and age-related features while face age synthesis (FAS) eliminates age variation by converting the faces in different age groups to the same group. However, AIFR lacks visual results for model interpretation and FAS compromises downstream recognition due to artifacts. Therefore, we propose a unified, multi-task framework to jointly handle these two tasks, termed MTLFace, which can learn the age-invariant identity-related representation for face recognition while achieving pleasing face synthesis for model interpretation. Specifically, we propose an attention-based feature decomposition to decompose the mixed face features into two uncorrelated components-identity- and age-related features-in a spatially constrained way. Unlike the conventional one-hot encoding that achieves group-level FAS, we propose a novel identity conditional module to achieve identity-level FAS, which can improve the age smoothness of synthesized faces through a weight-sharing strategy. Benefiting from the proposed multi-task framework, we then leverage those high-quality synthesized faces from FAS to further boost AIFR via a novel selective fine-tuning strategy. Furthermore, to advance both AIFR and FAS, we collect and release a large cross-age face dataset with age and gender annotations, and a new benchmark specifically designed for tracing long-missing children. Extensive experimental results on five benchmark cross-age datasets demonstrate that MTLFace yields superior performance than state-of-the-art methods for both AIFR and FAS. We further validate MTLFace on two popular general face recognition datasets, obtaining competitive performance on face recognition in the wild. The source code and datasets are available at http://hzzone.github.io/MTLFace.


Asunto(s)
Reconocimiento Facial , Niño , Humanos , Algoritmos , Benchmarking , Cara , Procesamiento de Imagen Asistido por Computador/métodos
12.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7509-7524, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-36269906

RESUMEN

Existing deep clustering methods rely on either contrastive or non-contrastive representation learning for downstream clustering task. Contrastive-based methods thanks to negative pairs learn uniform representations for clustering, in which negative pairs, however, may inevitably lead to the class collision issue and consequently compromise the clustering performance. Non-contrastive-based methods, on the other hand, avoid class collision issue, but the resulting non-uniform representations may cause the collapse of clustering. To enjoy the strengths of both worlds, this paper presents a novel end-to-end deep clustering method with prototype scattering and positive sampling, termed ProPos. Specifically, we first maximize the distance between prototypical representations, named prototype scattering loss, which improves the uniformity of representations. Second, we align one augmented view of instance with the sampled neighbors of another view-assumed to be truly positive pair in the embedding space-to improve the within-cluster compactness, termed positive sampling alignment. The strengths of ProPos are avoidable class collision issue, uniform representations, well-separated clusters, and within-cluster compactness. By optimizing ProPos in an end-to-end expectation-maximization framework, extensive experimental results demonstrate that ProPos achieves competing performance on moderate-scale clustering benchmark datasets and establishes new state-of-the-art performance on large-scale datasets. Source code is available at https://github.com/Hzzone/ProPos.

13.
Comput Biol Med ; 156: 106717, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36878125

RESUMEN

There are considerable interests in automatic stroke lesion segmentation on magnetic resonance (MR) images in the medical imaging field, as stroke is an important cerebrovascular disease. Although deep learning-based models have been proposed for this task, generalizing these models to unseen sites is difficult due to not only the large inter-site discrepancy among different scanners, imaging protocols, and populations, but also the variations in stroke lesion shape, size, and location. To tackle this issue, we introduce a self-adaptive normalization network, termed SAN-Net, to achieve adaptive generalization on unseen sites for stroke lesion segmentation. Motivated by traditional z-score normalization and dynamic network, we devise a masked adaptive instance normalization (MAIN) to minimize inter-site discrepancies, which standardizes input MR images from different sites into a site-unrelated style by dynamically learning affine parameters from the input; i.e., MAIN can affinely transform the intensity values. Then, we leverage a gradient reversal layer to force the U-net encoder to learn site-invariant representation with a site classifier, which further improves the model generalization in conjunction with MAIN. Finally, inspired by the "pseudosymmetry" of the human brain, we introduce a simple yet effective data augmentation technique, termed symmetry-inspired data augmentation (SIDA), that can be embedded within SAN-Net to double the sample size while halving memory consumption. Experimental results on the benchmark Anatomical Tracings of Lesions After Stroke (ATLAS) v1.2 dataset, which includes MR images from 9 different sites, demonstrate that under the "leave-one-site-out" setting, the proposed SAN-Net outperforms recently published methods in terms of quantitative metrics and qualitative comparisons.


Asunto(s)
Redes Neurales de la Computación , Accidente Cerebrovascular , Humanos , Imagen por Resonancia Magnética/métodos , Encéfalo , Procesamiento de Imagen Asistido por Computador/métodos
14.
IEEE Trans Med Imaging ; 42(3): 850-863, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36327187

RESUMEN

Lowering the radiation dose in computed tomography (CT) can greatly reduce the potential risk to public health. However, the reconstructed images from dose-reduced CT or low-dose CT (LDCT) suffer from severe noise which compromises the subsequent diagnosis and analysis. Recently, convolutional neural networks have achieved promising results in removing noise from LDCT images. The network architectures that are used are either handcrafted or built on top of conventional networks such as ResNet and U-Net. Recent advances in neural network architecture search (NAS) have shown that the network architecture has a dramatic effect on the model performance. This indicates that current network architectures for LDCT may be suboptimal. Therefore, in this paper, we make the first attempt to apply NAS to LDCT and propose a multi-scale and multi-level memory-efficient NAS for LDCT denoising, termed M3NAS. On the one hand, the proposed M3NAS fuses features extracted by different scale cells to capture multi-scale image structural details. On the other hand, the proposed M3NAS can search a hybrid cell- and network-level structure for better performance. In addition, M3NAS can effectively reduce the number of model parameters and increase the speed of inference. Extensive experimental results on two different datasets demonstrate that the proposed M3NAS can achieve better performance and fewer parameters than several state-of-the-art methods. In addition, we also validate the effectiveness of the multi-scale and multi-level architecture for LDCT denoising, and present further analysis for different configurations of super-net.


Asunto(s)
Redes Neurales de la Computación , Tomografía Computarizada por Rayos X , Relación Señal-Ruido , Tomografía Computarizada por Rayos X/métodos
15.
Artif Intell Med ; 142: 102555, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37316093

RESUMEN

Digital mammography is currently the most common imaging tool for breast cancer screening. Although the benefits of using digital mammography for cancer screening outweigh the risks associated with the x-ray exposure, the radiation dose must be kept as low as possible while maintaining the diagnostic utility of the generated images, thus minimizing patient risks. Many studies investigated the feasibility of dose reduction by restoring low-dose images using deep neural networks. In these cases, choosing the appropriate training database and loss function is crucial and impacts the quality of the results. In this work, we used a standard residual network (ResNet) to restore low-dose digital mammography images and evaluated the performance of several loss functions. For training purposes, we extracted 256,000 image patches from a dataset of 400 images of retrospective clinical mammography exams, where dose reduction factors of 75% and 50% were simulated to generate low and standard-dose pairs. We validated the network in a real scenario by using a physical anthropomorphic breast phantom to acquire real low-dose and standard full-dose images in a commercially available mammography system, which were then processed through our trained model. We benchmarked our results against an analytical restoration model for low-dose digital mammography. Objective assessment was performed through the signal-to-noise ratio (SNR) and the mean normalized squared error (MNSE), decomposed into residual noise and bias. Statistical tests revealed that the use of the perceptual loss (PL4) resulted in statistically significant differences when compared to all other loss functions. Additionally, images restored using the PL4 achieved the closest residual noise to the standard dose. On the other hand, perceptual loss PL3, structural similarity index (SSIM) and one of the adversarial losses achieved the lowest bias for both dose reduction factors. The source code of our deep neural network is available at https://github.com/WANG-AXIS/LdDMDenoising.


Asunto(s)
Mama , Mamografía , Humanos , Estudios Retrospectivos , Bases de Datos Factuales , Redes Neurales de la Computación
16.
Artículo en Inglés | MEDLINE | ID: mdl-38090870

RESUMEN

Most conventional crowd counting methods utilize a fully-supervised learning framework to establish a mapping between scene images and crowd density maps. They usually rely on a large quantity of costly and time-intensive pixel-level annotations for training supervision. One way to mitigate the intensive labeling effort and improve counting accuracy is to leverage large amounts of unlabeled images. This is attributed to the inherent self-structural information and rank consistency within a single image, offering additional qualitative relation supervision during training. Contrary to earlier methods that utilized the rank relations at the original image level, we explore such rank-consistency relation within the latent feature spaces. This approach enables the incorporation of numerous pyramid partial orders, strengthening the model representation capability. A notable advantage is that it can also increase the utilization ratio of unlabeled samples. Specifically, we propose a Deep Rank-consistEnt pyrAmid Model (), which makes full use of rank consistency across coarse-to-fine pyramid features in latent spaces for enhanced crowd counting with massive unlabeled images. In addition, we have collected a new unlabeled crowd counting dataset, FUDAN-UCC, comprising 4000 images for training purposes. Extensive experiments on four benchmark datasets, namely UCF-QNRF, ShanghaiTech PartA and PartB, and UCF-CC-50, show the effectiveness of our method compared with previous semi-supervised methods. The codes are available at https://github.com/bridgeqiqi/DREAM.

17.
IEEE Trans Image Process ; 31: 7264-7278, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36378790

RESUMEN

The similarity among samples and the discrepancy among clusters are two crucial aspects of image clustering. However, current deep clustering methods suffer from inaccurate estimation of either feature similarity or semantic discrepancy. In this paper, we present a Semantic Pseudo-labeling-based Image ClustEring (SPICE) framework, which divides the clustering network into a feature model for measuring the instance-level similarity and a clustering head for identifying the cluster-level discrepancy. We design two semantics-aware pseudo-labeling algorithms, prototype pseudo-labeling and reliable pseudo-labeling, which enable accurate and reliable self-supervision over clustering. Without using any ground-truth label, we optimize the clustering network in three stages: 1) train the feature model through contrastive learning to measure the instance similarity; 2) train the clustering head with the prototype pseudo-labeling algorithm to identify cluster semantics; and 3) jointly train the feature model and clustering head with the reliable pseudo-labeling algorithm to improve the clustering performance. Extensive experimental results demonstrate that SPICE achieves significant improvements (~10%) over existing methods and establishes the new state-of-the-art clustering results on six balanced benchmark datasets in terms of three popular metrics. Importantly, SPICE significantly reduces the gap between unsupervised and fully-supervised classification; e.g. there is only 2% (91.8% vs 93.8%) accuracy difference on CIFAR-10. Our code is made publicly available at https://github.com/niuchuangnn/SPICE.

18.
IEEE Trans Radiat Plasma Med Sci ; 6(6): 656-666, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-35865007

RESUMEN

Deep neural network based methods have achieved promising results for CT metal artifact reduction (MAR), most of which use many synthesized paired images for supervised learning. As synthesized metal artifacts in CT images may not accurately reflect the clinical counterparts, an artifact disentanglement network (ADN) was proposed with unpaired clinical images directly, producing promising results on clinical datasets. However, as the discriminator can only judge if large regions semantically look artifact-free or artifact-affected, it is difficult for ADN to recover small structural details of artifact-affected CT images based on adversarial losses only without sufficient constraints. To overcome the illposedness of this problem, here we propose a low-dimensional manifold (LDM) constrained disentanglement network (DN), leveraging the image characteristics that the patch manifold of CT images is generally low-dimensional. Specifically, we design an LDM-DN learning algorithm to empower the disentanglement network through optimizing the synergistic loss functions used in ADN while constraining the recovered images to be on a low-dimensional patch manifold. Moreover, learning from both paired and unpaired data, an efficient hybrid optimization scheme is proposed to further improve the MAR performance on clinical datasets. Extensive experiments demonstrate that the proposed LDM-DN approach can consistently improve the MAR performance in paired and/or unpaired learning settings, outperforming competing methods on synthesized and clinical datasets.

19.
Patterns (N Y) ; 3(5): 100475, 2022 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-35607615

RESUMEN

Due to lack of the kernel awareness, some popular deep image reconstruction networks are unstable. To address this problem, here we introduce the bounded relative error norm (BREN) property, which is a special case of the Lipschitz continuity. Then, we perform a convergence study consisting of two parts: (1) a heuristic analysis on the convergence of the analytic compressed iterative deep (ACID) scheme (with the simplification that the CS module achieves a perfect sparsification), and (2) a mathematically denser analysis (with the two approximations: [1] AT is viewed as an inverse A- 1 in the perspective of an iterative reconstruction procedure and [2] a pseudo-inverse is used for a total variation operator H). Also, we present adversarial attack algorithms to perturb the selected reconstruction networks respectively and, more importantly, to attack the ACID workflow as a whole. Finally, we show the numerical convergence of the ACID iteration in terms of the Lipschitz constant and the local stability against noise.

20.
Patterns (N Y) ; 3(5): 100474, 2022 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-35607623

RESUMEN

A recent PNAS paper reveals that several popular deep reconstruction networks are unstable. Specifically, three kinds of instabilities were reported: (1) strong image artefacts from tiny perturbations, (2) small features missed in a deeply reconstructed image, and (3) decreased imaging performance with increased input data. Here, we propose an analytic compressed iterative deep (ACID) framework to address this challenge. ACID synergizes a deep network trained on big data, kernel awareness from compressed sensing (CS)-inspired processing, and iterative refinement to minimize the data residual relative to real measurement. Our study demonstrates that the ACID reconstruction is accurate, is stable, and sheds light on the converging mechanism of the ACID iteration under a bounded relative error norm assumption. ACID not only stabilizes an unstable deep reconstruction network but also is resilient against adversarial attacks to the whole ACID workflow, being superior to classic sparsity-regularized reconstruction and eliminating the three kinds of instabilities.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA