RESUMEN
Integrating complementary information from multiple magnetic resonance imaging (MRI) modalities is often necessary to make accurate and reliable diagnostic decisions. However, the different acquisition speeds of these modalities mean that obtaining information can be time consuming and require significant effort. Reference-based MRI reconstruction aims to accelerate slower, under-sampled imaging modalities, such as T2-modality, by utilizing redundant information from faster, fully sampled modalities, such as T1-modality. Unfortunately, spatial misalignment between different modalities often negatively impacts the final results. To address this issue, we propose FEFA, which consists of cascading FEFA blocks. The FEFA block first aligns and fuses the two modalities at the feature level. The combined features are then filtered in the frequency domain to enhance the important features while simultaneously suppressing the less essential ones, thereby ensuring accurate reconstruction. Furthermore, we emphasize the advantages of combining the reconstruction results from multiple cascaded blocks, which also contributes to stabilizing the training process. Compared to existing registration-then-reconstruction and cross-attention-based approaches, our method is end-to-end trainable without requiring additional supervision, extensive parameters, or heavy computation. Experiments on the public fastMRI, IXI and in-house datasets demonstrate that our approach is effective across various under-sampling patterns and ratios. Our code will be available at: https://github.com/chenxm12394/FEFA.
RESUMEN
Transformer-based image denoising methods have shown remarkable potential but suffer from high computational cost and large memory footprint due to their linear operations for capturing long-range dependencies. In this work, we aim to develop a more resource-efficient Transformer-based image denoising method that maintains high performance. To this end, we propose an Efficient Wavelet Transformer (EWT), which incorporates a Frequency-domain Conversion Pipeline (FCP) to reduce image resolution without losing critical features, and a Multi-level Feature Aggregation Module (MFAM) with a Dual-stream Feature Extraction Block (DFEB) to harness hierarchical features effectively. EWT achieves a faster processing speed by over 80% and reduces GPU memory usage by more than 60% compared to the original Transformer, while still delivering denoising performance on par with state-of-the-art methods. Extensive experiments show that EWT significantly improves the efficiency of Transformer-based image denoising, providing a more balanced approach between performance and resource consumption.
Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador , Análisis de Ondículas , Procesamiento de Imagen Asistido por Computador/métodos , Relación Señal-Ruido , HumanosRESUMEN
Ultrasound image segmentation is a challenging task due to the complexity of lesion types, fuzzy boundaries, and low-contrast images along with the presence of noises and artifacts. To address these issues, we propose an end-to-end multi-scale feature extraction and fusion network (MEF-UNet) for the automatic segmentation of ultrasound images. Specifically, we first design a selective feature extraction encoder, including detail extraction stage and structure extraction stage, to precisely capture the edge details and overall shape features of the lesions. In order to enhance the representation capacity of contextual information, we develop a context information storage module in the skip-connection section, responsible for integrating information from adjacent two-layer feature maps. In addition, we design a multi-scale feature fusion module in the decoder section to merge feature maps with different scales. Experimental results indicate that our MEF-UNet can significantly improve the segmentation results in both quantitative analysis and visual effects.
Asunto(s)
Algoritmos , Artefactos , Ultrasonografía , Procesamiento de Imagen Asistido por ComputadorRESUMEN
Regularization-based methods are commonly used for image registration. However, fixed regularizers have limitations in capturing details and describing the dynamic registration process. To address this issue, we propose a time multiscale registration framework for nonlinear image registration in this paper. Our approach replaces the fixed regularizer with a monotone decreasing sequence, and iteratively uses the residual of the previous step as the input for registration. Particularly, first, we introduce a dynamically varying regularization strategy that updates regularizers at each iteration and incorporates them with a multiscale framework. This approach guarantees an overall smooth deformation field in the initial stage of registration and fine-tunes local details as the images become more similar. We then deduce convergence analysis under certain conditions on the regularizers and parameters. Further, we introduce a TV-like regularizer to demonstrate the efficiency of our method. Finally, we compare our proposed multiscale algorithm with some existing methods on both synthetic images and pulmonary computed tomography (CT) images. The experimental results validate that our proposed algorithm outperforms the compared methods, especially in preserving details during image registration with sharp structures.
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Tomografía Computarizada por Rayos X , Procesamiento de Imagen Asistido por Computador/métodos , Tomografía Computarizada por Rayos X/métodos , AlgoritmosRESUMEN
Survival prediction based on histopathological whole slide images (WSIs) is of great significance for risk-benefit assessment and clinical decision. However, complex microenvironments and heterogeneous tissue structures in WSIs bring challenges to learning informative prognosis-related representations. Additionally, previous studies mainly focus on modeling using mono-scale WSIs, which commonly ignore useful subtle differences existed in multi-zoom WSIs. To this end, we propose a deep multi-dictionary learning framework for cancer survival prediction with multi-zoom histopathological WSIs. The framework can recognize and learn discriminative clusters (i.e., microenvironments) based on multi-scale deep representations for survival analysis. Specifically, we learn multi-scale features based on multi-zoom tiles from WSIs via stacked deep autoencoders network followed by grouping different microenvironments by cluster algorithm. Based on multi-scale deep features of clusters, a multi-dictionary learning method with a post-pruning strategy is devised to learn discriminative representations from selected prognosis-related clusters in a task-driven manner. Finally, a survival model (i.e., EN-Cox) is constructed to estimate the risk index of an individual patient. The proposed model is evaluated on three datasets derived from The Cancer Genome Atlas (TCGA), and the experimental results demonstrate that it outperforms several state-of-the-art survival analysis approaches.
Asunto(s)
Algoritmos , Neoplasias , Humanos , Neoplasias/genética , Microambiente TumoralRESUMEN
With the widespread application of digital orthodontics in the diagnosis and treatment of oral diseases, more and more researchers focus on the accurate segmentation of teeth from intraoral scan data. The accuracy of the segmentation results will directly affect the follow-up diagnosis of dentists. Although the current research on tooth segmentation has achieved promising results, the 3D intraoral scan datasets they use are almost all indirect scans of plaster models, and only contain limited samples of abnormal teeth, so it is difficult to apply them to clinical scenarios under orthodontic treatment. The current issue is the lack of a unified and standardized dataset for analyzing and validating the effectiveness of tooth segmentation. In this work, we focus on deformed teeth segmentation and provide a fine-grained tooth segmentation dataset (3D-IOSSeg). The dataset consists of 3D intraoral scan data from more than 200 patients, with each sample labeled with a fine-grained mesh unit. Meanwhile, 3D-IOSSeg meticulously classified every tooth in the upper and lower jaws. In addition, we propose a fast graph convolutional network for 3D tooth segmentation named Fast-TGCN. In the model, the relationship between adjacent mesh cells is directly established by the naive adjacency matrix to better extract the local geometric features of the tooth. Extensive experiments show that Fast-TGCN can quickly and accurately segment teeth from the mouth with complex structures and outperforms other methods in various evaluation metrics. Moreover, we present the results of multiple classical tooth segmentation methods on this dataset, providing a comprehensive analysis of the field. All code and data will be available at https://github.com/MIVRC/Fast-TGCN.
Asunto(s)
Imagenología Tridimensional , Diente , Humanos , Imagenología Tridimensional/métodos , Diente/diagnóstico por imagen , Modelos DentalesRESUMEN
Diffusion-weighted imaging (DWI) has been extensively explored in guiding the clinic management of patients with breast cancer. However, due to the limited resolution, accurately characterizing tumors using DWI and the corresponding apparent diffusion coefficient (ADC) is still a challenging problem. In this paper, we aim to address the issue of super-resolution (SR) of ADC images and evaluate the clinical utility of SR-ADC images through radiomics analysis. To this end, we propose a novel double transformer-based network (DTformer) to enhance the resolution of ADC images. More specifically, we propose a symmetric U-shaped encoder-decoder network with two different types of transformer blocks, named as UTNet, to extract deep features for super-resolution. The basic backbone of UTNet is composed of a locally-enhanced Swin transformer block (LeSwin-T) and a convolutional transformer block (Conv-T), which are responsible for capturing long-range dependencies and local spatial information, respectively. Additionally, we introduce a residual upsampling network (RUpNet) to expand image resolution by leveraging initial residual information from the original low-resolution (LR) images. Extensive experiments show that DTformer achieves superior SR performance. Moreover, radiomics analysis reveals that improving the resolution of ADC images is beneficial for tumor characteristic prediction, such as histological grade and human epidermal growth factor receptor 2 (HER2) status.
Asunto(s)
Neoplasias de la Mama , Humanos , Femenino , Neoplasias de la Mama/diagnóstico por imagen , Imagen de Difusión por Resonancia Magnética , Suministros de Energía Eléctrica , Radiómica , Procesamiento de Imagen Asistido por ComputadorRESUMEN
The score-based generative model (SGM) can generate high-quality samples, which have been successfully adopted for magnetic resonance imaging (MRI) reconstruction. However, the recent SGMs may take thousands of steps to generate a high-quality image. Besides, SGMs neglect to exploit the redundancy in k space. To overcome the above two drawbacks, in this article, we propose a fast and reliable SGM (FRSGM). First, we propose deep ensemble denoisers (DEDs) consisting of SGM and the deep denoiser, which are used to solve the proximal problem of the implicit regularization term. Second, we propose a spatially adaptive self-consistency (SASC) term as the regularization term of the k -space data. We use the alternating direction method of multipliers (ADMM) algorithm to solve the minimization model of compressed sensing (CS)-MRI incorporating the image prior term and the SASC term, which is significantly faster than the related works based on SGM. Meanwhile, we can prove that the iterating sequence of the proposed algorithm has a unique fixed point. In addition, the DED and the SASC term can significantly improve the generalization ability of the algorithm. The features mentioned above make our algorithm reliable, including the fixed-point convergence guarantee, the exploitation of the k space, and the powerful generalization ability.
RESUMEN
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) contains information on tumor morphology and physiology for breast cancer diagnosis and treatment. However, this technology requires contrast agent injection with more acquisition time than other parametric images, such as T2-weighted imaging (T2WI). Current image synthesis methods attempt to map the image data from one domain to another, whereas it is challenging or even infeasible to map the images with one sequence into images with multiple sequences. Here, we propose a new approach of cross-parametric generative adversarial network (GAN)-based feature synthesis (CPGANFS) to generate discriminative DCE-MRI features from T2WI with applications in breast cancer diagnosis. The proposed approach decodes the T2W images into latent cross-parameter features to reconstruct the DCE-MRI and T2WI features by balancing the information shared between the two. A Wasserstein GAN with a gradient penalty is employed to differentiate the T2WI-generated features from ground-truth features extracted from DCE-MRI. The synthesized DCE-MRI feature-based model achieved significantly (p = 0.036) higher prediction performance (AUC = 0.866) in breast cancer diagnosis than that based on T2WI (AUC = 0.815). Visualization of the model shows that our CPGANFS method enhances the predictive power by levitating attention to the lesion and the surrounding parenchyma areas, which is driven by the interparametric information learned from T2WI and DCE-MRI. Our proposed CPGANFS provides a framework for cross-parametric MR image feature generation from a single-sequence image guided by an information-rich, time-series image with kinetic information. Extensive experimental results demonstrate its effectiveness with high interpretability and improved performance in breast cancer diagnosis.
Asunto(s)
Neoplasias de la Mama , Imagen por Resonancia Magnética , Humanos , Femenino , Imagen por Resonancia Magnética/métodos , Mama/patología , Neoplasias de la Mama/patología , Medios de ContrasteRESUMEN
Since Magnetic Resonance Imaging (MRI) requires a long acquisition time, various methods were proposed to reduce the time, but they ignored the frequency information and non-local similarity, so that they failed to reconstruct images with a clear structure. In this article, we propose Frequency Learning via Multi-scale Fourier Transformer for MRI Reconstruction (FMTNet), which focuses on repairing the low-frequency and high-frequency information. Specifically, FMTNet is composed of a high-frequency learning branch (HFLB) and a low-frequency learning branch (LFLB). Meanwhile, we propose a Multi-scale Fourier Transformer (MFT) as the basic module to learn the non-local information. Unlike normal Transformers, MFT adopts Fourier convolution to replace self-attention to efficiently learn global information. Moreover, we further introduce a multi-scale learning and cross-scale linear fusion strategy in MFT to interact information between features of different scales and strengthen the representation of features. Compared with normal Transformers, the proposed MFT occupies fewer computing resources. Based on MFT, we design a Residual Multi-scale Fourier Transformer module as the main component of HFLB and LFLB. We conduct several experiments under different acceleration rates and different sampling patterns on different datasets, and the experiment results show that our method is superior to the previous state-of-the-art method.
Asunto(s)
Suministros de Energía Eléctrica , Imagen por Resonancia Magnética , HumanosRESUMEN
Assessments of multiple clinical indicators based on radiomic analysis of magnetic resonance imaging (MRI) are beneficial to the diagnosis, prognosis and treatment of breast cancer patients. Many machine learning methods have been designed to jointly predict multiple indicators for more accurate assessments while using original clinical labels directly without considering the noisy and redundant information among them. To this end, we propose a multilabel learning method based on label space dimensionality reduction (LSDR), which learns common and task-specific features via graph regularized nonnegative matrix factorization (CTFGNMF) for the joint prediction of multiple indicators in breast cancer. A nonnegative matrix factorization (NMF) is adopted to map original clinical labels to a low-dimensional latent space. The latent labels are employed to exploit task correlations by using a least square loss function with [Formula: see text]-norm regularization to identify common features, which help to improve the generalization performance of correlated tasks. Furthermore, task-specific features were retained by a multitask regression formulation to increase the discrimination power for different tasks. Common and task-specific features are incorporated by dynamic graph Laplacian regularization into a unified model to learn complementary features. Then, a multilabel classification is built to predict multiple clinical indicators including human epidermal growth factor receptor 2 (HER2), Ki-67, and histological grade. Experimental results show that CTFGNMF achieves AUCs of 0.823, 0.691 and 0.776 in the three indicator predictions, outperforming other counterparts that consider only task-independent features or common features. It indicates CTFGNMF is a promising application for multiple classification tasks in breast cancer.
RESUMEN
The Retinex model is one of the most representative and effective methods for low-light image enhancement. However, the Retinex model does not explicitly tackle the noise problem and shows unsatisfactory enhancing results. In recent years, due to the excellent performance, deep learning models have been widely used in low-light image enhancement. However, these methods have two limitations. First, the desirable performance can only be achieved by deep learning when a large number of labeled data are available. However, it is not easy to curate massive low-/normal-light paired data. Second, deep learning is notoriously a black-box model. It is difficult to explain their inner working mechanism and understand their behaviors. In this article, using a sequential Retinex decomposition strategy, we design a plug-and-play framework based on the Retinex theory for simultaneous image enhancement and noise removal. Meanwhile, we develop a convolutional neural network-based (CNN-based) denoiser into our proposed plug-and-play framework to generate a reflectance component. The final image is enhanced by integrating the illumination and reflectance with gamma correction. The proposed plug-and-play framework can facilitate both post hoc and ad hoc interpretability. Extensive experiments on different datasets demonstrate that our framework outcompetes the state-of-the-art methods in both image enhancement and denoising.
RESUMEN
Purpose: During neoadjuvant chemotherapy (NACT), breast tumor morphological and vascular characteristics are usually changed. This study aimed to evaluate the tumor shrinkage pattern and response to NACT by preoperative multiparametric magnetic resonance imaging (MRI), including dynamic contrast-enhanced MRI (DCE-MRI), diffuse weighted imaging (DWI) and T2 weighted imaging (T2WI). Method: In this retrospective analysis, female patients with unilateral unifocal primary breast cancer were included for predicting tumor pathologic/clinical response to NACT (n=216, development set, n=151 and validation set, n=65) and for discriminating the tumor concentric shrinkage (CS) pattern from the others (n=193; development set, n=135 and validation set, n=58). Radiomic features (n=102) of first-order statistical, morphological and textural features were calculated on tumors from the multiparametric MRI. Single- and multiparametric image-based features were assessed separately and were further combined to feed into a random forest-based predictive model. The predictive model was trained in the testing set and assessed on the testing dataset with an area under the curve (AUC). Molecular subtype information and radiomic features were fused to enhance the predictive performance. Results: The DCE-MRI-based model showed higher performance (AUCs of 0.919, 0.830 and 0.825 for tumor pathologic response, clinical response and tumor shrinkage patterns, respectively) than either the T2WI or the ADC image-based model. An increased prediction performance was achieved by a model with multiparametric MRI radiomic feature fusion. Conclusions: All these results demonstrated that multiparametric MRI features and their information fusion could be of important clinical value for the preoperative prediction of treatment response and shrinkage pattern.
RESUMEN
The long acquisition time has limited the accessibility of magnetic resonance imaging (MRI) because it leads to patient discomfort and motion artifacts. Although several MRI techniques have been proposed to reduce the acquisition time, compressed sensing in magnetic resonance imaging (CS-MRI) enables fast acquisition without compromising SNR and resolution. However, existing CS-MRI methods suffer from the challenge of aliasing artifacts. This challenge results in the noise-like textures and missing the fine details, thus leading to unsatisfactory reconstruction performance. To tackle this challenge, we propose a hierarchical perception adversarial learning framework (HP-ALF). HP-ALF can perceive the image information in the hierarchical mechanism: image-level perception and patch-level perception. The former can reduce the visual perception difference in the entire image, and thus achieve aliasing artifact removal. The latter can reduce this difference in the regions of the image, and thus recover fine details. Specifically, HP-ALF achieves the hierarchical mechanism by utilizing multilevel perspective discrimination. This discrimination can provide the information from two perspectives (overall and regional) for adversarial learning. It also utilizes a global and local coherent discriminator to provide structure information to the generator during training. In addition, HP-ALF contains a context-aware learning block to effectively exploit the slice information between individual images for better reconstruction performance. The experiments validated on three datasets demonstrate the effectiveness of HP-ALF and its superiority to the comparative methods.
Asunto(s)
Aprendizaje Profundo , Humanos , Imagen por Resonancia Magnética/métodos , Artefactos , Percepción Visual , Procesamiento de Imagen Asistido por Computador/métodosRESUMEN
Recently, deep convolution neural networks (CNNs) steered face super-resolution methods have achieved great progress in restoring degraded facial details by joint training with facial priors. However, these methods have some obvious limitations. On the one hand, multi-task joint learning requires additional marking on the dataset, and the introduced prior network will significantly increase the computational cost of the model. On the other hand, the limited receptive field of CNN will reduce the fidelity and naturalness of the reconstructed facial images, resulting in suboptimal reconstructed images. In this work, we propose an efficient CNN-Transformer Cooperation Network (CTCNet) for face super-resolution tasks, which uses the multi-scale connected encoder-decoder architecture as the backbone. Specifically, we first devise a novel Local-Global Feature Cooperation Module (LGCM), which is composed of a Facial Structure Attention Unit (FSAU) and a Transformer block, to promote the consistency of local facial detail and global facial structure restoration simultaneously. Then, we design an efficient Feature Refinement Module (FRM) to enhance the encoded features. Finally, to further improve the restoration of fine facial details, we present a Multi-scale Feature Fusion Unit (MFFU) to adaptively fuse the features from different stages in the encoder procedure. Extensive evaluations on various datasets have assessed that the proposed CTCNet can outperform other state-of-the-art methods significantly. Source code will be available at https://github.com/IVIPLab/CTCNet.
Asunto(s)
Aprendizaje , Redes Neurales de la Computación , Programas Informáticos , Procesamiento de Imagen Asistido por ComputadorRESUMEN
Scene recovery is a fundamental imaging task with several practical applications, including video surveillance and autonomous vehicles, etc. In this article, we provide a new real-time scene recovery framework to restore degraded images under different weather/imaging conditions, such as underwater, sand dust and haze. A degraded image can actually be seen as a superimposition of a clear image with the same color imaging environment (underwater, sand or haze, etc.). Mathematically, we can introduce a rank-one matrix to characterize this phenomenon, i.e., rank-one prior (ROP). Using the prior, a direct method with the complexity O(N) is derived for real-time recovery. For general cases, we develop ROP + to further improve the recovery performance. Comprehensive experiments of the scene recovery illustrate that our method outperforms competitively several state-of-the-art imaging methods in terms of efficiency and robustness.
RESUMEN
Discovering hidden pattern from imbalanced data is a critical issue in various real-world applications. Existing classification methods usually suffer from the limitation of data especially for minority classes, and result in unstable prediction and low performance. In this paper, a deep generative classifier is proposed to mitigate this issue via both model perturbation and data perturbation. Specially, the proposed generative classifier is derived from a deep latent variable model where two variables are involved. One variable is to capture the essential information of the original data, denoted as latent codes, which are represented by a probability distribution rather than a single fixed value. The learnt distribution aims to enforce the uncertainty of model and implement model perturbation, thus, lead to stable predictions. The other variable is a prior to latent codes so that the codes are restricted to lie on components in Gaussian Mixture Model. As a confounder affecting generative processes of data (feature/label), the latent variables are supposed to capture the discriminative latent distribution and implement data perturbation. Extensive experiments have been conducted on widely-used real imbalanced image datasets. Experimental results demonstrate the superiority of our proposed model by comparing with popular imbalanced classification baselines on imbalance classification task.
RESUMEN
Semantic segmentation has achieved great progress by effectively fusing features of contextual information. In this article, we propose an end-to-end semantic attention boosting (SAB) framework to adaptively fuse the contextual information iteratively across layers with semantic regularization. Specifically, we first propose a pixelwise semantic attention (SAP) block, with a semantic metric representing the pixelwise category relationship, to aggregate the nonlocal contextual information. In addition, we improve the computation complexity of SAP block from O(n²) to O(n) for images with size n. Second, we present a categorywise semantic attention (SAC) block to adaptively balance the nonlocal contextual dependencies and the local consistency with a categorywise weight, overcoming the contextual information confusion caused by the feature imbalance within intra-category. Furthermore, we propose the SAB module to refine the segmentation with SAC and SAP blocks. By applying the SAB module iteratively across layers, our model shrinks the semantic gap and enhances the structure reasoning by fully utilizing the coarse segmentation information. Extensive quantitative evaluations demonstrate that our method significantly improves the segmentation results and achieves superior performance on the PASCAL VOC 2012, Cityscapes, PASCAL Context, and ADE20K datasets.
RESUMEN
Junction plays an important role in biomedical research such as retinal biometric identification, retinal image registration, eye-related disease diagnosis and neuron reconstruction. However, junction detection in original biomedical images is extremely challenging. For example, retinal images contain many tiny blood vessels with complicated structures and low contrast, which makes it challenging to detect junctions. In this paper, we propose an O-shape Network architecture with Attention modules (Attention O-Net), which includes Junction Detection Branch (JDB) and Local Enhancement Branch (LEB) to detect junctions in biomedical images without segmentation. In JDB, the heatmap indicating the probabilities of junctions is estimated and followed by choosing the positions with the local highest value as the junctions, whereas it is challenging to detect junctions when the images contain weak filament signals. Therefore, LEB is constructed to enhance the thin branch foreground and make the network pay more attention to the regions with low contrast, which is helpful to alleviate the imbalance of the foreground between thin and thick branches and to detect the junctions of the thin branch. Furthermore, attention modules are utilized to introduce the feature maps of LEB to JDB, which can establish a complementary relationship and further integrate local features and contextual information between these two branches. The proposed method achieves the highest average F1-scores of 0.82, 0.73 and 0.94 in two retinal datasets and one neuron dataset, respectively. The experimental results confirm that Attention O-Net outperforms other state-of-the-art detection methods, and is helpful for retinal biometric identification.
Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la Computación , Retina/diagnóstico por imagenRESUMEN
Blind image deblurring is a conundrum because there are infinitely many pairs of latent image and blur kernel. To get a stable and reasonable deblurred image, proper prior knowledge of the latent image and the blur kernel is urgently required. Different from the recent works on the statistical observations of the difference between the blurred image and the clean one, our method is built on the surface-aware strategy arising from the intrinsic geometrical consideration. This approach facilitates the blur kernel estimation due to the preserved sharp edges in the intermediate latent image. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods on deblurring the text and natural images. Moreover, our method can achieve attractive results in some challenging cases, such as low-illumination images with large saturated regions and impulse noise. A direct extension of our method to the non-uniform deblurring problem also validates the effectiveness of the surface-aware prior.