Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 86
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Sensors (Basel) ; 24(14)2024 Jul 19.
Article in English | MEDLINE | ID: mdl-39066083

ABSTRACT

Infrared images hold significant value in applications such as remote sensing and fire safety. However, infrared detectors often face the problem of high hardware costs, which limits their widespread use. Advancements in deep learning have spurred innovative approaches to image super-resolution (SR), but comparatively few efforts have been dedicated to the exploration of infrared images. To address this, we design the Residual Swin Transformer and Average Pooling Block (RSTAB) and propose the SwinAIR, which can effectively extract and fuse the diverse frequency features in infrared images and achieve superior SR reconstruction performance. By further integrating SwinAIR with U-Net, we propose the SwinAIR-GAN for real infrared image SR reconstruction. SwinAIR-GAN extends the degradation space to better simulate the degradation process of real infrared images. Additionally, it incorporates spectral normalization, dropout, and artifact discrimination loss to reduce the potential image artifacts. Qualitative and quantitative evaluations on various datasets confirm the effectiveness of our proposed method in reconstructing realistic textures and details of infrared images.

2.
Sensors (Basel) ; 24(16)2024 Aug 06.
Article in English | MEDLINE | ID: mdl-39204794

ABSTRACT

In this paper, we propose a Local Global Union Network (LGUN), which effectively combines the strengths of Transformers and Convolutional Networks to develop a lightweight and high-performance network suitable for Single Image Super-Resolution (SISR). Specifically, we make use of the advantages of Transformers to provide input-adaptation weighting and global context interaction. We also make use of the advantages of Convolutional Networks to include spatial inductive biases and local connectivity. In the shallow layer, the local spatial information is encoded by Multi-order Local Hierarchical Attention (MLHA). In the deeper layer, we utilize Dynamic Global Sparse Attention (DGSA), which is based on the Multi-stage Token Selection (MTS) strategy to model global context dependencies. Moreover, we also conduct extensive experiments on both natural and satellite datasets, acquired through optical and satellite sensors, respectively, demonstrating that LGUN outperforms existing methods.

3.
Sensors (Basel) ; 24(13)2024 Jun 21.
Article in English | MEDLINE | ID: mdl-39000811

ABSTRACT

3D digital-image correlation (3D-DIC) is a non-contact optical technique for full-field shape, displacement, and deformation measurement. Given the high experimental hardware costs associated with 3D-DIC, the development of high-fidelity 3D-DIC simulations holds significant value. However, existing research on 3D-DIC simulation was mainly carried out through the generation of random speckle images. This study innovatively proposes a complete 3D-DIC simulation method involving optical simulation and mechanical simulation and integrating 3D-DIC, virtual stereo vision, and image super-resolution reconstruction technology. Virtual stereo vision can reduce hardware costs and eliminate camera-synchronization errors. Image super-resolution reconstruction can compensate for the decrease in precision caused by image-resolution loss. An array of software tools such as ANSYS SPEOS 2024R1, ZEMAX 2024R1, MECHANICAL 2024R1, and MULTIDIC v1.1.0 are used to implement this simulation. Measurement systems based on stereo vision and virtual stereo vision were built and tested for use in 3D-DIC. The results of the simulation experiment show that when the synchronization error of the basic stereo-vision system (BSS) is within 10-3 time steps, the reconstruction error is within 0.005 mm and the accuracy of the virtual stereo-vision system is between the BSS's synchronization error of 10-7 and 10-6 time steps. In addition, after image super-resolution reconstruction technology is applied, the reconstruction error will be reduced to within 0.002 mm. The simulation method proposed in this study can provide a novel research path for existing researchers in the field while also offering the opportunity for researchers without access to costly hardware to participate in related research.

4.
Sensors (Basel) ; 24(13)2024 Jun 26.
Article in English | MEDLINE | ID: mdl-39000923

ABSTRACT

Detail preservation is a major challenge for single image super-resolution (SISR). Many deep learning-based SISR methods focus on lightweight network design, but these may fall short in real-world scenarios where performance is prioritized over network size. To address these problems, we propose a novel plug-and-play attention module, rich elastic mixed attention (REMA), for SISR. REMA comprises the rich spatial attention module (RSAM) and the rich channel attention module (RCAM), both built on Rich Structure. Based on the results of our research on the module's structure, size, performance, and compatibility, Rich Structure is proposed to enhance REMA's adaptability to varying input complexities and task requirements. RSAM learns the mutual dependencies of multiple LR-HR pairs and multi-scale features, while RCAM accentuates key features through interactive learning, effectively addressing detail loss. Extensive experiments demonstrate that REMA significantly improves performance and compatibility in SR networks compared to other attention modules. The REMA-based SR network (REMA-SRNet) outperforms comparative algorithms in both visual effects and objective evaluation quality. Additionally, we find that module compatibility correlates with cardinality and in-branch feature bandwidth, and that networks with high effective parameter counts exhibit enhanced robustness across various datasets and scale factors in SISR.

5.
Sensors (Basel) ; 24(13)2024 Jun 30.
Article in English | MEDLINE | ID: mdl-39001038

ABSTRACT

The accurate detection of electrical equipment states and faults is crucial for the reliable operation of such equipment and for maintaining the health of the overall power system. The state of power equipment can be effectively monitored through deep learning-based visual inspection methods, which provide essential information for diagnosing and predicting equipment failures. However, there are significant challenges: on the one hand, electrical equipment typically operates in complex environments, thus resulting in captured images that contain environmental noise, which significantly reduces the accuracy of state recognition based on visual perception. This, in turn, affects the comprehensiveness of the power system's situational awareness. On the other hand, visual perception is limited to obtaining the appearance characteristics of the equipment. The lack of logical reasoning makes it difficult for purely visual analysis to conduct a deeper analysis and diagnosis of the complex equipment state. Therefore, to address these two issues, we first designed an image super-resolution reconstruction method based on the Generative Adversarial Network (GAN) to filter environmental noise. Then, the pixel information is analyzed using a deep learning-based method to obtain the spatial feature of the equipment. Finally, by constructing the logic diagram for electrical equipment clusters, we propose an interpretable fault diagnosis method that integrates the spatial features and temporal states of the electrical equipment. To verify the effectiveness of the proposed algorithm, extensive experiments are conducted on six datasets. The results demonstrate that the proposed method can achieve high accuracy in diagnosing electrical equipment faults.

6.
Sensors (Basel) ; 24(11)2024 May 31.
Article in English | MEDLINE | ID: mdl-38894350

ABSTRACT

With the development of deep learning, the Super-Resolution (SR) reconstruction of microscopic images has improved significantly. However, the scarcity of microscopic images for training, the underutilization of hierarchical features in original Low-Resolution (LR) images, and the high-frequency noise unrelated with the image structure generated during the reconstruction process are still challenges in the Single Image Super-Resolution (SISR) field. Faced with these issues, we first collected sufficient microscopic images through Motic, a company engaged in the design and production of optical and digital microscopes, to establish a dataset. Secondly, we proposed a Residual Dense Attention Generative Adversarial Network (RDAGAN). The network comprises a generator, an image discriminator, and a feature discriminator. The generator includes a Residual Dense Block (RDB) and a Convolutional Block Attention Module (CBAM), focusing on extracting the hierarchical features of the original LR image. Simultaneously, the added feature discriminator enables the network to generate high-frequency features pertinent to the image's structure. Finally, we conducted experimental analysis and compared our model with six classic models. Compared with the best model, our model improved PSNR and SSIM by about 1.5 dB and 0.2, respectively.

7.
Sensors (Basel) ; 24(3)2024 Jan 31.
Article in English | MEDLINE | ID: mdl-38339649

ABSTRACT

Terahertz (THz) waves are electromagnetic waves in the 0.1 to 10 THz frequency range, and THz imaging is utilized in a range of applications, including security inspections, biomedical fields, and the non-destructive examination of materials. However, THz images have a low resolution due to the long wavelength of THz waves. Therefore, improving the resolution of THz images is a current hot research topic. We propose a novel network architecture called J-Net, which is an improved version of U-Net, to achieve THz image super-resolution. It employs simple baseline blocks which can extract low-resolution (LR) image features and learn the mapping of LR images to high-resolution (HR) images efficiently. All training was conducted using the DIV2K+Flickr2K dataset, and we employed the peak signal-to-noise ratio (PSNR) for quantitative comparison. In our comparisons with other THz image super-resolution methods, J-Net achieved a PSNR of 32.52 dB, surpassing other techniques by more than 1 dB. J-Net also demonstrates superior performance on real THz images compared to other methods. Experiments show that the proposed J-Net achieves a better PSNR and visual improvement compared with other THz image super-resolution methods.

8.
Sensors (Basel) ; 24(15)2024 Jul 27.
Article in English | MEDLINE | ID: mdl-39123937

ABSTRACT

In the field of endoscopic imaging, challenges such as low resolution, complex textures, and blurred edges often degrade the quality of 3D reconstructed models. To address these issues, this study introduces an innovative endoscopic image super-resolution and 3D reconstruction technique named Omni-Directional Focus and Scale Resolution (OmDF-SR). This method integrates an Omnidirectional Self-Attention (OSA) mechanism, an Omnidirectional Scale Aggregation Group (OSAG), a Dual-stream Adaptive Focus Mechanism (DAFM), and a Dynamic Edge Adjustment Framework (DEAF) to enhance the accuracy and efficiency of super-resolution processing. Additionally, it employs Structure from Motion (SfM) and Multi-View Stereo (MVS) technologies to achieve high-precision medical 3D models. Experimental results indicate significant improvements in image processing with a PSNR of 38.2902 dB and an SSIM of 0.9746 at a magnification factor of ×2, and a PSNR of 32.1723 dB and an SSIM of 0.9489 at ×4. Furthermore, the method excels in reconstructing detailed 3D models, enhancing point cloud density, mesh quality, and texture mapping richness, thus providing substantial support for clinical diagnosis and surgical planning.

9.
Hum Brain Mapp ; 44(9): 3781-3794, 2023 06 15.
Article in English | MEDLINE | ID: mdl-37186095

ABSTRACT

The pedunculopontine nucleus (PPN) is a small brainstem structure and has attracted attention as a potentially effective deep brain stimulation (DBS) target for the treatment of Parkinson's disease (PD). However, the in vivo location of PPN remains poorly described and barely visible on conventional structural magnetic resonance (MR) images due to a lack of high spatial resolution and tissue contrast. This study aims to delineate the PPN on a high-resolution (HR) atlas and investigate the visibility of the PPN in individual quantitative susceptibility mapping (QSM) images. We combine a recently constructed Montreal Neurological Institute (MNI) space unbiased QSM atlas (MuSus-100), with an implicit representation-based self-supervised image super-resolution (SR) technique to achieve an atlas with improved spatial resolution. Then guided by a myelin staining histology human brain atlas, we localize and delineate PPN on the atlas with improved resolution. Furthermore, we examine the feasibility of directly identifying the approximate PPN location on the 3.0-T individual QSM MR images. The proposed SR network produces atlas images with four times the higher spatial resolution (from 1 to 0.25 mm isotropic) without a training dataset. The SR process also reduces artifacts and keeps superb image contrast for further delineating small deep brain nuclei, such as PPN. Using the myelin staining histological atlas as guidance, we first identify and annotate the location of PPN on the T1-weighted (T1w)-QSM hybrid MR atlas with improved resolution in the MNI space. Then, we relocate and validate that the optimal targeting site for PPN-DBS is at the middle-to-caudal part of PPN on our atlas. Furthermore, we confirm that the PPN region can be identified in a set of individual QSM images of 10 patients with PD and 10 healthy young adults. The contrast ratios of the PPN to its adjacent structure, namely the medial lemniscus, on images of different modalities indicate that QSM substantially improves the visibility of the PPN both in the atlas and individual images. Our findings indicate that the proposed SR network is an efficient tool for small-size brain nucleus identification. HR QSM is promising for improving the visibility of the PPN. The PPN can be directly identified on the individual QSM images acquired at the 3.0-T MR scanners, facilitating a direct targeting of PPN for DBS surgery.


Subject(s)
Deep Brain Stimulation , Pedunculopontine Tegmental Nucleus , Young Adult , Humans , Magnetic Resonance Imaging/methods , Pedunculopontine Tegmental Nucleus/diagnostic imaging , Brain/diagnostic imaging , Brain Mapping/methods , Deep Brain Stimulation/methods
10.
Prev Med ; 173: 107590, 2023 08.
Article in English | MEDLINE | ID: mdl-37364796

ABSTRACT

With the continuous development of society, people's life pressure is constantly increasing, and the mental health problems of college students are becoming increasingly prominent, bringing many challenges to their education and management. Universities should not only cultivate students' theoretical and professional knowledge and practical skills, but also attach importance to their mental health and effectively implement psychological education. Therefore, it is very necessary to develop and design a simple and effective student psychological evaluation system. As a new form of ideological and political transformation in universities in the era of big data, online ideological and political work has potential development space. It is necessary to carry out mental health education in universities, fully utilize online education forms, and improve ability of universities to repair mental health problems. Based on this, this system designs and implements software for typical image resolution based recognition and artificial intelligence. The use of B/S architecture in the development and use of. net technology and web server technology will enable more students to connect and use different terminals. In addition, an algorithm for image super-resolution recognition was proposed, which uses clustering convolution to improve residual blocks, improves modeling ability by extracting features on a larger scale, reduces the number of parameters to improve model calculation efficiency, and enables mental health educators and managers to work better. This article combines image super-resolution recognition technology with artificial intelligence technology to apply it to the process of psychological education in universities, thereby promoting the development of problem repair applications.


Subject(s)
Artificial Intelligence , Students , Humans , Educational Status , Health Education , Algorithms , Universities
11.
Sensors (Basel) ; 23(23)2023 Dec 02.
Article in English | MEDLINE | ID: mdl-38067950

ABSTRACT

Traditional Convolutional Neural Network (ConvNet, CNN)-based image super-resolution (SR) methods have lower computation costs, making them more friendly for real-world scenarios. However, they suffer from lower performance. On the contrary, Vision Transformer (ViT)-based SR methods have achieved impressive performance recently, but these methods often suffer from high computation costs and model storage overhead, making them hard to meet the requirements in practical application scenarios. In practical scenarios, an SR model should reconstruct an image with high quality and fast inference. To handle this issue, we propose a novel CNN-based Efficient Residual ConvNet enhanced with structural Re-parameterization (RepECN) for a better trade-off between performance and efficiency. A stage-to-block hierarchical architecture design paradigm inspired by ViT is utilized to keep the state-of-the-art performance, while the efficiency is ensured by abandoning the time-consuming Multi-Head Self-Attention (MHSA) and by re-designing the block-level modules based on CNN. Specifically, RepECN consists of three structural modules: a shallow feature extraction module, a deep feature extraction, and an image reconstruction module. The deep feature extraction module comprises multiple ConvNet Stages (CNS), each containing 6 Re-Parameterization ConvNet Blocks (RepCNB), a head layer, and a residual connection. The RepCNB utilizes larger kernel convolutions rather than MHSA to enhance the capability of learning long-range dependence. In the image reconstruction module, an upsampling module consisting of nearest-neighbor interpolation and pixel attention is deployed to reduce parameters and maintain reconstruction performance, while bicubic interpolation on another branch allows the backbone network to focus on learning high-frequency information. The extensive experimental results on multiple public benchmarks show that our RepECN can achieve 2.5∼5× faster inference than the state-of-the-art ViT-based SR model with better or competitive super-resolving performance, indicating that our RepECN can reconstruct high-quality images with fast inference.

12.
Sensors (Basel) ; 23(11)2023 May 24.
Article in English | MEDLINE | ID: mdl-37299757

ABSTRACT

The quality of videos varies due to the different capabilities of sensors. Video super-resolution (VSR) is a technology that improves the quality of captured video. However, the development of a VSR model is very costly. In this paper, we present a novel approach for adapting single-image super-resolution (SISR) models to the VSR task. To achieve this, we first summarize a common architecture of SISR models and perform a formal analysis of adaptation. Then, we propose an adaptation method that incorporates a plug-and-play temporal feature extraction module into existing SISR models. The proposed temporal feature extraction module consists of three submodules: offset estimation, spatial aggregation, and temporal aggregation. In the spatial aggregation submodule, the features obtained from the SISR model are aligned to the center frame based on the offset estimation results. The aligned features are fused in the temporal aggregation submodule. Finally, the fused temporal feature is fed to the SISR model for reconstruction. To evaluate the effectiveness of our method, we adapt five representative SISR models and evaluate these models on two popular benchmarks. The experiment results show the proposed method is effective on different SISR models. In particular, on the Vid4 benchmark, the VSR-adapted models achieve at least 1.26 dB and 0.067 improvement over the original SISR models in terms of PSNR and SSIM metrics, respectively. Additionally, these VSR-adapted models achieve better performance than the state-of-the-art VSR models.


Subject(s)
Acclimatization , Benchmarking , Technology
13.
Sensors (Basel) ; 23(11)2023 Jun 05.
Article in English | MEDLINE | ID: mdl-37300065

ABSTRACT

Image super-resolution (SR) usually synthesizes degraded low-resolution images with a predefined degradation model for training. Existing SR methods inevitably perform poorly when the true degradation does not follow the predefined degradation, especially in the case of the real world. To tackle this robustness issue, we propose a cascaded degradation-aware blind super-resolution network (CDASRN), which not only eliminates the influence of noise on blur kernel estimation but also can estimate the spatially varying blur kernel. With the addition of contrastive learning, our CDASRN can further distinguish the differences between local blur kernels, greatly improving its practicality. Experiments in various settings show that CDASRN outperforms state-of-the-art methods on both heavily degraded synthetic datasets and real-world datasets.

14.
Sensors (Basel) ; 23(4)2023 Feb 10.
Article in English | MEDLINE | ID: mdl-36850618

ABSTRACT

Due to its widespread usage in many applications, numerous deep learning algorithms have been proposed to overcome Light Field's trade-off (LF). The sensor's low resolution limits angular and spatial resolution, which causes this trade-off. The proposed method should be able to model the non-local properties of the 4D LF data fully to mitigate this problem. Therefore, this paper proposes a different approach to increase spatial and angular information interaction for LF image super-resolution (SR). We achieved this by processing the LF Sub-Aperture Images (SAI) independently to extract the spatial information and the LF Macro-Pixel Image (MPI) to extract the angular information. The MPI or Lenslet LF image is characterized by its ability to integrate more complementary information between different viewpoints (SAIs). In particular, we extract initial features and then process MAI and SAIs alternately to incorporate angular and spatial information. Finally, the interacted features are added to the initial extracted features to reconstruct the final output. We trained the proposed network to minimize the sum of absolute errors between low-resolution (LR) input and high-resolution (HR) output images. Experimental results prove the high performance of our proposed method over the state-of-the-art methods on LFSR for small baseline LF images.

15.
Sensors (Basel) ; 23(8)2023 Apr 12.
Article in English | MEDLINE | ID: mdl-37112247

ABSTRACT

Super-resolution (SR) images based on deep networks have achieved great accomplishments in recent years, but the large number of parameters that come with them are not conducive to use in equipment with limited capabilities in real life. Therefore, we propose a lightweight feature distillation and enhancement network (FDENet). Specifically, we propose a feature distillation and enhancement block (FDEB), which contains two parts: a feature-distillation part and a feature-enhancement part. Firstly, the feature-distillation part uses the stepwise distillation operation to extract the layered feature, and here we use the proposed stepwise fusion mechanism (SFM) to fuse the retained features after stepwise distillation to promote information flow and use the shallow pixel attention block (SRAB) to extract information. Secondly, we use the feature-enhancement part to enhance the extracted features. The feature-enhancement part is composed of well-designed bilateral bands. The upper sideband is used to enhance the features, and the lower sideband is used to extract the complex background information of remote sensing images. Finally, we fuse the features of the upper and lower sidebands to enhance the expression ability of the features. A large number of experiments show that the proposed FDENet both produces less parameters and performs better than most existing advanced models.

16.
Sensors (Basel) ; 23(8)2023 Apr 13.
Article in English | MEDLINE | ID: mdl-37112303

ABSTRACT

Deployment of deep convolutional neural networks (CNNs) in single image super-resolution (SISR) for edge computing devices is mainly hampered by the huge computational cost. In this work, we propose a lightweight image super-resolution (SR) network based on a reparameterizable multibranch bottleneck module (RMBM). In the training phase, RMBM efficiently extracts high-frequency information by utilizing multibranch structures, including bottleneck residual block (BRB), inverted bottleneck residual block (IBRB), and expand-squeeze convolution block (ESB). In the inference phase, the multibranch structures can be combined into a single 3 × 3 convolution to reduce the number of parameters without incurring any additional computational cost. Furthermore, a novel peak-structure-edge (PSE) loss is proposed to resolve the problem of oversmoothed reconstructed images while significantly improving image structure similarity. Finally, we optimize and deploy the algorithm on the edge devices equipped with the rockchip neural processor unit (RKNPU) to achieve real-time SR reconstruction. Extensive experiments on natural image datasets and remote sensing image datasets show that our network outperforms advanced lightweight SR networks regarding objective evaluation metrics and subjective vision quality. The reconstruction results demonstrate that the proposed network can achieve higher SR performance with a 98.1 K model size, which can be effectively deployed to edge computing devices.

17.
Sensors (Basel) ; 23(7)2023 Mar 29.
Article in English | MEDLINE | ID: mdl-37050632

ABSTRACT

Remote sensing images often have limited resolution, which can hinder their effectiveness in various applications. Super-resolution techniques can enhance the resolution of remote sensing images, and arbitrary resolution super-resolution techniques provide additional flexibility in choosing appropriate image resolutions for different tasks. However, for subsequent processing, such as detection and classification, the resolution of the input image may vary greatly for different methods. In this paper, we propose a method for continuous remote sensing image super-resolution using feature-enhanced implicit neural representation (SR-FEINR). Continuous remote sensing image super-resolution means users can scale a low-resolution image into an image with arbitrary resolution. Our algorithm is composed of three main components: a low-resolution image feature extraction module, a positional encoding module, and a feature-enhanced multi-layer perceptron module. We are the first to apply implicit neural representation in a continuous remote sensing image super-resolution task. Through extensive experiments on two popular remote sensing image datasets, we have shown that our SR-FEINR outperforms the state-of-the-art algorithms in terms of accuracy. Our algorithm showed an average improvement of 0.05 dB over the existing method on ×30 across three datasets.

18.
Sensors (Basel) ; 23(19)2023 Oct 01.
Article in English | MEDLINE | ID: mdl-37837043

ABSTRACT

Advanced deep learning-based Single Image Super-Resolution (SISR) techniques are designed to restore high-frequency image details and enhance imaging resolution through the use of rapid and lightweight network architectures. Existing SISR methodologies face the challenge of striking a balance between performance and computational costs, which hinders the practical application of SISR methods. In response to this challenge, the present study introduces a lightweight network known as the Spatial and Channel Aggregation Network (SCAN), designed to excel in image super-resolution (SR) tasks. SCAN is the first SISR method to employ large-kernel convolutions combined with feature reduction operations. This design enables the network to focus more on challenging intermediate-level information extraction, leading to improved performance and efficiency of the network. Additionally, an innovative 9 × 9 large kernel convolution was introduced to further expand the receptive field. The proposed SCAN method outperforms state-of-the-art lightweight SISR methods on benchmark datasets with a 0.13 dB improvement in peak signal-to-noise ratio (PSNR) and a 0.0013 increase in structural similarity (SSIM). Moreover, on remote sensing datasets, SCAN achieves a 0.4 dB improvement in PSNR and a 0.0033 increase in SSIM.

19.
Sensors (Basel) ; 23(5)2023 Feb 22.
Article in English | MEDLINE | ID: mdl-36904643

ABSTRACT

As small commodity features are often few in number and easily occluded by hands, the overall detection accuracy is low, and small commodity detection is still a great challenge. Therefore, in this study, a new algorithm for occlusion detection is proposed. Firstly, a super-resolution algorithm with an outline feature extraction module is used to process the input video frames to restore high-frequency details, such as the contours and textures of the commodities. Next, residual dense networks are used for feature extraction, and the network is guided to extract commodity feature information under the effects of an attention mechanism. As small commodity features are easily ignored by the network, a new local adaptive feature enhancement module is designed to enhance the regional commodity features in the shallow feature map to enhance the expression of the small commodity feature information. Finally, a small commodity detection box is generated through the regional regression network to complete the small commodity detection task. Compared to RetinaNet, the F1-score improved by 2.6%, and the mean average precision improved by 2.45%. The experimental results reveal that the proposed method can effectively enhance the expressions of the salient features of small commodities and further improve the detection accuracy for small commodities.

20.
Sensors (Basel) ; 23(5)2023 Feb 21.
Article in English | MEDLINE | ID: mdl-36904589

ABSTRACT

The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better version of an image given in a low-quality format. In this study, the efficiency of ViT in image restoration is studied extensively. The ViT architectures are classified for every task of image restoration. Seven image restoration tasks are considered: Image Super-Resolution, Image Denoising, General Image Enhancement, JPEG Compression Artifact Reduction, Image Deblurring, Removing Adverse Weather Conditions, and Image Dehazing. The outcomes, the advantages, the limitations, and the possible areas for future research are detailed. Overall, it is noted that incorporating ViT in the new architectures for image restoration is becoming a rule. This is due to some advantages compared to CNN, such as better efficiency, especially when more data are fed to the network, robustness in feature extraction, and a better feature learning approach that sees better the variances and characteristics of the input. Nevertheless, some drawbacks exist, such as the need for more data to show the benefits of ViT over CNN, the increased computational cost due to the complexity of the self-attention block, a more challenging training process, and the lack of interpretability. These drawbacks represent the future research direction that should be targeted to increase the efficiency of ViT in the image restoration domain.

SELECTION OF CITATIONS
SEARCH DETAIL