Pesquisa | Portal Regional da BVS

1.

Elevating Chest X-ray Image Super-Resolution with Residual Network Enhancement.

Khishigdelger, Anudari; Salem, Ahmed; Kang, Hyun-Soo.

J Imaging ; 10(3)2024 Mar 04.

Artigo em Inglês | MEDLINE | ID: mdl-38535144

RESUMO

Chest X-ray (CXR) imaging plays a pivotal role in diagnosing various pulmonary diseases, which account for a significant portion of the global mortality rate, as recognized by the World Health Organization (WHO). Medical practitioners routinely depend on CXR images to identify anomalies and make critical clinical decisions. Dramatic improvements in super-resolution (SR) have been achieved by applying deep learning techniques. However, some SR methods are very difficult to utilize due to their low-resolution inputs and features containing abundant low-frequency information, similar to the case of X-ray image super-resolution. In this paper, we introduce an advanced deep learning-based SR approach that incorporates the innovative residual-in-residual (RIR) structure to augment the diagnostic potential of CXR imaging. Specifically, we propose forming a light network consisting of residual groups built by residual blocks, with multiple skip connections to facilitate the efficient bypassing of abundant low-frequency information through multiple skip connections. This approach allows the main network to concentrate on learning high-frequency information. In addition, we adopted the dense feature fusion within residual groups and designed high parallel residual blocks for better feature extraction. Our proposed methods exhibit superior performance compared to existing state-of-the-art (SOTA) SR methods, delivering enhanced accuracy and notable visual improvements, as evidenced by our results.

2.

Conv3D-Based Video Violence Detection Network Using Optical Flow and RGB Data.

Park, Jae-Hyuk; Mahmoud, Mohamed; Kang, Hyun-Soo.

Sensors (Basel) ; 24(2)2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-38257410

RESUMO

Detecting violent behavior in videos to ensure public safety and security poses a significant challenge. Precisely identifying and categorizing instances of violence in real-life closed-circuit television, which vary across specifications and locations, requires comprehensive understanding and processing of the sequential information embedded in these videos. This study aims to introduce a model that adeptly grasps the spatiotemporal context of videos within diverse settings and specifications of violent scenarios. We propose a method to accurately capture spatiotemporal features linked to violent behaviors using optical flow and RGB data. The approach leverages a Conv3D-based ResNet-3D model as the foundational network, capable of handling high-dimensional video data. The efficiency and accuracy of violence detection are enhanced by integrating an attention mechanism, which assigns greater weight to the most crucial frames within the RGB and optical-flow sequences during instances of violence. Our model was evaluated on the UBI-Fight, Hockey, Crowd, and Movie-Fights datasets; the proposed method outperformed existing state-of-the-art techniques, achieving area under the curve scores of 95.4, 98.1, 94.5, and 100.0 on the respective datasets. Moreover, this research not only has the potential to be applied in real-time surveillance systems but also promises to contribute to a broader spectrum of research in video analysis and understanding.

Assuntos

Fluxo Óptico , Violência , Sistemas Computacionais

3.

Anomaly Detection Based on a 3D Convolutional Neural Network Combining Convolutional Block Attention Module Using Merged Frames.

Hwang, In-Chang; Kang, Hyun-Soo.

Sensors (Basel) ; 23(23)2023 Dec 04.

Artigo em Inglês | MEDLINE | ID: mdl-38067989

RESUMO

With the recent rise in violent crime, the real-time situation analysis capabilities of the prevalent closed-circuit television have been employed for the deterrence and resolution of criminal activities. Anomaly detection can identify abnormal instances such as violence within the patterns of a specified dataset; however, it faces challenges in that the dataset for abnormal situations is smaller than that for normal situations. Herein, using datasets such as UBI-Fights, RWF-2000, and UCSD Ped1 and Ped2, anomaly detection was approached as a binary classification problem. Frames extracted from each video with annotation were reconstructed into a limited number of images of 3×3, 4×3, 4×4, 5×3 sizes using the method proposed in this paper, forming an input data structure similar to a light field and patch of vision transformer. The model was constructed by applying a convolutional block attention module that included channel and spatial attention modules to a residual neural network with depths of 10, 18, 34, and 50 in the form of a three-dimensional convolution. The proposed model performed better than existing models in detecting abnormal behavior such as violent acts in videos. For instance, with the undersampled UBI-Fights dataset, our network achieved an accuracy of 0.9933, a loss value of 0.0010, an area under the curve of 0.9973, and an equal error rate of 0.0027. These results may contribute significantly to solve real-world issues such as the detection of violent behavior in artificial intelligence systems using computer vision and real-time video monitoring.

4.

GANMasker: A Two-Stage Generative Adversarial Network for High-Quality Face Mask Removal.

Mahmoud, Mohamed; Kang, Hyun-Soo.

Sensors (Basel) ; 23(16)2023 Aug 10.

Artigo em Inglês | MEDLINE | ID: mdl-37631631

RESUMO

Deep-learning-based image inpainting methods have made remarkable advancements, particularly in object removal tasks. The removal of face masks has gained significant attention, especially in the wake of the COVID-19 pandemic, and while numerous methods have successfully addressed the removal of small objects, removing large and complex masks from faces remains demanding. This paper presents a novel two-stage network for unmasking faces considering the intricate facial features typically concealed by masks, such as noses, mouths, and chins. Additionally, the scarcity of paired datasets comprising masked and unmasked face images poses an additional challenge. In the first stage of our proposed model, we employ an autoencoder-based network for binary segmentation of the face mask. Subsequently, in the second stage, we introduce a generative adversarial network (GAN)-based network enhanced with attention and Masked-Unmasked Region Fusion (MURF) mechanisms to focus on the masked region. Our network generates realistic and accurate unmasked faces that resemble the original faces. We train our model on paired unmasked and masked face images sourced from CelebA, a large public dataset, and evaluate its performance on multi-scale masked faces. The experimental results illustrate that the proposed method surpasses the current state-of-the-art techniques in both qualitative and quantitative metrics. It achieves a Peak Signal-to-Noise Ratio (PSNR) improvement of 4.18 dB over the second-best method, with the PSNR reaching 30.96. Additionally, it exhibits a 1% increase in the Structural Similarity Index Measure (SSIM), achieving a value of 0.95.

Assuntos

COVID-19 , Máscaras , Humanos , Pandemias , Equipamento de Proteção Individual , Benchmarking

5.

Light Field Image Super-Resolution Using Deep Residual Networks on Lenslet Images.

Salem, Ahmed; Ibrahem, Hatem; Kang, Hyun-Soo.

Sensors (Basel) ; 23(4)2023 Feb 10.

Artigo em Inglês | MEDLINE | ID: mdl-36850618

RESUMO

Due to its widespread usage in many applications, numerous deep learning algorithms have been proposed to overcome Light Field's trade-off (LF). The sensor's low resolution limits angular and spatial resolution, which causes this trade-off. The proposed method should be able to model the non-local properties of the 4D LF data fully to mitigate this problem. Therefore, this paper proposes a different approach to increase spatial and angular information interaction for LF image super-resolution (SR). We achieved this by processing the LF Sub-Aperture Images (SAI) independently to extract the spatial information and the LF Macro-Pixel Image (MPI) to extract the angular information. The MPI or Lenslet LF image is characterized by its ability to integrate more complementary information between different viewpoints (SAIs). In particular, we extract initial features and then process MAI and SAIs alternately to incorporate angular and spatial information. Finally, the interacted features are added to the initial extracted features to reconstruct the final output. We trained the proposed network to minimize the sum of absolute errors between low-resolution (LR) input and high-resolution (HR) output images. Experimental results prove the high performance of our proposed method over the state-of-the-art methods on LFSR for small baseline LF images.

6.

Exploration of Semantic Label Decomposition and Dataset Size in Semantic Indoor Scenes Synthesis via Optimized Residual Generative Adversarial Networks.

Ibrahem, Hatem; Salem, Ahmed; Kang, Hyun-Soo.

Sensors (Basel) ; 22(21)2022 Oct 29.

Artigo em Inglês | MEDLINE | ID: mdl-36366007

RESUMO

In this paper, we revisit the paired image-to-image translation using the conditional generative adversarial network, the so-called "Pix2Pix", and propose efficient optimization techniques for the architecture and the training method to maximize the architecture's performance to boost the realism of the generated images. We propose a generative adversarial network-based technique to create new artificial indoor scenes using a user-defined semantic segmentation map as an input to define the location, shape, and category of each object in the scene, exactly similar to Pix2Pix. We train different residual connections-based architectures of the generator and discriminator on the NYU depth-v2 dataset and a selected indoor subset from the ADE20K dataset, showing that the proposed models have fewer parameters, less computational complexity, and can generate better quality images than the state of the art methods following the same technique to generate realistic indoor images. We also prove that using extra specific labels and more training samples increases the quality of the generated images; however, the proposed residual connections-based models can learn better from small datasets (i.e., NYU depth-v2) and can improve the realism of the generated images in training on bigger datasets (i.e., ADE20K indoor subset) in comparison to Pix2Pix. The proposed method achieves an LPIPS value of 0.505 and an FID value of 81.067, generating better quality images than that produced by Pix2Pix and other recent paired Image-to-image translation methods and outperforming them in terms of LPIPS and FID.

7.

RCA-LF: Dense Light Field Reconstruction Using Residual Channel Attention Networks.

Salem, Ahmed; Ibrahem, Hatem; Kang, Hyun-Soo.

Sensors (Basel) ; 22(14)2022 Jul 14.

Artigo em Inglês | MEDLINE | ID: mdl-35890934

RESUMO

Dense multi-view image reconstruction has played an active role in research for a long time and interest has recently increased. Multi-view images can solve many problems and enhance the efficiency of many applications. This paper presents a more specific solution for reconstructing high-density light field (LF) images. We present this solution for images captured by Lytro Illum cameras to solve the implicit problem related to the discrepancy between angular and spatial resolution resulting from poor sensor resolution. We introduce the residual channel attention light field (RCA-LF) structure to solve different LF reconstruction tasks. In our approach, view images are grouped in one stack where epipolar information is available. We use 2D convolution layers to process and extract features from the stacked view images. Our method adopts the channel attention mechanism to learn the relation between different views and give higher weight to the most important features, restoring more texture details. Finally, experimental results indicate that the proposed model outperforms earlier state-of-the-art methods for visual and numerical evaluation.

8.

LEOD-Net: Learning Line-Encoded Bounding Boxes for Real-Time Object Detection.

Ibrahem, Hatem; Salem, Ahmed; Kang, Hyun-Soo.

Sensors (Basel) ; 22(10)2022 May 12.

Artigo em Inglês | MEDLINE | ID: mdl-35632108

RESUMO

This paper proposes a learnable line encoding technique for bounding boxes commonly used in the object detection task. A bounding box is simply encoded using two main points: the top-left corner and the bottom-right corner of the bounding box; then, a lightweight convolutional neural network (CNN) is employed to learn the lines and propose high-resolution line masks for each category of classes using a pixel-shuffle operation. Post-processing is applied to the predicted line masks to filtrate them and estimate clear lines based on a progressive probabilistic Hough transform. The proposed method was trained and evaluated on two common object detection benchmarks: Pascal VOC2007 and MS-COCO2017. The proposed model attains high mean average precision (mAP) values (78.8% for VOC2007 and 48.1% for COCO2017) while processing each frame in a few milliseconds (37 ms for PASCAL VOC and 47 ms for COCO). The strength of the proposed method lies in its simplicity and ease of implementation unlike the recent state-of-the-art methods in object detection, which include complex processing pipelines.

Assuntos

Máscaras , Redes Neurais de Computação

9.

RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.

Ibrahem, Hatem; Salem, Ahmed; Kang, Hyun-Soo.

Sensors (Basel) ; 22(10)2022 May 19.

Artigo em Inglês | MEDLINE | ID: mdl-35632271

RESUMO

The latest research in computer vision highlighted the effectiveness of the vision transformers (ViT) in performing several computer vision tasks; they can efficiently understand and process the image globally unlike the convolution which processes the image locally. ViTs outperform the convolutional neural networks in terms of accuracy in many computer vision tasks but the speed of ViTs is still an issue, due to the excessive use of the transformer layers that include many fully connected layers. Therefore, we propose a real-time ViT-based monocular depth estimation (depth estimation from single RGB image) method with encoder-decoder architectures for indoor and outdoor scenes. This main architecture of the proposed method consists of a vision transformer encoder and a convolutional neural network decoder. We started by training the base vision transformer (ViT-b16) with 12 transformer layers then we reduced the transformer layers to six layers, namely ViT-s16 (the Small ViT) and four layers, namely ViT-t16 (the Tiny ViT) to obtain real-time processing. We also try four different configurations of the CNN decoder network. The proposed architectures can learn the task of depth estimation efficiently and can produce more accurate depth predictions than the fully convolutional-based methods taking advantage of the multi-head self-attention module. We train the proposed encoder-decoder architecture end-to-end on the challenging NYU-depthV2 and CITYSCAPES benchmarks then we evaluate the trained models on the validation and test sets of the same benchmarks showing that it outperforms many state-of-the-art methods on depth estimation while performing the task in real-time (â¼20 fps). We also present a fast 3D reconstruction (â¼17 fps) experiment based on the depth estimated from our method which is considered a real-world application of our method.

Assuntos

Redes Neurais de Computação

10.

End-to-End Residual Network for Light Field Reconstruction on Raw Images and View Image Stacks.

Salem, Ahmed; Ibrahem, Hatem; Yagoub, Bilel; Kang, Hyun-Soo.

Sensors (Basel) ; 22(9)2022 May 06.

Artigo em Inglês | MEDLINE | ID: mdl-35591229

RESUMO

Light field (LF) technology has become a focus of great interest (due to its use in many applications), especially since the introduction of the consumer LF camera, which facilitated the acquisition of dense LF images. Obtaining densely sampled LF images is costly due to the trade-off between spatial and angular resolutions. Accordingly, in this research, we suggest a learning-based solution to this challenging problem, reconstructing dense, high-quality LF images. Instead of training our model with several images of the same scene, we used raw LF images (lenslet images). The raw LF format enables the encoding of several images of the same scene into one image. Consequently, it helps the network to understand and simulate the relationship between different images, resulting in higher quality images. We divided our model into two successive modules: LFR and LF augmentation (LFA). Each module is represented using a convolutional neural network-based residual network (CNN). We trained our network to lessen the absolute error between the novel and reference views. Experimental findings on real-world datasets show that our suggested method has excellent performance and superiority over state-of-the-art approaches.

Assuntos

Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador/métodos

11.

DTS-Depth: Real-Time Single-Image Depth Estimation Using Depth-to-Space Image Construction.

Ibrahem, Hatem; Salem, Ahmed; Kang, Hyun-Soo.

Sensors (Basel) ; 22(5)2022 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-35271061

RESUMO

As most of the recent high-resolution depth-estimation algorithms are computationally so expensive that they cannot work in real time, the common solution is using a low-resolution input image to reduce the computational complexity. We propose a different approach, an efficient and real-time convolutional neural network-based depth-estimation algorithm using a single high-resolution image as the input. The proposed method efficiently constructs a high-resolution depth map using a small encoding architecture and eliminates the need for a decoder, which is typically used in the encoder-decoder architectures employed for depth estimation. The proposed algorithm adopts a modified MobileNetV2 architecture, which is a lightweight architecture, to estimate the depth information through the depth-to-space image construction, which is generally employed in image super-resolution. As a result, it realizes fast frame processing and can predict a high-accuracy depth in real time. We train and test our method on the challenging KITTI, Cityscapes, and NYUV2 depth datasets. The proposed method achieves low relative absolute error (0.028 for KITTI, 0.167 for CITYSCAPES, and 0.069 for NYUV2) while working at speed reaching 48 frames per second on a GPU and 20 frames per second on a CPU for high-resolution test images. We compare our method with the state-of-the-art methods on depth estimation, showing that our method outperforms those methods. However, the architecture is less complex and works in real time.

12.

Light Field Reconstruction Using Residual Networks on Raw Images.

Salem, Ahmed; Ibrahem, Hatem; Kang, Hyun-Soo.

Sensors (Basel) ; 22(5)2022 Mar 02.

Artigo em Inglês | MEDLINE | ID: mdl-35271103

RESUMO

Although Light-Field (LF) technology attracts attention due to its large number of applications, especially with the introduction of consumer LF cameras and its frequent use, reconstructing densely sampled LF images represents a great challenge to the use and development of LF technology. Our paper proposes a learning-based method to reconstruct densely sampled LF images from a sparse set of input images. We trained our model with raw LF images rather than using multiple images of the same scene. Raw LF can represent the two-dimensional array of images captured in a single image. Therefore, it enables the network to understand and model the relationship between different images of the same scene well and thus restore more texture details and provide better quality. Using raw images has transformed the task from image reconstruction into image-to-image translation. The feature of small-baseline LF was used to define the images to be reconstructed using the nearest input view to initialize input images. Our network was trained end-to-end to minimize the sum of absolute errors between the reconstructed and ground-truth images. Experimental results on three challenging real-world datasets demonstrate the high performance of our proposed method and its outperformance over the state-of-the-art methods.

Assuntos

Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos

13.

DTS-Net: Depth-to-Space Networks for Fast and Accurate Semantic Object Segmentation.

Ibrahem, Hatem; Salem, Ahmed; Kang, Hyun-Soo.

Sensors (Basel) ; 22(1)2022 Jan 03.

Artigo em Inglês | MEDLINE | ID: mdl-35009879

RESUMO

We propose Depth-to-Space Net (DTS-Net), an effective technique for semantic segmentation using the efficient sub-pixel convolutional neural network. This technique is inspired by depth-to-space (DTS) image reconstruction, which was originally used for image and video super-resolution tasks, combined with a mask enhancement filtration technique based on multi-label classification, namely, Nearest Label Filtration. In the proposed technique, we employ depth-wise separable convolution-based architectures. We propose both a deep network, that is, DTS-Net, and a lightweight network, DTS-Net-Lite, for real-time semantic segmentation; these networks employ Xception and MobileNetV2 architectures as the feature extractors, respectively. In addition, we explore the joint semantic segmentation and depth estimation task and demonstrate that the proposed technique can efficiently perform both tasks simultaneously, outperforming state-of-art (SOTA) methods. We train and evaluate the performance of the proposed method on the PASCAL VOC2012, NYUV2, and CITYSCAPES benchmarks. Hence, we obtain high mean intersection over union (mIOU) and mean pixel accuracy (Pix.acc.) values using simple and lightweight convolutional neural network architectures of the developed networks. Notably, the proposed method outperforms SOTA methods that depend on encoder-decoder architectures, although our implementation and computations are far simpler.

14.

MiR-9 Controls Chemotactic Activity of Cord Blood CD34âº Cells by Repressing CXCR4 Expression.

Ha, Tae Won; Kang, Hyun Soo; Kim, Tae-Hee; Kwon, Ji Hyun; Kim, Hyun Kyu; Ryu, Aeli; Jeon, Hyeji; Han, Jaeseok; Broxmeyer, Hal E; Hwang, Yongsung; Lee, Yun Kyung; Lee, Man Ryul.

Int J Stem Cells ; 11(2): 187-195, 2018 Nov 30.

Artigo em Inglês | MEDLINE | ID: mdl-30343551

RESUMO

Improved approaches for promoting umbilical cord blood (CB) hematopoietic stem cell (HSC) homing are clinically important to enhance engraftment of CB-HSCs. Clinical transplantation of CB-HSCs is used to treat a wide range of disorders. However, an improved understanding of HSC chemotaxis is needed for facilitation of the engraftment process. We found that ectopic overexpression of miR-9 and antisense-miR-9 respectively down- and up-regulated C-X-C chemokine receptor type 4 (CXCR4) expression in CB-CD34ï¼ cells as well as in 293T and TF-1 cell lines. Since CXCR4 is a specific receptor for the stromal cell derived factor-1 (SDF-1) chemotactic factor, we investigated whether sense miR-9 and antisense miR-9 influenced CXCR4-mediated chemotactic mobility of primary CB CD34ï¼ cells and TF-1 cells. Ectopic overexpression of sense miR-9 and antisense miR-9 respectively down- and up-regulated SDF-1-mediated chemotactic cell mobility. To our knowledge, this study is the first to report that miR-9 may play a role in regulating CXCR4 expression and SDF-1-mediated chemotactic activity of CB CD34ï¼ cells.

15.

Enhanced three-dimensional discrete cosine transform based compression method for integral images by adaptive three-dimensional block construction.

Jeon, Ju-Il; Kang, Hyun-Soo.

Appl Opt ; 49(30): 5728-35, 2010 Oct 20.

Artigo em Inglês | MEDLINE | ID: mdl-20962936

RESUMO

In this paper, we propose an efficient compression method for integral images based on three-dimensional discrete cosine transform (3D-DCT). Even though the existing 3D-DCT based techniques are efficient, they may not be optimized to the characteristics of integral images, such as applying a fixed size block construction and a fixed scanning in placing 2D blocks to construct a 3D block. Therefore, we propose a variable size block construction and a scanning method adaptive to characteristics of integral images, which are realized by adaptive 3D block modes. Experimental results show that the proposed method gives significant improvement in coding efficiency. In particular, at the high bit rates, the proposed method is more improved, since overhead bits for signaling of the 3D block modes take a smaller part of the total bits.

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA