Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Article in English | MEDLINE | ID: mdl-38190685

ABSTRACT

Video motion magnification is the task of making subtle minute motions visible. Many times subtle motion occurs while being invisible to the naked eye, e.g., slight deformations in muscles of an athlete, small vibrations in the objects, microexpression, and chest movement while breathing. Magnification of such small motions has resulted in various applications like posture deformities detection, microexpression recognition, and studying the structural properties. State-of-the-art (SOTA) methods have fixed computational complexity, which makes them less suitable for applications requiring different time constraints, e.g., real-time respiratory rate measurement and microexpression classification. To solve this problem, we propose a knowledge distillation-based latency aware-differentiable architecture search (KL-DNAS) method for video motion magnification. To reduce memory requirements and to improve denoising characteristics, we use a teacher network to search the network by parts using knowledge distillation (KD). Furthermore, search among different receptive fields and multifeature connections are applied for individual layers. Also, a novel latency loss is proposed to jointly optimize the target latency constraint and output quality. We are able to find 2.8 × smaller model than the SOTA method and better motion magnification with lesser distortions. https://github.com/jasdeep-singh-007/KL-DNAS.

2.
Article in English | MEDLINE | ID: mdl-37703169

ABSTRACT

With the advancement in image editing applications, image inpainting is gaining more attention due to its ability to recover corrupted images efficiently. Also, the existing methods for image inpainting either use two-stage coarse-to-fine architectures or single-stage architectures with a deeper network. On the other hand, shallow network architectures lack the quality of results and the methods with remarkable inpainting quality have high complexity in terms of number of parameters or average run time. Despite the improvement in the inpainting quality, these methods still lack the correlated local and global information. In this work, we propose a single-stage multi-resolution generator architecture for image inpainting with moderate complexity and superior outcomes. Here, a multi-kernel non-local (MKNL) attention block is proposed to merge the feature maps from all the resolutions. Further, a feature projection block is proposed to project features of MKNL to respective decoder for effective reconstruction of image. Also, a valid feature fusion block is proposed to merge encoder skip connection features at valid region and respective decoder features at hole region. This ensures that there will not be any redundant feature merging while reconstruction of image. Effectiveness of the proposed architecture is verified on CelebA-HQ [1], [2] and Places2 [3] datasets corrupted with publicly available NVIDIA mask dataset [4]. The detailed ablation study, extensive result analysis, and application of object removal prove the robustness of the proposed method over existing state-of-the-art methods for image inpainting.

3.
IEEE Trans Image Process ; 31: 6577-6590, 2022.
Article in English | MEDLINE | ID: mdl-36251900

ABSTRACT

Image inpainting is one of the most important and widely used approaches where input image is synthesized at the missing regions. This has various applications like undesired object removal, virtual garment shopping, etc. The methods used for image inpainting may use the knowledge of hole locations to effectively regenerate contents in an image. Existing image inpainting methods give astonishing results with coarse-to-fine architectures or with use of guided information like edges, structures, etc. The coarse-to-fine architectures require umpteen resources leading to high computation cost of the architecture. Other methods with edge or structural information depend on the available models to generate guiding information for inpainting. In this context, we have proposed computationally efficient, light-weight network for image inpainting with very less number of parameters (0.97M) and without any guided information. The proposed architecture consists of the multi-encoder level feature fusion module, pseudo decoder and regeneration decoder. The encoder multi level feature fusion module extracts relevant information from each of the encoder levels to merge structural and textural information from various receptive fields. This information is then processed with pseudo decoder followed by space depth correlation module to assist regeneration decoder for inpainting task. The experiments are performed with different types of masks and compared with the state-of-the-art methods on three benchmark datasets i.e., Paris Street View (PARIS_SV), Places2 and CelebA_HQ. Along with this, the proposed network is tested on high resolution images ( 1024×1024 and 2048 ×2048 ) and compared with the existing methods. The extensive comparison with state-of-the-art methods, computational complexity analysis, and ablation study prove the effectiveness of the proposed framework for image inpainting.

4.
IEEE Trans Image Process ; 30: 7889-7902, 2021.
Article in English | MEDLINE | ID: mdl-34478367

ABSTRACT

Moving object segmentation (MOS) in videos received considerable attention because of its broad security-based applications like robotics, outdoor video surveillance, self-driving cars, etc. The current prevailing algorithms highly depend on additional trained modules for other applications or complicated training procedures or neglect the inter-frame spatio-temporal structural dependencies. To address these issues, a simple, robust, and effective unified recurrent edge aggregation approach is proposed for MOS, in which additional trained modules or fine-tuning on a test video frame(s) are not required. Here, a recurrent edge aggregation module (REAM) is proposed to extract effective foreground relevant features capturing spatio-temporal structural dependencies with encoder and respective decoder features connected recurrently from previous frame. These REAM features are then connected to a decoder through skip connections for comprehensive learning named as temporal information propagation. Further, the motion refinement block with multi-scale dense residual is proposed to combine the features from the optical flow encoder stream and the last REAM module for holistic feature learning. Finally, these holistic features and REAM features are given to the decoder block for segmentation. To guide the decoder block, previous frame output with respective scales is utilized. The different configurations of training-testing techniques are examined to evaluate the performance of the proposed method. Specifically, outdoor videos often suffer from constrained visibility due to different environmental conditions and other small particles in the air that scatter the light in the atmosphere. Thus, comprehensive result analysis is conducted on six benchmark video datasets with different surveillance environments. We demonstrate that the proposed method outperforms the state-of-the-art methods for MOS without any pre-trained module, fine-tuning on the test video frame(s) or complicated training.

5.
Article in English | MEDLINE | ID: mdl-31545721

ABSTRACT

Unlike prevalent facial expressions, micro expressions have subtle, involuntary muscle movements which are short-lived in nature. These minute muscle movements reflect true emotions of a person. Due to the short duration and low intensity, these micro-expressions are very difficult to perceive and interpret correctly. In this paper, we propose the dynamic representation of micro-expressions to preserve facial movement information of a video in a single frame. We also propose a Lateral Accretive Hybrid Network (LEARNet) to capture micro-level features of an expression in the facial region. The LEARNet refines the salient expression features in accretive manner by incorporating accretion layers (AL) in the network. The response of the AL holds the hybrid feature maps generated by prior laterally connected convolution layers. Moreover, LEARNet architecture incorporates the cross decoupled relationship between convolution layers which helps in preserving the tiny but influential facial muscle change information. The visual responses of the proposed LEARNet depict the effectiveness of the system by preserving both high- and micro-level edge features of facial expression. The effectiveness of the proposed LEARNet is evaluated on four benchmark datasets: CASME-I, CASME-II, CAS(ME)'2 and SMIC. The experimental results after investigation show a significant improvement of 4.03%, 1.90%, 1.79% and 2.82% as compared with ResNet on CASME-I, CASME-II, CAS(ME)'2 and SMIC datasets respectively.

6.
Article in English | MEDLINE | ID: mdl-31425076

ABSTRACT

Haze removal from a single image is a challenging task. Estimation of accurate scene transmission map (TrMap) is the key to reconstruct the haze-free scene. In this paper, we propose a convolutional neural network based architecture to estimate the TrMap of the hazy scene. The proposed network takes the hazy image as an input and extracts the haze relevant features using proposed RNet and YNet through RGB and YCbCr color spaces respectively and generates two TrMaps. Further, we propose a novel TrMap fusion network (FNet) to integrate two TrMaPs and estimate robust TrMap for the hazy scene. To analyze the robustness of FNet, we tested it on combinations of TrMaps obtained from existing state-of-the-art methods. Performance evaluation of the proposed approach has been carried out using the structural similarity index, mean square error and peak signal to noise ratio. We conduct experiments on five datasets namely: D-HAZY ancuti2016d, Imagenet deng2009imagenet, Indoor SOTS li2017reside, HazeRD zhang2017hazerd and set of real-world hazy images. Performance analysis shows that the proposed approach outperforms the existing state-of-the-art methods for single image dehazing. Further, we extended our work to address high-level vision task such as object detection in hazy scenes. It is observed that there is a significant improvement in accurate object detection in hazy scenes using proposed approach.

8.
IEEE J Biomed Health Inform ; 18(3): 929-38, 2014 May.
Article in English | MEDLINE | ID: mdl-24235315

ABSTRACT

In this paper, a new image indexing and retrieval algorithm using local mesh patterns are proposed for biomedical image retrieval application. The standard local binary pattern encodes the relationship between the referenced pixel and its surrounding neighbors, whereas the proposed method encodes the relationship among the surrounding neighbors for a given referenced pixel in an image. The possible relationships among the surrounding neighbors are depending on the number of neighbors, P. In addition, the effectiveness of our algorithm is confirmed by combining it with the Gabor transform. To prove the effectiveness of our algorithm, three experiments have been carried out on three different biomedical image databases. Out of which two are meant for computer tomography (CT) and one for magnetic resonance (MR) image retrieval. It is further mentioned that the database considered for three experiments are OASIS-MRI database, NEMA-CT database, and VIA/I-ELCAP database which includes region of interest CT images. The results after being investigated show a significant improvement in terms of their evaluation measures as compared to LBP, LBP with Gabor transform, and other spatial and transform domain methods.


Subject(s)
Algorithms , Image Processing, Computer-Assisted/methods , Magnetic Resonance Imaging/methods , Tomography, X-Ray Computed/methods , Adolescent , Adult , Aged , Aged, 80 and over , Databases, Factual , Humans , Middle Aged , Young Adult
9.
IEEE Trans Image Process ; 21(5): 2874-86, 2012 May.
Article in English | MEDLINE | ID: mdl-22514130

ABSTRACT

In this paper, we propose a novel image indexing and retrieval algorithm using local tetra patterns (LTrPs) for content-based image retrieval (CBIR). The standard local binary pattern (LBP) and local ternary pattern (LTP) encode the relationship between the referenced pixel and its surrounding neighbors by computing gray-level difference. The proposed method encodes the relationship between the referenced pixel and its neighbors, based on the directions that are calculated using the first-order derivatives in vertical and horizontal directions. In addition, we propose a generic strategy to compute nth-order LTrP using (n - 1)th-order horizontal and vertical derivatives for efficient CBIR and analyze the effectiveness of our proposed algorithm by combining it with the Gabor transform. The performance of the proposed method is compared with the LBP, the local derivative patterns, and the LTP based on the results obtained using benchmark image databases viz., Corel 1000 database (DB1), Brodatz texture database (DB2), and MIT VisTex database (DB3). Performance analysis shows that the proposed method improves the retrieval result from 70.34%/44.9% to 75.9%/48.7% in terms of average precision/average recall on database DB1, and from 79.97% to 85.30% and 82.23% to 90.02% in terms of average retrieval rate on databases DB2 and DB3, respectively, as compared with the standard LBP.


Subject(s)
Algorithms , Documentation/methods , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity
10.
J Med Syst ; 36(5): 2865-79, 2012 Oct.
Article in English | MEDLINE | ID: mdl-21822675

ABSTRACT

A new algorithm for medical image retrieval is presented in the paper. An 8-bit grayscale image is divided into eight binary bit-planes, and then binary wavelet transform (BWT) which is similar to the lifting scheme in real wavelet transform (RWT) is performed on each bitplane to extract the multi-resolution binary images. The local binary pattern (LBP) features are extracted from the resultant BWT sub-bands. Three experiments have been carried out for proving the effectiveness of the proposed algorithm. Out of which two are meant for medical image retrieval and one for face retrieval. It is further mentioned that the database considered for three experiments are OASIS magnetic resonance imaging (MRI) database, NEMA computer tomography (CT) database and PolyU-NIRFD face database. The results after investigation shows a significant improvement in terms of their evaluation measures as compared to LBP and LBP with Gabor transform.


Subject(s)
Information Storage and Retrieval/methods , Wavelet Analysis , Algorithms , Diagnostic Imaging , Image Enhancement , Magnetic Resonance Imaging , Pattern Recognition, Automated , Tomography, X-Ray Computed
SELECTION OF CITATIONS
SEARCH DETAIL
...