RESUMO
Wavefront aberration describes the deviation of a wavefront in an imaging system from a desired perfect shape, such as a plane or a sphere, which may be caused by a variety of factors, such as imperfections in optical equipment, atmospheric turbulence, and the physical properties of imaging subjects and medium. Measuring the wavefront aberration of an imaging system is a crucial part of modern optics and optical engineering, with a variety of applications such as adaptive optics, optical testing, microscopy, laser system design, and ophthalmology. While there are dedicated wavefront sensors that aim to measure the phase of light, they often exhibit some drawbacks, such as higher cost and limited spatial resolution compared to regular intensity measurement. In this paper, we introduce a lightweight and practical learning-based method, named LWNet, to recover the wavefront aberration for an imaging system from a single intensity measurement. Specifically, LWNet takes a measured point spread function (PSF) as input and recovers the wavefront aberration with a two-stage network. The first stage network estimates an initial wavefront aberration via supervised learning, and the second stage network further optimizes the wavefront aberration via self-supervised learning by enforcing the statistical priors and physical constraints of wavefront aberrations via Zernike decomposition. For supervised learning, we created a synthetic PSF-wavefront aberration dataset via ray tracing of 88 lenses. Experimental results show that even trained with simulated data, LWNet works well for wavefront aberration estimation of real imaging systems and consistently outperforms prior learning-based methods.
RESUMO
We show that pre-trained Generative Adversarial Networks (GANs) such as StyleGAN and BigGAN can be used as a latent bank to improve the performance of image super-resolution. While most existing perceptual-oriented approaches attempt to generate realistic outputs through learning with adversarial loss, our method, Generative LatEnt bANk (GLEAN), goes beyond existing practices by directly leveraging rich and diverse priors encapsulated in a pre-trained GAN. But unlike prevalent GAN inversion methods that require expensive image-specific optimization at runtime, our approach only needs a single forward pass for restoration. GLEAN can be easily incorporated in a simple encoder-bank-decoder architecture with multi-resolution skip connections. Employing priors from different generative models allows GLEAN to be applied to diverse categories (e.g., human faces, cats, buildings, and cars). We further present a lightweight version of GLEAN, named LightGLEAN, which retains only the critical components in GLEAN. Notably, LightGLEAN consists of only 21% of parameters and 35% of FLOPs while achieving comparable image quality. We extend our method to different tasks including image colorization and blind image restoration, and extensive experiments show that our proposed models perform favorably in comparison to existing methods. Codes and models are available at https://github.com/open-mmlab/mmediting.
RESUMO
Modern machine learning has enhanced the image quality for consumer and mobile photography through low-light denoising, high dynamic range (HDR) imaging, and improved demosaicing among other applications. While most of these advances have been made for normal lens-based cameras, there has been an emerging body of research for improved photography for lensless cameras using thin optics such as amplitude or phase masks, diffraction gratings, or diffusion layers. These lensless cameras are suited for size and cost-constrained applications such as tiny robotics and microscopy that prohibit the use of a large lens. However, the earliest and simplest camera design, the camera obscura or pinhole camera, has been relatively overlooked for machine learning pipelines with minimal research on enhancing pinhole camera images for everyday photography applications. In this paper, we develop an image restoration pipeline of the pinhole system to enhance the pinhole image quality through joint denoising and deblurring. Our pipeline integrates optics-based filtering and reblur losses for reconstructing high resolution still images (2600 × 1952) as well as temporal consistency for video reconstruction to enable practical exposure times (30 FPS) for high resolution video (1920 × 1080). We demonstrate high 2D image quality on real pinhole images that is on-par or slightly improved compared to other lensless cameras. This work opens up the potential of pinhole cameras to be used for photography in size-limited devices such as smartphones in the future.
RESUMO
While humans can effortlessly transform complex visual scenes into simple words and the other way around by leveraging their high-level understanding of the content, conventional or the more recent learned image compression codecs do not seem to utilize the semantic meanings of visual content to their full potential. Moreover, they focus mostly on rate-distortion and tend to underperform in perception quality especially in low bitrate regime, and often disregard the performance of downstream computer vision algorithms, which is a fast-growing consumer group of compressed images in addition to human viewers. In this paper, we (1) present a generic framework that can enable any image codec to leverage high-level semantics and (2) study the joint optimization of perception quality and distortion. Our idea is that given any codec, we utilize high-level semantics to augment the low-level visual features extracted by it and produce essentially a new, semantic-aware codec. We propose a three-phase training scheme that teaches semantic-aware codecs to leverage the power of semantic to jointly optimize rate-perception-distortion (R-PD) performance. As an additional benefit, semantic-aware codecs also boost the performance of downstream computer vision algorithms. To validate our claim, we perform extensive empirical evaluations and provide both quantitative and qualitative results.
Assuntos
Compressão de Dados , Algoritmos , Humanos , Aumento da Imagem/métodos , Percepção , SemânticaRESUMO
Low-light image enhancement (LLIE) aims at improving the perception or interpretability of an image captured in an environment with poor illumination. Recent advances in this area are dominated by deep learning-based solutions, where many learning strategies, network structures, loss functions, training data, etc. have been employed. In this paper, we provide a comprehensive survey to cover various aspects ranging from algorithm taxonomy to unsolved open issues. To examine the generalization of existing methods, we propose a low-light image and video dataset, in which the images and videos are taken by different mobile phones' cameras under diverse illumination conditions. Besides, for the first time, we provide a unified online platform that covers many popular LLIE methods, of which the results can be produced through a user-friendly web interface. In addition to qualitative and quantitative evaluation of existing methods on publicly available and our proposed datasets, we also validate their performance in face detection in the dark. This survey together with the proposed dataset and online platform could serve as a reference source for future study and promote the development of this research field. The proposed platform and dataset as well as the collected methods, datasets, and evaluation metrics are publicly available and will be regularly updated. Project page: https://www.mmlab-ntu.com/project/lliv_survey/index.html.
RESUMO
Chimeric antigen receptor T (CAR T) cell therapy is a new pillar in cancer therapeutics, and has been successfully used for the treatment of cancers, including acute lymphoblastic leukemia and solid cancers. Following immune attack, many tumors upregulate inhibitory ligands which bind to inhibitory receptors on T cells. For example, the interaction between programmed cell death protein 1 (PD-1) on activated T cells and its ligands (widely known as PD-L1) on a target tumor limits the efficacy of CAR T cells therapy against poorly responding tumors. Here, we use mesothelin (MSLN)-expressing human ovarian cancer cells (SKOV3) and human colon cancer cells (HCT116) to investigate whether PD-1-mediated T cell exhaustion affects the anti-tumor activity of MSLN-targeted CAR T cells. We utilized cell-intrinsic PD-1-targeting shRNA overexpression strategy, resulting in a significant PD-1 silencing in CAR T cells. The reduction of PD-1 expression on T cell surface strongly augmented CAR T cell cytokine production and cytotoxicity towards PD-L1-expressing cancer cells in vitro. This study indicates the enhanced anti-tumor efficacy of PD-1-silencing MSLN-targeted CAR T cells against several cancers and suggests the potential of other specific gene silencing on the immune checkpoints to enhance the CAR T cell therapies against human tumors.
Assuntos
Proteínas Ligadas por GPI/antagonistas & inibidores , Imunoterapia Adotiva/métodos , Neoplasias/terapia , Receptor de Morte Celular Programada 1/genética , Receptores de Antígenos Quiméricos/metabolismo , Antígeno B7-H1/metabolismo , Linhagem Celular Tumoral , Células Cultivadas , Citocinas/metabolismo , Proteínas Ligadas por GPI/metabolismo , Células HEK293 , Humanos , Ativação Linfocitária , Mesotelina , Neoplasias/imunologia , Cultura Primária de Células , Receptor de Morte Celular Programada 1/metabolismo , Interferência de RNA , RNA Interferente Pequeno/metabolismo , Receptores de Antígenos Quiméricos/genética , Receptores de Antígenos Quiméricos/imunologia , Linfócitos T/imunologia , Linfócitos T/metabolismo , Linfócitos T/transplanteRESUMO
Cameras face a fundamental trade-off between spatial and temporal resolution. Digital still cameras can capture images with high spatial resolution, but most high-speed video cameras have relatively low spatial resolution. It is hard to overcome this trade-off without incurring a significant increase in hardware costs. In this paper, we propose techniques for sampling, representing, and reconstructing the space-time volume to overcome this trade-off. Our approach has two important distinctions compared to previous works: 1) We achieve sparse representation of videos by learning an overcomplete dictionary on video patches, and 2) we adhere to practical hardware constraints on sampling schemes imposed by architectures of current image sensors, which means that our sampling function can be implemented on CMOS image sensors with modified control units in the future. We evaluate components of our approach, sampling function and sparse representation, by comparing them to several existing approaches. We also implement a prototype imaging system with pixel-wise coded exposure control using a liquid crystal on silicon device. System characteristics such as field of view and modulation transfer function are evaluated for our imaging system. Both simulations and experiments on a wide range of scenes show that our method can effectively reconstruct a video from a single coded image while maintaining high spatial resolution.
Assuntos
Algoritmos , Compressão de Dados/métodos , Aumento da Imagem/métodos , Fotografação/métodos , Processamento de Sinais Assistido por Computador , Gravação em Vídeo/métodos , Aumento da Imagem/instrumentação , Fotografação/instrumentação , Reprodutibilidade dos Testes , Tamanho da Amostra , Sensibilidade e Especificidade , Análise Espaço-Temporal , Gravação em Vídeo/instrumentaçãoRESUMO
Classifying raw, unpainted materials--metal, plastic, ceramic, fabric, and so on--is an important yet challenging task for computer vision. Previous works measure subsets of surface spectral reflectance as features for classification. However, acquiring the full spectral reflectance is time consuming and error-prone. In this paper, we propose to use coded illumination to directly measure discriminative features for material classification. Optimal illumination patterns--which we call "discriminative illumination"--are learned from training samples, after projecting to which the spectral reflectance of different materials are maximally separated. This projection is automatically realized by the integration of incident light for surface reflection. While a single discriminative illumination is capable of linear, two-class classification, we show that multiple discriminative illuminations can be used for nonlinear and multiclass classification. We also show theoretically that the proposed method has higher signal-to-noise ratio than previous methods due to light multiplexing. Finally, we construct an LED-based multispectral dome and use the discriminative illumination method for classifying a variety of raw materials, including metal (aluminum, alloy, steel, stainless steel, brass, and copper), plastic, ceramic, fabric, and wood. Experimental results demonstrate its effectiveness.
RESUMO
We propose a new method named compressive structured light for recovering inhomogeneous participating media. Whereas conventional structured light methods emit coded light patterns onto the surface of an opaque object to establish correspondence for triangulation, compressive structured light projects patterns into a volume of participating medium to produce images which are integral measurements of the volume density along the line of sight. For a typical participating medium encountered in the real world, the integral nature of the acquired images enables the use of compressive sensing techniques that can recover the entire volume density from only a few measurements. This makes the acquisition process more efficient and enables reconstruction of dynamic volumetric phenomena. Moreover, our method requires the projection of multiplexed coded illumination, which has the added advantage of increasing the signal-to-noise ratio of the acquisition. Finally, we propose an iterative algorithm to correct for the attenuation of the participating medium during the reconstruction process. We show the effectiveness of our method with simulations as well as experiments on the volumetric recovery of multiple translucent layers, 3D point clouds etched in glass, and the dynamic process of milk drops dissolving in water.
RESUMO
Fingerprint analysis is typically based on the location and pattern of detected singular points in the images. These singular points (cores and deltas) not only represent the characteristics of local ridge patterns but also determine the topological structure (i.e., fingerprint type) and largely influence the orientation field. In this paper, we propose a novel algorithm for singular points detection. After an initial detection using the conventional Poincaré Index method, a so-called DORIC feature is used to remove spurious singular points. Then, the optimal combination of singular points is selected to minimize the difference between the original orientation field and the model-based orientation field reconstructed using the singular points. A core-delta relation is used as a global constraint for the final selection of singular points. Experimental results show that our algorithm is accurate and robust, giving better results than competing approaches. The proposed detection algorithm can also be used for more general 2D oriented patterns, such as fluid flow motion, and so forth.
Assuntos
Algoritmos , Inteligência Artificial , Dermatoglifia/classificação , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Pele/anatomia & histologia , Técnica de Subtração , Humanos , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
As an important feature, orientation field describes the global structure of fingerprints. It provides robust discriminatory information other than traditional widely-used minutiae points. However, there are few works explicitly incorporating this information into fingerprint matching stage, partly due to the difficulty of saving the orientation field in the feature template. In this paper, we propose a novel representation for fingerprints which includes both minutiae and model-based orientation field. Then, fingerprint matching can be done by combining the decisions of the matchers based on the global structure (orientation field) and the local cue (minutiae). We have conducted a set of experiments on large-scale databases and made thorough comparisons with the state-of-the-arts. Extensive experimental results show that combining these local and global discriminative information can largely improve the performance. The proposed system is more robust and accurate than conventional minutiae-based methods, and also better than the previous works which implicitly incorporate the orientation information. In this system, the feature template takes less than 420 bytes, and the feature extraction and matching procedures can be done in about 0.30 s. We also show that the global orientation field is beneficial to the alignment of the fingerprints which are either incomplete or poor-qualitied.
Assuntos
Biometria/métodos , Dermatoglifia , Dedos/anatomia & histologia , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Algoritmos , Inteligência Artificial , Sinais (Psicologia) , Humanos , Aumento da Imagem/métodos , Modelos Biológicos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Processamento de Sinais Assistido por ComputadorRESUMO
Content-based image retrieval (CBIR) has been more and more important in the last decade, and the gap between high-level semantic concepts and low-level visual features hinders further performance improvement. The problem of online feature selection is critical to really bridge this gap. In this paper, we investigate online feature selection in the relevance feedback learning process to improve the retrieval performance of the region-based image retrieval system. Our contributions are mainly in three areas. 1) A novel feature selection criterion is proposed, which is based on the psychological similarity between the positive and negative training sets. 2) An effective online feature selection algorithm is implemented in a boosting manner to select the most representative features for the current query concept and combine classifiers constructed over the selected features to retrieve images. 3) To apply the proposed feature selection method in region-based image retrieval systems, we propose a novel region-based representation to describe images in a uniform feature space with real-valued fuzzy features. Our system is suitable for online relevance feedback learning in CBIR by meeting the three requirements: learning with small size training set, the intrinsic asymmetry property of training samples, and the fast response requirement. Extensive experiments, including comparisons with many state-of-the-arts, show the effectiveness of our algorithm in improving the retrieval performance and saving the processing time.
Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Sistemas On-LineRESUMO
As a global feature of fingerprints, the orientation field is very important for automatic fingerprint recognition. Many algorithms have been proposed for orientation field estimation, but their results are unsatisfactory, especially for poor quality fingerprint images. In this paper, a model-based method for the computation of orientation field is proposed. First a combination model is established for the representation of the orientation field by conidering its smoothness except for several singular points, in which a polynomial model is used to describe the orientation field globally and a point-charge model is taken to improve the accuracy locally at each singular point. When the coarse field is computed by using the gradient-based algorithm, a further result can be gained by using the model for a weighted approximation. Due to the global approximation, this model-based orientation field estimation algorithm has a robust performance on different fingerprint images. A further experiment shows that the performance of a whole fingerprint recognition system can be improved by applying this algorithm instead of previous orientation estimation methods.