Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
1.
Sensors (Basel) ; 15(2): 4326-52, 2015 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-25686317

RESUMO

We propose a novel biometric recognition method that identifies the inner knuckle print (IKP). It is robust enough to confront uncontrolled lighting conditions, pose variations and low imaging quality. Such robustness is crucial for its application on portable devices equipped with consumer-level cameras. We achieve this robustness by two means. First, we propose a novel feature extraction scheme that highlights the salient structure and suppresses incorrect and/or unwanted features. The extracted IKP features retain simple geometry and morphology and reduce the interference of illumination. Second, to counteract the deformation induced by different hand orientations, we propose a novel structure-context descriptor based on local statistics. To our best knowledge, we are the first to simultaneously consider the illumination invariance and deformation tolerance for appearance-based low-resolution hand biometrics. Settings in previous works are more restrictive. They made strong assumptions either about the illumination condition or the restrictive hand orientation. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods in terms of recognition accuracy, especially under uncontrolled lighting conditions and the flexible hand orientation requirement.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38354074

RESUMO

Creating a vivid video from the event or scenario in our imagination is a truly fascinating experience. Recent advancements in text-to-video synthesis have unveiled the potential to achieve this with prompts only. While text is convenient in conveying the overall scene context, it may be insufficient to control precisely. In this paper, we explore customized video generation by utilizing text as context description and motion structure (e.g. frame- wise depth) as concrete guidance. Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules. This two-stage learning scheme not only reduces the computing resources required, but also improves the performance by transferring the rich concepts available in image datasets solely into video generation. Moreover, we use a simple yet effective causal attention mask strategy to enable longer video synthesis, which mitigates the potential quality degradation effectively. Experimental results show the superiority of our method over existing baselines, particularly in terms of temporal coherence and fidelity to users' guidance. In addition, our model enables several intriguing applications that demonstrate potential for practical usage. The code, model weights, and videos are publicly available at our project page: https://doubiiu.github.io/projects/Make-Your-Video/.

3.
IEEE Trans Image Process ; 32: 4259-4274, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37486835

RESUMO

Conventional social media platforms usually downscale high-resolution (HR) images to restrict their resolution to a specific size for saving transmission/storage cost, which makes those visual details inaccessible to other users. To bypass this obstacle, recent invertible image downscaling methods jointly model the downscaling/upscaling problems and achieve impressive performance. However, they only consider fixed integer scale factors and may be inapplicable to generic downscaling tasks towards resolution restriction as posed by social media platforms. In this paper, we propose an effective and universal Scale-Arbitrary Invertible Image Downscaling Network (AIDN), to downscale HR images with arbitrary scale factors in an invertible manner. Particularly, the HR information is embedded in the downscaled low-resolution (LR) counterparts in a nearly imperceptible form such that our AIDN can further restore the original HR images solely from the LR images. The key to supporting arbitrary scale factors is our proposed Conditional Resampling Module (CRM) that conditions the downscaling/upscaling kernels and sampling locations on both scale factors and image content. Extensive experimental results demonstrate that our AIDN achieves top performance for invertible downscaling with both arbitrary integer and non-integer scale factors. Also, both quantitative and qualitative evaluations show our AIDN is robust to the lossy image compression standard. The source code and trained models are publicly available at https://github.com/Doubiiu/AIDN.

4.
Artigo em Inglês | MEDLINE | ID: mdl-37220038

RESUMO

Traditional halftoning usually drops colors when dithering images with binary dots, which makes it difficult to recover the original color information. We proposed a novel halftoning technique that converts a color image into a binary halftone with full restorability to its original version. Our novel base halftoning technique consists of two convolutional neural networks (CNNs) to produce the reversible halftone patterns, and a noise incentive block (NIB) to mitigate the flatness degradation issue of CNNs. Furthermore, to tackle the conflicts between the blue-noise quality and restoration accuracy in our novel base method, we proposed a predictor-embedded approach to offload predictable information from the network, which in our case is the luminance information resembling from the halftone pattern. Such an approach allows the network to gain more flexibility to produce halftones with better blue-noise quality without compromising the restoration quality. Detailed studies on the multiple-stage training method and loss weightings have been conducted. We have compared our predictor-embedded method and our novel method regarding spectrum analysis on halftone, halftone accuracy, restoration accuracy, and the data embedding studies. Our entropy evaluation evidences our halftone contains less encoding information than our novel base method. The experiments show our predictor-embedded method gains more flexibility to improve the blue-noise quality of halftones and maintains a comparable restoration quality with a higher tolerance for disturbances.

5.
IEEE Trans Vis Comput Graph ; 29(7): 3226-3237, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35239483

RESUMO

This work presents an innovative method for point set self-embedding, that encodes the structural information of a dense point set into its sparser version in a visual but imperceptible form. The self-embedded point set can function as the ordinary downsampled one and be visualized efficiently on mobile devices. Particularly, we can leverage the self-embedded information to fully restore the original point set for detailed analysis on remote servers. This task is challenging, since both the self-embedded point set and the restored point set should resemble the original one. To achieve a learnable self-embedding scheme, we design a novel framework with two jointly-trained networks: one to encode the input point set into its self-embedded sparse point set and the other to leverage the embedded information for inverting the original point set back. Further, we develop a pair of up-shuffle and down-shuffle units in the two networks, and formulate loss terms to encourage the shape similarity and point distribution in the results. Extensive qualitative and quantitative results demonstrate the effectiveness of our method on both synthetic and real-scanned datasets. The source code and trained models will be publicly available at https://github.com/liruihui/Self-Embedding.

6.
Artigo em Inglês | MEDLINE | ID: mdl-36350869

RESUMO

Light fields are 4D scene representations that are typically structured as arrays of views or several directional samples per pixel in a single view. However, this highly correlated structure is not very efficient to transmit and manipulate, especially for editing. To tackle this issue, we propose a novel representation learning framework that can encode the light field into a single meta-view that is both compact and editable. Specifically, the meta-view composes of three visual channels and a complementary meta channel that is embedded with geometric and residual appearance information. The visual channels can be edited using existing 2D image editing tools, before reconstructing the whole edited light field. To facilitate edit propagation against occlusion, we design a special editing-aware decoding network that consistently propagates the visual edits to the whole light field upon reconstruction. Extensive experiments show that our proposed method achieves competitive representation accuracy and meanwhile enables consistent edit propagation.

7.
Neuroimage ; 54 Suppl 1: S180-8, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-20382235

RESUMO

The vestibular system is the sensory organ responsible for perceiving head rotational movements and maintaining postural balance of human body. The objectives of this study are to propose an innovative computational technique capable of automatically segmenting the vestibular system and to analyze its geometrical features from high resolution T2-weighted MR images. In this study, the proposed technique was used to test the hypothesis that the morphoanatomy of vestibular system in adolescent idiopathic scoliosis (AIS) patients is different from healthy control subjects. The findings could contribute significantly to the understanding of the etiopathogenesis of AIS. The segmentation pipeline consisted of extraction of region of interest, image pre-processing, K-means clustering, and surface smoothing. The geometry of this high-genus labyrinth structure was analyzed through automatic partition into genus-0 units and approximation using the best-fit circle and plane for each unit. The metrics of the best-fit planes and circles were taken as shape measures. The proposed technique was applied on a cohort of 20 right-thoracic AIS patients (mean age 14.7 years old) and 20 age-matched healthy girls. The intermediate results were validated by subjective scoring. The result showed that the distance between centers of lateral and superior canals and the angle with vertex at the center of posterior canal were significantly smaller in AIS than in healthy controls in the left-side vestibular system with p=0.0264 and p=0.0200 respectively, but not in the right-side counterparts. The detected morphoanatomical changes are likely to be associated with subclinical postural, vestibular and proprioceptive dysfunctions reported frequently in AIS. This study has demonstrated that the proposed method could be applied in MRI-based morphoanatomy studies of vestibular system clinically.


Assuntos
Interpretação de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética , Escoliose/patologia , Vestíbulo do Labirinto/patologia , Adolescente , Feminino , Humanos
8.
IEEE Trans Pattern Anal Mach Intell ; 43(12): 4491-4504, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32750783

RESUMO

Unlike images, finding the desired video content in a large pool of videos is not easy due to the time cost of loading and watching. Most video streaming and sharing services provide the video preview function for a better browsing experience. In this paper, we aim to generate a video preview from a single image. To this end, we propose two cascaded networks, the motion embedding network and the motion expansion network. The motion embedding network aims to embed the spatio-temporal information into an embedded image, called video snapshot. On the other end, the motion expansion network is proposed to invert the video back from the input video snapshot. To hold the invertibility of motion embedding and expansion during training, we design four tailor-made losses and a motion attention module to make the network focus on the temporal information. In order to enhance the viewing experience, our expansion network involves an interpolation module to produce a longer video preview with a smooth transition. Extensive experiments demonstrate that our method can successfully embed the spatio-temporal information of a video into one "live" image, which can be converted back to a video preview. Quantitative and qualitative evaluations are conducted on a large number of videos to prove the effectiveness of our proposed method. In particular, statistics of PSNR and SSIM on a large number of videos show the proposed method is general, and it can generate a high-quality video from a single image.

9.
IEEE Trans Vis Comput Graph ; 27(1): 178-189, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31352345

RESUMO

Deep learning has been recently demonstrated as an effective tool for raster-based sketch simplification. Nevertheless, it remains challenging to simplify extremely rough sketches. We found that a simplification network trained with a simple loss, such as pixel loss or discriminator loss, may fail to retain the semantically meaningful details when simplifying a very sketchy and complicated drawing. In this paper, we show that, with a well-designed multi-layer perceptual loss, we are able to obtain aesthetic and neat simplification results preserving semantically important global structures as well as fine details without blurriness and excessive emphasis on local structures. To do so, we design a multi-layer discriminator by fusing all VGG feature layers to differentiate sketches and clean lines. The weights used in layer fusing are automatically learned via an intelligent adjustment mechanism. Furthermore, to evaluate our method, we compare our method to state-of-the-art methods through multiple experiments, including visual comparison and intensive user study.

10.
IEEE Trans Vis Comput Graph ; 16(2): 287-97, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20075488

RESUMO

We propose a novel reaction diffusion (RD) simulator to evolve image-resembling mazes. The evolved mazes faithfully preserve the salient interior structures in the source images. Since it is difficult to control the generation of desired patterns with traditional reaction diffusion, we develop our RD simulator on a different computational platform, cellular neural networks. Based on the proposed simulator, we can generate the mazes that exhibit both regular and organic appearance, with uniform and/or spatially varying passage spacing. Our simulator also provides high controllability of maze appearance. Users can directly and intuitively "paint" to modify the appearance of mazes in a spatially varying manner via a set of brushes. In addition, the evolutionary nature of our method naturally generates maze without any obvious seam even though the input image is a composite of multiple sources. The final maze is obtained by determining a solution path that follows the user-specified guiding curve. We validate our method by evolving several interesting mazes from different source images.


Assuntos
Algoritmos , Gráficos por Computador , Interpretação de Imagem Assistida por Computador/métodos , Modelos Teóricos , Simulação por Computador
11.
IEEE Trans Vis Comput Graph ; 16(1): 43-56, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-19910660

RESUMO

This paper proposes a novel multiscale spherical radial basis function (MSRBF) representation for all-frequency lighting. It supports the illumination of distant environment as well as the local illumination commonly used in practical applications, such as games. The key is to define a multiscale and hierarchical structure of spherical radial basis functions (SRBFs) with basis functions uniformly distributed over the sphere. The basis functions are divided into multiple levels according to their coverage (widths). Within the same level, SRBFs have the same width. Larger width SRBFs are responsible for lower frequency lighting while the smaller width ones are responsible for the higher frequency lighting. Hence, our approach can achieve the true all-frequency lighting that is not achievable by the single-scale SRBF approach. Besides, the MSRBF approach is scalable as coarser rendering quality can be achieved without reestimating the coefficients from the raw data. With the homogeneous form of basis functions, the rendering is highly efficient. The practicability of the proposed method is demonstrated with real-time rendering and effective compression for tractable storage.


Assuntos
Algoritmos , Gráficos por Computador , Imageamento Tridimensional/métodos , Iluminação/métodos , Modelos Teóricos , Interface Usuário-Computador , Simulação por Computador
12.
IEEE Trans Pattern Anal Mach Intell ; 31(6): 974-88, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19372604

RESUMO

This paper presents a novel method for recovering consistent depth maps from a video sequence. We propose a bundle optimization framework to address the major difficulties in stereo reconstruction, such as dealing with image noise, occlusions, and outliers. Different from the typical multi-view stereo methods, our approach not only imposes the photo-consistency constraint, but also explicitly associates the geometric coherence with multiple frames in a statistical way. It thus can naturally maintain the temporal coherence of the recovered dense depth maps without over-smoothing. To make the inference tractable, we introduce an iterative optimization scheme by first initializing the disparity maps using a segmentation prior and then refining the disparities by means of bundle optimization. Instead of defining the visibility parameters, our method implicitly models the reconstruction noise as well as the probabilistic visibility. After bundle optimization, we introduce an efficient space-time fusion algorithm to further reduce the reconstruction noise. Our automatic depth recovery is evaluated using a variety of challenging video examples.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Fotografação/métodos , Gravação em Vídeo/métodos , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
13.
IEEE Trans Vis Comput Graph ; 15(5): 828-40, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19590108

RESUMO

Compared to still image editing, content-based video editing faces the additional challenges of maintaining the spatiotemporal consistency with respect to geometry. This brings up difficulties of seamlessly modifying video content, for instance, inserting or removing an object. In this paper, we present a new video editing system for creating spatiotemporally consistent and visually appealing refilming effects. Unlike the typical filming practice, our system requires no labor-intensive construction of 3D models/surfaces mimicking the real scene. Instead, it is based on an unsupervised inference of view-dependent depth maps for all video frames. We provide interactive tools requiring only a small amount of user input to perform elementary video content editing, such as separating video layers, completing background scene, and extracting moving objects. These tools can be utilized to produce a variety of visual effects in our system, including but not limited to video composition, "predator" effect, bullet-time, depth-of-field, and fog synthesis. Some of the effects can be achieved in real time.

14.
IEEE Trans Vis Comput Graph ; 24(5): 1705-1716, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-28436877

RESUMO

Most graphics hardware features memory to store textures and vertex data for rendering. However, because of the irreversible trend of increasing complexity of scenes, rendering a scene can easily reach the limit of memory resources. Thus, vertex data are preferably compressed, with a requirement that they can be decompressed during rendering. In this paper, we present a novel method to exploit existing hardware texture compression circuits to facilitate the decompression of vertex data in graphics processing unit (GPUs). This built-in hardware allows real-time, random-order decoding of data. However, vertex data must be packed into textures, and careless packing arrangements can easily disrupt data coherence. Hence, we propose an optimization approach for the best vertex data permutation that minimizes compression error. All of these result in fast and high-quality vertex data decompression for real-time rendering. To further improve the visual quality, we introduce vertex clustering to reduce the dynamic range of data during quantization. Our experiments demonstrate the effectiveness of our method for various vertex data of 3D models during rendering with the advantages of a minimized memory footprint and high frame rate.

15.
IEEE Trans Vis Comput Graph ; 24(7): 2103-2117, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-28534776

RESUMO

Shading is a tedious process for artists involved in 2D cartoon and manga production given the volume of contents that the artists have to prepare regularly over tight schedule. While we can automate shading production with the presence of geometry, it is impractical for artists to model the geometry for every single drawing. In this work, we aim to automate shading generation by analyzing the local shapes, connections, and spatial arrangement of wrinkle strokes in a clean line drawing. By this, artists can focus more on the design rather than the tedious manual editing work, and experiment with different shading effects under different conditions. To achieve this, we have made three key technical contributions. First, we model five perceptual cues by exploring relevant psychological principles to estimate the local depth profile around strokes. Second, we formulate stroke interpretation as a global optimization model that simultaneously balances different interpretations suggested by the perceptual cues and minimizes the interpretation discrepancy. Lastly, we develop a wrinkle-aware inflation method to generate a height field for the surface to support the shading region computation. In particular, we enable the generation of two commonly-used shading styles: 3D-like soft shading and manga-style flat shading.

16.
IEEE Trans Vis Comput Graph ; 13(4): 720-31, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17495332

RESUMO

This paper proposes a novel six-face spherical map, isocube, that fully utilizes the cubemap hardware built in most GPUs. Unlike the cubemap, the proposed isocube uniformly samples the unit sphere (uniformly distributed), and all samples span the same solid angle (equally important). Its mapping computation contains only a small overhead. By feeding the cubemap hardware with the six-face isocube map, the isocube can exploit all built-in texturing operators tailored for the cubemap and achieve a very high frame rate. In addition, we develop an anisotropic filtering that compensates aliasing artifacts due to texture magnification. This filtering technique extends the existing hardware anisotropic filtering and can be applied not only to the proposed isocube, but also to other texture mapping applications.


Assuntos
Algoritmos , Gráficos por Computador , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/instrumentação , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Processamento de Sinais Assistido por Computador , Desenho de Equipamento , Aumento da Imagem/instrumentação , Imageamento Tridimensional/instrumentação , Análise Numérica Assistida por Computador
17.
IEEE Trans Vis Comput Graph ; 13(4): 686-96, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17495329

RESUMO

This paper presents an automatic and robust approach to synthesize stereoscopic videos from ordinary monocular videos acquired by commodity video cameras. Instead of recovering the depth map, the proposed method synthesizes the binocular parallax in stereoscopic video directly from the motion parallax in monocular video. The synthesis is formulated as an optimization problem via introducing a cost function of the stereoscopic effects, the similarity, and the smoothness constraints. The optimization selects the most suitable frames in the input video for generating the stereoscopic video frames. With the optimized selection, convincing and smooth stereoscopic video can be synthesized even by simple constant-depth warping. No user interaction is required. We demonstrate the visually plausible results obtained given the input clips acquired by ordinary handheld video camera.


Assuntos
Algoritmos , Gráficos por Computador , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Fotogrametria/métodos , Gravação em Vídeo/métodos , Análise Numérica Assistida por Computador
18.
IEEE Trans Vis Comput Graph ; 13(5): 953-65, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17622679

RESUMO

This paper presents a modular framework to efficiently apply the bidirectional texture functions (BTF) onto object surfaces. The basic building blocks are the BTF tiles. By constructing one set of BTF tiles, a wide variety of objects can be textured seamlessly without re-synthesizing the BTF. The proposed framework nicely decouples the surface appearance from the geometry. With this appearance-geometry decoupling, one can build a library of BTF tile sets to instantaneously dress and render various objects under variable lighting and viewing conditions. The core of our framework is a novel method for synthesizing seamless high-dimensional BTF tiles, that are difficult for existing synthesis techniques. Its key is to shorten the cutting paths and broaden the choices of samples so as to increase the chance of synthesizing seamless BTF tiles. To tackle the enormous data, the tile synthesis process is performed in compressed domain. This not just allows the handling of large BTF data during the synthesis, but also facilitates compact storage of the BTF in GPU memory during the rendering.


Assuntos
Algoritmos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Gráficos por Computador , Armazenamento e Recuperação da Informação/métodos , Análise Numérica Assistida por Computador , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador
19.
Stud Health Technol Inform ; 125: 500-2, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17377336

RESUMO

Marching cubes has long been employed as a standard indirect volume rendering approach to extract isosurfaces from 3D volumetric data. This paper presents a GPU-friendly MC implementation. Besides the cell indexing, we propose to calculate vertex and normal interpolations by precomputing the expensive equations and looking up these values during runtime. Upon a commodity GPU, our implementation can rapidly extract isosurfaces from a high-resolution volume and render the result. With the proposed parallel marching cubes algorithm, we can naturally generate layer-structured triangles, which facilitate the visualization of multiple-layer translucent isosurfaces without performing computational expensive sorting. The algorithm extracts and draws triangles, in a layer by layer fashion, from back to front.


Assuntos
Simulação por Computador , Imageamento Tridimensional/métodos , Interface Usuário-Computador , Hong Kong , Humanos , Software
20.
IEEE Trans Vis Comput Graph ; 23(8): 1910-1923, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-27323365

RESUMO

While ASCII art is a worldwide popular art form, automatic generating structure-based ASCII art from natural photographs remains challenging. The major challenge lies on extracting the perception-sensitive structure from the natural photographs so that a more concise ASCII art reproduction can be produced based on the structure. However, due to excessive amount of texture in natural photos, extracting perception-sensitive structure is not easy, especially when the structure may be weak and within the texture region. Besides, to fit different target text resolutions, the amount of the extracted structure should also be controllable. To tackle these challenges, we introduce a visual perception mechanism of non-classical receptive field modulation (non-CRF modulation) from physiological findings to this ASCII art application, and propose a new model of non-CRF modulation which can better separate the weak structure from the crowded texture, and also better control the scale of texture suppression. Thanks to our non-CRF model, more sensible ASCII art reproduction can be obtained. In addition, to produce more visually appealing ASCII arts, we propose a novel optimization scheme to obtain the optimal placement of proportional-font characters. We apply our method on a rich variety of images, and visually appealing ASCII art can be obtained in all cases.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA