Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 57
Filtrar
1.
IEEE Trans Image Process ; 33: 5715-5726, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39356595

RESUMEN

Instance shadow detection, crucial for applications such as photo editing and light direction estimation, has undergone significant advancements in predicting shadow instances, object instances, and their associations. The extension of this task to videos presents challenges in annotating diverse video data and addressing complexities arising from occlusion and temporary disappearances within associations. In response to these challenges, we introduce ViShadow, a semi-supervised video instance shadow detection framework that leverages both labeled image data and unlabeled video data for training. ViShadow features a two-stage training pipeline: the first stage, utilizing labeled image data, identifies shadow and object instances through contrastive learning for cross-frame pairing. The second stage employs unlabeled videos, incorporating an associated cycle consistency loss to enhance tracking ability. A retrieval mechanism is introduced to manage temporary disappearances, ensuring tracking continuity. The SOBA-VID dataset, comprising unlabeled training videos and labeled testing videos, along with the SOAP-VID metric, is introduced for the quantitative evaluation of VISD solutions. The effectiveness of ViShadow is further demonstrated through various video-level applications such as video inpainting, instance cloning, shadow editing, and text-instructed shadow-object manipulation.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14385-14403, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37782580

RESUMEN

This paper presents a new text-guided 3D shape generation approach DreamStone that uses images as a stepping stone to bridge the gap between the text and shape modalities for generating 3D shapes without requiring paired text and 3D data. The core of our approach is a two-stage feature-space alignment strategy that leverages a pre-trained single-view reconstruction (SVR) model to map CLIP features to shapes: to begin with, map the CLIP image feature to the detail-rich 3D shape space of the SVR model, then map the CLIP text feature to the 3D shape space through encouraging the CLIP-consistency between the rendered images and the input text. Besides, to extend beyond the generative capability of the SVR model, we design the text-guided 3D shape stylization module that can enhance the output shapes with novel structures and textures. Further, we exploit pre-trained text-to-image diffusion models to enhance the generative diversity, fidelity, and stylization capability. Our approach is generic, flexible, and scalable. It can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity. Extensive experimental results demonstrate that our approach outperforms the state-of-the-art methods in terms of generative quality and consistency with the input text.

3.
Med Image Anal ; 84: 102680, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36481607

RESUMEN

In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018. The image dataset is diverse and contains primary and secondary tumors with varied sizes and appearances with various lesion-to-background levels (hyper-/hypo-dense), created in collaboration with seven hospitals and research institutions. Seventy-five submitted liver and liver tumor segmentation algorithms were trained on a set of 131 computed tomography (CT) volumes and were tested on 70 unseen test images acquired from different patients. We found that not a single algorithm performed best for both liver and liver tumors in the three events. The best liver segmentation algorithm achieved a Dice score of 0.963, whereas, for tumor segmentation, the best algorithms achieved Dices scores of 0.674 (ISBI 2017), 0.702 (MICCAI 2017), and 0.739 (MICCAI 2018). Retrospectively, we performed additional analysis on liver tumor detection and revealed that not all top-performing segmentation algorithms worked well for tumor detection. The best liver tumor detection method achieved a lesion-wise recall of 0.458 (ISBI 2017), 0.515 (MICCAI 2017), and 0.554 (MICCAI 2018), indicating the need for further research. LiTS remains an active benchmark and resource for research, e.g., contributing the liver-related segmentation tasks in http://medicaldecathlon.com/. In addition, both data and online evaluation are accessible via https://competitions.codalab.org/competitions/17094.


Asunto(s)
Benchmarking , Neoplasias Hepáticas , Humanos , Estudios Retrospectivos , Neoplasias Hepáticas/diagnóstico por imagen , Neoplasias Hepáticas/patología , Hígado/diagnóstico por imagen , Hígado/patología , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos
4.
IEEE Trans Vis Comput Graph ; 29(3): 1610-1624, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-34752396

RESUMEN

We present a novel two-stage approach for automated floorplan design in residential buildings with a given exterior wall boundary. Our approach has the unique advantage of being human-centric, that is, the generated floorplans can be geometrically plausible, as well as topologically reasonable to enhance resident interaction with the environment. From the input boundary, we first synthesize a human-activity map that reflects both the spatial configuration and human-environment interaction in an architectural space. We propose to produce the human-activity map either automatically by a pre-trained generative adversarial network (GAN) model, or semi-automatically by synthesizing it with user manipulation of the furniture. Second, we feed the human-activity map into our deep framework ActFloor-GAN to guide a pixel-wise prediction of room types. We adopt a re-formulated cycle-consistency constraint in ActFloor-GAN to maximize the overall prediction performance, so that we can produce high-quality room layouts that are readily convertible to vectorized floorplans. Experimental results show several benefits of our approach. First, a quantitative comparison with prior methods shows superior performance of leveraging the human-activity map in predicting piecewise room types. Second, a subjective evaluation by architects shows that our results have compelling quality as professionally-designed floorplans and much better than those generated by existing methods in terms of the room layout topology. Last, our approach allows manipulating the furniture placement, considers the human activities in the environment, and enables the incorporation of user-design preferences.

5.
IEEE Trans Vis Comput Graph ; 29(7): 3226-3237, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-35239483

RESUMEN

This work presents an innovative method for point set self-embedding, that encodes the structural information of a dense point set into its sparser version in a visual but imperceptible form. The self-embedded point set can function as the ordinary downsampled one and be visualized efficiently on mobile devices. Particularly, we can leverage the self-embedded information to fully restore the original point set for detailed analysis on remote servers. This task is challenging, since both the self-embedded point set and the restored point set should resemble the original one. To achieve a learnable self-embedding scheme, we design a novel framework with two jointly-trained networks: one to encode the input point set into its self-embedded sparse point set and the other to leverage the embedded information for inverting the original point set back. Further, we develop a pair of up-shuffle and down-shuffle units in the two networks, and formulate loss terms to encourage the shape similarity and point distribution in the results. Extensive qualitative and quantitative results demonstrate the effectiveness of our method on both synthetic and real-scanned datasets. The source code and trained models will be publicly available at https://github.com/liruihui/Self-Embedding.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3259-3273, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-35737621

RESUMEN

This article formulates a new problem, instance shadow detection, which aims to detect shadow instance and the associated object instance that cast each shadow in the input image. To approach this task, we first compile a new dataset with the masks for shadow instances, object instances, and shadow-object associations. We then design an evaluation metric for quantitative evaluation of the performance of instance shadow detection. Further, we design a single-stage detector to perform instance shadow detection in an end-to-end manner, where the bidirectional relation learning module and the deformable maskIoU head are proposed in the detector to directly learn the relation between shadow instances and object instances and to improve the accuracy of the predicted masks. Finally, we quantitatively and qualitatively evaluate our method on the benchmark dataset of instance shadow detection and show the applicability of our method on light direction estimation and photo editing.

7.
Artículo en Inglés | MEDLINE | ID: mdl-37015422

RESUMEN

Kinesthetic feedback, the feeling of restriction or resistance when hands contact objects, is essential for natural freehand interaction in VR. However, inducing kinesthetic feedback using mechanical hardware can be cumbersome and hard to control in commodity VR systems. We propose the kine-appendage concept to compensate for the loss of kinesthetic feedback in virtual environments, i.e., a virtual appendage is added to the user's avatar hand; when the appendage contacts a virtual object, it exhibits transformations (rotation and deformation); when it disengages from the contact, it recovers its original appearance. A proof-of-concept kine-appendage technique, BrittleStylus, was designed to enhance isomorphic typing. Our empirical evaluations demonstrated that (i) BrittleStylus significantly reduced the uncorrected error rate of naive isomorphic typing from 6.53% to 1.92% without compromising the typing speed; (ii) BrittleStylus could induce the sense of kinesthetic feedback, the degree of which was parity with that induced by pseudo-haptic (+ visual cue) methods; and (iii) participants preferred BrittleStylus over pseudo-haptic (+ visual cue) methods because of not only good performance but also fluent hand movements.

8.
IEEE Trans Vis Comput Graph ; 28(1): 890-900, 2022 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-34587082

RESUMEN

Time-series data-usually presented in the form of lines-plays an important role in many domains such as finance, meteorology, health, and urban informatics. Yet, little has been done to support interactive exploration of large-scale time-series data, which requires a clutter-free visual representation with low-latency interactions. In this paper, we contribute a novel line-segment-based KD-tree method to enable interactive analysis of many time series. Our method enables not only fast queries over time series in selected regions of interest but also a line splatting method for efficient computation of the density field and selection of representative lines. Further, we develop KD-Box, an interactive system that provides rich interactions, e.g., timebox, attribute filtering, and coordinated multiple views. We demonstrate the effectiveness of KD-Box in supporting efficient line query and density field computation through a quantitative comparison and show its usefulness for interactive visual analysis on several real-world datasets.

9.
IEEE Trans Vis Comput Graph ; 28(1): 593-603, 2022 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-34587089

RESUMEN

We present a pyramid-based scatterplot sampling technique to avoid overplotting and enable progressive and streaming visualization of large data. Our technique is based on a multiresolution pyramid-based decomposition of the underlying density map and makes use of the density values in the pyramid to guide the sampling at each scale for preserving the relative data densities and outliers. We show that our technique is competitive in quality with state-of-the-art methods and runs faster by about an order of magnitude. Also, we have adapted it to deliver progressive and streaming data visualization by processing the data in chunks and updating the scatterplot areas with visible changes in the density map. A quantitative evaluation shows that our approach generates stable and faithful progressive samples that are comparable to the state-of-the-art method in preserving relative densities and superior to it in keeping outliers and stability when switching frames. We present two case studies that demonstrate the effectiveness of our approach for exploring large data.

10.
IEEE Trans Vis Comput Graph ; 28(12): 4048-4060, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33819157

RESUMEN

This article presents a new approach based on deep learning to automatically extract colormaps from visualizations. After summarizing colors in an input visualization image as a Lab color histogram, we pass the histogram to a pre-trained deep neural network, which learns to predict the colormap that produces the visualization. To train the network, we create a new dataset of  âˆ¼ 64K visualizations that cover a wide variety of data distributions, chart types, and colormaps. The network adopts an atrous spatial pyramid pooling module to capture color features at multiple scales in the input color histograms. We then classify the predicted colormap as discrete or continuous, and refine the predicted colormap based on its color histogram. Quantitative comparisons to existing methods show the superior performance of our approach on both synthetic and real-world visualizations. We further demonstrate the utility of our method with two use cases, i.e., color transfer and color remapping.

11.
IEEE Trans Vis Comput Graph ; 28(12): 4503-4514, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34170827

RESUMEN

Recently, many deep neural networks were designed to process 3D point clouds, but a common drawback is that rotation invariance is not ensured, leading to poor generalization to arbitrary orientations. In this article, we introduce a new low-level purely rotation-invariant representation to replace common 3D Cartesian coordinates as the network inputs. Also, we present a network architecture to embed these representations into features, encoding local relations between points and their neighbors, and the global shape structure. To alleviate inevitable global information loss caused by the rotation-invariant representations, we further introduce a region relation convolution to encode local and non-local information. We evaluate our method on multiple point cloud analysis tasks, including (i) shape classification, (ii) part segmentation, and (iii) shape retrieval. Extensive experimental results show that our method achieves consistent, and also the best performance, on inputs at arbitrary orientations, compared with all the state-of-the-art methods.

12.
IEEE Trans Image Process ; 30: 6686-6699, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34310282

RESUMEN

The ability to recognize the position and order of the floor-level lines that divide adjacent building floors can benefit many applications, for example, urban augmented reality (AR). This work tackles the problem of locating floor-level lines in street-view images, using a supervised deep learning approach. Unfortunately, very little data is available for training such a network - current street-view datasets contain either semantic annotations that lack geometric attributes, or rectified facades without perspective priors. To address this issue, we first compile a new dataset and develop a new data augmentation scheme to synthesize training samples by harassing (i) the rich semantics of existing rectified facades and (ii) perspective priors of buildings in diverse street views. Next, we design FloorLevel-Net, a multi-task learning network that associates explicit features of building facades and implicit floor-level lines, along with a height-attention mechanism to help enforce a vertical ordering of floor-level lines. The generated segmentations are then passed to a second-stage geometry post-processing to exploit self-constrained geometric priors for plausible and consistent reconstruction of floor-level lines. Quantitative and qualitative evaluations conducted on assorted facades in existing datasets and street views from Google demonstrate the effectiveness of our approach. Also, we present context-aware image overlay results and show the potentials of our approach in enriching AR-related applications. Project website: https://wumengyangok.github.io/Project/FloorLevelNet.

13.
IEEE Trans Image Process ; 30: 1759-1770, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33417548

RESUMEN

Rain is a common weather phenomenon that affects environmental monitoring and surveillance systems. According to an established rain model (Garg and Nayar, 2007), the scene visibility in the rain varies with the depth from the camera, where objects faraway are visually blocked more by the fog than by the rain streaks. However, existing datasets and methods for rain removal ignore these physical properties, thus limiting the rain removal efficiency on real photos. In this work, we analyze the visual effects of rain subject to scene depth and formulate a rain imaging model that collectively considers rain streaks and fog. Also, we prepare a dataset called RainCityscapes on real outdoor photos. Furthermore, we design a novel real-time end-to-end deep neural network, for which we train to learn the depth-guided non-local features and to regress a residual map to produce a rain-free output image. We performed various experiments to visually and quantitatively compare our method with several state-of-the-art methods to show its superiority over others.

14.
IEEE Trans Image Process ; 30: 1925-1934, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33428570

RESUMEN

Shadow detection in general photos is a nontrivial problem, due to the complexity of the real world. Though recent shadow detectors have already achieved remarkable performance on various benchmark data, their performance is still limited for general real-world situations. In this work, we collected shadow images for multiple scenarios and compiled a new dataset of 10,500 shadow images, each with labeled ground-truth mask, for supporting shadow detection in the complex world. Our dataset covers a rich variety of scene categories, with diverse shadow sizes, locations, contrasts, and types. Further, we comprehensively analyze the complexity of the dataset, present a fast shadow detection network with a detail enhancement module to harvest shadow details, and demonstrate the effectiveness of our method to detect shadows in general situations.

15.
IEEE Trans Vis Comput Graph ; 27(10): 4060-4072, 2021 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-32746260

RESUMEN

This article presents a deep normal filtering network, called DNF-Net, for mesh denoising. To better capture local geometry, our network processes the mesh in terms of local patches extracted from the mesh. Overall, DNF-Net is an end-to-end network that takes patches of facet normals as inputs and directly outputs the corresponding denoised facet normals of the patches. In this way, we can reconstruct the geometry from the denoised normals with feature preservation. Besides the overall network architecture, our contributions include a novel multi-scale feature embedding unit, a residual learning strategy to remove noise, and a deeply-supervised joint loss function. Compared with the recent data-driven works on mesh denoising, DNF-Net does not require manual input to extract features and better utilizes the training data to enhance its denoising performance. Finally, we present comprehensive experiments to evaluate our method and demonstrate its superiority over the state of the art on both synthetic and real-scanned meshes.

16.
IEEE Trans Neural Netw Learn Syst ; 32(2): 523-534, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-32479407

RESUMEN

A common shortfall of supervised deep learning for medical imaging is the lack of labeled data, which is often expensive and time consuming to collect. This article presents a new semisupervised method for medical image segmentation, where the network is optimized by a weighted combination of a common supervised loss only for the labeled inputs and a regularization loss for both the labeled and unlabeled data. To utilize the unlabeled data, our method encourages consistent predictions of the network-in-training for the same input under different perturbations. With the semisupervised segmentation tasks, we introduce a transformation-consistent strategy in the self-ensembling model to enhance the regularization effect for pixel-level predictions. To further improve the regularization effects, we extend the transformation in a more generalized form including scaling and optimize the consistency loss with a teacher model, which is an averaging of the student model weights. We extensively validated the proposed semisupervised method on three typical yet challenging medical image segmentation tasks: 1) skin lesion segmentation from dermoscopy images in the International Skin Imaging Collaboration (ISIC) 2017 data set; 2) optic disk (OD) segmentation from fundus images in the Retinal Fundus Glaucoma Challenge (REFUGE) data set; and 3) liver segmentation from volumetric CT scans in the Liver Tumor Segmentation Challenge (LiTS) data set. Compared with state-of-the-art, our method shows superior performance on the challenging 2-D/3-D medical images, demonstrating the effectiveness of our semisupervised method for medical image segmentation.


Asunto(s)
Diagnóstico por Imagen/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático Supervisado , Algoritmos , Bases de Datos Factuales , Aprendizaje Profundo , Dermatología/métodos , Fondo de Ojo , Glaucoma/diagnóstico por imagen , Humanos , Imagenología Tridimensional , Hígado/diagnóstico por imagen , Redes Neurales de la Computación , Disco Óptico/diagnóstico por imagen , Reproducibilidad de los Resultados , Enfermedades de la Piel/diagnóstico por imagen , Tomografía Computarizada por Rayos X
17.
IEEE Trans Med Imaging ; 39(12): 4237-4248, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-32776876

RESUMEN

Deep convolutional neural networks have significantly boosted the performance of fundus image segmentation when test datasets have the same distribution as the training datasets. However, in clinical practice, medical images often exhibit variations in appearance for various reasons, e.g., different scanner vendors and image quality. These distribution discrepancies could lead the deep networks to over-fit on the training datasets and lack generalization ability on the unseen test datasets. To alleviate this issue, we present a novel Domain-oriented Feature Embedding (DoFE) framework to improve the generalization ability of CNNs on unseen target domains by exploring the knowledge from multiple source domains. Our DoFE framework dynamically enriches the image features with additional domain prior knowledge learned from multi-source domains to make the semantic features more discriminative. Specifically, we introduce a Domain Knowledge Pool to learn and memorize the prior information extracted from multi-source domains. Then the original image features are augmented with domain-oriented aggregated features, which are induced from the knowledge pool based on the similarity between the input image and multi-source domain images. We further design a novel domain code prediction branch to infer this similarity and employ an attention-guided mechanism to dynamically combine the aggregated features with the semantic features. We comprehensively evaluate our DoFE framework on two fundus image segmentation tasks, including the optic cup and disc segmentation and vessel segmentation. Our DoFE framework generates satisfying segmentation results on unseen datasets and surpasses other domain generalization and network regularization methods.


Asunto(s)
Disco Óptico , Fondo de Ojo , Redes Neurales de la Computación , Disco Óptico/diagnóstico por imagen
18.
Artículo en Inglés | MEDLINE | ID: mdl-32305920

RESUMEN

This paper presents a new approach to recognizing vanishing-point-constrained building planes from a single image of street view. We first design a novel convolutional neural network (CNN) architecture that generates geometric segmentation of per-pixel orientations from a single street-view image. The network combines two-stream features of general visual cues and surface normals in gated convolution layers, and employs a deeply supervised loss that encapsulates multi-scale convolutional features. Our experiments on a new benchmark with fine-grained plane segmentations of real-world street views show that our network outperforms state-of-the-arts methods of both semantic and geometric segmentation. The pixel-wise segmentation exhibits coarse boundaries and discontinuities. We then propose to rectify the pixel-wise segmentation into perspectively-projected quads based on spatial proximity between the segmentation masks and exterior line segments detected through an image processing. We demonstrate how the results can be utilized to perspectively overlay images and icons on building planes in input photos, and provide visual cues for various applications.

19.
IEEE Trans Med Imaging ; 39(9): 2806-2817, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32091996

RESUMEN

Sub-cortical brain structure segmentation is of great importance for diagnosing neuropsychiatric disorders. However, developing an automatic approach to segmenting sub-cortical brain structures remains very challenging due to the ambiguous boundaries, complex anatomical structures, and large variance of shapes. This paper presents a novel deep network architecture, namely Ψ -Net, for sub-cortical brain structure segmentation, aiming at selectively aggregating features and boosting the information propagation in a deep convolutional neural network (CNN). To achieve this, we first formulate a densely convolutional LSTM module (DC-LSTM) to selectively aggregate the convolutional features with the same spatial resolution at the same stage of a CNN. This helps to promote the discriminativeness of features at each CNN stage. Second, we stack multiple DC-LSTMs from the deepest stage to the shallowest stage to progressively enrich low-level feature maps with high-level context. We employ two benchmark datasets on sub-cortical brain structure segmentation, and perform various experiments to evaluate the proposed Ψ -Net. The experimental results show that our network performs favorably against the state-of-the-art methods on both benchmark datasets.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Encéfalo/diagnóstico por imagen
20.
IEEE Trans Vis Comput Graph ; 26(1): 729-738, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31442987

RESUMEN

We present a non-uniform recursive sampling technique for multi-class scatterplots, with the specific goal of faithfully presenting relative data and class densities, while preserving major outliers in the plots. Our technique is based on a customized binary kd-tree, in which leaf nodes are created by recursively subdividing the underlying multi-class density map. By backtracking, we merge leaf nodes until they encompass points of all classes for our subsequently applied outlier-aware multi-class sampling strategy. A quantitative evaluation shows that our approach can better preserve outliers and at the same time relative densities in multi-class scatterplots compared to the previous approaches, several case studies demonstrate the effectiveness of our approach in exploring complex and real world data.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...