RESUMO
Mental representations of familiar categories are composed of visual and semantic information. Disentangling the contributions of visual and semantic information in humans is challenging because they are intermixed in mental representations. Deep neural networks that are trained either on images or on text or by pairing images and text enable us now to disentangle human mental representations into their visual, visual-semantic and semantic components. Here we used these deep neural networks to uncover the content of human mental representations of familiar faces and objects when they are viewed or recalled from memory. The results show a larger visual than semantic contribution when images are viewed and a reversed pattern when they are recalled. We further reveal a previously unknown unique contribution of an integrated visual-semantic representation in both perception and memory. We propose a new framework in which visual and semantic information contribute independently and interactively to mental representations in perception and memory.
Assuntos
Rememoração Mental , Redes Neurais de Computação , Semântica , Percepção Visual , Humanos , Feminino , Masculino , Rememoração Mental/fisiologia , Percepção Visual/fisiologia , Adulto , Adulto Jovem , Reconhecimento Psicológico/fisiologia , Reconhecimento Facial/fisiologia , Memória/fisiologiaRESUMO
Visually encoding quantitative information associated with graph links is an important problem in graph visualization. A conventional approach is to vary the thickness of lines to encode the strength of connections in node-link diagrams. In this paper, we present Sticky Links, a novel visual encoding method that draws graph links with stickiness. Taking the metaphor of links with glues, sticky links represent connection strength using spiky shapes, ranging from two broken spikes for weak connections to connected lines for strong connections. We conducted a controlled user study to compare the efficiency and aesthetic appeal of stickiness with conventional thickness encoding. Our results show that stickiness enables more effective and expressive quantitative encoding while maintaining the perception of node connectivity. Participants also found sticky links to be more aesthetic and less visually cluttering than conventional thickness encoding. Overall, our findings suggest that sticky links offer a promising alternative to conventional methods for encoding quantitative information in graphs.
RESUMO
Synthesizing human motion with a global structure, such as a choreography, is a challenging task. Existing methods tend to concentrate on local smooth pose transitions and neglect the global context or the theme of the motion. In this work, we present a music-driven motion synthesis framework that generates long-term sequences of human motions which are synchronized with the input beats, and jointly form a global structure that respects a specific dance genre. In addition, our framework enables generation of diverse motions that are controlled by the content of the music, and not only by the beat. Our music-driven dance synthesis framework is a hierarchical system that consists of three levels: pose, motif, and choreography. The pose level consists of an LSTM component that generates temporally coherent sequences of poses. The motif level guides sets of consecutive poses to form a movement that belongs to a specific distribution using a novel motion perceptual-loss. And the choreography level selects the order of the performed movements and drives the system to follow the global structure of a dance genre. Our results demonstrate the effectiveness of our music-driven framework to generate natural and consistent movements on various dance types, having control over the content of the synthesized motions, and respecting the overall structure of the dance.
Assuntos
Dança , Música , Humanos , Percepção Auditiva , Gráficos por Computador , MovimentoRESUMO
Rigid registration of partial observations is a fundamental problem in various applied fields. In computer graphics, special attention has been given to the registration between two partial point clouds generated by scanning devices. State-of-the-art registration techniques still struggle when the overlap region between the two point clouds is small, and completely fail if there is no overlap between the scan pairs. In this article, we present a learning-based technique that alleviates this problem, and allows registration between point clouds, presented in arbitrary poses, and having little or even no overlap, a setting that has been referred to as tele-registration. Our technique is based on a novel neural network design that learns a prior of a class of shapes and can complete a partial shape. The key idea is combining the registration and completion tasks in a way that reinforces each other. In particular, we simultaneously train the registration network and completion network using two coupled flows, one that register-and-complete, and one that complete-and-register, and encourage the two flows to produce a consistent result. We show that, compared with each separate flow, this two-flow training leads to robust and reliable tele-registration, and hence to a better point cloud prediction that completes the registered scans. It is also worth mentioning that each of the components in our neural network outperforms state-of-the-art methods in both completion and registration. We further analyze our network with several ablation studies and demonstrate its performance on a large number of partial point clouds, both synthetic and real-world, that have only small or no overlap.
RESUMO
Static visual attributes such as color and shape are used with great success in visual charts designed to be displayed in static, hard-copy form. However, nowadays digital displays become ubiquitous in the visualization of any form of data, lifting the confines of static presentations. In this article, we propose incorporating data-driven animations to bring static charts to life, with the purpose of encoding and emphasizing certain attributes of the data. We lay out a design space for data-driven animated effects and experiment with three versatile effects, marching ants, geometry deformation and gradual appearance. For each, we provide practical details regarding their mode of operation and extent of interaction with existing visual encodings. We examine the impact and effectiveness of our enhancements through an empirical user study to assess preference as well as gauge the influence of animated effects on human perception in terms of speed and accuracy of visual understanding.
RESUMO
Recently, many deep neural networks were designed to process 3D point clouds, but a common drawback is that rotation invariance is not ensured, leading to poor generalization to arbitrary orientations. In this article, we introduce a new low-level purely rotation-invariant representation to replace common 3D Cartesian coordinates as the network inputs. Also, we present a network architecture to embed these representations into features, encoding local relations between points and their neighbors, and the global shape structure. To alleviate inevitable global information loss caused by the rotation-invariant representations, we further introduce a region relation convolution to encode local and non-local information. We evaluate our method on multiple point cloud analysis tasks, including (i) shape classification, (ii) part segmentation, and (iii) shape retrieval. Extensive experimental results show that our method achieves consistent, and also the best performance, on inputs at arbitrary orientations, compared with all the state-of-the-art methods.
RESUMO
Convolutional layers are the core building blocks of Convolutional Neural Networks (CNNs). In this paper, we propose to augment a convolutional layer with an additional depthwise convolution, where each input channel is convolved with a different 2D kernel. The composition of the two convolutions constitutes an over-parameterization, since it adds learnable parameters, while the resulting linear operation can be expressed by a single convolution layer. We refer to this depthwise over-parameterized convolutional layer as DO-Conv, which is a novel way of over-parameterization. We show with extensive experiments that the mere replacement of conventional convolutional layers with DO-Conv layers boosts the performance of CNNs on many classical vision tasks, such as image classification, detection, and segmentation. Moreover, in the inference phase, the depthwise convolution is folded into the conventional convolution, reducing the computation to be exactly equivalent to that of a convolutional layer without over-parameterization. As DO-Conv introduces performance gains without incurring any computational complexity increase for inference, we advocate it as an alternative to the conventional convolutional layer. We open sourced an implementation of DO-Conv in Tensorflow, PyTorch and GluonCV at https://github.com/yangyanli/DO-Conv.
RESUMO
In this paper, we present a novel non-parametric clustering technique. Our technique is based on the notion that each latent cluster is comprised of layers that surround its core, where the external layers, or border points, implicitly separate the clusters. Unlike previous techniques, such as DBSCAN, where the cores of the clusters are defined directly by their densities, here the latent cores are revealed by a progressive peeling of the border points. Analyzing the density of the local neighborhoods allows identifying the border points and associating them with points of inner layers. We show that the peeling process adapts to the local densities and characteristics to successfully separate adjacent clusters (of possibly different densities). We extensively tested our technique on large sets of labeled data, including high-dimensional datasets of deep features that were trained by a convolutional neural network. We show that our technique is competitive to other state-of-the-art non-parametric methods using a fixed set of parameters throughout the experiments.
RESUMO
Visualizing high-dimensional data on a 2D canvas is generally challenging. It becomes significantly more difficult when multiple time-steps are to be presented, as the visual clutter quickly increases. Moreover, the challenge to perceive the significant temporal evolution is even greater. In this paper, we present a method to plot temporal high-dimensional data in a static scatterplot; it uses the established PCA technique to project data from multiple time-steps. The key idea is to extend each individual displacement prior to applying PCA, so as to skew the projection process, and to set a projection plane that balances the directions of temporal change and spatial variance. We present numerous examples and various visual cues to highlight the data trajectories, and demonstrate the effectiveness of the method for visualizing temporal data.
RESUMO
This work proposes Winglets, an enhancement to the classic scatterplot to better perceptually pronounce multiple classes by improving the perception of association and uncertainty of points to their related cluster. Designed as a pair of dual-sided strokes belonging to a data point, Winglets leverage the Gestalt principle of Closure to shape the perception of the form of the clusters, rather than use an explicit divisive encoding. Through a subtle design of two dominant attributes, length and orientation, Winglets enable viewers to perform a mental completion of the clusters. A controlled user study was conducted to examine the efficiency of Winglets in perceiving the cluster association and the uncertainty of certain points. The results show Winglets form a more prominent association of points into clusters and improve the perception of associating uncertainty.
RESUMO
Multi-dimensional scaling (MDS) plays a central role in data-exploration, dimensionality reduction and visualization. State-of-the-art MDS algorithms are not robust to outliers, yielding significant errors in the embedding even when only a handful of outliers are present. In this paper, we introduce a technique to detect and filter outliers based on geometric reasoning. We test the validity of triangles formed by three points, and mark a triangle as broken if its triangle inequality does not hold. The premise of our work is that unlike inliers, outlier distances tend to break many triangles. Our method is tested and its performance is evaluated on various datasets and distributions of outliers. We demonstrate that for a reasonable amount of outliers, e.g., under 20 percent, our method is effective, and leads to a high embedding quality.
RESUMO
Recently, there has been increasing interest to leverage the competence of neural networks to analyze data. In particular, new clustering methods that employ deep embeddings have been presented. In this paper, we depart from centroid-based models and suggest a new framework, called Clustering-driven deep embedding with PAirwise Constraints (CPAC), for nonparametric clustering using a neural network. We present a clustering-driven embedding based on a Siamese network that encourages pairs of data points to output similar representations in the latent space. Our pair-based model allows augmenting the information with labeled pairs to constitute a semi-supervised framework. Our approach is based on analyzing the losses associated with each pair to refine the set of constraints. We show that clustering performance increases when using this scheme, even with a limited amount of user queries. We demonstrate how our architecture is adapted for various types of data and present the first deep framework to cluster three-dimensional (3-D) shapes.
RESUMO
We introduce a data-driven method to generate a large number of plausible, closely interacting 3D human pose-pairs, for a given motion category, e.g., wrestling or salsa dance. With much difficulty in acquiring close interactions using 3D sensors, our approach utilizes abundant existing video data which cover many human activities. Instead of treating the data generation problem as one of reconstruction, either through 3D acquisition or direct 2D-to-3D data lifting from video annotations, we present a solution based on Markov Chain Monte Carlo (MCMC) sampling. Given a motion category and a set of video frames depicting the motion with the 2D pose-pair in each frame annotated, we start the sampling with one or few seed 3D pose-pairs which are manually created based on the target motion category. The initial set is then augmented by MCMC sampling around the seeds, via the Metropolis-Hastings algorithm and guided by a probability density function (PDF) that is defined by two terms to bias the sampling towards 3D pose-pairs that are physically valid and plausible for the motion category. With a focus on efficient sampling over the space of close interactions, rather than pose spaces, we develop a novel representation called interaction coordinates (IC) to encode both poses and their interactions in an integrated manner. Plausibility of a 3D pose-pair is then defined based on the IC and with respect to the annotated 2D pose-pairs from video. We show that our sampling-based approach is able to efficiently synthesize a large volume of plausible, closely interacting 3D pose-pairs which provide a good coverage of the input 2D pose-pairs.
RESUMO
Recent psychological studies have strongly suggested that humans share common visual preferences for facial attractiveness. Here, we present a learning model that automatically extracts measurements of facial features from raw images and obtains human-level performance in predicting facial attractiveness ratings. The machine's ratings are highly correlated with mean human ratings, markedly improving on recent machine learning studies of this task. Simulated psychophysical experiments with virtually manipulated images reveal preferences in the machine's judgments that are remarkably similar to those of humans. Thus, a model trained explicitly to capture a specific operational performance criteria, implicitly captures basic human psychophysical characteristics.
Assuntos
Inteligência Artificial , Beleza , Face , Reconhecimento Visual de Modelos , Algoritmos , Face/anatomia & histologia , Feminino , Humanos , Processamento de Imagem Assistida por Computador/métodos , Julgamento , Fotografação , Psicofísica , Reprodutibilidade dos TestesRESUMO
We present a structure-aware technique to consolidate noisy data, which we use as a pre-process for standard clustering and dimensionality reduction. Our technique is related to mean shift, but instead of seeking density modes, it reveals and consolidates continuous high density structures such as curves and surface sheets in the underlying data while ignoring noise and outliers. We provide a theoretical analysis under a Gaussian noise model, and show that our approach significantly improves the performance of many non-linear dimensionality reduction and clustering algorithms in challenging scenarios.
RESUMO
A 3D shape signature is a compact representation for some essence of a shape. Shape signatures are commonly utilized as a fast indexing mechanism for shape retrieval. Effective shape signatures capture some global geometric properties which are scale, translation, and rotation invariant. In this paper, we introduce an effective shape signature which is also pose-oblivious. This means that the signature is also insensitive to transformations which change the pose of a 3D shape such as skeletal articulations. Although some topology-based matching methods can be considered pose-oblivious as well, our new signature retains the simplicity and speed of signature indexing. Moreover, contrary to topology-based methods, the new signature is also insensitive to the topology change of the shape, allowing us to match similar shapes with different genus. Our shape signature is a 2D histogram which is a combination of the distribution of two scalar functions defined on the boundary surface of the 3D shape. The first is a definition of a novel function called the local-diameter function. This function measures the diameter of the 3D shape in the neighborhood of each vertex. The histogram of this function is an informative measure of the shape which is insensitive to pose changes. The second is the centricity function that measures the average geodesic distance from one vertex to all other vertices on the mesh. We evaluate and compare a number of methods for measuring the similarity between two signatures, and demonstrate the effectiveness of our pose-oblivious shape signature within a 3D search engine application for different databases containing hundreds of models.
Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
We introduce a new class of shape approximation techniques for irregular triangular meshes. Our method approximates the geometry of the mesh using a linear combination of a small number of basis vectors. The basis vectors are functions of the mesh connectivity and of the mesh indices of a number of anchor vertices. There is a fundamental difference between the bases generated by our method and those generated by geometry-oblivious methods, such as Laplacian-based spectral methods. In the latter methods, the basis vectors are functions of the connectivity alone. The basis vectors of our method, in contrast, are geometry-aware since they depend on both the connectivity and on a binary tagging of vertices that are "geometrically important" in the given mesh (e.g., extrema). We show that, by defining the basis vectors to be the solutions of certain least-squares problems, the reconstruction problem reduces to solving a single sparse linear least-squares problem. We also show that this problem can be solved quickly using a state-of-the-art sparse-matrix factorization algorithm. We show how to select the anchor vertices to define a compact effective basis from which an approximated shape can be reconstructed. Furthermore, we develop an incremental update of the factorization of the least-squares system. This allows a progressive scheme where an initial approximation is incrementally refined by a stream of anchor points. We show that the incremental update and solving the factored system are fast enough to allow an online refinement of the mesh geometry.
Assuntos
Algoritmos , Gráficos por Computador , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Análise Numérica Assistida por Computador , Interface Usuário-ComputadorRESUMO
A new system provides a virtual experience akin to trying on clothing. It clones the user's photographic image into a catalog of images of models wearing the desired garments. Simple offline training extracts the user's head. Segmentation accurately separates the face, hair, and background, employing both a three-kernel statistical model and graph cuts. The system adjusts the resulting image's skin color according to a statistical model and relights the head via spherical harmonics. Finally, using a parametric model, the system warps the clone's body dimensions to fit the user's dimensions. This creates high-quality compositions of the user's image and the given garment.
RESUMO
The use of image guidance in medical applications is constantly growing because of its tremendous impact on the future of health care. Although image-based tissue tracking has been thoroughly explored in the academic literature for years, it has not yet matured to become widely accepted by clinicians. Undetected tissue movements in image-based clinical procedures may cause safety and efficacy difficulties. We introduce an image-based approach for detecting tissue movements during clinical procedures. Our method has been validated in more than 600 true clinical cases. The results show that our algorithm agrees with an expert analysis in 98% of the cases, showing zero events of false alarms and zero events of undetected motion. The results show that the approach provides a clinically ready motion detection algorithm. These robust results are achieved by introducing the concept of weighted directional descriptors (WDDs). The technique analyzes the directivity and confidence level of each anatomical feature and uses it to weight local inputs resulting in a robust motion vector. The robustness is further increased by a novel preprocess that screens out features that may be misleading or are repeated in the adjacent search zone. The technique meets the requirements, as defined by our clinicians, and is now integrated in true medical systems. In particular, our approach has been uniquely developed and integrated into a clinical product. ExAblate is the first Food and Drug Administration (FDA)-approved magnetic resonance (MR)-guided noninvasive surgical device using focused ultrasound therapy. It is used in commercial clinics and in leading medical academic research institutions, attesting to the success of our method and its practical clinical value.