RESUMEN
SUMMARY: Current synteny visualization tools either focus on small regions of sequence and do not illustrate genome-wide trends, or are complicated to use and create visualizations that are difficult to interpret. To address this challenge, The Comparative Genomics Platform (CoGe) has developed two web-based tools to visualize synteny across whole genomes. SynMap2 and SynMap3D allow researchers to explore whole genome synteny patterns (across two or three genomes, respectively) in responsive, web-based visualization and virtual reality environments. Both tools have access to the extensive CoGe genome database (containing over 30 000 genomes) as well as the option for users to upload their own data. By leveraging modern web technologies there is no installation required, making the tools widely accessible and easy to use. AVAILABILITY AND IMPLEMENTATION: Both tools are open source (MIT license) and freely available for use online through CoGe ( https://genomevolution.org ). SynMap2 and SynMap3D can be accessed at http://genomevolution.org/coge/SynMap.pl and http://genomevolution.org/coge/SynMap3D.pl , respectively. Source code is available: https://github.com/LyonsLab/coge . CONTACT: ericlyons@email.arizona.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genómica/métodos , Programas Informáticos , Sintenía , Navegador Web , Secuenciación Completa del Genoma , GenomaRESUMEN
Projection techniques are often used to visualize high-dimensional data, allowing users to better understand the overall structure of multi-dimensional spaces on a 2D screen. Although many such methods exist, comparably little work has been done on generalizable methods of inverse-projection - the process of mapping the projected points, or more generally, the projection space back to the original high-dimensional space. In this article we present NNInv, a deep learning technique with the ability to approximate the inverse of any projection or mapping. NNInv learns to reconstruct high-dimensional data from any arbitrary point on a 2D projection space, giving users the ability to interact with the learned high-dimensional representation in a visual analytics system. We provide an analysis of the parameter space of NNInv, and offer guidance in selecting these parameters. We extend validation of the effectiveness of NNInv through a series of quantitative and qualitative analyses. We then demonstrate the method's utility by applying it to three visualization tasks: interactive instance interpolation, classifier agreement, and gradient visualization.
RESUMEN
Understanding the behavior of software in execution is a key step in identifying and fixing performance issues. This is especially important in high performance computing contexts where even minor performance tweaks can translate into large savings in terms of computational resource use. To aid performance analysis, developers may collect an execution trace-a chronological log of program activity during execution. As traces represent the full history, developers can discover a wide array of possibly previously unknown performance issues, making them an important artifact for exploratory performance analysis. However, interactive trace visualization is difficult due to issues of data size and complexity of meaning. Traces represent nanosecond-level events across many parallel processes, meaning the collected data is often large and difficult to explore. The rise of asynchronous task parallel programming paradigms complicates the relation between events and their probable cause. To address these challenges, we conduct a continuing design study in collaboration with high performance computing researchers. We develop diverse and hierarchical ways to navigate and represent execution trace data in support of their trace analysis tasks. Through an iterative design process, we developed Traveler, an integrated visualization platform for task parallel traces. Traveler provides multiple linked interfaces to help navigate trace data from multiple contexts. We evaluate the utility of Traveler through feedback from users and a case study, finding that integrating multiple modes of navigation in our design supported performance analysis tasks and led to the discovery of previously unknown behavior in a distributed array library.
RESUMEN
The interpretation of deep neural networks (DNNs) has become a key topic as more and more people apply them to solve various problems and making critical decisions. Concept-based explanations have recently become a popular approach for post-hoc interpretation of DNNs. However, identifying human-understandable visual concepts that affect model decisions is a challenging task that is not easily addressed with automatic approaches. We present a novel human-in-the-Ioop approach to generate user-defined concepts for model interpretation and diagnostics. Central to our proposal is the use of active learning, where human knowledge and feedback are combined to train a concept extractor with very little human labeling effort. We integrate this process into an interactive system, ConceptExtract. Through two case studies, we show how our approach helps analyze model behavior and extract human-friendly concepts for different machine learning tasks and datasets and how to use these concepts to understand the predictions, compare model performance and make suggestions for model refinement. Quantitative experiments show that our active learning approach can accurately extract meaningful visual concepts. More importantly, by identifying visual concepts that negatively affect model performance, we develop the corresponding data augmentation strategy that consistently improves model performance.
Asunto(s)
Aprendizaje Profundo , Gráficos por Computador , Humanos , Aprendizaje Automático , Redes Neurales de la ComputaciónRESUMEN
Progressive visualization is fast becoming a technique in the visualization community to help users interact with large amounts of data. With progressive visualization, users can examine intermediate results of complex or long running computations, without waiting for the computation to complete. While this has shown to be beneficial to users, recent research has identified potential risks. For example, users may misjudge the uncertainty in the intermediate results and draw incorrect conclusions or see patterns that are not present in the final results. In this article, we conduct a comprehensive set of studies to quantify the advantages and limitations of progressive visualization. Based on a recent report by Micallef et al., we examine four types of cognitive biases that can occur with progressive visualization: uncertainty bias, illusion bias, control bias, and anchoring bias. The results of the studies suggest a cautious but promising use of progressive visualization - while there can be significant savings in task completion time, accuracy can be negatively affected in certain conditions. These findings confirm earlier reports of the benefits and drawbacks of progressive visualization and that continued research into mitigating the effects of cognitive biases is necessary.
Asunto(s)
Cognición , Gráficos por Computador , Sesgo , IncertidumbreRESUMEN
In the last two decades, interactive visualization and analysis have become a central tool in data-driven decision making. Concurrently to the contributions in data visualization, research in data management has produced technology that directly benefits interactive analysis. Here, we contribute a systematic review of 30 years of work in this adjacent field, and highlight techniques and principles we believe to be underappreciated in visualization work. We structure our review along two axes. First, we use task taxonomies from the visualization literature to structure the space of interactions in usual systems. Second, we created a categorization of data management work that strikes a balance between specificity and generality. Concretely, we contribute a characterization of 131 research papers along these two axes. We find that five notions in data management venues fit interactive visualization systems well: materialized views, approximate query processing, user modeling and query prediction, muiti-query optimization, lineage techniques, and indexing techniques. In addition, we find a preponderance of work in materialized views and approximate query processing, most targeting a limited subset of the interaction tasks in the taxonomy we used. This suggests natural avenues of future research both in visualization and data management. Our categorization both changes how we visualization researchers design and build our systems, and highlights where future work is necessary.
RESUMEN
Graphs are commonly used to encode relationships among entities, yet their abstractness makes them difficult to analyze. Node-link diagrams are popular for drawing graphs, and force-directed layouts provide a flexible method for node arrangements that use local relationships in an attempt to reveal the global shape of the graph. However, clutter and overlap of unrelated structures can lead to confusing graph visualizations. This paper leverages the persistent homology features of an undirected graph as derived information for interactive manipulation of force-directed layouts. We first discuss how to efficiently extract 0-dimensional persistent homology features from both weighted and unweighted undirected graphs. We then introduce the interactive persistence barcode used to manipulate the force-directed graph layout. In particular, the user adds and removes contracting and repulsing forces generated by the persistent homology features, eventually selecting the set of persistent homology features that most improve the layout. Finally, we demonstrate the utility of our approach across a variety of synthetic and real datasets.
RESUMEN
We investigate the influence of bandwidth selection in the reconstruction quality of point-based surfaces. While the problem has received relatively little attention in the literature, we show that appropriate selection plays a significant role in the quality of reconstructed surfaces. We show how to compute optimal bandwidths for one class of moving least-squares surfaces by formulating the polynomial fitting step as a kernel regression problem for both noiseless and noisy data. In the context of Levin's projection, we also discuss the implications of the two-step projection for bandwidth selection. We show experimental comparisons of our method, which outperforms heuristically chosen functions and weights previously proposed. We also show the influence of bandwidth on the reconstruction quality of different formulations of point-based surfaces. We provide, to the best of our knowledge, the first quantitative comparisons between different MLS surface formulations and their optimal bandwidths. Using these experiments, we investigate the choice of effective bandwidths for these alternative formulations. We conclude with a discussion of how to effectively compare the different MLS formulations in the literature.
RESUMEN
Visual representations of isosurfaces are ubiquitous in the scientific and engineering literature. In this paper, we present techniques to assess the behavior of isosurface extraction codes. Where applicable, these techniques allow us to distinguish whether anomalies in isosurface features can be attributed to the underlying physical process or to artifacts from the extraction process. Such scientific scrutiny is at the heart of verifiable visualization--subjecting visualization algorithms to the same verification process that is used in other components of the scientific pipeline. More concretely, we derive formulas for the expected order of accuracy (or convergence rate) of several isosurface features, and compare them to experimentally observed results in the selected codes. This technique is practical: in two cases, it exposed actual problems in implementations. We provide the reader with the range of responses they can expect to encounter with isosurface techniques, both under "normal operating conditions" and also under adverse conditions. Armed with this information--the results of the verification process--practitioners can judiciously select the isosurface extraction technique appropriate for their problem of interest, and have confidence in its behavior.
RESUMEN
Marching Cubes is a popular choice for isosurface extraction from regular grids due to its simplicity, robustness, and efficiency. One of the key shortcomings of this approach is the quality of the resulting meshes, which tend to have many poorly shaped and degenerate triangles. This issue is often addressed through post processing operations such as smoothing. As we demonstrate in experiments with several datasets, while these improve the mesh, they do not remove all degeneracies, and incur an increased and unbounded error between the resulting mesh and the original isosurface. Rather than modifying the resulting mesh, we propose a method to modify the grid on which Marching Cubes operates. This modification greatly increases the quality of the extracted mesh. In our experiments, our method did not create a single degenerate triangle, unlike any other method we experimented with. Our method incurs minimal computational overhead, requiring at most twice the execution time of the original Marching Cubes algorithm in our experiments. Most importantly, it can be readily integrated in existing Marching Cubes implementations, and is orthogonal to many Marching Cubes enhancements (particularly, performance enhancements such as out-of-core and acceleration structures).
Asunto(s)
Algoritmos , Gráficos por Computador , Interpretación de Imagen Asistida por Computador/métodos , Modelos Teóricos , Interfaz Usuario-Computador , Simulación por ComputadorRESUMEN
Recent results have shown a link between geometric properties of isosurfaces and statistical properties of the underlying sampled data. However, this has two defects: not all of the properties described converge to the same solution, and the statistics computed are not always invariant under isosurface-preserving transformations. We apply Federer's Coarea Formula from geometric measure theory to explain these discrepancies. We describe an improved substitute for histograms based on weighting with the inverse gradient magnitude, develop a statistical model that is invariant under isosurface-preserving transformations, and argue that this provides a consistent method for algorithm evaluation across multiple datasets based on histogram equalization. We use our corrected formulation to reevaluate recent results on average isosurface complexity, and show evidence that noise is one cause of the discrepancy between the expected figure and the observed one.
RESUMEN
Building visualization and analysis pipelines is a large hurdle in the adoption of visualization and workflow systems by domain scientists. In this paper, we propose techniques to help users construct pipelines by consensus--automatically suggesting completions based on a database of previously created pipelines. In particular, we compute correspondences between existing pipeline subgraphs from the database, and use these to predict sets of likely pipeline additions to a given partial pipeline. By presenting these predictions in a carefully designed interface, users can create visualizations and other data products more efficiently because they can augment their normal work patterns with the suggested completions. We present an implementation of our technique in a publicly-available, open-source scientific workflow system and demonstrate efficiency gains in real-world situations.
RESUMEN
Marching Cubes is the most popular isosurface extraction algorithm due to its simplicity, efficiency and robustness. It has been widely studied, improved, and extended. While much early work was concerned with efficiency and correctness issues, lately there has been a push to improve the quality of Marching Cubes meshes so that they can be used in computational codes. In this work we present a new classification of MC cases that we call Edge Groups, which helps elucidate the issues that impact the triangle quality of the meshes that the method generates. This formulation allows a more systematic way to bound the triangle quality, and is general enough to extend to other polyhedral cell shapes used in other polygonization algorithms. Using this analysis, we also discuss ways to improve the quality of the resulting triangle mesh, including some that require only minor modifications of the original algorithm.
RESUMEN
Non-linear dimensionality reduction (NDR) methods such as LLE and t-SNE are popular with visualization researchers and experienced data analysts, but present serious problems of interpretation. In this paper, we present DimReader, a technique that recovers readable axes from such techniques. DimReader is based on analyzing infinitesimal perturbations of the dataset with respect to variables of interest. The perturbations define exactly how we want to change each point in the original dataset and we measure the effect that these changes have on the projection. The recovered axes are in direct analogy with the axis lines (grid lines) of traditional scatterplots. We also present methods for discovering perturbations on the input data that change the projection the most. The calculation of the perturbations is efficient and easily integrated into programs written in modern programming languages. We present results of DimReader on a variety of NDR methods and datasets both synthetic and real-life, and show how it can be used to compare different NDR methods. Finally, we discuss limitations of our proposal and situations where further research is needed.
RESUMEN
Animated transitions can be effective in explaining and exploring a small number of visualizations where there are drastic changes in the scene over a short interval of time. This is especially true if data elements cannot be visually distinguished by other means. Current research in animated transitions has mainly focused on linear transitions (all elements follow straight line paths) or enhancing coordinated motion through bundling of linear trajectories. In this paper, we introduce animated transition design, a technique to build smooth, non-linear transitions for clustered data with either minimal or no user involvement. The technique is flexible and simple to implement, and has the additional advantage that it explicitly enhances coordinated motion and can avoid crowding, which are both important factors to support object tracking in a scene. We investigate its usability, provide preliminary evidence for the effectiveness of this technique through metric evaluations and user study and discuss limitations and future directions.
RESUMEN
Famous examples such as Anscombe's Quartet highlight that one of the core benefits of visualizations is allowing people to discover visual patterns that might otherwise be hidden by summary statistics. This visual inspection is particularly important in exploratory data analysis, where analysts can use visualizations such as histograms and dot plots to identify data quality issues. Yet, these visualizations are driven by parameters such as histogram bin size or mark opacity that have a great deal of impact on the final visual appearance of the chart, but are rarely optimized to make important features visible. In this paper, we show that data flaws have varying impact on the visual features of visualizations, and that the adversarial or merely uncritical setting of design parameters of visualizations can obscure the visual signatures of these flaws. Drawing on the framework of Algebraic Visualization Design, we present the results of a crowdsourced study showing that common visualization types can appear to reasonably summarize distributional data while hiding large and important flaws such as missing data and extraneous modes. We make use of these results to propose additional best practices for visualizations of distributions for data quality tasks.
RESUMEN
We describe bivariate cartograms, a technique specifically designed to allow for the simultaneous comparison of two geo-statistical variables. Traditional cartograms are designed to show only a single statistical variable, but in practice, it is often useful to show two variables (e.g., the total sales for two competing companies) simultaneously. We illustrate bivariate cartograms using Dorling-style cartograms, yet the technique is simple and generalizable to other cartogram types, such as contiguous cartograms, rectangular cartograms, and non-contiguous cartograms. An interactive feature makes it possible to switch between bivariate cartograms, and the traditional (monovariate) cartograms. Bivariate cartograms make it easy to find more geographic patterns and outliers in a pre-attentive way than previous approaches, as shown in Fig. 2 . They are most effective for showing two variables from the same domain (e.g., population in two different years, sales for two different companies), although they can also be used for variables from different domains (e.g., population and income). We also describe a small-scale evaluation of the proposed techniques that indicates bivariate cartograms are especially effective for finding geo-statistical patterns, trends and outliers.
RESUMEN
While there have been advances in visualization systems, particularly in multi-view visualizations and visual exploration, the process of building visualizations remains a major bottleneck in data exploration. We show that provenance metadata collected during the creation of pipelines can be reused to suggest similar content in related visualizations and guide semi-automated changes. We introduce the idea of query-by-example in the context of an ensemble of visualizations, and the use of analogies as first-class operations in a system to guide scalable interactions. We describe an implementation of these techniques in VisTrails, a publicly-available, open-source system.
RESUMEN
Recently proposed techniques have finally made it possible for analysts to interactively explore very large datasets in real time. However powerful, the class of analyses these systems enable is somewhat limited: specifically, one can only quickly obtain plots such as histograms and heatmaps. In this paper, we contribute Gaussian Cubes, which significantly improves on state-of-the-art systems by providing interactive modeling capabilities, which include but are not limited to linear least squares and principal components analysis (PCA). The fundamental insight in Gaussian Cubes is that instead of precomputing counts of many data subsets (as state-of-the-art systems do), Gaussian Cubes precomputes the best multivariate Gaussian for the respective data subsets. As an example, Gaussian Cubes can fit hundreds of models over millions of data points in well under a second, enabling novel types of visual exploration of such large datasets. We present three case studies that highlight the visualization and analysis capabilities in Gaussian Cubes, using earthquake safety simulations, astronomical catalogs, and transportation statistics. The dataset sizes range around one hundred million elements and 5 to 10 dimensions. We present extensive performance results, a discussion of the limitations in Gaussian Cubes, and future research directions.
RESUMEN
We propose Hashedcubes, a data structure that enables real-time visual exploration of large datasets that improves the state of the art by virtue of its low memory requirements, low query latencies, and implementation simplicity. In some instances, Hashedcubes notably requires two orders of magnitude less space than recent data cube visualization proposals. In this paper, we describe the algorithms to build and query Hashedcubes, and how it can drive well-known interactive visualizations such as binned scatterplots, linked histograms and heatmaps. We report memory usage, build time and query latencies for a variety of synthetic and real-world datasets, and find that although sometimes Hashedcubes offers slightly slower querying times to the state of the art, the typical query is answered fast enough to easily sustain a interaction. In datasets with hundreds of millions of elements, only about 2% of the queries take longer than 40ms. Finally, we discuss the limitations of data structure, potential spacetime tradeoffs, and future research directions.