RESUMEN
The risk perception attitude (RPA) framework was tested as a message tailoring strategy to encourage diabetes screening. Participants (N = 602) were first categorized into one of four RPA groups based on their diabetes risk and efficacy perceptions and then randomly assigned to receive a message that matched their RPA, mismatched their RPA, or a control message. Participants receiving a matched message reported greater intentions to engage in self-protective behavior than participants who received a mismatched message or the control message. The results also showed differences in attitudes and behavioral intentions across the four RPA groups. Participants in the responsive group had more positive attitudes toward diabetes screening than the other three groups, whereas participants in the indifferent group reported the weakest intentions to engage in self-protective behavior.
Asunto(s)
Comunicación , Diabetes Mellitus/diagnóstico , Conocimientos, Actitudes y Práctica en Salud , Promoción de la Salud , Tamizaje Masivo , Riesgo , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estados UnidosAsunto(s)
Conducta Cooperativa , Ciencia de los Datos , Comunicación Interdisciplinaria , Relaciones Interprofesionales , Biología Computacional , Ciencia de los Datos/ética , Ciencia de los Datos/organización & administración , Ciencia de los Datos/tendencias , Humanos , Colaboración IntersectorialRESUMEN
Analysts often have to work with and make sense of large complex networks. One possible solution is to make visualisations interactive, providing users with a way to control visual clutter. Although several interactive methods have been proposed, there may be situations where some of them are too specific to be directly applicable. We have therefore identified several underlying low-level visual transformations, steered by group structures in the networks, and investigated their individual effects on user performance. This may both facilitate the development of further methods and support the generation of new hypotheses. We conducted an exploratory online experiment with 300 participants, involving five tasks, one control condition, and five group-based visual transformations: de-emphasising groups by opacity, position or size, aggregating groups, and hiding groups. The results for the three tasks that were specifically referring to groups show a high usage of the visual transformations by participants and several positive effects of the latter on accuracy, completion time, and mental effort spent. On the other hand, the two tasks that were not directly referring to groups show a lower usage of the visual transformations and the results regarding effects are rather mixed. Supplemental materials are available on DaRUS at https://doi.org/10.18419/darus-3706.
RESUMEN
Large tree structures are ubiquitous and real-world relational datasets often have information associated with nodes (e.g., labels or other attributes) and edges (e.g., weights or distances) that need to be communicated to the viewers. Yet, scalable, easy to read tree layouts are difficult to achieve. We consider tree layouts to be readable if they meet some basic requirements: node labels should not overlap, edges should not cross, edge lengths should be preserved, and the output should be compact. There are many algorithms for drawing trees, although very few take node labels or edge lengths into account, and none optimizes all requirements above. With this in mind, we propose a new scalable method for readable tree layouts. The algorithm guarantees that the layout has no edge crossings and no label overlaps, and optimizes one of the remaining aspects: desired edge lengths and compactness. We evaluate the performance of the new algorithm by comparison with related earlier approaches using several real-world datasets, ranging from a few thousand nodes to hundreds of thousands of nodes. Tree layout algorithms can be used to visualize large general graphs, by extracting a hierarchy of progressively larger trees. We illustrate this functionality by presenting several map-like visualizations generated by the new tree layout algorithm.
RESUMEN
Relational information between different types of entities is often modelled by a multilayer network (MLN) - a network with subnetworks represented by layers. The layers of an MLN can be arranged in different ways in a visual representation, however, the impact of the arrangement on the readability of the network is an open question. Therefore, we studied this impact for several commonly occurring tasks related to MLN analysis. Additionally, layer arrangements with a dimensionality beyond 2D, which are common in this scenario, motivate the use of stereoscopic displays. We ran a human subject study utilising a Virtual Reality headset to evaluate 2D, 2.5D, and 3D layer arrangements. The study employs six analysis tasks that cover the spectrum of an MLN task taxonomy, from path finding and pattern identification to comparisons between and across layers. We found no clear overall winner. However, we explore the task-to-arrangement space and derive empirical-based recommendations on the effective use of 2D, 2.5D, and 3D layer arrangements for MLNs.
RESUMEN
BACKGROUND: Increasing an individual's awareness and understanding of their dietary habits and reasons for eating may help facilitate positive dietary changes. Mobile technologies allow individuals to record diet-related behavior in real time from any location; however, the most popular software applications lack empirical evidence supporting their efficacy as health promotion tools. OBJECTIVE: The purpose of this study was to test the feasibility and acceptability of a popular social media software application (Twitter) to capture young adults' dietary behavior and reasons for eating. A secondary aim was to visualize data from Twitter using a novel analytic tool designed to help identify relationships among dietary behaviors, reasons for eating, and contextual factors. METHODS: Participants were trained to record all food and beverages consumed over 3 consecutive days (2 weekdays and 1 weekend day) using their mobile device's native Twitter application. A list of 24 hashtags (#) representing food groups and reasons for eating were provided to participants to guide reporting (eg, #protein, #mood). Participants were encouraged to annotate hashtags with contextual information using photos, text, and links. User experience was assessed through a combination of email reports of technical challenges and a 9-item exit survey. Participant data were captured from the public Twitter stream, and frequency of hashtag occurrence and co-occurrence were determined. Contextual data were further parsed and qualitatively analyzed. A frequency matrix was constructed to identify food and behavior hashtags that co-occurred. These relationships were visualized using GMap algorithmic mapping software. RESULTS: A total of 50 adults completed the study. In all, 773 tweets including 2862 hashtags (1756 foods and 1106 reasons for eating) were reported. Frequently reported food groups were #grains (n=365 tweets), #dairy (n=221), and #protein (n=307). The most frequently cited reasons for eating were #social (activity) (n=122), #taste (n=146), and #convenience (n=173). Participants used a combination of study-provided hash tags and their own hash tags to describe behavior. Most rated Twitter as easy to use for the purpose of reporting diet-related behavior. "Maps" of hash tag occurrences and co-occurrences were developed that suggested time-varying diet and behavior patterns. CONCLUSIONS: Twitter combined with an analytical software tool provides a method for capturing real-time food consumption and diet-related behavior. Data visualization may provide a method to identify relationships between dietary and behavioral factors. These findings will inform the design of a study exploring the use of social media and data visualization to identify relationships between food consumption, reasons for engaging in specific food-related behaviors, relevant contextual factors, and weight and health statuses in diverse populations.
Asunto(s)
Dieta , Conducta Alimentaria , Internet , Adulto , Estudios de Factibilidad , Humanos , Apoyo SocialRESUMEN
Bipartite graphs model the relationships between two disjoint sets of entities in several applications and are naturally drawn as 2-layer graph drawings. In such drawings, the two sets of entities (vertices) are placed on two parallel lines (layers), and their relationships (edges) are represented by segments connecting vertices. Methods for constructing 2-layer drawings often try to minimize the number of edge crossings. We use vertex splitting to reduce the number of crossings, by replacing selected vertices on one layer by two (or more) copies and suitably distributing their incident edges among these copies. We study several optimization problems related to vertex splitting, either minimizing the number of crossings or removing all crossings with fewest splits. While we prove that some variants are ${\mathsf {NP}}$NP-complete, we obtain polynomial-time algorithms for others. We run our algorithms on a benchmark set of bipartite graphs representing the relationships between human anatomical structures and cell types.
RESUMEN
Readability criteria, such as distance or neighborhood preservation, are often used to optimize node-link representations of graphs to enable the comprehension of the underlying data. With few exceptions, graph drawing algorithms typically optimize one such criterion, usually at the expense of others. We propose a layout approach, Multicriteria Scalable Graph Drawing via Stochastic Gradient Descent, (SGD)2, that can handle multiple readability criteria. (SGD)2 can optimize any criterion that can be described by a differentiable function. Our approach is flexible and can be used to optimize several criteria that have already been considered earlier (e.g., obtaining ideal edge lengths, stress, neighborhood preservation) as well as other criteria which have not yet been explicitly optimized in such fashion (e.g., node resolution, angular resolution, aspect ratio). The approach is scalable and can handle large graphs. A variation of the underlying approach can also be used to optimize many desirable properties in planar graphs, while maintaining planarity. Finally, we provide quantitative and qualitative evidence of the effectiveness of (SGD)2: we analyze the interactions between criteria, measure the quality of layouts generated from (SGD)2 as well as the runtime behavior, and analyze the impact of sample sizes. The source code is available on github and we also provide an interactive demo for small graphs.
RESUMEN
Cartograms are popular for visualizing numerical data for administrative regions in thematic maps. When there are multiple data values per region (over time or from different datasets) shown as animated or juxtaposed cartograms, preserving the viewer's mental map in terms of stability between multiple cartograms is another important criterion alongside traditional cartogram criteria such as maintaining adjacencies. We present a method to compute stable stable Demers cartograms, where each region is shown as a square scaled proportionally to the given numerical data and similar data yield similar cartograms. We enforce orthogonal separation constraints using linear programming, and measure quality in terms of keeping adjacent regions close (cartogram quality) and using similar positions for a region between the different data values (stability). Our method guarantees the ability to connect most lost adjacencies with minimal-length planar orthogonal polylines. Experiments show that our method yields good quality and stability on multiple quality criteria.
RESUMEN
Set systems are used to model data that naturally arises in many contexts: social networks have communities, musicians have genres, and patients have symptoms. Visualizations that accurately reflect the information in the underlying set system make it possible to identify the set elements, the sets themselves, and the relationships between the sets. In static contexts, such as print media or infographics, it is necessary to capture this information without the help of interactions. With this in mind, we consider three different systems for medium-sized set data, LineSets, EulerView, and MetroSets, and report the results of a controlled human-subjects experiment comparing their effectiveness. Specifically, we evaluate the performance, in terms of time and error, on tasks that cover the spectrum of static set-based tasks. We also collect and analyze qualitative data about the three different visualization systems. Our results include statistically significant differences, suggesting that MetroSets performs and scales better.
RESUMEN
We propose MetroSets, a new, flexible online tool for visualizing set systems using the metro map metaphor. We model a given set system as a hypergraph H=(V, S), consisting of a set V of vertices and a set S, which contains subsets of V called hyperedges. Our system then computes a metro map representation of H, where each hyperedge E in S corresponds to a metro line and each vertex corresponds to a metro station. Vertices that appear in two or more hyperedges are drawn as interchanges in the metro map, connecting the different sets. MetroSets is based on a modular 4-step pipeline which constructs and optimizes a path-based hypergraph support, which is then drawn and schematized using metro map layout algorithms. We propose and implement multiple algorithms for each step of the MetroSet pipeline and provide a functional prototype with easy-to-use preset configurations. Furthermore, using several real-world datasets, we perform an extensive quantitative evaluation of the impact of different pipeline stages on desirable properties of the generated maps, such as octolinearity, monotonicity, and edge uniformity.
RESUMEN
We describe MPSE: a Multi-Perspective Simultaneous Embedding method for visualizing high-dimensional data, based on multiple pairwise distances between the data points. Specifically, MPSE computes positions for the points in 3D and provides different views into the data by means of 2D projections (planes) that preserve each of the given distance matrices. We consider two versions of the problem: fixed projections and variable projections. MPSE with fixed projections takes as input a set of pairwise distance matrices defined on the data points, along with the same number of projections and embeds the points in 3D so that the pairwise distances are preserved in the given projections. MPSE with variable projections takes as input a set of pairwise distance matrices and embeds the points in 3D while also computing the appropriate projections that preserve the pairwise distances. The proposed approach can be useful in multiple scenarios: from creating simultaneous embedding of multiple graphs on the same set of vertices, to reconstructing a 3D object from multiple 2D snapshots, to analyzing data from multiple points of view. We provide a functional prototype of MPSE that is based on an adaptive and stochastic generalization of multi-dimensional scaling to multiple distances and multiple variable projections. We provide an extensive quantitative evaluation with datasets of different sizes and using different number of projections, as well as several examples that illustrate the quality of the resulting solutions.
RESUMEN
Data analysts commonly utilize statistics to summarize large datasets. While it is often sufficient to explore only the summary statistics of a dataset (e.g., min/mean/max), Anscombe's Quartet demonstrates how such statistics can be misleading. We consider a similar problem in the context of graph mining. To study the relationships between different graph properties, we examine low-order non-isomorphic graphs and provide a simple visual analytics system to explore correlations across multiple graph properties. However, for larger graphs, studying the entire space quickly becomes intractable. We use different random graph generation methods to further look into the distribution of graph properties for higher order graphs and investigate the impact of various sampling methodologies. We also describe a method for generating many graphs that are identical over a number of graph properties and statistics yet are clearly different and identifiably distinct.
RESUMEN
Dynamic graph drawing algorithms take as input a series of timeslices that standard, force-directed algorithms can exploit to compute a layout. However, often dynamic graphs are expressed as a series of events where the nodes and edges have real coordinates along the time dimension that are not confined to discrete timeslices. Current techniques for dynamic graph drawing impose a set of timeslices on this event-based data in order to draw the dynamic graph, but it is unclear how many timeslices should be selected: too many timeslices slows the computation of the layout, while too few timeslices obscures important temporal features, such as causality. To address these limitations, we introduce a novel model for drawing event-based dynamic graphs and the first dynamic graph drawing algorithm, DynNoSlice, that is capable of drawing dynamic graphs in this model. DynNoSlice is an offline, force-directed algorithm that draws event-based, dynamic graphs in the space-time cube (2D+time). We also present a method to extract representative small multiples from the space-time cube. To demonstrate the advantages of our approach, DynNoSlice is compared with state-of-the-art timeslicing methods using a metrics-based experiment. Finally, we present case studies of event-based dynamic data visualised with the new model and algorithm.
RESUMEN
Visualizing network data is applicable in domains such as biology, engineering, and social sciences. We report the results of a study comparing the effectiveness of the two primary techniques for showing network data: node-link diagrams and adjacency matrices. Specifically, an evaluation with a large number of online participants revealed statistically significant differences between the two visualizations. Our work adds to existing research in several ways. First, we explore a broad spectrum of network tasks, many of which had not been previously evaluated. Second, our study uses two large datasets, typical of many real-life networks not explored by previous studies. Third, we leverage crowdsourcing to evaluate many tasks with many participants. This paper is an expanded journal version of a Graph Drawing (GD'17) conference paper. We evaluated a second dataset, added a qualitative feedback section, and expanded the procedure, results, discussion, and limitations sections.
Asunto(s)
Gráficos por Computador , Visualización de Datos , Adulto , Anciano , Colaboración de las Masas , Femenino , Humanos , Masculino , Persona de Mediana Edad , Análisis y Desempeño de Tareas , Adulto JovenRESUMEN
Cartograms are maps in which areas of geographic regions, such as countries and states, appear in proportion to some variable of interest, such as population or income. Cartograms are popular visualizations for geo-referenced data that have been used for over a century to illustrate patterns and trends in the world around us. Despite the popularity of cartograms, and the large number of cartogram types, there are few studies evaluating the effectiveness of cartograms in conveying information. Based on a recent task taxonomy for cartograms, we evaluate four major types of cartograms: contiguous, non-contiguous, rectangular, and Dorling cartograms. We first evaluate the effectiveness of these cartogram types by quantitative performance analysis (time and error). Second, we collect qualitative data with an attitude study and by analyzing subjective preferences. Third, we compare the quantitative and qualitative results with the results of a metrics-based cartogram evaluation. Fourth, we analyze the results of our study in the context of cartography, geography, visual perception, and demography. Finally, we consider implications for design and possible improvements.
RESUMEN
We describe bivariate cartograms, a technique specifically designed to allow for the simultaneous comparison of two geo-statistical variables. Traditional cartograms are designed to show only a single statistical variable, but in practice, it is often useful to show two variables (e.g., the total sales for two competing companies) simultaneously. We illustrate bivariate cartograms using Dorling-style cartograms, yet the technique is simple and generalizable to other cartogram types, such as contiguous cartograms, rectangular cartograms, and non-contiguous cartograms. An interactive feature makes it possible to switch between bivariate cartograms, and the traditional (monovariate) cartograms. Bivariate cartograms make it easy to find more geographic patterns and outliers in a pre-attentive way than previous approaches, as shown in Fig. 2 . They are most effective for showing two variables from the same domain (e.g., population in two different years, sales for two different companies), although they can also be used for variables from different domains (e.g., population and income). We also describe a small-scale evaluation of the proposed techniques that indicates bivariate cartograms are especially effective for finding geo-statistical patterns, trends and outliers.
RESUMEN
BACKGROUND: Software designed to accurately estimate food calories from still images could help users and health professionals identify dietary patterns and food choices associated with health and health risks more effectively. However, calorie estimation from images is difficult, and no publicly available software can do so accurately while minimizing the burden associated with data collection and analysis. OBJECTIVE: The aim of this study was to determine the accuracy of crowdsourced annotations of calorie content in food images and to identify and quantify sources of bias and noise as a function of respondent characteristics and food qualities (eg, energy density). METHODS: We invited adult social media users to provide calorie estimates for 20 food images (for which ground truth calorie data were known) using a custom-built webpage that administers an online quiz. The images were selected to provide a range of food types and energy density. Participants optionally provided age range, gender, and their height and weight. In addition, 5 nutrition experts provided annotations for the same data to form a basis of comparison. We examined estimated accuracy on the basis of expertise, demographic data, and food qualities using linear mixed-effects models with participant and image index as random variables. We also analyzed the advantage of aggregating nonexpert estimates. RESULTS: A total of 2028 respondents agreed to participate in the study (males: 770/2028, 37.97%, mean body mass index: 27.5 kg/m2). Average accuracy was 5 out of 20 correct guesses, where "correct" was defined as a number within 20% of the ground truth. Even a small crowd of 10 individuals achieved an accuracy of 7, exceeding the average individual and expert annotator's accuracy of 5. Women were more accurate than men (P<.001), and younger people were more accurate than older people (P<.001). The calorie content of energy-dense foods was overestimated (P=.02). Participants performed worse when images contained reference objects, such as credit cards, for scale (P=.01). CONCLUSIONS: Our findings provide new information about how calories are estimated from food images, which can inform the design of related software and analyses.
RESUMEN
OVERVIEW: Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. CLUSTER QUALITY METRICS: We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. NETWORK CLUSTERING ALGORITHMS: Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
Asunto(s)
Algoritmos , Análisis por Conglomerados , Biología Computacional/métodosRESUMEN
We present a conceptually simple approach to generalizing force-directed methods for graph layout from Euclidean geometry to Riemannian geometries. Unlike previous work on non-Euclidean force-directed methods, ours is not limited to special classes of graphs, but can be applied to arbitrary graphs. The method relies on extending the Euclidean notions of distance, angle, and force-interactions to smooth non-Euclidean geometries via projections to and from appropriately chosen tangent spaces. In particular, we formally describe the calculations needed to extend such algorithms to hyperbolic and spherical geometries. We also study the theoretical and practical considerations that arise when working with non-Euclidean geometries.