Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 15(9): e1007244, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31557157

RESUMO

Biological network figures are ubiquitous in the biology and medical literature. On the one hand, a good network figure can quickly provide information about the nature and degree of interactions between items and enable inferences about the reason for those interactions. On the other hand, good network figures are difficult to create. In this paper, we outline 10 simple rules for creating biological network figures for communication, from choosing layouts, to applying color or other channels to show attributes, to the use of layering and separation. These rules are accompanied by illustrative examples. We also provide a concise set of references and additional resources for each rule.


Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Atenção , Cor , Humanos , Mapas de Interação de Proteínas/fisiologia , Transdução de Sinais/fisiologia , Percepção Visual
3.
Bioinformatics ; 33(18): 2938-2940, 2017 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-28645171

RESUMO

MOTIVATION: Venn and Euler diagrams are a popular yet inadequate solution for quantitative visualization of set intersections. A scalable alternative to Venn and Euler diagrams for visualizing intersecting sets and their properties is needed. RESULTS: We developed UpSetR, an open source R package that employs a scalable matrix-based visualization to show intersections of sets, their size, and other properties. AVAILABILITY AND IMPLEMENTATION: UpSetR is available at https://github.com/hms-dbmi/UpSetR/ and released under the MIT License. A Shiny app is available at https://gehlenborglab.shinyapps.io/upsetr/ . CONTACT: nils@hms.harvard.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Software , Técnicas de Genotipagem/métodos , Análise de Sequência de DNA/métodos
4.
BMC Bioinformatics ; 18(1): 406, 2017 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-28899361

RESUMO

BACKGROUND: With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. RESULTS: In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. CONCLUSIONS: Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.


Assuntos
Algoritmos , Interface Usuário-Computador , Análise por Conglomerados , Genótipo , Humanos , Internet , Neoplasias/classificação , Neoplasias/genética , Neoplasias/patologia , Fenótipo
5.
Artigo em Inglês | MEDLINE | ID: mdl-39312426

RESUMO

Exploratory data science is an iterative process of obtaining, cleaning, profiling, analyzing, and interpreting data. This cyclical way of working creates challenges within the linear structure of computational notebooks, leading to issues with code quality, recall, and reproducibility. To remedy this, we present Loops, a set of visual support techniques for iterative and exploratory data analysis in computational notebooks. Loops leverages provenance information to visualize the impact of changes made within a notebook. In visualizations of the notebook provenance, we trace the evolution of the notebook over time and highlight differences between versions. Loops visualizes the provenance of code, markdown, tables, visualizations, and images and their respective differences. Analysts can explore these differences in detail in a separate view. Loops not only makes the analysis process transparent but also supports analysts in their data science work by showing the effects of changes and facilitating comparison of multiple versions. We demonstrate our approach's utility and potential impact in two use cases and feedback from notebook users from various backgrounds. This paper and all supplemental materials are available at https://osf.io/79eyn.

6.
Artigo em Inglês | MEDLINE | ID: mdl-39255114

RESUMO

How do cancer cells grow, divide, proliferate, and die? How do drugs infuence these processes? These are diffcult questions that we can attempt to answer with a combination of time-series microscopy experiments, classifcation algorithms, and data visualization. However, collecting this type of data and applying algorithms to segment and track cells and construct lineages of proliferation is error-prone; and identifying the errors can be challenging since it often requires cross-checking multiple data types. Similarly, analyzing and communicating the results necessitates synthesizing different data types into a single narrative. State-of-the-art visualization methods for such data use independent line charts, tree diagrams, and images in separate views. However, this spatial separation requires the viewer of these charts to combine the relevant pieces of data in memory. To simplify this challenging task, we describe design principles for weaving cell images, time-series data, and tree data into a cohesive visualization. Our design principles are based on choosing a primary data type that drives the layout and integrates the other data types into that layout. We then introduce Aardvark, a system that uses these principles to implement novel visualization techniques. Based on Aardvark, we demonstrate the utility of each of these approaches for discovery, communication, and data debugging in a series of case studies.

7.
BMC Bioinformatics ; 14 Suppl 19: S3, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24564375

RESUMO

Jointly analyzing biological pathway maps and experimental data is critical for understanding how biological processes work in different conditions and why different samples exhibit certain characteristics. This joint analysis, however, poses a significant challenge for visualization. Current techniques are either well suited to visualize large amounts of pathway node attributes, or to represent the topology of the pathway well, but do not accomplish both at the same time. To address this we introduce enRoute, a technique that enables analysts to specify a path of interest in a pathway, extract this path into a separate, linked view, and show detailed experimental data associated with the nodes of this extracted path right next to it. This juxtaposition of the extracted path and the experimental data allows analysts to simultaneously investigate large amounts of potentially heterogeneous data, thereby solving the problem of joint analysis of topology and node attributes. As this approach does not modify the layout of pathway maps, it is compatible with arbitrary graph layouts, including those of hand-crafted, image-based pathway maps. We demonstrate the technique in context of pathways from the KEGG and the Wikipathways databases. We apply experimental data from two public databases, the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA) that both contain a wide variety of genomic datasets for a large number of samples. In addition, we make use of a smaller dataset of hepatocellular carcinoma and common xenograft models. To verify the utility of enRoute, domain experts conducted two case studies where they explore data from the CCLE and the hepatocellular carcinoma datasets in the context of relevant pathways.


Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Genômica/métodos , Bases de Dados Genéticas , Humanos , Redes e Vias Metabólicas , Neoplasias/genética
8.
Nat Methods ; 12(4): 281, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25825831
9.
IEEE Trans Vis Comput Graph ; 29(1): 504-514, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36155455

RESUMO

The trouble with data is that it frequently provides only an imperfect representation of a phenomenon of interest. Experts who are familiar with their datasets will often make implicit, mental corrections when analyzing a dataset, or will be cautious not to be overly confident about their findings if caveats are present. However, personal knowledge about the caveats of a dataset is typically not incorporated in a structured way, which is problematic if others who lack that knowledge interpret the data. In this work, we define such analysts' knowledge about datasets as data hunches. We differentiate data hunches from uncertainty and discuss types of hunches. We then explore ways of recording data hunches, and, based on a prototypical design, develop recommendations for designing visualizations that support data hunches. We conclude by discussing various challenges associated with data hunches, including the potential for harm and challenges for trust and privacy. We envision that data hunches will empower analysts to externalize their knowledge, facilitate collaboration and communication, and support the ability to learn from others' data hunches.

11.
IEEE Trans Vis Comput Graph ; 28(1): 248-258, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34587022

RESUMO

Which drug is most promising for a cancer patient? A new microscopy-based approach for measuring the mass of individual cancer cells treated with different drugs promises to answer this question in only a few hours. However, the analysis pipeline for extracting data from these images is still far from complete automation: human intervention is necessary for quality control for preprocessing steps such as segmentation, adjusting filters, removing noise, and analyzing the result. To address this workflow, we developed Loon, a visualization tool for analyzing drug screening data based on quantitative phase microscopy imaging. Loon visualizes both derived data such as growth rates and imaging data. Since the images are collected automatically at a large scale, manual inspection of images and segmentations is infeasible. However, reviewing representative samples of cells is essential, both for quality control and for data analysis. We introduce a new approach for choosing and visualizing representative exemplar cells that retain a close connection to the low-level data. By tightly integrating the derived data visualization capabilities with the novel exemplar visualization and providing selection and filtering capabilities, Loon is well suited for making decisions about which drugs are suitable for a specific patient.


Assuntos
Gráficos por Computador , Microscopia , Automação , Humanos , Processamento de Imagem Assistida por Computador
12.
IEEE Trans Big Data ; 7(3): 524-534, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35693692

RESUMO

The size of large, geo-located datasets has reached scales where visualization of all data points is inefficient. Random sampling is a method to reduce the size of a dataset, yet it can introduce unwanted errors. We describe a method for subsampling of spatial data suitable for creating kernel density estimates from very large data and demonstrate that it results in less error than random sampling. We also introduce a method to ensure that thresholding of low values based on sampled data does not omit any regions above the desired threshold when working with sampled data. We demonstrate the effectiveness of our approach using both, artificial and real-world large geospatial datasets.

13.
IEEE Trans Vis Comput Graph ; 27(2): 1106-1116, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33048719

RESUMO

Design study is an established approach of conducting problem-driven visualization research. The academic visualization community has produced a large body of work for reporting on design studies, informed by a handful of theoretical frameworks, and applied to a broad range of application areas. The result is an abundance of reported insights into visualization design, with an emphasis on novel visualization techniques and systems as the primary contribution of these studies. In recent work we proposed a new, interpretivist perspective on design study and six companion criteria for rigor that highlight the opportunities for researchers to contribute knowledge that extends beyond visualization idioms and software. In this work we conducted a year-long collaboration with evolutionary biologists to develop an interactive tool for visual exploration of multivariate datasets and phylogenetic trees. During this design study we experimented with methods to support three of the rigor criteria: ABUNDANT, REFLEXIVE, and TRANSPARENT. As a result we contribute two novel visualization techniques for the analysis of multivariate phylogenetic datasets, three methodological recommendations for conducting design studies drawn from reflections over our process of experimentation, and two writing devices for reporting interpretivist design study. We offer this work as an example for implementing the rigor criteria to produce a diverse range of knowledge contributions.


Assuntos
Gráficos por Computador , Software , Filogenia , Projetos de Pesquisa
14.
Bioinformatics ; 25(20): 2760-1, 2009 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-19620095

RESUMO

UNLABELLED: Understanding the relationships between pathways and the altered expression of their components in disease conditions can be addressed in a visual data analysis process. Caleydo uses novel visualization techniques to support life science experts in their analysis of gene expression data in the context of pathways and functions of individual genes. Pathways and gene expression visualizations are placed in a 3D scene where selected entities (i.e. genes) are visually connected. This allows Caleydo to seamlessly integrate interactive gene expression visualization with cross-database pathway exploration. AVAILABILITY: The Caleydo visualization framework is freely available on www.caleydo.org for non-commercial use. It runs on Windows and Linux and requires a 3D capable graphics card.


Assuntos
Biologia Computacional/métodos , Expressão Gênica , Software , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Interface Usuário-Computador
15.
IEEE Trans Vis Comput Graph ; 16(6): 1027-35, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20975140

RESUMO

When analyzing multidimensional, quantitative data, the comparison of two or more groups of dimensions is a common task. Typical sources of such data are experiments in biology, physics or engineering, which are conducted in different configurations and use replicates to ensure statistically significant results. One common way to analyze this data is to filter it using statistical methods and then run clustering algorithms to group similar values. The clustering results can be visualized using heat maps, which show differences between groups as changes in color. However, in cases where groups of dimensions have an a priori meaning, it is not desirable to cluster all dimensions combined, since a clustering algorithm can fragment continuous blocks of records. Furthermore, identifying relevant elements in heat maps becomes more difficult as the number of dimensions increases. To aid in such situations, we have developed Matchmaker, a visualization technique that allows researchers to arbitrarily arrange and compare multiple groups of dimensions at the same time. We create separate groups of dimensions which can be clustered individually, and place them in an arrangement of heat maps reminiscent of parallel coordinates. To identify relations, we render bundled curves and ribbons between related records in different groups. We then allow interactive drill-downs using enlarged detail views of the data, which enable in-depth comparisons of clusters between groups. To reduce visual clutter, we minimize crossings between the views. This paper concludes with two case studies. The first demonstrates the value of our technique for the comparison of clustering algorithms. In the second, biologists use our system to investigate why certain strains of mice develop liver disease while others remain healthy, informally showing the efficacy of our system when analyzing multidimensional data containing distinct groups of dimensions.

16.
Gigascience ; 9(1)2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31972021

RESUMO

BACKGROUND: Metabolic networks represent all chemical reactions that occur between molecular metabolites in an organism's cells. They offer biological context in which to integrate, analyze, and interpret omic measurements, but their large scale and extensive connectivity present unique challenges. While it is practical to simplify these networks by placing constraints on compartments and hubs, it is unclear how these simplifications alter the structure of metabolic networks and the interpretation of metabolomic experiments. RESULTS: We curated and adapted the latest systemic model of human metabolism and developed customizable tools to define metabolic networks with and without compartmentalization in subcellular organelles and with or without inclusion of prolific metabolite hubs. Compartmentalization made networks larger, less dense, and more modular, whereas hubs made networks larger, more dense, and less modular. When present, these hubs also dominated shortest paths in the network, yet their exclusion exposed the subtler prominence of other metabolites that are typically more relevant to metabolomic experiments. We applied the non-compartmental network without metabolite hubs in a retrospective, exploratory analysis of metabolomic measurements from 5 studies on human tissues. Network clusters identified individual reactions that might experience differential regulation between experimental conditions, several of which were not apparent in the original publications. CONCLUSIONS: Exclusion of specific metabolite hubs exposes modularity in both compartmental and non-compartmental metabolic networks, improving detection of relevant clusters in omic measurements. Better computational detection of metabolic network clusters in large data sets has potential to identify differential regulation of individual genes, transcripts, and proteins.


Assuntos
Biologia Computacional , Metabolismo Energético , Redes e Vias Metabólicas , Metabolômica , Modelos Biológicos , Biologia Computacional/métodos , Humanos , Metabolômica/métodos , Software , Interface Usuário-Computador , Navegador
17.
IEEE Trans Vis Comput Graph ; 25(3): 1543-1558, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-29993603

RESUMO

The majority of diseases that are a significant challenge for public and individual heath are caused by a combination of hereditary and environmental factors. In this paper we introduce Lineage, a novel visual analysis tool designed to support domain experts who study such multifactorial diseases in the context of genealogies. Incorporating familial relationships between cases with other data can provide insights into shared genomic variants and shared environmental exposures that may be implicated in such diseases. We introduce a data and task abstraction, and argue that the problem of analyzing such diseases based on genealogical, clinical, and genetic data can be mapped to a multivariate graph visualization problem. The main contribution of our design study is a novel visual representation for tree-like, multivariate graphs, which we apply to genealogies and clinical data about the individuals in these families. We introduce data-driven aggregation methods to scale to multiple families. By designing the genealogy graph layout to align with a tabular view, we are able to incorporate extensive, multivariate attributes in the analysis of the genealogy without cluttering the graph. We validate our designs by conducting case studies with our domain collaborators.


Assuntos
Gráficos por Computador , Doença/genética , Genômica/métodos , Linhagem , Algoritmos , Bases de Dados Genéticas , Feminino , Genealogia e Heráldica , Humanos , Masculino
18.
Appl Clin Inform ; 10(2): 278-285, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-31018234

RESUMO

OBJECTIVE: Visual cohort analysis utilizing electronic health record data has become an important tool in clinical assessment of patient outcomes. In this article, we introduce Composer, a visual analysis tool for orthopedic surgeons to compare changes in physical functions of a patient cohort following various spinal procedures. The goal of our project is to help researchers analyze outcomes of procedures and facilitate informed decision-making about treatment options between patient and clinician. METHODS: In collaboration with orthopedic surgeons and researchers, we defined domain-specific user requirements to inform the design. We developed the tool in an iterative process with our collaborators to develop and refine functionality. With Composer, analysts can dynamically define a patient cohort using demographic information, clinical parameters, and events in patient medical histories and then analyze patient-reported outcome scores for the cohort over time, as well as compare it to other cohorts. Using Composer's current iteration, we provide a usage scenario for use of the tool in a clinical setting. CONCLUSION: We have developed a prototype cohort analysis tool to help clinicians assess patient treatment options by analyzing prior cases with similar characteristics. Although Composer was designed using patient data specific to orthopedic research, we believe the tool is generalizable to other healthcare domains. A long-term goal for Composer is to develop the application into a shared decision-making tool that allows translation of comparison and analysis from a clinician-facing interface into visual representations to communicate treatment options to patients.


Assuntos
Estudos de Coortes , Registros Eletrônicos de Saúde , Interface Usuário-Computador , Humanos , Resultado do Tratamento
19.
Artigo em Inglês | MEDLINE | ID: mdl-30188828

RESUMO

Analyzing large, multivariate graphs is an important problem in many domains, yet such graphs are challenging to visualize. In this paper, we introduce a novel, scalable, tree+table multivariate graph visualization technique, which makes many tasks related to multivariate graph analysis easier to achieve. The core principle we follow is to selectively query for nodes or subgraphs of interest and visualize these subgraphs as a spanning tree of the graph. The tree is laid out linearly, which enables us to juxtapose the nodes with a table visualization where diverse attributes can be shown. We also use this table as an adjacency matrix, so that the resulting technique is a hybrid node-link/adjacency matrix technique. We implement this concept in Juniper and complement it with a set of interaction techniques that enable analysts to dynamically grow, restructure, and aggregate the tree, as well as change the layout or show paths between nodes. We demonstrate the utility of our tool in usage scenarios for different multivariate networks: a bipartite network of scholars, papers, and citation metrics and a multitype network of story characters, places, books, etc.

20.
IEEE Trans Vis Comput Graph ; 22(1): 399-408, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26529712

RESUMO

Alternative splicing is a process by which the same DNA sequence is used to assemble different proteins, called protein isoforms. Alternative splicing works by selectively omitting some of the coding regions (exons) typically associated with a gene. Detection of alternative splicing is difficult and uses a combination of advanced data acquisition methods and statistical inference. Knowledge about the abundance of isoforms is important for understanding both normal processes and diseases and to eventually improve treatment through targeted therapies. The data, however, is complex and current visualizations for isoforms are neither perceptually efficient nor scalable. To remedy this, we developed Vials, a novel visual analysis tool that enables analysts to explore the various datasets that scientists use to make judgments about isoforms: the abundance of reads associated with the coding regions of the gene, evidence for junctions, i.e., edges connecting the coding regions, and predictions of isoform frequencies. Vials is scalable as it allows for the simultaneous analysis of many samples in multiple groups. Our tool thus enables experts to (a) identify patterns of isoform abundance in groups of samples and (b) evaluate the quality of the data. We demonstrate the value of our tool in case studies using publicly available datasets.


Assuntos
Processamento Alternativo/genética , Gráficos por Computador , Genômica/métodos , Modelos Genéticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA