Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 83
Filter
Add more filters

Publication year range
1.
Cell ; 153(4): 919-29, 2013 May 09.
Article in English | MEDLINE | ID: mdl-23663786

ABSTRACT

Identification of somatic rearrangements in cancer genomes has accelerated through analysis of high-throughput sequencing data. However, characterization of complex structural alterations and their underlying mechanisms remains inadequate. Here, applying an algorithm to predict structural variations from short reads, we report a comprehensive catalog of somatic structural variations and the mechanisms generating them, using high-coverage whole-genome sequencing data from 140 patients across ten tumor types. We characterize the relative contributions of different types of rearrangements and their mutational mechanisms, find that ~20% of the somatic deletions are complex deletions formed by replication errors, and describe the differences between the mutational mechanisms in somatic and germline alterations. Importantly, we provide detailed reconstructions of the events responsible for loss of CDKN2A/B and gain of EGFR in glioblastoma, revealing that these alterations can result from multiple mechanisms even in a single genome and that both DNA double-strand breaks and replication errors drive somatic rearrangements.


Subject(s)
Algorithms , Genome, Human , Mutation , Neoplasms/genetics , Chromosome Aberrations , Genome-Wide Association Study , Glioblastoma/genetics , Humans , Neoplasms/pathology
2.
Nat Methods ; 20(8): 1174-1178, 2023 08.
Article in English | MEDLINE | ID: mdl-37468619

ABSTRACT

Multiplexed antibody-based imaging enables the detailed characterization of molecular and cellular organization in tissues. Advances in the field now allow high-parameter data collection (>60 targets); however, considerable expertise and capital are needed to construct the antibody panels employed by these methods. Organ mapping antibody panels are community-validated resources that save time and money, increase reproducibility, accelerate discovery and support the construction of a Human Reference Atlas.


Subject(s)
Antibodies , Community Resources , Humans , Reproducibility of Results , Diagnostic Imaging
3.
Nucleic Acids Res ; 52(D1): D61-D66, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37971305

ABSTRACT

The Cistrome Data Browser is a resource of ChIP-seq, ATAC-seq and DNase-seq data from humans and mice. It provides maps of the genome-wide locations of transcription factors, cofactors, chromatin remodelers, histone post-translational modifications and regions of chromatin accessible to endonuclease activity. Cistrome DB v3.0 contains approximately 45 000 human and 44 000 mouse samples with about 32 000 newly collected datasets compared to the previous release. The Cistrome DB v3.0 user interface is implemented as a single page application that unifies menu driven and data driven search functions and provides an embedded genome browser, which allows users to find and visualize data more effectively. Users can find informative chromatin profiles through keyword, menu, and data-driven search tools. Browser search functions can predict the regulators of query genes as well as the cell type and factor dependent functionality of potential cis-regulatory elements. Cistrome DB v3.0 expands the display of quality control statistics, incorporates sequence logos into motif enrichment displays and includes more expansive sample metadata. Cistrome DB v3.0 is available at http://db3.cistrome.org/browser.


Subject(s)
Chromatin , Databases, Protein , Genomics , Software , Animals , Humans , Mice , Chromatin/genetics , Histones/genetics , Histones/metabolism , Sequence Analysis, DNA , Transcription Factors/genetics , Transcription Factors/metabolism , Data Visualization , Internet , Genomics/methods
4.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36688709

ABSTRACT

SUMMARY: Gos is a declarative Python library designed to create interactive multiscale visualizations of genomics and epigenomics data. It provides a consistent and simple interface to the flexible Gosling visualization grammar. Gos hides technical complexities involved with configuring web-based genome browsers and integrates seamlessly within computational notebooks environments to enable new interactive analysis workflows. AVAILABILITY AND IMPLEMENTATION: Gos is released under the MIT License and available on the Python Package Index (PyPI). The source code is publicly available on GitHub (https://github.com/gosling-lang/gos), and documentation with examples can be found at https://gosling-lang.github.io/gos.


Subject(s)
Computational Biology , Geese , Animals , Genomics , Genome , Gene Library , Software
5.
Bioinformatics ; 39(2)2023 02 03.
Article in English | MEDLINE | ID: mdl-36688700

ABSTRACT

SUMMARY: The regulation of genes by cis-regulatory elements (CREs) is complex and differs between cell types. Visual analysis of large collections of chromatin profiles across diverse cell types, integrated with computational methods, can reveal meaningful biological insights. We developed Cistrome Explorer, a web-based interactive visual analytics tool for exploring thousands of chromatin profiles in diverse cell types. Integrated with the Cistrome Data Browser database which contains thousands of ChIP-seq, DNase-seq and ATAC-seq samples, Cistrome Explorer enables the discovery of patterns of CREs across cell types and the identification of transcription factor binding underlying these patterns. AVAILABILITY AND IMPLEMENTATION: Cistrome Explorer and its source code are available at http://cisvis.gehlenborglab.org/ and released under the MIT License. Documentation can be accessed via http://cisvis.gehlenborglab.org/docs/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Chromatin , Epigenomics , Sequence Analysis, DNA , Chromatin Immunoprecipitation Sequencing , Software , Databases, Genetic
6.
Bioinformatics ; 37(Suppl_1): i59-i66, 2021 07 12.
Article in English | MEDLINE | ID: mdl-34252935

ABSTRACT

MOTIVATION: Molecular profiling of patient tumors and liquid biopsies over time with next-generation sequencing technologies and new immuno-profile assays are becoming part of standard research and clinical practice. With the wealth of new longitudinal data, there is a critical need for visualizations for cancer researchers to explore and interpret temporal patterns not just in a single patient but across cohorts. RESULTS: To address this need we developed OncoThreads, a tool for the visualization of longitudinal clinical and cancer genomics and other molecular data in patient cohorts. The tool visualizes patient cohorts as temporal heatmaps and Sankey diagrams that support the interactive exploration and ranking of a wide range of clinical and molecular features. This allows analysts to discover temporal patterns in longitudinal data, such as the impact of mutations on response to a treatment, for example, emergence of resistant clones. We demonstrate the functionality of OncoThreads using a cohort of 23 glioma patients sampled at 2-4 timepoints. AVAILABILITY AND IMPLEMENTATION: Freely available at http://oncothreads.gehlenborglab.org. Implemented in Java Script using the cBioPortal web API as a backend. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Biochemical Phenomena , Neoplasms , Genomics , High-Throughput Nucleotide Sequencing , Humans , Neoplasms/genetics , Software
9.
J Med Internet Res ; 23(10): e31400, 2021 10 11.
Article in English | MEDLINE | ID: mdl-34533459

ABSTRACT

BACKGROUND: Many countries have experienced 2 predominant waves of COVID-19-related hospitalizations. Comparing the clinical trajectories of patients hospitalized in separate waves of the pandemic enables further understanding of the evolving epidemiology, pathophysiology, and health care dynamics of the COVID-19 pandemic. OBJECTIVE: In this retrospective cohort study, we analyzed electronic health record (EHR) data from patients with SARS-CoV-2 infections hospitalized in participating health care systems representing 315 hospitals across 6 countries. We compared hospitalization rates, severe COVID-19 risk, and mean laboratory values between patients hospitalized during the first and second waves of the pandemic. METHODS: Using a federated approach, each participating health care system extracted patient-level clinical data on their first and second wave cohorts and submitted aggregated data to the central site. Data quality control steps were adopted at the central site to correct for implausible values and harmonize units. Statistical analyses were performed by computing individual health care system effect sizes and synthesizing these using random effect meta-analyses to account for heterogeneity. We focused the laboratory analysis on C-reactive protein (CRP), ferritin, fibrinogen, procalcitonin, D-dimer, and creatinine based on their reported associations with severe COVID-19. RESULTS: Data were available for 79,613 patients, of which 32,467 were hospitalized in the first wave and 47,146 in the second wave. The prevalence of male patients and patients aged 50 to 69 years decreased significantly between the first and second waves. Patients hospitalized in the second wave had a 9.9% reduction in the risk of severe COVID-19 compared to patients hospitalized in the first wave (95% CI 8.5%-11.3%). Demographic subgroup analyses indicated that patients aged 26 to 49 years and 50 to 69 years; male and female patients; and black patients had significantly lower risk for severe disease in the second wave than in the first wave. At admission, the mean values of CRP were significantly lower in the second wave than in the first wave. On the seventh hospital day, the mean values of CRP, ferritin, fibrinogen, and procalcitonin were significantly lower in the second wave than in the first wave. In general, countries exhibited variable changes in laboratory testing rates from the first to the second wave. At admission, there was a significantly higher testing rate for D-dimer in France, Germany, and Spain. CONCLUSIONS: Patients hospitalized in the second wave were at significantly lower risk for severe COVID-19. This corresponded to mean laboratory values in the second wave that were more likely to be in typical physiological ranges on the seventh hospital day compared to the first wave. Our federated approach demonstrated the feasibility and power of harmonizing heterogeneous EHR data from multiple international health care systems to rapidly conduct large-scale studies to characterize how COVID-19 clinical trajectories evolve.


Subject(s)
COVID-19 , Pandemics , Adult , Aged , Female , Hospitalization , Hospitals , Humans , Male , Middle Aged , Retrospective Studies , SARS-CoV-2
10.
J Med Internet Res ; 23(3): e22219, 2021 03 02.
Article in English | MEDLINE | ID: mdl-33600347

ABSTRACT

Coincident with the tsunami of COVID-19-related publications, there has been a surge of studies using real-world data, including those obtained from the electronic health record (EHR). Unfortunately, several of these high-profile publications were retracted because of concerns regarding the soundness and quality of the studies and the EHR data they purported to analyze. These retractions highlight that although a small community of EHR informatics experts can readily identify strengths and flaws in EHR-derived studies, many medical editorial teams and otherwise sophisticated medical readers lack the framework to fully critically appraise these studies. In addition, conventional statistical analyses cannot overcome the need for an understanding of the opportunities and limitations of EHR-derived studies. We distill here from the broader informatics literature six key considerations that are crucial for appraising studies utilizing EHR data: data completeness, data collection and handling (eg, transformation), data type (ie, codified, textual), robustness of methods against EHR variability (within and across institutions, countries, and time), transparency of data and analytic code, and the multidisciplinary approach. These considerations will inform researchers, clinicians, and other stakeholders as to the recommended best practices in reviewing manuscripts, grants, and other outputs from EHR-data derived studies, and thereby promote and foster rigor, quality, and reliability of this rapidly growing field.


Subject(s)
COVID-19/epidemiology , Data Collection/methods , Electronic Health Records , Data Collection/standards , Humans , Peer Review, Research/standards , Publishing/standards , Reproducibility of Results , SARS-CoV-2/isolation & purification
13.
Nature ; 512(7515): 449-52, 2014 Aug 28.
Article in English | MEDLINE | ID: mdl-25164756

ABSTRACT

Genome function is dynamically regulated in part by chromatin, which consists of the histones, non-histone proteins and RNA molecules that package DNA. Studies in Caenorhabditis elegans and Drosophila melanogaster have contributed substantially to our understanding of molecular mechanisms of genome function in humans, and have revealed conservation of chromatin components and mechanisms. Nevertheless, the three organisms have markedly different genome sizes, chromosome architecture and gene organization. On human and fly chromosomes, for example, pericentric heterochromatin flanks single centromeres, whereas worm chromosomes have dispersed heterochromatin-like regions enriched in the distal chromosomal 'arms', and centromeres distributed along their lengths. To systematically investigate chromatin organization and associated gene regulation across species, we generated and analysed a large collection of genome-wide chromatin data sets from cell lines and developmental stages in worm, fly and human. Here we present over 800 new data sets from our ENCODE and modENCODE consortia, bringing the total to over 1,400. Comparison of combinatorial patterns of histone modifications, nuclear lamina-associated domains, organization of large-scale topological domains, chromatin environment at promoters and enhancers, nucleosome positioning, and DNA replication patterns reveals many conserved features of chromatin organization among the three organisms. We also find notable differences in the composition and locations of repressive chromatin. These data sets and analyses provide a rich resource for comparative and species-specific investigations of chromatin composition, organization and function.


Subject(s)
Caenorhabditis elegans/cytology , Caenorhabditis elegans/genetics , Chromatin/genetics , Chromatin/metabolism , Drosophila melanogaster/cytology , Drosophila melanogaster/genetics , Animals , Cell Line , Centromere/genetics , Centromere/metabolism , Chromatin/chemistry , Chromatin Assembly and Disassembly/genetics , DNA Replication/genetics , Enhancer Elements, Genetic/genetics , Epigenesis, Genetic , Heterochromatin/chemistry , Heterochromatin/genetics , Heterochromatin/metabolism , Histones/chemistry , Histones/metabolism , Humans , Molecular Sequence Annotation , Nuclear Lamina/metabolism , Nucleosomes/chemistry , Nucleosomes/genetics , Nucleosomes/metabolism , Promoter Regions, Genetic/genetics , Species Specificity
14.
Bioinformatics ; 34(7): 1200-1207, 2018 04 01.
Article in English | MEDLINE | ID: mdl-29186292

ABSTRACT

Motivation: The ever-increasing number of biomedical datasets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating datasets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find datasets of interest. Results: We developed SATORI-an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection. SATORI enables researchers to seamlessly search, browse and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface. Availability and implementation: SATORI is an open-source web application, which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform. Contact: nils@hms.harvard.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Biological Ontologies , Computational Biology/methods , Metadata , Software , Animals , Humans , Internet , Semantics
15.
Bioinformatics ; 33(18): 2938-2940, 2017 Sep 15.
Article in English | MEDLINE | ID: mdl-28645171

ABSTRACT

MOTIVATION: Venn and Euler diagrams are a popular yet inadequate solution for quantitative visualization of set intersections. A scalable alternative to Venn and Euler diagrams for visualizing intersecting sets and their properties is needed. RESULTS: We developed UpSetR, an open source R package that employs a scalable matrix-based visualization to show intersections of sets, their size, and other properties. AVAILABILITY AND IMPLEMENTATION: UpSetR is available at https://github.com/hms-dbmi/UpSetR/ and released under the MIT License. A Shiny app is available at https://gehlenborglab.shinyapps.io/upsetr/ . CONTACT: nils@hms.harvard.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Software , Genotyping Techniques/methods , Sequence Analysis, DNA/methods
16.
BMC Bioinformatics ; 18(1): 406, 2017 Sep 12.
Article in English | MEDLINE | ID: mdl-28899361

ABSTRACT

BACKGROUND: With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. RESULTS: In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. CONCLUSIONS: Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.


Subject(s)
Algorithms , User-Computer Interface , Cluster Analysis , Genotype , Humans , Internet , Neoplasms/classification , Neoplasms/genetics , Neoplasms/pathology , Phenotype
17.
Proc Natl Acad Sci U S A ; 111(43): 15544-9, 2014 Oct 28.
Article in English | MEDLINE | ID: mdl-25313082

ABSTRACT

Previous studies have established that a subset of head and neck tumors contains human papillomavirus (HPV) sequences and that HPV-driven head and neck cancers display distinct biological and clinical features. HPV is known to drive cancer by the actions of the E6 and E7 oncoproteins, but the molecular architecture of HPV infection and its interaction with the host genome in head and neck cancers have not been comprehensively described. We profiled a cohort of 279 head and neck cancers with next generation RNA and DNA sequencing and show that 35 (12.5%) tumors displayed evidence of high-risk HPV types 16, 33, or 35. Twenty-five cases had integration of the viral genome into one or more locations in the human genome with statistical enrichment for genic regions. Integrations had a marked impact on the human genome and were associated with alterations in DNA copy number, mRNA transcript abundance and splicing, and both inter- and intrachromosomal rearrangements. Many of these events involved genes with documented roles in cancer. Cancers with integrated vs. nonintegrated HPV displayed different patterns of DNA methylation and both human and viral gene expressions. Together, these data provide insight into the mechanisms by which HPV interacts with the human genome beyond expression of viral oncoproteins and suggest that specific integration events are an integral component of viral oncogenesis.


Subject(s)
Genome, Human/genetics , Head and Neck Neoplasms/genetics , Head and Neck Neoplasms/virology , Host-Pathogen Interactions/genetics , Papillomaviridae/physiology , Base Sequence , DNA Methylation/genetics , Gene Expression Regulation, Neoplastic , Genes, Neoplasm , Humans , Molecular Sequence Data , Virus Integration/genetics
18.
Bioinformatics ; 29(8): 1089-91, 2013 Apr 15.
Article in English | MEDLINE | ID: mdl-23419376

ABSTRACT

SUMMARY: We have developed Nozzle, an R package that provides an Application Programming Interface to generate HTML reports with dynamic user interface elements. Nozzle was designed to facilitate summarization and rapid browsing of complex results in data analysis pipelines where multiple analyses are performed frequently on big datasets. The package can be applied to any project where user-friendly reports need to be created. AVAILABILITY: The R package is available on CRAN at http://cran.r-project.org/package=Nozzle.R1. Examples and additional materials are available at http://gdac.broadinstitute.org/nozzle. The source code is also available at http://www.github.com/parklab/Nozzle. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Software , Computational Biology/methods , Genomics , Humans , Neoplasms/genetics , Programming Languages , User-Computer Interface , Workflow
19.
Article in English | MEDLINE | ID: mdl-39255153

ABSTRACT

Projecting high-dimensional vectors into two dimensions for visualization, known as embedding visualization, facilitates perceptual reasoning and interpretation. Comparing multiple embedding visualizations drives decision-making in many domains, but traditional comparison methods are limited by a reliance on direct point correspondences. This requirement precludes comparisons without point correspondences, such as two different datasets of annotated images, and fails to capture meaningful higher-level relationships among point groups. To address these shortcomings, we propose a general framework for comparing embedding visualizations based on shared class labels rather than individual points. Our approach partitions points into regions corresponding to three key class concepts-confusion, neighborhood, and relative size-to characterize intra- and inter-class relationships. Informed by a preliminary user study, we implemented our framework using perceptual neighborhood graphs to defne these regions and introduced metrics to quantify each concept. We demonstrate the generality of our framework with usage scenarios from machine learning and single-cell biology, highlighting our metrics' ability to draw insightful comparisons across label hierarchies. To assess the effectiveness of our approach, we conducted an evaluation study with fve machine learning researchers and six single-cell biologists using an interactive and scalable prototype built with Python, JavaScript, and Rust. Our metrics enable more structured comparisons through visual guidance and increased participants' confdence in their fndings.

20.
Article in English | MEDLINE | ID: mdl-39288066

ABSTRACT

Genomics experts rely on visualization to extract and share insights from complex and large-scale datasets. Beyond off-the-shelf tools for data exploration, there is an increasing need for platforms that aid experts in authoring customized visualizations for both exploration and communication of insights. A variety of interactive techniques have been proposed for authoring data visualizations, such as template editing, shelf configuration, natural language input, and code editors. However, it remains unclear how genomics experts create visualizations and which techniques best support their visualization tasks and needs. To address this gap, we conducted two user studies with genomics researchers: (1) semi-structured interviews (n = 20) to identify the tasks, user contexts, and current visualization authoring techniques and (2) an exploratory study (n = 13) using visual probes to elicit users' intents and desired techniques when creating visualizations. Our contributions include (1) a characterization of how visualization authoring is currently utilized in genomics visualization, identifying limitations and benefits in light of common criteria for authoring tools, and (2) generalizable design implications for genomics visualization authoring tools based on our findings on task- and user-specific usefulness of authoring techniques. All supplemental materials are available at https://osf.io/bdj4v/.

SELECTION OF CITATIONS
SEARCH DETAIL