ABSTRACT
Spatially resolved omics technologies are transforming our understanding of biological tissues. However, the handling of uni- and multimodal spatial omics datasets remains a challenge owing to large data volumes, heterogeneity of data types and the lack of flexible, spatially aware data structures. Here we introduce SpatialData, a framework that establishes a unified and extensible multiplatform file-format, lazy representation of larger-than-memory data, transformations and alignment to common coordinate systems. SpatialData facilitates spatial annotations and cross-modal aggregation and analysis, the utility of which is illustrated in the context of multiple vignettes, including integrative analysis on a multimodal Xenium and Visium breast cancer study.
ABSTRACT
Spatial omics data are advancing the study of tissue organization and cellular communication at an unprecedented scale. Flexible tools are required to store, integrate and visualize the large diversity of spatial omics data. Here, we present Squidpy, a Python framework that brings together tools from omics and image analysis to enable scalable description of spatial molecular data, such as transcriptome or multivariate proteins. Squidpy provides efficient infrastructure and numerous analysis methods that allow to efficiently store, manipulate and interactively visualize spatial omics data. Squidpy is extensible and can be interfaced with a variety of already existing libraries for the scalable analysis of spatial omics data.
Subject(s)
Computational Biology/methods , Gene Expression Profiling/methods , Proteomics/methods , Software , Animals , Data Visualization , Databases, Factual , Humans , Image Processing, Computer-Assisted , Mice , Programming Languages , WorkflowABSTRACT
SUMMARY: Spatial omics technologies are increasingly leveraged to characterize how disease disrupts tissue organization and cellular niches. While multiple methods to analyze spatial variation within a sample have been published, statistical and computational approaches to compare cell spatial organization across samples or conditions are mostly lacking. We present GraphCompass, a comprehensive set of omics-adapted graph analysis methods to quantitatively evaluate and compare the spatial arrangement of cells in samples representing diverse biological conditions. GraphCompass builds upon the Squidpy spatial omics toolbox and encompasses various statistical approaches to perform cross-condition analyses at the level of individual cell types, niches, and samples. Additionally, GraphCompass provides custom visualization functions that enable effective communication of results. We demonstrate how GraphCompass can be used to address key biological questions, such as how cellular organization and tissue architecture differ across various disease states and which spatial patterns correlate with a given pathological condition. GraphCompass can be applied to various popular omics techniques, including, but not limited to, spatial proteomics (e.g. MIBI-TOF), spot-based transcriptomics (e.g. 10× Genomics Visium), and single-cell resolved transcriptomics (e.g. Stereo-seq). In this work, we showcase the capabilities of GraphCompass through its application to three different studies that may also serve as benchmark datasets for further method development. With its easy-to-use implementation, extensive documentation, and comprehensive tutorials, GraphCompass is accessible to biologists with varying levels of computational expertise. By facilitating comparative analyses of cell spatial organization, GraphCompass promises to be a valuable asset in advancing our understanding of tissue function in health and disease..
Subject(s)
Software , Humans , Proteomics/methods , Computational Biology/methods , Genomics/methods , Animals , Transcriptome , Single-Cell Analysis/methodsABSTRACT
Enhancers play a vital role in gene regulation and are critical in mediating the impact of noncoding genetic variants associated with complex traits. Enhancer activity is a cell-type-specific process regulated by transcription factors (TFs), epigenetic mechanisms and genetic variants. Despite the strong mechanistic link between TFs and enhancers, we currently lack a framework for jointly analysing them in cell-type-specific gene regulatory networks (GRN). Equally important, we lack an unbiased way of assessing the biological significance of inferred GRNs since no complete ground truth exists. To address these gaps, we present GRaNIE (Gene Regulatory Network Inference including Enhancers) and GRaNPA (Gene Regulatory Network Performance Analysis). GRaNIE (https://git.embl.de/grp-zaugg/GRaNIE) builds enhancer-mediated GRNs based on covariation of chromatin accessibility and RNA-seq across samples (e.g. individuals), while GRaNPA (https://git.embl.de/grp-zaugg/GRaNPA) assesses the performance of GRNs for predicting cell-type-specific differential expression. We demonstrate their power by investigating gene regulatory mechanisms underlying the response of macrophages to infection, cancer and common genetic traits including autoimmune diseases. Finally, our methods identify the TF PURA as a putative regulator of pro-inflammatory macrophage polarisation.
Subject(s)
Gene Regulatory Networks , Neoplasms , Humans , Gene Expression Regulation , Transcription Factors/genetics , Transcription Factors/metabolism , Chromatin , Neoplasms/genetics , Enhancer Elements, Genetic/geneticsABSTRACT
A growing community is constructing a next-generation file format (NGFF) for bioimaging to overcome problems of scalability and heterogeneity. Organized by the Open Microscopy Environment (OME), individuals and institutes across diverse modalities facing these problems have designed a format specification process (OME-NGFF) to address these needs. This paper brings together a wide range of those community members to describe the cloud-optimized format itself-OME-Zarr-along with tools and data resources available today to increase FAIR access and remove barriers in the scientific process. The current momentum offers an opportunity to unify a key component of the bioimaging domain-the file format that underlies so many personal, institutional, and global data management and analysis tasks.
Subject(s)
Microscopy , Software , Humans , Community SupportABSTRACT
With the growing number of single-cell analysis tools, benchmarks are increasingly important to guide analysis and method development. However, a lack of standardisation and extensibility in current benchmarks limits their usability, longevity, and relevance to the community. We present Open Problems, a living, extensible, community-guided benchmarking platform including 10 current single-cell tasks that we envision will raise standards for the selection, evaluation, and development of methods in single-cell analysis.
ABSTRACT
Gastruloids are 3D structures generated from pluripotent stem cells recapitulating fundamental principles of embryonic pattern formation. Using single-cell genomic analysis, we provide a resource mapping cell states and types during gastruloid development and compare them with the in vivo embryo. We developed a high-throughput handling and imaging pipeline to spatially monitor symmetry breaking during gastruloid development and report an early spatial variability in pluripotency determining a binary response to Wnt activation. Although cells in the gastruloid-core revert to pluripotency, peripheral cells become primitive streak-like. These two populations subsequently break radial symmetry and initiate axial elongation. By performing a compound screen, perturbing thousands of gastruloids, we derive a phenotypic landscape and infer networks of genetic interactions. Finally, using a dual Wnt modulation, we improve the formation of anterior structures in the existing gastruloid model. This work provides a resource to understand how gastruloids develop and generate complex patterns in vitro.
Subject(s)
Embryo, Mammalian , Pluripotent Stem Cells , Mice , Animals , Embryo, Mammalian/metabolism , Primitive Streak/metabolism , Embryonic DevelopmentABSTRACT
A growing community is constructing a next-generation file format (NGFF) for bioimaging to overcome problems of scalability and heterogeneity. Organized by the Open Microscopy Environment (OME), individuals and institutes across diverse modalities facing these problems have designed a format specification process (OME-NGFF) to address these needs. This paper brings together a wide range of those community members to describe the cloud-optimized format itself -- OME-Zarr -- along with tools and data resources available today to increase FAIR access and remove barriers in the scientific process. The current momentum offers an opportunity to unify a key component of the bioimaging domain -- the file format that underlies so many personal, institutional, and global data management and analysis tasks.
ABSTRACT
Methods for profiling RNA and protein expression in a spatially resolved manner are rapidly evolving, making it possible to comprehensively characterize cells and tissues in health and disease. To maximize the biological insights obtained using these techniques, it is critical to both clearly articulate the key biological questions in spatial analysis of tissues and develop the requisite computational tools to address them. Developers of analytical tools need to decide on the intrinsic molecular features of each cell that need to be considered, and how cell shape and morphological features are incorporated into the analysis. Also, optimal ways to compare different tissue samples at various length scales are still being sought. Grouping these biological problems and related computational algorithms into classes across length scales, thus characterizing common issues that need to be addressed, will facilitate further progress in spatial transcriptomics and proteomics.
Subject(s)
Proteomics , Transcriptome , Algorithms , Computational Biology/methods , Spatial Analysis , Transcriptome/geneticsABSTRACT
Latent factor modeling applied to single-cell RNA sequencing (scRNA-seq) data is a useful approach to discover gene signatures. However, it is often unclear what methods are best suited for specific tasks and how latent factors should be interpreted. Here, we compare four state-of-the-art methods and propose an approach to assign derived latent factors to pathway activities and specific cell subsets. By applying this framework to scRNA-seq datasets from biopsies of patients with rheumatoid arthritis and systemic lupus erythematosus, we discover disease-relevant gene signatures in specific cellular subsets. In rheumatoid arthritis, we identify an inflammatory OSMR signaling signature active in a subset of synovial fibroblasts and an efferocytic signature in a subset of synovial monocytes. Overall, we provide insights into latent factors models for the analysis of scRNA-seq data, develop a framework to identify cell subtypes in a phenotype-driven way, and use it to identify novel pathways dysregulated in rheumatoid arthritis.
ABSTRACT
Transcription factors (TFs) regulate many cellular processes and can therefore serve as readouts of the signaling and regulatory state. Yet for many TFs, the mode of action-repressing or activating transcription of target genes-is unclear. Here, we present diffTF (https://git.embl.de/grp-zaugg/diffTF) to calculate differential TF activity (basic mode) and classify TFs into putative transcriptional activators or repressors (classification mode). In basic mode, it combines genome-wide chromatin accessibility/activity with putative TF binding sites that, in classification mode, are integrated with RNA-seq. We apply diffTF to compare (1) mutated and unmutated chronic lymphocytic leukemia patients and (2) two hematopoietic progenitor cell types. In both datasets, diffTF recovers most known biology and finds many previously unreported TFs. It classifies almost 40% of TFs based on their mode of action, which we validate experimentally. Overall, we demonstrate that diffTF recovers known biology, identifies less well-characterized TFs, and classifies TFs into transcriptional activators or repressors.