ABSTRACT
The main goals and challenges for the life science communities in the Open Science framework are to increase reuse and sustainability of data resources, software tools, and workflows, especially in large-scale data-driven research and computational analyses. Here, we present key findings, procedures, effective measures and recommendations for generating and establishing sustainable life science resources based on the collaborative, cross-disciplinary work done within the EOSC-Life (European Open Science Cloud for Life Sciences) consortium. Bringing together 13 European life science research infrastructures, it has laid the foundation for an open, digital space to support biological and medical research. Using lessons learned from 27 selected projects, we describe the organisational, technical, financial and legal/ethical challenges that represent the main barriers to sustainability in the life sciences. We show how EOSC-Life provides a model for sustainable data management according to FAIR (findability, accessibility, interoperability, and reusability) principles, including solutions for sensitive- and industry-related resources, by means of cross-disciplinary training and best practices sharing. Finally, we illustrate how data harmonisation and collaborative work facilitate interoperability of tools, data, solutions and lead to a better understanding of concepts, semantics and functionalities in the life sciences.
Subject(s)
Biological Science Disciplines , Biomedical Research , Software , WorkflowABSTRACT
The rapid pace of innovation in biological imaging and the diversity of its applications have prevented the establishment of a community-agreed standardized data format. We propose that complementing established open formats such as OME-TIFF and HDF5 with a next-generation file format such as Zarr will satisfy the majority of use cases in bioimaging. Critically, a common metadata format used in all these vessels can deliver truly findable, accessible, interoperable and reusable bioimaging data.
Subject(s)
Computational Biology/instrumentation , Computational Biology/standards , Metadata , Microscopy/instrumentation , Microscopy/standards , Software , Benchmarking , Computational Biology/methods , Data Compression , Databases, Factual , Information Storage and Retrieval , Internet , Microscopy/methods , Programming Languages , SARS-CoV-2ABSTRACT
A growing community is constructing a next-generation file format (NGFF) for bioimaging to overcome problems of scalability and heterogeneity. Organized by the Open Microscopy Environment (OME), individuals and institutes across diverse modalities facing these problems have designed a format specification process (OME-NGFF) to address these needs. This paper brings together a wide range of those community members to describe the cloud-optimized format itself-OME-Zarr-along with tools and data resources available today to increase FAIR access and remove barriers in the scientific process. The current momentum offers an opportunity to unify a key component of the bioimaging domain-the file format that underlies so many personal, institutional, and global data management and analysis tasks.
Subject(s)
Microscopy , Software , Humans , Community SupportABSTRACT
High content screening (HCS) experiments create a classic data management challenge-multiple, large sets of heterogeneous structured and unstructured data, that must be integrated and linked to produce a set of "final" results. These different data include images, reagents, protocols, analytic output, and phenotypes, all of which must be stored, linked and made accessible for users, scientists, collaborators and where appropriate the wider community. The OME Consortium has built several open source tools for managing, linking and sharing these different types of data. The OME Data Model is a metadata specification that supports the image data and metadata recorded in HCS experiments. Bio-Formats is a Java library that reads recorded image data and metadata and includes support for several HCS screening systems. OMERO is an enterprise data management application that integrates image data, experimental and analytic metadata and makes them accessible for visualization, mining, sharing and downstream analysis. We discuss how Bio-Formats and OMERO handle these different data types, and how they can be used to integrate, link and share HCS experiments in facilities and public data repositories. OME specifications and software are open source and are available at https://www.openmicroscopy.org.
Subject(s)
Computational Biology/statistics & numerical data , Data Mining/statistics & numerical data , High-Throughput Screening Assays/statistics & numerical data , Information Storage and Retrieval/statistics & numerical data , Software , Computational Biology/methods , Datasets as Topic , High-Throughput Screening Assays/methods , Humans , Information Dissemination , Information Storage and Retrieval/methods , InternetABSTRACT
Imaging data are used in the life and biomedical sciences to measure the molecular and structural composition and dynamics of cells, tissues, and organisms. Datasets range in size from megabytes to terabytes and usually contain a combination of binary pixel data and metadata that describe the acquisition process and any derived results. The OMERO image data management platform allows users to securely share image datasets according to specific permissions levels: data can be held privately, shared with a set of colleagues, or made available via a public URL. Users control access by assigning data to specific Groups with defined membership and access rights. OMERO's Permission system supports simple data sharing in a lab, collaborative data analysis, and even teaching environments. OMERO software is open source and released by the OME Consortium at www.openmicroscopy.org.
Subject(s)
Information Dissemination , Molecular Imaging , Software , Animals , Internet , PublishingABSTRACT
Data-intensive research depends on tools that manage multidimensional, heterogeneous datasets. We built OME Remote Objects (OMERO), a software platform that enables access to and use of a wide range of biological data. OMERO uses a server-based middleware application to provide a unified interface for images, matrices and tables. OMERO's design and flexibility have enabled its use for light-microscopy, high-content-screening, electron-microscopy and even non-image-genotype data. OMERO is open-source software, available at http://openmicroscopy.org/.
Subject(s)
Database Management Systems , Databases, Factual , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Models, Biological , Software , User-Computer Interface , Animals , Biology/methods , Computer Simulation , HumansABSTRACT
A growing community is constructing a next-generation file format (NGFF) for bioimaging to overcome problems of scalability and heterogeneity. Organized by the Open Microscopy Environment (OME), individuals and institutes across diverse modalities facing these problems have designed a format specification process (OME-NGFF) to address these needs. This paper brings together a wide range of those community members to describe the cloud-optimized format itself -- OME-Zarr -- along with tools and data resources available today to increase FAIR access and remove barriers in the scientific process. The current momentum offers an opportunity to unify a key component of the bioimaging domain -- the file format that underlies so many personal, institutional, and global data management and analysis tasks.
ABSTRACT
Faced with the need to support a growing number of whole slide imaging (WSI) file formats, our team has extended a long-standing community file format (OME-TIFF) for use in digital pathology. The format makes use of the core TIFF specification to store multi-resolution (or "pyramidal") representations of a single slide in a flexible, performant manner. Here we describe the structure of this format, its performance characteristics, as well as an open-source library support for reading and writing pyramidal OME-TIFFs.
ABSTRACT
Data sharing is important in the biological sciences to prevent duplication of effort, to promote scientific integrity, and to facilitate and disseminate scientific discovery. Sharing requires centralized repositories, and submission to and utility of these resources require common data formats. This is particularly challenging for multidimensional microscopy image data, which are acquired from a variety of platforms with a myriad of proprietary file formats (PFFs). In this paper, we describe an open standard format that we have developed for microscopy image data. We call on the community to use open image data standards and to insist that all imaging platforms support these file formats. This will build the foundation for an open image data repository.
Subject(s)
Databases, Factual/standards , Information Storage and Retrieval/standards , Microscopy/methods , Computational Biology/methods , Databases, Factual/trends , Image Processing, Computer-Assisted/methods , Image Processing, Computer-Assisted/standards , Information Storage and Retrieval/methods , Information Storage and Retrieval/trends , Internet , Software , User-Computer InterfaceABSTRACT
The explosion in quantitative imaging has driven the need to develop tools for storing, managing, analyzing, and viewing large sets of data. In this chapter, we discuss tools we have built for storing large data sets for the lifetime of a typical research project. As part of the Open Microscopy Environment (OME) Consortium, we have built a series of open-source tools that support the manipulation and visualization of large sets of complex image data. Images from a number of proprietary file formats can be imported and then accessed from a single server running in a laboratory or imaging facility. We discuss the capabilities of the OME Server, a Perl-based data management system that is designed for large-scale analysis of image data using a web browser-based user interface. In addition, we have recently released a lighter weight Java-based OME Remote Objects Server that supports remote applications for managing and viewing image data. Together these systems provide a suite of tools for large-scale quantitative imaging that is now commonly used throughout cell and developmental biology.
Subject(s)
Database Management Systems , Information Storage and Retrieval/methods , Microscopy , Software , User-Computer InterfaceABSTRACT
The Open Microscopy Environment (OME) defines a data model and a software implementation to serve as an informatics framework for imaging in biological microscopy experiments, including representation of acquisition parameters, annotations and image analysis results. OME is designed to support high-content cell-based screening as well as traditional image analysis applications. The OME Data Model, expressed in Extensible Markup Language (XML) and realized in a traditional database, is both extensible and self-describing, allowing it to meet emerging imaging and analysis needs.