RESUMO
The rapid pace of innovation in biological imaging and the diversity of its applications have prevented the establishment of a community-agreed standardized data format. We propose that complementing established open formats such as OME-TIFF and HDF5 with a next-generation file format such as Zarr will satisfy the majority of use cases in bioimaging. Critically, a common metadata format used in all these vessels can deliver truly findable, accessible, interoperable and reusable bioimaging data.
Assuntos
Biologia Computacional/instrumentação , Biologia Computacional/normas , Metadados , Microscopia/instrumentação , Microscopia/normas , Software , Benchmarking , Biologia Computacional/métodos , Compressão de Dados , Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Internet , Microscopia/métodos , Linguagens de Programação , SARS-CoV-2RESUMO
A growing community is constructing a next-generation file format (NGFF) for bioimaging to overcome problems of scalability and heterogeneity. Organized by the Open Microscopy Environment (OME), individuals and institutes across diverse modalities facing these problems have designed a format specification process (OME-NGFF) to address these needs. This paper brings together a wide range of those community members to describe the cloud-optimized format itself-OME-Zarr-along with tools and data resources available today to increase FAIR access and remove barriers in the scientific process. The current momentum offers an opportunity to unify a key component of the bioimaging domain-the file format that underlies so many personal, institutional, and global data management and analysis tasks.
Assuntos
Microscopia , Software , Humanos , Apoio ComunitárioRESUMO
High content screening (HCS) experiments create a classic data management challenge-multiple, large sets of heterogeneous structured and unstructured data, that must be integrated and linked to produce a set of "final" results. These different data include images, reagents, protocols, analytic output, and phenotypes, all of which must be stored, linked and made accessible for users, scientists, collaborators and where appropriate the wider community. The OME Consortium has built several open source tools for managing, linking and sharing these different types of data. The OME Data Model is a metadata specification that supports the image data and metadata recorded in HCS experiments. Bio-Formats is a Java library that reads recorded image data and metadata and includes support for several HCS screening systems. OMERO is an enterprise data management application that integrates image data, experimental and analytic metadata and makes them accessible for visualization, mining, sharing and downstream analysis. We discuss how Bio-Formats and OMERO handle these different data types, and how they can be used to integrate, link and share HCS experiments in facilities and public data repositories. OME specifications and software are open source and are available at https://www.openmicroscopy.org.
Assuntos
Biologia Computacional/estatística & dados numéricos , Mineração de Dados/estatística & dados numéricos , Ensaios de Triagem em Larga Escala/estatística & dados numéricos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Software , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Ensaios de Triagem em Larga Escala/métodos , Humanos , Disseminação de Informação , Armazenamento e Recuperação da Informação/métodos , InternetRESUMO
Imaging data are used in the life and biomedical sciences to measure the molecular and structural composition and dynamics of cells, tissues, and organisms. Datasets range in size from megabytes to terabytes and usually contain a combination of binary pixel data and metadata that describe the acquisition process and any derived results. The OMERO image data management platform allows users to securely share image datasets according to specific permissions levels: data can be held privately, shared with a set of colleagues, or made available via a public URL. Users control access by assigning data to specific Groups with defined membership and access rights. OMERO's Permission system supports simple data sharing in a lab, collaborative data analysis, and even teaching environments. OMERO software is open source and released by the OME Consortium at www.openmicroscopy.org.
Assuntos
Disseminação de Informação , Imagem Molecular , Software , Animais , Internet , EditoraçãoRESUMO
A growing community is constructing a next-generation file format (NGFF) for bioimaging to overcome problems of scalability and heterogeneity. Organized by the Open Microscopy Environment (OME), individuals and institutes across diverse modalities facing these problems have designed a format specification process (OME-NGFF) to address these needs. This paper brings together a wide range of those community members to describe the cloud-optimized format itself -- OME-Zarr -- along with tools and data resources available today to increase FAIR access and remove barriers in the scientific process. The current momentum offers an opportunity to unify a key component of the bioimaging domain -- the file format that underlies so many personal, institutional, and global data management and analysis tasks.
RESUMO
UNLABELLED: TOPALi v2 simplifies and automates the use of several methods for the evolutionary analysis of multiple sequence alignments. Jobs are submitted from a Java graphical user interface as TOPALi web services to either run remotely on high-performance computing clusters or locally (with multiple cores supported). Methods available include model selection and phylogenetic tree estimation using the Bayesian inference and maximum likelihood (ML) approaches, in addition to recombination detection methods. The optimal substitution model can be selected for protein or nucleic acid (standard, or protein-coding using a codon position model) data using accurate statistical criteria derived from ML co-estimation of the tree and the substitution model. Phylogenetic software available includes PhyML, RAxML and MrBayes. AVAILABILITY: Freely downloadable from http://www.topali.org for Windows, Mac OS X, Linux and Solaris.
Assuntos
Gráficos por Computador , Computadores , Evolução Molecular , Alinhamento de Sequência/instrumentação , Interface Usuário-Computador , Códon/genética , Internet , FilogeniaRESUMO
Faced with the need to support a growing number of whole slide imaging (WSI) file formats, our team has extended a long-standing community file format (OME-TIFF) for use in digital pathology. The format makes use of the core TIFF specification to store multi-resolution (or "pyramidal") representations of a single slide in a flexible, performant manner. Here we describe the structure of this format, its performance characteristics, as well as an open-source library support for reading and writing pyramidal OME-TIFFs.
RESUMO
Similarity Matrix of Proteins (SIMAP) (http://mips.gsf.de/simap) provides a database based on a pre-computed similarity matrix covering the similarity space formed by >4 million amino acid sequences from public databases and completely sequenced genomes. The database is capable of handling very large datasets and is updated incrementally. For sequence similarity searches and pairwise alignments, we implemented a grid-enabled software system, which is based on FASTA heuristics and the Smith-Waterman algorithm. Our ProtInfo system allows querying by protein sequences covered by the SIMAP dataset as well as by fragments of these sequences, highly similar sequences and title words. Each sequence in the database is supplemented with pre-calculated features generated by detailed sequence analyses. By providing WWW interfaces as well as web-services, we offer the SIMAP resource as an efficient and comprehensive tool for sequence similarity searches.