RESUMEN
The EU-ToxRisk project (2016-2021) was a large European project working towards shifting toxicological testing away from animal tests, towards a toxicological assessment based on comprehensive mechanistic understanding of cause-consequence relationships of chemical adverse effects. More than 40 partners from scientific institutions, industry and regulators coordinated their work towards this goal in a six-year long programme. The breadth and variety of data and knowledge generated, presented a challenging data management landscape. Here, we describe our approach to data management as developed under EU-ToxRisk. The main building blocks of the data infrastructure are: 1) An easy-to-use, extensible data and metadata format; 2) A flexible system with protocols for data capture and sharing from the entire consortium; 3) A methods database for describing and reviewing data generation and processing protocols; 4) Data archiving using a sustainable resource; 5) Data transformation from the archive to the system that provides granular access; 6) Application Programming Interface (API) for access to individual data points; 7) Data exploration and analysis modules, based on a «web notebook¼ approach to executable data processing documentation; and 8) Knowledge portal that ties together all of the above and provides a collaboration space for information exchange across the consortium. This knowledge infrastructure is being extended and refined for the support of follow-up projects (RISK-HUNT3R, ASPIS cluster, European Open Science Cloud (2021-2026)).
Asunto(s)
Bases de Datos Factuales , Toxicología , Medición de Riesgo/métodos , Humanos , Toxicología/métodos , Animales , Manejo de DatosRESUMEN
The main goals and challenges for the life science communities in the Open Science framework are to increase reuse and sustainability of data resources, software tools, and workflows, especially in large-scale data-driven research and computational analyses. Here, we present key findings, procedures, effective measures and recommendations for generating and establishing sustainable life science resources based on the collaborative, cross-disciplinary work done within the EOSC-Life (European Open Science Cloud for Life Sciences) consortium. Bringing together 13 European life science research infrastructures, it has laid the foundation for an open, digital space to support biological and medical research. Using lessons learned from 27 selected projects, we describe the organisational, technical, financial and legal/ethical challenges that represent the main barriers to sustainability in the life sciences. We show how EOSC-Life provides a model for sustainable data management according to FAIR (findability, accessibility, interoperability, and reusability) principles, including solutions for sensitive- and industry-related resources, by means of cross-disciplinary training and best practices sharing. Finally, we illustrate how data harmonisation and collaborative work facilitate interoperability of tools, data, solutions and lead to a better understanding of concepts, semantics and functionalities in the life sciences.
Asunto(s)
Disciplinas de las Ciencias Biológicas , Investigación Biomédica , Programas Informáticos , Flujo de TrabajoRESUMEN
Biological imaging is one of the primary tools by which we understand living systems across scales from atoms to organisms. Rapid advances in imaging technology have increased both the spatial and temporal resolutions at which we examine those systems, as well as enabling visualisation of larger tissue volumes. These advances have huge potential but also generate ever increasing amounts of imaging data that must be stored and analysed. Public image repositories provide a critical scientific service through open data provision, supporting reproducibility of scientific results, access to reference imaging datasets and reuse of data for new scientific discovery and acceleration of image analysis methods development. The scale and scope of imaging data provides both challenges and opportunities for open sharing of image data. In this article, we provide a perspective influenced by decades of provision of open data resources for biological information, suggesting areas to focus on and a path towards global interoperability.
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Reproducibilidad de los ResultadosRESUMEN
Organised data is easy to use but the rapid developments in the field of bioimaging, with improvements in instrumentation, detectors, software and experimental techniques, have resulted in an explosion of the volumes of data being generated, making well-organised data an elusive goal. This guide offers a handful of recommendations for bioimage depositors, analysts and microscope and software developers, whose implementation would contribute towards better organised data in preparation for archival. Based on our experience archiving large image datasets in EMPIAR, the BioImage Archive and BioStudies, we propose a number of strategies that we believe would improve the usability (clarity, orderliness, learnability, navigability, self-documentation, coherence and consistency of identifiers, accessibility, succinctness) of future data depositions more useful to the bioimaging community (data authors and analysts, researchers, clinicians, funders, collaborators, industry partners, hardware/software producers, journals, archive developers as well as interested but non-specialist users of bioimaging data). The recommendations that may also find use in other data-intensive disciplines. To facilitate the process of analysing data organisation, we present bandbox, a Python package that provides users with an assessment of their data by flagging potential issues, such as redundant directories or invalid characters in file or folder names, that should be addressed before archival. We offer these recommendations as a starting point and hope to engender more substantial conversations across and between the various data-rich communities.
Asunto(s)
Comunicación , Industrias , Humanos , Proyectos de Investigación , Investigadores , Programas InformáticosRESUMEN
The data currently described was generated within the EU/FP7 HeCaToS project (Hepatic and Cardiac Toxicity Systems modeling). The project aimed to develop an in silico prediction system to contribute to drug safety assessment for humans. For this purpose, multi-omics data of repeated dose toxicity were obtained for 10 hepatotoxic and 10 cardiotoxic compounds. Most data were gained from in vitro experiments in which 3D microtissues (either hepatic or cardiac) were exposed to a therapeutic (physiologically relevant concentrations calculated through PBPK-modeling) or a toxic dosing profile (IC20 after 7 days). Exposures lasted for 14 days and samples were obtained at 7 time points (therapeutic doses: 2-8-24-72-168-240-336 h; toxic doses 0-2-8-24-72-168-240 h). Transcriptomics (RNA sequencing & microRNA sequencing), proteomics (LC-MS), epigenomics (MeDIP sequencing) and metabolomics (LC-MS & NMR) data were obtained from these samples. Furthermore, functional endpoints (ATP content, Caspase3/7 and O2 consumption) were measured in exposed microtissues. Additionally, multi-omics data from human biopsies from patients are available. This data is now being released to the scientific community through the BioStudies data repository ( https://www.ebi.ac.uk/biostudies/ ).
Asunto(s)
Cardiotoxicidad , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Epigenómica , Metabolómica , Proteómica , TranscriptomaRESUMEN
Despite the huge impact of data resources in genomics and structural biology, until now there has been no central archive for biological data for all imaging modalities. The BioImage Archive is a new data resource at the European Bioinformatics Institute (EMBL-EBI) designed to fill this gap. In its initial development BioImage Archive accepts bioimaging data associated with publications, in any format, from any imaging modality from the molecular to the organism scale, excluding medical imaging. The BioImage Archive will ensure reproducibility of published studies that derive results from image data and reduce duplication of effort. Most importantly, the BioImage Archive will help scientists to generate new insights through reuse of existing data to answer new biological questions, and provision of training, testing and benchmarking data for development of tools for image analysis. The archive is available at https://www.ebi.ac.uk/bioimage-archive/.
Asunto(s)
Archivos , Uso de Internet , Microscopía , Bases de Datos Factuales , Reproducibilidad de los ResultadosRESUMEN
The Human Cell Atlas (HCA) consortium aims to establish an atlas of all organs in the healthy human body at single-cell resolution to increase our understanding of basic biological processes that govern development, physiology and anatomy, and to accelerate diagnosis and treatment of disease. The Lung Biological Network of the HCA aims to generate the Human Lung Cell Atlas as a reference for the cellular repertoire, molecular cell states and phenotypes, and cell-cell interactions that characterise normal lung homeostasis in healthy lung tissue. Such a reference atlas of the healthy human lung will facilitate mapping the changes in the cellular landscape in disease. The discovAIR project is one of six pilot actions for the HCA funded by the European Commission in the context of the H2020 framework programme. discovAIR aims to establish the first draft of an integrated Human Lung Cell Atlas, combining single-cell transcriptional and epigenetic profiling with spatially resolving techniques on matched tissue samples, as well as including a number of chronic and infectious diseases of the lung. The integrated Human Lung Cell Atlas will be available as a resource for the wider respiratory community, including basic and translational scientists, clinical medicine, and the private sector, as well as for patients with lung disease and the interested lay public. We anticipate that the Human Lung Cell Atlas will be the founding stone for a more detailed understanding of the pathogenesis of lung diseases, guiding the design of novel diagnostics and preventive or curative interventions.
Asunto(s)
Enfermedades Pulmonares , Pulmón , Humanos , Proteómica , TóraxAsunto(s)
Biología Computacional/métodos , Biología Computacional/normas , Diagnóstico por Imagen/métodos , Diagnóstico por Imagen/normas , Metadatos/normas , Animales , Inteligencia Artificial , Biología Computacional/instrumentación , Bases de Datos Factuales , Diagnóstico por Imagen/instrumentación , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Almacenamiento y Recuperación de la Información/métodos , Microscopía/métodos , Proteómica/normas , Sociedades Científicas , Programas Informáticos , Espectrometría Raman , Interfaz Usuario-ComputadorAsunto(s)
Biología Computacional/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Metadatos , Algoritmos , Animales , Congresos como Asunto , Microscopía por Crioelectrón/métodos , Minería de Datos/métodos , Bases de Datos Factuales , Diagnóstico por Imagen/métodos , Humanos , Microscopía , Programas InformáticosRESUMEN
This protocol illustrates the steps necessary to deposit correlated 3D cryo-imaging data from cryo-structured illumination microscopy and cryo-soft X-ray tomography with the BioStudies and EMPIAR deposition databases of the European Bioinformatics Institute. There is currently a real need for a robust method of data deposition to ensure unhindered access to and independent validation of correlative light and X-ray microscopy data to allow use in further comparative studies, educational activities, and data mining. For complete details on the use and execution of this protocol, please refer to Kounatidis et al. (2020).
Asunto(s)
Bases de Datos Factuales , Imagenología Tridimensional , Tomografía por Rayos XRESUMEN
ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for publication-related microarray data and was later extended to accept sequencing-based data. Over the last decade an increasing share of biological experiments involve multiple technologies assaying different biological modalities, such as epigenetics, and RNA and protein expression, and thus the BioStudies database (https://www.ebi.ac.uk/biostudies) was established to deal with such multimodal data. Its central concept is a study, which typically is associated with a publication. BioStudies stores metadata describing the study, provides links to the relevant databases, such as European Nucleotide Archive (ENA), as well as hosts the types of data for which specialized databases do not exist. With BioStudies now fully functional, we are able to further harmonize the archival data infrastructure at EMBL-EBI, and ArrayExpress is being migrated to BioStudies. In future, all functional genomics data will be archived at BioStudies. The process will be seamless for the users, who will continue to submit data using the online tool Annotare and will be able to query and download data largely in the same manner as before. Nevertheless, some technical aspects, particularly programmatic access, will change. This update guides the users through these changes.
Asunto(s)
Bases de Datos Genéticas , Epigénesis Genética , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Animales , Línea Celular , Metilación de ADN , Perfilación de la Expresión Génica , Humanos , Internet , Metadatos , Especificidad de Órganos , Plantas/genética , Análisis de la Célula Individual , Programas InformáticosRESUMEN
Uncovering cellular responses from heterogeneous genomic data is crucial for molecular medicine in particular for drug safety. This can be realized by integrating the molecular activities in networks of interacting proteins. As proof-of-concept we challenge network modeling with time-resolved proteome, transcriptome and methylome measurements in iPSC-derived human 3D cardiac microtissues to elucidate adverse mechanisms of anthracycline cardiotoxicity measured with four different drugs (doxorubicin, epirubicin, idarubicin and daunorubicin). Dynamic molecular analysis at in vivo drug exposure levels reveal a network of 175 disease-associated proteins and identify common modules of anthracycline cardiotoxicity in vitro, related to mitochondrial and sarcomere function as well as remodeling of extracellular matrix. These in vitro-identified modules are transferable and are evaluated with biopsies of cardiomyopathy patients. This to our knowledge most comprehensive study on anthracycline cardiotoxicity demonstrates a reproducible workflow for molecular medicine and serves as a template for detecting adverse drug responses from complex omics data.
Asunto(s)
Metaboloma , Modelos Biológicos , Proteoma , Transcriptoma , Epigénesis Genética , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Metabolómica/métodos , Mitocondrias/genética , Mitocondrias/metabolismo , Proteómica/métodos , Sarcómeros/genética , Sarcómeros/metabolismo , Transducción de SeñalRESUMEN
Hazard assessment, based on new approach methods (NAM), requires the use of batteries of assays, where individual tests may be contributed by different laboratories. A unified strategy for such collaborative testing is presented. It details all procedures required to allow test information to be usable for integrated hazard assessment, strategic project decisions and/or for regulatory purposes. The EU-ToxRisk project developed a strategy to provide regulatorily valid data, and exemplified this using a panel of > 20 assays (with > 50 individual endpoints), each exposed to 19 well-known test compounds (e.g. rotenone, colchicine, mercury, paracetamol, rifampicine, paraquat, taxol). Examples of strategy implementation are provided for all aspects required to ensure data validity: (i) documentation of test methods in a publicly accessible database; (ii) deposition of standard operating procedures (SOP) at the European Union DB-ALM repository; (iii) test readiness scoring accoding to defined criteria; (iv) disclosure of the pipeline for data processing; (v) link of uncertainty measures and metadata to the data; (vi) definition of test chemicals, their handling and their behavior in test media; (vii) specification of the test purpose and overall evaluation plans. Moreover, data generation was exemplified by providing results from 25 reporter assays. A complete evaluation of the entire test battery will be described elsewhere. A major learning from the retrospective analysis of this large testing project was the need for thorough definitions of the above strategy aspects, ideally in form of a study pre-registration, to allow adequate interpretation of the data and to ensure overall scientific/toxicological validity.
Asunto(s)
Documentación , Procesamiento Automatizado de Datos/legislación & jurisprudencia , Regulación Gubernamental , Pruebas de Toxicidad , Toxicología/legislación & jurisprudencia , Animales , Células Cultivadas , Europa (Continente) , Humanos , Formulación de Políticas , Reproducibilidad de los Resultados , Estudios Retrospectivos , Medición de Riesgo , Terminología como Asunto , Pez Cebra/embriologíaRESUMEN
ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data from a variety of technologies assaying functional modalities of a genome, such as gene expression or promoter occupancy. The number of experiments based on sequencing technologies, in particular RNA-seq experiments, has been increasing over the last few years and submissions of sequencing data have overtaken microarray experiments in the last 12 months. Additionally, there is a significant increase in experiments investigating single cells, rather than bulk samples, known as single-cell RNA-seq. To accommodate these trends, we have substantially changed our submission tool Annotare which, along with raw and processed data, collects all metadata necessary to interpret these experiments. Selected datasets are re-processed and loaded into our sister resource, the value-added Expression Atlas (and its component Single Cell Expression Atlas), which not only enables users to interpret the data easily but also serves as a test for data quality. With an increasing number of studies that combine different assay modalities (multi-omics experiments), a new more general archival resource the BioStudies Database has been developed, which will eventually supersede ArrayExpress. Data submissions will continue unchanged; all existing ArrayExpress data will be incorporated into BioStudies and the existing accession numbers and application programming interfaces will be maintained.
Asunto(s)
Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos , Bases de Datos Genéticas , RNA-Seq/métodosRESUMEN
This paper was originally published under standard Nature America Inc. copyright. As of the date of this correction, the Resource is available online as an open-access paper with a CC-BY license. No other part of the paper has been changed.
RESUMEN
BioStudies (www.ebi.ac.uk/biostudies) is a new public database that organizes data from biological studies. Typically, but not exclusively, a study is associated with a publication. BioStudies offers a simple way to describe the study structure, and provides flexible data deposition tools and data access interfaces. The actual data can be stored either in BioStudies or remotely, or both. BioStudies imports supplementary data from Europe PMC, and is a resource for authors and publishers for packaging data during the manuscript preparation process. It also can support data management needs of collaborative projects. The growth in multiomics experiments and other multi-faceted approaches to life sciences research mean that studies result in a diversity of data outputs in multiple locations. BioStudies presents a solution to ensuring that all these data and the associated publication(s) can be found coherently in the longer term.
Asunto(s)
Disciplinas de las Ciencias Biológicas , Bases de Datos Factuales , Animales , Humanos , Internet , Programas InformáticosAsunto(s)
Evaluación Preclínica de Medicamentos/métodos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Técnicas de Cultivo de Órganos/métodos , Pruebas de Toxicidad Aguda/métodos , Animales , Relación Dosis-Respuesta a Droga , Corazón/efectos de los fármacos , Humanos , Hígado/efectos de los fármacos , Preparaciones Farmacéuticas/administración & dosificación , Preparaciones Farmacéuticas/sangre , FarmacocinéticaRESUMEN
Access to primary research data is vital for the advancement of science. To extend the data types supported by community repositories, we built a prototype Image Data Resource (IDR) that collects and integrates imaging data acquired across many different imaging modalities. IDR links data from several imaging modalities, including high-content screening, super-resolution and time-lapse microscopy, digital pathology, public genetic or chemical databases, and cell and tissue phenotypes expressed using controlled ontologies. Using this integration, IDR facilitates the analysis of gene networks and reveals functional interactions that are inaccessible to individual studies. To enable re-analysis, we also established a computational resource based on Jupyter notebooks that allows remote access to the entire IDR. IDR is also an open source platform that others can use to publish their own image data. Thus IDR provides both a novel on-line resource and a software infrastructure that promotes and extends publication and re-analysis of scientific image data.