RESUMO
Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositories. The vast majority do not adhere to a single standard, such as Schema.org, which is widely-adopted by generalist repositories. Consequently, datasets in these repositories are not findable in aggregation projects like Google Dataset Search. We alleviated this gap by creating a reusable metadata schema based on Schema.org and catalogued nearly 400 datasets and computational tools we collected. The approach is easily reusable to create schemas interoperable with community standards, but customized to a particular context. Our approach enabled data discovery, increased the reusability of datasets from a large research consortium, and accelerated research. Lastly, we discuss ongoing challenges with FAIRness beyond discoverability.
Assuntos
Doenças Transmissíveis , Conjuntos de Dados como Assunto , Metadados , Reprodutibilidade dos Testes , Conjuntos de Dados como Assunto/normas , HumanosRESUMO
We present the Canadian Open Neuroscience Platform (CONP) portal to answer the research community's need for flexible data sharing resources and provide advanced tools for search and processing infrastructure capacity. This portal differs from previous data sharing projects as it integrates datasets originating from a number of already existing platforms or databases through DataLad, a file level data integrity and access layer. The portal is also an entry point for searching and accessing a large number of standardized and containerized software and links to a computing infrastructure. It leverages community standards to help document and facilitate reuse of both datasets and tools, and already shows a growing community adoption giving access to more than 60 neuroscience datasets and over 70 tools. The CONP portal demonstrates the feasibility and offers a model of a distributed data and tool management system across 17 institutions throughout Canada.