Your browser doesn't support javascript.
loading
OMD Curation Toolkit: a workflow for in-house curation of public omics datasets.
Piquer-Esteban, Samuel; Arnau, Vicente; Diaz, Wladimiro; Moya, Andrés.
Affiliation
  • Piquer-Esteban S; Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish National Research Council, Valencia, Spain. samuel.piquer@uv.es.
  • Arnau V; Area of Genomics and Health, Foundation for the Promotion of Sanitary and Biomedical Research of Valencia Region (FISABIO-Public Health), Valencia, Spain. samuel.piquer@uv.es.
  • Diaz W; Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish National Research Council, Valencia, Spain.
  • Moya A; Area of Genomics and Health, Foundation for the Promotion of Sanitary and Biomedical Research of Valencia Region (FISABIO-Public Health), Valencia, Spain.
BMC Bioinformatics ; 25(1): 184, 2024 May 09.
Article in En | MEDLINE | ID: mdl-38724907
ABSTRACT

BACKGROUND:

Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing.

RESULTS:

Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources.

CONCLUSIONS:

Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / Workflow / Data Curation Language: En Journal: BMC Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2024 Document type: Article Affiliation country: España Country of publication: Reino Unido

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / Workflow / Data Curation Language: En Journal: BMC Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2024 Document type: Article Affiliation country: España Country of publication: Reino Unido