Búsqueda | Portal Regional de la BVS

FAIR data pipeline: provenance-driven data management for traceable scientific workflows.

Mitchell, Sonia Natalie; Lahiff, Andrew; Cummings, Nathan; Hollocombe, Jonathan; Boskamp, Bram; Field, Ryan; Reddyhoff, Dennis; Zarebski, Kristian; Wilson, Antony; Viola, Bruno; Burke, Martin; Archibald, Blair; Bessell, Paul; Blackwell, Richard; Boden, Lisa A; Brett, Alys; Brett, Sam; Dundas, Ruth; Enright, Jessica; Gonzalez-Beltran, Alejandra N; Harris, Claire; Hinder, Ian; David Hughes, Christopher; Knight, Martin; Mano, Vino; McMonagle, Ciaran; Mellor, Dominic; Mohr, Sibylle; Marion, Glenn; Matthews, Louise; McKendrick, Iain J; Mark Pooley, Christopher; Porphyre, Thibaud; Reeves, Aaron; Townsend, Edward; Turner, Robert; Walton, Jeremy; Reeve, Richard.

Philos Trans A Math Phys Eng Sci ; 380(2233): 20210300, 2022 Oct 03.

Artículo en Inglés | MEDLINE | ID: mdl-35965468

RESUMEN

Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of 'following the science' are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue 'Technical challenges of modelling real-life epidemics and examples of overcoming these'.

Asunto(s)

COVID-19 , Manejo de Datos , Humanos , Pandemias , Programas Informáticos , Flujo de Trabajo

Ten simple rules for making a vocabulary FAIR.

Cox, Simon J D; Gonzalez-Beltran, Alejandra N; Magagna, Barbara; Marinescu, Maria-Cristina.

PLoS Comput Biol ; 17(6): e1009041, 2021 06.

Artículo en Inglés | MEDLINE | ID: mdl-34133421

RESUMEN

We present ten simple rules that support converting a legacy vocabulary-a list of terms available in a print-based glossary or in a table not accessible using web standards-into a FAIR vocabulary. Various pathways may be followed to publish the FAIR vocabulary, but we emphasise particularly the goal of providing a globally unique resolvable identifier for each term or concept. A standard representation of the concept should be returned when the individual web identifier is resolved, using SKOS or OWL serialised in an RDF-based representation for machine-interchange and in a web-page for human consumption. Guidelines for vocabulary and term metadata are provided, as well as development and maintenance considerations. The rules are arranged as a stepwise recipe for creating a FAIR vocabulary based on the legacy vocabulary. By following these rules you can achieve the outcome of converting a legacy vocabulary into a standalone FAIR vocabulary, which can be used for unambiguous data annotation. In turn, this increases data interoperability and enables data integration.

Asunto(s)

Guías como Asunto , Vocabulario Controlado , Internet , Aprendizaje Automático

Community standards for open cell migration data.

Gonzalez-Beltran, Alejandra N; Masuzzo, Paola; Ampe, Christophe; Bakker, Gert-Jan; Besson, Sébastien; Eibl, Robert H; Friedl, Peter; Gunzer, Matthias; Kittisopikul, Mark; Dévédec, Sylvia E Le; Leo, Simone; Moore, Josh; Paran, Yael; Prilusky, Jaime; Rocca-Serra, Philippe; Roudot, Philippe; Schuster, Marc; Sergeant, Gwendolien; Strömblad, Staffan; Swedlow, Jason R; van Erp, Merijn; Van Troys, Marleen; Zaritsky, Assaf; Sansone, Susanna-Assunta; Martens, Lennart.

Gigascience ; 9(5)2020 05 01.

Artículo en Inglés | MEDLINE | ID: mdl-32396199

RESUMEN

Cell migration research has become a high-content field. However, the quantitative information encapsulated in these complex and high-dimensional datasets is not fully exploited owing to the diversity of experimental protocols and non-standardized output formats. In addition, typically the datasets are not open for reuse. Making the data open and Findable, Accessible, Interoperable, and Reusable (FAIR) will enable meta-analysis, data integration, and data mining. Standardized data formats and controlled vocabularies are essential for building a suitable infrastructure for that purpose but are not available in the cell migration domain. We here present standardization efforts by the Cell Migration Standardisation Organisation (CMSO), an open community-driven organization to facilitate the development of standards for cell migration data. This work will foster the development of improved algorithms and tools and enable secondary analysis of public datasets, ultimately unlocking new knowledge of the complex biological process of cell migration.

Asunto(s)

Biomarcadores , Movimiento Celular , Investigación/normas , Biología Computacional/métodos , Biología Computacional/normas , Análisis de Datos , Bases de Datos Factuales , Metadatos

Semantic concept schema of the linear mixed model of experimental observations.

Cwiek-Kupczynska, Hanna; Filipiak, Katarzyna; Markiewicz, Augustyn; Rocca-Serra, Philippe; Gonzalez-Beltran, Alejandra N; Sansone, Susanna-Assunta; Millet, Emilie J; van Eeuwijk, Fred; Lawrynowicz, Agnieszka; Krajewski, Pawel.

Sci Data ; 7(1): 70, 2020 02 27.

Artículo en Inglés | MEDLINE | ID: mdl-32109232

RESUMEN

In the information age, smart data modelling and data management can be carried out to address the wealth of data produced in scientific experiments. In this paper, we propose a semantic model for the statistical analysis of datasets by linear mixed models. We tie together disparate statistical concepts in an interdisciplinary context through the application of ontologies, in particular the Statistics Ontology (STATO), to produce FAIR data summaries. We hope to improve the general understanding of statistical modelling and thus contribute to a better description of the statistical conclusions from data analysis, allowing their efficient exploration and automated processing.

Interoperable and scalable data analysis with microservices: applications in metabolomics.

Emami Khoonsari, Payam; Moreno, Pablo; Bergmann, Sven; Burman, Joachim; Capuccini, Marco; Carone, Matteo; Cascante, Marta; de Atauri, Pedro; Foguet, Carles; Gonzalez-Beltran, Alejandra N; Hankemeier, Thomas; Haug, Kenneth; He, Sijin; Herman, Stephanie; Johnson, David; Kale, Namrata; Larsson, Anders; Neumann, Steffen; Peters, Kristian; Pireddu, Luca; Rocca-Serra, Philippe; Roger, Pierrick; Rueedi, Rico; Ruttkies, Christoph; Sadawi, Noureddin; Salek, Reza M; Sansone, Susanna-Assunta; Schober, Daniel; Selivanov, Vitaly; Thévenot, Etienne A; van Vliet, Michael; Zanetti, Gianluigi; Steinbeck, Christoph; Kultima, Kim; Spjuth, Ola.

Bioinformatics ; 35(19): 3752-3760, 2019 10 01.

Artículo en Inglés | MEDLINE | ID: mdl-30851093

RESUMEN

MOTIVATION: Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. RESULTS: We developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science. AVAILABILITY AND IMPLEMENTATION: The PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the Virtual Research Environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Análisis de Datos , Metabolómica , Biología Computacional , Programas Informáticos , Flujo de Trabajo

Data discovery with DATS: exemplar adoptions and lessons learned.

Gonzalez-Beltran, Alejandra N; Campbell, John; Dunn, Patrick; Guijarro, Diana; Ionescu, Sanda; Kim, Hyeoneui; Lyle, Jared; Wiser, Jeffrey; Sansone, Susanna-Assunta; Rocca-Serra, Philippe.

J Am Med Inform Assoc ; 25(1): 13-16, 2018 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-29228196

RESUMEN

The DAta Tag Suite (DATS) is a model supporting dataset description, indexing, and discovery. It is available as an annotated serialization with schema.org, a vocabulary used by major search engines, thus making the datasets discoverable on the web. DATS underlies DataMed, the National Institutes of Health Big Data to Knowledge Data Discovery Index prototype, which aims to provide a "PubMed for datasets." The experience gained while indexing a heterogeneous range of >60 repositories in DataMed helped in evaluating DATS's entities, attributes, and scope. In this work, 3 additional exemplary and diverse data sources were mapped to DATS by their representatives or experts, offering a deep scan of DATS fitness against a new set of existing data. The procedure, including feedback from users and implementers, resulted in DATS implementation guidelines and best practices, and identification of a path for evolving and optimizing the model. Finally, the work exposed additional needs when defining datasets for indexing, especially in the context of clinical and observational information.

Asunto(s)

Indización y Redacción de Resúmenes , Conjuntos de Datos como Asunto , Alergia e Inmunología , Atención a la Salud , Humanos , Almacenamiento y Recuperación de la Información , Motor de Búsqueda , Ciencias Sociales , Vocabulario Controlado

The health care and life sciences community profile for dataset descriptions.

Dumontier, Michel; Gray, Alasdair J G; Marshall, M Scott; Alexiev, Vladimir; Ansell, Peter; Bader, Gary; Baran, Joachim; Bolleman, Jerven T; Callahan, Alison; Cruz-Toledo, José; Gaudet, Pascale; Gombocz, Erich A; Gonzalez-Beltran, Alejandra N; Groth, Paul; Haendel, Melissa; Ito, Maori; Jupp, Simon; Juty, Nick; Katayama, Toshiaki; Kobayashi, Norio; Krishnaswami, Kalpana; Laibe, Camille; Le Novère, Nicolas; Lin, Simon; Malone, James; Miller, Michael; Mungall, Christopher J; Rietveld, Laurens; Wimalaratne, Sarala M; Yamaguchi, Atsuko.

PeerJ ; 4: e2331, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-27602295

RESUMEN

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

The MetaboLights repository: curation challenges in metabolomics.

Salek, Reza M; Haug, Kenneth; Conesa, Pablo; Hastings, Janna; Williams, Mark; Mahendraker, Tejasvi; Maguire, Eamonn; González-Beltrán, Alejandra N; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Steinbeck, Christoph.

Database (Oxford) ; 2013: bat029, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23630246

RESUMEN

MetaboLights is the first general-purpose open-access curated repository for metabolomic studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Increases in the number of depositions, number of samples per study and the file size of data submitted to MetaboLights present a challenge for the objective of ensuring high-quality and standardized data in the context of diverse metabolomic workflows and data representations. Here, we describe the MetaboLights curation pipeline, its challenges and its practical application in quality control of complex data depositions. Database URL: http://www.ebi.ac.uk/metabolights.

Asunto(s)

Minería de Datos , Bases de Datos como Asunto , Metabolómica , Animales , Recolección de Datos , Humanos , Metaboloma , Metabolómica/normas , Proyectos de Investigación , Estadística como Asunto

Guidelines for information about therapy experiments: a proposal on best practice for recording experimental data on cancer therapy.

González-Beltrán, Alejandra N; Yong, May Y; Dancey, Gairin; Begent, Richard.

BMC Res Notes ; 5: 10, 2012 Jan 06.

Artículo en Inglés | MEDLINE | ID: mdl-22226027

RESUMEN

BACKGROUND: Biology, biomedicine and healthcare have become data-driven enterprises, where scientists and clinicians need to generate, access, validate, interpret and integrate different kinds of experimental and patient-related data. Thus, recording and reporting of data in a systematic and unambiguous fashion is crucial to allow aggregation and re-use of data. This paper reviews the benefits of existing biomedical data standards and focuses on key elements to record experiments for therapy development. Specifically, we describe the experiments performed in molecular, cellular, animal and clinical models. We also provide an example set of elements for a therapy tested in a phase I clinical trial. FINDINGS: We introduce the Guidelines for Information About Therapy Experiments (GIATE), a minimum information checklist creating a consistent framework to transparently report the purpose, methods and results of the therapeutic experiments. A discussion on the scope, design and structure of the guidelines is presented, together with a description of the intended audience. We also present complementary resources such as a classification scheme, and two alternative ways of creating GIATE information: an electronic lab notebook and a simple spreadsheet-based format. Finally, we use GIATE to record the details of the phase I clinical trial of CHT-25 for patients with refractory lymphomas. The benefits of using GIATE for this experiment are discussed. CONCLUSIONS: While data standards are being developed to facilitate data sharing and integration in various aspects of experimental medicine, such as genomics and clinical data, no previous work focused on therapy development. We propose a checklist for therapy experiments and demonstrate its use in the 131Iodine labeled CHT-25 chimeric antibody cancer therapy. As future work, we will expand the set of GIATE tools to continue to encourage its use by cancer researchers, and we will engineer an ontology to annotate GIATE elements and facilitate unambiguous interpretation and data integration.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA