Búsqueda | Portal Regional de la BVS

1.

Toward a common standard for data and specimen provenance in life sciences.

Wittner, Rudolf; Holub, Petr; Mascia, Cecilia; Frexia, Francesca; Müller, Heimo; Plass, Markus; Allocca, Clare; Betsou, Fay; Burdett, Tony; Cancio, Ibon; Chapman, Adriane; Chapman, Martin; Courtot, Mélanie; Curcin, Vasa; Eder, Johann; Elliot, Mark; Exter, Katrina; Goble, Carole; Golebiewski, Martin; Kisler, Bron; Kremer, Andreas; Leo, Simone; Lin-Gibson, Sheng; Marsano, Anna; Mattavelli, Marco; Moore, Josh; Nakae, Hiroki; Perseil, Isabelle; Salman, Ayat; Sluka, James; Soiland-Reyes, Stian; Strambio-De-Castillia, Caterina; Sussman, Michael; Swedlow, Jason R; Zatloukal, Kurt; Geiger, Jörg.

Learn Health Syst ; 8(1): e10365, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38249839

RESUMEN

Open and practical exchange, dissemination, and reuse of specimens and data have become a fundamental requirement for life sciences research. The quality of the data obtained and thus the findings and knowledge derived is thus significantly influenced by the quality of the samples, the experimental methods, and the data analysis. Therefore, a comprehensive and precise documentation of the pre-analytical conditions, the analytical procedures, and the data processing are essential to be able to assess the validity of the research results. With the increasing importance of the exchange, reuse, and sharing of data and samples, procedures are required that enable cross-organizational documentation, traceability, and non-repudiation. At present, this information on the provenance of samples and data is mostly either sparse, incomplete, or incoherent. Since there is no uniform framework, this information is usually only provided within the organization and not interoperably. At the same time, the collection and sharing of biological and environmental specimens increasingly require definition and documentation of benefit sharing and compliance to regulatory requirements rather than consideration of pure scientific needs. In this publication, we present an ongoing standardization effort to provide trustworthy machine-actionable documentation of the data lineage and specimens. We would like to invite experts from the biotechnology and biomedical fields to further contribute to the standard.

2.

Federated causal inference based on real-world observational data sources: application to a SARS-CoV-2 vaccine effectiveness assessment.

Meurisse, Marjan; Estupiñán-Romero, Francisco; González-Galindo, Javier; Martínez-Lizaga, Natalia; Royo-Sierra, Santiago; Saldner, Simon; Dolanski-Aghamanoukjan, Lorenz; Degelsegger-Marquez, Alexander; Soiland-Reyes, Stian; Van Goethem, Nina; Bernal-Delgado, Enrique.

BMC Med Res Methodol ; 23(1): 248, 2023 10 23.

Artículo en Inglés | MEDLINE | ID: mdl-37872541

RESUMEN

INTRODUCTION: Causal inference helps researchers and policy-makers to evaluate public health interventions. When comparing interventions or public health programs by leveraging observational sensitive individual-level data from populations crossing jurisdictional borders, a federated approach (as opposed to a pooling data approach) can be used. Approaching causal inference by re-using routinely collected observational data across different regions in a federated manner, is challenging and guidance is currently lacking. With the aim of filling this gap and allowing a rapid response in the case of a next pandemic, a methodological framework to develop studies attempting causal inference using federated cross-national sensitive observational data, is described and showcased within the European BeYond-COVID project. METHODS: A framework for approaching federated causal inference by re-using routinely collected observational data across different regions, based on principles of legal, organizational, semantic and technical interoperability, is proposed. The framework includes step-by-step guidance, from defining a research question, to establishing a causal model, identifying and specifying data requirements in a common data model, generating synthetic data, and developing an interoperable and reproducible analytical pipeline for distributed deployment. The conceptual and instrumental phase of the framework was demonstrated and an analytical pipeline implementing federated causal inference was prototyped using open-source software in preparation for the assessment of real-world effectiveness of SARS-CoV-2 primary vaccination in preventing infection in populations spanning different countries, integrating a data quality assessment, imputation of missing values, matching of exposed to unexposed individuals based on confounders identified in the causal model and a survival analysis within the matched population. RESULTS: The conceptual and instrumental phase of the proposed methodological framework was successfully demonstrated within the BY-COVID project. Different Findable, Accessible, Interoperable and Reusable (FAIR) research objects were produced, such as a study protocol, a data management plan, a common data model, a synthetic dataset and an interoperable analytical pipeline. CONCLUSIONS: The framework provides a systematic approach to address federated cross-national policy-relevant causal research questions based on sensitive population, health and care data in a privacy-preserving and interoperable way. The methodology and derived research objects can be re-used and contribute to pandemic preparedness.

Asunto(s)

COVID-19 , Humanos , COVID-19/epidemiología , COVID-19/prevención & control , Vacunas contra la COVID-19 , SARS-CoV-2 , Eficacia de las Vacunas , Causalidad

3.

Ten simple rules for making a software tool workflow-ready.

Brack, Paul; Crowther, Peter; Soiland-Reyes, Stian; Owen, Stuart; Lowe, Douglas; Williams, Alan R; Groom, Quentin; Dillen, Mathias; Coppens, Frederik; Grüning, Björn; Eguinoa, Ignacio; Ewels, Philip; Goble, Carole.

PLoS Comput Biol ; 18(3): e1009823, 2022 03.

Artículo en Inglés | MEDLINE | ID: mdl-35324885

Asunto(s)

Biología Computacional , Programas Informáticos , Flujo de Trabajo

4.

Perspectives on automated composition of workflows in the life sciences.

Lamprecht, Anna-Lena; Palmblad, Magnus; Ison, Jon; Schwämmle, Veit; Al Manir, Mohammad Sadnan; Altintas, Ilkay; Baker, Christopher J O; Ben Hadj Amor, Ammar; Capella-Gutierrez, Salvador; Charonyktakis, Paulos; Crusoe, Michael R; Gil, Yolanda; Goble, Carole; Griffin, Timothy J; Groth, Paul; Ienasescu, Hans; Jagtap, Pratik; Kalas, Matús; Kasalica, Vedran; Khanteymoori, Alireza; Kuhn, Tobias; Mei, Hailiang; Ménager, Hervé; Möller, Steffen; Richardson, Robin A; Robert, Vincent; Soiland-Reyes, Stian; Stevens, Robert; Szaniszlo, Szoke; Verberne, Suzan; Verhoeven, Aswin; Wolstencroft, Katherine.

F1000Res ; 10: 897, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34804501

RESUMEN

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.

Asunto(s)

Disciplinas de las Ciencias Biológicas , Biología Computacional , Benchmarking , Programas Informáticos , Flujo de Trabajo

5.

Semantic micro-contributions with decentralized nanopublication services.

Kuhn, Tobias; Taelman, Ruben; Emonet, Vincent; Antonatos, Haris; Soiland-Reyes, Stian; Dumontier, Michel.

PeerJ Comput Sci ; 7: e387, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-33817033

RESUMEN

While the publication of Linked Data has become increasingly common, the process tends to be a relatively complicated and heavy-weight one. Linked Data is typically published by centralized entities in the form of larger dataset releases, which has the downside that there is a central bottleneck in the form of the organization or individual responsible for the releases. Moreover, certain kinds of data entries, in particular those with subjective or original content, currently do not fit into any existing dataset and are therefore more difficult to publish. To address these problems, we present here an approach to use nanopublications and a decentralized network of services to allow users to directly publish small Linked Data statements through a simple and user-friendly interface, called Nanobench, powered by semantic templates that are themselves published as nanopublications. The published nanopublications are cryptographically verifiable and can be queried through a redundant and decentralized network of services, based on the grlc API generator and a new quad extension of Triple Pattern Fragments. We show here that these two kinds of services are complementary and together allow us to query nanopublications in a reliable and efficient manner. We also show that Nanobench makes it indeed very easy for users to publish Linked Data statements, even for those who have no prior experience in Linked Data publishing.

6.

Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv.

Khan, Farah Zaib; Soiland-Reyes, Stian; Sinnott, Richard O; Lonie, Andrew; Goble, Carole; Crusoe, Michael R.

Gigascience ; 8(11)2019 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-31675414

RESUMEN

BACKGROUND: The automation of data analysis in the form of scientific workflows has become a widely adopted practice in many fields of research. Computationally driven data-intensive experiments using workflows enable automation, scaling, adaptation, and provenance support. However, there are still several challenges associated with the effective sharing, publication, and reproducibility of such workflows due to the incomplete capture of provenance and lack of interoperability between different technical (software) platforms. RESULTS: Based on best-practice recommendations identified from the literature on workflow design, sharing, and publishing, we define a hierarchical provenance framework to achieve uniformity in provenance and support comprehensive and fully re-executable workflows equipped with domain-specific information. To realize this framework, we present CWLProv, a standard-based format to represent any workflow-based computational analysis to produce workflow output artefacts that satisfy the various levels of provenance. We use open source community-driven standards, interoperable workflow definitions in Common Workflow Language (CWL), structured provenance representation using the W3C PROV model, and resource aggregation and sharing as workflow-centric research objects generated along with the final outputs of a given workflow enactment. We demonstrate the utility of this approach through a practical implementation of CWLProv and evaluation using real-life genomic workflows developed by independent groups. CONCLUSIONS: The underlying principles of the standards utilized by CWLProv enable semantically rich and executable research objects that capture computational workflows with retrospective provenance such that any platform supporting CWL will be able to understand the analysis, reuse the methods for partial reruns, or reproduce the analysis to validate the published findings.

Asunto(s)

Genómica , Modelos Teóricos , Flujo de Trabajo , Humanos , Programas Informáticos

7.

BioExcel Building Blocks, a software library for interoperable biomolecular simulation workflows.

Andrio, Pau; Hospital, Adam; Conejero, Javier; Jordá, Luis; Del Pino, Marc; Codo, Laia; Soiland-Reyes, Stian; Goble, Carole; Lezzi, Daniele; Badia, Rosa M; Orozco, Modesto; Gelpi, Josep Ll.

Sci Data ; 6(1): 169, 2019 09 10.

Artículo en Inglés | MEDLINE | ID: mdl-31506435

RESUMEN

In the recent years, the improvement of software and hardware performance has made biomolecular simulations a mature tool for the study of biological processes. Simulation length and the size and complexity of the analyzed systems make simulations both complementary and compatible with other bioinformatics disciplines. However, the characteristics of the software packages used for simulation have prevented the adoption of the technologies accepted in other bioinformatics fields like automated deployment systems, workflow orchestration, or the use of software containers. We present here a comprehensive exercise to bring biomolecular simulations to the "bioinformatics way of working". The exercise has led to the development of the BioExcel Building Blocks (BioBB) library. BioBB's are built as Python wrappers to provide an interoperable architecture. BioBB's have been integrated in a chain of usual software management tools to generate data ontologies, documentation, installation packages, software containers and ways of integration with workflow managers, that make them usable in most computational environments.

8.

Enabling precision medicine via standard communication of HTS provenance, analysis, and results.

Alterovitz, Gil; Dean, Dennis; Goble, Carole; Crusoe, Michael R; Soiland-Reyes, Stian; Bell, Amanda; Hayes, Anais; Suresh, Anita; Purkayastha, Anjan; King, Charles H; Taylor, Dan; Johanson, Elaine; Thompson, Elaine E; Donaldson, Eric; Morizono, Hiroki; Tsang, Hsinyi; Vora, Jeet K; Goecks, Jeremy; Yao, Jianchao; Almeida, Jonas S; Keeney, Jonathon; Addepalli, KanakaDurga; Krampis, Konstantinos; Smith, Krista M; Guo, Lydia; Walderhaug, Mark; Schito, Marco; Ezewudo, Matthew; Guimera, Nuria; Walsh, Paul; Kahsay, Robel; Gottipati, Srikanth; Rodwell, Timothy C; Bloom, Toby; Lai, Yuching; Simonyan, Vahan; Mazumder, Raja.

PLoS Biol ; 16(12): e3000099, 2018 12.

Artículo en Inglés | MEDLINE | ID: mdl-30596645

RESUMEN

A personalized approach based on a patient's or pathogen's unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to findable, accessible, interoperable, and reusable (FAIR) guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet-lab procedures to computational methods. The BioCompute framework (https://w3id.org/biocompute/1.3.0) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed, and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCOs) offer that standard and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the "Open-Stand.org principles for collaborative open standards development." With high-throughput sequencing (HTS) studies communicated using a BCO, regulatory agencies (e.g., Food and Drug Administration [FDA]), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next-generation sequencing workflow exchange, reporting, and regulatory reviews.

Asunto(s)

Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Animales , Comunicación , Biología Computacional/normas , Genoma , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Medicina de Precisión/tendencias , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN/normas , Programas Informáticos , Flujo de Trabajo

9.

Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data.

McMurry, Julie A; Juty, Nick; Blomberg, Niklas; Burdett, Tony; Conlin, Tom; Conte, Nathalie; Courtot, Mélanie; Deck, John; Dumontier, Michel; Fellows, Donal K; Gonzalez-Beltran, Alejandra; Gormanns, Philipp; Grethe, Jeffrey; Hastings, Janna; Hériché, Jean-Karim; Hermjakob, Henning; Ison, Jon C; Jimenez, Rafael C; Jupp, Simon; Kunze, John; Laibe, Camille; Le Novère, Nicolas; Malone, James; Martin, Maria Jesus; McEntyre, Johanna R; Morris, Chris; Muilu, Juha; Müller, Wolfgang; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Sariyar, Murat; Snoep, Jacky L; Soiland-Reyes, Stian; Stanford, Natalie J; Swainston, Neil; Washington, Nicole; Williams, Alan R; Wimalaratne, Sarala M; Winfree, Lilly M; Wolstencroft, Katherine; Goble, Carole; Mungall, Christopher J; Haendel, Melissa A; Parkinson, Helen.

PLoS Biol ; 15(6): e2001414, 2017 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-28662064

RESUMEN

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

Asunto(s)

Disciplinas de las Ciencias Biológicas/métodos , Biología Computacional/métodos , Minería de Datos/métodos , Diseño de Software , Programas Informáticos , Disciplinas de las Ciencias Biológicas/estadística & datos numéricos , Disciplinas de las Ciencias Biológicas/tendencias , Biología Computacional/tendencias , Minería de Datos/estadística & datos numéricos , Minería de Datos/tendencias , Bases de Datos Factuales/estadística & datos numéricos , Bases de Datos Factuales/tendencias , Predicción , Humanos , Internet

10.

FAIRDOMHub: a repository and collaboration environment for sharing systems biology research.

Wolstencroft, Katherine; Krebs, Olga; Snoep, Jacky L; Stanford, Natalie J; Bacall, Finn; Golebiewski, Martin; Kuzyakiv, Rostyk; Nguyen, Quyen; Owen, Stuart; Soiland-Reyes, Stian; Straszewski, Jakub; van Niekerk, David D; Williams, Alan R; Malmström, Lars; Rinn, Bernd; Müller, Wolfgang; Goble, Carole.

Nucleic Acids Res ; 45(D1): D404-D407, 2017 01 04.

Artículo en Inglés | MEDLINE | ID: mdl-27899646

RESUMEN

The FAIRDOMHub is a repository for publishing FAIR (Findable, Accessible, Interoperable and Reusable) Data, Operating procedures and Models (https://fairdomhub.org/) for the Systems Biology community. It is a web-accessible repository for storing and sharing systems biology research assets. It enables researchers to organize, share and publish data, models and protocols, interlink them in the context of the systems biology investigations that produced them, and to interrogate them via API interfaces. By using the FAIRDOMHub, researchers can achieve more effective exchange with geographically distributed collaborators during projects, ensure results are sustained and preserved and generate reproducible publications that adhere to the FAIR guiding principles of data stewardship.

Asunto(s)

Bases de Datos Factuales , Biología de Sistemas/métodos , Carbono/metabolismo , Curaduría de Datos , Difusión de la Información , Redes y Vías Metabólicas , Investigación

11.

COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project.

Bergmann, Frank T; Adams, Richard; Moodie, Stuart; Cooper, Jonathan; Glont, Mihai; Golebiewski, Martin; Hucka, Michael; Laibe, Camille; Miller, Andrew K; Nickerson, David P; Olivier, Brett G; Rodriguez, Nicolas; Sauro, Herbert M; Scharm, Martin; Soiland-Reyes, Stian; Waltemath, Dagmar; Yvon, Florent; Le Novère, Nicolas.

BMC Bioinformatics ; 15: 369, 2014 Dec 14.

Artículo en Inglés | MEDLINE | ID: mdl-25494900

RESUMEN

BACKGROUND: With the ever increasing use of computational models in the biosciences, the need to share models and reproduce the results of published studies efficiently and easily is becoming more important. To this end, various standards have been proposed that can be used to describe models, simulations, data or other essential information in a consistent fashion. These constitute various separate components required to reproduce a given published scientific result. RESULTS: We describe the Open Modeling EXchange format (OMEX). Together with the use of other standard formats from the Computational Modeling in Biology Network (COMBINE), OMEX is the basis of the COMBINE Archive, a single file that supports the exchange of all the information necessary for a modeling and simulation experiment in biology. An OMEX file is a ZIP container that includes a manifest file, listing the content of the archive, an optional metadata file adding information about the archive and its content, and the files describing the model. The content of a COMBINE Archive consists of files encoded in COMBINE standards whenever possible, but may include additional files defined by an Internet Media Type. Several tools that support the COMBINE Archive are available, either as independent libraries or embedded in modeling software. CONCLUSIONS: The COMBINE Archive facilitates the reproduction of modeling and simulation experiments in biology by embedding all the relevant information in one file. Having all the information stored and exchanged at once also helps in building activity logs and audit trails. We anticipate that the COMBINE Archive will become a significant help for modellers, as the domain moves to larger, more complex experiments such as multi-scale models of organs, digital organisms, and bioengineering.

Asunto(s)

Biología Computacional/métodos , Simulación por Computador , Bases de Datos de Ácidos Nucleicos , Programas Informáticos , Archivos , Humanos , Almacenamiento y Recuperación de la Información , Internet

12.

Community-driven development for computational biology at Sprints, Hackathons and Codefests.

Möller, Steffen; Afgan, Enis; Banck, Michael; Bonnal, Raoul J P; Booth, Timothy; Chilton, John; Cock, Peter J A; Gumbel, Markus; Harris, Nomi; Holland, Richard; Kalas, Matús; Kaján, László; Kibukawa, Eri; Powel, David R; Prins, Pjotr; Quinn, Jacqueline; Sallou, Olivier; Strozzi, Francesco; Seemann, Torsten; Sloggett, Clare; Soiland-Reyes, Stian; Spooner, William; Steinbiss, Sascha; Tille, Andreas; Travis, Anthony J; Guimera, Roman; Katayama, Toshiaki; Chapman, Brad A.

BMC Bioinformatics ; 15 Suppl 14: S7, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-25472764

RESUMEN

BACKGROUND: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. RESULTS: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. CONCLUSIONS: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects.

Asunto(s)

Biología Computacional , Conducta Cooperativa , Programas Informáticos , Comunicación , Internet

13.

Structuring research methods and data with the research object model: genomics workflows as a case study.

Hettne, Kristina M; Dharuri, Harish; Zhao, Jun; Wolstencroft, Katherine; Belhajjame, Khalid; Soiland-Reyes, Stian; Mina, Eleni; Thompson, Mark; Cruickshank, Don; Verdes-Montenegro, Lourdes; Garrido, Julian; de Roure, David; Corcho, Oscar; Klyne, Graham; van Schouwen, Reinout; 't Hoen, Peter A C; Bechhofer, Sean; Goble, Carole; Roos, Marco.

J Biomed Semantics ; 5(1): 41, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-25276335

RESUMEN

BACKGROUND: One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. RESULTS: We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?". CONCLUSIONS: Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. AVAILABILITY: The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro.

14.

PAV ontology: provenance, authoring and versioning.

Ciccarese, Paolo; Soiland-Reyes, Stian; Belhajjame, Khalid; Gray, Alasdair Jg; Goble, Carole; Clark, Tim.

J Biomed Semantics ; 4(1): 37, 2013 Nov 22.

Artículo en Inglés | MEDLINE | ID: mdl-24267948

RESUMEN

BACKGROUND: Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as Dublin Core Terms (DC Terms) and the W3C Provenance Ontology (PROV-O) are domain-independent and general-purpose and they allow and encourage for extensions to cover more specific needs. In particular, to track authoring and versioning information of web resources, PROV-O provides a basic methodology but not any specific classes and properties for identifying or distinguishing between the various roles assumed by agents manipulating digital artifacts, such as author, contributor and curator. RESULTS: We present the Provenance, Authoring and Versioning ontology (PAV, namespace http://purl.org/pav/): a lightweight ontology for capturing "just enough" descriptions essential for tracking the provenance, authoring and versioning of web resources. We argue that such descriptions are essential for digital scientific content. PAV distinguishes between contributors, authors and curators of content and creators of representations in addition to the provenance of originating resources that have been accessed, transformed and consumed. We explore five projects (and communities) that have adopted PAV illustrating their usage through concrete examples. Moreover, we present mappings that show how PAV extends the W3C PROV-O ontology to support broader interoperability. METHOD: The initial design of the PAV ontology was driven by requirements from the AlzSWAN project with further requirements incorporated later from other projects detailed in this paper. The authors strived to keep PAV lightweight and compact by including only those terms that have demonstrated to be pragmatically useful in existing applications, and by recommending terms from existing ontologies when plausible. DISCUSSION: We analyze and compare PAV with related approaches, namely Provenance Vocabulary (PRV), DC Terms and BIBFRAME. We identify similarities and analyze differences between those vocabularies and PAV, outlining strengths and weaknesses of our proposed model. We specify SKOS mappings that align PAV with DC Terms. We conclude the paper with general remarks on the applicability of PAV.

15.

The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud.

Wolstencroft, Katherine; Haines, Robert; Fellows, Donal; Williams, Alan; Withers, David; Owen, Stuart; Soiland-Reyes, Stian; Dunlop, Ian; Nenadic, Aleksandra; Fisher, Paul; Bhagat, Jiten; Belhajjame, Khalid; Bacall, Finn; Hardisty, Alex; Nieva de la Hidalga, Abraham; Balcazar Vargas, Maria P; Sufi, Shoaib; Goble, Carole.

Nucleic Acids Res ; 41(Web Server issue): W557-61, 2013 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-23640334

RESUMEN

The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server.

Asunto(s)

Biología Computacional , Programas Informáticos , Minería de Datos , Perfilación de la Expresión Génica , Internet , Filogenia , Proteómica , Motor de Búsqueda , Flujo de Trabajo

16.

CaGrid Workflow Toolkit: a Taverna based workflow tool for cancer grid.

Tan, Wei; Madduri, Ravi; Nenadic, Alexandra; Soiland-Reyes, Stian; Sulakhe, Dinanath; Foster, Ian; Goble, Carole A.

BMC Bioinformatics ; 11: 542, 2010 Nov 02.

Artículo en Inglés | MEDLINE | ID: mdl-21044328

RESUMEN

BACKGROUND: In biological and medical domain, the use of web services made the data and computation functionality accessible in a unified manner, which helped automate the data pipeline that was previously performed manually. Workflow technology is widely used in the orchestration of multiple services to facilitate in-silico research. Cancer Biomedical Informatics Grid (caBIG) is an information network enabling the sharing of cancer research related resources and caGrid is its underlying service-based computation infrastructure. CaBIG requires that services are composed and orchestrated in a given sequence to realize data pipelines, which are often called scientific workflows. RESULTS: CaGrid selected Taverna as its workflow execution system of choice due to its integration with web service technology and support for a wide range of web services, plug-in architecture to cater for easy integration of third party extensions, etc. The caGrid Workflow Toolkit (or the toolkit for short), an extension to the Taverna workflow system, is designed and implemented to ease building and running caGrid workflows. It provides users with support for various phases in using workflows: service discovery, composition and orchestration, data access, and secure service invocation, which have been identified by the caGrid community as challenging in a multi-institutional and cross-discipline domain. CONCLUSIONS: By extending the Taverna Workbench, caGrid Workflow Toolkit provided a comprehensive solution to compose and coordinate services in caGrid, which would otherwise remain isolated and disconnected from each other. Using it users can access more than 140 services and are offered with a rich set of features including discovery of data and analytical services, query and transfer of data, security protections for service invocations, state management in service interactions, and sharing of workflows, experiences and best practices. The proposed solution is general enough to be applicable and reusable within other service-computing infrastructures that leverage similar technology stack.

Asunto(s)

Biología Computacional/métodos , Almacenamiento y Recuperación de la Información/métodos , Neoplasias/genética , Programas Informáticos , Sistemas de Administración de Bases de Datos , Internet , Neoplasias/clasificación , Neoplasias/diagnóstico

17.

Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data.

Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B.

BMC Bioinformatics ; 9: 334, 2008 Aug 07.

Artículo en Inglés | MEDLINE | ID: mdl-18687127

RESUMEN

BACKGROUND: There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. RESULTS: Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. CONCLUSION: Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.

Asunto(s)

Interpretación Estadística de Datos , Perfilación de la Expresión Génica/estadística & datos numéricos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Programas Informáticos , Bases de Datos Genéticas , Almacenamiento y Recuperación de la Información , Lenguajes de Programación

RESUMEN

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA