Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 72
Filter
Add more filters

Country/Region as subject
Publication year range
1.
EMBO J ; 42(23): e115008, 2023 Dec 01.
Article in English | MEDLINE | ID: mdl-37964598

ABSTRACT

The main goals and challenges for the life science communities in the Open Science framework are to increase reuse and sustainability of data resources, software tools, and workflows, especially in large-scale data-driven research and computational analyses. Here, we present key findings, procedures, effective measures and recommendations for generating and establishing sustainable life science resources based on the collaborative, cross-disciplinary work done within the EOSC-Life (European Open Science Cloud for Life Sciences) consortium. Bringing together 13 European life science research infrastructures, it has laid the foundation for an open, digital space to support biological and medical research. Using lessons learned from 27 selected projects, we describe the organisational, technical, financial and legal/ethical challenges that represent the main barriers to sustainability in the life sciences. We show how EOSC-Life provides a model for sustainable data management according to FAIR (findability, accessibility, interoperability, and reusability) principles, including solutions for sensitive- and industry-related resources, by means of cross-disciplinary training and best practices sharing. Finally, we illustrate how data harmonisation and collaborative work facilitate interoperability of tools, data, solutions and lead to a better understanding of concepts, semantics and functionalities in the life sciences.


Subject(s)
Biological Science Disciplines , Biomedical Research , Software , Workflow
2.
Bioinformatics ; 37(12): 1781-1782, 2021 07 19.
Article in English | MEDLINE | ID: mdl-33031499

ABSTRACT

MOTIVATION: Since its launch in 2010, Identifiers.org has become an important tool for the annotation and cross-referencing of Life Science data. In 2016, we established the Compact Identifier (CID) scheme (prefix: accession) to generate globally unique identifiers for data resources using their locally assigned accession identifiers. Since then, we have developed and improved services to support the growing need to create, reference and resolve CIDs, in systems ranging from human readable text to cloud-based e-infrastructures, by providing high availability and low-latency cloud-based services, backed by a high-quality, manually curated resource. RESULTS: We describe a set of services that can be used to construct and resolve CIDs in Life Sciences and beyond. We have developed a new front end for accessing the Identifiers.org registry data and APIs to simplify integration of Identifiers.org CID services with third-party applications. We have also deployed the new Identifiers.org infrastructure in a commercial cloud environment, bringing our services closer to the data. AVAILABILITYAND IMPLEMENTATION: https://identifiers.org.


Subject(s)
Biological Science Disciplines , Cloud Computing , Humans
3.
Bioinformatics ; 36(10): 3290-3291, 2020 05 01.
Article in English | MEDLINE | ID: mdl-32044952

ABSTRACT

SUMMARY: Dispersed across the Internet is an abundance of disparate, disconnected training information, making it hard for researchers to find training opportunities that are relevant to them. To address this issue, we have developed a new platform-TeSS-which aggregates geographically distributed information and presents it in a central, feature-rich portal. Data are gathered automatically from content providers via bespoke scripts. These resources are cross-linked with related data and tools registries, and made available via a search interface, a data API and through widgets. AVAILABILITY AND IMPLEMENTATION: https://tess.elixir-europe.org.


Subject(s)
Biological Science Disciplines , Software , Humans , Internet , Research Personnel
4.
PLoS Biol ; 16(12): e3000099, 2018 12.
Article in English | MEDLINE | ID: mdl-30596645

ABSTRACT

A personalized approach based on a patient's or pathogen's unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to findable, accessible, interoperable, and reusable (FAIR) guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet-lab procedures to computational methods. The BioCompute framework (https://w3id.org/biocompute/1.3.0) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed, and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCOs) offer that standard and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the "Open-Stand.org principles for collaborative open standards development." With high-throughput sequencing (HTS) studies communicated using a BCO, regulatory agencies (e.g., Food and Drug Administration [FDA]), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next-generation sequencing workflow exchange, reporting, and regulatory reviews.


Subject(s)
Computational Biology/methods , Sequence Analysis, DNA/methods , Animals , Communication , Computational Biology/standards , Genome , Genomics/methods , High-Throughput Nucleotide Sequencing , Humans , Precision Medicine/trends , Reproducibility of Results , Sequence Analysis, DNA/standards , Software , Workflow
5.
PLoS Biol ; 15(6): e2001414, 2017 Jun.
Article in English | MEDLINE | ID: mdl-28662064

ABSTRACT

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.


Subject(s)
Biological Science Disciplines/methods , Computational Biology/methods , Data Mining/methods , Software Design , Software , Biological Science Disciplines/statistics & numerical data , Biological Science Disciplines/trends , Computational Biology/trends , Data Mining/statistics & numerical data , Data Mining/trends , Databases, Factual/statistics & numerical data , Databases, Factual/trends , Forecasting , Humans , Internet
6.
Nucleic Acids Res ; 45(D1): D404-D407, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899646

ABSTRACT

The FAIRDOMHub is a repository for publishing FAIR (Findable, Accessible, Interoperable and Reusable) Data, Operating procedures and Models (https://fairdomhub.org/) for the Systems Biology community. It is a web-accessible repository for storing and sharing systems biology research assets. It enables researchers to organize, share and publish data, models and protocols, interlink them in the context of the systems biology investigations that produced them, and to interrogate them via API interfaces. By using the FAIRDOMHub, researchers can achieve more effective exchange with geographically distributed collaborators during projects, ensure results are sustained and preserved and generate reproducible publications that adhere to the FAIR guiding principles of data stewardship.


Subject(s)
Databases, Factual , Systems Biology/methods , Carbon/metabolism , Data Curation , Information Dissemination , Metabolic Networks and Pathways , Research
8.
Biochem Soc Trans ; 44(3): 675-7, 2016 06 15.
Article in English | MEDLINE | ID: mdl-27284023

ABSTRACT

The Manchester Synthetic Biology Research Centre (SYNBIOCHEM) is a foundry for the biosynthesis and sustainable production of fine and speciality chemicals. The Centre's integrated technology platforms provide a unique capability to facilitate predictable engineering of microbial bio-factories for chemicals production. An overview of these capabilities is described.


Subject(s)
Metabolic Engineering , Synthetic Biology , United Kingdom , Universities
9.
BMC Ecol ; 16(1): 49, 2016 10 20.
Article in English | MEDLINE | ID: mdl-27765035

ABSTRACT

BACKGROUND: Making forecasts about biodiversity and giving support to policy relies increasingly on large collections of data held electronically, and on substantial computational capability and capacity to analyse, model, simulate and predict using such data. However, the physically distributed nature of data resources and of expertise in advanced analytical tools creates many challenges for the modern scientist. Across the wider biological sciences, presenting such capabilities on the Internet (as "Web services") and using scientific workflow systems to compose them for particular tasks is a practical way to carry out robust "in silico" science. However, use of this approach in biodiversity science and ecology has thus far been quite limited. RESULTS: BioVeL is a virtual laboratory for data analysis and modelling in biodiversity science and ecology, freely accessible via the Internet. BioVeL includes functions for accessing and analysing data through curated Web services; for performing complex in silico analysis through exposure of R programs, workflows, and batch processing functions; for on-line collaboration through sharing of workflows and workflow runs; for experiment documentation through reproducibility and repeatability; and for computational support via seamless connections to supporting computing infrastructures. We developed and improved more than 60 Web services with significant potential in many different kinds of data analysis and modelling tasks. We composed reusable workflows using these Web services, also incorporating R programs. Deploying these tools into an easy-to-use and accessible 'virtual laboratory', free via the Internet, we applied the workflows in several diverse case studies. We opened the virtual laboratory for public use and through a programme of external engagement we actively encouraged scientists and third party application and tool developers to try out the services and contribute to the activity. CONCLUSIONS: Our work shows we can deliver an operational, scalable and flexible Internet-based virtual laboratory to meet new demands for data processing and analysis in biodiversity science and ecology. In particular, we have successfully integrated existing and popular tools and practices from different scientific disciplines to be used in biodiversity and ecological research.


Subject(s)
Biodiversity , Ecology/methods , Ecology/instrumentation , Internet , Models, Biological , Software , Workflow
10.
J Med Internet Res ; 18(1): e13, 2016 Jan 14.
Article in English | MEDLINE | ID: mdl-26769334

ABSTRACT

BACKGROUND: Data discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfaces to support these "experts." Such interfaces hark back to a time when searches needed to be accurate first time as there was a high computational cost associated with each search. Our work is part of a governmental research initiative between the medical and social research funding bodies to improve the use of social data in medical research. OBJECTIVE: The cross-disciplinary nature of data science can make no assumptions regarding the domain expertise of a particular scientist, whose interests may intersect multiple domains. Here we consider the common requirement for scientists to seek archived data for secondary analysis. This has more in common with search needs of the "Google generation" than with their single-domain, single-tool forebears. Our study compares a Google-like interface with traditional ways of searching for noncomplex health data in a data archive. METHODS: Two user interfaces are evaluated for the same set of tasks in extracting data from surveys stored in the UK Data Archive (UKDA). One interface, Web search, is "Google-like," enabling users to browse, search for, and view metadata about study variables, whereas the other, traditional search, has standard multioption user interface. RESULTS: Using a comprehensive set of tasks with 20 volunteers, we found that the Web search interface met data discovery needs and expectations better than the traditional search. A task × interface repeated measures analysis showed a main effect indicating that answers found through the Web search interface were more likely to be correct (F1,19=37.3, P<.001), with a main effect of task (F3,57=6.3, P<.001). Further, participants completed the task significantly faster using the Web search interface (F1,19=18.0, P<.001). There was also a main effect of task (F2,38=4.1, P=.025, Greenhouse-Geisser correction applied). Overall, participants were asked to rate learnability, ease of use, and satisfaction. Paired mean comparisons showed that the Web search interface received significantly higher ratings than the traditional search interface for learnability (P=.002, 95% CI [0.6-2.4]), ease of use (P<.001, 95% CI [1.2-3.2]), and satisfaction (P<.001, 95% CI [1.8-3.5]). The results show superior cross-domain usability of Web search, which is consistent with its general familiarity and with enabling queries to be refined as the search proceeds, which treats serendipity as part of the refinement. CONCLUSIONS: The results provide clear evidence that data science should adopt single-field natural language search interfaces for variable search supporting in particular: query reformulation; data browsing; faceted search; surrogates; relevance feedback; summarization, analytics, and visual presentation.


Subject(s)
Natural Language Processing , Search Engine/methods , User-Computer Interface , Datasets as Topic , Information Storage and Retrieval/methods , Internet
11.
Nucleic Acids Res ; 41(Web Server issue): W557-61, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23640334

ABSTRACT

The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server.


Subject(s)
Computational Biology , Software , Data Mining , Gene Expression Profiling , Internet , Phylogeny , Proteomics , Search Engine , Workflow
12.
BMC Bioinformatics ; 15 Suppl 1: S12, 2014.
Article in English | MEDLINE | ID: mdl-24564760

ABSTRACT

BACKGROUND: Scientific workflows management systems are increasingly used to specify and manage bioinformatics experiments. Their programming model appeals to bioinformaticians, who can use them to easily specify complex data processing pipelines. Such a model is underpinned by a graph structure, where nodes represent bioinformatics tasks and links represent the dataflow. The complexity of such graph structures is increasing over time, with possible impacts on scientific workflows reuse. In this work, we propose effective methods for workflow design, with a focus on the Taverna model. We argue that one of the contributing factors for the difficulties in reuse is the presence of "anti-patterns", a term broadly used in program design, to indicate the use of idiomatic forms that lead to over-complicated design. The main contribution of this work is a method for automatically detecting such anti-patterns, and replacing them with different patterns which result in a reduction in the workflow's overall structural complexity. Rewriting workflows in this way will be beneficial both in terms of user experience (easier design and maintenance), and in terms of operational efficiency (easier to manage, and sometimes to exploit the latent parallelism amongst the tasks). RESULTS: We have conducted a thorough study of the workflows structures available in Taverna, with the aim of finding out workflow fragments whose structure could be made simpler without altering the workflow semantics. We provide four contributions. Firstly, we identify a set of anti-patterns that contribute to the structural workflow complexity. Secondly, we design a series of refactoring transformations to replace each anti-pattern by a new semantically-equivalent pattern with less redundancy and simplified structure. Thirdly, we introduce a distilling algorithm that takes in a workflow and produces a distilled semantically-equivalent workflow. Lastly, we provide an implementation of our refactoring approach that we evaluate on both the public Taverna workflows and on a private collection of workflows from the BioVel project. CONCLUSION: We have designed and implemented an approach to improving workflow structure by way of rewriting preserving workflow semantics. Future work includes considering our refactoring approach during the phase of workflow design and proposing guidelines for designing distilled workflows.


Subject(s)
Algorithms , User-Computer Interface , Workflow , Computational Biology/methods
13.
PeerJ Comput Sci ; 10: e1781, 2024.
Article in English | MEDLINE | ID: mdl-38855229

ABSTRACT

FAIR Digital Object (FDO) is an emerging concept that is highlighted by European Open Science Cloud (EOSC) as a potential candidate for building an ecosystem of machine-actionable research outputs. In this work we systematically evaluate FDO and its implementations as a global distributed object system, by using five different conceptual frameworks that cover interoperability, middleware, FAIR principles, EOSC requirements and FDO guidelines themself. We compare the FDO approach with established Linked Data practices and the existing Web architecture, and provide a brief history of the Semantic Web while discussing why these technologies may have been difficult to adopt for FDO purposes. We conclude with recommendations for both Linked Data and FDO communities to further their adaptation and alignment.

14.
Learn Health Syst ; 8(1): e10365, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38249839

ABSTRACT

Open and practical exchange, dissemination, and reuse of specimens and data have become a fundamental requirement for life sciences research. The quality of the data obtained and thus the findings and knowledge derived is thus significantly influenced by the quality of the samples, the experimental methods, and the data analysis. Therefore, a comprehensive and precise documentation of the pre-analytical conditions, the analytical procedures, and the data processing are essential to be able to assess the validity of the research results. With the increasing importance of the exchange, reuse, and sharing of data and samples, procedures are required that enable cross-organizational documentation, traceability, and non-repudiation. At present, this information on the provenance of samples and data is mostly either sparse, incomplete, or incoherent. Since there is no uniform framework, this information is usually only provided within the organization and not interoperably. At the same time, the collection and sharing of biological and environmental specimens increasingly require definition and documentation of benefit sharing and compliance to regulatory requirements rather than consideration of pure scientific needs. In this publication, we present an ongoing standardization effort to provide trustworthy machine-actionable documentation of the data lineage and specimens. We would like to invite experts from the biotechnology and biomedical fields to further contribute to the standard.

15.
Nucleic Acids Res ; 39(Database issue): D7-10, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21097465

ABSTRACT

The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.


Subject(s)
Databases, Factual/standards , Information Dissemination
16.
Sci Data ; 10(1): 756, 2023 11 02.
Article in English | MEDLINE | ID: mdl-37919302

ABSTRACT

Biological science produces "big data" in varied formats, which necessitates using computational tools to process, integrate, and analyse data. Researchers using computational biology tools range from those using computers for communication, to those writing analysis code. We examine differences in how researchers conceptualise the same data, which we call "subjective data models". We interviewed 22 people with biological experience and varied levels of computational experience, and found that many had fluid subjective data models that changed depending on circumstance. Surprisingly, results did not cluster around participants' computational experience levels. People did not consistently map entities from abstract data models to the real-world entities in files, and certain data identifier formats were easier to infer meaning from than others. Real-world implications: 1) software engineers should design interfaces for task performance, emulating popular user interfaces, rather than targeting professional backgrounds; 2) when insufficient context is provided, people may guess what data means, whether or not they are correct, emphasising the importance of contextual metadata to remove the need for erroneous guesswork.

17.
Drug Discov Today ; 28(4): 103510, 2023 04.
Article in English | MEDLINE | ID: mdl-36716952

ABSTRACT

The FAIR (findable, accessible, interoperable and reusable) principles are data management and stewardship guidelines aimed at increasing the effective use of scientific research data. Adherence to these principles in managing data assets in pharmaceutical research and development (R&D) offers pharmaceutical companies the potential to maximise the value of such assets, but the endeavour is costly and challenging. We describe the 'FAIR-Decide' framework, which aims to guide decision-making on the retrospective FAIRification of existing datasets by using business analysis techniques to estimate costs and expected benefits. This framework supports decision-making on FAIRification in the pharmaceutical R&D industry and can be integrated into a company's data management strategy.


Subject(s)
Drug Industry , Research , Retrospective Studies , Data Management , Pharmaceutical Preparations
18.
J Biomed Semantics ; 14(1): 6, 2023 06 01.
Article in English | MEDLINE | ID: mdl-37264430

ABSTRACT

BACKGROUND: The Findable, Accessible, Interoperable and Reusable(FAIR) Principles explicitly require the use of FAIR vocabularies, but what precisely constitutes a FAIR vocabulary remains unclear. Being able to define FAIR vocabularies, identify features of FAIR vocabularies, and provide assessment approaches against the features can guide the development of vocabularies. RESULTS: We differentiate data, data resources and vocabularies used for FAIR, examine the application of the FAIR Principles to vocabularies, align their requirements with the Open Biomedical Ontologies principles, and propose FAIR Vocabulary Features. We also design assessment approaches for FAIR vocabularies by mapping the FVFs with existing FAIR assessment indicators. Finally, we demonstrate how they can be used for evaluating and improving vocabularies using exemplary biomedical vocabularies. CONCLUSIONS: Our work proposes features of FAIR vocabularies and corresponding indicators for assessing the FAIR levels of different types of vocabularies, identifies use cases for vocabulary engineers, and guides the evolution of vocabularies.


Subject(s)
Biological Ontologies , Vocabulary, Controlled , Vocabulary
19.
Curr Protoc ; 3(2): e682, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36809564

ABSTRACT

Many trainers and organizations are passionate about sharing their training material. Sharing training material has several benefits, such as providing a record of recognition as an author, offering inspiration to other trainers, enabling researchers to discover training resources for their personal learning path, and improving the training resource landscape using data-driven gap analysis from the bioinformatics community. In this article, we present a series of protocols for using the ELIXIR online training registry Training eSupport System (TeSS). TeSS provides a one-stop shop for trainers and trainees to discover online information and content, including training materials, events, and interactive tutorials. For trainees, we provide protocols for registering and logging in and for searching and filtering content. For trainers and organizations, we also show how to manually or automatically register training events and materials. Following these protocols will contribute to promoting training events and add to a growing catalog of materials. This will concomitantly increase the FAIRness of training materials and events. Training registries like TeSS use a scraping mechanism to aggregate training resources from many providers when they have been annotated using Bioschemas specifications. Finally, we describe how to enrich training resources to allow for more efficient sharing of the structured metadata, such as prerequisites, target audience, and learning outcomes using Bioschemas specification. As increasing training events and material are aggregated in TeSS, searching the registry for specific events and materials becomes crucial. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Searching for training events and materials in TeSS Support Protocol: Integrating TeSS widgets on your website Basic Protocol 2: Logging in to TeSS using an institutional account Alternate Protocol: Creating and logging in to a TeSS account Basic Protocol 3: Manual registration of training events in TeSS Basic Protocol 4: Manual registration of training materials in TeSS Basic Protocol 5: Registration of a content provider in TeSS Basic Protocol 6: Automated harvesting of training events and materials in TeSS.


Subject(s)
Computational Biology , Research Personnel , Humans
20.
Sci Data ; 10(1): 291, 2023 05 19.
Article in English | MEDLINE | ID: mdl-37208349

ABSTRACT

The COVID-19 pandemic has highlighted the need for FAIR (Findable, Accessible, Interoperable, and Reusable) data more than any other scientific challenge to date. We developed a flexible, multi-level, domain-agnostic FAIRification framework, providing practical guidance to improve the FAIRness for both existing and future clinical and molecular datasets. We validated the framework in collaboration with several major public-private partnership projects, demonstrating and delivering improvements across all aspects of FAIR and across a variety of datasets and their contexts. We therefore managed to establish the reproducibility and far-reaching applicability of our approach to FAIRification tasks.


Subject(s)
COVID-19 , Datasets as Topic , Humans , Pandemics , Public-Private Sector Partnerships , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL