Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 19 de 19
Filtrer
1.
F1000Res ; 122023.
Article de Anglais | MEDLINE | ID: mdl-38882711

RÉSUMÉ

Biodiversity loss is now recognised as one of the major challenges for humankind to address over the next few decades. Unless major actions are taken, the sixth mass extinction will lead to catastrophic effects on the Earth's biosphere and human health and well-being. ELIXIR can help address the technical challenges of biodiversity science, through leveraging its suite of services and expertise to enable data management and analysis activities that enhance our understanding of life on Earth and facilitate biodiversity preservation and restoration. This white paper, prepared by the ELIXIR Biodiversity Community, summarises the current status and responses, and presents a set of plans, both technical and community-oriented, that should both enhance how ELIXIR Services are applied in the biodiversity field and how ELIXIR builds connections across the many other infrastructures active in this area. We discuss the areas of highest priority, how they can be implemented in cooperation with the ELIXIR Platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for a Biodiversity Community in ELIXIR and is an appeal to identify and involve new stakeholders.


Sujet(s)
Biodiversité , Humains , Conservation des ressources naturelles
2.
Nucleic Acids Res ; 50(W1): W108-W114, 2022 07 05.
Article de Anglais | MEDLINE | ID: mdl-35524558

RÉSUMÉ

Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various subsystems and scales remain siloed by different software tools. For example, each tool must be executed through a distinct interface. To help investigators find and use simulation tools, we developed BioSimulators (https://biosimulators.org), a central registry of the capabilities of simulation tools and consistent Python, command-line and containerized interfaces to each version of each tool. The foundation of BioSimulators is standards, such as CellML, SBML, SED-ML and the COMBINE archive format, and validation tools for simulation projects and simulation tools that ensure these standards are used consistently. To help modelers find tools for particular projects, we have also used the registry to develop recommendation services. We anticipate that BioSimulators will help modelers exchange, reproduce, and combine simulations.


Sujet(s)
Simulation numérique , Logiciel , Humains , Bioingénierie , Modèles biologiques , Enregistrements , Personnel de recherche
3.
F1000Res ; 10: 897, 2021.
Article de Anglais | MEDLINE | ID: mdl-34804501

RÉSUMÉ

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.


Sujet(s)
Disciplines des sciences biologiques , Biologie informatique , Référenciation , Logiciel , Flux de travaux
4.
F1000Res ; 10: 320, 2021.
Article de Anglais | MEDLINE | ID: mdl-34136134

RÉSUMÉ

Workflows are the keystone of bioimage analysis, and the NEUBIAS (Network of European BioImage AnalystS) community is trying to gather the actors of this field and organize the information around them.  One of its most recent outputs is the opening of the F1000Research NEUBIAS gateway, whose main objective is to offer a channel of publication for bioimage analysis workflows and associated resources. In this paper we want to express some personal opinions and recommendations related to finding, handling and developing bioimage analysis workflows.  The emergence of "big data" in bioimaging and resource-intensive analysis algorithms make local data storage and computing solutions a limiting factor. At the same time, the need for data sharing with collaborators and a general shift towards remote work, have created new challenges and avenues for the execution and sharing of bioimage analysis workflows. These challenges are to reproducibly run workflows in remote environments, in particular when their components come from different software packages, but also to document them and link their parameters and results by following the FAIR principles (Findable, Accessible, Interoperable, Reusable) to foster open and reproducible science. In this opinion paper, we focus on giving some directions to the reader to tackle these challenges and navigate through this complex ecosystem, in order to find and use workflows, and to compare workflows addressing the same problem. We also discuss tools to run workflows in the cloud and on High Performance Computing resources, and suggest ways to make these workflows FAIR.


Sujet(s)
Biologie informatique , Écosystème , Algorithmes , Mémorisation et recherche des informations , Flux de travaux
5.
Gigascience ; 10(1)2021 01 27.
Article de Anglais | MEDLINE | ID: mdl-33506265

RÉSUMÉ

BACKGROUND: Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. The diversity of information used to describe life-scientific digital resources presents an obstacle to their utilization. Although several standardization efforts are emerging, no information schema has been sufficiently detailed to enable uniform semantic and syntactic description-and cataloguing-of bioinformatics resources. FINDINGS: Here we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. We compare our approach to related initiatives and provide alignments to foster interoperability and reusability. CONCLUSIONS: biotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. The use of biotoolsSchema in bio.tools promotes the FAIRness of research software, a key element of open and reproducible developments for data-intensive sciences.


Sujet(s)
Disciplines des sciences biologiques , Biologie informatique , Bases de données factuelles , Humains , Sémantique , Logiciel
6.
Brief Bioinform ; 21(5): 1697-1705, 2020 09 25.
Article de Anglais | MEDLINE | ID: mdl-31624831

RÉSUMÉ

The corpus of bioinformatics resources is huge and expanding rapidly, presenting life scientists with a growing challenge in selecting tools that fit the desired purpose. To address this, the European Infrastructure for Biological Information is supporting a systematic approach towards a comprehensive registry of tools and databases for all domains of bioinformatics, provided under a single portal (https://bio.tools). We describe here the practical means by which scientific communities, including individual developers and projects, through major service providers and research infrastructures, can describe their own bioinformatics resources and share these via bio.tools.


Sujet(s)
Participation communautaire , Biologie informatique/méthodes , Logiciel , Biologie informatique/normes , Systèmes de gestion de bases de données , Europe , Humains
7.
Genome Biol ; 20(1): 164, 2019 08 12.
Article de Anglais | MEDLINE | ID: mdl-31405382

RÉSUMÉ

Bioinformaticians and biologists rely increasingly upon workflows for the flexible utilization of the many life science tools that are needed to optimally convert data into knowledge. We outline a pan-European enterprise to provide a catalogue ( https://bio.tools ) of tools and databases that can be used in these workflows. bio.tools not only lists where to find resources, but also provides a wide variety of practical information.


Sujet(s)
Disciplines des sciences biologiques , Bases de données factuelles , Logiciel , Internet
8.
F1000Res ; 72018.
Article de Anglais | MEDLINE | ID: mdl-30271575

RÉSUMÉ

The Norwegian e-Infrastructure for Life Sciences (NeLS) has been developed by ELIXIR Norway to provide its users with a system enabling data storage, sharing, and analysis in a project-oriented fashion. The system is available through easy-to-use web interfaces, including the Galaxy workbench for data analysis and workflow execution. Users confident with a command-line interface and programming may also access it through Secure Shell (SSH) and application programming interfaces (APIs).  NeLS has been in production since 2015, with training and support provided by the help desk of ELIXIR Norway. Through collaboration with NorSeq, the national consortium for high-throughput sequencing, an integrated service is offered so that sequencing data generated in a research project is provided to the involved researchers through NeLS. Sensitive data, such as individual genomic sequencing data, are handled using the TSD (Services for Sensitive Data) platform provided by Sigma2 and the University of Oslo. NeLS integrates national e-infrastructure storage and computing resources, and is also integrated with the SEEK platform in order to store large data files produced by experiments described in SEEK.   In this article, we outline the architecture of NeLS and discuss possible directions for further development.


Sujet(s)
Disciplines des sciences biologiques , Systèmes de gestion de bases de données , Séquençage nucléotidique à haut débit , Humains , Diffusion de l'information/méthodes , Mémorisation et recherche des informations/méthodes , Norvège
9.
Gigascience ; 6(6): 1-4, 2017 06 01.
Article de Anglais | MEDLINE | ID: mdl-28402416

RÉSUMÉ

Background: Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE .


Sujet(s)
Biologie informatique/méthodes , Automatisation , Systèmes informatiques , Internet , Reproductibilité des résultats , Logiciel , Interface utilisateur , Flux de travaux
10.
Nucleic Acids Res ; 44(D1): D38-47, 2016 Jan 04.
Article de Anglais | MEDLINE | ID: mdl-26538599

RÉSUMÉ

Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.


Sujet(s)
Biologie informatique , Enregistrements , Curation de données , Logiciel
11.
BMC Bioinformatics ; 15 Suppl 14: S7, 2014.
Article de Anglais | MEDLINE | ID: mdl-25472764

RÉSUMÉ

BACKGROUND: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. RESULTS: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. CONCLUSIONS: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects.


Sujet(s)
Biologie informatique , Comportement coopératif , Logiciel , Communication , Internet
12.
BMC Bioinformatics ; 15: 85, 2014 Mar 26.
Article de Anglais | MEDLINE | ID: mdl-24669753

RÉSUMÉ

BACKGROUND: 20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software. RESULTS: Here, we present FreeContact, a fast, open source implementation of EVfold-mfDCA and PSICOV. On a test set of 140 proteins, FreeContact was almost eight times faster than PSICOV without decreasing prediction performance. The EVfold-mfDCA implementation of FreeContact was over 220 times faster than PSICOV with negligible performance decrease. EVfold-mfDCA was unavailable for testing due to its dependency on proprietary software. FreeContact is implemented as the free C++ library "libfreecontact", complete with command line tool "freecontact", as well as Perl and Python modules. All components are available as Debian packages. FreeContact supports the BioXSD format for interoperability. CONCLUSIONS: FreeContact provides the opportunity to compute reliable contact predictions in any environment (desktop or cloud).


Sujet(s)
Biologie informatique/méthodes , Protéines/composition chimique , Analyse de séquence de protéine/méthodes , Logiciel , Algorithmes , Conformation des protéines , Protéines/génétique
13.
J Biomed Semantics ; 5(1): 5, 2014 Feb 05.
Article de Anglais | MEDLINE | ID: mdl-24495517

RÉSUMÉ

The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.

14.
Nucleic Acids Res ; 41(Web Server issue): W133-41, 2013 Jul.
Article de Anglais | MEDLINE | ID: mdl-23632163

RÉSUMÉ

The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.


Sujet(s)
Génomique/méthodes , Logiciel , Interprétation statistique de données , Génome , Internet
15.
Bioinformatics ; 29(10): 1325-32, 2013 May 15.
Article de Anglais | MEDLINE | ID: mdl-23479348

RÉSUMÉ

MOTIVATION: Advancing the search, publication and integration of bioinformatics tools and resources demands consistent machine-understandable descriptions. A comprehensive ontology allowing such descriptions is therefore required. RESULTS: EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats. EDAM supports semantic annotation of diverse entities such as Web services, databases, programmatic libraries, standalone tools, interactive applications, data schemas, datasets and publications within bioinformatics. EDAM applies to organizing and finding suitable tools and data and to automating their integration into complex applications or workflows. It includes over 2200 defined concepts and has successfully been used for annotations and implementations. AVAILABILITY: The latest stable version of EDAM is available in OWL format from http://edamontology.org/EDAM.owl and in OBO format from http://edamontology.org/EDAM.obo. It can be viewed online at the NCBO BioPortal and the EBI Ontology Lookup Service. For documentation and license please refer to http://edamontology.org. This article describes version 1.2 available at http://edamontology.org/EDAM_1.2.owl. CONTACT: jison@ebi.ac.uk.


Sujet(s)
Biologie informatique/méthodes , Logiciel , Algorithmes , Bases de données factuelles , Flux de travaux
16.
N Biotechnol ; 30(2): 109-13, 2013 Jan 25.
Article de Anglais | MEDLINE | ID: mdl-22687389

RÉSUMÉ

Management of data to produce scientific knowledge is a key challenge for biological research in the 21st century. Emerging high-throughput technologies allow life science researchers to produce big data at speeds and in amounts that were unthinkable just a few years ago. This places high demands on all aspects of the workflow: from data capture (including the experimental constraints of the experiment), analysis and preservation, to peer-reviewed publication of results. Failure to recognise the issues at each level can lead to serious conflicts and mistakes; research may then be compromised as a result of the publication of non-coherent protocols, or the misinterpretation of published data. In this report, we present the results from a workshop that was organised to create an ontological data-modelling framework for Laboratory Protocol Standards for the Molecular Methods Database (MolMeth). The workshop provided a set of short- and long-term goals for the MolMeth database, the most important being the decision to use the established EXACT description of biomedical ontologies as a starting point.


Sujet(s)
Congrès comme sujet , Bases de données comme sujet , Laboratoires , Biologie moléculaire/méthodes , Biologie moléculaire/normes , Internet , Laboratoires/normes
17.
BMC Bioinformatics ; 12: 494, 2011 Dec 30.
Article de Anglais | MEDLINE | ID: mdl-22208806

RÉSUMÉ

BACKGROUND: With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated. RESULTS: We here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0. CONCLUSIONS: The defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience.


Sujet(s)
Génomique/méthodes , Analyse de séquence d'ADN , Génome , Génome humain , Humains , Séquences d'acides nucléiques régulatrices , Logiciel
18.
Bioinformatics ; 26(18): i540-6, 2010 Sep 15.
Article de Anglais | MEDLINE | ID: mdl-20823319

RÉSUMÉ

MOTIVATION: The world-wide community of life scientists has access to a large number of public bioinformatics databases and tools, which are developed and deployed using diverse technologies and designs. More and more of the resources offer programmatic web-service interface. However, efficient use of the resources is hampered by the lack of widely used, standard data-exchange formats for the basic, everyday bioinformatics data types. RESULTS: BioXSD has been developed as a candidate for standard, canonical exchange format for basic bioinformatics data. BioXSD is represented by a dedicated XML Schema and defines syntax for biological sequences, sequence annotations, alignments and references to resources. We have adapted a set of web services to use BioXSD as the input and output format, and implemented a test-case workflow. This demonstrates that the approach is feasible and provides smooth interoperability. Semantics for BioXSD is provided by annotation with the EDAM ontology. We discuss in a separate section how BioXSD relates to other initiatives and approaches, including existing standards and the Semantic Web. AVAILABILITY: The BioXSD 1.0 XML Schema is freely available at http://www.bioxsd.org/BioXSD-1.0.xsd under the Creative Commons BY-ND 3.0 license. The http://bioxsd.org web page offers documentation, examples of data in BioXSD format, example workflows with source codes in common programming languages, an updated list of compatible web services and tools and a repository of feature requests from the community.


Sujet(s)
Biologie informatique/méthodes , Mémorisation et recherche des informations , Internet , Langages de programmation , Séquence d'acides aminés , Mémorisation et recherche des informations/normes , Données de séquences moléculaires , Sémantique , Logiciel , Flux de travaux
19.
Nucleic Acids Res ; 38(Web Server issue): W683-8, 2010 Jul.
Article de Anglais | MEDLINE | ID: mdl-20462862

RÉSUMÉ

The EMBRACE (European Model for Bioinformatics Research and Community Education) web service collection is the culmination of a 5-year project that set out to investigate issues involved in developing and deploying web services for use in the life sciences. The project concluded that in order for web services to achieve widespread adoption, standards must be defined for the choice of web service technology, for semantically annotating both service function and the data exchanged, and a mechanism for discovering services must be provided. Building on this, the project developed: EDAM, an ontology for describing life science web services; BioXSD, a schema for exchanging data between services; and a centralized registry (http://www.embraceregistry.net) that collects together around 1000 services developed by the consortium partners. This article presents the current status of the collection and its associated recommendations and standards definitions.


Sujet(s)
Biologie informatique , Logiciel , Disciplines des sciences biologiques , Diffusion de l'information , Internet , Enregistrements , Intégration de systèmes
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...