Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 37
Filter
1.
PLoS Biol ; 15(6): e2001414, 2017 Jun.
Article in English | MEDLINE | ID: mdl-28662064

ABSTRACT

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.


Subject(s)
Biological Science Disciplines/methods , Computational Biology/methods , Data Mining/methods , Software Design , Software , Biological Science Disciplines/statistics & numerical data , Biological Science Disciplines/trends , Computational Biology/trends , Data Mining/statistics & numerical data , Data Mining/trends , Databases, Factual/statistics & numerical data , Databases, Factual/trends , Forecasting , Humans , Internet
3.
Bioinformatics ; 33(16): 2580-2582, 2017 Aug 15.
Article in English | MEDLINE | ID: mdl-28379341

ABSTRACT

MOTIVATION: BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). AVAILABILITY AND IMPLEMENTATION: The software is freely available at github.com/BioContainers/. CONTACT: yperez@ebi.ac.uk.


Subject(s)
Computational Biology/methods , Software , Genomics/methods , Metabolomics/methods , Proteomics/methods
4.
Bioinformatics ; 31(1): 140-2, 2015 Jan 01.
Article in English | MEDLINE | ID: mdl-25189782

ABSTRACT

SUMMARY: Rapid technological advances have led to an explosion of biomedical data in recent years. The pace of change has inspired new collaborative approaches for sharing materials and resources to help train life scientists both in the use of cutting-edge bioinformatics tools and databases and in how to analyse and interpret large datasets. A prototype platform for sharing such training resources was recently created by the Bioinformatics Training Network (BTN). Building on this work, we have created a centralized portal for sharing training materials and courses, including a catalogue of trainers and course organizers, and an announcement service for training events. For course organizers, the portal provides opportunities to promote their training events; for trainers, the portal offers an environment for sharing materials, for gaining visibility for their work and promoting their skills; for trainees, it offers a convenient one-stop shop for finding suitable training resources and identifying relevant training events and activities locally and worldwide. AVAILABILITY AND IMPLEMENTATION: http://mygoblet.org/training-portal.


Subject(s)
Computational Biology/education , Curriculum , Database Management Systems , Research Personnel/education , Teaching , Humans , Programming Languages , Software Design
5.
Brief Bioinform ; 14(5): 528-37, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23803301

ABSTRACT

The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.


Subject(s)
Biological Science Disciplines/education , Computational Biology/education , Curriculum , Data Mining , Database Management Systems , Programming Languages , Software Design , Teaching
6.
Circ Res ; 113(9): 1043-53, 2013 Oct 12.
Article in English | MEDLINE | ID: mdl-23965338

ABSTRACT

RATIONALE: Omics sciences enable a systems-level perspective in characterizing cardiovascular biology. Integration of diverse proteomics data via a computational strategy will catalyze the assembly of contextualized knowledge, foster discoveries through multidisciplinary investigations, and minimize unnecessary redundancy in research efforts. OBJECTIVE: The goal of this project is to develop a consolidated cardiac proteome knowledgebase with novel bioinformatics pipeline and Web portals, thereby serving as a new resource to advance cardiovascular biology and medicine. METHODS AND RESULTS: We created Cardiac Organellar Protein Atlas Knowledgebase (COPaKB; www.HeartProteome.org), a centralized platform of high-quality cardiac proteomic data, bioinformatics tools, and relevant cardiovascular phenotypes. Currently, COPaKB features 8 organellar modules, comprising 4203 LC-MS/MS experiments from human, mouse, drosophila, and Caenorhabditis elegans, as well as expression images of 10,924 proteins in human myocardium. In addition, the Java-coded bioinformatics tools provided by COPaKB enable cardiovascular investigators in all disciplines to retrieve and analyze pertinent organellar protein properties of interest. CONCLUSIONS: COPaKB provides an innovative and interactive resource that connects research interests with the new biological discoveries in protein sciences. With an array of intuitive tools in this unified Web server, nonproteomics investigators can conveniently collaborate with proteomics specialists to dissect the molecular signatures of cardiovascular phenotypes.


Subject(s)
Databases, Protein , Knowledge Bases , Muscle Proteins/metabolism , Myocardium/metabolism , Proteomics/methods , Systems Biology , Systems Integration , Access to Information , Animals , Caenorhabditis elegans , Diffusion of Innovation , Drosophila , Humans , Mice , Software Design , Workflow
7.
Nucleic Acids Res ; 41(Web Server issue): W601-6, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23671334

ABSTRACT

The Proteomics Standard Initiative Common QUery InterfaCe (PSICQUIC) specification was created by the Human Proteome Organization Proteomics Standards Initiative (HUPO-PSI) to enable computational access to molecular-interaction data resources by means of a standard Web Service and query language. Currently providing >150 million binary interaction evidences from 28 servers globally, the PSICQUIC interface allows the concurrent search of multiple molecular-interaction information resources using a single query. Here, we present an extension of the PSICQUIC specification (version 1.3), which has been released to be compliant with the enhanced standards in molecular interactions. The new release also includes a new reference implementation of the PSICQUIC server available to the data providers. It offers augmented web service capabilities and improves the user experience. PSICQUIC has been running for almost 5 years, with a user base growing from only 4 data providers to 28 (April 2013) allowing access to 151 310 109 binary interactions. The power of this web service is shown in PSICQUIC View web application, an example of how to simultaneously query, browse and download results from the different PSICQUIC servers. This application is free and open to all users with no login requirement (http://www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml).


Subject(s)
Proteomics/standards , Software , Internet
8.
Brief Bioinform ; 13(3): 383-9, 2012 May.
Article in English | MEDLINE | ID: mdl-22110242

ABSTRACT

Funding bodies are increasingly recognizing the need to provide graduates and researchers with access to short intensive courses in a variety of disciplines, in order both to improve the general skills base and to provide solid foundations on which researchers may build their careers. In response to the development of 'high-throughput biology', the need for training in the field of bioinformatics, in particular, is seeing a resurgence: it has been defined as a key priority by many Institutions and research programmes and is now an important component of many grant proposals. Nevertheless, when it comes to planning and preparing to meet such training needs, tension arises between the reward structures that predominate in the scientific community which compel individuals to publish or perish, and the time that must be devoted to the design, delivery and maintenance of high-quality training materials. Conversely, there is much relevant teaching material and training expertise available worldwide that, were it properly organized, could be exploited by anyone who needs to provide training or needs to set up a new course. To do this, however, the materials would have to be centralized in a database and clearly tagged in relation to target audiences, learning objectives, etc. Ideally, they would also be peer reviewed, and easily and efficiently accessible for downloading. Here, we present the Bioinformatics Training Network (BTN), a new enterprise that has been initiated to address these needs and review it, respectively, to similar initiatives and collections.


Subject(s)
Computational Biology/education , Community Networks , Humans , Research Personnel/education
9.
Bioinformatics ; 29(8): 1103-4, 2013 Apr 15.
Article in English | MEDLINE | ID: mdl-23435069

ABSTRACT

SUMMARY: BioJS is an open-source project whose main objective is the visualization of biological data in JavaScript. BioJS provides an easy-to-use consistent framework for bioinformatics application programmers. It follows a community-driven standard specification that includes a collection of components purposely designed to require a very simple configuration and installation. In addition to the programming framework, BioJS provides a centralized repository of components available for reutilization by the bioinformatics community. AVAILABILITY AND IMPLEMENTATION: http://code.google.com/p/biojs/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computer Graphics , Software , Programming Languages
10.
Bioinformatics ; 29(15): 1919-21, 2013 Aug 01.
Article in English | MEDLINE | ID: mdl-23742982

ABSTRACT

SUMMARY: We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualization of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available. AVAILABILITY: http://iann.pro/iannviewer CONTACT: manuel.corpas@tgac.ac.uk.


Subject(s)
Biological Science Disciplines , Software , Anniversaries and Special Events , Congresses as Topic , Internet
11.
Nucleic Acids Res ; 40(Database issue): D841-6, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22121220

ABSTRACT

IntAct is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. Two levels of curation are now available within the database, with both IMEx-level annotation and less detailed MIMIx-compatible entries currently supported. As from September 2011, IntAct contains approximately 275,000 curated binary interaction evidences from over 5000 publications. The IntAct website has been improved to enhance the search process and in particular the graphical display of the results. New data download formats are also available, which will facilitate the inclusion of IntAct's data in the Semantic Web. IntAct is an active contributor to the IMEx consortium (http://www.imexconsortium.org). IntAct source code and data are freely available at http://www.ebi.ac.uk/intact.


Subject(s)
Databases, Protein , Protein Interaction Mapping , Computer Graphics , Genes , Internet , Molecular Sequence Annotation , Sequence Analysis, Protein , Software
12.
PLoS Comput Biol ; 8(12): e1002789, 2012.
Article in English | MEDLINE | ID: mdl-23300402

ABSTRACT

This article aims to introduce the nature of data integration to life scientists. Generally, the subject of data integration is not discussed outside the field of computational science and is not covered in any detail, or even neglected, when teaching/training trainees. End users (hereby defined as wet-lab trainees, clinicians, lab researchers) will mostly interact with bioinformatics resources and tools through web interfaces that mask the user from the data integration processes. However, the lack of formal training or acquaintance with even simple database concepts and terminology often results in a real obstacle to the full comprehension of the resources and tools the end users wish to access. Understanding how data integration works is fundamental to empowering trainees to see the limitations as well as the possibilities when exploring, retrieving, and analysing biological data from databases. Here we introduce a game-based learning activity for training/teaching the topic of data integration that trainers/educators can adopt and adapt for their classroom. In particular we provide an example using DAS (Distributed Annotation Systems) as a method for data integration.


Subject(s)
Biology/education , Education/methods , Game Theory , Internet , User-Computer Interface
13.
Bioinformatics ; 27(18): 2616-7, 2011 Sep 15.
Article in English | MEDLINE | ID: mdl-21798964

ABSTRACT

MOTIVATION: Dasty3 is a highly interactive and extensible Web-based framework. It provides a rich Application Programming Interface upon which it is possible to develop specialized clients capable of retrieving information from DAS sources as well as from data providers not using the DAS protocol. Dasty3 provides significant improvements on previous Web-based frameworks and is implemented using the 1.6 DAS specification. AVAILABILITY: Dasty3 is an open-source tool freely available at http://www.ebi.ac.uk/dasty/ under the terms of the GNU General public license. Source and documentation can be found at http://code.google.com/p/dasty/. CONTACT: hhe@ebi.ac.uk.


Subject(s)
Databases, Genetic , Databases, Protein , Software , Computational Biology/methods , Internet , Registries
14.
BMC Bioinformatics ; 12: 143, 2011 May 10.
Article in English | MEDLINE | ID: mdl-21569281

ABSTRACT

BACKGROUND: Centralised resources such as GenBank and UniProt are perfect examples of the major international efforts that have been made to integrate and share biological information. However, additional data that adds value to these resources needs a simple and rapid route to public access. The Distributed Annotation System (DAS) provides an adequate environment to integrate genomic and proteomic information from multiple sources, making this information accessible to the community. DAS offers a way to distribute and access information but it does not provide domain experts with the mechanisms to participate in the curation process of the available biological entities and their annotations. RESULTS: We designed and developed a Collaborative Annotation System for proteins called DAS Writeback. DAS writeback is a protocol extension of DAS to provide the functionalities of adding, editing and deleting annotations. We implemented this new specification as extensions of both a DAS server and a DAS client. The architecture was designed with the involvement of the DAS community and it was improved after performing usability experiments emulating a real annotation task. CONCLUSIONS: We demonstrate that DAS Writeback is effective, usable and will provide the appropriate environment for the creation and evolution of community protein annotation.


Subject(s)
Databases, Genetic , Information Storage and Retrieval , Software , Computer Communication Networks , Molecular Sequence Annotation
15.
BMC Bioinformatics ; 12: 23, 2011 Jan 18.
Article in English | MEDLINE | ID: mdl-21244646

ABSTRACT

BACKGROUND: The Distributed Annotation System (DAS) has proven to be a successful way to publish and share biological data. Although there are more than 750 active registered servers from around 50 organizations, setting up a DAS server comprises a fair amount of work, making it difficult for many research groups to share their biological annotations. Given the clear advantage that the generalized sharing of relevant biological data is for the research community it would be desirable to facilitate the sharing process. RESULTS: Here we present easyDAS, a web-based system enabling anyone to publish biological annotations with just some clicks. The system, available at http://www.ebi.ac.uk/panda-srv/easydas is capable of reading different standard data file formats, process the data and create a new publicly available DAS source in a completely automated way. The created sources are hosted on the EBI systems and can take advantage of its high storage capacity and network connection, freeing the data provider from any network management work. easyDAS is an open source project under the GNU LGPL license. CONCLUSIONS: easyDAS is an automated DAS source creation system which can help many researchers in sharing their biological data, potentially increasing the amount of relevant biological data available to the scientific community.


Subject(s)
Molecular Sequence Annotation , Software , Computer Communication Networks , Internet
17.
Bioinformatics ; 24(18): 2119-21, 2008 Sep 15.
Article in English | MEDLINE | ID: mdl-18694895

ABSTRACT

SUMMARY: Dasty2 is a highly interactive web client integrating protein sequence annotations from currently more than 40 sources, using the distributed annotation system (DAS). AVAILABILITY: Dasty2 is an open source tool freely available under the terms of the Apache License 2.0, publicly available at http://www.ebi.ac.uk/dasty/.


Subject(s)
Proteins/chemistry , Sequence Analysis, Protein/methods , Software , Computational Biology/methods , Databases, Protein , User-Computer Interface
18.
Bioinformatics ; 24(23): 2767-72, 2008 Dec 01.
Article in English | MEDLINE | ID: mdl-18936051

ABSTRACT

MOTIVATION: The advent of sequencing and structural genomics projects has provided a dramatic boost in the number of uncharacterized protein structures and sequences. Consequently, many computational tools have been developed to help elucidate protein function. However, such services are spread throughout the world, often with standalone web pages. Integration of these methods is needed and so far this has not been possible as there was no common vocabulary available that could be used as a standard language. RESULTS: The Protein Feature Ontology has been developed to provide a structured controlled vocabulary for features on a protein sequence or structure and comprises approximately 100 positional terms, now integrated into the Sequence Ontology (SO) and 40 non-positional terms which describe features relating to the whole-protein sequence. In addition, post-translational modifications are described by using a pre-existing ontology, the Protein Modification Ontology (MOD). This ontology is being used to integrate over 150 distinct annotations provided by the BioSapiens Network of Excellence, a consortium comprising 19 partner sites in Europe. AVAILABILITY: The Protein Feature Ontology can be browsed by accessing the ontology lookup service at the European Bioinformatics Institute (http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=BS).


Subject(s)
Computational Biology/methods , Proteins/chemistry , Software , Vocabulary, Controlled , Databases, Protein , Internet , Proteins/metabolism , Proteome/genetics
19.
Drug Discov Today ; 24(4): 933-938, 2019 04.
Article in English | MEDLINE | ID: mdl-30690198

ABSTRACT

Biopharmaceutical industry R&D, and indeed other life sciences R&D such as biomedical, environmental, agricultural and food production, is becoming increasingly data-driven and can significantly improve its efficiency and effectiveness by implementing the FAIR (findable, accessible, interoperable, reusable) guiding principles for scientific data management and stewardship. By so doing, the plethora of new and powerful analytical tools such as artificial intelligence and machine learning will be able, automatically and at scale, to access the data from which they learn, and on which they thrive. FAIR is a fundamental enabler for digital transformation.


Subject(s)
Data Management , Drug Industry , Biological Products , Biomedical Research
20.
BMC Bioinformatics ; 9: 437, 2008 Oct 16.
Article in English | MEDLINE | ID: mdl-18925933

ABSTRACT

BACKGROUND: Ontologies such as the Gene Ontology can enable the construction of complex queries over biological information in a conceptual way, however existing systems to do this are too technical. Within the biological domain there is an increasing need for software that facilitates the flexible retrieval of information. OntoDas aims to fulfil this need by allowing the definition of queries by selecting valid ontology terms. RESULTS: OntoDas is a web-based tool that uses information visualisation techniques to provide an intuitive, interactive environment for constructing ontology-based queries against the Gene Ontology Database. Both a comprehensive use case and the interface itself were designed in a participatory manner by working with biologists to ensure that the interface matches the way biologists work. OntoDas was further tested with a separate group of biologists and refined based on their suggestions. CONCLUSION: OntoDas provides a visual and intuitive means for constructing complex queries against the Gene Ontology. It was designed with the participation of biologists and compares favourably with similar tools. It is available at http://ontodas.nbn.ac.za.


Subject(s)
Computational Biology/methods , Database Management Systems , Natural Language Processing , User-Computer Interface , Computer Graphics , Databases, Genetic/statistics & numerical data , Systems Integration , Terminology as Topic , Vocabulary, Controlled
SELECTION OF CITATIONS
SEARCH DETAIL