Search | VHL Regional Portal

BioHackathon 2015: Semantics of data for life sciences and reproducible research.

Vos, Rutger A; Katayama, Toshiaki; Mishima, Hiroyuki; Kawano, Shin; Kawashima, Shuichi; Kim, Jin-Dong; Moriya, Yuki; Tokimatsu, Toshiaki; Yamaguchi, Atsuko; Yamamoto, Yasunori; Wu, Hongyan; Amstutz, Peter; Antezana, Erick; Aoki, Nobuyuki P; Arakawa, Kazuharu; Bolleman, Jerven T; Bolton, Evan; Bonnal, Raoul J P; Bono, Hidemasa; Burger, Kees; Chiba, Hirokazu; Cohen, Kevin B; Deutsch, Eric W; Fernández-Breis, Jesualdo T; Fu, Gang; Fujisawa, Takatomo; Fukushima, Atsushi; García, Alexander; Goto, Naohisa; Groza, Tudor; Hercus, Colin; Hoehndorf, Robert; Itaya, Kotone; Juty, Nick; Kawashima, Takeshi; Kim, Jee-Hyub; Kinjo, Akira R; Kotera, Masaaki; Kozaki, Kouji; Kumagai, Sadahiro; Kushida, Tatsuya; Lütteke, Thomas; Matsubara, Masaaki; Miyamoto, Joe; Mohsen, Attayeb; Mori, Hiroshi; Naito, Yuki; Nakazato, Takeru; Nguyen-Xuan, Jeremy; Nishida, Kozo.

F1000Res ; 9: 136, 2020.

Article in English | MEDLINE | ID: mdl-32308977

ABSTRACT

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.

Subject(s)

Biological Science Disciplines , Computational Biology , Semantic Web , Data Mining , Metadata , Reproducibility of Results

The health care and life sciences community profile for dataset descriptions.

Dumontier, Michel; Gray, Alasdair J G; Marshall, M Scott; Alexiev, Vladimir; Ansell, Peter; Bader, Gary; Baran, Joachim; Bolleman, Jerven T; Callahan, Alison; Cruz-Toledo, José; Gaudet, Pascale; Gombocz, Erich A; Gonzalez-Beltran, Alejandra N; Groth, Paul; Haendel, Melissa; Ito, Maori; Jupp, Simon; Juty, Nick; Katayama, Toshiaki; Kobayashi, Norio; Krishnaswami, Kalpana; Laibe, Camille; Le Novère, Nicolas; Lin, Simon; Malone, James; Miller, Michael; Mungall, Christopher J; Rietveld, Laurens; Wimalaratne, Sarala M; Yamaguchi, Atsuko.

PeerJ ; 4: e2331, 2016.

Article in English | MEDLINE | ID: mdl-27602295

ABSTRACT

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.

Bolleman, Jerven T; Mungall, Christopher J; Strozzi, Francesco; Baran, Joachim; Dumontier, Michel; Bonnal, Raoul J P; Buels, Robert; Hoehndorf, Robert; Fujisawa, Takatomo; Katayama, Toshiaki; Cock, Peter J A.

J Biomed Semantics ; 7: 39, 2016 Jun 13.

Article in English | MEDLINE | ID: mdl-27296299

ABSTRACT

BACKGROUND: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. DESCRIPTION: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. CONCLUSIONS: Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

Subject(s)

Biological Ontologies , Molecular Sequence Annotation/standards , Nucleotides/genetics , Nucleotides/metabolism , Proteins/chemistry , Proteins/metabolism , Semantics , Databases, Genetic , Databases, Protein , Fuzzy Logic , Humans , Reference Books

Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search.

Alocci, Davide; Mariethoz, Julien; Horlacher, Oliver; Bolleman, Jerven T; Campbell, Matthew P; Lisacek, Frederique.

PLoS One ; 10(12): e0144578, 2015.

Article in English | MEDLINE | ID: mdl-26656740

ABSTRACT

Resource description framework (RDF) and Property Graph databases are emerging technologies that are used for storing graph-structured data. We compare these technologies through a molecular biology use case: glycan substructure search. Glycans are branched tree-like molecules composed of building blocks linked together by chemical bonds. The molecular structure of a glycan can be encoded into a direct acyclic graph where each node represents a building block and each edge serves as a chemical linkage between two building blocks. In this context, Graph databases are possible software solutions for storing glycan structures and Graph query languages, such as SPARQL and Cypher, can be used to perform a substructure search. Glycan substructure searching is an important feature for querying structure and experimental glycan databases and retrieving biologically meaningful data. This applies for example to identifying a region of the glycan recognised by a glycan binding protein (GBP). In this study, 19,404 glycan structures were selected from GlycomeDB (www.glycome-db.org) and modelled for being stored into a RDF triple store and a Property Graph. We then performed two different sets of searches and compared the query response times and the results from both technologies to assess performance and accuracy. The two implementations produced the same results, but interestingly we noted a difference in the query response times. Qualitative measures such as portability were also used to define further criteria for choosing the technology adapted to solving glycan substructure search and other comparable issues.

Subject(s)

Databases, Factual , Information Storage and Retrieval , Polysaccharides/metabolism , Computational Biology , Molecular Structure , Software

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL