Pesquisa | Portal Regional da BVS

1.

A Survey of Bioinformatics Database and Software Usage through Mining the Literature.

Duck, Geraint; Nenadic, Goran; Filannino, Michele; Brass, Andy; Robertson, David L; Stevens, Robert.

PLoS One ; 11(6): e0157989, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27331905

RESUMO

Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371.

Assuntos

Biologia Computacional/métodos , Mineração de Dados , Bases de Dados Genéticas , PubMed , Software , Biologia , Análise por Conglomerados , Medicina , Modelos Teóricos , Publicações Periódicas como Assunto , Reprodutibilidade dos Testes , Fatores de Tempo

2.

Ambiguity and variability of database and software names in bioinformatics.

Duck, Geraint; Kovacevic, Aleksandar; Robertson, David L; Stevens, Robert; Nenadic, Goran.

J Biomed Semantics ; 6: 29, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26131352

RESUMO

BACKGROUND: There are numerous options available to achieve various tasks in bioinformatics, but until recently, there were no tools that could systematically identify mentions of databases and tools within the literature. In this paper we explore the variability and ambiguity of database and software name mentions and compare dictionary and machine learning approaches to their identification. RESULTS: Through the development and analysis of a corpus of 60 full-text documents manually annotated at the mention level, we report high variability and ambiguity in database and software mentions. On a test set of 25 full-text documents, a baseline dictionary look-up achieved an F-score of 46 %, highlighting not only variability and ambiguity but also the extensive number of new resources introduced. A machine learning approach achieved an F-score of 63 % (with precision of 74 %) and 70 % (with precision of 83 %) for strict and lenient matching respectively. We characterise the issues with various mention types and propose potential ways of capturing additional database and software mentions in the literature. CONCLUSIONS: Our analyses show that identification of mentions of databases and tools is a challenging task that cannot be achieved by relying on current manually-curated resource repositories. Although machine learning shows improvement and promise (primarily in precision), more contextual information needs to be taken into account to achieve a good degree of accuracy.

3.

Extracting patterns of database and software usage from the bioinformatics literature.

Duck, Geraint; Nenadic, Goran; Brass, Andy; Robertson, David L; Stevens, Robert.

Bioinformatics ; 30(17): i601-8, 2014 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-25161253

RESUMO

MOTIVATION: As a natural consequence of being a computer-based discipline, bioinformatics has a strong focus on database and software development, but the volume and variety of resources are growing at unprecedented rates. An audit of database and software usage patterns could help provide an overview of developments in bioinformatics and community common practice, and comparing the links between resources through time could demonstrate both the persistence of existing software and the emergence of new tools. RESULTS: We study the connections between bioinformatics resources and construct networks of database and software usage patterns, based on resource co-occurrence, that correspond to snapshots of common practice in the bioinformatics community. We apply our approach to pairings of phylogenetics software reported in the literature and argue that these could provide a stepping stone into the identification of scientific best practice. AVAILABILITY AND IMPLEMENTATION: The extracted resource data, the scripts used for network generation and the resulting networks are available at http://bionerds.sourceforge.net/networks/.

Assuntos

Biologia Computacional , Bases de Dados Factuais/estatística & dados numéricos , Software , Mineração de Dados , Publicações Periódicas como Assunto , Filogenia

4.

The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery.

Dumontier, Michel; Baker, Christopher Jo; Baran, Joachim; Callahan, Alison; Chepelev, Leonid; Cruz-Toledo, José; Del Rio, Nicholas R; Duck, Geraint; Furlong, Laura I; Keath, Nichealla; Klassen, Dana; McCusker, Jamie P.; Queralt-Rosinach, Núria; Samwald, Matthias; Villanueva-Rosales, Natalia; Wilkinson, Mark D; Hoehndorf, Robert.

J Biomed Semantics ; 5(1): 14, 2014 03 06.

Artigo em Inglês | MEDLINE | ID: mdl-24602174

RESUMO

The Semanticscience Integrated Ontology (SIO) is an ontology to facilitate biomedical knowledge discovery. SIO features a simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes. SIO specifies simple design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. SIO provides an ontological foundation for the Bio2RDF linked data for the life sciences project and is used for semantic integration and discovery for SADI-based semantic web services. SIO is freely available to all users under a creative commons by attribution license. See website for further information: http://sio.semanticscience.org.

5.

BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains.

Katayama, Toshiaki; Wilkinson, Mark D; Aoki-Kinoshita, Kiyoko F; Kawashima, Shuichi; Yamamoto, Yasunori; Yamaguchi, Atsuko; Okamoto, Shinobu; Kawano, Shin; Kim, Jin-Dong; Wang, Yue; Wu, Hongyan; Kano, Yoshinobu; Ono, Hiromasa; Bono, Hidemasa; Kocbek, Simon; Aerts, Jan; Akune, Yukie; Antezana, Erick; Arakawa, Kazuharu; Aranda, Bruno; Baran, Joachim; Bolleman, Jerven; Bonnal, Raoul Jp; Buttigieg, Pier Luigi; Campbell, Matthew P; Chen, Yi-An; Chiba, Hirokazu; Cock, Peter Ja; Cohen, K Bretonnel; Constantin, Alexandru; Duck, Geraint; Dumontier, Michel; Fujisawa, Takatomo; Fujiwara, Toyofumi; Goto, Naohisa; Hoehndorf, Robert; Igarashi, Yoshinobu; Itaya, Hidetoshi; Ito, Maori; Iwasaki, Wataru; Kalas, Matús; Katoda, Takeo; Kim, Taehong; Kokubu, Anna; Komiyama, Yusuke; Kotera, Masaaki; Laibe, Camille; Lapp, Hilmar; Lütteke, Thomas; Marshall, M Scott.

J Biomed Semantics ; 5(1): 5, 2014 Feb 05.

Artigo em Inglês | MEDLINE | ID: mdl-24495517

RESUMO

The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.

6.

bioNerDS: exploring bioinformatics' database and software use through literature mining.

Duck, Geraint; Nenadic, Goran; Brass, Andy; Robertson, David L; Stevens, Robert.

BMC Bioinformatics ; 14: 194, 2013 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-23768135

RESUMO

BACKGROUND: Biology-focused databases and software define bioinformatics and their use is central to computational biology. In such a complex and dynamic field, it is of interest to understand what resources are available, which are used, how much they are used, and for what they are used. While scholarly literature surveys can provide some insights, large-scale computer-based approaches to identify mentions of bioinformatics databases and software from primary literature would automate systematic cataloguing, facilitate the monitoring of usage, and provide the foundations for the recovery of computational methods for analysing biological data, with the long-term aim of identifying best/common practice in different areas of biology. RESULTS: We have developed bioNerDS, a named entity recogniser for the recovery of bioinformatics databases and software from primary literature. We identify such entities with an F-measure ranging from 63% to 91% at the mention level and 63-78% at the document level, depending on corpus. Not attaining a higher F-measure is mostly due to high ambiguity in resource naming, which is compounded by the on-going introduction of new resources. To demonstrate the software, we applied bioNerDS to full-text articles from BMC Bioinformatics and Genome Biology. General mention patterns reflect the remit of these journals, highlighting BMC Bioinformatics's emphasis on new tools and Genome Biology's greater emphasis on data analysis. The data also illustrates some shifts in resource usage: for example, the past decade has seen R and the Gene Ontology join BLAST and GenBank as the main components in bioinformatics processing. ABSTRACT: Conclusions We demonstrate the feasibility of automatically identifying resource names on a large-scale from the scientific literature and show that the generated data can be used for exploration of bioinformatics database and software usage. For example, our results help to investigate the rate of change in resource usage and corroborate the suspicion that a vast majority of resources are created, but rarely (if ever) used thereafter. bioNerDS is available at http://bionerds.sourceforge.net/.

Assuntos

Biologia Computacional/métodos , Mineração de Dados/métodos , Bases de Dados Factuais , Software , Genômica , Vocabulário Controlado

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA