Pesquisa | Secretaria de Estado da Saúde

1.

Page, Roderic D M.

PeerJ ; 10: e13712, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35821898

RESUMO

Biological taxonomy rests on a long tail of publications spanning nearly three centuries. Not only is this literature vital to resolving disputes about taxonomy and nomenclature, for many species it represents a key source-indeed sometimes the only source-of information about that species. Unlike other disciplines such as biomedicine, the taxonomic community lacks a centralised, curated literature database (the "bibliography of life"). This article argues that Wikidata can be that database as it has flexible and sophisticated models of bibliographic information, and an active community of people and programs ("bots") adding, editing, and curating that information.

Assuntos

Software , Humanos , Bases de Dados Factuais

2.

Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library.

Page, Roderic D M.

BMC Bioinformatics ; 12: 187, 2011 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-21605356

RESUMO

BACKGROUND: The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive. DESCRIPTION: A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article locating service is exposed as a standard OpenURL resolver on the BioStor web site http://biostor.org/openurl/. This resolver can be used on the web, or called by bibliographic tools that support OpenURL. CONCLUSIONS: BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from http://biostor.org/.

Assuntos

Biologia , Armazenamento e Recuperação da Informação , Bibliotecas Digitais , Publicações , Arquivos , Biodiversidade , Publicações Periódicas como Assunto

3.

Biodiversity informatics: the challenge of linking data and the role of shared identifiers.

Page, Roderic D M.

Brief Bioinform ; 9(5): 345-54, 2008 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-18445641

RESUMO

A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers [such as Digital Object Identifiers (DOIs) and Life Science Identifiers (LSIDs)], and the implementation of services that link those identifiers.

Assuntos

Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Documentação/métodos , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Terminologia como Assunto

4.

People are essential to linking biodiversity data.

Groom, Quentin; Güntsch, Anton; Huybrechts, Pieter; Kearney, Nicole; Leachman, Siobhan; Nicolson, Nicky; Page, Roderic D M; Shorthouse, David P; Thessen, Anne E; Haston, Elspeth.

Database (Oxford) ; 20202020 11 27.

Artigo em Inglês | MEDLINE | ID: mdl-33439246

RESUMO

People are one of the best known and most stable entities in the biodiversity knowledge graph. The wealth of public information associated with people and the ability to identify them uniquely open up the possibility to make more use of these data in biodiversity science. Person data are almost always associated with entities such as specimens, molecular sequences, taxonomic names, observations, images, traits and publications. For example, the digitization and the aggregation of specimen data from museums and herbaria allow us to view a scientist's specimen collecting in conjunction with the whole corpus of their works. However, the metadata of these entities are also useful in validating data, integrating data across collections and institutional databases and can be the basis of future research into biodiversity and science. In addition, the ability to reliably credit collectors for their work has the potential to change the incentive structure to promote improved curation and maintenance of natural history collections.

Assuntos

Biodiversidade , História Natural , Bases de Dados Factuais , Humanos , Museus

5.

bioGUID: resolving, discovering, and minting identifiers for biodiversity informatics.

Page, Roderic D M.

BMC Bioinformatics ; 10 Suppl 14: S5, 2009 Nov 10.

Artigo em Inglês | MEDLINE | ID: mdl-19900301

RESUMO

BACKGROUND: Linking together the data of interest to biodiversity researchers (including specimen records, images, taxonomic names, and DNA sequences) requires services that can mint, resolve, and discover globally unique identifiers (including, but not limited to, DOIs, HTTP URIs, and LSIDs). RESULTS: bioGUID implements a range of services, the core ones being an OpenURL resolver for bibliographic resources, and a LSID resolver. The LSID resolver supports Linked Data-friendly resolution using HTTP 303 redirects and content negotiation. Additional services include journal ISSN look-up, author name matching, and a tool to monitor the status of biodiversity data providers. CONCLUSION: bioGUID is available at http://bioguid.info/. Source code is available from http://code.google.com/p/bioguid/.

Assuntos

Biodiversidade , Biologia Computacional , Bases de Dados Factuais , Humanos , Internet

6.

Ozymandias: a biodiversity knowledge graph.

Page, Roderic D M.

PeerJ ; 7: e6739, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-30993051

RESUMO

Enormous quantities of biodiversity data are being made available online, but much of this data remains isolated in silos. One approach to breaking these silos is to map local, often database-specific identifiers to shared global identifiers. This mapping can then be used to construct a knowledge graph, where entities such as taxa, publications, people, places, specimens, sequences, and institutions are all part of a single, shared knowledge space. Motivated by the 2018 GBIF Ebbe Nielsen Challenge I explore the feasibility of constructing a "biodiversity knowledge graph" for the Australian fauna. The data cleaning and reconciliation steps involved in constructing the knowledge graph are described in detail. Examples are given of its application to understanding changes in patterns of taxonomic publication over time. A web interface to the knowledge graph (called "Ozymandias") is available at https://ozymandias-demo.herokuapp.com.

7.

TBMap: a taxonomic perspective on the phylogenetic database TreeBASE.

Page, Roderic D M.

BMC Bioinformatics ; 8: 158, 2007 May 18.

Artigo em Inglês | MEDLINE | ID: mdl-17511869

RESUMO

BACKGROUND: TreeBASE is currently the only available large-scale database of published organismal phylogenies. Its utility is hampered by a lack of taxonomic consistency, both within the database, and with names of organisms in external genomic, specimen, and taxonomic databases. The extent to which the phylogenetic knowledge in TreeBASE becomes integrated with these other sources is limited by this lack of consistency. DESCRIPTION: Taxonomic names in TreeBASE were mapped onto names in the external taxonomic databases IPNI, ITIS, NCBI, and uBio, and graph G of these mappings was constructed. Additional edges representing taxonomic synonymies were added to G, then all components of G were extracted. These components correspond to "name clusters", and group together names in TreeBASE that are inferred to refer to the same taxon. The mapping to NCBI enables hierarchical queries to be performed, which can improve TreeBASE information retrieval by an order of magnitude. CONCLUSION: TBMap database provides a mapping of the bulk of the names in TreeBASE to names in external taxonomic databases, and a clustering of those mappings into sets of names that can be regarded as equivalent. This mapping enables queries and visualisations that cannot otherwise be constructed. A simple query interface to the mapping and names clusters is available at http://linnaeus.zoology.gla.ac.uk/~rpage/tbmap.

Assuntos

Classificação/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Documentação/métodos , Modelos Genéticos , Filogenia , Interface Usuário-Computador , Sequência de Bases , Mapeamento Cromossômico , Dados de Sequência Molecular , Análise de Sequência de DNA

8.

The shape of human gene family phylogenies.

Cotton, James A; Page, Roderic D M.

BMC Evol Biol ; 6: 66, 2006 Aug 29.

Artigo em Inglês | MEDLINE | ID: mdl-16939643

RESUMO

BACKGROUND: The shape of phylogenetic trees has been used to make inferences about the evolutionary process by comparing the shapes of actual phylogenies with those expected under simple models of the speciation process. Previous studies have focused on speciation events, but gene duplication is another lineage splitting event, analogous to speciation, and gene loss or deletion is analogous to extinction. Measures of the shape of gene family phylogenies can thus be used to investigate the processes of gene duplication and loss. We make the first systematic attempt to use tree shape to study gene duplication using human gene phylogenies. RESULTS: We find that gene duplication has produced gene family trees significantly less balanced than expected from a simple model of the process, and less balanced than species phylogenies: the opposite to what might be expected under the 2R hypothesis. CONCLUSION: While other explanations are plausible, we suggest that the greater imbalance of gene family trees than species trees is due to the prevalence of tandem duplications over regional duplications during the evolution of the human genome.

Assuntos

Filogenia , Deleção de Genes , Duplicação Gênica , Humanos/genética , Modelos Genéticos

9.

Surfacing the deep data of taxonomy.

Page, Roderic D M.

Zookeys ; (550): 247-60, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26877663

RESUMO

Taxonomic databases are perpetuating approaches to citing literature that may have been appropriate before the Internet, often being little more than digitised 5 × 3 index cards. Typically the original taxonomic literature is either not cited, or is represented in the form of a (typically abbreviated) text string. Hence much of the "deep data" of taxonomy, such as the original descriptions, revisions, and nomenclatural actions are largely hidden from all but the most resourceful users. At the same time there are burgeoning efforts to digitise the scientific literature, and much of this newly available content has been assigned globally unique identifiers such as Digital Object Identifiers (DOIs), which are also the identifier of choice for most modern publications. This represents an opportunity for taxonomic databases to engage with digitisation efforts. Mapping the taxonomic literature on to globally unique identifiers can be time consuming, but need be done only once. Furthermore, if we reuse existing identifiers, rather than mint our own, we can start to build the links between the diverse data that are needed to support the kinds of inference which biodiversity informatics aspires to support. Until this practice becomes widespread, the taxonomic literature will remain balkanized, and much of the knowledge that it contains will linger in obscurity.

10.

DNA barcoding and taxonomy: dark taxa and dark texts.

Page, Roderic D M.

Philos Trans R Soc Lond B Biol Sci ; 371(1702)2016 09 05.

Artigo em Inglês | MEDLINE | ID: mdl-27481786

RESUMO

Both classical taxonomy and DNA barcoding are engaged in the task of digitizing the living world. Much of the taxonomic literature remains undigitized. The rise of open access publishing this century and the freeing of older literature from the shackles of copyright have greatly increased the online availability of taxonomic descriptions, but much of the literature of the mid- to late-twentieth century remains offline ('dark texts'). DNA barcoding is generating a wealth of computable data that in many ways are much easier to work with than classical taxonomic descriptions, but many of the sequences are not identified to species level. These 'dark taxa' hamper the classical method of integrating biodiversity data, using shared taxonomic names. Voucher specimens are a potential common currency of both the taxonomic literature and sequence databases, and could be used to help link names, literature and sequences. An obstacle to this approach is the lack of stable, resolvable specimen identifiers. The paper concludes with an appeal for a global 'digital dashboard' to assess the extent to which biodiversity data are available online.This article is part of the themed issue 'From DNA barcodes to biomes'.

Assuntos

Classificação/métodos , Código de Barras de DNA Taxonômico , Publicações Periódicas como Assunto , Biodiversidade , Manejo de Espécimes

11.

A Taxonomic Search Engine: federating taxonomic databases using web services.

Page, Roderic D M.

BMC Bioinformatics ; 6: 48, 2005 Mar 09.

Artigo em Inglês | MEDLINE | ID: mdl-15757517

RESUMO

BACKGROUND: The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. RESULTS: The Taxonomic Search Engine (TSE) is a web application written in PHP that queries multiple taxonomic databases (ITIS, Index Fungorum, IPNI, NCBI, and uBIO) and summarises the results in a consistent format. It supports "drill-down" queries to retrieve a specific record. The TSE can optionally suggest alternative spellings the user can try. It also acts as a Life Science Identifier (LSID) authority for the source taxonomic databases, providing globally unique identifiers (and associated metadata) for each name. CONCLUSION: The Taxonomic Search Engine is available at http://darwin.zoology.gla.ac.uk/~rpage/portal/ and provides a simple demonstration of the potential of the federated approach to providing access to taxonomic names.

Assuntos

Biologia Computacional/métodos , Bases de Dados Factuais , Classificação , Redes de Comunicação de Computadores , Sistemas de Gerenciamento de Base de Dados , Bases de Dados como Assunto , Bases de Dados Genéticas , Bases de Dados de Proteínas , Disseminação de Informação , Serviços de Informação , Armazenamento e Recuperação da Informação , Sistemas de Informação , Internet , Informática Médica , National Institutes of Health (U.S.) , National Library of Medicine (U.S.) , Análise de Sequência de Proteína , Software , Design de Software , Integração de Sistemas , Unified Medical Language System , Estados Unidos , Interface Usuário-Computador

12.

An edit script for taxonomic classifications.

Page, Roderic D M; Valiente, Gabriel.

BMC Bioinformatics ; 6: 208, 2005 Aug 25.

Artigo em Inglês | MEDLINE | ID: mdl-16122379

RESUMO

BACKGROUND: The NCBI taxonomy provides one of the most powerful ways to navigate sequence data bases but currently users are forced to formulate queries according to a single taxonomic classification. Given that there is not universal agreement on the classification of organisms, providing a single classification places constraints on the questions biologists can ask. However, maintaining multiple classifications is burdensome in the face of a constantly growing NCBI classification. RESULTS: In this paper, we present a solution to the problem of generating modifications of the NCBI taxonomy, based on the computation of an edit script that summarises the differences between two classification trees. Our algorithms find the shortest possible edit script based on the identification of all shared subtrees, and only take time quasi linear in the size of the trees because classification trees have unique node labels. CONCLUSION: These algorithms have been recently implemented, and the software is freely available for download from http://darwin.zoology.gla.ac.uk/~rpage/forest/.

Assuntos

Algoritmos , Classificação/métodos , Computadores Moleculares , Animais , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos/classificação , Humanos , Filogenia , Especificidade da Espécie

13.

Rates and patterns of gene duplication and loss in the human genome.

Cotton, James A; Page, Roderic D M.

Proc Biol Sci ; 272(1560): 277-83, 2005 Feb 07.

Artigo em Inglês | MEDLINE | ID: mdl-15705552

RESUMO

Gene duplication has certainly played a major role in structuring vertebrate genomes but the extent and nature of the duplication events involved remains controversial. A recent study identified two major episodes of gene duplication: one episode of putative genome duplication ca. 500 Myr ago and a more recent gene-family expansion attributed to segmental or tandem duplications. We confirm this pattern using methods not reliant on molecular clocks for individual gene families. However, analysis of a simple model of the birth-death process suggests that the apparent recent episode of duplication is an artefact of the birth-death process. We show that a constant-rate birth-death model is appropriate for gene duplication data, allowing us to estimate the rate of gene duplication and loss in the vertebrate genome over the last 200 Myr (0.00115 and 0.00740 Myr(-1) lineage(-1), respectively). Finally, we show that increasing rates of gene loss reduce the impact of a genome-wide duplication event on the distribution of gene duplications through time.

Assuntos

Evolução Molecular , Deleção de Genes , Duplicação Gênica , Genoma Humano , Modelos Genéticos , Genômica , Humanos

14.

Community next steps for making globally unique identifiers work for biocollections data.

Guralnick, Robert P; Cellinese, Nico; Deck, John; Pyle, Richard L; Kunze, John; Penev, Lyubomir; Walls, Ramona; Hagedorn, Gregor; Agosti, Donat; Wieczorek, John; Catapano, Terry; Page, Roderic D M.

Zookeys ; (494): 133-54, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25901117

RESUMO

Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided.

15.

Going nuclear: gene family evolution and vertebrate phylogeny reconciled.

Cotton, James A; Page, Roderic D M.

Proc Biol Sci ; 269(1500): 1555-61, 2002 Aug 07.

Artigo em Inglês | MEDLINE | ID: mdl-12184825

RESUMO

Gene duplications have been common throughout vertebrate evolution, introducing paralogy and so complicating phylogenetic inference from nuclear genes. Reconciled trees are one method capable of dealing with paralogy, using the relationship between a gene phylogeny and the phylogeny of the organisms containing those genes to identify gene duplication events. This allows us to infer phylogenies from gene families containing both orthologous and paralogous copies. Vertebrate phylogeny is well understood from morphological and palaeontological data, but studies using mitochondrial sequence data have failed to reproduce this classical view. Reconciled tree analysis of a database of 118 vertebrate gene families supports a largely classical vertebrate phylogeny.

Assuntos

Evolução Molecular , Filogenia , Vertebrados/genética , Animais , Duplicação Gênica , Família Multigênica/genética , Especificidade da Espécie

16.

COMPONENT ANALYSIS: A VALIANT FAILURE?

Page, Roderic D M.

Cladistics ; 6(2): 119-136, 1990 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-34933509

RESUMO

Abstract- Rerent criticisms of component analysis are based on misunderstandings of the relationship between component analysis, parsimony and consensus methods. These criticisms are rebutted, and the appropriateness of applying the Wagner parsimony criterion to the study of biogcography and co-speciation is questioned. An alternative parsimony method, previously applied to mapping gene cladograms onto organism cladograms, is developed.

17.

Reweaving the tapestry: a supertree of birds.

Davis, Katie E; Page, Roderic D M.

PLoS Curr ; 62014 Jun 09.

Artigo em Inglês | MEDLINE | ID: mdl-24944845

RESUMO

Our knowledge of the avian tree of life remains uncertain, particularly at deeper levels due to the rapid diversification early in their evolutionary history. They are the most abundant land vertebrate on the planet and have been of great historical interest to systematists. Birds are also economically and ecologically important and as a result are intensively studied, yet despite their importance and interest to humans around 13% of taxa currently on the endangered species list perhaps as a result of human activity. Despite all this no comprehensive phylogeny that includes both extinct and extant species currently exists. Here we present a species-level supertree, constructed using the Matrix Representation with Parsimony method, of Aves containing approximately two thirds of all species from nearly 1000 source phylogenies with a broad taxonomic coverage. The source data for the tree were collected and processed according to a strict protocol to ensure robust and accurate data handling. The resulting tree topology is largely consistent with molecular hypotheses of avian phylogeny. We identify areas that are in broad agreement with current views on avian systematics and also those that require further work. We also highlight the need for leaf-based support measures to enable the identification of rogue taxa in supertrees. This is a first attempt at a supertree of both extinct and extant birds, it is not intended to be utilised in an overhaul of avian systematics or as a basis for taxonomic re-classification but provides a strong basis on which to base further studies on macroevolution, conservation, biodiversity, comparative biology and character evolution, in particular the inclusion of fossils will allow the study of bird evolution and diversification throughout deep time.

18.

BioNames: linking taxonomy, texts, and trees.

Page, Roderic D M.

PeerJ ; 1: e190, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24244913

RESUMO

BioNames is a web database of taxonomic names for animals, linked to the primary literature and, wherever possible, to phylogenetic trees. It aims to provide a taxonomic "dashboard" where at a glance we can see a summary of the taxonomic and phylogenetic information we have for a given taxon and hence provide a quick answer to the basic question "what is this taxon?" BioNames combines classifications from the Global Biodiversity Information Facility (GBIF) and GenBank, images from the Encyclopedia of Life (EOL), animal names from the Index of Organism Names (ION), and bibliographic data from multiple sources including the Biodiversity Heritage Library (BHL) and CrossRef. The user interface includes display of full text articles, interactive timelines of taxonomic publications, and zoomable phylogenies. It is available at http://bionames.org.

19.

Space, time, form: viewing the Tree of Life.

Page, Roderic D M.

Trends Ecol Evol ; 27(2): 113-20, 2012 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-22209094

RESUMO

There are numerous ways to display a phylogenetic tree, which is reflected in the diversity of software tools available to phylogenetists. Displaying very large trees continues to be a challenge, made ever harder as increasing computing power enables researchers to construct ever-larger trees. At the same time, computing technology is enabling novel visualisations, ranging from geophylogenies embedded on digital globes to touch-screen interfaces that enable greater interaction with evolutionary trees. In this review, I survey recent developments in phylogenetic visualisation, highlighting successful (and less successful) approaches and sketching some future directions.

Assuntos

Filogenia , Classificação/métodos , Ecologia/tendências , Filogeografia/métodos , Filogeografia/tendências , Software

20.

Evolutionary informatics: unifying knowledge about the diversity of life.

Parr, Cynthia S; Guralnick, Robert; Cellinese, Nico; Page, Roderic D M.

Trends Ecol Evol ; 27(2): 94-103, 2012 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-22154516

RESUMO

The accelerating growth of data and knowledge in evolutionary biology is indisputable. Despite this rapid progress, information remains scattered, poorly documented and in formats that impede discovery and integration. A grand challenge is the creation of a linked system of all evolutionary data, information and knowledge organized around Darwin's ever-growing Tree of Life. Such a system, accommodating topological disagreement where necessary, would consolidate taxon names, phenotypic and geographical distributional data across clades, and serve as an integrated community resource. The field of evolutionary informatics, reviewed here for the first time, has matured into a robust discipline that is developing the conceptual, infrastructure and community frameworks for meeting this grand challenge.

Assuntos

Biodiversidade , Biologia Computacional/métodos , Evolução Biológica , Ecologia/métodos , Ecologia/tendências , Filogenia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa