Pesquisa | Biblioteca Virtual em Saúde

The taxonomic name resolution service: an online tool for automated standardization of plant names.

Boyle, Brad; Hopkins, Nicole; Lu, Zhenyuan; Raygoza Garay, Juan Antonio; Mozzherin, Dmitry; Rees, Tony; Matasci, Naim; Narro, Martha L; Piel, William H; McKay, Sheldon J; Lowry, Sonya; Freeland, Chris; Peet, Robert K; Enquist, Brian J.

BMC Bioinformatics ; 14: 16, 2013 Jan 16.

Artigo em Inglês | MEDLINE | ID: mdl-23324024

RESUMO

BACKGROUND: The digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this 'names problem' has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science. RESULTS: The TNRS, or Taxonomic Name Resolution Service, is an online application for automated and user-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source applications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies, including the Missouri Botanical Garden's Tropicos database. Capable of processing thousands of names in a single operation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and converts nomenclatural synonyms to accepted names. Family names can be included to increase match accuracy and resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations, accession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active and legacy datasets. CONCLUSIONS: We show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct spelling errors and eliminate spurious names. As a result, the TNRS can aid the integration of disparate biological datasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and design can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at http://tnrs.iplantcollaborative.org/ and as a RESTful web service and application programming interface. Source code is available at https://github.com/iPlantCollaborativeOpenSource/TNRS/.

Assuntos

Plantas/classificação , Software , Algoritmos , Classificação/métodos , Bases de Dados Factuais , Internet , Nomes , Interface Usuário-Computador

A Standardised Vocabulary for Identifying Benthic Biota and Substrata from Underwater Imagery: The CATAMI Classification Scheme.

Althaus, Franziska; Hill, Nicole; Ferrari, Renata; Edwards, Luke; Przeslawski, Rachel; Schönberg, Christine H L; Stuart-Smith, Rick; Barrett, Neville; Edgar, Graham; Colquhoun, Jamie; Tran, Maggie; Jordan, Alan; Rees, Tony; Gowlett-Holmes, Karen.

PLoS One ; 10(10): e0141039, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26509918

RESUMO

Imagery collected by still and video cameras is an increasingly important tool for minimal impact, repeatable observations in the marine environment. Data generated from imagery includes identification, annotation and quantification of biological subjects and environmental features within an image. To be long-lived and useful beyond their project-specific initial purpose, and to maximize their utility across studies and disciplines, marine imagery data should use a standardised vocabulary of defined terms. This would enable the compilation of regional, national and/or global data sets from multiple sources, contributing to broad-scale management studies and development of automated annotation algorithms. The classification scheme developed under the Collaborative and Automated Tools for Analysis of Marine Imagery (CATAMI) project provides such a vocabulary. The CATAMI classification scheme introduces Australian-wide acknowledged, standardised terminology for annotating benthic substrates and biota in marine imagery. It combines coarse-level taxonomy and morphology, and is a flexible, hierarchical classification that bridges the gap between habitat/biotope characterisation and taxonomy, acknowledging limitations when describing biological taxa through imagery. It is fully described, documented, and maintained through curated online databases, and can be applied across benthic image collection methods, annotation platforms and scoring methods. Following release in 2013, the CATAMI classification scheme was taken up by a wide variety of users, including government, academia and industry. This rapid acceptance highlights the scheme's utility and the potential to facilitate broad-scale multidisciplinary studies of marine ecosystems when applied globally. Here we present the CATAMI classification scheme, describe its conception and features, and discuss its utility and the opportunities as well as challenges arising from its use.

Assuntos

Monitoramento Ambiental/métodos , Invertebrados , Algoritmos , Animais , Biota , Conservação dos Recursos Naturais , Ecossistema

Taxamatch, an algorithm for near ('fuzzy') matching of scientific names in taxonomic databases.

Rees, Tony.

PLoS One ; 9(9): e107510, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25247892

RESUMO

Misspellings of organism scientific names create barriers to optimal storage and organization of biological data, reconciliation of data stored under different spelling variants of the same name, and appropriate responses from user queries to taxonomic data systems. This study presents an analysis of the nature of the problem from first principles, reviews some available algorithmic approaches, and describes Taxamatch, an improved name matching solution for this information domain. Taxamatch employs a custom Modified Damerau-Levenshtein Distance algorithm in tandem with a phonetic algorithm, together with a rule-based approach incorporating a suite of heuristic filters, to produce improved levels of recall, precision and execution time over the existing dynamic programming algorithms n-grams (as bigrams and trigrams) and standard edit distance. Although entirely phonetic methods are faster than Taxamatch, they are inferior in the area of recall since many real-world errors are non-phonetic in nature. Excellent performance of Taxamatch (as recall, precision and execution time) is demonstrated against a reference database of over 465,000 genus names and 1.6 million species names, as well as against a range of error types as present at both genus and species levels in three sets of sample data for species and four for genera alone. An ancillary authority matching component is included which can be used both for misspelled names and for otherwise matching names where the associated cited authorities are not identical.

Assuntos

Algoritmos , Classificação/métodos , Bases de Dados Factuais

Marine biodiversity in the Australian region.

Butler, Alan J; Rees, Tony; Beesley, Pam; Bax, Nicholas J.

PLoS One ; 5(8): e11831, 2010 Aug 02.

Artigo em Inglês | MEDLINE | ID: mdl-20689847

RESUMO

The entire Australian marine jurisdictional area, including offshore and sub-Antarctic islands, is considered in this paper. Most records, however, come from the Exclusive Economic Zone (EEZ) around the continent of Australia itself. The counts of species have been obtained from four primary databases (the Australian Faunal Directory, Codes for Australian Aquatic Biota, Online Zoological Collections of Australian Museums, and the Australian node of the Ocean Biogeographic Information System), but even these are an underestimate of described species. In addition, some partially completed databases for particular taxonomic groups, and specialized databases (for introduced and threatened species) have been used. Experts also provided estimates of the number of known species not yet in the major databases. For only some groups could we obtain an (expert opinion) estimate of undiscovered species. The databases provide patchy information about endemism, levels of threat, and introductions. We conclude that there are about 33,000 marine species (mainly animals) in the major databases, of which 130 are introduced, 58 listed as threatened and an unknown percentage endemic. An estimated 17,000 more named species are either known from the Australian EEZ but not in the present databases, or potentially occur there. It is crudely estimated that there may be as many as 250,000 species (known and yet to be discovered) in the Australian EEZ. For 17 higher taxa, there is sufficient detail for subdivision by Large Marine Domains, for comparison with other National and Regional Implementation Committees of the Census of Marine Life. Taxonomic expertise in Australia is unevenly distributed across taxa, and declining. Comments are given briefly on biodiversity management measures in Australia, including but not limited to marine protected areas.

Assuntos

Biodiversidade , Animais , Austrália , Bases de Dados Factuais , Espécies em Perigo de Extinção/estatística & dados numéricos , Oceanos e Mares

Final evaluation results for the Fast-Check HIV rapid test kits.

Rekart, Michael L; Quon, J A; Rees, Tony.

CMAJ ; 171(11): 1324, 2004 Nov 23.

Artigo em Inglês | MEDLINE | ID: mdl-15557571

Assuntos

Sorodiagnóstico da AIDS/normas , Kit de Reagentes para Diagnóstico/normas , Canadá , Erros de Diagnóstico , Reações Falso-Negativas , Humanos , Sensibilidade e Especificidade

Problems with the fast-check HIV rapid test kits.

Rekart, Michael L; Krajden, Mel; Cook, Darrel; McNabb, Gail; Rees, Tony; Isaac-Renton, Judy; Harris, Marianne; Montaner, Julio S G.

CMAJ ; 167(2): 119, 2002 Jul 23.

Artigo em Inglês | MEDLINE | ID: mdl-12160110

Assuntos

Sorodiagnóstico da AIDS/normas , Kit de Reagentes para Diagnóstico/normas , Erros de Diagnóstico , Reações Falso-Negativas , Humanos , Fatores de Tempo

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA