Your browser doesn't support javascript.
loading
Solr-Plant: efficient extraction of plant names from text.
Sharma, Vivekanand; Restrepo, Maria Isabel; Sarkar, Indra Neil.
Afiliación
  • Sharma V; Center for Biomedical Informatics, Brown University, Box G-R, Providence, RI, USA.
  • Restrepo MI; Center for Biomedical Informatics, Brown University, Box G-R, Providence, RI, USA.
  • Sarkar IN; Center for Biomedical Informatics, Brown University, Box G-R, Providence, RI, USA. neil_sarkar@brown.edu.
BMC Bioinformatics ; 20(1): 263, 2019 May 22.
Article en En | MEDLINE | ID: mdl-31117932
ABSTRACT

BACKGROUND:

The retrieval of plant-related information is a challenging task due to variations in species name mentions as well as spelling or typographical errors across data sources. Scalable solutions are needed for identifying plant name mentions from text and resolving them to accepted taxonomic names.

RESULTS:

An Apache Solr-based fuzzy matching system enhanced with the Smith-Waterman alignment algorithm ("Solr-Plant") was developed for mapping and resolution to a plant name and synonym thesaurus. Evaluation of Solr-Plant suggests promising results in terms of both accuracy and processing efficiency on misspelled species names from two benchmark datasets (1) SALVIAS and (2) National Center for Biotechnology Information (NCBI) Taxonomy. Additional evaluation using S800 text corpus also reflects high precision and recall. The latest version of the source code is available at https//github.com/bcbi/SolrPlantAPI . A REST-compliant web interface and service for Solr-Plant is hosted at http//bcbi.brown.edu/solrplant .

CONCLUSION:

Automated techniques are needed for efficient and accurate identification of knowledge linked with biological scientific names. Solr-Plant complements the current state-of-the-art in terms of both efficiency and accuracy in identification of names restricted at species level. The approach can be extended to identify broader groups of organisms at different taxonomic levels. The results reflect potential utility of Solr-Plant as a data mining tool for extracting and correcting plant species names.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Plantas / Algoritmos / Minería de Datos / Terminología como Asunto Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2019 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Plantas / Algoritmos / Minería de Datos / Terminología como Asunto Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2019 Tipo del documento: Article País de afiliación: Estados Unidos
...