Your browser doesn't support javascript.
loading
Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait.
Singh, Gurnoor; Papoutsoglou, Evangelia A; Keijts-Lalleman, Frederique; Vencheva, Bilyana; Rice, Mark; Visser, Richard G F; Bachem, Christian W B; Finkers, Richard.
Afiliación
  • Singh G; Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, The Netherlands.
  • Papoutsoglou EA; Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, The Netherlands.
  • Keijts-Lalleman F; IBM Netherlands, Amsterdam, The Netherlands.
  • Vencheva B; IBM Netherlands, Amsterdam, The Netherlands.
  • Rice M; IBM Netherlands, Amsterdam, The Netherlands.
  • Visser RGF; Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, The Netherlands.
  • Bachem CWB; Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, The Netherlands.
  • Finkers R; Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, The Netherlands. richard.finkers@wur.nl.
BMC Plant Biol ; 21(1): 198, 2021 Apr 24.
Article en En | MEDLINE | ID: mdl-33894758
ABSTRACT

BACKGROUND:

Scientific literature carries a wealth of information crucial for research, but only a fraction of it is present as structured information in databases and therefore can be analyzed using traditional data analysis tools. Natural language processing (NLP) is often and successfully employed to support humans by distilling relevant information from large corpora of free text and structuring it in a way that lends itself to further computational analyses. For this pilot, we developed a pipeline that uses NLP on biological literature to produce knowledge networks. We focused on the flesh color of potato, a well-studied trait with known associations, and we investigated whether these knowledge networks can assist us in formulating new hypotheses on the underlying biological processes.

RESULTS:

We trained an NLP model based on a manually annotated corpus of 34 full-text potato articles, to recognize relevant biological entities and relationships between them in text (genes, proteins, metabolites and traits). This model detected the number of biological entities with a precision of 97.65% and a recall of 88.91% on the training set. We conducted a time series analysis on 4023 PubMed abstract of plant genetics-based articles which focus on 4 major Solanaceous crops (tomato, potato, eggplant and capsicum), to determine that the networks contained both previously known and contemporaneously unknown leads to subsequently discovered biological phenomena relating to flesh color. A novel time-based analysis of these networks indicates a connection between our trait and a candidate gene (zeaxanthin epoxidase) already two years prior to explicit statements of that connection in the literature.

CONCLUSIONS:

Our time-based analysis indicates that network-assisted hypothesis generation shows promise for knowledge discovery, data integration and hypothesis generation in scientific research.
Asunto(s)
Palabras clave

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Procesamiento de Lenguaje Natural / Solanum tuberosum / Tubérculos de la Planta / Minería de Datos Idioma: En Revista: BMC Plant Biol Asunto de la revista: BOTANICA Año: 2021 Tipo del documento: Article País de afiliación: Países Bajos

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Procesamiento de Lenguaje Natural / Solanum tuberosum / Tubérculos de la Planta / Minería de Datos Idioma: En Revista: BMC Plant Biol Asunto de la revista: BOTANICA Año: 2021 Tipo del documento: Article País de afiliación: Países Bajos