Your browser doesn't support javascript.
loading
Avoiding background knowledge: literature based discovery from important information.
Preiss, Judita.
Afiliação
  • Preiss J; Information School, University of Sheffield, S1 4DP, Sheffield, UK. judita.preiss@sheffield.ac.uk.
BMC Bioinformatics ; 23(Suppl 9): 570, 2023 Mar 14.
Article em En | MEDLINE | ID: mdl-36918777
ABSTRACT

BACKGROUND:

Automatic literature based discovery attempts to uncover new knowledge by connecting existing facts information extracted from existing publications in the form of [Formula see text] and [Formula see text] relations can be simply connected to deduce [Formula see text]. However, using this approach, the quantity of proposed connections is often too vast to be useful. It can be reduced by using subject[Formula see text](predicate)[Formula see text]object triples as the [Formula see text] relations, but too many proposed connections remain for manual verification.

RESULTS:

Based on the hypothesis that only a small number of subject-predicate-object triples extracted from a publication represent the paper's novel contribution(s), we explore using BERT embeddings to identify these before literature based discovery is performed utilizing only these, important, triples. While the method exploits the availability of full texts of publications in the CORD-19 dataset-making use of the fact that a novel contribution is likely to be mentioned in both an abstract and the body of a paper-to build a training set, the resulting tool can be applied to papers with only abstracts available. Candidate hidden knowledge pairs generated from unfiltered triples and those built from important triples only are compared using a variety of timeslicing gold standards.

CONCLUSIONS:

The quantity of proposed knowledge pairs is reduced by a factor of [Formula see text], and we show that when the gold standard is designed to avoid rewarding background knowledge, the precision obtained increases up to a factor of 10. We argue that the gold standard needs to be carefully considered, and release as yet undiscovered candidate knowledge pairs based on important triples alongside this work.
Assuntos
Palavras-chave

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Conhecimento / Descoberta do Conhecimento Tipo de estudo: Prognostic_studies Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Reino Unido

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Conhecimento / Descoberta do Conhecimento Tipo de estudo: Prognostic_studies Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Reino Unido