Your browser doesn't support javascript.
loading
Click-words: learning to predict document keywords from a user perspective.
Islamaj Dogan, Rezarta; Lu, Zhiyong.
Afiliación
  • Islamaj Dogan R; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Bioinformatics ; 26(21): 2767-75, 2010 Nov 01.
Article en En | MEDLINE | ID: mdl-20810602
ABSTRACT
MOTIVATION Recognizing words that are key to a document is important for ranking relevant scientific documents. Traditionally, important words in a document are either nominated subjectively by authors and indexers or selected objectively by some statistical measures. As an alternative, we propose to use documents' words popularity in user queries to identify click-words, a set of prominent words from the users' perspective. Although they often overlap, click-words differ significantly from other document keywords.

RESULTS:

We developed a machine learning approach to learn the unique characteristics of click-words. Each word was represented by a set of features that included different types of information, such as semantic type, part of speech tag, term frequency-inverse document frequency (TF-IDF) weight and location in the abstract. We identified the most important features and evaluated our model using 6 months of PubMed click-through logs. Our results suggest that, in addition to carrying high TF-IDF weight, click-words tend to be biomedical entities, to exist in article titles, and to occur repeatedly in article abstracts. Given the abstract and title of a document, we are able to accurately predict the words likely to appear in user queries that lead to document clicks.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Inteligencia Artificial / Indización y Redacción de Resúmenes Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2010 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Inteligencia Artificial / Indización y Redacción de Resúmenes Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2010 Tipo del documento: Article País de afiliación: Estados Unidos