Your browser doesn't support javascript.
loading
Improving the utility of MeSH® terms using the TopicalMeSH representation.
Yu, Zhiguo; Bernstam, Elmer; Cohen, Trevor; Wallace, Byron C; Johnson, Todd R.
Afiliación
  • Yu Z; School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
  • Bernstam E; School of Biomedical Informatics and Department of Internal Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA.
  • Cohen T; School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
  • Wallace BC; School of Information, University of Texas at Austin, Austin, TX, USA.
  • Johnson TR; School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA. Electronic address: Todd.R.Johnson@uth.tmc.edu.
J Biomed Inform ; 61: 77-86, 2016 06.
Article en En | MEDLINE | ID: mdl-27001195
OBJECTIVE: To evaluate whether vector representations encoding latent topic proportions that capture similarities to MeSH terms can improve performance on biomedical document retrieval and classification tasks, compared to using MeSH terms. MATERIALS AND METHODS: We developed the TopicalMeSH representation, which exploits the 'correspondence' between topics generated using latent Dirichlet allocation (LDA) and MeSH terms to create new document representations that combine MeSH terms and latent topic vectors. We used 15 systematic drug review corpora to evaluate performance on information retrieval and classification tasks using this TopicalMeSH representation, compared to using standard encodings that rely on either (1) the original MeSH terms, (2) the text, or (3) their combination. For the document retrieval task, we compared the precision and recall achieved by ranking citations using MeSH and TopicalMeSH representations, respectively. For the classification task, we considered three supervised machine learning approaches, Support Vector Machines (SVMs), logistic regression, and decision trees. We used these to classify documents as relevant or irrelevant using (independently) MeSH, TopicalMeSH, Words (i.e., n-grams extracted from citation titles and abstracts, encoded via bag-of-words representation), a combination of MeSH and Words, and a combination of TopicalMeSH and Words. We also used SVM to compare the classification performance of tf-idf weighted MeSH terms, LDA Topics, a combination of Topics and MeSH, and TopicalMeSH to supervised LDA's classification performance. RESULTS: For the document retrieval task, using the TopicalMeSH representation resulted in higher precision than MeSH in 11 of 15 corpora while achieving the same recall. For the classification task, use of TopicalMeSH features realized a higher F1 score in 14 of 15 corpora when used by SVMs, 12 of 15 corpora using logistic regression, and 12 of 15 corpora using decision trees. TopicalMeSH also had better document classification performance on 12 of 15 corpora when compared to Topics, tf-idf weighted MeSH terms, and a combination of Topics and MeSH using SVMs. Supervised LDA achieved the worst performance in most of the corpora. CONCLUSION: The proposed TopicalMeSH representation (which combines MeSH terms with latent topics) consistently improved performance on document retrieval and classification tasks, compared to using alternative standard representations using MeSH terms alone, as well as, several standard alternative approaches.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Almacenamiento y Recuperación de la Información / Medical Subject Headings / Máquina de Vectores de Soporte Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: J Biomed Inform Asunto de la revista: INFORMATICA MEDICA Año: 2016 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Almacenamiento y Recuperación de la Información / Medical Subject Headings / Máquina de Vectores de Soporte Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: J Biomed Inform Asunto de la revista: INFORMATICA MEDICA Año: 2016 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Estados Unidos