Your browser doesn't support javascript.
loading
Supporting systematic reviews using LDA-based document representations.
Mo, Yuanhan; Kontonatsios, Georgios; Ananiadou, Sophia.
Afiliação
  • Mo Y; School of Computer Science, National Centre for Text Mining, The University of Manchester, Manchester, UK. maxmo2009@gmail.com.
  • Kontonatsios G; School of Computer Science, National Centre for Text Mining, The University of Manchester, Manchester, UK. georgios.kontonatsios@manchester.ac.uk.
  • Ananiadou S; School of Computer Science, National Centre for Text Mining, The University of Manchester, Manchester, UK. Sophia.Ananiadou@manchester.ac.uk.
Syst Rev ; 4: 172, 2015 Nov 26.
Article em En | MEDLINE | ID: mdl-26612232
ABSTRACT

BACKGROUND:

Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies has shown that the use of machine learning and text mining methods to automatically identify relevant studies has the potential to drastically decrease the workload involved in the screening phase. The vast majority of these machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW).

METHODS:

We explore the use of topic modelling methods to derive a more informative representation of studies. We apply Latent Dirichlet allocation (LDA), an unsupervised topic modelling approach, to automatically identify topics in a collection of studies. We then represent each study as a distribution of LDA topics. Additionally, we enrich topics derived using LDA with multi-word terms identified by using an automatic term recognition (ATR) tool. For evaluation purposes, we carry out automatic identification of relevant studies using support vector machine (SVM)-based classifiers that employ both our novel topic-based representation and the BOW representation.

RESULTS:

Our results show that the SVM classifier is able to identify a greater number of relevant studies when using the LDA representation than the BOW representation. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain.

CONCLUSIONS:

A topic-based feature representation of documents outperforms the BOW representation when applied to the task of automatic citation screening. The proposed term-enriched topics are more informative and less ambiguous to systematic reviewers.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Literatura de Revisão como Assunto / Modelos Estatísticos / Pesquisa Biomédica / Mineração de Dados / Máquina de Vetores de Suporte Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2015 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Literatura de Revisão como Assunto / Modelos Estatísticos / Pesquisa Biomédica / Mineração de Dados / Máquina de Vetores de Suporte Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2015 Tipo de documento: Article