Your browser doesn't support javascript.
loading
Automated Phrase Mining from Massive Text Corpora.
Shang, Jingbo; Liu, Jialu; Jiang, Meng; Ren, Xiang; Voss, Clare R; Han, Jiawei.
Afiliação
  • Shang J; Department of Computer Science in University of Illinois at Urbana-Champaign, IL, USA.
  • Liu J; Google Research, NY, USA.
  • Jiang M; Department of Computer Science in University of Illinois at Urbana-Champaign, IL, USA.
  • Ren X; Department of Computer Science in University of Illinois at Urbana-Champaign, IL, USA.
  • Voss CR; US Army Research Lab.
  • Han J; Department of Computer Science in University of Illinois at Urbana-Champaign, IL, USA.
IEEE Trans Knowl Data Eng ; 30(10): 1825-1837, 2018 Oct.
Article em En | MEDLINE | ID: mdl-31105412
ABSTRACT
As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus and has various downstream applications including information extraction/retrieval, taxonomy construction, and topic modeling. Most existing methods rely on complex, trained linguistic analyzers, and thus likely have unsatisfactory performance on text corpora of new domains and genres without extra but expensive adaption. None of the state-of-the-art models, even data-driven models, is fully automated because they require human experts for designing rules or labeling phrases. In this paper, we propose a novel framework for automated phrase mining, AutoPhrase, which supports any language as long as a general knowledge base (e.g., Wikipedia) in that language is available, while benefiting from, but not requiring, a POS tagger. Compared to the state-of-the-art methods, AutoPhrase has shown significant improvements in both effectiveness and efficiency on five real-world datasets across different domains and languages. Besides, AutoPhrase can be extend to model single-word quality phrases.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: IEEE Trans Knowl Data Eng Ano de publicação: 2018 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: IEEE Trans Knowl Data Eng Ano de publicação: 2018 Tipo de documento: Article País de afiliação: Estados Unidos