Your browser doesn't support javascript.
loading
Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years.
Wagner, Mathias; Vicinus, Benjamin; Muthra, Sherieda T; Richards, Tereza A; Linder, Roland; Frick, Vilma Oliveira; Groh, Andreas; Rubie, Claudia; Weichert, Frank.
Afiliação
  • Wagner M; Department of Pathology, University of Saarland, Homburg Saar Campus, Homburg Saar, Germany.
  • Vicinus B; Department of General, Visceral, Vascular and Pediatric Surgery, University of Saarland, Homburg Saar Campus, Homburg Saar, Germany; Institute of Virology, University of Saarland, Homburg Saar Campus, Homburg Saar, Germany.
  • Muthra ST; Lombardi Comprehensive Cancer Center, Georgetown University, 37th & O St NW, Washington, DC 20057, United States of America. Electronic address: stm36@georgetown.edu.
  • Richards TA; The Medical Library, University of the West Indies, Mona, Kingston, Jamaica.
  • Linder R; Institute of Medical Informatics, University of Luebeck, Luebeck, Germany.
  • Frick VO; Department of General, Visceral, Vascular and Pediatric Surgery, University of Saarland, Homburg Saar Campus, Homburg Saar, Germany.
  • Groh A; Department of Mathematics, University of Saarland, Saarbrücken Campus, Saarbrücken, Germany.
  • Rubie C; Department of General, Visceral, Vascular and Pediatric Surgery, University of Saarland, Homburg Saar Campus, Homburg Saar, Germany.
  • Weichert F; Department of Computer Science VII, Technical University of Dortmund, Dortmund, Germany.
Comput Biol Med ; 73: 173-85, 2016 06 01.
Article em En | MEDLINE | ID: mdl-27208610
ABSTRACT

BACKGROUND:

The continuous growth of medical sciences literature indicates the need for automated text analysis. Scientific writing which is neither unitary, transcending social situation nor defined by a timeless idea is subject to constant change as it develops in response to evolving knowledge, aims at different goals, and embodies different assumptions about nature and communication. The objective of this study was to evaluate whether publication dates should be considered when performing text mining.

METHODS:

A search of PUBMED for combined references to chemokine identifiers and particular cancer related terms was conducted to detect changes over the past 36 years. Text analyses were performed using freeware available from the World Wide Web. TOEFL Scores of territories hosting institutional affiliations as well as various readability indices were investigated. Further assessment was conducted using Principal Component Analysis. Laboratory examination was performed to evaluate the quality of attempts to extract content from the examined linguistic features.

RESULTS:

The PUBMED search yielded a total of 14,420 abstracts (3,190,219 words). The range of findings in laboratory experimentation were coherent with the variability of the results described in the analyzed body of literature. Increased concurrence of chemokine identifiers together with cancer related terms was found at the abstract and sentence level, whereas complexity of sentences remained fairly stable.

CONCLUSIONS:

The findings of the present study indicate that concurrent references to chemokines and cancer increased over time whereas text complexity remained stable.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Internet / PubMed / Mineração de Dados / Neoplasias Tipo de estudo: Prognostic_studies / Systematic_reviews Limite: Animals / Humans Idioma: En Revista: Comput Biol Med Ano de publicação: 2016 Tipo de documento: Article País de afiliação: Alemanha

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Internet / PubMed / Mineração de Dados / Neoplasias Tipo de estudo: Prognostic_studies / Systematic_reviews Limite: Animals / Humans Idioma: En Revista: Comput Biol Med Ano de publicação: 2016 Tipo de documento: Article País de afiliação: Alemanha