Your browser doesn't support javascript.
loading
Examining linguistic shifts between preprints and publications.
Nicholson, David N; Rubinetti, Vincent; Hu, Dongbo; Thielk, Marvin; Hunter, Lawrence E; Greene, Casey S.
Afiliação
  • Nicholson DN; Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
  • Rubinetti V; Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
  • Hu D; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America.
  • Thielk M; Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
  • Hunter LE; Elsevier, Philadelphia, Pennsylvania, United States of America.
  • Greene CS; Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, Colorado, United States of America.
PLoS Biol ; 20(2): e3001470, 2022 02.
Article em En | MEDLINE | ID: mdl-35104289
ABSTRACT
Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https//greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Revisão da Pesquisa por Pares / Pré-Publicações como Assunto / Idioma Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Revisão da Pesquisa por Pares / Pré-Publicações como Assunto / Idioma Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article