Your browser doesn't support javascript.
loading
Machine learning models for abstract screening task - A systematic literature review application for health economics and outcome research.
Du, Jingcheng; Soysal, Ekin; Wang, Dong; He, Long; Lin, Bin; Wang, Jingqi; Manion, Frank J; Li, Yeran; Wu, Elise; Yao, Lixia.
Affiliation
  • Du J; Intelligent Medical Objects, Houston, TX, USA.
  • Soysal E; Intelligent Medical Objects, Houston, TX, USA.
  • Wang D; McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
  • He L; Merck & Co., Inc, Rahway, NJ, USA.
  • Lin B; Intelligent Medical Objects, Houston, TX, USA.
  • Wang J; Intelligent Medical Objects, Houston, TX, USA.
  • Manion FJ; Intelligent Medical Objects, Houston, TX, USA.
  • Li Y; Intelligent Medical Objects, Houston, TX, USA.
  • Wu E; Merck & Co., Inc, Rahway, NJ, USA.
  • Yao L; Merck & Co., Inc, Rahway, NJ, USA.
BMC Med Res Methodol ; 24(1): 108, 2024 May 09.
Article in En | MEDLINE | ID: mdl-38724903
ABSTRACT

OBJECTIVE:

Systematic literature reviews (SLRs) are critical for life-science research. However, the manual selection and retrieval of relevant publications can be a time-consuming process. This study aims to (1) develop two disease-specific annotated corpora, one for human papillomavirus (HPV) associated diseases and the other for pneumococcal-associated pediatric diseases (PAPD), and (2) optimize machine- and deep-learning models to facilitate automation of the SLR abstract screening.

METHODS:

This study constructed two disease-specific SLR screening corpora for HPV and PAPD, which contained citation metadata and corresponding abstracts. Performance was evaluated using precision, recall, accuracy, and F1-score of multiple combinations of machine- and deep-learning algorithms and features such as keywords and MeSH terms. RESULTS AND

CONCLUSIONS:

The HPV corpus contained 1697 entries, with 538 relevant and 1159 irrelevant articles. The PAPD corpus included 2865 entries, with 711 relevant and 2154 irrelevant articles. Adding additional features beyond title and abstract improved the performance (measured in Accuracy) of machine learning models by 3% for HPV corpus and 2% for PAPD corpus. Transformer-based deep learning models that consistently outperformed conventional machine learning algorithms, highlighting the strength of domain-specific pre-trained language models for SLR abstract screening. This study provides a foundation for the development of more intelligent SLR systems.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Papillomavirus Infections / Machine Learning Limits: Humans Language: En Journal: BMC Med Res Methodol Journal subject: MEDICINA Year: 2024 Document type: Article Affiliation country: Estados Unidos Country of publication: Reino Unido

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Papillomavirus Infections / Machine Learning Limits: Humans Language: En Journal: BMC Med Res Methodol Journal subject: MEDICINA Year: 2024 Document type: Article Affiliation country: Estados Unidos Country of publication: Reino Unido