Text classification models for assessing the completeness of randomized controlled trial publications based on CONSORT reporting guidelines.

Jiang, Lan; Lan, Mengfei; Menke, Joe D; Vorland, Colby J; Kilicoglu, Halil

Jiang, Lan; Lan, Mengfei; Menke, Joe D; Vorland, Colby J; Kilicoglu, Halil.

Afiliação

Jiang L; School of Information Sciences, University of Illinois Urbana-Champaign, 501 E Daniel Street, Champaign, IL, 61820, USA.
Lan M; School of Information Sciences, University of Illinois Urbana-Champaign, 501 E Daniel Street, Champaign, IL, 61820, USA.
Menke JD; School of Information Sciences, University of Illinois Urbana-Champaign, 501 E Daniel Street, Champaign, IL, 61820, USA.
Vorland CJ; School of Public Health, Indiana University, Bloomington, IN, USA.
Kilicoglu H; School of Information Sciences, University of Illinois Urbana-Champaign, 501 E Daniel Street, Champaign, IL, 61820, USA. halil@illinois.edu.

Sci Rep ; 14(1): 21721, 2024 09 17.

Article em En | MEDLINE | ID: mdl-39289403

ABSTRACT

ABSTRACT

Complete and transparent reporting of randomized controlled trial publications (RCTs) is essential for assessing their credibility. We aimed to develop text classification models for determining whether RCT publications report CONSORT checklist items. Using a corpus annotated with 37 fine-grained CONSORT items, we trained sentence classification models (PubMedBERT fine-tuning, BioGPT fine-tuning, and in-context learning with GPT-4) and compared their performance. We assessed the impact of data augmentation methods (Easy Data Augmentation (EDA), UMLS-EDA, text generation and rephrasing with GPT-4) on model performance. We also fine-tuned section-specific PubMedBERT models (e.g., Methods) to evaluate whether they could improve performance compared to the single full model. We performed 5-fold cross-validation and report precision, recall, F1 score, and area under curve (AUC). Fine-tuned PubMedBERT model that uses the sentence along with the surrounding sentences and section headers yielded the best overall performance (sentence level 0.71 micro-F1, 0.67 macro-F1; article-level 0.90 micro-F1, 0.84 macro-F1). Data augmentation had limited positive effect. BioGPT fine-tuning and GPT-4 in-context learning exhibited suboptimal results. Methods-specific model improved recognition of methodology items, other section-specific models did not have significant impact. Most CONSORT checklist items can be recognized reasonably well with the fine-tuned PubMedBERT model but there is room for improvement. Improved models can underpin the journal editorial workflows and CONSORT adherence checks.

Assuntos

Lista de Checagem; Ensaios Clínicos Controlados Aleatórios como Assunto; Ensaios Clínicos Controlados Aleatórios como Assunto/normas; Humanos; Guias como Assunto

Palavras-chave

CONSORT; Reporting guidelines; Reporting transparency; Sentence classification; Text mining

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Ensaios Clínicos Controlados Aleatórios como Assunto / Lista de Checagem Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google