Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach.

Hu, Yan; Keloth, Vipina K; Raja, Kalpana; Chen, Yong; Xu, Hua

Hu, Yan; Keloth, Vipina K; Raja, Kalpana; Chen, Yong; Xu, Hua.

Afiliação

Hu Y; School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77054, United States.
Keloth VK; Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, 100 College St, New Haven, CT 06510, United States.
Raja K; Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, 100 College St, New Haven, CT 06510, United States.
Chen Y; Center for Health Analytics and Synthesis of Evidence (CHASE), Department of Biostatistics, Epide-miology and Informatics, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104, United States.
Xu H; Penn Medicine Center for Evidence-based Practice (CEP), University of Pennsylvania, 3600 Civic Center Blvd, Philadelphia, PA 19104, United States.

Bioinformatics ; 2023 Sep 05.

Article em En | MEDLINE | ID: mdl-37669123

ABSTRACT

ABSTRACT

MOTIVATION Automated extraction of participants, intervention, comparison/control, and outcome (PICO) from the randomized controlled trial (RCT) abstracts is important for evidence synthesis. Previous studies have demonstrated the feasibility of applying natural language processing (NLP) for PICO extraction. However, the performance is not optimal due to the complexity of PICO information in RCT abstracts and the challenges involved in their annotation.

RESULTS:

We propose a two-step NLP pipeline to extract PICO elements from RCT abstracts (i) sentence classification using a prompt-based learning model and (ii) PICO extraction using a named entity recognition (NER) model. First, the sentences in abstracts were categorized into four sections namely background, methods, results, and conclusions. Next, the NER model was applied to extract the PICO elements from the sentences within the title and methods sections that include >96% of PICO information. We evaluated our proposed NLP pipeline on three datasets, the EBM-NLPmoddataset, a randomly selected and reannotated dataset of 500 RCT abstracts from the EBM-NLP corpus, a dataset of 150 COVID-19 RCT abstracts, and a dataset of 150 Alzheimer's disease (AD) RCT abstracts. The end-to-end evaluation reveals that our proposed approach achieved an overall micro F1 score of 0.833 on the EBM-NLPmod dataset, 0.928 on the COVID-19 dataset, and 0.899 on the AD dataset when measured at the token-level and an overall micro F1 score of 0.712 on EBM-NLPmod dataset, 0.850 on the COVID-19 dataset, and 0.805 on the AD dataset when measured at the entity-level.

AVAILABILITY:

Our codes and datasets are publicly available at https//github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Clinical_trials / Prognostic_studies Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google