Improving long COVID-related text classification: a novel end-to-end domain-adaptive paraphrasing framework.

Somayajula, Sai Ashish; Litake, Onkar; Liang, Youwei; Hosseini, Ramtin; Nemati, Shamim; Wilson, David O; Weinreb, Robert N; Malhotra, Atul; Xie, Pengtao

Somayajula, Sai Ashish; Litake, Onkar; Liang, Youwei; Hosseini, Ramtin; Nemati, Shamim; Wilson, David O; Weinreb, Robert N; Malhotra, Atul; Xie, Pengtao.

Afiliação

Somayajula SA; Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA.
Litake O; Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA.
Liang Y; Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA.
Hosseini R; Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA.
Nemati S; Division of Biomedical Informatics, University of California, La Jolla, San Diego, USA.
Wilson DO; Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, USA.
Weinreb RN; Hamilton Glaucoma Center, Shiley Eye Center and Department of Ophthalmology, University of California, La Jolla, San Diego, USA.
Malhotra A; UC San Diego Health, Department of Medicine, La Jolla, San Diego, USA.
Xie P; Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA. p1xie@eng.ucsd.edu.

Sci Rep ; 14(1): 85, 2024 01 02.

Article em En | MEDLINE | ID: mdl-38168099

ABSTRACT

ABSTRACT

The emergence of long COVID during the ongoing COVID-19 pandemic has presented considerable challenges for healthcare professionals and researchers. The task of identifying relevant literature is particularly daunting due to the rapidly evolving scientific landscape, inconsistent definitions, and a lack of standardized nomenclature. This paper proposes a novel solution to this challenge by employing machine learning techniques to classify long COVID literature. However, the scarcity of annotated data for machine learning poses a significant obstacle. To overcome this, we introduce a strategy called medical paraphrasing, which diversifies the training data while maintaining the original content. Additionally, we propose a Data-Reweighting-Based Multi-Level Optimization Framework for Domain Adaptive Paraphrasing, supported by a Meta-Weight-Network (MWN). This innovative approach incorporates feedback from the downstream text classification model to influence the training of the paraphrasing model. During the training process, the framework assigns higher weights to the training examples that contribute more effectively to the downstream task of long COVID text classification. Our findings demonstrate that this method substantially improves the accuracy and efficiency of long COVID literature classification, offering a valuable tool for physicians and researchers navigating this complex and ever-evolving field.

Assuntos

COVID-19; Síndrome de COVID-19 Pós-Aguda; Humanos; Pandemias; Aprendizado de Máquina; Pessoal de Saúde

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: COVID-19 / Síndrome de COVID-19 Pós-Aguda Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google