Toward Robust Self-Training Paradigm for Molecular Prediction Tasks.

Ma, Hehuan; Jiang, Feng; Rong, Yu; Guo, Yuzhi; Huang, Junzhou

Ma, Hehuan; Jiang, Feng; Rong, Yu; Guo, Yuzhi; Huang, Junzhou.

Afiliação

Ma H; Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA.
Jiang F; Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA.
Rong Y; Tecent AI Lab, Shenzhen, China.
Guo Y; Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA.
Huang J; Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA.

J Comput Biol ; 31(3): 213-228, 2024 03.

Article em En | MEDLINE | ID: mdl-38531049

ABSTRACT

ABSTRACT

Molecular prediction tasks normally demand a series of professional experiments to label the target molecule, which suffers from the limited labeled data problem. One of the semisupervised learning paradigms, known as self-training, utilizes both labeled and unlabeled data. Specifically, a teacher model is trained using labeled data and produces pseudo labels for unlabeled data. These labeled and pseudo-labeled data are then jointly used to train a student model. However, the pseudo labels generated from the teacher model are generally not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels in two paradigms, that is, generic and adaptive. We have conducted experiments on three molecular biology prediction tasks with four backbone models to gradually evaluate the performance of the proposed robust self-training strategy. The results demonstrate that the proposed method enhances prediction performance across all tasks, notably within molecular regression tasks, where there has been an average enhancement of 41.5%. Furthermore, the visualization analysis confirms the superiority of our method. Our proposed robust self-training is a simple yet effective strategy that efficiently improves molecular biology prediction performance. It tackles the labeled data insufficient issue in molecular biology by taking advantage of both labeled and unlabeled data. Moreover, it can be easily embedded with any prediction task, which serves as a universal approach for the bioinformatics community.

Assuntos

Biologia Computacional; Biologia Molecular; Humanos; Aprendizado de Máquina Supervisionado

Palavras-chave

deep learning; molecular prediction tasks; semisupervised learning

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Biologia Computacional / Biologia Molecular Limite: Humans Idioma: En Revista: J Comput Biol / J. comput. biol / Journal of computational biology Assunto da revista: BIOLOGIA MOLECULAR / INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google