TeaTFactor: a prediction tool for tea plant transcription factors based on BERT.
IEEE/ACM Trans Comput Biol Bioinform
; PP2024 Aug 16.
Article
en En
| MEDLINE
| ID: mdl-39150804
ABSTRACT
A transcription factor (TF) is a sequence-specific DNA-binding protein, which plays key roles in cell-fate decision by regulating gene expression. Predicting TFs is key for tea plant research community, as they regulate gene expression, influencing plant growth, development, and stress responses. It is a challenging task through wet lab experimental validation, due to their rarity, as well as the high cost and time requirements. As a result, computational methods are increasingly popular to be chosen. The pre-training strategy has been applied to many tasks in natural language processing (NLP) and has achieved impressive performance. In this paper, we present a novel recognition algorithm named TeaTFactor that utilizes pre-training for the model training of TFs prediction. The model is built upon the BERT architecture, initially pre-trained using protein data from UniProt. Subsequently, the model was fine-tuned using the collected TFs data of tea plants. We evaluated four different word segmentation methods and the existing state-of-the-art prediction tools. According to the comprehensive experimental results and a case study, our model is superior to existing models and achieves the goal of accurate identification. In addition, we have developed a web server at http//teatfactor.tlds.cc, which we believe will facilitate future studies on tea transcription factors and advance the field of crop synthetic biology.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Idioma:
En
Revista:
ACM Trans Comput Biol Bioinform
Asunto de la revista:
BIOLOGIA
/
INFORMATICA MEDICA
Año:
2024
Tipo del documento:
Article