Your browser doesn't support javascript.
loading
Promoter prediction in nannochloropsis based on densely connected convolutional neural networks.
Wei, Pi-Jing; Pang, Zhen-Zhen; Jiang, Lin-Jie; Tan, Da-Yu; Su, Yan-Sen; Zheng, Chun-Hou.
Afiliação
  • Wei PJ; Institutes of Physical Science and Information Technology, Anhui University, Hefei, China.
  • Pang ZZ; Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, China.
  • Jiang LJ; Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, China.
  • Tan DY; Institutes of Physical Science and Information Technology, Anhui University, Hefei, China.
  • Su YS; School of Artificial Intelligence, Anhui University, Hefei, China. Electronic address: suyansen@ahu.edu.cn.
  • Zheng CH; School of Artificial Intelligence, Anhui University, Hefei, China.
Methods ; 204: 38-46, 2022 08.
Article em En | MEDLINE | ID: mdl-35367367
ABSTRACT
Promoter is a key DNA element located near the transcription start site, which regulates gene transcription by binding RNA polymerase. Thus, the identification of promoters is an important research field in synthetic biology. Nannochloropsis is an important unicellular industrial oleaginous microalgae, and at present, some studies have identified some promoters with specific functions by biological methods in Nannochloropsis, whereas few studies used computational methods. Here, we propose a method called DNPPro (DenseNet-Predict-Promoter) based on densely connected convolutional neural networks to predict the promoter of Nannochloropsis. First, we collected promoter sequences from six Nannochloropsis strains and removed 80% similarity using CD-HIT for each strain to yield a reliable set of positive datasets. Then, in order to construct a robust classifier, within-group scrambling method was used to generate negative dataset which overcomes the limitation of randomly selecting a non-promoter region from the same genome as a negative sample. Finally, we constructed a densely connected convolutional neural network, with the sequence one-hot encoding as the input. Compared with commonly used sequence processing methods, DNPPro can extract long sequence features to a greater extent. The cross-strain experiment on independent dataset verifies the generalization of our method. At the same time, T-SNE visualization analysis shows that our method can effectively distinguish promoters from non-promoters.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Redes Neurais de Computação / Biologia Sintética Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Redes Neurais de Computação / Biologia Sintética Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article