GraphPro: An interpretable graph neural network-based model for identifying promoters in multiple species.

Zhang, Qi; Wei, Yuxiao; Liu, Liwei

Zhang, Qi; Wei, Yuxiao; Liu, Liwei.

Afiliação

Zhang Q; College of Science, Dalian Jiaotong University, Dalian, 116028, China.
Wei Y; College of Software, Dalian Jiaotong University, Dalian, 116028, China.
Liu L; College of Science, Dalian Jiaotong University, Dalian, 116028, China. Electronic address: liutree80@163.com.

Comput Biol Med ; 180: 108974, 2024 Sep.

Article em En | MEDLINE | ID: mdl-39096613

ABSTRACT

ABSTRACT

Promoters are DNA sequences that bind with RNA polymerase to initiate transcription, regulating this process through interactions with transcription factors. Accurate identification of promoters is crucial for understanding gene expression regulation mechanisms and developing therapeutic approaches for various diseases. However, experimental techniques for promoter identification are often expensive, time-consuming, and inefficient, necessitating the development of accurate and efficient computational models for this task. Enhancing the model's ability to recognize promoters across multiple species and improving its interpretability pose significant challenges. In this study, we introduce a novel interpretable model based on graph neural networks, named GraphPro, for multi-species promoter identification. Initially, we encode the sequences using k-tuple nucleotide frequency pattern, dinucleotide physicochemical properties, and dna2vec. Subsequently, we construct two feature extraction modules based on convolutional neural networks and graph neural networks. These modules aim to extract specific motifs from the promoters, learn their dependencies, and capture the underlying structural features of the promoters, providing a more comprehensive representation. Finally, a fully connected neural network predicts whether the input sequence is a promoter. We conducted extensive experiments on promoter datasets from eight species, including Human, Mouse, and Escherichia coli. The experimental results show that the average Sn, Sp, Acc and MCC values of GraphPro are 0.9123, 0.9482, 0.8840 and 0.7984, respectively. Compared with previous promoter identification methods, GraphPro not only achieves better recognition accuracy on multiple species, but also outperforms all previous methods in cross-species prediction ability. Furthermore, by visualizing GraphPro's decision process and analyzing the sequences matching the transcription factor binding motifs captured by the model, we validate its significant advantages in biological interpretability. The source code for GraphPro is available at https//github.com/liuliwei1980/GraphPro.

Assuntos

Redes Neurais de Computação; Regiões Promotoras Genéticas; Humanos; Animais; Biologia Computacional/métodos; Análise de Sequência de DNA/métodos; Camundongos; Software

Palavras-chave

Deep learning; Model interpretability; Promoter; Representation learning

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Regiões Promotoras Genéticas / Redes Neurais de Computação Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google