Your browser doesn't support javascript.
loading
Employing bimodal representations to predict DNA bendability within a self-supervised pre-trained framework.
Yang, Minghao; Zhang, Shichen; Zheng, Zhihang; Zhang, Pengfei; Liang, Yan; Tang, Shaojun.
Afiliação
  • Yang M; Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China.
  • Zhang S; Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China.
  • Zheng Z; Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China.
  • Zhang P; Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China.
  • Liang Y; School of Artificial Intelligence, South China Normal University, Foshan 528225, China.
  • Tang S; Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China.
Nucleic Acids Res ; 52(6): e33, 2024 Apr 12.
Article em En | MEDLINE | ID: mdl-38375921
ABSTRACT
The bendability of genomic DNA, which measures the DNA looping rate, is crucial for numerous biological processes of DNA. Recently, an advanced high-throughput technique known as 'loop-seq' has made it possible to measure the inherent cyclizability of DNA fragments. However, quantifying the bendability of large-scale DNA is costly, laborious, and time-consuming. To close the gap between rapidly evolving large language models and expanding genomic sequence information, and to elucidate the DNA bendability's impact on critical regulatory sequence motifs such as super-enhancers in the human genome, we introduce an innovative computational model, named MIXBend, to forecast the DNA bendability utilizing both nucleotide sequences and physicochemical properties. In MIXBend, a pre-trained language model DNABERT and convolutional neural network with attention mechanism are utilized to construct both sequence- and physicochemical-based extractors for the sophisticated refinement of DNA sequence representations. These bimodal DNA representations are then fed to a k-mer sequence-physicochemistry matching module to minimize the semantic gap between each modality. Lastly, a self-attention fusion layer is employed for the prediction of DNA bendability. In conclusion, the experimental results validate MIXBend's superior performance relative to other state-of-the-art methods. Additionally, MIXBend reveals both novel and known motifs from the yeast. Moreover, MIXBend discovers significant bendability fluctuations within super-enhancer regions and transcription factors binding sites in the human genome.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: DNA / Biologia Computacional Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: DNA / Biologia Computacional Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article