Your browser doesn't support javascript.
loading
Prediction of DNA i-motifs via machine learning.
Yang, Bibo; Guneri, Dilek; Yu, Haopeng; Wright, Elisé P; Chen, Wenqian; Waller, Zoë A E; Ding, Yiliang.
Affiliation
  • Yang B; Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK.
  • Guneri D; School of Pharmacy, University College London, London WC1N 1AX, UK.
  • Yu H; Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK.
  • Wright EP; Molecular Physiology School of Medicine, and Molecular Medicine Research Group, University of Western Sydney, Campbelltown, NSW 1797, Australia.
  • Chen W; School of Pharmacy, University College London, London WC1N 1AX, UK.
  • Waller ZAE; School of Pharmacy, University College London, London WC1N 1AX, UK.
  • Ding Y; Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK.
Nucleic Acids Res ; 52(5): 2188-2197, 2024 Mar 21.
Article in En | MEDLINE | ID: mdl-38364855
ABSTRACT
i-Motifs (iMs), are secondary structures formed in cytosine-rich DNA sequences and are involved in multiple functions in the genome. Although putative iM forming sequences are widely distributed in the human genome, the folding status and strength of putative iMs vary dramatically. Much previous research on iM has focused on assessing the iM folding properties using biophysical experiments. However, there are no dedicated computational tools for predicting the folding status and strength of iM structures. Here, we introduce a machine learning pipeline, iM-Seeker, to predict both folding status and structural stability of DNA iMs. The programme iM-Seeker incorporates a Balanced Random Forest classifier trained on genome-wide iMab antibody-based CUT&Tag sequencing data to predict the folding status and an Extreme Gradient Boosting regressor to estimate the folding strength according to both literature biophysical data and our in-house biophysical experiments. iM-Seeker predicts DNA iM folding status with a classification accuracy of 81% and estimates the folding strength with coefficient of determination (R2) of 0.642 on the test set. Model interpretation confirms that the nucleotide composition of the C-rich sequence significantly affects iM stability, with a positive correlation with sequences containing cytosine and thymine and a negative correlation with guanine and adenine.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: DNA / Nucleotide Motifs / Machine Learning Limits: Humans Language: En Journal: Nucleic Acids Res Year: 2024 Document type: Article Affiliation country:

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: DNA / Nucleotide Motifs / Machine Learning Limits: Humans Language: En Journal: Nucleic Acids Res Year: 2024 Document type: Article Affiliation country:
...