Your browser doesn't support javascript.
loading
Uncovering co-regulatory modules and gene regulatory networks in the heart through machine learning-based analysis of large-scale epigenomic data.
Vahab, Naima; Bonu, Tarun; Kuhlmann, Levin; Ramialison, Mirana; Tyagi, Sonika.
Afiliación
  • Vahab N; School of Computational Technologies, RMIT University, Melbourne VIC 3000, Australia; Department of Infectious Diseases, Alfred Hospital, Prahran VIC 3008, Australia.
  • Bonu T; Faculty of Information Technology, Monash University, Clayton VIC 3800, Australia.
  • Kuhlmann L; Faculty of Information Technology, Monash University, Clayton VIC 3800, Australia.
  • Ramialison M; Murdoch Children Research Institute, Melbourne VIC 3000, Australia.
  • Tyagi S; School of Computational Technologies, RMIT University, Melbourne VIC 3000, Australia; Department of Infectious Diseases, Alfred Hospital, Prahran VIC 3008, Australia. Electronic address: sonika.tyagi@rmit.edu.au.
Comput Biol Med ; 171: 108068, 2024 Mar.
Article en En | MEDLINE | ID: mdl-38354497
ABSTRACT
The availability of large-scale epigenomic data from various cell types and conditions has yielded valuable insights for evaluating and learning features predicting the co-binding of transcription factors (TF). However, prior attempts to develop models predicting motif co-occurrence lacked scalability for globally analyzing any motif combination or making cross-species predictions. Moreover, mapping co-regulatory modules (CRM) to gene regulatory networks (GRN) is crucial for understanding underlying function. Currently, no comprehensive pipeline exists for large-scale, rapid, and accurate CRM and GRN identification. In this study, we analyzed and evaluated different TF binding characteristics facilitating biologically significant co-binding to identify all potential clusters of co-binding TFs. We curated the UniBind database, containing ChIP-Seq data from over 1983 samples and 232 TFs, and implemented two machine learning models to predict CRMs and the potential regulatory networks they operate on. Two machine learning models, Convolution Neural Networks (CNN) and Random Forest Classifier(RFC), used to predict co-binding between TFs, were compared using precision-recall Receiver Operating Characteristic (ROC) curves. CNN outperformed RFC (AUC 0.94 vs. 0.88) and achieved higher F1 scores (0.938 vs. 0.872). The CRMs generated by the clustering algorithm were validated against ChipAtlas and MCOT, revealing additional motifs forming CRMs. We predicted 200k CRMs for 50k+ human genes, validated against recent CRM prediction methods with 100% overlap. Further, we narrowed our focus to study heart-related regulatory motifs, filtering the generated CRMs to report 1784 Cardiac CRMs containing at least four cardiac TFs. Identified cardiac CRMs revealed potential novel regulators like ARID3A and RXRB for SCAD, including known TFs like PPARG for F11R. Our findings highlight the importance of the NKX family of transcription factors in cardiac development and provide potential targets for further investigation in cardiac disease.
Asunto(s)
Palabras clave

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Redes Reguladoras de Genes / Epigenómica Tipo de estudio: Prognostic_studies Idioma: En Revista: Comput Biol Med Año: 2024 Tipo del documento: Article

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Redes Reguladoras de Genes / Epigenómica Tipo de estudio: Prognostic_studies Idioma: En Revista: Comput Biol Med Año: 2024 Tipo del documento: Article