Your browser doesn't support javascript.
loading
A Comprehensive Evaluation of Generalizability of Deep Learning-Based Hi-C Resolution Improvement Methods.
Murtaza, Ghulam; Jain, Atishay; Hughes, Madeline; Wagner, Justin; Singh, Ritambhara.
Afiliación
  • Murtaza G; Department of Computer Science, Brown University, Providence, RI 02912, USA.
  • Jain A; Department of Computer Science, Brown University, Providence, RI 02912, USA.
  • Hughes M; Department of Computer Science, Brown University, Providence, RI 02912, USA.
  • Wagner J; Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.
  • Singh R; Department of Computer Science, Brown University, Providence, RI 02912, USA.
Genes (Basel) ; 15(1)2023 12 29.
Article en En | MEDLINE | ID: mdl-38254945
ABSTRACT
Hi-C is a widely used technique to study the 3D organization of the genome. Due to its high sequencing cost, most of the generated datasets are of a coarse resolution, which makes it impractical to study finer chromatin features such as Topologically Associating Domains (TADs) and chromatin loops. Multiple deep learning-based methods have recently been proposed to increase the resolution of these datasets by imputing Hi-C reads (typically called upscaling). However, the existing works evaluate these methods on either synthetically downsampled datasets, or a small subset of experimentally generated sparse Hi-C datasets, making it hard to establish their generalizability in the real-world use case. We present our framework-Hi-CY-that compares existing Hi-C resolution upscaling methods on seven experimentally generated low-resolution Hi-C datasets belonging to various levels of read sparsities originating from three cell lines on a comprehensive set of evaluation metrics. Hi-CY also includes four downstream analysis tasks, such as TAD and chromatin loops recall, to provide a thorough report on the generalizability of these methods. We observe that existing deep learning methods fail to generalize to experimentally generated sparse Hi-C datasets, showing a performance reduction of up to 57%. As a potential solution, we find that retraining deep learning-based methods with experimentally generated Hi-C datasets improves performance by up to 31%. More importantly, Hi-CY shows that even with retraining, the existing deep learning-based methods struggle to recover biological features such as chromatin loops and TADs when provided with sparse Hi-C datasets. Our study, through the Hi-CY framework, highlights the need for rigorous evaluation in the future. We identify specific avenues for improvements in the current deep learning-based Hi-C upscaling methods, including but not limited to using experimentally generated datasets for training.
Asunto(s)
Palabras clave

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Aprendizaje Profundo Idioma: En Revista: Genes (Basel) Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Aprendizaje Profundo Idioma: En Revista: Genes (Basel) Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos