A Comprehensive Evaluation of Generalizability of Deep Learning-Based Hi-C Resolution Improvement Methods.

Murtaza, Ghulam; Jain, Atishay; Hughes, Madeline; Wagner, Justin; Singh, Ritambhara

Murtaza, Ghulam; Jain, Atishay; Hughes, Madeline; Wagner, Justin; Singh, Ritambhara.

Afiliación

Murtaza G; Department of Computer Science, Brown University, Providence, RI 02912, USA.
Jain A; Department of Computer Science, Brown University, Providence, RI 02912, USA.
Hughes M; Department of Computer Science, Brown University, Providence, RI 02912, USA.
Wagner J; Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.
Singh R; Department of Computer Science, Brown University, Providence, RI 02912, USA.

Genes (Basel) ; 15(1)2023 12 29.

Article en En | MEDLINE | ID: mdl-38254945

ABSTRACT

ABSTRACT

Hi-C is a widely used technique to study the 3D organization of the genome. Due to its high sequencing cost, most of the generated datasets are of a coarse resolution, which makes it impractical to study finer chromatin features such as Topologically Associating Domains (TADs) and chromatin loops. Multiple deep learning-based methods have recently been proposed to increase the resolution of these datasets by imputing Hi-C reads (typically called upscaling). However, the existing works evaluate these methods on either synthetically downsampled datasets, or a small subset of experimentally generated sparse Hi-C datasets, making it hard to establish their generalizability in the real-world use case. We present our framework-Hi-CY-that compares existing Hi-C resolution upscaling methods on seven experimentally generated low-resolution Hi-C datasets belonging to various levels of read sparsities originating from three cell lines on a comprehensive set of evaluation metrics. Hi-CY also includes four downstream analysis tasks, such as TAD and chromatin loops recall, to provide a thorough report on the generalizability of these methods. We observe that existing deep learning methods fail to generalize to experimentally generated sparse Hi-C datasets, showing a performance reduction of up to 57%. As a potential solution, we find that retraining deep learning-based methods with experimentally generated Hi-C datasets improves performance by up to 31%. More importantly, Hi-CY shows that even with retraining, the existing deep learning-based methods struggle to recover biological features such as chromatin loops and TADs when provided with sparse Hi-C datasets. Our study, through the Hi-CY framework, highlights the need for rigorous evaluation in the future. We identify specific avenues for improvements in the current deep learning-based Hi-C upscaling methods, including but not limited to using experimentally generated datasets for training.

Asunto(s)

Aprendizaje Profundo; Benchmarking; Línea Celular; Cromatina/genética

Palabras clave

Hi-C; chromosome conformation capture; generalizability; resolution improvement

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Aprendizaje Profundo Idioma: En Revista: Genes (Basel) Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google