Are dropout imputation methods for scRNA-seq effective for scATAC-seq data?

Liu, Yue; Zhang, Junfeng; Wang, Shulin; Zeng, Xiangxiang; Zhang, Wei

Liu, Yue; Zhang, Junfeng; Wang, Shulin; Zeng, Xiangxiang; Zhang, Wei.

Affiliation

Liu Y; College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China.
Zhang J; College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China.
Wang S; College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China.
Zeng X; College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China.
Zhang W; College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, Hunan 410003, China.

Brief Bioinform ; 23(1)2022 01 17.

Article in En | MEDLINE | ID: mdl-34718405

ABSTRACT

ABSTRACT

The tremendous progress of single-cell sequencing technology has given researchers the opportunity to study cell development and differentiation processes at single-cell resolution. Assay of Transposase-Accessible Chromatin by deep sequencing (ATAC-seq) was proposed for genome-wide analysis of chromatin accessibility. Due to technical limitations or other reasons, dropout events are almost a common occurrence for extremely sparse single-cell ATAC-seq data, leading to confusion in downstream analysis (such as clustering). Although considerable progress has been made in the estimation of scRNA-seq data, there is currently no specific method for the inference of dropout events in single-cell ATAC-seq data. In this paper, we select several state-of-the-art scRNA-seq imputation methods (including MAGIC, SAVER, scImpute, deepImpute, PRIME, bayNorm and knn-smoothing) in recent years to infer dropout peaks in scATAC-seq data, and perform a systematic evaluation of these methods through several downstream analyses. Specifically, we benchmarked these methods in terms of correlation with meta-cell, clustering, subpopulations distance analysis, imputation performance for corruption datasets, identification of TF motifs and computation time. The experimental results indicated that most of the imputed peaks increased the correlation with the reference meta-cell, while the performance of different methods on different datasets varied greatly in different downstream analyses, thus should be used with caution. In general, MAGIC performed better than the other methods most consistently across all assessments. Our source code is freely available at https//github.com/yueyueliu/scATAC-master.

Subject(s)

Single-Cell Analysis; Software; Cluster Analysis; Sequence Analysis, RNA; Exome Sequencing

Key words

dropout events; imputation method; single-cell ATAC-seq; single-cell RNA-seq

Fulltext

XML

PubMed Links

Search on Google