Scalable and unbiased sequence-informed embedding of single-cell ATAC-seq data with CellSpace.
Nat Methods
; 21(6): 1014-1022, 2024 Jun.
Article
em En
| MEDLINE
| ID: mdl-38724693
ABSTRACT
Standard scATAC sequencing (scATAC-seq) analysis pipelines represent cells as sparse numeric vectors relative to an atlas of peaks or genomic tiles and consequently ignore genomic sequence information at accessible loci. Here we present CellSpace, an efficient and scalable sequence-informed embedding algorithm for scATAC-seq that learns a mapping of DNA k-mers and cells to the same space, to address this limitation. We show that CellSpace captures meaningful latent structure in scATAC-seq datasets, including cell subpopulations and developmental hierarchies, and can score transcription factor activities in single cells based on proximity to binding motifs embedded in the same space. Importantly, CellSpace implicitly mitigates batch effects arising from multiple samples, donors or assays, even when individual datasets are processed relative to different peak atlases. Thus, CellSpace provides a powerful tool for integrating and interpreting large-scale scATAC-seq compendia.
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Algoritmos
/
Análise de Célula Única
/
Sequenciamento de Cromatina por Imunoprecipitação
Limite:
Animals
/
Humans
Idioma:
En
Ano de publicação:
2024
Tipo de documento:
Article