Your browser doesn't support javascript.
loading
Deciphering cell types by integrating scATAC-seq data with genome sequences.
Zeng, Yuansong; Luo, Mai; Shangguan, Ningyuan; Shi, Peiyu; Feng, Junxi; Xu, Jin; Chen, Ken; Lu, Yutong; Yu, Weijiang; Yang, Yuedong.
Affiliation
  • Zeng Y; School of Big Data and Software Engineering, Chongqing University, Chongqing, China.
  • Luo M; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
  • Shangguan N; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
  • Shi P; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
  • Feng J; State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China.
  • Xu J; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
  • Chen K; State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China.
  • Lu Y; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
  • Yu W; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
  • Yang Y; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
Nat Comput Sci ; 4(4): 285-298, 2024 Apr.
Article de En | MEDLINE | ID: mdl-38600256
ABSTRACT
The single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) technology provides insight into gene regulation and epigenetic heterogeneity at single-cell resolution, but cell annotation from scATAC-seq remains challenging due to high dimensionality and extreme sparsity within the data. Existing cell annotation methods mostly focus on the cell peak matrix without fully utilizing the underlying genomic sequence. Here we propose a method, SANGO, for accurate single-cell annotation by integrating genome sequences around the accessibility peaks within scATAC data. The genome sequences of peaks are encoded into low-dimensional embeddings, and then iteratively used to reconstruct the peak statistics of cells through a fully connected network. The learned weights are considered as regulatory modes to represent cells, and utilized to align the query cells and the annotated cells in the reference data through a graph transformer network for cell annotations. SANGO was demonstrated to consistently outperform competing methods on 55 paired scATAC-seq datasets across samples, platforms and tissues. SANGO was also shown to be able to detect unknown tumor cells through attention edge weights learned by the graph transformer. Moreover, from the annotated cells, we found cell-type-specific peaks that provide functional insights/biological signals through expression enrichment analysis, cis-regulatory chromatin interaction analysis and motif enrichment analysis.
Sujet(s)

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Chromatine / Analyse sur cellule unique Limites: Humans Langue: En Journal: Nat Comput Sci / Nat. comput. sci / Nature computational science Année: 2024 Type de document: Article Pays d'affiliation: Chine Pays de publication: États-Unis d'Amérique

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Chromatine / Analyse sur cellule unique Limites: Humans Langue: En Journal: Nat Comput Sci / Nat. comput. sci / Nature computational science Année: 2024 Type de document: Article Pays d'affiliation: Chine Pays de publication: États-Unis d'Amérique