Anchor Clustering for million-scale immune repertoire sequencing data.

Chang, Haiyang; Ashlock, Daniel A; Graether, Steffen P; Keller, Stefan M

Chang, Haiyang; Ashlock, Daniel A; Graether, Steffen P; Keller, Stefan M.

Affiliation

Chang H; Department of Mathematics and Statistics, University of Guelph, 50 Stone Rd E, Guelph, ON, N1G 2W1, Canada.
Ashlock DA; Department of Mathematics and Statistics, University of Guelph, 50 Stone Rd E, Guelph, ON, N1G 2W1, Canada.
Graether SP; Department of Molecular and Cellular Biology, University of Guelph, 50 Stone Rd E, Guelph, ON, N1G 2W1, Canada.
Keller SM; Department of Pathology, Microbiology and Immunology, School of Veterinary Medicine, University of California Davis, One Shields Avenue, Davis, CA, 95616, USA. smkeller@ucdavis.edu.

BMC Bioinformatics ; 25(1): 42, 2024 Jan 25.

Article in En | MEDLINE | ID: mdl-38273275

ABSTRACT

ABSTRACT

BACKGROUND:

The clustering of immune repertoire data is challenging due to the computational cost associated with a very large number of pairwise sequence comparisons. To overcome this limitation, we developed Anchor Clustering, an unsupervised clustering method designed to identify similar sequences from millions of antigen receptor gene sequences. First, a Point Packing algorithm is used to identify a set of maximally spaced anchor sequences. Then, the genetic distance of the remaining sequences to all anchor sequences is calculated and transformed into distance vectors. Finally, distance vectors are clustered using unsupervised clustering. This process is repeated iteratively until the resulting clusters are small enough so that pairwise distance comparisons can be performed.

RESULTS:

Our results demonstrate that Anchor Clustering is faster than existing pairwise comparison clustering methods while providing similar clustering quality. With its flexible, memory-saving strategy, Anchor Clustering is capable of clustering millions of antigen receptor gene sequences in just a few minutes.

CONCLUSIONS:

This method enables the meta-analysis of immune-repertoire data from different studies and could contribute to a more comprehensive understanding of the immune repertoire data space.

Subject(s)

Algorithms; Receptors, Antigen; Cluster Analysis

Key words

Clonal relationship; Immune repertoire; Lymphocyte antigen receptor; Unsupervised clustering

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Algorithms / Receptors, Antigen Type of study: Systematic_reviews Language: En Journal: BMC Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2024 Document type: Article Affiliation country: Canada

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google