Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.

Chato, Connor; Feng, Yi; Ruan, Yuhua; Xing, Hui; Herbeck, Joshua; Kalish, Marcia; Poon, Art F Y

Chato, Connor; Feng, Yi; Ruan, Yuhua; Xing, Hui; Herbeck, Joshua; Kalish, Marcia; Poon, Art F Y.

Affiliation

Chato C; Department of Pathology and Laboratory Medicine, Western University, London, Canada.
Feng Y; Division of Virology and Immunology, National Center for AIDS/STD Control and Prevention (NCAIDS), Chinese Center for Disease Control and Prevention (China-CDC), Beijing, China.
Ruan Y; Division of Virology and Immunology, National Center for AIDS/STD Control and Prevention (NCAIDS), Chinese Center for Disease Control and Prevention (China-CDC), Beijing, China.
Xing H; Division of Virology and Immunology, National Center for AIDS/STD Control and Prevention (NCAIDS), Chinese Center for Disease Control and Prevention (China-CDC), Beijing, China.
Herbeck J; Department of Global Health, University of Washington, Seattle, Washington, United States of America.
Kalish M; Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America.
Poon AFY; Department of Pathology and Laboratory Medicine, Western University, London, Canada.

PLoS Comput Biol ; 18(11): e1010745, 2022 11.

Article in En | MEDLINE | ID: mdl-36449514

ABSTRACT

Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random sampling. Consequently the ideal threshold for public health applications varies substantially across settings. Here, we show a method which selects optimal thresholds for phylogenetic (subset tree) clustering based on population. We evaluated this method on HIV-1 pol datasets (n = 14, 221 sequences) from four sites in USA (Tennessee, Washington), Canada (Northern Alberta) and China (Beijing). Clusters were defined by tips descending from an ancestral node (with a minimum bootstrap support of 95%) through a series of branches, each with a length below a given threshold. Next, we used pplacer to graft new cases to the fixed tree by maximum likelihood. We evaluated the effect of varying branch-length thresholds on cluster growth as a count outcome by fitting two Poisson regression models: a null model that predicts growth from cluster size, and an alternative model that includes mean collection date as an additional covariate. The alternative model was favoured by AIC across most thresholds, with optimal (greatest difference in AIC) thresholds ranging 0.007-0.013 across sites. The range of optimal thresholds was more variable when re-sampling 80% of the data by location (IQR 0.008 - 0.016, n = 100 replicates). Our results use prospective phylogenetic cluster growth and suggest that there is more variation in effective thresholds for public health than those typically used in clustering studies.

Subject(s)

HIV Infections; HIV-1; Humans; HIV-1/genetics; Phylogeny; Prospective Studies; Public Health; HIV Infections/epidemiology; Cluster Analysis

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Database: MEDLINE Main subject: HIV Infections / HIV-1 Type of study: Prognostic_studies Limits: Humans Language: En Journal: PLoS Comput Biol Journal subject: BIOLOGIA / INFORMATICA MEDICA Year: 2022 Type: Article Affiliation country: Canada

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google