Your browser doesn't support javascript.
loading
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M.
Affiliation
  • Novitsky V; 1 Harvard School of Public Health AIDS Initiative, Department of Immunology and Infectious Diseases, Harvard School of Public Health , Boston, Massachusetts.
AIDS Res Hum Retroviruses ; 31(5): 531-42, 2015 May.
Article in En | MEDLINE | ID: mdl-25560745
ABSTRACT
To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Subject(s)

Full text: 1 Database: MEDLINE Main subject: Cluster Analysis / HIV Infections / HIV / Sequence Analysis, DNA / Computational Biology Limits: Humans Language: En Journal: AIDS Res Hum Retroviruses Journal subject: SINDROME DA IMUNODEFICIENCIA ADQUIRIDA (AIDS) Year: 2015 Type: Article

Full text: 1 Database: MEDLINE Main subject: Cluster Analysis / HIV Infections / HIV / Sequence Analysis, DNA / Computational Biology Limits: Humans Language: En Journal: AIDS Res Hum Retroviruses Journal subject: SINDROME DA IMUNODEFICIENCIA ADQUIRIDA (AIDS) Year: 2015 Type: Article