RESUMO
Mutation rates and fitness costs of deleterious mutations are difficult to measure in vivo but essential for a quantitative understanding of evolution. Using whole genome deep sequencing data from longitudinal samples during untreated HIV-1 infection, we estimated mutation rates and fitness costs in HIV-1 from the dynamics of genetic variation. At approximately neutral sites, mutations accumulate with a rate of 1.2 × 10-5 per site per day, in agreement with the rate measured in cell cultures. We estimated the rate from G to A to be the largest, followed by the other transitions C to T, T to C, and A to G, while transversions are less frequent. At other sites, mutations tend to reduce virus replication. We estimated the fitness cost of mutations at every site in the HIV-1 genome using a model of mutation selection balance. About half of all non-synonymous mutations have large fitness costs (>10 percent), while most synonymous mutations have costs <1 percent. The cost of synonymous mutations is especially low in most of pol where we could not detect measurable costs for the majority of synonymous mutations. In contrast, we find high costs for synonymous mutations in important RNA structures and regulatory regions. The intra-patient fitness cost estimates are consistent across multiple patients, indicating that the deleterious part of the fitness landscape is universal and explains a large fraction of global HIV-1 group M diversity.
RESUMO
Deep sequencing is a powerful and cost-effective tool to characterize the genetic diversity and evolution of virus populations. While modern sequencing instruments readily cover viral genomes many thousand fold and very rare variants can in principle be detected, sequencing errors, amplification biases, and other artifacts can limit sensitivity and complicate data interpretation. For this reason, the number of studies using whole genome deep sequencing to characterize viral quasi-species in clinical samples is still limited. We have previously undertaken a large scale whole genome deep sequencing study of HIV-1 populations. Here we discuss the challenges, error profiles, control experiments, and computational test we developed to quantify the accuracy of variant frequency estimation.
Assuntos
Variação Genética , Genoma Viral , Infecções por HIV/virologia , HIV-1/genética , Reação em Cadeia da Polimerase , Recombinação Genética , Biologia Computacional/métodos , Genótipo , Humanos , Mutação INDEL , Reação em Cadeia da Polimerase/métodos , Reação em Cadeia da Polimerase/normas , Reprodutibilidade dos TestesRESUMO
HIV-1 infection cannot be cured because the virus persists as integrated proviral DNA in long-lived cells despite years of suppressive antiretroviral therapy (ART). In a previous paper (Zanini et al, 2015) we documented HIV-1 evolution in 10 untreated patients. Here we characterize establishment, turnover, and evolution of viral DNA reservoirs in the same patients after 3-18 years of suppressive ART. A median of 14% (range 0-42%) of the DNA sequences were defective due to G-to-A hypermutation. Remaining DNA sequences showed no evidence of evolution over years of suppressive ART. Most sequences from the DNA reservoirs were very similar to viruses actively replicating in plasma (RNA sequences) shortly before start of ART. The results do not support persistent HIV-1 replication as a mechanism to maintain the HIV-1 reservoir during suppressive therapy. Rather, the data indicate that DNA variants are turning over as long as patients are untreated and that suppressive ART halts this turnover.
Assuntos
DNA Viral/análise , Infecções por HIV/virologia , HIV-1/fisiologia , Latência Viral , Antirretrovirais/uso terapêutico , DNA Viral/genética , Genótipo , Infecções por HIV/tratamento farmacológico , HIV-1/classificação , HIV-1/genética , HumanosRESUMO
Many microbial populations rapidly adapt to changing environments with multiple variants competing for survival. To quantify such complex evolutionary dynamics in vivo, time resolved and genome wide data including rare variants are essential. We performed whole-genome deep sequencing of HIV-1 populations in 9 untreated patients, with 6-12 longitudinal samples per patient spanning 5-8 years of infection. The data can be accessed and explored via an interactive web application. We show that patterns of minor diversity are reproducible between patients and mirror global HIV-1 diversity, suggesting a universal landscape of fitness costs that control diversity. Reversions towards the ancestral HIV-1 sequence are observed throughout infection and account for almost one third of all sequence changes. Reversion rates depend strongly on conservation. Frequent recombination limits linkage disequilibrium to about 100 bp in most of the genome, but strong hitch-hiking due to short range linkage limits diversity.
Assuntos
Variação Genética , Infecções por HIV/virologia , HIV-1/classificação , HIV-1/genética , Metagenômica , Ligação Genética , Genoma Viral , HIV-1/isolamento & purificação , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Estudos Longitudinais , Mutação , Recombinação Genética , Análise de Sequência de DNARESUMO
Next generation sequencing technologies, like ultra-deep pyrosequencing (UDPS), allows detailed investigation of complex populations, like RNA viruses, but its utility is limited by errors introduced during sample preparation and sequencing. By tagging each individual cDNA molecule with barcodes, referred to as Primer IDs, before PCR and sequencing these errors could theoretically be removed. Here we evaluated the Primer ID methodology on 257,846 UDPS reads generated from a HIV-1 SG3Δenv plasmid clone and plasma samples from three HIV-infected patients. The Primer ID consisted of 11 randomized nucleotides, 4,194,304 combinations, in the primer for cDNA synthesis that introduced a unique sequence tag into each cDNA molecule. Consensus template sequences were constructed for reads with Primer IDs that were observed three or more times. Despite high numbers of input template molecules, the number of consensus template sequences was low. With 10,000 input molecules for the clone as few as 97 consensus template sequences were obtained due to highly skewed frequency of resampling. Furthermore, the number of sequenced templates was overestimated due to PCR errors in the Primer IDs. Finally, some consensus template sequences were erroneous due to hotspots for UDPS errors. The Primer ID methodology has the potential to provide highly accurate deep sequencing. However, it is important to be aware that there are remaining challenges with the methodology. In particular it is important to find ways to obtain a more even frequency of resampling of template molecules as well as to identify and remove artefactual consensus template sequences that have been generated by PCR errors in the Primer IDs.
Assuntos
Análise de Sequência/métodos , Sequência de Bases , Primers do DNA , HIV-1/genética , Dados de Sequência Molecular , Reação em Cadeia da Polimerase , Análise de Sequência/normas , Homologia de Sequência do Ácido NucleicoRESUMO
BACKGROUND: Primer design for highly variable DNA sequences is difficult, and experimental success requires attention to many interacting constraints. The advent of next-generation sequencing methods allows the investigation of rare variants otherwise hidden deep in large populations, but requires attention to population diversity and primer localization in relatively conserved regions, in addition to recognized constraints typically considered in primer design. RESULTS: Design constraints include degenerate sites to maximize population coverage, matching of melting temperatures, optimizing de novo sequence length, finding optimal bio-barcodes to allow efficient downstream analyses, and minimizing risk of dimerization. To facilitate primer design addressing these and other constraints, we created a novel computer program (PrimerDesign) that automates this complex procedure. We show its powers and limitations and give examples of successful designs for the analysis of HIV-1 populations. CONCLUSIONS: PrimerDesign is useful for researchers who want to design DNA primers and probes for analyzing highly variable DNA populations. It can be used to design primers for PCR, RT-PCR, Sanger sequencing, next-generation sequencing, and other experimental protocols targeting highly variable DNA samples.
Assuntos
Algoritmos , Primers do DNA/genética , Reação em Cadeia da Polimerase/métodos , Análise de Sequência de DNA/métodos , Software , Infecções por HIV/virologia , HIV-1/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos GenéticosRESUMO
BACKGROUND: Ultra-deep pyrosequencing (UDPS) is used to identify rare sequence variants. The sequence depth is influenced by several factors including the error frequency of PCR and UDPS. This study investigated the characteristics and source of errors in raw and cleaned UDPS data. RESULTS: UDPS of a 167-nucleotide fragment of the HIV-1 SG3Δenv plasmid was performed on the Roche/454 platform. The plasmid was diluted to one copy, PCR amplified and subjected to bidirectional UDPS on three occasions. The dataset consisted of 47,693 UDPS reads. Raw UDPS data had an average error frequency of 0.30% per nucleotide site. Most errors were insertions and deletions in homopolymeric regions. We used a cleaning strategy that removed almost all indel errors, but had little effect on substitution errors, which reduced the error frequency to 0.056% per nucleotide. In cleaned data the error frequency was similar in homopolymeric and non-homopolymeric regions, but varied considerably across sites. These site-specific error frequencies were moderately, but still significantly, correlated between runs (r=0.15-0.65) and between forward and reverse sequencing directions within runs (r=0.33-0.65). Furthermore, transition errors were 48-times more common than transversion errors (0.052% vs. 0.001%; p<0.0001). Collectively the results indicate that a considerable proportion of the sequencing errors that remained after data cleaning were generated during the PCR that preceded UDPS. CONCLUSIONS: A majority of the sequencing errors that remained after data cleaning were introduced by PCR prior to sequencing, which means that they will be independent of platform used for next-generation sequencing. The transition vs. transversion error bias in cleaned UDPS data will influence the detection limits of rare mutations and sequence variants.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/normas , Reação em Cadeia da Polimerase/normas , Análise de Sequência de DNA/normas , Artefatos , Sequência de Bases , HIV-1/genéticaRESUMO
In early infection HIV-1 generally uses the CCR5 coreceptor. During disease progression the coreceptor use switches to include CXCR4 in approximately 70% of infected individuals. The primary determinant for coreceptor use is located in the V3 loop of the viral envelope. Here, ultradeep pyrosequencing (UDPS) of the V3 loop was used to investigate if CXCR4-using (X4) virus may be present as a minority population during primary HIV infection (PHI). Three patients with HIV populations that switched coreceptor use, as determined by the MT-2 cell culture assay, were investigated. Longitudinally collected plasma samples (four to nine samples per patient) obtained from PHI until after coreceptor switch were analyzed by UDPS of the V3 loop. From each sample between 279 and 32,094 reads were generated based on template molecule availability. UDPS analysis showed that the X4 virus that emerged after switch was not present during PHI or prior to overt phenotypic switch. In addition, the phylogenetic analyses indicated that the X4 populations originated from R5 variants that had evolved after the previous R5-only sample was obtained. Finally, one to three major variants were found during PHI, supporting the idea that infection is established with one or just a few viral particles.
Assuntos
Proteína gp120 do Envelope de HIV/genética , Infecções por HIV/virologia , HIV-1/classificação , HIV-1/genética , Fragmentos de Peptídeos/genética , Receptores CCR5/genética , Receptores CXCR4/genética , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Filogenia , RNA Viral/sangue , RNA Viral/genética , Estudos Retrospectivos , Análise de Sequência de RNARESUMO
BACKGROUND: HIV-1-infected patients can be superinfected with additional HIV-1 variants. Therapy failure can be the consequence of an infection with a resistant strain. METHODS: A patient was diagnosed with a recent HIV-1 infection in April 2005 and subsequently clinically monitored. HIV-1 evolution was studied by population sequencing of the first 984 bases of the pol gene as well as 454 ultra-deep pyrosequencing (UDPS) of parts of the pol and env genes. RESULTS: The patient was diagnosed with a wild-type HIV-1 strain, but experienced rapid virological failure after initiating a non-nucleoside reverse transcriptase inhibitor (NNRTI)-based treatment regimen 3 years later. Population sequencing and UDPS revealed the presence of a second HIV-1 strain with a Y188L NNRTI resistance mutation in a sample obtained shortly prior to initiation of therapy. Phylogenetic analyses showed that the two HIV-1 strains were genetically distinct, providing evidence for superinfection. CONCLUSIONS: The virological treatment failure in this patient was probably due to the superinfection with an NNRTI-resistant HIV-1 variant. Superinfection with drug-resistant strains can undermine HIV-1 treatment regimens selected on the basis of resistance testing at diagnosis. Patients, especially in high-risk groups, as well as their clinicians, should be aware of the risks and dangers of superinfections.