Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 95
Filter
1.
EBioMedicine ; 102: 105040, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38485563

ABSTRACT

BACKGROUND: The principal barrier to an HIV cure is the presence of the latent viral reservoir (LVR), which has been understudied in African populations. From 2018 to 2019, Uganda instituted a nationwide rollout of ART consisting of Dolutegravir (DTG) with two NRTI, which replaced the previous regimen of one NNRTI and the same two NRTI. METHODS: Changes in the inducible replication-competent LVR (RC-LVR) of ART-suppressed Ugandans with HIV (n = 88) from 2015 to 2020 were examined using the quantitative viral outgrowth assay. Outgrowth viruses were examined for viral evolution. Changes in the RC-LVR were analyzed using three versions of a Bayesian model that estimated the decay rate over time as a single, linear rate (model A), or allowing for a change at time of DTG initiation (model B&C). FINDINGS: Model A estimated the slope of RC-LVR change as a non-significant positive increase, which was due to a temporary spike in the RC-LVR that occurred 0-12 months post-DTG initiation (p < 0.005). This was confirmed with models B and C; for instance, model B estimated a significant decay pre-DTG initiation with a half-life of 6.9 years, and an ∼1.7-fold increase in the size of the RC-LVR post-DTG initiation. There was no evidence of viral failure or consistent evolution in the cohort. INTERPRETATION: These data suggest that the change from NNRTI- to DTG-based ART is associated with a significant temporary increase in the circulating RC-LVR. FUNDING: Supported by the NIH (grant 1-UM1AI164565); Gilead HIV Cure Grants Program (90072171); Canadian Institutes of Health Research (PJT-155990); and Ontario Genomics-Canadian Statistical Sciences Institute.


Subject(s)
East African People , HIV Infections , HIV Integrase Inhibitors , HIV-1 , Humans , CD4-Positive T-Lymphocytes , HIV Infections/drug therapy , Bayes Theorem , Virus Latency , Anti-Retroviral Agents/therapeutic use , HIV Integrase Inhibitors/pharmacology , HIV Integrase Inhibitors/therapeutic use , Ontario , Viral Load
2.
Sci Rep ; 14(1): 3728, 2024 02 14.
Article in English | MEDLINE | ID: mdl-38355869

ABSTRACT

Wastewater surveillance of coronavirus disease 2019 (COVID-19) commonly applies reverse transcription-quantitative polymerase chain reaction (RT-qPCR) to quantify severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA concentrations in wastewater over time. In most applications worldwide, maximal sensitivity and specificity of RT-qPCR has been achieved, in part, by monitoring two or more genomic loci of SARS-CoV-2. In Ontario, Canada, the provincial Wastewater Surveillance Initiative reports the average copies of the CDC N1 and N2 loci normalized to the fecal biomarker pepper mild mottle virus. In November 2021, the emergence of the Omicron variant of concern, harboring a C28311T mutation within the CDC N1 probe region, challenged the accuracy of the consensus between the RT-qPCR measurements of the N1 and N2 loci of SARS-CoV-2. In this study, we developed and applied a novel real-time dual loci quality assurance and control framework based on the relative difference between the loci measurements to the City of Ottawa dataset to identify a loss of sensitivity of the N1 assay in the period from July 10, 2022 to January 31, 2023. Further analysis via sequencing and allele-specific RT-qPCR revealed a high proportion of mutations C28312T and A28330G during the study period, both in the City of Ottawa and across the province. It is hypothesized that nucleotide mutations in the probe region, especially A28330G, led to inefficient annealing, resulting in reduction in sensitivity and accuracy of the N1 assay. This study highlights the importance of implementing quality assurance and control criteria to continually evaluate, in near real-time, the accuracy of the signal produced in wastewater surveillance applications that rely on detection of pathogens whose genomes undergo high rates of mutation.


Subject(s)
Wastewater-Based Epidemiological Monitoring , Wastewater , Alleles , Mutation , Ontario/epidemiology , SARS-CoV-2/genetics , RNA, Viral/genetics
3.
Mol Biol Evol ; 40(8)2023 08 03.
Article in English | MEDLINE | ID: mdl-37463439

ABSTRACT

Nef is an accessory protein unique to the primate HIV-1, HIV-2, and SIV lentiviruses. During infection, Nef functions by interacting with multiple host proteins within infected cells to evade the immune response and enhance virion infectivity. Notably, Nef can counter immune regulators such as CD4 and MHC-I, as well as the SERINC5 restriction factor in infected cells. In this study, we generated a posterior sample of time-scaled phylogenies relating SIV and HIV Nef sequences, followed by reconstruction of ancestral sequences at the root and internal nodes of the sampled trees up to the HIV-1 Group M ancestor. Upon expression of the ancestral primate lentivirus Nef protein within CD4+ HeLa cells, flow cytometry analysis revealed that the primate lentivirus Nef ancestor robustly downregulated cell-surface SERINC5, yet only partially downregulated CD4 from the cell surface. Further analysis revealed that the Nef-mediated CD4 downregulation ability evolved gradually, while Nef-mediated SERINC5 downregulation was recovered abruptly in the HIV-1/M ancestor. Overall, this study provides a framework to reconstruct ancestral viral proteins and enable the functional characterization of these proteins to delineate how functions could have changed throughout evolutionary history.


Subject(s)
Lentiviruses, Primate , Simian Immunodeficiency Virus , Humans , Animals , Lentiviruses, Primate/genetics , Lentiviruses, Primate/metabolism , Phylogeny , HeLa Cells , Simian Immunodeficiency Virus/metabolism , nef Gene Products, Human Immunodeficiency Virus/genetics , nef Gene Products, Human Immunodeficiency Virus/metabolism , Primates/genetics , Primates/metabolism , Membrane Proteins/genetics
4.
medRxiv ; 2023 May 16.
Article in English | MEDLINE | ID: mdl-37292785

ABSTRACT

The principal barrier to an HIV cure is the presence of a latent viral reservoir (LVR) made up primarily of latently infected resting CD4+ (rCD4) T-cells. Studies in the United States have shown that the LVR decays slowly (half-life=3.8 years), but this rate in African populations has been understudied. This study examined longitudinal changes in the inducible replication competent LVR (RC-LVR) of ART-suppressed Ugandans living with HIV (n=88) from 2015-2020 using the quantitative viral outgrowth assay, which measures infectious units per million (IUPM) rCD4 T-cells. In addition, outgrowth viruses were examined with site-directed next-generation sequencing to assess for possible ongoing viral evolution. During the study period (2018-19), Uganda instituted a nationwide rollout of first-line ART consisting of Dolutegravir (DTG) with two NRTI, which replaced the previous regimen that consisted of one NNRTI and the same two NRTI. Changes in the RC-LVR were analyzed using two versions of a novel Bayesian model that estimated the decay rate over time on ART as a single, linear rate (model A) or allowing for an inflection at time of DTG initiation (model B). Model A estimated the population-level slope of RC-LVR change as a non-significant positive increase. This positive slope was due to a temporary increase in the RC-LVR that occurred 0-12 months post-DTG initiation (p<0.0001). This was confirmed with model B, which estimated a significant decay pre-DTG initiation with a half-life of 7.7 years, but a significant positive slope post-DTG initiation leading to a transient estimated doubling-time of 8.1 years. There was no evidence of viral failure in the cohort, or consistent evolution in the outgrowth sequences associated with DTG initiation. These data suggest that either the initiation of DTG, or cessation of NNRTI use, is associated with a significant temporary increase in the circulating RC-LVR.

5.
Virus Evol ; 9(1): vead026, 2023.
Article in English | MEDLINE | ID: mdl-37187604

ABSTRACT

Defining clusters of epidemiologically related infections is a common problem in the surveillance of infectious disease. A popular method for generating clusters is pairwise distance clustering, which assigns pairs of sequences to the same cluster if their genetic distance falls below some threshold. The result is often represented as a network or graph of nodes. A connected component is a set of interconnected nodes in a graph that are not connected to any other node. The prevailing approach to pairwise clustering is to map clusters to the connected components of the graph on a one-to-one basis. We propose that this definition of clusters is unnecessarily rigid. For instance, the connected components can collapse into one cluster by the addition of a single sequence that bridges nodes in the respective components. Moreover, the distance thresholds typically used for viruses like HIV-1 tend to exclude a large proportion of new sequences, making it difficult to train models for predicting cluster growth. These issues may be resolved by revisiting how we define clusters from genetic distances. Community detection is a promising class of clustering methods from the field of network science. A community is a set of nodes that are more densely inter-connected relative to the number of their connections to external nodes. Thus, a connected component may be partitioned into two or more communities. Here we describe community detection methods in the context of genetic clustering for epidemiology, demonstrate how a popular method (Markov clustering) enables us to resolve variation in transmission rates within a giant connected component of HIV-1 sequences, and identify current challenges and directions for further work.

6.
Virus Evol ; 9(1): vead009, 2023.
Article in English | MEDLINE | ID: mdl-36846827

ABSTRACT

Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may provide a mechanism to increase the information content of compact genomes. The presence of overlapping reading frames (OvRFs) can skew estimates of selection based on the rates of non-synonymous and synonymous substitutions, since a substitution that is synonymous in one reading frame may be non-synonymous in another and vice versa. To understand the impact of OvRFs on molecular evolution, we implemented a versatile simulation model of nucleotide sequence evolution along a phylogeny with any distribution of open reading frames in linear or circular genomes. We use a custom data structure to track the substitution rates at every nucleotide site, which is determined by the stationary nucleotide frequencies, transition bias and the distribution of selection biases (dN/dS) in the respective reading frames. Our simulation model is implemented in the Python scripting language. All source code is released under the GNU General Public License version 3 and are available at https://github.com/PoonLab/HexSE.

7.
Curr Protoc ; 3(2): e666, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36809686

ABSTRACT

The comparative analysis of amino acid sequences is an important tool in molecular biology that often requires multiple sequence alignments. In comparisons between less closely related genomes, however, it becomes more difficult to accurately align protein-coding sequences, or even to identify homologous regions in different genomes. In this article, we describe an alignment-free method for the classification of homologous protein-coding regions from different genomes. This methodology was originally developed for comparing genomes within virus families, but may be adapted for other organisms. We quantify sequence homology from the overlap (intersection distance) of the k-mer (word) frequency distributions for different protein sequences. Next, we extract groups of homologous sequences from the resulting distance matrix using a combination of dimensionality reduction and hierarchical clustering methods. Finally, we demonstrate how to generate visualizations of the composition of clusters with respect to protein annotations, and by coloring protein-coding regions of genomes by cluster assignments. These provide a useful means to quickly assess the reliability of the clustering results based on the distribution of homologous genes among genomes. © 2023 Wiley Periodicals LLC. Basic Protocol 1: Data collection and processing Basic Protocol 2: Calculating k-mer distances Basic Protocol 3: Extracting clusters of homology Support Protocol: Genome plot based on clustering results.


Subject(s)
Algorithms , Reproducibility of Results , Sequence Alignment , Amino Acid Sequence , Cluster Analysis
8.
Virus Evol ; 9(1): veac120, 2023.
Article in English | MEDLINE | ID: mdl-36632480

ABSTRACT

The composition of the latent human immunodeficiency virus 1 (HIV-1) reservoir is shaped by when proviruses integrated into host genomes. These integration dates can be estimated by phylogenetic methods like root-to-tip (RTT) regression. However, RTT does not accommodate variation in the number of mutations over time, uncertainty in estimating the molecular clock, or the position of the root in the tree. To address these limitations, we implemented a Bayesian extension of RTT as an R package (bayroot), which enables the user to incorporate prior information about the time of infection and start of antiretroviral therapy. Taking an unrooted maximum likelihood tree as input, we use a Metropolis-Hastings algorithm to sample from the joint posterior distribution of three parameters (the rate of sequence evolution, i.e., molecular clock; the location of the root; and the time associated with the root). Next, we apply rejection sampling to this posterior sample of model parameters to simulate integration dates for HIV proviral sequences. To validate this method, we use the R package treeswithintrees (twt) to simulate time-scaled trees relating samples of actively and latently infected T cells from a single host. We find that bayroot yields significantly more accurate estimates of integration dates than conventional RTT under a range of model settings.

9.
PLoS Comput Biol ; 18(11): e1010745, 2022 11.
Article in English | MEDLINE | ID: mdl-36449514

ABSTRACT

Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random sampling. Consequently the ideal threshold for public health applications varies substantially across settings. Here, we show a method which selects optimal thresholds for phylogenetic (subset tree) clustering based on population. We evaluated this method on HIV-1 pol datasets (n = 14, 221 sequences) from four sites in USA (Tennessee, Washington), Canada (Northern Alberta) and China (Beijing). Clusters were defined by tips descending from an ancestral node (with a minimum bootstrap support of 95%) through a series of branches, each with a length below a given threshold. Next, we used pplacer to graft new cases to the fixed tree by maximum likelihood. We evaluated the effect of varying branch-length thresholds on cluster growth as a count outcome by fitting two Poisson regression models: a null model that predicts growth from cluster size, and an alternative model that includes mean collection date as an additional covariate. The alternative model was favoured by AIC across most thresholds, with optimal (greatest difference in AIC) thresholds ranging 0.007-0.013 across sites. The range of optimal thresholds was more variable when re-sampling 80% of the data by location (IQR 0.008 - 0.016, n = 100 replicates). Our results use prospective phylogenetic cluster growth and suggest that there is more variation in effective thresholds for public health than those typically used in clustering studies.


Subject(s)
HIV Infections , HIV-1 , Humans , HIV-1/genetics , Phylogeny , Prospective Studies , Public Health , HIV Infections/epidemiology , Cluster Analysis
11.
Elife ; 112022 08 02.
Article in English | MEDLINE | ID: mdl-35916373

ABSTRACT

Tracking the emergence and spread of SARS-CoV-2 lineages using phylogenetics has proven critical to inform the timing and stringency of COVID-19 public health interventions. We investigated the effectiveness of international travel restrictions at reducing SARS-CoV-2 importations and transmission in Canada in the first two waves of 2020 and early 2021. Maximum likelihood phylogenetic trees were used to infer viruses' geographic origins, enabling identification of 2263 (95% confidence interval: 2159-2366) introductions, including 680 (658-703) Canadian sublineages, which are international introductions resulting in sampled Canadian descendants, and 1582 (1501-1663) singletons, introductions with no sampled descendants. Of the sublineages seeded during the first wave, 49% (46-52%) originated from the USA and were primarily introduced into Quebec (39%) and Ontario (36%), while in the second wave, the USA was still the predominant source (43%), alongside a larger contribution from India (16%) and the UK (7%). Following implementation of restrictions on the entry of foreign nationals on 21 March 2020, importations declined from 58.5 (50.4-66.5) sublineages per week to 10.3-fold (8.3-15.0) lower within 4 weeks. Despite the drastic reduction in viral importations following travel restrictions, newly seeded sublineages in summer and fall 2020 contributed to the persistence of COVID-19 cases in the second wave, highlighting the importance of sustained interventions to reduce transmission. Importations rebounded further in November, bringing newly emergent variants of concern (VOCs). By the end of February 2021, there had been an estimated 30 (19-41) B.1.1.7 sublineages imported into Canada, which increasingly displaced previously circulating sublineages by the end of the second wave.Although viral importations are nearly inevitable when global prevalence is high, with fewer importations there are fewer opportunities for novel variants to spark outbreaks or outcompete previously circulating lineages.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , Genomics/methods , Humans , Ontario , Phylogeny , SARS-CoV-2/genetics
12.
Proc Natl Acad Sci U S A ; 119(19): e2108815119, 2022 05 10.
Article in English | MEDLINE | ID: mdl-35500121

ABSTRACT

The prevailing abundance of full-length HIV type 1 (HIV-1) genome sequences provides an opportunity to revisit the standard model of HIV-1 group M (HIV-1/M) diversity that clusters genomes into largely nonrecombinant subtypes, which is not consistent with recent evidence of deep recombinant histories for simian immunodeficiency virus (SIV) and other HIV-1 groups. Here we develop an unsupervised nonparametric clustering approach, which does not rely on predefined nonrecombinant genomes, by adapting a community detection method developed for dynamic social network analysis. We show that this method (dynamic stochastic block model [DSBM]) attains a significantly lower mean error rate in detecting recombinant breakpoints in simulated data (quasibinomial generalized linear model (GLM), P<8×10−8), compared to other reference-free recombination detection programs (genetic algorithm for recombination detection [GARD], recombination detection program 4 [RDP4], and RDP5). When this method was applied to a representative sample of n = 525 actual HIV-1 genomes, we determined k = 29 as the optimal number of DSBM clusters and used change-point detection to estimate that at least 95% of these genomes are recombinant. Further, we identified both known and undocumented recombination hotspots in the HIV-1 genome and evidence of intersubtype recombination in HIV-1 subtype reference genomes. We propose that clusters generated by DSBM can provide an informative framework for HIV-1 classification.


Subject(s)
HIV-1 , HIV-1/genetics , Recombination, Genetic
13.
PLoS Pathog ; 18(2): e1010331, 2022 02.
Article in English | MEDLINE | ID: mdl-35202429

ABSTRACT

Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (-0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.


Subject(s)
Genes, Overlapping , Genome, Viral , Genes, Overlapping/genetics , Genome, Viral/genetics , Open Reading Frames/genetics
14.
Environ Sci Technol Lett ; 9(7): 638-644, 2022 Jul 12.
Article in English | MEDLINE | ID: mdl-37552744

ABSTRACT

Wastewater surveillance has rapidly emerged as an early warning tool to track COVID-19. However, the early warning measurement of new SARS-CoV-2 variants of concern (VOCs) in wastewaters remains a major challenge. We herein report a rapid analytical strategy for quantitative measurement of VOCs, which couples nested polymerase chain reaction and liquid chromatography-mass spectrometry (nPCR-LC-MS). This method showed a greater selectivity than the current allele-specific quantitative PCR (AS-qPCR) for tracking new VOC and allowed the detection of multiple signature mutations in a single measurement. By measuring the Omicron variant in wastewaters across nine Ontario wastewater treatment plants serving over a three million population, the nPCR-LC-MS method demonstrated a better quantification accuracy than next-generation sequencing (NGS), particularly at the early stage of community spreading of Omicron. This work addresses a major challenge for current SARS-CoV-2 wastewater surveillance by rapidly and accurately measuring VOCs in wastewaters for early warning.

15.
PLoS One ; 16(12): e0259877, 2021.
Article in English | MEDLINE | ID: mdl-34941890

ABSTRACT

The shape of phylogenetic trees can be used to gain evolutionary insights. A tree's shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree. In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems. Network science is thus a potential source of many novel ways to characterize tree shape, as trees are also networks. Here, we tailor tools from network science, including diameter, average path length, and betweenness, closeness, and eigenvector centrality, to summarize phylogenetic tree shapes. We thereby propose tree shape summaries that are complementary to both asymmetry and the frequencies of small configurations. These new statistics can be computed in linear time and scale well to describe the shapes of large trees. We apply these statistics, alongside some conventional tree statistics, to phylogenetic trees from three very different viruses (HIV, dengue fever and measles), from the same virus in different epidemiological scenarios (influenza A and HIV) and from simulation models known to produce trees with different shapes. Using mutual information and supervised learning algorithms, we find that the statistics adapted from network science perform as well as or better than conventional statistics. We describe their distributions and prove some basic results about their extreme values in a tree. We conclude that network science-based tree shape summaries are a promising addition to the toolkit of tree shape features. All our shape summaries, as well as functions to select the most discriminating ones for two sets of trees, are freely available as an R package at http://github.com/Leonardini/treeCentrality.


Subject(s)
Computational Biology/methods , Decision Trees , Virus Diseases/virology , Viruses/classification , Algorithms , Data Interpretation, Statistical , Dengue/epidemiology , Dengue/virology , Dengue Virus/classification , HIV Infections/epidemiology , HIV Infections/virology , HIV-1/classification , Humans , Measles/epidemiology , Measles/virology , Measles virus/classification , Phylogeny , Software , Virus Diseases/epidemiology
16.
Viruses ; 13(9)2021 08 30.
Article in English | MEDLINE | ID: mdl-34578305

ABSTRACT

Despite the effectiveness of direct-acting antiviral agents in treating hepatitis C virus (HCV), cases of treatment failure have been associated with the emergence of resistance-associated substitutions. To better guide clinical decision-making, we developed and validated a near-whole-genome HCV genotype-independent next-generation sequencing strategy. HCV genotype 1-6 samples from direct-acting antiviral agent treatment-naïve and -treated HCV-infected individuals were included. Viral RNA was extracted using a NucliSens easyMAG and amplified using nested reverse transcription-polymerase chain reaction. Libraries were prepared using Nextera XT and sequenced on the Illumina MiSeq sequencing platform. Data were processed by an in-house pipeline (MiCall). Nucleotide consensus sequences were aligned to reference strain sequences for resistance-associated substitution identification and compared to NS3, NS5a, and NS5b sequence data obtained from a validated in-house assay optimized for HCV genotype 1. Sequencing success rates (defined as achieving >100-fold read coverage) approaching 90% were observed for most genotypes in samples with a viral load >5 log10 IU/mL. This genotype-independent sequencing method resulted in >99.8% nucleotide concordance with the genotype 1-optimized method, and 100% agreement in genotype assignment with paired line probe assay-based genotypes. The assay demonstrated high intra-run repeatability and inter-run reproducibility at detecting substitutions above 2% prevalence. This study highlights the performance of a freely available laboratory and bioinformatic approach for reliable HCV genotyping and resistance-associated substitution detection regardless of genotype.


Subject(s)
Genotype , Hepacivirus/genetics , Hepatitis C/virology , RNA, Viral/genetics , Whole Genome Sequencing/methods , Whole Genome Sequencing/standards , Genotyping Techniques , Hepacivirus/classification , Hepatitis C/diagnosis , Humans , Reproducibility of Results , Sensitivity and Specificity , Viral Load
17.
Virus Evol ; 7(1): veaa106, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33614158

ABSTRACT

Many virus-encoded proteins have intrinsically disordered regions that lack a stable, folded three-dimensional structure. These disordered proteins often play important functional roles in virus replication, such as down-regulating host defense mechanisms. With the widespread availability of next-generation sequencing, the number of new virus genomes with predicted open reading frames is rapidly outpacing our capacity for directly characterizing protein structures through crystallography. Hence, computational methods for structural prediction play an important role. A large number of predictors focus on the problem of classifying residues into ordered and disordered regions, and these methods tend to be validated on a diverse training set of proteins from eukaryotes, prokaryotes, and viruses. In this study, we investigate whether some predictors outperform others in the context of virus proteins and compared our findings with data from non-viral proteins. We evaluate the prediction accuracy of 21 methods, many of which are only available as web applications, on a curated set of 126 proteins encoded by viruses. Furthermore, we apply a random forest classifier to these predictor outputs. Based on cross-validation experiments, this ensemble approach confers a substantial improvement in accuracy, e.g., a mean 36 per cent gain in Matthews correlation coefficient. Lastly, we apply the random forest predictor to severe acute respiratory syndrome coronavirus 2 ORF6, an accessory gene that encodes a short (61 AA) and moderately disordered protein that inhibits the host innate immune response. We show that disorder prediction methods perform differently for viral and non-viral proteins, and that an ensemble approach can yield more robust and accurate predictions.

18.
Virus Evol ; 7(1): veaa104, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33505711

ABSTRACT

Among people living with human immunodeficiency virus type 1 (HIV-1), the long-term persistence of a population of cells carrying transcriptionally silent integrated viral DNA (provirus) remains the primary barrier to developing an effective cure. Ongoing cell division via proliferation is generally considered to be the driving force behind the persistence of this latent HIV-1 reservoir. The contribution of this mechanism (clonal expansion) is supported by the observation that proviral sequences sampled from the reservoir are often identical. This outcome is quantified as the 'clonality' of the sample population, e.g. the fraction of provirus sequences observed more than once. However, clonality as a quantitative measure is inconsistently defined and its statistical properties are not well understood. In this Reflections article, we use mathematical and phylogenetic frameworks to formally examine the inherent problems of using clonality to characterize the dynamics and proviral composition of the reservoir. We describe how clonality is not adequate for this task due to the inherent complexity of how infected cells are 'labeled' by proviral sequences-the outcome of a sampling process from the evolutionary history of active viral replication before treatment-as well as variation in cell birth and death rates among lineages and over time. Lastly, we outline potential directions in statistical and phylogenetic research to address these issues.

19.
Virus Evol ; 7(2): veab092, 2021 Dec.
Article in English | MEDLINE | ID: mdl-37124703

ABSTRACT

Phylogenetics has played a pivotal role in the genomic epidemiology of severe acute respiratory syndrome coronavirus 2, such as tracking the emergence and global spread of variants and scientific communication. However, the rapid accumulation of genomic data from around the world-with over two million genomes currently available in the Global Initiative on Sharing All Influenza Data database-is testing the limits of standard phylogenetic methods. Here, we describe a new approach to rapidly analyze and visualize large numbers of SARS-CoV-2 genomes. Using Python, genomes are filtered for problematic sites, incomplete coverage, and excessive divergence from a strict molecular clock. All differences from the reference genome, including indels, are extracted using minimap2 and compactly stored as a set of features for each genome. For each Pango lineage (https://cov-lineages.org), we collapse genomes with identical features into 'variants', generate 100 bootstrap samples of the feature set union to generate weights, and compute the symmetric differences between the weighted feature sets for every pair of variants. The resulting distance matrices are used to generate neighbor-joining trees in RapidNJ that are converted into a majority-rule consensus tree for each lineage. Branches with support values below 50 per cent or mean lengths below 0.5 differences are collapsed, and tip labels on affected branches are mapped to internal nodes as directly sampled ancestral variants. Currently, we process about 2 million genomes in approximately 9 h on 52 cores. The resulting trees are visualized using the JavaScript framework D3.js as 'beadplots', in which variants are represented by horizontal line segments, annotated with beads representing samples by collection date. Variants are linked by vertical edges to represent branches in the consensus tree. These visualizations are published at https://filogeneti.ca/CoVizu. All source code was released under an MIT license at https://github.com/PoonLab/covizu.

20.
J Antimicrob Chemother ; 75(12): 3525-3533, 2020 12 01.
Article in English | MEDLINE | ID: mdl-32853364

ABSTRACT

BACKGROUND: Increasing first-line treatment failures in low- and middle-income countries (LMICs) have led to increased use of integrase strand transfer inhibitors (INSTIs) such as dolutegravir. However, HIV-1 susceptibility to INSTIs in LMICs, especially with previous raltegravir exposure, is poorly understood due to infrequent reporting of INSTI failures and testing for INSTI drug resistance mutations (DRMs). METHODS: A total of 51 non-subtype B HIV-1 infected patients failing third-line (raltegravir-based) therapy in Uganda were initially selected for the study. DRMs were detected using Sanger and deep sequencing. HIV integrase genes of 13 patients were cloned and replication capacities (RCs) and phenotypic susceptibilities to dolutegravir, raltegravir and elvitegravir were determined with TZM-bl cells. Spearman's correlation coefficient was used to determine cross-resistance between INSTIs. RESULTS: INSTI DRMs were detected in 47% of patients. HIV integrase-recombinant virus carrying one primary INSTI DRM (N155H or Y143R/S) was susceptible to dolutegravir but highly resistant to raltegravir and elvitegravir (>50-fold change). Two patients, one with E138A/G140A/Q148R/G163R and one with E138K/G140A/S147G/Q148K, displayed the highest reported resistance to raltegravir, elvitegravir and even dolutegravir. The former multi-DRM virus had WT RC whereas the latter had lower RCs than WT. CONCLUSIONS: In HIV-1 subtype A- and D-infected patients failing raltegravir and harbouring INSTI DRMs, there is high-level resistance to elvitegravir and raltegravir. More routine monitoring of INSTI treatment may be advised in LMICs, considering that multiple INSTI DRMs may have accumulated during prolonged exposure to raltegravir during virological failure, leading to high-level INSTI resistance, including dolutegravir resistance.


Subject(s)
HIV Infections , HIV Integrase Inhibitors , HIV Integrase , HIV-1 , Drug Resistance, Viral , HIV Infections/drug therapy , HIV Integrase/genetics , HIV Integrase Inhibitors/pharmacology , HIV Integrase Inhibitors/therapeutic use , HIV-1/genetics , Heterocyclic Compounds, 3-Ring , Humans , Mutation , Oxazines , Piperazines/therapeutic use , Pyridones , Raltegravir Potassium/pharmacology , Raltegravir Potassium/therapeutic use , Uganda
SELECTION OF CITATIONS
SEARCH DETAIL
...