Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 97
Filtrar
1.
Microb Genom ; 10(5)2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38785221

RESUMO

Wastewater-based surveillance (WBS) is an important epidemiological and public health tool for tracking pathogens across the scale of a building, neighbourhood, city, or region. WBS gained widespread adoption globally during the SARS-CoV-2 pandemic for estimating community infection levels by qPCR. Sequencing pathogen genes or genomes from wastewater adds information about pathogen genetic diversity, which can be used to identify viral lineages (including variants of concern) that are circulating in a local population. Capturing the genetic diversity by WBS sequencing is not trivial, as wastewater samples often contain a diverse mixture of viral lineages with real mutations and sequencing errors, which must be deconvoluted computationally from short sequencing reads. In this study we assess nine different computational tools that have recently been developed to address this challenge. We simulated 100 wastewater sequence samples consisting of SARS-CoV-2 BA.1, BA.2, and Delta lineages, in various mixtures, as well as a Delta-Omicron recombinant and a synthetic 'novel' lineage. Most tools performed well in identifying the true lineages present and estimating their relative abundances and were generally robust to variation in sequencing depth and read length. While many tools identified lineages present down to 1 % frequency, results were more reliable above a 5 % threshold. The presence of an unknown synthetic lineage, which represents an unclassified SARS-CoV-2 lineage, increases the error in relative abundance estimates of other lineages, but the magnitude of this effect was small for most tools. The tools also varied in how they labelled novel synthetic lineages and recombinants. While our simulated dataset represents just one of many possible use cases for these methods, we hope it helps users understand potential sources of error or bias in wastewater sequencing analysis and to appreciate the commonalities and differences across methods.


Assuntos
COVID-19 , Genoma Viral , SARS-CoV-2 , Águas Residuárias , Águas Residuárias/virologia , SARS-CoV-2/genética , SARS-CoV-2/classificação , COVID-19/virologia , COVID-19/epidemiologia , Humanos , Biologia Computacional/métodos , Genômica/métodos , Vigilância Epidemiológica Baseada em Águas Residuárias , Filogenia
2.
J Infect Dis ; 2024 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-38819322

RESUMO

Timing of HIV-1 reservoir formation is important for informing HIV cure efforts. It is unclear how much of the variability seen in dating reservoir formation is due to sampling and gene-specific differences. We used a Bayesian extension of root to tip regression (bayroot) to re-estimate formation date distributions in participants from Swedish and South African cohorts, and assessed the impact of variable timing, frequency, and depth of sampling on these estimates. Significant shifts in formation date distributions were only observed with use of faster-evolving genes, while timing, frequency, and depth of sampling had minor or no significant effect on estimates.

3.
EBioMedicine ; 102: 105040, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38485563

RESUMO

BACKGROUND: The principal barrier to an HIV cure is the presence of the latent viral reservoir (LVR), which has been understudied in African populations. From 2018 to 2019, Uganda instituted a nationwide rollout of ART consisting of Dolutegravir (DTG) with two NRTI, which replaced the previous regimen of one NNRTI and the same two NRTI. METHODS: Changes in the inducible replication-competent LVR (RC-LVR) of ART-suppressed Ugandans with HIV (n = 88) from 2015 to 2020 were examined using the quantitative viral outgrowth assay. Outgrowth viruses were examined for viral evolution. Changes in the RC-LVR were analyzed using three versions of a Bayesian model that estimated the decay rate over time as a single, linear rate (model A), or allowing for a change at time of DTG initiation (model B&C). FINDINGS: Model A estimated the slope of RC-LVR change as a non-significant positive increase, which was due to a temporary spike in the RC-LVR that occurred 0-12 months post-DTG initiation (p < 0.005). This was confirmed with models B and C; for instance, model B estimated a significant decay pre-DTG initiation with a half-life of 6.9 years, and an ∼1.7-fold increase in the size of the RC-LVR post-DTG initiation. There was no evidence of viral failure or consistent evolution in the cohort. INTERPRETATION: These data suggest that the change from NNRTI- to DTG-based ART is associated with a significant temporary increase in the circulating RC-LVR. FUNDING: Supported by the NIH (grant 1-UM1AI164565); Gilead HIV Cure Grants Program (90072171); Canadian Institutes of Health Research (PJT-155990); and Ontario Genomics-Canadian Statistical Sciences Institute.


Assuntos
População da África Oriental , Infecções por HIV , Inibidores de Integrase de HIV , HIV-1 , Humanos , Linfócitos T CD4-Positivos , Infecções por HIV/tratamento farmacológico , Teorema de Bayes , Latência Viral , Antirretrovirais/uso terapêutico , Inibidores de Integrase de HIV/farmacologia , Inibidores de Integrase de HIV/uso terapêutico , Ontário , Carga Viral
4.
Sci Rep ; 14(1): 3728, 2024 02 14.
Artigo em Inglês | MEDLINE | ID: mdl-38355869

RESUMO

Wastewater surveillance of coronavirus disease 2019 (COVID-19) commonly applies reverse transcription-quantitative polymerase chain reaction (RT-qPCR) to quantify severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA concentrations in wastewater over time. In most applications worldwide, maximal sensitivity and specificity of RT-qPCR has been achieved, in part, by monitoring two or more genomic loci of SARS-CoV-2. In Ontario, Canada, the provincial Wastewater Surveillance Initiative reports the average copies of the CDC N1 and N2 loci normalized to the fecal biomarker pepper mild mottle virus. In November 2021, the emergence of the Omicron variant of concern, harboring a C28311T mutation within the CDC N1 probe region, challenged the accuracy of the consensus between the RT-qPCR measurements of the N1 and N2 loci of SARS-CoV-2. In this study, we developed and applied a novel real-time dual loci quality assurance and control framework based on the relative difference between the loci measurements to the City of Ottawa dataset to identify a loss of sensitivity of the N1 assay in the period from July 10, 2022 to January 31, 2023. Further analysis via sequencing and allele-specific RT-qPCR revealed a high proportion of mutations C28312T and A28330G during the study period, both in the City of Ottawa and across the province. It is hypothesized that nucleotide mutations in the probe region, especially A28330G, led to inefficient annealing, resulting in reduction in sensitivity and accuracy of the N1 assay. This study highlights the importance of implementing quality assurance and control criteria to continually evaluate, in near real-time, the accuracy of the signal produced in wastewater surveillance applications that rely on detection of pathogens whose genomes undergo high rates of mutation.


Assuntos
Vigilância Epidemiológica Baseada em Águas Residuárias , Águas Residuárias , Alelos , Mutação , Ontário/epidemiologia , SARS-CoV-2/genética , RNA Viral/genética
5.
Mol Biol Evol ; 40(8)2023 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-37463439

RESUMO

Nef is an accessory protein unique to the primate HIV-1, HIV-2, and SIV lentiviruses. During infection, Nef functions by interacting with multiple host proteins within infected cells to evade the immune response and enhance virion infectivity. Notably, Nef can counter immune regulators such as CD4 and MHC-I, as well as the SERINC5 restriction factor in infected cells. In this study, we generated a posterior sample of time-scaled phylogenies relating SIV and HIV Nef sequences, followed by reconstruction of ancestral sequences at the root and internal nodes of the sampled trees up to the HIV-1 Group M ancestor. Upon expression of the ancestral primate lentivirus Nef protein within CD4+ HeLa cells, flow cytometry analysis revealed that the primate lentivirus Nef ancestor robustly downregulated cell-surface SERINC5, yet only partially downregulated CD4 from the cell surface. Further analysis revealed that the Nef-mediated CD4 downregulation ability evolved gradually, while Nef-mediated SERINC5 downregulation was recovered abruptly in the HIV-1/M ancestor. Overall, this study provides a framework to reconstruct ancestral viral proteins and enable the functional characterization of these proteins to delineate how functions could have changed throughout evolutionary history.


Assuntos
Lentivirus de Primatas , Vírus da Imunodeficiência Símia , Humanos , Animais , Lentivirus de Primatas/genética , Lentivirus de Primatas/metabolismo , Filogenia , Células HeLa , Vírus da Imunodeficiência Símia/metabolismo , Produtos do Gene nef do Vírus da Imunodeficiência Humana/genética , Produtos do Gene nef do Vírus da Imunodeficiência Humana/metabolismo , Primatas/genética , Primatas/metabolismo , Proteínas de Membrana/genética
6.
medRxiv ; 2023 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-37292785

RESUMO

The principal barrier to an HIV cure is the presence of a latent viral reservoir (LVR) made up primarily of latently infected resting CD4+ (rCD4) T-cells. Studies in the United States have shown that the LVR decays slowly (half-life=3.8 years), but this rate in African populations has been understudied. This study examined longitudinal changes in the inducible replication competent LVR (RC-LVR) of ART-suppressed Ugandans living with HIV (n=88) from 2015-2020 using the quantitative viral outgrowth assay, which measures infectious units per million (IUPM) rCD4 T-cells. In addition, outgrowth viruses were examined with site-directed next-generation sequencing to assess for possible ongoing viral evolution. During the study period (2018-19), Uganda instituted a nationwide rollout of first-line ART consisting of Dolutegravir (DTG) with two NRTI, which replaced the previous regimen that consisted of one NNRTI and the same two NRTI. Changes in the RC-LVR were analyzed using two versions of a novel Bayesian model that estimated the decay rate over time on ART as a single, linear rate (model A) or allowing for an inflection at time of DTG initiation (model B). Model A estimated the population-level slope of RC-LVR change as a non-significant positive increase. This positive slope was due to a temporary increase in the RC-LVR that occurred 0-12 months post-DTG initiation (p<0.0001). This was confirmed with model B, which estimated a significant decay pre-DTG initiation with a half-life of 7.7 years, but a significant positive slope post-DTG initiation leading to a transient estimated doubling-time of 8.1 years. There was no evidence of viral failure in the cohort, or consistent evolution in the outgrowth sequences associated with DTG initiation. These data suggest that either the initiation of DTG, or cessation of NNRTI use, is associated with a significant temporary increase in the circulating RC-LVR.

7.
Virus Evol ; 9(1): vead026, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37187604

RESUMO

Defining clusters of epidemiologically related infections is a common problem in the surveillance of infectious disease. A popular method for generating clusters is pairwise distance clustering, which assigns pairs of sequences to the same cluster if their genetic distance falls below some threshold. The result is often represented as a network or graph of nodes. A connected component is a set of interconnected nodes in a graph that are not connected to any other node. The prevailing approach to pairwise clustering is to map clusters to the connected components of the graph on a one-to-one basis. We propose that this definition of clusters is unnecessarily rigid. For instance, the connected components can collapse into one cluster by the addition of a single sequence that bridges nodes in the respective components. Moreover, the distance thresholds typically used for viruses like HIV-1 tend to exclude a large proportion of new sequences, making it difficult to train models for predicting cluster growth. These issues may be resolved by revisiting how we define clusters from genetic distances. Community detection is a promising class of clustering methods from the field of network science. A community is a set of nodes that are more densely inter-connected relative to the number of their connections to external nodes. Thus, a connected component may be partitioned into two or more communities. Here we describe community detection methods in the context of genetic clustering for epidemiology, demonstrate how a popular method (Markov clustering) enables us to resolve variation in transmission rates within a giant connected component of HIV-1 sequences, and identify current challenges and directions for further work.

8.
Virus Evol ; 9(1): vead009, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36846827

RESUMO

Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may provide a mechanism to increase the information content of compact genomes. The presence of overlapping reading frames (OvRFs) can skew estimates of selection based on the rates of non-synonymous and synonymous substitutions, since a substitution that is synonymous in one reading frame may be non-synonymous in another and vice versa. To understand the impact of OvRFs on molecular evolution, we implemented a versatile simulation model of nucleotide sequence evolution along a phylogeny with any distribution of open reading frames in linear or circular genomes. We use a custom data structure to track the substitution rates at every nucleotide site, which is determined by the stationary nucleotide frequencies, transition bias and the distribution of selection biases (dN/dS) in the respective reading frames. Our simulation model is implemented in the Python scripting language. All source code is released under the GNU General Public License version 3 and are available at https://github.com/PoonLab/HexSE.

9.
Curr Protoc ; 3(2): e666, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36809686

RESUMO

The comparative analysis of amino acid sequences is an important tool in molecular biology that often requires multiple sequence alignments. In comparisons between less closely related genomes, however, it becomes more difficult to accurately align protein-coding sequences, or even to identify homologous regions in different genomes. In this article, we describe an alignment-free method for the classification of homologous protein-coding regions from different genomes. This methodology was originally developed for comparing genomes within virus families, but may be adapted for other organisms. We quantify sequence homology from the overlap (intersection distance) of the k-mer (word) frequency distributions for different protein sequences. Next, we extract groups of homologous sequences from the resulting distance matrix using a combination of dimensionality reduction and hierarchical clustering methods. Finally, we demonstrate how to generate visualizations of the composition of clusters with respect to protein annotations, and by coloring protein-coding regions of genomes by cluster assignments. These provide a useful means to quickly assess the reliability of the clustering results based on the distribution of homologous genes among genomes. © 2023 Wiley Periodicals LLC. Basic Protocol 1: Data collection and processing Basic Protocol 2: Calculating k-mer distances Basic Protocol 3: Extracting clusters of homology Support Protocol: Genome plot based on clustering results.


Assuntos
Algoritmos , Reprodutibilidade dos Testes , Alinhamento de Sequência , Sequência de Aminoácidos , Análise por Conglomerados
10.
Virus Evol ; 9(1): veac120, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36632480

RESUMO

The composition of the latent human immunodeficiency virus 1 (HIV-1) reservoir is shaped by when proviruses integrated into host genomes. These integration dates can be estimated by phylogenetic methods like root-to-tip (RTT) regression. However, RTT does not accommodate variation in the number of mutations over time, uncertainty in estimating the molecular clock, or the position of the root in the tree. To address these limitations, we implemented a Bayesian extension of RTT as an R package (bayroot), which enables the user to incorporate prior information about the time of infection and start of antiretroviral therapy. Taking an unrooted maximum likelihood tree as input, we use a Metropolis-Hastings algorithm to sample from the joint posterior distribution of three parameters (the rate of sequence evolution, i.e., molecular clock; the location of the root; and the time associated with the root). Next, we apply rejection sampling to this posterior sample of model parameters to simulate integration dates for HIV proviral sequences. To validate this method, we use the R package treeswithintrees (twt) to simulate time-scaled trees relating samples of actively and latently infected T cells from a single host. We find that bayroot yields significantly more accurate estimates of integration dates than conventional RTT under a range of model settings.

12.
PLoS Comput Biol ; 18(11): e1010745, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36449514

RESUMO

Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random sampling. Consequently the ideal threshold for public health applications varies substantially across settings. Here, we show a method which selects optimal thresholds for phylogenetic (subset tree) clustering based on population. We evaluated this method on HIV-1 pol datasets (n = 14, 221 sequences) from four sites in USA (Tennessee, Washington), Canada (Northern Alberta) and China (Beijing). Clusters were defined by tips descending from an ancestral node (with a minimum bootstrap support of 95%) through a series of branches, each with a length below a given threshold. Next, we used pplacer to graft new cases to the fixed tree by maximum likelihood. We evaluated the effect of varying branch-length thresholds on cluster growth as a count outcome by fitting two Poisson regression models: a null model that predicts growth from cluster size, and an alternative model that includes mean collection date as an additional covariate. The alternative model was favoured by AIC across most thresholds, with optimal (greatest difference in AIC) thresholds ranging 0.007-0.013 across sites. The range of optimal thresholds was more variable when re-sampling 80% of the data by location (IQR 0.008 - 0.016, n = 100 replicates). Our results use prospective phylogenetic cluster growth and suggest that there is more variation in effective thresholds for public health than those typically used in clustering studies.


Assuntos
Infecções por HIV , HIV-1 , Humanos , HIV-1/genética , Filogenia , Estudos Prospectivos , Saúde Pública , Infecções por HIV/epidemiologia , Análise por Conglomerados
13.
Elife ; 112022 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-35916373

RESUMO

Tracking the emergence and spread of SARS-CoV-2 lineages using phylogenetics has proven critical to inform the timing and stringency of COVID-19 public health interventions. We investigated the effectiveness of international travel restrictions at reducing SARS-CoV-2 importations and transmission in Canada in the first two waves of 2020 and early 2021. Maximum likelihood phylogenetic trees were used to infer viruses' geographic origins, enabling identification of 2263 (95% confidence interval: 2159-2366) introductions, including 680 (658-703) Canadian sublineages, which are international introductions resulting in sampled Canadian descendants, and 1582 (1501-1663) singletons, introductions with no sampled descendants. Of the sublineages seeded during the first wave, 49% (46-52%) originated from the USA and were primarily introduced into Quebec (39%) and Ontario (36%), while in the second wave, the USA was still the predominant source (43%), alongside a larger contribution from India (16%) and the UK (7%). Following implementation of restrictions on the entry of foreign nationals on 21 March 2020, importations declined from 58.5 (50.4-66.5) sublineages per week to 10.3-fold (8.3-15.0) lower within 4 weeks. Despite the drastic reduction in viral importations following travel restrictions, newly seeded sublineages in summer and fall 2020 contributed to the persistence of COVID-19 cases in the second wave, highlighting the importance of sustained interventions to reduce transmission. Importations rebounded further in November, bringing newly emergent variants of concern (VOCs). By the end of February 2021, there had been an estimated 30 (19-41) B.1.1.7 sublineages imported into Canada, which increasingly displaced previously circulating sublineages by the end of the second wave.Although viral importations are nearly inevitable when global prevalence is high, with fewer importations there are fewer opportunities for novel variants to spark outbreaks or outcompete previously circulating lineages.


Assuntos
COVID-19 , SARS-CoV-2 , COVID-19/epidemiologia , Genômica/métodos , Humanos , Ontário , Filogenia , SARS-CoV-2/genética
14.
Proc Natl Acad Sci U S A ; 119(19): e2108815119, 2022 05 10.
Artigo em Inglês | MEDLINE | ID: mdl-35500121

RESUMO

The prevailing abundance of full-length HIV type 1 (HIV-1) genome sequences provides an opportunity to revisit the standard model of HIV-1 group M (HIV-1/M) diversity that clusters genomes into largely nonrecombinant subtypes, which is not consistent with recent evidence of deep recombinant histories for simian immunodeficiency virus (SIV) and other HIV-1 groups. Here we develop an unsupervised nonparametric clustering approach, which does not rely on predefined nonrecombinant genomes, by adapting a community detection method developed for dynamic social network analysis. We show that this method (dynamic stochastic block model [DSBM]) attains a significantly lower mean error rate in detecting recombinant breakpoints in simulated data (quasibinomial generalized linear model (GLM), P<8×10−8), compared to other reference-free recombination detection programs (genetic algorithm for recombination detection [GARD], recombination detection program 4 [RDP4], and RDP5). When this method was applied to a representative sample of n = 525 actual HIV-1 genomes, we determined k = 29 as the optimal number of DSBM clusters and used change-point detection to estimate that at least 95% of these genomes are recombinant. Further, we identified both known and undocumented recombination hotspots in the HIV-1 genome and evidence of intersubtype recombination in HIV-1 subtype reference genomes. We propose that clusters generated by DSBM can provide an informative framework for HIV-1 classification.


Assuntos
HIV-1 , HIV-1/genética , Recombinação Genética
15.
PLoS Pathog ; 18(2): e1010331, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35202429

RESUMO

Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (-0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.


Assuntos
Homologia de Genes , Genoma Viral , Homologia de Genes/genética , Genoma Viral/genética , Fases de Leitura Aberta/genética
16.
Environ Sci Technol Lett ; 9(7): 638-644, 2022 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-37552744

RESUMO

Wastewater surveillance has rapidly emerged as an early warning tool to track COVID-19. However, the early warning measurement of new SARS-CoV-2 variants of concern (VOCs) in wastewaters remains a major challenge. We herein report a rapid analytical strategy for quantitative measurement of VOCs, which couples nested polymerase chain reaction and liquid chromatography-mass spectrometry (nPCR-LC-MS). This method showed a greater selectivity than the current allele-specific quantitative PCR (AS-qPCR) for tracking new VOC and allowed the detection of multiple signature mutations in a single measurement. By measuring the Omicron variant in wastewaters across nine Ontario wastewater treatment plants serving over a three million population, the nPCR-LC-MS method demonstrated a better quantification accuracy than next-generation sequencing (NGS), particularly at the early stage of community spreading of Omicron. This work addresses a major challenge for current SARS-CoV-2 wastewater surveillance by rapidly and accurately measuring VOCs in wastewaters for early warning.

17.
PLoS One ; 16(12): e0259877, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34941890

RESUMO

The shape of phylogenetic trees can be used to gain evolutionary insights. A tree's shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree. In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems. Network science is thus a potential source of many novel ways to characterize tree shape, as trees are also networks. Here, we tailor tools from network science, including diameter, average path length, and betweenness, closeness, and eigenvector centrality, to summarize phylogenetic tree shapes. We thereby propose tree shape summaries that are complementary to both asymmetry and the frequencies of small configurations. These new statistics can be computed in linear time and scale well to describe the shapes of large trees. We apply these statistics, alongside some conventional tree statistics, to phylogenetic trees from three very different viruses (HIV, dengue fever and measles), from the same virus in different epidemiological scenarios (influenza A and HIV) and from simulation models known to produce trees with different shapes. Using mutual information and supervised learning algorithms, we find that the statistics adapted from network science perform as well as or better than conventional statistics. We describe their distributions and prove some basic results about their extreme values in a tree. We conclude that network science-based tree shape summaries are a promising addition to the toolkit of tree shape features. All our shape summaries, as well as functions to select the most discriminating ones for two sets of trees, are freely available as an R package at http://github.com/Leonardini/treeCentrality.


Assuntos
Biologia Computacional/métodos , Árvores de Decisões , Viroses/virologia , Vírus/classificação , Algoritmos , Interpretação Estatística de Dados , Dengue/epidemiologia , Dengue/virologia , Vírus da Dengue/classificação , Infecções por HIV/epidemiologia , Infecções por HIV/virologia , HIV-1/classificação , Humanos , Sarampo/epidemiologia , Sarampo/virologia , Vírus do Sarampo/classificação , Filogenia , Software , Viroses/epidemiologia
18.
Viruses ; 13(9)2021 08 30.
Artigo em Inglês | MEDLINE | ID: mdl-34578305

RESUMO

Despite the effectiveness of direct-acting antiviral agents in treating hepatitis C virus (HCV), cases of treatment failure have been associated with the emergence of resistance-associated substitutions. To better guide clinical decision-making, we developed and validated a near-whole-genome HCV genotype-independent next-generation sequencing strategy. HCV genotype 1-6 samples from direct-acting antiviral agent treatment-naïve and -treated HCV-infected individuals were included. Viral RNA was extracted using a NucliSens easyMAG and amplified using nested reverse transcription-polymerase chain reaction. Libraries were prepared using Nextera XT and sequenced on the Illumina MiSeq sequencing platform. Data were processed by an in-house pipeline (MiCall). Nucleotide consensus sequences were aligned to reference strain sequences for resistance-associated substitution identification and compared to NS3, NS5a, and NS5b sequence data obtained from a validated in-house assay optimized for HCV genotype 1. Sequencing success rates (defined as achieving >100-fold read coverage) approaching 90% were observed for most genotypes in samples with a viral load >5 log10 IU/mL. This genotype-independent sequencing method resulted in >99.8% nucleotide concordance with the genotype 1-optimized method, and 100% agreement in genotype assignment with paired line probe assay-based genotypes. The assay demonstrated high intra-run repeatability and inter-run reproducibility at detecting substitutions above 2% prevalence. This study highlights the performance of a freely available laboratory and bioinformatic approach for reliable HCV genotyping and resistance-associated substitution detection regardless of genotype.


Assuntos
Genótipo , Hepacivirus/genética , Hepatite C/virologia , RNA Viral/genética , Sequenciamento Completo do Genoma/métodos , Sequenciamento Completo do Genoma/normas , Técnicas de Genotipagem , Hepacivirus/classificação , Hepatite C/diagnóstico , Humanos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Carga Viral
19.
Virus Evol ; 7(1): veaa106, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33614158

RESUMO

Many virus-encoded proteins have intrinsically disordered regions that lack a stable, folded three-dimensional structure. These disordered proteins often play important functional roles in virus replication, such as down-regulating host defense mechanisms. With the widespread availability of next-generation sequencing, the number of new virus genomes with predicted open reading frames is rapidly outpacing our capacity for directly characterizing protein structures through crystallography. Hence, computational methods for structural prediction play an important role. A large number of predictors focus on the problem of classifying residues into ordered and disordered regions, and these methods tend to be validated on a diverse training set of proteins from eukaryotes, prokaryotes, and viruses. In this study, we investigate whether some predictors outperform others in the context of virus proteins and compared our findings with data from non-viral proteins. We evaluate the prediction accuracy of 21 methods, many of which are only available as web applications, on a curated set of 126 proteins encoded by viruses. Furthermore, we apply a random forest classifier to these predictor outputs. Based on cross-validation experiments, this ensemble approach confers a substantial improvement in accuracy, e.g., a mean 36 per cent gain in Matthews correlation coefficient. Lastly, we apply the random forest predictor to severe acute respiratory syndrome coronavirus 2 ORF6, an accessory gene that encodes a short (61 AA) and moderately disordered protein that inhibits the host innate immune response. We show that disorder prediction methods perform differently for viral and non-viral proteins, and that an ensemble approach can yield more robust and accurate predictions.

20.
Virus Evol ; 7(1): veaa104, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33505711

RESUMO

Among people living with human immunodeficiency virus type 1 (HIV-1), the long-term persistence of a population of cells carrying transcriptionally silent integrated viral DNA (provirus) remains the primary barrier to developing an effective cure. Ongoing cell division via proliferation is generally considered to be the driving force behind the persistence of this latent HIV-1 reservoir. The contribution of this mechanism (clonal expansion) is supported by the observation that proviral sequences sampled from the reservoir are often identical. This outcome is quantified as the 'clonality' of the sample population, e.g. the fraction of provirus sequences observed more than once. However, clonality as a quantitative measure is inconsistently defined and its statistical properties are not well understood. In this Reflections article, we use mathematical and phylogenetic frameworks to formally examine the inherent problems of using clonality to characterize the dynamics and proviral composition of the reservoir. We describe how clonality is not adequate for this task due to the inherent complexity of how infected cells are 'labeled' by proviral sequences-the outcome of a sampling process from the evolutionary history of active viral replication before treatment-as well as variation in cell birth and death rates among lineages and over time. Lastly, we outline potential directions in statistical and phylogenetic research to address these issues.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA