Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 36
Filter
1.
Proc Natl Acad Sci U S A ; 120(24): e2220294120, 2023 06 13.
Article in English | MEDLINE | ID: mdl-37276424

ABSTRACT

A hepatitis C virus (HCV) vaccine is urgently needed. Vaccine development has been hindered by HCV's genetic diversity, particularly within the immunodominant hypervariable region 1 (HVR1). Here, we developed a strategy to elicit broadly neutralizing antibodies to HVR1, which had previously been considered infeasible. We first applied a unique information theory-based measure of genetic distance to evaluate phenotypic relatedness between HVR1 variants. These distances were used to model the structure of HVR1's sequence space, which was found to have five major clusters. Variants from each cluster were used to immunize mice individually, and as a pentavalent mixture. Sera obtained following immunization neutralized every variant in a diverse HCVpp panel (n = 10), including those resistant to monovalent immunization, and at higher mean titers (1/ID50 = 435) than a glycoprotein E2 (1/ID50 = 205) vaccine. This synergistic immune response offers a unique approach to overcoming antigenic variability and may be applicable to other highly mutable viruses.


Subject(s)
Hepacivirus , Hepatitis C , Animals , Mice , Viral Envelope Proteins/genetics , Immunization , Immunity , Hepatitis C Antibodies , Antibodies, Neutralizing
2.
J Comput Biol ; 30(4): 420-431, 2023 04.
Article in English | MEDLINE | ID: mdl-36602524

ABSTRACT

Application of genetic distances to measure phenotypic relatedness is a challenging task, reflecting the complex relationship between genotype and phenotype. Accurate assessment of proximity among sequences with different phenotypic traits depends on how strongly the chosen distance is associated with structural and functional properties. In this study, we present a new distance measure Mutual Information and Entropy H (MIH) for categorical data such as nucleotide or amino acid sequences. MIH applies an information matrix (IM), which is calculated from the data and captures heterogeneity of individual positions as measured by Shannon entropy and coordinated substitutions among positions as measured by mutual information. In general, MIH assigns low weights to differences occurring at high entropy positions or at dependent positions. MIH distance was compared with other common distances on two experimental and two simulated data sets. MIH showed the best ability to distinguish cross-immunoreactive sequence pairs from non-cross-immunoreactive pairs of variants of the hepatitis C virus hypervariable region 1 (26,883 pairwise comparisons), and Major Histocompatibility Complex (MHC) binding peptides (n = 181) from non-binding peptides (n = 129). Analysis of 74 simulated RNA secondary structures also showed that the ratio between MIH distance of sequences from the same RNA structure and MIH of sequences from different structures is three orders of magnitude greater than for Hamming distances. These findings indicate that lower MIH between two sequences is associated with greater probability of the sequences to belong to the same phenotype. Examination of rule-based phenotypes generated in silico showed that (1) MIH is strongly associated with phenotypic differences, (2) IM of sequences under selection is very different from IM generated under random scenarios, and (3) IM is robust to sampling. In conclusion, MIH strongly approximates structural/functional distances and should have important applications to a wide range of biological problems, including evolution, artificial selection of biological functions and structures, and measuring phenotypic similarity.


Subject(s)
Peptides , RNA , Amino Acid Sequence , Phenotype
3.
Pathogens ; 11(5)2022 Apr 28.
Article in English | MEDLINE | ID: mdl-35631041

ABSTRACT

The Plasmodium falciparum protein VAR2CSA allows infected erythrocytes to accumulate within the placenta, inducing pathology and poor birth outcomes. Multiple exposures to placental malaria (PM) induce partial immunity against VAR2CSA, making it a promising vaccine candidate. However, the extent to which VAR2CSA genetic diversity contributes to immune evasion and virulence remains poorly understood. The deep sequencing of the var2csa DBL3X domain in placental blood from forty-nine primigravid and multigravid women living in malaria-endemic western Kenya revealed numerous unique sequences within individuals in association with chronic PM but not gravidity. Additional analysis unveiled four distinct sequence types that were variably present in mixed proportions amongst the study population. An analysis of the abundance of each of these sequence types revealed that one was inversely related to infant gestational age, another was inversely related to placental parasitemia, and a third was associated with chronic PM. The categorization of women according to the type to which their dominant sequence belonged resulted in the segregation of types as a function of gravidity: two types predominated in multigravidae whereas the other two predominated in primigravidae. The univariate logistic regression analysis of sequence type dominance further revealed that gravidity, maternal age, placental parasitemia, and hemozoin burden (within maternal leukocytes), reported a lack of antimalarial drug use, and infant gestational age and birth weight influenced the odds of membership in one or more of these sequence predominance groups. Cumulatively, these results show that unique var2csa sequences differentially appear in women with different PM exposure histories and segregate to types independently associated with maternal factors, infection parameters, and birth outcomes. The association of some var2csa sequence types with indicators of pathogenesis should motivate vaccine efforts to further identify and target VAR2CSA epitopes associated with maternal morbidity and poor birth outcomes.

4.
BMC Bioinformatics ; 23(1): 62, 2022 Feb 08.
Article in English | MEDLINE | ID: mdl-35135469

ABSTRACT

BACKGROUND: Investigation of outbreaks to identify the primary case is crucial for the interruption and prevention of transmission of infectious diseases. These individuals may have a higher risk of participating in near future transmission events when compared to the other patients in the outbreak, so directing more transmission prevention resources towards these individuals is a priority. Although the genetic characterization of intra-host viral populations can aid the identification of transmission clusters, it is not trivial to determine the directionality of transmissions during outbreaks, owing to complexity of viral evolution. Here, we present a new computational framework, PYCIVO: primary case inference in viral outbreaks. This framework expands upon our earlier work in development of QUENTIN, which builds a probabilistic disease transmission tree based on simulation of evolution of intra-host hepatitis C virus (HCV) variants between cases involved in direct transmission during an outbreak. PYCIVO improves upon QUENTIN by also adding a custom heterogeneity index and identifying the scenario when the primary case may have not been sampled. RESULTS: These approaches were validated using a set of 105 sequence samples from 11 distinct HCV transmission clusters identified during outbreak investigations, in which the primary case was epidemiologically verified. Both models can detect the correct primary case in 9 out of 11 transmission clusters (81.8%). However, while QUENTIN issues erroneous predictions on the remaining 2 transmission clusters, PYCIVO issues a null output for these clusters, giving it an effective prediction accuracy of 100%. To further evaluate accuracy of the inference, we created 10 modified transmission clusters in which the primary case had been removed. In this scenario, PYCIVO was able to correctly identify that there was no primary case in 8/10 (80%) of these modified clusters. This model was validated with HCV; however, this approach may be applicable to other microbial pathogens. CONCLUSIONS: PYCIVO improves upon QUENTIN by also implementing a custom heterogeneity index which empowers PYCIVO to make the important 'No primary case' prediction. One or more samples, possibly including the primary case, may have not been sampled, and this designation is meant to account for these scenarios.


Subject(s)
Communicable Diseases , Hepatitis C , Computational Biology , Disease Outbreaks , Hepacivirus/genetics , Hepatitis C/epidemiology , Humans , Phylogeny
5.
BMC Bioinformatics ; 21(Suppl 18): 482, 2020 Dec 30.
Article in English | MEDLINE | ID: mdl-33375937

ABSTRACT

BACKGROUND: In molecular epidemiology, comparison of intra-host viral variants among infected persons is frequently used for tracing transmissions in human population and detecting viral infection outbreaks. Application of Ultra-Deep Sequencing (UDS) immensely increases the sensitivity of transmission detection but brings considerable computational challenges when comparing all pairs of sequences. We developed a new population comparison method based on convex hulls in hamming space. We applied this method to a large set of UDS samples obtained from unrelated cases infected with hepatitis C virus (HCV) and compared its performance with three previously published methods. RESULTS: The convex hull in hamming space is a data structure that provides information on: (1) average hamming distance within the set, (2) average hamming distance between two sets; (3) closeness centrality of each sequence; and (4) lower and upper bound of all the pairwise distances among the members of two sets. This filtering strategy rapidly and correctly removes 96.2% of all pairwise HCV sample comparisons, outperforming all previous methods. The convex hull distance (CHD) algorithm showed variable performance depending on sequence heterogeneity of the studied populations in real and simulated datasets, suggesting the possibility of using clustering methods to improve the performance. To address this issue, we developed a new clustering algorithm, k-hulls, that reduces heterogeneity of the convex hull. This efficient algorithm is an extension of the k-means algorithm and can be used with any type of categorical data. It is 6.8-times more accurate than k-mode, a previously developed clustering algorithm for categorical data. CONCLUSIONS: CHD is a fast and efficient filtering strategy for massively reducing the computational burden of pairwise comparison among large samples of sequences, and thus, aiding the calculation of transmission links among infected individuals using threshold-based methods. In addition, the convex hull efficiently obtains important summary metrics for intra-host viral populations.


Subject(s)
Algorithms , Genomics/methods , Cluster Analysis , Hepacivirus/genetics , Humans
6.
PLoS One ; 15(12): e0243622, 2020.
Article in English | MEDLINE | ID: mdl-33284864

ABSTRACT

Persons who inject drugs (PWID) are at increased risk for overdose death (ODD), infections with HIV, hepatitis B (HBV) and hepatitis C virus (HCV), and noninfectious health conditions. Spatiotemporal identification of PWID communities is essential for developing efficient and cost-effective public health interventions for reducing morbidity and mortality associated with injection-drug use (IDU). Reported ODDs are a strong indicator of the extent of IDU in different geographic regions. However, ODD quantification can take time, with delays in ODD reporting occurring due to a range of factors including death investigation and drug testing. This delayed ODD reporting may affect efficient early interventions for infectious diseases. We present a novel model, Dynamic Overdose Vulnerability Estimator (DOVE), for assessment and spatiotemporal mapping of ODDs in different U.S. jurisdictions. Using Google® Web-search volumes (i.e., the fraction of all searches that include certain words), we identified a strong association between the reported ODD rates and drug-related search terms for 2004-2017. A machine learning model (Extremely Random Forest) was developed to produce yearly ODD estimates at state and county levels, as well as monthly estimates at state level. Regarding the total number of ODDs per year, DOVE's error was only 3.52% (Median Absolute Error, MAE) in the United States for 2005-2017. DOVE estimated 66,463 ODDs out of the reported 70,237 (94.48%) during 2017. For that year, the MAE of the individual ODD rates was 4.43%, 7.34%, and 12.75% among yearly estimates for states, yearly estimates for counties, and monthly estimates for states, respectively. These results indicate suitability of the DOVE ODD estimates for dynamic IDU assessment in most states, which may alert for possible increased morbidity and mortality associated with IDU. ODD estimates produced by DOVE offer an opportunity for a spatiotemporal ODD mapping. Timely identification of potential mortality trends among PWID might assist in developing efficient ODD prevention and HBV, HCV, and HIV infection elimination programs by targeting public health interventions to the most vulnerable PWID communities.


Subject(s)
Drug Overdose/epidemiology , Internet , Machine Learning , Drug Overdose/mortality , Humans , Public Health , Risk Factors , Search Engine , Substance Abuse, Intravenous/epidemiology , Substance Abuse, Intravenous/mortality , United States/epidemiology
8.
BMC Med Genomics ; 12(Suppl 4): 74, 2019 06 06.
Article in English | MEDLINE | ID: mdl-31167647

ABSTRACT

BACKGROUND: Ultra-Deep Sequencing (UDS) enabled identification of specific changes in human genome occurring in malignant tumors, with current approaches calling for the detection of specific mutations associated with certain cancers. However, such associations are frequently idiosyncratic and cannot be generalized for diagnostics. Mitochondrial DNA (mtDNA) has been shown to be functionally associated with several cancer types. Here, we study the association of intra-host mtDNA diversity with Hepatocellular Carcinoma (HCC). RESULTS: UDS mtDNA exome data from blood of patients with HCC (n = 293) and non-cancer controls (NC, n = 391) were used to: (i) measure the genetic heterogeneity of nucleotide sites from the entire population of intra-host mtDNA variants rather than to detect specific mutations, and (ii) apply machine learning algorithms to develop a classifier for HCC detection. Average total entropy of HCC mtDNA is 1.24-times lower than of NC mtDNA (p = 2.84E-47). Among all polymorphic sites, 2.09% had a significantly different mean entropy between HCC and NC, with 0.32% of the HCC mtDNA sites having greater (p < 0.05) and 1.77% of the sites having lower mean entropy (p < 0.05) as compared to NC. The entropy profile of each sample was used to further explore the association between mtDNA heterogeneity and HCC by means of a Random Forest (RF) classifier The RF-classifier separated 232 HCC and 232 NC patients with accuracy of up to 99.78% and average accuracy of 92.23% in the 10-fold cross-validation. The classifier accurately separated 93.08% of HCC (n = 61) and NC (n = 159) patients in a validation dataset that was not used for the RF parameter optimization. CONCLUSIONS: Polymorphic sites contributing most to the mtDNA association with HCC are scattered along the mitochondrial genome, affecting all mitochondrial genes. The findings suggest that application of heterogeneity profiles of intra-host mtDNA variants from blood may help overcome barriers associated with the complex association of specific mutations with cancer, enabling the development of accurate, rapid, inexpensive and minimally invasive diagnostic detection of cancer.


Subject(s)
Carcinoma, Hepatocellular/blood , Carcinoma, Hepatocellular/genetics , DNA, Mitochondrial/blood , Entropy , Liver Neoplasms/blood , Liver Neoplasms/genetics , Carcinoma, Hepatocellular/diagnosis , Carcinoma, Hepatocellular/pathology , Genomics , Humans , Liver Neoplasms/diagnosis , Liver Neoplasms/pathology , Neoplasm Grading
9.
BMC Bioinformatics ; 19(Suppl 11): 360, 2018 Oct 22.
Article in English | MEDLINE | ID: mdl-30343669

ABSTRACT

BACKGROUND: Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets sampled from infected individuals or studying of evolution of viruses or immune repertoires by analysis of network of intra-host viral variants or antibody clonotypes formed by genetically close sequences. The most obvious naïeve algorithms to extract such sequence families are impractical in light of the massive size of modern NGS datasets. RESULTS: In this paper, we present fast and scalable k-mer-based framework to perform such sequence similarity queries efficiently, which specifically targets data produced by deep sequencing of heterogeneous populations such as viruses. It shows better filtering quality and time performance when comparing to other tools. The tool is freely available for download at https://github.com/vyacheslav-tsivina/signature-sj CONCLUSION: The proposed tool allows for efficient detection of genetic relatedness between genomic samples produced by deep sequencing of heterogeneous populations. It should be especially useful for analysis of relatedness of genomes of viruses with unevenly distributed variable genomic regions, such as HIV and HCV. For the future we envision, that besides applications in molecular epidemiology the tool can also be adapted to immunosequencing and metagenomics data.


Subject(s)
Algorithms , Genetic Variation , Genome , Phylogeny , Base Sequence , Entropy , High-Throughput Nucleotide Sequencing , Humans , Metagenomics , Reproducibility of Results , Time Factors
10.
BMC Bioinformatics ; 19(Suppl 11): 358, 2018 Oct 22.
Article in English | MEDLINE | ID: mdl-30343674

ABSTRACT

BACKGROUND: Molecular surveillance and outbreak investigation are important for elimination of hepatitis C virus (HCV) infection in the United States. A web-based system, Global Hepatitis Outbreak and Surveillance Technology (GHOST), has been developed using Illumina MiSeq-based amplicon sequence data derived from the HCV E1/E2-junction genomic region to enable public health institutions to conduct cost-effective and accurate molecular surveillance, outbreak detection and strain characterization. However, as there are many factors that could impact input data quality to which the GHOST system is not completely immune, accuracy of epidemiological inferences generated by GHOST may be affected. Here, we analyze the data submitted to the GHOST system during its pilot phase to assess the nature of the data and to identify common quality concerns that can be detected and corrected automatically. RESULTS: The GHOST quality control filters were individually examined, and quality failure rates were measured for all samples, including negative controls. New filters were developed and introduced to detect primer dimers, loss of specimen-specific product, or short products. The genotyping tool was adjusted to improve the accuracy of subtype calls. The identification of "chordless" cycles in a transmission network from data generated with known laboratory-based quality concerns allowed for further improvement of transmission detection by GHOST in surveillance settings. Parameters derived to detect actionable common quality control anomalies were incorporated into the automatic quality control module that rejects data depending on the magnitude of a quality problem, and warns and guides users in performing correctional actions. The guiding responses generated by the system are tailored to the GHOST laboratory protocol. CONCLUSIONS: Several new quality control problems were identified in MiSeq data submitted to GHOST and used to improve protection of the system from erroneous data and users from erroneous inferences. The GHOST system was upgraded to include identification of causes of erroneous data and recommendation of corrective actions to laboratory users.


Subject(s)
Disease Outbreaks/prevention & control , Population Surveillance/methods , Automation , Genotyping Techniques , Hepacivirus/physiology , Hepatitis C/epidemiology , Hepatitis C/virology , Humans , Quality Control , Reference Standards , United States
11.
Infect Genet Evol ; 63: 204-215, 2018 09.
Article in English | MEDLINE | ID: mdl-29860098

ABSTRACT

Hepatitis C virus (HCV) infection is a global public health problem. The implementation of public health interventions (PHI) to control HCV infection could effectively interrupt HCV transmission. PHI targeting high-risk populations, e.g., people who inject drugs (PWID), are most efficient but there is a lack of tools for prioritizing individuals within a high-risk community. Here, we present Intelligent Network DisRuption Analysis (INDRA), a targeted strategy for efficient interruption of hepatitis C transmissions.Using a large HCV transmission network among PWID in Indiana as an example, we compare effectiveness of random and targeted strategies in reducing the rate of HCV transmission in two settings: (1) long-established and (2) rapidly spreading infections (outbreak). Identification of high centrality for the network nodes co-infected with HIV or > 1 HCV subtype indicates that the network structure properly represents the underlying contacts among PWID relevant to the transmission of these infections. Changes in the network's global efficiency (GE) were used as a measure of the PHI effects. In setting 1, simulation experiments showed that a 50% GE reduction can be achieved by removing 11.2 times less nodes using targeted vs random strategies. A greater effect of targeted strategies on GE was consistently observed when networks were simulated: (1) with a varying degree of errors in node sampling and link assignment, and (2) at different levels of transmission reduction at affected nodes. In simulations considering a 10% removal of infected nodes, targeted strategies were ~2.8 times more effective than random in reducing incidence. Peer-education intervention (PEI) was modeled as a probabilistic distribution of actionable knowledge of safe injection practices from the affected node to adjacent nodes in the network. Addition of PEI to the models resulted in a 2-3 times greater reduction in incidence than from direct PHI alone. In setting 2, however, random direct PHI were ~3.2 times more effective in reducing incidence at the simulated conditions. Nevertheless, addition of PEI resulted in a ~1.7-fold greater efficiency of targeted PHI. In conclusion, targeted PHI facilitated by INDRA outperforms random strategies in decreasing circulation of long-established infections. Network-based PEI may amplify effects of PHI on incidence reduction in both settings.


Subject(s)
HIV Infections/prevention & control , Hepacivirus/genetics , Hepatitis C/prevention & control , Neural Networks, Computer , Substance Abuse, Intravenous/epidemiology , Universal Precautions/methods , Coinfection , Contact Tracing/statistics & numerical data , HIV/isolation & purification , HIV Infections/epidemiology , HIV Infections/transmission , HIV Infections/virology , Hepacivirus/classification , Hepacivirus/isolation & purification , Hepatitis C/epidemiology , Hepatitis C/transmission , Hepatitis C/virology , Humans , Incidence , Indiana/epidemiology , Substance Abuse, Intravenous/virology
12.
BMC Genomics ; 18(Suppl 10): 881, 2017 Dec 06.
Article in English | MEDLINE | ID: mdl-29244001

ABSTRACT

BACKGROUND: Intra-host hepatitis C virus (HCV) populations are genetically heterogeneous and organized in subpopulations. With the exception of blood transfusions, transmission of HCV occurs via a small number of genetic variants, the effect of which is frequently described as a bottleneck. Stochasticity of transmission associated with the bottleneck is usually used to explain genetic differences among HCV populations identified in the source and recipient cases, which may be further exacerbated by intra-host HCV evolution and differential biological capacity of HCV variants to successfully establish a population in a new host. RESULTS: Transmissibility was formulated as a property that can be measured from experimental Ultra-Deep Sequencing (UDS) data. The UDS data were obtained from one large hepatitis C outbreak involving an epidemiologically defined source and 18 recipient cases. k-Step networks of HCV variants were constructed and used to identify a potential association between transmissibility and network centrality of individual HCV variants from the source. An additional dataset obtained from nine other HCV outbreaks with known directionality of transmission was used for validation. Transmissibility was not found to be dependent on high frequency of variants in the source, supporting the earlier observations of transmission of minority variants. Among all tested measures of centrality, the highest correlation of transmissibility was found with Hamming centrality (r = 0.720; p = 1.57 E-71). Correlation between genetic distances and differences in transmissibility among HCV variants from the source was found to be 0.3276 (Mantel Test, p = 9.99 E-5), indicating association between genetic proximity and transmissibility. A strong correlation ranging from 0.565-0.947 was observed between Hamming centrality and transmissibility in 7 of the 9 additional transmission clusters (p < 0.05). CONCLUSIONS: Transmission is not an exclusively stochastic process. Transmissibility, as formally measured in this study, is associated with certain biological properties that also define location of variants in the genetic space occupied by the HCV strain from the source. The measure may also be applicable to other highly heterogeneous viruses. Besides improving accuracy of outbreak investigations, this finding helps with the understanding of molecular mechanisms contributing to establishment of chronic HCV infection.


Subject(s)
Genetic Variation , Hepacivirus/genetics , Hepacivirus/physiology , Disease Outbreaks , Evolution, Molecular , Genotype , Hepatitis C/epidemiology , Hepatitis C/transmission , High-Throughput Nucleotide Sequencing , Humans
13.
BMC Genomics ; 18(Suppl 10): 916, 2017 Dec 06.
Article in English | MEDLINE | ID: mdl-29244005

ABSTRACT

BACKGROUND: Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Effective HCV outbreak investigation requires comprehensive surveillance and robust case investigation. We previously developed and validated a methodology for the rapid and cost-effective identification of HCV transmission clusters. Global Hepatitis Outbreak and Surveillance Technology (GHOST) is a cloud-based system enabling users, regardless of computational expertise, to analyze and visualize transmission clusters in an independent, accurate and reproducible way. RESULTS: We present and explore performance of several GHOST implemented algorithms using next-generation sequencing data experimentally obtained from hypervariable region 1 of genetically related and unrelated HCV strains. GHOST processes data from an entire MiSeq run in approximately 3 h. A panel of seven specimens was used for preparation of six repeats of MiSeq libraries. Testing sequence data from these libraries by GHOST showed a consistent transmission linkage detection, testifying to high reproducibility of the system. Lack of linkage among genetically unrelated HCV strains and constant detection of genetic linkage between HCV strains from known transmission pairs and from follow-up specimens at different levels of MiSeq-read sampling indicate high specificity and sensitivity of GHOST in accurate detection of HCV transmission. CONCLUSIONS: GHOST enables automatic extraction of timely and relevant public health information suitable for guiding effective intervention measures. It is designed as a virtual diagnostic system intended for use in molecular surveillance and outbreak investigations rather than in research. The system produces accurate and reproducible information on HCV transmission clusters for all users, irrespective of their level of bioinformatics expertise. Improvement in molecular detection capacity will contribute to increasing the rate of transmission detection, thus providing opportunity for rapid, accurate and effective response to outbreaks of hepatitis C. Although GHOST was originally developed for hepatitis C surveillance, its modular structure is readily applicable to other infectious diseases. Worldwide availability of GHOST for the detection of HCV transmissions will foster deeper involvement of public health researchers and practitioners in hepatitis C outbreak investigation.


Subject(s)
Cloud Computing , Computational Biology/methods , Disease Outbreaks/statistics & numerical data , Epidemiological Monitoring , Hepatitis C/epidemiology , Internationality , Algorithms , Humans , Software , User-Computer Interface
14.
BMC Genomics ; 18(Suppl 4): 372, 2017 05 24.
Article in English | MEDLINE | ID: mdl-28589864

ABSTRACT

BACKGROUND: Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Molecular analysis has been frequently used in the study of HCV outbreaks and transmission chains; helping identify a cluster of sequences as linked by transmission if their genetic distances are below a previously defined threshold. However, HCV exists as a population of numerous variants in each infected individual and it has been observed that minority variants in the source are often the ones responsible for transmission, a situation that precludes the use of a single sequence per individual because many such transmissions would be missed. The use of Next-Generation Sequencing immensely increases the sensitivity of transmission detection but brings a considerable computational challenge because all sequences need to be compared among all pairs of samples. METHODS: We developed a three-step strategy that filters pairs of samples according to different criteria: (i) a k-mer bloom filter, (ii) a Levenhstein filter and (iii) a filter of identical sequences. We applied these three filters on a set of samples that cover the spectrum of genetic relationships among HCV cases, from being part of the same transmission cluster, to belonging to different subtypes. RESULTS: Our three-step filtering strategy rapidly removes 85.1% of all the pairwise sample comparisons and 91.0% of all pairwise sequence comparisons, accurately establishing which pairs of HCV samples are below the relatedness threshold. CONCLUSIONS: We present a fast and efficient three-step filtering strategy that removes most sequence comparisons and accurately establishes transmission links of any threshold-based method. This highly efficient workflow will allow a faster response and molecular detection capacity, improving the rate of detection of viral transmissions with molecular data.


Subject(s)
Hepacivirus/genetics , Hepacivirus/physiology , High-Throughput Nucleotide Sequencing , Algorithms , Statistics as Topic
15.
J Gen Virol ; 98(5): 1048-1057, 2017 May.
Article in English | MEDLINE | ID: mdl-28537543

ABSTRACT

Despite the significant public health problems associated with hepatitis B virus (HBV) in sub-Saharan Africa, many countries in this region do not have systematic HBV surveillance or genetic information on HBV circulating locally. Here, we report on the genetic characterization of 772 HBV strains from Tanzania. Phylogenetic analysis of the S-gene sequences showed prevalence of HBV genotype A (HBV/A, n=671, 86.9 %), followed by genotypes D (HBV/D, n=95, 12.3 %) and E (HBV/E, n=6, 0.8 %). All HBV/A sequences were further classified into subtype A1, while the HBV/D sequences were assigned to a new cluster. Among the Tanzanian sequences, 84 % of HBV/A1 and 94 % of HBV/D were unique. The Tanzanian and global HBV/A1 sequences were compared and were completely intermixed in the phylogenetic tree, with the Tanzanian sequences frequently generating long terminal branches, indicating a long history of HBV/A1 infections in the country. The time to the most recent common ancestor was estimated to be 188 years ago [95 % highest posterior density (HPD): 132 to 265 years] for HBV/A1 and 127 years ago (95 % HPD: 79 to 192 years) for HBV/D. The Bayesian skyline plot showed that the number of transmissions 'exploded' exponentially between 1960-1970 for HBV/A1 and 1970-1990 for HBV/D, with the effective population of HBV/A1 having expanded twice as much as that of HBV/D. The data suggest that Tanzania is at least a part of the geographic origin of the HBV/A1 subtype. A recent increase in the transmission rate and significant HBV genetic diversity should be taken into consideration when devising public health interventions to control HBV infections in Tanzania.

16.
J Infect Dis ; 213(6): 957-65, 2016 Mar 15.
Article in English | MEDLINE | ID: mdl-26582955

ABSTRACT

Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections are associated with unsafe injection practices, drug diversion, and other exposures to blood and are difficult to detect and investigate. Here, we developed and validated a simple approach for molecular detection of HCV transmissions in outbreak settings. We obtained sequences from the HCV hypervariable region 1 (HVR1), using end-point limiting-dilution (EPLD) technique, from 127 cases involved in 32 epidemiologically defined HCV outbreaks and 193 individuals with unrelated HCV strains. We compared several types of genetic distances and calculated a threshold, using minimal Hamming distances, that identifies transmission clusters in all tested outbreaks with 100% accuracy. The approach was also validated on sequences obtained using next-generation sequencing from HCV strains recovered from 239 individuals, and findings showed the same accuracy as that for EPLD. On average, the nucleotide diversity of the intrahost population was 6.2 times greater in the source case than in any incident case, allowing the correct detection of transmission direction in 8 outbreaks for which source cases were known. A simple and accurate distance-based approach developed here for detecting HCV transmissions streamlines molecular investigation of outbreaks, thus improving the public health capacity for rapid and effective control of hepatitis C.


Subject(s)
Disease Outbreaks , Genetic Linkage , Hepacivirus/genetics , Hepacivirus/isolation & purification , Hepatitis C/transmission , Hepatitis C/virology , Cluster Analysis , Genetic Variation , Genotype , Hepatitis C/epidemiology , Humans , Reproducibility of Results
17.
Cell Mol Gastroenterol Hepatol ; 2(5): 676-684, 2016 Sep.
Article in English | MEDLINE | ID: mdl-28174739

ABSTRACT

BACKGROUND & AIMS: The host genetic environment contributes significantly to the outcomes of hepatitis C virus (HCV) infection and therapy response, but little is known about any effects of HCV infection on the host beyond any changes related to adaptive immune responses. HCV persistence is associated strongly with mitochondrial dysfunction, with liver mitochondrial DNA (mtDNA) genetic diversity linked to disease progression. METHODS: We evaluated the genetic diversity of 2 mtDNA genomic regions (hypervariable segments 1 and 2) obtained from sera of 116 persons using next-generation sequencing. RESULTS: Results were as follows: (1) the average diversity among cases with seronegative acute HCV infection was 4.2 times higher than among uninfected controls; (2) the diversity level among cases with chronic HCV infection was 96.1 times higher than among uninfected controls; and (3) the diversity was 23.1 times higher among chronic than acute cases. In 2 patients who were followed up during combined interferon and ribavirin therapy, mtDNA nucleotide diversity decreased dramatically after the completion of therapy in both patients: by 100% in patient A after 54 days and by 70.51% in patient B after 76 days. CONCLUSIONS: HCV infection strongly affects mtDNA genetic diversity. A rapid decrease in mtDNA genetic diversity observed after therapy-induced HCV clearance suggests that the effect is reversible, emphasizing dynamic genetic relationships between HCV and mitochondria. The level of mtDNA nucleotide diversity can be used to discriminate recent from past infections, which should facilitate the detection of recent transmission events and thus help identify modes of transmission.

18.
PLoS One ; 10(12): e0145530, 2015.
Article in English | MEDLINE | ID: mdl-26683463

ABSTRACT

Globally, hepatitis C Virus (HCV) infection is responsible for a large proportion of persons with liver disease, including cancer. The infection is highly prevalent in sub-Saharan Africa. West Africa was identified as a geographic origin of two HCV genotypes. However, little is known about the genetic composition of HCV populations in many countries of the region. Using conventional and next-generation sequencing (NGS), we identified and genetically characterized 65 HCV strains circulating among HCV-positive blood donors in Kumasi, Ghana. Phylogenetic analysis using consensus sequences derived from 3 genomic regions of the HCV genome, 5'-untranslated region, hypervariable region 1 (HVR1) and NS5B gene, consistently classified the HCV variants (n = 65) into genotypes 1 (HCV-1, 15%) and genotype 2 (HCV-2, 85%). The Ghanaian and West African HCV-2 NS5B sequences were found completely intermixed in the phylogenetic tree, indicating a substantial genetic heterogeneity of HCV-2 in Ghana. Analysis of HVR1 sequences from intra-host HCV variants obtained by NGS showed that three donors were infected with >1 HCV strain, including infections with 2 genotypes. Two other donors share an HCV strain, indicating HCV transmission between them. The HCV-2 strain sampled from one donor was replaced with another HCV-2 strain after only 2 months of observation, indicating rapid strain switching. Bayesian analysis estimated that the HCV-2 strains in Ghana were expanding since the 16th century. The blood donors in Kumasi, Ghana, are infected with a very heterogeneous HCV population of HCV-1 and HCV-2, with HCV-2 being prevalent. The detection of three cases of co- or super-infections and transmission linkage between 2 cases suggests frequent opportunities for HCV exposure among the blood donors and is consistent with the reported high HCV prevalence. The conditions for effective HCV-2 transmission existed for ~ 3-4 centuries, indicating a long epidemic history of HCV-2 in Ghana.


Subject(s)
Hepacivirus/genetics , Hepatitis C/virology , Adult , Epidemics , Evolution, Molecular , Genes, Viral , Genetic Variation , Genotype , Ghana/epidemiology , Hepatitis C/epidemiology , Hepatitis C/transmission , High-Throughput Nucleotide Sequencing , Humans , Male , Molecular Typing , Phylogeny , Sequence Analysis, DNA
20.
J Infect Dis ; 212(12): 1962-9, 2015 Dec 15.
Article in English | MEDLINE | ID: mdl-26155829

ABSTRACT

BACKGROUND: Up to 30% of acute viral hepatitis has no known etiology. To determine the disease etiology in patients with acute hepatitis of unknown etiology (HUE), serum specimens were obtained from 38 patients residing in the United Kingdom and Vietnam and from 26 healthy US blood donors. All specimens tested negative for known viral infections causing hepatitis, using commercially available serological and nucleic acid assays. METHODS: Specimens were processed by sequence-independent complementary DNA amplification and next-generation sequencing (NGS). Sufficient material for individual NGS libraries was obtained from 12 HUE cases and 26 blood donors; the remaining HUE cases were sequenced as a pool. Read mapping was done by targeted and de novo assembly. RESULTS: Sequences from hepatitis B virus (HBV) were detected in 7 individuals with HUE (58.3%) and the pooled library, and hepatitis E virus (HEV) was detected in 2 individuals with HUE (16.7%) and the pooled library. Both HEV-positive cases were coinfected with HBV. HBV sequences belonged to genotypes A, D, or G, and HEV sequences belonged to genotype 3. No known hepatotropic viruses were detected in the tested normal human sera. CONCLUSIONS: NGS-based detection of HBV and HEV infections is more sensitive than using commercially available assays. HBV and HEV may be cryptically associated with HUE.


Subject(s)
Blood/virology , Diagnostic Tests, Routine/methods , Hepatitis B virus/isolation & purification , Hepatitis E virus/isolation & purification , Hepatitis, Viral, Human/diagnosis , Hepatitis, Viral, Human/etiology , Adult , Aged , Coinfection/virology , Female , Hepatitis B virus/genetics , Hepatitis E virus/genetics , Humans , Male , Middle Aged , Sensitivity and Specificity , Sequence Analysis, DNA , United Kingdom , United States , Vietnam , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...