ABSTRACT
The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.
Subject(s)
COVID-19/virology , Databases, Genetic , SARS-CoV-2/genetics , Web Browser , Coronaviridae/genetics , Genetic Variation , Genome, Viral , Humans , Molecular Sequence AnnotationABSTRACT
Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the Ensembl project (https://www.ensembl.org). The two resources collectively present genome annotation through a consistent set of interfaces spanning the tree of life presenting genome sequence, annotation, variation, transcriptomic data and comparative analysis. Here, we present our largest increase in plant, metazoan and fungal genomes since the project's inception creating one of the world's most comprehensive genomic resources and describe our efforts to reduce genome redundancy in our Bacteria portal. We detail our new efforts in gene annotation, our emerging support for pangenome analysis, our efforts to accelerate data dissemination through the Ensembl Rapid Release resource and our new AlphaFold visualization. Finally, we present details of our future plans including updates on our integration with Ensembl, and how we plan to improve our support for the microbial research community. Software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license). Data updates are synchronised with Ensembl's release cycle.
Subject(s)
Databases, Genetic , Genomics , Internet , Software , Animals , Computational Biology , Genome, Bacterial/genetics , Genome, Fungal/genetics , Genome, Plant/genetics , Plants/classification , Plants/genetics , Vertebrates/classification , Vertebrates/geneticsABSTRACT
BACKGROUND: Next-generation sequencing (NGS) is gradually replacing Sanger sequencing (SS) as the primary method for HIV genotypic resistance testing. However, there are limited systematic data on comparability of these methods in a clinical setting for the presence of low-abundance drug resistance mutations (DRMs) and their dependency on the variant-calling thresholds. METHODS: To compare the HIV-DRMs detected by SS and NGS, we included participants enrolled in the Swiss HIV Cohort Study (SHCS) with SS and NGS sequences available with sample collection dates ≤7 days apart. We tested for the presence of HIV-DRMs and compared the agreement between SS and NGS at different variant-calling thresholds. RESULTS: We included 594 pairs of SS and NGS from 527 SHCS participants. Males accounted for 80.5% of the participants, 76.3% were ART naive at sample collection and 78.1% of the sequences were subtype B. Overall, we observed a good agreement (Cohen's kappa >0.80) for HIV-DRMs for variant-calling thresholds ≥5%. We observed an increase in low-abundance HIV-DRMs detected at lower thresholds [28/417 (6.7%) at 10%-25% to 293/812 (36.1%) at 1%-2% threshold]. However, such low-abundance HIV-DRMs were overrepresented in ART-naive participants and were in most cases not detected in previously sampled sequences suggesting high sequencing error for thresholds <3%. CONCLUSIONS: We found high concordance between SS and NGS but also a substantial number of low-abundance HIV-DRMs detected only by NGS at lower variant-calling thresholds. Our findings suggest that a substantial fraction of the low-abundance HIV-DRMs detected at thresholds <3% may represent sequencing errors and hence should not be overinterpreted in clinical practice.
Subject(s)
Anti-HIV Agents , HIV Infections , HIV Seropositivity , HIV-1 , Male , Humans , HIV Infections/drug therapy , Cohort Studies , Drug Resistance, Viral/genetics , Viral Load , HIV Seropositivity/drug therapy , Mutation , High-Throughput Nucleotide Sequencing/methods , Genotype , Anti-HIV Agents/therapeutic useABSTRACT
The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed and comprehensive annotation of gene structures, regulatory elements and variants, and enable comparative genomics by inferring the evolutionary history of genes and genomes. Our integrated genomic data are made available in a variety of ways, including genome browsers, search interfaces, specialist tools such as the Ensembl Variant Effect Predictor, download files and programmatic interfaces. Here, we present recent Ensembl developments including two new website portals. Ensembl Rapid Release (http://rapid.ensembl.org) is designed to provide core tools and services for genomes as soon as possible and has been deployed to support large biodiversity sequencing projects. Our SARS-CoV-2 genome browser (https://covid-19.ensembl.org) integrates our own annotation with publicly available genomic data from numerous sources to facilitate the use of genomics in the international scientific response to the COVID-19 pandemic. We also report on other updates to our annotation resources, tools and services. All Ensembl data and software are freely available without restriction.
Subject(s)
Computational Biology/methods , Databases, Nucleic Acid , Genomics/methods , SARS-CoV-2/genetics , Vertebrates/genetics , Animals , COVID-19/epidemiology , COVID-19/virology , Humans , Internet , Molecular Sequence Annotation/methods , Pandemics , Vertebrates/classificationABSTRACT
The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.
Subject(s)
Computational Biology/methods , Databases, Genetic , Epigenome , Molecular Sequence Annotation , Algorithms , Animals , Computer Graphics , Databases, Protein , Genetic Variation , Genome-Wide Association Study , Genomics , Histones/metabolism , Humans , Imaging, Three-Dimensional , Internet , Ligands , Search Engine , Software , Species Specificity , Transcriptome , User-Computer Interface , Web BrowserABSTRACT
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.
Subject(s)
Computational Biology/methods , Databases, Genetic , Genetic Variation , Genome, Bacterial , Genome, Fungal , Genome, Plant , Algorithms , Animals , Caenorhabditis elegans/genetics , Genomics , Internet , Molecular Sequence Annotation , Phenotype , Plants/genetics , Reference Values , Software , User-Computer InterfaceABSTRACT
The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. Ensembl seeks to be a fundamental resource driving scientific progress by creating, maintaining and updating reference genome annotation and comparative genomics resources. This year we describe our new and expanded gene, variant and comparative annotation capabilities, which led to a 50% increase in the number of vertebrate genomes we support. We have also doubled the number of available human variants and added regulatory regions for many mouse cell types and developmental stages. Our data sets and tools are available via the Ensembl website as well as a through a RESTful webservice, Perl application programming interface and as data files for download.
Subject(s)
Databases, Genetic , Genome/genetics , Genomics , Vertebrates/genetics , Animals , Computational Biology/trends , Humans , Mice , Molecular Sequence Annotation , SoftwareABSTRACT
[This corrects the article DOI: 10.1371/journal.pbio.2001855.].
ABSTRACT
HIV-1 set-point viral load-the approximately stable value of viraemia in the first years of chronic infection-is a strong predictor of clinical outcome and is highly variable across infected individuals. To better understand HIV-1 pathogenesis and the evolution of the viral population, we must quantify the heritability of set-point viral load, which is the fraction of variation in this phenotype attributable to viral genetic variation. However, current estimates of heritability vary widely, from 6% to 59%. Here we used a dataset of 2,028 seroconverters infected between 1985 and 2013 from 5 European countries (Belgium, Switzerland, France, the Netherlands and the United Kingdom) and estimated the heritability of set-point viral load at 31% (CI 15%-43%). Specifically, heritability was measured using models of character evolution describing how viral load evolves on the phylogeny of whole-genome viral sequences. In contrast to previous studies, (i) we measured viral loads using standardized assays on a sample collected in a strict time window of 6 to 24 months after infection, from which the viral genome was also sequenced; (ii) we compared 2 models of character evolution, the classical "Brownian motion" model and another model ("Ornstein-Uhlenbeck") that includes stabilising selection on viral load; (iii) we controlled for covariates, including age and sex, which may inflate estimates of heritability; and (iv) we developed a goodness of fit test based on the correlation of viral loads in cherries of the phylogenetic tree, showing that both models of character evolution fit the data well. An overall heritability of 31% (CI 15%-43%) is consistent with other studies based on regression of viral load in donor-recipient pairs. Thus, about a third of variation in HIV-1 virulence is attributable to viral genetic variation.
Subject(s)
Genetic Variation , Genome, Viral , HIV Infections/microbiology , HIV Seropositivity/microbiology , HIV-1/genetics , Human Immunodeficiency Virus Proteins/genetics , Models, Genetic , Adult , Aged , Cohort Studies , Europe , Evolution, Molecular , Female , Genome-Wide Association Study , HIV Infections/blood , HIV Seropositivity/blood , HIV-1/growth & development , HIV-1/isolation & purification , HIV-1/pathogenicity , Human Immunodeficiency Virus Proteins/blood , Human Immunodeficiency Virus Proteins/metabolism , Humans , Male , Middle Aged , Phylogeny , Registries , Seroconversion , Viral Load , VirulenceABSTRACT
The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.
Subject(s)
Databases, Genetic , Datasets as Topic , Genome , Information Dissemination , Animals , Epigenomics , Genome, Human , Genome-Wide Association Study , Genomics , High-Throughput Nucleotide Sequencing , Humans , Molecular Sequence Annotation , Vertebrates/genetics , Web BrowserABSTRACT
A central feature of pathogen genomics is that different infectious particles (virions and bacterial cells) within an infected individual may be genetically distinct, with patterns of relatedness among infectious particles being the result of both within-host evolution and transmission from one host to the next. Here, we present a new software tool, phyloscanner, which analyses pathogen diversity from multiple infected hosts. phyloscanner provides unprecedented resolution into the transmission process, allowing inference of the direction of transmission from sequence data alone. Multiply infected individuals are also identified, as they harbor subpopulations of infectious particles that are not connected by within-host evolution, except where recombinant types emerge. Low-level contamination is flagged and removed. We illustrate phyloscanner on both viral and bacterial pathogens, namely HIV-1 sequenced on Illumina and Roche 454 platforms, HCV sequenced with the Oxford Nanopore MinION platform, and Streptococcus pneumoniae with sequences from multiple colonies per individual. phyloscanner is available from https://github.com/BDI-pathogens/phyloscanner.
ABSTRACT
The global-scale epidemiology and genome-wide evolutionary dynamics of influenza B remain poorly understood compared with influenza A viruses. We compiled a spatio-temporally comprehensive dataset of influenza B viruses, comprising over 2,500 genomes sampled worldwide between 1987 and 2015, including 382 newly-sequenced genomes that fill substantial gaps in previous molecular surveillance studies. Our contributed data increase the number of available influenza B virus genomes in Europe, Africa and Central Asia, improving the global context to study influenza B viruses. We reveal Yamagata-lineage diversity results from co-circulation of two antigenically-distinct groups that also segregate genetically across the entire genome, without evidence of intra-lineage reassortment. In contrast, Victoria-lineage diversity stems from geographic segregation of different genetic clades, with variability in the degree of geographic spread among clades. Differences between the lineages are reflected in their antigenic dynamics, as Yamagata-lineage viruses show alternating dominance between antigenic groups, while Victoria-lineage viruses show antigenic drift of a single lineage. Structural mapping of amino acid substitutions on trunk branches of influenza B gene phylogenies further supports these antigenic differences and highlights two potential mechanisms of adaptation for polymerase activity. Our study provides new insights into the epidemiological and molecular processes shaping influenza B virus evolution globally.
Subject(s)
Influenza B virus/genetics , Influenza, Human/epidemiology , Influenza, Human/virology , Amino Acid Substitution , Antigenic Variation , Antigens, Viral/genetics , Databases, Genetic , Evolution, Molecular , Genetic Variation , Genome, Viral , Global Health , Hemagglutinin Glycoproteins, Influenza Virus/genetics , Humans , Influenza B virus/classification , Influenza B virus/immunology , Models, Molecular , Molecular Epidemiology , Phylogeny , RNA-Dependent RNA Polymerase/chemistry , RNA-Dependent RNA Polymerase/genetics , Reassortant Viruses/genetics , Viral Proteins/chemistry , Viral Proteins/geneticsABSTRACT
BACKGROUND: The factors determining differential HIV disease outcome among individuals expressing protective HLA alleles such as HLA-B*27:05 and HLA-B*57:01 remain unknown. We here analyse two HIV-infected subjects expressing both HLA-B*27:05 and HLA-B*57:01. One subject maintained low-to-undetectable viral loads for more than a decade of follow up. The other progressed to AIDS in < 3 years. RESULTS: The rapid progressor was the recipient within a known transmission pair, enabling virus sequences to be tracked from transmission. Progression was associated with a 12% Gag sequence change and 26% Nef sequence change at the amino acid level within 2 years. Although next generation sequencing from early timepoints indicated that multiple CD8+ cytotoxic T lymphocyte (CTL) escape mutants were being selected prior to superinfection, < 4% of the amino acid changes arising from superinfection could be ascribed to CTL escape. Analysis of an HLA-B*27:05/B*57:01 non-progressor, in contrast, demonstrated minimal virus sequence diversification (1.1% Gag amino acid sequence change over 10 years), and dominant HIV-specific CTL responses previously shown to be effective in control of viraemia were maintained. Clonal sequencing demonstrated that escape variants were generated within the non-progressor, but in many cases were not selected. In the rapid progressor, progression occurred despite substantial reductions in viral replicative capacity (VRC), and non-progression in the elite controller despite relatively high VRC. CONCLUSIONS: These data are consistent with previous studies demonstrating rapid progression in association with superinfection and that rapid disease progression can occur despite the relatively the low VRC that is typically observed in the setting of multiple CTL escape mutants.
Subject(s)
Disease Progression , HIV Infections/virology , HIV-1/physiology , Superinfection/virology , Amino Acid Substitution , CD4 Lymphocyte Count , CD4-Positive T-Lymphocytes/immunology , Cluster Analysis , Epitopes, T-Lymphocyte/genetics , Genetic Variation , HIV Core Protein p24/genetics , HIV Infections/genetics , HIV Infections/immunology , HIV-1/classification , HIV-1/genetics , HIV-1/immunology , HLA-B Antigens/immunology , High-Throughput Nucleotide Sequencing/methods , Humans , Male , RNA, Viral/blood , RNA, Viral/genetics , Sequence Analysis, RNA , Superinfection/genetics , Superinfection/immunology , T-Lymphocytes, Cytotoxic/immunology , Viral Load , Virus Replication , gag Gene Products, Human Immunodeficiency Virus/geneticsABSTRACT
MOTIVATION: An accurate genome assembly from short read sequencing data is critical for downstream analysis, for example allowing investigation of variants within a sequenced population. However, assembling sequencing data from virus samples, especially RNA viruses, into a genome sequence is challenging due to the combination of viral population diversity and extremely uneven read depth caused by amplification bias in the inevitable reverse transcription and polymerase chain reaction amplification process of current methods. RESULTS: We developed a new de novo assembler called IVA (Iterative Virus Assembler) designed specifically for read pairs sequenced at highly variable depth from RNA virus samples. We tested IVA on datasets from 140 sequenced samples from human immunodeficiency virus-1 or influenza-virus-infected people and demonstrated that IVA outperforms all other virus de novo assemblers. AVAILABILITY AND IMPLEMENTATION: The software runs under Linux, has the GPLv3 licence and is freely available from http://sanger-pathogens.github.io/iva
Subject(s)
Genome, Viral , HIV-1/genetics , Influenza A virus/genetics , Influenza B virus/genetics , RNA Viruses/genetics , Sequence Analysis, DNA/methods , Software , HIV Infections/genetics , HIV Infections/virology , HIV-1/isolation & purification , High-Throughput Nucleotide Sequencing/methods , Humans , Influenza A virus/isolation & purification , Influenza B virus/isolation & purification , Influenza, Human/genetics , Influenza, Human/virology , Polymerase Chain Reaction/methodsABSTRACT
BACKGROUND: The precise immune responses mediated by HLA class I molecules such as HLA-B*27:05 and HLA-B*57:01 that protect against HIV disease progression remain unclear. We studied a CRF01_AE clade HIV infected donor-recipient transmission pair in which the recipient expressed both HLA-B*27:05 and HLA-B*57:01. RESULTS: Within 4.5 years of diagnosis, the recipient had progressed to meet criteria for antiretroviral therapy initiation. We employed ultra-deep sequencing of the full-length virus genome in both donor and recipient as an unbiased approach by which to identify specific viral mutations selected in association with progression. Using a heat map method to highlight differences in the viral sequences between donor and recipient, we demonstrated that the majority of the recipient's mutations outside of Env were within epitopes restricted by HLA-B*27:05 and HLA-B*57:01, including the well-studied Gag epitopes. The donor, who also expressed HLA alleles associated with disease protection, HLA-A*32:01/B*13:02/B*14:01, showed selection of mutations in parallel with disease progression within epitopes restricted by these protective alleles. CONCLUSIONS: These studies of full-length viral sequences in a transmission pair, both of whom expressed protective HLA alleles but nevertheless failed to control viremia, are consistent with previous reports pointing to the critical role of Gag-specific CD8+ T cell responses restricted by protective HLA molecules in maintaining immune control of HIV infection. The transmission of subtype CRF01_AE clade infection may have contributed to accelerated disease progression in this pair as a result of clade-specific sequence differences in immunodominant epitopes.
Subject(s)
Disease Progression , HIV Infections/immunology , HIV Infections/pathology , HLA-B Antigens/metabolism , HLA-B27 Antigen/metabolism , Adult , Epitopes/genetics , Epitopes/immunology , Family Characteristics , Female , Gene Expression , HIV/classification , HIV/genetics , HIV Infections/transmission , Humans , Male , Molecular Sequence Data , Mutation, Missense , Sequence Analysis, DNA , gag Gene Products, Human Immunodeficiency Virus/genetics , gag Gene Products, Human Immunodeficiency Virus/immunologyABSTRACT
BACKGROUND: Since June, 2012, Middle East respiratory syndrome coronavirus (MERS-CoV) has, worldwide, caused 104 infections in people including 49 deaths, with 82 cases and 41 deaths reported from Saudi Arabia. In addition to confirming diagnosis, we generated the MERS-CoV genomic sequences obtained directly from patient samples to provide important information on MERS-CoV transmission, evolution, and origin. METHODS: Full genome deep sequencing was done on nucleic acid extracted directly from PCR-confirmed clinical samples. Viral genomes were obtained from 21 MERS cases of which 13 had 100%, four 85-95%, and four 30-50% genome coverage. Phylogenetic analysis of the 21 sequences, combined with nine published MERS-CoV genomes, was done. FINDINGS: Three distinct MERS-CoV genotypes were identified in Riyadh. Phylogeographic analyses suggest the MERS-CoV zoonotic reservoir is geographically disperse. Selection analysis of the MERS-CoV genomes reveals the expected accumulation of genetic diversity including changes in the S protein. The genetic diversity in the Al-Hasa cluster suggests that the hospital outbreak might have had more than one virus introduction. INTERPRETATION: We present the largest number of MERS-CoV genomes (21) described so far. MERS-CoV full genome sequences provide greater detail in tracking transmission. Multiple introductions of MERS-CoV are identified and suggest lower R0 values. Transmission within Saudi Arabia is consistent with either movement of an animal reservoir, animal products, or movement of infected people. Further definition of the exposures responsible for the sporadic introductions of MERS-CoV into human populations is urgently needed. FUNDING: Saudi Arabian Ministry of Health, Wellcome Trust, European Community, and National Institute of Health Research University College London Hospitals Biomedical Research Centre.
Subject(s)
Coronavirus Infections/genetics , Coronavirus/genetics , Disease Outbreaks , Evolution, Molecular , Genome, Viral , Respiratory Tract Infections/genetics , Base Sequence , Coronavirus Infections/epidemiology , Coronavirus Infections/transmission , Gene Amplification , Humans , Respiratory Tract Infections/epidemiology , Respiratory Tract Infections/transmission , Saudi Arabia/epidemiology , SyndromeABSTRACT
BACKGROUND: Dynamic changes in Human Immunodeficiency Virus 1 (HIV-1) sequence diversity and divergence are associated with immune control during primary infection and progression to AIDS. Consensus sequencing or single genome amplification sequencing of the HIV-1 envelope (env) gene, in particular the variable (V) regions, is used as a marker for HIV-1 genome diversity, but population diversity is only minimally, or semi-quantitatively sampled using these methods. RESULTS: Here we use second generation deep sequencing to determine inter-and intra-patient sequence heterogeneity and to quantify minor variants in a cohort of individuals either receiving or not receiving antiretroviral treatment following seroconversion; the SPARTAC trial. We show, through a cross-sectional study of sequence diversity of the env V3 in 30 antiretroviral-naive patients during primary infection that considerable population structure diversity exists, with some individuals exhibiting highly constrained plasma virus diversity. Diversity was independent of clinical markers (viral load, time from seroconversion, CD4 cell count) of infection. Serial sampling over 60 weeks of non-treated individuals that define three initially different diversity profiles showed that complex patterns of continuing HIV-1 sequence diversification and divergence could be readily detected. Evidence for minor sequence turnover, emergence of new variants and re-emergence of archived variants could be inferred from this analysis. Analysis of viral divergence over the same time period in patients who received short (12 weeks, ART12) or long course antiretroviral therapy (48 weeks, ART48) and a non-treated control group revealed that ART48 successfully suppressed viral divergence while ART12 did not have a significant effect. CONCLUSIONS: Deep sequencing is a sensitive and reliable method for investigating the diversity of the env V3 as an important component of HIV-1 genome diversity. Detailed insights into the complex early intra-patient dynamics of env V3 diversity and divergence were explored in antiretroviral-naïve recent seroconverters. Long course antiretroviral therapy, initiated soon after seroconversion and administered for 48 weeks, restricts HIV-1 divergence significantly. The effect of ART12 and ART48 on clinical markers of HIV infection and progression is currently investigated in the SPARTAC trial.
Subject(s)
Antiretroviral Therapy, Highly Active , Genes, env , Genetic Variation , HIV Antibodies/blood , HIV Envelope Protein gp120/genetics , HIV Infections/drug therapy , HIV-1/genetics , Peptide Fragments/genetics , Cohort Studies , Cross-Sectional Studies , HIV Infections/virology , HIV-1/drug effects , HIV-1/immunology , Humans , Mutation , Sequence Analysis, DNA/methods , Time Factors , Treatment OutcomeABSTRACT
Virus gene sequencing and phylogenetics can be used to study the epidemiological dynamics of rapidly evolving viruses. With complete genome data, it becomes possible to identify and trace individual transmission chains of viruses such as influenza virus during the course of an epidemic. Here we sequenced 153 pandemic influenza H1N1/09 virus genomes from United Kingdom isolates from the first (127 isolates) and second (26 isolates) waves of the 2009 pandemic and used their sequences, dates of isolation, and geographical locations to infer the genetic epidemiology of the epidemic in the United Kingdom. We demonstrate that the epidemic in the United Kingdom was composed of many cocirculating lineages, among which at least 13 were exclusively or predominantly United Kingdom clusters. The estimated divergence times of two of the clusters predate the detection of pandemic H1N1/09 virus in the United Kingdom, suggesting that the pandemic H1N1/09 virus was already circulating in the United Kingdom before the first clinical case. Crucially, three clusters contain isolates from the second wave of infections in the United Kingdom, two of which represent chains of transmission that appear to have persisted within the United Kingdom between the first and second waves. This demonstrates that whole-genome analysis can track in fine detail the behavior of individual influenza virus lineages during the course of a single epidemic or pandemic.
Subject(s)
Evolution, Molecular , Genome, Viral , Influenza A Virus, H1N1 Subtype/classification , Influenza A Virus, H1N1 Subtype/genetics , Influenza, Human/virology , Adolescent , Adult , Child , Humans , Influenza A Virus, H1N1 Subtype/isolation & purification , Influenza, Human/epidemiology , Molecular Sequence Data , Pandemics , Phylogeny , United Kingdom , Young AdultABSTRACT
Whole HIV-1 genome sequences are pivotal for large-scale studies of inter- and intrahost evolution, including the acquisition of drug resistance mutations. The ability to rapidly and cost-effectively generate large numbers of HIV-1 genome sequences from different populations and geographical locations and determine the effect of minority genetic variants is, however, a limiting factor. Next-generation sequencing promises to bridge this gap but is hindered by the lack of methods for the enrichment of virus genomes across the phylogenetic breadth of HIV-1 and methods for the robust assembly of the virus genomes from short-read data. Here we report a method for the amplification, next-generation sequencing, and unbiased de novo assembly of HIV-1 genomes of groups M, N, and O, as well as recombinants, that does not require prior knowledge of the sequence or subtype. A sensitivity of at least 3,000 copies/ml was determined by using plasma virus samples of known copy numbers. We applied our novel method to compare the genome diversities of HIV-1 groups, subtypes, and genes. The highest level of diversity was found in the env, nef, vpr, tat, and rev genes and parts of the gag gene. Furthermore, we used our method to investigate mutations associated with HIV-1 drug resistance in clinical samples at the level of the complete genome. Drug resistance mutations were detected as both major variant and minor species. In conclusion, we demonstrate the feasibility of our method for large-scale HIV-1 genome sequencing. This will enable the phylogenetic and phylodynamic resolution of the ongoing pandemic and efficient monitoring of complex HIV-1 drug resistance genotypes.
Subject(s)
Genetic Variation , Genome, Viral , HIV Infections/virology , HIV-1/classification , HIV-1/genetics , High-Throughput Nucleotide Sequencing/methods , Nucleic Acid Amplification Techniques/methods , Drug Resistance, Viral , Evolution, Molecular , Genotype , HIV-1/isolation & purification , Humans , Mutation, Missense , Phylogeny , RNA, Viral/genetics , Recombination, GeneticABSTRACT
Influenza A surface proteins H (haemagglutinin) and N (neuraminidase) occur in sixteen and nine distinct genotypes, respectively. The need for a timely production of vaccinations in case of pandemics or seasonal epidemics requires rapid typing methods for the determination of these alleles. The aim of the present study was to develop and improve a rapid and economic assay for determining H and N subtypes of influenza A from patient samples. The assay is based on the hybridisation of labelled amplicons from H and N reverse transcriptase-PCRs using consensus primer pairs to subtype-specific probes on microtiterstripe-mounted DNA-microarrays. An algorithm for semi-automatic data interpretation of raw data and assignment to H and N subtypes was proposed. Altogether, 191 samples were genotyped. This included 134 patient and 44 reference samples as well as controls. Under routine conditions sensitivity and specificity proved to be comparable to conventional nested or real-time PCRs. At least 130 out of 147 array-positive samples were unambiguously assignable. This included all sixteen variants of H as well as all nine variants of N. Furthermore, eighty-two samples from the 2009/2010 "novel H1N1/swine flu" (SF)-outbreak were correctly identified.