RESUMO
The COVID-19 pandemic has been characterised by sequential variant-specific waves shaped by viral, individual human and population factors. SARS-CoV-2 variants are defined by their unique combinations of mutations and there has been a clear adaptation to more efficient human infection since the emergence of this new human coronavirus in late 2019. Here, we use machine learning models to identify shared signatures, i.e., common underlying mutational processes and link these to the subset of mutations that define the variants of concern (VOCs). First, we examined the global SARS-CoV-2 genomes and associated metadata to determine how viral properties and public health measures have influenced the magnitude of waves, as measured by the number of infection cases, in different geographic locations using regression models. This analysis showed that, as expected, both public health measures and virus properties were associated with the waves of regional SARS-CoV-2 reported infection numbers and this impact varies geographically. We attribute this to intrinsic differences such as vaccine coverage, testing and sequencing capacity and the effectiveness of government stringency. To assess underlying evolutionary change, we used non-negative matrix factorisation and observed three distinct mutational signatures, unique in their substitution patterns and exposures from the SARS-CoV-2 genomes. Signatures 1, 2 and 3 were biased to CâT, TâC/AâG and GâT point mutations. We hypothesise assignments of these mutational signatures to the host antiviral molecules APOBEC, ADAR and ROS respectively. We observe a shift amidst the pandemic in relative mutational signature activity from predominantly Signature 1 changes to an increasingly high proportion of changes consistent with Signature 2. This could represent changes in how the virus and the host immune response interact and indicates how SARS-CoV-2 may continue to generate variation in the future. Linkage of the detected mutational signatures to the VOC-defining amino acids substitutions indicates the majority of SARS-CoV-2's evolutionary capacity is likely to be associated with the action of host antiviral molecules rather than virus replication errors.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/genética , Pandemias , Mutação , Antivirais/farmacologiaRESUMO
We describe a cluster of COVID-19 breakthrough infections after vaccination in Kyamulibwa, Kalungu District, Uganda. All but 1 infection were from SARS-CoV-2 Omicron strain BA.5.2.1. We identified 6 distinct genotypes by genome sequencing. Infections were mild, suggesting vaccination is not protective against infection but may limit disease severity.
Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , COVID-19/prevenção & controle , SARS-CoV-2/genética , Uganda/epidemiologia , Infecções IrruptivasRESUMO
The 2013-2016 West African epidemic caused by the Ebola virus was of unprecedented magnitude, duration and impact. Here we reconstruct the dispersal, proliferation and decline of Ebola virus throughout the region by analysing 1,610 Ebola virus genomes, which represent over 5% of the known cases. We test the association of geography, climate and demography with viral movement among administrative regions, inferring a classic 'gravity' model, with intense dispersal between larger and closer populations. Despite attenuation of international dispersal after border closures, cross-border transmission had already sown the seeds for an international epidemic, rendering these measures ineffective at curbing the epidemic. We address why the epidemic did not spread into neighbouring countries, showing that these countries were susceptible to substantial outbreaks but at lower risk of introductions. Finally, we reveal that this large epidemic was a heterogeneous and spatially dissociated collection of transmission clusters of varying size, duration and connectivity. These insights will help to inform interventions in future epidemics.
Assuntos
Ebolavirus/genética , Ebolavirus/fisiologia , Genoma Viral/genética , Doença pelo Vírus Ebola/transmissão , Doença pelo Vírus Ebola/virologia , Clima , Surtos de Doenças/estatística & dados numéricos , Ebolavirus/isolamento & purificação , Geografia , Doença pelo Vírus Ebola/epidemiologia , Humanos , Internacionalidade , Modelos Lineares , Epidemiologia Molecular , Filogenia , Viagem/legislação & jurisprudência , Viagem/estatística & dados numéricosRESUMO
SUMMARY: Here, we present an automated pipeline for Download Of NCBI Entries (DONE) and continuous updating of a local sequence database based on user-specified queries. The database can be created with either protein or nucleotide sequences containing all entries or complete genomes only. The pipeline can automatically clean the database by removing entries with matches to a database of user-specified sequence contaminants. The default contamination entries include sequences from the UniVec database of plasmids, marker genes and sequencing adapters from NCBI, an E.coli genome, rRNA sequences, vectors and satellite sequences. Furthermore, duplicates are removed and the database is automatically screened for sequences from green fluorescent protein, luciferase and antibiotic resistance genes that might be present in some GenBank viral entries, and could lead to false positives in virus identification. For utilizing the database, we present a useful opportunity for dealing with possible human contamination. We show the applicability of DONE by downloading a virus database comprising 37 virus families. We observed an average increase of 16 776 new entries downloaded per month for the 37 families. In addition, we demonstrate the utility of a custom database compared to a standard reference database for classifying both simulated and real sequence data. AVAILABILITYAND IMPLEMENTATION: The DONE pipeline for downloading and cleaning is deposited in a publicly available repository (https://bitbucket.org/genomicepidemiology/done/src/master/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Genoma , Humanos , ProteínasRESUMO
As the coronavirus pandemic continues, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequence data are required to inform vaccine efforts. We provide SARS-CoV-2 sequence data from South Sudan and document the dominance of SARS-CoV-2 lineage B.1.525 (Eta variant) during the country's second wave of infection.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Pandemias , Sudão do Sul/epidemiologiaRESUMO
Following the publication of this article [1], it was noted that due to a typesetting error the figure legends were paired incorrectly.
RESUMO
BACKGROUND: Human metapneumovirus (HMPV) is an important cause of acute respiratory illness in young children. Whole genome sequencing enables better identification of transmission events and outbreaks, which is not always possible with sub-genomic sequences. RESULTS: We report a 2-reaction amplicon-based next generation sequencing method to determine the complete genome sequences of five HMPV strains, representing three subgroups (A2, B1 and B2), directly from clinical samples. In addition to reporting five novel HMPV genomes from Africa we examined genetic diversity and sequence patterns of publicly available HMPV genomes. We found that the overall nucleotide sequence identity was 71.3 and 80% for HMPV group A and B, respectively, the diversity between HMPV groups was greater at amino acid level for SH and G surface protein genes, and multiple subgroups co-circulated in various countries. Comparison of sequences between HMPV groups revealed variability in G protein length (219 to 241 amino acids) due to changes in the stop codon position. Genome-wide phylogenetic analysis showed congruence with the individual gene sequence sets except for F and M2 genes. CONCLUSION: This is the first genomic characterization of HMPV genomes from African patients.
Assuntos
Genoma Viral/genética , Metapneumovirus/genética , Infecções por Paramyxoviridae/genética , Sequenciamento Completo do Genoma , Sequência de Aminoácidos , Genótipo , Humanos , Quênia/epidemiologia , Metapneumovirus/patogenicidade , Infecções por Paramyxoviridae/epidemiologia , Infecções por Paramyxoviridae/virologia , Filogenia , Proteínas Virais/genética , Zâmbia/epidemiologiaRESUMO
We established rapid local viral sequencing to document the genomic diversity of severe acute respiratory syndrome coronavirus 2 entering Uganda. Virus lineages closely followed the travel origins of infected persons. Our sequence data provide an important baseline for tracking any further transmission of the virus throughout the country and region.
Assuntos
Betacoronavirus/genética , Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/virologia , Pandemias/prevenção & controle , Pneumonia Viral/epidemiologia , Pneumonia Viral/virologia , Viagem Aérea , COVID-19 , Infecções por Coronavirus/diagnóstico , Infecções por Coronavirus/prevenção & controle , Variação Genética , Genoma , Política de Saúde , Humanos , Programas de Rastreamento , Veículos Automotores , Filogeografia , Pneumonia Viral/diagnóstico , Pneumonia Viral/prevenção & controle , Quarentena , SARS-CoV-2 , Uganda/epidemiologiaRESUMO
We are rapidly approaching the point where we have sequenced millions of human genomes. There is a pressing need for new data structures to store raw sequencing data and efficient algorithms for population scale analysis. Current reference-based data formats do not fully exploit the redundancy in population sequencing nor take advantage of shared genetic variation. In recent years, the Burrows-Wheeler transform (BWT) and FM-index have been widely employed as a full-text searchable index for read alignment and de novo assembly. We introduce the concept of a population BWT and use it to store and index the sequencing reads of 2705 samples from the 1000 Genomes Project. A key feature is that, as more genomes are added, identical read sequences are increasingly observed, and compression becomes more efficient. We assess the support in the 1000 Genomes read data for every base position of two human reference assembly versions, identifying that 3.2 Mbp with population support was lost in the transition from GRCh37 with 13.7 Mbp added to GRCh38. We show that the vast majority of variant alleles can be uniquely described by overlapping 31-mers and show how rapid and accurate SNP and indel genotyping can be carried out across the genomes in the population BWT. We use the population BWT to carry out nonreference queries to search for the presence of all known viral genomes and discover human T-lymphotropic virus 1 integrations in six samples in a recognized epidemiological distribution.
Assuntos
Genoma Humano/genética , Genômica , Alinhamento de Sequência/métodos , Sequenciamento Completo do Genoma/métodos , Alelos , Compressão de Dados , Genótipo , Humanos , Mutação INDEL/genética , Análise de Sequência de DNA , SoftwareRESUMO
Lemurs are highly endangered mammals inhabiting the forests of Madagascar. In this study, we performed virus discovery on serum samples collected from 84 wild lemurs and identified viral sequence fragments from 4 novel viruses within the family Flaviviridae, including members of the genera Hepacivirus and Pegivirus. The sifaka hepacivirus (SifHV, two genotypes) and pegivirus (SifPgV, two genotypes) were discovered in the diademed sifaka (Propithecus diadema), while other pegiviral fragments were detected in samples from the indri (Indri indri, IndPgV) and the weasel sportive lemur (Lepilemur mustelinus, LepPgV). Although data are preliminary, each viral species appeared host species-specific and frequent infection was detected (18 of 84 individuals were positive for at least one virus). The complete coding sequence and partial 5' and 3' untranslated regions (UTRs) were obtained for SifHV and its genomic organization was consistent with that of other hepaciviruses, with one unique polyprotein and highly structured UTRs. Phylogenetic analyses showed the SifHV belonged to a clade that includes several viral species identified in rodents from Asia and North America, while SifPgV and IndPgV were more closely related to pegiviral species A and C, that include viruses found in humans as well as New- and Old-World monkeys. Our results support the current proposed model of virus-host co-divergence with frequent occurrence of cross-species transmission for these genera and highlight how the discovery of more members of the Flaviviridae can help clarify the ecology and evolutionary history of these viruses. Furthermore, this knowledge is important for conservation and captive management of lemurs.
Assuntos
Infecções por Flaviviridae/veterinária , Flaviviridae/isolamento & purificação , Lemur/virologia , Doenças dos Primatas/virologia , Animais , Flaviviridae/classificação , Flaviviridae/genética , Flaviviridae/fisiologia , Infecções por Flaviviridae/virologia , Variação Genética , Madagáscar , FilogeniaRESUMO
In November 2018, yellow fever was diagnosed in a Dutch traveller returning from a bicycle tour in the Gambia-Senegal region. A complete genome sequence of yellow fever virus (YFV) from the case was generated and clustered phylogenetically with YFV from the Gambia and Senegal, ruling out importation into the Netherlands from recent outbreaks in Brazil or Angola. We emphasise the need for increased public awareness of YFV vaccination before travelling to endemic countries.
Assuntos
Insetos Vetores/virologia , Viagem , Febre Amarela/diagnóstico , Vírus da Febre Amarela/genética , Vírus da Febre Amarela/isolamento & purificação , Injúria Renal Aguda/diagnóstico , Injúria Renal Aguda/etiologia , Animais , Surtos de Doenças , Gâmbia , Humanos , Mordeduras e Picadas de Insetos , Falência Hepática Aguda/diagnóstico , Falência Hepática Aguda/etiologia , Países Baixos , Filogenia , Reação em Cadeia da Polimerase , Senegal , Sequenciamento Completo do Genoma , Febre Amarela/virologia , Adulto JovemRESUMO
Background: Human coronavirus NL63 (HCoV-NL63) is a globally endemic pathogen causing mild and severe respiratory tract infections with reinfections occurring repeatedly throughout a lifetime. Methods: Nasal samples were collected in coastal Kenya through community-based and hospital-based surveillance. HCoV-NL63 was detected with multiplex real-time reverse transcription PCR, and positive samples were targeted for nucleotide sequencing of the spike (S) protein. Additionally, paired samples from 25 individuals with evidence of repeat HCoV-NL63 infection were selected for whole-genome virus sequencing. Results: HCoV-NL63 was detected in 1.3% (75/5573) of child pneumonia admissions. Two HCoV-NL63 genotypes circulated in Kilifi between 2008 and 2014. Full genome sequences formed a monophyletic clade closely related to contemporary HCoV-NL63 from other global locations. An unexpected pattern of repeat infections was observed with some individuals showing higher viral titers during their second infection. Similar patterns for 2 other endemic coronaviruses, HCoV-229E and HCoV-OC43, were observed. Repeat infections by HCoV-NL63 were not accompanied by detectable genotype switching. Conclusions: In this coastal Kenya setting, HCoV-NL63 exhibited low prevalence in hospital pediatric pneumonia admissions. Clade persistence with low genetic diversity suggest limited immune selection, and absence of detectable clade switching in reinfections indicates initial exposure was insufficient to elicit a protective immune response.
Assuntos
Infecções por Coronavirus/epidemiologia , Coronavirus Humano NL63/genética , Adolescente , Adulto , Evolução Biológica , Criança , Pré-Escolar , Infecções por Coronavirus/virologia , Coronavirus Humano OC43/genética , Feminino , Hospitalização , Humanos , Lactente , Recém-Nascido , Quênia/epidemiologia , Masculino , Epidemiologia Molecular , Filogenia , Prevalência , Estudos Prospectivos , Infecções Respiratórias/epidemiologia , Infecções Respiratórias/virologia , Adulto JovemAssuntos
COVID-19 , SARS-CoV-2 , Viagem , COVID-19/epidemiologia , Humanos , Filogenia , SARS-CoV-2/genética , Viagem/legislação & jurisprudênciaRESUMO
Background: The genus Norovirus comprises large genetic diversity, and new GII.4 variants emerge every 2-3 years. It is unknown in which host these new variants originate. Here we study whether prolonged shedders within the immunocompromised population could be a reservoir for newly emerging strains. Methods: Sixty-five fecal samples from 16 immunocompromised patients were retrospectively selected. Isolated viral RNA was enriched by hybridization with a custom norovirus whole-genome RNA bait set and deep sequenced on the Illumina MiSeq platform. Results: Patients shed virus for average 352 days (range, 76-716 days). Phylogenetic analysis showed distinct GII.4 variants in 3 of 13 patients (23%). The viral mutation rates were variable between patients but did not differ between various immune status groups. All within-host GII.4 viral populations showed amino acid changes at blocking epitopes over time, and the majority of VP1 amino acid mutations were located at the capsid surface. Conclusions: This study found viruses in immunocompromised hosts that are genetically distinct from viruses circulating in the general population, and these patients therefore may contain a reservoir for newly emerging strains. Future studies need to determine whether these new strains are of risk to other immunocompromised patients and the general population.
Assuntos
Infecções por Caliciviridae/virologia , Evolução Molecular , Genoma Viral , Hospedeiro Imunocomprometido , Norovirus/classificação , Norovirus/genética , Adolescente , Adulto , Idoso , Criança , Pré-Escolar , Doença Crônica , Reservatórios de Doenças/virologia , Fezes/virologia , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade , Taxa de Mutação , Norovirus/isolamento & purificação , Filogenia , RNA Viral/genética , RNA Viral/isolamento & purificação , Estudos Retrospectivos , Fatores de Tempo , Eliminação de Partículas Virais , Sequenciamento Completo do Genoma , Adulto JovemRESUMO
We report on an Ebola virus disease (EVD) survivor who showed Ebola virus in seminal fluid 531 days after onset of disease. The persisting virus was sexually transmitted in February 2016, about 470 days after onset of symptoms, and caused a new cluster of EVD in Guinea and Liberia.
Assuntos
Surtos de Doenças , Ebolavirus/genética , Doença pelo Vírus Ebola , Sêmen/virologia , Doenças Virais Sexualmente Transmissíveis , Ebolavirus/isolamento & purificação , Feminino , Guiné , Doença pelo Vírus Ebola/transmissão , Doença pelo Vírus Ebola/virologia , Humanos , Masculino , Reação em Cadeia da Polimerase , RNA Viral/análise , Doenças Virais Sexualmente Transmissíveis/transmissão , Doenças Virais Sexualmente Transmissíveis/virologia , SobreviventesRESUMO
BACKGROUND: In September 2012, the World Health Organization reported the first cases of pneumonia caused by the novel Middle East respiratory syndrome coronavirus (MERS-CoV). We describe a cluster of health care-acquired MERS-CoV infections. METHODS: Medical records were reviewed for clinical and demographic information and determination of potential contacts and exposures. Case patients and contacts were interviewed. The incubation period and serial interval (the time between the successive onset of symptoms in a chain of transmission) were estimated. Viral RNA was sequenced. RESULTS: Between April 1 and May 23, 2013, a total of 23 cases of MERS-CoV infection were reported in the eastern province of Saudi Arabia. Symptoms included fever in 20 patients (87%), cough in 20 (87%), shortness of breath in 11 (48%), and gastrointestinal symptoms in 8 (35%); 20 patients (87%) presented with abnormal chest radiographs. As of June 12, a total of 15 patients (65%) had died, 6 (26%) had recovered, and 2 (9%) remained hospitalized. The median incubation period was 5.2 days (95% confidence interval [CI], 1.9 to 14.7), and the serial interval was 7.6 days (95% CI, 2.5 to 23.1). A total of 21 of the 23 cases were acquired by person-to-person transmission in hemodialysis units, intensive care units, or in-patient units in three different health care facilities. Sequencing data from four isolates revealed a single monophyletic clade. Among 217 household contacts and more than 200 health care worker contacts whom we identified, MERS-CoV infection developed in 5 family members (3 with laboratory-confirmed cases) and in 2 health care workers (both with laboratory-confirmed cases). CONCLUSIONS: Person-to-person transmission of MERS-CoV can occur in health care settings and may be associated with considerable morbidity. Surveillance and infection-control measures are critical to a global public health response.
Assuntos
Infecções por Coronavirus/transmissão , Coronavirus/genética , Infecção Hospitalar/transmissão , Surtos de Doenças , Pneumonia Viral/epidemiologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Sequência de Bases , Coronavirus/isolamento & purificação , Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/virologia , Infecção Hospitalar/epidemiologia , Infecção Hospitalar/virologia , DNA Viral/análise , Transmissão de Doença Infecciosa , Feminino , Humanos , Período de Incubação de Doenças Infecciosas , Transmissão de Doença Infecciosa do Paciente para o Profissional , Unidades de Terapia Intensiva , Masculino , Pessoa de Meia-Idade , Filogenia , Pneumonia Viral/transmissão , Pneumonia Viral/virologia , Diálise Renal , Arábia Saudita/epidemiologiaRESUMO
Both the recognition of HIV-infected cells and the immunogenicity of candidate CTL vaccines depend on the presentation of a peptide epitope at the cell surface, which in turn depends on intracellular antigen processing. Differential antigen processing maybe responsible for the differences in both the quality and the quantity of epitopes produced, influencing the immunodominance hierarchy of viral epitopes. Previously, we showed that the magnitude of the HIV-2 gag-specific T-cell response is inversely correlated with plasma viral load, particularly when responses are directed against an epitope, 165 DRFYKSLRA173 , within the highly conserved Major Homology Region of gag-p26. We also showed that the presence of three proline residues, at positions 119, 159 and 178 of gag-p26, was significantly correlated with low viral load. Since this proline motif was also associated with stronger gag-specific CTL responses, we investigated the impact of these prolines on proteasomal processing of the protective 165 DRFYKSLRA173 epitope. Our data demonstrate that the 165 DRFYKSLRA173 epitope is most efficiently processed from precursors that contain two flanking proline residues, found naturally in low viral-load patients. Superior antigen processing and enhanced presentation may account for the link between infection with HIV-2 encoding the "PPP-gag" sequence and both strong gag-specific CTL responses as well as lower viral load.
Assuntos
Epitopos de Linfócito T/imunologia , Infecções por HIV/imunologia , HIV-2/imunologia , Imunidade Celular , Linfócitos T/imunologia , Produtos do Gene gag do Vírus da Imunodeficiência Humana/imunologia , Motivos de Aminoácidos , Epitopos de Linfócito T/genética , Feminino , Infecções por HIV/genética , Infecções por HIV/patologia , HIV-2/genética , Humanos , Masculino , Linfócitos T/patologia , Carga Viral/imunologia , Produtos do Gene gag do Vírus da Imunodeficiência Humana/genéticaRESUMO
Media and World Health Organization (WHO) attention on Zika virus transmission at the 2016 Rio Olympic Games and the 2015 Ebola virus outbreak in West Africa diverted the attention of global public health authorities from other lethal infectious diseases with epidemic potential. Mass gatherings such as the annual Hajj pilgrimage hosted by Kingdom of Saudi Arabia attract huge crowds from all continents, creating high-risk conditions for the rapid global spread of infectious diseases. The highly lethal Middle Eastern respiratory syndrome coronavirus (MERS-CoV) remains in the WHO list of top emerging diseases likely to cause major epidemics. The 2015 MERS-CoV outbreak in South Korea, in which 184 MERS cases including 33 deaths occurred in 2 months, that was imported from the Middle East by a South Korean businessman was a wake-up call for the global community to refocus attention on MERS-CoV and other emerging and re-emerging infectious diseases with epidemic potential. The international donor community and Middle Eastern countries should make available resources for, and make a serious commitment to, taking forward a "One Health" global network for proactive surveillance, rapid detection, and prevention of MERS-CoV and other epidemic infectious diseases threats.
Assuntos
Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/prevenção & controle , Surtos de Doenças/prevenção & controle , Epidemias/prevenção & controle , Saúde Global , Coronavírus da Síndrome Respiratória do Oriente Médio , Atenção , Doenças Transmissíveis/epidemiologia , Doenças Transmissíveis/transmissão , Doenças Transmissíveis Emergentes/epidemiologia , Doenças Transmissíveis Emergentes/prevenção & controle , Doenças Transmissíveis Emergentes/transmissão , Infecções por Coronavirus/transmissão , Humanos , Coronavírus da Síndrome Respiratória do Oriente Médio/isolamento & purificação , Arábia Saudita/epidemiologia , ViagemRESUMO
UNLABELLED: Human respiratory syncytial virus (RSV) is associated with severe childhood respiratory infections. A clear description of local RSV molecular epidemiology, evolution, and transmission requires detailed sequence data and can inform new strategies for virus control and vaccine development. We have generated 27 complete or nearly complete genomes of RSV from hospitalized children attending a rural coastal district hospital in Kilifi, Kenya, over a 10-year period using a novel full-genome deep-sequencing process. Phylogenetic analysis of the new genomes demonstrated the existence and cocirculation of multiple genotypes in both RSV A and B groups in Kilifi. Comparison of local versus global strains demonstrated that most RSV A variants observed locally in Kilifi were also seen in other parts of the world, while the Kilifi RSV B genomes encoded a high degree of variation that was not observed in other parts of the world. The nucleotide substitution rates for the individual open reading frames (ORFs) were highest in the regions encoding the attachment (G) glycoprotein and the NS2 protein. The analysis of RSV full genomes, compared to subgenomic regions, provided more precise estimates of the RSV sequence changes and revealed important patterns of RSV genomic variation and global movement. The novel sequencing method and the new RSV genomic sequences reported here expand our knowledge base for large-scale RSV epidemiological and transmission studies. IMPORTANCE: The new RSV genomic sequences and the novel sequencing method reported here provide important data for understanding RSV transmission and vaccine development. Given the complex interplay between RSV A and RSV B infections, the existence of local RSV B evolution is an important factor in vaccine deployment.
Assuntos
Evolução Molecular , Genoma Viral , Infecções por Vírus Respiratório Sincicial/virologia , Vírus Sincicial Respiratório Humano/classificação , Vírus Sincicial Respiratório Humano/genética , Pré-Escolar , Análise por Conglomerados , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Hospitais Rurais , Humanos , Lactente , Recém-Nascido , Quênia , Dados de Sequência Molecular , Filogeografia , Vírus Sincicial Respiratório Humano/isolamento & purificação , Análise de Sequência de DNA , Homologia de SequênciaRESUMO
UNLABELLED: Epstein-Barr virus (EBV) infects most of the world's population and is causally associated with several human cancers, but little is known about how EBV genetic variation might influence infection or EBV-associated disease. There are currently no published wild-type EBV genome sequences from a healthy individual and very few genomes from EBV-associated diseases. We have sequenced 71 geographically distinct EBV strains from cell lines, multiple types of primary tumor, and blood samples and the first EBV genome from the saliva of a healthy carrier. We show that the established genome map of EBV accurately represents all strains sequenced, but novel deletions are present in a few isolates. We have increased the number of type 2 EBV genomes sequenced from one to 12 and establish that the type 1/type 2 classification is a major feature of EBV genome variation, defined almost exclusively by variation of EBNA2 and EBNA3 genes, but geographic variation is also present. Single nucleotide polymorphism (SNP) density varies substantially across all known open reading frames and is highest in latency-associated genes. Some T-cell epitope sequences in EBNA3 genes show extensive variation across strains, and we identify codons under positive selection, both important considerations for the development of vaccines and T-cell therapy. We also provide new evidence for recombination between strains, which provides a further mechanism for the generation of diversity. Our results provide the first global view of EBV sequence variation and demonstrate an effective method for sequencing large numbers of genomes to further understand the genetics of EBV infection. IMPORTANCE: Most people in the world are infected by Epstein-Barr virus (EBV), and it causes several human diseases, which occur at very different rates in different parts of the world and are linked to host immune system variation. Natural variation in EBV DNA sequence may be important for normal infection and for causing disease. Here we used rapid, cost-effective sequencing to determine 71 new EBV sequences from different sample types and locations worldwide. We showed geographic variation in EBV genomes and identified the most variable parts of the genome. We identified protein sequences that seem to have been selected by the host immune system and detected variability in known immune epitopes. This gives the first overview of EBV genome variation, important for designing vaccines and immune therapy for EBV, and provides techniques to investigate relationships between viral sequence variation and EBV-associated diseases.