ABSTRACT
We present evidence for multiple independent origins of recombinant SARS-CoV-2 viruses sampled from late 2020 and early 2021 in the United Kingdom. Their genomes carry single-nucleotide polymorphisms and deletions that are characteristic of the B.1.1.7 variant of concern but lack the full complement of lineage-defining mutations. Instead, the remainder of their genomes share contiguous genetic variation with non-B.1.1.7 viruses circulating in the same geographic area at the same time as the recombinants. In four instances, there was evidence for onward transmission of a recombinant-origin virus, including one transmission cluster of 45 sequenced cases over the course of 2 months. The inferred genomic locations of recombination breakpoints suggest that every community-transmitted recombinant virus inherited its spike region from a B.1.1.7 parental virus, consistent with a transmission advantage for B.1.1.7's set of mutations.
Subject(s)
COVID-19/epidemiology , COVID-19/transmission , Pandemics , Recombination, Genetic , SARS-CoV-2/genetics , Base Sequence/genetics , COVID-19/virology , Computational Biology/methods , Gene Frequency , Genome, Viral , Genotype , Humans , Mutation , Phylogeny , Polymorphism, Single Nucleotide , United Kingdom/epidemiology , Whole Genome Sequencing/methodsABSTRACT
Global dispersal and increasing frequency of the SARS-CoV-2 spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large dataset, well represented by both spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant.
Subject(s)
Amino Acid Substitution , COVID-19/transmission , COVID-19/virology , SARS-CoV-2/genetics , SARS-CoV-2/pathogenicity , Spike Glycoprotein, Coronavirus/genetics , Aspartic Acid/analysis , Aspartic Acid/genetics , COVID-19/epidemiology , Genome, Viral , Glycine/analysis , Glycine/genetics , Humans , Mutation , SARS-CoV-2/growth & development , United Kingdom/epidemiology , Virulence , Whole Genome SequencingABSTRACT
The novel coronavirus SARS-CoV-2 was first detected in the Pacific Northwest region of the United States in January 2020, with subsequent COVID-19 outbreaks detected in all 50 states by early March. To uncover the sources of SARS-CoV-2 introductions and patterns of spread within the United States, we sequenced nine viral genomes from early reported COVID-19 patients in Connecticut. Our phylogenetic analysis places the majority of these genomes with viruses sequenced from Washington state. By coupling our genomic data with domestic and international travel patterns, we show that early SARS-CoV-2 transmission in Connecticut was likely driven by domestic introductions. Moreover, the risk of domestic importation to Connecticut exceeded that of international importation by mid-March regardless of our estimated effects of federal travel restrictions. This study provides evidence of widespread sustained transmission of SARS-CoV-2 within the United States and highlights the critical need for local surveillance.
Subject(s)
Betacoronavirus/genetics , Coronavirus Infections/transmission , Pneumonia, Viral/transmission , Travel , Betacoronavirus/isolation & purification , COVID-19 , Connecticut/epidemiology , Coronavirus Infections/epidemiology , Coronavirus Infections/virology , Epidemiological Monitoring , Humans , Likelihood Functions , Pandemics , Phylogeny , Pneumonia, Viral/epidemiology , Pneumonia, Viral/virology , SARS-CoV-2 , Travel/legislation & jurisprudence , United States/epidemiology , Washington/epidemiologyABSTRACT
The SARS-CoV-2 Delta (Pango lineage B.1.617.2) variant of concern spread globally, causing resurgences of COVID-19 worldwide1,2. The emergence of the Delta variant in the UK occurred on the background of a heterogeneous landscape of immunity and relaxation of non-pharmaceutical interventions. Here we analyse 52,992 SARS-CoV-2 genomes from England together with 93,649 genomes from the rest of the world to reconstruct the emergence of Delta and quantify its introduction to and regional dissemination across England in the context of changing travel and social restrictions. Using analysis of human movement, contact tracing and virus genomic data, we find that the geographic focus of the expansion of Delta shifted from India to a more global pattern in early May 2021. In England, Delta lineages were introduced more than 1,000 times and spread nationally as non-pharmaceutical interventions were relaxed. We find that hotel quarantine for travellers reduced onward transmission from importations; however, the transmission chains that later dominated the Delta wave in England were seeded before travel restrictions were introduced. Increasing inter-regional travel within England drove the nationwide dissemination of Delta, with some cities receiving more than 2,000 observable lineage introductions from elsewhere. Subsequently, increased levels of local population mixing-and not the number of importations-were associated with the faster relative spread of Delta. The invasion dynamics of Delta depended on spatial heterogeneity in contact patterns, and our findings will inform optimal spatial interventions to reduce the transmission of current and future variants of concern, such as Omicron (Pango lineage B.1.1.529).
Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/transmission , COVID-19/virology , Cities/epidemiology , Contact Tracing , England/epidemiology , Genome, Viral/genetics , Humans , Quarantine/legislation & jurisprudence , SARS-CoV-2/genetics , SARS-CoV-2/growth & development , SARS-CoV-2/isolation & purification , Travel/legislation & jurisprudenceABSTRACT
The SARS-CoV-2 lineage B.1.1.7, designated variant of concern (VOC) 202012/01 by Public Health England1, was first identified in the UK in late summer to early autumn 20202. Whole-genome SARS-CoV-2 sequence data collected from community-based diagnostic testing for COVID-19 show an extremely rapid expansion of the B.1.1.7 lineage during autumn 2020, suggesting that it has a selective advantage. Here we show that changes in VOC frequency inferred from genetic data correspond closely to changes inferred by S gene target failures (SGTF) in community-based diagnostic PCR testing. Analysis of trends in SGTF and non-SGTF case numbers in local areas across England shows that B.1.1.7 has higher transmissibility than non-VOC lineages, even if it has a different latent period or generation time. The SGTF data indicate a transient shift in the age composition of reported cases, with cases of B.1.1.7 including a larger share of under 20-year-olds than non-VOC cases. We estimated time-varying reproduction numbers for B.1.1.7 and co-circulating lineages using SGTF and genomic data. The best-supported models did not indicate a substantial difference in VOC transmissibility among different age groups, but all analyses agreed that B.1.1.7 has a substantial transmission advantage over other lineages, with a 50% to 100% higher reproduction number.
Subject(s)
COVID-19/transmission , COVID-19/virology , Phylogeny , SARS-CoV-2/classification , SARS-CoV-2/pathogenicity , Adolescent , Adult , Age Distribution , Aged , Aged, 80 and over , Basic Reproduction Number , COVID-19/diagnosis , COVID-19/epidemiology , Child , Child, Preschool , England/epidemiology , Evolution, Molecular , Genome, Viral/genetics , Humans , Infant , Infant, Newborn , Middle Aged , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification , Spike Glycoprotein, Coronavirus/analysis , Spike Glycoprotein, Coronavirus/genetics , Time Factors , Young AdultABSTRACT
After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.
Subject(s)
Chromosomes, Human, X/genetics , Genome, Human/genetics , Telomere/genetics , Centromere/genetics , CpG Islands/genetics , DNA Methylation , DNA, Satellite/genetics , Female , Humans , Hydatidiform Mole/genetics , Male , Pregnancy , Reproducibility of Results , Testis/metabolismABSTRACT
The recent Ebola and Zika epidemics demonstrate the need for the continuous surveillance, rapid diagnosis and real-time tracking of emerging infectious diseases. Fast, affordable sequencing of pathogen genomes - now a staple of the public health microbiology laboratory in well-resourced settings - can affect each of these areas. Coupling genomic diagnostics and epidemiology to innovative digital disease detection platforms raises the possibility of an open, global, digital pathogen surveillance system. When informed by a One Health approach, in which human, animal and environmental health are considered together, such a genomics-based system has profound potential to improve public health in settings lacking robust laboratory capacity.
Subject(s)
Communicable Diseases, Emerging/epidemiology , Public Health Surveillance/methods , Animals , Communicable Diseases, Emerging/etiology , Communicable Diseases, Emerging/genetics , Computer Systems , Environmental Health , Epidemics , Genomics , Hemorrhagic Fever, Ebola/epidemiology , High-Throughput Nucleotide Sequencing , Humans , Metagenomics , Models, Biological , Molecular Epidemiology , Public HealthABSTRACT
The Ebola virus disease epidemic in West Africa is the largest on record, responsible for over 28,599 cases and more than 11,299 deaths. Genome sequencing in viral outbreaks is desirable to characterize the infectious agent and determine its evolutionary rate. Genome sequencing also allows the identification of signatures of host adaptation, identification and monitoring of diagnostic targets, and characterization of responses to vaccines and treatments. The Ebola virus (EBOV) genome substitution rate in the Makona strain has been estimated at between 0.87 × 10(-3) and 1.42 × 10(-3) mutations per site per year. This is equivalent to 16-27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions. Genomic surveillance during the epidemic has been sporadic owing to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities. To address this problem, here we devise a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. We present sequence data and analysis of 142 EBOV samples collected during the period March to October 2015. We were able to generate results less than 24 h after receiving an Ebola-positive sample, with the sequencing process taking as little as 15-60 min. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks.
Subject(s)
Ebolavirus/genetics , Epidemiological Monitoring , Genome, Viral/genetics , Hemorrhagic Fever, Ebola/epidemiology , Hemorrhagic Fever, Ebola/virology , Sequence Analysis, DNA/instrumentation , Sequence Analysis, DNA/methods , Aircraft , Disease Outbreaks/statistics & numerical data , Ebolavirus/classification , Ebolavirus/pathogenicity , Guinea/epidemiology , Humans , Mutagenesis/genetics , Mutation Rate , Time FactorsABSTRACT
We have assembled de novo the Escherichia coli K-12 MG1655 chromosome in a single 4.6-Mb contig using only nanopore data. Our method has three stages: (i) overlaps are detected between reads and then corrected by a multiple-alignment process; (ii) corrected reads are assembled using the Celera Assembler; and (iii) the assembly is polished using a probabilistic model of the signal-level data. The assembly reconstructs gene order and has 99.5% nucleotide identity.
Subject(s)
Computational Biology/methods , Escherichia coli K12/genetics , Genome, Bacterial , Nanopores , Nanotechnology/methods , Sequence Analysis, DNA/methods , Algorithms , Contig Mapping/methods , High-Throughput Nucleotide Sequencing/methods , Reproducibility of Results , SoftwareABSTRACT
A 9-month-old infant died from Ebola virus (EBOV) disease with unknown epidemiological link. While her parents did not report previous illness, laboratory investigations revealed persisting EBOV RNA in the mother's breast milk and the father's seminal fluid. Genomic analysis strongly suggests EBOV transmission to the child through breastfeeding.
Subject(s)
Ebolavirus/isolation & purification , Hemorrhagic Fever, Ebola/transmission , Infectious Disease Transmission, Vertical , Milk, Human/virology , Adult , Cluster Analysis , Female , Humans , Infant , Male , Phylogeny , RNA, Viral/genetics , RNA, Viral/isolation & purification , Semen/virology , Sequence Analysis, DNA , Sequence Homology , Young AdultABSTRACT
Shotgun sequencing enables the reconstruction of genomes from complex microbial communities, but because assembly does not reconstruct entire genomes, it is necessary to bin genome fragments. Here we present CONCOCT, a new algorithm that combines sequence composition and coverage across multiple samples, to automatically cluster contigs into genomes. We demonstrate high recall and precision on artificial as well as real human gut metagenome data sets.
Subject(s)
Contig Mapping/methods , Gastrointestinal Tract/microbiology , Genome, Bacterial/genetics , Metagenome/genetics , Metagenomics/methods , Microbiota/genetics , Sequence Analysis, DNA/methods , Software , Algorithms , Bifidobacterium/genetics , Escherichia coli K12/genetics , Feces/microbiology , Humans , Shiga-Toxigenic Escherichia coli/geneticsABSTRACT
We report on an Ebola virus disease (EVD) survivor who showed Ebola virus in seminal fluid 531 days after onset of disease. The persisting virus was sexually transmitted in February 2016, about 470 days after onset of symptoms, and caused a new cluster of EVD in Guinea and Liberia.
Subject(s)
Disease Outbreaks , Ebolavirus/genetics , Hemorrhagic Fever, Ebola , Semen/virology , Sexually Transmitted Diseases, Viral , Ebolavirus/isolation & purification , Female , Guinea , Hemorrhagic Fever, Ebola/transmission , Hemorrhagic Fever, Ebola/virology , Humans , Male , Polymerase Chain Reaction , RNA, Viral/analysis , Sexually Transmitted Diseases, Viral/transmission , Sexually Transmitted Diseases, Viral/virology , SurvivorsABSTRACT
In October 2015, a new case of Ebola virus disease in Guinea was detected. Case investigation, serology, and whole-genome sequencing indicated possible transmission of the virus from an Ebola virus disease survivor to another person and then to the case-patient reported here. This transmission chain over 11 months suggests slow Ebola virus evolution.
Subject(s)
Disease Outbreaks , Ebolavirus , Hemorrhagic Fever, Ebola/epidemiology , Hemorrhagic Fever, Ebola/transmission , Child , Ebolavirus/classification , Ebolavirus/genetics , Female , Guinea/epidemiology , Hemorrhagic Fever, Ebola/history , Hemorrhagic Fever, Ebola/virology , History, 21st Century , Humans , Male , Phylogeny , Population Surveillance , Seroepidemiologic StudiesABSTRACT
Laboratory-based evolution and whole-genome sequencing can link genotype and phenotype. We used evolution of acid resistance in exponential phase Escherichia coli to study resistance to a lethal stress. Iterative selection at pH 2.5 generated five populations that were resistant to low pH in early exponential phase. Genome sequencing revealed multiple mutations, but the only gene mutated in all strains was evgS, part of a two-component system that has already been implicated in acid resistance. All these mutations were in the cytoplasmic PAS domain of EvgS, and were shown to be solely responsible for the resistant phenotype, causing strong upregulation at neutral pH of genes normally induced by low pH. Resistance to pH 2.5 in these strains did not require the transporter GadC, or the sigma factor RpoS. We found that EvgS-dependent constitutive acid resistance to pH 2.5 was retained in the absence of the regulators GadE or YdeO, but was lost if the oxidoreductase YdeP was also absent. A deletion in the periplasmic domain of EvgS abolished the response to low pH, but not the activity of the constitutive mutants. On the basis of these results we propose a model for how EvgS may become activated by low pH.
Subject(s)
Acids/metabolism , Escherichia coli Proteins/genetics , Escherichia coli/enzymology , Evolution, Molecular , Protein Kinases/genetics , Amino Acid Sequence , Escherichia coli/chemistry , Escherichia coli/genetics , Escherichia coli/metabolism , Escherichia coli Proteins/chemistry , Escherichia coli Proteins/metabolism , Hydrogen-Ion Concentration , Models, Molecular , Molecular Sequence Data , Mutation , Protein Kinases/chemistry , Protein Kinases/metabolism , Protein Structure, TertiaryABSTRACT
MOTIVATION: Nanopore sequencing may be the next disruptive technology in genomics, owing to its ability to detect single DNA molecules without prior amplification, lack of reliance on expensive optical components, and the ability to sequence long fragments. The MinION™ from Oxford Nanopore Technologies (ONT) is the first nanopore sequencer to be commercialized and is now available to early-access users. The MinION™ is a USB-connected, portable nanopore sequencer that permits real-time analysis of streaming event data. Currently, the research community lacks a standardized toolkit for the analysis of nanopore datasets. RESULTS: We introduce poretools, a flexible toolkit for exploring datasets generated by nanopore sequencing devices from MinION™ for the purposes of quality control and downstream analysis. Poretools operates directly on the native FAST5 (an application of the HDF5 standard) file format produced by ONT and provides a wealth of format conversion utilities and data exploration and visualization tools. AVAILABILITY AND IMPLEMENTATION: Poretools is an open-source software and is written in Python as both a suite of command line utilities and a Python application programming interface. Source code is freely available in Github at https://www.github.com/arq5x/poretools.
Subject(s)
Nanopores , Sequence Analysis, DNA/methods , Software , Sequence Analysis, DNA/standardsABSTRACT
BACKGROUND: The study of microbial communities has been revolutionised in recent years by the widespread adoption of culture independent analytical techniques such as 16S rRNA gene sequencing and metagenomics. One potential confounder of these sequence-based approaches is the presence of contamination in DNA extraction kits and other laboratory reagents. RESULTS: In this study we demonstrate that contaminating DNA is ubiquitous in commonly used DNA extraction kits and other laboratory reagents, varies greatly in composition between different kits and kit batches, and that this contamination critically impacts results obtained from samples containing a low microbial biomass. Contamination impacts both PCR-based 16S rRNA gene surveys and shotgun metagenomics. We provide an extensive list of potential contaminating genera, and guidelines on how to mitigate the effects of contamination. CONCLUSIONS: These results suggest that caution should be advised when applying sequence-based techniques to the study of microbiota present in low biomass environments. Concurrent sequencing of negative control samples is strongly advised.
Subject(s)
DNA Contamination , Indicators and Reagents/analysis , Laboratories , Metagenomics , Microbiota , Salmonella/genetics , Polymerase Chain Reaction , RNA, Ribosomal, 16S/analysis , Sequence Analysis, DNAABSTRACT
An outbreak caused by Shiga-toxinproducing Escherichia coli O104:H4 occurred in Germany in May and June of 2011, with more than 3000 persons infected. Here, we report a cluster of cases associated with a single family and describe an open-source genomic analysis of an isolate from one member of the family. This analysis involved the use of rapid, bench-top DNA sequencing technology, open-source data release, and prompt crowd-sourced analyses. In less than a week, these studies revealed that the outbreak strain belonged to an enteroaggregative E. coli lineage that had acquired genes for Shiga toxin 2 and for antibiotic resistance.
Subject(s)
Escherichia coli Infections/microbiology , Genome, Bacterial , Hemolytic-Uremic Syndrome/microbiology , Shiga-Toxigenic Escherichia coli/genetics , Adolescent , Bacterial Typing Techniques , Child , Diarrhea/epidemiology , Diarrhea/microbiology , Feces/microbiology , Female , Germany , Hemolytic-Uremic Syndrome/epidemiology , Humans , Male , Molecular Sequence Data , Phylogeny , Plasmids/genetics , Polymerase Chain Reaction , Sequence Analysis, DNA , Shiga-Toxigenic Escherichia coli/classification , Shiga-Toxigenic Escherichia coli/isolation & purificationABSTRACT
The risk to human health from mosquito-borne viruses such as dengue, chikungunya and yellow fever is increasing due to increased human expansion, deforestation and climate change. To anticipate and predict the spread and transmission of mosquito-borne viruses, a better understanding of the transmission cycle in mosquito populations is needed. We present a pathogen-agnostic combined sequencing protocol for identifying vectors, viral pathogens and their hosts or reservoirs using portable Oxford Nanopore sequencing. Using mosquitoes collected in São Paulo, Brazil, we extracted RNA for virus identification and DNA for blood meal and mosquito identification. Mosquitoes and blood meals were identified by comparing cytochrome c oxidase I (COI) sequences against a curated Barcode of Life Data System (BOLD). Viruses were identified using the SMART-9N protocol, which allows amplified DNA to be prepared with native barcoding for nanopore sequencing. Kraken 2 was employed to detect viral pathogens and Minimap2 and BOLD identified the contents of the blood meal. Due to the high similarity of some species, mosquito identification was conducted using blast after generation of consensus COI sequences using RACON polishing. This protocol can simultaneously uncover viral diversity, mosquito species and mosquito feeding habits. It also has the potential to increase understanding of mosquito genetic diversity and transmission dynamics of zoonotic mosquito-borne viruses.
Subject(s)
Arboviruses , Culicidae , Nanopore Sequencing , Animals , Humans , Culicidae/genetics , Arboviruses/genetics , Mosquito Vectors , Brazil , DNAABSTRACT
IMPORTANCE: Identification of the bacterium responsible for an outbreak can aid in disease management. However, traditional culture-based diagnosis can be difficult, particularly if no specific diagnostic test is available for an outbreak strain. OBJECTIVE: To explore the potential of metagenomics, which is the direct sequencing of DNA extracted from microbiologically complex samples, as an open-ended clinical discovery platform capable of identifying and characterizing bacterial strains from an outbreak without laboratory culture. DESIGN, SETTING, AND PATIENTS: In a retrospective investigation, 45 samples were selected from fecal specimens obtained from patients with diarrhea during the 2011 outbreak of Shiga-toxigenic Escherichia coli (STEC) O104:H4 in Germany. Samples were subjected to high-throughput sequencing (August-September 2012), followed by a 3-phase analysis (November 2012-February 2013). In phase 1, a de novo assembly approach was developed to obtain a draft genome of the outbreak strain. In phase 2, the depth of coverage of the outbreak strain genome was determined in each sample. In phase 3, sequences from each sample were compared with sequences from known bacteria to identify pathogens other than the outbreak strain. MAIN OUTCOMES AND MEASURES: The recovery of genome sequence data for the purposes of identification and characterization of the outbreak strain and other pathogens from fecal samples. RESULTS: During phase 1, a draft genome of the STEC outbreak strain was obtained. During phase 2, the outbreak strain genome was recovered from 10 samples at greater than 10-fold coverage and from 26 samples at greater than 1-fold coverage. Sequences from the Shiga-toxin genes were detected in 27 of 40 STEC-positive samples (67%). In phase 3, sequences from Clostridium difficile, Campylobacter jejuni, Campylobacter concisus, and Salmonella enterica were recovered. CONCLUSIONS AND RELEVANCE: These results suggest the potential of metagenomics as a culture-independent approach for the identification of bacterial pathogens during an outbreak of diarrheal disease. Challenges include improving diagnostic sensitivity, speeding up and simplifying workflows, and reducing costs.