RESUMO
We present evidence for multiple independent origins of recombinant SARS-CoV-2 viruses sampled from late 2020 and early 2021 in the United Kingdom. Their genomes carry single-nucleotide polymorphisms and deletions that are characteristic of the B.1.1.7 variant of concern but lack the full complement of lineage-defining mutations. Instead, the remainder of their genomes share contiguous genetic variation with non-B.1.1.7 viruses circulating in the same geographic area at the same time as the recombinants. In four instances, there was evidence for onward transmission of a recombinant-origin virus, including one transmission cluster of 45 sequenced cases over the course of 2 months. The inferred genomic locations of recombination breakpoints suggest that every community-transmitted recombinant virus inherited its spike region from a B.1.1.7 parental virus, consistent with a transmission advantage for B.1.1.7's set of mutations.
Assuntos
COVID-19/epidemiologia , COVID-19/transmissão , Pandemias , Recombinação Genética , SARS-CoV-2/genética , Sequência de Bases/genética , COVID-19/virologia , Biologia Computacional/métodos , Frequência do Gene , Genoma Viral , Genótipo , Humanos , Mutação , Filogenia , Polimorfismo de Nucleotídeo Único , Reino Unido/epidemiologia , Sequenciamento Completo do Genoma/métodosRESUMO
Global dispersal and increasing frequency of the SARS-CoV-2 spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large dataset, well represented by both spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant.
Assuntos
Substituição de Aminoácidos , COVID-19/transmissão , COVID-19/virologia , SARS-CoV-2/genética , SARS-CoV-2/patogenicidade , Glicoproteína da Espícula de Coronavírus/genética , Ácido Aspártico/análise , Ácido Aspártico/genética , COVID-19/epidemiologia , Genoma Viral , Glicina/análise , Glicina/genética , Humanos , Mutação , SARS-CoV-2/crescimento & desenvolvimento , Reino Unido/epidemiologia , Virulência , Sequenciamento Completo do GenomaRESUMO
MOTIVATION: Population-level genetic variation enables competitiveness and niche specialization in microbial communities. Despite the difficulty in culturing many microbes from an environment, we can still study these communities by isolating and sequencing DNA directly from an environment (metagenomics). Recovering the genomic sequences of all isoforms of a given gene across all organisms in a metagenomic sample would aid evolutionary and ecological insights into microbial ecosystems with potential benefits for medicine and biotechnology. A significant obstacle to this goal arises from the lack of a computationally tractable solution that can recover these sequences from sequenced read fragments. This poses a problem analogous to reconstructing the two sequences that make up the genome of a diploid organism (i.e. haplotypes) but for an unknown number of individuals and haplotypes. RESULTS: The problem of single individual haplotyping was first formalized by Lancia et al. in 2001. Now, nearly two decades later, we discuss the complexity of 'haplotyping' metagenomic samples, with a new formalization of Lancia et al.'s data structure that allows us to effectively extend the single individual haplotype problem to microbial communities. This work describes and formalizes the problem of recovering genes (and other genomic subsequences) from all individuals within a complex community sample, which we term the metagenomic individual haplotyping problem. We also provide software implementations for a pairwise single nucleotide variant (SNV) co-occurrence matrix and greedy graph traversal algorithm. AVAILABILITY AND IMPLEMENTATION: Our reference implementation of the described pairwise SNV matrix (Hansel) and greedy haplotype path traversal algorithm (Gretel) is open source, MIT licensed and freely available online at github.com/samstudio8/hansel and github.com/samstudio8/gretel, respectively.
RESUMO
UNLABELLED: : We present Goldilocks: a Python package providing functionality for collecting summary statistics, identifying shifts in variation, discovering outlier regions and locating and extracting interesting regions from one or more arbitrary genomes for further analysis, for a user-provided definition of interesting. AVAILABILITY AND IMPLEMENTATION: Goldilocks is freely available open-source software distributed under the MIT licence. Source code is hosted publicly at https://github.com/SamStudio8/goldilocks and the package may also be installed using pip install goldilocks. Documentation can be found at https://goldilocks.readthedocs.org CONTACT: : msn@aber.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional/métodos , Genômica/métodos , SoftwareRESUMO
The scale of data produced during the SARS-CoV-2 pandemic has been unprecedented, with more than 13 million sequences shared publicly at the time of writing. This wealth of sequence data provides important context for interpreting local outbreaks. However, placing sequences of interest into national and international context is difficult given the size of the global dataset. Often outbreak investigations and genomic surveillance efforts require running similar analyses again and again on the latest dataset and producing reports. We developed civet (cluster investigation and virus epidemiology tool) to aid these routine analyses and facilitate virus outbreak investigation and surveillance. Civet can place sequences of interest in the local context of background diversity, resolving the query into different 'catchments' and presenting the phylogenetic results alongside metadata in an interactive, distributable report. Civet can be used on a fine scale for clinical outbreak investigation, for local surveillance and cluster discovery, and to routinely summarise the virus diversity circulating on a national level. Civet reports have helped researchers and public health bodies feedback genomic information in the appropriate context within a timeframe that is useful for public health.
RESUMO
BACKGROUND: The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatics tools and resources, and advocate for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a need for a fit-for-purpose, open-source SARS-CoV-2 contextual data standard. RESULTS: As such, we have developed a SARS-CoV-2 contextual data specification package based on harmonizable, publicly available community standards. The specification can be implemented via a collection template, as well as an array of protocols and tools to support both the harmonization and submission of sequence data and contextual information to public biorepositories. CONCLUSIONS: Well-structured, rich contextual data add value, promote reuse, and enable aggregation and integration of disparate datasets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19. The package is now supported by the NCBI's BioSample database.
Assuntos
COVID-19 , SARS-CoV-2 , Genômica , Humanos , Metadados , Saúde Pública , Reprodutibilidade dos TestesRESUMO
In response to the ongoing SARS-CoV-2 pandemic in the UK, the COVID-19 Genomics UK (COG-UK) consortium was formed to rapidly sequence SARS-CoV-2 genomes as part of a national-scale genomic surveillance strategy. The network consists of universities, academic institutes, regional sequencing centres and the four UK Public Health Agencies. We describe the development and deployment of CLIMB-COVID, an encompassing digital infrastructure to address the challenge of collecting and integrating both genomic sequencing data and sample-associated metadata produced across the COG-UK network.
Assuntos
Computação em Nuvem , Genômica/organização & administração , SARS-CoV-2/genética , COVID-19/epidemiologia , Monitoramento Epidemiológico , Genoma Viral , Humanos , Análise de Sequência de DNA , Reino Unido , Interface Usuário-Computador , Sequenciamento Completo do GenomaRESUMO
Extensive global sampling and sequencing of the pandemic virus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have enabled researchers to monitor its spread and to identify concerning new variants. Two important determinants of variant spread are how frequently they arise within individuals and how likely they are to be transmitted. To characterize within-host diversity and transmission, we deep-sequenced 1313 clinical samples from the United Kingdom. SARS-CoV-2 infections are characterized by low levels of within-host diversity when viral loads are high and by a narrow bottleneck at transmission. Most variants are either lost or occasionally fixed at the point of transmission, with minimal persistence of shared diversity, patterns that are readily observable on the phylogenetic tree. Our results suggest that transmission-enhancing and/or immune-escape SARS-CoV-2 variants are likely to arise infrequently but could spread rapidly if successfully transmitted.
Assuntos
COVID-19/transmissão , COVID-19/virologia , Variação Genética , SARS-CoV-2/genética , COVID-19/imunologia , Coinfecção/virologia , Infecções por Coronavirus/virologia , Coronavirus Humano OC43 , Características da Família , Genoma Viral , Humanos , Evasão da Resposta Imune , Mutação , Filogenia , RNA Viral/genética , RNA-Seq , SARS-CoV-2/patogenicidade , SARS-CoV-2/fisiologia , Seleção Genética , Glicoproteína da Espícula de Coronavírus/genética , Reino Unido , Carga ViralRESUMO
The United Kingdom's COVID-19 epidemic during early 2020 was one of world's largest and was unusually well represented by virus genomic sampling. We determined the fine-scale genetic lineage structure of this epidemic through analysis of 50,887 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes, including 26,181 from the UK sampled throughout the country's first wave of infection. Using large-scale phylogenetic analyses combined with epidemiological and travel data, we quantified the size, spatiotemporal origins, and persistence of genetically distinct UK transmission lineages. Rapid fluctuations in virus importation rates resulted in >1000 lineages; those introduced prior to national lockdown tended to be larger and more dispersed. Lineage importation and regional lineage diversity declined after lockdown, whereas lineage elimination was size-dependent. We discuss the implications of our genetic perspective on transmission dynamics for COVID-19 epidemiology and control.
Assuntos
COVID-19/epidemiologia , COVID-19/virologia , Genoma Viral , SARS-CoV-2/genética , COVID-19/prevenção & controle , COVID-19/transmissão , Cadeia de Infecção , Controle de Doenças Transmissíveis , Doenças Transmissíveis Importadas/epidemiologia , Doenças Transmissíveis Importadas/virologia , Epidemias , Humanos , Filogenia , Viagem , Reino Unido/epidemiologiaRESUMO
Brazil currently has one of the fastest-growing severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) epidemics in the world. Because of limited available data, assessments of the impact of nonpharmaceutical interventions (NPIs) on this virus spread remain challenging. Using a mobility-driven transmission model, we show that NPIs reduced the reproduction number from >3 to 1 to 1.6 in São Paulo and Rio de Janeiro. Sequencing of 427 new genomes and analysis of a geographically representative genomic dataset identified >100 international virus introductions in Brazil. We estimate that most (76%) of the Brazilian strains fell in three clades that were introduced from Europe between 22 February and 11 March 2020. During the early epidemic phase, we found that SARS-CoV-2 spread mostly locally and within state borders. After this period, despite sharp decreases in air travel, we estimated multiple exportations from large urban centers that coincided with a 25% increase in average traveled distances in national flights. This study sheds new light on the epidemic transmission and evolutionary trajectories of SARS-CoV-2 lineages in Brazil and provides evidence that current interventions remain insufficient to keep virus transmission under control in this country.
Assuntos
Betacoronavirus/genética , Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/transmissão , Pneumonia Viral/epidemiologia , Pneumonia Viral/transmissão , Número Básico de Reprodução , Teorema de Bayes , Betacoronavirus/classificação , Brasil/epidemiologia , COVID-19 , Teste para COVID-19 , Cidades/epidemiologia , Técnicas de Laboratório Clínico , Infecções por Coronavirus/diagnóstico , Infecções por Coronavirus/prevenção & controle , Infecções por Coronavirus/virologia , Europa (Continente) , Evolução Molecular , Genoma Viral , Humanos , Modelos Genéticos , Modelos Estatísticos , Pandemias/prevenção & controle , Filogenia , Filogeografia , Pneumonia Viral/prevenção & controle , Pneumonia Viral/virologia , SARS-CoV-2 , Análise Espaço-Temporal , Viagem , População UrbanaRESUMO
BACKGROUND: Long sequencing reads are information-rich: aiding de novo assembly and reference mapping, and consequently have great potential for the study of microbial communities. However, the best approaches for analysis of long-read metagenomic data are unknown. Additionally, rigorous evaluation of bioinformatics tools is hindered by a lack of long-read data from validated samples with known composition. FINDINGS: We sequenced 2 commercially available mock communities containing 10 microbial species (ZymoBIOMICS Microbial Community Standards) with Oxford Nanopore GridION and PromethION. Both communities and the 10 individual species isolates were also sequenced with Illumina technology. We generated 14 and 16 gigabase pairs from 2 GridION flowcells and 150 and 153 gigabase pairs from 2 PromethION flowcells for the evenly distributed and log-distributed communities, respectively. Read length N50 ranged between 5.3 and 5.4 kilobase pairs over the 4 sequencing runs. Basecalls and corresponding signal data are made available (4.2 TB in total). Alignment to Illumina-sequenced isolates demonstrated the expected microbial species at anticipated abundances, with the limit of detection for the lowest abundance species below 50 cells (GridION). De novo assembly of metagenomes recovered long contiguous sequences without the need for pre-processing techniques such as binning. CONCLUSIONS: We present ultra-deep, long-read nanopore datasets from a well-defined mock community. These datasets will be useful for those developing bioinformatics methods for long-read metagenomics and for the validation and comparison of current laboratory and software pipelines.