Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 50
Filter
Add more filters

Publication year range
1.
Cell ; 181(5): 997-1003.e9, 2020 05 28.
Article in English | MEDLINE | ID: mdl-32359424

ABSTRACT

Coronavirus disease 2019 (COVID-19) is caused by SARS-CoV-2 infection and was first reported in central China in December 2019. Extensive molecular surveillance in Guangdong, China's most populous province, during early 2020 resulted in 1,388 reported RNA-positive cases from 1.6 million tests. In order to understand the molecular epidemiology and genetic diversity of SARS-CoV-2 in China, we generated 53 genomes from infected individuals in Guangdong using a combination of metagenomic sequencing and tiling amplicon approaches. Combined epidemiological and phylogenetic analyses indicate multiple independent introductions to Guangdong, although phylogenetic clustering is uncertain because of low virus genetic variation early in the pandemic. Our results illustrate how the timing, size, and duration of putative local transmission chains were constrained by national travel restrictions and by the province's large-scale intensive surveillance and intervention measures. Despite these successes, COVID-19 surveillance in Guangdong is still required, because the number of cases imported from other countries has increased.


Subject(s)
Betacoronavirus/genetics , Coronavirus Infections/epidemiology , Pneumonia, Viral/epidemiology , Bayes Theorem , COVID-19 , China/epidemiology , Coronavirus Infections/virology , Epidemiological Monitoring , Humans , Likelihood Functions , Pandemics , Pneumonia, Viral/virology , SARS-CoV-2 , Travel
2.
Nature ; 627(8002): 182-188, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38267579

ABSTRACT

The origins of treponemal diseases have long remained unknown, especially considering the sudden onset of the first syphilis epidemic in the late 15th century in Europe and its hypothesized arrival from the Americas with Columbus' expeditions1,2. Recently, ancient DNA evidence has revealed various treponemal infections circulating in early modern Europe and colonial-era Mexico3-6. However, there has been to our knowledge no genomic evidence of treponematosis recovered from either the Americas or the Old World that can be reliably dated to the time before the first trans-Atlantic contacts. Here, we present treponemal genomes from nearly 2,000-year-old human remains from Brazil. We reconstruct four ancient genomes of a prehistoric treponemal pathogen, most closely related to the bejel-causing agent Treponema pallidum endemicum. Contradicting the modern day geographical niche of bejel in the arid regions of the world, the results call into question the previous palaeopathological characterization of treponeme subspecies and showcase their adaptive potential. A high-coverage genome is used to improve molecular clock date estimations, placing the divergence of modern T. pallidum subspecies firmly in pre-Columbian times. Overall, our study demonstrates the opportunities within archaeogenetics to uncover key events in pathogen evolution and emergence, paving the way to new hypotheses on the origin and spread of treponematoses.


Subject(s)
Evolution, Molecular , Genome, Bacterial , Treponema pallidum , Treponemal Infections , Humans , Brazil/epidemiology , Brazil/ethnology , Europe/epidemiology , Genome, Bacterial/genetics , History, 15th Century , History, Ancient , Syphilis/epidemiology , Syphilis/history , Syphilis/microbiology , Syphilis/transmission , Treponema pallidum/classification , Treponema pallidum/genetics , Treponema pallidum/isolation & purification , Treponemal Infections/epidemiology , Treponemal Infections/history , Treponemal Infections/microbiology , Treponemal Infections/transmission
3.
Nature ; 610(7930): 154-160, 2022 10.
Article in English | MEDLINE | ID: mdl-35952712

ABSTRACT

The SARS-CoV-2 Delta (Pango lineage B.1.617.2) variant of concern spread globally, causing resurgences of COVID-19 worldwide1,2. The emergence of the Delta variant in the UK occurred on the background of a heterogeneous landscape of immunity and relaxation of non-pharmaceutical interventions. Here we analyse 52,992 SARS-CoV-2 genomes from England together with 93,649 genomes from the rest of the world to reconstruct the emergence of Delta and quantify its introduction to and regional dissemination across England in the context of changing travel and social restrictions. Using analysis of human movement, contact tracing and virus genomic data, we find that the geographic focus of the expansion of Delta shifted from India to a more global pattern in early May 2021. In England, Delta lineages were introduced more than 1,000 times and spread nationally as non-pharmaceutical interventions were relaxed. We find that hotel quarantine for travellers reduced onward transmission from importations; however, the transmission chains that later dominated the Delta wave in England were seeded before travel restrictions were introduced. Increasing inter-regional travel within England drove the nationwide dissemination of Delta, with some cities receiving more than 2,000 observable lineage introductions from elsewhere. Subsequently, increased levels of local population mixing-and not the number of importations-were associated with the faster relative spread of Delta. The invasion dynamics of Delta depended on spatial heterogeneity in contact patterns, and our findings will inform optimal spatial interventions to reduce the transmission of current and future variants of concern, such as Omicron (Pango lineage B.1.1.529).


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/transmission , COVID-19/virology , Cities/epidemiology , Contact Tracing , England/epidemiology , Genome, Viral/genetics , Humans , Quarantine/legislation & jurisprudence , SARS-CoV-2/genetics , SARS-CoV-2/growth & development , SARS-CoV-2/isolation & purification , Travel/legislation & jurisprudence
4.
Nature ; 603(7902): 679-686, 2022 03.
Article in English | MEDLINE | ID: mdl-35042229

ABSTRACT

The SARS-CoV-2 epidemic in southern Africa has been characterized by three distinct waves. The first was associated with a mix of SARS-CoV-2 lineages, while the second and third waves were driven by the Beta (B.1.351) and Delta (B.1.617.2) variants, respectively1-3. In November 2021, genomic surveillance teams in South Africa and Botswana detected a new SARS-CoV-2 variant associated with a rapid resurgence of infections in Gauteng province, South Africa. Within three days of the first genome being uploaded, it was designated a variant of concern (Omicron, B.1.1.529) by the World Health Organization and, within three weeks, had been identified in 87 countries. The Omicron variant is exceptional for carrying over 30 mutations in the spike glycoprotein, which are predicted to influence antibody neutralization and spike function4. Here we describe the genomic profile and early transmission dynamics of Omicron, highlighting the rapid spread in regions with high levels of population immunity.


Subject(s)
COVID-19/epidemiology , COVID-19/virology , Immune Evasion , SARS-CoV-2/isolation & purification , Antibodies, Neutralizing/immunology , Botswana/epidemiology , COVID-19/immunology , COVID-19/transmission , Humans , Models, Molecular , Mutation , Phylogeny , Recombination, Genetic , SARS-CoV-2/classification , SARS-CoV-2/immunology , South Africa/epidemiology , Spike Glycoprotein, Coronavirus/genetics , Spike Glycoprotein, Coronavirus/immunology
5.
Mol Biol Evol ; 39(2)2022 02 03.
Article in English | MEDLINE | ID: mdl-35038728

ABSTRACT

High-throughput sequencing enables rapid genome sequencing during infectious disease outbreaks and provides an opportunity to quantify the evolutionary dynamics of pathogens in near real-time. One difficulty of undertaking evolutionary analyses over short timescales is the dependency of the inferred evolutionary parameters on the timespan of observation. Crucially, there are an increasing number of molecular clock analyses using external evolutionary rate priors to infer evolutionary parameters. However, it is not clear which rate prior is appropriate for a given time window of observation due to the time-dependent nature of evolutionary rate estimates. Here, we characterize the molecular evolutionary dynamics of SARS-CoV-2 and 2009 pandemic H1N1 (pH1N1) influenza during the first 12 months of their respective pandemics. We use Bayesian phylogenetic methods to estimate the dates of emergence, evolutionary rates, and growth rates of SARS-CoV-2 and pH1N1 over time and investigate how varying sampling window and data set sizes affect the accuracy of parameter estimation. We further use a generalized McDonald-Kreitman test to estimate the number of segregating nonneutral sites over time. We find that the inferred evolutionary parameters for both pandemics are time dependent, and that the inferred rates of SARS-CoV-2 and pH1N1 decline by ∼50% and ∼100%, respectively, over the course of 1 year. After at least 4 months since the start of sequence sampling, inferred growth rates and emergence dates remain relatively stable and can be inferred reliably using a logistic growth coalescent model. We show that the time dependency of the mean substitution rate is due to elevated substitution rates at terminal branches which are 2-4 times higher than those of internal branches for both viruses. The elevated rate at terminal branches is strongly correlated with an increasing number of segregating nonneutral sites, demonstrating the role of purifying selection in generating the time dependency of evolutionary parameters during pandemics.


Subject(s)
COVID-19 , Influenza A Virus, H1N1 Subtype , Influenza, Human , Bayes Theorem , Humans , Influenza A Virus, H1N1 Subtype/genetics , Influenza, Human/epidemiology , Phylogeny , SARS-CoV-2
6.
PLoS Comput Biol ; 18(2): e1009805, 2022 02.
Article in English | MEDLINE | ID: mdl-35148311

ABSTRACT

Inferring the dynamics of pathogen transmission during an outbreak is an important problem in infectious disease epidemiology. In mathematical epidemiology, estimates are often informed by time series of confirmed cases, while in phylodynamics genetic sequences of the pathogen, sampled through time, are the primary data source. Each type of data provides different, and potentially complementary, insight. Recent studies have recognised that combining data sources can improve estimates of the transmission rate and the number of infected individuals. However, inference methods are typically highly specialised and field-specific and are either computationally prohibitive or require intensive simulation, limiting their real-time utility. We present a novel birth-death phylogenetic model and derive a tractable analytic approximation of its likelihood, the computational complexity of which is linear in the size of the dataset. This approach combines epidemiological and phylodynamic data to produce estimates of key parameters of transmission dynamics and the unobserved prevalence. Using simulated data, we show (a) that the approximation agrees well with existing methods, (b) validate the claim of linear complexity and (c) explore robustness to model misspecification. This approximation facilitates inference on large datasets, which is increasingly important as large genomic sequence datasets become commonplace.


Subject(s)
Disease Outbreaks , Genomics , Computer Simulation , Humans , Phylogeny , Probability
7.
Emerg Infect Dis ; 28(4): 751-758, 2022 04.
Article in English | MEDLINE | ID: mdl-35203112

ABSTRACT

Limited genomic sampling in many high-incidence countries has impeded studies of severe respiratory syndrome coronavirus 2 (SARS-CoV-2) genomic epidemiology. Consequently, critical questions remain about the generation and global distribution of virus genetic diversity. We investigated SARS-CoV-2 transmission dynamics in Gujarat, India, during the state's first epidemic wave to shed light on spread of the virus in one of the regions hardest hit by the pandemic. By integrating case data and 434 whole-genome sequences sampled across 20 districts, we reconstructed the epidemic dynamics and spatial spread of SARS-CoV-2 in Gujarat. Our findings indicate global and regional connectivity and population density were major drivers of the Gujarat outbreak. We detected >100 virus lineage introductions, most of which appear to be associated with international travel. Within Gujarat, virus dissemination occurred predominantly from densely populated regions to geographically proximate locations that had low population density, suggesting that urban centers contributed disproportionately to virus spread.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , Genome, Viral , Genomics , Humans , India/epidemiology , Phylogeny , SARS-CoV-2/genetics
8.
PLoS Pathog ; 16(8): e1008699, 2020 08.
Article in English | MEDLINE | ID: mdl-32764827

ABSTRACT

São Paulo, a densely inhabited state in southeast Brazil that contains the fourth most populated city in the world, recently experienced its largest yellow fever virus (YFV) outbreak in decades. YFV does not normally circulate extensively in São Paulo, so most people were unvaccinated when the outbreak began. Surveillance in non-human primates (NHPs) is important for determining the magnitude and geographic extent of an epizootic, thereby helping to evaluate the risk of YFV spillover to humans. Data from infected NHPs can give more accurate insights into YFV spread than when using data from human cases alone. To contextualise human cases, identify epizootic foci and uncover the rate and direction of YFV spread in São Paulo, we generated and analysed virus genomic data and epizootic case data from NHPs in São Paulo. We report the occurrence of three spatiotemporally distinct phases of the outbreak in São Paulo prior to February 2018. We generated 51 new virus genomes from YFV positive cases identified in 23 different municipalities in São Paulo, mostly sampled from NHPs between October 2016 and January 2018. Although we observe substantial heterogeneity in lineage dispersal velocities between phylogenetic branches, continuous phylogeographic analyses of generated YFV genomes suggest that YFV lineages spread in São Paulo at a mean rate of approximately 1km per day during all phases of the outbreak. Viral lineages from the first epizootic phase in northern São Paulo subsequently dispersed towards the south of the state to cause the second and third epizootic phases there. This alters our understanding of how YFV was introduced into the densely populated south of São Paulo state. Our results shed light on the sylvatic transmission of YFV in highly fragmented forested regions in São Paulo state and highlight the importance of continued surveillance of zoonotic pathogens in sentinel species.


Subject(s)
Genome, Viral , Primate Diseases/virology , Yellow Fever/veterinary , Yellow Fever/virology , Yellow fever virus/genetics , Zoonoses/virology , Animals , Brazil/epidemiology , Disease Outbreaks , Genomics , Humans , Phylogeny , Phylogeography , Primate Diseases/epidemiology , Primate Diseases/transmission , Primates/virology , Yellow Fever/epidemiology , Yellow Fever/transmission , Yellow fever virus/classification , Yellow fever virus/isolation & purification , Zoonoses/epidemiology , Zoonoses/transmission
9.
Proc Natl Acad Sci U S A ; 116(35): 17231-17238, 2019 08 27.
Article in English | MEDLINE | ID: mdl-31405970

ABSTRACT

Archaeological evidence indicates that pig domestication had begun by ∼10,500 y before the present (BP) in the Near East, and mitochondrial DNA (mtDNA) suggests that pigs arrived in Europe alongside farmers ∼8,500 y BP. A few thousand years after the introduction of Near Eastern pigs into Europe, however, their characteristic mtDNA signature disappeared and was replaced by haplotypes associated with European wild boars. This turnover could be accounted for by substantial gene flow from local European wild boars, although it is also possible that European wild boars were domesticated independently without any genetic contribution from the Near East. To test these hypotheses, we obtained mtDNA sequences from 2,099 modern and ancient pig samples and 63 nuclear ancient genomes from Near Eastern and European pigs. Our analyses revealed that European domestic pigs dating from 7,100 to 6,000 y BP possessed both Near Eastern and European nuclear ancestry, while later pigs possessed no more than 4% Near Eastern ancestry, indicating that gene flow from European wild boars resulted in a near-complete disappearance of Near East ancestry. In addition, we demonstrate that a variant at a locus encoding black coat color likely originated in the Near East and persisted in European pigs. Altogether, our results indicate that while pigs were not independently domesticated in Europe, the vast majority of human-mediated selection over the past 5,000 y focused on the genomic fraction derived from the European wild boars, and not on the fraction that was selected by early Neolithic farmers over the first 2,500 y of the domestication process.


Subject(s)
DNA, Ancient , DNA, Mitochondrial/genetics , Domestication , Gene Flow , Phylogeny , Swine/genetics , Animals , Europe , History, Ancient , Middle East , Skin Pigmentation/genetics
10.
Mol Biol Evol ; 37(8): 2414-2429, 2020 08 01.
Article in English | MEDLINE | ID: mdl-32003829

ABSTRACT

Estimating past population dynamics from molecular sequences that have been sampled longitudinally through time is an important problem in infectious disease epidemiology, molecular ecology, and macroevolution. Popular solutions, such as the skyline and skygrid methods, infer past effective population sizes from the coalescent event times of phylogenies reconstructed from sampled sequences but assume that sequence sampling times are uninformative about population size changes. Recent work has started to question this assumption by exploring how sampling time information can aid coalescent inference. Here, we develop, investigate, and implement a new skyline method, termed the epoch sampling skyline plot (ESP), to jointly estimate the dynamics of population size and sampling rate through time. The ESP is inspired by real-world data collection practices and comprises a flexible model in which the sequence sampling rate is proportional to the population size within an epoch but can change discontinuously between epochs. We show that the ESP is accurate under several realistic sampling protocols and we prove analytically that it can at least double the best precision achievable by standard approaches. We generalize the ESP to incorporate phylogenetic uncertainty in a new Bayesian package (BESP) in BEAST2. We re-examine two well-studied empirical data sets from virus epidemiology and molecular evolution and find that the BESP improves upon previous coalescent estimators and generates new, biologically useful insights into the sampling protocols underpinning these data sets. Sequence sampling times provide a rich source of information for coalescent inference that will become increasingly important as sequence collection intensifies and becomes more formalized.


Subject(s)
Models, Genetic , Population Density , Population Dynamics , Animals , Bison/genetics , Humans , Influenza A virus/genetics
11.
Proc Natl Acad Sci U S A ; 115(16): 4200-4205, 2018 04 17.
Article in English | MEDLINE | ID: mdl-29610334

ABSTRACT

Bayesian phylogenetics aims at estimating phylogenetic trees together with evolutionary and population dynamic parameters based on genetic sequences. It has been noted that the clock rate, one of the evolutionary parameters, decreases with an increase in the sampling period of sequences. In particular, clock rates of epidemic outbreaks are often estimated to be higher compared with the long-term clock rate. Purifying selection has been suggested as a biological factor that contributes to this phenomenon, since it purges slightly deleterious mutations from a population over time. However, other factors such as methodological biases may also play a role and make a biological interpretation of results difficult. In this paper, we identify methodological biases originating from the choice of tree prior, that is, the model specifying epidemiological dynamics. With a simulation study we demonstrate that a misspecification of the tree prior can upwardly bias the inferred clock rate and that the interplay of the different models involved in the inference can be complex and nonintuitive. We also show that the choice of tree prior can influence the inference of clock rate on real-world Ebola virus (EBOV) datasets. While commonly used tree priors result in very high clock-rate estimates for sequences from the initial phase of the epidemic in Sierra Leone, tree priors allowing for population structure lead to estimates agreeing with the long-term rate for EBOV.


Subject(s)
Biological Evolution , Computer Simulation , Ebolavirus/genetics , Epidemics , Genetics, Population/methods , Models, Genetic , Mutation Rate , Phylogeny , Bayes Theorem , Bias , Calibration , Evolution, Molecular , Humans , Sierra Leone
12.
PLoS Comput Biol ; 15(4): e1006650, 2019 04.
Article in English | MEDLINE | ID: mdl-30958812

ABSTRACT

Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.


Subject(s)
Bayes Theorem , Biological Evolution , Phylogeny , Software , Animals , Computational Biology , Computer Simulation , Evolution, Molecular , Humans , Markov Chains , Models, Genetic , Monte Carlo Method
13.
J Infect Dis ; 220(2): 233-243, 2019 06 19.
Article in English | MEDLINE | ID: mdl-30805610

ABSTRACT

BACKGROUND: Estimation of temporal changes in human immunodeficiency virus (HIV) transmission patterns can help to elucidate the impact of preventive strategies and public health policies. METHODS: Portuguese HIV-1 subtype B and G pol genetic sequences were appended to global reference data sets to identify country-specific transmission clades. Bayesian birth-death models were used to estimate subtype-specific effective reproductive numbers (Re). Discrete trait analysis (DTA) was used to quantify mixing among transmission groups. RESULTS: We identified 5 subtype B Portuguese clades (26-79 sequences) and a large monophyletic subtype G Portuguese clade (236 sequences). We estimated that major shifts in HIV-1 transmission occurred around 1999 (95% Bayesian credible interval [BCI], 1998-2000) and 2000 (95% BCI, 1998-2001) for subtypes B and G, respectively. For subtype B, Re dropped from 1.91 (95% BCI, 1.73-2.09) to 0.62 (95% BCI,.52-.72). For subtype G, Re decreased from 1.49 (95% BCI, 1.39-1.59) to 0.72 (95% BCI, .63-.8). The DTA suggests that people who inject drugs (PWID) and heterosexuals were the source of most (>80%) virus lineage transitions for subtypes G and B, respectively. CONCLUSIONS: The estimated declines in Re coincide with the introduction of highly active antiretroviral therapy and the scale-up of harm reduction for PWID. Inferred transmission events across transmission groups emphasize the importance of prevention efforts for bridging populations.


Subject(s)
HIV Infections/epidemiology , HIV Infections/transmission , HIV-1/genetics , Bayes Theorem , HIV Infections/virology , Humans , Molecular Epidemiology , Phylogeny , Portugal/epidemiology , Public Health , pol Gene Products, Human Immunodeficiency Virus/genetics
14.
Mol Biol Evol ; 33(9): 2454-68, 2016 09.
Article in English | MEDLINE | ID: mdl-27189564

ABSTRACT

Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations.


Subject(s)
Genetic Fitness , Models, Genetic , Models, Statistical , Adaptation, Physiological , Biological Evolution , Biometry/methods , Epistasis, Genetic , Evolution, Molecular , Humans , Mutation , RNA/genetics , Selection, Genetic/genetics , Sequence Analysis, RNA/methods
15.
Proc Natl Acad Sci U S A ; 111(9): 3496-501, 2014 Mar 04.
Article in English | MEDLINE | ID: mdl-24550506

ABSTRACT

In many systems, host-parasite evolutionary dynamics have led to the emergence and maintenance of diverse parasite and host genotypes within the same population. Genotypes vary in key attributes: Parasite genotypes vary in ability to infect, host genotypes vary in susceptibility, and infection outcome is frequently the result of both parties' genotypic identities. These host-parasite genotype-by-genotype (GH × GP) interactions influence evolutionary and ecological dynamics in important ways. Interactions can be produced through genetic variation; however, here, we assess the role of variable gene expression as an additional source of GH × GP interactions. The bumblebee Bombus terrestris and its trypanosome gut parasite Crithidia bombi are a model system for host-parasite matching. Full-transcriptome sequencing of the bumblebee host revealed that different parasite genotypes indeed induce fundamentally different host expression responses and host genotypes vary in their responses to the infecting parasite genotype. It appears that broadly and successfully infecting parasite genotypes lead to reduced host immune gene expression relative to unexposed bees but induce the expression of genes responsible for controlling gene expression. Contrastingly, a poorly infecting parasite genotype induced the expression of immunologically important genes, including antimicrobial peptides. A targeted expression assay confirmed the transcriptome results and also revealed strong host genotype effects. In all, the expression of a number of genes depends on the host genotype and the parasite genotype and the interaction between both host and parasite genotypes. These results suggest that alongside sequence variation in coding immunological genes, variation that controls immune gene expression can also produce patterns of host-parasite specificity.


Subject(s)
Bees/parasitology , Biological Evolution , Crithidia/physiology , Gene Expression Regulation/immunology , Host-Parasite Interactions/immunology , Analysis of Variance , Animals , Base Sequence , Bees/immunology , Computational Biology , DNA Primers/genetics , Gene Expression Profiling , Gene Expression Regulation/genetics , Gene Ontology , Genotype , Host-Parasite Interactions/genetics , Molecular Sequence Data , Sequence Analysis, RNA , Species Specificity , Switzerland
16.
Gigascience ; 132024 Jan 02.
Article in English | MEDLINE | ID: mdl-39347649

ABSTRACT

The large amount and diversity of viral genomic datasets generated by next-generation sequencing technologies poses a set of challenges for computational data analysis workflows, including rigorous quality control, scaling to large sample sizes, and tailored steps for specific applications. Here, we present V-pipe 3.0, a computational pipeline designed for analyzing next-generation sequencing data of short viral genomes. It is developed to enable reproducible, scalable, adaptable, and transparent inference of genetic diversity of viral samples. By presenting 2 large-scale data analysis projects, we demonstrate the effectiveness of V-pipe 3.0 in supporting sustainable viral genomic data science.


Subject(s)
Genetic Variation , Genome, Viral , High-Throughput Nucleotide Sequencing , Software , High-Throughput Nucleotide Sequencing/methods , Computational Biology/methods , Genomics/methods , Viruses/genetics , Humans
17.
Nat Commun ; 15(1): 7123, 2024 Aug 20.
Article in English | MEDLINE | ID: mdl-39164246

ABSTRACT

Vast amounts of pathogen genomic, demographic and spatial data are transforming our understanding of SARS-CoV-2 emergence and spread. We examined the drivers of molecular evolution and spread of 291,791 SARS-CoV-2 genomes from Denmark in 2021. With a sequencing rate consistently exceeding 60%, and up to 80% of PCR-positive samples between March and November, the viral genome set is broadly whole-epidemic representative. We identify a consistent rise in viral diversity over time, with notable spikes upon the importation of novel variants (e.g., Delta and Omicron). By linking genomic data with rich individual-level demographic data from national registers, we find that individuals aged  < 15 and  > 75 years had a lower contribution to molecular change (i.e., branch lengths) compared to other age groups, but similar molecular evolutionary rates, suggesting a lower likelihood of introducing novel variants. Similarly, we find greater molecular change among vaccinated individuals, suggestive of immune evasion. We also observe evidence of transmission in rural areas to follow predictable diffusion processes. Conversely, urban areas are expectedly more complex due to their high mobility, emphasising the role of population structure in driving virus spread. Our analyses highlight the added value of integrating genomic data with detailed demographic and spatial information, particularly in the absence of structured infection surveys.


Subject(s)
COVID-19 , Genome, Viral , SARS-CoV-2 , Humans , Denmark/epidemiology , COVID-19/epidemiology , COVID-19/virology , COVID-19/transmission , SARS-CoV-2/genetics , SARS-CoV-2/classification , Genome, Viral/genetics , Adult , Middle Aged , Aged , Adolescent , Young Adult , Evolution, Molecular , Male , Female , Child, Preschool , Child , Phylogeny , Infant
18.
Microbiol Spectr ; 12(5): e0362823, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38497714

ABSTRACT

During the SARS-CoV-2 pandemic, many countries directed substantial resources toward genomic surveillance to detect and track viral variants. There is a debate over how much sequencing effort is necessary in national surveillance programs for SARS-CoV-2 and future pandemic threats. We aimed to investigate the effect of reduced sequencing on surveillance outcomes in a large genomic data set from Switzerland, comprising more than 143k sequences. We employed a uniform downsampling strategy using 100 iterations each to investigate the effects of fewer available sequences on the surveillance outcomes: (i) first detection of variants of concern (VOCs), (ii) speed of introduction of VOCs, (iii) diversity of lineages, (iv) first cluster detection of VOCs, (v) density of active clusters, and (vi) geographic spread of clusters. The impact of downsampling on VOC detection is disparate for the three VOC lineages, but many outcomes including introduction and cluster detection could be recapitulated even with only 35% of the original sequencing effort. The effect on the observed speed of introduction and first detection of clusters was more sensitive to reduced sequencing effort for some VOCs, in particular Omicron and Delta, respectively. A genomic surveillance program needs a balance between societal benefits and costs. While the overall national dynamics of the pandemic could be recapitulated by a reduced sequencing effort, the effect is strongly lineage-dependent-something that is unknown at the time of sequencing-and comes at the cost of accuracy, in particular for tracking the emergence of potential VOCs.IMPORTANCESwitzerland had one of the most comprehensive genomic surveillance systems during the COVID-19 pandemic. Such programs need to strike a balance between societal benefits and program costs. Our study aims to answer the question: How would surveillance outcomes have changed had we sequenced less? We find that some outcomes but also certain viral lineages are more affected than others by sequencing less. However, sequencing to around a third of the original effort still captured many important outcomes for the variants of concern such as their first detection but affected more strongly other measures like the detection of first transmission clusters for some lineages. Our work highlights the importance of setting predefined targets for a national genomic surveillance program based on which sequencing effort should be determined. Additionally, the use of a centralized surveillance platform facilitates aggregating data on a national level for rapid public health responses as well as post-analyses.


Subject(s)
COVID-19 , Genome, Viral , SARS-CoV-2 , COVID-19/epidemiology , COVID-19/virology , COVID-19/diagnosis , Humans , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification , SARS-CoV-2/classification , Switzerland/epidemiology , Genome, Viral/genetics , Epidemiological Monitoring , Pandemics , Phylogeny
19.
Brief Bioinform ; 12(6): 723-35, 2011 Nov.
Article in English | MEDLINE | ID: mdl-21330331

ABSTRACT

With high-throughput technologies providing vast amounts of data, it has become more important to provide systematic, quality annotations. The Gene Ontology (GO) project is the largest resource for cataloguing gene function. Nonetheless, its use is not yet ubiquitous and is still fraught with pitfalls. In this review, we provide a short primer to the GO for bioinformaticians. We summarize important aspects of the structure of the ontology, describe sources and types of functional annotations, survey measures of GO annotation similarity, review typical uses of GO and discuss other important considerations pertaining to the use of GO in bioinformatics applications.


Subject(s)
Computational Biology/methods , Molecular Sequence Annotation , Databases, Genetic , Genes , Vocabulary, Controlled
20.
Microb Genom ; 9(5)2023 05.
Article in English | MEDLINE | ID: mdl-37227264

ABSTRACT

Bovine tuberculosis (bTB) is a costly, epidemiologically complex, multi-host, endemic disease. Lack of understanding of transmission dynamics may undermine eradication efforts. Pathogen whole-genome sequencing improves epidemiological inferences, providing a means to determine the relative importance of inter- and intra-species host transmission for disease persistence. We sequenced an exceptional data set of 619 Mycobacterium bovis isolates from badgers and cattle in a 100 km2 bTB 'hotspot' in Northern Ireland. Historical molecular subtyping data permitted the targeting of an endemic pathogen lineage, whose long-term persistence provided a unique opportunity to study disease transmission dynamics in unparalleled detail. Additionally, to assess whether badger population genetic structure was associated with the spatial distribution of pathogen genetic diversity, we microsatellite genotyped hair samples from 769 badgers trapped in this area. Birth death models and TransPhylo analyses indicated that cattle were likely driving the local epidemic, with transmission from cattle to badgers being more common than badger to cattle. Furthermore, the presence of significant badger population genetic structure in the landscape was not associated with the spatial distribution of M. bovis genetic diversity, suggesting that badger-to-badger transmission is not playing a major role in transmission dynamics. Our data were consistent with badgers playing a smaller role in transmission of M. bovis infection in this study site, compared to cattle. We hypothesize, however, that this minor role may still be important for persistence. Comparison to other areas suggests that M. bovis transmission dynamics are likely to be context dependent, with the role of wildlife being difficult to generalize.


Subject(s)
Mustelidae , Mycobacterium bovis , Tuberculosis, Bovine , Animals , Cattle , Mycobacterium bovis/genetics , Mustelidae/microbiology , Northern Ireland/epidemiology , Tuberculosis, Bovine/microbiology , Genomics
SELECTION OF CITATIONS
SEARCH DETAIL