RESUMEN
The World Health Organization declared mpox a public health emergency of international concern in July 2022. To investigate global mpox transmission and population-level changes associated with controlling spread, we built phylogeographic and phylodynamic models to analyze MPXV genomes from five global regions together with air traffic and epidemiological data. Our models reveal community transmission prior to detection, changes in case reporting throughout the epidemic, and a large degree of transmission heterogeneity. We find that viral introductions played a limited role in prolonging spread after initial dissemination, suggesting that travel bans would have had only a minor impact. We find that mpox transmission in North America began declining before more than 10% of high-risk individuals in the USA had vaccine-induced immunity. Our findings highlight the importance of broader routine specimen screening surveillance for emerging infectious diseases and of joint integration of genomic and epidemiological information for early outbreak control.
Asunto(s)
Enfermedades Transmisibles Emergentes , Epidemias , Mpox , Humanos , Brotes de Enfermedades , Mpox/epidemiología , Mpox/transmisión , Mpox/virología , Salud Pública , Monkeypox virus/fisiologíaRESUMEN
Zoonotic spillovers of viruses have occurred through the animal trade worldwide. The start of the COVID-19 pandemic was traced epidemiologically to the Huanan Seafood Wholesale Market. Here, we analyze environmental qPCR and sequencing data collected in the Huanan market in early 2020. We demonstrate that market-linked severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genetic diversity is consistent with market emergence and find increased SARS-CoV-2 positivity near and within a wildlife stall. We identify wildlife DNA in all SARS-CoV-2-positive samples from this stall, including species such as civets, bamboo rats, and raccoon dogs, previously identified as possible intermediate hosts. We also detect animal viruses that infect raccoon dogs, civets, and bamboo rats. Combining metagenomic and phylogenetic approaches, we recover genotypes of market animals and compare them with those from farms and other markets. This analysis provides the genetic basis for a shortlist of potential intermediate hosts of SARS-CoV-2 to prioritize for serological and viral sampling.
Asunto(s)
Animales Salvajes , COVID-19 , Filogenia , SARS-CoV-2 , Animales , COVID-19/epidemiología , COVID-19/virología , SARS-CoV-2/genética , SARS-CoV-2/aislamiento & purificación , Animales Salvajes/virología , Humanos , PandemiasRESUMEN
Game animals are wildlife species traded and consumed as food and are potential reservoirs for SARS-CoV and SARS-CoV-2. We performed a meta-transcriptomic analysis of 1,941 game animals, representing 18 species and five mammalian orders, sampled across China. From this, we identified 102 mammalian-infecting viruses, with 65 described for the first time. Twenty-one viruses were considered as potentially high risk to humans and domestic animals. Civets (Paguma larvata) carried the highest number of potentially high-risk viruses. We inferred the transmission of bat-associated coronavirus from bats to civets, as well as cross-species jumps of coronaviruses from bats to hedgehogs, from birds to porcupines, and from dogs to raccoon dogs. Of note, we identified avian Influenza A virus H9N2 in civets and Asian badgers, with the latter displaying respiratory symptoms, as well as cases of likely human-to-wildlife virus transmission. These data highlight the importance of game animals as potential drivers of disease emergence.
Asunto(s)
Animales Salvajes/virología , Enfermedades Transmisibles Emergentes/virología , Reservorios de Enfermedades , Mamíferos/virología , Viroma , Animales , China , Filogenia , ZoonosisRESUMEN
The independent emergence late in 2020 of the B.1.1.7, B.1.351, and P.1 lineages of SARS-CoV-2 prompted renewed concerns about the evolutionary capacity of this virus to overcome public health interventions and rising population immunity. Here, by examining patterns of synonymous and non-synonymous mutations that have accumulated in SARS-CoV-2 genomes since the pandemic began, we find that the emergence of these three "501Y lineages" coincided with a major global shift in the selective forces acting on various SARS-CoV-2 genes. Following their emergence, the adaptive evolution of 501Y lineage viruses has involved repeated selectively favored convergent mutations at 35 genome sites, mutations we refer to as the 501Y meta-signature. The ongoing convergence of viruses in many other lineages on this meta-signature suggests that it includes multiple mutation combinations capable of promoting the persistence of diverse SARS-CoV-2 lineages in the face of mounting host immune recognition.
Asunto(s)
COVID-19/epidemiología , Evolución Molecular , Mutación , Pandemias , SARS-CoV-2/genética , Secuencia de Aminoácidos/genética , COVID-19/inmunología , COVID-19/transmisión , COVID-19/virología , Codón/genética , Genes Virales , Flujo Genético , Adaptación al Huésped/genética , Humanos , Evasión Inmune , Filogenia , Salud PúblicaRESUMEN
Animals such as raccoon dogs, mink and muskrats are farmed for fur and are sometimes used as food or medicinal products1,2, yet they are also potential reservoirs of emerging pathogens3. Here we performed single-sample metatranscriptomic sequencing of internal tissues from 461 individual fur animals that were found dead due to disease. We characterized 125 virus species, including 36 that were novel and 39 at potentially high risk of cross-species transmission, including zoonotic spillover. Notably, we identified seven species of coronaviruses, expanding their known host range, and documented the cross-species transmission of a novel canine respiratory coronavirus to raccoon dogs and of bat HKU5-like coronaviruses to mink, present at a high abundance in lung tissues. Three subtypes of influenza A virus-H1N2, H5N6 and H6N2-were detected in the lungs of guinea pig, mink and muskrat, respectively. Multiple known zoonotic viruses, such as Japanese encephalitis virus and mammalian orthoreovirus4,5, were detected in guinea pigs. Raccoon dogs and mink carried the highest number of potentially high-risk viruses, while viruses from the Coronaviridae, Paramyxoviridae and Sedoreoviridae families commonly infected multiple hosts. These data also reveal potential virus transmission between farmed animals and wild animals, and from humans to farmed animals, indicating that fur farming represents an important transmission hub for viral zoonoses.
Asunto(s)
Pelaje de Animal , Animales Domésticos , Animales Salvajes , Reservorios de Enfermedades , Especificidad del Huésped , Zoonosis Virales , Animales , Perros , Cobayas , Humanos , Animales Domésticos/virología , Animales Salvajes/virología , Arvicolinae/virología , Quirópteros/virología , Coronavirus/aislamiento & purificación , Coronavirus/genética , Coronavirus/clasificación , Reservorios de Enfermedades/virología , Reservorios de Enfermedades/veterinaria , Virus de la Encefalitis Japonesa (Especie)/genética , Virus de la Encefalitis Japonesa (Especie)/aislamiento & purificación , Virus de la Influenza A/clasificación , Virus de la Influenza A/genética , Virus de la Influenza A/aislamiento & purificación , Pulmón/virología , Visón/virología , Orthoreovirus/genética , Orthoreovirus/aislamiento & purificación , Filogenia , Perros Mapache/virología , Zoonosis Virales/transmisión , Zoonosis Virales/virologíaRESUMEN
After the first wave of SARS-CoV-2 infections in spring 2020, Europe experienced a resurgence of the virus starting in late summer 2020 that was deadlier and more difficult to contain1. Relaxed intervention measures and summer travel have been implicated as drivers of the second wave2. Here we build a phylogeographical model to evaluate how newly introduced lineages, as opposed to the rekindling of persistent lineages, contributed to the resurgence of COVID-19 in Europe. We inform this model using genomic, mobility and epidemiological data from 10 European countries and estimate that in many countries more than half of the lineages circulating in late summer resulted from new introductions since 15 June 2020. The success in onward transmission of newly introduced lineages was negatively associated with the local incidence of COVID-19 during this period. The pervasive spread of variants in summer 2020 highlights the threat of viral dissemination when restrictions are lifted, and this needs to be carefully considered in strategies to control the current spread of variants that are more transmissible and/or evade immunity. Our findings indicate that more effective and coordinated measures are required to contain the spread through cross-border travel even as vaccination is reducing disease burden.
Asunto(s)
COVID-19/transmisión , COVID-19/virología , SARS-CoV-2/aislamiento & purificación , COVID-19/epidemiología , COVID-19/prevención & control , Europa (Continente)/epidemiología , Genoma Viral/genética , Humanos , Incidencia , Locomoción , Filogenia , Filogeografía , SARS-CoV-2/clasificación , SARS-CoV-2/genética , SARS-CoV-2/patogenicidad , Factores de Tiempo , Viaje/estadística & datos numéricosRESUMEN
The discrepancy between short- and long-term rate estimates, known as the time-dependent rate phenomenon (TDRP), poses a challenge to extrapolating evolutionary rates over time and reconstructing evolutionary history of viruses. The TDRP reveals a decline in evolutionary rate estimates with the measurement timescale, explained empirically by a power-law rate decay, notably observed in animal and human viruses. A mechanistic evolutionary model, the Prisoner of War (PoW) model, has been proposed to address TDRP in viruses. Although TDRP has been studied in animal viruses, its impact on plant virus evolutionary history remains largely unexplored. Here, we investigated the consequences of TDRP in plant viruses by applying the PoW model to reconstruct the evolutionary history of sobemoviruses, plant pathogens with significant importance due to their impact on agriculture and plant health. Our analysis showed that the Sobemovirus genus dates back over four million years, indicating an ancient origin. We found evidence that supports deep host jumps to Poaceae, Fabaceae, and Solanaceae occurring between tens to hundreds of thousand years ago, followed by specialization. Remarkably, the TDRP-corrected evolutionary history of sobemoviruses was extended far beyond previous estimates that had suggested their emergence nearly 9,000 years ago, a time coinciding with the Neolithic period in the Near East. By incorporating sequences collected through metagenomic analyses, the resulting phylogenetic tree showcases increased genetic diversity, reflecting a deep history of sobemovirus species. We identified major radiation events beginning between 4,600 to 2,000 years ago, which aligns with the Neolithic period in various regions, suggesting a period of rapid diversification from then to the present. Our findings make a case for the possibility of deep evolutionary origins of plant viruses.
Asunto(s)
Virus de Plantas , Virus ARN , Animales , Humanos , Filogenia , Evolución Biológica , Virus ARN/genética , Virus de Plantas/genética , Plantas , Evolución MolecularRESUMEN
Escape from cytotoxic T lymphocyte (CTL) responses toward HIV-1 Gag and Nef has been associated with reduced control of HIV-1 replication in adults. However, less is known about CTL-driven immune selection in infants as longitudinal studies of infants are limited. Here, 1,210 gag and 1,264 nef sequences longitudinally collected within 15 months after birth from 14 HIV-1 perinatally infected infants and their mothers were analyzed. The number of transmitted founder (T/F) viruses and associations between virus evolution, selection, CTL escape, and disease progression were determined. The analyses indicated that a paraphyletic-monophyletic relationship between the mother-infant sequences was common (80%), and that the HIV-1 infection was established by a single T/F virus in 10 of the 12 analyzed infants (83%). Furthermore, most HIV-1 CTL escape mutations among infants were transmitted from the mothers and did not revert during the first year of infection. Still, immune-driven selection was observed at approximately 3 months after HIV-1 infection in infants. Moreover, virus populations with CTL escape mutations in gag evolved faster than those without, independently of disease progression rate. These findings expand the current knowledge of HIV-1 transmission, evolution, and CTL escape in infant HIV-1 infection and are relevant for the development of immune-directed interventions in infants.IMPORTANCEDespite increased coverage in antiretroviral therapy for the prevention of perinatal transmission, paediatric HIV-1 infection remains a significant public health concern, especially in areas of high HIV-1 prevalence. Understanding HIV-1 transmission and the subsequent virus adaptation from the mother to the infant's host environment, as well as the viral factors that affect disease outcome, is important for the development of early immune-directed interventions for infants. This study advances our understanding of vertical HIV-1 transmission, and how infant immune selection pressure is shaping the intra-host evolutionary dynamics of HIV-1.
Asunto(s)
Evolución Molecular , Infecciones por VIH , VIH-1 , Transmisión Vertical de Enfermedad Infecciosa , Mutación , Linfocitos T Citotóxicos , Productos del Gen gag del Virus de la Inmunodeficiencia Humana , Productos del Gen nef del Virus de la Inmunodeficiencia Humana , Humanos , VIH-1/genética , VIH-1/inmunología , Linfocitos T Citotóxicos/inmunología , Infecciones por VIH/virología , Infecciones por VIH/inmunología , Infecciones por VIH/transmisión , Lactante , Femenino , Productos del Gen gag del Virus de la Inmunodeficiencia Humana/genética , Productos del Gen gag del Virus de la Inmunodeficiencia Humana/inmunología , Productos del Gen nef del Virus de la Inmunodeficiencia Humana/genética , Productos del Gen nef del Virus de la Inmunodeficiencia Humana/inmunología , Evasión Inmune/genética , Recién Nacido , Filogenia , Masculino , Estudios Longitudinales , Embarazo , AdultoRESUMEN
MOTIVATION: Advancements in high-throughput genomic sequencing are delivering genomic pathogen data at an unprecedented rate, positioning statistical phylogenetics as a critical tool to monitor infectious diseases globally. This rapid growth spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences N. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-length-specific (BLS) parameters that traditionally takes O(N2) operations using the standard pruning algorithm. A recent study proposes an approach to calculate this gradient in O(N), enabling researchers to take advantage of gradient-based samplers such as HMC. The CPU implementation of this approach makes the calculation of the gradient computationally tractable for nucleotide-based models but falls short in performance for larger state-space size models, such as Markov-modulated and codon models. Here, we describe novel massively parallel algorithms to calculate the gradient of the log-likelihood wrt all BLS parameters that take advantage of graphics processing units (GPUs) and result in many fold higher speedups over previous CPU implementations. RESULTS: We benchmark these GPU algorithms on three computing systems using three evolutionary inference examples exploring complete genomes from 997 dengue viruses, 62 carnivore mitochondria and 49 yeasts, and observe a >128-fold speedup over the CPU implementation for codon-based models and >8-fold speedup for nucleotide-based models. As a practical demonstration, we also estimate the timing of the first introduction of West Nile virus into the continental Unites States under a codon model with a relaxed molecular clock from 104 full viral genomes, an inference task previously intractable. AVAILABILITY AND IMPLEMENTATION: We provide an implementation of our GPU algorithms in BEAGLE v4.0.0 (https://github.com/beagle-dev/beagle-lib), an open-source library for statistical phylogenetics that enables parallel calculations on multi-core CPUs and GPUs. We employ a BEAGLE-implementation using the Bayesian phylogenetics framework BEAST (https://github.com/beast-dev/beast-mcmc).
Asunto(s)
Algoritmos , Programas Informáticos , Filogenia , Teorema de Bayes , Codón , NucleótidosRESUMEN
Since the latter part of 2020, SARS-CoV-2 evolution has been characterised by the emergence of viral variants associated with distinct biological characteristics. While the main research focus has centred on the ability of new variants to increase in frequency and impact the effective reproductive number of the virus, less attention has been placed on their relative ability to establish transmission chains and to spread through a geographic area. Here, we describe a phylogeographic approach to estimate and compare the introduction and dispersal dynamics of the main SARS-CoV-2 variants - Alpha, Iota, Delta, and Omicron - that circulated in the New York City area between 2020 and 2022. Notably, our results indicate that Delta had a lower ability to establish sustained transmission chains in the NYC area and that Omicron (BA.1) was the variant fastest to disseminate across the study area. The analytical approach presented here complements non-spatially-explicit analytical approaches that seek a better understanding of the epidemiological differences that exist among successive SARS-CoV-2 variants of concern.
Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , COVID-19/virología , Ciudad de Nueva York/epidemiología , SARS-CoV-2/genéticaRESUMEN
Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.
Asunto(s)
Clasificación , Filogenia , Clasificación/métodos , SARS-CoV-2/genética , SARS-CoV-2/clasificación , Subtipo H3N2 del Virus de la Influenza A/genética , Subtipo H3N2 del Virus de la Influenza A/clasificación , Modelos Genéticos , Cadenas de Markov , Teorema de BayesRESUMEN
BACKGROUND: Nipah virus (NiV), a highly lethal virus in humans, circulates in Pteropus bats throughout South and Southeast Asia. Difficulty in obtaining viral genomes from bats means we have a poor understanding of NiV diversity. METHODS: We develop phylogenetic approaches applied to the most comprehensive collection of genomes to date (N=257, 175 from bats, 73 from humans) from six countries over 22 years (1999-2020). We divide the four major NiV sublineages into 15 genetic clusters. Using Approximate Bayesian Computation fit to a spatial signature of viral diversity, we estimate the presence and the average size of genetic clusters per area. RESULTS: We find that, within any bat roost, there are an average of 2.4 co-circulating genetic clusters, rising to 5.5 clusters at areas of 1500-2000km2. We estimate that each genetic cluster occupies an average area of 1.3million km2 (95%CI: 0.6-2.3 million), with 14 clusters in an area of 100,000km2 (95%CI: 6-24). In the few sites in Bangladesh and Cambodia where genomic surveillance has been concentrated, we estimate that most clusters have been identified, but only â¼15% of overall NiV diversity has been uncovered. CONCLUSION: Our findings are consistent with entrenched co-circulation of distinct lineages, even within roosts, coupled with slow migration over larger spatial scales.
RESUMEN
Molecular clock models undergird modern methods of divergence-time estimation. Local clock models propose that the rate of molecular evolution is constant within phylogenetic subtrees. Current local clock inference procedures exhibit one or more weaknesses, namely they achieve limited scalability to trees with large numbers of taxa, impose model misspecification, or require a priori knowledge of the existence and location of clocks. To overcome these challenges, we present an autocorrelated, Bayesian model of heritable clock rate evolution that leverages heavy-tailed priors with mean zero to shrink increments of change between branch-specific clocks. We further develop an efficient Hamiltonian Monte Carlo sampler that exploits closed form gradient computations to scale our model to large trees. Inference under our shrinkage clock exhibits a speed-up compared to the popular random local clock when estimating branch-specific clock rates on a variety of simulated datasets. This speed-up increases with the size of the problem. We further show our shrinkage clock recovers known local clocks within a rodent and mammalian phylogeny. Finally, in a problem that once appeared computationally impractical, we investigate the heritable clock structure of various surface glycoproteins of influenza A virus in the absence of prior knowledge about clock placement. We implement our shrinkage clock and make it publicly available in the BEAST software package.
Asunto(s)
Evolución Molecular , Mamíferos , Animales , Filogenia , Teorema de Bayes , Factores de Tiempo , Modelos GenéticosRESUMEN
Getah virus (GETV) mainly causes disease in livestock and may pose an epidemic risk due to its expanding host range and the potential of long-distance dispersal through animal trade. Here, we used metagenomic next-generation sequencing (mNGS) to identify GETV as the pathogen responsible for reemerging swine disease in China and subsequently estimated key epidemiological parameters using phylodynamic and spatially-explicit phylogeographic approaches. The GETV isolates were able to replicate in a variety of cell lines, including human cells, and showed high pathogenicity in a mouse model, suggesting the potential for more mammal hosts. We obtained 16 complete genomes and 79 E2 gene sequences from viral strains collected in China from 2016 to 2021 through large-scale surveillance among livestock, pets, and mosquitoes. Our phylogenetic analysis revealed that three major GETV lineages are responsible for the current epidemic in livestock in China. We identified three potential positively selected sites and mutations of interest in E2, which may impact the transmissibility and pathogenicity of the virus. Phylodynamic inference of the GETV demographic dynamics identified an association between livestock meat consumption and the evolution of viral genetic diversity. Finally, phylogeographic reconstruction of GETV dispersal indicated that the sampled lineages have preferentially circulated within areas associated with relatively higher mean annual temperature and pig population density. Our results highlight the importance of continuous surveillance of GETV among livestock in southern Chinese regions associated with relatively high temperatures. IMPORTANCE Although livestock is known to be the primary reservoir of Getah virus (GETV) in Asian countries, where identification is largely based on serology, the evolutionary history and spatial epidemiology of GETV in these regions remain largely unknown. Through our sequencing efforts, we provided robust support for lineage delineation of GETV and identified three major lineages that are responsible for the current epidemic in livestock in China. We further analyzed genomic and epidemiological data to reconstruct the recent demographic and dispersal history of GETV in domestic animals in China and to explore the impact of environmental factors on its genetic diversity and its diffusion. Notably, except for livestock meat consumption, other pig-related factors such as the evolution of live pig transport and pork production do not show a significant association with the evolution of viral genetic diversity, pointing out that further studies should investigate the potential contribution of other host species to the GETV outbreak. Our analysis of GETV demonstrates the need for wider animal species surveillance and provides a baseline for future studies of the molecular epidemiology and early warning of emerging arboviruses in China.
Asunto(s)
Arbovirus , Genoma Viral , Filogenia , Animales , Humanos , Ratones , Arbovirus/genética , China/epidemiología , Genómica , Ganado/virologíaRESUMEN
IMPORTANCE: Lumpy skin disease virus (LSDV) has a complex epidemiology involving multiple strains, recombination, and vaccination. Its DNA genome provides limited genetic variation to trace outbreaks in space and time. Sequencing of LSDV whole genomes has also been patchy at global and regional scales. Here, we provide the first fine-grained whole genome sequence sampling of a constrained LSDV outbreak (southeastern Europe, 2015-2017), which we analyze along with global publicly available genomes. We formally evaluate the past occurrence of recombination events as well as the temporal signal that is required for calibrating molecular clock models and subsequently conduct a time-calibrated spatially explicit phylogeographic reconstruction. Our study further illustrates the importance of accounting for recombination events before reconstructing global and regional dynamics of DNA viruses. More LSDV whole genomes from endemic areas are needed to obtain a comprehensive understanding of global LSDV dispersal dynamics.
Asunto(s)
Genoma Viral , Dermatosis Nodular Contagiosa , Virus de la Dermatosis Nodular Contagiosa , Animales , Bovinos , Brotes de Enfermedades , ADN Viral/genética , Europa (Continente)/epidemiología , Dermatosis Nodular Contagiosa/epidemiología , Dermatosis Nodular Contagiosa/virología , Virus de la Dermatosis Nodular Contagiosa/genética , FilogeniaRESUMEN
The dynamics of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission are influenced by a variety of factors, including social restrictions and the emergence of distinct variants. In this study, we delve into the origins and dissemination of the Alpha, Delta, and Omicron-BA.1 variants of concern in Galicia, northwest Spain. For this, we leveraged genomic data collected by the EPICOVIGAL Consortium and from the GISAID database, along with mobility information from other Spanish regions and foreign countries. Our analysis indicates that initial introductions during the Alpha phase were predominantly from other Spanish regions and France. However, as the pandemic progressed, introductions from Portugal and the United States became increasingly significant. The number of detected introductions varied from 96 and 101 for Alpha and Delta to 39 for Omicron-BA.1. Most of these introductions left a low number of descendants (<10), suggesting a limited impact on the evolution of the pandemic in Galicia. Notably, Galicia's major coastal cities emerged as critical hubs for viral transmission, highlighting their role in sustaining and spreading the virus. This research emphasizes the critical role of regional connectivity in the spread of SARS-CoV-2 and offers essential insights for enhancing public health strategies and surveillance measures.
Asunto(s)
COVID-19 , SARS-CoV-2 , España/epidemiología , COVID-19/epidemiología , COVID-19/transmisión , COVID-19/virología , Humanos , SARS-CoV-2/genética , Genoma Viral , Filogenia , PandemiasRESUMEN
Divergence time estimation is crucial to provide temporal signals for dating biologically important events from species divergence to viral transmissions in space and time. With the advent of high-throughput sequencing, recent Bayesian phylogenetic studies have analyzed hundreds to thousands of sequences. Such large-scale analyses challenge divergence time reconstruction by requiring inference on highly correlated internal node heights that often become computationally infeasible. To overcome this limitation, we explore a ratio transformation that maps the original $N-1$ internal node heights into a space of one height parameter and $N-2$ ratio parameters. To make the analyses scalable, we develop a collection of linear-time algorithms to compute the gradient and Jacobian-associated terms of the log-likelihood with respect to these ratios. We then apply Hamiltonian Monte Carlo sampling with the ratio transform in a Bayesian framework to learn the divergence times in 4 pathogenic viruses (West Nile virus, rabies virus, Lassa virus, and Ebola virus) and the coralline red algae. Our method both resolves a mixing issue in the West Nile virus example and improves inference efficiency by at least 5-fold for the Lassa and rabies virus examples as well as for the algae example. Our method now also makes it computationally feasible to incorporate mixed-effects molecular clock models for the Ebola virus example, confirms the findings from the original study, and reveals clearer multimodal distributions of the divergence times of some clades of interest.
Asunto(s)
Algoritmos , Filogenia , Teorema de Bayes , Factores de Tiempo , Método de MontecarloRESUMEN
Virus host shifts are generally associated with novel adaptations to exploit the cells of the new host species optimally. Surprisingly, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has apparently required little to no significant adaptation to humans since the start of the Coronavirus Disease 2019 (COVID-19) pandemic and to October 2020. Here we assess the types of natural selection taking place in Sarbecoviruses in horseshoe bats versus the early SARS-CoV-2 evolution in humans. While there is moderate evidence of diversifying positive selection in SARS-CoV-2 in humans, it is limited to the early phase of the pandemic, and purifying selection is much weaker in SARS-CoV-2 than in related bat Sarbecoviruses. In contrast, our analysis detects evidence for significant positive episodic diversifying selection acting at the base of the bat virus lineage SARS-CoV-2 emerged from, accompanied by an adaptive depletion in CpG composition presumed to be linked to the action of antiviral mechanisms in these ancestral bat hosts. The closest bat virus to SARS-CoV-2, RmYN02 (sharing an ancestor about 1976), is a recombinant with a structure that includes differential CpG content in Spike; clear evidence of coinfection and evolution in bats without involvement of other species. While an undiscovered "facilitating" intermediate species cannot be discounted, collectively, our results support the progenitor of SARS-CoV-2 being capable of efficient human-human transmission as a consequence of its adaptive evolutionary history in bats, not humans, which created a relatively generalist virus.
Asunto(s)
COVID-19/virología , Quirópteros/virología , SARS-CoV-2/genética , Zoonosis Virales/virología , Animales , COVID-19/epidemiología , COVID-19/transmisión , Evolución Molecular , Genoma Viral , Especificidad del Huésped , Humanos , Pandemias , Filogenia , Receptores Virales/genética , SARS-CoV-2/patogenicidad , Selección Genética , Zoonosis Virales/genética , Zoonosis Virales/transmisiónRESUMEN
Inferring dependencies between mixed-type biological traits while accounting for evolutionary relationships between specimens is of great scientific interest yet remains infeasible when trait and specimen counts grow large. The state-of-the-art approach uses a phylogenetic multivariate probit model to accommodate binary and continuous traits via a latent variable framework, and utilizes an efficient bouncy particle sampler (BPS) to tackle the computational bottleneck-integrating many latent variables from a high-dimensional truncated normal distribution. This approach breaks down as the number of specimens grows and fails to reliably characterize conditional dependencies between traits. Here, we propose an inference pipeline for phylogenetic probit models that greatly outperforms BPS. The novelty lies in 1) a combination of the recent Zigzag Hamiltonian Monte Carlo (Zigzag-HMC) with linear-time gradient evaluations and 2) a joint sampling scheme for highly correlated latent variables and correlation matrix elements. In an application exploring HIV-1 evolution from 535 viruses, the inference requires joint sampling from an 11,235-dimensional truncated normal and a 24-dimensional covariance matrix. Our method yields a 5-fold speedup compared to BPS and makes it possible to learn partial correlations between candidate viral mutations and virulence. Computational speedup now enables us to tackle even larger problems: we study the evolution of influenza H1N1 glycosylations on around 900 viruses. For broader applicability, we extend the phylogenetic probit model to incorporate categorical traits, and demonstrate its use to study Aquilegia flower and pollinator co-evolution.
Asunto(s)
Subtipo H1N1 del Virus de la Influenza A , Teorema de Bayes , Subtipo H1N1 del Virus de la Influenza A/genética , Filogenia , Flores , GlicosilaciónRESUMEN
Salmonella enterica serovar Typhimurium strain ATCC14028s is commercially available from multiple national type culture collections, and has been widely used since 1960 for quality control of growth media and experiments on fitness ("laboratory evolution"). ATCC14028s has been implicated in multiple cross-contaminations in the laboratory, and has also caused multiple laboratory infections and one known attempt at bioterrorism. According to hierarchical clustering of 3002 core gene sequences, ATCC14028s belongs to HierCC cluster HC20_373 in which most internal branch lengths are only one to three SNPs long. Many natural Typhimurium isolates from humans, domesticated animals and the environment also belong to HC20_373, and their core genomes are almost indistinguishable from those of laboratory strains. These natural isolates have infected humans in Ireland and Taiwan for decades, and are common in the British Isles as well as the Americas. The isolation history of some of the natural isolates confirms the conclusion that they do not represent recent contamination by the laboratory strain, and 10% carry plasmids or bacteriophages which have been acquired in nature by HGT from unrelated bacteria. We propose that ATCC14028s has repeatedly escaped from the laboratory environment into nature via laboratory accidents or infections, but the escaped micro-lineages have only a limited life span. As a result, there is a genetic gap separating HC20_373 from its closest natural relatives due to a divergence between them in the late 19th century followed by repeated extinction events of escaped HC20_373.