RESUMO
Zoonotic spillovers of viruses have occurred through the animal trade worldwide. The start of the COVID-19 pandemic was traced epidemiologically to the Huanan Seafood Wholesale Market. Here, we analyze environmental qPCR and sequencing data collected in the Huanan market in early 2020. We demonstrate that market-linked severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genetic diversity is consistent with market emergence and find increased SARS-CoV-2 positivity near and within a wildlife stall. We identify wildlife DNA in all SARS-CoV-2-positive samples from this stall, including species such as civets, bamboo rats, and raccoon dogs, previously identified as possible intermediate hosts. We also detect animal viruses that infect raccoon dogs, civets, and bamboo rats. Combining metagenomic and phylogenetic approaches, we recover genotypes of market animals and compare them with those from farms and other markets. This analysis provides the genetic basis for a shortlist of potential intermediate hosts of SARS-CoV-2 to prioritize for serological and viral sampling.
Assuntos
Animais Selvagens , COVID-19 , Filogenia , SARS-CoV-2 , Animais , COVID-19/epidemiologia , COVID-19/virologia , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , Animais Selvagens/virologia , Humanos , PandemiasRESUMO
The World Health Organization declared mpox a public health emergency of international concern in July 2022. To investigate global mpox transmission and population-level changes associated with controlling spread, we built phylogeographic and phylodynamic models to analyze MPXV genomes from five global regions together with air traffic and epidemiological data. Our models reveal community transmission prior to detection, changes in case reporting throughout the epidemic, and a large degree of transmission heterogeneity. We find that viral introductions played a limited role in prolonging spread after initial dissemination, suggesting that travel bans would have had only a minor impact. We find that mpox transmission in North America began declining before more than 10% of high-risk individuals in the USA had vaccine-induced immunity. Our findings highlight the importance of broader routine specimen screening surveillance for emerging infectious diseases and of joint integration of genomic and epidemiological information for early outbreak control.
Assuntos
Doenças Transmissíveis Emergentes , Epidemias , Mpox , Humanos , Surtos de Doenças , Mpox/epidemiologia , Mpox/transmissão , Mpox/virologia , Saúde Pública , Monkeypox virus/fisiologiaRESUMO
We estimate the basic reproductive number and case counts for 15 distinct Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) outbreaks, distributed across 11 populations (10 countries and one cruise ship), based solely on phylodynamic analyses of genomic data. Our results indicate that, prior to significant public health interventions, the reproductive numbers for 10 (out of 15) of these outbreaks are similar, with median posterior estimates ranging between 1.4 and 2.8. These estimates provide a view which is complementary to that provided by those based on traditional line listing data. The genomic-based view is arguably less susceptible to biases resulting from differences in testing protocols, testing intensity, and import of cases into the community of interest. In the analyses reported here, the genomic data primarily provide information regarding which samples belong to a particular outbreak. We observe that once these outbreaks are identified, the sampling dates carry the majority of the information regarding the reproductive number. Finally, we provide genome-based estimates of the cumulative number of infections for each outbreak. For 7 out of 11 of the populations studied, the number of confirmed cases is much bigger than the cumulative number of infections estimated from the sequence data, a possible explanation being the presence of unsequenced outbreaks in these populations.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , Surtos de Doenças , Genômica , NaviosRESUMO
Epidemiology has been transformed by the advent of Bayesian phylodynamic models that allow researchers to infer the geographic history of pathogen dispersal over a set of discrete geographic areas [1, 2]. These models provide powerful tools for understanding the spatial dynamics of disease outbreaks, but contain many parameters that are inferred from minimal geographic information (i.e., the single area in which each pathogen was sampled). Consequently, inferences under these models are inherently sensitive to our prior assumptions about the model parameters. Here, we demonstrate that the default priors used in empirical phylodynamic studies make strong and biologically unrealistic assumptions about the underlying geographic process. We provide empirical evidence that these unrealistic priors strongly (and adversely) impact commonly reported aspects of epidemiological studies, including: 1) the relative rates of dispersal between areas; 2) the importance of dispersal routes for the spread of pathogens among areas; 3) the number of dispersal events between areas, and; 4) the ancestral area in which a given outbreak originated. We offer strategies to avoid these problems, and develop tools to help researchers specify more biologically reasonable prior models that will realize the full potential of phylodynamic methods to elucidate pathogen biology and, ultimately, inform surveillance and monitoring policies to mitigate the impacts of disease outbreaks.
Assuntos
Surtos de Doenças , Filogenia , Teorema de BayesRESUMO
In 2013 to 2017, avian influenza A(H7N9) virus has caused five severe epidemic waves of human infections in China. The role of live bird markets (LBMs) in the transmission dynamics of H7N9 remains unclear. Using a Bayesian phylodynamic approach, we shed light on past H7N9 transmission events at the human-LBM interface that were not directly observed using case surveillance data-based approaches. Our results reveal concurrent circulation of H7N9 lineages in Yangtze and Pearl River Delta regions, with evidence of local transmission during each wave. Our results indicate that H7N9 circulated in humans and LBMs for weeks to months before being first detected. Our findings support the seasonality of H7N9 transmission and suggest a high number of underreported infections, particularly in LBMs. We provide evidence for differences in virus transmissibility between low and highly pathogenic H7N9. We demonstrate a regional spatial structure for the spread of H7N9 among LBMs, highlighting the importance of further investigating the role of local live poultry trade in virus transmission. Our results provide estimates of avian influenza virus (AIV) transmission at the LBM level, providing a unique opportunity to better prepare surveillance plans at LBMs for response to future AIV epidemics.
Assuntos
Subtipo H7N9 do Vírus da Influenza A , Influenza Aviária , Influenza Humana , Animais , Humanos , Teorema de Bayes , Aves Domésticas , China/epidemiologiaRESUMO
Reassortment is an evolutionary process common in viruses with segmented genomes. These viruses can swap whole genomic segments during cellular co-infection, giving rise to novel progeny formed from the mixture of parental segments. Since large-scale genome rearrangements have the potential to generate new phenotypes, reassortment is important to both evolutionary biology and public health research. However, statistical inference of the pattern of reassortment events from phylogenetic data is exceptionally difficult, potentially involving inference of general graphs in which individual segment trees are embedded. In this paper, we argue that, in general, the number and pattern of reassortment events are not identifiable from segment trees alone, even with theoretically ideal data. We call this fact the fundamental problem of reassortment, which we illustrate using the concept of the "first-infection tree," a potentially counterfactual genealogy that would have been observed in the segment trees had no reassortment occurred. Further, we illustrate four additional problems that can arise logically in the inference of reassortment events and show, using simulated data, that these problems are not rare and can potentially distort our observation of reassortment even in small data sets. Finally, we discuss how existing methods can be augmented or adapted to account for not only the fundamental problem of reassortment, but also the four additional situations that can complicate the inference of reassortment.
Assuntos
Genoma Viral , Filogenia , Vírus Reordenados , Vírus Reordenados/genética , Evolução Molecular , Modelos GenéticosRESUMO
Analysis of phylogenetic trees has become an essential tool in epidemiology. Likelihood-based methods fit models to phylogenies to draw inferences about the phylodynamics and history of viral transmission. However, these methods are often computationally expensive, which limits the complexity and realism of phylodynamic models and makes them ill-suited for informing policy decisions in real-time during rapidly developing outbreaks. Likelihood-free methods using deep learning are pushing the boundaries of inference beyond these constraints. In this paper, we extend, compare, and contrast a recently developed deep learning method for likelihood-free inference from trees. We trained multiple deep neural networks using phylogenies from simulated outbreaks that spread among 5 locations and found they achieve close to the same levels of accuracy as Bayesian inference under the true simulation model. We compared robustness to model misspecification of a trained neural network to that of a Bayesian method. We found that both models had comparable performance, converging on similar biases. We also implemented a method of uncertainty quantification called conformalized quantile regression that we demonstrate has similar patterns of sensitivity to model misspecification as Bayesian highest posterior density (HPD) and greatly overlap with HPDs, but have lower precision (more conservative). Finally, we trained and tested a neural network against phylogeographic data from a recent study of the SARS-Cov-2 pandemic in Europe and obtained similar estimates of region-specific epidemiological parameters and the location of the common ancestor in Europe. Along with being as accurate and robust as likelihood-based methods, our trained neural networks are on average over 3 orders of magnitude faster after training. Our results support the notion that neural networks can be trained with simulated data to accurately mimic the good and bad statistical properties of the likelihood functions of generative phylogenetic models.
Assuntos
Aprendizado Profundo , Filogeografia , Filogeografia/métodos , Funções Verossimilhança , Filogenia , Classificação/métodos , Teorema de Bayes , Vírus/genética , Vírus/classificaçãoRESUMO
In a striking result, Louca and Pennell [S. Louca, M. W. Pennell, Nature 580, 502-505 (2020)] recently proved that a large class of phylogenetic birth-death models is statistically unidentifiable from lineage-through-time (LTT) data: Any pair of sufficiently smooth birth and death rate functions is "congruent" to an infinite collection of other rate functions, all of which have the same likelihood for any LTT vector of any dimension. As Louca and Pennell argue, this fact has distressing implications for the thousands of studies that have utilized birth-death models to study evolution. In this paper, we qualify their finding by proving that an alternative and widely used class of birth-death models is indeed identifiable. Specifically, we show that piecewise constant birth-death models can, in principle, be consistently estimated and distinguished from one another, given a sufficiently large extant timetree and some knowledge of the present-day population. Subject to mild regularity conditions, we further show that any unidentifiable birth-death model class can be arbitrarily closely approximated by a class of identifiable models. The sampling requirements needed for our results to hold are explicit and are expected to be satisfied in many contexts such as the phylodynamic analysis of a global pandemic.
Assuntos
Morte , Cadeias de Markov , Modelos Biológicos , Parto , Filogenia , Dinâmica Populacional , Evolução Biológica , Humanos , PandemiasRESUMO
Most new pathogens of humans and animals arise via switching events from distinct host species. However, our understanding of the evolutionary and ecological drivers of successful host adaptation, expansion, and dissemination are limited. Staphylococcus aureus is a major bacterial pathogen of humans and a leading cause of mastitis in dairy cows worldwide. Here we trace the evolutionary history of bovine S. aureus using a global dataset of 10,254 S. aureus genomes including 1,896 bovine isolates from 32 countries in 6 continents. We identified 7 major contemporary endemic clones of S. aureus causing bovine mastitis around the world and traced them back to 4 independent host-jump events from humans that occurred up to 2,500 y ago. Individual clones emerged and underwent clonal expansion from the mid-19th to late 20th century coinciding with the commercialization and industrialization of dairy farming, and older lineages have become globally distributed via established cattle trade links. Importantly, we identified lineage-dependent differences in the frequency of host transmission events between humans and cows in both directions revealing high risk clones threatening veterinary and human health. Finally, pangenome network analysis revealed that some bovine S. aureus lineages contained distinct sets of bovine-associated genes, consistent with multiple trajectories to host adaptation via gene acquisition. Taken together, we have dissected the evolutionary history of a major endemic pathogen of livestock providing a comprehensive temporal, geographic, and gene-level perspective of its remarkable success.
Assuntos
Infecções Estafilocócicas , Staphylococcus aureus , Feminino , Humanos , Bovinos , Animais , Staphylococcus aureus/genética , Gado/genética , Infecções Estafilocócicas/epidemiologia , Infecções Estafilocócicas/veterinária , Infecções Estafilocócicas/genética , Genoma , Especificidade de HospedeiroRESUMO
Community-associated, methicillin-resistant Staphylococcus aureus (MRSA) lineages have emerged in many geographically distinct regions around the world during the past 30 y. Here, we apply consistent phylodynamic methods across multiple community-associated MRSA lineages to describe and contrast their patterns of emergence and dissemination. We generated whole-genome sequencing data for the Australian sequence type (ST) ST93-MRSA-IV from remote communities in Far North Queensland and Papua New Guinea, and the Bengal Bay ST772-MRSA-V clone from metropolitan communities in Pakistan. Increases in the effective reproduction number (Re) and sustained transmission (Re > 1) coincided with spread of progenitor methicillin-susceptible S. aureus (MSSA) in remote northern Australian populations, dissemination of the ST93-MRSA-IV genotype into population centers on the Australian East Coast, and subsequent importation into the highlands of Papua New Guinea and Far North Queensland. Applying the same phylodynamic methods to existing lineage datasets, we identified common signatures of epidemic growth in the emergence and epidemiological trajectory of community-associated S. aureus lineages from America, Asia, Australasia, and Europe. Surges in Re were observed at the divergence of antibiotic-resistant strains, coinciding with their establishment in regional population centers. Epidemic growth was also observed among drug-resistant MSSA clades in Africa and northern Australia. Our data suggest that the emergence of community-associated MRSA in the late 20th century was driven by a combination of antibiotic-resistant genotypes and host epidemiology, leading to abrupt changes in lineage-wide transmission dynamics and sustained transmission in regional population centers.
Assuntos
Infecções Comunitárias Adquiridas , Staphylococcus aureus Resistente à Meticilina , Infecções Estafilocócicas , Humanos , Staphylococcus aureus/genética , Infecções Estafilocócicas/epidemiologia , Austrália/epidemiologia , Antibacterianos/farmacologia , Paquistão , Infecções Comunitárias Adquiridas/epidemiologia , Testes de Sensibilidade MicrobianaRESUMO
Modern HIV research depends crucially on both viral sequencing and population measurements. To directly link mechanistic biological processes and evolutionary dynamics during HIV infection, we developed multiple within-host phylodynamic models of HIV primary infection for comparative validation against viral load and evolutionary dynamics data. The optimal model of primary infection required no positive selection, suggesting that the host adaptive immune system reduces viral load but surprisingly does not drive observed viral evolution. Rather, the fitness (infectivity) of mutant variants is drawn from an exponential distribution in which most variants are slightly less infectious than their parents (nearly neutral evolution). This distribution was not largely different from either in vivo fitness distributions recorded beyond primary infection or in vitro distributions that are observed without adaptive immunity, suggesting the intrinsic viral fitness distribution may drive evolution. Simulated phylogenetic trees also agree with independent data and illuminate how phylogenetic inference must consider viral and immune-cell population dynamics to gain accurate mechanistic insights.
Assuntos
Adaptação Fisiológica/genética , Infecções por HIV/virologia , HIV-1/genética , Filogenia , Carga Viral , Aptidão Genética , Humanos , Modelos Genéticos , Mutação , Reprodutibilidade dos TestesRESUMO
BACKGROUND: Shenzhen, a city with a substantial mobile population, was identified as the first discovered region of HIV-1 CRF55_01B and epicenter of its severe epidemic. During the implementation of venue-based behavioral interventions and the "treat-all" policy, discerning the spread patterns and transmission hotspots of CRF55_01B is imperative. METHODS: In this study, 1,450 partial pol sequences, with demographic information, were collected from all newly diagnosed CRF55_01B infections in Shenzhen from 2008 to 2020. Molecular networks were constructed using the maximum likelihood and time-resolve phylogenies. Transmission rates, effective reproduction numbers (Re) of clusters and viral dispersal were evaluated using Bayesian inference. RESULTS: In total, 526 sequences formed 114 clusters, including seven large clusters. The status and size of clusters were strongly correlated with age, ethnicity, occupation and CD4+ T cell counts. The transmission rates of clusters were significantly higher than the national epidemic estimate. Four large clusters had Re exceeding 1 at the end of sampling period. Immigrants from Guangdong and Hunan, along with local residents, were identified as the transmission hubs, with heterosexual men being the main source and MSM being the main destination. The virus exhibited a high movement frequency from individuals aged 30-49 years toward diverse age groups. CONCLUSIONS: This study demonstrated the hidden CRF55_01B transmissions continued despite current combined interventions in Shenzhen, and special at-risk individuals susceptible to infection or transmission were identified, potentially serving as targets for more effective prevention and control of the local epidemic, thereby mitigating cross-regional spread nationwide due to population migration.
RESUMO
The emergence of plant pathogens is often associated with waves of unique evolutionary and epidemiological events. Xanthomonas hortorum pv. gardneri is one of the major pathogens causing bacterial spot disease of tomatoes. After its first report in the 1950s, there were no formal reports on this pathogen until the 1990s, despite active global research on the pathogens that cause tomato and pepper bacterial spot disease. Given the recently documented global distribution of X. hortorum pv. gardneri, our objective was to examine genomic diversification associated with its emergence. We sequenced the genomes of X. hortorum pv. gardneri strains collected in eight countries to examine global population structure and pathways of emergence using phylodynamic analysis. We found that strains isolated post-1990 group by region of collection and show minimal impact of recombination on genetic variation. A period of rapid geographic expansion in X. hortorum pv. gardneri is associated with acquisition of a large plasmid conferring copper tolerance by horizontal transfer and coincides with the burgeoning hybrid tomato seed industry through the 1980s. The ancestry of X. hortorum pv. gardneri is consistent with introduction to hybrid tomato seed production and dissemination during the rapid increase in trade of hybrid seeds. [Formula: see text] Copyright © 2024 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.
RESUMO
Despite its increasing role in the understanding of infectious disease transmission at the applied and theoretical levels, phylodynamics lacks a well-defined notion of ideal data and optimal sampling. We introduce a method to visualize and quantify the relative impact of pathogen genome sequence and sampling times-two fundamental sources of data for phylodynamics under birth-death-sampling models-to understand how each drives phylodynamic inference. Applying our method to simulated data and real-world SARS-CoV-2 and H1N1 Influenza data, we use this insight to elucidate fundamental trade-offs and guidelines for phylodynamic analyses to draw the most from sequence data. Phylodynamics promises to be a staple of future responses to infectious disease threats globally. Continuing research into the inherent requirements and trade-offs of phylodynamic data and inference will help ensure phylodynamic tools are wielded in ever more targeted and efficient ways.
Assuntos
COVID-19 , Vírus da Influenza A Subtipo H1N1 , Filogenia , SARS-CoV-2/genéticaRESUMO
Getah virus (GETV) mainly causes disease in livestock and may pose an epidemic risk due to its expanding host range and the potential of long-distance dispersal through animal trade. Here, we used metagenomic next-generation sequencing (mNGS) to identify GETV as the pathogen responsible for reemerging swine disease in China and subsequently estimated key epidemiological parameters using phylodynamic and spatially-explicit phylogeographic approaches. The GETV isolates were able to replicate in a variety of cell lines, including human cells, and showed high pathogenicity in a mouse model, suggesting the potential for more mammal hosts. We obtained 16 complete genomes and 79 E2 gene sequences from viral strains collected in China from 2016 to 2021 through large-scale surveillance among livestock, pets, and mosquitoes. Our phylogenetic analysis revealed that three major GETV lineages are responsible for the current epidemic in livestock in China. We identified three potential positively selected sites and mutations of interest in E2, which may impact the transmissibility and pathogenicity of the virus. Phylodynamic inference of the GETV demographic dynamics identified an association between livestock meat consumption and the evolution of viral genetic diversity. Finally, phylogeographic reconstruction of GETV dispersal indicated that the sampled lineages have preferentially circulated within areas associated with relatively higher mean annual temperature and pig population density. Our results highlight the importance of continuous surveillance of GETV among livestock in southern Chinese regions associated with relatively high temperatures. IMPORTANCE Although livestock is known to be the primary reservoir of Getah virus (GETV) in Asian countries, where identification is largely based on serology, the evolutionary history and spatial epidemiology of GETV in these regions remain largely unknown. Through our sequencing efforts, we provided robust support for lineage delineation of GETV and identified three major lineages that are responsible for the current epidemic in livestock in China. We further analyzed genomic and epidemiological data to reconstruct the recent demographic and dispersal history of GETV in domestic animals in China and to explore the impact of environmental factors on its genetic diversity and its diffusion. Notably, except for livestock meat consumption, other pig-related factors such as the evolution of live pig transport and pork production do not show a significant association with the evolution of viral genetic diversity, pointing out that further studies should investigate the potential contribution of other host species to the GETV outbreak. Our analysis of GETV demonstrates the need for wider animal species surveillance and provides a baseline for future studies of the molecular epidemiology and early warning of emerging arboviruses in China.
Assuntos
Arbovírus , Genoma Viral , Filogenia , Animais , Humanos , Camundongos , Arbovírus/genética , China/epidemiologia , Genômica , Gado/virologiaRESUMO
IMPORTANCE: The number of known virus species has increased dramatically through metagenomic studies, which search genetic material sampled from a host for non-host genes. Here, we focus on an important viral family that includes influenza viruses, the Orthomyxoviridae, with over 100 recently discovered viruses infecting hosts from humans to fish. We find that one virus called WÇhàn mosquito virus 6, discovered in mosquitoes in China, has spread across the globe very recently. Surface proteins used to enter cells show signs of rapid evolution in WÇhàn mosquito virus 6 and its relatives which suggests an ability to infect vertebrate animals. We compute the rate at which new orthomyxovirus species discovered add evolutionary history to the tree of life, predict that many viruses remain to be discovered, and discuss what appropriately designed future studies can teach us about how diseases cross between continents and species.
Assuntos
Genoma Viral , Orthomyxoviridae , Evolução Molecular , Orthomyxoviridae/genética , Filogenia , MetagenômicaRESUMO
Enterovirus C99 (EV-C99) is a newly identified EV serotype within the species Enterovirus C. Few studies on EV-C99 have been conducted globally. More information and research on EV-C99 are needed to assess its genetic characteristics, phylogenetic relationships, and associations with enteroviral diseases. Here, the phylogenetic characteristics of 11 Chinese EV-C99 strains have been reported. The full-length genomic sequences of these 11 strains show 79.4-80.5% nucleotide identity and 91.7-94.3% amino acid (aa) identity with the prototype EV-C99. A maximum likelihood phylogenetic tree constructed based on the entire VP1 coding region identified 13 genotypes (A-M), revealing a high degree of variation among the EV-C99 strains. Phylogeographic analysis showed that the Xinjiang Uygur Autonomous Region is an important source of EV-C99 epidemics in various regions of China. Recombination analysis revealed inter-serotype recombination events of 16 Chinese EV-C99 strains in 5' untranslated regions and 3D regions, resulting in the formation of a single recombination form. Additionally, the Chinese strain of genotype J showed rich aa diversity in the P1 region, indicating that the genotype J of EV-C99 is still going through variable dynamic changes. This study contributes to the global understanding of the EV-C99 genome sequence and holds substantial implications for the surveillance of EV-C99.
Assuntos
Infecções por Enterovirus , Enterovirus , Humanos , Enterovirus/genética , Filogenia , Infecções por Enterovirus/epidemiologia , China/epidemiologia , Genótipo , Genoma ViralRESUMO
The human immunodeficiency virus type 1 (HIV-1) A6 sub-subtype is highly prevalent in Eastern Europe. Over the past decade, the dissemination of the A6 lineage has been expanding in Poland. The recent Russian invasion of Ukraine may further escalate the spread of this sub-subtype. While evolutionary studies using viral sequences have been instrumental in identifying the HIV epidemic patterns, the origins, and dynamics of the A6 sub-subtype in Poland remain to be explored. We analyzed 1185 HIV-1 A6 pol sequences from Poland, along with 8318 publicly available sequences from other countries. For analyses, phylogenetic tree construction, population dynamics inference, Bayesian analysis, and discrete phylogeographic modeling were employed. Of the introduction events to Poland, 69.94% originated from Ukraine, followed by 29.17% from Russia. Most A6 sequences in Poland (53.16%) formed four large clades, with their introductions spanning 1993-2008. Central and Southern Polish regions significantly influenced migration events. Transmissions among men who have sex with men (MSM) emerged as the dominant risk group for virus circulation, representing 72.92% of migration events. Sequences from migrants were found primarily outside the large clades. Past migration from Ukraine has fueled the spread of the A6 sub-subtype and the current influx of war-displaced people maintains the growing national epidemic.
Assuntos
Epidemias , Infecções por HIV , HIV-1 , Minorias Sexuais e de Gênero , Masculino , Humanos , Filogenia , Polônia/epidemiologia , Homossexualidade Masculina , HIV-1/genética , Infecções por HIV/epidemiologia , Teorema de BayesRESUMO
Multi-type birth-death (MTBD) models are phylodynamic analogies of compartmental models in classical epidemiology. They serve to infer such epidemiological parameters as the average number of secondary infections Re and the infectious time from a phylogenetic tree (a genealogy of pathogen sequences). The representatives of this model family focus on various aspects of pathogen epidemics. For instance, the birth-death exposed-infectious (BDEI) model describes the transmission of pathogens featuring an incubation period (when there is a delay between the moment of infection and becoming infectious, as for Ebola and SARS-CoV-2), and permits its estimation along with other parameters. With constantly growing sequencing data, MTBD models should be extremely useful for unravelling information on pathogen epidemics. However, existing implementations of these models in a phylodynamic framework have not yet caught up with the sequencing speed. Computing time and numerical instability issues limit their applicability to medium data sets (≤ 500 samples), while the accuracy of estimations should increase with more data. We propose a new highly parallelizable formulation of ordinary differential equations for MTBD models. We also extend them to forests to represent situations when a (sub-)epidemic started from several cases (e.g., multiple introductions to a country). We implemented it for the BDEI model in a maximum likelihood framework using a combination of numerical analysis methods for efficient equation resolution. Our implementation estimates epidemiological parameter values and their confidence intervals in two minutes on a phylogenetic tree of 10,000 samples. Comparison to the existing implementations on simulated data shows that it is not only much faster but also more accurate. An application of our tool to the 2014 Ebola epidemic in Sierra-Leone is also convincing, with very fast calculation and precise estimates. As MTBD models are closely related to Cladogenetic State Speciation and Extinction (ClaSSE)-like models, our findings could also be easily transferred to the macroevolution domain.
Assuntos
Epidemias , Doença pelo Vírus Ebola , Humanos , Filogenia , Doença pelo Vírus Ebola/epidemiologia , Funções Verossimilhança , Modelos EpidemiológicosRESUMO
Viral deep-sequencing data play a crucial role toward understanding disease transmission network flows, providing higher resolution compared to standard Sanger sequencing. To more fully utilize these rich data and account for the uncertainties in outcomes from phylogenetic analyses, we propose a spatial Poisson process model to uncover human immunodeficiency virus (HIV) transmission flow patterns at the population level. We represent pairings of individuals with viral sequence data as typed points, with coordinates representing covariates such as gender and age and point types representing the unobserved transmission statuses (linkage and direction). Points are associated with observed scores on the strength of evidence for each transmission status that are obtained through standard deep-sequence phylogenetic analysis. Our method is able to jointly infer the latent transmission statuses for all pairings and the transmission flow surface on the source-recipient covariate space. In contrast to existing methods, our framework does not require preclassification of the transmission statuses of data points, and instead learns them probabilistically through a fully Bayesian inference scheme. By directly modeling continuous spatial processes with smooth densities, our method enjoys significant computational advantages compared to previous methods that rely on discretization of the covariate space. We demonstrate that our framework can capture age structures in HIV transmission at high resolution, bringing valuable insights in a case study on viral deep-sequencing data from Southern Uganda.