Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
bioRxiv ; 2024 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-38559140

RESUMEN

Molecular surveillance of viral pathogens and inference of transmission networks from genomic data play an increasingly important role in public health efforts, especially for HIV-1. For many methods, the genetic distance threshold used to connect sequences in the transmission network is a key parameter informing the properties of inferred networks. Using a distance threshold that is too high can result in a network with many spurious links, making it difficult to interpret. Conversely, a distance threshold that is too low can result in a network with too few links, which may not capture key insights into clusters of public health concern. Published research using the HIV-TRACE software package frequently uses the default threshold of 0.015 substitutions/site for HIV pol gene sequences, but in many cases, investigators heuristically select other threshold parameters to better capture the underlying dynamics of the epidemic they are studying. Here, we present a general heuristic scoring approach for tuning a distance threshold adaptively, which seeks to prevent the formation of giant clusters. We prioritize the ratio of the sizes of the largest and the second largest cluster, maximizing the number of clusters present in the network. We apply our scoring heuristic to outbreaks with different characteristics, such as regional or temporal variability, and demonstrate the utility of using the scoring mechanism's suggested distance threshold to identify clusters exhibiting risk factors that would have otherwise been more difficult to identify. For example, while we found that a 0.015 substitutions/site distance threshold is typical for US-like epidemics, recent outbreaks like the CRF07_BC subtype among men who have sex with men (MSM) in China have been found to have a lower optimal threshold of 0.005 to better capture the transition from injected drug use (IDU) to MSM as the primary risk factor. Alternatively, in communities surrounding Lake Victoria in Uganda, where there has been sustained hetero-sexual transmission for many years, we found that a larger distance threshold is necessary to capture a more risk factor-diverse population with sparse sampling over a longer period of time. Such identification may allow for more informed intervention action by respective public health officials.

2.
Bioinformatics ; 38(10): 2719-2726, 2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35561179

RESUMEN

MOTIVATION: Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites but millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate and fast phylogenetic inference of resolvable phylogenetic features. RESULTS: We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. We develop a bootstrap strategy that resamples genomes spatiotemporally to assess topological robustness. The application of TopHap to build a phylogeny of 68 057 SARS-CoV-2 genomes (68KG) from the first year of the pandemic produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million SARS-CoV-2 genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major and recent variants of concern. AVAILABILITY AND IMPLEMENTATION: TopHap is available at https://github.com/SayakaMiura/TopHap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
COVID-19 , SARS-CoV-2 , Genoma Viral , Haplotipos , Humanos , Mutación , Filogenia , SARS-CoV-2/genética
3.
Mol Biol Evol ; 39(4)2022 04 11.
Artículo en Inglés | MEDLINE | ID: mdl-35325204

RESUMEN

Among the 30 nonsynonymous nucleotide substitutions in the Omicron S-gene are 13 that have only rarely been seen in other SARS-CoV-2 sequences. These mutations cluster within three functionally important regions of the S-gene at sites that will likely impact (1) interactions between subunits of the Spike trimer and the predisposition of subunits to shift from down to up configurations, (2) interactions of Spike with ACE2 receptors, and (3) the priming of Spike for membrane fusion. We show here that, based on both the rarity of these 13 mutations in intrapatient sequencing reads and patterns of selection at the codon sites where the mutations occur in SARS-CoV-2 and related sarbecoviruses, prior to the emergence of Omicron the mutations would have been predicted to decrease the fitness of any virus within which they occurred. We further propose that the mutations in each of the three clusters therefore cooperatively interact to both mitigate their individual fitness costs, and, in combination with other mutations, adaptively alter the function of Spike. Given the evident epidemic growth advantages of Omicron overall previously known SARS-CoV-2 lineages, it is crucial to determine both how such complex and highly adaptive mutation constellations were assembled within the Omicron S-gene, and why, despite unprecedented global genomic surveillance efforts, the early stages of this assembly process went completely undetected.


Asunto(s)
COVID-19 , Glicoproteína de la Espiga del Coronavirus , COVID-19/genética , Humanos , Mutación , SARS-CoV-2/genética , Glicoproteína de la Espiga del Coronavirus/genética
4.
Sci Transl Med ; 14(633): eabk3445, 2022 Feb 23.
Artículo en Inglés | MEDLINE | ID: mdl-35014856

RESUMEN

SARS-CoV-2 evolution threatens vaccine- and natural infection-derived immunity as well as the efficacy of therapeutic antibodies. To improve public health preparedness, we sought to predict which existing amino acid mutations in SARS-CoV-2 might contribute to future variants of concern. We tested the predictive value of features comprising epidemiology, evolution, immunology, and neural network-based protein sequence modeling, and identified primary biological drivers of SARS-CoV-2 intra-pandemic evolution. We found evidence that ACE2-mediated transmissibility and resistance to population-level host immunity has waxed and waned as a primary driver of SARS-CoV-2 evolution over time. We retroactively identified with high accuracy (area under the receiver operator characteristic curve, AUROC=0.92-0.97) mutations that will spread, at up to four months in advance, across different phases of the pandemic. The behavior of the model was consistent with a plausible causal structure wherein epidemiological covariates combine the effects of diverse and shifting drivers of viral fitness. We applied our model to forecast mutations that will spread in the future and characterize how these mutations affect the binding of therapeutic antibodies. These findings demonstrate that it is possible to forecast the driver mutations that could appear in emerging SARS-CoV-2 variants of concern. We validate this result against Omicron, showing elevated predictive scores for its component mutations prior to emergence, and rapid score increase across daily forecasts during emergence. This modeling approach may be applied to any rapidly evolving pathogens with sufficiently dense genomic surveillance data, such as influenza, and unknown future pandemic viruses.


Asunto(s)
COVID-19 , SARS-CoV-2 , COVID-19/virología , Humanos , Mutación , Pandemias , SARS-CoV-2/genética
5.
bioRxiv ; 2022 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-35075456

RESUMEN

Among the 30 non-synonymous nucleotide substitutions in the Omicron S-gene are 13 that have only rarely been seen in other SARS-CoV-2 sequences. These mutations cluster within three functionally important regions of the S-gene at sites that will likely impact (i) interactions between subunits of the Spike trimer and the predisposition of subunits to shift from down to up configurations, (ii) interactions of Spike with ACE2 receptors, and (iii) the priming of Spike for membrane fusion. We show here that, based on both the rarity of these 13 mutations in intrapatient sequencing reads and patterns of selection at the codon sites where the mutations occur in SARS-CoV-2 and related sarbecoviruses, prior to the emergence of Omicron the mutations would have been predicted to decrease the fitness of any genomes within which they occurred. We further propose that the mutations in each of the three clusters therefore cooperatively interact to both mitigate their individual fitness costs, and adaptively alter the function of Spike. Given the evident epidemic growth advantages of Omicron over all previously known SARS-CoV-2 lineages, it is crucial to determine both how such complex and highly adaptive mutation constellations were assembled within the Omicron S-gene, and why, despite unprecedented global genomic surveillance efforts, the early stages of this assembly process went completely undetected.

6.
bioRxiv ; 2021 Dec 14.
Artículo en Inglés | MEDLINE | ID: mdl-34931186

RESUMEN

MOTIVATION: Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of SARS-CoV-2 strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites and millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate phylogenetic inference of resolvable phylogenetic features. RESULTS: We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. To assess topological robustness, we develop a bootstrap resampling strategy that resamples genomes spatiotemporally. The application of TopHap to build a phylogeny of 68,057 genomes (68KG) produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major variants of concern. AVAILABILITY: TopHap is available on the web at https://github.com/SayakaMiura/TopHap . CONTACT: s.kumar@temple.edu.

7.
J Int AIDS Soc ; 24(11): e25836, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34762774

RESUMEN

INTRODUCTION: Molecular surveillance systems could provide public health benefits to focus strategies to improve the HIV care continuum. Here, we infer the HIV genetic network of Mexico City in 2020, and identify actively growing clusters that could represent relevant targets for intervention. METHODS: All new diagnoses, referrals from other institutions, as well as persons returning to care, enrolling at the largest HIV clinic in Mexico City were invited to participate in the study. The network was inferred from HIV pol sequences, using pairwise genetic distance methods, with a locally hosted, secure version of the HIV-TRACE tool: Seguro HIV-TRACE. Socio-demographic, clinical and behavioural metadata were overlaid across the network to design focused prevention interventions. RESULTS: A total of 3168 HIV sequences from unique individuals were included. One thousand and one-hundred and fifty (36%) sequences formed 1361 links within 386 transmission clusters in the network. Cluster size varied from 2 to 14 (63% were dyads). After adjustment for covariates, lower age (adjusted odds ratio [aOR]: 0.37, p<0.001; >34 vs. <24 years), being a man who has sex with men (MSM) (aOR: 2.47, p = 0.004; MSM vs. cisgender women), having higher viral load (aOR: 1.28, p<0.001) and higher CD4+ T cell count (aOR: 1.80, p<0.001; ≥500 vs. <200 cells/mm3 ) remained associated with higher odds of clustering. Compared to MSM, cisgender women and heterosexual men had significantly lower education (none or any elementary: 59.1% and 54.2% vs. 16.6%, p<0.001) and socio-economic status (low income: 36.4% and 29.0% vs. 18.6%, p = 0.03) than MSM. We identified 10 (2.6%) clusters with constant growth, for prioritized intervention, that included intersecting sexual risk groups, highly connected nodes and bridge nodes between possible sub-clusters with high growth potential. CONCLUSIONS: HIV transmission in Mexico City is strongly driven by young MSM with higher education level and recent infection. Nevertheless, leveraging network inference, we identified actively growing clusters that could be prioritized for focused intervention with demographic and risk characteristics that do not necessarily reflect the ones observed in the overall clustering population. Further studies evaluating different models to predict growing clusters are warranted. Focused interventions will have to consider structural and risk disparities between the MSM and the heterosexual populations.


Asunto(s)
Infecciones por VIH , Minorías Sexuales y de Género , Femenino , Redes Reguladoras de Genes , Infecciones por VIH/diagnóstico , Infecciones por VIH/epidemiología , Homosexualidad Masculina , Humanos , Masculino , México/epidemiología
8.
Cell ; 184(20): 5189-5200.e7, 2021 09 30.
Artículo en Inglés | MEDLINE | ID: mdl-34537136

RESUMEN

The independent emergence late in 2020 of the B.1.1.7, B.1.351, and P.1 lineages of SARS-CoV-2 prompted renewed concerns about the evolutionary capacity of this virus to overcome public health interventions and rising population immunity. Here, by examining patterns of synonymous and non-synonymous mutations that have accumulated in SARS-CoV-2 genomes since the pandemic began, we find that the emergence of these three "501Y lineages" coincided with a major global shift in the selective forces acting on various SARS-CoV-2 genes. Following their emergence, the adaptive evolution of 501Y lineage viruses has involved repeated selectively favored convergent mutations at 35 genome sites, mutations we refer to as the 501Y meta-signature. The ongoing convergence of viruses in many other lineages on this meta-signature suggests that it includes multiple mutation combinations capable of promoting the persistence of diverse SARS-CoV-2 lineages in the face of mounting host immune recognition.


Asunto(s)
COVID-19/epidemiología , Evolución Molecular , Mutación , Pandemias , SARS-CoV-2/genética , Secuencia de Aminoácidos/genética , COVID-19/inmunología , COVID-19/transmisión , COVID-19/virología , Codón/genética , Genes Virales , Flujo Genético , Adaptación al Huésped/genética , Humanos , Evasión Inmune , Filogenia , Salud Pública
9.
Mol Biol Evol ; 38(8): 3046-3059, 2021 07 29.
Artículo en Inglés | MEDLINE | ID: mdl-33942847

RESUMEN

Global sequencing of genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continued to reveal new genetic variants that are the key to unraveling its early evolutionary history and tracking its global spread over time. Here we present the heretofore cryptic mutational history and spatiotemporal dynamics of SARS-CoV-2 from an analysis of thousands of high-quality genomes. We report the likely most recent common ancestor of SARS-CoV-2, reconstructed through a novel application and advancement of computational methods initially developed to infer the mutational history of tumor cells in a patient. This progenitor genome differs from genomes of the first coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the United States harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide months before and after the first reported cases of COVID-19 in China. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains that have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic in 2020. There have been multiple replacements of predominant coronavirus strains in Europe and Asia as well as continued presence of multiple high-frequency strains in Asia and North America. We have developed a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (http://sars2evo.datamonkey.org/).


Asunto(s)
COVID-19/genética , SARS-CoV-2/genética , Evolución Biológica , COVID-19/metabolismo , Biología Computacional/métodos , Trazado de Contacto/métodos , Evolución Molecular , Genoma Viral , Humanos , Mutación , Pandemias , Filogenia , SARS-CoV-2/metabolismo , SARS-CoV-2/patogenicidad , Análisis de Secuencia de ADN/métodos
10.
Nature ; 592(7854): 438-443, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33690265

RESUMEN

Continued uncontrolled transmission of SARS-CoV-2 in many parts of the world is creating conditions for substantial evolutionary changes to the virus1,2. Here we describe a newly arisen lineage of SARS-CoV-2 (designated 501Y.V2; also known as B.1.351 or 20H) that is defined by eight mutations in the spike protein, including three substitutions (K417N, E484K and N501Y) at residues in its receptor-binding domain that may have functional importance3-5. This lineage was identified in South Africa after the first wave of the epidemic in a severely affected metropolitan area (Nelson Mandela Bay) that is located on the coast of the Eastern Cape province. This lineage spread rapidly, and became dominant in Eastern Cape, Western Cape and KwaZulu-Natal provinces within weeks. Although the full import of the mutations is yet to be determined, the genomic data-which show rapid expansion and displacement of other lineages in several regions-suggest that this lineage is associated with a selection advantage that most plausibly results from increased transmissibility or immune escape6-8.


Asunto(s)
COVID-19/virología , Mutación , Filogenia , Filogeografía , SARS-CoV-2/genética , SARS-CoV-2/aislamiento & purificación , COVID-19/epidemiología , COVID-19/inmunología , COVID-19/transmisión , Análisis Mutacional de ADN , Evolución Molecular , Aptitud Genética , Humanos , Evasión Inmune , Modelos Moleculares , SARS-CoV-2/inmunología , SARS-CoV-2/patogenicidad , Selección Genética , Sudáfrica/epidemiología , Glicoproteína de la Espiga del Coronavirus/química , Glicoproteína de la Espiga del Coronavirus/genética , Glicoproteína de la Espiga del Coronavirus/metabolismo , Factores de Tiempo
11.
medRxiv ; 2021 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-33688681

RESUMEN

The emergence and rapid rise in prevalence of three independent SARS-CoV-2 "501Y lineages", B.1.1.7, B.1.351 and P.1, in the last three months of 2020 prompted renewed concerns about the evolutionary capacity of SARS-CoV-2 to adapt to both rising population immunity, and public health interventions such as vaccines and social distancing. Viruses giving rise to the different 501Y lineages have, presumably under intense natural selection following a shift in host environment, independently acquired multiple unique and convergent mutations. As a consequence, all have gained epidemiological and immunological properties that will likely complicate the control of COVID-19. Here, by examining patterns of mutations that arose in SARSCoV-2 genomes during the pandemic we find evidence of a major change in the selective forces acting on various SARS-CoV-2 genes and gene segments (such as S, nsp2 and nsp6), that likely coincided with the emergence of the 501Y lineages. In addition to involving continuing sequence diversification, we find evidence that a significant portion of the ongoing adaptive evolution of the 501Y lineages also involves further convergence between the lineages. Our findings highlight the importance of monitoring how members of these known 501Y lineages, and others still undiscovered, are convergently evolving similar strategies to ensure their persistence in the face of mounting infection and vaccine induced host immune recognition.

12.
PLoS Biol ; 19(3): e3001115, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33711012

RESUMEN

Virus host shifts are generally associated with novel adaptations to exploit the cells of the new host species optimally. Surprisingly, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has apparently required little to no significant adaptation to humans since the start of the Coronavirus Disease 2019 (COVID-19) pandemic and to October 2020. Here we assess the types of natural selection taking place in Sarbecoviruses in horseshoe bats versus the early SARS-CoV-2 evolution in humans. While there is moderate evidence of diversifying positive selection in SARS-CoV-2 in humans, it is limited to the early phase of the pandemic, and purifying selection is much weaker in SARS-CoV-2 than in related bat Sarbecoviruses. In contrast, our analysis detects evidence for significant positive episodic diversifying selection acting at the base of the bat virus lineage SARS-CoV-2 emerged from, accompanied by an adaptive depletion in CpG composition presumed to be linked to the action of antiviral mechanisms in these ancestral bat hosts. The closest bat virus to SARS-CoV-2, RmYN02 (sharing an ancestor about 1976), is a recombinant with a structure that includes differential CpG content in Spike; clear evidence of coinfection and evolution in bats without involvement of other species. While an undiscovered "facilitating" intermediate species cannot be discounted, collectively, our results support the progenitor of SARS-CoV-2 being capable of efficient human-human transmission as a consequence of its adaptive evolutionary history in bats, not humans, which created a relatively generalist virus.


Asunto(s)
COVID-19/virología , Quirópteros/virología , SARS-CoV-2/genética , Zoonosis Virales/virología , Animales , COVID-19/epidemiología , COVID-19/transmisión , Evolución Molecular , Genoma Viral , Especificidad del Huésped , Humanos , Pandemias , Filogenia , Receptores Virales/genética , SARS-CoV-2/patogenicidad , Selección Genética , Zoonosis Virales/genética , Zoonosis Virales/transmisión
13.
PLoS One ; 16(3): e0248337, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33711070

RESUMEN

Despite many attempts to introduce evolutionary models that permit substitutions to instantly alter more than one nucleotide in a codon, the prevailing wisdom remains that such changes are rare and generally negligible or are reflective of non-biological artifacts, such as alignment errors. Codon models continue to posit that only single nucleotide change have non-zero rates. Here, we develop and test a simple hierarchy of codon-substitution models with non-zero evolutionary rates for only one-nucleotide (1H), one- and two-nucleotide (2H), or any (3H) codon substitutions. Using over 42, 000 empirical alignments, we find widespread statistical support for multiple hits: 61% of alignments prefer models with 2H allowed, and 23%-with 3H allowed. Analyses of simulated data suggest that these results are not likely to be due to simple artifacts such as model misspecification or alignment errors. Further modeling reveals that synonymous codon island jumping among codons encoding serine, especially along short branches, contributes significantly to this 3H signal. While serine codons were prominently involved in multiple-hit substitutions, there were other common exchanges contributing to better model fit. It appears that a small subset of sites in most alignments have unusual evolutionary dynamics not well explained by existing model formalisms, and that commonly estimated quantities, such as dN/dS ratios may be biased by model misspecification. Our findings highlight the need for continued evaluation of assumptions underlying workhorse evolutionary models and subsequent evolutionary inference techniques. We provide a software implementation for evolutionary biologists to assess the potential impact of extra base hits in their data in the HyPhy package and in the Datamonkey.org server.


Asunto(s)
Codón/genética , Modelos Genéticos , Filogenia , Programas Informáticos , Evolución Molecular , Nucleótidos
14.
Mol Biol Evol ; 38(3): 1184-1198, 2021 03 09.
Artículo en Inglés | MEDLINE | ID: mdl-33064823

RESUMEN

A number of evolutionary hypotheses can be tested by comparing selective pressures among sets of branches in a phylogenetic tree. When the question of interest is to identify specific sites within genes that may be evolving differently, a common approach is to perform separate analyses on subsets of sequences and compare parameter estimates in a post hoc fashion. This approach is statistically suboptimal and not always applicable. Here, we develop a simple extension of a popular fixed effects likelihood method in the context of codon-based evolutionary phylogenetic maximum likelihood testing, Contrast-FEL. It is suitable for identifying individual alignment sites where any among the K≥2 sets of branches in a phylogenetic tree have detectably different ω ratios, indicative of different selective regimes. Using extensive simulations, we show that Contrast-FEL delivers good power, exceeding 90% for sufficiently large differences, while maintaining tight control over false positive rates, when the model is correctly specified. We conclude by applying Contrast-FEL to data from five previously published studies spanning a diverse range of organisms and focusing on different evolutionary questions.


Asunto(s)
Técnicas Genéticas , Filogenia , Selección Genética , Brassicaceae/genética , Citocromos b/genética , Transcriptasa Inversa del VIH/genética , Haemosporida/genética , Rodopsina/genética , Ribulosa-Bifosfato Carboxilasa/genética , Tricomas/genética
15.
bioRxiv ; 2021 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-32995781

RESUMEN

We report the likely most recent common ancestor of SARS-CoV-2 - the coronavirus that causes COVID-19. This progenitor SARS-CoV-2 genome was recovered through a novel application and advancement of computational methods initially developed to reconstruct the mutational history of tumor cells in a patient. The progenitor differs from the earliest coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the USA harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide as soon as weeks after the first reported cases of COVID-19. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains, which have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic. There have been multiple replacements of predominant coronavirus strains in Europe and Asia and the continued presence of multiple high-frequency strains in Asia and North America. We provide a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (http://sars2evo.datamonkey.org/).

16.
PLoS Pathog ; 16(8): e1008643, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-32790776

RESUMEN

The current state of much of the Wuhan pneumonia virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies and requires unimpeded access to data, analysis tools, and computational infrastructure. Here, we show that community efforts in developing open analytical software tools over the past 10 years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner. Specifically, we use all SARS-CoV-2 genomic data available in the public domain so far to (1) underscore the importance of access to raw data and (2) demonstrate that existing community efforts in curation and deployment of biomedical software can reliably support rapid, reproducible research during global health crises. All our analyses are fully documented at https://github.com/galaxyproject/SARS-CoV-2.


Asunto(s)
Betacoronavirus/patogenicidad , Infecciones por Coronavirus/virología , Neumonía Viral/virología , Salud Pública , Síndrome Respiratorio Agudo Grave/virología , COVID-19 , Análisis de Datos , Humanos , Pandemias , SARS-CoV-2
17.
bioRxiv ; 2020 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-32577659

RESUMEN

RNA viruses are proficient at switching host species, and evolving adaptations to exploit the new host's cells efficiently. Surprisingly, SARS-CoV-2 has apparently required no significant adaptation to humans since the start of the COVID-19 pandemic, with no observed selective sweeps since genome sampling began. Here we assess the types of natural selection taking place in Sarbecoviruses in horseshoe bats versus SARS-CoV-2 evolution in humans. While there is moderate evidence of diversifying positive selection in SARS-CoV-2 in humans, it is limited to the early phase of the pandemic, and purifying selection is much weaker in SARS-CoV-2 than in related bat Sarbecoviruses . In contrast, our analysis detects significant positive episodic diversifying selection acting on the bat virus lineage SARS-CoV-2 emerged from, accompanied by an adaptive depletion in CpG composition presumed to be linked to the action of antiviral mechanisms in ancestral hosts. The closest bat virus to SARS-CoV-2, RmYN02 (sharing an ancestor ∼1976), is a recombinant with a structure that includes differential CpG content in Spike; clear evidence of coinfection and evolution in bats without involvement of other species. Collectively our results demonstrate the progenitor of SARS-CoV-2 was capable of near immediate human-human transmission as a consequence of its adaptive evolutionary history in bats, not humans.

18.
Mol Biol Evol ; 37(1): 295-299, 2020 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-31504749

RESUMEN

HYpothesis testing using PHYlogenies (HyPhy) is a scriptable, open-source package for fitting a broad range of evolutionary models to multiple sequence alignments, and for conducting subsequent parameter estimation and hypothesis testing, primarily in the maximum likelihood statistical framework. It has become a popular choice for characterizing various aspects of the evolutionary process: natural selection, evolutionary rates, recombination, and coevolution. The 2.5 release (available from www.hyphy.org) includes a completely re-engineered computational core and analysis library that introduces new classes of evolutionary models and statistical tests, delivers substantial performance and stability enhancements, improves usability, streamlines end-to-end analysis workflows, makes it easier to develop custom analyses, and is mostly backward compatible with previous HyPhy releases.


Asunto(s)
Técnicas Genéticas , Filogenia , Programas Informáticos
19.
Methods Mol Biol ; 1910: 427-468, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31278673

RESUMEN

Natural selection is a fundamental force shaping organismal evolution, as it both maintains function and enables adaptation and innovation. Viruses, with their typically short and largely coding genomes, experience strong and diverse selective forces, sometimes acting on timescales that can be directly measured. These selection pressures emerge from an antagonistic interplay between rapidly changing fitness requirements (immune and antiviral responses from hosts, transmission between hosts, or colonization of new host species) and functional imperatives (the ability to infect hosts or host cells and replicate within hosts). Indeed, computational methods to quantify these evolutionary forces using molecular sequence data were initially, dating back to the 1980s, applied to the study of viral pathogens. This preference largely emerged because the strong selective forces are easiest to detect in viruses, and, of course, viruses have clear biomedical relevance. Recent commoditization of affordable high-throughput sequencing has made it possible to generate truly massive genomic data sets, on which powerful and accurate methods can yield a very detailed depiction of when, where, and (sometimes) how viral pathogens respond to various selective forces.Here, we present recent statistical developments and state-of-the-art methods to identify and characterize these selection pressures from protein-coding sequence alignments and phylogenies. Methods described here can reveal critical information about various evolutionary regimes, including whole-gene selection, lineage-specific selection, and site-specific selection acting upon viral genomes, while accounting for confounding biological processes, such as recombination and variation in mutation rates.


Asunto(s)
Evolución Molecular , Genoma Viral , Genómica , Virus/genética , Codón , Biología Computacional/métodos , Variación Genética , Genómica/métodos , Filogenia , Recombinación Genética , Selección Genética , Programas Informáticos , Virus/clasificación
20.
PLoS Comput Biol ; 14(12): e1006498, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30543621

RESUMEN

Next generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data. FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN/dS) across time and across protein structure, and a phylogenetic tree browser. We demonstrate how FLEA may be used to process Pacific Biosciences HIV env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV env populations. A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018.


Asunto(s)
Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Virus/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Filogenia , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...