Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Front Bioinform ; 4: 1400003, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39086842

RESUMO

Molecular surveillance of viral pathogens and inference of transmission networks from genomic data play an increasingly important role in public health efforts, especially for HIV-1. For many methods, the genetic distance threshold used to connect sequences in the transmission network is a key parameter informing the properties of inferred networks. Using a distance threshold that is too high can result in a network with many spurious links, making it difficult to interpret. Conversely, a distance threshold that is too low can result in a network with too few links, which may not capture key insights into clusters of public health concern. Published research using the HIV-TRACE software package frequently uses the default threshold of 0.015 substitutions/site for HIV pol gene sequences, but in many cases, investigators heuristically select other threshold parameters to better capture the underlying dynamics of the epidemic they are studying. Here, we present a general heuristic scoring approach for tuning a distance threshold adaptively, which seeks to prevent the formation of giant clusters. We prioritize the ratio of the sizes of the largest and the second largest cluster, maximizing the number of clusters present in the network. We apply our scoring heuristic to outbreaks with different characteristics, such as regional or temporal variability, and demonstrate the utility of using the scoring mechanism's suggested distance threshold to identify clusters exhibiting risk factors that would have otherwise been more difficult to identify. For example, while we found that a 0.015 substitutions/site distance threshold is typical for US-like epidemics, recent outbreaks like the CRF07_BC subtype among men who have sex with men (MSM) in China have been found to have a lower optimal threshold of 0.005 to better capture the transition from injected drug use (IDU) to MSM as the primary risk factor. Alternatively, in communities surrounding Lake Victoria in Uganda, where there has been sustained heterosexual transmission for many years, we found that a larger distance threshold is necessary to capture a more risk factor-diverse population with sparse sampling over a longer period of time. Such identification may allow for more informed intervention action by respective public health officials.

2.
bioRxiv ; 2024 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-38559140

RESUMO

Molecular surveillance of viral pathogens and inference of transmission networks from genomic data play an increasingly important role in public health efforts, especially for HIV-1. For many methods, the genetic distance threshold used to connect sequences in the transmission network is a key parameter informing the properties of inferred networks. Using a distance threshold that is too high can result in a network with many spurious links, making it difficult to interpret. Conversely, a distance threshold that is too low can result in a network with too few links, which may not capture key insights into clusters of public health concern. Published research using the HIV-TRACE software package frequently uses the default threshold of 0.015 substitutions/site for HIV pol gene sequences, but in many cases, investigators heuristically select other threshold parameters to better capture the underlying dynamics of the epidemic they are studying. Here, we present a general heuristic scoring approach for tuning a distance threshold adaptively, which seeks to prevent the formation of giant clusters. We prioritize the ratio of the sizes of the largest and the second largest cluster, maximizing the number of clusters present in the network. We apply our scoring heuristic to outbreaks with different characteristics, such as regional or temporal variability, and demonstrate the utility of using the scoring mechanism's suggested distance threshold to identify clusters exhibiting risk factors that would have otherwise been more difficult to identify. For example, while we found that a 0.015 substitutions/site distance threshold is typical for US-like epidemics, recent outbreaks like the CRF07_BC subtype among men who have sex with men (MSM) in China have been found to have a lower optimal threshold of 0.005 to better capture the transition from injected drug use (IDU) to MSM as the primary risk factor. Alternatively, in communities surrounding Lake Victoria in Uganda, where there has been sustained hetero-sexual transmission for many years, we found that a larger distance threshold is necessary to capture a more risk factor-diverse population with sparse sampling over a longer period of time. Such identification may allow for more informed intervention action by respective public health officials.

3.
Bioinformatics ; 38(10): 2719-2726, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561179

RESUMO

MOTIVATION: Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites but millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate and fast phylogenetic inference of resolvable phylogenetic features. RESULTS: We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. We develop a bootstrap strategy that resamples genomes spatiotemporally to assess topological robustness. The application of TopHap to build a phylogeny of 68 057 SARS-CoV-2 genomes (68KG) from the first year of the pandemic produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million SARS-CoV-2 genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major and recent variants of concern. AVAILABILITY AND IMPLEMENTATION: TopHap is available at https://github.com/SayakaMiura/TopHap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
COVID-19 , SARS-CoV-2 , Genoma Viral , Haplótipos , Humanos , Mutação , Filogenia , SARS-CoV-2/genética
4.
Mol Biol Evol ; 39(4)2022 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-35325204

RESUMO

Among the 30 nonsynonymous nucleotide substitutions in the Omicron S-gene are 13 that have only rarely been seen in other SARS-CoV-2 sequences. These mutations cluster within three functionally important regions of the S-gene at sites that will likely impact (1) interactions between subunits of the Spike trimer and the predisposition of subunits to shift from down to up configurations, (2) interactions of Spike with ACE2 receptors, and (3) the priming of Spike for membrane fusion. We show here that, based on both the rarity of these 13 mutations in intrapatient sequencing reads and patterns of selection at the codon sites where the mutations occur in SARS-CoV-2 and related sarbecoviruses, prior to the emergence of Omicron the mutations would have been predicted to decrease the fitness of any virus within which they occurred. We further propose that the mutations in each of the three clusters therefore cooperatively interact to both mitigate their individual fitness costs, and, in combination with other mutations, adaptively alter the function of Spike. Given the evident epidemic growth advantages of Omicron overall previously known SARS-CoV-2 lineages, it is crucial to determine both how such complex and highly adaptive mutation constellations were assembled within the Omicron S-gene, and why, despite unprecedented global genomic surveillance efforts, the early stages of this assembly process went completely undetected.


Assuntos
COVID-19 , Glicoproteína da Espícula de Coronavírus , COVID-19/genética , Humanos , Mutação , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus/genética
5.
bioRxiv ; 2022 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-35075456

RESUMO

Among the 30 non-synonymous nucleotide substitutions in the Omicron S-gene are 13 that have only rarely been seen in other SARS-CoV-2 sequences. These mutations cluster within three functionally important regions of the S-gene at sites that will likely impact (i) interactions between subunits of the Spike trimer and the predisposition of subunits to shift from down to up configurations, (ii) interactions of Spike with ACE2 receptors, and (iii) the priming of Spike for membrane fusion. We show here that, based on both the rarity of these 13 mutations in intrapatient sequencing reads and patterns of selection at the codon sites where the mutations occur in SARS-CoV-2 and related sarbecoviruses, prior to the emergence of Omicron the mutations would have been predicted to decrease the fitness of any genomes within which they occurred. We further propose that the mutations in each of the three clusters therefore cooperatively interact to both mitigate their individual fitness costs, and adaptively alter the function of Spike. Given the evident epidemic growth advantages of Omicron over all previously known SARS-CoV-2 lineages, it is crucial to determine both how such complex and highly adaptive mutation constellations were assembled within the Omicron S-gene, and why, despite unprecedented global genomic surveillance efforts, the early stages of this assembly process went completely undetected.

6.
Sci Transl Med ; 14(633): eabk3445, 2022 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-35014856

RESUMO

SARS-CoV-2 evolution threatens vaccine- and natural infection-derived immunity as well as the efficacy of therapeutic antibodies. To improve public health preparedness, we sought to predict which existing amino acid mutations in SARS-CoV-2 might contribute to future variants of concern. We tested the predictive value of features comprising epidemiology, evolution, immunology, and neural network-based protein sequence modeling, and identified primary biological drivers of SARS-CoV-2 intra-pandemic evolution. We found evidence that ACE2-mediated transmissibility and resistance to population-level host immunity has waxed and waned as a primary driver of SARS-CoV-2 evolution over time. We retroactively identified with high accuracy (area under the receiver operator characteristic curve, AUROC=0.92-0.97) mutations that will spread, at up to four months in advance, across different phases of the pandemic. The behavior of the model was consistent with a plausible causal structure wherein epidemiological covariates combine the effects of diverse and shifting drivers of viral fitness. We applied our model to forecast mutations that will spread in the future and characterize how these mutations affect the binding of therapeutic antibodies. These findings demonstrate that it is possible to forecast the driver mutations that could appear in emerging SARS-CoV-2 variants of concern. We validate this result against Omicron, showing elevated predictive scores for its component mutations prior to emergence, and rapid score increase across daily forecasts during emergence. This modeling approach may be applied to any rapidly evolving pathogens with sufficiently dense genomic surveillance data, such as influenza, and unknown future pandemic viruses.


Assuntos
COVID-19 , SARS-CoV-2 , COVID-19/virologia , Humanos , Mutação , Pandemias , SARS-CoV-2/genética
7.
bioRxiv ; 2021 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-34931186

RESUMO

MOTIVATION: Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of SARS-CoV-2 strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites and millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate phylogenetic inference of resolvable phylogenetic features. RESULTS: We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. To assess topological robustness, we develop a bootstrap resampling strategy that resamples genomes spatiotemporally. The application of TopHap to build a phylogeny of 68,057 genomes (68KG) produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major variants of concern. AVAILABILITY: TopHap is available on the web at https://github.com/SayakaMiura/TopHap . CONTACT: s.kumar@temple.edu.

8.
J Int AIDS Soc ; 24(11): e25836, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34762774

RESUMO

INTRODUCTION: Molecular surveillance systems could provide public health benefits to focus strategies to improve the HIV care continuum. Here, we infer the HIV genetic network of Mexico City in 2020, and identify actively growing clusters that could represent relevant targets for intervention. METHODS: All new diagnoses, referrals from other institutions, as well as persons returning to care, enrolling at the largest HIV clinic in Mexico City were invited to participate in the study. The network was inferred from HIV pol sequences, using pairwise genetic distance methods, with a locally hosted, secure version of the HIV-TRACE tool: Seguro HIV-TRACE. Socio-demographic, clinical and behavioural metadata were overlaid across the network to design focused prevention interventions. RESULTS: A total of 3168 HIV sequences from unique individuals were included. One thousand and one-hundred and fifty (36%) sequences formed 1361 links within 386 transmission clusters in the network. Cluster size varied from 2 to 14 (63% were dyads). After adjustment for covariates, lower age (adjusted odds ratio [aOR]: 0.37, p<0.001; >34 vs. <24 years), being a man who has sex with men (MSM) (aOR: 2.47, p = 0.004; MSM vs. cisgender women), having higher viral load (aOR: 1.28, p<0.001) and higher CD4+ T cell count (aOR: 1.80, p<0.001; ≥500 vs. <200 cells/mm3 ) remained associated with higher odds of clustering. Compared to MSM, cisgender women and heterosexual men had significantly lower education (none or any elementary: 59.1% and 54.2% vs. 16.6%, p<0.001) and socio-economic status (low income: 36.4% and 29.0% vs. 18.6%, p = 0.03) than MSM. We identified 10 (2.6%) clusters with constant growth, for prioritized intervention, that included intersecting sexual risk groups, highly connected nodes and bridge nodes between possible sub-clusters with high growth potential. CONCLUSIONS: HIV transmission in Mexico City is strongly driven by young MSM with higher education level and recent infection. Nevertheless, leveraging network inference, we identified actively growing clusters that could be prioritized for focused intervention with demographic and risk characteristics that do not necessarily reflect the ones observed in the overall clustering population. Further studies evaluating different models to predict growing clusters are warranted. Focused interventions will have to consider structural and risk disparities between the MSM and the heterosexual populations.


Assuntos
Infecções por HIV , Minorias Sexuais e de Gênero , Feminino , Redes Reguladoras de Genes , Infecções por HIV/diagnóstico , Infecções por HIV/epidemiologia , Homossexualidade Masculina , Humanos , Masculino , México/epidemiologia
9.
Cell ; 184(20): 5189-5200.e7, 2021 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-34537136

RESUMO

The independent emergence late in 2020 of the B.1.1.7, B.1.351, and P.1 lineages of SARS-CoV-2 prompted renewed concerns about the evolutionary capacity of this virus to overcome public health interventions and rising population immunity. Here, by examining patterns of synonymous and non-synonymous mutations that have accumulated in SARS-CoV-2 genomes since the pandemic began, we find that the emergence of these three "501Y lineages" coincided with a major global shift in the selective forces acting on various SARS-CoV-2 genes. Following their emergence, the adaptive evolution of 501Y lineage viruses has involved repeated selectively favored convergent mutations at 35 genome sites, mutations we refer to as the 501Y meta-signature. The ongoing convergence of viruses in many other lineages on this meta-signature suggests that it includes multiple mutation combinations capable of promoting the persistence of diverse SARS-CoV-2 lineages in the face of mounting host immune recognition.


Assuntos
COVID-19/epidemiologia , Evolução Molecular , Mutação , Pandemias , SARS-CoV-2/genética , Sequência de Aminoácidos/genética , COVID-19/imunologia , COVID-19/transmissão , COVID-19/virologia , Códon/genética , Genes Virais , Deriva Genética , Adaptação ao Hospedeiro/genética , Humanos , Evasão da Resposta Imune , Filogenia , Saúde Pública
10.
Mol Biol Evol ; 38(8): 3046-3059, 2021 07 29.
Artigo em Inglês | MEDLINE | ID: mdl-33942847

RESUMO

Global sequencing of genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continued to reveal new genetic variants that are the key to unraveling its early evolutionary history and tracking its global spread over time. Here we present the heretofore cryptic mutational history and spatiotemporal dynamics of SARS-CoV-2 from an analysis of thousands of high-quality genomes. We report the likely most recent common ancestor of SARS-CoV-2, reconstructed through a novel application and advancement of computational methods initially developed to infer the mutational history of tumor cells in a patient. This progenitor genome differs from genomes of the first coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the United States harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide months before and after the first reported cases of COVID-19 in China. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains that have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic in 2020. There have been multiple replacements of predominant coronavirus strains in Europe and Asia as well as continued presence of multiple high-frequency strains in Asia and North America. We have developed a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (http://sars2evo.datamonkey.org/).


Assuntos
COVID-19/genética , SARS-CoV-2/genética , Evolução Biológica , COVID-19/metabolismo , Biologia Computacional/métodos , Busca de Comunicante/métodos , Evolução Molecular , Genoma Viral , Humanos , Mutação , Pandemias , Filogenia , SARS-CoV-2/metabolismo , SARS-CoV-2/patogenicidade , Análise de Sequência de DNA/métodos
11.
medRxiv ; 2021 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-33688681

RESUMO

The emergence and rapid rise in prevalence of three independent SARS-CoV-2 "501Y lineages", B.1.1.7, B.1.351 and P.1, in the last three months of 2020 prompted renewed concerns about the evolutionary capacity of SARS-CoV-2 to adapt to both rising population immunity, and public health interventions such as vaccines and social distancing. Viruses giving rise to the different 501Y lineages have, presumably under intense natural selection following a shift in host environment, independently acquired multiple unique and convergent mutations. As a consequence, all have gained epidemiological and immunological properties that will likely complicate the control of COVID-19. Here, by examining patterns of mutations that arose in SARSCoV-2 genomes during the pandemic we find evidence of a major change in the selective forces acting on various SARS-CoV-2 genes and gene segments (such as S, nsp2 and nsp6), that likely coincided with the emergence of the 501Y lineages. In addition to involving continuing sequence diversification, we find evidence that a significant portion of the ongoing adaptive evolution of the 501Y lineages also involves further convergence between the lineages. Our findings highlight the importance of monitoring how members of these known 501Y lineages, and others still undiscovered, are convergently evolving similar strategies to ensure their persistence in the face of mounting infection and vaccine induced host immune recognition.

12.
PLoS Biol ; 19(3): e3001115, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33711012

RESUMO

Virus host shifts are generally associated with novel adaptations to exploit the cells of the new host species optimally. Surprisingly, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has apparently required little to no significant adaptation to humans since the start of the Coronavirus Disease 2019 (COVID-19) pandemic and to October 2020. Here we assess the types of natural selection taking place in Sarbecoviruses in horseshoe bats versus the early SARS-CoV-2 evolution in humans. While there is moderate evidence of diversifying positive selection in SARS-CoV-2 in humans, it is limited to the early phase of the pandemic, and purifying selection is much weaker in SARS-CoV-2 than in related bat Sarbecoviruses. In contrast, our analysis detects evidence for significant positive episodic diversifying selection acting at the base of the bat virus lineage SARS-CoV-2 emerged from, accompanied by an adaptive depletion in CpG composition presumed to be linked to the action of antiviral mechanisms in these ancestral bat hosts. The closest bat virus to SARS-CoV-2, RmYN02 (sharing an ancestor about 1976), is a recombinant with a structure that includes differential CpG content in Spike; clear evidence of coinfection and evolution in bats without involvement of other species. While an undiscovered "facilitating" intermediate species cannot be discounted, collectively, our results support the progenitor of SARS-CoV-2 being capable of efficient human-human transmission as a consequence of its adaptive evolutionary history in bats, not humans, which created a relatively generalist virus.


Assuntos
COVID-19/virologia , Quirópteros/virologia , SARS-CoV-2/genética , Zoonoses Virais/virologia , Animais , COVID-19/epidemiologia , COVID-19/transmissão , Evolução Molecular , Genoma Viral , Especificidade de Hospedeiro , Humanos , Pandemias , Filogenia , Receptores Virais/genética , SARS-CoV-2/patogenicidade , Seleção Genética , Zoonoses Virais/genética , Zoonoses Virais/transmissão
13.
PLoS One ; 16(3): e0248337, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33711070

RESUMO

Despite many attempts to introduce evolutionary models that permit substitutions to instantly alter more than one nucleotide in a codon, the prevailing wisdom remains that such changes are rare and generally negligible or are reflective of non-biological artifacts, such as alignment errors. Codon models continue to posit that only single nucleotide change have non-zero rates. Here, we develop and test a simple hierarchy of codon-substitution models with non-zero evolutionary rates for only one-nucleotide (1H), one- and two-nucleotide (2H), or any (3H) codon substitutions. Using over 42, 000 empirical alignments, we find widespread statistical support for multiple hits: 61% of alignments prefer models with 2H allowed, and 23%-with 3H allowed. Analyses of simulated data suggest that these results are not likely to be due to simple artifacts such as model misspecification or alignment errors. Further modeling reveals that synonymous codon island jumping among codons encoding serine, especially along short branches, contributes significantly to this 3H signal. While serine codons were prominently involved in multiple-hit substitutions, there were other common exchanges contributing to better model fit. It appears that a small subset of sites in most alignments have unusual evolutionary dynamics not well explained by existing model formalisms, and that commonly estimated quantities, such as dN/dS ratios may be biased by model misspecification. Our findings highlight the need for continued evaluation of assumptions underlying workhorse evolutionary models and subsequent evolutionary inference techniques. We provide a software implementation for evolutionary biologists to assess the potential impact of extra base hits in their data in the HyPhy package and in the Datamonkey.org server.


Assuntos
Códon/genética , Modelos Genéticos , Filogenia , Software , Evolução Molecular , Nucleotídeos
14.
Nature ; 592(7854): 438-443, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33690265

RESUMO

Continued uncontrolled transmission of SARS-CoV-2 in many parts of the world is creating conditions for substantial evolutionary changes to the virus1,2. Here we describe a newly arisen lineage of SARS-CoV-2 (designated 501Y.V2; also known as B.1.351 or 20H) that is defined by eight mutations in the spike protein, including three substitutions (K417N, E484K and N501Y) at residues in its receptor-binding domain that may have functional importance3-5. This lineage was identified in South Africa after the first wave of the epidemic in a severely affected metropolitan area (Nelson Mandela Bay) that is located on the coast of the Eastern Cape province. This lineage spread rapidly, and became dominant in Eastern Cape, Western Cape and KwaZulu-Natal provinces within weeks. Although the full import of the mutations is yet to be determined, the genomic data-which show rapid expansion and displacement of other lineages in several regions-suggest that this lineage is associated with a selection advantage that most plausibly results from increased transmissibility or immune escape6-8.


Assuntos
COVID-19/virologia , Mutação , Filogenia , Filogeografia , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , COVID-19/epidemiologia , COVID-19/imunologia , COVID-19/transmissão , Análise Mutacional de DNA , Evolução Molecular , Aptidão Genética , Humanos , Evasão da Resposta Imune , Modelos Moleculares , SARS-CoV-2/imunologia , SARS-CoV-2/patogenicidade , Seleção Genética , África do Sul/epidemiologia , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/genética , Glicoproteína da Espícula de Coronavírus/metabolismo , Fatores de Tempo
15.
Mol Biol Evol ; 38(3): 1184-1198, 2021 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-33064823

RESUMO

A number of evolutionary hypotheses can be tested by comparing selective pressures among sets of branches in a phylogenetic tree. When the question of interest is to identify specific sites within genes that may be evolving differently, a common approach is to perform separate analyses on subsets of sequences and compare parameter estimates in a post hoc fashion. This approach is statistically suboptimal and not always applicable. Here, we develop a simple extension of a popular fixed effects likelihood method in the context of codon-based evolutionary phylogenetic maximum likelihood testing, Contrast-FEL. It is suitable for identifying individual alignment sites where any among the K≥2 sets of branches in a phylogenetic tree have detectably different ω ratios, indicative of different selective regimes. Using extensive simulations, we show that Contrast-FEL delivers good power, exceeding 90% for sufficiently large differences, while maintaining tight control over false positive rates, when the model is correctly specified. We conclude by applying Contrast-FEL to data from five previously published studies spanning a diverse range of organisms and focusing on different evolutionary questions.


Assuntos
Técnicas Genéticas , Filogenia , Seleção Genética , Brassicaceae/genética , Citocromos b/genética , Transcriptase Reversa do HIV/genética , Haemosporida/genética , Rodopsina/genética , Ribulose-Bifosfato Carboxilase/genética , Tricomas/genética
16.
bioRxiv ; 2021 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-32995781

RESUMO

We report the likely most recent common ancestor of SARS-CoV-2 - the coronavirus that causes COVID-19. This progenitor SARS-CoV-2 genome was recovered through a novel application and advancement of computational methods initially developed to reconstruct the mutational history of tumor cells in a patient. The progenitor differs from the earliest coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the USA harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide as soon as weeks after the first reported cases of COVID-19. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains, which have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic. There have been multiple replacements of predominant coronavirus strains in Europe and Asia and the continued presence of multiple high-frequency strains in Asia and North America. We provide a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (http://sars2evo.datamonkey.org/).

17.
PLoS Pathog ; 16(8): e1008643, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32790776

RESUMO

The current state of much of the Wuhan pneumonia virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies and requires unimpeded access to data, analysis tools, and computational infrastructure. Here, we show that community efforts in developing open analytical software tools over the past 10 years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner. Specifically, we use all SARS-CoV-2 genomic data available in the public domain so far to (1) underscore the importance of access to raw data and (2) demonstrate that existing community efforts in curation and deployment of biomedical software can reliably support rapid, reproducible research during global health crises. All our analyses are fully documented at https://github.com/galaxyproject/SARS-CoV-2.


Assuntos
Betacoronavirus/patogenicidade , Infecções por Coronavirus/virologia , Pneumonia Viral/virologia , Saúde Pública , Síndrome Respiratória Aguda Grave/virologia , COVID-19 , Análise de Dados , Humanos , Pandemias , SARS-CoV-2
18.
bioRxiv ; 2020 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-32577659

RESUMO

RNA viruses are proficient at switching host species, and evolving adaptations to exploit the new host's cells efficiently. Surprisingly, SARS-CoV-2 has apparently required no significant adaptation to humans since the start of the COVID-19 pandemic, with no observed selective sweeps since genome sampling began. Here we assess the types of natural selection taking place in Sarbecoviruses in horseshoe bats versus SARS-CoV-2 evolution in humans. While there is moderate evidence of diversifying positive selection in SARS-CoV-2 in humans, it is limited to the early phase of the pandemic, and purifying selection is much weaker in SARS-CoV-2 than in related bat Sarbecoviruses . In contrast, our analysis detects significant positive episodic diversifying selection acting on the bat virus lineage SARS-CoV-2 emerged from, accompanied by an adaptive depletion in CpG composition presumed to be linked to the action of antiviral mechanisms in ancestral hosts. The closest bat virus to SARS-CoV-2, RmYN02 (sharing an ancestor ∼1976), is a recombinant with a structure that includes differential CpG content in Spike; clear evidence of coinfection and evolution in bats without involvement of other species. Collectively our results demonstrate the progenitor of SARS-CoV-2 was capable of near immediate human-human transmission as a consequence of its adaptive evolutionary history in bats, not humans.

19.
Mol Biol Evol ; 37(1): 295-299, 2020 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-31504749

RESUMO

HYpothesis testing using PHYlogenies (HyPhy) is a scriptable, open-source package for fitting a broad range of evolutionary models to multiple sequence alignments, and for conducting subsequent parameter estimation and hypothesis testing, primarily in the maximum likelihood statistical framework. It has become a popular choice for characterizing various aspects of the evolutionary process: natural selection, evolutionary rates, recombination, and coevolution. The 2.5 release (available from www.hyphy.org) includes a completely re-engineered computational core and analysis library that introduces new classes of evolutionary models and statistical tests, delivers substantial performance and stability enhancements, improves usability, streamlines end-to-end analysis workflows, makes it easier to develop custom analyses, and is mostly backward compatible with previous HyPhy releases.


Assuntos
Técnicas Genéticas , Filogenia , Software
20.
Methods Mol Biol ; 1910: 427-468, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31278673

RESUMO

Natural selection is a fundamental force shaping organismal evolution, as it both maintains function and enables adaptation and innovation. Viruses, with their typically short and largely coding genomes, experience strong and diverse selective forces, sometimes acting on timescales that can be directly measured. These selection pressures emerge from an antagonistic interplay between rapidly changing fitness requirements (immune and antiviral responses from hosts, transmission between hosts, or colonization of new host species) and functional imperatives (the ability to infect hosts or host cells and replicate within hosts). Indeed, computational methods to quantify these evolutionary forces using molecular sequence data were initially, dating back to the 1980s, applied to the study of viral pathogens. This preference largely emerged because the strong selective forces are easiest to detect in viruses, and, of course, viruses have clear biomedical relevance. Recent commoditization of affordable high-throughput sequencing has made it possible to generate truly massive genomic data sets, on which powerful and accurate methods can yield a very detailed depiction of when, where, and (sometimes) how viral pathogens respond to various selective forces.Here, we present recent statistical developments and state-of-the-art methods to identify and characterize these selection pressures from protein-coding sequence alignments and phylogenies. Methods described here can reveal critical information about various evolutionary regimes, including whole-gene selection, lineage-specific selection, and site-specific selection acting upon viral genomes, while accounting for confounding biological processes, such as recombination and variation in mutation rates.


Assuntos
Evolução Molecular , Genoma Viral , Genômica , Vírus/genética , Códon , Biologia Computacional/métodos , Variação Genética , Genômica/métodos , Filogenia , Recombinação Genética , Seleção Genética , Software , Vírus/classificação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA