Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Nat Microbiol ; 9(2): 550-560, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38316930

RESUMO

Pathogen lineage nomenclature systems are a key component of effective communication and collaboration for researchers and public health workers. Since February 2021, the Pango dynamic lineage nomenclature for SARS-CoV-2 has been sustained by crowdsourced lineage proposals as new isolates were sequenced. This approach is vulnerable to time-critical delays as well as regional and personal bias. Here we developed a simple heuristic approach for dividing phylogenetic trees into lineages, including the prioritization of key mutations or genes. Our implementation is efficient on extremely large phylogenetic trees consisting of millions of sequences and produces similar results to existing manually curated lineage designations when applied to SARS-CoV-2 and other viruses including chikungunya virus, Venezuelan equine encephalitis virus complex and Zika virus. This method offers a simple, automated and consistent approach to pathogen nomenclature that can assist researchers in developing and maintaining phylogeny-based classifications in the face of ever-increasing genomic datasets.


Assuntos
Vírus da Encefalite Equina Venezuelana , Infecção por Zika virus , Zika virus , Animais , Cavalos/genética , Filogenia , Vírus da Encefalite Equina Venezuelana/genética , Genômica , Sequência de Bases , Genoma Viral , SARS-CoV-2/genética , Zika virus/genética
2.
Virus Evol ; 10(1): vead085, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38361813

RESUMO

With the rapid spread and evolution of SARS-CoV-2, the ability to monitor its transmission and distinguish among viral lineages is critical for pandemic response efforts. The most commonly used software for the lineage assignment of newly isolated SARS-CoV-2 genomes is pangolin, which offers two methods of assignment, pangoLEARN and pUShER. PangoLEARN rapidly assigns lineages using a machine-learning algorithm, while pUShER performs a phylogenetic placement to identify the lineage corresponding to a newly sequenced genome. In a preliminary study, we observed that pangoLEARN (decision tree model), while substantially faster than pUShER, offered less consistency across different versions of pangolin v3. Here, we expand upon this analysis to include v3 and v4 of pangolin, which moved the default algorithm for lineage assignment from pangoLEARN in v3 to pUShER in v4, and perform a thorough analysis confirming that pUShER is not only more stable across versions but also more accurate. Our findings suggest that future lineage assignment algorithms for various pathogens should consider the value of phylogenetic placement.

4.
Mol Biol Evol ; 40(9)2023 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-37739926

RESUMO

Fins are major functional appendages of fish that have been repeatedly modified in different lineages. To search for genomic changes underlying natural fin diversity, we compared the genomes of 36 percomorph fish species that span over 100 million years of evolution and either have complete or reduced pelvic and caudal fins. We identify 1,614 genomic regions that are well-conserved in fin-complete species but missing from multiple fin-reduced lineages. Recurrent deletions of conserved sequences in wild fin-reduced species are enriched for functions related to appendage development, suggesting that convergent fin reduction at the organismal level is associated with repeated genomic deletions near fin-appendage development genes. We used sequencing and functional enhancer assays to confirm that PelA, a Pitx1 enhancer previously linked to recurrent pelvic loss in sticklebacks, has also been independently deleted and may have contributed to the fin morphology in distantly related pelvic-reduced species. We also identify a novel enhancer that is conserved in the majority of percomorphs, drives caudal fin expression in transgenic stickleback, is missing in tetraodontiform, syngnathid, and synbranchid species with caudal fin reduction, and alters caudal fin development when targeted by genome editing. Our study illustrates a broadly applicable strategy for mapping phenotypes to genotypes across a tree of vertebrate species and highlights notable new examples of regulatory genomic hotspots that have been used to evolve recurrent phenotypes across 100 million years of fish evolution.


Assuntos
Peixes , Smegmamorpha , Animais , Peixes/genética , Genômica , Genótipo , Smegmamorpha/genética , Nadadeiras de Animais
5.
Bioinformatics ; 39(9)2023 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-37651445

RESUMO

MOTIVATION: Neighbour-Joining is one of the most widely used distance-based phylogenetic inference methods. However, current implementations do not scale well for datasets with more than 10 000 sequences. Given the increasing pace of generating new sequence data, particularly in outbreaks of emerging diseases, and the already enormous existing databases of sequence data for which Neighbour-Joining is a useful approach, new implementations of existing methods are warranted. RESULTS: Here, we present DecentTree, which provides highly optimized and parallel implementations of Neighbour-Joining and several of its variants. DecentTree is designed as a stand-alone application and a header-only library easily integrated with other phylogenetic software (e.g. it is integral in the popular IQ-TREE software). We show that DecentTree shows similar or improved performance over existing software (BIONJ, Quicktree, FastME, and RapidNJ), especially for handling very large alignments. For example, DecentTree is up to 6-fold faster than the fastest existing Neighbour-Joining software (e.g. RapidNJ) when generating a tree of 64 000 SARS-CoV-2 genomes. AVAILABILITY AND IMPLEMENTATION: DecentTree is open source and freely available at https://github.com/iqtree/decenttree. All code and data used in this analysis are available on Github (https://github.com/asdcid/Comparison-of-neighbour-joining-software).


Assuntos
COVID-19 , Humanos , Filogenia , SARS-CoV-2/genética , Genômica , Biblioteca Gênica
6.
Bioinformatics ; 39(9)2023 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-37651464

RESUMO

MOTIVATION: Identifying and tracking recombinant strains of SARS-CoV-2 is critical to understanding the evolution of the virus and controlling its spread. But confidently identifying SARS-CoV-2 recombinants from thousands of new genome sequences that are being shared online every day is quite challenging, causing many recombinants to be missed or suffer from weeks of delay in being formally identified while undergoing expert curation. RESULTS: We present RIVET-a software pipeline and visual platform that takes advantage of recent algorithmic advances in recombination inference to comprehensively and sensitively search for potential SARS-CoV-2 recombinants and organize the relevant information in a web interface that would help greatly accelerate the process of identifying and tracking recombinants. AVAILABILITY AND IMPLEMENTATION: RIVET-based web interface displaying the most updated analysis of potential SARS-CoV-2 recombinants is available at https://rivet.ucsd.edu/. RIVET's frontend and backend code is freely available under the MIT license at https://github.com/TurakhiaLab/rivet and the documentation for RIVET is available at https://turakhialab.github.io/rivet/. The inputs necessary for running RIVET's backend workflow for SARS-CoV-2 are available through a public database maintained and updated daily by UCSC (https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/).


Assuntos
COVID-19 , Humanos , SARS-CoV-2/genética , Bases de Dados Factuais , Documentação , Software
7.
Syst Biol ; 72(5): 1039-1051, 2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37232476

RESUMO

Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 data sets do not fit this mold. There are currently over 14 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an "online" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) and pseudo-ML methods may be more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger data sets. Here, we evaluate the performance of de novo and online phylogenetic approaches, as well as ML, pseudo-ML, and MP frameworks for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimization with UShER and matOptimize produces equivalent SARS-CoV-2 phylogenies to some of the most popular ML and pseudo-ML inference tools. MP optimization with UShER and matOptimize is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo inference. Our results therefore suggest that parsimony-based methods like UShER and matOptimize represent an accurate and more practical alternative to established ML implementations for large SARS-CoV-2 phylogenies and could be successfully applied to other similar data sets with particularly dense sampling and short branch lengths.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Filogenia , Probabilidade , Genômica
8.
Microb Genom ; 9(5)2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37185044

RESUMO

Exposure to different mutagens leaves distinct mutational patterns that can allow inference of pathogen replication niches. We therefore investigated whether SARS-CoV-2 mutational spectra might show lineage-specific differences, dependent on the dominant site(s) of replication and onwards transmission, and could therefore rapidly infer virulence of emergent variants of concern (VOCs). Through mutational spectrum analysis, we found a significant reduction in G>T mutations in the Omicron variant, which replicates in the upper respiratory tract (URT), compared to other lineages, which replicate in both the URT and lower respiratory tract (LRT). Mutational analysis of other viruses and bacteria indicates a robust, generalizable association of high G>T mutations with replication within the LRT. Monitoring G>T mutation rates over time, we found early separation of Omicron from Beta, Gamma and Delta, while mutational patterns in Alpha varied consistent with changes in transmission source as social restrictions were lifted. Mutational spectra may be a powerful tool to infer niches of established and emergent pathogens.


Assuntos
COVID-19 , Humanos , SARS-CoV-2/genética , Mutação , Bactérias/genética , Pulmão
9.
Nat Genet ; 55(5): 746-752, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37038003

RESUMO

Phylogenetics has a crucial role in genomic epidemiology. Enabled by unparalleled volumes of genome sequence data generated to study and help contain the COVID-19 pandemic, phylogenetic analyses of SARS-CoV-2 genomes have shed light on the virus's origins, spread, and the emergence and reproductive success of new variants. However, most phylogenetic approaches, including maximum likelihood and Bayesian methods, cannot scale to the size of the datasets from the current pandemic. We present 'MAximum Parsimonious Likelihood Estimation' (MAPLE), an approach for likelihood-based phylogenetic analysis of epidemiological genomic datasets at unprecedented scales. MAPLE infers SARS-CoV-2 phylogenies more accurately than existing maximum likelihood approaches while running up to thousands of times faster, and requiring at least 100 times less memory on large datasets. This extends the reach of genomic epidemiology, allowing the continued use of accurate phylogenetic, phylogeographic and phylodynamic analyses on datasets of millions of genomes.


Assuntos
COVID-19 , Humanos , Filogenia , COVID-19/epidemiologia , COVID-19/genética , SARS-CoV-2/genética , Funções Verossimilhança , Pandemias , Teorema de Bayes
10.
bioRxiv ; 2023 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-36778215

RESUMO

Fins are major functional appendages of fish that have been repeatedly modified in different lineages. To search for genomic changes underlying natural fin diversity, we compared the genomes of 36 wild fish species that either have complete or reduced pelvic and caudal fins. We identify 1,614 genomic regions that are well-conserved in fin-complete species but missing from multiple fin-reduced lineages. Recurrent deletions of conserved sequences (CONDELs) in wild fin-reduced species are enriched for functions related to appendage development, suggesting that convergent fin reduction at the organismal level is associated with repeated genomic deletions near fin-appendage development genes. We used sequencing and functional enhancer assays to confirm that PelA , a Pitx1 enhancer previously linked to recurrent pelvic loss in sticklebacks, has also been independently deleted and may have contributed to the fin morphology in distantly related pelvic-reduced species. We also identify a novel enhancer that is conserved in the majority of percomorphs, drives caudal fin expression in transgenic stickleback, is missing in tetraodontiform, s yngnathid, and synbranchid species with caudal fin reduction, and which alters caudal fin development when targeted by genome editing. Our study illustrates a general strategy for mapping phenotypes to genotypes across a tree of vertebrate species, and highlights notable new examples of regulatory genomic hotspots that have been used to evolve recurrent phenotypes during 100 million years of fish evolution.

11.
Nature ; 609(7929): 994-997, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35952714

RESUMO

Accurate and timely detection of recombinant lineages is crucial for interpreting genetic variation, reconstructing epidemic spread, identifying selection and variants of interest, and accurately performing phylogenetic analyses1-4. During the SARS-CoV-2 pandemic, genomic data generation has exceeded the capacities of existing analysis platforms, thereby crippling real-time analysis of viral evolution5. Here, we use a new phylogenomic method to search a nearly comprehensive SARS-CoV-2 phylogeny for recombinant lineages. In a 1.6 million sample tree from May 2021, we identify 589 recombination events, which indicate that around 2.7% of sequenced SARS-CoV-2 genomes have detectable recombinant ancestry. Recombination breakpoints are inferred to occur disproportionately in the 3' portion of the genome that contains the spike protein. Our results highlight the need for timely analyses of recombination for pinpointing the emergence of recombinant lineages with the potential to increase transmissibility or virulence of the virus. We anticipate that this approach will empower comprehensive real-time tracking of viral recombination during the SARS-CoV-2 pandemic and beyond.


Assuntos
COVID-19 , Genoma Viral , Pandemias , Filogenia , Recombinação Genética , SARS-CoV-2 , COVID-19/epidemiologia , COVID-19/transmissão , COVID-19/virologia , Genoma Viral/genética , Humanos , Mutação , Recombinação Genética/genética , SARS-CoV-2/genética , SARS-CoV-2/patogenicidade , Seleção Genética/genética , Glicoproteína da Espícula de Coronavírus/genética , Virulência/genética
12.
Bioinformatics ; 38(15): 3734-3740, 2022 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-35731204

RESUMO

MOTIVATION: Phylogenetic tree optimization is necessary for precise analysis of evolutionary and transmission dynamics, but existing tools are inadequate for handling the scale and pace of data produced during the coronavirus disease 2019 (COVID-19) pandemic. One transformative approach, online phylogenetics, aims to incrementally add samples to an ever-growing phylogeny, but there are no previously existing approaches that can efficiently optimize this vast phylogeny under the time constraints of the pandemic. RESULTS: Here, we present matOptimize, a fast and memory-efficient phylogenetic tree optimization tool based on parsimony that can be parallelized across multiple CPU threads and nodes, and provides orders of magnitude improvement in runtime and peak memory usage compared to existing state-of-the-art methods. We have developed this method particularly to address the pressing need during the COVID-19 pandemic for daily maintenance and optimization of a comprehensive SARS-CoV-2 phylogeny. matOptimize is currently helping refine on a daily basis possibly the largest-ever phylogenetic tree, containing millions of SARS-CoV-2 sequences. AVAILABILITY AND IMPLEMENTATION: The matOptimize code is freely available as part of the UShER package (https://github.com/yatisht/usher) and can also be installed via bioconda (https://bioconda.github.io/recipes/usher/README.html). All scripts we used to perform the experiments in this manuscript are available at https://github.com/yceh/matOptimize-experiments. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Filogenia , SARS-CoV-2/genética , Pandemias , Software
13.
Virus Evol ; 8(1): veac048, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35769891

RESUMO

The unprecedented severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) global sequencing effort has suffered from an analytical bottleneck. Many existing methods for phylogenetic analysis are designed for sparse, static datasets and are too computationally expensive to apply to densely sampled, rapidly expanding datasets when results are needed immediately to inform public health action. For example, public health is often concerned with identifying clusters of closely related samples, but the sheer scale of the data prevents manual inspection and the current computational models are often too expensive in time and resources. Even when results are available, intuitive data exploration tools are of critical importance to effective public health interpretation and action. To help address this need, we present a phylogenetic heuristic that quickly and efficiently identifies newly introduced strains in a region, resulting in clusters of infected individuals, and their putative geographic origins. We show that this approach performs well on simulated data and yields results largely congruent with more sophisticated Bayesian phylogeographic modeling approaches. We also introduce Cluster-Tracker (https://clustertracker.gi.ucsc.edu/), a novel interactive web-based tool to facilitate effective and intuitive SARS-CoV-2 geographic data exploration and visualization across the USA. Cluster-Tracker is updated daily and automatically identifies and highlights groups of closely related SARS-CoV-2 infections resulting from the transmission of the virus between two geographic areas by travelers, streamlining public health tracking of local viral diversity and emerging infection clusters. The site is open-source and designed to be easily configured to analyze any chosen region, making it a useful resource globally. The combination of these open-source tools will empower detailed investigations of the geographic origins and spread of SARS-CoV-2 and other densely sampled pathogens.

14.
bioRxiv ; 2022 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-35611334

RESUMO

Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 datasets do not fit this mould. There are currently over 10 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an "online" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) methods are more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger datasets. Here, we evaluate the performance of de novo and online phylogenetic approaches, and ML and MP frameworks, for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimizations produce more accurate SARS-CoV-2 phylogenies than do ML optimizations. Since MP is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo , we therefore propose that, in the context of comprehensive genomic epidemiology of SARS-CoV-2, MP online phylogenetics approaches should be favored.

15.
PLoS Comput Biol ; 18(4): e1010056, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35486906

RESUMO

Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, and are an essential component of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here, we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and it implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutability models that we developed to more realistically represent SARS-CoV-2 genome evolution.


Assuntos
COVID-19 , Pandemias , Algoritmos , COVID-19/epidemiologia , Simulação por Computador , Evolução Molecular , Humanos , Filogenia , SARS-CoV-2/genética , Software
16.
bioRxiv ; 2022 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-35350209

RESUMO

Phylogenetics plays a crucial role in the interpretation of genomic data1. Phylogenetic analyses of SARS-CoV-2 genomes have allowed the detailed study of the virus's origins2, of its international3,4 and local4-9 spread, and of the emergence10 and reproductive success11 of new variants, among many applications. These analyses have been enabled by the unparalleled volumes of genome sequence data generated and employed to study and help contain the pandemic12. However, preferred model-based phylogenetic approaches including maximum likelihood and Bayesian methods, mostly based on Felsenstein's 'pruning' algorithm13,14, cannot scale to the size of the datasets from the current pandemic4,15, hampering our understanding of the virus's evolution and transmission16. We present new approaches, based on reworking Felsenstein's algorithm, for likelihood-based phylogenetic analysis of epidemiological genomic datasets at unprecedented scales. We exploit near-certainty regarding ancestral genomes, and the similarities between closely related and densely sampled genomes, to greatly reduce computational demands for memory and time. Combined with new methods for searching amongst candidate evolutionary trees, this results in our MAPLE ('MAximum Parsimonious Likelihood Estimation') software giving better results than popular approaches such as FastTree 217, IQ-TREE 218, RAxML-NG19 and UShER15. Our approach therefore allows complex and accurate probabilistic phylogenetic analyses of millions of microbial genomes, extending the reach of genomic epidemiology. Future epidemiological datasets are likely to be even larger than those currently associated with COVID-19, and other disciplines such as metagenomics and biodiversity science are also generating huge numbers of genome sequences20-22. Our methods will permit continued use of preferred likelihood-based phylogenetic analyses.

17.
Genome Biol Evol ; 14(3)2022 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-35171243

RESUMO

We present Champagne, a whole-genome method for generating character matrices for phylogenomic analysis using large genomic indel events. By rigorously picking orthologous genes and locating large insertion and deletion events, Champagne delivers a character matrix that considerably reduces homoplasy compared with morphological and nucleotide-based matrices, on both established phylogenies and difficult-to-resolve nodes in the mammalian tree. Champagne provides ample evidence in the form of genomic structural variation to support incomplete lineage sorting and possible introgression in Paenungulata and human-chimp-gorilla which were previously inferred primarily through matrices composed of aligned single-nucleotide characters. Champagne also offers further evidence for Myomorpha as sister to Sciuridae and Hystricomorpha in the rodent tree. Champagne harbors distinct theoretical advantages as an automated method that produces nearly homoplasy-free character matrices on the whole-genome scale.


Assuntos
Genoma , Genômica , Animais , Mutação INDEL , Mamíferos , Nucleotídeos , Filogenia
18.
bioRxiv ; 2021 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-34927180

RESUMO

Phylogenetics has been central to the genomic surveillance, epidemiology and contact tracing efforts during the COVD-19 pandemic. But the massive scale of genomic sequencing has rendered the pre-pandemic tools inadequate for comprehensive phylogenetic analyses. Here, we discuss the phylogenetic package that we developed to address the needs imposed by this pandemic. The package incorporates several pandemic-specific optimization and parallelization techniques and comprises four programs: UShER, matOptimize, RIPPLES and matUtils. Using high-performance computing, UShER and matOptimize maintain and refine daily a massive mutation-annotated phylogenetic tree consisting of all SARS-CoV-2 sequences available in online repositories. With UShER and RIPPLES, individual labs - even with modest compute resources - incorporate newly-sequenced SARS-CoV-2 genomes on this phylogeny and discover evidence for recombination in real-time. With matUtils, they rapidly query and visualize massive SARS-CoV-2 phylogenies. These tools have empowered scientists worldwide to study the SARS-CoV-2 evolution and transmission at an unprecedented scale, resolution and speed.

19.
Mol Biol Evol ; 38(12): 5819-5824, 2021 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-34469548

RESUMO

The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils-a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.


Assuntos
Evolução Molecular , Filogenia , SARS-CoV-2 , COVID-19/virologia , Humanos , Mutação , SARS-CoV-2/genética , Software
20.
Nat Genet ; 53(6): 809-816, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33972780

RESUMO

As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering in a new era of 'genomic contact tracing'-that is, using viral genomes to trace local transmission dynamics. However, because the viral phylogeny is already so large-and will undoubtedly grow many fold-placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach greatly improves the speed of phylogenetic placement of new samples and data visualization, making it possible to complete the placements under the constraints of real-time contact tracing. Thus, our method addresses an important need for maintaining a fully updated reference phylogeny. We make these tools available to the research community through the University of California Santa Cruz SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for SARS-CoV-2 specifically for laboratories worldwide.


Assuntos
COVID-19/epidemiologia , COVID-19/virologia , Biologia Computacional/métodos , Filogenia , SARS-CoV-2/classificação , SARS-CoV-2/genética , Software , Algoritmos , Biologia Computacional/normas , Bases de Dados Genéticas , Genoma Viral , Humanos , Anotação de Sequência Molecular , Mutação , Navegador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA