Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Diversity and specificity of molecular functions in cyanobacterial symbionts.

Cameron, Ellen S; Sanchez, Santiago; Goldman, Nick; Blaxter, Mark L; Finn, Robert D.

Sci Rep ; 14(1): 18658, 2024 08 12.

Artigo em Inglês | MEDLINE | ID: mdl-39134591

RESUMO

Cyanobacteria are globally occurring photosynthetic bacteria notable for their contribution to primary production and production of toxins which have detrimental ecosystem impacts. Furthermore, cyanobacteria can form mutualistic symbiotic relationships with a diverse set of eukaryotes, including land plants, aquatic plankton and fungi. Nevertheless, not all cyanobacteria are found in symbiotic associations suggesting symbiotic cyanobacteria have evolved specializations that facilitate host-interactions. Photosynthetic capabilities, nitrogen fixation, and the production of complex biochemicals are key functions provided by host-associated cyanobacterial symbionts. To explore if additional specializations are associated with such lifestyles in cyanobacteria, we have conducted comparative phylogenomics of molecular functions and of biosynthetic gene clusters (BGCs) in 984 cyanobacterial genomes. Cyanobacteria with host-associated and symbiotic lifestyles were concentrated in the family Nostocaceae, where eight monophyletic clades correspond to specific host taxa. In agreement with previous studies, symbionts are likely to provide fixed nitrogen to their eukaryotic partners, through multiple different nitrogen fixation pathways. Additionally, our analyses identified chitin metabolising pathways in cyanobacteria associated with specific host groups, while obligate symbionts had fewer BGCs. The conservation of molecular functions and BGCs between closely related symbiotic and free-living cyanobacteria suggests the potential for additional cyanobacteria to form symbiotic relationships than is currently known.

Assuntos

Cianobactérias , Fixação de Nitrogênio , Filogenia , Simbiose , Cianobactérias/genética , Cianobactérias/metabolismo , Genoma Bacteriano , Família Multigênica , Fotossíntese

2.

Detection of hidden antibiotic resistance through real-time genomics.

Sauerborn, Ela; Corredor, Nancy Carolina; Reska, Tim; Perlas, Albert; Vargas da Fonseca Atum, Samir; Goldman, Nick; Wantia, Nina; Prazeres da Costa, Clarissa; Foster-Nyarko, Ebenezer; Urban, Lara.

Nat Commun ; 15(1): 5494, 2024 Jun 28.

Artigo em Inglês | MEDLINE | ID: mdl-38944650

RESUMO

Real-time genomics through nanopore sequencing holds the promise of fast antibiotic resistance prediction directly in the clinical setting. However, concerns about the accuracy of genomics-based resistance predictions persist, particularly when compared to traditional, clinically established diagnostic methods. Here, we leverage the case of a multi-drug resistant Klebsiella pneumoniae infection to demonstrate how real-time genomics can enhance the accuracy of antibiotic resistance profiling in complex infection scenarios. Our results show that unlike established diagnostics, nanopore sequencing data analysis can accurately detect low-abundance plasmid-mediated resistance, which often remains undetected by conventional methods. This capability has direct implications for clinical practice, where such "hidden" resistance profiles can critically influence treatment decisions. Consequently, the rapid, in situ application of real-time genomics holds significant promise for improving clinical decision-making and patient outcomes.

Assuntos

Antibacterianos , Farmacorresistência Bacteriana Múltipla , Genômica , Infecções por Klebsiella , Klebsiella pneumoniae , Klebsiella pneumoniae/genética , Klebsiella pneumoniae/efeitos dos fármacos , Genômica/métodos , Humanos , Antibacterianos/farmacologia , Infecções por Klebsiella/microbiologia , Infecções por Klebsiella/tratamento farmacológico , Infecções por Klebsiella/diagnóstico , Farmacorresistência Bacteriana Múltipla/genética , Plasmídeos/genética , Sequenciamento por Nanoporos/métodos , Genoma Bacteriano/genética , Testes de Sensibilidade Microbiana

3.

A retroviral link to vertebrate myelination through retrotransposon-RNA-mediated control of myelin gene expression.

Ghosh, Tanay; Almeida, Rafael G; Zhao, Chao; Mannioui, Abdelkrim; Martin, Elodie; Fleet, Alex; Chen, Civia Z; Assinck, Peggy; Ellams, Sophie; Gonzalez, Ginez A; Graham, Stephen C; Rowitch, David H; Stott, Katherine; Adams, Ian; Zalc, Bernard; Goldman, Nick; Lyons, David A; Franklin, Robin J M.

Cell ; 187(4): 814-830.e23, 2024 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-38364788

RESUMO

Myelin, the insulating sheath that surrounds neuronal axons, is produced by oligodendrocytes in the central nervous system (CNS). This evolutionary innovation, which first appears in jawed vertebrates, enabled rapid transmission of nerve impulses, more complex brains, and greater morphological diversity. Here, we report that RNA-level expression of RNLTR12-int, a retrotransposon of retroviral origin, is essential for myelination. We show that RNLTR12-int-encoded RNA binds to the transcription factor SOX10 to regulate transcription of myelin basic protein (Mbp, the major constituent of myelin) in rodents. RNLTR12-int-like sequences (which we name RetroMyelin) are found in all jawed vertebrates, and we further demonstrate their function in regulating myelination in two different vertebrate classes (zebrafish and frogs). Our study therefore suggests that retroviral endogenization played a prominent role in the emergence of vertebrate myelin.

Assuntos

Bainha de Mielina , Retroelementos , Animais , Expressão Gênica , Bainha de Mielina/metabolismo , Oligodendroglia/metabolismo , Retroelementos/genética , RNA/metabolismo , Peixe-Zebra/genética , Anuros

4.

DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies.

Kapli, Paschalia; Kotari, Ioanna; Telford, Maximilian J; Goldman, Nick; Yang, Ziheng.

Syst Biol ; 72(5): 1119-1135, 2023 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-37366056

RESUMO

Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here, we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis strategies to an empirical dataset to infer the metazoan phylogeny. Our results from both simulated and real data suggest that DNA sequences may be as useful as proteins for inferring deep phylogenies and should not be excluded from such analyses. Analysis of DNA data under nucleotide models has a major computational advantage over protein-data analysis, potentially making it feasible to use advanced models that account for among-site and among-lineage heterogeneity in the nucleotide-substitution process in inference of deep phylogenies.

Assuntos

Modelos Genéticos , Nucleotídeos , Animais , Filogenia , Sequência de Bases , Códon , Aminoácidos/genética , Evolução Molecular

5.

Maximum likelihood pandemic-scale phylogenetics.

De Maio, Nicola; Kalaghatgi, Prabhav; Turakhia, Yatish; Corbett-Detig, Russell; Minh, Bui Quang; Goldman, Nick.

Nat Genet ; 55(5): 746-752, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-37038003

RESUMO

Phylogenetics has a crucial role in genomic epidemiology. Enabled by unparalleled volumes of genome sequence data generated to study and help contain the COVID-19 pandemic, phylogenetic analyses of SARS-CoV-2 genomes have shed light on the virus's origins, spread, and the emergence and reproductive success of new variants. However, most phylogenetic approaches, including maximum likelihood and Bayesian methods, cannot scale to the size of the datasets from the current pandemic. We present 'MAximum Parsimonious Likelihood Estimation' (MAPLE), an approach for likelihood-based phylogenetic analysis of epidemiological genomic datasets at unprecedented scales. MAPLE infers SARS-CoV-2 phylogenies more accurately than existing maximum likelihood approaches while running up to thousands of times faster, and requiring at least 100 times less memory on large datasets. This extends the reach of genomic epidemiology, allowing the continued use of accurate phylogenetic, phylogeographic and phylodynamic analyses on datasets of millions of genomes.

Assuntos

COVID-19 , Humanos , Filogenia , COVID-19/epidemiologia , COVID-19/genética , SARS-CoV-2/genética , Funções Verossimilhança , Pandemias , Teorema de Bayes

6.

Dynamic, adaptive sampling during nanopore sequencing using Bayesian experimental design.

Weilguny, Lukas; De Maio, Nicola; Munro, Rory; Manser, Charlotte; Birney, Ewan; Loose, Matthew; Goldman, Nick.

Nat Biotechnol ; 41(7): 1018-1025, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-36593407

RESUMO

Nanopore sequencers can select which DNA molecules to sequence, rejecting a molecule after analysis of a small initial part. Currently, selection is based on predetermined regions of interest that remain constant throughout an experiment. Sequencing efforts, thus, cannot be re-focused on molecules likely contributing most to experimental success. Here we present BOSS-RUNS, an algorithmic framework and software to generate dynamically updated decision strategies. We quantify uncertainty at each genome position with real-time updates from data already observed. For each DNA fragment, we decide whether the expected decrease in uncertainty that it would provide warrants fully sequencing it, thus optimizing information gain. BOSS-RUNS mitigates coverage bias between and within members of a microbial community, leading to improved variant calling; for example, low-coverage sites of a species at 1% abundance were reduced by 87.5%, with 12.5% more single-nucleotide polymorphisms detected. Such data-driven updates to molecule selection are applicable to many sequencing scenarios, such as enriching for regions with increased divergence or low coverage, reducing time-to-answer.

Assuntos

Sequenciamento por Nanoporos , Nanoporos , Projetos de Pesquisa , Teorema de Bayes , Genoma , Software , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA

7.

Publisher Correction: Genomic reconstruction of the SARS CoV-2 epidemic in England.

Vöhringer, Harald S; Sanderson, Theo; Sinnott, Matthew; De Maio, Nicola; Nguyen, Thuy; Goater, Richard; Schwach, Frank; Harrison, Ian; Hellewell, Joel; Ariani, Cristina V; Gonçalves, Sonia; Jackson, David K; Johnston, Ian; Jung, Alexander W; Saint, Callum; Sillitoe, John; Suciu, Maria; Goldman, Nick; Panovska-Griffiths, Jasmina; Birney, Ewan; Volz, Erik; Funk, Sebastian; Kwiatkowski, Dominic; Chand, Meera; Martincorena, Inigo; Barrett, Jeffrey C; Gerstung, Moritz.

Nature ; 606(7915): E18, 2022 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-35701578

8.

phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets.

De Maio, Nicola; Boulton, William; Weilguny, Lukas; Walker, Conor R; Turakhia, Yatish; Corbett-Detig, Russell; Goldman, Nick.

PLoS Comput Biol ; 18(4): e1010056, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35486906

RESUMO

Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, and are an essential component of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here, we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and it implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutability models that we developed to more realistically represent SARS-CoV-2 genome evolution.

Assuntos

COVID-19 , Pandemias , Algoritmos , COVID-19/epidemiologia , Simulação por Computador , Evolução Molecular , Humanos , Filogenia , SARS-CoV-2/genética , Software

9.

Maximum likelihood pandemic-scale phylogenetics.

De Maio, Nicola; Kalaghatgi, Prabhav; Turakhia, Yatish; Corbett-Detig, Russell; Minh, Bui Quang; Goldman, Nick.

bioRxiv ; 2022 Jul 18.

Artigo em Inglês | MEDLINE | ID: mdl-35350209

RESUMO

Phylogenetics plays a crucial role in the interpretation of genomic data1. Phylogenetic analyses of SARS-CoV-2 genomes have allowed the detailed study of the virus's origins2, of its international3,4 and local4-9 spread, and of the emergence10 and reproductive success11 of new variants, among many applications. These analyses have been enabled by the unparalleled volumes of genome sequence data generated and employed to study and help contain the pandemic12. However, preferred model-based phylogenetic approaches including maximum likelihood and Bayesian methods, mostly based on Felsenstein's 'pruning' algorithm13,14, cannot scale to the size of the datasets from the current pandemic4,15, hampering our understanding of the virus's evolution and transmission16. We present new approaches, based on reworking Felsenstein's algorithm, for likelihood-based phylogenetic analysis of epidemiological genomic datasets at unprecedented scales. We exploit near-certainty regarding ancestral genomes, and the similarities between closely related and densely sampled genomes, to greatly reduce computational demands for memory and time. Combined with new methods for searching amongst candidate evolutionary trees, this results in our MAPLE ('MAximum Parsimonious Likelihood Estimation') software giving better results than popular approaches such as FastTree 217, IQ-TREE 218, RAxML-NG19 and UShER15. Our approach therefore allows complex and accurate probabilistic phylogenetic analyses of millions of microbial genomes, extending the reach of genomic epidemiology. Future epidemiological datasets are likely to be even larger than those currently associated with COVID-19, and other disciplines such as metagenomics and biodiversity science are also generating huge numbers of genome sequences20-22. Our methods will permit continued use of preferred likelihood-based phylogenetic analyses.

10.

Genomic reconstruction of the SARS-CoV-2 epidemic in England.

Vöhringer, Harald S; Sanderson, Theo; Sinnott, Matthew; De Maio, Nicola; Nguyen, Thuy; Goater, Richard; Schwach, Frank; Harrison, Ian; Hellewell, Joel; Ariani, Cristina V; Gonçalves, Sonia; Jackson, David K; Johnston, Ian; Jung, Alexander W; Saint, Callum; Sillitoe, John; Suciu, Maria; Goldman, Nick; Panovska-Griffiths, Jasmina; Birney, Ewan; Volz, Erik; Funk, Sebastian; Kwiatkowski, Dominic; Chand, Meera; Martincorena, Inigo; Barrett, Jeffrey C; Gerstung, Moritz.

Nature ; 600(7889): 506-511, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34649268

RESUMO

The evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus leads to new variants that warrant timely epidemiological characterization. Here we use the dense genomic surveillance data generated by the COVID-19 Genomics UK Consortium to reconstruct the dynamics of 71 different lineages in each of 315 English local authorities between September 2020 and June 2021. This analysis reveals a series of subepidemics that peaked in early autumn 2020, followed by a jump in transmissibility of the B.1.1.7/Alpha lineage. The Alpha variant grew when other lineages declined during the second national lockdown and regionally tiered restrictions between November and December 2020. A third more stringent national lockdown suppressed the Alpha variant and eliminated nearly all other lineages in early 2021. Yet a series of variants (most of which contained the spike E484K mutation) defied these trends and persisted at moderately increasing proportions. However, by accounting for sustained introductions, we found that the transmissibility of these variants is unlikely to have exceeded the transmissibility of the Alpha variant. Finally, B.1.617.2/Delta was repeatedly introduced in England and grew rapidly in early summer 2021, constituting approximately 98% of sampled SARS-CoV-2 genomes on 26 June 2021.

Assuntos

COVID-19/epidemiologia , COVID-19/virologia , Genoma Viral/genética , Genômica , SARS-CoV-2/genética , Substituição de Aminoácidos , COVID-19/transmissão , Inglaterra/epidemiologia , Monitoramento Epidemiológico , Humanos , Epidemiologia Molecular , Mutação , Quarentena/estatística & dados numéricos , SARS-CoV-2/classificação , Análise Espaço-Temporal , Glicoproteína da Espícula de Coronavírus/genética

11.

A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees.

McBroome, Jakob; Thornlow, Bryan; Hinrichs, Angie S; Kramer, Alexander; De Maio, Nicola; Goldman, Nick; Haussler, David; Corbett-Detig, Russell; Turakhia, Yatish.

Mol Biol Evol ; 38(12): 5819-5824, 2021 12 09.

Artigo em Inglês | MEDLINE | ID: mdl-34469548

RESUMO

The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils-a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.

Assuntos

Evolução Molecular , Filogenia , SARS-CoV-2 , COVID-19/virologia , Humanos , Mutação , SARS-CoV-2/genética , Software

12.

Genetic Variability of the SARS-CoV-2 Pocketome.

Yazdani, Setayesh; De Maio, Nicola; Ding, Yining; Shahani, Vijay; Goldman, Nick; Schapira, Matthieu.

J Proteome Res ; 20(8): 4212-4215, 2021 08 06.

Artigo em Inglês | MEDLINE | ID: mdl-34180678

RESUMO

In the absence of effective treatment, COVID-19 is likely to remain a global disease burden. Compounding this threat is the near certainty that novel coronaviruses with pandemic potential will emerge in years to come. Pan-coronavirus drugs-agents active against both SARS-CoV-2 and other coronaviruses-would address both threats. A strategy to develop such broad-spectrum inhibitors is to pharmacologically target binding sites on SARS-CoV-2 proteins that are highly conserved in other known coronaviruses, the assumption being that any selective pressure to keep a site conserved across past viruses will apply to future ones. Here we systematically mapped druggable binding pockets on the experimental structure of 15 SARS-CoV-2 proteins and analyzed their variation across 27 α- and ß-coronaviruses and across thousands of SARS-CoV-2 samples from COVID-19 patients. We find that the two most conserved druggable sites are a pocket overlapping the RNA binding site of the helicase nsp13 and the catalytic site of the RNA-dependent RNA polymerase nsp12, both components of the viral replication-transcription complex. We present the data on a public web portal (https://www.thesgc.org/SARSCoV2_pocketome/), where users can interactively navigate individual protein structures and view the genetic variability of drug-binding pockets in 3D.

Assuntos

COVID-19 , SARS-CoV-2 , Antivirais/farmacologia , Antivirais/uso terapêutico , Humanos , Pandemias , RNA Polimerase Dependente de RNA/genética

13.

A phylogenetic approach for weighting genetic sequences.

De Maio, Nicola; Alekseyenko, Alexander V; Coleman-Smith, William J; Pardi, Fabio; Suchard, Marc A; Tamuri, Asif U; Truszkowski, Jakub; Goldman, Nick.

BMC Bioinformatics ; 22(1): 285, 2021 May 28.

Artigo em Inglês | MEDLINE | ID: mdl-34049487

RESUMO

BACKGROUND: Many important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are 'novel' compared to the others in the same dataset, and low weights to sequences that are over-represented. RESULTS: We formalise this principle by rigorously defining the evolutionary 'novelty' of a sequence within an alignment. This results in new sequence weights that we call 'phylogenetic novelty scores'. These scores have various desirable properties, and we showcase their use by considering, as an example application, the inference of character frequencies at an alignment column-important, for example, in protein family profiling. We give computationally efficient algorithms for calculating our scores and, using simulations, show that they are versatile and can improve the accuracy of character frequency estimation compared to existing sequence weighting schemes. CONCLUSIONS: Our phylogenetic novelty scores can be useful when an evolutionarily meaningful system for adjusting for uneven taxon sampling is desired. They have numerous possible applications, including estimation of evolutionary conservation scores and sequence logos, identification of targets in conservation biology, and improving and measuring sequence alignment accuracy.

Assuntos

Algoritmos , Biologia Computacional , Filogenia , Alinhamento de Sequência

14.

Mutation Rates and Selection on Synonymous Mutations in SARS-CoV-2.

De Maio, Nicola; Walker, Conor R; Turakhia, Yatish; Lanfear, Robert; Corbett-Detig, Russell; Goldman, Nick.

Genome Biol Evol ; 13(5)2021 05 07.

Artigo em Inglês | MEDLINE | ID: mdl-33895815

RESUMO

The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, G âU and C âU, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. Although previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.

Assuntos

Taxa de Mutação , SARS-CoV-2/genética , Seleção Genética , Mutação Silenciosa/genética , COVID-19/virologia , Evolução Molecular , Genoma Viral , Filogenia , RNA Viral/genética , SARS-CoV-2/classificação , Análise de Sequência de RNA

15.

A daily-updated database and tools for comprehensive SARS-CoV-2 mutation-annotated trees.

McBroome, Jakob; Thornlow, Bryan; Hinrichs, Angie S; De Maio, Nicola; Goldman, Nick; Haussler, David; Corbett-Detig, Russell; Turakhia, Yatish.

bioRxiv ; 2021 Jul 13.

Artigo em Inglês | MEDLINE | ID: mdl-33821270

RESUMO

The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently-proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils - a command-line utility for rapidly querying, interpreting and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.

16.

phastSim: efficient simulation of sequence evolution for pandemic-scale datasets.

De Maio, Nicola; Boulton, William; Weilguny, Lukas; Walker, Conor R; Turakhia, Yatish; Corbett-Detig, Russell; Goldman, Nick.

bioRxiv ; 2021 Sep 23.

Artigo em Inglês | MEDLINE | ID: mdl-33758852

RESUMO

Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, as well as being part of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100,000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software is available from https://github.com/NicolaDM/phastSim and allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutatability models that we developed to more realistically represent SARS-CoV-2 genome evolution.

17.

Short-range template switching in great ape genomes explored using pair hidden Markov models.

Walker, Conor R; Scally, Aylwyn; De Maio, Nicola; Goldman, Nick.

PLoS Genet ; 17(3): e1009221, 2021 03.

Artigo em Inglês | MEDLINE | ID: mdl-33651813

RESUMO

Many complex genomic rearrangements arise through template switch errors, which occur in DNA replication when there is a transient polymerase switch to an alternate template nearby in three-dimensional space. While typically investigated at kilobase-to-megabase scales, the genomic and evolutionary consequences of this mutational process are not well characterised at smaller scales, where they are often interpreted as clusters of independent substitutions, insertions and deletions. Here we present an improved statistical approach using pair hidden Markov models, and use it to detect and describe short-range template switches underlying clusters of mutations in the multi-way alignment of hominid genomes. Using robust statistics derived from evolutionary genomic simulations, we show that template switch events have been widespread in the evolution of the great apes' genomes and provide a parsimonious explanation for the presence of many complex mutation clusters in their phylogenetic context. Larger-scale mechanisms of genome rearrangement are typically associated with structural features around breakpoints, and accordingly we show that atypical patterns of secondary structure formation and DNA bending are present at the initial template switch loci. Our methods improve on previous non-probabilistic approaches for computational detection of template switch mutations, allowing the statistical significance of events to be assessed. By specifying realistic evolutionary parameters based on the genomes and taxa involved, our methods can be readily adapted to other intra- or inter-species comparisons.

Assuntos

Replicação do DNA , Genoma , Hominidae/genética , Cadeias de Markov , Modelos Genéticos , Moldes Genéticos , Algoritmos , Animais , Genômica/métodos , Humanos , Poli A-U , Locos de Características Quantitativas

18.

Want to track pandemic variants faster? Fix the bioinformatics bottleneck.

Hodcroft, Emma B; De Maio, Nicola; Lanfear, Rob; MacCannell, Duncan R; Minh, Bui Quang; Schmidt, Heiko A; Stamatakis, Alexandros; Goldman, Nick; Dessimoz, Christophe.

Nature ; 591(7848): 30-33, 2021 03.

Artigo em Inglês | MEDLINE | ID: mdl-33649511

Assuntos

COVID-19/epidemiologia , COVID-19/virologia , Evolução Molecular , Genômica/métodos , Genômica/tendências , Mutação , SARS-CoV-2/genética , Animais , Automação/métodos , Número Básico de Reprodução , COVID-19/imunologia , COVID-19/transmissão , Vacinas contra COVID-19/imunologia , Genoma Viral/genética , Humanos , Vison/virologia , Pandemias/estatística & dados numéricos , Filogenia , Saúde Pública/métodos , Saúde Pública/tendências , SARS-CoV-2/imunologia , SARS-CoV-2/isolamento & purificação , SARS-CoV-2/patogenicidade , Mídias Sociais , Incerteza

19.

Mutation rates and selection on synonymous mutations in SARS-CoV-2.

De Maio, Nicola; Walker, Conor R; Turakhia, Yatish; Lanfear, Robert; Corbett-Detig, Russell; Goldman, Nick.

bioRxiv ; 2021 Jan 14.

Artigo em Inglês | MEDLINE | ID: mdl-33469589

RESUMO

The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, GâU and CâU, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. While previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.

20.

Sampling bias and model choice in continuous phylogeography: Getting lost on a random walk.

Kalkauskas, Antanas; Perron, Umberto; Sun, Yuxuan; Goldman, Nick; Baele, Guy; Guindon, Stephane; De Maio, Nicola.

PLoS Comput Biol ; 17(1): e1008561, 2021 01.

Artigo em Inglês | MEDLINE | ID: mdl-33406072

RESUMO

Phylogeographic inference allows reconstruction of past geographical spread of pathogens or living organisms by integrating genetic and geographic data. A popular model in continuous phylogeography-with location data provided in the form of latitude and longitude coordinates-describes spread as a Brownian motion (Brownian Motion Phylogeography, BMP) in continuous space and time, akin to similar models of continuous trait evolution. Here, we show that reconstructions using this model can be strongly affected by sampling biases, such as the lack of sampling from certain areas. As an attempt to reduce the effects of sampling bias on BMP, we consider the addition of sequence-free samples from under-sampled areas. While this approach alleviates the effects of sampling bias, in most scenarios this will not be a viable option due to the need for prior knowledge of an outbreak's spatial distribution. We therefore consider an alternative model, the spatial Λ-Fleming-Viot process (ΛFV), which has recently gained popularity in population genetics. Despite the ΛFV's robustness to sampling biases, we find that the different assumptions of the ΛFV and BMP models result in different applicabilities, with the ΛFV being more appropriate for scenarios of endemic spread, and BMP being more appropriate for recent outbreaks or colonizations.

Assuntos

Genética Populacional/métodos , Modelos Genéticos , Filogeografia/métodos , Viés de Seleção , Teorema de Bayes , Biologia Computacional , Surtos de Doenças/estatística & dados numéricos , Flavivirus/genética , Infecções por Flavivirus/epidemiologia , Infecções por Flavivirus/virologia , Humanos , Cadeias de Markov

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA