Búsqueda | Portal de Búsqueda de la BVS España

1.

Pandemic-scale phylogenomics reveals the SARS-CoV-2 recombination landscape.

Turakhia, Yatish; Thornlow, Bryan; Hinrichs, Angie; McBroome, Jakob; Ayala, Nicolas; Ye, Cheng; Smith, Kyle; De Maio, Nicola; Haussler, David; Lanfear, Robert; Corbett-Detig, Russell.

Nature ; 609(7929): 994-997, 2022 09.

Artículo en Inglés | MEDLINE | ID: mdl-35952714

RESUMEN

Accurate and timely detection of recombinant lineages is crucial for interpreting genetic variation, reconstructing epidemic spread, identifying selection and variants of interest, and accurately performing phylogenetic analyses1-4. During the SARS-CoV-2 pandemic, genomic data generation has exceeded the capacities of existing analysis platforms, thereby crippling real-time analysis of viral evolution5. Here, we use a new phylogenomic method to search a nearly comprehensive SARS-CoV-2 phylogeny for recombinant lineages. In a 1.6 million sample tree from May 2021, we identify 589 recombination events, which indicate that around 2.7% of sequenced SARS-CoV-2 genomes have detectable recombinant ancestry. Recombination breakpoints are inferred to occur disproportionately in the 3' portion of the genome that contains the spike protein. Our results highlight the need for timely analyses of recombination for pinpointing the emergence of recombinant lineages with the potential to increase transmissibility or virulence of the virus. We anticipate that this approach will empower comprehensive real-time tracking of viral recombination during the SARS-CoV-2 pandemic and beyond.

Asunto(s)

COVID-19 , Genoma Viral , Pandemias , Filogenia , Recombinación Genética , SARS-CoV-2 , COVID-19/epidemiología , COVID-19/transmisión , COVID-19/virología , Genoma Viral/genética , Humanos , Mutación , Recombinación Genética/genética , SARS-CoV-2/genética , SARS-CoV-2/patogenicidad , Selección Genética/genética , Glicoproteína de la Espiga del Coronavirus/genética , Virulencia/genética

2.

Genomic reconstruction of the SARS-CoV-2 epidemic in England.

Vöhringer, Harald S; Sanderson, Theo; Sinnott, Matthew; De Maio, Nicola; Nguyen, Thuy; Goater, Richard; Schwach, Frank; Harrison, Ian; Hellewell, Joel; Ariani, Cristina V; Gonçalves, Sonia; Jackson, David K; Johnston, Ian; Jung, Alexander W; Saint, Callum; Sillitoe, John; Suciu, Maria; Goldman, Nick; Panovska-Griffiths, Jasmina; Birney, Ewan; Volz, Erik; Funk, Sebastian; Kwiatkowski, Dominic; Chand, Meera; Martincorena, Inigo; Barrett, Jeffrey C; Gerstung, Moritz.

Nature ; 600(7889): 506-511, 2021 12.

Artículo en Inglés | MEDLINE | ID: mdl-34649268

RESUMEN

The evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus leads to new variants that warrant timely epidemiological characterization. Here we use the dense genomic surveillance data generated by the COVID-19 Genomics UK Consortium to reconstruct the dynamics of 71 different lineages in each of 315 English local authorities between September 2020 and June 2021. This analysis reveals a series of subepidemics that peaked in early autumn 2020, followed by a jump in transmissibility of the B.1.1.7/Alpha lineage. The Alpha variant grew when other lineages declined during the second national lockdown and regionally tiered restrictions between November and December 2020. A third more stringent national lockdown suppressed the Alpha variant and eliminated nearly all other lineages in early 2021. Yet a series of variants (most of which contained the spike E484K mutation) defied these trends and persisted at moderately increasing proportions. However, by accounting for sustained introductions, we found that the transmissibility of these variants is unlikely to have exceeded the transmissibility of the Alpha variant. Finally, B.1.617.2/Delta was repeatedly introduced in England and grew rapidly in early summer 2021, constituting approximately 98% of sampled SARS-CoV-2 genomes on 26 June 2021.

Asunto(s)

COVID-19/epidemiología , COVID-19/virología , Genoma Viral/genética , Genómica , SARS-CoV-2/genética , Sustitución de Aminoácidos , COVID-19/transmisión , Inglaterra/epidemiología , Monitoreo Epidemiológico , Humanos , Epidemiología Molecular , Mutación , Cuarentena/estadística & datos numéricos , SARS-CoV-2/clasificación , Análisis Espacio-Temporal , Glicoproteína de la Espiga del Coronavirus/genética

3.

CMAPLE: efficient phylogenetic inference in the pandemic era.

Ly-Trong, Nhan; Bielow, Chris; De Maio, Nicola; Minh, Bui Quang.

Mol Biol Evol ; 2024 Jun 27.

Artículo en Inglés | MEDLINE | ID: mdl-38934791

RESUMEN

We have recently introduced MAPLE (MAximum Parsimonious Likelihood Estimation), a new pandemic-scale phylogenetic inference method exclusively designed for genomic epidemiology. In response to the need for enhancing MAPLE's performance and scalability, here we present two key components: (1) CMAPLE software, a highly optimized C++ reimplementation of MAPLE with many new features and advancements; and (2) CMAPLE library, a suite of Application Programming Interfaces to facilitate the integration of the CMAPLE algorithm into existing phylogenetic inference packages. Notably, we have successfully integrated CMAPLE into the widely used IQ-TREE 2 software, enabling its rapid adoption in the scientific community. These advancements serve as a vital step towards better preparedness for future pandemics, offering researchers powerful tools for large-scale pathogen genomic analysis.

4.

Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations.

Kramer, Alexander M; Thornlow, Bryan; Ye, Cheng; De Maio, Nicola; McBroome, Jakob; Hinrichs, Angie S; Lanfear, Robert; Turakhia, Yatish; Corbett-Detig, Russell.

Syst Biol ; 72(5): 1039-1051, 2023 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-37232476

RESUMEN

Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 data sets do not fit this mold. There are currently over 14 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an "online" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) and pseudo-ML methods may be more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger data sets. Here, we evaluate the performance of de novo and online phylogenetic approaches, as well as ML, pseudo-ML, and MP frameworks for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimization with UShER and matOptimize produces equivalent SARS-CoV-2 phylogenies to some of the most popular ML and pseudo-ML inference tools. MP optimization with UShER and matOptimize is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo inference. Our results therefore suggest that parsimony-based methods like UShER and matOptimize represent an accurate and more practical alternative to established ML implementations for large SARS-CoV-2 phylogenies and could be successfully applied to other similar data sets with particularly dense sampling and short branch lengths.

Asunto(s)

COVID-19 , SARS-CoV-2 , Humanos , Filogenia , Probabilidad , Genómica

5.

Accounting for spatial sampling patterns in Bayesian phylogeography.

Guindon, Stéphane; De Maio, Nicola.

Proc Natl Acad Sci U S A ; 118(52)2021 12 28.

Artículo en Inglés | MEDLINE | ID: mdl-34930835

RESUMEN

Statistical phylogeography provides useful tools to characterize and quantify the spread of organisms during the course of evolution. Analyzing georeferenced genetic data often relies on the assumption that samples are preferentially collected in densely populated areas of the habitat. Deviation from this assumption negatively impacts the inference of the spatial and demographic dynamics. This issue is pervasive in phylogeography. It affects analyses that approximate the habitat as a set of discrete demes as well as those that treat it as a continuum. The present study introduces a Bayesian modeling approach that explicitly accommodates for spatial sampling strategies. An original inference technique, based on recent advances in statistical computing, is then described that is most suited to modeling data where sequences are preferentially collected at certain locations, independently of the outcome of the evolutionary process. The analysis of georeferenced genetic sequences from the West Nile virus in North America along with simulated data shows how assumptions about spatial sampling may impact our understanding of the forces shaping biodiversity across time and space.

Asunto(s)

Modelos Estadísticos , Filogeografía/métodos , Dinámica Poblacional , Algoritmos , Teorema de Bayes , Ecosistema , Evolución Molecular , Humanos , América del Norte , Análisis Espacial , Fiebre del Nilo Occidental/epidemiología , Fiebre del Nilo Occidental/virología , Virus del Nilo Occidental/genética

6.

Short-range template switching in great ape genomes explored using pair hidden Markov models.

Walker, Conor R; Scally, Aylwyn; De Maio, Nicola; Goldman, Nick.

PLoS Genet ; 17(3): e1009221, 2021 03.

Artículo en Inglés | MEDLINE | ID: mdl-33651813

RESUMEN

Many complex genomic rearrangements arise through template switch errors, which occur in DNA replication when there is a transient polymerase switch to an alternate template nearby in three-dimensional space. While typically investigated at kilobase-to-megabase scales, the genomic and evolutionary consequences of this mutational process are not well characterised at smaller scales, where they are often interpreted as clusters of independent substitutions, insertions and deletions. Here we present an improved statistical approach using pair hidden Markov models, and use it to detect and describe short-range template switches underlying clusters of mutations in the multi-way alignment of hominid genomes. Using robust statistics derived from evolutionary genomic simulations, we show that template switch events have been widespread in the evolution of the great apes' genomes and provide a parsimonious explanation for the presence of many complex mutation clusters in their phylogenetic context. Larger-scale mechanisms of genome rearrangement are typically associated with structural features around breakpoints, and accordingly we show that atypical patterns of secondary structure formation and DNA bending are present at the initial template switch loci. Our methods improve on previous non-probabilistic approaches for computational detection of template switch mutations, allowing the statistical significance of events to be assessed. By specifying realistic evolutionary parameters based on the genomes and taxa involved, our methods can be readily adapted to other intra- or inter-species comparisons.

Asunto(s)

Replicación del ADN , Genoma , Hominidae/genética , Cadenas de Markov , Modelos Genéticos , Moldes Genéticos , Algoritmos , Animales , Genómica/métodos , Humanos , Poli A-U , Sitios de Carácter Cuantitativo

7.

phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets.

De Maio, Nicola; Boulton, William; Weilguny, Lukas; Walker, Conor R; Turakhia, Yatish; Corbett-Detig, Russell; Goldman, Nick.

PLoS Comput Biol ; 18(4): e1010056, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35486906

RESUMEN

Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, and are an essential component of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here, we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and it implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutability models that we developed to more realistically represent SARS-CoV-2 genome evolution.

Asunto(s)

COVID-19 , Pandemias , Algoritmos , COVID-19/epidemiología , Simulación por Computador , Evolución Molecular , Humanos , Filogenia , SARS-CoV-2/genética , Programas Informáticos

8.

VGsim: Scalable viral genealogy simulator for global pandemic.

Shchur, Vladimir; Spirin, Vadim; Sirotkin, Dmitry; Burovski, Evgeni; De Maio, Nicola; Corbett-Detig, Russell.

PLoS Comput Biol ; 18(8): e1010409, 2022 08.

Artículo en Inglés | MEDLINE | ID: mdl-36001646

RESUMEN

Accurate simulation of complex biological processes is an essential component of developing and validating new technologies and inference approaches. As an effort to help contain the COVID-19 pandemic, large numbers of SARS-CoV-2 genomes have been sequenced from most regions in the world. More than 5.5 million viral sequences are publicly available as of November 2021. Many studies estimate viral genealogies from these sequences, as these can provide valuable information about the spread of the pandemic across time and space. Additionally such data are a rich source of information about molecular evolutionary processes including natural selection, for example allowing the identification of new variants with transmissibility and immunity evasion advantages. To our knowledge, there is no framework that is both efficient and flexible enough to simulate the pandemic to approximate world-scale scenarios and generate viral genealogies of millions of samples. Here, we introduce a new fast simulator VGsim which addresses the problem of simulation genealogies under epidemiological models. The simulation process is split into two phases. During the forward run the algorithm generates a chain of population-level events reflecting the dynamics of the pandemic using an hierarchical version of the Gillespie algorithm. During the backward run a coalescent-like approach generates a tree genealogy of samples conditioning on the population-level events chain generated during the forward run. Our software can model complex population structure, epistasis and immunity escape.

Asunto(s)

COVID-19 , Pandemias , COVID-19/epidemiología , Simulación por Computador , Humanos , SARS-CoV-2/genética , Programas Informáticos

9.

Publisher Correction: Genomic reconstruction of the SARS CoV-2 epidemic in England.

Vöhringer, Harald S; Sanderson, Theo; Sinnott, Matthew; De Maio, Nicola; Nguyen, Thuy; Goater, Richard; Schwach, Frank; Harrison, Ian; Hellewell, Joel; Ariani, Cristina V; Gonçalves, Sonia; Jackson, David K; Johnston, Ian; Jung, Alexander W; Saint, Callum; Sillitoe, John; Suciu, Maria; Goldman, Nick; Panovska-Griffiths, Jasmina; Birney, Ewan; Volz, Erik; Funk, Sebastian; Kwiatkowski, Dominic; Chand, Meera; Martincorena, Inigo; Barrett, Jeffrey C; Gerstung, Moritz.

Nature ; 606(7915): E18, 2022 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-35701578

10.

Stability of SARS-CoV-2 phylogenies.

Turakhia, Yatish; De Maio, Nicola; Thornlow, Bryan; Gozashti, Landen; Lanfear, Robert; Walker, Conor R; Hinrichs, Angie S; Fernandes, Jason D; Borges, Rui; Slodkowicz, Greg; Weilguny, Lukas; Haussler, David; Goldman, Nick; Corbett-Detig, Russell.

PLoS Genet ; 16(11): e1009175, 2020 11.

Artículo en Inglés | MEDLINE | ID: mdl-33206635

RESUMEN

The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse.

Asunto(s)

Genoma Viral/genética , Filogenia , SARS-CoV-2/genética , Algoritmos , COVID-19 , Biología Computacional , Evolución Molecular , Humanos , ARN Viral/genética , Alineación de Secuencia , Secuenciación Completa del Genoma

11.

A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees.

McBroome, Jakob; Thornlow, Bryan; Hinrichs, Angie S; Kramer, Alexander; De Maio, Nicola; Goldman, Nick; Haussler, David; Corbett-Detig, Russell; Turakhia, Yatish.

Mol Biol Evol ; 38(12): 5819-5824, 2021 12 09.

Artículo en Inglés | MEDLINE | ID: mdl-34469548

RESUMEN

The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils-a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.

Asunto(s)

Evolución Molecular , Filogenia , SARS-CoV-2 , COVID-19/virología , Humanos , Mutación , SARS-CoV-2/genética , Programas Informáticos

12.

The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment.

De Maio, Nicola.

Syst Biol ; 70(2): 236-257, 2021 02 10.

Artículo en Inglés | MEDLINE | ID: mdl-32653921

RESUMEN

Sequence alignment is essential for phylogenetic and molecular evolution inference, as well as in many other areas of bioinformatics and evolutionary biology. Inaccurate alignments can lead to severe biases in most downstream statistical analyses. Statistical alignment based on probabilistic models of sequence evolution addresses these issues by replacing heuristic score functions with evolutionary model-based probabilities. However, score-based aligners and fixed-alignment phylogenetic approaches are still more prevalent than methods based on evolutionary indel models, mostly due to computational convenience. Here, I present new techniques for improving the accuracy and speed of statistical evolutionary alignment. The "cumulative indel model" approximates realistic evolutionary indel dynamics using differential equations. "Adaptive banding" reduces the computational demand of most alignment algorithms without requiring prior knowledge of divergence levels or pseudo-optimal alignments. Using simulations, I show that these methods lead to fast and accurate pairwise alignment inference. Also, I show that it is possible, with these methods, to align and infer evolutionary parameters from a single long synteny block ($\approx$530 kbp) between the human and chimp genomes. The cumulative indel model and adaptive banding can therefore improve the performance of alignment and phylogenetic methods. [Evolutionary alignment; pairHMM; sequence evolution; statistical alignment; statistical genetics.].

Asunto(s)

Evolución Molecular , Mutación INDEL , Algoritmos , Biología Computacional , Humanos , Mutación INDEL/genética , Modelos Estadísticos , Filogenia , Alineación de Secuencia

13.

Sampling bias and model choice in continuous phylogeography: Getting lost on a random walk.

Kalkauskas, Antanas; Perron, Umberto; Sun, Yuxuan; Goldman, Nick; Baele, Guy; Guindon, Stephane; De Maio, Nicola.

PLoS Comput Biol ; 17(1): e1008561, 2021 01.

Artículo en Inglés | MEDLINE | ID: mdl-33406072

RESUMEN

Phylogeographic inference allows reconstruction of past geographical spread of pathogens or living organisms by integrating genetic and geographic data. A popular model in continuous phylogeography-with location data provided in the form of latitude and longitude coordinates-describes spread as a Brownian motion (Brownian Motion Phylogeography, BMP) in continuous space and time, akin to similar models of continuous trait evolution. Here, we show that reconstructions using this model can be strongly affected by sampling biases, such as the lack of sampling from certain areas. As an attempt to reduce the effects of sampling bias on BMP, we consider the addition of sequence-free samples from under-sampled areas. While this approach alleviates the effects of sampling bias, in most scenarios this will not be a viable option due to the need for prior knowledge of an outbreak's spatial distribution. We therefore consider an alternative model, the spatial Λ-Fleming-Viot process (ΛFV), which has recently gained popularity in population genetics. Despite the ΛFV's robustness to sampling biases, we find that the different assumptions of the ΛFV and BMP models result in different applicabilities, with the ΛFV being more appropriate for scenarios of endemic spread, and BMP being more appropriate for recent outbreaks or colonizations.

Asunto(s)

Genética de Población/métodos , Modelos Genéticos , Filogeografía/métodos , Sesgo de Selección , Teorema de Bayes , Biología Computacional , Brotes de Enfermedades/estadística & datos numéricos , Flavivirus/genética , Infecciones por Flavivirus/epidemiología , Infecciones por Flavivirus/virología , Humanos , Cadenas de Markov

14.

Want to track pandemic variants faster? Fix the bioinformatics bottleneck.

Hodcroft, Emma B; De Maio, Nicola; Lanfear, Rob; MacCannell, Duncan R; Minh, Bui Quang; Schmidt, Heiko A; Stamatakis, Alexandros; Goldman, Nick; Dessimoz, Christophe.

Nature ; 591(7848): 30-33, 2021 03.

Artículo en Inglés | MEDLINE | ID: mdl-33649511

Asunto(s)

COVID-19/epidemiología , COVID-19/virología , Evolución Molecular , Genómica/métodos , Genómica/tendencias , Mutación , SARS-CoV-2/genética , Animales , Automatización/métodos , Número Básico de Reproducción , COVID-19/inmunología , COVID-19/transmisión , Vacunas contra la COVID-19/inmunología , Genoma Viral/genética , Humanos , Visón/virología , Pandemias/estadística & datos numéricos , Filogenia , Salud Pública/métodos , Salud Pública/tendencias , SARS-CoV-2/inmunología , SARS-CoV-2/aislamiento & purificación , SARS-CoV-2/patogenicidad , Medios de Comunicación Sociales , Incertidumbre

15.

A phylogenetic approach for weighting genetic sequences.

De Maio, Nicola; Alekseyenko, Alexander V; Coleman-Smith, William J; Pardi, Fabio; Suchard, Marc A; Tamuri, Asif U; Truszkowski, Jakub; Goldman, Nick.

BMC Bioinformatics ; 22(1): 285, 2021 May 28.

Artículo en Inglés | MEDLINE | ID: mdl-34049487

RESUMEN

BACKGROUND: Many important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are 'novel' compared to the others in the same dataset, and low weights to sequences that are over-represented. RESULTS: We formalise this principle by rigorously defining the evolutionary 'novelty' of a sequence within an alignment. This results in new sequence weights that we call 'phylogenetic novelty scores'. These scores have various desirable properties, and we showcase their use by considering, as an example application, the inference of character frequencies at an alignment column-important, for example, in protein family profiling. We give computationally efficient algorithms for calculating our scores and, using simulations, show that they are versatile and can improve the accuracy of character frequency estimation compared to existing sequence weighting schemes. CONCLUSIONS: Our phylogenetic novelty scores can be useful when an evolutionarily meaningful system for adjusting for uneven taxon sampling is desired. They have numerous possible applications, including estimation of evolutionary conservation scores and sequence logos, identification of targets in conservation biology, and improving and measuring sequence alignment accuracy.

Asunto(s)

Algoritmos , Biología Computacional , Filogenia , Alineación de Secuencia

16.

Genetic Variability of the SARS-CoV-2 Pocketome.

Yazdani, Setayesh; De Maio, Nicola; Ding, Yining; Shahani, Vijay; Goldman, Nick; Schapira, Matthieu.

J Proteome Res ; 20(8): 4212-4215, 2021 08 06.

Artículo en Inglés | MEDLINE | ID: mdl-34180678

RESUMEN

In the absence of effective treatment, COVID-19 is likely to remain a global disease burden. Compounding this threat is the near certainty that novel coronaviruses with pandemic potential will emerge in years to come. Pan-coronavirus drugs-agents active against both SARS-CoV-2 and other coronaviruses-would address both threats. A strategy to develop such broad-spectrum inhibitors is to pharmacologically target binding sites on SARS-CoV-2 proteins that are highly conserved in other known coronaviruses, the assumption being that any selective pressure to keep a site conserved across past viruses will apply to future ones. Here we systematically mapped druggable binding pockets on the experimental structure of 15 SARS-CoV-2 proteins and analyzed their variation across 27 α- and ß-coronaviruses and across thousands of SARS-CoV-2 samples from COVID-19 patients. We find that the two most conserved druggable sites are a pocket overlapping the RNA binding site of the helicase nsp13 and the catalytic site of the RNA-dependent RNA polymerase nsp12, both components of the viral replication-transcription complex. We present the data on a public web portal (https://www.thesgc.org/SARSCoV2_pocketome/), where users can interactively navigate individual protein structures and view the genetic variability of drug-binding pockets in 3D.

Asunto(s)

COVID-19 , SARS-CoV-2 , Antivirales/farmacología , Antivirales/uso terapéutico , Humanos , Pandemias , ARN Polimerasa Dependiente del ARN/genética

17.

BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis.

Bouckaert, Remco; Vaughan, Timothy G; Barido-Sottani, Joëlle; Duchêne, Sebastián; Fourment, Mathieu; Gavryushkina, Alexandra; Heled, Joseph; Jones, Graham; Kühnert, Denise; De Maio, Nicola; Matschiner, Michael; Mendes, Fábio K; Müller, Nicola F; Ogilvie, Huw A; du Plessis, Louis; Popinga, Alex; Rambaut, Andrew; Rasmussen, David; Siveroni, Igor; Suchard, Marc A; Wu, Chieh-Hsi; Xie, Dong; Zhang, Chi; Stadler, Tanja; Drummond, Alexei J.

PLoS Comput Biol ; 15(4): e1006650, 2019 04.

Artículo en Inglés | MEDLINE | ID: mdl-30958812

RESUMEN

Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.

Asunto(s)

Teorema de Bayes , Evolución Biológica , Filogenia , Programas Informáticos , Animales , Biología Computacional , Simulación por Computador , Evolución Molecular , Humanos , Cadenas de Markov , Modelos Genéticos , Método de Montecarlo

18.

Bayesian reconstruction of transmission within outbreaks using genomic variants.

De Maio, Nicola; Worby, Colin J; Wilson, Daniel J; Stoesser, Nicole.

PLoS Comput Biol ; 14(4): e1006117, 2018 04.

Artículo en Inglés | MEDLINE | ID: mdl-29668677

RESUMEN

Pathogen genome sequencing can reveal details of transmission histories and is a powerful tool in the fight against infectious disease. In particular, within-host pathogen genomic variants identified through heterozygous nucleotide base calls are a potential source of information to identify linked cases and infer direction and time of transmission. However, using such data effectively to model disease transmission presents a number of challenges, including differentiating genuine variants from those observed due to sequencing error, as well as the specification of a realistic model for within-host pathogen population dynamics. Here we propose a new Bayesian approach to transmission inference, BadTrIP (BAyesian epiDemiological TRansmission Inference from Polymorphisms), that explicitly models evolution of pathogen populations in an outbreak, transmission (including transmission bottlenecks), and sequencing error. BadTrIP enables the inference of host-to-host transmission from pathogen sequencing data and epidemiological data. By assuming that genomic variants are unlinked, our method does not require the computationally intensive and unreliable reconstruction of individual haplotypes. Using simulations we show that BadTrIP is robust in most scenarios and can accurately infer transmission events by efficiently combining information from genetic and epidemiological sources; thanks to its realistic model of pathogen evolution and the inclusion of epidemiological data, BadTrIP is also more accurate than existing approaches. BadTrIP is distributed as an open source package (https://bitbucket.org/nicofmay/badtrip) for the phylogenetic software BEAST2. We apply our method to reconstruct transmission history at the early stages of the 2014 Ebola outbreak, showcasing the power of within-host genomic variants to reconstruct transmission events.

Asunto(s)

Enfermedades Transmisibles/epidemiología , Enfermedades Transmisibles/transmisión , Brotes de Enfermedades/estadística & datos numéricos , Interacciones Huésped-Patógeno/genética , Teorema de Bayes , Enfermedades Transmisibles/genética , Biología Computacional , Simulación por Computador , Evolución Molecular , Variación Genética , Fiebre Hemorrágica Ebola/epidemiología , Fiebre Hemorrágica Ebola/genética , Fiebre Hemorrágica Ebola/transmisión , Humanos , Modelos Genéticos , Sierra Leona/epidemiología , Programas Informáticos

19.

Two Distinct Patterns of Clostridium difficile Diversity Across Europe Indicating Contrasting Routes of Spread.

Eyre, David W; Davies, Kerrie A; Davis, Georgina; Fawley, Warren N; Dingle, Kate E; De Maio, Nicola; Karas, Andreas; Crook, Derrick W; Peto, Tim E A; Walker, A Sarah; Wilcox, Mark H.

Clin Infect Dis ; 67(7): 1035-1044, 2018 09 14.

Artículo en Inglés | MEDLINE | ID: mdl-29659747

RESUMEN

Background: Rates of Clostridium difficile infection vary widely across Europe, as do prevalent ribotypes. The extent of Europe-wide diversity within each ribotype, however, is unknown. Methods: Inpatient diarrheal fecal samples submitted on a single day in summer and winter (2012-2013) to laboratories in 482 European hospitals were cultured for C. difficile, and isolates the 10 most prevalent ribotypes were whole-genome sequenced. Within each ribotype, country-based sequence clustering was assessed using the ratio of the median number of single-nucleotide polymorphisms between isolates within versus across different countries, using permutation tests. Time-scaled Bayesian phylogenies were used to reconstruct the historical location of each lineage. Results: Sequenced isolates (n = 624) were from 19 countries. Five ribotypes had within-country clustering: ribotype 356, only in Italy; ribotype 018, predominantly in Italy; ribotype 176, with distinct Czech and German clades; ribotype 001/072, including distinct German, Slovakian, and Spanish clades; and ribotype 027, with multiple predominantly country-specific clades including in Hungary, Italy, Germany, Romania, and Poland. By contrast, we found no within-country clustering for ribotypes 078, 015, 002, 014, and 020, consistent with a Europe-wide distribution. Fluoroquinolone resistance was significantly more common in within-country clustered ribotypes (P = .009). Fluoroquinolone-resistant isolates were also more tightly clustered geographically with a median (interquartile range) of 43 (0-213) miles between each isolate and the most closely genetically related isolate, versus 421 (204-680) miles in nonresistant pairs (P < .001). Conclusions: Two distinct patterns of C. difficile ribotype spread were observed, consistent with either predominantly healthcare-associated acquisition or Europe-wide dissemination via other routes/sources, for example, the food chain.

Asunto(s)

Clostridioides difficile/genética , Infecciones por Clostridium/epidemiología , Infecciones por Clostridium/microbiología , Antibacterianos/farmacología , Clostridioides difficile/efectos de los fármacos , Análisis por Conglomerados , Farmacorresistencia Bacteriana , Europa (Continente)/epidemiología , Variación Genética , Humanos , Ribotipificación

20.

Pneumococcal Capsule Synthesis Locus cps as Evolutionary Hotspot with Potential to Generate Novel Serotypes by Recombination.

Mostowy, Rafal J; Croucher, Nicholas J; De Maio, Nicola; Chewapreecha, Claire; Salter, Susannah J; Turner, Paul; Aanensen, David M; Bentley, Stephen D; Didelot, Xavier; Fraser, Christophe.

Mol Biol Evol ; 34(10): 2537-2554, 2017 10 01.

Artículo en Inglés | MEDLINE | ID: mdl-28595308

RESUMEN

Diversity of the polysaccharide capsule in Streptococcus pneumoniae-main surface antigen and the target of the currently used pneumococcal vaccines-constitutes a major obstacle in eliminating pneumococcal disease. Such diversity is genetically encoded by almost 100 variants of the capsule biosynthesis locus, cps. However, the evolutionary dynamics of the capsule remains not fully understood. Here, using genetic data from 4,519 bacterial isolates, we found cps to be an evolutionary hotspot with elevated substitution and recombination rates. These rates were a consequence of relaxed purifying selection and positive, diversifying selection acting at this locus, supporting the hypothesis that the capsule has an increased potential to generate novel diversity compared with the rest of the genome. Diversifying selection was particularly evident in the region of wzd/wze genes, which are known to regulate capsule expression and hence the bacterium's ability to cause disease. Using a novel, capsule-centered approach, we analyzed the evolutionary history of 12 major serogroups. Such analysis revealed their complex diversification scenarios, which were principally driven by recombination with other serogroups and other streptococci. Patterns of recombinational exchanges between serogroups could not be explained by serotype frequency alone, thus pointing to nonrandom associations between co-colonizing serotypes. Finally, we discovered a previously unobserved mosaic serotype 39X, which was confirmed to carry a viable and structurally novel capsule. Adding to previous discoveries of other mosaic capsules in densely sampled collections, these results emphasize the strong adaptive potential of the bacterium by its ability to generate novel antigenic diversity by recombination.

Asunto(s)

Cápsulas Bacterianas/genética , Streptococcus pneumoniae/metabolismo , Cápsulas Bacterianas/metabolismo , Evolución Biológica , ADN Bacteriano/genética , Evolución Molecular , Variación Genética , Mutación/genética , Filogenia , Infecciones Neumocócicas , Vacunas Neumococicas , Polisacáridos Bacterianos/genética , Polisacáridos Bacterianos/metabolismo , Recombinación Genética/genética , Serogrupo , Streptococcus/genética , Streptococcus pneumoniae/genética

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA