Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Mol Biol Evol ; 2024 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-38934791

RESUMO

We have recently introduced MAPLE (MAximum Parsimonious Likelihood Estimation), a new pandemic-scale phylogenetic inference method exclusively designed for genomic epidemiology. In response to the need for enhancing MAPLE's performance and scalability, here we present two key components: (1) CMAPLE software, a highly optimized C++ reimplementation of MAPLE with many new features and advancements; and (2) CMAPLE library, a suite of Application Programming Interfaces to facilitate the integration of the CMAPLE algorithm into existing phylogenetic inference packages. Notably, we have successfully integrated CMAPLE into the widely used IQ-TREE 2 software, enabling its rapid adoption in the scientific community. These advancements serve as a vital step towards better preparedness for future pandemics, offering researchers powerful tools for large-scale pathogen genomic analysis.

2.
Syst Biol ; 2024 Feb 29.
Artigo em Inglês | MEDLINE | ID: mdl-38421146

RESUMO

Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting, introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call MAST. This model extends a prior implementation by Boussau et al. (2009) by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of incomplete lineage sorting in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of four Platyrrhine species for which standard concatenated maximum likelihood and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e. the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyse a concatenated alignment using maximum likelihood, while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.

3.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36383168

RESUMO

MOTIVATION: Site concordance factors (sCFs) have become a widely used way to summarize discordance in phylogenomic datasets. However, the original version of sCFs was calculated by sampling a quartet of tip taxa and then applying parsimony-based criteria for discordance. This approach has the potential to be strongly affected by multiple hits at a site (homoplasy), especially when substitution rates are high or taxa are not closely related. RESULTS: Here, we introduce a new method for calculating sCFs. The updated version uses likelihood to generate probability distributions of ancestral states at internal nodes of the phylogeny. By sampling from the states at internal nodes adjacent to a given branch, this approach substantially reduces-but does not abolish-the effects of homoplasy and taxon sampling. AVAILABILITY AND IMPLEMENTATION: Updated sCFs are implemented in IQ-TREE 2.2.2. The software is freely available at https://github.com/iqtree/iqtree2/releases. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.


Assuntos
Software , Filogenia , Probabilidade
4.
Bioinformatics ; 39(9)2023 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-37656933

RESUMO

MOTIVATION: Sequence simulation plays a vital role in phylogenetics with many applications, such as evaluating phylogenetic methods, testing hypotheses, and generating training data for machine-learning applications. We recently introduced a new simulator for multiple sequence alignments called AliSim, which outperformed existing tools. However, with the increasing demands of simulating large data sets, AliSim is still slow due to its sequential implementation; for example, to simulate millions of sequence alignments, AliSim took several days or weeks. Parallelization has been used for many phylogenetic inference methods but not yet for sequence simulation. RESULTS: This paper introduces AliSim-HPC, which, for the first time, employs high-performance computing for phylogenetic simulations. AliSim-HPC parallelizes the simulation process at both multi-core and multi-CPU levels using the OpenMP and message passing interface (MPI) libraries, respectively. AliSim-HPC is highly efficient and scalable, which reduces the runtime to simulate 100 large gap-free alignments (30 000 sequences of one million sites) from over one day to 11 min using 256 CPU cores from a cluster with six computing nodes, a 153-fold speedup. While the OpenMP version can only simulate gap-free alignments, the MPI version supports insertion-deletion models like the sequential AliSim. AVAILABILITY AND IMPLEMENTATION: AliSim-HPC is open-source and available as part of the new IQ-TREE version v2.2.3 at https://github.com/iqtree/iqtree2/releases with a user manual at http://www.iqtree.org/doc/AliSim.


Assuntos
Metodologias Computacionais , Software , Filogenia , Simulação por Computador , Alinhamento de Sequência
5.
Bioinformatics ; 39(9)2023 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-37651445

RESUMO

MOTIVATION: Neighbour-Joining is one of the most widely used distance-based phylogenetic inference methods. However, current implementations do not scale well for datasets with more than 10 000 sequences. Given the increasing pace of generating new sequence data, particularly in outbreaks of emerging diseases, and the already enormous existing databases of sequence data for which Neighbour-Joining is a useful approach, new implementations of existing methods are warranted. RESULTS: Here, we present DecentTree, which provides highly optimized and parallel implementations of Neighbour-Joining and several of its variants. DecentTree is designed as a stand-alone application and a header-only library easily integrated with other phylogenetic software (e.g. it is integral in the popular IQ-TREE software). We show that DecentTree shows similar or improved performance over existing software (BIONJ, Quicktree, FastME, and RapidNJ), especially for handling very large alignments. For example, DecentTree is up to 6-fold faster than the fastest existing Neighbour-Joining software (e.g. RapidNJ) when generating a tree of 64 000 SARS-CoV-2 genomes. AVAILABILITY AND IMPLEMENTATION: DecentTree is open source and freely available at https://github.com/iqtree/decenttree. All code and data used in this analysis are available on Github (https://github.com/asdcid/Comparison-of-neighbour-joining-software).


Assuntos
COVID-19 , Humanos , Filogenia , SARS-CoV-2/genética , Genômica , Biblioteca Gênica
6.
Mol Biol Evol ; 39(5)2022 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-35511713

RESUMO

Sequence simulators play an important role in phylogenetics. Simulated data has many applications, such as evaluating the performance of different methods, hypothesis testing with parametric bootstraps, and, more recently, generating data for training machine-learning applications. Many sequence simulation programmes exist, but the most feature-rich programmes tend to be rather slow, and the fastest programmes tend to be feature-poor. Here, we introduce AliSim, a new tool that can efficiently simulate biologically realistic alignments under a large range of complex evolutionary models. To achieve high performance across a wide range of simulation conditions, AliSim implements an adaptive approach that combines the commonly used rate matrix and probability matrix approaches. AliSim takes 1.4 h and 1.3 GB RAM to simulate alignments with one million sequences or sites, whereas popular software Seq-Gen, Dawg, and INDELible require 2-5 h and 50-500 GB of RAM. We provide AliSim as an extension of the IQ-TREE software version 2.2, freely available at www.iqtree.org, and a comprehensive user tutorial at http://www.iqtree.org/doc/AliSim.


Assuntos
Evolução Molecular , Modelos Genéticos , Genômica , Filogenia , Software
7.
Syst Biol ; 71(5): 1110-1123, 2022 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-35139203

RESUMO

Amino acid substitution models are a key component in phylogenetic analyses of protein sequences. All commonly used amino acid models available to date are time-reversible, an assumption designed for computational convenience but not for biological reality. Another significant downside to time-reversible models is that they do not allow inference of rooted trees without outgroups. In this article, we introduce a maximum likelihood approach nQMaker, an extension of the recently published QMaker method, that allows the estimation of time nonreversible amino acid substitution models and rooted phylogenetic trees from a set of protein sequence alignments. We show that the nonreversible models estimated with nQMaker are a much better fit to empirical alignments than pre-existing reversible models, across a wide range of data sets including mammals, birds, plants, fungi, and other taxa, and that the improvements in model fit scale with the size of the data set. Notably, for the recently published plant and bird trees, these nonreversible models correctly recovered the commonly estimated root placements with very high-statistical support without the need to use an outgroup. We provide nQMaker as an easy-to-use feature in the IQ-TREE software (http://www.iqtree.org), allowing users to estimate nonreversible models and rooted phylogenies from their own protein data sets. The data sets and scripts used in this article are available at https://doi.org/10.5061/dryad.3tx95x6hx. [amino acid sequence analyses; amino acid substitution models; maximum likelihood model estimation; nonreversible models; phylogenetic inference; reversible models.].


Assuntos
Modelos Genéticos , Software , Substituição de Aminoácidos , Animais , Evolução Molecular , Funções Verossimilhança , Mamíferos , Filogenia , Proteínas
8.
PLoS Biol ; 18(12): e3000954, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33270638

RESUMO

Our understanding of the evolutionary history of primates is undergoing continual revision due to ongoing genome sequencing efforts. Bolstered by growing fossil evidence, these data have led to increased acceptance of once controversial hypotheses regarding phylogenetic relationships, hybridization and introgression, and the biogeographical history of primate groups. Among these findings is a pattern of recent introgression between species within all major primate groups examined to date, though little is known about introgression deeper in time. To address this and other phylogenetic questions, here, we present new reference genome assemblies for 3 Old World monkey (OWM) species: Colobus angolensis ssp. palliatus (the black and white colobus), Macaca nemestrina (southern pig-tailed macaque), and Mandrillus leucophaeus (the drill). We combine these data with 23 additional primate genomes to estimate both the species tree and individual gene trees using thousands of loci. While our species tree is largely consistent with previous phylogenetic hypotheses, the gene trees reveal high levels of genealogical discordance associated with multiple primate radiations. We use strongly asymmetric patterns of gene tree discordance around specific branches to identify multiple instances of introgression between ancestral primate lineages. In addition, we exploit recent fossil evidence to perform fossil-calibrated molecular dating analyses across the tree. Taken together, our genome-wide data help to resolve multiple contentious sets of relationships among primates, while also providing insight into the biological processes and technical artifacts that led to the disagreements in the first place.


Assuntos
Introgressão Genética/genética , Primatas/genética , Animais , Evolução Biológica , Cercopithecidae/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Fósseis , Fluxo Gênico/genética , Genoma/genética , Modelos Genéticos , Filogenia , Análise de Sequência de DNA/métodos
9.
Chem Pharm Bull (Tokyo) ; 71(6): 451-453, 2023 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-36948639

RESUMO

Two new compounds, named eudesm-4(15),7-diene-3α,9ß,11-triol (1) and eudesm-4(15),7-diene-1ß,3α,9ß,11-tetraol (2) together with three known sesquiterpene lactones (1S,5R,7R,10R)-secoatractylolactone (3), (1S,5R,7R,10R)-secoatractylolactone-11-O-ß-D-glucopyranoside (4) atractylenolide III (5) were isolated from the rhizomes of Atractylodes macrocephala. Their structures were elucidated by using one-dimensional (1D) and 2D-NMR spectra and high resolution electrospray ionization (HR-ESI)-MS data. Compound 5 exhibited the most active anti-inflammatory activity with IC50 values of 27.5 µM in inhibiting of nitric oxide production. Compounds 1, 2, and 3 showed moderate effects while compound 4 was inactive.


Assuntos
Atractylodes , Sesquiterpenos , Rizoma/química , Atractylodes/química , Anti-Inflamatórios/farmacologia , Espectroscopia de Ressonância Magnética , Sesquiterpenos/farmacologia , Sesquiterpenos/química , Lactonas/farmacologia , Lactonas/química
10.
Mol Biol Evol ; 38(7): 2915-2929, 2021 06 25.
Artigo em Inglês | MEDLINE | ID: mdl-33744972

RESUMO

Serine protease inhibitors (serpins) are found in all kingdoms of life and play essential roles in multiple physiological processes. Owing to the diversity of the superfamily, phylogenetic analysis is challenging and prokaryotic serpins have been speculated to have been acquired from Metazoa through horizontal gene transfer due to their unexpectedly high homology. Here, we have leveraged a structural alignment of diverse serpins to generate a comprehensive 6,000-sequence phylogeny that encompasses serpins from all kingdoms of life. We show that in addition to a central "hub" of highly conserved serpins, there has been extensive diversification of the superfamily into many novel functional clades. Our analysis indicates that the hub proteins are ancient and are similar because of convergent evolution, rather than the alternative hypothesis of horizontal gene transfer. This work clarifies longstanding questions in the evolution of serpins and provides new directions for research in the field of serpin biology.


Assuntos
Evolução Molecular , Família Multigênica , Filogenia , Serpinas/genética , Animais , Bactérias/genética , Cordados/genética , Invertebrados/genética , Plantas/genética
11.
Syst Biol ; 70(5): 1046-1060, 2021 08 11.
Artigo em Inglês | MEDLINE | ID: mdl-33616668

RESUMO

Amino acid substitution models play a crucial role in phylogenetic analyses. Maximum likelihood (ML) methods have been proposed to estimate amino acid substitution models; however, they are typically complicated and slow. In this article, we propose QMaker, a new ML method to estimate a general time-reversible $Q$ matrix from a large protein data set consisting of multiple sequence alignments. QMaker combines an efficient ML tree search algorithm, a model selection for handling the model heterogeneity among alignments, and the consideration of rate mixture models among sites. We provide QMaker as a user-friendly function in the IQ-TREE software package (http://www.iqtree.org) supporting the use of multiple CPU cores so that biologists can easily estimate amino acid substitution models from their own protein alignments. We used QMaker to estimate new empirical general amino acid substitution models from the current Pfam database as well as five clade-specific models for mammals, birds, insects, yeasts, and plants. Our results show that the new models considerably improve the fit between model and data and in some cases influence the inference of phylogenetic tree topologies.[Amino acid replacement matrices; amino acid substitution models; maximum likelihood estimation; phylogenetic inferences.].


Assuntos
Evolução Molecular , Modelos Genéticos , Animais , Funções Verossimilhança , Filogenia , Proteínas/genética , Alinhamento de Sequência
13.
Mol Biol Evol ; 37(9): 2727-2733, 2020 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-32365179

RESUMO

We implement two measures for quantifying genealogical concordance in phylogenomic data sets: the gene concordance factor (gCF) and the novel site concordance factor (sCF). For every branch of a reference tree, gCF is defined as the percentage of "decisive" gene trees containing that branch. This measure is already in wide usage, but here we introduce a package that calculates it while accounting for variable taxon coverage among gene trees. sCF is a new measure defined as the percentage of decisive sites supporting a branch in the reference tree. gCF and sCF complement classical measures of branch support in phylogenetics by providing a full description of underlying disagreement among loci and sites. An easy to use implementation and tutorial is freely available in the IQ-TREE software package (http://www.iqtree.org/doc/Concordance-Factor, last accessed May 13, 2020).


Assuntos
Conjuntos de Dados como Assunto , Técnicas Genéticas , Filogenia , Software
14.
Mol Biol Evol ; 37(5): 1530-1534, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32011700

RESUMO

IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.


Assuntos
Evolução Molecular , Genômica , Modelos Genéticos , Filogenia , Software
15.
Syst Biol ; 69(2): 249-264, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31364711

RESUMO

Molecular sequence data that have evolved under the influence of heterotachous evolutionary processes are known to mislead phylogenetic inference. We introduce the General Heterogeneous evolution On a Single Topology (GHOST) model of sequence evolution, implemented under a maximum-likelihood framework in the phylogenetic program IQ-TREE (http://www.iqtree.org). Simulations show that using the GHOST model, IQ-TREE can accurately recover the tree topology, branch lengths, and substitution model parameters from heterotachously evolved sequences. We investigate the performance of the GHOST model on empirical data by sampling phylogenomic alignments of varying lengths from a plastome alignment. We then carry out inference under the GHOST model on a phylogenomic data set composed of 248 genes from 16 taxa, where we find the GHOST model concurs with the currently accepted view, placing turtles as a sister lineage of archosaurs, in contrast to results obtained using traditional variable rates-across-sites models. Finally, we apply the model to a data set composed of a sodium channel gene of 11 fish taxa, finding that the GHOST model is able to elucidate a subtle component of the historical signal, linked to the previously established convergent evolution of the electric organ in two geographically distinct lineages of electric fish. We compare inference under the GHOST model to partitioning by codon position and show that, owing to the minimization of model constraints, the GHOST model offers unique biological insights when applied to empirical data.


Assuntos
Classificação/métodos , Alinhamento de Sequência/métodos , Software , Animais , Evolução Molecular , Peixes/classificação , Peixes/genética , Modelos Genéticos , Filogenia
16.
Mol Biol Evol ; 36(6): 1294-1301, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-30825307

RESUMO

Molecular phylogenetics has neglected polymorphisms within present and ancestral populations for a long time. Recently, multispecies coalescent based methods have increased in popularity, however, their application is limited to a small number of species and individuals. We introduced a polymorphism-aware phylogenetic model (PoMo), which overcomes this limitation and scales well with the increasing amount of sequence data whereas accounting for present and ancestral polymorphisms. PoMo circumvents handling of gene trees and directly infers species trees from allele frequency data. Here, we extend the PoMo implementation in IQ-TREE and integrate search for the statistically best-fit mutation model, the ability to infer mutation rate variation across sites, and assessment of branch support values. We exemplify an analysis of a hundred species with ten haploid individuals each, showing that PoMo can perform inference on large data sets. While PoMo is more accurate than standard substitution models applied to concatenated alignments, it is almost as fast. We also provide bmm-simulate, a software package that allows simulation of sequences evolving under PoMo. The new options consolidate the value of PoMo for phylogenetic analyses with population data.


Assuntos
Modelos Genéticos , Taxa de Mutação , Filogenia , Polimorfismo Genético , Animais , Humanos , Funções Verossimilhança , Software
17.
Nat Methods ; 14(6): 587-589, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28481363

RESUMO

Model-based molecular phylogenetics plays an important role in comparisons of genomic data, and model selection is a key step in all such analyses. We present ModelFinder, a fast model-selection method that greatly improves the accuracy of phylogenetic estimates by incorporating a model of rate heterogeneity across sites not previously considered in this context and by allowing concurrent searches of model space and tree space.


Assuntos
Algoritmos , Mapeamento Cromossômico/normas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Genéticos , Filogenia , Animais , Simulação por Computador , Evolução Molecular , Humanos , Modelos Estatísticos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de DNA
18.
Mol Biol Evol ; 35(2): 518-522, 2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-29077904

RESUMO

The standard bootstrap (SBS), despite being computationally intensive, is widely used in maximum likelihood phylogenetic analyses. We recently proposed the ultrafast bootstrap approximation (UFBoot) to reduce computing time while achieving more unbiased branch supports than SBS under mild model violations. UFBoot has been steadily adopted as an efficient alternative to SBS and other bootstrap approaches. Here, we present UFBoot2, which substantially accelerates UFBoot and reduces the risk of overestimating branch supports due to polytomies or severe model violations. Additionally, UFBoot2 provides suitable bootstrap resampling strategies for phylogenomic data. UFBoot2 is 778 times (median) faster than SBS and 8.4 times (median) faster than RAxML rapid bootstrap on tested data sets. UFBoot2 is implemented in the IQ-TREE software package version 1.6 and freely available at http://www.iqtree.org.


Assuntos
Funções Verossimilhança , Filogenia , Software , Modelos Genéticos
19.
J Mol Evol ; 92(1): 1-2, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38231224
20.
Syst Biol ; 67(2): 216-235, 2018 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-28950365

RESUMO

Proteins have distinct structural and functional constraints at different sites that lead to site-specific preferences for particular amino acid residues as the sequences evolve. Heterogeneity in the amino acid substitution process between sites is not modeled by commonly used empirical amino acid exchange matrices. Such model misspecification can lead to artefacts in phylogenetic estimation such as long-branch attraction. Although sophisticated site-heterogeneous mixture models have been developed to address this problem in both Bayesian and maximum likelihood (ML) frameworks, their formidable computational time and memory usage severely limits their use in large phylogenomic analyses. Here we propose a posterior mean site frequency (PMSF) method as a rapid and efficient approximation to full empirical profile mixture models for ML analysis. The PMSF approach assigns a conditional mean amino acid frequency profile to each site calculated based on a mixture model fitted to the data using a preliminary guide tree. These PMSF profiles can then be used for in-depth tree-searching in place of the full mixture model. Compared with widely used empirical mixture models with $k$ classes, our implementation of PMSF in IQ-TREE (http://www.iqtree.org) speeds up the computation by approximately $k$/1.5-fold and requires a small fraction of the RAM. Furthermore, this speedup allows, for the first time, full nonparametric bootstrap analyses to be conducted under complex site-heterogeneous models on large concatenated data matrices. Our simulations and empirical data analyses demonstrate that PMSF can effectively ameliorate long-branch attraction artefacts. In some empirical and simulation settings PMSF provided more accurate estimates of phylogenies than the mixture models from which they derive.


Assuntos
Classificação/métodos , Modelos Genéticos , Filogenia , Substituição de Aminoácidos , Simulação por Computador , Evolução Molecular , Estatísticas não Paramétricas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA