Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
Syst Biol ; 73(2): 470-485, 2024 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-38507308

RESUMO

Chronograms-phylogenies with branch lengths proportional to time-represent key data on timing of evolutionary events, allowing us to study natural processes in many areas of biological research. Chronograms also provide valuable information that can be used for education, science communication, and conservation policy decisions. Yet, achieving a high-quality reconstruction of a chronogram is a difficult and resource-consuming task. Here we present DateLife, a phylogenetic software implemented as an R package and an R Shiny web application available at www.datelife.org, that provides services for efficient and easy discovery, summary, reuse, and reanalysis of node age data mined from a curated database of expert, peer-reviewed, and openly available chronograms. The main DateLife workflow starts with one or more scientific taxon names provided by a user. Names are processed and standardized to a unified taxonomy, allowing DateLife to run a name match across its local chronogram database that is curated from Open Tree of Life's phylogenetic repository, and extract all chronograms that contain at least two queried taxon names, along with their metadata. Finally, node ages from matching chronograms are mapped using the congruification algorithm to corresponding nodes on a tree topology, either extracted from Open Tree of Life's synthetic phylogeny or one provided by the user. Congruified node ages are used as secondary calibrations to date the chosen topology, with or without initial branch lengths, using different phylogenetic dating methods such as BLADJ, treePL, PATHd8, and MrBayes. We performed a cross-validation test to compare node ages resulting from a DateLife analysis (i.e, phylogenetic dating using secondary calibrations) to those from the original chronograms (i.e, obtained with primary calibrations), and found that DateLife's node age estimates are consistent with the age estimates from the original chronograms, with the largest variation in ages occurring around topologically deeper nodes. Because the results from any software for scientific analysis can only be as good as the data used as input, we highlight the importance of considering the results of a DateLife analysis in the context of the input chronograms. DateLife can help to increase awareness of the existing disparities among alternative hypotheses of dates for the same diversification events, and to support exploration of the effect of alternative chronogram hypotheses on downstream analyses, providing a framework for a more informed interpretation of evolutionary results.


Assuntos
Classificação , Filogenia , Software , Classificação/métodos , Bases de Dados Factuais
2.
Mol Biol Evol ; 40(11)2023 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-37879113

RESUMO

In phylogenomics, incongruences between gene trees, resulting from both artifactual and biological reasons, can decrease the signal-to-noise ratio and complicate species tree inference. The amount of data handled today in classical phylogenomic analyses precludes manual error detection and removal. However, a simple and efficient way to automate the identification of outliers from a collection of gene trees is still missing. Here, we present PhylteR, a method that allows rapid and accurate detection of outlier sequences in phylogenomic datasets, i.e. species from individual gene trees that do not follow the general trend. PhylteR relies on DISTATIS, an extension of multidimensional scaling to 3 dimensions to compare multiple distance matrices at once. In PhylteR, these distance matrices extracted from individual gene phylogenies represent evolutionary distances between species according to each gene. On simulated datasets, we show that PhylteR identifies outliers with more sensitivity and precision than a comparable existing method. We also show that PhylteR is not sensitive to ILS-induced incongruences, which is a desirable feature. On a biological dataset of 14,463 genes for 53 species previously assembled for Carnivora phylogenomics, we show (i) that PhylteR identifies as outliers sequences that can be considered as such by other means, and (ii) that the removal of these sequences improves the concordance between the gene trees and the species tree. Thanks to the generation of numerous graphical outputs, PhylteR also allows for the rapid and easy visual characterization of the dataset at hand, thus aiding in the precise identification of errors. PhylteR is distributed as an R package on CRAN and as containerized versions (docker and singularity).


Assuntos
Evolução Biológica , Filogenia
3.
BMC Bioinformatics ; 24(1): 390, 2023 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-37838689

RESUMO

BACKGROUND: Genome-scale phylogenetic analysis based on core gene sets is routinely used in microbiological research. However, the techniques are still not approachable for individuals with little bioinformatics experience. Here, we present EasyCGTree, a user-friendly and cross-platform pipeline to reconstruct genome-scale maximum-likehood (ML) phylogenetic tree using supermatrix (SM) and supertree (ST) approaches. RESULTS: EasyCGTree was implemented in Perl programming languages and was built using a collection of published reputable programs. All the programs were precompiled as standalone executable files and contained in the EasyCGTree package. It can run after installing Perl language environment. Several profile hidden Markov models (HMMs) of core gene sets were prepared in advance to construct a profile HMM database (PHD) that was enclosed in the package and available for homolog searching. Customized gene sets can also be used to build profile HMM and added to the PHD via EasyCGTree. Taking 43 genomes of the genus Paracoccus as the testing data set, consensus (a variant of the typical SM), SM, and ST trees were inferred via EasyCGTree successfully, and the SM trees were compared with those inferred via the pipelines UBCG and bcgTree, using the metrics of cophenetic correlation coefficients (CCC) and Robinson-Foulds distance (topological distance). The results suggested that EasyCGTree can infer SM trees with nearly identical topology (distance < 0.1) and accuracy (CCC > 0.99) to those of trees inferred with the two pipelines. CONCLUSIONS: EasyCGTree is an all-in-one automatic pipeline from input data to phylogenomic tree with guaranteed accuracy, and is much easier to install and use than the reference pipelines. In addition, ST is implemented in EasyCGTree conveniently and can be used to explore prokaryotic evolutionary signals from a different perspective. The EasyCGTree version 4 is freely available for Linux and Windows users at Github ( https://github.com/zdf1987/EasyCGTree4 ).


Assuntos
Biologia Computacional , Células Procarióticas , Humanos , Filogenia , Biologia Computacional/métodos , Evolução Biológica , Linguagens de Programação
4.
Proc Biol Sci ; 289(1975): 20212535, 2022 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-35582793

RESUMO

A clade's evolutionary history is shaped, in part, by geographical range expansion, sweepstakes dispersal and local extinction. A rigorous understanding of historical biogeography may therefore yield insights into macroevolutionary dynamics such as adaptive radiation. Modern historical biogeographic analyses typically fit statistical models to molecular phylogenies, but it remains unclear whether extant species provide sufficient signal or if well-sampled phylogenies of extinct and extant taxa are necessary to produce meaningful estimates of past ranges. We investigated the historical biogeography of Primates and their euarchontan relatives using a novel meta-analytical phylogeny of over 900 extant (n= 419) and extinct (n = 483) species spanning their entire evolutionary history. Ancestral range estimates for young nodes were largely congruent with those derived from molecular phylogeny. However, node age exerts a significant effect on ancestral range estimate congruence, and the probability of congruent inference dropped below 0.5 for nodes older than the late Eocene, corresponding to the origins of higher-level clades. Discordance was not observed in analyses of extinct taxa alone. Fossils are essential for robust ancestral range inference and biogeographic analyses of extant clades originating in the deep past should be viewed with scepticism without them.


Assuntos
Evolução Biológica , Fósseis , Animais , Geografia , Filogenia , Primatas/genética
5.
Syst Biol ; 69(1): 38-60, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31062850

RESUMO

Evolutionary relationships have remained unresolved in many well-studied groups, even though advances in next-generation sequencing and analysis, using approaches such as transcriptomics, anchored hybrid enrichment, or ultraconserved elements, have brought systematics to the brink of whole genome phylogenomics. Recently, it has become possible to sequence the entire genomes of numerous nonbiological models in parallel at reasonable cost, particularly with shotgun sequencing. Here, we identify orthologous coding sequences from whole-genome shotgun sequences, which we then use to investigate the relevance and power of phylogenomic relationship inference and time-calibrated tree estimation. We study an iconic group of butterflies-swallowtails of the family Papilionidae-that has remained phylogenetically unresolved, with continued debate about the timing of their diversification. Low-coverage whole genomes were obtained using Illumina shotgun sequencing for all genera. Genome assembly coupled to BLAST-based orthology searches allowed extraction of 6621 orthologous protein-coding genes for 45 Papilionidae species and 16 outgroup species (with 32% missing data after cleaning phases). Supermatrix phylogenomic analyses were performed with both maximum-likelihood (IQ-TREE) and Bayesian mixture models (PhyloBayes) for amino acid sequences, which produced a fully resolved phylogeny providing new insights into controversial relationships. Species tree reconstruction from gene trees was performed with ASTRAL and SuperTriplets and recovered the same phylogeny. We estimated gene site concordant factors to complement traditional node-support measures, which strengthens the robustness of inferred phylogenies. Bayesian estimates of divergence times based on a reduced data set (760 orthologs and 12% missing data) indicate a mid-Cretaceous origin of Papilionoidea around 99.2 Ma (95% credibility interval: 68.6-142.7 Ma) and Papilionidae around 71.4 Ma (49.8-103.6 Ma), with subsequent diversification of modern lineages well after the Cretaceous-Paleogene event. These results show that shotgun sequencing of whole genomes, even when highly fragmented, represents a powerful approach to phylogenomics and molecular dating in a group that has previously been refractory to resolution.


Assuntos
Evolução Biológica , Borboletas/classificação , Borboletas/genética , Genoma de Inseto/genética , Filogenia , Animais , Tempo
6.
Mol Phylogenet Evol ; 130: 346-356, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30321696

RESUMO

The babblers are a diverse group of passerine birds comprising 452 species. The group was long regarded as a "scrap basket" in taxonomic classification schemes. Although several studies have assessed the phylogenetic relationships for subsets of babblers during the past two decades, a comprehensive phylogeny of this group has been lacking. In this study, we used five mitochondrial and seven nuclear loci to generate a dated phylogeny for babblers. This phylogeny includes 402 species (ca. 89% of the overall clade) from 75 genera (97%) and all five currently recognized families, providing a robust basis for taxonomic revision. Our phylogeny supports seven major clades and reveals several non-monophyletic genera. Divergence time estimates indicate that the seven major clades diverged around the same time (18-20 million years ago, Ma) in the early Miocene. We use the phylogeny in a consistent way to propose a new taxonomy, with seven families and 64 genera of babblers, and a new linear sequence of names.


Assuntos
Passeriformes/classificação , Filogenia , Animais , DNA Mitocondrial/genética , Funções Verossimilhança , Passeriformes/genética , Fatores de Tempo
7.
BMC Genomics ; 19(Suppl 5): 252, 2018 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-29745851

RESUMO

BACKGROUND: Many supertree estimation and multi-locus species tree estimation methods compute trees by combining trees on subsets of the species set based on some NP-hard optimization criterion. A recent approach to computing large trees has been to constrain the search space by defining a set of "allowed bipartitions", and then use dynamic programming to find provably optimal solutions in polynomial time. Several phylogenomic estimation methods, such as ASTRAL, the MDC algorithm in PhyloNet, FastRFS, and ALE, use this approach. RESULTS: We present SIESTA, a method that can be combined with these dynamic programming algorithms to return a data structure that compactly represents all the optimal trees in the search space. As a result, SIESTA provides multiple capabilities, including: (1) counting the number of optimal trees, (2) calculating consensus trees, (3) generating a random optimal tree, and (4) annotating branches in a given optimal tree by the proportion of optimal trees it appears in. CONCLUSIONS: SIESTA improves the accuracy of FastRFS and ASTRAL, and is a general technique for enhancing dynamic programming methods for constrained optimization.


Assuntos
Algoritmos , Simulação por Computador , Filogenia , Software , Animais , Biologia Computacional/métodos , Humanos
8.
BMC Genomics ; 19(Suppl 6): 570, 2018 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-30367577

RESUMO

BACKGROUND: Deciphering the history of life on Earth has long been regarded as one of the most central tasks in biology. In past years, widespread discordance between the evolutionary histories of different groups of orthologous genes of prokaryotes have been revealed, primarily due to horizontal gene transfers (HGTs). Nonetheless, evidence that support a strong tree-like signal of evolution have been uncovered, despite the presence of HGT events. Therefore, a challenging task is to distill this tree-like signal from the noise induced by all sources of non-tree-like events. RESULTS: In this work we tackle this question, using real and simulated data. We first tighten a recent related theoretical result in this field. In a simulation study, we infer individual quartet topologies, and then use the inferred quartets to reconstruct simulated species trees. We demonstrate that accurate tree reconstruction is feasible despite surprisingly high rates of HGT. In a real data study, we construct phylogenies of two sets of prokaryotes, and show that our tree reconstruction scheme is comparable with (and complementary better than) other commonly used methods. CONCLUSIONS: Using a blend of theoretical and empirical investigations, our study proves the feasibility of accurate quartet-based phylogenetic reconstruction, the vast impact of HGT events notwithstanding.


Assuntos
Filogenia , Simulação por Computador , Transferência Genética Horizontal , Genes Arqueais , Genes Bacterianos
9.
Mol Biol Evol ; 34(9): 2408-2421, 2017 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-28873954

RESUMO

Supertree methods merge a set of overlapping phylogenetic trees into a supertree containing all taxa of the input trees. The challenge in supertree reconstruction is the way of dealing with conflicting information in the input trees. Many different algorithms for different objective functions have been suggested to resolve these conflicts. In particular, there exist methods based on encoding the source trees in a matrix, where the supertree is constructed applying a local search heuristic to optimize the respective objective function. We present a novel heuristic supertree algorithm called Bad Clade Deletion (BCD) supertrees. It uses minimum cuts to delete a locally minimal number of columns from such a matrix representation so that it is compatible. This is the complement problem to Matrix Representation with Compatibility (Maximum Split Fit). Our algorithm has guaranteed polynomial worst-case running time and performs swiftly in practice. Different from local search heuristics, it guarantees to return the directed perfect phylogeny for the input matrix, corresponding to the parent tree of the input trees, if one exists. Comparing supertrees to model trees for simulated data, BCD shows a better accuracy (F1 score) than the state-of-the-art algorithms SuperFine (up to 3%) and Matrix Representation with Parsimony (up to 7%); at the same time, BCD is up to 7 times faster than SuperFine, and up to 600 times faster than Matrix Representation with Parsimony. Finally, using the BCD supertree as a starting tree for a combined Maximum Likelihood analysis using RAxML, we reach significantly improved accuracy (1% higher F1 score) and running time (1.7-fold speedup).


Assuntos
Biologia Computacional/métodos , Algoritmos , Simulação por Computador , Filogenia , Software
10.
J Mol Evol ; 86(2): 150-165, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29460038

RESUMO

Despite impressive advancements in technological and theoretical tools, construction of phylogenetic (evolutionary) trees is still a challenging task. The availability of enormous quantities of molecular data has made large-scale phylogenetic reconstruction involving thousands of species, a more viable goal. For this goal, separate trees over different, overlapping subsets of species, representing histories of various markers of these species, are collected. These trees, typically with conflicting signals, are subsequently combined into a single tree over the full set, an operation denoted as supertree construction. The amalgamation of such trees into a single tree lies at the heart of many tasks in phylogenetics, yet remains a daunting endeavor, especially in light of conflicting signals. In this work, we study the performance of matrix representation with parsimony (MRP), the most widely used supertree method to date, when confronted with quartet trees. Quartet trees are the most basic informational unit when amalgamation of unrooted trees is attempted, and they remain relevant in more general settings even though standard supertree methods are not necessarily confined to quartets. This study involves both real and simulated data, and the effects of several parameters on the results are evaluated, revealing a number of anomalies associated with MRP. We show that these anomalies are surmountable when using a recently introduced supertree method, weighted quartet MaxCut (wQMC).


Assuntos
Análise de Sequência de DNA/métodos , Algoritmos , Evolução Biológica , Simulação por Computador/estatística & dados numéricos , Interpretação Estatística de Dados , Filogenia , Projetos de Pesquisa/estatística & dados numéricos
11.
Proc Biol Sci ; 285(1884)2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-30068675

RESUMO

An understanding of the balance of interspecific competition and the physical environment in structuring organismal communities is crucial because those communities structured primarily by their physical environment typically exhibit greater sensitivity to environmental change than those structured predominantly by competitive interactions. Here, using detailed phylogenetic and functional information, we investigate this question in macrofaunal assemblages from Northwest Atlantic Ocean continental slopes, a high seas region projected to experience substantial environmental change through the current century. We demonstrate assemblages to be both phylogenetically and functionally under-dispersed, and thus conclude that the physical environment, not competition, may dominate in structuring deep-ocean communities. Further, we find temperature and bottom trawling intensity to be among the environmental factors significantly related to assemblage diversity. These results hint that deep-ocean communities are highly sensitive to their physical environment and vulnerable to environmental perturbation, including by direct disturbance through fishing, and indirectly through the changes brought about by climate change.


Assuntos
Organismos Aquáticos , Ecossistema , Pesqueiros , Animais , Oceano Atlântico , Mudança Climática , Filogenia , Temperatura
12.
Syst Biol ; 66(1): 112-120, 2017 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-28173480

RESUMO

The impact of incomplete lineage sorting (ILS) on phylogenetic conflicts among genes, and the related issue of whether to account for ILS in species tree reconstruction, are matters of intense controversy. Here, focusing on full-genome data in placental mammals, we empirically test two assumptions underlying current usage of tree-building methods that account for ILS. We show that in this data set (i) distinct exons from a common gene do not share a common genealogy, and (ii) ILS is only a minor determinant of the existing phylogenetic conflict. These results shed new light on the relevance and conditions of applicability of ILS-aware methods in phylogenomic analyses of protein coding sequences.


Assuntos
Genoma/genética , Genômica/normas , Mamíferos/classificação , Filogenia , Animais , Simulação por Computador , Mamíferos/genética , Proteínas/genética
13.
BMC Biol ; 15(1): 32, 2017 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-28449681

RESUMO

BACKGROUND: Fishes are extremely speciose and also highly disparate in their fin configurations, more specifically in the number of fins present as well as their structure, shape, and size. How they achieved this remarkable disparity is difficult to explain in the absence of any comprehensive overview of the evolutionary history of fish appendages. Fin modularity could provide an explanation for both the observed disparity in fin configurations and the sequential appearance of new fins. Modularity is considered as an important prerequisite for the evolvability of living systems, enabling individual modules to be optimized without interfering with others. Similarities in developmental patterns between some of the fins already suggest that they form developmental modules during ontogeny. At a macroevolutionary scale, these developmental modules could act as evolutionary units of change and contribute to the disparity in fin configurations. This study addresses fin disparity in a phylogenetic perspective, while focusing on the presence/absence and number of each of the median and paired fins. RESULTS: Patterns of fin morphological disparity were assessed by mapping fin characters on a new phylogenetic supertree of fish orders. Among agnathans, disparity in fin configurations results from the sequential appearance of novel fins forming various combinations. Both median and paired fins would have appeared first as elongated ribbon-like structures, which were the precursors for more constricted appendages. Among chondrichthyans, disparity in fin configurations relates mostly to median fin losses. Among actinopterygians, fin disparity involves fin losses, the addition of novel fins (e.g., the adipose fin), and coordinated duplications of the dorsal and anal fins. Furthermore, some pairs of fins, notably the dorsal/anal and pectoral/pelvic fins, show non-independence in their character distribution, supporting expectations based on developmental and morphological evidence that these fin pairs form evolutionary modules. CONCLUSIONS: Our results suggest that the pectoral/pelvic fins and the dorsal/anal fins form two distinct evolutionary modules, and that the latter is nested within a more inclusive median fins module. Because the modularity hypotheses that we are testing are also supported by developmental and variational data, this constitutes a striking example linking developmental, variational, and evolutionary modules.


Assuntos
Nadadeiras de Animais/crescimento & desenvolvimento , Evolução Biológica , Padronização Corporal , Peixes/crescimento & desenvolvimento , Nadadeiras de Animais/anatomia & histologia , Animais , Peixes/anatomia & histologia , Filogenia
14.
Mol Phylogenet Evol ; 107: 209-220, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-27818264

RESUMO

With the availability of enormous quantities of genetic data it has become common to construct very accurate trees describing the evolutionary history of the species under study, as well as every single gene of these species. These trees allow us to examine the evolutionary compliance of given markers (characters). A marker compliant with the history of the species investigated, has undergone mutations along the species tree branches, such that every subtree of that tree exhibits a different state. Convex recoloring (CR) uses combinatorial representation to measure the adequacy of a taxonomic classifier to a given tree. Despite its biological origins, research on CR has been almost exclusively dedicated to mathematical properties of the problem, or variants of it with little, if any, relationship to taxonomy. In this work we return to the origins of CR. We put CR in a statistical framework and introduce and learn the notion of the statistical significance of a character. We apply this measure to two data sets - Passerine birds and prokaryotes, and four examples. These examples demonstrate various applications of CR, from evolutionary relatedness, through lateral evolution, to supertree construction. The above study was done with a new software that we provide, containing algorithmic improvement with a graphical output of a (optimally) recolored tree. AVAILABILITY: A code implementing the features and a README is available at http://research.haifa.ac.il/ssagi/software/convexrecoloring.zip.


Assuntos
Algoritmos , Evolução Biológica , Migração Animal , Animais , Aves/genética , Simulação por Computador , Marcadores Genéticos , Muda , Filogenia , Células Procarióticas/metabolismo
15.
Syst Biol ; 65(3): 397-416, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-25281847

RESUMO

Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting (ILS) or horizontal gene transfer. In this manuscript, we address the problem of species tree inference from multilocus, genome-wide data sets regardless of the presence of gene duplication and loss and ILS therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood (ML) supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose inputs are posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations to evaluate the performance of our approach in comparison with other species tree approaches able to deal with more than one leaf from the same species. Our method ranked best under simulated data sets, in spite of ignoring branch lengths, and performed well on empirical data, as well as being fast enough to analyze relatively large data sets. Our Bayesian supertree method was also very successful in obtaining better estimates of gene trees, by reducing the uncertainty in their distributions. In addition, our results show that under complex simulation scenarios, gene tree parsimony is also a competitive approach once we consider its speed, in contrast to more sophisticated models.


Assuntos
Classificação/métodos , Modelos Genéticos , Filogenia , Teorema de Bayes , Simulação por Computador , Genoma/genética , Software
16.
Syst Biol ; 65(3): 366-80, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-25164915

RESUMO

Species tree estimation is complicated by processes, such as gene duplication and loss and incomplete lineage sorting (ILS), that cause discordance between gene trees and the species tree. Furthermore, while concatenation, a traditional approach to tree estimation, has excellent performance under many conditions, the expectation is that the best accuracy will be obtained through the use of species tree estimation methods that are specifically designed to address gene tree discordance. In this article, we report on a study to evaluate MP-EST-one of the most popular species tree estimation methods designed to address ILS-as well as concatenation under maximum likelihood, the greedy consensus, and two supertree methods (Matrix Representation with Parsimony and Matrix Representation with Likelihood). Our study shows that several factors impact the absolute and relative accuracy of methods, including the number of gene trees, the accuracy of the estimated gene trees, and the amount of ILS. Concatenation can be more accurate than the best summary methods in some cases (mostly when the gene trees have poor phylogenetic signal or when the level of ILS is low), but summary methods are generally more accurate than concatenation when there are an adequate number of sufficiently accurate gene trees. Our study suggests that coalescent-based species tree methods may be key to estimating highly accurate species trees from multiple loci.


Assuntos
Classificação/métodos , Filogenia , Simulação por Computador , Duplicação Gênica , Probabilidade
17.
Mol Biol Evol ; 32(6): 1628-42, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25657329

RESUMO

The wealth of phylogenetic information accumulated over many decades of biological research, coupled with recent technological advances in molecular sequence generation, presents significant opportunities for researchers to investigate relationships across and within the kingdoms of life. However, to make best use of this data wealth, several problems must first be overcome. One key problem is finding effective strategies to deal with missing data. Here, we introduce Lasso, a novel heuristic approach for reconstructing rooted phylogenetic trees from distance matrices with missing values, for data sets where a molecular clock may be assumed. Contrary to other phylogenetic methods on partial data sets, Lasso possesses desirable properties such as its reconstructed trees being both unique and edge-weighted. These properties are achieved by Lasso restricting its leaf set to a large subset of all possible taxa, which in many practical situations is the entire taxa set. Furthermore, the Lasso approach is distance-based, rendering it very fast to run and suitable for data sets of all sizes, including large data sets such as those generated by modern Next Generation Sequencing technologies. To better understand the performance of Lasso, we assessed it by means of artificial and real biological data sets, showing its effectiveness in the presence of missing data. Furthermore, by formulating the supermatrix problem as a particular case of the missing data problem, we assessed Lasso's ability to reconstruct supertrees. We demonstrate that, although not specifically designed for such a purpose, Lasso performs better than or comparably with five leading supertree algorithms on a challenging biological data set. Finally, we make freely available a software implementation of Lasso so that researchers may, for the first time, perform both rooted tree and supertree reconstruction with branch lengths on their own partial data sets.


Assuntos
Bases de Dados Genéticas , Modelos Genéticos , Filogenia , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Saccharomyces cerevisiae/classificação , Saccharomyces cerevisiae/genética , Software , Triticum/classificação , Triticum/genética
18.
Brief Bioinform ; 15(1): 79-90, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22908214

RESUMO

Supermatrix and supertree analyses are frequently used to more accurately recover vertical evolutionary history but debate still exists over which method provides greater reliability. Traditional methods that resolve relationships among organisms from single genes are often unreliable because of the frequent lack of strong phylogenetic signal and the presence of systematic artifacts. Methods developed to reconstruct organismal history from multiple genes can be divided into supermatrix and supertree approaches. A supermatrix analysis consists of the concatenation of multiple genes into a single, possibly partitioned alignment, from which phylogenies are reconstructed using a variety of approaches. Supertrees build consensus trees from the topological information contained within individual gene trees. Both methods are now widely used and have been demonstrated to solve previously ambiguous or unresolved phylogenies with high statistical support. However, the amount of misleading signal needed to induce erroneous phylogenies for both strategies is still unknown. Using genome simulations, we test the accuracy of supertree and supermatrix approaches in recovering the true organismal phylogeny under increased amounts of horizontally transferred genes and changes in substitution rates. Our results show that overall, supermatrix approaches are preferable when a low amount of gene transfer is suspected to be present in the dataset, while supertrees have greater reliability in the presence of a moderate amount of misleading gene transfers. In the face of very high or very low substitution rates without horizontal gene transfers, supermatrix approaches outperform supertrees as individual gene trees remain unresolved and additional sequences contribute to a congruent phylogenetic signal.


Assuntos
Transferência Genética Horizontal , Modelos Genéticos , Filogenia , Biologia Computacional , Simulação por Computador , Evolução Molecular , Genômica/estatística & dados numéricos , Alinhamento de Sequência/estatística & dados numéricos
19.
Syst Biol ; 64(3): 384-95, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25472575

RESUMO

Mollusks are the most morphologically disparate living animal phylum, they have diversified into all habitats, and have a deep fossil record. Monophyly and identity of their eight living classes is undisputed, but relationships between these groups and patterns of their early radiation have remained elusive. Arguments about traditional morphological phylogeny focus on a small number of topological concepts but often without regard to proximity of the individual classes. In contrast, molecular studies have proposed a number of radically different, inherently contradictory, and controversial sister relationships. Here, we assembled a data set of 42 unique published trees describing molluscan interrelationships. We used these data to ask several questions about the state of resolution of molluscan phylogeny compared with a null model of the variation possible in random trees constructed from a monophyletic assemblage of eight terminals. Although 27 different unique trees have been proposed from morphological inference, the majority of these are not statistically different from each other. Within the available molecular topologies, only four studies to date have included the deep sea class Monoplacophora; but 36.4% of all trees are not significantly different. We also present supertrees derived from two data partitions and three methods, including all available molecular molluscan phylogenies, which will form the basis for future hypothesis testing. The supertrees presented here were not constructed to provide yet another hypothesis of molluscan relationships, but rather to algorithmically evaluate the relationships present in the disparate published topologies. Based on the totality of available evidence, certain patterns of relatedness among constituent taxa become clear. The internodal distance is consistently short between a few taxon pairs, particularly supporting the relatedness of Monoplacophora and the chitons, Polyplacophora. Other taxon pairs are rarely or never found in close proximity, such as the vermiform Caudofoveata and Bivalvia. Our results have specific utility for guiding constructive research planning to better test relationships in Mollusca as well as other problematic groups. Taxa with consistently proximate relationships should be the focus of a combined approach in a concerted assessment of potential genetic and anatomical homology, whereas unequivocally distant taxa will make the most constructive choices for exemplar selection in higher level phylogenomic analyses.


Assuntos
Moluscos/classificação , Filogenia , Animais , Moluscos/anatomia & histologia , Moluscos/genética
20.
Syst Biol ; 64(2): 233-42, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25414175

RESUMO

Despite impressive technical and theoretical developments, reconstruction of phylogenetic trees for enormous quantities of molecular data is still a challenging task. A key tool in analyses of large data sets has been the construction of separate trees for subsets (e.g., quartets) of sequences, and subsequent combination of these subtrees into a single tree for the full set (i.e., supertree analysis). Unfortunately, even amalgamating quartets into a supertree remains a computationally daunting task. Assigning weights to quartets to indicate importance or reliability was proposed more than a decade ago, but handling weighted quartets is even more challenging and has scarcely been attempted in the past. In this work, we focus on weighted quartet-based approaches. We propose a scheme to assign weights to quartets coming from weighted trees and devise a tree similarity measure for weighted trees based on weighted quartets. We also extend the quartet MaxCut (QMC algorithm) to handle weighted quartets. We evaluate these tools on simulated and real data. Our simulated data analysis highlights the additional information that is conveyed when using the new weighted tree similarity measure, and shows that extending QMC to a weighted setting improves the quality of tree reconstruction. Our analyses of a cyanobacterial data set with weighted QMC reinforce previous results achieved with other tools.


Assuntos
Classificação/métodos , Modelos Biológicos , Filogenia , Cianobactérias/classificação , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA