Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Cladistics ; 32(4): 461-478, 2016 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-34736310

RESUMEN

Analysis of sequence data using time-reversible substitution models and maximum likelihood (ML) algorithms is currently the most popular method to infer phylogenies, despite the fact that results often contradict each other. Searching for sources of error we focus on a hitherto neglected feature of these methods: character polarity is usually thought to be irrelevant in ML analyses. Mechanisms that lead to wrong tree topologies were analysed at the level of split-supporting site patterns. In simulations, plesiomorphic site patterns can be identified by comparison with known root sequences. These patterns cause some surprising effects: Using data sets generated with simulations of sequence evolution along a variety of topologies and inferring trees using the same (correct) model, we show for cases of branch-length heterogeneity that (i) as already known, ML analyses can fail to recover the correct tree even when the correct substitution model is used, but also that (ii) plesiomorphic character states cause substantial mistakes and therefore character polarity is relevant, and (iii) accumulating chance similarities on long branches are far less misleading than plesiomorphic states accumulating on shorter branches. The artefacts occur when branch lengths are heterogeneous. The systematic errors disappear for the most part when the sites with symplesiomorphies supporting false clades are deleted from the data set. We conclude that many of the phylogenies published during the past decades may be false due to the neglected effects of symplesiomorphies.

2.
Mol Biol Evol ; 31(7): 1833-49, 2014 07.
Artículo en Inglés | MEDLINE | ID: mdl-24748651

RESUMEN

Based on molecular data three major clades have been recognized within Bilateria: Deuterostomia, Ecdysozoa, and Spiralia. Within Spiralia, small-sized and simply organized animals such as flatworms, gastrotrichs, and gnathostomulids have recently been grouped together as Platyzoa. However, the representation of putative platyzoans was low in the respective molecular phylogenetic studies, in terms of both, taxon number and sequence data. Furthermore, increased substitution rates in platyzoan taxa raised the possibility that monophyletic Platyzoa represents an artifact due to long-branch attraction. In order to overcome such problems, we employed a phylogenomic approach, thereby substantially increasing 1) the number of sampled species within Platyzoa and 2) species-specific sequence coverage in data sets of up to 82,162 amino acid positions. Using established and new measures (long-branch score), we disentangled phylogenetic signal from misleading effects such as long-branch attraction. In doing so, our phylogenomic analyses did not recover a monophyletic origin of platyzoan taxa that, instead, appeared paraphyletic with respect to the other spiralians. Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa and all other spiralians represent a monophyletic group, which we name Platytrochozoa. Platyzoan paraphyly suggests that the last common ancestor of Spiralia was a simple-bodied organism lacking coelomic cavities, segmentation, and complex brain structures, and that more complex animals such as annelids evolved from such a simply organized ancestor. This conclusion contradicts alternative evolutionary scenarios proposing an annelid-like ancestor of Bilateria and Spiralia and several independent events of secondary reduction.


Asunto(s)
Genómica/métodos , Helmintos/clasificación , Helmintos/genética , Animales , Evolución Molecular , Genoma de los Helmintos , Filogenia , Platelmintos/clasificación , Platelmintos/genética
3.
BMC Bioinformatics ; 15: 294, 2014 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-25176556

RESUMEN

BACKGROUND: Masking of multiple sequence alignment blocks has become a powerful method to enhance the tree-likeness of the underlying data. However, existing masking approaches are insensitive to heterogeneous sequence divergence which can mislead tree reconstructions. We present AliGROOVE, a new method based on a sliding window and a Monte Carlo resampling approach, that visualizes heterogeneous sequence divergence or alignment ambiguity related to single taxa or subsets of taxa within a multiple sequence alignment and tags suspicious branches on a given tree. RESULTS: We used simulated multiple sequence alignments to show that the extent of alignment ambiguity in pairwise sequence comparison is correlated with the frequency of misplaced taxa in tree reconstructions. The approach implemented in AliGROOVE allows to detect nodes within a tree that are supported despite the absence of phylogenetic signal in the underlying multiple sequence alignment. We show that AliGROOVE equally well detects heterogeneous sequence divergence in a case study based on an empirical data set of mitochondrial DNA sequences of chelicerates. CONCLUSIONS: The AliGROOVE approach has the potential to identify single taxa or subsets of taxa which show predominantly randomized sequence similarity in comparison with other taxa in a multiple sequence alignment. It further allows to evaluate the reliability of node support in a novel way.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Gráficos por Computador , Variación Genética , Filogenia , Alineación de Secuencia/métodos , Animales , Artrópodos/clasificación , Artrópodos/genética , ADN Mitocondrial/genética , Método de Montecarlo , Reproducibilidad de los Resultados
4.
Mol Phylogenet Evol ; 70: 94-8, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24076250

RESUMEN

BaCoCa (BAse COmposition CAlculator) is a user-friendly software that combines multiple statistical approaches (like RCFV and C value calculations) to identify biases in aligned sequence data which potentially mislead phylogenetic reconstructions. As a result of its speed and flexibility, the program provides the possibility to analyze hundreds of pre-defined gene partitions and taxon subsets in one single process run. BaCoCa is command-line driven and can be easily integrated into automatic process pipelines of phylogenomic studies. Moreover, given the tab-delimited output style the results can be easily used for further analyses in programs like Excel or statistical packages like R. A built-in option of BaCoCa is the generation of heat maps with hierarchical clustering of certain results using R. As input files BaCoCa can handle FASTA and relaxed PHYLIP, which are commonly used in phylogenomic pipelines. BaCoCa is implemented in Perl and works on Windows PCs, Macs and Linux operating systems. The executable source code as well as example test files and a detailed documentation of BaCoCa are freely available at http://software.zfmk.de.


Asunto(s)
Filogenia , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Análisis por Conglomerados , Genómica , Alineación de Secuencia
5.
Front Zool ; 11(1): 81, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25426157

RESUMEN

BACKGROUND: Phylogenetic and population genetic studies often deal with multiple sequence alignments that require manipulation or processing steps such as sequence concatenation, sequence renaming, sequence translation or consensus sequence generation. In recent years phylogenetic data sets have expanded from single genes to genome wide markers comprising hundreds to thousands of loci. Processing of these large phylogenomic data sets is impracticable without using automated process pipelines. Currently no stand-alone or pipeline compatible program exists that offers a broad range of manipulation and processing steps for multiple sequence alignments in a single process run. RESULTS: Here we present FASconCAT-G, a system independent editor, which offers various processing options for multiple sequence alignments. The software provides a wide range of possibilities to edit and concatenate multiple nucleotide, amino acid, and structure sequence alignment files for phylogenetic and population genetic purposes. The main options include sequence renaming, file format conversion, sequence translation between nucleotide and amino acid states, consensus generation of specific sequence blocks, sequence concatenation, model selection of amino acid replacement with ProtTest, two types of RY coding as well as site exclusions and extraction of parsimony informative sites. Convieniently, most options can be invoked in combination and performed during a single process run. Additionally, FASconCAT-G prints useful information regarding alignment characteristics and editing processes such as base compositions of single in- and outfiles, sequence areas in a concatenated supermatrix, as well as paired stem and loop regions in secondary structure sequence strings. CONCLUSIONS: FASconCAT-G is a command-line driven Perl program that delivers computationally fast and user-friendly processing of multiple sequence alignments for phylogenetic and population genetic applications and is well suited for incorporation into analysis pipelines.

6.
BMC Bioinformatics ; 14: 348, 2013 Dec 03.
Artículo en Inglés | MEDLINE | ID: mdl-24299043

RESUMEN

BACKGROUND: Character matrices with extensive missing data are frequently used in phylogenomics with potentially detrimental effects on the accuracy and robustness of tree inference. Therefore, many investigators select taxa and genes with high data coverage. Drawbacks of these selections are their exclusive reliance on data coverage without consideration of actual signal in the data which might, thus, not deliver optimal data matrices in terms of potential phylogenetic signal. In order to circumvent this problem, we have developed a heuristics implemented in a software called mare which (1) assesses information content of genes in supermatrices using a measure of potential signal combined with data coverage and (2) reduces supermatrices with a simple hill climbing procedure to submatrices with high total information content. We conducted simulation studies using matrices of 50 taxa × 50 genes with heterogeneous phylogenetic signal among genes and data coverage between 10-30%. RESULTS: With matrices of 50 taxa × 50 genes with heterogeneous phylogenetic signal among genes and data coverage between 10-30% Maximum Likelihood (ML) tree reconstructions failed to recover correct trees. A selection of a data subset with the herein proposed approach increased the chance to recover correct partial trees more than 10-fold. The selection of data subsets with the herein proposed simple hill climbing procedure performed well either considering the information content or just a simple presence/absence information of genes. We also applied our approach on an empirical data set, addressing questions of vertebrate systematics. With this empirical dataset selecting a data subset with high information content and supporting a tree with high average boostrap support was most successful if information content of genes was considered. CONCLUSIONS: Our analyses of simulated and empirical data demonstrate that sparse supermatrices can be reduced on a formal basis outperforming the usually used simple selections of taxa and genes with high data coverage.


Asunto(s)
Algoritmos , Clasificación/métodos , Evolución Molecular , Filogenia , Animales , Bovinos , Simulación por Computador , Reparación del ADN/genética , Humanos , Funciones de Verosimilitud , Ratones , Probabilidad , Distribución Aleatoria , Ratas , Transducción de Señal/genética , Programas Informáticos , Máquina de Vectores de Soporte , Sus scrofa
7.
NAR Genom Bioinform ; 4(3): lqac064, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-36128424

RESUMEN

Confidence measures of branch reliability play an important role in phylogenetics as these measures allow to identify trees or parts of a tree that are well supported by the data and thus adequate to serve as basis for evolutionary inference of biological systems. Unreliable branch relationships in phylogenetic analyses are of concern because of their potential to represent incorrect relationships of interest among more reliable branch relationships. The site-concordance factor implemented in the IQ-TREE package is a recently introduced heuristic solution to the problem of identifying unreliable branch relationships on the basis of quartets. We test the performance of the site-concordance measure with simple examples based on simulated data and designed to study its behaviour in branch support estimates related to different degrees of branch length heterogeneities among a ten sequence tree. Our results show that in particular in cases of relationships with heterogeneous branch lengths site-concordance measures may be misleading. We therefore argue that the maximum parsimony optimality criterion currently used by the site-concordance measure may sometimes be poorly suited to evaluate branch support and that the scores reported by the site-concordance factor should not be considered as reliable.

8.
Mol Biol Evol ; 27(11): 2507-21, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20530152

RESUMEN

The use of secondary structures has been advocated to improve both the alignment and the tree reconstruction processes of ribosomal RNA (rRNA) data sets. We used simulated and empirical rRNA data to test the impact of secondary structure consideration in both steps of molecular phylogenetic analyses. A simulation approach was used to generate realistic rRNA data sets based on real 16S, 18S, and 28S sequences and structures in combination with different branch length and topologies. Alignment and tree reconstruction performance of four recent structural alignment methods was compared with exclusively sequence-based approaches. As empirical data, we used a hexapod rRNA data set to study the influence of nucleotide interdependencies in sequence alignment and tree reconstruction. Structural alignment methods delivered significantly better sequence alignments compared with pure sequence-based methods. Also, structural alignment methods delivered better trees judged by topological congruence to simulation base trees. However, the advantage of structural alignments was less pronounced and even vanished in several instances. For simulated data, application of mixed RNA/DNA models to stems and loops, respectively, led to significantly shorter branches. The application of mixed RNA/DNA models in the hexapod analyses delivered partly implausible relationships. This can be interpreted as a stronger sensitivity of mixed model setups to nonphylogenetic signal. Secondary structure consideration clearly influenced sequence alignment and tree reconstruction of ribosomal genes. Although sequence alignment quality can considerably be improved by the use of secondary structure information, the application of mixed models in tree reconstructions needs further studies to understand the observed effects.


Asunto(s)
Artrópodos/genética , Simulación por Computador , Conformación de Ácido Nucleico , Filogenia , ARN Ribosómico/química , Alineación de Secuencia/métodos , Animales , Teorema de Bayes , ARN Ribosómico/genética
9.
Mol Biol Evol ; 27(11): 2451-64, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20534705

RESUMEN

Arthropods were the first animals to conquer land and air. They encompass more than three quarters of all described living species. This extraordinary evolutionary success is based on an astoundingly wide array of highly adaptive body organizations. A lack of robustly resolved phylogenetic relationships, however, currently impedes the reliable reconstruction of the underlying evolutionary processes. Here, we show that phylogenomic data can substantially advance our understanding of arthropod evolution and resolve several conflicts among existing hypotheses. We assembled a data set of 233 taxa and 775 genes from which an optimally informative data set of 117 taxa and 129 genes was finally selected using new heuristics and compared with the unreduced data set. We included novel expressed sequence tag (EST) data for 11 species and all published phylogenomic data augmented by recently published EST data on taxonomically important arthropod taxa. This thorough sampling reduces the chance of obtaining spurious results due to stochastic effects of undersampling taxa and genes. Orthology prediction of genes, alignment masking tools, and selection of most informative genes due to a balanced taxa-gene ratio using new heuristics were established. Our optimized data set robustly resolves major arthropod relationships. We received strong support for a sister group relationship of onychophorans and euarthropods and strong support for a close association of tardigrades and cycloneuralia. Within pancrustaceans, our analyses yielded paraphyletic crustaceans and monophyletic hexapods and robustly resolved monophyletic endopterygote insects. However, our analyses also showed for few deep splits that were recently thought to be resolved, for example, the position of myriapods, a remarkable sensitivity to methods of analyses.


Asunto(s)
Artrópodos/clasificación , Artrópodos/genética , Genómica/métodos , Filogenia , Animales , Teorema de Bayes , Etiquetas de Secuencia Expresada , Funciones de Verosimilitud , Especificidad de la Especie
10.
Mol Phylogenet Evol ; 56(3): 1115-8, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20416383

RESUMEN

FASconCAT is a user-friendly software that concatenates rapidly different kinds of sequence data into one supermatrix file. Output files are either in FASTA, PHYLIP or NEXUS format and are directly loadable in phylogenetic programs like PAUP *, RAxML or MrBayes. FASconCAT can handle FASTA, PHYLIP and CLUSTAL formatted input files in one single run. It provides useful information about each input file and the concatenated supermatrix. For example, the program provides the range information of each concatenated gene (partition) and delivers a check list of all concatenated sequences (taxa). Information about the base composition of single input files and the resulting supermatrix is supplied for nucleotide data. For given structure strings (e.g. secondary structures) it displays single unpaired (loop) and paired (stem) positions after the concatenation process. Optionally, FASconCAT generates NEXUS files of concatenated sequences, either with MrBayes commands directly executable in PAUP * and MrBayes, or without any specific commands. If favoured, FASconCAT dispenses output files in PHYLIP format with relaxed (unlimited signs) or restricted taxon names (up to ten signs) while sequences are printed in non-interleaved format. FASconCAT is implemented in Perl and freely available from http://software.zfmk.de. It runs on UNIX and MS Windows operating systems.


Asunto(s)
Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Filogenia
11.
Front Zool ; 7: 10, 2010 Mar 31.
Artículo en Inglés | MEDLINE | ID: mdl-20356385

RESUMEN

BACKGROUND: Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. RESULTS: ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. CONCLUSIONS: Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements.

12.
Mol Phylogenet Evol ; 53(3): 758-71, 2009 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19654047

RESUMEN

Secondary structure models of mitochondrial and nuclear (r)RNA sequences are frequently applied to aid the alignment of these molecules in phylogenetic analyses. Additionally, it is often speculated that structure variation of (r)RNA sequences might profitably be used as phylogenetic markers. The benefit of these approaches depends on the reliability of structure models. We used a recently developed approach to show that reliable inference of large (r)RNA secondary structures as a prerequisite of simultaneous sequence and structure alignment is feasible. The approach iteratively establishes local structure constraints of each sequence and infers fully folded individual structures by constrained MFE optimization. A comparison of structure edit distances of individual constraints and fully folded structures showed pronounced phylogenetic signal in fully folded structures. As model sequences we characterized secondary structures of 28S rRNA sequences of selected insects and examined their phylogenetic signal according to established phylogenetic hypotheses.


Asunto(s)
Conformación de Ácido Nucleico , Filogenia , ARN Ribosómico 28S/genética , Animales , Genes de ARNr , Insectos/genética , Alineación de Secuencia , Análisis de Secuencia de ARN
13.
PLoS One ; 12(8): e0183393, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28841676

RESUMEN

Systematic biases such as long branch attraction can mislead commonly relied upon model-based (i.e. maximum likelihood and Bayesian) phylogenetic methods when, as is usually the case with empirical data, there is model misspecification. We present PhyQuart, a new method for evaluating the three possible binary trees for any quartet of taxa. PhyQuart was developed through a process of reciprocal illumination between a priori considerations and the results of extensive simulations. It is based on identification of site-patterns that can be considered to support a particular quartet tree taking into account the Hennigian distinction between apomorphic and plesiomorphic similarity, and employing corrections to the raw observed frequencies of site-patterns that exploit expectations from maximum likelihood estimation. We demonstrate through extensive simulation experiments that, whereas maximum likeilihood estimation performs well in many cases, it can be outperformed by PhyQuart in cases where it fails due to extreme branch length asymmetries producing long-branch attraction artefacts where there is only very minor model misspecification.


Asunto(s)
Funciones de Verosimilitud , Filogenia , Análisis de Secuencia , Algoritmos , Modelos Teóricos
14.
PLoS One ; 7(11): e49119, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23152859

RESUMEN

The amplified fragment length polymorphisms (AFLP) method has become an attractive tool in phylogenetics due to the ease with which large numbers of characters can be generated. In contrast to sequence-based phylogenetic approaches, AFLP data consist of anonymous multilocus markers. However, potential artificial amplifications or amplification failures of fragments contained in the AFLP data set will reduce AFLP reliability especially in phylogenetic inferences. In the present study, we introduce a new automated scoring approach, called "AMARE" (AFLP MAtrix REduction). The approach is based on replicates and makes marker selection dependent on marker reproducibility to control for scoring errors. To demonstrate the effectiveness of our approach we record error rate estimations, resolution scores, PCoA and stemminess calculations. As in general the true tree (i.e. the species phylogeny) is not known, we tested AMARE with empirical, already published AFLP data sets, and compared tree topologies of different AMARE generated character matrices to existing phylogenetic trees and/or other independent sources such as morphological and geographical data. It turns out that the selection of masked character matrices with highest resolution scores gave similar or even better phylogenetic results than the original AFLP data sets.


Asunto(s)
Análisis del Polimorfismo de Longitud de Fragmentos Amplificados/métodos , Análisis del Polimorfismo de Longitud de Fragmentos Amplificados/normas , Filogenia , Algoritmos , Animales , Anuros/clasificación , Anuros/genética , Automatización , Caniformia/clasificación , Caniformia/genética , Bases de Datos Genéticas , Marcadores Genéticos , Ipomoea/clasificación , Ipomoea/genética , Lamiaceae/clasificación , Lamiaceae/genética , Reproducibilidad de los Resultados
15.
PLoS One ; 7(5): e36593, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22662120

RESUMEN

The aim of our study was to test the robustness and efficiency of maximum likelihood with respect to different long branch effects on multiple-taxon trees. We simulated data of different alignment lengths under two different 11-taxon trees and a broad range of different branch length conditions. The data were analyzed with the true model parameters as well as with estimated and incorrect assumptions about among-site rate variation. If length differences between connected branches strongly increase, tree inference with the correct likelihood model assumptions can fail. We found that incorporating invariant sites together with Γ distributed site rates in the tree reconstruction (Γ+I) increases the robustness of maximum likelihood in comparison with models using only Γ. The results show that for some topologies and branch lengths the reconstruction success of maximum likelihood under the correct model is still low for alignments with a length of 100,000 base positions. Altogether, the high confidence that is put in maximum likelihood trees is not always justified under certain tree shapes even if alignment lengths reach 100,000 base positions.


Asunto(s)
Simulación por Computador , Modelos Genéticos , Filogenia , Funciones de Verosimilitud
16.
PLoS One ; 6(6): e21031, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21731644

RESUMEN

Martialinae are pale, eyeless and probably hypogaeic predatory ants. Morphological character sets suggest a close relationship to the ant subfamily Leptanillinae. Recent analyses based on molecular sequence data suggest that Martialinae are the sister group to all extant ants. However, by comparing molecular studies and different reconstruction methods, the position of Martialinae remains ambiguous. While this sister group relationship was well supported by Bayesian partitioned analyses, Maximum Likelihood approaches could not unequivocally resolve the position of Martialinae. By re-analysing a previous published molecular data set, we show that the Maximum Likelihood approach is highly appropriate to resolve deep ant relationships, especially between Leptanillinae, Martialinae and the remaining ant subfamilies. Based on improved alignments, alignment masking, and tree reconstructions with a sufficient number of bootstrap replicates, our results strongly reject a placement of Martialinae at the first split within the ant tree of life. Instead, we suggest that Leptanillinae are a sister group to all other extant ant subfamilies, whereas Martialinae branch off as a second lineage. This assumption is backed by approximately unbiased (AU) tests, additional Bayesian analyses and split networks. Our results demonstrate clear effects of improved alignment approaches, alignment masking and data partitioning. We hope that our study illustrates the importance of thorough, comprehensible phylogenetic analyses using the example of ant relationships.


Asunto(s)
Hormigas/genética , Filogenia , Animales , Funciones de Verosimilitud , Reproducibilidad de los Resultados , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA