Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Nucleic Acids Res ; 40(3): e17, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22121222

RESUMO

We introduce the software tool NTRFinder to search for a complex repetitive structure in DNA we call a nested tandem repeat (NTR). An NTR is a recurrence of two or more distinct tandem motifs interspersed with each other. We propose that NTRs can be used as phylogenetic and population markers. We have tested our algorithm on both real and simulated data, and present some real NTRs of interest. NTRFinder can be downloaded from http://www.maths.otago.ac.nz/~aamatroud/.


Assuntos
Software , Sequências de Repetição em Tandem , Algoritmos , Cromossomos Humanos Y , Humanos , Análise de Sequência de DNA
2.
Bioinformatics ; 20 Suppl 1: i348-54, 2004 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-15262819

RESUMO

Maximum likelihood (ML) for phylogenetic inference from sequence data remains a method of choice, but has computational limitations. In particular, it cannot be applied for a global search through all potential trees when the number of taxa is large, and hence a heuristic restriction in the search space is required. In this paper, we derive a quadratic approximation, QAML, to the likelihood function whose maximum is easily determined for a given tree. The derivation depends on Hadamard conjugation, and hence is limited to the simple symmetric models of Kimura and of Jukes and Cantor. Preliminary testing has demonstrated the accuracy of QAML is close to that of ML.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Evolução Molecular , Modelos Genéticos , Filogenia , Análise de Sequência de DNA/métodos , Simulação por Computador , Funções Verossimilhança , Modelos Estatísticos
3.
FEBS Lett ; 301(2): 127-31, 1992 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-1568469

RESUMO

Controversy exists over the origins of photosynthetic organelles in that contradictory trees arise from different sequence, biochemical and ultrastructural data sets. We propose a testable hypothesis which explains this inconsistency as a result of the differing GC contents of sequences. We report that current methods of tree reconstruction tend to group sequences with similar GC contents irrespective of whether the similar GC content is due to common ancestry or is independently acquired. Nuclear encoded sequences (high GC) give different trees from chloroplast encoded sequences (low GC). We find that current data is consistent with the hypothesis of multiple origins for photosynthetic organelles and single origins for each type of light harvesting complex.


Assuntos
Evolução Biológica , Cloroplastos , Composição de Bases , Cloroplastos/química , Cloroplastos/metabolismo
4.
J Comput Biol ; 3(1): 19-31, 1996.
Artigo em Inglês | MEDLINE | ID: mdl-8697236

RESUMO

For various models of sequence evolution, the set of linear functions of the frequencies of the nucleotide patterns forms a vector space, the invariant space. Here we distinguish between the model of nucleotide substitution, and the phylogenetic tree T describing the paths on which these changes occur. We describe a procedure to construct a basis of the invariant space for those models that are extensions of models incorporating Kimura's three substitution model of nucleotide change, including both the Jukes-Cantor and Cavender-Farris models. The dimension of the invariant space is determined, for those models where it is independent of the tree topology, as a function of the number of sequences. These are calculated where the nucleotide distribution at the root is unspecified, and both with, and without, the assumption of the molecular clock hypothesis. The invariants have a number of potential applications, including tree identification, and testing the fit of models (which could include the molecular clock) to sequence data.


Assuntos
Evolução Biológica , Modelos Genéticos , Simulação por Computador , Mutação
5.
J Comput Biol ; 1(2): 133-51, 1994.
Artigo em Inglês | MEDLINE | ID: mdl-8790460

RESUMO

Simulations were used to study the performance of several character-based and distance-based phylogenetic methods in obtaining the correct tree from pseudo-randomly generated input data. The study included all the topologies of unrooted binary trees with from 4 to 10 pendant vertices (taxa) inclusive. The length of the character sequences used ranged from 10 to 10(5) characters exponentially. The methods studied include Closest Tree, Compatibility, Li's method, Maximum Parsimony, Neighbor-joining, Neighborliness, and UPGMA. We also provide a modification to Li's method (SimpLi) which is consistent with additive data. We give estimations of the sequence lengths required for given confidence in the output of these methods under the assumptions of molecular evolution used in this study. A notation for characterizing all tree topologies is described. We show that when the number of taxa, the maximum path length, and the minimum edge length are held constant, there it little but significant dependence of the performance of the methods on the tree topology. We show that those methods that are consistent with the model used perform similarly, whereas the inconsistent methods, UPGMA and Li's method, perform very poorly.


Assuntos
Sequência de Bases , Simulação por Computador , Modelos Biológicos , Filogenia , Alinhamento de Sequência/métodos , DNA/classificação , RNA/classificação , Reprodutibilidade dos Testes
6.
J Comput Biol ; 1(2): 153-63, 1994.
Artigo em Inglês | MEDLINE | ID: mdl-8790461

RESUMO

For a sequence of colors independently evolving on a tree under a simple Markov model, we consider conditions under which the tree can be uniquely recovered from the "sequence spectrum"-the expected frequencies of the various leaf colorations. This is relevant for phylogenetic analysis (where colors represent nucleotides or amino acids; leaves represent extant taxa) as the sequence spectrum is estimated directly from a collection of aligned sequences. Allowing the rate of the evolutionary process to vary across sites is an important extension over most previous studies-we show that, given suitable restrictions on the rate distribution, the true tree (up to the placement of its root) is uniquely identified by its sequence spectrum. However, if the rate distribution is unknown and arbitrary, then, for simple models, it is possible for every tree to produce the same sequence spectrum. Hence there is a logical barrier to accurate, consistent phylogenetic inference for these models when assumptions about the rate distribution are not made. This result exploits a novel theorem on the action of polynomials with non-negative coefficients on sequences.


Assuntos
Modelos Biológicos , Filogenia , Análise de Sequência , Cadeias de Markov , Reprodutibilidade dos Testes , Alinhamento de Sequência
7.
Proc Biol Sci ; 267(1447): 1041-7, 2000 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-10874755

RESUMO

Molecular dates consistently place the divergence of major metazoan lineages in the Precambrian, leading to the suggestion that the 'Cambrian explosion' is an artefact of preservation which left earlier forms unrecorded in the fossil record. While criticisms of molecular analyses for failing to deal with variation in the rate of molecular evolution adequately have been countered by analyses which allow both site-to-site and lineage-specific rate variation, no analysis to date has allowed the rates to vary temporally. If the rates of molecular evolution were much higher early in the metazoan radiation, molecular dates could consistently overestimate the divergence times of lineages. Here, we use a new method which uses multiple calibration dates and an empirically determined range of possible substitution rates to place bounds on the basal date of divergence of lineages in order to ask whether faster rates of molecular evolution early in the metazoan radiation could possibly account for the discrepancy between molecular and palaeontological date estimates. We find that allowing basal (interphylum) lineages the fastest observed substitution rate brings the minimum possible divergence date (586 million years ago) to the Vendian period, just before the first multicellular animal fossils, but excludes divergence of the major metazoan lineages in a Cambrian explosion.


Assuntos
Evolução Biológica , Animais , Fósseis , Modelos Biológicos
8.
Cladistics ; 1(3): 266-278, 1985 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34965674

RESUMO

Abstract- Evaluating the reliability of methods for reconstructing evolutionary trees is discussed under the four headings of: evaluating criteria for an optimal tree, finding the optimal tree for the criterion selected, detecting reliable and unreliable data, and estimating the error range for the final tree. It is shown with five data sets (protein sequences) that, in general, the minimal tree is a better estimate of phylogeny than a longer tree. However, for each data set, the minimal tree was no longer the shortest when the sequences were combined. An objective weighting of columns (characters) can lead to an improved tree by giving less weight to columns that are closer to a random order. The weighting of characters is derived from the ratio of the observed to expected number of incompatabilities for each column. Several forms of weighting give better trees as measured by both the increase in correlation between lengths of trees with different subsets of data, and by an increase in the similarity between minimal trees found with disjoint subsets of data. Increasing the size of randomly selected subsets, and measuring the increased similarity of the results, can lead to an estimate of the minimum number of trees that need to be considered as possibly the correct historical tree. A measure of the 'treeness' of the data is described that estimates the extent to which a binary tree is a good description of the data.

9.
Comput Appl Biosci ; 3(3): 183-7, 1987 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-3453227

RESUMO

A branch and bound algorithm is described for searching rapidly for minimal length trees from biological data. The algorithm adds characters one at a time, rather than adding taxa, as in previous branch and bound methods. The algorithm has been programmed and is available from the authors. A worked example is given with 33 characters and 15 taxa. About 8 x 10(12) binary trees are possible with 15 taxa but the branch and bound program finds the minimal tree in less than 5 min on an IBM PC.


Assuntos
Algoritmos , Filogenia , Animais , Classificação/métodos , Humanos , Design de Software
10.
J Mol Evol ; 50(3): 296-301, 2000 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-10754073

RESUMO

One of the most useful features of molecular phylogenetic analyses is the potential for estimating dates of divergence of evolutionary lineages from the DNA of extant species. But lineage-specific variation in rate of molecular evolution complicates molecular dating, because a calibration rate estimated from one lineage may not be an accurate representation of the rate in other lineages. Many molecular dating studies use a "clock test" to identify and exclude sequences that vary in rate between lineages. However, these clock tests should not be relied upon without a critical examination of their effectiveness at removing rate variable sequences from any given data set, particularly with regard to the sequence length and number of variable sites. As an illustration of this problem we present a power test of a frequently employed triplet relative rates test. We conclude that (1) relative rates tests are unlikely to detect moderate levels of lineage-specific rate variation (where one lineage has a rate of molecular evolution 1.5 to 4.0 times the other) for most commonly used sequences in molecular dating analyses, and (2) this lack of power is likely to result in substantial error in the estimation of dates of divergence. As an example, we show that the well-studied rate difference between murid rodents and great apes will not be detected for many of the sequences used to date the divergence between these two lineages and that this failure to detect rate variation is likely to result in consistent overestimation the date of the rodent-primate split.


Assuntos
Evolução Molecular , Variação Genética
11.
Proc Natl Acad Sci U S A ; 91(8): 3339-43, 1994 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-8159749

RESUMO

Discrete Fourier transformations have recently been developed to model the evolution of two-state characters (the Cavender/Farris model). We report here the extension of these transformations to provide invertible relationships between a phylogenetic tree T (with three probability parameters of nucleotide substitution on each edge corresponding to Kimura's 3ST model) and the expected frequencies of the nucleotide patterns in the sequences. We refer to these relationships as spectral analysis. In either model with independent and identically distributed site substitutions, spectral analysis allows a global correction for all multiple substitutions (second- and higher-order interactions), independent of any particular tree. From these corrected data we use a least-squares selection procedure, the closest tree algorithm, to infer an evolutionary tree. Other selection criteria such as parsimony or compatibility analysis could also be used; each of these criteria will be statistically consistent for these models. The closest tree algorithm selects a unique best-fit phylogenetic tree together with independent edge length parameters for each edge. The method is illustrated with an analysis of some primate hemoglobin sequences.


Assuntos
Filogenia , Pseudogenes , Análise de Sequência/métodos , Animais , DNA/genética , Análise de Fourier , Globinas/genética , Humanos , Mutação , Primatas
12.
J Mol Evol ; 13(2): 127-49, 1979 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-480370

RESUMO

The problem of determining the minimal phylogenetic tree is discussed in relation to graph theory. It is shown that this problem is an example of the Steiner problem in graphs which is to connect a set of points by a minimal length network where new points can be added. There is no reported method of solving realistically-sized Steiner problems in reasonable computing time. A heuristic method of approaching the phylogenetic problem is presented, together with a worked example with 7 mammalian cytochrome c sequences. It is shown in this case that the method develops a phylogenetic tree that has the smallest possible number of amino acid replacements. The potential and limitations of the method are discussed. It is stressed that objective methods must be used for comparing different trees. In particular it should be determined how close a given tree is to a mathematically determined lower bound. A theorem is proved which is used to establish a lower bound on the lenghtof any tree and if a tree is found with a length equal to the lower bound, then no shorter tree can exist.


Assuntos
Filogenia , Matemática , Modelos Biológicos
13.
Biochem J ; 187(1): 65-74, 1980 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-6773522

RESUMO

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127--150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151--166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.


Assuntos
Hemoglobinas , Filogenia , Sequência de Aminoácidos , Animais , Sequência de Bases , Cães , Haplorrinos , Humanos , Mamíferos , Métodos , Camundongos , Modelos Biológicos , Nucleotídeos , Coelhos
14.
Mol Phylogenet Evol ; 19(1): 1-8, 2001 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-11286486

RESUMO

The groupings of taxa in a phylogenetic tree cannot represent all the conflicting signals that usually occur among site patterns in aligned homologous genetic sequences. Hence a tree-building program must compromise by reporting a subset of the patterns, using some discriminatory criterion. Thus, in the worst case, out of possibly a large number of equally good trees, only an arbitrarily chosen tree might be reported by the tree-building program as "The Tree." This tree might then be used as a basis for phylogenetic conclusions. One strategy to represent conflicting patterns in the data is to construct a network. The Buneman graph is a theoretically very attractive example of such a network. In particular, a characterization for when this network will be a tree is known. Also the Buneman graph contains each of the most parsimonious trees indicated by the data. In this paper we describe a new method for constructing the Buneman graph that can be used for a generalization of Hadamard conjugation to networks. This new method differs from previous methods by allowing us to focus on local regions of the graph without having to first construct the full graph. The construction is illustrated by an example.


Assuntos
Algoritmos , Filogenia , DNA Mitocondrial/genética , Humanos
15.
Syst Biol ; 52(2): 229-38, 2003 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-12746148

RESUMO

We conducted a simulation study of the phylogenetic methods UPGMA, neighbor joining, maximum parsimony, and maximum likelihood for a five-taxon tree under a molecular clock. The parameter space included a small region where maximum parsimony is inconsistent, so we tested inconsistency correction for parsimony and distance correction for neighbor joining. As expected, corrected parsimony was consistent. For these data, maximum likelihood with the clock assumption outperformed each of the other methods tested. The distance-based methods performed marginally better than did maximum parsimony and maximum likelihood without the clock assumption. Data correction was generally detrimental to accuracy, especially for short sequence lengths. We identified another region of the parameter space where, although consistent for a given method, some incorrect trees were each selected with up to twice the frequency of the correct (generating) tree for sequences of bounded length. These incorrect trees are those where the outgroup has been incorrectly placed. In addition to this problem, the placement of the outgroup sequence can have a confounding effect on the ingroup tree, whereby the ingroup is correct when using the ingroup sequences alone, but with the inclusion of the outgroup the ingroup tree becomes incorrect.


Assuntos
Simulação por Computador , Modelos Genéticos , Filogenia , Análise por Conglomerados , Evolução Molecular , Funções Verossimilhança , Modelos Estatísticos , Projetos de Pesquisa
16.
J Mol Evol ; 13(2): 151-66, 1979 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-225499

RESUMO

We have recently described a method of building phylogenetic trees and have outlined an approach for proving whether a particular tree is optimal for the data used. In this paper we describe in detail the method of establishing lower bounds on the length of a minimal tree by partitioning the data set into subsets. All characters that could be involved in duplications in the data are paired with all other such characters. A matching algorithm is then used to obtain the pairing of characters that reveals the most duplications in the data. This matching may still not account for all nucleotide substitutions on the tree. The structure of the tree is then used to help select subsets of three or more characters until the lower bound found by partitioning is equal to the length of the tree. The tree must then be a minimal tree since no tree can exist with a length less than that of the lower bound. The method is demonstrated using a set of 23 vertebrate cytochrome c sequences with the criterion of minimizing the total number of nucleotide substitutions. There are 131130 7045768798 96033440625 topologically distinct trees that can be constructed from this data set. The method described in this paper does identify 144 minimal tree variants. The method is general in the sense that it can be used for other data and other criteria of length. It need not however always be possible to prove a treee minimal but the method will give an upper and lower bound on the length of minimal trees.


Assuntos
Grupo dos Citocromos c , Filogenia , Vertebrados , Sequência de Aminoácidos , Aminoácidos , Animais , Evolução Biológica , Código Genético , Matemática , Modelos Biológicos
17.
J Theor Biol ; 140(3): 289-303, 1989 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-2615399

RESUMO

The study of phylogeny is becoming increasing scientific in that hypotheses can be tested quantitatively. We report a method of estimating the probabilities of obtaining a tree of a given length from nucleic acid sequence data. The method is applied to the hypothesis of Hoyle & Wickramasinghe that the earth is being continually bombarded by influenza (and other) viruses which originate from comets. A quantitative analysis of sequences from the H1 strain of human influenza viruses contradicts three versions of the Hoyle-Wickramasinghe model. One non-evolutionary version of their model has less than one chance in 10(66) of being correct. A version that allowed extraterrestrial evolution has less than one change in 10(6) of being correct. The sequence data is in agreement with the biological (evolutionary) model. The results are discussed from the aspect of the falsifiability of evolutionary theory.


Assuntos
Evolução Biológica , DNA Viral/genética , Geologia , Modelos Genéticos , Orthomyxoviridae/genética , Sequência de Bases , Meio Ambiente Extraterreno , Fenômenos Geológicos , Dados de Sequência Molecular , Origem da Vida , Filogenia , Probabilidade
18.
Trends Ecol Evol ; 7(3): 73-9, 1992 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21235960

RESUMO

Evolutionists dream of a tree-reconstruction method that is efficient (fast), powerful, consistent, robust and falsifiable. These criteria are at present conflicting in that the fastest methods are weak (in their use of information in the sequences) and inconsistent (even with very long sequences they may lead to an incorrect tree). But there has been exciting progress in new approaches to tree inference, in understanding general properties of methods, and in developing ideas for estimating the reliability of trees. New phylogenetic invariant methods allow selected parameters of the underlying model to be estimated directly from sequences. There is still a need for more theoretical understanding and assistance in applying what is already known.

19.
Mol Phylogenet Evol ; 2(1): 6-12, 1993 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-8081548

RESUMO

A class of phylogenetic clustering methods which calculate net divergences from distance data, but assign differing weights to the net divergences, is defined. The class includes the Neighbor-Joining Method and the Unweighted Pair-Group Method with Arithmetic Mean. The accuracy of some of these methods is studied by computer simulation for the case of four taxa under the additive tree hypothesis. Of these methods and under this hypothesis, it is proved that Neighbor-Joining uses the only weighting for net divergence which is consistent, so that it is the only method in the class which is expected to converge to the correct tree as more data are added. Neighbor-Joining is then compared with Closest Tree on Distances for five taxa by simulation. It is proved that Closest Tree on Distances is equivalent to Neighbor-Joining for four taxa, though it is not when more than four taxa are considered.


Assuntos
Análise por Conglomerados , Simulação por Computador , Modelos Genéticos , Filogenia , Frequência do Gene , Distribuição de Poisson
20.
Photosynth Res ; 37(1): 61-8, 1993 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24317654

RESUMO

We examine the issue of prochlorophyte origins and provide analyses which highlight the limitations of inferring evolutionary trees from anciently diverged sequences that have markedly different GC contents. Under these conditions we have found that current tree reconstruction methods strongly group together sequences with similar GC contents, whether or not the sequences share a common ancestor. We provide 3'psbA termini sequence forProchloron didemni and find it does not have the 7 amino acid deletion that occurs in Chla/b chloroplasts andProchlorothrix hollandica. This is consistent with the recent findings of a Chlc like pigment in the light harvesting system in other prochlorophytes but apparently absent inP. hollandica. From these observations we suggest thatP. hollandica is the prochlorophyte most closely related to Chla/b containing chloroplasts and hence the most appropriate prokaryotic model for higher plant Chla/b photosynthesis.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA