RESUMO
Results from phylogenetic analyses that study the evolution of species according to their biological characteristics are frequently structured as phylogenetic trees. One of the most widely used methods for reconstructing them is the distance-based method known as the neighbor-joining (NJ) algorithm. It is known that the NJ algorithm can produce different phylogenetic trees depending on the order of the taxa in the input matrix of evolutionary distances, because the method only yields bifurcating branches or dichotomies. According to this, results and conclusions published in articles that only calculate one of the possible dichotomic phylogenetic trees are somehow biased. We have generalized the formulas used in the NJ algorithm to cope with Multifurcating branches or polytomies, and we have called this new variant of the method the multifurcating neighbor-joining (MFNJ) algorithm. Instead of the dichotomic phylogenetic trees reconstructed by the NJ algorithm, the MFNJ algorithm produces polytomic phylogenetic trees. The main advantage of using the MFNJ algorithm is that only one phylogenetic tree can be obtained, which makes the experimental section of any study completely reproducible and unbiased to external issues such as the input order of taxa.
Assuntos
Algoritmos , Modelos Genéticos , FilogeniaRESUMO
The phylogenetic relationships between fossil hominin taxa have been a contentious topic for decades. Recent discoveries of new taxa, rather than resolving the issue, have only further confused it. Compounding this problem are the limitations of some of the tools frequently used by paleoanthropologists to analyze these relationships. Most commonly, phylogenetic questions are investigated using analytical methods such as maximum parsimony and Bayesian analysis. While these are useful analytical tools, these tree-building methods can have limitations when investigating taxa that may have complex evolutionary histories. Exploratory data analysis can provide information about patterns in a dataset that are obscured by tree-based methods. These patterns include phylogenetic signal conflict, which is not depicted in tree-based methods. Signal conflict can have a number of sources, including methodological issues with character choice, taxonomic issues, homoplasy, and gene flow between taxa. In this study, an exploratory data analysis of fossil hominin morphological data is conducted using the tree-based analytical method neighbor-joining and the network-based analytical method neighbor-net with the goal of visualizing phylogenetic signal conflict within a hominin morphological data set. The data set is divided into cranial regions, and each cranial region is analyzed individually to investigate which regions of the skull contain the highest levels of signal conflict. Results of this analysis show that conflicting phylogenetic signals are present in the hominin fossil record during the relatively speciose period between 3 and 1 Ma, and they also indicate that levels of signal conflict vary by cranial region. Possible sources of these conflicting signals are then explored. Exploratory data analyses such as this can be a useful tool in generating phylogenetic hypotheses and in refining character choice. This study also highlights the value network-based approaches can bring to the hominin phylogenetic analysis toolkit.
Assuntos
Hominidae , Animais , Filogenia , Hominidae/anatomia & histologia , Teorema de Bayes , Evolução Biológica , Crânio/anatomia & histologia , FósseisRESUMO
The neighbor-joining (NJ) method of tree inference is examined, with special attention to its use in yeast species descriptions. How the often-vilified method works is often misunderstood. More importantly, given the right kind of data, its output is a phylogram that illustrates a hypothetical phylogeny that is just as credible as that obtained by any other method. And as with any other method, the result is greatly affected by sampling intensity, particularly the number of aligned positions used for analysis. I address various allegations, including the claim that the method is phenetic, and, therefore, not phylogenetic. I argue that NJ is the most suitable tree inference method to use in yeast species descriptions, primarily because it is best at visually preserving the extent of sequence divergence between close relatives, which continues to be the primary criterion for yeast species delineation. The relevance of bootstraps in the application of the phylogenetic species concept is discussed.
Assuntos
Algoritmos , Modelos Genéticos , Evolução Molecular , FilogeniaRESUMO
Phylogenetic networks extend phylogenetic trees to model non-vertical inheritance, by which a lineage inherits material from multiple parents. The computational complexity of estimating phylogenetic networks from genome-wide data with likelihood-based methods limits the size of networks that can be handled. Methods based on pairwise distances could offer faster alternatives. We study here the information that average pairwise distances contain on the underlying phylogenetic network, by characterizing local and global features that can or cannot be identified. For general networks, we clarify that the root and edge lengths adjacent to reticulations are not identifiable, and then focus on the class of zipped-up semidirected networks. We provide a criterion to swap subgraphs locally, such as 3-cycles, resulting in indistinguishable networks. We propose the "distance split tree", which can be constructed from pairwise distances, and prove that it is a refinement of the network's tree of blobs, capturing the tree-like features of the network. For level-1 networks, this distance split tree is equal to the tree of blobs refined to separate polytomies from blobs, and we prove that the mixed representation of the network is identifiable. The information loss is localized around 4-cycles, for which the placement of the reticulation is unidentifiable. The mixed representation combines split edges for 4-cycles, regular tree and hybrid edges from the semidirected network, and edge parameters that encode all information identifiable from average pairwise distances.
Assuntos
Filogenia , Funções VerossimilhançaRESUMO
The identification of species primordium has been one of the hot issues in the identification of traditional Chinese medicine. Sea snake is one of the most valuable Chinese medicinal materials in China. In order to understand the origin and varieties of sea snake in the market, we studied the molecular identification of 46 sea snakes by cytochrome B(Cytb). After comparison and manual correction, the sequence length was 582 bp, and the content of A+T(58.9%) was higher than that of G+C(41.1%). There exist 197 variable sites and 179 parsimony-informative sites of the sequence. There are 44 kinds of sequence alignment with consistency equal to 100%, and 2 kinds equal to 96%. A total of 408 Cytb effective sequences were downloaded from GenBank database, with a total of 68 species. Phylogenetic tree of a total of 454 sea snake sequences with the samples in this study were constructed by neighbor-joining trees and Bayesian inference method, respectively, which can identify 42 samples of medicinal materials, while 4 samples can not be identified because of their low node support. The results showed that the species of the sea snake medicine were at least from 2 genera and 5 species, namely, Aipysurus eydouxii, Hydrophis curtus, H. caerulescen, H. curtus, H. ornatus and H. spiralis. This study suggested that the original species of commercial sea snake are very complex and can provide insight into the identification of sea snakes.
Assuntos
Hydrophiidae , Animais , Teorema de Bayes , China , Citocromos b/genética , Medicina Tradicional Chinesa , FilogeniaRESUMO
Based on the k-mer model for protein sequence, a novel k-mer natural vector method is proposed to characterize the features of k-mers in a protein sequence, in which the numbers and distributions of k-mers are considered. It is proved that the relationship between a protein sequence and its k-mer natural vector is one-to-one. Phylogenetic analysis of protein sequences therefore can be easily performed without requiring evolutionary models or human intervention. In addition, there exists no a criterion to choose a suitable k, and k has a great influence on obtaining results as well as computational complexity. In this paper, a compound k-mer natural vector is utilized to quantify each protein sequence. The results gotten from phylogenetic analysis on three protein datasets demonstrate that our new method can precisely describe the evolutionary relationships of proteins, and greatly heighten the computing efficiency.
Assuntos
Filogenia , Análise de Sequência de Proteína/métodos , Vírus da Influenza A/classificação , Rhinovirus/classificação , Proteínas Virais/química , Globinas beta/químicaRESUMO
The neighbor-joining algorithm for phylogenetic inference (NJ) has been seen to have three specific properties when applied to distance matrices that contain an admixed taxon: (1) antecedence of clustering, in which the admixed taxon agglomerates with one of its source taxa before the two source taxa agglomerate with each other; (2) intermediacy of distances, in which the distance on an inferred NJ tree between an admixed taxon and either of its source taxa is smaller than the distance between the two source taxa; and (3) intermediacy of path lengths, in which the number of edges separating the admixed taxon and either of its source taxa is less than or equal to the number of edges between the source taxa. We examine the behavior of neighbor-joining on distance matrices containing an admixed group, investigating the occurrence of antecedence of clustering, intermediacy of distances, and intermediacy of path lengths. We first mathematically predict the frequency with which the properties are satisfied for a labeled unrooted binary tree selected uniformly at random in the absence of admixture. We then introduce a taxon constructed by a linear admixture of distances from two source taxa, examining three admixture scenarios by simulation: a model in which distance matrices are chosen at random, a model in which an admixed taxon is added to a set of taxa that reflect treelike evolution, and a model that introduces a perturbation of the treelike scenario. In contrast to previous conjectures, we observe that the three properties are sometimes violated by distance matrices that include an admixed taxon. However, we also find that they are satisfied more often than is expected by chance when the distance matrix contains an admixed taxon, especially when evolution among the non-admixed taxa is treelike. The results contribute to a deeper understanding of the nature of evolutionary trees constructed from data that do not necessarily reflect a treelike evolutionary process.
Assuntos
Algoritmos , Filogenia , Análise por Conglomerados , Biologia Computacional , Simulação por Computador , Evolução Molecular , Conceitos Matemáticos , Modelos Genéticos , Modelos Estatísticos , ProbabilidadeRESUMO
BACKGROUND: In phylogenetic reconstruction the result is a tree where all taxa are leaves and internal nodes are hypothetical ancestors. In a live phylogeny, both ancestral and living taxa may coexist, leading to a tree where internal nodes may be living taxa. The well-known Neighbor-Joining heuristic is largely used for phylogenetic reconstruction. RESULTS: We present Live Neighbor-Joining, a heuristic for building a live phylogeny. We have investigated Live Neighbor-Joining on datasets of viral genomes, a plausible scenario for its application, which allowed the construction of alternative hypothesis for the relationships among virus that embrace both ancestral and descending taxa. We also applied Live Neighbor-Joining on a set of bacterial genomes and to sets of images and texts. Non-biological data may be better explored visually when their relationship in terms of content similarity is represented by means of a phylogeny. CONCLUSION: Our experiments have shown interesting alternative phylogenetic hypothesis for RNA virus genomes, bacterial genomes and alternative relationships among images and texts, illustrating a wide range of scenarios where Live Neighbor-Joining may be used.
Assuntos
Modelos Genéticos , Filogenia , Plantas/químicaRESUMO
We surveyed genome sequences from the basidiomycetous mushroom Coprinopsis cinerea and isolated a cDNA homologous to CMKA, a calmodulin-dependent protein kinase (CaMK) in Aspergillus nidulans. We designated this sequence, encoding 580 amino acids with a molecular weight of 63,987, as CoPK02. CoPK02 possessed twelve subdomains specific to protein kinases and exhibited 43, 35, 40% identity with rat CaMKI, CaMKII, CaMKIV, respectively, and 40% identity with CoPK12, one of the CaMK orthologs in C. cinerea. CoPK02 showed significant autophosphorylation activity and phosphorylated exogenous proteins in the presence of Ca2+/CaM. By the CaM-overlay assay we confirmed that the C-terminal sequence (Trp346-Arg358) was the calmodulin-binding site, and that the binding of Ca2+/CaM to CoPK02 was reduced by the autophosphorylation of CoPK02. Since CoPK02 evolved in a different clade from CoPK12, and showed different gene expression compared to that of CoPK32, which is homologous to mitogen-activated protein kinase-activated protein kinase, CoPK02 and CoPK12 might cooperatively regulate Ca2+-signaling in C. cinerea.
Assuntos
Basidiomycota/enzimologia , Proteínas Quinases Dependentes de Cálcio-Calmodulina/metabolismo , Sequência de Aminoácidos , Animais , Basidiomycota/genética , Basidiomycota/crescimento & desenvolvimento , Sítios de Ligação , Sinalização do Cálcio , Proteínas Quinases Dependentes de Cálcio-Calmodulina/química , Proteínas Quinases Dependentes de Cálcio-Calmodulina/genética , Calmodulina/metabolismo , Catálise , Clonagem Molecular , Eletroforese em Gel de Poliacrilamida , Perfilação da Expressão Gênica , Genes Fúngicos , Fosforilação , Filogenia , Ratos , Homologia de Sequência de AminoácidosRESUMO
The occurrence of Suidasia medanensis (= S. pontifica) mites in Malaysian house dust was first reported in 1984. The taxonomy of this storage mite is, however, quite confusing. Therefore, we need an accurate identification to resolve morphological problems due to its minute size and some overlapping characters between species. The purpose of this study was to demonstrate the application of partial mitochondrial cytochrome c oxidase subunit I (COI) sequences for the identification of S. medanensis by PCR. Identity of the mite was first determined by observing morphological characters under a light microscope. Genomic DNA of S. medanensis mites was successfully extracted prior to PCR and DNA sequencing using COI universal primers. The length of the COI sequences obtained was 378 bp. BLAST analysis of amplicon sequences showed that local S. medanensis COI region had 99% maximum identity with S. medanensis nucleotide sequence (AY525568) available in the GenBank. As the phylogenetic tree generated indicated, COI sequences from this study were clustered with S. medanensis from Korea and the UK in one major clade, supported with high bootstrap value (> 85%). Results of the phylogenetic analysis of this COI gene were congruent with the morphological identification and provided strong support for a single clade of local S. medanensis.
Assuntos
Ácaros/classificação , Animais , Proteínas de Artrópodes/análise , Sequência de Bases , Poeira , Complexo IV da Cadeia de Transporte de Elétrons/análise , Malásia , Ácaros/anatomia & histologia , Ácaros/genética , Filogenia , Reação em Cadeia da Polimerase/métodos , Reação em Cadeia da Polimerase/veterinária , Alinhamento de SequênciaRESUMO
At the present time it is often stated that the maximum likelihood or the Bayesian method of phylogenetic construction is more accurate than the neighbor joining (NJ) method. Our computer simulations, however, have shown that the converse is true if we use p distance in the NJ procedure and the criterion of obtaining the true tree (Pc expressed as a percentage) or the combined quantity (c) of a value of Pc and a value of Robinson-Foulds' average topological error index (dT). This c is given by Pc (1 - dT/dTmax) = Pc (m - 3 - dT/2)/(m - 3), where m is the number of taxa used and dTmax is the maximum possible value of dT, which is given by 2(m - 3). This neighbor joining method with p distance (NJp method) will be shown generally to give the best data-fit model. This c takes a value between 0 and 1, and a tree-making method giving a high value of c is considered to be good. Our computer simulations have shown that the NJp method generally gives a better performance than the other methods and therefore this method should be used in general whether the gene is compositional or it contains the mosaic DNA regions or not.
Assuntos
Biologia Computacional/métodos , Modelos Genéticos , Filogenia , Análise de Sequência de DNA/métodos , Algoritmos , Teorema de Bayes , Simulação por Computador , DNA/genética , Funções Verossimilhança , ProbabilidadeRESUMO
We propose an extension of the distance matrix methods NJst and ASTRID to infer species trees from incongruent gene trees having Incomplete Lineage Sorting. Both approaches consider the average internode distance (ID) between individual taxa pairs as the distance measure. The measure ID does not use the root of a tree, and thus may not always infer the relative position of a taxon with respect to the root. We define a novel distance measure excess gene leaf count (XL) between individual couplets. The XL measure is computed using the root of a tree. It is proved to be additive, and is shown to infer the relative order of divergence among individual couplets better. We propose a novel method IDXL which uses both the XL and ID measures for species tree construction. IDXL is shown to perform better than NJst and other distance matrix approaches for most of the biological and simulated datasets. Having the same computational complexity as NJst, IDXL can be applied for species tree inference on large-scale biological datasets.
Assuntos
Algoritmos , Biologia Computacional/métodos , Evolução Molecular , Genes , Especiação Genética , Animais , Magnoliopsida/genética , Modelos Genéticos , Filogenia , Vertebrados/genéticaRESUMO
Recent theoretical work has demonstrated that Neighbor Joining applied to concatenated DNA sequences is a statistically consistent method of species tree reconstruction. This brief note compares the accuracy of this approach to other popular statistically consistent species tree reconstruction algorithms including ASTRAL-II Neighbor Joining using average gene-tree internode distances (NJst) and SVD-Quartets+PAUP*, as well as concatenation using maximum likelihood (RaxML). We find that the faster Neighbor Joining, applied to concatenated sequences, is among the most effective of these methods for accurate species tree reconstruction.
Assuntos
Algoritmos , Evolução Molecular , Modelos Genéticos , Análise de Sequência de DNARESUMO
It was recently observed by de Vienne et al. (Syst Biol 60(6):826-832, 2011) that a simple square root transformation of distances between taxa on a phylogenetic tree allowed for an embedding of the taxa into Euclidean space. While the justification for this was based on a diffusion model of continuous character evolution along the tree, here we give a direct and elementary explanation for it that provides substantial additional insight. We use this embedding to reinterpret the differences between the NJ and BIONJ tree building algorithms, providing one illustration of how this embedding reflects tree structures in data.
Assuntos
Classificação/métodos , Modelos Genéticos , Filogenia , AlgoritmosRESUMO
The balanced minimal evolution (BME) method of creating phylogenetic trees can be formulated as a linear programming problem, minimizing an inner product over the vertices of the BME polytope. In this paper we undertake the project of describing the facets of this polytope. We classify and identify the combinatorial structure and geometry (facet inequalities) of all the facets in dimensions up to five, and classify even more facets in all dimensions. A full set of facet inequalities would allow a full implementation of the simplex method for finding the BME tree-although there are reasons to think this an unreachable goal. However, our results provide the crucial first steps for a more likely-to-be-successful program: finding efficient relaxations of the BME polytope.
Assuntos
Classificação/métodos , Filogenia , Algoritmos , Modelos Biológicos , Programação LinearRESUMO
BACKGROUND: Genetic diversity of 19 forage-type and 2 turf-type cultivars of tall fescue (Festuca arundinacea Schreb.) was revealed using SSR markers in an attempt to explore the genetic relationships among them, and examine potential use of SSR markers to identify cultivars by bulked samples. RESULTS: A total of 227 clear band was scored with 14 SSR primers and out of which 201 (88.6 %) were found polymorphic. The percentage of polymorphic bands (PPB) per primer pair varied from 62.5 to 100 % with an average of 86.9 %. The polymorphism information content (PIC) value ranged from 0.116 to 0.347 with an average of 0.257 and the highest PIC value (0.347) was noticed for primer NFA040 followed by NFA113 (0.346) whereas the highest discriminating power (D) of 1 was shown in NFA037 and LMgSSR02-01C. A Neighbor-joining dendrogram and the principal component analysis identified six major clusters and grouped the cultivars in agreement with their breeding histories. STRUCTURE analysis divided these cultivars into 3 sub-clades which correspond to distance based groupings. CONCLUSION: These findings indicates that SSR markers by bulking strategy are a useful tool to measure genetic diversity among tall fescue cultivars and could be used to supplement morphological data for plant variety protection.
Assuntos
Festuca/genética , Repetições de Microssatélites , Polimorfismo Genético , DNA de Plantas/genética , Festuca/classificação , Marcadores Genéticos , Análise de Sequência de DNARESUMO
Phylogenetic relationships of Indian Citron (Citrus medica L.) with other important Citrus species have been inferred through sequence analyses of rbcL and matK gene region of chloroplast DNA. The study was based on 23 accessions of Citrus genotypes representing 15 taxa of Indian Citrus, collected from wild, semi-wild, and domesticated stocks. The phylogeny was inferred using the maximum parsimony (MP) and neighbor-joining (NJ) methods. Both MP and NJ trees separated all the 23 accessions of Citrus into five distinct clusters. The chloroplast DNA (cpDNA) analysis based on rbcL and matK sequence data carried out in Indian taxa of Citrus was useful in differentiating all the true species and species/varieties of probable hybrid origin in distinct clusters or groups. Sequence analysis based on rbcL and matK gene provided unambiguous identification and disposition of true species like C. maxima, C. medica, C. reticulata, and related hybrids/cultivars. The separation of C. maxima, C. medica, and C. reticulata in distinct clusters or sub-clusters supports their distinctiveness as the basic species of edible Citrus. However, the cpDNA sequence analysis of rbcL and matK gene could not find any clear cut differentiation between subgenera Citrus and Papeda as proposed in Swingle's system of classification.
Assuntos
Cloroplastos/genética , Citrus/classificação , Citrus/genética , Análise de Sequência de DNA/métodos , DNA de Cloroplastos/genética , Evolução Molecular , Variação Genética , Genótipo , FilogeniaRESUMO
In evolutionary biology, the taxonomy and origination of species are widely studied subjects. An estimation of the evolutionary tree can be done via available DNA sequence data. The calculation of the tree is made by well-known and frequently used methods such as maximum likelihood and neighbor-joining. In order to examine the results of these methods, an evolutionary tree is pursued computationally by a mathematical model, called Tangled Nature. A relatively small genome space is investigated due to computational burden and it is found that the actual and predicted trees are in reasonably good agreement in terms of shape. Moreover, the speciation and the resulting community structure of the food-web are investigated by modularity.
Assuntos
Ecossistema , Modelos Genéticos , Filogenia , Genoma , Funções Verossimilhança , Especificidade da Espécie , Fatores de TempoRESUMO
Potato (Solanum tuberosum) is an important non-cereal crop throughout the world and is highly recommended for ensuring global food security. Owing to the complexities in genetics and inheritance pattern of potato, the conventional method of cross breeding for developing improved varieties has been difficult. Identification and tagging of desirable traits with informative molecular markers would aid in the development of improved varieties. Insertional polymorphism of copia-like and gypsy-like long terminal repeat retrotransposons (RTN) were investigated among 47 potato varieties from India using Inter-Retrotransposon Amplified Polymorphism (IRAP) and Retrotransposon Microsatellite Amplified Polymorphism (REMAP) marker techniques and were compared with the DNA profiles obtained with simple sequence repeats (SSRs). The genetic polymorphism, efficiency of polymorphism and effectiveness of marker systems were evaluated to assess the extent of genetic diversity among Indian potato varieties. A total of 139 polymorphic SSR alleles, 270 IRAP and 98 REMAP polymorphic bands, showing polymorphism of 100%, 87.9% and 68.5%, respectively, were used for detailed characterization of the genetic relationships among potato varieties by using cluster analysis and principal coordinate analysis (PCoA). IRAP analysis resulted in the highest number of polymorphic bands with an average of 15 polymorphic bands per assay unit when compared to the other two marker systems. Based on pair-wise comparison, the genetic similarity was calculated using Dice similarity coefficient. The SSRs showed a wide range in genetic similarity values (0.485-0.971) as compared to IRAP (0.69-0.911) and REMAP (0.713-0.947). A Mantel's matrix correspondence test showed a high positive correlation (r=0.6) between IRAP and REMAP, an intermediate value (r=0.58) for IRAP and SSR and the lowest value (r=0.17) for SSR and REMAP. Statistically significant cophenetic correlation coefficient values, of 0.961, 0.941 and 0.905 were observed for REMAP, IRAP and SSR, respectively. The widespread presence and distinct DNA profiles for copia-like and gypsy-like RTNs in the examined genotypes indicate that these elements are active in the genome and may have even contributed to the potato genome organization. Although the three marker systems were capable of distinguishing all the 47 varieties; high reproducibility, low cost and ease of DNA profiling data collection make IRAP and REMAP markers highly efficient whole-genome scanning molecular probes for population genetic studies. Information obtained from the present study regarding the genetic association and distinctiveness provides an useful guide for selection of germplasm for plant breeding and conservation efforts.
Assuntos
Impressões Digitais de DNA/métodos , Variação Genética/genética , Repetições de Microssatélites/genética , Retroelementos/genética , Solanum tuberosum/genética , Alelos , Impressões Digitais de DNA/economia , Marcadores Genéticos/genética , Genótipo , Filogenia , Polimorfismo Genético/genética , Análise de Sequência de DNA/economiaRESUMO
Phylogenetic analyses based on small to moderately sized sets of sequential data lead to overestimating mutation rates in influenza hemagglutinin (HA) by at least an order of magnitude. Two major underlying reasons are: the incomplete lineage sorting, and a possible absence in the analyzed sequences set some of key missing ancestors. Additionally, during neighbor joining tree reconstruction each mutation is considered equally important, regardless of its nature. Here we have implemented a heuristic method optimizing site dependent factors weighting differently 1st, 2nd, and 3rd codon position mutations, allowing to extricate incorrectly attributed sub-clades. The least squares regression analysis of distribution of frequencies for all mutations observed on a partially disentangled tree for a large set of unique 3243 HA sequences, along all nucleotide positions, was performed for all mutations as well as for non-equivalent amino acid mutations - in both cases demonstrating almost flat gradients, with a very slight downward slope towards the 3'-end positions. The mean mutation rates per sequence per year were 3.83×10(-4) for the all mutations, and 9.64×10(-5) for the non-equivalent ones.