Búsqueda | Portal de Búsqueda de la BVS

1.

The prevalence of terraced treescapes in analyses of phylogenetic data sets.

Dobrin, Barbara H; Zwickl, Derrick J; Sanderson, Michael J.

BMC Evol Biol ; 18(1): 46, 2018 04 04.

Artículo en Inglés | MEDLINE | ID: mdl-29618314

RESUMEN

BACKGROUND: The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling "sufficiency". We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods. RESULTS: Terraces were identified in nearly all data sets with taxon coverage densities < 0.90. They were not found, however, in high-coverage-density (i.e., ≥ 0.94) transcriptomic and genomic data sets. The terraces could be very large, and size varied inversely with taxon coverage density and with gene sampling sufficiency. Few data sets achieved a theoretical minimum gene sampling depth needed to reduce terrace size to a single tree. Terraces found during bootstrap resampling reduced overall support. CONCLUSIONS: If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction.

Asunto(s)

Bases de Datos Genéticas , Filogenia , Genes , Modelos Genéticos

2.

Impacts of Terraces on Phylogenetic Inference.

Sanderson, Michael J; McMahon, Michelle M; Stamatakis, Alexandros; Zwickl, Derrick J; Steel, Mike.

Syst Biol ; 64(5): 709-26, 2015 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-25999395

RESUMEN

Terraces are sets of trees with precisely the same likelihood or parsimony score, which can be induced by missing sequences in partitioned multi-locus phylogenetic data matrices. The potentially large set of trees on a terrace can be characterized by enumeration algorithms or consensus methods that exploit the pattern of partial taxon coverage in the data, independent of the sequence data themselves. Terraces can add ambiguity and complexity to phylogenetic inference, particularly in settings where inference is already challenging: data sets with many taxa and relatively few loci. In this article we present five new findings about terraces and their impacts on phylogenetic inference. First, we clarify assumptions about partitioning scheme model parameters that are necessary for the existence of terraces. Second, we explore the dependence of terrace size on partitioning scheme and indicate how to find the partitioning scheme associated with the largest terrace containing a given tree. Third, we highlight the impact of terrace size on bootstrap estimates of confidence limits in clades, and characterize the surprising result that the bootstrap proportion for a clade, as it is usually calculated, can be entirely determined by the frequency of bipartitions on a terrace, with some bipartitions receiving high support even when incorrect. Fourth, we dissect some effects of prior distributions of edge lengths on the computed posterior probabilities of clades on terraces, to understand an example in which long edges "attract" each other in Bayesian inference. Fifth, we describe how assuming relationships between edge-lengths of different loci, as an attempt to avoid terraces, can also be problematic when taxon coverage is partial, specifically when heterotachy is present. Finally, we discuss strategies for remediation of some of these problems. One promising approach finds a minimal set of taxa which, when deleted from the data matrix, reduces the size of a terrace to a single tree.

Asunto(s)

Clasificación/métodos , Simulación por Computador/normas , Filogenia , Modelos Genéticos

3.

A gateway for phylogenetic analysis powered by grid computing featuring GARLI 2.0.

Bazinet, Adam L; Zwickl, Derrick J; Cummings, Michael P.

Syst Biol ; 63(5): 812-8, 2014 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-24789072

RESUMEN

We introduce molecularevolution.org, a publicly available gateway for high-throughput, maximum-likelihood phylogenetic analysis powered by grid computing. The gateway features a garli 2.0 web service that enables a user to quickly and easily submit thousands of maximum likelihood tree searches or bootstrap searches that are executed in parallel on distributed computing resources. The garli web service allows one to easily specify partitioned substitution models using a graphical interface, and it performs sophisticated post-processing of phylogenetic results. Although the garli web service has been used by the research community for over three years, here we formally announce the availability of the service, describe its capabilities, highlight new features and recent improvements, and provide details about how the grid system efficiently delivers high-quality phylogenetic results.

Asunto(s)

Clasificación/métodos , Filogenia , Programas Informáticos , Acceso a la Información , Internet

4.

Disentangling methodological and biological sources of gene tree discordance on Oryza (Poaceae) chromosome 3.

Zwickl, Derrick J; Stein, Joshua C; Wing, Rod A; Ware, Doreen; Sanderson, Michael J.

Syst Biol ; 63(5): 645-59, 2014 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-24721692

RESUMEN

We describe new methods for characterizing gene tree discordance in phylogenomic data sets, which screen for deviations from neutral expectations, summarize variation in statistical support among gene trees, and allow comparison of the patterns of discordance induced by various analysis choices. Using an exceptionally complete set of genome sequences for the short arm of chromosome 3 in Oryza (rice) species, we applied these methods to identify the causes and consequences of differing patterns of discordance in the sets of gene trees inferred using a panel of 20 distinct analysis pipelines. We found that discordance patterns were strongly affected by aspects of data selection, alignment, and alignment masking. Unusual patterns of discordance evident when using certain pipelines were reduced or eliminated by using alternative pipelines, suggesting that they were the product of methodological biases rather than evolutionary processes. In some cases, once such biases were eliminated, evolutionary processes such as introgression could be implicated. Additionally, patterns of gene tree discordance had significant downstream impacts on species tree inference. For example, inference from supermatrices was positively misleading when pipelines that led to biased gene trees were used. Several results may generalize to other data sets: we found that gene tree and species tree inference gave more reasonable results when intron sequence was included during sequence alignment and tree inference, the alignment software PRANK was used, and detectable "block-shift" alignment artifacts were removed. We discuss our findings in the context of well-established relationships in Oryza and continuing controversies regarding the domestication history of O. sativa.

Asunto(s)

Cromosomas de las Plantas/genética , Clasificación/métodos , Oryza/clasificación , Oryza/genética , Filogenia , Genoma de Planta/genética

5.

Exceptional reduction of the plastid genome of saguaro cactus (Carnegiea gigantea): Loss of the ndh gene suite and inverted repeat.

Sanderson, Michael J; Copetti, Dario; Búrquez, Alberto; Bustamante, Enriquena; Charboneau, Joseph L M; Eguiarte, Luis E; Kumar, Sudhir; Lee, Hyun Oh; Lee, Junki; McMahon, Michelle; Steele, Kelly; Wing, Rod; Yang, Tae-Jin; Zwickl, Derrick; Wojciechowski, Martin F.

Am J Bot ; 102(7): 1115-27, 2015 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-26199368

RESUMEN

UNLABELLED: â¢ PREMISE OF THE STUDY: Land-plant plastid genomes have only rarely undergone significant changes in gene content and order. Thus, discovery of additional examples adds power to tests for causes of such genome-scale structural changes.â¢ METHODS: Using next-generation sequence data, we assembled the plastid genome of saguaro cactus and probed the nuclear genome for transferred plastid genes and functionally related nuclear genes. We combined these results with available data across Cactaceae and seed plants more broadly to infer the history of gene loss and to assess the strength of phylogenetic association between gene loss and loss of the inverted repeat (IR).â¢ KEY RESULTS: The saguaro plastid genome is the smallest known for an obligately photosynthetic angiosperm (â¼113 kb), having lost the IR and plastid ndh genes. This loss supports a statistically strong association across seed plants between the loss of ndh genes and the loss of the IR. Many nonplastid copies of plastid ndh genes were found in the nuclear genome, but none had intact reading frames; nor did three related nuclear-encoded subunits. However, nuclear pgr5, which functions in a partially redundant pathway, was intact.â¢ CONCLUSIONS: The existence of an alternative pathway redundant with the function of the plastid NADH dehydrogenase-like complex (NDH) complex may permit loss of the plastid ndh gene suite in photoautotrophs like saguaro. Loss of these genes may be a recurring mechanism for overall plastid genome size reduction, especially in combination with loss of the IR.

Asunto(s)

Cactaceae/genética , Genoma de Plastidios/genética , Secuencias Invertidas Repetidas/genética , NADH Deshidrogenasa/genética , Plastidios/genética , ADN de Plantas/química , ADN de Plantas/genética , Evolución Molecular , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Filogenia , Proteínas de Plantas/genética , Análisis de Secuencia de ADN

6.

BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics.

Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A.

Syst Biol ; 61(1): 170-3, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-21963610

RESUMEN

Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.

Asunto(s)

Biología Computacional/métodos , Filogenia , Programas Informáticos , Algoritmos , Metodologías Computacionales , Evolución Molecular , Genoma

7.

Old gene duplication facilitates origin and diversification of an innovative communication system--twice.

Arnegard, Matthew E; Zwickl, Derrick J; Lu, Ying; Zakon, Harold H.

Proc Natl Acad Sci U S A ; 107(51): 22172-7, 2010 Dec 21.

Artículo en Inglés | MEDLINE | ID: mdl-21127261

RESUMEN

The genetic basis of parallel innovation remains poorly understood due to the rarity of independent origins of the same complex trait among model organisms. We focus on two groups of teleost fishes that independently gained myogenic electric organs underlying electrical communication. Earlier work suggested that a voltage-gated sodium channel gene (Scn4aa), which arose by whole-genome duplication, was neofunctionalized for expression in electric organ and subsequently experienced strong positive selection. However, it was not possible to determine if these changes were temporally linked to the independent origins of myogenic electric organs in both lineages. Here, we test predictions of such a relationship. We show that Scn4aa co-option and rapid sequence evolution were tightly coupled to the two origins of electric organ, providing strong evidence that Scn4aa contributed to parallel innovations underlying the evolutionary diversification of each electric fish group. Independent evolution of electric organs and Scn4aa co-option occurred more than 100 million years following the origin of Scn4aa by duplication. During subsequent diversification of the electrical communication channels, amino acid substitutions in both groups occurred in the same regions of the sodium channel that likely contribute to electric signal variation. Thus, the phenotypic similarities between independent electric fish groups are also associated with striking parallelism at genetic and molecular levels. Our results show that gene duplication can contribute to remarkably similar innovations in repeatable ways even after long waiting periods between gene duplication and the origins of novelty.

Asunto(s)

Órgano Eléctrico/fisiología , Evolución Molecular , Proteínas de Peces/genética , Peces/genética , Duplicación de Gen/genética , Canales de Sodio/genética , Secuencia de Aminoácidos , Sustitución de Aminoácidos , Animales , Estudio de Asociación del Genoma Completo , Humanos , Datos de Secuencia Molecular

8.

Source identification in two criminal cases using phylogenetic analysis of HIV-1 DNA sequences.

Scaduto, Diane I; Brown, Jeremy M; Haaland, Wade C; Zwickl, Derrick J; Hillis, David M; Metzker, Michael L.

Proc Natl Acad Sci U S A ; 107(50): 21242-7, 2010 Dec 14.

Artículo en Inglés | MEDLINE | ID: mdl-21078965

RESUMEN

Phylogenetic analysis has been widely used to test the a priori hypothesis of epidemiological clustering in suspected transmission chains of HIV-1. Among studies showing strong support for relatedness between HIV samples obtained from infected individuals, evidence for the direction of transmission between epidemiologically related pairs has been lacking. During transmission of HIV, a genetic bottleneck occurs, resulting in the paraphyly of source viruses with respect to those of the recipient. This paraphyly establishes the direction of transmission, from which the source can then be inferred. Here, we present methods and results from two criminal cases, State of Washington v Anthony Eugene Whitfield, case number 04-1-0617-5 (Superior Court of the State of Washington, Thurston County, 2004) and State of Texas v Philippe Padieu, case numbers 219-82276-07, 219-82277-07, 219-82278-07, 219-82279-07, 219-82280-07, and 219-82705-07 (219th Judicial District Court, Collin County, TX, 2009), which provided evidence that direction can be established from blinded case samples. The observed paraphyly from each case study led to the identification of an inferred source (i.e., index case), whose identity was revealed at trial to be that of the defendant.

Asunto(s)

Derecho Penal , ADN Viral/análisis , Genética Forense/métodos , Infecciones por VIH/transmisión , VIH-1/clasificación , VIH-1/genética , Análisis de Secuencia de ADN , ADN Viral/sangre , Bases de Datos Genéticas , Infecciones por VIH/genética , Infecciones por VIH/virología , Humanos , Datos de Secuencia Molecular , Filogenia , Texas , Washingtón

9.

Publisher Correction: Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza.

Stein, Joshua C; Yu, Yeisoo; Copetti, Dario; Zwickl, Derrick J; Zhang, Li; Zhang, Chengjun; Chougule, Kapeel; Gao, Dongying; Iwata, Aiko; Goicoechea, Jose Luis; Wei, Sharon; Wang, Jun; Liao, Yi; Wang, Muhua; Jacquemin, Julie; Becker, Claude; Kudrna, Dave; Zhang, Jianwei; Londono, Carlos E M; Song, Xiang; Lee, Seunghee; Sanchez, Paul; Zuccolo, Andrea; Ammiraju, Jetty S S; Talag, Jayson; Danowitz, Ann; Rivera, Luis F; Gschwend, Andrea R; Noutsos, Christos; Wu, Cheng-Chieh; Kao, Shu-Min; Zeng, Jhih-Wun; Wei, Fu-Jin; Zhao, Qiang; Feng, Qi; El Baidouri, Moaine; Carpentier, Marie-Christine; Lasserre, Eric; Cooke, Richard; da Rosa Farias, Daniel; da Maia, Luciano Carlos; Dos Santos, Railson S; Nyberg, Kevin G; McNally, Kenneth L; Mauleon, Ramil; Alexandrov, Nickolai; Schmutz, Jeremy; Flowers, Dave; Fan, Chuanzhu; Weigel, Detlef.

Nat Genet ; 50(11): 1618, 2018 11.

Artículo en Inglés | MEDLINE | ID: mdl-30291357

RESUMEN

This article was not made open access when initially published online, which was corrected before print publication. In addition, ORCID links were missing for 12 authors and have been added to the HTML and PDF versions of the article.

10.

Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza.

Stein, Joshua C; Yu, Yeisoo; Copetti, Dario; Zwickl, Derrick J; Zhang, Li; Zhang, Chengjun; Chougule, Kapeel; Gao, Dongying; Iwata, Aiko; Goicoechea, Jose Luis; Wei, Sharon; Wang, Jun; Liao, Yi; Wang, Muhua; Jacquemin, Julie; Becker, Claude; Kudrna, Dave; Zhang, Jianwei; Londono, Carlos E M; Song, Xiang; Lee, Seunghee; Sanchez, Paul; Zuccolo, Andrea; Ammiraju, Jetty S S; Talag, Jayson; Danowitz, Ann; Rivera, Luis F; Gschwend, Andrea R; Noutsos, Christos; Wu, Cheng-Chieh; Kao, Shu-Min; Zeng, Jhih-Wun; Wei, Fu-Jin; Zhao, Qiang; Feng, Qi; El Baidouri, Moaine; Carpentier, Marie-Christine; Lasserre, Eric; Cooke, Richard; Rosa Farias, Daniel da; da Maia, Luciano Carlos; Dos Santos, Railson S; Nyberg, Kevin G; McNally, Kenneth L; Mauleon, Ramil; Alexandrov, Nickolai; Schmutz, Jeremy; Flowers, Dave; Fan, Chuanzhu; Weigel, Detlef.

Nat Genet ; 50(2): 285-296, 2018 02.

Artículo en Inglés | MEDLINE | ID: mdl-29358651

RESUMEN

The genus Oryza is a model system for the study of molecular evolution over time scales ranging from a few thousand to 15 million years. Using 13 reference genomes spanning the Oryza species tree, we show that despite few large-scale chromosomal rearrangements rapid species diversification is mirrored by lineage-specific emergence and turnover of many novel elements, including transposons, and potential new coding and noncoding genes. Our study resolves controversial areas of the Oryza phylogeny, showing a complex history of introgression among different chromosomes in the young 'AA' subclade containing the two domesticated species. This study highlights the prevalence of functionally coupled disease resistance genes and identifies many new haplotypes of potential use for future crop protection. Finally, this study marks a milestone in modern rice research with the release of a complete long-read assembly of IR 8 'Miracle Rice', which relieved famine and drove the Green Revolution in Asia 50 years ago.

Asunto(s)

Productos Agrícolas/genética , Evolución Molecular , Variación Genética , Oryza/clasificación , Oryza/genética , Secuencia Conservada , Domesticación , Especiación Genética , Genoma de Planta , Filogenia

11.

Endogenous florendoviruses are major components of plant genomes and hallmarks of virus evolution.

Geering, Andrew D W; Maumus, Florian; Copetti, Dario; Choisne, Nathalie; Zwickl, Derrick J; Zytnicki, Matthias; McTaggart, Alistair R; Scalabrin, Simone; Vezzulli, Silvia; Wing, Rod A; Quesneville, Hadi; Teycheney, Pierre-Yves.

Nat Commun ; 5: 5269, 2014 Nov 10.

Artículo en Inglés | MEDLINE | ID: mdl-25381880

RESUMEN

The extent and importance of endogenous viral elements have been extensively described in animals but are much less well understood in plants. Here we describe a new genus of Caulimoviridae called 'Florendovirus', members of which have colonized the genomes of a large diversity of flowering plants, sometimes at very high copy numbers (>0.5% total genome content). The genome invasion of Oryza is dated to over 1.8 million years ago (MYA) but phylogeographic evidence points to an even older age of 20-34 MYA for this virus group. Some appear to have had a bipartite genome organization, a unique characteristic among viral retroelements. In Vitis vinifera, 9% of the endogenous florendovirus loci are located within introns and therefore may influence host gene expression. The frequent colocation of endogenous florendovirus loci with TA simple sequence repeats, which are associated with chromosome fragility, suggests sequence capture during repair of double-stranded DNA breaks.

Asunto(s)

Caulimoviridae/genética , Evolución Molecular , Genoma de Planta/genética , Oryza/virología , Filogenia , Dosificación de Gen/genética , Sitios Genéticos/genética , Intrones/genética , Repeticiones de Microsatélite/genética , Replicación Viral/genética

12.

A simple method for estimating informative node age priors for the fossil calibration of molecular divergence time analyses.

Nowak, Michael D; Smith, Andrew B; Simpson, Carl; Zwickl, Derrick J.

PLoS One ; 8(6): e66245, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23755303

RESUMEN

Molecular divergence time analyses often rely on the age of fossil lineages to calibrate node age estimates. Most divergence time analyses are now performed in a Bayesian framework, where fossil calibrations are incorporated as parametric prior probabilities on node ages. It is widely accepted that an ideal parameterization of such node age prior probabilities should be based on a comprehensive analysis of the fossil record of the clade of interest, but there is currently no generally applicable approach for calculating such informative priors. We provide here a simple and easily implemented method that employs fossil data to estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade, which can be used to fit an informative parametric prior probability distribution on a node age. Specifically, our method uses the extant diversity and the stratigraphic distribution of fossil lineages confidently assigned to a clade to fit a branching model of lineage diversification. Conditioning this on a simple model of fossil preservation, we estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade. The likelihood surface of missing history can then be translated into a parametric prior probability distribution on the age of the clade of interest. We show that the method performs well with simulated fossil distribution data, but that the likelihood surface of missing history can at times be too complex for the distribution-fitting algorithm employed by our software tool. An empirical example of the application of our method is performed to estimate echinoid node ages. A simulation-based sensitivity analysis using the echinoid data set shows that node age prior distributions estimated under poor preservation rates are significantly less informative than those estimated under high preservation rates.

Asunto(s)

Especiación Genética , Modelos Genéticos , Algoritmos , Animales , Teorema de Bayes , Calibración , Evolución Molecular , Fósiles , Funciones de Verosimilitud , Modelos Estadísticos , Erizos de Mar/genética , Programas Informáticos

13.

A large-scale, higher-level, molecular phylogenetic study of the insect order Lepidoptera (moths and butterflies).

Regier, Jerome C; Mitter, Charles; Zwick, Andreas; Bazinet, Adam L; Cummings, Michael P; Kawahara, Akito Y; Sohn, Jae-Cheon; Zwickl, Derrick J; Cho, Soowon; Davis, Donald R; Baixeras, Joaquin; Brown, John; Parr, Cynthia; Weller, Susan; Lees, David C; Mitter, Kim T.

PLoS One ; 8(3): e58568, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23554903

RESUMEN

BACKGROUND: Higher-level relationships within the Lepidoptera, and particularly within the species-rich subclade Ditrysia, are generally not well understood, although recent studies have yielded progress. We present the most comprehensive molecular analysis of lepidopteran phylogeny to date, focusing on relationships among superfamilies. METHODOLOGY PRINCIPAL FINDINGS: 483 taxa spanning 115 of 124 families were sampled for 19 protein-coding nuclear genes, from which maximum likelihood tree estimates and bootstrap percentages were obtained using GARLI. Assessment of heuristic search effectiveness showed that better trees and higher bootstrap percentages probably remain to be discovered even after 1000 or more search replicates, but further search proved impractical even with grid computing. Other analyses explored the effects of sampling nonsynonymous change only versus partitioned and unpartitioned total nucleotide change; deletion of rogue taxa; and compositional heterogeneity. Relationships among the non-ditrysian lineages previously inferred from morphology were largely confirmed, plus some new ones, with strong support. Robust support was also found for divergences among non-apoditrysian lineages of Ditrysia, but only rarely so within Apoditrysia. Paraphyly for Tineoidea is strongly supported by analysis of nonsynonymous-only signal; conflicting, strong support for tineoid monophyly when synonymous signal was added back is shown to result from compositional heterogeneity. CONCLUSIONS SIGNIFICANCE: Support for among-superfamily relationships outside the Apoditrysia is now generally strong. Comparable support is mostly lacking within Apoditrysia, but dramatically increased bootstrap percentages for some nodes after rogue taxon removal, and concordance with other evidence, strongly suggest that our picture of apoditrysian phylogeny is approximately correct. This study highlights the challenge of finding optimal topologies when analyzing hundreds of taxa. It also shows that some nodes get strong support only when analysis is restricted to nonsynonymous change, while total change is necessary for strong support of others. Thus, multiple types of analyses will be necessary to fully resolve lepidopteran phylogeny.

Asunto(s)

Mariposas Diurnas/genética , Mariposas Nocturnas/genética , Filogenia , Animales , Mariposas Diurnas/clasificación , Mariposas Nocturnas/clasificación

14.

Resolving discrepancy between nucleotides and amino acids in deep-level arthropod phylogenomics: differentiating serine codons in 21-amino-acid models.

Zwick, Andreas; Regier, Jerome C; Zwickl, Derrick J.

PLoS One ; 7(11): e47450, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-23185239

RESUMEN

BACKGROUND: In a previous study of higher-level arthropod phylogeny, analyses of nucleotide sequences from 62 protein-coding nuclear genes for 80 panarthopod species yielded significantly higher bootstrap support for selected nodes than did amino acids. This study investigates the cause of that discrepancy. METHODOLOGY/PRINCIPAL FINDINGS: The hypothesis is tested that failure to distinguish the serine residues encoded by two disjunct clusters of codons (TCN, AGY) in amino acid analyses leads to this discrepancy. In one test, the two clusters of serine codons (Ser1, Ser2) are conceptually translated as separate amino acids. Analysis of the resulting 21-amino-acid data matrix shows striking increases in bootstrap support, in some cases matching that in nucleotide analyses. In a second approach, nucleotide and 20-amino-acid data sets are artificially altered through targeted deletions, modifications, and replacements, revealing the pivotal contributions of distinct Ser1 and Ser2 codons. We confirm that previous methods of coding nonsynonymous nucleotide change are robust and computationally efficient by introducing two new degeneracy coding methods. We demonstrate for degeneracy coding that neither compositional heterogeneity at the level of nucleotides nor codon usage bias between Ser1 and Ser2 clusters of codons (or their separately coded amino acids) is a major source of non-phylogenetic signal. CONCLUSIONS: The incongruity in support between amino-acid and nucleotide analyses of the forementioned arthropod data set is resolved by showing that "standard" 20-amino-acid analyses yield lower node support specifically when serine provides crucial signal. Separate coding of Ser1 and Ser2 residues yields support commensurate with that found by degenerated nucleotides, without introducing phylogenetic artifacts. While exclusion of all serine data leads to reduced support for serine-sensitive nodes, these nodes are still recovered in the ML topology, indicating that the enhanced signal from Ser1 and Ser2 is not qualitatively different from that of the other amino acids.

Asunto(s)

Aminoácidos/genética , Artrópodos/genética , Codón/genética , Genómica/métodos , Nucleótidos/genética , Filogenia , Serina/genética , Animales , Bases de Datos Genéticas , Funciones de Verosimilitud , Modelos Genéticos , Terminología como Asunto

15.

Molecular evolution of Na+ channels in teleost fishes.

Zakon, Harold H; Jost, Manda C; Zwickl, Derrick J; Lu, Ying; Hillis, David M.

Integr Zool ; 4(1): 64-74, 2009 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-21392277

RESUMEN

Voltage-dependent sodium channels are critical for electrical excitability. Invertebrates possess a single sodium channel gene; two rounds of genome duplication early in vertebrates increased the number to four. Since the teleost-tetrapod split, independent gene duplications in each lineage have further increased the number of sodium channel genes to 10 in tetrapods and 8 in teleosts. Here we review how the occurrence of multiple sodium channel paralogs has influenced the evolutionary history of three groups of fishes: pufferfish, gymnotiform and mormyriform electric fish. Pufferfish (tetraodontidae) produce a neurotoxin, tetrodotoxin, that binds to and blocks the pore of sodium channels. Pufferfish evolved resistance to their own toxins by amino acid substitutions in the pore of their sodium channels. These substitutions had to occur in parallel across multiple paralogs for organismal resistance to evolve. Gymnotiform and mormyriform fishes independently evolved electric organs to generate electricity for communication and object localization. Two sodium channel genes are expressed in muscle in most fishes. In both groups of weakly electric fishes, one gene lost its expression in muscle and became compartmentalized in the evolutionary novel electric organ, which is a muscle derivative. This gene then evolved at elevated rates, whereas the gene that is still expressed in muscle does not show elevated rates of evolution. In the electric organ-expressing gene, amino acid substitutions occur in parts of the channel involved in determining how long the channel will be open or closed. The enhanced rate of sequence evolution of this gene likely underlies the species-level variations in the electric signal.

Asunto(s)

Pez Eléctrico/fisiología , Evolución Molecular , Canales de Sodio/fisiología , Tetraodontiformes/fisiología , Secuencia de Aminoácidos , Sustitución de Aminoácidos , Animales , Resistencia a Medicamentos/genética , Órgano Eléctrico/fisiología , Genes Duplicados/genética , Datos de Secuencia Molecular , Músculo Esquelético/metabolismo , Filogenia , Alineación de Secuencia , Canales de Sodio/genética , Tetrodotoxina/toxicidad

16.

Evaluating the robustness of phylogenetic methods to among-site variability in substitution processes.

Holder, Mark T; Zwickl, Derrick J; Dessimoz, Christophe.

Philos Trans R Soc Lond B Biol Sci ; 363(1512): 4013-21, 2008 Dec 27.

Artículo en Inglés | MEDLINE | ID: mdl-18852108

RESUMEN

Computer simulations provide a flexible method for assessing the power and robustness of phylogenetic inference methods. Unfortunately, simulated data are often obviously atypical of data encountered in studies of molecular evolution. Unrealistic simulations can lead to conclusions that are irrelevant to real-data analyses or can provide a biased view of which methods perform well. Here, we present a software tool designed to generate data under a complex codon model that allows each residue in the protein sequence to have a different set of equilibrium amino acid frequencies. The software can obtain maximum-likelihood estimates of the parameters of the Halpern and Bruno model from empirical data and a fixed tree; given an arbitrary tree and a fixed set of parameters, the software can then simulate artificial datasets.We present the results of a simulation experiment using randomly generated tree shapes and substitution parameters estimated from 1610 mammalian cytochrome b sequences.We tested tree inference at the amino acid, nucleotide and codon levels and under parsimony, maximum-likelihood, Bayesian and distance criteria (for a total of more than 650 analyses on each dataset). Based on these simulations, nucleotide-level analyses seem to be more accurate than amino acid and codon analyses. The performance of distance-based phylogenetic methods appears to be quite sensitive to the choice of model and the form of rate heterogeneity used. Further studies are needed to assess the generality of these conclusions. For example, fitting parameters of the Halpern Bruno model to sequences from other genes will reveal the extent to which our conclusions were influenced by the choice of cytochrome b. Incorporating codon bias and more sources heterogeneity into the simulator will be crucial to determining whether the current results are caused by a bias in the current simulation study in favour of nucleotide analyses.

Asunto(s)

Algoritmos , Sustitución de Aminoácidos/genética , Clasificación/métodos , Evolución Molecular , Modelos Genéticos , Filogenia , Teorema de Bayes , Codón/genética , Simulación por Computador , Citocromos b/genética , Funciones de Verosimilitud

17.

Molecular evolution of communication signals in electric fish.

Zakon, Harold H; Zwickl, Derrick J; Lu, Ying; Hillis, David M.

J Exp Biol ; 211(Pt 11): 1814-8, 2008 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-18490397

RESUMEN

Animal communication systems are subject to natural selection so the imprint of selection must reside in the genome of each species. Electric fish generate electric organ discharges (EODs) from a muscle-derived electric organ (EO) and use these fields for electrolocation and communication. Weakly electric teleosts have evolved at least twice (mormyriforms, gymnotiforms) allowing a comparison of the workings of evolution in two independently evolved sensory/motor systems. We focused on the genes for two Na(+) channels, Nav1.4a and Nav1.4b, which are orthologs of the mammalian muscle-expressed Na(+) channel gene Nav1.4. Both genes are expressed in muscle in non-electric fish. Nav1.4b is expressed in muscle in electric fish, but Nav1.4a expression has been lost from muscle and gained in the evolutionarily novel EO in both groups. We hypothesized that Nav1.4a might be evolving to optimize the EOD for different sensory environments and the generation of species-specific communication signals. We obtained the sequence for Nav1.4a from non-electric, mormyriform and gymnotiform species, estimated a phylogenetic tree, and determined rates of evolution. We observed elevated rates of evolution in this gene in both groups coincident with the loss of Nav1.4a from muscle and its compartmentalization in EO. We found amino acid substitutions at sites known to be critical for channel inactivation; analyses suggest that these changes are likely to be the result of positive selection. We suggest that the diversity of EOD waveforms in both groups of electric fish is correlated with accelerations in the rate of evolution of the Nav1.4a Na(+) channel gene due to changes in selection pressure on the gene once it was solely expressed in the EO.

Asunto(s)

Comunicación Animal , Pez Eléctrico/genética , Evolución Molecular , Secuencia de Aminoácidos , Animales , Proteínas de Peces/química , Proteínas de Peces/genética , Proteínas de Peces/fisiología , Datos de Secuencia Molecular , Proteínas Musculares/química , Proteínas Musculares/genética , Proteínas Musculares/fisiología , Filogenia , Selección Genética , Alineación de Secuencia , Canales de Sodio/química , Canales de Sodio/genética , Canales de Sodio/fisiología , Especificidad de la Especie

18.

Sodium channel genes and the evolution of diversity in communication signals of electric fishes: convergent molecular evolution.

Zakon, Harold H; Lu, Ying; Zwickl, Derrick J; Hillis, David M.

Proc Natl Acad Sci U S A ; 103(10): 3675-80, 2006 Mar 07.

Artículo en Inglés | MEDLINE | ID: mdl-16505358

RESUMEN

We investigated whether the evolution of electric organs and electric signal diversity in two independently evolved lineages of electric fishes was accompanied by convergent changes on the molecular level. We found that a sodium channel gene (Na(v)1.4a) that is expressed in muscle in nonelectric fishes has lost its expression in muscle and is expressed instead in the evolutionarily novel electric organ in both lineages of electric fishes. This gene appears to be evolving under positive selection in both lineages, facilitated by its restricted expression in the electric organ. This view is reinforced by the lack of evidence for selection on this gene in one electric species in which expression of this gene is retained in muscle. Amino acid replacements occur convergently in domains that influence channel inactivation, a key trait for shaping electric communication signals. Some amino acid replacements occur at or adjacent to sites at which disease-causing mutations have been mapped in human sodium channel genes, emphasizing that these replacements occur in functionally important domains. Selection appears to have acted on the final step in channel inactivation, but complementarily on the inactivation "ball" in one lineage, and its receptor site in the other lineage. Thus, changes in the expression and sequence of the same gene are associated with the independent evolution of signal complexity.

Asunto(s)

Pez Eléctrico/genética , Evolución Molecular , Canales de Sodio/genética , Secuencia de Aminoácidos , Animales , Pez Eléctrico/clasificación , Órgano Eléctrico/metabolismo , Peces/clasificación , Peces/genética , Gymnotiformes/clasificación , Gymnotiformes/genética , Humanos , Datos de Secuencia Molecular , Filogenia , Homología de Secuencia de Aminoácido , Transducción de Señal/genética , Canales de Sodio/química , Especificidad de la Especie

19.

Model parameterization, prior distributions, and the general time-reversible model in Bayesian phylogenetics.

Zwickl, Derrick; Holder, Mark.

Syst Biol ; 53(6): 877-88, 2004 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-15764557

RESUMEN

Bayesian phylogenetic methods require the selection of prior probability distributions for all parameters of the model of evolution. These distributions allow one to incorporate prior information into a Bayesian analysis, but even in the absence of meaningful prior information, a prior distribution must be chosen. In such situations, researchers typically seek to choose a prior that will have little effect on the posterior estimates produced by an analysis, allowing the data to dominate. Sometimes a prior that is uniform (assigning equal prior probability density to all points within some range) is chosen for this purpose. In reality, the appropriate prior depends on the parameterization chosen for the model of evolution, a choice that is largely arbitrary. There is an extensive Bayesian literature on appropriate prior choice, and it has long been appreciated that there are parameterizations for which uniform priors can have a strong influence on posterior estimates. We here discuss the relationship between model parameterization and prior specification, using the general time-reversible model of nucleotide evolution as an example. We present Bayesian analyses of 10 simulated data sets obtained using a variety of prior distributions and parameterizations of the general time-reversible model. Uniform priors can produce biased parameter estimates under realistic conditions, and a variety of alternative priors avoid this bias.

Asunto(s)

Modelos Teóricos , Filogenia , Animales , Teorema de Bayes , Evolución Biológica , Factores de Tiempo

20.

Increased taxon sampling greatly reduces phylogenetic error.

Zwickl, Derrick J; Hillis, David M.

Syst Biol ; 51(4): 588-98, 2002 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-12228001

RESUMEN

Several authors have argued recently that extensive taxon sampling has a positive and important effect on the accuracy of phylogenetic estimates. However, other authors have argued that there is little benefit of extensive taxon sampling, and so phylogenetic problems can or should be reduced to a few exemplar taxa as a means of reducing the computational complexity of the phylogenetic analysis. In this paper we examined five aspects of study design that may have led to these different perspectives. First, we considered the measurement of phylogenetic error across a wide range of taxon sample sizes, and conclude that the expected error based on randomly selecting trees (which varies by taxon sample size) must be considered in evaluating error in studies of the effects of taxon sampling. Second, we addressed the scope of the phylogenetic problems defined by different samples of taxa, and argue that phylogenetic scope needs to be considered in evaluating the importance of taxon-sampling strategies. Third, we examined the claim that fast and simple tree searches are as effective as more thorough searches at finding near-optimal trees that minimize error. We show that a more complete search of tree space reduces phylogenetic error, especially as the taxon sample size increases. Fourth, we examined the effects of simple versus complex simulation models on taxonomic sampling studies. Although benefits of taxon sampling are apparent for all models, data generated under more complex models of evolution produce higher overall levels of error and show greater positive effects of increased taxon sampling. Fifth, we asked if different phylogenetic optimality criteria show different effects of taxon sampling. Although we found strong differences in effectiveness of different optimality criteria as a function of taxon sample size, increased taxon sampling improved the results from all the common optimality criteria. Nonetheless, the method that showed the lowest overall performance (minimum evolution) also showed the least improvement from increased taxon sampling. Taking each of these results into account re-enforces the conclusion that increased sampling of taxa is one of the most important ways to increase overall phylogenetic accuracy.

Asunto(s)

Filogenia , Proyectos de Investigación , Funciones de Verosimilitud

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA