Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 73
Filtrar
1.
Syst Biol ; 71(3): 721-740, 2022 04 19.
Artículo en Inglés | MEDLINE | ID: mdl-34677617

RESUMEN

A potential shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting. Coalescent methods address this problem but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both concatenation and coalescent methods, retroelement insertions (RIs) have emerged as powerful phylogenomic markers for species tree estimation. Here, we show that two recently proposed quartet-based methods, SDPquartets and ASTRAL_BP, are statistically consistent estimators of the unrooted species tree topology under the coalescent when RIs follow a neutral infinite-sites model of mutation and the expected number of new RIs per generation is constant across the species tree. The accuracy of these (and other) methods for inferring species trees from RIs has yet to be assessed on simulated data sets, where the true species tree topology is known. Therefore, we evaluated eight methods given RIs simulated from four model species trees, all of which have short branches and at least three of which are in the anomaly zone. In our simulation study, ASTRAL_BP and SDPquartets always recovered the correct species tree topology when given a sufficiently large number of RIs, as predicted. A distance-based method (ASTRID_BP) and Dollo parsimony also performed well in recovering the species tree topology. In contrast, unordered, polymorphism, and Camin-Sokal parsimony (as well as an approach based on MDC) typically fail to recover the correct species tree topology in anomaly zone situations with more than four ingroup taxa. Of the methods studied, only ASTRAL_BP automatically estimates internal branch lengths (in coalescent units) and support values (i.e., local posterior probabilities). We examined the accuracy of branch length estimation, finding that estimated lengths were accurate for short branches but upwardly biased otherwise. This led us to derive the maximum likelihood (branch length) estimate for when RIs are given as input instead of binary gene trees; this corrected formula produced accurate estimates of branch lengths in our simulation study provided that a sufficiently large number of RIs were given as input. Lastly, we evaluated the impact of data quantity on species tree estimation by repeating the above experiments with input sizes varying from 100 to 100,000 parsimony-informative RIs. We found that, when given just 1000 parsimony-informative RIs as input, ASTRAL_BP successfully reconstructed major clades (i.e., clades separated by branches $>0.3$ coalescent units) with high support and identified rapid radiations (i.e., shorter connected branches), although not their precise branching order. The local posterior probability was effective for controlling false positive branches in these scenarios. [Coalescence; incomplete lineage sorting; Laurasiatheria; Palaeognathae; parsimony; polymorphism parsimony; retroelement insertions; species trees; transposon.].


Asunto(s)
Paleognatos , Retroelementos , Animales , Simulación por Computador , Modelos Genéticos , Filogenia , Retroelementos/genética
2.
Cladistics ; 39(5): 418-436, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37096985

RESUMEN

Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (congsort), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson-Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.


Asunto(s)
Artefactos , Filogenia , Funciones de Verosimilitud
3.
Mol Phylogenet Evol ; 171: 107463, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35358696

RESUMEN

The loss of teeth and evolution of baleen racks in Mysticeti was a profound transformation that permitted baleen whales to radiate and diversify into a previously underutilized ecological niche of bulk filter-feeding on zooplankton and other small prey. Ancestral state reconstructions suggest that postnatal teeth were lost in the common ancestor of crown Mysticeti. Genomic studies provide some support for this hypothesis and suggest that the genetic toolkit for enamel production was inactivated in the common ancestor of living baleen whales. However, molecular studies to date have not provided direct evidence for the complete loss of teeth, including their dentin component, on the stem mysticete branch. Given these results, several questions remain unanswered: (1) Were teeth lost in a single step or did enamel loss precede dentin loss? (2) Was enamel lost early or late on the stem mysticete branch? (3) If enamel and dentin/tooth loss were decoupled in the ancestry of baleen whales, did dentin loss occur on the stem mysticete branch or independently in different crown mysticete lineages? To address these outstanding questions, we compiled and analyzed complete protein-coding sequences for nine tooth-related genes from cetaceans with available genome data. Seven of these genes are associated with enamel formation (ACP4, AMBN, AMELX, AMTN, ENAM, KLK4, MMP20) whereas two other genes are either dentin-specific (DSPP) or tooth-specific (ODAPH) but not enamel-specific. Molecular evolutionary analyses indicate that all seven enamel-specific genes have inactivating mutations that are scattered across branches of the mysticete tree. Three of the enamel genes (ACP4, KLK4, MMP20) have inactivating mutations that are shared by all mysticetes. The two genes that are dentin-specific (DSPP) or tooth-specific (ODAPH) do not have any inactivating mutations that are shared by all mysticetes, but there are shared mutations in Balaenidae as well as in Plicogulae (Neobalaenidae + Balaenopteroidea). These shared mutations suggest that teeth were lost at most two times. Shared inactivating mutations and dN/dS analyses, in combination with cetacean divergence times, were used to estimate inactivation times of genes and by proxy enamel and tooth phenotypes at ancestral nodes. The results of these analyses are most compatible with a two-step model for the loss of teeth in the ancestry of living baleen whales: enamel was lost very early on the stem Mysticeti branch followed by the independent loss of dentin (and teeth) in the common ancestors of Balaenidae and Plicogulae, respectively. These results imply that some stem mysticetes, and even early crown mysticetes, may have had vestigial teeth comprised of dentin with no enamel. Our results also demonstrate that all odontocete species (in our study) with absent or degenerative enamel have inactivating mutations in one or more of their enamel genes.


Asunto(s)
Evolución Biológica , Metaloproteinasa 20 de la Matriz , Animales , Esmalte Dental , Metaloproteinasa 20 de la Matriz/genética , Filogenia , Ballenas/genética
4.
Mol Phylogenet Evol ; 167: 107344, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34748873

RESUMEN

Phylogenomic analyses of ancient rapid radiations can produce conflicting results that are driven by differential sampling of taxa and characters as well as the limitations of alternative analytical methods. We re-examine basal relationships of palaeognath birds (ratites and tinamous) using recently published datasets of nucleotide characters from 20,850 loci as well as 4301 retroelement insertions. The original studies attributed conflicting resolutions of rheas in their inferred coalescent and concatenation trees to concatenation failing in the anomaly zone. By contrast, we find that the coalescent-based resolution of rheas is premised upon extensive gene-tree estimation errors. Furthermore, retroelement insertions contain much more conflict than originally reported and multiple insertion loci support the basal position of rheas found in concatenation trees, while none were reported in the original publication. We demonstrate how even remarkable congruence in phylogenomic studies may be driven by long-branch misplacement of a divergent outgroup, highly incongruent gene trees, differential taxon sampling that can result in gene-tree misrooting errors that bias species-tree inference, and gross homology errors. What was previously interpreted as broad, robustly supported corroboration for a single resolution in coalescent analyses may instead indicate a common bias that taints phylogenomic results across multiple genome-scale datasets. The updated retroelement dataset now supports a species tree with branch lengths that suggest an ancient anomaly zone, and both concatenation and coalescent analyses of the huge nucleotide datasets fail to yield coherent, reliable results in this challenging phylogenetic context.


Asunto(s)
Aves , Genoma , Animales , Aves/genética , Filogenia
5.
Proc Biol Sci ; 288(1961): 20211213, 2021 10 27.
Artículo en Inglés | MEDLINE | ID: mdl-34702078

RESUMEN

The deep sea has been described as the last major ecological frontier, as much of its biodiversity is yet to be discovered and described. Beaked whales (ziphiids) are among the most visible inhabitants of the deep sea, due to their large size and worldwide distribution, and their taxonomic diversity and much about their natural history remain poorly understood. We combine genomic and morphometric analyses to reveal a new Southern Hemisphere ziphiid species, Ramari's beaked whale, Mesoplodon eueu, whose name is linked to the Indigenous peoples of the lands from which the species holotype and paratypes were recovered. Mitogenome and ddRAD-derived phylogenies demonstrate reciprocally monophyletic divergence between M. eueu and True's beaked whale (M. mirus) from the North Atlantic, with which it was previously subsumed. Morphometric analyses of skulls also distinguish the two species. A time-calibrated mitogenome phylogeny and analysis of two nuclear genomes indicate divergence began circa 2 million years ago (Ma), with geneflow ceasing 0.35-0.55 Ma. This is an example of how deep sea biodiversity can be unravelled through increasing international collaboration and genome sequencing of archival specimens. Our consultation and involvement with Indigenous peoples offers a model for broadening the cultural scope of the scientific naming process.


Asunto(s)
Genómica , Ballenas , Animales , Núcleo Celular , Filogenia , Ballenas/anatomía & histología , Ballenas/genética
6.
Mol Phylogenet Evol ; 158: 107092, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33545272

RESUMEN

In two-step coalescent analyses of phylogenomic data, gene-tree topologies are treated as fixed prior to species-tree inference. Although all gene-tree conflict is assumed to be caused by lineage sorting when applying these methods, in empirical datasets much of the conflict can be caused by estimation error. Weakly supported and even arbitrarily resolved clades are important sources of this estimation error for gene trees inferred from few informative characters relative to the number of sampled terminals, and the resulting extraneous conflict among gene trees can negatively impact species-tree inference. In this study, we quantified the relative severity of alternative methods for collapsing gene-tree branches for seven empirical datasets and quantified their effects on species-tree inference. The branch-collapsing methods that we employed were based on the strict consensus of optimal topologies, various bootstrap thresholds, and 0% approximate likelihood ratio test (SH-like aLRT) support. Up to 86% of internal gene-tree branches are dubiously or arbitrarily resolved in reanalyses of these published phylogenomic datasets, and collapsing these branches increased inferred species-tree coalescent branch lengths by up to 455%. For two datasets, the longer inferred branch lengths sometimes impacted inference of anomaly-zone conditions. Although branch-collapsing methods did not consistently affect the species-tree topology, they often increased branch support. The more severe and clearly justified gene-tree branch-collapsing methods, which we recommend be broadly applied for two-step coalescent analyses, are use of the strict consensus in parsimony analyses and the collapse clades with 0% SH-like aLRT support in likelihood analyses. Collapsing dubiously or arbitrarily resolved branches in gene trees sometimes improved congruence between coalescent-based results and concatenation trees. In such cases, we contend that the resolution provided by concatenation should be preferred and that incomplete lineage sorting is a poor explanation for the initial conflict between phylogenetic approaches.


Asunto(s)
Modelos Genéticos , Animales , Aves/genética , Funciones de Verosimilitud , Lagartos , Filogenia , Sciuridae/genética , Programas Informáticos
7.
Mol Phylogenet Evol ; 152: 106924, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32771548

RESUMEN

Extant species in the order Crocodylia are remnants of an ancient lineage of large-bodied archosaur reptiles. Despite decades of systematic studies, phylogenetic relationships among members of the genus Crocodylus (true crocodiles) in the Neotropics are poorly understood. Here we estimated phylogenomic relationships among the four extant Crocodylus species in the Americas. Species-tree reconstructions using genotypic data from 17,538 SNPs collected for 33 individuals spanning six Crocodylus species (four ingroup and two outgroup) revealed novel relationships for all Neotropical species. For the first time, C. acutus, the American crocodile, was recovered as monophyletic when individuals from Antillean and continental populations were analyzed together. Our results also contradict previous inferences based on mitochondrial DNA data and a limited number of nuclear markers by robustly grouping Morelet's crocodile (C. moreletii) as the sister species to C. acutus, suggesting a novel phylogeographic hypothesis for the group. The present study punctuates the importance of using nuclear genome-wide information and representative sampling for resolving phylogenetic relationships, especially in broadly distributed species and those with complex evolutionary histories.


Asunto(s)
Caimanes y Cocodrilos/clasificación , Caimanes y Cocodrilos/genética , Filogenia , Américas , Animales , ADN Mitocondrial/genética , Genotipo , Filogeografía
8.
J Hered ; 111(2): 147-168, 2020 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-31837265

RESUMEN

DNA sequence alignments have provided the majority of data for inferring phylogenetic relationships with both concatenation and coalescent methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroelement insertions have emerged as a powerful alternative to sequences for deciphering evolutionary relationships because these data are nearly homoplasy-free. In addition, retroelement insertions satisfy the "no intralocus-recombination" assumption of summary coalescent methods because they are singular events and better approximate neutrality relative to DNA loci commonly sampled in phylogenomic studies. Retroelements have traditionally been analyzed with parsimony, distance, and network methods. Here, we analyze retroelement data sets for vertebrate clades (Placentalia, Laurasiatheria, Balaenopteroidea, Palaeognathae) with 2 ILS-aware methods that operate by extracting, weighting, and then assembling unrooted quartets into a species tree. The first approach constructs a species tree from retroelement bipartitions with ASTRAL, and the second method is based on split-decomposition with parsimony. We also develop a Quartet-Asymmetry test to detect hybridization using retroelements. Both ILS-aware methods recovered the same species-tree topology for each data set. The ASTRAL species trees for Laurasiatheria have consecutive short branch lengths in the anomaly zone whereas Palaeognathae is outside of this zone. For the Balaenopteroidea data set, which includes rorquals (Balaenopteridae) and gray whale (Eschrichtiidae), both ILS-aware methods resolved balaeonopterids as paraphyletic. Application of the Quartet-Asymmetry test to this data set detected 19 different quartets of species for which historical introgression may be inferred. Evidence for introgression was not detected in the other data sets.


Asunto(s)
Especiación Genética , Modelos Genéticos , Retroelementos , Vertebrados/genética , Animales , Elementos Transponibles de ADN , Hibridación Genética , Filogenia
9.
BMC Evol Biol ; 19(1): 31, 2019 01 23.
Artículo en Inglés | MEDLINE | ID: mdl-30674270

RESUMEN

BACKGROUND: The gene for odontogenic ameloblast-associated (ODAM) is a member of the secretory calcium-binding phosphoprotein gene family. ODAM is primarily expressed in dental tissues including the enamel organ and the junctional epithelium, and may also have pleiotropic functions that are unrelated to teeth. Here, we leverage the power of natural selection to test competing hypotheses that ODAM is tooth-specific versus pleiotropic. Specifically, we compiled and screened complete protein-coding sequences, plus sequences for flanking intronic regions, for ODAM in 165 placental mammals to determine if this gene contains inactivating mutations in lineages that either lack teeth (baleen whales, pangolins, anteaters) or lack enamel on their teeth (aardvarks, sloths, armadillos), as would be expected if the only essential functions of ODAM are related to tooth development and the adhesion of the gingival junctional epithelium to the enamel tooth surface. RESULTS: We discovered inactivating mutations in all species of placental mammals that either lack teeth or lack enamel on their teeth. A surprising result is that ODAM is also inactivated in a few additional lineages including all toothed whales that were examined. We hypothesize that ODAM inactivation is related to the simplified outer enamel surface of toothed whales. An alternate hypothesis is that ODAM inactivation in toothed whales may be related to altered antimicrobial functions of the junctional epithelium in aquatic habitats. Selection analyses on ODAM sequences revealed that the composite dN/dS value for pseudogenic branches is close to 1.0 as expected for a neutrally evolving pseudogene. DN/dS values on transitional branches were used to estimate ODAM inactivation times. In the case of pangolins, ODAM was inactivated ~ 65 million years ago, which is older than the oldest pangolin fossil (Eomanis, 47 Ma) and suggests an even more ancient loss or simplification of teeth in this lineage. CONCLUSION: Our results validate the hypothesis that the only essential functions of ODAM that are maintained by natural selection are related to tooth development and/or the maintenance of a healthy junctional epithelium that attaches to the enamel surface of teeth.


Asunto(s)
Ameloblastos/metabolismo , Esmalte Dental/metabolismo , Euterios/genética , Silenciador del Gen , Odontogénesis , Proteínas/genética , Ballenas/genética , Animales , Secuencia de Bases , Teorema de Bayes , Codón/genética , Femenino , Fósiles , Funciones de Verosimilitud , Mutación/genética , Filogenia , Embarazo , Proteínas/metabolismo
10.
Mol Phylogenet Evol ; 131: 80-92, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30391518

RESUMEN

In summary ("two-step") coalescent analyses of empirical data, researchers typically apply the bootstrap to quantify branch support for clades inferred on the optimal species tree. We tested whether site-wise bootstrap analyses provide consistently more conservative support than gene-wise bootstrap analyses. We did so using data from three empirical phylogenomic studies and employed four coalescent methods (ASTRAL, MP-EST, NJst, and STAR). We demonstrate that application of site-wise bootstrapping generally resulted in gene-trees with substantial additional conflicts relative to the original data and this approach therefore cannot be relied upon to provide conservative support. Instead the site-wise bootstrap can provide high support for apparently incorrect clades. We provide a script (https://github.com/dbsloan/msc_tree_resampling) that implements gene-wise resampling, using either the bootstrap or the jackknife, for use with ASTRAL, MP-EST, NJst, and STAR. We demonstrate that the gene-wise bootstrap outperformed the site-wise bootstrap for the primary focal clades for all four coalescent methods that were applied to all three empirical studies. For summary coalescent analyses we suggest that gene-wise resampling support should be favored over gene + site or site-wise resampling when numerous genes are sampled because site-wise resampling causes substantially greater gene-tree-estimation error.


Asunto(s)
Genes , Filogenia , Investigación Empírica , Probabilidad , Programas Informáticos
11.
Mol Phylogenet Evol ; 139: 106539, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31226465

RESUMEN

Genomic datasets sometimes support conflicting phylogenetic relationships when different tree-building methods are applied. Coherent interpretations of such results are enabled by partitioning support for controversial relationships among the constituent genes of a phylogenomic dataset. For the supermatrix (=concatenation) approach, several methods that measure the distribution of support and conflict among loci were introduced over 15 years ago. More recently, partitioned coalescence support (PCS) was developed for phylogenetic coalescence methods that account for incomplete lineage sorting and use the summed fits of gene trees to estimate the species tree. Here, we automate computation of PCS to permit application of this index to genome-scale matrices that include hundreds of loci. Reanalyses of four phylogenomic datasets for amniotes, land plants, skinks, and angiosperms demonstrate how PCS scores can be used to: (1) compare conflicting results favored by alternative coalescence methods, (2) identify outlier gene trees that have a disproportionate influence on the resolution of contentious relationships, (3) assess the effects of missing data in species-tree analysis, and (4) clarify biases in commonly-implemented coalescence methods and support indices. We show that key phylogenomic conclusions from these analyses often hinge on just a few gene trees and that results can be driven by specific biases of a particular coalescence method and/or the differential weight placed on gene trees with high versus low taxon sampling. The attribution of exceptionally high weight to some gene trees and very low weight to other gene trees counters the basic logic of phylogenomic coalescence analysis; even clades in species trees with high support according to commonly used indices (likelihood-ratio test, bootstrap, Bayesian local posterior probability) can be unstable to the removal of only one or two gene trees with high PCS. Computer simulations cannot adequately describe all of the contingencies and complexities of empirical genetic data. PCS scores complement simulation work by providing specific insights into a particular dataset given the assumptions of the phylogenetic coalescence method that is applied. In combination with standard measures of nodal support, PCS provides a more complete understanding of the overall genomic evidence for contested evolutionary relationships in species trees.


Asunto(s)
Filogenia , Animales , Teorema de Bayes , Sesgo , Evolución Biológica , Simulación por Computador , Genes , Genómica , Lagartos/clasificación , Lagartos/genética , Magnoliopsida/clasificación , Magnoliopsida/genética , Plantas/clasificación , Plantas/genética , Probabilidad
12.
Mol Phylogenet Evol ; 120: 364-374, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29277542

RESUMEN

MC5R is one of five melanocortin receptor genes found in placental mammals. MC5R plays an important role in energy homeostasis and is also expressed in the terminal differentiation of sebaceous glands. Among placental mammals there are multiple lineages that either lack or have degenerative sebaceous glands including Cetacea (whales, dolphins, and porpoises), Hippopotamidae (hippopotamuses), Sirenia (manatees and dugongs), Proboscidea (elephants), Rhinocerotidae (rhinos), and Heterocephalus glaber (naked mole rat). Given the loss or diminution of sebaceous glands in these taxa, we procured MC5R sequences from publicly available genomes and transcriptomes, supplemented by a newly generated sequence for Choeropsis liberiensis (pygmy hippopotamus), to determine if this gene remains intact or is inactivated in association with loss/reduction of sebaceous glands. Our data set includes complete MC5R sequences for 114 placental mammal species including two individuals of Mammuthus primigenius (woolly mammoth) from Oimyakon and Wrangel Island. Complete loss or inactivation of the MC5R gene occurs in multiple placental lineages that have lost sebaceous glands (Cetacea, West Indian manatee, African elephant, white rhinoceros) or are characterized by unusual skin (pangolins, aardvarks). Both M. primigenius individuals share inactivating mutations with the African elephant even though sebaceous glands have been reported in the former. MC5R remains intact in hippopotamuses and the naked mole rat, although slightly elevated dN/dS ratios in these lineages allow for the possibility that the accumulation of inactivating mutations in MC5R may lag behind the relaxation of purifying selection. For Cetacea and Hippopotamidae, the absence of shared inactivating mutations in two different skin genes (MC5R, PSORS1C2) is consistent with the hypothesis that semi-aquatic lifestyles were acquired independently in these clades following divergence from a common ancestor.


Asunto(s)
Evolución Molecular , Mamíferos/metabolismo , Receptores de Melanocortina/genética , Glándulas Sebáceas/fisiología , Animales , Secuencia de Bases , Bases de Datos Genéticas , Femenino , Mamíferos/clasificación , Filogenia , Placenta/metabolismo , Embarazo , Receptores de Melanocortina/clasificación , Alineación de Secuencia
13.
J Hered ; 109(3): 297-307, 2018 03 16.
Artículo en Inglés | MEDLINE | ID: mdl-29077895

RESUMEN

Homology is perhaps the most central concept of phylogenetic biology. At difficult to resolve polytomies that are deep in the Tree of Life, a few homology errors in phylogenomic data can drive spurious phylogenetic results. Feijoo and Parada (2017) assembled three phylogenomic data sets for mammals and reported methodological discrepancies and unexpected results that contradict the monophyly of well-established clades in Pinnipedia and Yangochiroptera. Examination of Feijoo and Parada's (2017) data sets reveals extensive homology errors (paralogous sequences, alignments of different exons to each other) and cross-contamination of sequences from different species. These problems predictably result in distorted estimates of gene trees, species trees, bootstrap support, and branch lengths. Correction of these errors resulted in robust support for conventional relationships in Pinnipedia and Yangochiroptera. Phylogenomic data sets are not immune to the problems of homology errors in sequence alignments. Rather, sequence alignments underlie all inferences in molecular phylogenetics and evolution and should be spot-checked for obvious errors via manual inspection of alignments and gene trees.


Asunto(s)
Caniformia/genética , Quirópteros/genética , Filogenia , Animales , Carnívoros/genética , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Exones , Funciones de Verosimilitud , Alineación de Secuencia/métodos , Alineación de Secuencia/estadística & datos numéricos
14.
Mol Phylogenet Evol ; 109: 375-387, 2017 04.
Artículo en Inglés | MEDLINE | ID: mdl-28193458

RESUMEN

Various toothed whales (Odontoceti) are unique among mammals in lacking olfactory bulbs as adults and are thought to be anosmic (lacking the olfactory sense). At the molecular level, toothed whales have high percentages of pseudogenic olfactory receptor genes, but species that have been investigated to date retain an intact copy of the olfactory marker protein gene (OMP), which is highly expressed in olfactory receptor neurons and may regulate the temporal resolution of olfactory responses. One hypothesis for the retention of intact OMP in diverse odontocete lineages is that this gene is pleiotropic with additional functions that are unrelated to olfaction. Recent expression studies provide some support for this hypothesis. Here, we report OMP sequences for representatives of all extant cetacean families and provide the first molecular evidence for inactivation of this gene in vertebrates. Specifically, OMP exhibits independent inactivating mutations in six different odontocete lineages: four river dolphin genera (Platanista, Lipotes, Pontoporia, Inia), sperm whale (Physeter), and harbor porpoise (Phocoena). These results suggest that the only essential role of OMP that is maintained by natural selection is in olfaction, although a non-olfactory role for OMP cannot be ruled out for lineages that retain an intact copy of this gene. Available genome sequences from cetaceans and close outgroups provide evidence of inactivating mutations in two additional genes (CNGA2, CNGA4), which imply further pseudogenization events in the olfactory cascade of odontocetes. Selection analyses demonstrate that evolutionary constraints on all three genes (OMP, CNGA2, CNGA4) have been greatly reduced in Odontoceti, but retain a signature of purifying selection on the stem Cetacea branch and in Mysticeti (baleen whales). This pattern is compatible with the 'echolocation-priority' hypothesis for the evolution of OMP, which posits that negative selection was maintained in the common ancestor of Cetacea and was not relaxed significantly until the evolution of echolocation in Odontoceti.


Asunto(s)
Delfines/genética , Proteína Marcadora Olfativa/genética , Animales , Secuencia de Bases , Evolución Biológica , ADN Mitocondrial , Delfines/clasificación , Evolución Molecular , Proteína Marcadora Olfativa/fisiología , Filogenia
15.
Mol Phylogenet Evol ; 106: 103-117, 2017 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-27640953

RESUMEN

Multi-locus nuclear DNA data were used to delimit species of fringe-toed lizards of the Uma notata complex, which are specialized for living in wind-blown sand habitats in the deserts of southwestern North America, and to infer whether Quaternary glacial cycles or Tertiary geological events were important in shaping the historical biogeography of this group. We analyzed ten nuclear loci collected using Sanger sequencing and genome-wide sequence/single-nucleotide polymorphism (SNP) data collected using restriction-associated DNA (RAD) sequencing. A combination of species discovery methods (concatenated phylogenies, parametric and non-parametric clustering algorithms) and species validation approaches (coalescent-based species tree/isolation-with-migration models) were used to delimit species, infer phylogenetic relationships, and to estimate effective population sizes, migration rates, and speciation times. Uma notata, U. inornata, U. cowlesi, and an undescribed species from Mohawk Dunes, Arizona (U. sp.) were supported as distinct in the concatenated analyses and by clustering algorithms, and all operational taxonomic units were decisively supported as distinct species by ranking hierarchical nested speciation models with Bayes factors based on coalescent-based species tree methods. However, significant unidirectional gene flow (2NM>1) from U. cowlesi and U. notata into U. rufopunctata was detected under the isolation-with-migration model. Therefore, we conservatively delimit four species-level lineages within this complex (U. inornata, U. notata, U. cowlesi, and U. sp.), treating U. rufopunctata as a hybrid population (U. notata×cowlesi). Both concatenated and coalescent-based estimates of speciation times support the hypotheses that speciation within the complex occurred during the late Pleistocene, and that the geological evolution of the Colorado River delta during this period was an important process shaping the observed phylogeographic patterns.


Asunto(s)
Flujo Génico , Lagartos/clasificación , Migración Animal , Animales , Teorema de Bayes , Biodiversidad , Análisis por Conglomerados , ADN/química , ADN/aislamiento & purificación , ADN/metabolismo , Lagartos/genética , Filogenia , Filogeografía , Polimorfismo de Nucleótido Simple , Análisis de Componente Principal , Análisis de Secuencia de ADN
16.
Cladistics ; 33(3): 295-332, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-34715726

RESUMEN

Recent phylogenetic analyses of a large dataset for mammalian families (169 taxa, 26 loci) portray contrasting results. Supermatrix (concatenation) methods support a generally robust tree with only a few inconsistently resolved polytomies, whereas MP-EST coalescence analysis of the same dataset yields a weakly supported tree that conflicts with many traditionally recognized clades. Here, we evaluate this discrepancy via improved coalescence analyses with reference to the rich history of phylogenetic studies on mammals. This integration clearly demonstrates that both supermatrix and coalescence analyses of just 26 loci yield a congruent, well-supported phylogenetic hypothesis for Mammalia. Discrepancies between published studies are explained by implementation of overly simple DNA substitution models, inadequate tree-search routines and limitations of the MP-EST method. We develop a simple measure, partitioned coalescence support (PCS), which summarizes the distribution of support and conflict among gene trees for a given clade. Extremely high PCS scores for outlier gene trees at two nodes in the mammalian tree indicate a troubling bias in the MP-EST method. We conclude that in this age of phylogenomics, a solid understanding of systematics fundamentals, choice of valid methodology and a broad knowledge of a clade's taxonomic history are still required to yield coherent phylogenetic inferences.

17.
Mol Phylogenet Evol ; 94(Pt A): 1-33, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26238460

RESUMEN

Higher-level relationships among placental mammals are mostly resolved, but several polytomies remain contentious. Song et al. (2012) claimed to have resolved three of these using shortcut coalescence methods (MP-EST, STAR) and further concluded that these methods, which assume no within-locus recombination, are required to unravel deep-level phylogenetic problems that have stymied concatenation. Here, we reanalyze Song et al.'s (2012) data and leverage these re-analyses to explore key issues in systematics including the recombination ratchet, gene tree stoichiometry, the proportion of gene tree incongruence that results from deep coalescence versus other factors, and simulations that compare the performance of coalescence and concatenation methods in species tree estimation. Song et al. (2012) reported an average locus length of 3.1 kb for the 447 protein-coding genes in their phylogenomic dataset, but the true mean length of these loci (start codon to stop codon) is 139.6 kb. Empirical estimates of recombination breakpoints in primates, coupled with consideration of the recombination ratchet, suggest that individual coalescence genes (c-genes) approach ∼12 bp or less for Song et al.'s (2012) dataset, three to four orders of magnitude shorter than the c-genes reported by these authors. This result has general implications for the application of coalescence methods in species tree estimation. We contend that it is illogical to apply coalescence methods to complete protein-coding sequences. Such analyses amalgamate c-genes with different evolutionary histories (i.e., exons separated by >100,000 bp), distort true gene tree stoichiometry that is required for accurate species tree inference, and contradict the central rationale for applying coalescence methods to difficult phylogenetic problems. In addition, Song et al.'s (2012) dataset of 447 genes includes 21 loci with switched taxonomic names, eight duplicated loci, 26 loci with non-homologous sequences that are grossly misaligned, and numerous loci with >50% missing data for taxa that are misplaced in their gene trees. These problems were compounded by inadequate tree searches with nearest neighbor interchange branch swapping and inadvertent application of substitution models that did not account for among-site rate heterogeneity. Sixty-six gene trees imply unrealistic deep coalescences that exceed 100 million years (MY). Gene trees that were obtained with better justified models and search parameters show large increases in both likelihood scores and congruence. Coalescence analyses based on a curated set of 413 improved gene trees and a superior coalescence method (ASTRAL) support a Scandentia (treeshrews)+Glires (rabbits, rodents) clade, contradicting one of the three primary systematic conclusions of Song et al. (2012). Robust support for a Perissodactyla+Carnivora clade within Laurasiatheria is also lost, contradicting a second major conclusion of this study. Song et al.'s (2012) MP-EST species tree provided the basis for circular simulations that led these authors to conclude that the multispecies coalescent accounts for 77% of the gene tree conflicts in their dataset, but many internal branches of their MP-EST tree are stunted by an order of magnitude or more due to wholesale gene tree reconstruction errors. An independent assessment of branch lengths suggests the multispecies coalescent accounts for ⩽ 15% of the conflicts among Song et al.'s (2012) 447 gene trees. Unfortunately, Song et al.'s (2012) flawed phylogenomic dataset has been used as a model for additional simulation work that suggests the superiority of shortcut coalescence methods relative to concatenation. Investigator error was passed on to the subsequent simulation studies, which also incorporated further logical errors that should be avoided in future simulation studies. Illegitimate branch length switches in the simulation routines unfairly protected coalescence methods from their Achilles' heel, high gene tree reconstruction error at short internodes. These simulations therefore provide no evidence that shortcut coalescence methods out-compete concatenation at deep timescales. In summary, the long c-genes that are required for accurate reconstruction of species trees using shortcut coalescence methods do not exist and are a delusion. Coalescence approaches based on SNPs that are widely spaced in the genome avoid problems with the recombination ratchet and merit further pursuit in both empirical systematic research and simulations.


Asunto(s)
Genes , Mamíferos/clasificación , Mamíferos/genética , Modelos Genéticos , Filogenia , Animales , Conjuntos de Datos como Asunto , Evolución Molecular , Polimorfismo de Nucleótido Simple/genética , Escandentios/clasificación , Escandentios/genética
18.
Mol Phylogenet Evol ; 100: 424-443, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27103257

RESUMEN

Observed Variability (OV) and Tree Independent Generation of Evolutionary Rates (TIGER) are quick and easy-to-apply tree-independent methods that have been proposed to provide unbiased estimates of each character's rate of evolution and serve as the basis for excluding rapidly evolving characters. Both methods have been applied to multiple phylogenomic datasets, and in many cases the authors considered their trees inferred from the OV- and TIGER-delimited sub-matrices to be better estimates of the phylogeny than their trees based on all characters. In this study we use four sets of simulations and an empirical phylogenomic example to demonstrate that both methods share a systematic bias against characters with more symmetric distributions of character states, against characters with greater observed character-state space, and against large clades in the context of character conflict. As a result these methods can favor convergences and reversals over synapomorphy, exacerbate long-branch attraction, and produce mutually exclusive phylogenetic inferences that are dependent upon differential taxon sampling. We assert that neither OV nor TIGER should be relied upon to increase the ratio of phylogenetic to non-phylogenetic signal in a data matrix. We also assert that skepticism is warranted for empirical phylogenetic results that are based on OV- and/or TIGER-based character deletion wherein a small clade is supported after deletion of characters, yet is contradicted by a larger clade when the entire data matrix was analyzed.


Asunto(s)
Evolución Molecular , Filogenia , Clasificación , Exactitud de los Datos , Interpretación Estadística de Datos , Variación Genética , Modelos Genéticos , Tipificación Molecular/métodos
19.
Mol Phylogenet Evol ; 97: 76-89, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-26768112

RESUMEN

Gene-tree-estimation error is a major concern for coalescent methods of phylogenetic inference. We sampled eight empirical studies of ancient lineages with diverse numbers of taxa and genes for which the original authors applied one or more coalescent methods. We found that the average pairwise congruence among gene trees varied greatly both between studies and also often within a study. We recommend that presenting plots of pairwise congruence among gene trees in a dataset be treated as a standard practice for empirical coalescent studies so that readers can readily assess the extent and distribution of incongruence among gene trees. ASTRAL-based coalescent analyses generally outperformed MP-EST and STAR with respect to both internal consistency (congruence between analyses of subsamples of genes with the complete dataset of all genes) and congruence with the concatenation-based topology. We evaluated the approach of subsampling gene trees that are, on average, more congruent with other gene trees as a method to reduce artifacts caused by gene-tree-estimation errors on coalescent analyses. We suggest that this method is well suited to testing whether gene-tree-estimation error is a primary cause of incongruence between concatenation- and coalescent-based results, to reconciling conflicting phylogenetic results based on different coalescent methods, and to identifying genes affected by artifacts that may then be targeted for reciprocal illumination. We provide scripts that automate the process of calculating pairwise gene-tree incongruence and subsampling trees while accounting for differential taxon sampling among genes. Finally, we assert that multiple tree-search replicates should be implemented as a standard practice for empirical coalescent studies that apply MP-EST.


Asunto(s)
Genes , Técnicas Genéticas , Filogenia , Artefactos , Reproducibilidad de los Resultados , Proyectos de Investigación
20.
Mol Phylogenet Evol ; 95: 34-45, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26596502

RESUMEN

Previous studies have reported inactivated copies of six enamel-related genes (AMBN, AMEL, AMTN, ENAM, KLK4, MMP20) and one dentin-related gene (DSPP) in one or more toothless vertebrates and/or vertebrates with enamelless teeth, thereby providing evidence that these genes are enamel or tooth-specific with respect to their critical functions that are maintained by natural selection. Here, we employ available genome sequences for edentulous and enamelless mammals to evaluate the enamel specificity of four genes (WDR72, SLC24A4, FAM83H, C4orf26) that have been implicated in amelogenesis imperfecta, a condition in which proper enamel formation is abrogated during tooth development. Coding sequences for WDR72, SCL24A4, and FAM83H are intact in four edentulous taxa (Chinese pangolin, three baleen whales) and three taxa (aardvark, nine-banded armadillo, Hoffmann's two-toed sloth) with enamelless teeth, suggesting that these genes have critical functions beyond their involvement in tooth development. By contrast, genomic data for C4orf26 reveal inactivating mutations in pangolin and bowhead whale as well as evidence for deletion of this gene in two minke whale species. Hybridization capture of exonic regions and PCR screens provide evidence for inactivation of C4orf26 in eight additional baleen whale species. However, C4orf26 is intact in all three species with enamelless teeth that were surveyed, as well as in 95 additional mammalian species with enamel-capped teeth. Estimates of selection intensity suggest that dN/dS ratios on branches leading to taxa with enamelless teeth are similar to the dN/dS ratio on branches leading to taxa with enamel-capped teeth. Based on these results, we conclude that C4orf26 is tooth-specific, but not enamel-specific, with respect to its essential functions that are maintained by natural selection. A caveat is that an alternative splice site variant, which translates exon 3 in a different reading frame, is putatively functional in Catarrhini and may have evolved an additional role in this primate clade.


Asunto(s)
Amelogénesis Imperfecta/genética , Esmalte Dental/crecimiento & desarrollo , Silenciador del Gen , Genes del Desarrollo , Mamíferos/crecimiento & desarrollo , Diente/crecimiento & desarrollo , Animales , Secuencia de Bases , Exones , Femenino , Mamíferos/genética , Datos de Secuencia Molecular , Placenta , Embarazo , Selección Genética , Homología de Secuencia de Ácido Nucleico , Ballenas/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA