Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 74
Filtrar
1.
Syst Biol ; 71(3): 721-740, 2022 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-34677617

RESUMO

A potential shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting. Coalescent methods address this problem but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both concatenation and coalescent methods, retroelement insertions (RIs) have emerged as powerful phylogenomic markers for species tree estimation. Here, we show that two recently proposed quartet-based methods, SDPquartets and ASTRAL_BP, are statistically consistent estimators of the unrooted species tree topology under the coalescent when RIs follow a neutral infinite-sites model of mutation and the expected number of new RIs per generation is constant across the species tree. The accuracy of these (and other) methods for inferring species trees from RIs has yet to be assessed on simulated data sets, where the true species tree topology is known. Therefore, we evaluated eight methods given RIs simulated from four model species trees, all of which have short branches and at least three of which are in the anomaly zone. In our simulation study, ASTRAL_BP and SDPquartets always recovered the correct species tree topology when given a sufficiently large number of RIs, as predicted. A distance-based method (ASTRID_BP) and Dollo parsimony also performed well in recovering the species tree topology. In contrast, unordered, polymorphism, and Camin-Sokal parsimony (as well as an approach based on MDC) typically fail to recover the correct species tree topology in anomaly zone situations with more than four ingroup taxa. Of the methods studied, only ASTRAL_BP automatically estimates internal branch lengths (in coalescent units) and support values (i.e., local posterior probabilities). We examined the accuracy of branch length estimation, finding that estimated lengths were accurate for short branches but upwardly biased otherwise. This led us to derive the maximum likelihood (branch length) estimate for when RIs are given as input instead of binary gene trees; this corrected formula produced accurate estimates of branch lengths in our simulation study provided that a sufficiently large number of RIs were given as input. Lastly, we evaluated the impact of data quantity on species tree estimation by repeating the above experiments with input sizes varying from 100 to 100,000 parsimony-informative RIs. We found that, when given just 1000 parsimony-informative RIs as input, ASTRAL_BP successfully reconstructed major clades (i.e., clades separated by branches $>0.3$ coalescent units) with high support and identified rapid radiations (i.e., shorter connected branches), although not their precise branching order. The local posterior probability was effective for controlling false positive branches in these scenarios. [Coalescence; incomplete lineage sorting; Laurasiatheria; Palaeognathae; parsimony; polymorphism parsimony; retroelement insertions; species trees; transposon.].


Assuntos
Paleógnatas , Retroelementos , Animais , Simulação por Computador , Modelos Genéticos , Filogenia , Retroelementos/genética
2.
Cladistics ; 39(5): 418-436, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37096985

RESUMO

Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (congsort), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson-Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.


Assuntos
Artefatos , Filogenia , Funções Verossimilhança
3.
Mol Phylogenet Evol ; 171: 107463, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35358696

RESUMO

The loss of teeth and evolution of baleen racks in Mysticeti was a profound transformation that permitted baleen whales to radiate and diversify into a previously underutilized ecological niche of bulk filter-feeding on zooplankton and other small prey. Ancestral state reconstructions suggest that postnatal teeth were lost in the common ancestor of crown Mysticeti. Genomic studies provide some support for this hypothesis and suggest that the genetic toolkit for enamel production was inactivated in the common ancestor of living baleen whales. However, molecular studies to date have not provided direct evidence for the complete loss of teeth, including their dentin component, on the stem mysticete branch. Given these results, several questions remain unanswered: (1) Were teeth lost in a single step or did enamel loss precede dentin loss? (2) Was enamel lost early or late on the stem mysticete branch? (3) If enamel and dentin/tooth loss were decoupled in the ancestry of baleen whales, did dentin loss occur on the stem mysticete branch or independently in different crown mysticete lineages? To address these outstanding questions, we compiled and analyzed complete protein-coding sequences for nine tooth-related genes from cetaceans with available genome data. Seven of these genes are associated with enamel formation (ACP4, AMBN, AMELX, AMTN, ENAM, KLK4, MMP20) whereas two other genes are either dentin-specific (DSPP) or tooth-specific (ODAPH) but not enamel-specific. Molecular evolutionary analyses indicate that all seven enamel-specific genes have inactivating mutations that are scattered across branches of the mysticete tree. Three of the enamel genes (ACP4, KLK4, MMP20) have inactivating mutations that are shared by all mysticetes. The two genes that are dentin-specific (DSPP) or tooth-specific (ODAPH) do not have any inactivating mutations that are shared by all mysticetes, but there are shared mutations in Balaenidae as well as in Plicogulae (Neobalaenidae + Balaenopteroidea). These shared mutations suggest that teeth were lost at most two times. Shared inactivating mutations and dN/dS analyses, in combination with cetacean divergence times, were used to estimate inactivation times of genes and by proxy enamel and tooth phenotypes at ancestral nodes. The results of these analyses are most compatible with a two-step model for the loss of teeth in the ancestry of living baleen whales: enamel was lost very early on the stem Mysticeti branch followed by the independent loss of dentin (and teeth) in the common ancestors of Balaenidae and Plicogulae, respectively. These results imply that some stem mysticetes, and even early crown mysticetes, may have had vestigial teeth comprised of dentin with no enamel. Our results also demonstrate that all odontocete species (in our study) with absent or degenerative enamel have inactivating mutations in one or more of their enamel genes.


Assuntos
Evolução Biológica , Metaloproteinase 20 da Matriz , Animais , Esmalte Dentário , Metaloproteinase 20 da Matriz/genética , Filogenia , Baleias/genética
4.
Mol Phylogenet Evol ; 167: 107344, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34748873

RESUMO

Phylogenomic analyses of ancient rapid radiations can produce conflicting results that are driven by differential sampling of taxa and characters as well as the limitations of alternative analytical methods. We re-examine basal relationships of palaeognath birds (ratites and tinamous) using recently published datasets of nucleotide characters from 20,850 loci as well as 4301 retroelement insertions. The original studies attributed conflicting resolutions of rheas in their inferred coalescent and concatenation trees to concatenation failing in the anomaly zone. By contrast, we find that the coalescent-based resolution of rheas is premised upon extensive gene-tree estimation errors. Furthermore, retroelement insertions contain much more conflict than originally reported and multiple insertion loci support the basal position of rheas found in concatenation trees, while none were reported in the original publication. We demonstrate how even remarkable congruence in phylogenomic studies may be driven by long-branch misplacement of a divergent outgroup, highly incongruent gene trees, differential taxon sampling that can result in gene-tree misrooting errors that bias species-tree inference, and gross homology errors. What was previously interpreted as broad, robustly supported corroboration for a single resolution in coalescent analyses may instead indicate a common bias that taints phylogenomic results across multiple genome-scale datasets. The updated retroelement dataset now supports a species tree with branch lengths that suggest an ancient anomaly zone, and both concatenation and coalescent analyses of the huge nucleotide datasets fail to yield coherent, reliable results in this challenging phylogenetic context.


Assuntos
Aves , Genoma , Animais , Aves/genética , Filogenia
6.
Proc Biol Sci ; 288(1961): 20211213, 2021 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-34702078

RESUMO

The deep sea has been described as the last major ecological frontier, as much of its biodiversity is yet to be discovered and described. Beaked whales (ziphiids) are among the most visible inhabitants of the deep sea, due to their large size and worldwide distribution, and their taxonomic diversity and much about their natural history remain poorly understood. We combine genomic and morphometric analyses to reveal a new Southern Hemisphere ziphiid species, Ramari's beaked whale, Mesoplodon eueu, whose name is linked to the Indigenous peoples of the lands from which the species holotype and paratypes were recovered. Mitogenome and ddRAD-derived phylogenies demonstrate reciprocally monophyletic divergence between M. eueu and True's beaked whale (M. mirus) from the North Atlantic, with which it was previously subsumed. Morphometric analyses of skulls also distinguish the two species. A time-calibrated mitogenome phylogeny and analysis of two nuclear genomes indicate divergence began circa 2 million years ago (Ma), with geneflow ceasing 0.35-0.55 Ma. This is an example of how deep sea biodiversity can be unravelled through increasing international collaboration and genome sequencing of archival specimens. Our consultation and involvement with Indigenous peoples offers a model for broadening the cultural scope of the scientific naming process.


Assuntos
Genômica , Baleias , Animais , Núcleo Celular , Filogenia , Baleias/anatomia & histologia , Baleias/genética
7.
Mol Phylogenet Evol ; 158: 107092, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33545272

RESUMO

In two-step coalescent analyses of phylogenomic data, gene-tree topologies are treated as fixed prior to species-tree inference. Although all gene-tree conflict is assumed to be caused by lineage sorting when applying these methods, in empirical datasets much of the conflict can be caused by estimation error. Weakly supported and even arbitrarily resolved clades are important sources of this estimation error for gene trees inferred from few informative characters relative to the number of sampled terminals, and the resulting extraneous conflict among gene trees can negatively impact species-tree inference. In this study, we quantified the relative severity of alternative methods for collapsing gene-tree branches for seven empirical datasets and quantified their effects on species-tree inference. The branch-collapsing methods that we employed were based on the strict consensus of optimal topologies, various bootstrap thresholds, and 0% approximate likelihood ratio test (SH-like aLRT) support. Up to 86% of internal gene-tree branches are dubiously or arbitrarily resolved in reanalyses of these published phylogenomic datasets, and collapsing these branches increased inferred species-tree coalescent branch lengths by up to 455%. For two datasets, the longer inferred branch lengths sometimes impacted inference of anomaly-zone conditions. Although branch-collapsing methods did not consistently affect the species-tree topology, they often increased branch support. The more severe and clearly justified gene-tree branch-collapsing methods, which we recommend be broadly applied for two-step coalescent analyses, are use of the strict consensus in parsimony analyses and the collapse clades with 0% SH-like aLRT support in likelihood analyses. Collapsing dubiously or arbitrarily resolved branches in gene trees sometimes improved congruence between coalescent-based results and concatenation trees. In such cases, we contend that the resolution provided by concatenation should be preferred and that incomplete lineage sorting is a poor explanation for the initial conflict between phylogenetic approaches.


Assuntos
Modelos Genéticos , Animais , Aves/genética , Funções Verossimilhança , Lagartos , Filogenia , Sciuridae/genética , Software
8.
Mol Phylogenet Evol ; 152: 106924, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32771548

RESUMO

Extant species in the order Crocodylia are remnants of an ancient lineage of large-bodied archosaur reptiles. Despite decades of systematic studies, phylogenetic relationships among members of the genus Crocodylus (true crocodiles) in the Neotropics are poorly understood. Here we estimated phylogenomic relationships among the four extant Crocodylus species in the Americas. Species-tree reconstructions using genotypic data from 17,538 SNPs collected for 33 individuals spanning six Crocodylus species (four ingroup and two outgroup) revealed novel relationships for all Neotropical species. For the first time, C. acutus, the American crocodile, was recovered as monophyletic when individuals from Antillean and continental populations were analyzed together. Our results also contradict previous inferences based on mitochondrial DNA data and a limited number of nuclear markers by robustly grouping Morelet's crocodile (C. moreletii) as the sister species to C. acutus, suggesting a novel phylogeographic hypothesis for the group. The present study punctuates the importance of using nuclear genome-wide information and representative sampling for resolving phylogenetic relationships, especially in broadly distributed species and those with complex evolutionary histories.


Assuntos
Jacarés e Crocodilos/classificação , Jacarés e Crocodilos/genética , Filogenia , América , Animais , DNA Mitocondrial/genética , Genótipo , Filogeografia
9.
J Hered ; 111(2): 147-168, 2020 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-31837265

RESUMO

DNA sequence alignments have provided the majority of data for inferring phylogenetic relationships with both concatenation and coalescent methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroelement insertions have emerged as a powerful alternative to sequences for deciphering evolutionary relationships because these data are nearly homoplasy-free. In addition, retroelement insertions satisfy the "no intralocus-recombination" assumption of summary coalescent methods because they are singular events and better approximate neutrality relative to DNA loci commonly sampled in phylogenomic studies. Retroelements have traditionally been analyzed with parsimony, distance, and network methods. Here, we analyze retroelement data sets for vertebrate clades (Placentalia, Laurasiatheria, Balaenopteroidea, Palaeognathae) with 2 ILS-aware methods that operate by extracting, weighting, and then assembling unrooted quartets into a species tree. The first approach constructs a species tree from retroelement bipartitions with ASTRAL, and the second method is based on split-decomposition with parsimony. We also develop a Quartet-Asymmetry test to detect hybridization using retroelements. Both ILS-aware methods recovered the same species-tree topology for each data set. The ASTRAL species trees for Laurasiatheria have consecutive short branch lengths in the anomaly zone whereas Palaeognathae is outside of this zone. For the Balaenopteroidea data set, which includes rorquals (Balaenopteridae) and gray whale (Eschrichtiidae), both ILS-aware methods resolved balaeonopterids as paraphyletic. Application of the Quartet-Asymmetry test to this data set detected 19 different quartets of species for which historical introgression may be inferred. Evidence for introgression was not detected in the other data sets.


Assuntos
Especiação Genética , Modelos Genéticos , Retroelementos , Vertebrados/genética , Animais , Elementos de DNA Transponíveis , Hibridização Genética , Filogenia
10.
BMC Evol Biol ; 19(1): 31, 2019 01 23.
Artigo em Inglês | MEDLINE | ID: mdl-30674270

RESUMO

BACKGROUND: The gene for odontogenic ameloblast-associated (ODAM) is a member of the secretory calcium-binding phosphoprotein gene family. ODAM is primarily expressed in dental tissues including the enamel organ and the junctional epithelium, and may also have pleiotropic functions that are unrelated to teeth. Here, we leverage the power of natural selection to test competing hypotheses that ODAM is tooth-specific versus pleiotropic. Specifically, we compiled and screened complete protein-coding sequences, plus sequences for flanking intronic regions, for ODAM in 165 placental mammals to determine if this gene contains inactivating mutations in lineages that either lack teeth (baleen whales, pangolins, anteaters) or lack enamel on their teeth (aardvarks, sloths, armadillos), as would be expected if the only essential functions of ODAM are related to tooth development and the adhesion of the gingival junctional epithelium to the enamel tooth surface. RESULTS: We discovered inactivating mutations in all species of placental mammals that either lack teeth or lack enamel on their teeth. A surprising result is that ODAM is also inactivated in a few additional lineages including all toothed whales that were examined. We hypothesize that ODAM inactivation is related to the simplified outer enamel surface of toothed whales. An alternate hypothesis is that ODAM inactivation in toothed whales may be related to altered antimicrobial functions of the junctional epithelium in aquatic habitats. Selection analyses on ODAM sequences revealed that the composite dN/dS value for pseudogenic branches is close to 1.0 as expected for a neutrally evolving pseudogene. DN/dS values on transitional branches were used to estimate ODAM inactivation times. In the case of pangolins, ODAM was inactivated ~ 65 million years ago, which is older than the oldest pangolin fossil (Eomanis, 47 Ma) and suggests an even more ancient loss or simplification of teeth in this lineage. CONCLUSION: Our results validate the hypothesis that the only essential functions of ODAM that are maintained by natural selection are related to tooth development and/or the maintenance of a healthy junctional epithelium that attaches to the enamel surface of teeth.


Assuntos
Ameloblastos/metabolismo , Esmalte Dentário/metabolismo , Eutérios/genética , Inativação Gênica , Odontogênese , Proteínas/genética , Baleias/genética , Animais , Sequência de Bases , Teorema de Bayes , Códon/genética , Feminino , Fósseis , Funções Verossimilhança , Mutação/genética , Filogenia , Gravidez , Proteínas/metabolismo
11.
Mol Phylogenet Evol ; 131: 80-92, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30391518

RESUMO

In summary ("two-step") coalescent analyses of empirical data, researchers typically apply the bootstrap to quantify branch support for clades inferred on the optimal species tree. We tested whether site-wise bootstrap analyses provide consistently more conservative support than gene-wise bootstrap analyses. We did so using data from three empirical phylogenomic studies and employed four coalescent methods (ASTRAL, MP-EST, NJst, and STAR). We demonstrate that application of site-wise bootstrapping generally resulted in gene-trees with substantial additional conflicts relative to the original data and this approach therefore cannot be relied upon to provide conservative support. Instead the site-wise bootstrap can provide high support for apparently incorrect clades. We provide a script (https://github.com/dbsloan/msc_tree_resampling) that implements gene-wise resampling, using either the bootstrap or the jackknife, for use with ASTRAL, MP-EST, NJst, and STAR. We demonstrate that the gene-wise bootstrap outperformed the site-wise bootstrap for the primary focal clades for all four coalescent methods that were applied to all three empirical studies. For summary coalescent analyses we suggest that gene-wise resampling support should be favored over gene + site or site-wise resampling when numerous genes are sampled because site-wise resampling causes substantially greater gene-tree-estimation error.


Assuntos
Genes , Filogenia , Pesquisa Empírica , Probabilidade , Software
12.
Mol Phylogenet Evol ; 139: 106539, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31226465

RESUMO

Genomic datasets sometimes support conflicting phylogenetic relationships when different tree-building methods are applied. Coherent interpretations of such results are enabled by partitioning support for controversial relationships among the constituent genes of a phylogenomic dataset. For the supermatrix (=concatenation) approach, several methods that measure the distribution of support and conflict among loci were introduced over 15 years ago. More recently, partitioned coalescence support (PCS) was developed for phylogenetic coalescence methods that account for incomplete lineage sorting and use the summed fits of gene trees to estimate the species tree. Here, we automate computation of PCS to permit application of this index to genome-scale matrices that include hundreds of loci. Reanalyses of four phylogenomic datasets for amniotes, land plants, skinks, and angiosperms demonstrate how PCS scores can be used to: (1) compare conflicting results favored by alternative coalescence methods, (2) identify outlier gene trees that have a disproportionate influence on the resolution of contentious relationships, (3) assess the effects of missing data in species-tree analysis, and (4) clarify biases in commonly-implemented coalescence methods and support indices. We show that key phylogenomic conclusions from these analyses often hinge on just a few gene trees and that results can be driven by specific biases of a particular coalescence method and/or the differential weight placed on gene trees with high versus low taxon sampling. The attribution of exceptionally high weight to some gene trees and very low weight to other gene trees counters the basic logic of phylogenomic coalescence analysis; even clades in species trees with high support according to commonly used indices (likelihood-ratio test, bootstrap, Bayesian local posterior probability) can be unstable to the removal of only one or two gene trees with high PCS. Computer simulations cannot adequately describe all of the contingencies and complexities of empirical genetic data. PCS scores complement simulation work by providing specific insights into a particular dataset given the assumptions of the phylogenetic coalescence method that is applied. In combination with standard measures of nodal support, PCS provides a more complete understanding of the overall genomic evidence for contested evolutionary relationships in species trees.


Assuntos
Filogenia , Animais , Teorema de Bayes , Viés , Evolução Biológica , Simulação por Computador , Genes , Genômica , Lagartos/classificação , Lagartos/genética , Magnoliopsida/classificação , Magnoliopsida/genética , Plantas/classificação , Plantas/genética , Probabilidade
13.
Mol Phylogenet Evol ; 120: 364-374, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29277542

RESUMO

MC5R is one of five melanocortin receptor genes found in placental mammals. MC5R plays an important role in energy homeostasis and is also expressed in the terminal differentiation of sebaceous glands. Among placental mammals there are multiple lineages that either lack or have degenerative sebaceous glands including Cetacea (whales, dolphins, and porpoises), Hippopotamidae (hippopotamuses), Sirenia (manatees and dugongs), Proboscidea (elephants), Rhinocerotidae (rhinos), and Heterocephalus glaber (naked mole rat). Given the loss or diminution of sebaceous glands in these taxa, we procured MC5R sequences from publicly available genomes and transcriptomes, supplemented by a newly generated sequence for Choeropsis liberiensis (pygmy hippopotamus), to determine if this gene remains intact or is inactivated in association with loss/reduction of sebaceous glands. Our data set includes complete MC5R sequences for 114 placental mammal species including two individuals of Mammuthus primigenius (woolly mammoth) from Oimyakon and Wrangel Island. Complete loss or inactivation of the MC5R gene occurs in multiple placental lineages that have lost sebaceous glands (Cetacea, West Indian manatee, African elephant, white rhinoceros) or are characterized by unusual skin (pangolins, aardvarks). Both M. primigenius individuals share inactivating mutations with the African elephant even though sebaceous glands have been reported in the former. MC5R remains intact in hippopotamuses and the naked mole rat, although slightly elevated dN/dS ratios in these lineages allow for the possibility that the accumulation of inactivating mutations in MC5R may lag behind the relaxation of purifying selection. For Cetacea and Hippopotamidae, the absence of shared inactivating mutations in two different skin genes (MC5R, PSORS1C2) is consistent with the hypothesis that semi-aquatic lifestyles were acquired independently in these clades following divergence from a common ancestor.


Assuntos
Evolução Molecular , Mamíferos/metabolismo , Receptores de Melanocortina/genética , Glândulas Sebáceas/fisiologia , Animais , Sequência de Bases , Bases de Dados Genéticas , Feminino , Mamíferos/classificação , Filogenia , Placenta/metabolismo , Gravidez , Receptores de Melanocortina/classificação , Alinhamento de Sequência
14.
J Hered ; 109(3): 297-307, 2018 03 16.
Artigo em Inglês | MEDLINE | ID: mdl-29077895

RESUMO

Homology is perhaps the most central concept of phylogenetic biology. At difficult to resolve polytomies that are deep in the Tree of Life, a few homology errors in phylogenomic data can drive spurious phylogenetic results. Feijoo and Parada (2017) assembled three phylogenomic data sets for mammals and reported methodological discrepancies and unexpected results that contradict the monophyly of well-established clades in Pinnipedia and Yangochiroptera. Examination of Feijoo and Parada's (2017) data sets reveals extensive homology errors (paralogous sequences, alignments of different exons to each other) and cross-contamination of sequences from different species. These problems predictably result in distorted estimates of gene trees, species trees, bootstrap support, and branch lengths. Correction of these errors resulted in robust support for conventional relationships in Pinnipedia and Yangochiroptera. Phylogenomic data sets are not immune to the problems of homology errors in sequence alignments. Rather, sequence alignments underlie all inferences in molecular phylogenetics and evolution and should be spot-checked for obvious errors via manual inspection of alignments and gene trees.


Assuntos
Caniformia/genética , Quirópteros/genética , Filogenia , Animais , Carnívoros/genética , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Éxons , Funções Verossimilhança , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos
15.
Mol Phylogenet Evol ; 109: 375-387, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28193458

RESUMO

Various toothed whales (Odontoceti) are unique among mammals in lacking olfactory bulbs as adults and are thought to be anosmic (lacking the olfactory sense). At the molecular level, toothed whales have high percentages of pseudogenic olfactory receptor genes, but species that have been investigated to date retain an intact copy of the olfactory marker protein gene (OMP), which is highly expressed in olfactory receptor neurons and may regulate the temporal resolution of olfactory responses. One hypothesis for the retention of intact OMP in diverse odontocete lineages is that this gene is pleiotropic with additional functions that are unrelated to olfaction. Recent expression studies provide some support for this hypothesis. Here, we report OMP sequences for representatives of all extant cetacean families and provide the first molecular evidence for inactivation of this gene in vertebrates. Specifically, OMP exhibits independent inactivating mutations in six different odontocete lineages: four river dolphin genera (Platanista, Lipotes, Pontoporia, Inia), sperm whale (Physeter), and harbor porpoise (Phocoena). These results suggest that the only essential role of OMP that is maintained by natural selection is in olfaction, although a non-olfactory role for OMP cannot be ruled out for lineages that retain an intact copy of this gene. Available genome sequences from cetaceans and close outgroups provide evidence of inactivating mutations in two additional genes (CNGA2, CNGA4), which imply further pseudogenization events in the olfactory cascade of odontocetes. Selection analyses demonstrate that evolutionary constraints on all three genes (OMP, CNGA2, CNGA4) have been greatly reduced in Odontoceti, but retain a signature of purifying selection on the stem Cetacea branch and in Mysticeti (baleen whales). This pattern is compatible with the 'echolocation-priority' hypothesis for the evolution of OMP, which posits that negative selection was maintained in the common ancestor of Cetacea and was not relaxed significantly until the evolution of echolocation in Odontoceti.


Assuntos
Golfinhos/genética , Proteína de Marcador Olfatório/genética , Animais , Sequência de Bases , Evolução Biológica , DNA Mitocondrial , Golfinhos/classificação , Evolução Molecular , Proteína de Marcador Olfatório/fisiologia , Filogenia
16.
Mol Phylogenet Evol ; 106: 103-117, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-27640953

RESUMO

Multi-locus nuclear DNA data were used to delimit species of fringe-toed lizards of the Uma notata complex, which are specialized for living in wind-blown sand habitats in the deserts of southwestern North America, and to infer whether Quaternary glacial cycles or Tertiary geological events were important in shaping the historical biogeography of this group. We analyzed ten nuclear loci collected using Sanger sequencing and genome-wide sequence/single-nucleotide polymorphism (SNP) data collected using restriction-associated DNA (RAD) sequencing. A combination of species discovery methods (concatenated phylogenies, parametric and non-parametric clustering algorithms) and species validation approaches (coalescent-based species tree/isolation-with-migration models) were used to delimit species, infer phylogenetic relationships, and to estimate effective population sizes, migration rates, and speciation times. Uma notata, U. inornata, U. cowlesi, and an undescribed species from Mohawk Dunes, Arizona (U. sp.) were supported as distinct in the concatenated analyses and by clustering algorithms, and all operational taxonomic units were decisively supported as distinct species by ranking hierarchical nested speciation models with Bayes factors based on coalescent-based species tree methods. However, significant unidirectional gene flow (2NM>1) from U. cowlesi and U. notata into U. rufopunctata was detected under the isolation-with-migration model. Therefore, we conservatively delimit four species-level lineages within this complex (U. inornata, U. notata, U. cowlesi, and U. sp.), treating U. rufopunctata as a hybrid population (U. notata×cowlesi). Both concatenated and coalescent-based estimates of speciation times support the hypotheses that speciation within the complex occurred during the late Pleistocene, and that the geological evolution of the Colorado River delta during this period was an important process shaping the observed phylogeographic patterns.


Assuntos
Fluxo Gênico , Lagartos/classificação , Migração Animal , Animais , Teorema de Bayes , Biodiversidade , Análise por Conglomerados , DNA/química , DNA/isolamento & purificação , DNA/metabolismo , Lagartos/genética , Filogenia , Filogeografia , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal , Análise de Sequência de DNA
17.
Cladistics ; 33(3): 295-332, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34715726

RESUMO

Recent phylogenetic analyses of a large dataset for mammalian families (169 taxa, 26 loci) portray contrasting results. Supermatrix (concatenation) methods support a generally robust tree with only a few inconsistently resolved polytomies, whereas MP-EST coalescence analysis of the same dataset yields a weakly supported tree that conflicts with many traditionally recognized clades. Here, we evaluate this discrepancy via improved coalescence analyses with reference to the rich history of phylogenetic studies on mammals. This integration clearly demonstrates that both supermatrix and coalescence analyses of just 26 loci yield a congruent, well-supported phylogenetic hypothesis for Mammalia. Discrepancies between published studies are explained by implementation of overly simple DNA substitution models, inadequate tree-search routines and limitations of the MP-EST method. We develop a simple measure, partitioned coalescence support (PCS), which summarizes the distribution of support and conflict among gene trees for a given clade. Extremely high PCS scores for outlier gene trees at two nodes in the mammalian tree indicate a troubling bias in the MP-EST method. We conclude that in this age of phylogenomics, a solid understanding of systematics fundamentals, choice of valid methodology and a broad knowledge of a clade's taxonomic history are still required to yield coherent phylogenetic inferences.

18.
Mol Phylogenet Evol ; 94(Pt A): 1-33, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26238460

RESUMO

Higher-level relationships among placental mammals are mostly resolved, but several polytomies remain contentious. Song et al. (2012) claimed to have resolved three of these using shortcut coalescence methods (MP-EST, STAR) and further concluded that these methods, which assume no within-locus recombination, are required to unravel deep-level phylogenetic problems that have stymied concatenation. Here, we reanalyze Song et al.'s (2012) data and leverage these re-analyses to explore key issues in systematics including the recombination ratchet, gene tree stoichiometry, the proportion of gene tree incongruence that results from deep coalescence versus other factors, and simulations that compare the performance of coalescence and concatenation methods in species tree estimation. Song et al. (2012) reported an average locus length of 3.1 kb for the 447 protein-coding genes in their phylogenomic dataset, but the true mean length of these loci (start codon to stop codon) is 139.6 kb. Empirical estimates of recombination breakpoints in primates, coupled with consideration of the recombination ratchet, suggest that individual coalescence genes (c-genes) approach ∼12 bp or less for Song et al.'s (2012) dataset, three to four orders of magnitude shorter than the c-genes reported by these authors. This result has general implications for the application of coalescence methods in species tree estimation. We contend that it is illogical to apply coalescence methods to complete protein-coding sequences. Such analyses amalgamate c-genes with different evolutionary histories (i.e., exons separated by >100,000 bp), distort true gene tree stoichiometry that is required for accurate species tree inference, and contradict the central rationale for applying coalescence methods to difficult phylogenetic problems. In addition, Song et al.'s (2012) dataset of 447 genes includes 21 loci with switched taxonomic names, eight duplicated loci, 26 loci with non-homologous sequences that are grossly misaligned, and numerous loci with >50% missing data for taxa that are misplaced in their gene trees. These problems were compounded by inadequate tree searches with nearest neighbor interchange branch swapping and inadvertent application of substitution models that did not account for among-site rate heterogeneity. Sixty-six gene trees imply unrealistic deep coalescences that exceed 100 million years (MY). Gene trees that were obtained with better justified models and search parameters show large increases in both likelihood scores and congruence. Coalescence analyses based on a curated set of 413 improved gene trees and a superior coalescence method (ASTRAL) support a Scandentia (treeshrews)+Glires (rabbits, rodents) clade, contradicting one of the three primary systematic conclusions of Song et al. (2012). Robust support for a Perissodactyla+Carnivora clade within Laurasiatheria is also lost, contradicting a second major conclusion of this study. Song et al.'s (2012) MP-EST species tree provided the basis for circular simulations that led these authors to conclude that the multispecies coalescent accounts for 77% of the gene tree conflicts in their dataset, but many internal branches of their MP-EST tree are stunted by an order of magnitude or more due to wholesale gene tree reconstruction errors. An independent assessment of branch lengths suggests the multispecies coalescent accounts for ⩽ 15% of the conflicts among Song et al.'s (2012) 447 gene trees. Unfortunately, Song et al.'s (2012) flawed phylogenomic dataset has been used as a model for additional simulation work that suggests the superiority of shortcut coalescence methods relative to concatenation. Investigator error was passed on to the subsequent simulation studies, which also incorporated further logical errors that should be avoided in future simulation studies. Illegitimate branch length switches in the simulation routines unfairly protected coalescence methods from their Achilles' heel, high gene tree reconstruction error at short internodes. These simulations therefore provide no evidence that shortcut coalescence methods out-compete concatenation at deep timescales. In summary, the long c-genes that are required for accurate reconstruction of species trees using shortcut coalescence methods do not exist and are a delusion. Coalescence approaches based on SNPs that are widely spaced in the genome avoid problems with the recombination ratchet and merit further pursuit in both empirical systematic research and simulations.


Assuntos
Genes , Mamíferos/classificação , Mamíferos/genética , Modelos Genéticos , Filogenia , Animais , Conjuntos de Dados como Assunto , Evolução Molecular , Polimorfismo de Nucleotídeo Único/genética , Escandêntias/classificação , Escandêntias/genética
19.
Mol Phylogenet Evol ; 100: 424-443, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27103257

RESUMO

Observed Variability (OV) and Tree Independent Generation of Evolutionary Rates (TIGER) are quick and easy-to-apply tree-independent methods that have been proposed to provide unbiased estimates of each character's rate of evolution and serve as the basis for excluding rapidly evolving characters. Both methods have been applied to multiple phylogenomic datasets, and in many cases the authors considered their trees inferred from the OV- and TIGER-delimited sub-matrices to be better estimates of the phylogeny than their trees based on all characters. In this study we use four sets of simulations and an empirical phylogenomic example to demonstrate that both methods share a systematic bias against characters with more symmetric distributions of character states, against characters with greater observed character-state space, and against large clades in the context of character conflict. As a result these methods can favor convergences and reversals over synapomorphy, exacerbate long-branch attraction, and produce mutually exclusive phylogenetic inferences that are dependent upon differential taxon sampling. We assert that neither OV nor TIGER should be relied upon to increase the ratio of phylogenetic to non-phylogenetic signal in a data matrix. We also assert that skepticism is warranted for empirical phylogenetic results that are based on OV- and/or TIGER-based character deletion wherein a small clade is supported after deletion of characters, yet is contradicted by a larger clade when the entire data matrix was analyzed.


Assuntos
Evolução Molecular , Filogenia , Classificação , Confiabilidade dos Dados , Interpretação Estatística de Dados , Variação Genética , Modelos Genéticos , Tipagem Molecular/métodos
20.
Mol Phylogenet Evol ; 97: 76-89, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26768112

RESUMO

Gene-tree-estimation error is a major concern for coalescent methods of phylogenetic inference. We sampled eight empirical studies of ancient lineages with diverse numbers of taxa and genes for which the original authors applied one or more coalescent methods. We found that the average pairwise congruence among gene trees varied greatly both between studies and also often within a study. We recommend that presenting plots of pairwise congruence among gene trees in a dataset be treated as a standard practice for empirical coalescent studies so that readers can readily assess the extent and distribution of incongruence among gene trees. ASTRAL-based coalescent analyses generally outperformed MP-EST and STAR with respect to both internal consistency (congruence between analyses of subsamples of genes with the complete dataset of all genes) and congruence with the concatenation-based topology. We evaluated the approach of subsampling gene trees that are, on average, more congruent with other gene trees as a method to reduce artifacts caused by gene-tree-estimation errors on coalescent analyses. We suggest that this method is well suited to testing whether gene-tree-estimation error is a primary cause of incongruence between concatenation- and coalescent-based results, to reconciling conflicting phylogenetic results based on different coalescent methods, and to identifying genes affected by artifacts that may then be targeted for reciprocal illumination. We provide scripts that automate the process of calculating pairwise gene-tree incongruence and subsampling trees while accounting for differential taxon sampling among genes. Finally, we assert that multiple tree-search replicates should be implemented as a standard practice for empirical coalescent studies that apply MP-EST.


Assuntos
Genes , Técnicas Genéticas , Filogenia , Artefatos , Reprodutibilidade dos Testes , Projetos de Pesquisa
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA