RESUMO
A potential shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting. Coalescent methods address this problem but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both concatenation and coalescent methods, retroelement insertions (RIs) have emerged as powerful phylogenomic markers for species tree estimation. Here, we show that two recently proposed quartet-based methods, SDPquartets and ASTRAL_BP, are statistically consistent estimators of the unrooted species tree topology under the coalescent when RIs follow a neutral infinite-sites model of mutation and the expected number of new RIs per generation is constant across the species tree. The accuracy of these (and other) methods for inferring species trees from RIs has yet to be assessed on simulated data sets, where the true species tree topology is known. Therefore, we evaluated eight methods given RIs simulated from four model species trees, all of which have short branches and at least three of which are in the anomaly zone. In our simulation study, ASTRAL_BP and SDPquartets always recovered the correct species tree topology when given a sufficiently large number of RIs, as predicted. A distance-based method (ASTRID_BP) and Dollo parsimony also performed well in recovering the species tree topology. In contrast, unordered, polymorphism, and Camin-Sokal parsimony (as well as an approach based on MDC) typically fail to recover the correct species tree topology in anomaly zone situations with more than four ingroup taxa. Of the methods studied, only ASTRAL_BP automatically estimates internal branch lengths (in coalescent units) and support values (i.e., local posterior probabilities). We examined the accuracy of branch length estimation, finding that estimated lengths were accurate for short branches but upwardly biased otherwise. This led us to derive the maximum likelihood (branch length) estimate for when RIs are given as input instead of binary gene trees; this corrected formula produced accurate estimates of branch lengths in our simulation study provided that a sufficiently large number of RIs were given as input. Lastly, we evaluated the impact of data quantity on species tree estimation by repeating the above experiments with input sizes varying from 100 to 100,000 parsimony-informative RIs. We found that, when given just 1000 parsimony-informative RIs as input, ASTRAL_BP successfully reconstructed major clades (i.e., clades separated by branches $>0.3$ coalescent units) with high support and identified rapid radiations (i.e., shorter connected branches), although not their precise branching order. The local posterior probability was effective for controlling false positive branches in these scenarios. [Coalescence; incomplete lineage sorting; Laurasiatheria; Palaeognathae; parsimony; polymorphism parsimony; retroelement insertions; species trees; transposon.].
Assuntos
Paleógnatas , Retroelementos , Animais , Simulação por Computador , Modelos Genéticos , Filogenia , Retroelementos/genéticaRESUMO
Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (congsort), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson-Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.
Assuntos
Artefatos , Filogenia , Funções VerossimilhançaRESUMO
Phylogenomic analyses of ancient rapid radiations can produce conflicting results that are driven by differential sampling of taxa and characters as well as the limitations of alternative analytical methods. We re-examine basal relationships of palaeognath birds (ratites and tinamous) using recently published datasets of nucleotide characters from 20,850 loci as well as 4301 retroelement insertions. The original studies attributed conflicting resolutions of rheas in their inferred coalescent and concatenation trees to concatenation failing in the anomaly zone. By contrast, we find that the coalescent-based resolution of rheas is premised upon extensive gene-tree estimation errors. Furthermore, retroelement insertions contain much more conflict than originally reported and multiple insertion loci support the basal position of rheas found in concatenation trees, while none were reported in the original publication. We demonstrate how even remarkable congruence in phylogenomic studies may be driven by long-branch misplacement of a divergent outgroup, highly incongruent gene trees, differential taxon sampling that can result in gene-tree misrooting errors that bias species-tree inference, and gross homology errors. What was previously interpreted as broad, robustly supported corroboration for a single resolution in coalescent analyses may instead indicate a common bias that taints phylogenomic results across multiple genome-scale datasets. The updated retroelement dataset now supports a species tree with branch lengths that suggest an ancient anomaly zone, and both concatenation and coalescent analyses of the huge nucleotide datasets fail to yield coherent, reliable results in this challenging phylogenetic context.
Assuntos
Aves , Genoma , Animais , Aves/genética , FilogeniaRESUMO
The loss of teeth and evolution of baleen racks in Mysticeti was a profound transformation that permitted baleen whales to radiate and diversify into a previously underutilized ecological niche of bulk filter-feeding on zooplankton and other small prey. Ancestral state reconstructions suggest that postnatal teeth were lost in the common ancestor of crown Mysticeti. Genomic studies provide some support for this hypothesis and suggest that the genetic toolkit for enamel production was inactivated in the common ancestor of living baleen whales. However, molecular studies to date have not provided direct evidence for the complete loss of teeth, including their dentin component, on the stem mysticete branch. Given these results, several questions remain unanswered: (1) Were teeth lost in a single step or did enamel loss precede dentin loss? (2) Was enamel lost early or late on the stem mysticete branch? (3) If enamel and dentin/tooth loss were decoupled in the ancestry of baleen whales, did dentin loss occur on the stem mysticete branch or independently in different crown mysticete lineages? To address these outstanding questions, we compiled and analyzed complete protein-coding sequences for nine tooth-related genes from cetaceans with available genome data. Seven of these genes are associated with enamel formation (ACP4, AMBN, AMELX, AMTN, ENAM, KLK4, MMP20) whereas two other genes are either dentin-specific (DSPP) or tooth-specific (ODAPH) but not enamel-specific. Molecular evolutionary analyses indicate that all seven enamel-specific genes have inactivating mutations that are scattered across branches of the mysticete tree. Three of the enamel genes (ACP4, KLK4, MMP20) have inactivating mutations that are shared by all mysticetes. The two genes that are dentin-specific (DSPP) or tooth-specific (ODAPH) do not have any inactivating mutations that are shared by all mysticetes, but there are shared mutations in Balaenidae as well as in Plicogulae (Neobalaenidae + Balaenopteroidea). These shared mutations suggest that teeth were lost at most two times. Shared inactivating mutations and dN/dS analyses, in combination with cetacean divergence times, were used to estimate inactivation times of genes and by proxy enamel and tooth phenotypes at ancestral nodes. The results of these analyses are most compatible with a two-step model for the loss of teeth in the ancestry of living baleen whales: enamel was lost very early on the stem Mysticeti branch followed by the independent loss of dentin (and teeth) in the common ancestors of Balaenidae and Plicogulae, respectively. These results imply that some stem mysticetes, and even early crown mysticetes, may have had vestigial teeth comprised of dentin with no enamel. Our results also demonstrate that all odontocete species (in our study) with absent or degenerative enamel have inactivating mutations in one or more of their enamel genes.
Assuntos
Evolução Biológica , Metaloproteinase 20 da Matriz , Animais , Esmalte Dentário , Metaloproteinase 20 da Matriz/genética , Filogenia , Baleias/genéticaRESUMO
The deep sea has been described as the last major ecological frontier, as much of its biodiversity is yet to be discovered and described. Beaked whales (ziphiids) are among the most visible inhabitants of the deep sea, due to their large size and worldwide distribution, and their taxonomic diversity and much about their natural history remain poorly understood. We combine genomic and morphometric analyses to reveal a new Southern Hemisphere ziphiid species, Ramari's beaked whale, Mesoplodon eueu, whose name is linked to the Indigenous peoples of the lands from which the species holotype and paratypes were recovered. Mitogenome and ddRAD-derived phylogenies demonstrate reciprocally monophyletic divergence between M. eueu and True's beaked whale (M. mirus) from the North Atlantic, with which it was previously subsumed. Morphometric analyses of skulls also distinguish the two species. A time-calibrated mitogenome phylogeny and analysis of two nuclear genomes indicate divergence began circa 2 million years ago (Ma), with geneflow ceasing 0.35-0.55 Ma. This is an example of how deep sea biodiversity can be unravelled through increasing international collaboration and genome sequencing of archival specimens. Our consultation and involvement with Indigenous peoples offers a model for broadening the cultural scope of the scientific naming process.
Assuntos
Genômica , Baleias , Animais , Núcleo Celular , Filogenia , Baleias/anatomia & histologia , Baleias/genéticaRESUMO
As limits on O2 availability during submergence impose severe constraints on aerobic respiration, the oxygen binding globin proteins of marine mammals are expected to have evolved under strong evolutionary pressures during their land-to-sea transition. Here, we address this question for the order Sirenia by retrieving, annotating, and performing detailed selection analyses on the globin repertoire of the extinct Steller's sea cow (Hydrodamalis gigas), dugong (Dugong dugon), and Florida manatee (Trichechus manatus latirostris) in relation to their closest living terrestrial relatives (elephants and hyraxes). These analyses indicate most loci experienced elevated nucleotide substitution rates during their transition to a fully aquatic lifestyle. While most of these genes evolved under neutrality or strong purifying selection, the rate of nonsynonymous/synonymous replacements increased in two genes (Hbz-T1 and Hba-T1) that encode the α-type chains of hemoglobin (Hb) during each stage of life. Notably, the relaxed evolution of Hba-T1 is temporally coupled with the emergence of a chimeric pseudogene (Hba-T2/Hbq-ps) that contributed to the tandemly linked Hba-T1 of stem sirenians via interparalog gene conversion. Functional tests on recombinant Hb proteins from extant and ancestral sirenians further revealed that the molecular remodeling of Hba-T1 coincided with increased Hb-O2 affinity in early sirenians. Available evidence suggests that this trait evolved to maximize O2 extraction from finite lung stores and suppress tissue O2 offloading, thereby facilitating the low metabolic intensities of extant sirenians. In contrast, the derived reduction in Hb-O2 affinity in (sub)Arctic Steller's sea cows is consistent with fueling increased thermogenesis by these once colossal marine herbivores.
Assuntos
Adaptação Biológica , Evolução Molecular , Globinas/genética , Pseudogenes , Sirênios/genética , Animais , Conversão Gênica , Globinas/metabolismo , Masculino , Família Multigênica , Proteínas Mutantes Quiméricas , Oxigênio/metabolismo , Seleção Genética , Sirênios/metabolismoRESUMO
DNA sequence alignments have provided the majority of data for inferring phylogenetic relationships with both concatenation and coalescent methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroelement insertions have emerged as a powerful alternative to sequences for deciphering evolutionary relationships because these data are nearly homoplasy-free. In addition, retroelement insertions satisfy the "no intralocus-recombination" assumption of summary coalescent methods because they are singular events and better approximate neutrality relative to DNA loci commonly sampled in phylogenomic studies. Retroelements have traditionally been analyzed with parsimony, distance, and network methods. Here, we analyze retroelement data sets for vertebrate clades (Placentalia, Laurasiatheria, Balaenopteroidea, Palaeognathae) with 2 ILS-aware methods that operate by extracting, weighting, and then assembling unrooted quartets into a species tree. The first approach constructs a species tree from retroelement bipartitions with ASTRAL, and the second method is based on split-decomposition with parsimony. We also develop a Quartet-Asymmetry test to detect hybridization using retroelements. Both ILS-aware methods recovered the same species-tree topology for each data set. The ASTRAL species trees for Laurasiatheria have consecutive short branch lengths in the anomaly zone whereas Palaeognathae is outside of this zone. For the Balaenopteroidea data set, which includes rorquals (Balaenopteridae) and gray whale (Eschrichtiidae), both ILS-aware methods resolved balaeonopterids as paraphyletic. Application of the Quartet-Asymmetry test to this data set detected 19 different quartets of species for which historical introgression may be inferred. Evidence for introgression was not detected in the other data sets.
Assuntos
Especiação Genética , Modelos Genéticos , Retroelementos , Vertebrados/genética , Animais , Elementos de DNA Transponíveis , Hibridização Genética , FilogeniaRESUMO
BACKGROUND: The gene for odontogenic ameloblast-associated (ODAM) is a member of the secretory calcium-binding phosphoprotein gene family. ODAM is primarily expressed in dental tissues including the enamel organ and the junctional epithelium, and may also have pleiotropic functions that are unrelated to teeth. Here, we leverage the power of natural selection to test competing hypotheses that ODAM is tooth-specific versus pleiotropic. Specifically, we compiled and screened complete protein-coding sequences, plus sequences for flanking intronic regions, for ODAM in 165 placental mammals to determine if this gene contains inactivating mutations in lineages that either lack teeth (baleen whales, pangolins, anteaters) or lack enamel on their teeth (aardvarks, sloths, armadillos), as would be expected if the only essential functions of ODAM are related to tooth development and the adhesion of the gingival junctional epithelium to the enamel tooth surface. RESULTS: We discovered inactivating mutations in all species of placental mammals that either lack teeth or lack enamel on their teeth. A surprising result is that ODAM is also inactivated in a few additional lineages including all toothed whales that were examined. We hypothesize that ODAM inactivation is related to the simplified outer enamel surface of toothed whales. An alternate hypothesis is that ODAM inactivation in toothed whales may be related to altered antimicrobial functions of the junctional epithelium in aquatic habitats. Selection analyses on ODAM sequences revealed that the composite dN/dS value for pseudogenic branches is close to 1.0 as expected for a neutrally evolving pseudogene. DN/dS values on transitional branches were used to estimate ODAM inactivation times. In the case of pangolins, ODAM was inactivated ~ 65 million years ago, which is older than the oldest pangolin fossil (Eomanis, 47 Ma) and suggests an even more ancient loss or simplification of teeth in this lineage. CONCLUSION: Our results validate the hypothesis that the only essential functions of ODAM that are maintained by natural selection are related to tooth development and/or the maintenance of a healthy junctional epithelium that attaches to the enamel surface of teeth.
Assuntos
Ameloblastos/metabolismo , Esmalte Dentário/metabolismo , Eutérios/genética , Inativação Gênica , Odontogênese , Proteínas/genética , Baleias/genética , Animais , Sequência de Bases , Teorema de Bayes , Códon/genética , Feminino , Fósseis , Funções Verossimilhança , Mutação/genética , Filogenia , Gravidez , Proteínas/metabolismoRESUMO
In summary ("two-step") coalescent analyses of empirical data, researchers typically apply the bootstrap to quantify branch support for clades inferred on the optimal species tree. We tested whether site-wise bootstrap analyses provide consistently more conservative support than gene-wise bootstrap analyses. We did so using data from three empirical phylogenomic studies and employed four coalescent methods (ASTRAL, MP-EST, NJst, and STAR). We demonstrate that application of site-wise bootstrapping generally resulted in gene-trees with substantial additional conflicts relative to the original data and this approach therefore cannot be relied upon to provide conservative support. Instead the site-wise bootstrap can provide high support for apparently incorrect clades. We provide a script (https://github.com/dbsloan/msc_tree_resampling) that implements gene-wise resampling, using either the bootstrap or the jackknife, for use with ASTRAL, MP-EST, NJst, and STAR. We demonstrate that the gene-wise bootstrap outperformed the site-wise bootstrap for the primary focal clades for all four coalescent methods that were applied to all three empirical studies. For summary coalescent analyses we suggest that gene-wise resampling support should be favored over geneâ¯+â¯site or site-wise resampling when numerous genes are sampled because site-wise resampling causes substantially greater gene-tree-estimation error.
Assuntos
Genes , Filogenia , Pesquisa Empírica , Probabilidade , SoftwareRESUMO
Genomic datasets sometimes support conflicting phylogenetic relationships when different tree-building methods are applied. Coherent interpretations of such results are enabled by partitioning support for controversial relationships among the constituent genes of a phylogenomic dataset. For the supermatrix (=concatenation) approach, several methods that measure the distribution of support and conflict among loci were introduced over 15â¯years ago. More recently, partitioned coalescence support (PCS) was developed for phylogenetic coalescence methods that account for incomplete lineage sorting and use the summed fits of gene trees to estimate the species tree. Here, we automate computation of PCS to permit application of this index to genome-scale matrices that include hundreds of loci. Reanalyses of four phylogenomic datasets for amniotes, land plants, skinks, and angiosperms demonstrate how PCS scores can be used to: (1) compare conflicting results favored by alternative coalescence methods, (2) identify outlier gene trees that have a disproportionate influence on the resolution of contentious relationships, (3) assess the effects of missing data in species-tree analysis, and (4) clarify biases in commonly-implemented coalescence methods and support indices. We show that key phylogenomic conclusions from these analyses often hinge on just a few gene trees and that results can be driven by specific biases of a particular coalescence method and/or the differential weight placed on gene trees with high versus low taxon sampling. The attribution of exceptionally high weight to some gene trees and very low weight to other gene trees counters the basic logic of phylogenomic coalescence analysis; even clades in species trees with high support according to commonly used indices (likelihood-ratio test, bootstrap, Bayesian local posterior probability) can be unstable to the removal of only one or two gene trees with high PCS. Computer simulations cannot adequately describe all of the contingencies and complexities of empirical genetic data. PCS scores complement simulation work by providing specific insights into a particular dataset given the assumptions of the phylogenetic coalescence method that is applied. In combination with standard measures of nodal support, PCS provides a more complete understanding of the overall genomic evidence for contested evolutionary relationships in species trees.
Assuntos
Filogenia , Animais , Teorema de Bayes , Viés , Evolução Biológica , Simulação por Computador , Genes , Genômica , Lagartos/classificação , Lagartos/genética , Magnoliopsida/classificação , Magnoliopsida/genética , Plantas/classificação , Plantas/genética , ProbabilidadeRESUMO
The mammalian family Talpidae (moles, shrew moles, desmans) is characterized by diverse ecomorphologies associated with terrestrial, semi-aquatic, semi-fossorial, fossorial, and aquatic-fossorial lifestyles. Prominent specializations involved with these different lifestyles, and the transitions between them, pose outstanding questions regarding the evolutionary history within the family, not only for living but also for fossil taxa. Here, we investigate the phylogenetic relationships, divergence times, and biogeographic history of the family using 19 nuclear and 2 mitochondrial genes (â¼16 kb) from â¼60% of described species representing all 17 genera. Our phylogenetic analyses help settle classical questions in the evolution of moles, identify an ancient (mid-Miocene) split within the monotypic genus Scaptonyx, and indicate that talpid species richness may be nearly 30% higher than previously recognized. Our results also uniformly support the monophyly of long-tailed moles with the two shrew mole tribes and confirm that the Gansu mole is the sole living Asian member of an otherwise North American radiation. Finally, we provide evidence that aquatic specializations within the tribes Condylurini and Desmanini evolved along different morphological trajectories, though we were unable to statistically reject monophyly of the strictly fossorial tribes Talpini and Scalopini.
Assuntos
Toupeiras/genética , Musaranhos/genética , Animais , Evolução Biológica , Classificação/métodos , Bases de Dados de Ácidos Nucleicos , Variação Genética , Filogenia , Análise de Sequência de DNA/métodos , Especificidade da EspécieRESUMO
MC5R is one of five melanocortin receptor genes found in placental mammals. MC5R plays an important role in energy homeostasis and is also expressed in the terminal differentiation of sebaceous glands. Among placental mammals there are multiple lineages that either lack or have degenerative sebaceous glands including Cetacea (whales, dolphins, and porpoises), Hippopotamidae (hippopotamuses), Sirenia (manatees and dugongs), Proboscidea (elephants), Rhinocerotidae (rhinos), and Heterocephalus glaber (naked mole rat). Given the loss or diminution of sebaceous glands in these taxa, we procured MC5R sequences from publicly available genomes and transcriptomes, supplemented by a newly generated sequence for Choeropsis liberiensis (pygmy hippopotamus), to determine if this gene remains intact or is inactivated in association with loss/reduction of sebaceous glands. Our data set includes complete MC5R sequences for 114 placental mammal species including two individuals of Mammuthus primigenius (woolly mammoth) from Oimyakon and Wrangel Island. Complete loss or inactivation of the MC5R gene occurs in multiple placental lineages that have lost sebaceous glands (Cetacea, West Indian manatee, African elephant, white rhinoceros) or are characterized by unusual skin (pangolins, aardvarks). Both M. primigenius individuals share inactivating mutations with the African elephant even though sebaceous glands have been reported in the former. MC5R remains intact in hippopotamuses and the naked mole rat, although slightly elevated dN/dS ratios in these lineages allow for the possibility that the accumulation of inactivating mutations in MC5R may lag behind the relaxation of purifying selection. For Cetacea and Hippopotamidae, the absence of shared inactivating mutations in two different skin genes (MC5R, PSORS1C2) is consistent with the hypothesis that semi-aquatic lifestyles were acquired independently in these clades following divergence from a common ancestor.
Assuntos
Evolução Molecular , Mamíferos/metabolismo , Receptores de Melanocortina/genética , Glândulas Sebáceas/fisiologia , Animais , Sequência de Bases , Bases de Dados Genéticas , Feminino , Mamíferos/classificação , Filogenia , Placenta/metabolismo , Gravidez , Receptores de Melanocortina/classificação , Alinhamento de SequênciaRESUMO
The mammalian order Eulipotyphla includes four extant families of insectivorans: Solenodontidae (solenodons); Talpidae (moles); Soricidae (shrews); and Erinaceidae (hedgehogs). Of these, Solenodontidae includes only two extant species, which are endemic to the largest islands of the Greater Antilles: Cuba and Hispaniola. Most molecular studies suggest that eulipotyphlan families diverged from each other across several million years, with the basal split between Solenodontidae and other families occurring in the Late Cretaceous. By contrast, Sato et al. (2016) suggest that eulipotyphlan families diverged from each other in a polytomy â¼58.6â¯million years ago (Mya). This more recent divergence estimate for Solenodontidae versus other extant eulipotyphlans suggests that solenodons must have arrived in the Greater Antilles via overwater dispersal rather than vicariance. Here, we show that the young timetree estimates for eulipotyphlan families and the polytomy are due to an inverted ingroup-outgroup arrangement of the tree, the result of using Tracer rather than TreeAnnotator to compile interfamilial divergence times, and of not enforcing the monophly of well-established clades such as Laurasiatheria and Eulipotyphla. Finally, Sato et al.'s (2016) timetree includes several zombie lineages where estimated divergence times are much younger than minimum ages that are implied by the fossil record. We reanalyzed Sato et al.'s (2016) original data with enforced monophyly for well-established clades and updated fossil calibrations that eliminate the inference of zombie lineages. Our resulting timetrees, which were compiled with TreeAnnotator rather than Tracer, produce dates that are in good agreement with other recent studies and place the basal split between Solenodontidae and other eulipotyphlans in the Late Cretaceous.
Assuntos
Fósseis , Mamíferos/classificação , Filogenia , Animais , Calibragem , Cuba , Evolução Molecular , Fatores de TempoRESUMO
Homology is perhaps the most central concept of phylogenetic biology. At difficult to resolve polytomies that are deep in the Tree of Life, a few homology errors in phylogenomic data can drive spurious phylogenetic results. Feijoo and Parada (2017) assembled three phylogenomic data sets for mammals and reported methodological discrepancies and unexpected results that contradict the monophyly of well-established clades in Pinnipedia and Yangochiroptera. Examination of Feijoo and Parada's (2017) data sets reveals extensive homology errors (paralogous sequences, alignments of different exons to each other) and cross-contamination of sequences from different species. These problems predictably result in distorted estimates of gene trees, species trees, bootstrap support, and branch lengths. Correction of these errors resulted in robust support for conventional relationships in Pinnipedia and Yangochiroptera. Phylogenomic data sets are not immune to the problems of homology errors in sequence alignments. Rather, sequence alignments underlie all inferences in molecular phylogenetics and evolution and should be spot-checked for obvious errors via manual inspection of alignments and gene trees.
Assuntos
Caniformia/genética , Quirópteros/genética , Filogenia , Animais , Carnívoros/genética , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Éxons , Funções Verossimilhança , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricosRESUMO
Throughout Earth's history, evolution's numerous natural 'experiments' have resulted in a diverse range of phenotypes. Though de novo phenotypes receive widespread attention, degeneration of traits inherited from an ancestor is a very common, yet frequently neglected, evolutionary path. The latter phenomenon, known as regressive evolution, often results in vertebrates with phenotypes that mimic inherited disease states in humans. Regressive evolution of anatomical and/or physiological traits is typically accompanied by inactivating mutations underlying these traits, which frequently occur at loci identical to those implicated in human diseases. Here we discuss the potential utility of examining the genomes of vertebrates that have experienced regressive evolution to inform human medical genetics. This approach is low cost and high throughput, giving it the potential to rapidly improve knowledge of disease genetics. We discuss two well-described examples, rod monochromacy (congenital achromatopsia) and amelogenesis imperfecta, to demonstrate the utility of this approach, and then suggest methods to equip non-experts with the ability to corroborate candidate genes and uncover new disease loci.
Assuntos
Evolução Molecular , Loci Gênicos , Predisposição Genética para Doença , Genoma , Genômica , Modelos Genéticos , Vertebrados/genética , Amelogênese Imperfeita/diagnóstico , Amelogênese Imperfeita/genética , Animais , Defeitos da Visão Cromática/diagnóstico , Defeitos da Visão Cromática/genética , Estudos de Associação Genética , Genômica/métodos , Humanos , Mutação , Fenótipo , PseudogenesRESUMO
Various toothed whales (Odontoceti) are unique among mammals in lacking olfactory bulbs as adults and are thought to be anosmic (lacking the olfactory sense). At the molecular level, toothed whales have high percentages of pseudogenic olfactory receptor genes, but species that have been investigated to date retain an intact copy of the olfactory marker protein gene (OMP), which is highly expressed in olfactory receptor neurons and may regulate the temporal resolution of olfactory responses. One hypothesis for the retention of intact OMP in diverse odontocete lineages is that this gene is pleiotropic with additional functions that are unrelated to olfaction. Recent expression studies provide some support for this hypothesis. Here, we report OMP sequences for representatives of all extant cetacean families and provide the first molecular evidence for inactivation of this gene in vertebrates. Specifically, OMP exhibits independent inactivating mutations in six different odontocete lineages: four river dolphin genera (Platanista, Lipotes, Pontoporia, Inia), sperm whale (Physeter), and harbor porpoise (Phocoena). These results suggest that the only essential role of OMP that is maintained by natural selection is in olfaction, although a non-olfactory role for OMP cannot be ruled out for lineages that retain an intact copy of this gene. Available genome sequences from cetaceans and close outgroups provide evidence of inactivating mutations in two additional genes (CNGA2, CNGA4), which imply further pseudogenization events in the olfactory cascade of odontocetes. Selection analyses demonstrate that evolutionary constraints on all three genes (OMP, CNGA2, CNGA4) have been greatly reduced in Odontoceti, but retain a signature of purifying selection on the stem Cetacea branch and in Mysticeti (baleen whales). This pattern is compatible with the 'echolocation-priority' hypothesis for the evolution of OMP, which posits that negative selection was maintained in the common ancestor of Cetacea and was not relaxed significantly until the evolution of echolocation in Odontoceti.
Assuntos
Golfinhos/genética , Proteína de Marcador Olfatório/genética , Animais , Sequência de Bases , Evolução Biológica , DNA Mitocondrial , Golfinhos/classificação , Evolução Molecular , Proteína de Marcador Olfatório/fisiologia , FilogeniaRESUMO
The explosive, long fuse, and short fuse models represent competing hypotheses for the timing of placental mammal diversification. Support for the explosive model, which posits both interordinal and intraordinal diversification after the KPg mass extinction, derives from morphological cladistic studies that place Cretaceous eutherians outside of crown Placentalia. By contrast, most molecular studies favor the long fuse model wherein interordinal cladogenesis occurred in the Cretaceous followed by intraordinal cladogenesis after the KPg boundary. Phillips (2016) proposed a soft explosive model that allows for the emergence of a few lineages (Xenarthra, Afrotheria, Euarchontoglires, Laurasiatheria) in the Cretaceous, but otherwise agrees with the explosive model in positing the majority of interordinal diversification after the KPg mass extinction. Phillips (2016) argues that rate transference errors associated with large body size and long lifespan have inflated previous estimates of interordinal divergence times, and further suggests that most interordinal divergences are positioned after the KPg boundary when rate transference errors are avoided through the elimination of calibrations in large-bodied and/or long lifespan clades. Here, we show that rate transference errors can also occur in the opposite direction and drag forward estimated divergence dates when calibrations in large-bodied/long lifespan clades are omitted. This dragging forward effect results in the occurrence of more than half a billion years of 'zombie lineages' on Phillips' preferred timetree. By contrast with ghost lineages, which are a logical byproduct of an incomplete fossil record, zombie lineages occur when estimated divergence dates are younger than the minimum age of the oldest crown fossils. We also present the results of new timetree analyses that address the rate transference problem highlighted by Phillips (2016) by deleting taxa that exceed thresholds for body size and lifespan. These analyses recover all interordinal divergence times in the Cretaceous and are consistent with the long fuse model of placental diversification. Finally, we outline potential problems with morphological cladistic analyses of higher-level relationships among placental mammals that may account for the perceived discrepancies between molecular and paleontological estimates of placental divergence times.
Assuntos
Mamíferos/classificação , Modelos Teóricos , Animais , Biodiversidade , Tamanho Corporal , Feminino , Fósseis , Longevidade , Mamíferos/fisiologia , Paleontologia , Filogenia , Placenta , GravidezRESUMO
Recent phylogenetic analyses of a large dataset for mammalian families (169 taxa, 26 loci) portray contrasting results. Supermatrix (concatenation) methods support a generally robust tree with only a few inconsistently resolved polytomies, whereas MP-EST coalescence analysis of the same dataset yields a weakly supported tree that conflicts with many traditionally recognized clades. Here, we evaluate this discrepancy via improved coalescence analyses with reference to the rich history of phylogenetic studies on mammals. This integration clearly demonstrates that both supermatrix and coalescence analyses of just 26 loci yield a congruent, well-supported phylogenetic hypothesis for Mammalia. Discrepancies between published studies are explained by implementation of overly simple DNA substitution models, inadequate tree-search routines and limitations of the MP-EST method. We develop a simple measure, partitioned coalescence support (PCS), which summarizes the distribution of support and conflict among gene trees for a given clade. Extremely high PCS scores for outlier gene trees at two nodes in the mammalian tree indicate a troubling bias in the MP-EST method. We conclude that in this age of phylogenomics, a solid understanding of systematics fundamentals, choice of valid methodology and a broad knowledge of a clade's taxonomic history are still required to yield coherent phylogenetic inferences.
RESUMO
Higher-level relationships among placental mammals are mostly resolved, but several polytomies remain contentious. Song et al. (2012) claimed to have resolved three of these using shortcut coalescence methods (MP-EST, STAR) and further concluded that these methods, which assume no within-locus recombination, are required to unravel deep-level phylogenetic problems that have stymied concatenation. Here, we reanalyze Song et al.'s (2012) data and leverage these re-analyses to explore key issues in systematics including the recombination ratchet, gene tree stoichiometry, the proportion of gene tree incongruence that results from deep coalescence versus other factors, and simulations that compare the performance of coalescence and concatenation methods in species tree estimation. Song et al. (2012) reported an average locus length of 3.1 kb for the 447 protein-coding genes in their phylogenomic dataset, but the true mean length of these loci (start codon to stop codon) is 139.6 kb. Empirical estimates of recombination breakpoints in primates, coupled with consideration of the recombination ratchet, suggest that individual coalescence genes (c-genes) approach â¼12 bp or less for Song et al.'s (2012) dataset, three to four orders of magnitude shorter than the c-genes reported by these authors. This result has general implications for the application of coalescence methods in species tree estimation. We contend that it is illogical to apply coalescence methods to complete protein-coding sequences. Such analyses amalgamate c-genes with different evolutionary histories (i.e., exons separated by >100,000 bp), distort true gene tree stoichiometry that is required for accurate species tree inference, and contradict the central rationale for applying coalescence methods to difficult phylogenetic problems. In addition, Song et al.'s (2012) dataset of 447 genes includes 21 loci with switched taxonomic names, eight duplicated loci, 26 loci with non-homologous sequences that are grossly misaligned, and numerous loci with >50% missing data for taxa that are misplaced in their gene trees. These problems were compounded by inadequate tree searches with nearest neighbor interchange branch swapping and inadvertent application of substitution models that did not account for among-site rate heterogeneity. Sixty-six gene trees imply unrealistic deep coalescences that exceed 100 million years (MY). Gene trees that were obtained with better justified models and search parameters show large increases in both likelihood scores and congruence. Coalescence analyses based on a curated set of 413 improved gene trees and a superior coalescence method (ASTRAL) support a Scandentia (treeshrews)+Glires (rabbits, rodents) clade, contradicting one of the three primary systematic conclusions of Song et al. (2012). Robust support for a Perissodactyla+Carnivora clade within Laurasiatheria is also lost, contradicting a second major conclusion of this study. Song et al.'s (2012) MP-EST species tree provided the basis for circular simulations that led these authors to conclude that the multispecies coalescent accounts for 77% of the gene tree conflicts in their dataset, but many internal branches of their MP-EST tree are stunted by an order of magnitude or more due to wholesale gene tree reconstruction errors. An independent assessment of branch lengths suggests the multispecies coalescent accounts for ⩽ 15% of the conflicts among Song et al.'s (2012) 447 gene trees. Unfortunately, Song et al.'s (2012) flawed phylogenomic dataset has been used as a model for additional simulation work that suggests the superiority of shortcut coalescence methods relative to concatenation. Investigator error was passed on to the subsequent simulation studies, which also incorporated further logical errors that should be avoided in future simulation studies. Illegitimate branch length switches in the simulation routines unfairly protected coalescence methods from their Achilles' heel, high gene tree reconstruction error at short internodes. These simulations therefore provide no evidence that shortcut coalescence methods out-compete concatenation at deep timescales. In summary, the long c-genes that are required for accurate reconstruction of species trees using shortcut coalescence methods do not exist and are a delusion. Coalescence approaches based on SNPs that are widely spaced in the genome avoid problems with the recombination ratchet and merit further pursuit in both empirical systematic research and simulations.