RESUMO
The evolution of genomes in all life forms involves two distinct, dynamic types of genomic changes: gene duplication (and loss) that shape families of paralogous genes and extension (and contraction) of low-complexity regions (LCR), which occurs through dynamics of short repeats in protein-coding genes. Although the roles of each of these types of events in genome evolution have been studied, their co-evolutionary dynamics is not thoroughly understood. Here, by analyzing a wide range of genomes from diverse bacteria and archaea, we show that LCR and paralogy represent two distinct routes of evolution that are inversely correlated. The emergence of LCR is a prominent evolutionary mechanism in fast evolving, young protein families, whereas paralogy dominates the comparatively slow evolution of old protein families. The analysis of multiple prokaryotic genomes shows that the formation of LCR is likely a widespread, transient evolutionary mechanism that temporally and locally affects also ancestral functions, but apparently, fades away with time, under mutational and selective pressures, yielding to gene paralogy. We propose that compensatory relationships between short-term and longer-term evolutionary mechanisms are universal in the evolution of life.
Assuntos
Evolução Molecular , Células Procarióticas , Filogenia , Bactérias/genética , Archaea/genéticaRESUMO
The Andes mountains of western South America are a globally important biodiversity hotspot, yet there is a paucity of resolved phylogenies for plant clades from this region. Filling an important gap in our understanding of the World's richest flora, we present the first phylogeny of Freziera (Pentaphylacaceae), an Andean-centered, cloud forest radiation. Our dataset was obtained via hybrid-enriched target sequence capture of Angiosperms353 universal loci for 50 of the ca. 75 spp., obtained almost entirely from herbarium specimens. We identify high phylogenomic complexity in Freziera, including the presence of data artifacts. Via by-eye observation of gene trees, detailed examination of warnings from recently improved assembly pipelines, and gene tree filtering, we identified that artifactual orthologs (i.e., the presence of only one copy of a multicopy gene due to differential assembly) were an important source of gene tree heterogeneity that had a negative impact on phylogenetic inference and support. These artifactual orthologs may be common in plant phylogenomic datasets, where multiple instances of genome duplication are common. After accounting for artifactual orthologs as source of gene tree error, we identified a significant, but nonspecific signal of introgression using Patterson's D and f4 statistics. Despite phylogenomic complexity, we were able to resolve Freziera into 9 well-supported subclades whose evolution has been shaped by multiple evolutionary processes, including incomplete lineage sorting, historical gene flow, and gene duplication. Our results highlight the complexities of plant phylogenomics, which are heightened in Andean radiations, and show the impact of filtering data processing artifacts and standard filtering approaches on phylogenetic inference.
Assuntos
Filogenia , Classificação/métodos , América do Sul , Genoma de PlantaRESUMO
Homology in vertebrate body plans is traditionally ascribed to the high-level conservation of regulatory components within the genetic programs governing them, particularly during the "phylotypic stage." However, advancements in embryology and molecular phylogeny have unveiled the dynamic nature of gene repertoires responsible for early development. Notably, the Nodal and Lefty genes, members of the transforming growth factor-beta superfamily producing intercellular signaling molecules and crucial for left-right (L-R) symmetry breaking, exhibit distinctive features within their gene repertoires. These features encompass among-species gene repertoire variations resulting from gene gain and loss, as well as gene conversion. Despite their significance, these features have been largely unexplored in a phylogenetic context, but accumulating genome-wide sequence information is allowing the scrutiny of these features. It has exposed hidden paralogy between Nodal1 and Nodal2 genes resulting from differential gene loss in amniotes. In parallel, the tandem cluster of Lefty1 and Lefty2 genes, which was thought to be confined to mammals, is observed in sharks and rays, with an unexpected phylogenetic pattern. This article provides a comprehensive review of the current understanding of the origins of these vertebrate gene repertoires and proposes a revised nomenclature based on the elucidated history of vertebrate genome evolution.
RESUMO
Recent advances in higher-level invertebrate phylogeny have leveraged shared features of genomic architecture to resolve contentious nodes across the tree of life. Yet, the interordinal relationships within Chelicerata have remained recalcitrant given competing topologies in recent molecular analyses. As such, relationships between topologically unstable orders remain supported primarily by morphological cladistic analyses. Solifugae, one such unstable chelicerate order, has long been thought to be the sister group of Pseudoscorpiones, forming the clade Haplocnemata, on the basis of eight putative morphological synapomorphies. The discovery, however, of a shared whole genome duplication placing Pseudoscorpiones in Arachnopulmonata provides the opportunity for a simple litmus test evaluating the validity of Haplocnemata. Here, we present the first developmental transcriptome of a solifuge (Titanopuga salinarum) and survey copy numbers of the homeobox genes for evidence of systemic duplication. We find that over 70% of the identified homeobox genes in T. salinarum are retained in a single copy, while representatives of the arachnopulmonates retain orthologs of those genes as two or more copies. Our results refute the placement of Solifugae in Haplocnemata. Subsequent reevaluation of putative interordinal morphological synapomorphies among chelicerates reveals a high incidence of homoplasy, reversals, and inaccurate coding within Haplocnemata and other small clades, as well as Arachnida more broadly, suggesting existing morphological character matrices are insufficient to resolve chelicerate phylogeny.
Assuntos
Filogenia , Animais , Aracnídeos/anatomia & histologia , Aracnídeos/genética , Aracnídeos/classificação , Genoma , TranscriptomaRESUMO
Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology-evolutionary relatedness-is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit-from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.
Assuntos
Especiação Genética , Genômica/tendências , Filogenia , Genoma Viral , Genômica/métodosRESUMO
Huntington's disease (HD) is caused by the production of a mutant huntingtin (HTT) with an abnormally long poly-glutamine (polyQ) tract, forming aggregates and inclusions in neurons. Previous work by us and others has shown that an increase or decrease in polyQ-triggered aggregates can be passive simply due to the interaction of proteins with the aggregates. To search for proteins with active (functional) effects, which might be more effective in finding therapies and mechanisms of HD, we selected among the proteins that interact with HTT a total of 49 pairs of proteins that, while being paralogous to each other (and thus expected to have similar passive interaction with HTT), are located in different regions of the protein interaction network (suggesting participation in different pathways or complexes). Three of these 49 pairs contained members with opposite effects on HD, according to the literature. The negative members of the three pairs, MID1, IKBKG, and IKBKB, interact with PPP2CA and TUBB, which are known negative factors in HD, as well as with HSP90AA1 and RPS3. The positive members of the three pairs interact with HSPA9. Our results provide potential HD modifiers of functional relevance and reveal the dynamic aspect of paralog evolution within the interaction network.
Assuntos
Doença de Huntington , Humanos , Doença de Huntington/metabolismo , Quinase I-kappa B/metabolismo , Corpos de Inclusão/metabolismo , Neurônios/metabolismo , Mapas de Interação de ProteínasRESUMO
The tribe Senecioneae is one of the largest tribes in Asteraceae, with a nearly cosmopolitan distribution. Despite great efforts devoted to elucidate the evolution of Senecioneae, many questions still remain concerning the systematics of this group, from the tribal circumscription and position to species relationships in many genera. The hybridization-based target enrichment method of next-generation sequencing has been accepted as a promising approach to resolve phylogenetic problems. We herein develop a set of single-/low-copy genes for Senecioneae, and test their phylogenetic utilities. Our results demonstrate that these genes work highly efficiently for Senecioneae, with a high average gene recovery of 98.8% across the tribe and recovering robust phylogenetic hypotheses at different levels. In particular, the delimitation of the Senecioneae has been confirmed to include Abrotanella and exclude Doronicum, with the former sister to core Senecioneae and the latter shown to be more closely related to Calenduleae. Moreover, Doronicum and Calenduleae are inferred to be the closest relatives of Senecioneae, which is a new hypothesis well supported by statistical topology tests, morphological evidence, and the profile of pyrrolizidine alkaloids, a special kind of chemical characters generally used to define Senecioneae. Furthermore, this study suggests a complex reticulation history in the diversification of Senecioneae, accounting for the prevalence of polyploid groups in the tribe. With subtribe Tussilagininae s.str. as a case study showing a more evident pattern of gene duplication, we further explored reconstructing the phylogeny in the groups with high ploidy levels. Our results also demonstrate that tree topologies based on sorted paralogous copies are stable across different methods of phylogenetic inference, and more congruent with the morphological evidence and the results of previous phylogenetic studies.
Assuntos
Asteraceae/classificação , Asteraceae/genética , Núcleo Celular/genética , Filogenia , Hibridização Genética , PoliploidiaRESUMO
Comparative genomics has proven a fruitful approach to acquire many functional and evolutionary insights into core cellular processes. Here it is argued that in order to perform accurate and interesting comparative genomics, one first and foremost has to be able to recognize, postulate, and revise different evolutionary scenarios. After all, these studies lack a simple protocol, due to different proteins having different evolutionary dynamics and demanding different approaches. The authors here discuss this challenge from a practical (what are the observations?) and conceptual (how do these indicate a specific evolutionary scenario?) viewpoint, with the aim to guide investigators who want to analyze the evolution of their protein(s) of interest. By sharing how the authors draft, test, and update such a scenario and how it directs their investigations, the authors hope to illuminate how to execute molecular evolution studies and how to interpret them. Also see the video abstract here https://youtu.be/VCt3l2pbdbQ.
Assuntos
Biologia Computacional/métodos , Evolução Molecular , Proteínas/genética , Proteínas de Caenorhabditis elegans/genética , Bases de Dados de Proteínas , Células Eucarióticas , Genômica/métodos , Humanos , Filogenia , Domínios Proteicos , Proteínas/químicaRESUMO
Our understanding of phylogenetic relationships among bony fishes has been transformed by analysis of a small number of genes, but uncertainty remains around critical nodes. Genome-scale inferences so far have sampled a limited number of taxa and genes. Here we leveraged 144 genomes and 159 transcriptomes to investigate fish evolution with an unparalleled scale of data: >0.5 Mb from 1,105 orthologous exon sequences from 303 species, representing 66 out of 72 ray-finned fish orders. We apply phylogenetic tests designed to trace the effect of whole-genome duplication events on gene trees and find paralogy-free loci using a bioinformatics approach. Genome-wide data support the structure of the fish phylogeny, and hypothesis-testing procedures appropriate for phylogenomic datasets using explicit gene genealogy interrogation settle some long-standing uncertainties, such as the branching order at the base of the teleosts and among early euteleosts, and the sister lineage to the acanthomorph and percomorph radiations. Comprehensive fossil calibrations date the origin of all major fish lineages before the end of the Cretaceous.
Assuntos
Peixes/genética , Genoma/genética , Transcriptoma/genética , Animais , Evolução Molecular , Éxons/genética , Fósseis , Duplicação Gênica/genética , Genômica/métodos , Modelos Genéticos , FilogeniaRESUMO
Increasingly, large phylogenomic data sets include transcriptomic data from nonmodel organisms. This not only has allowed controversial and unexplored evolutionary relationships in the tree of life to be addressed but also increases the risk of inadvertent inclusion of paralogs in the analysis. Although this may be expected to result in decreased phylogenetic support, it is not clear if it could also drive highly supported artifactual relationships. Many groups, including the hyperdiverse Lissamphibia, are especially susceptible to these issues due to ancient gene duplication events and small numbers of sequenced genomes and because transcriptomes are increasingly applied to resolve historically conflicting taxonomic hypotheses. We tested the potential impact of paralog inclusion on the topologies and timetree estimates of the Lissamphibia using published and de novo sequencing data including 18 amphibian species, from which 2,656 single-copy gene families were identified. A novel paralog filtering approach resulted in four differently curated data sets, which were used for phylogenetic reconstructions using Bayesian inference, maximum likelihood, and quartet-based supertrees. We found that paralogs drive strongly supported conflicting hypotheses within the Lissamphibia (Batrachia and Procera) and older divergence time estimates even within groups where no variation in topology was observed. All investigated methods, except Bayesian inference with the CAT-GTR model, were found to be sensitive to paralogs, but with filtering convergence to the same answer (Batrachia) was observed. This is the first large-scale study to address the impact of orthology selection using transcriptomic data and emphasizes the importance of quality over quantity particularly for understanding relationships of poorly sampled taxa.
Assuntos
Técnicas Genéticas , Filogenia , Transcriptoma , Anfíbios/genética , Animais , Duplicação GênicaRESUMO
Understanding the evolution of biodiversity on Earth is a central aim in biology. Currently, various disciplines of science contribute to unravel evolution at all levels of life, from individual organisms to species and higher ranks, using different approaches and specific terminologies. The search for common origin, traditionally called homology, is a connecting paradigm of all studies related to evolution. However, it is not always sufficiently taken into account that defining homology depends on the hierarchical level studied (organism, population, and species), which can cause confusion. Therefore, we propose a framework to define homologies making use of existing terms, which refer to homology in different fields, but restricting them to an unambiguous meaning and a particular hierarchical level. We propose to use the overarching term "homology" only when "morphological homology," "vertical gene transfer," and "phylogenetic homology" are confirmed. Consequently, neither phylogenetic nor morphological homology is equal to homology. This article is intended for readers with different research backgrounds. We challenge their traditional approaches, inviting them to consider the proposed framework and offering them a new perspective for their own research.
Assuntos
Evolução Biológica , Classificação/métodosRESUMO
BACKGROUND: The hypothesis that vertebrates have experienced two ancient, whole genome duplications (WGDs) is of central interest to evolutionary biology and has been implicated in evolution of developmental complexity. Three-way and Four-way paralogy regions in human and other vertebrate genomes are considered as vital evidence to support this hypothesis. Alternatively, it has been proposed that such paralogy regions are created by small-scale duplications that occurred at different intervals over the evolution of life. RESULTS: To address this debate, the present study investigates the evolutionary history of multigene families with at least three-fold representation on human chromosomes 1, 2, 8 and 20. Phylogenetic analysis and the tree topology comparisons classified the members of 36 multigene families into four distinct co-duplicated groups. Gene families falling within the same co-duplicated group might have duplicated together, whereas genes belong to different co-duplicated groups might have distinct evolutionary origins. CONCLUSION: Taken together with previous investigations, the current study yielded no proof in favor of WGDs hypothesis. Rather, it appears that the vertebrate genome evolved as a result of small-scale duplication events, that cover the entire span of the animals' history.
Assuntos
Evolução Molecular , Duplicação Gênica , Família Multigênica , Vertebrados/genética , Animais , Cromossomos Humanos , Genoma Humano , Humanos , Invertebrados/classificação , Invertebrados/genética , Filogenia , Vertebrados/classificaçãoRESUMO
Twenty-nine DNA regions of plastid origin have been previously identified in the mitochondrial genome of Cucurbita pepo (pumpkin; Cucurbitaceae). Four of these regions harbor homolog sequences of rbcL, matK, rpl20-rps12 and trnL-trnF, which are widely used as molecular markers for phylogenetic and phylogeographic studies. We extracted the mitochondrial copies of these regions based on the mitochondrial genome of C. pepo and, along with published sequences for these plastome markers from 13 Cucurbita taxa, we performed phylogenetic molecular analyses to identify inter-organellar transfer events in the Cucurbita phylogeny and changes in their nucleotide substitution rates. Phylogenetic reconstruction and tree selection tests suggest that rpl20 and rbcL mitochondrial paralogs arose before Cucurbita diversification whereas the mitochondrial matK and trnL-trnF paralogs emerged most probably later, in the mesophytic Cucurbita clade. Nucleotide substitution rates increased one order of magnitude in all the mitochondrial paralogs compared to their original plastid sequences. Additionally, mitochondrial trnL-trnF sequences obtained by PCR from nine Cucurbita taxa revealed higher nucleotide diversity in the mitochondrial than in the plastid copies, likely related to the higher nucleotide substitution rates in the mitochondrial region and loss of functional constraints in its tRNA genes.
Assuntos
Cucurbita/genética , Genoma Mitocondrial/genética , Plastídeos/genética , Evolução Biológica , Evolução Molecular , Genes de Plantas/genética , Genoma de Planta/genética , Mitocôndrias/genética , Filogenia , Análise de Sequência de DNARESUMO
Phylogenomics heavily relies on well-curated sequence data sets that comprise, for each gene, exclusively 1:1 orthologos. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics, we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. Although the resolution is very poor for individual gene families, we show that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees, even in the presence of horizontal gene transfer.
Assuntos
Genômica , FilogeniaRESUMO
BACKGROUND: Orthology characterizes genes of different organisms that arose from a single ancestral gene via speciation, in contrast to paralogy, which is assigned to genes that arose via gene duplication. An accurate orthology assignment is a crucial step for comparative genomic studies. Orthologous genes in two organisms can be identified by applying a so-called reciprocal search strategy, given that complete information of the organisms' gene repertoire is available. In many investigations, however, only a fraction of the gene content of the organisms under study is examined (e.g., RNA sequencing). Here, identification of orthologous nucleotide or amino acid sequences can be achieved using a graph-based approach that maps nucleotide sequences to genes of known orthology. Existing implementations of this approach, however, suffer from algorithmic issues that may cause problems in downstream analyses. RESULTS: We present a new software pipeline, Orthograph, that addresses and solves the above problems and implements useful features for a wide range of comparative genomic and transcriptomic analyses. Orthograph applies a best reciprocal hit search strategy using profile hidden Markov models and maps nucleotide sequences to the globally best matching cluster of orthologous genes, thus enabling researchers to conveniently and reliably delineate orthologs and paralogs from transcriptomic and genomic sequence data. We demonstrate the performance of our approach on de novo-sequenced and assembled transcript libraries of 24 species of apoid wasps (Hymenoptera: Aculeata) as well as on published genomic datasets. CONCLUSION: With Orthograph, we implemented a best reciprocal hit approach to reference-based orthology prediction for coding nucleotide sequences such as RNAseq data. Orthograph is flexible, easy to use, open source and freely available at https://mptrsen.github.io/Orthograph . Additionally, we release 24 de novo-sequenced and assembled transcript libraries of apoid wasp species.
Assuntos
Genômica/métodos , Família Multigênica/genética , Fases de Leitura Aberta/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Software , Animais , Genoma/genética , Transcriptoma/genética , Vespas/genéticaRESUMO
The Leavenworthia self-incompatibility locus (S locus) consists of paralogs (Lal2, SCRL) of the canonical Brassicaceae S locus genes (SRK, SCR), and is situated in a genomic position that differs from the ancestral one in the Brassicaceae. Unexpectedly, in a small number of Leavenworthia alabamica plants examined, sequences closely resembling exon 1 of SRK have been found, but the function of these has remained unclear. BAC cloning and expression analyses were employed to characterize these SRK-like sequences. An SRK-positive Bacterial Artificial Chromosome clone was found to contain complete SRK and SCR sequences located close by one another in the derived genomic position of the Leavenworthia S locus, and in place of the more typical Lal2 and SCRL sequences. These sequences are expressed in stigmas and anthers, respectively, and crossing data show that the SRK/SCR haplotype is functional in self-incompatibility. Population surveys indicate that < 5% of Leavenworthia S loci possess such alleles. An ancestral translocation or recombination event involving SRK/SCR and Lal2/SCRL likely occurred, together with neofunctionalization of Lal2/SCRL, and both haplotype groups now function as Leavenworthia S locus alleles. These findings suggest that S locus alleles can have distinctly different evolutionary origins.
Assuntos
Brassicaceae/genética , Autoincompatibilidade em Angiospermas/genética , Brassicaceae/metabolismo , Flores/metabolismo , Genoma de Planta , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Homologia de Sequência do Ácido NucleicoRESUMO
Phylogenetic approaches are indispensable in any comparative molecular study involving multiple species. These approaches are in increasing demand as the amount and availability of DNA sequence information continues to increase exponentially, even for organisms that were previously not extensively studied. Without the sound application of phylogenetic concepts and knowledge, one can be misled when attempting to infer ancestral character states as well as the timing and order of evolutionary events, both of which are frequently exerted in evolutionary developmental biology. The ignorance of phylogenetic approaches can also impact non-evolutionary studies and cause misidentification of the target gene or protein to be examined in functional characterization. This review aims to promote tree-thinking in evolutionary conjecture and stress the importance of a sense of time scale in cross-species comparisons, in order to enhance the understanding of phylogenetics in all biological fields including developmental biology. To this end, molecular phylogenies of several developmental regulatory genes, including those denoted as "cryptic pan-vertebrate genes", are introduced as examples.
Assuntos
Biologia do Desenvolvimento , Evolução Molecular , Filogenia , Animais , HumanosRESUMO
Orthology analysis, that is, finding out whether a pair of homologous genes are orthologs - stemming from a speciation - or paralogs - stemming from a gene duplication - is of central importance in computational biology, genome annotation, and phylogenetic inference. In particular, an orthologous relationship makes functional equivalence of the two genes highly likely. A major approach to orthology analysis is to reconcile a gene tree to the corresponding species tree, (most commonly performed using the most parsimonious reconciliation, MPR). However, most such phylogenetic orthology methods infer the gene tree without considering the constraints implied by the species tree and, perhaps even more importantly, only allow the gene sequences to influence the orthology analysis through the a priori reconstructed gene tree. We propose a sound, comprehensive Bayesian Markov chain Monte Carlo-based method, DLRSOrthology, to compute orthology probabilities. It efficiently sums over the possible gene trees and jointly takes into account the current gene tree, all possible reconciliations to the species tree, and the, typically strong, signal conveyed by the sequences. We compare our method with PrIME-GEM, a probabilistic orthology approach built on a probabilistic duplication-loss model, and MrBayesMPR, a probabilistic orthology approach that is based on conventional Bayesian inference coupled with MPR. We find that DLRSOrthology outperforms these competing approaches on synthetic data as well as on biological data sets and is robust to incomplete taxon sampling artifacts.
Assuntos
Classificação/métodos , Filogenia , Algoritmos , Simulação por Computador , Homologia de Sequência , SoftwareRESUMO
Scorpions represent an iconic lineage of arthropods, historically renowned for their unique bauplan, ancient fossil record and venom potency. Yet, higher level relationships of scorpions, based exclusively on morphology, remain virtually untested, and no multilocus molecular phylogeny has been deployed heretofore towards assessing the basal tree topology. We applied a phylogenomic assessment to resolve scorpion phylogeny, for the first time, to our knowledge, sampling extensive molecular sequence data from all superfamilies and examining basal relationships with up to 5025 genes. Analyses of supermatrices as well as species tree approaches converged upon a robust basal topology of scorpions that is entirely at odds with traditional systematics and controverts previous understanding of scorpion evolutionary history. All analyses unanimously support a single origin of katoikogenic development, a form of parental investment wherein embryos are nurtured by direct connections to the parent's digestive system. Based on the phylogeny obtained herein, we propose the following systematic emendations: Caraboctonidae is transferred to Chactoidea new superfamilial assignment: ; superfamily Bothriuroidea revalidated: is resurrected and Bothriuridae transferred therein; and Chaerilida and Pseudochactida are synonymized with Buthida new parvordinal synonymies: .
Assuntos
Evolução Biológica , Genoma , Filogenia , Escorpiões/classificação , Escorpiões/genética , Animais , Evolução Molecular , Dados de Sequência Molecular , Escorpiões/anatomia & histologia , Análise de Sequência de DNARESUMO
PREMISE OF THE STUDY: Delimitation of Amelanchier species is difficult because of polyploidy and gametophytic apomixis. A first step in unraveling this species problem is understanding the diversity of the diploids that contributed genomes to polyploid apomicts. This research helps clarify challenging species-delimitation problems attending polyploid, apomictic complexity. METHODS: We sampled 431 diploid accessions from 13 species, of which 10 are North American and three are Old World. Quantitative morphological analyses tested the null hypothesis of no discrete groups. Using three to nine diploid accessions per species, we constructed phylogenies with DNA sequences from ETS, ITS, the second intron of LEAFY, and chloroplast regions rpoB-trnC, rpl16, trnD-trnT, and ycf6-psbM. KEY RESULTS: Most Amelanchier diploid taxa are morphologically and ecogeographically distinct and genetically exclusive lineages. They rarely hybridize with one another. Nuclear and chloroplast DNA sequences almost completely resolve the Amelanchier phylogeny. The backbone is the mostly western North American clade A, eastern North American clade B, and Old World clade O. DNA sequences and morphology support clades A and O as sister taxa. Despite extensive paralogy, our LEAFY data are phylogenetically informative and identify a clade (T) of three arborescent taxa within clade B. CONCLUSIONS: Amelanchier diploids differ strikingly from polyploid apomicts, in that hybridization among them is rare, and they form taxa that would qualify as species by most species concepts. Knowledge of diploid morphology, phylogeny, and ecogeography provides a foundation for understanding the evolutionary history of polyploid apomicts, their patterns of diversification, and their species status.