Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 58
Filter
1.
Cell ; 185(16): 2975-2987.e10, 2022 08 04.
Article in English | MEDLINE | ID: mdl-35853453

ABSTRACT

Horizontal gene transfer (HGT) is an important evolutionary force shaping prokaryotic and eukaryotic genomes. HGT-acquired genes have been sporadically reported in insects, a lineage containing >50% of animals. We systematically examined HGT in 218 high-quality genomes of diverse insects and found that they acquired 1,410 genes exhibiting diverse functions, including many not previously reported, via 741 distinct transfers from non-metazoan donors. Lepidopterans had the highest average number of HGT-acquired genes. HGT-acquired genes containing introns exhibited substantially higher expression levels than genes lacking introns, suggesting that intron gains were likely involved in HGT adaptation. Lastly, we used the CRISPR-Cas9 system to edit the prevalent unreported gene LOC105383139, which was transferred into the last common ancestor of moths and butterflies. In diamondback moths, males lacking LOC105383139 courted females significantly less. We conclude that HGT has been a major contributor to insect adaptation.


Subject(s)
Butterflies , Gene Transfer, Horizontal , Animals , Butterflies/genetics , Courtship , Evolution, Molecular , Male , Phylogeny
2.
Cell ; 176(6): 1356-1366.e10, 2019 03 07.
Article in English | MEDLINE | ID: mdl-30799038

ABSTRACT

Operons are a hallmark of bacterial genomes, where they allow concerted expression of functionally related genes as single polycistronic transcripts. They are rare in eukaryotes, where each gene usually drives expression of its own independent messenger RNAs. Here, we report the horizontal operon transfer of a siderophore biosynthesis pathway from relatives of Escherichia coli into a group of budding yeast taxa. We further show that the co-linearly arranged secondary metabolism genes are expressed, exhibit eukaryotic transcriptional features, and enable the sequestration and uptake of iron. After transfer, several genetic changes occurred during subsequent evolution, including the gain of new transcription start sites that were sometimes within protein-coding sequences, acquisition of polyadenylation sites, structural rearrangements, and integration of eukaryotic genes into the cluster. We conclude that the genes were likely acquired as a unit, modified for eukaryotic gene expression, and maintained by selection to adapt to the highly competitive, iron-limited environment.


Subject(s)
Eukaryota/genetics , Gene Transfer, Horizontal/genetics , Operon/genetics , Bacteria/genetics , Escherichia coli/genetics , Eukaryotic Cells , Evolution, Molecular , Gene Expression Regulation, Bacterial/genetics , Genes, Bacterial/genetics , Genome, Bacterial/genetics , Genome, Fungal/genetics , Saccharomycetales/genetics , Siderophores/genetics
3.
Cell ; 175(6): 1533-1545.e20, 2018 11 29.
Article in English | MEDLINE | ID: mdl-30415838

ABSTRACT

Budding yeasts (subphylum Saccharomycotina) are found in every biome and are as genetically diverse as plants or animals. To understand budding yeast evolution, we analyzed the genomes of 332 yeast species, including 220 newly sequenced ones, which represent nearly one-third of all known budding yeast diversity. Here, we establish a robust genus-level phylogeny comprising 12 major clades, infer the timescale of diversification from the Devonian period to the present, quantify horizontal gene transfer (HGT), and reconstruct the evolution of 45 metabolic traits and the metabolic toolkit of the budding yeast common ancestor (BYCA). We infer that BYCA was metabolically complex and chronicle the tempo and mode of genomic and phenotypic evolution across the subphylum, which is characterized by very low HGT levels and widespread losses of traits and the genes that control them. More generally, our results argue that reductive evolution is a major mode of evolutionary diversification.


Subject(s)
Evolution, Molecular , Gene Transfer, Horizontal , Genome, Fungal , Phylogeny , Saccharomycetales/classification , Saccharomycetales/genetics
4.
Nat Rev Genet ; 24(12): 834-850, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37369847

ABSTRACT

Genome-scale data and the development of novel statistical phylogenetic approaches have greatly aided the reconstruction of a broad sketch of the tree of life and resolved many of its branches. However, incongruence - the inference of conflicting evolutionary histories - remains pervasive in phylogenomic data, hampering our ability to reconstruct and interpret the tree of life. Biological factors, such as incomplete lineage sorting, horizontal gene transfer, hybridization, introgression, recombination and convergent molecular evolution, can lead to gene phylogenies that differ from the species tree. In addition, analytical factors, including stochastic, systematic and treatment errors, can drive incongruence. Here, we review these factors, discuss methodological advances to identify and handle incongruence, and highlight avenues for future research.


Subject(s)
Biological Evolution , Genome , Phylogeny , Evolution, Molecular , Hybridization, Genetic
5.
Plant Cell ; 36(5): 1637-1654, 2024 May 01.
Article in English | MEDLINE | ID: mdl-38114096

ABSTRACT

MicroRNAs (miRNAs) are a class of nonprotein-coding short transcripts that provide a layer of post-transcriptional regulation essential to many plant biological processes. MiR858, which targets the transcripts of MYB transcription factors, can affect a range of secondary metabolic processes. Although miR858 and its 187-nt precursor have been well studied in Arabidopsis (Arabidopsis thaliana), a systematic investigation of miR858 precursors and their functions across plant species is lacking due to a problem in identifying the transcripts that generate this subclass. By re-evaluating the transcript of miR858 and relaxing the length cut-off for identifying hairpins, we found in kiwifruit (Actinidia chinensis) that miR858 has long-loop hairpins (1,100 to 2,100 nt), whose intervening sequences between miRNA generating complementary sites were longer than all previously reported miRNA hairpins. Importantly, these precursors of miR858 containing long-loop hairpins (termed MIR858L) are widespread in seed plants including Arabidopsis, varying between 350 and 5,500 nt. Moreover, we showed that MIR858L has a greater impact on proanthocyanidin and flavonol levels in both Arabidopsis and kiwifruit. We suggest that an active MIR858L-MYB regulatory module appeared in the transition of early land plants to large upright flowering plants, making a key contribution to plant secondary metabolism.


Subject(s)
Actinidia , Arabidopsis , Gene Expression Regulation, Plant , MicroRNAs , RNA, Plant , MicroRNAs/genetics , MicroRNAs/metabolism , Actinidia/genetics , Actinidia/metabolism , Arabidopsis/genetics , RNA, Plant/genetics , RNA, Plant/metabolism , Seeds/genetics , Seeds/metabolism , Base Sequence
6.
Proc Natl Acad Sci U S A ; 121(18): e2315314121, 2024 Apr 30.
Article in English | MEDLINE | ID: mdl-38669185

ABSTRACT

How genomic differences contribute to phenotypic differences is a major question in biology. The recently characterized genomes, isolation environments, and qualitative patterns of growth on 122 sources and conditions of 1,154 strains from 1,049 fungal species (nearly all known) in the yeast subphylum Saccharomycotina provide a powerful, yet complex, dataset for addressing this question. We used a random forest algorithm trained on these genomic, metabolic, and environmental data to predict growth on several carbon sources with high accuracy. Known structural genes involved in assimilation of these sources and presence/absence patterns of growth in other sources were important features contributing to prediction accuracy. By further examining growth on galactose, we found that it can be predicted with high accuracy from either genomic (92.2%) or growth data (82.6%) but not from isolation environment data (65.6%). Prediction accuracy was even higher (93.3%) when we combined genomic and growth data. After the GALactose utilization genes, the most important feature for predicting growth on galactose was growth on galactitol, raising the hypothesis that several species in two orders, Serinales and Pichiales (containing the emerging pathogen Candida auris and the genus Ogataea, respectively), have an alternative galactose utilization pathway because they lack the GAL genes. Growth and biochemical assays confirmed that several of these species utilize galactose through an alternative oxidoreductive D-galactose pathway, rather than the canonical GAL pathway. Machine learning approaches are powerful for investigating the evolution of the yeast genotype-phenotype map, and their application will uncover novel biology, even in well-studied traits.


Subject(s)
Galactose , Machine Learning , Galactose/metabolism , Genome, Fungal , Metabolic Networks and Pathways/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae/genetics
7.
Proc Natl Acad Sci U S A ; 121(10): e2316031121, 2024 Mar 05.
Article in English | MEDLINE | ID: mdl-38412132

ABSTRACT

The Saccharomycotina yeasts ("yeasts" hereafter) are a fungal clade of scientific, economic, and medical significance. Yeasts are highly ecologically diverse, found across a broad range of environments in every biome and continent on earth; however, little is known about what rules govern the macroecology of yeast species and their range limits in the wild. Here, we trained machine learning models on 12,816 terrestrial occurrence records and 96 environmental variables to infer global distribution maps at ~1 km2 resolution for 186 yeast species (~15% of described species from 75% of orders) and to test environmental drivers of yeast biogeography and macroecology. We found that predicted yeast diversity hotspots occur in mixed montane forests in temperate climates. Diversity in vegetation type and topography were some of the greatest predictors of yeast species richness, suggesting that microhabitats and environmental clines are key to yeast diversity. We further found that range limits in yeasts are significantly influenced by carbon niche breadth and range overlap with other yeast species, with carbon specialists and species in high-diversity environments exhibiting reduced geographic ranges. Finally, yeasts contravene many long-standing macroecological principles, including the latitudinal diversity gradient, temperature-dependent species richness, and a positive relationship between latitude and range size (Rapoport's rule). These results unveil how the environment governs the global diversity and distribution of species in the yeast subphylum. These high-resolution models of yeast species distributions will facilitate the prediction of economically relevant and emerging pathogenic species under current and future climate scenarios.


Subject(s)
Biodiversity , Ecosystem , Climate , Forests , Carbon , Yeasts
8.
Mol Biol Evol ; 41(4)2024 Apr 02.
Article in English | MEDLINE | ID: mdl-38415839

ABSTRACT

Siderophores are crucial for iron-scavenging in microorganisms. While many yeasts can uptake siderophores produced by other organisms, they are typically unable to synthesize siderophores themselves. In contrast, Wickerhamiella/Starmerella (W/S) clade yeasts gained the capacity to make the siderophore enterobactin following the remarkable horizontal acquisition of a bacterial operon enabling enterobactin synthesis. Yet, how these yeasts absorb the iron bound by enterobactin remains unresolved. Here, we demonstrate that Enb1 is the key enterobactin importer in the W/S-clade species Starmerella bombicola. Through phylogenomic analyses, we show that ENB1 is present in all W/S clade yeast species that retained the enterobactin biosynthetic genes. Conversely, it is absent in species that lost the ent genes, except for Starmerella stellata, making this species the only cheater in the W/S clade that can utilize enterobactin without producing it. Through phylogenetic analyses, we infer that ENB1 is a fungal gene that likely existed in the W/S clade prior to the acquisition of the ent genes and subsequently experienced multiple gene losses and duplications. Through phylogenetic topology tests, we show that ENB1 likely underwent horizontal gene transfer from an ancient W/S clade yeast to the order Saccharomycetales, which includes the model yeast Saccharomyces cerevisiae, followed by extensive secondary losses. Taken together, these results suggest that the fungal ENB1 and bacterial ent genes were cooperatively integrated into a functional unit within the W/S clade that enabled adaptation to iron-limited environments. This integrated fungal-bacterial circuit and its dynamic evolution determine the extant distribution of yeast enterobactin producers and cheaters.


Subject(s)
Enterobactin , Evolution, Molecular , Operon , Phylogeny , Enterobactin/metabolism , Enterobactin/genetics , Siderophores/metabolism , Siderophores/genetics , Genes, Fungal , Saccharomycetales/genetics , Saccharomycetales/metabolism , Gene Transfer, Horizontal
9.
Syst Biol ; 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38940001

ABSTRACT

Maximum likelihood (ML) phylogenetic inference is widely used in phylogenomics. As heuristic searches most likely find suboptimal trees, it is recommended to conduct multiple (e.g., ten) tree searches in phylogenetic analyses. However, beyond its positive role, how and to what extent multiple tree searches aid ML phylogenetic inference remains poorly explored. Here, we found that a random starting tree was not as effective as the BioNJ and parsimony starting trees in inferring ML gene tree and that RAxML-NG and PhyML were less sensitive to different starting trees than IQ-TREE. We then examined the effect of the number of tree searches on ML tree inference with IQ-TREE and RAxML-NG, by running 100 tree searches on 19,414 gene alignments from 15 animal, plant, and fungal phylogenomic datasets. We found that the number of tree searches substantially impacted the recovery of the best-of-100 ML gene tree topology among 100 searches for a given ML program. In addition, all of the concatenation-based trees were topologically identical if the number of tree searches was ≥ 10. Quartet-based ASTRAL trees inferred from 1 to 80 tree searches differed topologically from those inferred from 100 tree searches for 6 /15 phylogenomic datasets. Lastly, our simulations showed that gene alignments with lower difficulty scores had a higher chance of finding the best-of-100 gene tree topology and were more likely to yield the correct trees.

10.
PLoS Biol ; 20(10): e3001827, 2022 10.
Article in English | MEDLINE | ID: mdl-36228036

ABSTRACT

Molecular evolution studies, such as phylogenomic studies and genome-wide surveys of selection, often rely on gene families of single-copy orthologs (SC-OGs). Large gene families with multiple homologs in 1 or more species-a phenomenon observed among several important families of genes such as transporters and transcription factors-are often ignored because identifying and retrieving SC-OGs nested within them is challenging. To address this issue and increase the number of markers used in molecular evolution studies, we developed OrthoSNAP, a software that uses a phylogenetic framework to simultaneously split gene families into SC-OGs and prune species-specific inparalogs. We term SC-OGs identified by OrthoSNAP as SNAP-OGs because they are identified using a splitting and pruning procedure analogous to snapping branches on a tree. From 415,129 orthologous groups of genes inferred across 7 eukaryotic phylogenomic datasets, we identified 9,821 SC-OGs; using OrthoSNAP on the remaining 405,308 orthologous groups of genes, we identified an additional 10,704 SNAP-OGs. Comparison of SNAP-OGs and SC-OGs revealed that their phylogenetic information content was similar, even in complex datasets that contain a whole-genome duplication, complex patterns of duplication and loss, transcriptome data where each gene typically has multiple transcripts, and contentious branches in the tree of life. OrthoSNAP is useful for increasing the number of markers used in molecular evolution data matrices, a critical step for robustly inferring and exploring the tree of life.


Subject(s)
Algorithms , Evolution, Molecular , Phylogeny , Pedigree , Transcription Factors
11.
New Phytol ; 2024 Aug 21.
Article in English | MEDLINE | ID: mdl-39166427

ABSTRACT

Horizontal gene transfer (HGT) is a major driving force in the evolution of prokaryotic and eukaryotic genomes. Despite recent advances in distribution and ecological importance, the extensive pattern, especially in seed plants, and post-transfer adaptation of HGT-acquired genes in land plants remain elusive. We systematically identified 1150 foreign genes in 522 land plant genomes that were likely acquired via at least 322 distinct transfers from nonplant donors and confirmed that recent HGT events were unevenly distributed between seedless and seed plants. HGT-acquired genes evolved to be more similar to native genes in terms of average intron length due to intron gains, and HGT-acquired genes containing introns exhibited higher expression levels than those lacking introns, suggesting that intron gains may be involved in the post-transfer adaptation of HGT in land plants. Functional validation of bacteria-derived gene GuaD in mosses and gymnosperms revealed that the invasion of foreign genes introduced a novel bypass of guanine degradation and resulted in the loss of native pathway genes in some gymnosperms, eventually shaping three major types of guanine metabolism in land plants. We conclude that HGT has played a critical role in land plant evolution.

12.
PLoS Biol ; 19(8): e3001365, 2021 08.
Article in English | MEDLINE | ID: mdl-34358228

ABSTRACT

Phylogenomic analyses of hundreds of protein-coding genes aimed at resolving phylogenetic relationships is now a common practice. However, no software currently exists that includes tools for dataset construction and subsequent analysis with diverse validation strategies to assess robustness. Furthermore, there are no publicly available high-quality curated databases designed to assess deep (>100 million years) relationships in the tree of eukaryotes. To address these issues, we developed an easy-to-use software package, PhyloFisher (https://github.com/TheBrownLab/PhyloFisher), written in Python 3. PhyloFisher includes a manually curated database of 240 protein-coding genes from 304 eukaryotic taxa covering known eukaryotic diversity, a novel tool for ortholog selection, and utilities that will perform diverse analyses required by state-of-the-art phylogenomic investigations. Through phylogenetic reconstructions of the tree of eukaryotes and of the Saccharomycetaceae clade of budding yeasts, we demonstrate the utility of the PhyloFisher workflow and the provided starting database to address phylogenetic questions across a large range of evolutionary time points for diverse groups of organisms. We also demonstrate that undetected paralogy can remain in phylogenomic "single-copy orthogroup" datasets constructed using widely accepted methods such as all vs. all BLAST searches followed by Markov Cluster Algorithm (MCL) clustering and application of automated tree pruning algorithms. Finally, we show how the PhyloFisher workflow helps detect inadvertent paralog inclusions, allowing the user to make more informed decisions regarding orthology assignments, leading to a more accurate final dataset.


Subject(s)
Eukaryota/genetics , Phylogeny , Software
13.
Environ Microbiol ; 25(3): 642-645, 2023 03.
Article in English | MEDLINE | ID: mdl-36511824

ABSTRACT

As the most diverse group of animals on Earth, insects are key organisms in ecosystems. Horizontal gene transfer (HGT) refers to the transfer of genetic material between species by non-reproductive means. HGT is a major evolutionary force in prokaryotic genome evolution, but its importance in different eukaryotic groups, such as insects, has only recently begun to be understood. Genomic data from hundreds of insect species have enabled the detection of large numbers of HGT events and the elucidation of the functions of some of these foreign genes. Although quantification of the extent of HGT in insects broadens our understanding of its role in insect evolution, the scope of its influence and underlying mechanism(s) of its occurrence remain open questions for the field.


Subject(s)
Evolution, Molecular , Gene Transfer, Horizontal , Animals , Ecosystem , Prokaryotic Cells , Insecta , Genome, Insect , Phylogeny
14.
PLoS Biol ; 18(12): e3001007, 2020 12.
Article in English | MEDLINE | ID: mdl-33264284

ABSTRACT

Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Thus, several different trimming strategies have been developed for identifying and removing these sites prior to phylogenetic inference. However, a recent study reported that doing so can worsen inference, underscoring the need for alternative alignment trimming strategies. Here, we introduce ClipKIT, an alignment trimming software that, rather than identifying and removing putatively phylogenetically uninformative sites, instead aims to identify and retain parsimony-informative sites, which are known to be phylogenetically informative. To test the efficacy of ClipKIT, we examined the accuracy and support of phylogenies inferred from 14 different alignment trimming strategies, including those implemented in ClipKIT, across nearly 140,000 alignments from a broad sampling of evolutionary histories. Phylogenies inferred from ClipKIT-trimmed alignments are accurate, robust, and time saving. Furthermore, ClipKIT consistently outperformed other trimming methods across diverse datasets, suggesting that strategies based on identifying and retaining parsimony-informative sites provide a robust framework for alignment trimming.


Subject(s)
Sequence Alignment/methods , Sequence Analysis, DNA/methods , Algorithms , Computer Simulation , Evolution, Molecular , Models, Genetic , Phylogeny , Software
15.
Mol Biol Evol ; 38(10): 4322-4333, 2021 09 27.
Article in English | MEDLINE | ID: mdl-34097041

ABSTRACT

Identifying our most distant animal relatives has emerged as one of the most challenging problems in phylogenetics. This debate has major implications for our understanding of the origin of multicellular animals and of the earliest events in animal evolution, including the origin of the nervous system. Some analyses identify sponges as our most distant animal relatives (Porifera-sister hypothesis), and others identify comb jellies (Ctenophora-sister hypothesis). These analyses vary in many respects, making it difficult to interpret previous tests of these hypotheses. To gain insight into why different studies yield different results, an important next step in the ongoing debate, we systematically test these hypotheses by synthesizing 15 previous phylogenomic studies and performing new standardized analyses under consistent conditions with additional models. We find that Ctenophora-sister is recovered across the full range of examined conditions, and Porifera-sister is recovered in some analyses under narrow conditions when most outgroups are excluded and site-heterogeneous CAT models are used. We additionally find that the number of categories in site-heterogeneous models is sufficient to explain the Porifera-sister results. Furthermore, our cross-validation analyses show CAT models that recover Porifera-sister have hundreds of additional categories and fail to fit significantly better than site-heterogenuous models with far fewer categories. Systematic and standardized testing of diverse phylogenetic models suggests that we should be skeptical of Porifera-sister results both because they are recovered under such narrow conditions and because the models in these conditions fit the data no better than other models that recover Ctenophora-sister.


Subject(s)
Ctenophora , Animals , Phylogeny
16.
EMBO J ; 37(1): 63-74, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29054852

ABSTRACT

DNA glycosylases preserve genome integrity and define the specificity of the base excision repair pathway for discreet, detrimental modifications, and thus, the mechanisms by which glycosylases locate DNA damage are of particular interest. Bacterial AlkC and AlkD are specific for cationic alkylated nucleobases and have a distinctive HEAT-like repeat (HLR) fold. AlkD uses a unique non-base-flipping mechanism that enables excision of bulky lesions more commonly associated with nucleotide excision repair. In contrast, AlkC has a much narrower specificity for small lesions, principally N3-methyladenine (3mA). Here, we describe how AlkC selects for and excises 3mA using a non-base-flipping strategy distinct from that of AlkD. A crystal structure resembling a catalytic intermediate complex shows how AlkC uses unique HLR and immunoglobulin-like domains to induce a sharp kink in the DNA, exposing the damaged nucleobase to active site residues that project into the DNA This active site can accommodate and excise N3-methylcytosine (3mC) and N1-methyladenine (1mA), which are also repaired by AlkB-catalyzed oxidative demethylation, providing a potential alternative mechanism for repair of these lesions in bacteria.


Subject(s)
Bacillus cereus/enzymology , DNA Adducts/chemistry , DNA Adducts/metabolism , DNA Damage , DNA Glycosylases/chemistry , DNA Glycosylases/metabolism , DNA Repair , Adenine/analogs & derivatives , Adenine/chemistry , Alkylation , Amino Acid Sequence , Catalytic Domain , Crystallography, X-Ray , Models, Molecular , Protein Conformation , Sequence Homology
17.
Bioinformatics ; 37(16): 2325-2331, 2021 Aug 25.
Article in English | MEDLINE | ID: mdl-33560364

ABSTRACT

MOTIVATION: Diverse disciplines in biology process and analyze multiple sequence alignments (MSAs) and phylogenetic trees to evaluate their information content, infer evolutionary events and processes and predict gene function. However, automated processing of MSAs and trees remains a challenge due to the lack of a unified toolkit. To fill this gap, we introduce PhyKIT, a toolkit for the UNIX shell environment with 30 functions that process MSAs and trees, including but not limited to estimation of mutation rate, evaluation of sequence composition biases, calculation of the degree of violation of a molecular clock and collapsing bipartitions (internal branches) with low support. RESULTS: To demonstrate the utility of PhyKIT, we detail three use cases: (1) summarizing information content in MSAs and phylogenetic trees for diagnosing potential biases in sequence or tree data; (2) evaluating gene-gene covariation of evolutionary rates to identify functional relationships, including novel ones, among genes and (3) identify lack of resolution events or polytomies in phylogenetic trees, which are suggestive of rapid radiation events or lack of data. We anticipate PhyKIT will be useful for processing, examining and deriving biological meaning from increasingly large phylogenomic datasets. AVAILABILITY AND IMPLEMENTATION: PhyKIT is freely available on GitHub (https://github.com/JLSteenwyk/PhyKIT), PyPi (https://pypi.org/project/phykit/) and the Anaconda Cloud (https://anaconda.org/JLSteenwyk/phykit) under the MIT license with extensive documentation and user tutorials (https://jlsteenwyk.com/PhyKIT). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

18.
Syst Biol ; 70(5): 997-1014, 2021 08 11.
Article in English | MEDLINE | ID: mdl-33616672

ABSTRACT

Topological conflict or incongruence is widespread in phylogenomic data. Concatenation- and coalescent-based approaches often result in incongruent topologies, but the causes of this conflict can be difficult to characterize. We examined incongruence stemming from conflict the between likelihood-based signal (quantified by the difference in gene-wise log-likelihood score or $\Delta $GLS) and quartet-based topological signal (quantified by the difference in gene-wise quartet score or $\Delta $GQS) for every gene in three phylogenomic studies in animals, fungi, and plants, which were chosen because their concatenation-based IQ-TREE (T1) and quartet-based ASTRAL (T2) phylogenies are known to produce eight conflicting internal branches (bipartitions). By comparing the types of phylogenetic signal for all genes in these three data matrices, we found that 30-36% of genes in each data matrix are inconsistent, that is, each of these genes has a higher log-likelihood score for T1 versus T2 (i.e., $\Delta $GLS $>$0) whereas its T1 topology has lower quartet score than its T2 topology (i.e., $\Delta $GQS $<$0) or vice versa. Comparison of inconsistent and consistent genes using a variety of metrics (e.g., evolutionary rate, gene tree topology, distribution of branch lengths, hidden paralogy, and gene tree discordance) showed that inconsistent genes are more likely to recover neither T1 nor T2 and have higher levels of gene tree discordance than consistent genes. Simulation analyses demonstrate that the removal of inconsistent genes from data sets with low levels of incomplete lineage sorting (ILS) and low and medium levels of gene tree estimation error (GTEE) reduced incongruence and increased accuracy. In contrast, removal of inconsistent genes from data sets with medium and high ILS levels and high GTEE levels eliminated or extensively reduced incongruence, but the resulting congruent species phylogenies were not always topologically identical to the true species trees.[Conflict; gene tree; phylogenetic signal; phylogenetics; phylogenomics; Tree of Life.].


Subject(s)
Phylogeny , Animals , Computer Simulation , Likelihood Functions
19.
PLoS Biol ; 17(5): e3000255, 2019 05.
Article in English | MEDLINE | ID: mdl-31112549

ABSTRACT

Cell-cycle checkpoints and DNA repair processes protect organisms from potentially lethal mutational damage. Compared to other budding yeasts in the subphylum Saccharomycotina, we noticed that a lineage in the genus Hanseniaspora exhibited very high evolutionary rates, low Guanine-Cytosine (GC) content, small genome sizes, and lower gene numbers. To better understand Hanseniaspora evolution, we analyzed 25 genomes, including 11 newly sequenced, representing 18/21 known species in the genus. Our phylogenomic analyses identify two Hanseniaspora lineages, a faster-evolving lineage (FEL), which began diversifying approximately 87 million years ago (mya), and a slower-evolving lineage (SEL), which began diversifying approximately 54 mya. Remarkably, both lineages lost genes associated with the cell cycle and genome integrity, but these losses were greater in the FEL. E.g., all species lost the cell-cycle regulator WHIskey 5 (WHI5), and the FEL lost components of the spindle checkpoint pathway (e.g., Mitotic Arrest-Deficient 1 [MAD1], Mitotic Arrest-Deficient 2 [MAD2]) and DNA-damage-checkpoint pathway (e.g., Mitosis Entry Checkpoint 3 [MEC3], RADiation sensitive 9 [RAD9]). Similarly, both lineages lost genes involved in DNA repair pathways, including the DNA glycosylase gene 3-MethylAdenine DNA Glycosylase 1 (MAG1), which is part of the base-excision repair pathway, and the DNA photolyase gene PHotoreactivation Repair deficient 1 (PHR1), which is involved in pyrimidine dimer repair. Strikingly, the FEL lost 33 additional genes, including polymerases (i.e., POLymerase 4 [POL4] and POL32) and telomere-associated genes (e.g., Repressor/activator site binding protein-Interacting Factor 1 [RIF1], Replication Factor A 3 [RFA3], Cell Division Cycle 13 [CDC13], Pbp1p Binding Protein [PBP2]). Echoing these losses, molecular evolutionary analyses reveal that, compared to the SEL, the FEL stem lineage underwent a burst of accelerated evolution, which resulted in greater mutational loads, homopolymer instabilities, and higher fractions of mutations associated with the common endogenously damaged base, 8-oxoguanine. We conclude that Hanseniaspora is an ancient lineage that has diversified and thrived, despite lacking many otherwise highly conserved cell-cycle and genome integrity genes and pathways, and may represent a novel, to our knowledge, system for studying cellular life without them.


Subject(s)
Cell Cycle/genetics , DNA Repair/genetics , Genes, Fungal , Phylogeny , Saccharomycetales/cytology , Saccharomycetales/genetics , Base Sequence , DNA Damage/genetics , Evolution, Molecular , Phenotype
20.
Proc Natl Acad Sci U S A ; 115(43): 11030-11035, 2018 10 23.
Article in English | MEDLINE | ID: mdl-30297402

ABSTRACT

Secondary metabolites are key in how organisms from all domains of life interact with their environment and each other. The iron-binding molecule pulcherrimin was described a century ago, but the genes responsible for its production in budding yeasts have remained uncharacterized. Here, we used phylogenomic footprinting on 90 genomes across the budding yeast subphylum Saccharomycotina to identify the gene cluster associated with pulcherrimin production. Using targeted gene replacements in Kluyveromyces lactis, we characterized the four genes that make up the cluster, which likely encode two pulcherriminic acid biosynthesis enzymes, a pulcherrimin transporter, and a transcription factor involved in both biosynthesis and transport. The requirement of a functional putative transporter to utilize extracellular pulcherrimin-complexed iron demonstrates that pulcherriminic acid is a siderophore, a chelator that binds iron outside the cell for subsequent uptake. Surprisingly, we identified homologs of the putative transporter and transcription factor genes in multiple yeast genera that lacked the biosynthesis genes and could not make pulcherrimin, including the model yeast Saccharomyces cerevisiae We deleted these previously uncharacterized genes and showed they are also required for pulcherrimin utilization in S. cerevisiae, raising the possibility that other genes of unknown function are linked to secondary metabolism. Phylogenetic analyses of this gene cluster suggest that pulcherrimin biosynthesis and utilization were ancestral to budding yeasts, but the biosynthesis genes and, subsequently, the utilization genes, were lost in many lineages, mirroring other microbial public goods systems that lead to the rise of cheater organisms.


Subject(s)
Multigene Family/genetics , Saccharomycetales/genetics , Secondary Metabolism/genetics , Iron/metabolism , Kluyveromyces/genetics , Membrane Transport Proteins/genetics , Phylogeny , Protein Biosynthesis/genetics , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Saccharomycetales/metabolism , Siderophores/genetics , Transcription Factors/genetics
SELECTION OF CITATIONS
SEARCH DETAIL