RESUMEN
Many distantly related organisms have convergently evolved traits and lifestyles that enable them to live in similar ecological environments. However, the extent of phenotypic convergence evolving through the same or distinct genetic trajectories remains an open question. Here, we leverage a comprehensive dataset of genomic and phenotypic data from 1,049 yeast species in the subphylum Saccharomycotina (Kingdom Fungi, Phylum Ascomycota) to explore signatures of convergent evolution in cactophilic yeasts, ecological specialists associated with cacti. We inferred that the ecological association of yeasts with cacti arose independently approximately 17 times. Using a machine learning-based approach, we further found that cactophily can be predicted with 76% accuracy from both functional genomic and phenotypic data. The most informative feature for predicting cactophily was thermotolerance, which we found to be likely associated with altered evolutionary rates of genes impacting the cell envelope in several cactophilic lineages. We also identified horizontal gene transfer and duplication events of plant cell wall-degrading enzymes in distantly related cactophilic clades, suggesting that putatively adaptive traits evolved independently through disparate molecular mechanisms. Notably, we found that multiple cactophilic species and their close relatives have been reported as emerging human opportunistic pathogens, suggesting that the cactophilic lifestyle-and perhaps more generally lifestyles favoring thermotolerance-might preadapt yeasts to cause human disease. This work underscores the potential of a multifaceted approach involving high-throughput genomic and phenotypic data to shed light onto ecological adaptation and highlights how convergent evolution to wild environments could facilitate the transition to human pathogenicity.
Asunto(s)
Cactaceae , Cactaceae/microbiología , Cactaceae/genética , Filogenia , Levaduras/genética , Genoma Fúngico/genética , Evolución Biológica , Evolución Molecular , Fenotipo , Transferencia de Gen Horizontal , Termotolerancia/genética , Ascomicetos/genética , Ascomicetos/patogenicidad , Aprendizaje AutomáticoRESUMEN
Nearly all genetic variants that influence disease risk have human-specific origins; however, the systems they influence have ancient roots that often trace back to evolutionary events long before the origin of humans. Here, we review how advances in our understanding of the genetic architectures of diseases, recent human evolution and deep evolutionary history can help explain how and why humans in modern environments become ill. Human populations exhibit differences in the prevalence of many common and rare genetic diseases. These differences are largely the result of the diverse environmental, cultural, demographic and genetic histories of modern human populations. Synthesizing our growing knowledge of evolutionary history with genetic medicine, while accounting for environmental and social factors, will help to achieve the promise of personalized genomics and realize the potential hidden in an individual's DNA sequence to guide clinical decisions. In short, precision medicine is fundamentally evolutionary medicine, and integration of evolutionary perspectives into the clinic will support the realization of its full potential.
Asunto(s)
Enfermedad/genética , Evolución Molecular , Estado de Salud , Variación Genética , HumanosRESUMEN
How genomic differences contribute to phenotypic differences is a major question in biology. The recently characterized genomes, isolation environments, and qualitative patterns of growth on 122 sources and conditions of 1,154 strains from 1,049 fungal species (nearly all known) in the yeast subphylum Saccharomycotina provide a powerful, yet complex, dataset for addressing this question. We used a random forest algorithm trained on these genomic, metabolic, and environmental data to predict growth on several carbon sources with high accuracy. Known structural genes involved in assimilation of these sources and presence/absence patterns of growth in other sources were important features contributing to prediction accuracy. By further examining growth on galactose, we found that it can be predicted with high accuracy from either genomic (92.2%) or growth data (82.6%) but not from isolation environment data (65.6%). Prediction accuracy was even higher (93.3%) when we combined genomic and growth data. After the GALactose utilization genes, the most important feature for predicting growth on galactose was growth on galactitol, raising the hypothesis that several species in two orders, Serinales and Pichiales (containing the emerging pathogen Candida auris and the genus Ogataea, respectively), have an alternative galactose utilization pathway because they lack the GAL genes. Growth and biochemical assays confirmed that several of these species utilize galactose through an alternative oxidoreductive D-galactose pathway, rather than the canonical GAL pathway. Machine learning approaches are powerful for investigating the evolution of the yeast genotype-phenotype map, and their application will uncover novel biology, even in well-studied traits.
Asunto(s)
Galactosa , Aprendizaje Automático , Galactosa/metabolismo , Genoma Fúngico , Redes y Vías Metabólicas/genética , Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genéticaRESUMEN
The Saccharomycotina yeasts ("yeasts" hereafter) are a fungal clade of scientific, economic, and medical significance. Yeasts are highly ecologically diverse, found across a broad range of environments in every biome and continent on earth; however, little is known about what rules govern the macroecology of yeast species and their range limits in the wild. Here, we trained machine learning models on 12,816 terrestrial occurrence records and 96 environmental variables to infer global distribution maps at ~1 km2 resolution for 186 yeast species (~15% of described species from 75% of orders) and to test environmental drivers of yeast biogeography and macroecology. We found that predicted yeast diversity hotspots occur in mixed montane forests in temperate climates. Diversity in vegetation type and topography were some of the greatest predictors of yeast species richness, suggesting that microhabitats and environmental clines are key to yeast diversity. We further found that range limits in yeasts are significantly influenced by carbon niche breadth and range overlap with other yeast species, with carbon specialists and species in high-diversity environments exhibiting reduced geographic ranges. Finally, yeasts contravene many long-standing macroecological principles, including the latitudinal diversity gradient, temperature-dependent species richness, and a positive relationship between latitude and range size (Rapoport's rule). These results unveil how the environment governs the global diversity and distribution of species in the yeast subphylum. These high-resolution models of yeast species distributions will facilitate the prediction of economically relevant and emerging pathogenic species under current and future climate scenarios.
Asunto(s)
Biodiversidad , Ecosistema , Clima , Bosques , Carbono , LevadurasRESUMEN
The Leloir galactose utilization or GAL pathway of budding yeasts, including that of the baker's yeast Saccharomyces cerevisiae and the opportunistic human pathogen Candida albicans, breaks down the sugar galactose for energy and biomass production. The GAL pathway has long served as a model system for understanding how eukaryotic metabolic pathways, including their modes of regulation, evolve. More recently, the physical linkage of the structural genes GAL1, GAL7, and GAL10 in diverse budding yeast genomes has been used as a model for understanding the evolution of gene clustering. In this review, we summarize exciting recent work on three different aspects of this iconic pathway's evolution: gene cluster organization, GAL gene regulation, and the population genetics of the GAL pathway.
Asunto(s)
Saccharomycetales , Galactosa/genética , Galactosa/metabolismo , Genes Fúngicos , Humanos , Familia de Multigenes , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Saccharomycetales/genética , Saccharomycetales/metabolismoRESUMEN
Natural selection shapes the genetic architecture of many human traits. However, the prevalence of different modes of selection on genomic regions associated with variation in traits remains poorly understood. To address this, we developed an efficient computational framework to calculate positive and negative enrichment of different evolutionary measures among regions associated with complex traits. We applied the framework to summary statistics from >900 genome-wide association studies (GWASs) and 11 evolutionary measures of sequence constraint, population differentiation, and allele age while accounting for linkage disequilibrium, allele frequency, and other potential confounders. We demonstrate that this framework yields consistent results across GWASs with variable sample sizes, numbers of trait-associated SNPs, and analytical approaches. The resulting evolutionary atlas maps diverse signatures of selection on genomic regions associated with complex human traits on an unprecedented scale. We detected positive enrichment for sequence conservation among trait-associated regions for the majority of traits (>77% of 290 high power GWASs), which included reproductive traits. Many traits also exhibited substantial positive enrichment for population differentiation, especially among hair, skin, and pigmentation traits. In contrast, we detected widespread negative enrichment for signatures of balancing selection (51% of GWASs) and absence of enrichment for evolutionary signals in regions associated with late-onset Alzheimer's disease. These results support a pervasive role for negative selection on regions of the human genome that contribute to variation in complex traits, but also demonstrate that diverse modes of evolution are likely to have shaped trait-associated loci. This atlas of evolutionary signatures across the diversity of available GWASs will enable exploration of the relationship between the genetic architecture and evolutionary processes in the human genome.
Asunto(s)
Estudio de Asociación del Genoma Completo , Selección Genética , Humanos , Desequilibrio de Ligamiento , Fenotipo , Genómica , Polimorfismo de Nucleótido Simple/genética , Genoma Humano/genéticaRESUMEN
SUMMARY: GSEL is a computational framework for calculating the enrichment of signatures of diverse evolutionary forces in a set of genomic regions. GSEL can flexibly integrate any sequence-based evolutionary metric and analyze sets of human genomic regions identified by genome-wide assays (e.g. GWAS, eQTL, *-seq). The core of GSEL's approach is the generation of empirical null distributions tailored to the allele frequency and linkage disequilibrium structure of the regions of interest. We illustrate the application of GSEL to variants identified from a GWAS of body mass index, a highly polygenic trait. AVAILABILITY AND IMPLEMENTATION: GSEL is implemented as a fast, flexible and user-friendly python package. It is available with demonstration data at https://github.com/abraham-abin13/gsel_vec. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Índice de Masa Corporal , Genoma Humano , Genómica , Programas Informáticos , Humanos , Frecuencia de los Genes , Estudio de Asociación del Genoma CompletoRESUMEN
[This corrects the article DOI: 10.1371/journal.pgen.1008304.].
RESUMEN
Aspergillus fumigatus is the main etiological agent of aspergillosis. The antifungal drug caspofungin (CSP) can be used against A. fumigatus, and CSP tolerance is observed. We have previously shown that the transcription factor FhdA is important for mitochondrial activity. Here, we show that FhdA regulates genes transcribed by RNA polymerase II and III. FhdA influences the expression of tRNAs that are important for mitochondrial function upon CSP. Our results show a completely novel mechanism that is impacted by CSP.
Asunto(s)
Antifúngicos , Aspergillus fumigatus , Antifúngicos/metabolismo , Antifúngicos/farmacología , Caspofungina/farmacología , Uso de Codones , Equinocandinas/genética , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Lipopéptidos/farmacología , Mitocondrias/genética , Mitocondrias/metabolismo , ARN Polimerasa II/genética , Factores de Transcripción/genéticaRESUMEN
MOTIVATION: Diverse disciplines in biology process and analyze multiple sequence alignments (MSAs) and phylogenetic trees to evaluate their information content, infer evolutionary events and processes and predict gene function. However, automated processing of MSAs and trees remains a challenge due to the lack of a unified toolkit. To fill this gap, we introduce PhyKIT, a toolkit for the UNIX shell environment with 30 functions that process MSAs and trees, including but not limited to estimation of mutation rate, evaluation of sequence composition biases, calculation of the degree of violation of a molecular clock and collapsing bipartitions (internal branches) with low support. RESULTS: To demonstrate the utility of PhyKIT, we detail three use cases: (1) summarizing information content in MSAs and phylogenetic trees for diagnosing potential biases in sequence or tree data; (2) evaluating gene-gene covariation of evolutionary rates to identify functional relationships, including novel ones, among genes and (3) identify lack of resolution events or polytomies in phylogenetic trees, which are suggestive of rapid radiation events or lack of data. We anticipate PhyKIT will be useful for processing, examining and deriving biological meaning from increasingly large phylogenomic datasets. AVAILABILITY AND IMPLEMENTATION: PhyKIT is freely available on GitHub (https://github.com/JLSteenwyk/PhyKIT), PyPi (https://pypi.org/project/phykit/) and the Anaconda Cloud (https://anaconda.org/JLSteenwyk/phykit) under the MIT license with extensive documentation and user tutorials (https://jlsteenwyk.com/PhyKIT). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMEN
Cell-cycle checkpoints and DNA repair processes protect organisms from potentially lethal mutational damage. Compared to other budding yeasts in the subphylum Saccharomycotina, we noticed that a lineage in the genus Hanseniaspora exhibited very high evolutionary rates, low Guanine-Cytosine (GC) content, small genome sizes, and lower gene numbers. To better understand Hanseniaspora evolution, we analyzed 25 genomes, including 11 newly sequenced, representing 18/21 known species in the genus. Our phylogenomic analyses identify two Hanseniaspora lineages, a faster-evolving lineage (FEL), which began diversifying approximately 87 million years ago (mya), and a slower-evolving lineage (SEL), which began diversifying approximately 54 mya. Remarkably, both lineages lost genes associated with the cell cycle and genome integrity, but these losses were greater in the FEL. E.g., all species lost the cell-cycle regulator WHIskey 5 (WHI5), and the FEL lost components of the spindle checkpoint pathway (e.g., Mitotic Arrest-Deficient 1 [MAD1], Mitotic Arrest-Deficient 2 [MAD2]) and DNA-damage-checkpoint pathway (e.g., Mitosis Entry Checkpoint 3 [MEC3], RADiation sensitive 9 [RAD9]). Similarly, both lineages lost genes involved in DNA repair pathways, including the DNA glycosylase gene 3-MethylAdenine DNA Glycosylase 1 (MAG1), which is part of the base-excision repair pathway, and the DNA photolyase gene PHotoreactivation Repair deficient 1 (PHR1), which is involved in pyrimidine dimer repair. Strikingly, the FEL lost 33 additional genes, including polymerases (i.e., POLymerase 4 [POL4] and POL32) and telomere-associated genes (e.g., Repressor/activator site binding protein-Interacting Factor 1 [RIF1], Replication Factor A 3 [RFA3], Cell Division Cycle 13 [CDC13], Pbp1p Binding Protein [PBP2]). Echoing these losses, molecular evolutionary analyses reveal that, compared to the SEL, the FEL stem lineage underwent a burst of accelerated evolution, which resulted in greater mutational loads, homopolymer instabilities, and higher fractions of mutations associated with the common endogenously damaged base, 8-oxoguanine. We conclude that Hanseniaspora is an ancient lineage that has diversified and thrived, despite lacking many otherwise highly conserved cell-cycle and genome integrity genes and pathways, and may represent a novel, to our knowledge, system for studying cellular life without them.
Asunto(s)
Ciclo Celular/genética , Reparación del ADN/genética , Genes Fúngicos , Filogenia , Saccharomycetales/citología , Saccharomycetales/genética , Secuencia de Bases , Daño del ADN/genética , Evolución Molecular , FenotipoRESUMEN
Variation in synonymous codon usage is abundant across multiple levels of organization: between codons of an amino acid, between genes in a genome, and between genomes of different species. It is now well understood that variation in synonymous codon usage is influenced by mutational bias coupled with both natural selection for translational efficiency and genetic drift, but how these processes shape patterns of codon usage bias across entire lineages remains unexplored. To address this question, we used a rich genomic data set of 327 species that covers nearly one third of the known biodiversity of the budding yeast subphylum Saccharomycotina. We found that, while genome-wide relative synonymous codon usage (RSCU) for all codons was highly correlated with the GC content of the third codon position (GC3), the usage of codons for the amino acids proline, arginine, and glycine was inconsistent with the neutral expectation where mutational bias coupled with genetic drift drive codon usage. Examination between genes' effective numbers of codons and their GC3 contents in individual genomes revealed that nearly a quarter of genes (381,174/1,683,203; 23%), as well as most genomes (308/327; 94%), significantly deviate from the neutral expectation. Finally, by evaluating the imprint of translational selection on codon usage, measured as the degree to which genes' adaptiveness to the tRNA pool were correlated with selective pressure, we show that translational selection is widespread in budding yeast genomes (264/327; 81%). These results suggest that the contribution of translational selection and drift to patterns of synonymous codon usage across budding yeasts varies across codons, genes, and genomes; whereas drift is the primary driver of global codon usage across the subphylum, the codon bias of large numbers of genes in the majority of genomes is influenced by translational selection.
Asunto(s)
Uso de Codones , Saccharomycetales/genética , Sesgo , Variación Genética , Genoma Fúngico , Selección GenéticaRESUMEN
The ornithine transcarbamylase (OTC) gene is on the X chromosome and its product catalyzes the formation of citrulline from ornithine and carbamylphosphate in the urea cycle. About 10%-15% of patients, clinically diagnosed with OTC deficiency (OTCD), lack identifiable mutations in the coding region or splice junctions of the OTC gene on routine molecular testing. We collected DNA from such patients via retrospective review and by prospective enrollment. In nine of 38 subjects (24%), we identified a sequence variant in the OTC regulatory regions. Eight subjects had unique sequence variants in the OTC promoter and one subject had a novel sequence variant in the OTC enhancer. All sequence variants affect positions that are highly conserved in mammalian OTC genes. Functional studies revealed reduced reporter gene expression with all sequence variants. Two sequence variants caused decreased binding of the HNF4 transcription factor to its mutated binding site. Bioinformatic analyses combined with functional assays can be used to identify and authenticate pathogenic sequence variants in regulatory regions of the OTC gene, in other urea cycle disorders or other inborn errors of metabolism.
Asunto(s)
Elementos de Facilitación Genéticos , Enfermedad por Deficiencia de Ornitina Carbamoiltransferasa/genética , Regiones Promotoras Genéticas , Sitios de Unión/genética , Regulación de la Expresión Génica , Factor Nuclear 4 del Hepatocito/metabolismo , Humanos , Masculino , Mutación , Ornitina/metabolismo , Estudios Prospectivos , Estudios RetrospectivosRESUMEN
Gene gains and losses are a major driver of genome evolution; their precise characterization can provide insights into the origin and diversification of major lineages. Here, we examined gene family evolution of 1,154 genomes from nearly all known species in the medically and technologically important yeast subphylum Saccharomycotina. We found that yeast gene family and genome evolution are distinct from plants, animals, and filamentous ascomycetes and are characterized by small genome sizes and smaller gene numbers but larger gene family sizes. Faster-evolving lineages (FELs) in yeasts experienced significantly higher rates of gene losses-commensurate with a narrowing of metabolic niche breadth-but higher speciation rates than their slower-evolving sister lineages (SELs). Gene families most often lost are those involved in mRNA splicing, carbohydrate metabolism, and cell division and are likely associated with intron loss, metabolic breadth, and non-canonical cell cycle processes. Our results highlight the significant role of gene family contractions in the evolution of yeast metabolism, genome function, and speciation, and suggest that gene family evolutionary trajectories have differed markedly across major eukaryotic lineages.
RESUMEN
Jaundice affects almost all neonates in their first days of life and is caused by the accumulation of bilirubin. Although the core biochemistry of bilirubin metabolism is well understood, it is not clear why some neonates experience more severe jaundice and require treatment with phototherapy. Here, we present the first genome-wide association study of neonatal jaundice to date in nearly 30,000 parent-offspring trios from Norway (cases ≈ 2000). The alternate allele of a common missense variant affecting the sequence of UGT1A4 reduces the susceptibility to jaundice five-fold, which replicated in separate cohorts of neonates of African American and European ancestries. eQTL colocalization analyses indicate that the association may be driven by regulation of UGT1A1 in the intestines, but not in the liver. Our results reveal marked differences in the genetic variants involved in neonatal jaundice compared to those regulating bilirubin levels in adults, suggesting distinct genetic mechanisms for the same biological pathways.
Asunto(s)
Bilirrubina , Estudio de Asociación del Genoma Completo , Glucuronosiltransferasa , Ictericia Neonatal , Humanos , Ictericia Neonatal/genética , Ictericia Neonatal/metabolismo , Bilirrubina/metabolismo , Glucuronosiltransferasa/genética , Glucuronosiltransferasa/metabolismo , Recién Nacido , Adulto , Femenino , Masculino , Polimorfismo de Nucleótido Simple , Predisposición Genética a la Enfermedad , Noruega , Sitios de Carácter Cuantitativo , Alelos , Mutación Missense , Hígado/metabolismo , Población Blanca/genéticaRESUMEN
The fungal genus Aspergillus contains a diversity of species divided into taxonomic sections of closely related species. Section Flavi contains 33 species, many of industrial, agricultural, or medical relevance. Here, we analyze the mitochondrial genomes (mitogenomes) of 20 Flavi species-including 18 newly assembled mitogenomes-and compare their evolutionary history and codon usage bias patterns to their nuclear counterparts. Codon usage bias refers to variable frequencies of synonymous codons in coding DNA and is shaped by a balance of neutral processes and natural selection. All mitogenomes were circular DNA molecules with highly conserved gene content and order. As expected, genomic content, including GC content, and genome size differed greatly between mitochondrial and nuclear genomes. Phylogenetic analysis based on 14 concatenated mitochondrial genes predicted evolutionary relationships largely consistent with those predicted by a phylogeny constructed from 2,422 nuclear genes. Comparing similarities in interspecies patterns of codon usage bias between mitochondrial and nuclear genomes showed that species grouped differently by patterns of codon usage bias depending on whether analyses were performed using mitochondrial or nuclear relative synonymous usage values. We found that patterns of codon usage bias at gene level are more similar between mitogenomes of different species than the mitogenome and nuclear genome of the same species. Finally, we inferred that, although most genes-both nuclear and mitochondrial-deviated from the neutral expectation for codon usage, mitogenomes were not under translational selection while nuclear genomes were under moderate translational selection. These results contribute to the study of mitochondrial genome evolution in filamentous fungi.
Asunto(s)
Uso de Codones , Genoma Mitocondrial , Filogenia , Codón/genética , Genómica , Aspergillus/genéticaRESUMEN
Introduction: Eukaryotic life depends on the functional elements encoded by both the nuclear genome and organellar genomes, such as those contained within the mitochondria. The content, size, and structure of the mitochondrial genome varies across organisms with potentially large implications for phenotypic variance and resulting evolutionary trajectories. Among yeasts in the subphylum Saccharomycotina, extensive differences have been observed in various species relative to the model yeast Saccharomyces cerevisiae, but mitochondrial genome sampling across many groups has been scarce, even as hundreds of nuclear genomes have become available. Methods: By extracting mitochondrial assemblies from existing short-read genome sequence datasets, we have greatly expanded both the number of available genomes and the coverage across sparsely sampled clades. Results: Comparison of 353 yeast mitochondrial genomes revealed that, while size and GC content were fairly consistent across species, those in the genera Metschnikowia and Saccharomyces trended larger, while several species in the order Saccharomycetales, which includes S. cerevisiae, exhibited lower GC content. Extreme examples for both size and GC content were scattered throughout the subphylum. All mitochondrial genomes shared a core set of protein-coding genes for Complexes III, IV, and V, but they varied in the presence or absence of mitochondrially-encoded canonical Complex I genes. We traced the loss of Complex I genes to a major event in the ancestor of the orders Saccharomycetales and Saccharomycodales, but we also observed several independent losses in the orders Phaffomycetales, Pichiales, and Dipodascales. In contrast to prior hypotheses based on smaller-scale datasets, comparison of evolutionary rates in protein-coding genes showed no bias towards elevated rates among aerobically fermenting (Crabtree/Warburg-positive) yeasts. Mitochondrial introns were widely distributed, but they were highly enriched in some groups. The majority of mitochondrial introns were poorly conserved within groups, but several were shared within groups, between groups, and even across taxonomic orders, which is consistent with horizontal gene transfer, likely involving homing endonucleases acting as selfish elements. Discussion: As the number of available fungal nuclear genomes continues to expand, the methods described here to retrieve mitochondrial genome sequences from these datasets will prove invaluable to ensuring that studies of fungal mitochondrial genomes keep pace with their nuclear counterparts.
RESUMEN
Eukaryotic life depends on the functional elements encoded by both the nuclear genome and organellar genomes, such as those contained within the mitochondria. The content, size, and structure of the mitochondrial genome varies across organisms with potentially large implications for phenotypic variance and resulting evolutionary trajectories. Among yeasts in the subphylum Saccharomycotina, extensive differences have been observed in various species relative to the model yeast Saccharomyces cerevisiae, but mitochondrial genome sampling across many groups has been scarce, even as hundreds of nuclear genomes have become available. By extracting mitochondrial assemblies from existing short-read genome sequence datasets, we have greatly expanded both the number of available genomes and the coverage across sparsely sampled clades. Comparison of 353 yeast mitochondrial genomes revealed that, while size and GC content were fairly consistent across species, those in the genera Metschnikowia and Saccharomyces trended larger, while several species in the order Saccharomycetales, which includes S. cerevisiae, exhibited lower GC content. Extreme examples for both size and GC content were scattered throughout the subphylum. All mitochondrial genomes shared a core set of protein-coding genes for Complexes III, IV, and V, but they varied in the presence or absence of mitochondrially-encoded canonical Complex I genes. We traced the loss of Complex I genes to a major event in the ancestor of the orders Saccharomycetales and Saccharomycodales, but we also observed several independent losses in the orders Phaffomycetales, Pichiales, and Dipodascales. In contrast to prior hypotheses based on smaller-scale datasets, comparison of evolutionary rates in protein-coding genes showed no bias towards elevated rates among aerobically fermenting (Crabtree/Warburg-positive) yeasts. Mitochondrial introns were widely distributed, but they were highly enriched in some groups. The majority of mitochondrial introns were poorly conserved within groups, but several were shared within groups, between groups, and even across taxonomic orders, which is consistent with horizontal gene transfer, likely involving homing endonucleases acting as selfish elements. As the number of available fungal nuclear genomes continues to expand, the methods described here to retrieve mitochondrial genome sequences from these datasets will prove invaluable to ensuring that studies of fungal mitochondrial genomes keep pace with their nuclear counterparts.
RESUMEN
The Saccharomycotina yeasts ("yeasts" hereafter) are a fungal clade of scientific, economic, and medical significance. Yeasts are highly ecologically diverse, found across a broad range of environments in every biome and continent on earth1; however, little is known about what rules govern the macroecology of yeast species and their range limits in the wild2. Here, we trained machine learning models on 12,221 occurrence records and 96 environmental variables to infer global distribution maps for 186 yeast species (~15% of described species from 75% of orders) and to test environmental drivers of yeast biogeography and macroecology. We found that predicted yeast diversity hotspots occur in mixed montane forests in temperate climates. Diversity in vegetation type and topography were some of the greatest predictors of yeast species richness, suggesting that microhabitats and environmental clines are key to yeast diversification. We further found that range limits in yeasts are significantly influenced by carbon niche breadth and range overlap with other yeast species, with carbon specialists and species in high diversity environments exhibiting reduced geographic ranges. Finally, yeasts contravene many longstanding macroecological principles, including the latitudinal diversity gradient, temperature-dependent species richness, and latitude-dependent range size (Rapoport's rule). These results unveil how the environment governs the global diversity and distribution of species in the yeast subphylum. These high-resolution models of yeast species distributions will facilitate the prediction of economically relevant and emerging pathogenic species under current and future climate scenarios.
RESUMEN
Many distantly related organisms have convergently evolved traits and lifestyles that enable them to live in similar ecological environments. However, the extent of phenotypic convergence evolving through the same or distinct genetic trajectories remains an open question. Here, we leverage a comprehensive dataset of genomic and phenotypic data from 1,049 yeast species in the subphylum Saccharomycotina (Kingdom Fungi, Phylum Ascomycota) to explore signatures of convergent evolution in cactophilic yeasts, ecological specialists associated with cacti. We inferred that the ecological association of yeasts with cacti arose independently ~17 times. Using machine-learning, we further found that cactophily can be predicted with 76% accuracy from functional genomic and phenotypic data. The most informative feature for predicting cactophily was thermotolerance, which is likely associated with duplication and altered evolutionary rates of genes impacting the cell envelope in several cactophilic lineages. We also identified horizontal gene transfer and duplication events of plant cell wall-degrading enzymes in distantly related cactophilic clades, suggesting that putatively adaptive traits evolved through disparate molecular mechanisms. Remarkably, multiple cactophilic lineages and their close relatives are emerging human opportunistic pathogens, suggesting that the cactophilic lifestyle-and perhaps more generally lifestyles favoring thermotolerance-may preadapt yeasts to cause human disease. This work underscores the potential of a multifaceted approach involving high throughput genomic and phenotypic data to shed light onto ecological adaptation and highlights how convergent evolution to wild environments could facilitate the transition to human pathogenicity.