RESUMEN
Haemoglobin is a key molecule for oxygen transport in vertebrates. It exhibits remarkable gene diversity in teleost fishes, reflecting adaptation to various aquatic environments. In this study, we present the dynamic evolution of haemoglobin subunit genes based on a comparison of high-quality genome assemblies of 24 vertebrate species, including 17 teleosts (of which six are cichlids). Our findings indicate that teleost genomes contain a range of haemoglobin genes, from as few as five in fugu to as many as 43 in salmon, with the latter being the largest repertoire found in vertebrates. We find evidence that the teleost ancestor had at least four Hbα and three or four Hbß subunit genes, and that the current gene diversity emerged during teleost radiation, driven primarily by (tandem) gene duplications, genome compaction, and rearrangement dynamics. We provide insights into the genomic organisation of haemoglobin clusters in different teleost species. We further show that the evolution of paralogous rhbdf1 genes flanking both teleost clusters (LA and MN) supports the hypothesis for the origin of the LA cluster by rearrangement within teleosts, rather than by the teleost specific whole-genome duplication. We specifically focus on cichlid fishes, where adaptation to low oxygen environment plays role in species diversification. Our analysis of six cichlid genomes, including Pungu maclareni from the Barombi Mbo crater lake, for which we sequenced a representative genome, reveals 18-32 copies of the Hb genes, and elevated rates of non-synonymous substitutions compared to other teleosts. Overall, this work facilitates a deeper understanding of how haemoglobin genes contribute to the adaptive potential of teleosts.
RESUMEN
The role of interspecific hybridization has recently seen increasing attention, especially in the context of diversification dynamics. Genomic research has now made it abundantly clear that both hybridization and introgression - the exchange of genetic material through hybridization and backcrossing - are far more common than previously thought. Besides cases of ongoing or recent genetic exchange between taxa, an increasing number of studies report "ancient introgression" - referring to results of hybridization that took place in the distant past. However, it is not clear whether commonly used methods for the detection of introgression are applicable to such old systems, given that most of these methods were originally developed for analyses at the level of populations and recently diverged species, affected by recent or ongoing genetic exchange. In particular, the assumption of constant evolutionary rates, which is implicit in many commonly used approaches, is more likely to be violated as evolutionary divergence increases. To test the limitations of introgression detection methods when being applied to old systems, we simulated thousands of genomic datasets under a wide range of settings, with varying degrees of among-species rate variation and introgression. Using these simulated datasets, we showed that some commonly applied statistical methods, including the D-statistic and certain tests based on sets of local phylogenetic trees, can produce false-positive signals of introgression between divergent taxa that have different rates of evolution. These misleading signals are caused by the presence of homoplasies occurring at different rates in different lineages. To distinguish between the patterns caused by rate variation and genuine introgression, we developed a new test that is based on the expected clustering of introgressed sites along the genome, and implemented this test in the program Dsuite.
RESUMEN
Behavior is critical for animal survival and reproduction, and possibly for diversification and evolutionary radiation. However, the genetics behind adaptive variation in behavior are poorly understood. In this work, we examined a fundamental and widespread behavioral trait, exploratory behavior, in one of the largest adaptive radiations on Earth, the cichlid fishes of Lake Tanganyika. By integrating quantitative behavioral data from 57 cichlid species (702 wild-caught individuals) with high-resolution ecomorphological and genomic information, we show that exploratory behavior is linked to macrohabitat niche adaptations in Tanganyikan cichlids. Furthermore, we uncovered a correlation between the genotypes at a single-nucleotide polymorphism upstream of the AMPA glutamate-receptor regulatory gene cacng5b and variation in exploratory tendency. We validated this association using behavioral predictions with a neural network approach and CRISPR-Cas9 genome editing.
Asunto(s)
Adaptación Fisiológica , Conducta Animal , Cíclidos , Conducta Exploratoria , Receptores AMPA , Animales , Adaptación Fisiológica/genética , Cíclidos/genética , Cíclidos/fisiología , Sistemas CRISPR-Cas , Ecosistema , Edición Génica , Genotipo , Lagos , Polimorfismo de Nucleótido Simple , Receptores AMPA/genéticaRESUMEN
With the advent of high-throughput genome sequencing, bioinformatics training has become essential for research in evolutionary biology and related fields. However, individual research groups are often not in the position to teach students about the most up-to-date methodology in the field. To fill this gap, extended bioinformatics courses have been developed by various institutions and provide intense training over the course of two or more weeks. Here, we describe our experience with the organization of a course in one of the longest-running extended bioinformatics series of workshops, the Evomics Workshop on Population and Speciation Genomics that takes place biennially in the UNESCO world heritage town of Ceský Krumlov, Czech Republic. We list the key ingredients that make this workshop successful in our view, explain the routine for workshop organization that we have optimized over the years, and describe the most important lessons that we have learned from it. We report the results of a survey conducted among past workshop participants that quantifies measures of effective teaching and provide examples of how the workshop setting has led to the cross-fertilisation of ideas and ultimately scientific progress. We expect that our account may be useful for other groups aiming to set up their own extended bioinformatics courses.
RESUMEN
Epigenetic variation can alter transcription and promote phenotypic divergence between populations facing different environmental challenges. Here, we assess the epigenetic basis of diversification during the early stages of speciation. Specifically, we focus on the extent and functional relevance of DNA methylome divergence in the very young radiation of Astatotilapia calliptera in crater Lake Masoko, southern Tanzania. Our study focuses on two lake ecomorphs that diverged approximately 1,000 years ago and a population in the nearby river from which they separated approximately 10,000 years ago. The two lake ecomorphs show no fixed genetic differentiation, yet are characterized by different morphologies, depth preferences and diets. We report extensive genome-wide methylome divergence between the two lake ecomorphs, and between the lake and river populations, linked to key biological processes and associated with altered transcriptional activity of ecologically relevant genes. Such genes differing between lake ecomorphs include those involved in steroid metabolism, hemoglobin composition and erythropoiesis, consistent with their divergent habitat occupancy. Using a common-garden experiment, we found that global methylation profiles are often rapidly remodeled across generations but ecomorph-specific differences can be inherited. Collectively, our study suggests an epigenetic contribution to the early stages of vertebrate speciation.
Asunto(s)
Cíclidos , Lagos , Animales , Evolución Biológica , Cíclidos/genética , Ecosistema , Epigénesis GenéticaRESUMEN
Cichlid fish of the genus Oreochromis form the basis of the global tilapia aquaculture and fisheries industries. Broodstocks for aquaculture are often collected from wild populations, which in Africa may be from locations containing multiple Oreochromis species. However, many species are difficult to distinguish morphologically, hampering efforts to maintain good quality farmed strains. Additionally, non-native farmed tilapia populations are known to be widely distributed across Africa and to hybridize with native Oreochromis species, which themselves are important for capture fisheries. The morphological identification of these hybrids is particularly unreliable. Here, we describe the development of a single nucleotide polymorphism (SNP) genotyping panel from whole-genome resequencing data that enables targeted species identification in Tanzania. We demonstrate that an optimized panel of 96 genome-wide SNPs based on FST outliers performs comparably to whole genome resequencing in distinguishing species and identifying hybrids. We also show this panel outperforms microsatellite-based and phenotype-based classification methods. Case studies indicate several locations where introduced aquaculture species have become established in the wild, threatening native Oreochromis species. The novel SNP markers identified here represent an important resource for assessing broodstock purity in hatcheries and helping to conserve unique endemic biodiversity.
RESUMEN
Epigenetic variation modulates gene expression and can be heritable. However, knowledge of the contribution of epigenetic divergence to adaptive diversification in nature remains limited. The massive evolutionary radiation of Lake Malawi cichlid fishes displaying extensive phenotypic diversity despite extremely low sequence divergence is an excellent system to study the epigenomic contribution to adaptation. Here, we present a comparative genome-wide methylome and transcriptome study, focussing on liver and muscle tissues in phenotypically divergent cichlid species. In both tissues we find substantial methylome divergence among species. Differentially methylated regions (DMR), enriched in evolutionary young transposons, are associated with transcription changes of ecologically-relevant genes related to energy expenditure and lipid metabolism, pointing to a link between dietary ecology and methylome divergence. Unexpectedly, half of all species-specific DMRs are shared across tissues and are enriched in developmental genes, likely reflecting distinct epigenetic developmental programmes. Our study reveals substantial methylome divergence in closely-related cichlid fishes and represents a resource to study the role of epigenetics in species diversification.
Asunto(s)
Mapeo Cromosómico , Cíclidos/genética , Epigénesis Genética , Evolución Molecular , Animales , Elementos Transponibles de ADN , Epigenoma , Expresión Génica , Genómica , Lagos , Hígado , Malaui , Análisis de Secuencia de ADN , Especificidad de la EspecieRESUMEN
Lake Malawi cichlid fishes exhibit extensive divergence in form and function built from a relatively small number of genetic changes. We compared the genomes of rock- and sand-dwelling species and asked which genetic variants differed among the groups. We found that 96% of differentiated variants reside in non-coding sequence but these non-coding diverged variants are evolutionarily conserved. Genome regions near differentiated variants are enriched for craniofacial, neural and behavioral categories. Following leads from genome sequence, we used rock- vs. sand-species and their hybrids to (i) delineate the push-pull roles of BMP signaling and irx1b in the specification of forebrain territories during gastrulation and (ii) reveal striking context-dependent brain gene expression during adult social behavior. Our results demonstrate how divergent genome sequences can predict differences in key evolutionary traits. We highlight the promise of evolutionary reverse genetics-the inference of phenotypic divergence from unbiased genome sequencing and then empirical validation in natural populations.
Asunto(s)
Conducta Animal , Evolución Biológica , Encéfalo/fisiología , Genoma , Genómica , Animales , Cíclidos/clasificación , Cíclidos/fisiología , Genómica/métodos , Filogenia , TranscriptomaRESUMEN
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Asunto(s)
Genoma , Genómica/métodos , Vertebrados/genética , Animales , Aves , Biblioteca de Genes , Tamaño del Genoma , Genoma Mitocondrial , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Alineación de Secuencia , Análisis de Secuencia de ADN , Cromosomas Sexuales/genéticaRESUMEN
Patterson's D, also known as the ABBA-BABA statistic, and related statistics such as the f4 -ratio, are commonly used to assess evidence of gene flow between populations or closely related species. Currently available implementations often require custom file formats, implement only small subsets of the available statistics, and are impractical to evaluate all gene flow hypotheses across data sets with many populations or species due to computational inefficiencies. Here, we present a new software package Dsuite, an efficient implementation allowing genome scale calculations of the D and f4 -ratio statistics across all combinations of tens or hundreds of populations or species directly from a variant call format (VCF) file. Our program also implements statistics suited for application to genomic windows, providing evidence of whether introgression is confined to specific loci, and it can also aid in interpretation of a system of f4 -ratio results with the use of the "f-branch" method. Dsuite is available at https://github.com/millanek/Dsuite, is straightforward to use, substantially more computationally efficient than comparable programs, and provides a convenient suite of tools and statistics, including some not previously available in any software package. Thus, Dsuite facilitates the assessment of evidence for gene flow, especially across larger genomic data sets.
Asunto(s)
Genómica , Programas Informáticos , Flujo Génico , GenomaRESUMEN
Evolutionary radiations are responsible for much of the variation in biodiversity across taxa. Cichlid fishes are well known for spectacular evolutionary radiations, as they have repeatedly evolved into large and phenotypically diverse arrays of species. Cichlid genomes carry signatures of past events and, at the same time, are the substrate for ongoing evolution. We survey genome-wide data and the available literature covering 438 cichlid populations (412 species) across multiple radiations to synthesize information about patterns and sharing of genetic variation. Nucleotide diversity within species is low in cichlids, with 92% of surveyed populations having less diversity than the median value found in other vertebrates. Divergence within radiations is also low, and a large proportion of variation is shared among species due to incomplete lineage sorting and widespread hybridization. Population genetics therefore provides a suitable conceptual framework for evolutionary genomic studies of cichlid radiations. We focus in detail on the roles of hybridization in shaping the patterns of genetic variation and in promoting cichlid diversification.
Asunto(s)
Evolución Biológica , Cíclidos/genética , Variación Genética , Hibridación Genética , Animales , Cíclidos/clasificación , Genoma , FilogeniaRESUMEN
Adaptive radiation is the likely source of much of the ecological and morphological diversity of life1-4. How adaptive radiations proceed and what determines their extent remains unclear in most cases1,4. Here we report the in-depth examination of the spectacular adaptive radiation of cichlid fishes in Lake Tanganyika. On the basis of whole-genome phylogenetic analyses, multivariate morphological measurements of three ecologically relevant trait complexes (body shape, upper oral jaw morphology and lower pharyngeal jaw shape), scoring of pigmentation patterns and approximations of the ecology of nearly all of the approximately 240 cichlid species endemic to Lake Tanganyika, we show that the radiation occurred within the confines of the lake and that morphological diversification proceeded in consecutive trait-specific pulses of rapid morphospace expansion. We provide empirical support for two theoretical predictions of how adaptive radiations proceed, the 'early-burst' scenario1,5 (for body shape) and the stages model1,6,7 (for all traits investigated). Through the analysis of two genomes per species and by taking advantage of the uneven distribution of species in subclades of the radiation, we further show that species richness scales positively with per-individual heterozygosity, but is not correlated with transposable element content, number of gene duplications or genome-wide levels of selection in coding sequences.
Asunto(s)
Evolución Biológica , Cíclidos/clasificación , Cíclidos/genética , Somatotipos/genética , África , Animales , Calibración , Cíclidos/anatomía & histología , Femenino , Especiación Genética , Genómica , Heterocigoto , Maxilares/anatomía & histología , Lagos , Masculino , Fenotipo , Factores de TiempoRESUMEN
African cichlid fishes are a prime model for studying speciation mechanisms. Despite the development of extensive genomic resources, it has been difficult to determine which sources of genetic variation are responsible for cichlid phenotypic variation. One of their most variable phenotypes is visual sensitivity, with some of the largest spectral shifts among vertebrates. These shifts arise primarily from differential expression of seven cone opsin genes. By mapping expression quantitative trait loci (eQTL) in intergeneric crosses of Lake Malawi cichlids, we previously identified four causative genetic variants that correspond to indels in the promoters of either key transcription factors or an opsin gene. In this comprehensive study, we show that these indels are the result of the movement of transposable elements (TEs) that correlate with opsin expression variation across the Malawi flock. In tracking the evolutionary history of these particular indels, we found they are endemic to Lake Malawi, suggesting that these TEs are recently active and are segregating within the Malawi cichlid lineage. However, an independent indel has arisen at a similar genomic location in one locus outside of the Malawi flock. The convergence in TE movement suggests these loci are primed for TE insertion and subsequent deletions. Increased TE mobility may be associated with interspecific hybridization, which disrupts mechanisms of TE suppression. This might provide a link between cichlid hybridization and accelerated regulatory variation. Overall, our study suggests that TEs may be an important driver of key regulatory changes, facilitating rapid phenotypic change and possibly speciation in African cichlids.
Asunto(s)
Cíclidos , Opsinas de los Conos , Animales , Cíclidos/genética , Opsinas de los Conos/genética , Elementos Transponibles de ADN/genética , Malaui , Opsinas/genética , FilogeniaRESUMEN
The adaptive radiation of cichlid fishes in East African Lake Malawi encompasses over 500 species that are believed to have evolved within the last 800,000 years from a common founder population. It has been proposed that hybridization between ancestral lineages can provide the genetic raw material to fuel such exceptionally high diversification rates, and evidence for this has recently been presented for the Lake Victoria region cichlid superflock. Here, we report that Lake Malawi cichlid genomes also show evidence of hybridization between two lineages that split 3-4 Ma, today represented by Lake Victoria cichlids and the riverine Astatotilapia sp. "ruaha blue." The two ancestries in Malawi cichlid genomes are present in large blocks of several kilobases, but there is little variation in this pattern between Malawi cichlid species, suggesting that the large-scale mosaic structure of the genomes was largely established prior to the radiation. Nevertheless, tens of thousands of polymorphic variants apparently derived from the hybridization are interspersed in the genomes. These loci show a striking excess of differentiation across ecological subgroups in the Lake Malawi cichlid assemblage, and parental alleles sort differentially into benthic and pelagic Malawi cichlid lineages, consistent with strong differential selection on these loci during species divergence. Furthermore, these loci are enriched for genes involved in immune response and vision, including opsin genes previously identified as important for speciation. Our results reinforce the role of ancestral hybridization in explosive diversification by demonstrating its significance in one of the largest recent vertebrate adaptive radiations.
Asunto(s)
Adaptación Biológica/genética , Cíclidos/genética , Especiación Genética , Hibridación Genética , Animales , Flujo Génico , Haplotipos , Lagos , Malaui , Polimorfismo GenéticoRESUMEN
The hundreds of cichlid fish species in Lake Malawi constitute the most extensive recent vertebrate adaptive radiation. Here we characterize its genomic diversity by sequencing 134 individuals covering 73 species across all major lineages. The average sequence divergence between species pairs is only 0.1-0.25%. These divergence values overlap diversity within species, with 82% of heterozygosity shared between species. Phylogenetic analyses suggest that diversification initially proceeded by serial branching from a generalist Astatotilapia-like ancestor. However, no single species tree adequately represents all species relationships, with evidence for substantial gene flow at multiple times. Common signatures of selection on visual and oxygen transport genes shared by distantly related deep-water species point to both adaptive introgression and independent selection. These findings enhance our understanding of genomic processes underlying rapid species diversification, and provide a platform for future genetic analysis of the Malawi radiation.
Asunto(s)
Evolución Biológica , Cíclidos/genética , Flujo Génico , Genoma , Animales , Variación Genética , Lagos , Malaui , Filogenia , Tanzanía , Secuenciación Completa del GenomaRESUMEN
Powerful approaches to inferring recent or current population structure based on nearest neighbor haplotype "coancestry" have so far been inaccessible to users without high quality genome-wide haplotype data. With a boom in nonmodel organism genomics, there is a pressing need to bring these methods to communities without access to such data. Here, we present RADpainter, a new program designed to infer the coancestry matrix from restriction-site-associated DNA sequencing (RADseq) data. We combine this program together with a previously published MCMC clustering algorithm into fineRADstructure-a complete, easy to use, and fast population inference package for RADseq data (https://github.com/millanek/fineRADstructure; last accessed February 24, 2018). Finally, with two example data sets, we illustrate its use, benefits, and robustness to missing RAD alleles in double digest RAD sequencing.
Asunto(s)
Genómica/métodos , Programas Informáticos , Alelos , Caryophyllaceae/genética , Población , Análisis de Secuencia de ADNRESUMEN
The genomic causes and effects of divergent ecological selection during speciation are still poorly understood. Here we report the discovery and detailed characterization of early-stage adaptive divergence of two cichlid fish ecomorphs in a small (700 meters in diameter) isolated crater lake in Tanzania. The ecomorphs differ in depth preference, male breeding color, body shape, diet, and trophic morphology. With whole-genome sequences of 146 fish, we identified 98 clearly demarcated genomic "islands" of high differentiation and demonstrated the association of genotypes across these islands with divergent mate preferences. The islands contain candidate adaptive genes enriched for functions in sensory perception (including rhodopsin and other twilight-vision-associated genes), hormone signaling, and morphogenesis. Our study suggests mechanisms and genomic regions that may play a role in the closely related mega-radiation of Lake Malawi.
Asunto(s)
Adaptación Fisiológica/genética , Cíclidos/genética , Cíclidos/fisiología , Islas Genómicas , Preferencia en el Apareamiento Animal , Animales , Cíclidos/clasificación , Lagos , Filogenia , Polimorfismo de Nucleótido Simple , Especificidad de la Especie , TanzaníaRESUMEN
Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification.