RESUMO
Genome skimming approach is widely used in plant systematics to infer phylogenies mostly from organelle genomes. However, organelles represent only 10 % of the produced libraries, and the low coverage associated with these libraries (<3X) prevents the capture of nuclear sequences, which are not always available in non-model organisms or limited to the ribosomal regions. We developed REFMAKER, a user-friendly pipeline, to create specific sets of nuclear loci that can be extracted directly from the genome skimming libraries. For this, a catalogue is built from the meta-assembly of each library contigs, and cleaned by selecting the nuclear regions and removing duplicates from clustering steps. Libraries are next mapped onto this catalogue and consensus sequences are generated to produce a ready-to-use phylogenetic matrix following different filtering parameters aiming at removing putative errors and paralogous sequences. REFMAKER allowed us to infer a well resolved phylogeny in Capurodendron (Sapotaceae) on 67 nuclear loci from low-coverage libraries (<1X). The resulting phylogeny is concomitant with one previously inferred on 638 nuclear genes from target enrichment libraries. While it remains preliminary because of this low sequencing depth, REFMAKER therefore opens perspectives in phylogenomics by allowing nuclear phylogeny reconstructions with genome skimming datasets.
Assuntos
Sapotaceae , Filogenia , Núcleo Celular/genéticaRESUMO
Next-generation sequencing technologies have opened a new era of research in population genetics. Following these new sequencing opportunities, the use of restriction enzyme-based genotyping techniques, such as restriction site-associated DNA sequencing (RAD-seq) or double-digest RAD-sequencing (ddRAD-seq), has dramatically increased in the last decade. From DNA sampling to SNP calling, the laboratory and bioinformatic parameters of enzyme-based techniques have been investigated in the literature. However, the impact of those parameters on downstream analyses and biological results remains less documented. In this study, we investigated the effects of sevral pre- and post-sequencing settings on ddRAD-seq results for two biological systems: a complex of butterfly species (Coenonympha sp.) and several populations of common beech (Fagus sylvatica). Our results suggest that pre-sequencing parameters (i.e., DNA quantity, number of PCR cycles during library preparation) have a significant impact on the number of recovered reads and SNPs, on the number of unique alleles and on individual heterozygosity. In the same way, we found that post-sequencing settings (i.e., clustering and minimum coverage thresholds) influenced loci reconstruction (e.g., number of loci, mean coverage) and SNP calling (e.g., number of SNPs; heterozygosity) but had only a marginal impact on downstream analyses (e.g., measure of genetic differentiation, estimation of individual admixture, and demographic inferences). In addition, replication analyses confirmed the reproducibility of the ddRAD-seq procedure. Overall, this study assesses the degree of sensitivity of ddRAD-seq data to pre- and post-sequencing protocols, and illustrates its robustness when studying population genetics.
Assuntos
Borboletas/genética , Fagus/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Alelos , Animais , Biologia Computacional/métodos , Enzimas de Restrição do DNA/metabolismo , Genética Populacional , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos TestesRESUMO
PREMISE: Events of accelerated species diversification represent one of Earth's most celebrated evolutionary outcomes. Northern Andean high-elevation ecosystems, or páramos, host some plant lineages that have experienced the fastest diversification rates, likely triggered by ecological opportunities created by mountain uplifts, local climate shifts, and key trait innovations. However, the mechanisms behind rapid speciation into the new adaptive zone provided by these opportunities have long remained unclear. METHODS: We address this issue by studying the Venezuelan clade of Espeletia, a species-rich group of páramo-endemics showing a dazzling ecological and morphological diversity. We performed several comparative analyses to study both lineage and trait diversification, using an updated molecular phylogeny of this plant group. RESULTS: We showed that sets of either vegetative or reproductive traits have conjointly diversified in Espeletia along different vegetation belts, leading to adaptive syndromes. Diversification in vegetative traits occurred earlier than in reproductive ones. The rate of species and morphological diversification showed a tendency to slow down over time, probably due to diversity dependence. We also found that closely related species exhibit significantly more overlap in their geographic distributions than distantly related taxa, suggesting that most events of ecological divergence occurred at close geographic proximity within páramos. CONCLUSIONS: These results provide compelling support for a scenario of small-scale ecological divergence along multiple ecological niche dimensions, possibly driven by competitive interactions between species, and acting sequentially over time in a leapfrog pattern.
Assuntos
Asteraceae , Radiação , Evolução Biológica , Ecossistema , Especiação Genética , FilogeniaRESUMO
The subtribe Espeletiinae (Asteraceae), endemic to the high-elevations in the Northern Andes, exhibits an exceptional diversity of species, growth-forms, and reproductive strategies. This complex of 140 species includes large trees, dichotomous trees, shrubs and the extraordinary giant caulescent rosettes, considered as a classic example of adaptation in tropical high-elevation ecosystems. The subtribe has also long been recognized as a prominent case of adaptive radiation, but the understanding of its evolution has been hampered by a lack of phylogenetic resolution. Herein, we produce the first fully resolved phylogeny of all morphological groups of Espeletiinae, using whole plastomes and about a million nuclear nucleotides obtained with an original de novo assembly procedure without reference genome, and analyzed with traditional and coalescent-based approaches that consider the possible impact of incomplete lineage sorting and hybridization on phylogenetic inference. We show that the diversification of Espeletiinae started from a rosette ancestor about 2.3 Ma, after the final uplift of the Northern Andes. This was followed by two independent radiations in the Colombian and Venezuelan Andes, with a few trans-cordilleran dispersal events among low-elevation tree lineages but none among high-elevation rosettes. We demonstrate complex scenarios of morphological change in Espeletiinae, usually implying the convergent evolution of growth-forms with frequent loss/gains of various traits. For instance, caulescent rosettes evolved independently in both countries, likely as convergent adaptations to life in tropical high-elevation habitats. Tree growth-forms evolved independently three times from the repeated colonization of lower elevations by high-elevation rosette ancestors. The rate of morphological diversification increased during the early phase of the radiation, after which it decreased steadily towards the present. On the other hand, the rate of species diversification in the best-sampled Venezuelan radiation was on average very high (3.1 spp/My), with significant rate variation among growth-forms (much higher in polycarpic caulescent rosettes). Our results point out a scenario where both adaptive morphological evolution and geographical isolation due to Pleistocene climatic oscillations triggered an exceptionally rapid radiation for a continental plant group.
Assuntos
Asteraceae/classificação , Asteraceae/genética , Genoma de Planta/genética , Filogenia , Adaptação Fisiológica/genética , Colômbia , Clima Tropical , VenezuelaRESUMO
Low-coverage whole genome shotgun sequencing (or genome skimming) has emerged as a cost-effective method for acquiring genomic data in nonmodel organisms. This method provides sequence information on chloroplast genome (cpDNA), mitochondrial genome (mtDNA) and nuclear ribosomal regions (rDNA), which are over-represented within cells. However, numerous bioinformatic challenges remain to accurately and rapidly obtain such data in organisms with complex genomic structures and rearrangements, in particular for mtDNA in plants or for cpDNA in some plant families. Here we introduce the pipeline ORTHOSKIM, which performs in silico capture of targeted sequences from genomic and transcriptomic libraries without assembling whole organelle genomes. ORTHOSKIM proceeds in three steps: (i) global sequence assembly, (ii) mapping against reference sequences and (iii) target sequence extraction; importantly it also includes a range of quality control tests. Different modes are implemented to capture both coding and noncoding regions of cpDNA, mtDNA and rDNA sequences, along with predefined nuclear sequences (e.g., ultraconserved elements) or collections of single-copy orthologue genes. Moreover, aligned DNA matrices are produced for phylogenetic reconstructions, by performing multiple alignments of the captured sequences. While ORTHOSKIM is suitable for any eukaryote, a case study is presented here, using 114 genome-skimming libraries and four RNA sequencing libraries obtained for two plant families, Primulaceae and Ericaceae, the latter being a well-known problematic family for cpDNA assemblies. ORTHOSKIM recovered with high success rates cpDNA, mtDNA and rDNA sequences, well suited to accurately infer evolutionary relationships within these families. ORTHOSKIM is released under a GPL-3 licence and is available at: https://github.com/cpouchon/ORTHOSKIM.
Assuntos
Genoma de Cloroplastos , Transcriptoma , DNA de Cloroplastos/genética , DNA Mitocondrial/genética , DNA Ribossômico/genética , Genômica/métodos , Filogenia , Análise de Sequência de DNA/métodosRESUMO
Genome skimming has the potential for generating large data sets for DNA barcoding and wider biodiversity genomic studies, particularly via the assembly and annotation of full chloroplast (cpDNA) and nuclear ribosomal DNA (nrDNA) sequences. We compare the success of genome skims of 2051 herbarium specimens from Norway/Polar regions with 4604 freshly collected, silica gel dried specimens mainly from the European Alps and the Carpathians. Overall, we were able to assemble the full chloroplast genome for 67% of the samples and the full nrDNA cluster for 86%. Average insert length, cover and full cpDNA and rDNA assembly were considerably higher for silica gel dried than herbarium-preserved material. However, complete plastid genomes were still assembled for 54% of herbarium samples compared to 70% of silica dried samples. Moreover, there was comparable recovery of coding genes from both tissue sources (121 for silica gel dried and 118 for herbarium material) and only minor differences in assembly success of standard barcodes between silica dried (89% ITS2, 96% matK and rbcL) and herbarium material (87% ITS2, 98% matK and rbcL). The success rate was > 90% for all three markers in 1034 of 1036 genera in 160 families, and only Boraginaceae worked poorly, with 7 genera failing. Our study shows that large-scale genome skims are feasible and work well across most of the land plant families and genera we tested, independently of material type. It is therefore an efficient method for increasing the availability of plant biodiversity genomic data to support a multitude of downstream applications.