RESUMEN
Pteronarcys californica (Newport 1848) is commonly referred to as the giant salmonfly and is the largest species of stonefly (Insecta: Plecoptera) in the western United States. Historically, it was widespread and abundant in western rivers, but populations have experienced a substantial decline in the past few decades, becoming locally extirpated in numerous rivers in Utah, Colorado, and Montana. Although previous research has explored the ecological variables conducive to the survivability of populations of the giant salmonfly, a lack of genomic resources hampers exploration of how genetic variation is spread across extant populations. To accelerate research on this imperiled species, we present a de novo chromosomal-length genome assembly of P. californica generated from PacBio HiFi sequencing and Hi-C chromosome conformation capture. Our assembly includes 14 predicted pseudo chromosomes and 98.8% of Insecta universal core orthologs. At 2.40 gigabases, the P. californica assembly is the largest of available stonefly assemblies, highlighting at least 9.5-fold variation in assembly size across the order. Repetitive elements (REs) account for much of the genome size increase in P. californica relative to other stonefly species, with the content of Class I retroelements alone exceeding the entire assembly size of all but two other species studied. We also observed preliminary suborder-specific trends in genome size that merit testing with more robust taxon sampling.
RESUMEN
Repetitive elements (REs) are integral to the composition, structure, and function of eukaryotic genomes, yet remain understudied in most taxonomic groups. We investigated REs across 601 insect species and report wide variation in RE dynamics across groups. Analysis of associations between REs and protein-coding genes revealed dynamic evolution at the interface between REs and coding regions across insects, including notably elevated RE-gene associations in lineages with abundant long interspersed nuclear elements (LINEs). We leveraged this large, empirical data set to quantify impacts of long-read technology on RE detection and investigate fundamental challenges to RE annotation in diverse groups. In long-read assemblies, we detected â¼36% more REs than short-read assemblies, with long terminal repeats (LTRs) showing 162% increased detection, whereas DNA transposons and LINEs showed less respective technology-related bias. In most insect lineages, 25%-85% of repetitive sequences were "unclassified" following automated annotation, compared with only â¼13% in Drosophila species. Although the diversity of available insect genomes has rapidly expanded, we show the rate of community contributions to RE databases has not kept pace, preventing efficient annotation and high-resolution study of REs in most groups. We highlight the tremendous opportunity and need for the biodiversity genomics field to embrace REs and suggest collective steps for making progress toward this goal.
Asunto(s)
Genómica , Secuencias Repetitivas de Ácidos Nucleicos , Genoma de los Insectos , Secuencias Repetidas Terminales , Elementos Transponibles de ADNRESUMEN
A new species of the carabid beetle genus Bembidion Latreille is described from the Central Valley, Los Angeles Basin, and surrounding areas of California. Bembidionbrownorumsp. nov. is a distinctive species, a relatively large member of the subgenus Notaphus Dejean, and within Notaphus a member of the B.obtusangulum LeConte species group. It has faint spots on the elytra and a large, convex, rounded prothorax. Of the 22 specimens from 11 localities, all but one were collected more than 55 years ago. Although the collection of the holotype in 2021 at UV light suggest the species is still extant, the lack of other recent specimens suggests the species may have a more restricted distribution than in the past, and its populations may be in decline.
RESUMEN
Long-read sequencing is driving a new reality for genome science in which highly contiguous assemblies can be produced efficiently with modest resources. Genome assemblies from long-read sequences are particularly exciting for understanding the evolution of complex genomic regions that are often difficult to assemble. In this study, we utilized long-read sequencing data to generate a high-quality genome assembly for an Antarctic eelpout, Ophthalmolycus amberensis, the first for the globally distributed family Zoarcidae. We used this assembly to understand how O. amberensis has adapted to the harsh Southern Ocean and compared it to another group of Antarctic fishes: the notothenioids. We showed that selection has largely acted on different targets in eelpouts relative to notothenioids. However, we did find some overlap; in both groups, genes involved in membrane structure, thermal tolerance and vision have evidence of positive selection. We found evidence for historical shifts of transposable element activity in O. amberensis and other polar fishes, perhaps reflecting a response to environmental change. We were specifically interested in the evolution of two complex genomic loci known to underlie key adaptations to polar seas: haemoglobin and antifreeze proteins (AFPs). We observed unique evolution of the haemoglobin MN cluster in eelpouts and related fishes in the suborder Zoarcoidei relative to other Perciformes. For AFPs, we identified the first species in the suborder with no evidence of afpIII sequences (Cebidichthys violaceus) in the genomic region where they are found in all other Zoarcoidei, potentially reflecting a lineage-specific loss of this cluster. Beyond polar fishes, our results highlight the power of long-read sequencing to understand genome evolution.
Asunto(s)
Peces , Perciformes , Animales , Peces/genética , Adaptación Fisiológica/genética , Perciformes/genética , Aclimatación , HemoglobinasRESUMEN
Djulis (Chenopodium formosanum Koidz.) is a crop grown since antiquity in Taiwan. It is a BCD-genome hexaploid (2n = 6x = 54) domesticated form of lambsquarters (C. album L.) and a relative of the allotetraploid (AABB) C. quinoa. As with quinoa, djulis seed contains a complete protein profile and many nutritionally important vitamins and minerals. While still sold locally in Taiwanese markets, its traditional culinary uses are being lost as diets of younger generations change. Moreover, indigenous Taiwanese peoples who have long safeguarded djulis are losing their traditional farmlands. We used PacBio sequencing and Hi-C-based scaffolding to produce a chromosome-scale, reference-quality assembly of djulis. The final genome assembly spans 1.63â Gb in 798 scaffolds, with 97.8% of the sequence contained in 27 scaffolds representing the nine haploid chromosomes of each sub-genome of the species. Benchmarking of universal, single-copy orthologs indicated that 98.5% of the conserved orthologous genes for Viridiplantae are complete within the assembled genome, with 92.9% duplicated, as expected for a polyploid. A total of 67.8% of the assembly is repetitive, with the most common repeat being Gypsy long terminal repeat retrotransposons, which had significantly expanded in the B sub-genome. Gene annotation using Iso-Seq data from multiple tissues identified 75,056 putative gene models. Comparisons to quinoa showed strong patterns of synteny which allowed for the identification of homoeologous chromosomes, and sub-genome-specific sequences were used to assign homoeologs to each sub-genome. These results represent the first hexaploid genome assembly and the first assemblies of the C and D genomes of the Chenopodioideae subfamily.
Asunto(s)
Chenopodium , Chenopodium/genética , Cromosomas de las Plantas/genética , Genoma de Planta , Poliploidía , SinteníaRESUMEN
BACKGROUND: Genome size is implicated in the form, function, and ecological success of a species. Two principally different mechanisms are proposed as major drivers of eukaryotic genome evolution and diversity: polyploidy (i.e., whole-genome duplication) or smaller duplication events and bursts in the activity of repetitive elements. Here, we generated de novo genome assemblies of 17 caddisflies covering all major lineages of Trichoptera. Using these and previously sequenced genomes, we use caddisflies as a model for understanding genome size evolution in diverse insect lineages. RESULTS: We detect a â¼14-fold variation in genome size across the order Trichoptera. We find strong evidence that repetitive element expansions, particularly those of transposable elements (TEs), are important drivers of large caddisfly genome sizes. Using an innovative method to examine TEs associated with universal single-copy orthologs (i.e., BUSCO genes), we find that TE expansions have a major impact on protein-coding gene regions, with TE-gene associations showing a linear relationship with increasing genome size. Intriguingly, we find that expanded genomes preferentially evolved in caddisfly clades with a higher ecological diversity (i.e., various feeding modes, diversification in variable, less stable environments). CONCLUSION: Our findings provide a platform to test hypotheses about the potential evolutionary roles of TE activity and TE-gene associations, particularly in groups with high species, ecological, and functional diversities.
Asunto(s)
Evolución Molecular , Insectos , Animales , Elementos Transponibles de ADN , Tamaño del Genoma , Genoma de los Insectos , Insectos/genética , PoliploidíaRESUMEN
The first insect genome assembly (Drosophila melanogaster) was published two decades ago. Today, nuclear genome assemblies are available for a staggering 601 insect species representing 20 orders. In this study, we analyzed the most-contiguous assembly for each species and provide a "state-of-the-field" perspective, emphasizing taxonomic representation, assembly quality, gene completeness, and sequencing technologies. Relative to species richness, genomic efforts have been biased toward four orders (Diptera, Hymenoptera, Collembola, and Phasmatodea), Coleoptera are underrepresented, and 11 orders still lack a publicly available genome assembly. The average insect genome assembly is 439.2 Mb in length with 87.5% of single-copy benchmarking genes intact. Most notable has been the impact of long-read sequencing; assemblies that incorporate long reads are â¼48× more contiguous than those that do not. We offer four recommendations as we collectively continue building insect genome resources: 1) seek better integration between independent research groups and consortia, 2) balance future sampling between filling taxonomic gaps and generating data for targeted questions, 3) take advantage of long-read sequencing technologies, and 4) expand and improve gene annotations.
Asunto(s)
Drosophila melanogaster , Secuenciación de Nucleótidos de Alto Rendimiento , Animales , Genoma de los Insectos , Genómica , Análisis de Secuencia de ADNRESUMEN
Trichoptera (caddisflies) play an essential role in freshwater ecosystems; for instance, larvae process organic material from the water and are food for a variety of predators. Knowledge on the genomic diversity of caddisflies can facilitate comparative and phylogenetic studies thereby allowing scientists to better understand the evolutionary history of caddisflies. Although Trichoptera are the most diverse aquatic insect order, they remain poorly represented in terms of genomic resources. To date, all long-read based genomes have been sequenced from individuals in the retreat-making suborder, Annulipalpia, leaving â¼275 Ma of evolution without high-quality genomic resources. Here, we report the first long-read based de novo genome assemblies of two tube case-making Trichoptera from the suborder Integripalpia, Agrypnia vestita Walker and Hesperophylax magnus Banks. We find that these tube case-making caddisflies have genome sizes that are at least 3-fold larger than those of currently sequenced annulipalpian genomes and that this pattern is at least partly driven by major expansion of repetitive elements. In H. magnus, long interspersed nuclear elements alone exceed the entire genome size of some annulipalpian counterparts suggesting that caddisflies have high potential as a model for understanding genome size evolution in diverse insect lineages.
Asunto(s)
Genómica , Holometabola/genética , Insectos/genética , Secuencias Repetitivas de Ácidos Nucleicos , Animales , Biodiversidad , Agua Dulce , Tamaño del Genoma , Holometabola/clasificación , Insectos/clasificación , Larva , Anotación de Secuencia Molecular , FilogeniaRESUMEN
Study of repetitive DNA elements in model organisms highlights the role of repetitive elements (REs) in many processes that drive genome evolution and phenotypic change. Because REs are much more dynamic than single-copy DNA, repetitive sequences can reveal signals of evolutionary history over short time scales that may not be evident in sequences from slower-evolving genomic regions. Many tools for studying REs are directed toward organisms with existing genomic resources, including genome assemblies and repeat libraries. However, signals in repeat variation may prove especially valuable in disentangling evolutionary histories in diverse non-model groups, for which genomic resources are limited. Here, we introduce RepeatProfiler, a tool for generating, visualizing, and comparing repetitive element DNA profiles from low-coverage, short-read sequence data. RepeatProfiler automates the generation and visualization of RE coverage depth profiles (RE profiles) and allows for statistical comparison of profile shape across samples. In addition, RepeatProfiler facilitates comparison of profiles by extracting signal from sequence variants across profiles which can then be analysed as molecular morphological characters using phylogenetic analysis. We validate RepeatProfiler with data sets from ground beetles (Bembidion), flies (Drosophila), and tomatoes (Solanum). We highlight the potential of RE profiles as a high-resolution data source for studies in species delimitation, comparative genomics, and repeat biology.
Asunto(s)
ADN , Visualización de Datos , Secuencias Repetitivas de Ácidos Nucleicos , Programas Informáticos , Animales , Escarabajos/genética , Drosophila/genética , Evolución Molecular , Genoma , Genómica , Solanum lycopersicum/genética , FilogeniaRESUMEN
Genome architecture is a complex, multidimensional property of an organism defined by the content and spatial organization of the genome's component parts. Comparative study of entire genome architecture in model organisms is shedding light on mechanisms underlying genome regulation, evolution, and diversification, but such studies require costly analytical approaches which make extensive comparative study impractical for most groups. However, lower-cost methods that measure a single architectural component (e.g., distribution of one class of repeats) have potential as a new data source for evolutionary studies insofar as that measure correlates with more complex biological phenomena, and for which it could serve as part of an explanatory framework. We investigated copy number variation (CNV) profiles in ribosomal DNA (rDNA) as a simple measure reflecting the distribution of rDNA subcomponents across the genome. We find that signatures present in rDNA CNV profiles strongly correlate with species boundaries in the breve species group of Bembidion, and vary across broader taxonomic sampling in Bembidion subgenus Plataphus. Profiles of several species show evidence of re-patterning of rDNA-like sequences throughout the genome, revealing evidence of rapid genome evolution (including among sister pairs) not evident from analysis of traditional data sources such as multigene data sets. Major re-patterning of rDNA-like sequences has occurred frequently within the evolutionary history of Plataphus. We confirm that CNV profiles represent an aspect of genomic architecture (i.e., the linear distribution of rDNA components across the genome) via fluorescence in-situ hybridization. In at least one species, novel rDNA-like elements are spread throughout all chromosomes. We discuss the potential of copy number profiles of rDNA, or other repeats, as a low-cost tool for incorporating signal of genomic architecture variation in studies of species delimitation and genome evolution. [Bembidion; Carabidae; copy number variation profiles; rapid genome evolution; ribosomal DNA; species delimitation.].
Asunto(s)
Escarabajos/clasificación , Escarabajos/genética , Evolución Molecular , Genoma de los Insectos/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Animales , Especiación Genética , Filogenia , Especificidad de la EspecieRESUMEN
Satellite DNAs (satDNAs) are among the most dynamically evolving components of eukaryotic genomes and play important roles in genome regulation, genome evolution, and speciation. Despite their abundance and functional impact, we know little about the evolutionary dynamics and molecular mechanisms that shape satDNA distributions in genomes. Here, we use high-quality genome assemblies to study the evolutionary dynamics of two complex satDNAs, Rsp-like and 1.688 g/cm3, in Drosophila melanogaster and its three nearest relatives in the simulans clade. We show that large blocks of these repeats are highly dynamic in the heterochromatin, where their genomic location varies across species. We discovered that small blocks of satDNA that are abundant in X chromosome euchromatin are similarly dynamic, with repeats changing in abundance, location, and composition among species. We detail the proliferation of a rare satellite (Rsp-like) across the X chromosome in D. simulans and D. mauritiana. Rsp-like spread by inserting into existing clusters of the older, more abundant 1.688 satellite, in events likely facilitated by microhomology-mediated repair pathways. We show that Rsp-like is abundant on extrachromosomal circular DNA in D. simulans, which may have contributed to its dynamic evolution. Intralocus satDNA expansions via unequal exchange and the movement of higher order repeats also contribute to the fluidity of the repeat landscape. We find evidence that euchromatic satDNA repeats experience cycles of proliferation and diversification somewhat analogous to bursts of transposable element proliferation. Our study lays a foundation for mechanistic studies of satDNA proliferation and the functional and evolutionary consequences of satDNA movement.
Asunto(s)
ADN Satélite/genética , Drosophila melanogaster/genética , Drosophila simulans/genética , Evolución Molecular , Cromosoma X , Animales , EucromatinaRESUMEN
Targeted capture and enrichment approaches have proven effective for phylogenetic study. Ultraconserved elements (UCEs) in particular have exhibited great utility for phylogenomic analyses, with the software package phyluce being among the most utilized pipelines for UCE phylogenomics, including probe design. Despite the success of UCEs, it is becoming increasing apparent that diverse lineages require probe sets tailored to focal taxa in order to improve locus recovery. However, factors affecting probe design and methods for optimizing probe sets to focal taxa remain underexplored. Here, we use newly available beetle (Coleoptera) genomic resources to investigate factors affecting UCE probe set design using phyluce. In particular, we explore the effects of stringency during initial design steps, as well as base genome choice on resulting probe sets and locus recovery. We found that both base genome choice and initial bait design stringency parameters greatly alter the number of resultant probes included in final probe sets and strongly affect the number of loci detected and recovered during in silico testing of these probe sets. In addition, we identify attributes of base genomes that correlated with high performance in probe design. Ultimately, we provide a recommended workflow for using phyluce to design an optimized UCE probe set that will work across a targeted lineage, and use our findings to develop a new, open-source UCE probe set for beetles of the suborder Adephaga.
RESUMEN
Despite advances that allow DNA sequencing of old museum specimens, sequencing small-bodied, historical specimens can be challenging and unreliable as many contain only small amounts of fragmented DNA. Dependable methods to sequence such specimens are especially critical if the specimens are unique. We attempt to sequence small-bodied (3-6 mm) historical specimens (including nomenclatural types) of beetles that have been housed, dried, in museums for 58-159 years, and for which few or no suitable replacement specimens exist. To better understand ideal approaches of sample preparation and produce preparation guidelines, we compared different library preparation protocols using low amounts of input DNA (1-10 ng). We also explored low-cost optimizations designed to improve library preparation efficiency and sequencing success of historical specimens with minimal DNA, such as enzymatic repair of DNA. We report successful sample preparation and sequencing for all historical specimens despite our low-input DNA approach. We provide a list of guidelines related to DNA repair, bead handling, reducing adapter dimers and library amplification. We present these guidelines to facilitate more economical use of valuable DNA and enable more consistent results in projects that aim to sequence challenging, irreplaceable historical specimens.
Asunto(s)
Escarabajos/genética , ADN/genética , ADN/aislamiento & purificación , Entomología/métodos , Fósiles , Análisis de Secuencia de ADN/métodos , Manejo de Especímenes/métodos , Animales , Escarabajos/clasificación , ADN/química , Entomología/economía , Biblioteca de Genes , Guías como Asunto , Museos , Análisis de Secuencia de ADN/economía , Manejo de Especímenes/economíaRESUMEN
BACKGROUND: Phylogeographic studies of aquatic insects provide valuable insights into mechanisms that shape the genetic structure of communities, yet studies that include broad geographic areas are uncommon for this group. We conducted a broad scale phylogeographic analysis of the least salmonfly Pteronarcella badia (Plecoptera) across western North America. We tested hypotheses related to mode of dispersal and the influence of historic climate oscillations on population genetic structure. In order to generate a larger mitochondrial data set, we used 454 sequencing to reconstruct the complete mitochondrial genome in the early stages of the project. RESULTS: Our analysis revealed high levels of population structure with several deeply divergent clades present across the sample area. Evidence from five mitochondrial genes and one nuclear locus identified a potentially cryptic lineage in the Pacific Northwest. Gene flow estimates and geographic clade distributions suggest that overland flight during the winged adult stage is an important dispersal mechanism for this taxon. We found evidence of multiple glacial refugia across the species distribution and signs of secondary contact within and among major clades. CONCLUSIONS: This study provides a basis for future studies of aquatic insect phylogeography at the inter-basin scale in western North America. Our findings add to an understanding of the role of historical climate isolations in shaping assemblages of aquatic insects in this region. We identified several geographic areas that may have historical importance for other aquatic organisms with similar distributions and dispersal strategies as P. badia. This work adds to the ever-growing list of studies that highlight the potential of next-generation DNA sequencing in a phylogenetic context to improve molecular data sets from understudied groups.
Asunto(s)
Insectos/genética , Animales , Clima , ADN Mitocondrial/genética , Flujo Génico , Variación Genética , Genética de Población , Insectos/clasificación , Datos de Secuencia Molecular , América del Norte , Noroeste de Estados Unidos , Filogeografía , Refugio de FaunaRESUMEN
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.