RESUMO
A genomic database of all Earth's eukaryotic species could contribute to many scientific discoveries; however, only a tiny fraction of species have genomic information available. In 2018, scientists across the world united under the Earth BioGenome Project (EBP), aiming to produce a database of high-quality reference genomes containing all ~1.5 million recognized eukaryotic species. As the European node of the EBP, the European Reference Genome Atlas (ERGA) sought to implement a new decentralised, equitable and inclusive model for producing reference genomes. For this, ERGA launched a Pilot Project establishing the first distributed reference genome production infrastructure and testing it on 98 eukaryotic species from 33 European countries. Here we outline the infrastructure and explore its effectiveness for scaling high-quality reference genome production, whilst considering equity and inclusion. The outcomes and lessons learned provide a solid foundation for ERGA while offering key learnings to other transnational, national genomic resource projects and the EBP.
RESUMO
Genome sequencing enables answering fundamental questions about the genetic basis of adaptation, population structure and epigenetic mechanisms. Yet, we usually need a suitable reference genome for mapping population-level resequencing data. In some model systems, multiple reference genomes are available, giving the challenging task of determining which reference genome best suits the data. Here, we compared the use of two different reference genomes for the three-spined stickleback (Gasterosteus aculeatus), one novel genome derived from a European gynogenetic individual and the published reference genome of a North American individual. Specifically, we investigated the impact of using a local reference versus one generated from a distinct lineage on several common population genomics analyses. Through mapping genome resequencing data of 60 sticklebacks from across Europe and North America, we demonstrate that genetic distance among samples and the reference genomes impacts downstream analyses. Using a local reference genome increased mapping efficiency and genotyping accuracy, effectively retaining more and better data. Despite comparable distributions of the metrics generated across the genome using SNP data (i.e. π, Tajima's D and FST ), window-based statistics using different references resulted in different outlier genes and enriched gene functions. A marker-based analysis of DNA methylation distributions had a comparably high overlap in outlier genes and functions, yet with distinct differences depending on the reference genome. Overall, our results highlight how using a local reference genome decreases reference bias to increase confidence in downstream analyses of the data. Such results have significant implications in all reference-genome-based population genomic analyses.
Assuntos
Metagenômica , Smegmamorpha , Animais , Genoma/genética , Mapeamento Cromossômico , Genômica/métodos , Análise de Sequência de DNA/métodos , Smegmamorpha/genéticaRESUMO
Endoparasitoid wasps are important natural enemies of many insect species and are major selective forces on the host immune system. Despite increased interest in insect antiparasitoid immunity, there is sparse information on the evolutionary dynamics of biological pathways and gene regulation involved in host immune defense outside Drosophila species. We de novo assembled transcriptomes from two beetle species and used time-course differential expression analysis to investigate gene expression differences in closely related species Galerucella pusilla and G. calmariensis that are, respectively, resistant and susceptible against parasitoid infection by Asecodes parviclava parasitoids. Approximately 271 million and 224 million paired-ended reads were assembled and filtered to form 52,563 and 59,781 transcripts for G. pusilla and G. calmariensis, respectively. In the whole-transcriptome level, an enrichment of functional categories related to energy production, biosynthetic process, and metabolic process was exhibited in both species. The main difference between species appears to be immune response and wound healing process mounted by G. pusilla larvae. Using reciprocal BLAST against the Drosophila melanogaster proteome, 120 and 121 immune-related genes were identified in G. pusilla and G. calmariensis, respectively. More immune genes were differentially expressed in G. pusilla than in G. calmariensis, in particular genes involved in signaling, hematopoiesis, and melanization. In contrast, only one gene was differentially expressed in G. calmariensis. Our study characterizes important genes and pathways involved in different immune functions after parasitoid infection and supports the role of signaling and hematopoiesis genes as key players in host immunity in Galerucella against parasitoid wasps.
Assuntos
Besouros/genética , Besouros/imunologia , Regulação da Expressão Gênica , Genes de Insetos , Interações Hospedeiro-Parasita/imunologia , Himenópteros/fisiologia , Imunocompetência , Animais , Evolução Biológica , Besouros/parasitologia , Drosophila melanogaster/genética , Filogenia , TranscriptomaRESUMO
As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR).