RESUMEN
As genomic data transform our understanding of biodiversity, the Earth BioGenome Project (EBP) has set a goal of generating reference quality genome assemblies for all ~1.9 million described eukaryotic taxa. Meeting this goal requires coordination among many individual regional and taxon-focussed projects working under the EBP umbrella. Large-scale sequencing projects require ready access to validated genome-relevant metadata, such as genome sizes and karyotypes, but these data are dispersed across the literature, and directly measured values are lacking for most taxa. To meet these needs, we have developed Genomes on a Tree (GoaT), an Elasticsearch-powered datastore and search index for genome-relevant metadata and sequencing project plans and statuses. GoaT indexes publicly available metadata for all eukaryotic species and interpolates missing values through phylogenetic comparison. GoaT also holds target priority and sequencing status information for many projects affiliated to the EBP to aid project coordination. Metadata and status attributes in GoaT can be queried through a mature API, a web front end, and a command line interface. The web front end additionally provides summary visualisations for data exploration and reporting (see https://goat.genomehubs.org). GoaT currently holds direct or estimated values for over 70 taxon attributes and over 30 assembly attributes across 1.5 million eukaryotic species. The depth and breadth of curated data, frequent updates, and a versatile query interface make GoaT a powerful data aggregator and portal to explore and report underlying data for the eukaryotic tree of life. We illustrate this utility through a series of use cases from planning through to completion of a genome-sequencing project.
RESUMEN
We present a genome assembly from an individual female Salmo trutta (the brown trout; Chordata; Actinopteri; Salmoniformes; Salmonidae). The genome sequence is 2.37 gigabases in span. The majority of the assembly is scaffolded into 40 chromosomal pseudomolecules. Gene annotation of this assembly on Ensembl has identified 43,935 protein coding genes.
RESUMEN
We present a genome assembly from an individual female Aquila chrysaetos chrysaetos (the European golden eagle; Chordata; Aves; Accipitridae). The genome sequence is 1.23 gigabases in span. The majority of the assembly is scaffolded into 28 chromosomal pseudomolecules, including the W and Z sex chromosomes.
RESUMEN
We present a genome assembly from a clonal population of Eimeria tenella Houghton parasites (Apicomplexa; Conoidasida; Eucoccidiorida; Eimeriidae). The genome sequence is 53.25 megabases in span. The entire assembly is scaffolded into 15 chromosomal pseudomolecules, with complete mitochondrion and apicoplast organellar genomes also present.
RESUMEN
We present a genome assembly from an individual male Rattus norvegicus (the Norway rat; Chordata; Mammalia; Rodentia; Muridae). The genome sequence is 2.44 gigabases in span. The majority of the assembly is scaffolded into 20 chromosomal pseudomolecules, with both X and Y sex chromosomes assembled. This genome assembly, mRatBN7.2, represents the new reference genome for R. norvegicus and has been adopted by the Genome Reference Consortium.
RESUMEN
As sequencing becomes more accessible and affordable, the analysis of genomic and transcriptomic data has become a cornerstone of many research initiatives. Communities with a focus on particular taxa or ecosystems need solutions capable of aggregating genomic resources and serving them in a standardized and analysis-friendly manner. Taxon-focussed resources can be more flexible in addressing the needs of a research community than can universal or general databases. Here, we present MolluscDB, a genome and transcriptome database for molluscs. MolluscDB offers a rich ecosystem of tools, including an Ensembl browser, a BLAST server for homology searches and an HTTP server from which any dataset present in the database can be downloaded. To demonstrate the utility of the database and verify the quality of its data, we imported data from assembled genomes and transcriptomes of 22 species, estimated the phylogeny of Mollusca using single-copy orthologues, explored patterns of gene family size change and interrogated the data for biomineralization-associated enzymes and shell matrix proteins. MolluscDB provides an easy-to-use and openly accessible data resource for the research community. This article is part of the Theo Murphy meeting issue 'Molluscan genomics: broad insights and future directions for a neglected phylum'.
Asunto(s)
Bases de Datos Genéticas , Genoma , Moluscos/genética , Transcriptoma , Animales , Perfilación de la Expresión Génica , GenómicaRESUMEN
We present a genome assembly from an individual male Arvicola amphibius (the European water vole; Chordata; Mammalia; Rodentia; Cricetidae). The genome sequence is 2.30 gigabases in span. The majority of the assembly is scaffolded into 18 chromosomal pseudomolecules, including the X sex chromosome. Gene annotation of this assembly on Ensembl has identified 21,394 protein coding genes.
RESUMEN
We present a genome assembly from an individual female Streptopelia turtur (the European turtle dove; Chordata; Aves; Columbidae). The genome sequence is 1.18 gigabases in span. The majority of the assembly is scaffolded into 35 chromosomal pseudomolecules, with the W and Z sex chromosomes assembled.
RESUMEN
We present a genome assembly from an individual male Sciurus carolinensis (the eastern grey squirrel; Vertebrata; Mammalia; Eutheria; Rodentia; Sciuridae). The genome sequence is 2.82 gigabases in span. The majority of the assembly (92.3%) is scaffolded into 21 chromosomal-level scaffolds, with both X and Y sex chromosomes assembled.
RESUMEN
We present a genome assembly from an individual male Sciurus vulgaris (the Eurasian red squirrel; Vertebrata; Mammalia; Eutheria; Rodentia; Sciuridae). The genome sequence is 2.88 gigabases in span. The majority of the assembly is scaffolded into 21 chromosomal-level scaffolds, with both X and Y sex chromosomes assembled.
RESUMEN
We present a genome assembly from an individual male Lutra lutra (the Eurasian river otter; Vertebrata; Mammalia; Eutheria; Carnivora; Mustelidae). The genome sequence is 2.44 gigabases in span. The majority of the assembly is scaffolded into 20 chromosomal pseudomolecules, with both X and Y sex chromosomes assembled.
RESUMEN
Evolutionary adaptation is generally thought to occur through incremental mutational steps, but large mutational leaps can occur during its early stages. These are challenging to study in nature due to the difficulty of observing new genetic variants as they arise and spread, but characterizing their genomic dynamics is important for understanding factors favoring rapid adaptation. Here, we report genomic consequences of recent, adaptive song loss in a Hawaiian population of field crickets (Teleogryllus oceanicus). A discrete genetic variant, flatwing, appeared and spread approximately 15 years ago. Flatwing erases sound-producing veins on male wings. These silent flatwing males are protected from a lethal, eavesdropping parasitoid fly. We sequenced, assembled and annotated the cricket genome, produced a linkage map, and identified a flatwing quantitative trait locus covering a large region of the X chromosome. Gene expression profiling showed that flatwing is associated with extensive genome-wide effects on embryonic gene expression. We found that flatwing male crickets express feminized chemical pheromones. This male feminizing effect, on a different sexual signaling modality, is genetically associated with the flatwing genotype. Our findings suggest that the early stages of evolutionary adaptation to extreme pressures can be accompanied by greater genomic and phenotypic disruption than previously appreciated, and highlight how abrupt adaptation might involve suites of traits that arise through pleiotropy or genomic hitchhiking.
RESUMEN
Reconstruction of target genomes from sequence data produced by instruments that are agnostic as to the species-of-origin may be confounded by contaminant DNA. Whether introduced during sample processing or through co-extraction alongside the target DNA, if insufficient care is taken during the assembly process, the final assembled genome may be a mixture of data from several species. Such assemblies can confound sequence-based biological inference and, when deposited in public databases, may be included in downstream analyses by users unaware of underlying problems. We present BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies. BlobToolKit can be used to process assembly, read and analysis files for fully reproducible interactive exploration in the browser-based Viewer. BlobToolKit can be used during assembly to filter non-target DNA, helping researchers produce assemblies with high biological credibility. We have been running an automated BlobToolKit pipeline on eukaryotic assemblies publicly available in the International Nucleotide Sequence Data Collaboration and are making the results available through a public instance of the Viewer at https://blobtoolkit.genomehubs.org/view We aim to complete analysis of all publicly available genomes and then maintain currency with the flow of new genomes. We have worked to embed these views into the presentation of genome assemblies at the European Nucleotide Archive, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the Viewer.
Asunto(s)
Genoma , Programas Informáticos , Análisis de Secuencia de ADNRESUMEN
We used 20 de novo genome assemblies to probe the speciation history and architecture of gene flow in rapidly radiating Heliconius butterflies. Our tests to distinguish incomplete lineage sorting from introgression indicate that gene flow has obscured several ancient phylogenetic relationships in this group over large swathes of the genome. Introgressed loci are underrepresented in low-recombination and gene-rich regions, consistent with the purging of foreign alleles more tightly linked to incompatibility loci. Here, we identify a hitherto unknown inversion that traps a color pattern switch locus. We infer that this inversion was transferred between lineages by introgression and is convergent with a similar rearrangement in another part of the genus. These multiple de novo genome sequences enable improved understanding of the importance of introgression and selective processes in adaptive radiation.
Asunto(s)
Mariposas Diurnas/genética , Flujo Génico , Introgresión Genética , Genoma de los Insectos , Animales , Evolución Biológica , Mariposas Diurnas/anatomía & histología , Inversión Cromosómica , Genes de Insecto , Especiación Genética , Filogenia , Alas de Animales/anatomía & histologíaRESUMEN
The capacity of organisms to tune their development in response to environmental cues is pervasive in nature. This phenotypic plasticity is particularly striking in plants, enabled by their modular and continuous development. A good example is the activation of lateral shoot branches in Arabidopsis, which develop from axillary meristems at the base of leaves. The activity and elongation of lateral shoots depends on the integration of many signals both external (e.g. light, nutrient supply) and internal (e.g. the phytohormones auxin, strigolactone and cytokinin). Here, we characterise natural variation in plasticity of shoot branching in response to nitrate supply using two diverse panels of Arabidopsis lines. We find extensive variation in nitrate sensitivity across these lines, suggesting a genetic basis for variation in branching plasticity. High plasticity is associated with extreme branching phenotypes such that lines with the most branches on high nitrate have the fewest under nitrate deficient conditions. Conversely, low plasticity is associated with a constitutively moderate level of branching. Furthermore, variation in plasticity is associated with alternative life histories with the low plasticity lines flowering significantly earlier than high plasticity lines. In Arabidopsis, branching is highly correlated with fruit yield, and thus low plasticity lines produce more fruit than high plasticity lines under nitrate deficient conditions, whereas highly plastic lines produce more fruit under high nitrate conditions. Low and high plasticity, associated with early and late flowering respectively, can therefore be interpreted alternative escape vs mitigate strategies to low N environments. The genetic architecture of these traits appears to be highly complex, with only a small proportion of the estimated genetic variance detected in association mapping.
Asunto(s)
Arabidopsis/genética , Nitratos/metabolismo , Brotes de la Planta/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Regulación de la Expresión Génica de las Plantas/genética , Genes de Plantas/genética , Meristema/crecimiento & desarrollo , Fenotipo , Hojas de la Planta/metabolismo , Raíces de Plantas/genética , Brotes de la Planta/crecimiento & desarrollo , Brotes de la Planta/metabolismoRESUMEN
Comparing newly obtained and previously known nucleotide and amino-acid sequences underpins modern biological research. BLAST is a well-established tool for such comparisons but is challenging to use on new data sets. We combined a user-centric design philosophy with sustainable software development approaches to create Sequenceserver, a tool for running BLAST and visually inspecting BLAST results for biological interpretation. Sequenceserver uses simple algorithms to prevent potential analysis errors and provides flexible text-based and visual outputs to support researcher productivity. Our software can be rapidly installed for use by individuals or on shared servers.
Asunto(s)
Biología Computacional/métodos , Técnicas Genéticas , Programas InformáticosRESUMEN
Database URL: http://GenomeHubs.org.As the generation and use of genomic datasets is becoming increasingly common in all areas of biology, the need for resources to collate, analyse and present data from one or more genome projects is becoming more pressing. The Ensembl platform is a powerful tool to make genome data and cross-species analyses easily accessible through a web interface and a comprehensive application programming interface. Here we introduce GenomeHubs, which provide a containerized environment to facilitate the setup and hosting of custom Ensembl genome browsers. This simplifies mirroring of existing content and import of new genomic data into the Ensembl database schema. GenomeHubs also provide a set of analysis containers to decorate imported genomes with results of standard analyses and functional annotations and support export to flat files, including EMBL format for submission of assemblies and annotations to International Nucleotide Sequence Database Collaboration.
Asunto(s)
Data Warehousing/métodos , Bases de Datos de Ácidos Nucleicos , Genoma , Internet , Análisis de Secuencia de ADN/métodos , Navegador Web , Animales , HumanosRESUMEN
The mycalesine butterfly Bicyclus anynana, the "Squinting bush brown," is a model organism in the study of lepidopteran ecology, development, and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species. Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology; 128 Gb of raw Illumina data was filtered to 124 Gb and assembled to a final size of 475 Mb (â¼×260 assembly coverage). Contigs were scaffolded using mate-pair, transcriptome, and PacBio data into 10 800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements and encodes a total of 22 642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes. We report a high-quality draft genome sequence for Bicyclus anynana. The genome assembly and annotated gene models are available at LepBase (http://ensembl.lepbase.org/index.html).
Asunto(s)
Mariposas Diurnas/genética , Genoma de los Insectos , Animales , Anotación de Secuencia Molecular , Secuenciación Completa del GenomaRESUMEN
The genomic causes and effects of divergent ecological selection during speciation are still poorly understood. Here we report the discovery and detailed characterization of early-stage adaptive divergence of two cichlid fish ecomorphs in a small (700 meters in diameter) isolated crater lake in Tanzania. The ecomorphs differ in depth preference, male breeding color, body shape, diet, and trophic morphology. With whole-genome sequences of 146 fish, we identified 98 clearly demarcated genomic "islands" of high differentiation and demonstrated the association of genotypes across these islands with divergent mate preferences. The islands contain candidate adaptive genes enriched for functions in sensory perception (including rhodopsin and other twilight-vision-associated genes), hormone signaling, and morphogenesis. Our study suggests mechanisms and genomic regions that may play a role in the closely related mega-radiation of Lake Malawi.
Asunto(s)
Adaptación Fisiológica/genética , Cíclidos/genética , Cíclidos/fisiología , Islas Genómicas , Preferencia en el Apareamiento Animal , Animales , Cíclidos/clasificación , Lagos , Filogenia , Polimorfismo de Nucleótido Simple , Especificidad de la Especie , TanzaníaRESUMEN
Estimates of particle size distributions (PSDs) in solid-in-liquid suspensions can be made on the basis of measurements of ultrasonic wave attenuation combined with a mathematical propagation model, which typically requires seven physical parameters to describe each phase of the mixture. The estimation process is insensitive to all of these except the density of the solid particles, which may not be known or difficult to measure. This paper proposes that an unknown density value is incorporated into the sizing computation as a free variable. It is shown that this leads to an accurate estimate of PSD, as well as the unknown density.