RESUMEN
The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.
Asunto(s)
COVID-19/virología , Bases de Datos Genéticas , SARS-CoV-2/genética , Navegador Web , Coronaviridae/genética , Variación Genética , Genoma Viral , Humanos , Anotación de Secuencia MolecularRESUMEN
Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the Ensembl project (https://www.ensembl.org). The two resources collectively present genome annotation through a consistent set of interfaces spanning the tree of life presenting genome sequence, annotation, variation, transcriptomic data and comparative analysis. Here, we present our largest increase in plant, metazoan and fungal genomes since the project's inception creating one of the world's most comprehensive genomic resources and describe our efforts to reduce genome redundancy in our Bacteria portal. We detail our new efforts in gene annotation, our emerging support for pangenome analysis, our efforts to accelerate data dissemination through the Ensembl Rapid Release resource and our new AlphaFold visualization. Finally, we present details of our future plans including updates on our integration with Ensembl, and how we plan to improve our support for the microbial research community. Software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license). Data updates are synchronised with Ensembl's release cycle.
Asunto(s)
Bases de Datos Genéticas , Genómica , Internet , Programas Informáticos , Animales , Biología Computacional , Genoma Bacteriano/genética , Genoma Fúngico/genética , Genoma de Planta/genética , Plantas/clasificación , Plantas/genética , Vertebrados/clasificación , Vertebrados/genéticaRESUMEN
The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed and comprehensive annotation of gene structures, regulatory elements and variants, and enable comparative genomics by inferring the evolutionary history of genes and genomes. Our integrated genomic data are made available in a variety of ways, including genome browsers, search interfaces, specialist tools such as the Ensembl Variant Effect Predictor, download files and programmatic interfaces. Here, we present recent Ensembl developments including two new website portals. Ensembl Rapid Release (http://rapid.ensembl.org) is designed to provide core tools and services for genomes as soon as possible and has been deployed to support large biodiversity sequencing projects. Our SARS-CoV-2 genome browser (https://covid-19.ensembl.org) integrates our own annotation with publicly available genomic data from numerous sources to facilitate the use of genomics in the international scientific response to the COVID-19 pandemic. We also report on other updates to our annotation resources, tools and services. All Ensembl data and software are freely available without restriction.
Asunto(s)
Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Genómica/métodos , SARS-CoV-2/genética , Vertebrados/genética , Animales , COVID-19/epidemiología , COVID-19/virología , Humanos , Internet , Anotación de Secuencia Molecular/métodos , Pandemias , Vertebrados/clasificaciónRESUMEN
The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Asunto(s)
COVID-19/prevención & control , Biología Computacional/métodos , Bases de Datos Genéticas , Genómica/métodos , Anotación de Secuencia Molecular/métodos , SARS-CoV-2/genética , Animales , COVID-19/epidemiología , COVID-19/virología , Epidemias , Humanos , Internet , Ratones , Seudogenes/genética , ARN Largo no Codificante/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiología , Transcripción Genética/genéticaRESUMEN
WormBase (https://wormbase.org/) is a mature Model Organism Information Resource supporting researchers using the nematode Caenorhabditis elegans as a model system for studies across a broad range of basic biological processes. Toward this mission, WormBase efforts are arranged in three primary facets: curation, user interface and architecture. In this update, we describe progress in each of these three areas. In particular, we discuss the status of literature curation and recently added data, detail new features of the web interface and options for users wishing to conduct data mining workflows, and discuss our efforts to build a robust and scalable architecture by leveraging commercial cloud offerings. We conclude with a description of WormBase's role as a founding member of the nascent Alliance of Genome Resources.
Asunto(s)
Caenorhabditis elegans/genética , Bases de Datos Genéticas , Genes de Helminto , Animales , Minería de Datos , Genómica , Internet , Interfaz Usuario-ComputadorRESUMEN
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Variación Genética , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Algoritmos , Animales , Caenorhabditis elegans/genética , Genómica , Internet , Anotación de Secuencia Molecular , Fenotipo , Plantas/genética , Valores de Referencia , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
WormBase (http://www.wormbase.org) is an important knowledge resource for biomedical researchers worldwide. To accommodate the ever increasing amount and complexity of research data, WormBase continues to advance its practices on data acquisition, curation and retrieval to most effectively deliver comprehensive knowledge about Caenorhabditis elegans, and genomic information about other nematodes and parasitic flatworms. Recent notable enhancements include user-directed submission of data, such as micropublication; genomic data curation and presentation, including additional genomes and JBrowse, respectively; new query tools, such as SimpleMine, Gene Enrichment Analysis; new data displays, such as the Person Lineage browser and the Summary of Ontology-based Annotations. Anticipating more rapid data growth ahead, WormBase continues the process of migrating to a cutting-edge database technology to achieve better stability, scalability, reproducibility and a faster response time. To better serve the broader research community, WormBase, with five other Model Organism Databases and The Gene Ontology project, have begun to collaborate formally as the Alliance of Genome Resources.
Asunto(s)
Bases de Datos Genéticas , Genoma , Nematodos/genética , Animales , Caenorhabditis/genética , Caenorhabditis elegans/genética , Curaduría de Datos , Minería de Datos , Conjuntos de Datos como Asunto , Modelos Animales de Enfermedad , Predicción , Ontología de Genes , Humanos , Almacenamiento y Recuperación de la Información , Platelmintos/genética , Edición , Interferencia de ARN , Alineación de Secuencia , Interfaz Usuario-Computador , Navegador WebRESUMEN
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.
Asunto(s)
Archaea/genética , Bacterias/genética , Bases de Datos Genéticas , Bases de Datos de Proteínas , Eucariontes/genética , Genómica , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Minería de Datos , Predicción , Genoma , Anotación de Secuencia Molecular , ARN/genética , Interfaz Usuario-ComputadorRESUMEN
RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. The website has been subject to continuous improvements focusing on text and sequence similarity searches as well as genome browsing functionality. All RNAcentral data is provided for free and is available for browsing, bulk downloads, and programmatic access at http://rnacentral.org/.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , ARN no Traducido/química , Animales , Genómica , Humanos , Nucleótidos/química , Análisis de Secuencia de ARN , Especificidad de la EspecieRESUMEN
WormBase (www.wormbase.org) is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species, and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed a new portal for parasitic worm genomes. WormBase ParaSite (parasite.wormbase.org) contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support helminth genomic research.
Asunto(s)
Caenorhabditis elegans/genética , Bases de Datos Genéticas , Genoma de los Helmintos , Genómica , Nematodos/genética , Animales , Genes de Helminto , Anotación de Secuencia Molecular , Platelmintos/genética , Programas InformáticosRESUMEN
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.
Asunto(s)
Bases de Datos Genéticas , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Invertebrados/genética , Animales , Diploidia , Eucariontes/genética , Variación Genética , Genoma , Poliploidía , Alineación de SecuenciaRESUMEN
The Piwi proteins of the Argonaute superfamily are required for normal germline development in Drosophila, zebrafish, and mice and associate with 24-30 nucleotide RNAs termed piRNAs. We identify a class of 21 nucleotide RNAs, previously named 21U-RNAs, as the piRNAs of C. elegans. Piwi and piRNA expression is restricted to the male and female germline and independent of many proteins in other small-RNA pathways, including DCR-1. We show that Piwi is specifically required to silence Tc3, but not other Tc/mariner DNA transposons. Tc3 excision rates in the germline are increased at least 100-fold in piwi mutants as compared to wild-type. We find no evidence for a Ping-Pong model for piRNA amplification in C. elegans. Instead, we demonstrate that Piwi acts upstream of an endogenous siRNA pathway in Tc3 silencing. These data might suggest a link between piRNA and siRNA function.
Asunto(s)
Proteínas de Caenorhabditis elegans/metabolismo , Caenorhabditis elegans/metabolismo , Elementos Transponibles de ADN/genética , Células Germinativas/metabolismo , Proteínas/metabolismo , ARN Interferente Pequeño/metabolismo , Animales , Proteínas Argonautas , Caenorhabditis elegans/genética , Proteínas de Drosophila , Femenino , Silenciador del Gen , Genes de Helminto , Células Germinativas/crecimiento & desarrollo , Masculino , Proteínas/genética , ARN de Helminto/metabolismo , Complejo Silenciador Inducido por ARN , Transposasas/metabolismoRESUMEN
BACKGROUND: It has recently emerged that common epithelial cancers such as breast cancers have fusion genes like those in leukaemias. In a representative breast cancer cell line, ZR-75-30, we searched for fusion genes, by analysing genome rearrangements. RESULTS: We first analysed rearrangements of the ZR-75-30 genome, to around 10kb resolution, by molecular cytogenetic approaches, combining array painting and array CGH. We then compared this map with genomic junctions determined by paired-end sequencing. Most of the breakpoints found by array painting and array CGH were identified in the paired end sequencing-55% of the unamplified breakpoints and 97% of the amplified breakpoints (as these are represented by more sequence reads). From this analysis we identified 9 expressed fusion genes: APPBP2-PHF20L1, BCAS3-HOXB9, COL14A1-SKAP1, TAOK1-PCGF2, TIAM1-NRIP1, TIMM23-ARHGAP32, TRPS1-LASP1, USP32-CCDC49 and ZMYM4-OPRD1. We also determined the genomic junctions of a further three expressed fusion genes that had been described by others, BCAS3-ERBB2, DDX5-DEPDC6/DEPTOR and PLEC1-ENPP2. Of this total of 12 expressed fusion genes, 9 were in the coamplification. Due to the sensitivity of the technologies used, we estimate these 12 fusion genes to be around two-thirds of the true total. Many of the fusions seem likely to be driver mutations. For example, PHF20L1, BCAS3, TAOK1, PCGF2, and TRPS1 are fused in other breast cancers. HOXB9 and PHF20L1 are members of gene families that are fused in other neoplasms. Several of the other genes are relevant to cancer-in addition to ERBB2, SKAP1 is an adaptor for Src, DEPTOR regulates the mTOR pathway and NRIP1 is an estrogen-receptor coregulator. CONCLUSIONS: This is the first structural analysis of a breast cancer genome that combines classical molecular cytogenetic approaches with sequencing. Paired-end sequencing was able to detect almost all breakpoints, where there was adequate read depth. It supports the view that gene breakage and gene fusion are important classes of mutation in breast cancer, with a typical breast cancer expressing many fusion genes.
Asunto(s)
Neoplasias de la Mama/genética , Genoma Humano/genética , Proteínas de Fusión Oncogénica/genética , Secuencia de Bases , Línea Celular Tumoral , Mapeo Cromosómico , Clonación Molecular , Hibridación Genómica Comparativa/métodos , Femenino , Humanos , Datos de Secuencia Molecular , Alineación de Secuencia , Análisis de Secuencia de ADNRESUMEN
WormBase (www.wormbase.org) is the central repository for the genetics and genomics of the nematode Caenorhabditis elegans. We provide the research community with data and tools to facilitate the use of C. elegans and related nematodes as model organisms for studying human health, development, and many aspects of fundamental biology. Throughout our 22-year history, we have continued to evolve to reflect progress and innovation in the science and technologies involved in the study of C. elegans. We strive to incorporate new data types and richer data sets, and to provide integrated displays and services that avail the knowledge generated by the published nematode genetics literature. Here, we provide a broad overview of the current state of WormBase in terms of data type, curation workflows, analysis, and tools, including exciting new advances for analysis of single-cell data, text mining and visualization, and the new community collaboration forum. Concurrently, we continue the integration and harmonization of infrastructure, processes, and tools with the Alliance of Genome Resources, of which WormBase is a founding member.
Asunto(s)
Caenorhabditis , Nematodos , Animales , Caenorhabditis/genética , Caenorhabditis elegans/genética , Bases de Datos Genéticas , Genoma , Genómica , Humanos , Nematodos/genéticaRESUMEN
The human X chromosome has a unique biology that was shaped by its evolution as the sex chromosome shared by males and females. We have determined 99.3% of the euchromatic sequence of the X chromosome. Our analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome. LINE1 repeat elements cover one-third of the X chromosome, with a distribution that is consistent with their proposed role as way stations in the process of X-chromosome inactivation. We found 1,098 genes in the sequence, of which 99 encode proteins expressed in testis and in various tumour types. A disproportionately high number of mendelian diseases are documented for the X chromosome. Of this number, 168 have been explained by mutations in 113 X-linked genes, which in many cases were characterized with the aid of the DNA sequence.
Asunto(s)
Cromosomas Humanos X/genética , Evolución Molecular , Genómica , Análisis de Secuencia de ADN , Animales , Antígenos de Neoplasias/genética , Centrómero/genética , Cromosomas Humanos Y/genética , Mapeo Contig , Intercambio Genético/genética , Compensación de Dosificación (Genética) , Femenino , Ligamiento Genético/genética , Genética Médica , Humanos , Masculino , Polimorfismo de Nucleótido Simple/genética , ARN/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Homología de Secuencia de Ácido Nucleico , Testículo/metabolismoRESUMEN
Haemonchus contortus is a globally distributed and economically important gastrointestinal pathogen of small ruminants and has become a key nematode model for studying anthelmintic resistance and other parasite-specific traits among a wider group of parasites including major human pathogens. Here, we report using PacBio long-read and OpGen and 10X Genomics long-molecule methods to generate a highly contiguous 283.4 Mbp chromosome-scale genome assembly including a resolved sex chromosome for the MHco3(ISE).N1 isolate. We show a remarkable pattern of conservation of chromosome content with Caenorhabditis elegans, but almost no conservation of gene order. Short and long-read transcriptome sequencing allowed us to define coordinated transcriptional regulation throughout the parasite's life cycle and refine our understanding of cis- and trans-splicing. Finally, we provide a comprehensive picture of chromosome-wide genetic diversity both within a single isolate and globally. These data provide a high-quality comparison for understanding the evolution and genomics of Caenorhabditis and other nematodes and extend the experimental tractability of this model parasitic nematode in understanding helminth biology, drug discovery and vaccine development, as well as important adaptive traits such as drug resistance.
Asunto(s)
Genoma de los Helmintos/genética , Haemonchus/genética , Modelos Biológicos , Transcriptoma/genética , Animales , Caenorhabditis elegans/genética , Cromosomas/genética , Femenino , Genómica , Hemoncosis/parasitología , Haemonchus/metabolismo , Haemonchus/fisiología , Humanos , Parasitosis Intestinales/parasitología , Estadios del Ciclo de Vida/genética , MasculinoRESUMEN
WormBase ParaSite ( parasite.wormbase.org ) is a comprehensive resource for the genomes of parasitic nematodes and flatworms (helminths). It currently includes genomic data for over 100 helminth species, adding value by way of consistent functional annotation, gene comparative analysis and gene expression analysis. We provide several ways of exploring the data including a choice of genome browsers, genome and gene summary pages, text and sequence searching, a query wizard, bulk downloads, and programmatic interfaces. WormBase ParaSite is released three to six times per year, and is developed in collaboration with WormBase ( www.wormbase.org ) and Ensembl Genomes ( www.ensemblgenomes.org ).
Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Genoma de los Helmintos , Genómica , Biología Computacional/métodos , Epistasis Genética , Perfilación de la Expresión Génica , Ontología de Genes , Helmintiasis/parasitología , Fenotipo , Programas Informáticos , Transcriptoma , Navegador WebRESUMEN
WormBase ( www.wormbase.org ) provides the nematode research community with a centralized database for information pertaining to nematode genes and genomes. As more nematode genome sequences are becoming available and as richer data sets are published, WormBase strives to maintain updated information, displays, and services to facilitate efficient access to and understanding of the knowledge generated by the published nematode genetics literature. This chapter aims to provide an explanation of how to use basic features of WormBase, new features, and some commonly used tools and data queries. Explanations of the curated data and step-by-step instructions of how to access the data via the WormBase website and available data mining tools are provided.
Asunto(s)
Caenorhabditis elegans/genética , Bases de Datos Genéticas , Genoma de los Helmintos , Genómica , Animales , Biología Computacional/métodos , Minería de Datos/métodos , Epistasis Genética , Ontología de Genes , Genes de Helminto , Genómica/métodos , Humanos , Fenotipo , Proteoma , Motor de Búsqueda , Programas Informáticos , Transcriptoma , Interfaz Usuario-Computador , Navegador WebRESUMEN
The Human Epigenome Project aims to identify, catalogue, and interpret genome-wide DNA methylation phenomena. Occurring naturally on cytosine bases at cytosine-guanine dinucleotides, DNA methylation is intimately involved in diverse biological processes and the aetiology of many diseases. Differentially methylated cytosines give rise to distinct profiles, thought to be specific for gene activity, tissue type, and disease state. The identification of such methylation variable positions will significantly improve our understanding of genome biology and our ability to diagnose disease. Here, we report the results of the pilot study for the Human Epigenome Project entailing the methylation analysis of the human major histocompatibility complex. This study involved the development of an integrated pipeline for high-throughput methylation analysis using bisulphite DNA sequencing, discovery of methylation variable positions, epigenotyping by matrix-assisted laser desorption/ionisation mass spectrometry, and development of an integrated public database available at http://www.epigenome.org. Our analysis of DNA methylation levels within the major histocompatibility complex, including regulatory exonic and intronic regions associated with 90 genes in multiple tissues and individuals, reveals a bimodal distribution of methylation profiles (i.e., the vast majority of the analysed regions were either hypo- or hypermethylated), tissue specificity, inter-individual variation, and correlation with independent gene expression data.
Asunto(s)
Metilación de ADN , Genoma Humano , Proyecto Genoma Humano , Complejo Mayor de Histocompatibilidad/genética , Islas de CpG , Citosina/metabolismo , Bases de Datos Genéticas , Epigénesis Genética , Exones , Regulación de la Expresión Génica , Variación Genética , Humanos , Internet , Intrones , Espectrometría de Masas , Proyectos Piloto , Reacción en Cadena de la Polimerasa , ARN Mensajero/metabolismo , Análisis de Secuencia de ADN , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción , Sulfitos/química , Distribución TisularRESUMEN
The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functional genomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite (http://parasite.wormbase.org) for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data.