RESUMO
Copepods encompass numerous ecological roles including parasites, detrivores and phytoplankton grazers. Nonetheless, copepod genome assemblies remain scarce. Lepeophtheirus salmonis is an economically and ecologically important ectoparasitic copepod found on salmonid fish. We present the 695.4 Mbp L. salmonis genome assembly containing ≈60% repetitive regions and 13,081 annotated protein-coding genes. The genome comprises 14 autosomes and a ZZ-ZW sex chromosome system. Assembly assessment identified 92.4% of the expected arthropod genes. Transcriptomics supported annotation and indicated a marked shift in gene expression after host attachment, including apparent downregulation of genes related to circadian rhythm coinciding with abandoning diurnal migration. The genome shows evolutionary signatures including loss of genes needed for peroxisome biogenesis, presence of numerous FNII domains, and an incomplete heme homeostasis pathway suggesting heme proteins to be obtained from the host. Despite repeated development of resistance against chemical treatments L. salmonis exhibits low numbers of many genes involved in detoxification.
Assuntos
Copépodes , Doenças dos Peixes , Parasitos , Aclimatação , Animais , Copépodes/genética , Copépodes/parasitologia , Doenças dos Peixes/genética , Parasitos/genética , TranscriptomaRESUMO
MOTIVATION: Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. RESULTS: We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. AVAILABILITY AND IMPLEMENTATION: The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.
Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Fenótipo , Locos de Características Quantitativas/genética , SoftwareRESUMO
The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Epigenoma , Anotação de Sequência Molecular , Algoritmos , Animais , Gráficos por Computador , Bases de Dados de Proteínas , Variação Genética , Estudo de Associação Genômica Ampla , Genômica , Histonas/metabolismo , Humanos , Imageamento Tridimensional , Internet , Ligantes , Ferramenta de Busca , Software , Especificidade da Espécie , Transcriptoma , Interface Usuário-Computador , NavegadorRESUMO
The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. Ensembl seeks to be a fundamental resource driving scientific progress by creating, maintaining and updating reference genome annotation and comparative genomics resources. This year we describe our new and expanded gene, variant and comparative annotation capabilities, which led to a 50% increase in the number of vertebrate genomes we support. We have also doubled the number of available human variants and added regulatory regions for many mouse cell types and developmental stages. Our data sets and tools are available via the Ensembl website as well as a through a RESTful webservice, Perl application programming interface and as data files for download.
Assuntos
Bases de Dados Genéticas , Genoma/genética , Genômica , Vertebrados/genética , Animais , Biologia Computacional/tendências , Humanos , Camundongos , Anotação de Sequência Molecular , SoftwareRESUMO
The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.
Assuntos
Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Genoma , Disseminação de Informação , Animais , Epigenômica , Genoma Humano , Estudo de Associação Genômica Ampla , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Vertebrados/genética , NavegadorRESUMO
Summary: ArachnoServer is a manually curated database that consolidates information on the sequence, structure, function and pharmacology of spider-venom toxins. Although spider venoms are complex chemical arsenals, the primary constituents are small disulfide-bridged peptides that target neuronal ion channels and receptors. Due to their high potency and selectivity, these peptides have been developed as pharmacological tools, bioinsecticides and drug leads. A new version of ArachnoServer (v3.0) has been developed that includes a bioinformatics pipeline for automated detection and analysis of peptide toxin transcripts in assembled venom-gland transcriptomes. ArachnoServer v3.0 was updated with the latest sequence, structure and functional data, the search-by-mass feature has been enhanced, and toxin cards provide additional information about each mature toxin. Availability and implementation: http://arachnoserver.org. Contact: support@arachnoserver.org. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Venenos de Aranha/química , Animais , Automação Laboratorial , Dissulfetos/química , Proteínas de Insetos/química , Peptídeos/química , Venenos de Aranha/análiseRESUMO
Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Ferramenta de Busca , Software , Navegador , Animais , Mineração de Dados , Evolução Molecular , Regulação da Expressão Gênica , Variação Genética , Genoma Humano , Humanos , Anotação de Sequência Molecular , Especificidade da Espécie , VertebradosRESUMO
New experimental techniques in epigenomics allow researchers to assay a diversity of highly dynamic features such as histone marks, DNA modifications or chromatin structure. The study of their fluctuations should provide insights into gene expression regulation, cell differentiation and disease. The Ensembl project collects and maintains the Ensembl regulation data resources on epigenetic marks, transcription factor binding and DNA methylation for human and mouse, as well as microarray probe mappings and annotations for a variety of chordate genomes. From this data, we produce a functional annotation of the regulatory elements along the human and mouse genomes with plans to expand to other species as data becomes available. Starting from well-studied cell lines, we will progressively expand our library of measurements to a greater variety of samples. Ensembl's regulation resources provide a central and easy-to-query repository for reference epigenomes. As with all Ensembl data, it is freely available at http://www.ensembl.org, from the Perl and REST APIs and from the public Ensembl MySQL database server at ensembldb.ensembl.org. Database URL: http://www.ensembl.org.
Assuntos
Biologia Computacional/métodos , DNA/análise , Bases de Dados Genéticas , Motivos de Aminoácidos , Animais , Metilação de DNA , Epigênese Genética , Epigenômica , Genoma , Genoma Humano , Genômica , Histonas/química , Humanos , Camundongos , Anotação de Sequência Molecular , Análise de Sequência com Séries de OligonucleotídeosRESUMO
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.
Assuntos
Bases de Dados Genéticas , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Invertebrados/genética , Animais , Diploide , Eucariotos/genética , Variação Genética , Genoma , Poliploidia , Alinhamento de SequênciaRESUMO
The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.
Assuntos
Bases de Dados Genéticas , Genômica , Anotação de Sequência Molecular , Animais , Genes , Variação Genética , Humanos , Internet , Camundongos , Proteínas/genética , Ratos , Sequências Reguladoras de Ácido Nucleico , SoftwareRESUMO
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.
Assuntos
Bases de Dados Genéticas , Genoma , Animais , Grão Comestível/genética , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Genômica , Internet , Anotação de Sequência Molecular , SoftwareRESUMO
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.
Assuntos
Bases de Dados Genéticas , Genômica , Animais , Genoma , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Invertebrados/genética , Anotação de Sequência Molecular , Integração de SistemasRESUMO
Streptococcus oralis, a commensal species of the human oral cavity, belongs to the Mitis group of streptococci, which includes one of the major human pathogens as well, S. pneumoniae. We report here the first complete genome sequence of this species. S. oralis Uo5, a high-level penicillin- and multiple-antibiotic-resistant isolate from Hungary, is competent for genetic transformation under laboratory conditions. Comparative and functional genomics of Uo5 will be important in understanding the evolution of pathogenesis among Mitis streptococci and their potential to engage in interspecies gene transfer.
Assuntos
DNA Bacteriano/química , DNA Bacteriano/genética , Genoma Bacteriano , Análise de Sequência de DNA , Streptococcus oralis/genética , Farmacorresistência Bacteriana Múltipla , Humanos , Hungria , Dados de Sequência Molecular , Boca , Streptococcus oralis/efeitos dos fármacos , Streptococcus oralis/isolamento & purificação , Transformação BacterianaRESUMO
BACKGROUND: Post-transcriptional regulation by small RNAs (sRNAs) in bacteria is now recognized as a wide-spread regulatory mechanism modulating a variety of physiological responses including virulence. In Streptococcus pneumoniae, an important human pathogen, the first sRNAs to be described were found in the regulon of the CiaRH two-component regulatory system. Five of these sRNAs were detected and designated csRNAs for cia-dependent small RNAs. CiaRH pleiotropically affects ß-lactam resistance, autolysis, virulence, and competence development by yet to be defined molecular mechanisms. Since CiaRH is highly conserved among streptococci, it is of interest to determine if csRNAs are also included in the CiaRH regulon in this group of organisms consisting of commensal as well as pathogenic species. Knowledge on the participation of csRNAs in CiaRH-dependent regulatory events will be the key to define the physiological role of this important control system. RESULTS: Genes for csRNAs were predicted in streptococcal genomes and data base entries other than S. pneumoniae by searching for CiaR-activated promoters located in intergenic regions that are followed by a transcriptional terminator. 61 different candidate genes were obtained specifying csRNAs ranging in size from 51 to 202 nt. Comparing these genes among each other revealed 40 different csRNA types. All streptococcal genomes harbored csRNA genes, their numbers varying between two and six. To validate these predictions, S. mitis, S. oralis, and S. sanguinis were subjected to csRNA-specific northern blot analysis. In addition, a csRNA gene from S. thermophilus plasmid pST0 introduced into S. pneumoniae was also tested. Each of the csRNAs was detected on these blots and showed the anticipated sizes. Thus, the method applied here is able to predict csRNAs with high precision. CONCLUSIONS: The results of this study strongly suggest that genes for small non-coding RNAs, csRNAs, are part of the regulon of the two-component regulatory system CiaRH in all streptococci.
Assuntos
Proteínas de Bactérias/genética , Genes Bacterianos/genética , Pequeno RNA não Traduzido/genética , Regulon/genética , Streptococcus/genética , Proteínas de Bactérias/metabolismo , Sequência de Bases , Northern Blotting , Sequência Conservada/genética , Regulação Bacteriana da Expressão Gênica , Humanos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Plasmídeos/genética , Regiões Promotoras Genéticas/genética , RNA Bacteriano/química , RNA Bacteriano/genética , Pequeno RNA não Traduzido/química , Especificidade da EspécieRESUMO
Streptococcus mitis is the closest relative of the major human pathogen S. pneumoniae. The 2,15 Mb sequence of the Streptococcus mitis B6 chromosome, an unusually high-level beta-lactam resistant and multiple antibiotic resistant strain, has now been determined to encode 2100 genes. The accessory genome is estimated to represent over 40%, including 75 mostly novel transposases and IS, the prophage phiB6 and another seven phage related regions. Tetracycline resistance mediated by Tn5801, and an unusual and large gene cluster containing three aminoglycoside resistance determinants have not been described in other Streptococcus spp. Comparative genomic analyses including hybridization experiments on a S. mitis B6 specific microarray reveal that individual S. mitis strains are almost as distantly related to the B6 strain as S. pneumoniae. Both species share a core of over 900 genes. Most proteins described as pneumococcal virulence factors are present in S. mitis B6, but the three choline binding proteins PcpA, PspA and PspC, and three gene clusters containing the hyaluronidase gene, ply and lytA, and the capsular genes are absent in S. mitis B6 and other S. mitis as well and confirm their importance for the pathogenetic potential of S. pneumoniae. Despite the close relatedness between the two species, the S. mitis B6 genome reveals a striking X-alignment when compared with S. pneumoniae.
Assuntos
Cromossomos Bacterianos/genética , DNA Bacteriano/genética , Genoma Bacteriano/genética , Streptococcus mitis/genética , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Sequência de Bases , Mapeamento Cromossômico , Hibridização Genômica Comparativa , Elementos de DNA Transponíveis/genética , DNA Bacteriano/química , Dados de Sequência Molecular , Filogenia , Análise de Sequência de DNA , Homologia de Sequência de Aminoácidos , Especificidade da Espécie , Streptococcus mitis/classificação , Streptococcus pneumoniae/genética , Sintenia , Fatores de Virulência/genéticaRESUMO
The production of bacteriocins can be favorable for colonization of the host by eliminating other bacterial species that share the same environment. In Streptococcus pneumoniae, the pnc (blp) locus encoding putative bacteriocins, immunity, and export proteins is controlled by a two-component system similar to the comCDE system required for the induction of genetic competence. A detailed comparison of the pnc clusters of four genetically distinct isolates confirmed the great plasticity of this locus and documented several repeat sequences. Members of the multiple-antibiotic-resistant Spain23F-1 clone, one member of the Spain9V-3 clone, sensitive 23F strain 2306, and the TIGR4 strain produced bactericidal substances active against other gram-positive bacteria and in some cases against S. pneumoniae as well. However, other strains did not show activity against the indicator strains despite the presence of a bacteriocin cluster, indicating that other factors are required for bacteriocin activity. Analysis of strain 2306 and mutant derivatives of this strain confirmed that bacteriocin production was dependent on the two-component regulatory system and genes involved in bacteriocin transport and processing. At least one other bacteriocin gene, pncE, is located elsewhere on the chromosome and might contribute to the bacteriocin activity of this strain.
Assuntos
Bacteriocinas/genética , Variação Genética , Streptococcus pneumoniae/genética , Sequência de Aminoácidos , Bacteriocinas/metabolismo , Sequência de Bases , Mapeamento Cromossômico , Genótipo , Dados de Sequência Molecular , Oligodesoxirribonucleotídeos/química , Reação em Cadeia da Polimerase , RNA Bacteriano/genética , Streptococcus pneumoniae/classificaçãoRESUMO
The genome sequences of two strains of Streptococcus pneumoniae, one of the major human pathogens, are currently available: that of the nonencapsulated laboratory strain R6, the origin of which dates back to the early 20th century, and of the serotype 4 TIGR strain isolated recently. The two genomes are not only different in size (2 versus 2.16 Mb) but differ also by approximately 10% of their genes, many of which being organized in large clusters. Their strain-specific genes and gene clusters are described here. The R6 genome contains 69 kb organized in six large regions that are absent from the TIGR strain, which in turn contains an extra 157kb in twelve clusters compared to R6. In addition, the TIGR strain contains 13 clusters of 4 kb and larger that are not shared by a variety of genetically different S. pneumoniae strains. Many regions bear signs of gene transfer events such as the presence of insertion sequences, transposable elements, and putative site-specific integrases/recombinases. Three strain-specific regions are devoted to genes encoding proteins with the cell wall anchor motif LPXTG which are important for the interaction with host cells and appear to be highly variable, similar to cell wall-associated choline-binding proteins.