Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Nucleic Acids Res ; 48(D1): D682-D688, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31691826

RESUMO

The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Epigenoma , Anotação de Sequência Molecular , Algoritmos , Animais , Gráficos por Computador , Bases de Dados de Proteínas , Variação Genética , Estudo de Associação Genômica Ampla , Genômica , Histonas/metabolismo , Humanos , Imageamento Tridimensional , Internet , Ligantes , Ferramenta de Busca , Software , Especificidade da Espécie , Transcriptoma , Interface Usuário-Computador , Navegador
2.
Nucleic Acids Res ; 48(D1): D689-D695, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31598706

RESUMO

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Variação Genética , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Algoritmos , Animais , Caenorhabditis elegans/genética , Genômica , Internet , Anotação de Sequência Molecular , Fenótipo , Plantas/genética , Valores de Referência , Software , Interface Usuário-Computador
3.
Nucleic Acids Res ; 46(D1): D802-D808, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29092050

RESUMO

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.


Assuntos
Archaea/genética , Bactérias/genética , Bases de Dados Genéticas , Bases de Dados de Proteínas , Eucariotos/genética , Genômica , Sequência de Aminoácidos , Animais , Sequência de Bases , Mineração de Dados , Previsões , Genoma , Anotação de Sequência Molecular , RNA/genética , Interface Usuário-Computador
4.
Nucleic Acids Res ; 44(D1): D574-80, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26578574

RESUMO

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.


Assuntos
Bases de Dados Genéticas , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Invertebrados/genética , Animais , Diploide , Eucariotos/genética , Variação Genética , Genoma , Poliploidia , Alinhamento de Sequência
5.
Nucleic Acids Res ; 44(D1): D688-93, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26476449

RESUMO

PhytoPath (www.phytopathdb.org) is a resource for genomic and phenotypic data from plant pathogen species, that integrates phenotypic data for genes from PHI-base, an expertly curated catalog of genes with experimentally verified pathogenicity, with the Ensembl tools for data visualization and analysis. The resource is focused on fungi, protists (oomycetes) and bacterial plant pathogens that have genomes that have been sequenced and annotated. Genes with associated PHI-base data can be easily identified across all plant pathogen species using a BioMart-based query tool and visualized in their genomic context on the Ensembl genome browser. The PhytoPath resource contains data for 135 genomic sequences from 87 plant pathogen species, and 1364 genes curated for their role in pathogenicity and as targets for chemical intervention. Support for community annotation of gene models is provided using the WebApollo online gene editor, and we are working with interested communities to improve reference annotation for selected species.


Assuntos
Bases de Dados Genéticas , Genômica , Interações Hospedeiro-Patógeno/genética , Doenças das Plantas/microbiologia , Genes Bacterianos , Genes Fúngicos , Genoma Bacteriano , Genoma Fúngico , Oomicetos/genética , Fenótipo , Alinhamento de Sequência
6.
Nucleic Acids Res ; 43(Database issue): D656-61, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25361970

RESUMO

PomBase (http://www.pombase.org) is the model organism database for the fission yeast Schizosaccharomyces pombe. PomBase provides a central hub for the fission yeast community, supporting both exploratory and hypothesis-driven research. It provides users easy access to data ranging from the sequence level, to molecular and phenotypic annotations, through to the display of genome-wide high-throughput studies. Recent improvements to the site extend annotation specificity, improve usability and allow for monthly data updates. Both in-house curators and community researchers provide manually curated data to PomBase. The genome browser provides access to published high-throughput data sets and the genomes of three additional Schizosaccharomyces species (Schizosaccharomyces cryophilus, Schizosaccharomyces japonicus and Schizosaccharomyces octosporus).


Assuntos
Bases de Dados Genéticas , Schizosaccharomyces/genética , Expressão Gênica , Ontologia Genética , Genes Fúngicos , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Anotação de Sequência Molecular
7.
Trends Biotechnol ; 32(8): 396-9, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24929579

RESUMO

The research communities studying microbial model organisms, such as Escherichia coli or Saccharomyces cerevisiae, are well served by model organism databases that have extensive functional annotation. However, this is not true of many industrial microbes that are used widely in biotechnology. In this Opinion piece, we use Pichia (Komagataella) pastoris to illustrate the limitations of the available annotation. We consider the resources that can be implemented in the short term both to improve Gene Ontology (GO) annotation coverage based on annotation transfer, and to establish curation pipelines for the literature corpus of this organism.


Assuntos
Biotecnologia/métodos , Proteínas Fúngicas/fisiologia , Microbiologia Industrial/métodos , Anotação de Sequência Molecular/métodos , Pichia/fisiologia , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Pichia/genética , Pichia/metabolismo
8.
Nucleic Acids Res ; 42(Database issue): D546-52, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24163254

RESUMO

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.


Assuntos
Bases de Dados Genéticas , Genoma , Animais , Grão Comestível/genética , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Genômica , Internet , Anotação de Sequência Molecular , Software
9.
Nucleic Acids Res ; 40(Database issue): D695-9, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22039153

RESUMO

PomBase (www.pombase.org) is a new model organism database established to provide access to comprehensive, accurate, and up-to-date molecular data and biological information for the fission yeast Schizosaccharomyces pombe to effectively support both exploratory and hypothesis-driven research. PomBase encompasses annotation of genomic sequence and features, comprehensive manual literature curation and genome-wide data sets, and supports sophisticated user-defined queries. The implementation of PomBase integrates a Chado relational database that houses manually curated data with Ensembl software that supports sequence-based annotation and web access. PomBase will provide user-friendly tools to promote curation by experts within the fission yeast community. This will make a key contribution to shaping its content and ensuring its comprehensiveness and long-term relevance.


Assuntos
Bases de Dados Genéticas , Schizosaccharomyces/genética , Genoma Fúngico , Genômica , Internet , Anotação de Sequência Molecular , Fenótipo
10.
Nucleic Acids Res ; 40(Database issue): D91-7, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22067447

RESUMO

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.


Assuntos
Bases de Dados Genéticas , Genômica , Animais , Genoma , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Invertebrados/genética , Anotação de Sequência Molecular , Integração de Sistemas
11.
Nucleic Acids Res ; 38(21): 7388-99, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20663773

RESUMO

Although the nucleolar localization of proteins is often believed to be mediated primarily by non-specific retention to core nucleolar components, many examples of short nucleolar targeting sequences have been reported in recent years. In this article, 46 human nucleolar localization sequences (NoLSs) were collated from the literature and subjected to statistical analysis. Of the residues in these NoLSs 48% are basic, whereas 99% of the residues are predicted to be solvent-accessible with 42% in α-helix and 57% in coil. The sequence and predicted protein secondary structure of the 46 NoLSs were used to train an artificial neural network to identify NoLSs. At a true positive rate of 54%, the predictor's overall false positive rate (FPR) is estimated to be 1.52%, which can be broken down to FPRs of 0.26% for randomly chosen cytoplasmic sequences, 0.80% for randomly chosen nucleoplasmic sequences and 12% for nuclear localization signals. The predictor was used to predict NoLSs in the complete human proteome and 10 of the highest scoring previously unknown NoLSs were experimentally confirmed. NoLSs are a prevalent type of targeting motif that is distinct from nuclear localization signals and that can be computationally predicted.


Assuntos
Nucléolo Celular/química , Redes Neurais de Computação , Proteínas Nucleares/química , Sinais Direcionadores de Proteínas , Linhagem Celular Tumoral , Biologia Computacional/métodos , Humanos , Sinais de Localização Nuclear , Proteínas Nucleares/análise , Proteínas Virais/análise , Proteínas Virais/química
12.
Nucleic Acids Res ; 37(Database issue): D651-6, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18988626

RESUMO

The PIPs database (http://www.compbio.dundee.ac.uk/www-pips) is a resource for studying protein-protein interactions in human. It contains predictions of >37,000 high probability interactions of which >34,000 are not reported in the interaction databases HPRD, BIND, DIP or OPHID. The interactions in PIPs were calculated by a Bayesian method that combines information from expression, orthology, domain co-occurrence, post-translational modifications and sub-cellular location. The predictions also take account of the topology of the predicted interaction network. The web interface to PIPs ranks predictions according to their likelihood of interaction broken down by the contribution from each information source and with easy access to the evidence that supports each prediction. Where data exists in OPHID, HPRD, DIP or BIND for a protein pair this is also reported in the output tables returned by a search. A network browser is included to allow convenient browsing of the interaction network for any protein in the database. The PIPs database provides a new resource on protein-protein interactions in human that is straightforward to browse, or can be exploited completely, for interaction network modelling.


Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Humanos , Internet , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Interface Usuário-Computador
13.
Infect Genet Evol ; 4(3): 221-42, 2004 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-15450202

RESUMO

A database of MALDI-TOF mass spectrometry (MS) profiles has been developed with the aim of establishing a high throughput system for the characterisation of microbes. Several parameters likely to affect the reproducibility of the mass spectrum of a taxon were exhaustively studied. These included such criteria as sample preparation, growth phase, culture conditions, sample storage, mass range of ions, reproducibility between instruments and the methodology prior to database entry. Replicates of 12 spectra per sample were analysed using a 96-well target plate containing central wells for peptide standards to correct against mass drift during analysis. The quality of the data was assessed statistically prior to database addition using root mean squared values of <3.0 as the criterion for rejection. Cluster analysis using a nearest neighbour algorithm also enabled subsets of data to be compared. This was achieved using the bespoke MicrobeLynx trade mark software. Columbia blood agar was used to standardise all procedures for the database, since it permitted the culture of most human pathogens and also produced spectra with a broad range of mass ions. In some instances, alternative media such as CLED were used in specific studies with greater success. Following standardisation of the procedure, a database was developed comprising ca. 3500 spectra with multiple strain entries for most species. The results to date show unequivocally that as the number of strains per species increased, so too did the success of species matching. The technique demonstrated unique mass spectral profiles for each genus/species, with the variation in mass ions among strains/species being dependent on the intra-specific diversity. The success of identification against the database for wild-type strains ranged between 33 and 100%; the lower percentage results being generally associated with poor representation of some species within the database. These findings provide a new dimension for the rapid and high throughput characterisation of human pathogens with potentially broad applications across the field of microbiology.


Assuntos
Bactérias , Doenças Transmissíveis , Bases de Dados Factuais , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Algoritmos , Bactérias/classificação , Bactérias/genética , Bactérias/metabolismo , Classificação , Doenças Transmissíveis/classificação , Doenças Transmissíveis/microbiologia , Humanos , Filogenia , Software , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/instrumentação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA