Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Nucleic Acids Res ; 48(D1): D689-D695, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31598706

RESUMEN

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Variación Genética , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Algoritmos , Animales , Caenorhabditis elegans/genética , Genómica , Internet , Anotación de Secuencia Molecular , Fenotipo , Plantas/genética , Valores de Referencia , Programas Informáticos , Interfaz Usuario-Computador
2.
Nucleic Acids Res ; 48(D1): D682-D688, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31691826

RESUMEN

The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Epigenoma , Anotación de Secuencia Molecular , Algoritmos , Animales , Gráficos por Computador , Bases de Datos de Proteínas , Variación Genética , Estudio de Asociación del Genoma Completo , Genómica , Histonas/metabolismo , Humanos , Imagenología Tridimensional , Internet , Ligandos , Motor de Búsqueda , Programas Informáticos , Especificidad de la Especie , Transcriptoma , Interfaz Usuario-Computador , Navegador Web
3.
Nucleic Acids Res ; 46(D1): D802-D808, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29092050

RESUMEN

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.


Asunto(s)
Archaea/genética , Bacterias/genética , Bases de Datos Genéticas , Bases de Datos de Proteínas , Eucariontes/genética , Genómica , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Minería de Datos , Predicción , Genoma , Anotación de Secuencia Molecular , ARN/genética , Interfaz Usuario-Computador
4.
Nucleic Acids Res ; 44(D1): D688-93, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26476449

RESUMEN

PhytoPath (www.phytopathdb.org) is a resource for genomic and phenotypic data from plant pathogen species, that integrates phenotypic data for genes from PHI-base, an expertly curated catalog of genes with experimentally verified pathogenicity, with the Ensembl tools for data visualization and analysis. The resource is focused on fungi, protists (oomycetes) and bacterial plant pathogens that have genomes that have been sequenced and annotated. Genes with associated PHI-base data can be easily identified across all plant pathogen species using a BioMart-based query tool and visualized in their genomic context on the Ensembl genome browser. The PhytoPath resource contains data for 135 genomic sequences from 87 plant pathogen species, and 1364 genes curated for their role in pathogenicity and as targets for chemical intervention. Support for community annotation of gene models is provided using the WebApollo online gene editor, and we are working with interested communities to improve reference annotation for selected species.


Asunto(s)
Bases de Datos Genéticas , Genómica , Interacciones Huésped-Patógeno/genética , Enfermedades de las Plantas/microbiología , Genes Bacterianos , Genes Fúngicos , Genoma Bacteriano , Genoma Fúngico , Oomicetos/genética , Fenotipo , Alineación de Secuencia
5.
Nucleic Acids Res ; 44(D1): D574-80, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26578574

RESUMEN

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.


Asunto(s)
Bases de Datos Genéticas , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Invertebrados/genética , Animales , Diploidia , Eucariontes/genética , Variación Genética , Genoma , Poliploidía , Alineación de Secuencia
6.
Nucleic Acids Res ; 43(Database issue): D656-61, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25361970

RESUMEN

PomBase (http://www.pombase.org) is the model organism database for the fission yeast Schizosaccharomyces pombe. PomBase provides a central hub for the fission yeast community, supporting both exploratory and hypothesis-driven research. It provides users easy access to data ranging from the sequence level, to molecular and phenotypic annotations, through to the display of genome-wide high-throughput studies. Recent improvements to the site extend annotation specificity, improve usability and allow for monthly data updates. Both in-house curators and community researchers provide manually curated data to PomBase. The genome browser provides access to published high-throughput data sets and the genomes of three additional Schizosaccharomyces species (Schizosaccharomyces cryophilus, Schizosaccharomyces japonicus and Schizosaccharomyces octosporus).


Asunto(s)
Bases de Datos Genéticas , Schizosaccharomyces/genética , Expresión Génica , Ontología de Genes , Genes Fúngicos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , Anotación de Secuencia Molecular
7.
Trends Biotechnol ; 32(8): 396-9, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24929579

RESUMEN

The research communities studying microbial model organisms, such as Escherichia coli or Saccharomyces cerevisiae, are well served by model organism databases that have extensive functional annotation. However, this is not true of many industrial microbes that are used widely in biotechnology. In this Opinion piece, we use Pichia (Komagataella) pastoris to illustrate the limitations of the available annotation. We consider the resources that can be implemented in the short term both to improve Gene Ontology (GO) annotation coverage based on annotation transfer, and to establish curation pipelines for the literature corpus of this organism.


Asunto(s)
Biotecnología/métodos , Proteínas Fúngicas/fisiología , Microbiología Industrial/métodos , Anotación de Secuencia Molecular/métodos , Pichia/fisiología , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Pichia/genética , Pichia/metabolismo
8.
Nucleic Acids Res ; 42(Database issue): D546-52, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24163254

RESUMEN

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.


Asunto(s)
Bases de Datos Genéticas , Genoma , Animales , Grano Comestible/genética , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Genómica , Internet , Anotación de Secuencia Molecular , Programas Informáticos
9.
Nucleic Acids Res ; 40(Database issue): D695-9, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22039153

RESUMEN

PomBase (www.pombase.org) is a new model organism database established to provide access to comprehensive, accurate, and up-to-date molecular data and biological information for the fission yeast Schizosaccharomyces pombe to effectively support both exploratory and hypothesis-driven research. PomBase encompasses annotation of genomic sequence and features, comprehensive manual literature curation and genome-wide data sets, and supports sophisticated user-defined queries. The implementation of PomBase integrates a Chado relational database that houses manually curated data with Ensembl software that supports sequence-based annotation and web access. PomBase will provide user-friendly tools to promote curation by experts within the fission yeast community. This will make a key contribution to shaping its content and ensuring its comprehensiveness and long-term relevance.


Asunto(s)
Bases de Datos Genéticas , Schizosaccharomyces/genética , Genoma Fúngico , Genómica , Internet , Anotación de Secuencia Molecular , Fenotipo
10.
Nucleic Acids Res ; 40(Database issue): D91-7, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22067447

RESUMEN

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.


Asunto(s)
Bases de Datos Genéticas , Genómica , Animales , Genoma , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Invertebrados/genética , Anotación de Secuencia Molecular , Integración de Sistemas
11.
Nucleic Acids Res ; 38(21): 7388-99, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20663773

RESUMEN

Although the nucleolar localization of proteins is often believed to be mediated primarily by non-specific retention to core nucleolar components, many examples of short nucleolar targeting sequences have been reported in recent years. In this article, 46 human nucleolar localization sequences (NoLSs) were collated from the literature and subjected to statistical analysis. Of the residues in these NoLSs 48% are basic, whereas 99% of the residues are predicted to be solvent-accessible with 42% in α-helix and 57% in coil. The sequence and predicted protein secondary structure of the 46 NoLSs were used to train an artificial neural network to identify NoLSs. At a true positive rate of 54%, the predictor's overall false positive rate (FPR) is estimated to be 1.52%, which can be broken down to FPRs of 0.26% for randomly chosen cytoplasmic sequences, 0.80% for randomly chosen nucleoplasmic sequences and 12% for nuclear localization signals. The predictor was used to predict NoLSs in the complete human proteome and 10 of the highest scoring previously unknown NoLSs were experimentally confirmed. NoLSs are a prevalent type of targeting motif that is distinct from nuclear localization signals and that can be computationally predicted.


Asunto(s)
Nucléolo Celular/química , Redes Neurales de la Computación , Proteínas Nucleares/química , Señales de Clasificación de Proteína , Línea Celular Tumoral , Biología Computacional/métodos , Humanos , Señales de Localización Nuclear , Proteínas Nucleares/análisis , Proteínas Virales/análisis , Proteínas Virales/química
12.
Nucleic Acids Res ; 37(Database issue): D651-6, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18988626

RESUMEN

The PIPs database (http://www.compbio.dundee.ac.uk/www-pips) is a resource for studying protein-protein interactions in human. It contains predictions of >37,000 high probability interactions of which >34,000 are not reported in the interaction databases HPRD, BIND, DIP or OPHID. The interactions in PIPs were calculated by a Bayesian method that combines information from expression, orthology, domain co-occurrence, post-translational modifications and sub-cellular location. The predictions also take account of the topology of the predicted interaction network. The web interface to PIPs ranks predictions according to their likelihood of interaction broken down by the contribution from each information source and with easy access to the evidence that supports each prediction. Where data exists in OPHID, HPRD, DIP or BIND for a protein pair this is also reported in the output tables returned by a search. A network browser is included to allow convenient browsing of the interaction network for any protein in the database. The PIPs database provides a new resource on protein-protein interactions in human that is straightforward to browse, or can be exploited completely, for interaction network modelling.


Asunto(s)
Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas , Humanos , Internet , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Interfaz Usuario-Computador
13.
Infect Genet Evol ; 4(3): 221-42, 2004 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-15450202

RESUMEN

A database of MALDI-TOF mass spectrometry (MS) profiles has been developed with the aim of establishing a high throughput system for the characterisation of microbes. Several parameters likely to affect the reproducibility of the mass spectrum of a taxon were exhaustively studied. These included such criteria as sample preparation, growth phase, culture conditions, sample storage, mass range of ions, reproducibility between instruments and the methodology prior to database entry. Replicates of 12 spectra per sample were analysed using a 96-well target plate containing central wells for peptide standards to correct against mass drift during analysis. The quality of the data was assessed statistically prior to database addition using root mean squared values of <3.0 as the criterion for rejection. Cluster analysis using a nearest neighbour algorithm also enabled subsets of data to be compared. This was achieved using the bespoke MicrobeLynx trade mark software. Columbia blood agar was used to standardise all procedures for the database, since it permitted the culture of most human pathogens and also produced spectra with a broad range of mass ions. In some instances, alternative media such as CLED were used in specific studies with greater success. Following standardisation of the procedure, a database was developed comprising ca. 3500 spectra with multiple strain entries for most species. The results to date show unequivocally that as the number of strains per species increased, so too did the success of species matching. The technique demonstrated unique mass spectral profiles for each genus/species, with the variation in mass ions among strains/species being dependent on the intra-specific diversity. The success of identification against the database for wild-type strains ranged between 33 and 100%; the lower percentage results being generally associated with poor representation of some species within the database. These findings provide a new dimension for the rapid and high throughput characterisation of human pathogens with potentially broad applications across the field of microbiology.


Asunto(s)
Bacterias , Enfermedades Transmisibles , Bases de Datos Factuales , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción , Algoritmos , Bacterias/clasificación , Bacterias/genética , Bacterias/metabolismo , Clasificación , Enfermedades Transmisibles/clasificación , Enfermedades Transmisibles/microbiología , Humanos , Filogenia , Programas Informáticos , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/instrumentación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...