Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 22
1.
Nucleic Acids Res ; 48(D1): D376-D382, 2020 01 08.
Article En | MEDLINE | ID: mdl-31724711

The Structural Classification of Proteins (SCOP) database is a classification of protein domains organised according to their evolutionary and structural relationships. We report a major effort to increase the coverage of structural data, aiming to provide classification of almost all domain superfamilies with representatives in the PDB. We have also improved the database schema, provided a new API and modernised the web interface. This is by far the most significant update in coverage since SCOP 1.75 and builds on the advances in schema from the SCOP 2 prototype. The database is accessible from http://scop.mrc-lmb.cam.ac.uk.


Databases, Protein , Protein Domains , Proteins/chemistry , Evolution, Molecular , Internet , Proteins/metabolism , Software , User-Computer Interface
3.
Article En | MEDLINE | ID: mdl-26896847

Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org.


Computational Biology/methods , Genome , Genomics , Algorithms , Animals , DNA, Complementary/genetics , Databases, Genetic , Evolution, Molecular , Expressed Sequence Tags , Humans , Phylogeny , Quality Control , RNA, Untranslated/genetics , Sequence Alignment , Sequence Analysis, RNA , Software
4.
Nucleic Acids Res ; 44(D1): D688-93, 2016 Jan 04.
Article En | MEDLINE | ID: mdl-26476449

PhytoPath (www.phytopathdb.org) is a resource for genomic and phenotypic data from plant pathogen species, that integrates phenotypic data for genes from PHI-base, an expertly curated catalog of genes with experimentally verified pathogenicity, with the Ensembl tools for data visualization and analysis. The resource is focused on fungi, protists (oomycetes) and bacterial plant pathogens that have genomes that have been sequenced and annotated. Genes with associated PHI-base data can be easily identified across all plant pathogen species using a BioMart-based query tool and visualized in their genomic context on the Ensembl genome browser. The PhytoPath resource contains data for 135 genomic sequences from 87 plant pathogen species, and 1364 genes curated for their role in pathogenicity and as targets for chemical intervention. Support for community annotation of gene models is provided using the WebApollo online gene editor, and we are working with interested communities to improve reference annotation for selected species.


Databases, Genetic , Genomics , Host-Pathogen Interactions/genetics , Plant Diseases/microbiology , Genes, Bacterial , Genes, Fungal , Genome, Bacterial , Genome, Fungal , Oomycetes/genetics , Phenotype , Sequence Alignment
5.
Nucleic Acids Res ; 44(D1): D574-80, 2016 Jan 04.
Article En | MEDLINE | ID: mdl-26578574

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.


Databases, Genetic , Genome, Bacterial , Genome, Fungal , Genome, Plant , Invertebrates/genetics , Animals , Diploidy , Eukaryota/genetics , Genetic Variation , Genome , Polyploidy , Sequence Alignment
6.
Curr Protoc Bioinformatics ; 49: 1.26.1-1.26.21, 2015 Mar 09.
Article En | MEDLINE | ID: mdl-25754991

SCOP2 is a successor to the Structural Classification of Proteins (SCOP) database that organizes proteins of known structure according to their structural and evolutionary relationships. It was designed to provide a more advanced framework for the classification of proteins. The SCOP2 classification is described in terms of a directed acyclic graph in which each node defines a relationship of particular type that is represented by a region of protein structure and sequence. The SCOP2 data are accessible via SCOP2-Browser and SCOP2-Graph. This protocol unit describes different ways to explore and investigate the SCOP2 evolutionary and structural groupings.


Databases, Protein , Evolution, Molecular , Proteins/chemistry , Amino Acid Sequence , Internet , Protein Structure, Secondary , Protein Structure, Tertiary
7.
Nucleic Acids Res ; 43(Database issue): D123-9, 2015 01.
Article En | MEDLINE | ID: mdl-25352543

The field of non-coding RNA biology has been hampered by the lack of availability of a comprehensive, up-to-date collection of accessioned RNA sequences. Here we present the first release of RNAcentral, a database that collates and integrates information from an international consortium of established RNA sequence databases. The initial release contains over 8.1 million sequences, including representatives of all major functional classes. A web portal (http://rnacentral.org) provides free access to data, search functionality, cross-references, source code and an integrated genome browser for selected species.


Databases, Nucleic Acid , RNA, Untranslated/chemistry , Chromosome Mapping , Humans , Internet , RNA, Untranslated/genetics , Sequence Analysis, RNA
8.
Nucleic Acids Res ; 42(Database issue): D546-52, 2014 Jan.
Article En | MEDLINE | ID: mdl-24163254

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.


Databases, Genetic , Genome , Animals , Edible Grain/genetics , Genome, Bacterial , Genome, Fungal , Genome, Plant , Genomics , Internet , Molecular Sequence Annotation , Software
9.
Nucleic Acids Res ; 42(Database issue): D749-55, 2014 Jan.
Article En | MEDLINE | ID: mdl-24316576

Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.


Databases, Genetic , Genomics , Animals , Chordata/genetics , Genetic Variation , Humans , Internet , Mice , Molecular Sequence Annotation , Phenotype , Rats
10.
Nucleic Acids Res ; 42(Database issue): D310-4, 2014 Jan.
Article En | MEDLINE | ID: mdl-24293656

We present a prototype of a new structural classification of proteins, SCOP2 (http://scop2.mrc-lmb.cam.ac.uk/), that we have developed recently. SCOP2 is a successor to the Structural Classification of Proteins (SCOP, http://scop.mrc-lmb.cam.ac.uk/scop/) database. Similarly to SCOP, the main focus of SCOP2 is to organize structurally characterized proteins according to their structural and evolutionary relationships. SCOP2 was designed to provide a more advanced framework for protein structure annotation and classification. It defines a new approach to the classification of proteins that is essentially different from SCOP, but retains its best features. The SCOP2 classification is described in terms of a directed acyclic graph in which nodes form a complex network of many-to-many relationships and are represented by a region of protein structure and sequence. The new classification project is expected to ensure new advances in the field and open new areas of research.


Databases, Protein , Protein Structure, Tertiary , Data Mining , Internet , Proteins/classification
11.
Nucleic Acids Res ; 41(Database issue): D48-55, 2013 Jan.
Article En | MEDLINE | ID: mdl-23203987

The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.


Databases, Genetic , Genomics , Animals , Gene Expression Regulation , Genetic Variation , Humans , Internet , Mice , Molecular Sequence Annotation , Rats , Software , Zebrafish/genetics
12.
Nat Methods ; 9(5): 459-62, 2012 Apr 27.
Article En | MEDLINE | ID: mdl-22543379

The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology. In addition to the primary scientific goals of creating both a deep catalog of human genetic variation and extensive methods to accurately discover and characterize variation using new sequencing technologies, the project makes all of its data publicly available. Members of the project data coordination center have developed and deployed several tools to enable widespread data access.


Databases, Genetic , Genome, Human , Genomics/methods , Sequence Analysis, DNA/methods , Computational Biology/methods , Genetic Variation , Humans
13.
Nucleic Acids Res ; 40(Database issue): D84-90, 2012 Jan.
Article En | MEDLINE | ID: mdl-22086963

The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.


Databases, Genetic , Genomics , Animals , Gene Expression Regulation , Genetic Variation , Humans , Mice , Molecular Sequence Annotation , Rats
14.
Nucleic Acids Res ; 40(Database issue): D91-7, 2012 Jan.
Article En | MEDLINE | ID: mdl-22067447

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.


Databases, Genetic , Genomics , Animals , Genome , Genome, Bacterial , Genome, Fungal , Genome, Plant , Invertebrates/genetics , Molecular Sequence Annotation , Systems Integration
15.
Nucleic Acids Res ; 39(Database issue): D800-6, 2011 Jan.
Article En | MEDLINE | ID: mdl-21045057

The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.


Databases, Genetic , Genomics , Animals , Genetic Variation , Humans , Mice , Molecular Sequence Annotation , Rats , Regulatory Sequences, Nucleic Acid , Software , Zebrafish/genetics
16.
BMC Genomics ; 11: 293, 2010 May 11.
Article En | MEDLINE | ID: mdl-20459805

BACKGROUND: The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. DESCRIPTION: The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. CONCLUSIONS: Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.


Databases, Genetic , Genetic Variation , Genomics/methods , Algorithms , Animals , Base Sequence , Cattle , Genotype , Humans , Internet , Linkage Disequilibrium , Mice , Phenotype , Phylogeny , Polymorphism, Single Nucleotide , Rats , Sequence Analysis, DNA , User-Computer Interface
17.
Nucleic Acids Res ; 38(Database issue): D557-62, 2010 Jan.
Article En | MEDLINE | ID: mdl-19906699

Ensembl (http://www.ensembl.org) integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.


Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Access to Information , Animals , Computational Biology/trends , Databases, Protein , Genetic Variation , Genomics/methods , Humans , Information Storage and Retrieval/methods , Internet , Protein Structure, Tertiary , Software , Species Specificity
18.
BMC Bioinformatics ; 9 Suppl 8: S3, 2008 Jul 22.
Article En | MEDLINE | ID: mdl-18673527

BACKGROUND: The Distributed Annotation System (DAS) is a widely adopted protocol for dynamically integrating a wide range of biological data from geographically diverse sources. DAS continues to expand its applicability and evolve in response to new challenges facing integrative bioinformatics. RESULTS: Here we describe the various infrastructure components of DAS and present a new extended version of the DAS specification. Version 1.53E incorporates several recent developments, including its extension to serve new data types and an ontology for protein features. CONCLUSION: Our extensions to the DAS protocol have facilitated the integration of new data types, and our improvements to the existing DAS infrastructure have addressed recent challenges. The steadily increasing numbers of available data sources demonstrates further adoption of the DAS protocol.


Database Management Systems , Databases, Genetic , Information Storage and Retrieval/methods , Computational Biology/methods , Systems Integration
19.
Nat Biotechnol ; 26(7): 779-85, 2008 Jul.
Article En | MEDLINE | ID: mdl-18612301

DNA methylation is an indispensible epigenetic modification required for regulating the expression of mammalian genomes. Immunoprecipitation-based methods for DNA methylome analysis are rapidly shifting the bottleneck in this field from data generation to data analysis, necessitating the development of better analytical tools. In particular, an inability to estimate absolute methylation levels remains a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling. To address this issue, we developed a cross-platform algorithm-Bayesian tool for methylation analysis (Batman)-for analyzing methylated DNA immunoprecipitation (MeDIP) profiles generated using oligonucleotide arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). We developed the latter approach to provide a high-resolution whole-genome DNA methylation profile (DNA methylome) of a mammalian genome. Strong correlation of our data, obtained using mature human spermatozoa, with those obtained using bisulfite sequencing suggest that combining MeDIP-seq or MeDIP-chip with Batman provides a robust, quantitative and cost-effective functional genomic strategy for elucidating the function of DNA methylation.


Algorithms , Chromatin Immunoprecipitation/methods , Chromosome Mapping/methods , DNA Methylation , DNA/genetics , Pattern Recognition, Automated/methods , Sequence Analysis, DNA/methods , Base Sequence , Bayes Theorem , Molecular Sequence Data
20.
Genome Res ; 18(9): 1518-29, 2008 Sep.
Article En | MEDLINE | ID: mdl-18577705

We report a novel resource (methylation profiles of DNA, or mPod) for human genome-wide tissue-specific DNA methylation profiles. mPod consists of three fully integrated parts, genome-wide DNA methylation reference profiles of 13 normal somatic tissues, placenta, sperm, and an immortalized cell line, a visualization tool that has been integrated with the Ensembl genome browser and a new algorithm for the analysis of immunoprecipitation-based DNA methylation profiles. We demonstrate the utility of our resource by identifying the first comprehensive genome-wide set of tissue-specific differentially methylated regions (tDMRs) that may play a role in cellular identity and the regulation of tissue-specific genome function. We also discuss the implications of our findings with respect to the regulatory potential of regions with varied CpG density, gene expression, transcription factor motifs, gene ontology, and correlation with other epigenetic marks such as histone modifications.


DNA Methylation , Genome, Human , Software , Algorithms , CpG Islands , DNA/metabolism , Epigenesis, Genetic , Gene Expression Profiling/methods , Humans
...