Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters

Database
Language
Publication year range
1.
Nucleic Acids Res ; 49(D1): D412-D419, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33125078

ABSTRACT

The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.


Subject(s)
Computational Biology/statistics & numerical data , Databases, Protein , Proteins/metabolism , Proteome/metabolism , Animals , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Computational Biology/methods , Epidemics , Humans , Internet , Models, Molecular , Protein Structure, Tertiary , Proteins/chemistry , Proteins/genetics , Proteome/classification , Proteome/genetics , Repetitive Sequences, Amino Acid/genetics , SARS-CoV-2/genetics , SARS-CoV-2/physiology , Sequence Analysis, Protein/methods
2.
Nucleic Acids Res ; 48(D1): D570-D578, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31696235

ABSTRACT

MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline with multiple analysis pipelines that are tailored according to the input data, and that are formally described using the Common Workflow Language, enabling greater provenance, reusability, and reproducibility. MGnify's new analysis pipelines offer additional approaches for taxonomic assertions based on ribosomal internal transcribed spacer regions (ITS1/2) and expanded protein functional annotations. Biochemical pathways and systems predictions have also been added for assembled contigs. MGnify's growing focus on the assembly of metagenomic data has also seen the number of datasets it has assembled and analysed increase six-fold. The non-redundant protein database constructed from the proteins encoded by these assemblies now exceeds 1 billion sequences. Meanwhile, a newly developed contig viewer provides fine-grained visualisation of the assembled contigs and their enriched annotations.


Subject(s)
Metagenome , Microbiota , Phylogeny , Software , Archaea/classification , Archaea/genetics , Bacteria/classification , Bacteria/genetics , DNA, Ribosomal Spacer/genetics , Databases, Genetic , Metagenomics/methods
3.
Nucleic Acids Res ; 47(D1): D564-D572, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30364992

ABSTRACT

Automatic annotation of protein function is routinely applied to newly sequenced genomes. While this provides a fine-grained view of an organism's functional protein repertoire, proteins, more commonly function in a coordinated manner, such as in pathways or multimeric complexes. Genome Properties (GPs) define such functional entities as a series of steps, originally described by either TIGRFAMs or Pfam entries. To increase the scope of coverage, we have migrated GPs to function as a companion resource utilizing InterPro entries. Having introduced GPs-specific versioned releases, we provide software and data via a GitHub repository, and have developed a new web interface to GPs (available at https://www.ebi.ac.uk/interpro/genomeproperties). In addition to exploring each of the 1286 GPs, the website contains GPs pre-calculated for a representative set of proteomes; these results can be used to profile GPs phylogenetically via an interactive viewer. Users can upload novel data to the viewer for comparison with the pre-calculated results. Over the last year, we have added ∼700 new GPs, increasing the coverage of eukaryotic systems, as well as increasing general coverage through automatic generation of GPs from related resources. All data are freely available via the website and the GitHub repository.


Subject(s)
Databases, Protein , Genome , Proteins/genetics , Genome, Microbial , Metabolic Networks and Pathways/genetics , Multiprotein Complexes/genetics , Proteins/metabolism , Proteome
4.
Nucleic Acids Res ; 47(D1): D427-D432, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30357350

ABSTRACT

The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.


Subject(s)
Databases, Protein , Proteins/classification , Molecular Sequence Annotation , Protein Domains , Proteins/chemistry , Repetitive Sequences, Amino Acid
5.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30398656

ABSTRACT

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Animals , Databases, Genetic , Gene Ontology , Humans , Internet , Multigene Family , Protein Domains/genetics , Sequence Homology, Amino Acid , Software , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL