Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Nucleic Acids Res ; 51(D1): D418-D427, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36350672

RESUMEN

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.


Asunto(s)
Bases de Datos de Proteínas , Humanos , Secuencia de Aminoácidos , Inteligencia Artificial , Internet , Proteínas/química , Programas Informáticos
2.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36484697

RESUMEN

MOTIVATION: To provide high quality, computationally tractable annotation of binding sites for biologically relevant (cognate) ligands in UniProtKB using the chemical ontology ChEBI (Chemical Entities of Biological Interest), to better support efforts to study and predict functionally relevant interactions between protein sequences and structures and small molecule ligands. RESULTS: We structured the data model for cognate ligand binding site annotations in UniProtKB and performed a complete reannotation of all cognate ligand binding sites using stable unique identifiers from ChEBI, which we now use as the reference vocabulary for all such annotations. We developed improved search and query facilities for cognate ligands in the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that ChEBI provides. AVAILABILITY AND IMPLEMENTATION: Binding site annotations for cognate ligands described using ChEBI are available for UniProtKB protein sequence records in several formats (text, XML and RDF) and are freely available to query and download through the UniProt website (www.uniprot.org), REST API (www.uniprot.org/help/api), SPARQL endpoint (sparql.uniprot.org/) and FTP site (https://ftp.uniprot.org/pub/databases/uniprot/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bases del Conocimiento , Bases de Datos de Proteínas , Ligandos , Secuencia de Aminoácidos , Sitios de Unión , Anotación de Secuencia Molecular
3.
Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33156333

RESUMEN

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Secuencia de Aminoácidos , COVID-19/metabolismo , Internet , Anotación de Secuencia Molecular , Dominios Proteicos , Mapas de Interacción de Proteínas , SARS-CoV-2/metabolismo , Alineación de Secuencia
4.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30398656

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Animales , Bases de Datos Genéticas , Ontología de Genes , Humanos , Internet , Familia de Multigenes , Dominios Proteicos/genética , Homología de Secuencia de Aminoácido , Programas Informáticos , Interfaz Usuario-Computador
5.
Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899635

RESUMEN

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Dominios y Motivos de Interacción de Proteínas , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Filogenia
6.
Nucleic Acids Res ; 43(Database issue): D213-21, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25428371

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36,766 member database signatures integrated into 26,238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Bacterias/metabolismo , Ontología de Genes , Estructura Terciaria de Proteína , Proteínas/genética , Análisis de Secuencia de Proteína , Programas Informáticos
7.
Nucleic Acids Res ; 41(Database issue): D344-7, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23161676

RESUMEN

PROSITE (http://prosite.expasy.org/) consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule a collection of rules, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE signatures, together with ProRule, are used for the annotation of domains and features of UniProtKB/Swiss-Prot entries. Here, we describe recent developments that allow users to perform whole-proteome annotation as well as a number of filtering options that can be combined to perform powerful targeted searches for biological discovery. The latest version of PROSITE (release 20.85, of 30 August 2012) contains 1308 patterns, 1039 profiles and 1041 ProRules.


Asunto(s)
Secuencias de Aminoácidos , Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Análisis de Secuencia de Proteína , Secuencia de Aminoácidos , Secuencia Conservada , Internet , Anotación de Secuencia Molecular , Proteínas/química , Proteínas/clasificación , Proteoma/química
8.
Nucleic Acids Res ; 40(Database issue): D306-12, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22096229

RESUMEN

InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Proteínas/clasificación , Proteínas/fisiología , Análisis de Secuencia de Proteína , Programas Informáticos , Terminología como Asunto , Interfaz Usuario-Computador
9.
Nucleic Acids Res ; 38(Database issue): D161-6, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19858104

RESUMEN

PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 ( approximately 70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanProsite to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/prosite/.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Estructura Terciaria de Proteína , Algoritmos , Secuencia de Aminoácidos , Animales , Análisis por Conglomerados , Biología Computacional/tendencias , Bases de Datos de Proteínas , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Datos de Secuencia Molecular , Homología de Secuencia de Aminoácido , Programas Informáticos
10.
Nucleic Acids Res ; 37(Database issue): D261-6, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18948296

RESUMEN

Peroxidases (EC 1.11.1.x), which are encoded by small or large multigenic families, are involved in several important physiological and developmental processes. They use various peroxides as electron acceptors to catalyse a number of oxidative reactions and are present in almost all living organisms. We have created a peroxidase database (http://peroxibase.isb-sib.ch) that contains all identified peroxidase-encoding sequences (about 6000 sequences in 940 organisms). They are distributed between 11 superfamilies and about 60 subfamilies. All the sequences have been individually annotated and checked. PeroxiBase can be consulted using six major interlink sections 'Classes', 'Organisms', 'Cellular localisations', 'Inducers', 'Repressors' and 'Tissue types'. General documentation on peroxidases and PeroxiBase is accessible in the 'Documents' section containing 'Introduction', 'Class description', 'Publications' and 'Links'. In addition to the database, we have developed a tool to classify peroxidases based on the PROSITE profile methodology. To improve their specificity and to prevent overlaps between closely related subfamilies the profiles were built using a new strategy based on the silencing of residues. This new profile construction method and its discriminatory capacity have been tested and validated using the different peroxidase families and subfamilies present in the database. The peroxidase classification tool called PeroxiScan is accessible at the following address: http://peroxibase.isb-sib.ch/peroxiscan.php.


Asunto(s)
Bases de Datos de Proteínas , Peroxidasas/clasificación , Peroxidasas/química , Peroxidasas/metabolismo , Programas Informáticos , Interfaz Usuario-Computador
11.
Nucleic Acids Res ; 37(Database issue): D211-5, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18940856

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total approximately 58,000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).


Asunto(s)
Bases de Datos de Proteínas , Análisis de Secuencia de Proteína , Proteínas/química , Proteínas/clasificación , Integración de Sistemas
12.
Nucleic Acids Res ; 36(Database issue): D245-9, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18003654

RESUMEN

PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. In this article, we describe the implementation of a new method to assign a status to pattern matches, the new PROSITE web page and a new approach to improve the specificity and sensitivity of PROSITE methods. The latest version of PROSITE (release 20.19 of 11 September 2007) contains 1319 patterns, 745 profiles and 764 ProRules. Over the past 2 years, about 200 domains have been added, and now 53% of UniProtKB/Swiss-Prot entries (release 54.2 of 11 September 2007) have a PROSITE match. PROSITE is available on the web at: http://www.expasy.org/prosite/.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Proteínas/clasificación , Aminoácidos/química , Proteínas Bacterianas/química , Proteínas Bacterianas/clasificación , Bases de Datos de Proteínas/historia , Historia del Siglo XX , Historia del Siglo XXI , Internet , Proteínas/química , Alineación de Secuencia , Análisis de Secuencia de Proteína , Programas Informáticos , Interfaz Usuario-Computador
13.
Nucleic Acids Res ; 35(Database issue): D224-8, 2007 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17202162

RESUMEN

InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.


Asunto(s)
Bases de Datos de Proteínas , Internet , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/clasificación , Proteínas/fisiología , Análisis de Secuencia de Proteína , Integración de Sistemas , Interfaz Usuario-Computador
14.
Nucleic Acids Res ; 34(Database issue): D227-30, 2006 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-16381852

RESUMEN

The PROSITE database consists of a large collection of biologically meaningful signatures that are described as patterns or profiles. Each signature is linked to a documentation that provides useful biological information on the protein family, domain or functional site identified by the signature. The PROSITE database is now complemented by a series of rules that can give more precise information about specific residues. During the last 2 years, the documentation and the ScanProsite web pages were redesigned to add more functionalities. The latest version of PROSITE (release 19.11 of September 27, 2005) contains 1329 patterns and 552 profile entries. Over the past 2 years more than 200 domains have been added, and now 52% of UniProtKB/Swiss-Prot entries (release 48.1 of September 27, 2005) have a cross-reference to a PROSITE entry. The database is accessible at http://www.expasy.org/prosite/.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Aminoácidos/química , Internet , Estructura Terciaria de Proteína , Proteínas/clasificación , Programas Informáticos , Interfaz Usuario-Computador
15.
Nucleic Acids Res ; 34(Web Server issue): W362-5, 2006 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-16845026

RESUMEN

ScanProsite--http://www.expasy.org/tools/scanprosite/--is a new and improved version of the web-based tool for detecting PROSITE signature matches in protein sequences. For a number of PROSITE profiles, the tool now makes use of ProRules--context-dependent annotation templates--to detect functional and structural intra-domain residues. The detection of those features enhances the power of function prediction based on profiles. Both user-defined sequences and sequences from the UniProt Knowledgebase can be matched against custom patterns, or against PROSITE signatures. To improve response times, matches of sequences from UniProtKB against PROSITE signatures are now retrieved from a pre-computed match database. Several output modes are available including simple text views and a rich mode providing an interactive match and feature viewer with a graphical representation of results.


Asunto(s)
Aminoácidos/química , Estructura Terciaria de Proteína , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Bases de Datos de Proteínas , Internet , Proteínas/química , Homología de Secuencia de Aminoácido , Interfaz Usuario-Computador
16.
Nucleic Acids Res ; 33(Database issue): D201-5, 2005 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-15608177

RESUMEN

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Proteínas/clasificación , Análisis de Secuencia de Proteína , Bases de Datos de Proteínas/tendencias , Humanos , Estructura Terciaria de Proteína , Alineación de Secuencia , Integración de Sistemas
17.
Nucleic Acids Res ; 30(1): 235-8, 2002 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-11752303

RESUMEN

PROSITE [Bairoch and Bucher (1994) Nucleic Acids Res., 22, 3583-3589; Hofmann et al. (1999) Nucleic Acids Res., 27, 215-219] is a method of identifying the functions of uncharacterized proteins translated from genomic or cDNA sequences. The PROSITE database (http://www.expasy.org/prosite/) consists of biologically significant patterns and profiles designed in such a way that with appropriate computational tools it can rapidly and reliably help to determine to which known family of proteins (if any) a new sequence belongs, or which known domain(s) it contains.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/fisiología , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Animales , Sistemas de Administración de Bases de Datos , Predicción , Almacenamiento y Recuperación de la Información , Internet , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/genética , Alineación de Secuencia
18.
Nucleic Acids Res ; 32(Database issue): D134-7, 2004 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-14681377

RESUMEN

The PROSITE database consists of a large collection of biologically meaningful signatures that are described as patterns or profiles. Each signature is linked to documentation that provides useful biological information on the protein family, domain or functional site identified by the signature. The PROSITE web page has been redesigned and several tools have been implemented to help the user discover new conserved regions in their own proteins and to visualize domain arrangements. We also introduced the facility to search PDB with a PROSITE entry or a user's pattern and visualize matched positions on 3D structures. The latest version of PROSITE (release 18.17 of November 30, 2003) contains 1676 entries. The database is accessible at http://www.expasy.org/prosite/.


Asunto(s)
Biología Computacional , Bases de Datos de Proteínas , Proteínas/química , Secuencias de Aminoácidos , Animales , Secuencia Conservada , Humanos , Internet , Modelos Moleculares , Estructura Terciaria de Proteína , Proteínas/metabolismo , Control de Calidad , Alineación de Secuencia , Relación Estructura-Actividad
19.
Nucleic Acids Res ; 31(1): 315-8, 2003 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-12520011

RESUMEN

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Animales , Gráficos por Computador , Procesamiento Proteico-Postraduccional , Estructura Terciaria de Proteína , Proteínas/genética , Proteínas/metabolismo , Secuencias Repetitivas de Aminoácido , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA