Search | VHL Regional Portal

InterPro in 2022.

Paysan-Lafosse, Typhaine; Blum, Matthias; Chuguransky, Sara; Grego, Tiago; Pinto, Beatriz Lázaro; Salazar, Gustavo A; Bileschi, Maxwell L; Bork, Peer; Bridge, Alan; Colwell, Lucy; Gough, Julian; Haft, Daniel H; Letunic, Ivica; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Orengo, Christine A; Pandurangan, Arun P; Rivoire, Catherine; Sigrist, Christian J A; Sillitoe, Ian; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Wu, Cathy H; Bateman, Alex.

Nucleic Acids Res ; 51(D1): D418-D427, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36350672

ABSTRACT

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.

Subject(s)

Databases, Protein , Humans , Amino Acid Sequence , Artificial Intelligence , Internet , Proteins/chemistry , Software

Reciprocal best structure hits: using AlphaFold models to discover distant homologues.

Monzon, Vivian; Paysan-Lafosse, Typhaine; Wood, Valerie; Bateman, Alex.

Bioinform Adv ; 2(1): vbac072, 2022.

Article in English | MEDLINE | ID: mdl-36408459

ABSTRACT

Motivation: The conventional methods to detect homologous protein pairs use the comparison of protein sequences. But the sequences of two homologous proteins may diverge significantly and consequently may be undetectable by standard approaches. The release of the AlphaFold 2.0 software enables the prediction of highly accurate protein structures and opens many opportunities to advance our understanding of protein functions, including the detection of homologous protein structure pairs. Results: In this proof-of-concept work, we search for the closest homologous protein pairs using the structure models of five model organisms from the AlphaFold database. We compare the results with homologous protein pairs detected by their sequence similarity and show that the structural matching approach finds a similar set of results. In addition, we detect potential novel homologs solely with the structural matching approach, which can help to understand the function of uncharacterized proteins and make previously overlooked connections between well-characterized proteins. We also observe limitations of our implementation of the structure-based approach, particularly when handling highly disordered proteins or short protein structures. Our work shows that high accuracy protein structure models can be used to discover homologous protein pairs, and we expose areas for improvement of this structural matching approach. Availability and Implementation: Information to the discovered homologous protein pairs can be found at the following URL: https://doi.org/10.17863/CAM.87873. The code can be accessed here: https://github.com/VivianMonzon/Reciprocal_Best_Structure_Hits. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

The InterPro protein families and domains database: 20 years on.

Blum, Matthias; Chang, Hsin-Yu; Chuguransky, Sara; Grego, Tiago; Kandasaamy, Swaathi; Mitchell, Alex; Nuka, Gift; Paysan-Lafosse, Typhaine; Qureshi, Matloob; Raj, Shriya; Richardson, Lorna; Salazar, Gustavo A; Williams, Lowri; Bork, Peer; Bridge, Alan; Gough, Julian; Haft, Daniel H; Letunic, Ivica; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Necci, Marco; Orengo, Christine A; Pandurangan, Arun P; Rivoire, Catherine; Sigrist, Christian J A; Sillitoe, Ian; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Wu, Cathy H; Bateman, Alex; Finn, Robert D.

Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.

Article in English | MEDLINE | ID: mdl-33156333

ABSTRACT

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.

Subject(s)

Databases, Protein , Proteins/chemistry , Amino Acid Sequence , COVID-19/metabolism , Internet , Molecular Sequence Annotation , Protein Domains , Protein Interaction Maps , SARS-CoV-2/metabolism , Sequence Alignment

Genome3D: integrating a collaborative data pipeline to expand the depth and breadth of consensus protein structure annotation.

Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Finn, Robert D; Gough, Julian; Jones, David; Kelley, Lawrence A; Paysan-Lafosse, Typhaine; Lam, Su Datt; Murzin, Alexey G; Pandurangan, Arun Prasad; Salazar, Gustavo A; Skwark, Marcin J; Sternberg, Michael J E; Velankar, Sameer; Orengo, Christine.

Nucleic Acids Res ; 48(D1): D314-D319, 2020 01 08.

Article in English | MEDLINE | ID: mdl-31733063

ABSTRACT

Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.

Subject(s)

Proteins/chemistry , Databases, Protein , Proteins/classification , Proteins/genetics , User-Computer Interface

PDBe: improved findability of macromolecular structure data in the PDB.

Armstrong, David R; Berrisford, John M; Conroy, Matthew J; Gutmanas, Aleksandras; Anyango, Stephen; Choudhary, Preeti; Clark, Alice R; Dana, Jose M; Deshpande, Mandar; Dunlop, Roisin; Gane, Paul; Gáborová, Romana; Gupta, Deepti; Haslam, Pauline; Koca, Jaroslav; Mak, Lora; Mir, Saqib; Mukhopadhyay, Abhik; Nadzirin, Nurul; Nair, Sreenath; Paysan-Lafosse, Typhaine; Pravda, Lukas; Sehnal, David; Salih, Osman; Smart, Oliver; Tolchard, James; Varadi, Mihaly; Svobodova-Vareková, Radka; Zaki, Hossam; Kleywegt, Gerard J; Velankar, Sameer.

Nucleic Acids Res ; 48(D1): D335-D343, 2020 01 08.

Article in English | MEDLINE | ID: mdl-31691821

ABSTRACT

The Protein Data Bank in Europe (PDBe), a founding member of the Worldwide Protein Data Bank (wwPDB), actively participates in the deposition, curation, validation, archiving and dissemination of macromolecular structure data. PDBe supports diverse research communities in their use of macromolecular structures by enriching the PDB data and by providing advanced tools and services for effective data access, visualization and analysis. This paper details the enrichment of data at PDBe, including mapping of RNA structures to Rfam, and identification of molecules that act as cofactors. PDBe has developed an advanced search facility with â¼100 data categories and sequence searches. New features have been included in the LiteMol viewer at PDBe, with updated visualization of carbohydrates and nucleic acids. Small molecules are now mapped more extensively to external databases and their visual representation has been enhanced. These advances help users to more easily find and interpret macromolecular structure data in order to solve scientific problems.

Subject(s)

Databases, Protein , Software , Cluster Analysis , Data Accuracy , Europe , Protein Conformation , User-Computer Interface

InterPro in 2019: improving coverage, classification and access to protein sequence annotations.

Mitchell, Alex L; Attwood, Teresa K; Babbitt, Patricia C; Blum, Matthias; Bork, Peer; Bridge, Alan; Brown, Shoshana D; Chang, Hsin-Yu; El-Gebali, Sara; Fraser, Matthew I; Gough, Julian; Haft, David R; Huang, Hongzhan; Letunic, Ivica; Lopez, Rodrigo; Luciani, Aurélien; Madeira, Fabio; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Necci, Marco; Nuka, Gift; Orengo, Christine; Pandurangan, Arun P; Paysan-Lafosse, Typhaine; Pesseat, Sebastien; Potter, Simon C; Qureshi, Matloob A; Rawlings, Neil D; Redaschi, Nicole; Richardson, Lorna J; Rivoire, Catherine; Salazar, Gustavo A; Sangrador-Vegas, Amaia; Sigrist, Christian J A; Sillitoe, Ian; Sutton, Granger G; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Yong, Siew-Yit; Finn, Robert D.

Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30398656

ABSTRACT

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.

Subject(s)

Databases, Protein , Molecular Sequence Annotation , Animals , Databases, Genetic , Gene Ontology , Humans , Internet , Multigene Family , Protein Domains/genetics , Sequence Homology, Amino Acid , Software , User-Computer Interface

PDBe: towards reusable data delivery infrastructure at protein data bank in Europe.

Mir, Saqib; Alhroub, Younes; Anyango, Stephen; Armstrong, David R; Berrisford, John M; Clark, Alice R; Conroy, Matthew J; Dana, Jose M; Deshpande, Mandar; Gupta, Deepti; Gutmanas, Aleksandras; Haslam, Pauline; Mak, Lora; Mukhopadhyay, Abhik; Nadzirin, Nurul; Paysan-Lafosse, Typhaine; Sehnal, David; Sen, Sanchayita; Smart, Oliver S; Varadi, Mihaly; Kleywegt, Gerard J; Velankar, Sameer.

Nucleic Acids Res ; 46(D1): D486-D492, 2018 01 04.

Article in English | MEDLINE | ID: mdl-29126160

ABSTRACT

The Protein Data Bank in Europe (PDBe, pdbe.org) is actively engaged in the deposition, annotation, remediation, enrichment and dissemination of macromolecular structure data. This paper describes new developments and improvements at PDBe addressing three challenging areas: data enrichment, data dissemination and functional reusability. New features of the PDBe Web site are discussed, including a context dependent menu providing links to raw experimental data and improved presentation of structures solved by hybrid methods. The paper also summarizes the features of the LiteMol suite, which is a set of services enabling fast and interactive 3D visualization of structures, with associated experimental maps, annotations and quality assessment information. We introduce a library of Web components which can be easily reused to port data and functionality available at PDBe to other services. We also introduce updates to the SIFTS resource which maps PDB data to other bioinformatics resources, and the PDBe REST API.

Subject(s)

Computational Biology/methods , Databases, Protein , Proteins/chemistry , Sequence Analysis, Protein/methods , User-Computer Interface , Amino Acid Sequence , Computer Graphics , Databases as Topic , Europe , Humans , Information Dissemination , Internet , Models, Molecular , Molecular Sequence Annotation , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Proteins/genetics , Proteins/metabolism

IMGT®, the international ImMunoGeneTics information system® 25 years on.

Lefranc, Marie-Paule; Giudicelli, Véronique; Duroux, Patrice; Jabado-Michaloud, Joumana; Folch, Géraldine; Aouinti, Safa; Carillon, Emilie; Duvergey, Hugo; Houles, Amélie; Paysan-Lafosse, Typhaine; Hadi-Saljoqi, Saida; Sasorith, Souphatta; Lefranc, Gérard; Kossida, Sofia.

Nucleic Acids Res ; 43(Database issue): D413-22, 2015 Jan.

Article in English | MEDLINE | ID: mdl-25378316

ABSTRACT

IMGT(®), the international ImMunoGeneTics information system(®)(http://www.imgt.org) is the global reference in immunogenetics and immunoinformatics. By its creation in 1989 by Marie-Paule Lefranc (Université de Montpellier and CNRS), IMGT(®) marked the advent of immunoinformatics, which emerged at the interface between immunogenetics and bioinformatics. IMGT(®) is specialized in the immunoglobulins (IG) or antibodies, T cell receptors (TR), major histocompatibility (MH) and proteins of the IgSF and MhSF superfamilies. IMGT(®) is built on the IMGT-ONTOLOGY axioms and concepts, which bridged the gap between genes, sequences and 3D structures. The concepts include the IMGT(®) standardized keywords (identification), IMGT(®) standardized labels (description), IMGT(®) standardized nomenclature (classification), IMGT unique numbering and IMGT Colliers de Perles (numerotation). IMGT(®) comprises 7 databases, 17 online tools and 15,000 pages of web resources, and provides a high-quality and integrated system for analysis of the genomic and expressed IG and TR repertoire of the adaptive immune responses, including NGS high-throughput data. Tools and databases are used in basic, veterinary and medical research, in clinical applications (mutation analysis in leukemia and lymphoma) and in antibody engineering and humanization. The IMGT/mAb-DB interface was developed for therapeutic antibodies and fusion proteins for immunological applications (FPIA). IMGT(®) is freely available at http://www.imgt.org.

Subject(s)

Databases, Genetic , Genes, Immunoglobulin , Genes, T-Cell Receptor , Histocompatibility Antigens/chemistry , Immunoglobulins/chemistry , Major Histocompatibility Complex , Receptors, Antigen, T-Cell/chemistry , Alleles , Animals , Biological Ontologies , Computational Biology , Histocompatibility Antigens/genetics , Humans , Immunogenetics , Immunoglobulins/genetics , Immunoglobulins/metabolism , Internet , Receptors, Antigen, T-Cell/genetics , Receptors, Antigen, T-Cell/metabolism , Software

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL