Búsqueda | Portal de Búsqueda de la BVS Colombia

1.

Clustering predicted structures at the scale of the known protein universe.

Barrio-Hernandez, Inigo; Yeo, Jingi; Jänes, Jürgen; Mirdita, Milot; Gilchrist, Cameron L M; Wein, Tanita; Varadi, Mihaly; Velankar, Sameer; Beltrao, Pedro; Steinegger, Martin.

Nature ; 622(7983): 637-645, 2023 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-37704730

RESUMEN

Proteins are key to all cellular processes and their structure is important in understanding their function and evolution. Sequence-based predictions of protein structures have increased in accuracy1, and over 214 million predicted structures are available in the AlphaFold database2. However, studying protein structures at this scale requires highly efficient methods. Here, we developed a structural-alignment-based clustering algorithm-Foldseek cluster-that can cluster hundreds of millions of structures. Using this method, we have clustered all of the structures in the AlphaFold database, identifying 2.30 million non-singleton structural clusters, of which 31% lack annotations representing probable previously undescribed structures. Clusters without annotation tend to have few representatives covering only 4% of all proteins in the AlphaFold database. Evolutionary analysis suggests that most clusters are ancient in origin but 4% seem to be species specific, representing lower-quality predictions or examples of de novo gene birth. We also show how structural comparisons can be used to predict domain families and their relationships, identifying examples of remote structural similarity. On the basis of these analyses, we identify several examples of human immune-related proteins with putative remote homology in prokaryotic species, illustrating the value of this resource for studying protein function and evolution across the tree of life.

Asunto(s)

Algoritmos , Análisis por Conglomerados , Proteínas , Homología Estructural de Proteína , Humanos , Bases de Datos de Proteínas , Proteínas/química , Proteínas/clasificación , Proteínas/metabolismo , Alineación de Secuencia , Anotación de Secuencia Molecular , Células Procariotas/química , Filogenia , Especificidad de la Especie , Evolución Molecular

2.

EMBL's European Bioinformatics Institute (EMBL-EBI) in 2023.

Thakur, Matthew; Buniello, Annalisa; Brooksbank, Catherine; Gurwitz, Kim T; Hall, Matthew; Hartley, Matthew; Hulcoop, David G; Leach, Andrew R; Marques, Diana; Martin, Maria; Mithani, Aziz; McDonagh, Ellen M; Mutasa-Gottgens, Euphemia; Ochoa, David; Perez-Riverol, Yasset; Stephenson, James; Varadi, Mihaly; Velankar, Sameer; Vizcaino, Juan Antonio; Witham, Rick; McEntyre, Johanna.

Nucleic Acids Res ; 52(D1): D10-D17, 2024 Jan 05.

Artículo en Inglés | MEDLINE | ID: mdl-38015445

RESUMEN

The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the latest developments in the services provided by EMBL-EBI data resources to scientific communities globally. These developments aim to ensure EMBL-EBI resources meet the current and future needs of these scientific communities, accelerating the impact of open biological data for all.

Asunto(s)

Academias e Institutos , Biología Computacional , Biología Computacional/organización & administración , Biología Computacional/tendencias , Academias e Institutos/organización & administración , Academias e Institutos/tendencias , Bases de Datos de Ácidos Nucleicos , Europa (Continente)

3.

AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences.

Varadi, Mihaly; Bertoni, Damian; Magana, Paulyna; Paramval, Urmila; Pidruchna, Ivanna; Radhakrishnan, Malarvizhi; Tsenkov, Maxim; Nair, Sreenath; Mirdita, Milot; Yeo, Jingi; Kovalevskiy, Oleg; Tunyasuvunakool, Kathryn; Laydon, Agata; Zídek, Augustin; Tomlinson, Hamish; Hariharan, Dhavanthi; Abrahamson, Josh; Green, Tim; Jumper, John; Birney, Ewan; Steinegger, Martin; Hassabis, Demis; Velankar, Sameer.

Nucleic Acids Res ; 52(D1): D368-D375, 2024 Jan 05.

Artículo en Inglés | MEDLINE | ID: mdl-37933859

RESUMEN

The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB.

The AlphaFold Protein Structure Database (AlphaFold DB) is a massive digital library of predicted protein structures, with over 214 million entries, marking a 500-times expansion in size since its initial release in 2021. The structures are predicted using Google DeepMind's AlphaFold 2 artificial intelligence (AI) system. Our new report highlights the latest updates we have made to this database. We have added more data on specific organisms and proteins related to global health and expanded to cover almost the complete UniProt database, a primary data resource of protein sequences. We also made it easier for our users to access the data by directly downloading files or using advanced cloud-based tools. Finally, we have also improved how users view and search through these protein structures, making the user experience smoother and more informative. In short, AlphaFold DB has been growing rapidly and has become more user-friendly and robust to support the broader scientific community.

Asunto(s)

Inteligencia Artificial , Estructura Secundaria de Proteína , Proteoma , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Motor de Búsqueda , Proteínas/química

4.

PDBImages: a command-line tool for automated macromolecular structure visualization.

Midlik, Adam; Nair, Sreenath; Anyango, Stephen; Deshpande, Mandar; Sehnal, David; Varadi, Mihaly; Velankar, Sameer.

Bioinformatics ; 39(12)2023 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-38085238

RESUMEN

SUMMARY: PDBImages is an innovative, open-source Node.js package that harnesses the power of the popular macromolecule structure visualization software Mol*. Designed for use by the scientific community, PDBImages provides a means to generate high-quality images for PDB and AlphaFold DB models. Its unique ability to render and save images directly to files in a browserless mode sets it apart, offering users a streamlined, automated process for macromolecular structure visualization. Here, we detail the implementation of PDBImages, enumerating its diverse image types, and elaborating on its user-friendly setup. This powerful tool opens a new gateway for researchers to visualize, analyse, and share their work, fostering a deeper understanding of bioinformatics. AVAILABILITY AND IMPLEMENTATION: PDBImages is available as an npm package from https://www.npmjs.com/package/pdb-images. The source code is available from https://github.com/PDBeurope/pdb-images.

Asunto(s)

Biología Computacional , Programas Informáticos , Estructura Molecular , Biología Computacional/métodos

5.

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.

Varadi, Mihaly; Anyango, Stephen; Deshpande, Mandar; Nair, Sreenath; Natassia, Cindy; Yordanova, Galabina; Yuan, David; Stroe, Oana; Wood, Gemma; Laydon, Agata; Zídek, Augustin; Green, Tim; Tunyasuvunakool, Kathryn; Petersen, Stig; Jumper, John; Clancy, Ellen; Green, Richard; Vora, Ankur; Lutfi, Mira; Figurnov, Michael; Cowie, Andrew; Hobbs, Nicole; Kohli, Pushmeet; Kleywegt, Gerard; Birney, Ewan; Hassabis, Demis; Velankar, Sameer.

Nucleic Acids Res ; 50(D1): D439-D444, 2022 01 07.

Artículo en Inglés | MEDLINE | ID: mdl-34791371

RESUMEN

The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by AlphaFold v2.0 of DeepMind, it has enabled an unprecedented expansion of the structural coverage of the known protein-sequence space. AlphaFold DB provides programmatic access to and interactive visualization of predicted atomic coordinates, per-residue and pairwise model-confidence estimates and predicted aligned errors. The initial release of AlphaFold DB contains over 360,000 predicted structures across 21 model-organism proteomes, which will soon be expanded to cover most of the (over 100 million) representative sequences from the UniRef90 data set.

Asunto(s)

Bases de Datos de Proteínas , Pliegue de Proteína , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Animales , Bacterias/genética , Bacterias/metabolismo , Conjuntos de Datos como Asunto , Dictyostelium/genética , Dictyostelium/metabolismo , Hongos/genética , Hongos/metabolismo , Humanos , Internet , Modelos Moleculares , Plantas/genética , Plantas/metabolismo , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Proteínas/genética , Proteínas/metabolismo , Trypanosoma cruzi/genética , Trypanosoma cruzi/metabolismo

6.

The impact of AlphaFold Protein Structure Database on the fields of life sciences.

Varadi, Mihaly; Velankar, Sameer.

Proteomics ; 23(17): e2200128, 2023 09.

Artículo en Inglés | MEDLINE | ID: mdl-36382391

RESUMEN

Arguably, 2020 was the year of high-accuracy protein structure predictions, with AlphaFold 2.0 achieving previously unseen accuracy in the Critical Assessment of Protein Structure Prediction (CASP). In 2021, DeepMind and EMBL-EBI developed the AlphaFold Protein Structure Database to make an unprecedented number of reliable protein structure predictions easily accessible to the broad scientific community. We provide a brief overview and describe the latest developments in the AlphaFold database. We highlight how the fields of data services, bioinformatics, structural biology, and drug discovery are directly affected by the influx of protein structure data. We also show examples of cutting-edge research that took advantage of the AlphaFold database. It is apparent that connections between various fields through protein structures are now possible, but the amount of data poses new challenges. Finally, we give an outlook regarding the future direction of the database, both in terms of data sets and new functionalities.

Asunto(s)

Disciplinas de las Ciencias Biológicas , Proteínas , Conformación Proteica , Bases de Datos de Proteínas , Proteínas/química , Biología Computacional

7.

Challenges in bridging the gap between protein structure prediction and functional interpretation.

Varadi, Mihaly; Tsenkov, Maxim; Velankar, Sameer.

Proteins ; 2023 Oct 18.

Artículo en Inglés | MEDLINE | ID: mdl-37850517

RESUMEN

The rapid evolution of protein structure prediction tools has significantly broadened access to protein structural data. Although predicted structure models have the potential to accelerate and impact fundamental and translational research significantly, it is essential to note that they are not validated and cannot be considered the ground truth. Thus, challenges persist, particularly in capturing protein dynamics, predicting multi-chain structures, interpreting protein function, and assessing model quality. Interdisciplinary collaborations are crucial to overcoming these obstacles. Databases like the AlphaFold Protein Structure Database, the ESM Metagenomic Atlas, and initiatives like the 3D-Beacons Network provide FAIR access to these data, enabling their interpretation and application across a broader scientific community. Whilst substantial advancements have been made in protein structure prediction, further progress is required to address the remaining challenges. Developing training materials, nurturing collaborations, and ensuring open data sharing will be paramount in this pursuit. The continued evolution of these tools and methodologies will deepen our understanding of protein function and accelerate disease pathogenesis and drug development discoveries.

8.

The impact of structural bioinformatics tools and resources on SARS-CoV-2 research and therapeutic strategies.

Waman, Vaishali P; Sen, Neeladri; Varadi, Mihaly; Daina, Antoine; Wodak, Shoshana J; Zoete, Vincent; Velankar, Sameer; Orengo, Christine.

Brief Bioinform ; 22(2): 742-768, 2021 03 22.

Artículo en Inglés | MEDLINE | ID: mdl-33348379

RESUMEN

SARS-CoV-2 is the causative agent of COVID-19, the ongoing global pandemic. It has posed a worldwide challenge to human health as no effective treatment is currently available to combat the disease. Its severity has led to unprecedented collaborative initiatives for therapeutic solutions against COVID-19. Studies resorting to structure-based drug design for COVID-19 are plethoric and show good promise. Structural biology provides key insights into 3D structures, critical residues/mutations in SARS-CoV-2 proteins, implicated in infectivity, molecular recognition and susceptibility to a broad range of host species. The detailed understanding of viral proteins and their complexes with host receptors and candidate epitope/lead compounds is the key to developing a structure-guided therapeutic design. Since the discovery of SARS-CoV-2, several structures of its proteins have been determined experimentally at an unprecedented speed and deposited in the Protein Data Bank. Further, specialized structural bioinformatics tools and resources have been developed for theoretical models, data on protein dynamics from computer simulations, impact of variants/mutations and molecular therapeutics. Here, we provide an overview of ongoing efforts on developing structural bioinformatics tools and resources for COVID-19 research. We also discuss the impact of these resources and structure-based studies, to understand various aspects of SARS-CoV-2 infection and therapeutic development. These include (i) understanding differences between SARS-CoV-2 and SARS-CoV, leading to increased infectivity of SARS-CoV-2, (ii) deciphering key residues in the SARS-CoV-2 involved in receptor-antibody recognition, (iii) analysis of variants in host proteins that affect host susceptibility to infection and (iv) analyses facilitating structure-based drug and vaccine design against SARS-CoV-2.

Asunto(s)

Antivirales/uso terapéutico , Tratamiento Farmacológico de COVID-19 , Biología Computacional , SARS-CoV-2/aislamiento & purificación , COVID-19/virología , Humanos , Conformación Proteica , Proteínas Virales/química

9.

PED in 2021: a major update of the protein ensemble database for intrinsically disordered proteins.

Lazar, Tamas; Martínez-Pérez, Elizabeth; Quaglia, Federica; Hatos, András; Chemes, Lucía B; Iserte, Javier A; Méndez, Nicolás A; Garrone, Nicolás A; Saldaño, Tadeo E; Marchetti, Julia; Rueda, Ana Julia Velez; Bernadó, Pau; Blackledge, Martin; Cordeiro, Tiago N; Fagerberg, Eric; Forman-Kay, Julie D; Fornasari, Maria S; Gibson, Toby J; Gomes, Gregory-Neal W; Gradinaru, Claudiu C; Head-Gordon, Teresa; Jensen, Malene Ringkjøbing; Lemke, Edward A; Longhi, Sonia; Marino-Buslje, Cristina; Minervini, Giovanni; Mittag, Tanja; Monzon, Alexander Miguel; Pappu, Rohit V; Parisi, Gustavo; Ricard-Blum, Sylvie; Ruff, Kiersten M; Salladini, Edoardo; Skepö, Marie; Svergun, Dmitri; Vallet, Sylvain D; Varadi, Mihaly; Tompa, Peter; Tosatto, Silvio C E; Piovesan, Damiano.

Nucleic Acids Res ; 49(D1): D404-D411, 2021 01 08.

Artículo en Inglés | MEDLINE | ID: mdl-33305318

RESUMEN

The Protein Ensemble Database (PED) (https://proteinensemble.org), which holds structural ensembles of intrinsically disordered proteins (IDPs), has been significantly updated and upgraded since its last release in 2016. The new version, PED 4.0, has been completely redesigned and reimplemented with cutting-edge technology and now holds about six times more data (162 versus 24 entries and 242 versus 60 structural ensembles) and a broader representation of state of the art ensemble generation methods than the previous version. The database has a completely renewed graphical interface with an interactive feature viewer for region-based annotations, and provides a series of descriptors of the qualitative and quantitative properties of the ensembles. High quality of the data is guaranteed by a new submission process, which combines both automatic and manual evaluation steps. A team of biocurators integrate structured metadata describing the ensemble generation methodology, experimental constraints and conditions. A new search engine allows the user to build advanced queries and search all entry fields including cross-references to IDP-related resources such as DisProt, MobiDB, BMRB and SASBDB. We expect that the renewed PED will be useful for researchers interested in the atomic-level understanding of IDP function, and promote the rational, structure-based design of IDP-targeting drugs.

Asunto(s)

Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas/química , Humanos , Motor de Búsqueda , Proteína p53 Supresora de Tumor/química

10.

PDBe aggregated API: programmatic access to an integrative knowledge graph of molecular structure data.

Nair, Sreenath; Váradi, Mihály; Nadzirin, Nurul; Pravda, Lukás; Anyango, Stephen; Mir, Saqib; Berrisford, John; Armstrong, David; Gutmanas, Aleksandras; Velankar, Sameer.

Bioinformatics ; 37(21): 3950-3952, 2021 11 05.

Artículo en Inglés | MEDLINE | ID: mdl-34081107

RESUMEN

SUMMARY: The PDBe aggregated API is an open-access and open-source RESTful API that provides programmatic access to a wealth of macromolecular structural data and their functional and biophysical annotations through 80+ API endpoints. The API is powered by the PDBe graph database (https://pdbe.org/graph-schema), an open-access integrative knowledge graph that can be used as a discovery tool to answer complex biological questions. AVAILABILITY AND IMPLEMENTATION: The PDBe aggregated API provides up-to-date access to the PDBe graph database, which has weekly releases with the latest data from the Protein Data Bank, integrated with updated annotations from UniProt, Pfam, CATH, SCOP and the PDBe-KB partner resources. The complete list of all the available API endpoints and their descriptions are available at https://pdbe.org/graph-api. The source code of the Python 3.6+ API application is publicly available at https://gitlab.ebi.ac.uk/pdbe-kb/services/pdbe-graph-api. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Reconocimiento de Normas Patrones Automatizadas , Programas Informáticos , Estructura Molecular , Bases de Datos de Proteínas , Conformación Proteica

11.

PDBe: improved findability of macromolecular structure data in the PDB.

Armstrong, David R; Berrisford, John M; Conroy, Matthew J; Gutmanas, Aleksandras; Anyango, Stephen; Choudhary, Preeti; Clark, Alice R; Dana, Jose M; Deshpande, Mandar; Dunlop, Roisin; Gane, Paul; Gáborová, Romana; Gupta, Deepti; Haslam, Pauline; Koca, Jaroslav; Mak, Lora; Mir, Saqib; Mukhopadhyay, Abhik; Nadzirin, Nurul; Nair, Sreenath; Paysan-Lafosse, Typhaine; Pravda, Lukas; Sehnal, David; Salih, Osman; Smart, Oliver; Tolchard, James; Varadi, Mihaly; Svobodova-Vareková, Radka; Zaki, Hossam; Kleywegt, Gerard J; Velankar, Sameer.

Nucleic Acids Res ; 48(D1): D335-D343, 2020 01 08.

Artículo en Inglés | MEDLINE | ID: mdl-31691821

RESUMEN

The Protein Data Bank in Europe (PDBe), a founding member of the Worldwide Protein Data Bank (wwPDB), actively participates in the deposition, curation, validation, archiving and dissemination of macromolecular structure data. PDBe supports diverse research communities in their use of macromolecular structures by enriching the PDB data and by providing advanced tools and services for effective data access, visualization and analysis. This paper details the enrichment of data at PDBe, including mapping of RNA structures to Rfam, and identification of molecules that act as cofactors. PDBe has developed an advanced search facility with â¼100 data categories and sequence searches. New features have been included in the LiteMol viewer at PDBe, with updated visualization of carbohydrates and nucleic acids. Small molecules are now mapped more extensively to external databases and their visual representation has been enhanced. These advances help users to more easily find and interpret macromolecular structure data in order to solve scientific problems.

Asunto(s)

Bases de Datos de Proteínas , Programas Informáticos , Análisis por Conglomerados , Exactitud de los Datos , Europa (Continente) , Conformación Proteica , Interfaz Usuario-Computador

12.

Comprehensive Collection and Prediction of ABC Transmembrane Protein Structures in the AI Era of Structural Biology.

Tordai, Hedvig; Suhajda, Erzsebet; Sillitoe, Ian; Nair, Sreenath; Varadi, Mihaly; Hegedus, Tamas.

Int J Mol Sci ; 23(16)2022 Aug 09.

Artículo en Inglés | MEDLINE | ID: mdl-36012140

RESUMEN

The number of unique transmembrane (TM) protein structures doubled in the last four years, which can be attributed to the revolution of cryo-electron microscopy. In addition, AlphaFold2 (AF2) also provided a large number of predicted structures with high quality. However, if a specific protein family is the subject of a study, collecting the structures of the family members is highly challenging in spite of existing general and protein domain-specific databases. Here, we demonstrate this and assess the applicability and usability of automatic collection and presentation of protein structures via the ABC protein superfamily. Our pipeline identifies and classifies transmembrane ABC protein structures using the PFAM search and also aims to determine their conformational states based on special geometric measures, conftors. Since the AlphaFold database contains structure predictions only for single polypeptide chains, we performed AF2-Multimer predictions for human ABC half transporters functioning as dimers. Our AF2 predictions warn of possibly ambiguous interpretation of some biochemical data regarding interaction partners and call for further experiments and experimental structure determination. We made our predicted ABC protein structures available through a web application, and we joined the 3D-Beacons Network to reach the broader scientific community through platforms such as PDBe-KB.

Asunto(s)

Transportadoras de Casetes de Unión a ATP , Furilfuramida , Transportadoras de Casetes de Unión a ATP/metabolismo , Inteligencia Artificial , Biología , Microscopía por Crioelectrón , Humanos , Modelos Moleculares , Conformación Proteica

13.

PDBeCIF: an open-source mmCIF/CIF parsing and processing package.

van Ginkel, Glen; Pravda, Lukás; Dana, José M; Varadi, Mihaly; Keller, Peter; Anyango, Stephen; Velankar, Sameer.

BMC Bioinformatics ; 22(1): 383, 2021 Jul 23.

Artículo en Inglés | MEDLINE | ID: mdl-34301175

RESUMEN

BACKGROUND: Biomacromolecular structural data outgrew the legacy Protein Data Bank (PDB) format which the scientific community relied on for decades, yet the use of its successor PDBx/Macromolecular Crystallographic Information File format (PDBx/mmCIF) is still not widespread. Perhaps one of the reasons is the availability of easy to use tools that only support the legacy format, but also the inherent difficulties of processing mmCIF files correctly, given the number of edge cases that make efficient parsing problematic. Nevertheless, to fully exploit macromolecular structure data and their associated annotations such as multiscale structures from integrative/hybrid methods or large macromolecular complexes determined using traditional methods, it is necessary to fully adopt the new format as soon as possible. RESULTS: To this end, we developed PDBeCIF, an open-source Python project for manipulating mmCIF and CIF files. It is part of the official list of mmCIF parsers recorded by the wwPDB and is heavily employed in the processes of the Protein Data Bank in Europe. The package is freely available both from the PyPI repository ( http://pypi.org/project/pdbecif ) and from GitHub ( https://github.com/pdbeurope/pdbecif ) along with rich documentation and many ready-to-use examples. CONCLUSIONS: PDBeCIF is an efficient and lightweight Python 2.6+/3+ package with no external dependencies. It can be readily integrated with 3rd party libraries as well as adopted for broad scientific analyses.

Asunto(s)

Programas Informáticos , Bases de Datos de Proteínas , Europa (Continente) , Sustancias Macromoleculares , Estructura Molecular

14.

AmyPro: a database of proteins with validated amyloidogenic regions.

Varadi, Mihaly; De Baets, Greet; Vranken, Wim F; Tompa, Peter; Pancsa, Rita.

Nucleic Acids Res ; 46(D1): D387-D392, 2018 01 04.

Artículo en Inglés | MEDLINE | ID: mdl-29040693

RESUMEN

Soluble functional proteins may transform into insoluble amyloid fibrils that deposit in a variety of tissues. Amyloid formation is a hallmark of age-related degenerative disorders. Perhaps surprisingly, amyloid fibrils can also be beneficial and are frequently exploited for diverse functional roles in organisms. Here we introduce AmyPro, an open-access database providing a comprehensive, carefully curated collection of validated amyloid fibril-forming proteins from all kingdoms of life classified into broad functional categories (http://amypro.net). In particular, AmyPro provides the boundaries of experimentally validated amyloidogenic sequence regions, short descriptions of the functional relevance of the proteins and their amyloid state, a list of the experimental techniques applied to study the amyloid state, important structural/functional/variation/mutation data transferred from UniProt, a list of relevant PDB structures categorized according to protein states, database cross-references and literature references. AmyPro greatly improves on similar currently available resources by incorporating both prions and functional amyloids in addition to pathogenic amyloids, and allows users to screen their sequences against the entire collection of validated amyloidogenic sequence fragments. By enabling further elucidation of the sequential determinants of amyloid fibril formation, we hope AmyPro will enhance the development of new methods for the precise prediction of amyloidogenic regions within proteins.

Asunto(s)

Proteínas Amiloidogénicas/química , Bases de Datos de Proteínas , Interfaz Usuario-Computador

15.

PDBe: towards reusable data delivery infrastructure at protein data bank in Europe.

Mir, Saqib; Alhroub, Younes; Anyango, Stephen; Armstrong, David R; Berrisford, John M; Clark, Alice R; Conroy, Matthew J; Dana, Jose M; Deshpande, Mandar; Gupta, Deepti; Gutmanas, Aleksandras; Haslam, Pauline; Mak, Lora; Mukhopadhyay, Abhik; Nadzirin, Nurul; Paysan-Lafosse, Typhaine; Sehnal, David; Sen, Sanchayita; Smart, Oliver S; Varadi, Mihaly; Kleywegt, Gerard J; Velankar, Sameer.

Nucleic Acids Res ; 46(D1): D486-D492, 2018 01 04.

Artículo en Inglés | MEDLINE | ID: mdl-29126160

RESUMEN

The Protein Data Bank in Europe (PDBe, pdbe.org) is actively engaged in the deposition, annotation, remediation, enrichment and dissemination of macromolecular structure data. This paper describes new developments and improvements at PDBe addressing three challenging areas: data enrichment, data dissemination and functional reusability. New features of the PDBe Web site are discussed, including a context dependent menu providing links to raw experimental data and improved presentation of structures solved by hybrid methods. The paper also summarizes the features of the LiteMol suite, which is a set of services enabling fast and interactive 3D visualization of structures, with associated experimental maps, annotations and quality assessment information. We introduce a library of Web components which can be easily reused to port data and functionality available at PDBe to other services. We also introduce updates to the SIFTS resource which maps PDB data to other bioinformatics resources, and the PDBe REST API.

Asunto(s)

Biología Computacional/métodos , Bases de Datos de Proteínas , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Interfaz Usuario-Computador , Secuencia de Aminoácidos , Gráficos por Computador , Bases de Datos como Asunto , Europa (Continente) , Humanos , Difusión de la Información , Internet , Modelos Moleculares , Anotación de Secuencia Molecular , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Proteínas/genética , Proteínas/metabolismo

16.

Start2Fold: a database of hydrogen/deuterium exchange data on protein folding and stability.

Pancsa, Rita; Varadi, Mihaly; Tompa, Peter; Vranken, Wim F.

Nucleic Acids Res ; 44(D1): D429-34, 2016 Jan 04.

Artículo en Inglés | MEDLINE | ID: mdl-26582925

RESUMEN

Proteins fulfil a wide range of tasks in cells; understanding how they fold into complex three-dimensional (3D) structures and how these structures remain stable while retaining sufficient dynamics for functionality is essential for the interpretation of overall protein behaviour. Since the 1950's, solvent exchange-based methods have been the most powerful experimental means to obtain information on the folding and stability of proteins. Considerable expertise and care were required to obtain the resulting datasets, which, despite their importance and intrinsic value, have never been collected, curated and classified. Start2Fold is an openly accessible database (http://start2fold.eu) of carefully curated hydrogen/deuterium exchange (HDX) data extracted from the literature that is open for new submissions from the community. The database entries contain (i) information on the proteins investigated and the underlying experimental procedures and (ii) the classification of the residues based on their exchange protection levels, also allowing for the instant visualization of the relevant residue groups on the 3D structures of the corresponding proteins. By providing a clear hierarchical framework for the easy sharing, comparison and (re-)interpretation of HDX data, Start2Fold intends to promote a better understanding of how the protein sequence encodes folding and structure as well as the development of new computational methods predicting protein folding and stability.

Asunto(s)

Bases de Datos de Proteínas , Medición de Intercambio de Deuterio , Pliegue de Proteína , Estabilidad Proteica , Conformación Proteica , Análisis de Secuencia de Proteína

17.

Just a Flexible Linker? The Structural and Dynamic Properties of CBP-ID4 Revealed by NMR Spectroscopy.

Piai, Alessandro; Calçada, Eduardo O; Tarenzi, Thomas; Grande, Alessandro Del; Varadi, Mihaly; Tompa, Peter; Felli, Isabella C; Pierattelli, Roberta.

Biophys J ; 110(2): 372-381, 2016 Jan 19.

Artículo en Inglés | MEDLINE | ID: mdl-26789760

RESUMEN

Here, we present a structural and dynamic description of CBP-ID4 at atomic resolution. ID4 is the fourth intrinsically disordered linker of CREB-binding protein (CBP). In spite of the largely disordered nature of CBP-ID4, NMR chemical shifts and relaxation measurements show a significant degree of α-helix sampling in the protein regions encompassing residues 2-25 and 101-128 (1852-1875 and 1951-1978 in full-length CBP). Proline residues are uniformly distributed along the polypeptide, except for the two α-helical regions, indicating that they play an active role in modulating the structural features of this CBP fragment. The two helical regions are lacking known functional motifs, suggesting that they represent thus-far uncharacterized functional modules of CBP. This work provides insights into the functions of this protein linker that may exploit its plasticity to modulate the relative orientations of neighboring folded domains of CBP and fine-tune its interactions with a multitude of partners.

Asunto(s)

Proteína de Unión a CREB/química , Proteínas Inhibidoras de la Diferenciación/química , Simulación de Dinámica Molecular , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Humanos , Datos de Secuencia Molecular , Estructura Terciaria de Proteína

18.

pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins.

Varadi, Mihaly; Kosol, Simone; Lebrun, Pierre; Valentini, Erica; Blackledge, Martin; Dunker, A Keith; Felli, Isabella C; Forman-Kay, Julie D; Kriwacki, Richard W; Pierattelli, Roberta; Sussman, Joel; Svergun, Dmitri I; Uversky, Vladimir N; Vendruscolo, Michele; Wishart, David; Wright, Peter E; Tompa, Peter.

Nucleic Acids Res ; 42(Database issue): D326-35, 2014 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-24174539

RESUMEN

The goal of pE-DB (http://pedb.vib.be) is to serve as an openly accessible database for the deposition of structural ensembles of intrinsically disordered proteins (IDPs) and of denatured proteins based on nuclear magnetic resonance spectroscopy, small-angle X-ray scattering and other data measured in solution. Owing to the inherent flexibility of IDPs, solution techniques are particularly appropriate for characterizing their biophysical properties, and structural ensembles in agreement with these data provide a convenient tool for describing the underlying conformational sampling. Database entries consist of (i) primary experimental data with descriptions of the acquisition methods and algorithms used for the ensemble calculations, and (ii) the structural ensembles consistent with these data, provided as a set of models in a Protein Data Bank format. PE-DB is open for submissions from the community, and is intended as a forum for disseminating the structural ensembles and the methodologies used to generate them. While the need to represent the IDP structures is clear, methods for determining and evaluating the structural ensembles are still evolving. The availability of the pE-DB database is expected to promote the development of new modeling methods and leads to a better understanding of how function arises from disordered states.

Asunto(s)

Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas/química , Desplegamiento Proteico , Internet , Resonancia Magnética Nuclear Biomolecular , Dispersión del Ángulo Pequeño , Difracción de Rayos X

19.

DisCons: a novel tool to quantify and classify evolutionary conservation of intrinsic protein disorder.

Varadi, Mihaly; Guharoy, Mainak; Zsolyomi, Fruzsina; Tompa, Peter.

BMC Bioinformatics ; 16: 153, 2015 May 13.

Artículo en Inglés | MEDLINE | ID: mdl-25968230

RESUMEN

BACKGROUND: Analyzing the amino acid sequence of an intrinsically disordered protein (IDP) in an evolutionary context can yield novel insights on the functional role of disordered regions and sequence element(s). However, in the case of many IDPs, the lack of evolutionary conservation of the primary sequence can hamper the study of functionality, because the conservation of their disorder profile and ensuing function(s) may not appear in a traditional analysis of the evolutionary history of the protein. RESULTS: Here we present DisCons (Disorder Conservation), a novel pipelined tool that combines the quantification of sequence- and disorder conservation to classify disordered residue positions. According to this scheme, the most interesting categories (for functional purposes) are constrained disordered residues and flexible disordered residues. The former residues show conservation of both the sequence and the property of disorder and are associated mainly with specific binding functionalities (e.g., short, linear motifs, SLiMs), whereas the latter class correspond to segments where disorder as a feature is important for function as opposed to the identity of the underlying sequence (e.g., entropic chains and linkers). DisCons therefore helps with elucidating the function(s) arising from the disordered state by analyzing individual proteins as well as large-scale proteomics datasets. CONCLUSIONS: DisCons is an openly accessible sequence analysis tool that identifies and highlights structurally disordered segments of proteins where the conformational flexibility is conserved across homologs, and therefore potentially functional. The tool is freely available both as a web application and as stand-alone source code hosted at http://pedb.vib.be/discons .

Asunto(s)

Secuencia Conservada , Evolución Molecular , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Proteína p53 Supresora de Tumor/química , Secuencia de Aminoácidos , Sustitución de Aminoácidos , Humanos , Modelos Moleculares , Datos de Secuencia Molecular , Conformación Proteica

20.

The Protein Ensemble Database.

Varadi, Mihaly; Tompa, Peter.

Adv Exp Med Biol ; 870: 335-49, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26387108

RESUMEN

The scientific community's major conceptual notion of structural biology has recently shifted in emphasis from the classical structure-function paradigm due to the emergence of intrinsically disordered proteins (IDPs). As opposed to their folded cousins, these proteins are defined by the lack of a stable 3D fold and a high degree of inherent structural heterogeneity that is closely tied to their function. Due to their flexible nature, solution techniques such as small-angle X-ray scattering (SAXS), nuclear magnetic resonance (NMR) spectroscopy and fluorescence resonance energy transfer (FRET) are particularly well-suited for characterizing their biophysical properties. Computationally derived structural ensembles based on such experimental measurements provide models of the conformational sampling displayed by these proteins, and they may offer valuable insights into the functional consequences of inherent flexibility. The Protein Ensemble Database (http://pedb.vib.be) is the first openly accessible, manually curated online resource storing the ensemble models, protocols used during the calculation procedure, and underlying primary experimental data derived from SAXS and/or NMR measurements. By making this previously inaccessible data freely available to researchers, this novel resource is expected to promote the development of more advanced modelling methodologies, facilitate the design of standardized calculation protocols, and consequently lead to a better understanding of how function arises from the disordered state.

Asunto(s)

Bases de Datos de Proteínas , Proteínas/química , Almacenamiento y Recuperación de la Información , Resonancia Magnética Nuclear Biomolecular , Dispersión del Ángulo Pequeño , Interfaz Usuario-Computador , Difracción de Rayos X

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA