Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Nature ; 622(7983): 646-653, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37704037

RESUMEN

We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this 'dark matter' of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the ß-flower fold, added several protein families to Pfam database2 and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin-antitoxin systems, TumE-TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.


Asunto(s)
Bases de Datos de Proteínas , Aprendizaje Profundo , Anotación de Secuencia Molecular , Pliegue de Proteína , Proteínas , Homología Estructural de Proteína , Secuencia de Aminoácidos , Internet , Proteínas/química , Proteínas/clasificación , Proteínas/metabolismo
2.
Nucleic Acids Res ; 52(W1): W318-W323, 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38634802

RESUMEN

The 'structure assessment' web server is a one-stop shop for interactive evaluation and benchmarking of structural models of macromolecular complexes including proteins and nucleic acids. A user-friendly web dashboard links sequence with structure information and results from a variety of state-of-the-art tools, which facilitates the visual exploration and evaluation of structure models. The dashboard integrates stereochemistry information, secondary structure information, global and local model quality assessment of the tertiary structure of comparative protein models, as well as prediction of membrane location. In addition, a benchmarking mode is available where a model can be compared to a reference structure, providing easy access to scores that have been used in recent CASP experiments and CAMEO. The structure assessment web server is available at https://swissmodel.expasy.org/assess.


Asunto(s)
Internet , Modelos Moleculares , Programas Informáticos , Proteínas/química , Benchmarking , Conformación Proteica
3.
Bioinformatics ; 40(1)2024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-38175775

RESUMEN

MOTIVATION: Language models are routinely used for text classification and generative tasks. Recently, the same architectures were applied to protein sequences, unlocking powerful new approaches in the bioinformatics field. Protein language models (pLMs) generate high-dimensional embeddings on a per-residue level and encode a "semantic meaning" of each individual amino acid in the context of the full protein sequence. These representations have been used as a starting point for downstream learning tasks and, more recently, for identifying distant homologous relationships between proteins. RESULTS: In this work, we introduce a new method that generates embedding-based protein sequence alignments (EBA) and show how these capture structural similarities even in the twilight zone, outperforming both classical methods as well as other approaches based on pLMs. The method shows excellent accuracy despite the absence of training and parameter optimization. We demonstrate that the combination of pLMs with alignment methods is a valuable approach for the detection of relationships between proteins in the twilight-zone. AVAILABILITY AND IMPLEMENTATION: The code to run EBA and reproduce the analysis described in this article is available at: https://git.scicore.unibas.ch/schwede/EBA and https://git.scicore.unibas.ch/schwede/eba_benchmark.


Asunto(s)
Aminoácidos , Proteínas , Proteínas/química , Secuencia de Aminoácidos , Alineación de Secuencia , Lenguaje
4.
Proteins ; 91(12): 1850-1860, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37858934

RESUMEN

Predicting model quality is a fundamental component of any modeling procedure, and blind assessment of these methods constitutes a crucial aspect of the Critical Assessment of Protein Structure Prediction (CASP) experiment. Historically, the main focus was on assessing methods that predict global and per-residue accuracies in tertiary structure models. This focus shifted with the community's increased efforts in modeling complexes and assemblies. We asked the community to process the models from the CASP15 assembly category and provide estimates of the accuracy of the predicted quaternary structure, both globally and at the local interface level. Besides identifying remarkable accuracy of modeling groups in assessing their own predictions, we set up a benchmarking pipeline to highlight different aspects of quaternary structure models and introduced a simple consensus EMA method as baseline. While participating methods showed commendable performance, the baseline was difficult to surpass. It is important to point out that prediction performance varies for the individual CASP targets, highlighting potential areas of improvement and challenges ahead.


Asunto(s)
Biología Computacional , Proteínas , Conformación Proteica , Modelos Moleculares , Biología Computacional/métodos , Proteínas/química , Benchmarking
5.
Proteins ; 91(12): 1811-1821, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37795762

RESUMEN

CASP15 introduced a new category, ligand prediction, where participants were provided with a protein or nucleic acid sequence, SMILES line notation, and stoichiometry for ligands and tasked with generating computational models for the three-dimensional structure of the corresponding protein-ligand complex. These models were subsequently compared with experimental structures determined by x-ray crystallography or cryoEM. To assess these predictions, two novel scores were developed. The Binding-Site Superposed, Symmetry-Corrected Pose Root Mean Square Deviation (BiSyRMSD) evaluated the absolute deviations of the models from the experimental structures. At the same time, the Local Distance Difference Test for Protein-Ligand Interactions (lDDT-PLI) assessed the ability of models to reproduce the protein-ligand interactions in the experimental structures. The ligands evaluated in this challenge range from single-atom ions to large flexible organic molecules. More than 1800 submissions were evaluated for their ability to predict 23 different protein-ligand complexes. Overall, the best models could faithfully reproduce the geometries of more than half of the prediction targets. The ligands' size and flexibility were the primary factors influencing the predictions' quality. Small ions and organic molecules with limited flexibility were predicted with high fidelity, while reproducing the binding poses of larger, flexible ligands proved more challenging.


Asunto(s)
Modelos Moleculares , Humanos , Ligandos , Sitios de Unión , Iones , Unión Proteica , Cristalografía por Rayos X
6.
Proteins ; 91(12): 1550-1557, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37306011

RESUMEN

Prediction categories in the Critical Assessment of Structure Prediction (CASP) experiments change with the need to address specific problems in structure modeling. In CASP15, four new prediction categories were introduced: RNA structure, ligand-protein complexes, accuracy of oligomeric structures and their interfaces, and ensembles of alternative conformations. This paper lists technical specifications for these categories and describes their integration in the CASP data management system.


Asunto(s)
Biología Computacional , Proteínas , Conformación Proteica , Proteínas/química , Modelos Moleculares , Ligandos
7.
PLoS Comput Biol ; 17(1): e1008667, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33507980

RESUMEN

Computational methods for protein structure modelling are routinely used to complement experimental structure determination, thus they help to address a broad spectrum of scientific questions in biomedical research. The most accurate methods today are based on homology modelling, i.e. detecting a homologue to the desired target sequence that can be used as a template for modelling. Here we present a versatile open source homology modelling toolbox as foundation for flexible and computationally efficient modelling workflows. ProMod3 is a fully scriptable software platform that can perform all steps required to generate a protein model by homology. Its modular design aims at fast prototyping of novel algorithms and implementing flexible modelling pipelines. Common modelling tasks, such as loop modelling, sidechain modelling or generating a full protein model by homology, are provided as production ready pipelines, forming the starting point for own developments and enhancements. ProMod3 is the central software component of the widely used SWISS-MODEL web-server.


Asunto(s)
Biología Computacional/métodos , Modelos Moleculares , Proteínas/química , Programas Informáticos , Homología Estructural de Proteína , Algoritmos , Bases de Datos de Proteínas , Internet , Conformación Proteica
8.
Bioinformatics ; 36(6): 1765-1771, 2020 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-31697312

RESUMEN

MOTIVATION: Methods that estimate the quality of a 3D protein structure model in absence of an experimental reference structure are crucial to determine a model's utility and potential applications. Single model methods assess individual models whereas consensus methods require an ensemble of models as input. In this work, we extend the single model composite score QMEAN that employs statistical potentials of mean force and agreement terms by introducing a consensus-based distance constraint (DisCo) score. RESULTS: DisCo exploits distance distributions from experimentally determined protein structures that are homologous to the model being assessed. Feed-forward neural networks are trained to adaptively weigh contributions by the multi-template DisCo score and classical single model QMEAN parameters. The result is the composite score QMEANDisCo, which combines the accuracy of consensus methods with the broad applicability of single model approaches. We also demonstrate that, despite being the de-facto standard for structure prediction benchmarking, CASP models are not the ideal data source to train predictive methods for model quality estimation. For performance assessment, QMEANDisCo is continuously benchmarked within the CAMEO project and participated in CASP13. For both, it ranks among the top performers and excels with low response times. AVAILABILITY AND IMPLEMENTATION: QMEANDisCo is available as web-server at https://swissmodel.expasy.org/qmean. The source code can be downloaded from https://git.scicore.unibas.ch/schwede/QMEAN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteínas , Programas Informáticos , Modelos Moleculares , Redes Neurales de la Computación , Conformación Proteica
9.
Nucleic Acids Res ; 46(W1): W296-W303, 2018 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-29788355

RESUMEN

Homology modelling has matured into an important technique in structural biology, significantly contributing to narrowing the gap between known protein sequences and experimentally determined structures. Fully automated workflows and servers simplify and streamline the homology modelling process, also allowing users without a specific computational expertise to generate reliable protein models and have easy access to modelling results, their visualization and interpretation. Here, we present an update to the SWISS-MODEL server, which pioneered the field of automated modelling 25 years ago and been continuously further developed. Recently, its functionality has been extended to the modelling of homo- and heteromeric complexes. Starting from the amino acid sequences of the interacting proteins, both the stoichiometry and the overall structure of the complex are inferred by homology modelling. Other major improvements include the implementation of a new modelling engine, ProMod3 and the introduction a new local model quality estimation method, QMEANDisCo. SWISS-MODEL is freely available at https://swissmodel.expasy.org.


Asunto(s)
Internet , Conformación Proteica , Proteínas/genética , Programas Informáticos , Bases de Datos de Proteínas , Modelos Químicos , Simulación de Dinámica Molecular , Proteínas/química , Homología de Secuencia de Aminoácido , Homología Estructural de Proteína
10.
Int J Mol Sci ; 21(14)2020 Jul 21.
Artículo en Inglés | MEDLINE | ID: mdl-32708196

RESUMEN

(1) Background: Virtual screening studies on the therapeutically relevant proteins of the severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) require a detailed characterization of their druggable binding sites, and, more generally, a convenient pocket mapping represents a key step for structure-based in silico studies; (2) Methods: Along with a careful literature search on SARS-CoV-2 protein targets, the study presents a novel strategy for pocket mapping based on the combination of pocket (as performed by the well-known FPocket tool) and docking searches (as performed by PLANTS or AutoDock/Vina engines); such an approach is implemented by the Pockets 2.0 plug-in for the VEGA ZZ suite of programs; (3) Results: The literature analysis allowed the identification of 16 promising binding cavities within the SARS-CoV-2 proteins and the here proposed approach was able to recognize them showing performances clearly better than those reached by the sole pocket detection; and (4) Conclusions: Even though the presented strategy should require more extended validations, this proved successful in precisely characterizing a set of SARS-CoV-2 druggable binding pockets including both orthosteric and allosteric sites, which are clearly amenable for virtual screening campaigns and drug repurposing studies. All results generated by the study and the Pockets 2.0 plug-in are available for download.


Asunto(s)
Antivirales/química , Betacoronavirus/efectos de los fármacos , Infecciones por Coronavirus/tratamiento farmacológico , Neumonía Viral/tratamiento farmacológico , Proteínas Virales/química , Sitios de Unión/efectos de los fármacos , COVID-19 , Reposicionamiento de Medicamentos , Humanos , Simulación del Acoplamiento Molecular , Pandemias , Unión Proteica/efectos de los fármacos , Conformación Proteica , SARS-CoV-2
11.
Proteins ; 87(12): 1378-1387, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31571280

RESUMEN

Critical blind assessment of structure prediction techniques is crucial for the scientific community to establish the state of the art, identify bottlenecks, and guide future developments. In Critical Assessment of Techniques in Structure Prediction (CASP), human experts assess the performance of participating methods in relation to the difficulty of the prediction task in a biennial experiment on approximately 100 targets. Yet, the development of automated computational modeling methods requires more frequent evaluation cycles and larger sets of data. The "Continuous Automated Model EvaluatiOn (CAMEO)" platform complements CASP by conducting fully automated blind prediction evaluations based on the weekly pre-release of sequences of those structures, which are going to be published in the next release of the Protein Data Bank (PDB). Each week, CAMEO publishes benchmarking results for predictions corresponding to a set of about 20 targets collected during a 4-day prediction window. CAMEO benchmarking data are generated consistently for all methods at the same point in time, enabling developers to cross-validate their method's performance, and referring to their results in publications. Many successful participants of CASP have used CAMEO-either by directly benchmarking their methods within the system or by comparing their own performance to CAMEO reference data. CAMEO offers a variety of scores reflecting different aspects of structure modeling, for example, binding site accuracy, homo-oligomer interface quality, or accuracy of local model confidence estimates. By introducing the "bestSingleTemplate" method based on structure superpositions as a reference for the accuracy of 3D modeling predictions, CAMEO facilitates objective comparison of techniques and fosters the development of advanced methods.


Asunto(s)
Biología Computacional , Conformación Proteica , Proteínas/ultraestructura , Programas Informáticos , Algoritmos , Benchmarking , Sitios de Unión , Bases de Datos de Proteínas , Humanos , Modelos Moleculares , Pliegue de Proteína , Proteínas/química , Proteínas/genética , Análisis de Secuencia de Proteína
12.
Proteins ; 87(12): 1361-1377, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31265154

RESUMEN

Methods to reliably estimate the accuracy of 3D models of proteins are both a fundamental part of most protein folding pipelines and important for reliable identification of the best models when multiple pipelines are used. Here, we describe the progress made from CASP12 to CASP13 in the field of estimation of model accuracy (EMA) as seen from the progress of the most successful methods in CASP13. We show small but clear progress, that is, several methods perform better than the best methods from CASP12 when tested on CASP13 EMA targets. Some progress is driven by applying deep learning and residue-residue contacts to model accuracy prediction. We show that the best EMA methods select better models than the best servers in CASP13, but that there exists a great potential to improve this further. Also, according to the evaluation criteria based on local similarities, such as lDDT and CAD, it is now clear that single model accuracy methods perform relatively better than consensus-based methods.


Asunto(s)
Biología Computacional , Conformación Proteica , Proteínas/ultraestructura , Programas Informáticos , Algoritmos , Bases de Datos de Proteínas , Modelos Moleculares , Pliegue de Proteína , Proteínas/química , Proteínas/genética , Alineación de Secuencia , Análisis de Secuencia de Proteína
13.
Nucleic Acids Res ; 45(D1): D313-D319, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899672

RESUMEN

SWISS-MODEL Repository (SMR) is a database of annotated 3D protein structure models generated by the automated SWISS-MODEL homology modeling pipeline. It currently holds >400 000 high quality models covering almost 20% of Swiss-Prot/UniProtKB entries. In this manuscript, we provide an update of features and functionalities which have been implemented recently. We address improvements in target coverage, model quality estimates, functional annotations and improved in-page visualization. We also introduce a new update concept which includes regular updates of an expanded set of core organism models and UniProtKB-based targets, complemented by user-driven on-demand update of individual models. With the new release of the modeling pipeline, SMR has implemented a REST-API and adopted an open licencing model for accessing model coordinates, thus enabling bulk download for groups of targets fostering re-use of models in other contexts. SMR can be accessed at https://swissmodel.expasy.org/repository.


Asunto(s)
Bases de Datos de Proteínas , Modelos Moleculares , Conformación Proteica , Proteínas/química , Humanos , Proteoma , Proteómica/métodos , Programas Informáticos , Relación Estructura-Actividad , Navegador Web
14.
Proteins ; 86 Suppl 1: 387-398, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29178137

RESUMEN

Every second year, the community experiment "Critical Assessment of Techniques for Structure Prediction" (CASP) is conducting an independent blind assessment of structure prediction methods, providing a framework for comparing the performance of different approaches and discussing the latest developments in the field. Yet, developers of automated computational modeling methods clearly benefit from more frequent evaluations based on larger sets of data. The "Continuous Automated Model EvaluatiOn (CAMEO)" platform complements the CASP experiment by conducting fully automated blind prediction assessments based on the weekly pre-release of sequences of those structures, which are going to be published in the next release of the PDB Protein Data Bank. CAMEO publishes weekly benchmarking results based on models collected during a 4-day prediction window, on average assessing ca. 100 targets during a time frame of 5 weeks. CAMEO benchmarking data is generated consistently for all participating methods at the same point in time, enabling developers to benchmark and cross-validate their method's performance, and directly refer to the benchmarking results in publications. In order to facilitate server development and promote shorter release cycles, CAMEO sends weekly email with submission statistics and low performance warnings. Many participants of CASP have successfully employed CAMEO when preparing their methods for upcoming community experiments. CAMEO offers a variety of scores to allow benchmarking diverse aspects of structure prediction methods. By introducing new scoring schemes, CAMEO facilitates new development in areas of active research, for example, modeling quaternary structure, complexes, or ligand binding sites.


Asunto(s)
Biología Computacional/métodos , Modelos Moleculares , Conformación Proteica , Proteínas/química , Proteínas/metabolismo , Análisis de Secuencia de Proteína/métodos , Sitios de Unión , Bases de Datos de Proteínas , Humanos , Ligandos , Unión Proteica
16.
Nucleic Acids Res ; 42(Web Server issue): W252-8, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24782522

RESUMEN

Protein structure homology modelling has become a routine technique to generate 3D models for proteins when experimental structures are not available. Fully automated servers such as SWISS-MODEL with user-friendly web interfaces generate reliable models without the need for complex software packages or downloading large databases. Here, we describe the latest version of the SWISS-MODEL expert system for protein structure modelling. The SWISS-MODEL template library provides annotation of quaternary structure and essential ligands and co-factors to allow for building of complete structural models, including their oligomeric structure. The improved SWISS-MODEL pipeline makes extensive use of model quality estimation for selection of the most suitable templates and provides estimates of the expected accuracy of the resulting models. The accuracy of the models generated by SWISS-MODEL is continuously evaluated by the CAMEO system. The new web site allows users to interactively search for templates, cluster them by sequence similarity, structurally compare alternative templates and select the ones to be used for model building. In cases where multiple alternative template structures are available for a protein of interest, a user-guided template selection step allows building models in different functional states. SWISS-MODEL is available at http://swissmodel.expasy.org/.


Asunto(s)
Modelos Moleculares , Estructura Cuaternaria de Proteína , Estructura Terciaria de Proteína , Programas Informáticos , Homología Estructural de Proteína , Evolución Molecular , Internet
17.
Bioinformatics ; 30(17): i505-11, 2014 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-25161240

RESUMEN

MOTIVATION: Membrane proteins are an important class of biological macromolecules involved in many cellular key processes including signalling and transport. They account for one third of genes in the human genome and >50% of current drug targets. Despite their importance, experimental structural data are sparse, resulting in high expectations for computational modelling tools to help fill this gap. However, as many empirical methods have been trained on experimental structural data, which is biased towards soluble globular proteins, their accuracy for transmembrane proteins is often limited. RESULTS: We developed a local model quality estimation method for membrane proteins ('QMEANBrane') by combining statistical potentials trained on membrane protein structures with a per-residue weighting scheme. The increasing number of available experimental membrane protein structures allowed us to train membrane-specific statistical potentials that approach statistical saturation. We show that reliable local quality estimation of membrane protein models is possible, thereby extending local quality estimation to these biologically relevant molecules. AVAILABILITY AND IMPLEMENTATION: Source code and datasets are available on request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteínas de la Membrana/química , Modelos Moleculares , Algoritmos , Interpretación Estadística de Datos , Humanos
18.
BMC Genomics ; 15: 1162, 2014 Dec 22.
Artículo en Inglés | MEDLINE | ID: mdl-25534632

RESUMEN

BACKGROUND: Large-scale RNAi screening has become an important technology for identifying genes involved in biological processes of interest. However, the quality of large-scale RNAi screening is often deteriorated by off-targets effects. In order to find statistically significant effector genes for pathogen entry, we systematically analyzed entry pathways in human host cells for eight pathogens using image-based kinome-wide siRNA screens with siRNAs from three vendors. We propose a Parallel Mixed Model (PMM) approach that simultaneously analyzes several non-identical screens performed with the same RNAi libraries. RESULTS: We show that PMM gains statistical power for hit detection due to parallel screening. PMM allows incorporating siRNA weights that can be assigned according to available information on RNAi quality. Moreover, PMM is able to estimate a sharedness score that can be used to focus follow-up efforts on generic or specific gene regulators. By fitting a PMM model to our data, we found several novel hit genes for most of the pathogens studied. CONCLUSIONS: Our results show parallel RNAi screening can improve the results of individual screens. This is currently particularly interesting when large-scale parallel datasets are becoming more and more publicly available. Our comprehensive siRNA dataset provides a public, freely available resource for further statistical and biological analyses in the high-content, high-throughput siRNA screening field.


Asunto(s)
Genómica/métodos , Interferencia de ARN , ARN Interferente Pequeño/genética , Línea Celular , Biblioteca de Genes , Genómica/normas , Ensayos Analíticos de Alto Rendimiento , Interacciones Huésped-Patógeno/genética , Humanos , Curva ROC , Reproducibilidad de los Resultados
19.
Methods Mol Biol ; 1851: 301-316, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30298405

RESUMEN

Proteins are subject to evolutionary forces that shape their three-dimensional structure to meet specific functional demands. The knowledge of the structure of a protein is therefore instrumental to gain information about the molecular basis of its function. However, experimental structure determination is inherently time consuming and expensive, making it impossible to follow the explosion of sequence data deriving from genome-scale projects. As a consequence, computational structural modeling techniques have received much attention and established themselves as a valuable complement to experimental structural biology efforts. Among these, comparative modeling remains the method of choice to model the three-dimensional structure of a protein when homology to a protein of known structure can be detected.The general strategy consists of using experimentally determined structures of proteins as templates for the generation of three-dimensional models of related family members (targets) of which the structure is unknown. This chapter provides a description of the individual steps needed to obtain a comparative model using SWISS-MODEL, one of the most widely used automated servers for protein structure homology modeling.


Asunto(s)
Proteínas/química , Biología Computacional , Modelos Moleculares , Proteínas/clasificación , Homología de Secuencia de Aminoácido , Homología Estructural de Proteína
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA