RESUMEN
An unambiguous description of an experiment, and the subsequent biological observation, is vital for accurate data interpretation. Minimum information guidelines define the fundamental complement of data that can support an unambiguous conclusion based on experimental observations. We present the Minimum Information About Disorder Experiments (MIADE) guidelines to define the parameters required for the wider scientific community to understand the findings of an experiment studying the structural properties of intrinsically disordered regions (IDRs). MIADE guidelines provide recommendations for data producers to describe the results of their experiments at source, for curators to annotate experimental data to community resources and for database developers maintaining community resources to disseminate the data. The MIADE guidelines will improve the interpretability of experimental results for data consumers, facilitate direct data submission, simplify data curation, improve data exchange among repositories and standardize the dissemination of the key metadata on an IDR experiment by IDR data sources.
Asunto(s)
Proteínas Intrínsecamente Desordenadas , Proteínas Intrínsecamente Desordenadas/química , Conformación ProteicaRESUMEN
Intrinsically disordered proteins and protein regions (IDPs/IDRs) carry out important biological functions without relying on a single well-defined conformation. As these proteins are a challenge to study experimentally, computational methods play important roles in their characterization. One of the commonly used tools is the IUPred web server which provides prediction of disordered regions and their binding sites. IUPred is rooted in a simple biophysical model and uses a limited number of parameters largely derived on globular protein structures only. This enabled an incredibly fast and robust prediction method, however, its limitations have also become apparent in light of recent breakthrough methods using deep learning techniques. Here, we present AIUPred, a novel version of IUPred which incorporates deep learning techniques into the energy estimation framework. It achieves improved performance while keeping the robustness of the original method. Based on the evaluation of recent benchmark datasets, AIUPred scored amongst the top three single sequence based methods. With a new web server we offer fast and reliable visual analysis for users as well as options to analyze whole genomes in mere seconds with the downloadable package. AIUPred is available at https://aiupred.elte.hu.
Asunto(s)
Aprendizaje Profundo , Proteínas Intrínsecamente Desordenadas , Programas Informáticos , Proteínas Intrínsecamente Desordenadas/química , Proteínas Intrínsecamente Desordenadas/metabolismo , Proteínas Intrínsecamente Desordenadas/genética , Sitios de Unión , Conformación Proteica , Internet , Termodinámica , Biología Computacional/métodosRESUMEN
Intrinsically disordered proteins and protein regions (IDPs/IDRs) exist without a single well-defined conformation. They carry out important biological functions with multifaceted roles which is also reflected in their evolutionary behavior. Computational methods play important roles in the characterization of IDRs. One of the commonly used disorder prediction methods is IUPred, which relies on an energy estimation approach. The IUPred web server takes an amino acid sequence or a Uniprot ID/accession as an input and predicts the tendency for each amino acid to be in a disordered region with an option to also predict context-dependent disordered regions. In this new iteration of IUPred, we added multiple novel features to enhance the prediction capabilities of the server. First, learning from the latest evaluation of disorder prediction methods we introduced multiple new smoothing functions to the prediction that decreases noise and increases the performance of the predictions. We constructed a dataset consisting of experimentally verified ordered/disordered regions with unambiguous annotations which were added to the prediction. We also introduced a novel tool that enables the exploration of the evolutionary conservation of protein disorder coupled to sequence conservation in model organisms. The web server is freely available to users and accessible at https://iupred3.elte.hu.
Asunto(s)
Proteínas Intrínsecamente Desordenadas/química , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Secuencia Conservada , Factor 2 Eucariótico de Iniciación/química , Evolución Molecular , Proteínas Fúngicas/química , Humanos , Proteínas Intrínsecamente Desordenadas/genética , Análisis de Secuencia de ProteínaRESUMEN
Protein and lipid membrane interactions play fundamental roles in a large number of cellular processes (e.g. signalling, vesicle trafficking, or viral invasion). A growing number of examples indicate that such interactions can also rely on intrinsically disordered protein regions (IDRs), which can form specific reversible interactions not only with proteins but also with lipids. We named IDRs involved in such membrane lipid-induced disorder-to-order transition as MemMoRFs, in an analogy to IDRs exhibiting disorder-to-order transition upon interaction with protein partners termed Molecular Recognition Features (MoRFs). Currently, both the experimental detection and computational characterization of MemMoRFs are challenging, and information about these regions are scattered in the literature. To facilitate the related investigations we generated a comprehensive database of experimentally validated MemMoRFs based on manual curation of literature and structural data. To characterize the dynamics of MemMoRFs, secondary structure propensity and flexibility calculated from nuclear magnetic resonance chemical shifts were incorporated into the database. These data were supplemented by inclusion of sentences from papers, functional data and disease-related information. The MemMoRF database can be accessed via a user-friendly interface at https://memmorf.hegelab.org, potentially providing a central resource for the characterization of disordered regions in transmembrane and membrane-associated proteins.
Asunto(s)
Membrana Celular/metabolismo , Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas/química , Proteínas Intrínsecamente Desordenadas/metabolismo , Sistemas de Lectura Abierta/genética , Internet , Espectroscopía de Resonancia Magnética , Unión ProteicaRESUMEN
The MobiDB database (URL: https://mobidb.org/) provides predictions and annotations for intrinsically disordered proteins. Here, we report recent developments implemented in MobiDB version 4, regarding the database format, with novel types of annotations and an improved update process. The new website includes a re-designed user interface, a more effective search engine and advanced API for programmatic access. The new database schema gives more flexibility for the users, as well as simplifying the maintenance and updates. In addition, the new entry page provides more visualisation tools including customizable feature viewer and graphs of the residue contact maps. MobiDB v4 annotates the binding modes of disordered proteins, whether they undergo disorder-to-order transitions or remain disordered in the bound state. In addition, disordered regions undergoing liquid-liquid phase separation or post-translational modifications are defined. The integrated information is presented in a simplified interface, which enables faster searches and allows large customized datasets to be downloaded in TSV, Fasta or JSON formats. An alternative advanced interface allows users to drill deeper into features of interest. A new statistics page provides information at database and proteome levels. The new MobiDB version presents state-of-the-art knowledge on disordered proteins and improves data accessibility for both computational and experimental users.
Asunto(s)
Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas/química , Algoritmos , Internet , Anotación de Secuencia Molecular , Procesamiento Proteico-Postraduccional , Programas InformáticosRESUMEN
There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. SHORT ABSTRACT: There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.
Asunto(s)
Proteínas/química , Algoritmos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Evolución Molecular , Conformación Proteica , Dominios ProteicosRESUMEN
MOTIVATION: The earlier version of MobiDB-lite is currently used in large-scale proteome annotation platforms to detect intrinsic disorder. However, new theoretical models allow for the classification of intrinsically disordered regions into subtypes from sequence features associated with specific polymeric properties or compositional bias. RESULTS: MobiDB-lite 3.0 maintains its previous speed and performance but also provides a finer classification of disorder by identifying regions with characteristics of polyolyampholytes, positive or negative polyelectrolytes, low-complexity regions or enriched in cysteine, proline or glycine or polar residues. Subregions are abundantly detected in IDRs of the human proteome. The new version of MobiDB-lite represents a new step for the proteome level analysis of protein disorder. AVAILABILITY AND IMPLEMENTATION: Both the MobiDB-lite 3.0 source code and a docker container are available from the GitHub repository: https://github.com/BioComputingUP/MobiDB-lite.
RESUMEN
Membraneless organelles (MOs) are dynamic liquid condensates that host a variety of specific cellular processes, such as ribosome biogenesis or RNA degradation. MOs form through liquid-liquid phase separation (LLPS), a process that relies on multivalent weak interactions of the constituent proteins and other macromolecules. Since the first discoveries of certain proteins being able to drive LLPS, it emerged as a general mechanism for the effective organization of cellular space that is exploited in all kingdoms of life. While numerous experimental studies report novel cases, the computational identification of LLPS drivers is lagging behind, and many open questions remain about the sequence determinants, composition, regulation and biological relevance of the resulting condensates. Our limited ability to overcome these issues is largely due to the lack of a dedicated LLPS database. Therefore, here we introduce PhaSePro (https://phasepro.elte.hu), an openly accessible, comprehensive, manually curated database of experimentally validated LLPS driver proteins/protein regions. It not only provides a wealth of information on such systems, but improves the standardization of data by introducing novel LLPS-specific controlled vocabularies. PhaSePro can be accessed through an appealing, user-friendly interface and thus has definite potential to become the central resource in this dynamically developing field.
Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Vocabulario Controlado , Orgánulos/metabolismo , Proteínas/metabolismo , Interfaz Usuario-ComputadorRESUMEN
Low complexity regions (LCRs) in protein sequences are characterized by a less diverse amino acid composition compared to typically observed sequence diversity. Recent studies have shown that LCRs may co-occur with intrinsically disordered regions, are highly conserved in many organisms, and often play important roles in protein functions and in diseases. In previous decades, several methods have been developed to identify regions with LCRs or amino acid bias, but most of them as stand-alone applications and currently there is no web-based tool which allows users to explore LCRs in protein sequences with additional functional annotations. We aim to fill this gap by providing PlaToLoCo - PLAtform of TOols for LOw COmplexity-a meta-server that integrates and collects the output of five different state-of-the-art tools for discovering LCRs and provides functional annotations such as domain detection, transmembrane segment prediction, and calculation of amino acid frequencies. In addition, the union or intersection of the results of the search on a query sequence can be obtained. By developing the PlaToLoCo meta-server, we provide the community with a fast and easily accessible tool for the analysis of LCRs with additional information included to aid the interpretation of the results. The PlaToLoCo platform is available at: http://platoloco.aei.polsl.pl/.
Asunto(s)
Proteínas/química , Programas Informáticos , Aminoácidos/análisis , Gráficos por Computador , Humanos , Proteínas de la Membrana/química , Anotación de Secuencia Molecular , Dominios Proteicos , Análisis de Secuencia de ProteínaRESUMEN
The structural states of proteins include ordered globular domains as well as intrinsically disordered protein regions that exist as highly flexible conformational ensembles in isolation. Various computational tools have been developed to discriminate ordered and disordered segments based on the amino acid sequence. However, properties of IDRs can also depend on various conditions, including binding to globular protein partners or environmental factors, such as redox potential. These cases provide further challenges for the computational characterization of disordered segments. In this work we present IUPred2A, a combined web interface that allows to generate energy estimation based predictions for ordered and disordered residues by IUPred2 and for disordered binding regions by ANCHOR2. The updated web server retains the robustness of the original programs but offers several new features. While only minor bug fixes are implemented for IUPred, the next version of ANCHOR is significantly improved through a new architecture and parameters optimized on novel datasets. In addition, redox-sensitive regions can also be highlighted through a novel experimental feature. The web server offers graphical and text outputs, a RESTful interface, access to software download and extensive help, and can be accessed at a new location: http://iupred2a.elte.hu.
Asunto(s)
Internet , Proteínas/genética , Programas Informáticos , Algoritmos , Oxidación-Reducción , Unión Proteica , Conformación Proteica , Proteínas/química , Análisis de Secuencia de ProteínaRESUMEN
The MobiDB (URL: mobidb.bio.unipd.it) database of protein disorder and mobility annotations has been significantly updated and upgraded since its last major renewal in 2014. Several curated datasets for intrinsic disorder and folding upon binding have been integrated from specialized databases. The indirect evidence has also been expanded to better capture information available in the PDB, such as high temperature residues in X-ray structures and overall conformational diversity. Novel nuclear magnetic resonance chemical shift data provides an additional experimental information layer on conformational dynamics. Predictions have been expanded to provide new types of annotation on backbone rigidity, secondary structure preference and disordered binding regions. MobiDB 3.0 contains information for the complete UniProt protein set and synchronization has been improved by covering all UniParc sequences. An advanced search function allows the creation of a wide array of custom-made datasets for download and further analysis. A large amount of information and cross-links to more specialized databases are intended to make MobiDB the central resource for the scientific community working on protein intrinsic disorder and mobility.
Asunto(s)
Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas/química , Anotación de Secuencia Molecular , Programas Informáticos , Secuencia de Aminoácidos , Sitios de Unión , Conjuntos de Datos como Asunto , Ontología de Genes , Humanos , Internet , Proteínas Intrínsecamente Desordenadas/genética , Proteínas Intrínsecamente Desordenadas/metabolismo , Modelos Moleculares , Unión Proteica , Pliegue de Proteína , Dominios y Motivos de Interacción de Proteínas , Alineación de SecuenciaRESUMEN
Recently developed quantitative redox proteomic studies enable the direct identification of redox-sensing cysteine residues that regulate the functional behavior of target proteins in response to changing levels of reactive oxygen species. At the molecular level, redox regulation can directly modify the active sites of enzymes, although a growing number of examples indicate the importance of an additional underlying mechanism that involves conditionally disordered proteins. These proteins alter their functional behavior by undergoing a disorder-to-order transition in response to changing redox conditions. However, the extent to which this mechanism is used in various proteomes is currently unknown. Here, a recently developed sequence-based prediction tool incorporated into the IUPred2A web server is used to estimate redox-sensitive conditionally disordered regions at a large scale. It is shown that redox-sensitive conditional disorder is fairly widespread in various proteomes and that its presence strongly correlates with the expansion of specific domains in multicellular organisms that largely rely on extra stability provided by disulfide bonds or zinc ion binding. The analyses of yeast redox proteomes and human disease data further underlie the significance of this phenomenon in the regulation of a wide range of biological processes, as well as its biomedical importance.
Asunto(s)
Cisteína/metabolismo , Proteínas Intrínsecamente Desordenadas/metabolismo , Proteómica/métodos , Especies Reactivas de Oxígeno/metabolismo , Animales , Cisteína/química , Humanos , Proteínas Intrínsecamente Desordenadas/química , Modelos Moleculares , Oxidación-Reducción , Conformación Proteica , Saccharomyces cerevisiae/química , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMEN
Motivation: Intrinsic disorder (ID), i.e. the lack of a unique folded conformation at physiological conditions, is a common feature for many proteins, which requires specialized biochemical experiments that are not high-throughput. Missing X-ray residues from the PDB have been widely used as a proxy for ID when developing computational methods. This may lead to a systematic bias, where predictors deviate from biologically relevant ID. Large benchmarking sets on experimentally validated ID are scarce. Recently, the DisProt database has been renewed and expanded to include manually curated ID annotations for several hundred new proteins. This provides a large benchmark set which has not yet been used for training ID predictors. Results: Here, we describe the first systematic benchmarking of ID predictors on the new DisProt dataset. In contrast to previous assessments based on missing X-ray data, this dataset contains mostly long ID regions and a significant amount of fully ID proteins. The benchmarking shows that ID predictors work quite well on the new dataset, especially for long ID segments. However, a large fraction of ID still goes virtually undetected and the ranking of methods is different than for PDB data. In particular, many predictors appear to confound ID and regions outside X-ray structures. This suggests that the ID prediction methods capture different flavors of disorder and can benefit from highly accurate curated examples. Availability and implementation: The raw data used for the evaluation are available from URL: http://www.disprot.org/assessment/. Contact: silvio.tosatto@unipd.it. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Conformación Proteica , Análisis de Secuencia de Proteína/métodosRESUMEN
Motivation: Intrinsically Disordered Proteins (IDPs) mediate crucial protein-protein interactions, most notably in signaling and regulation. As their importance is increasingly recognized, the detailed analyses of specific IDP interactions opened up new opportunities for therapeutic targeting. Yet, large scale information about IDP-mediated interactions in structural and functional details are lacking, hindering the understanding of the mechanisms underlying this distinct binding mode. Results: Here, we present DIBS, the first comprehensive, curated collection of complexes between IDPs and ordered proteins. DIBS not only describes by far the highest number of cases, it also provides the dissociation constants of their interactions, as well as the description of potential post-translational modifications modulating the binding strength and linear motifs involved in the binding. Together with the wide range of structural and functional annotations, DIBS will provide the cornerstone for structural and functional studies of IDP complexes. Availability and implementation: DIBS is freely accessible at http://dibs.enzim.ttk.mta.hu/. The DIBS application is hosted by Apache web server and was implemented in PHP. To enrich querying features and to enhance backend performance a MySQL database was also created. Contact: dosztanyi@caesar.elte.hu or bmeszaros@caesar.elte.hu. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas/metabolismo , Dominios y Motivos de Interacción de Proteínas , Sitios de Unión , Proteínas Intrínsecamente Desordenadas/químicaRESUMEN
The Database of Protein Disorder (DisProt, URL: www.disprot.org) has been significantly updated and upgraded since its last major renewal in 2007. The current release holds information on more than 800 entries of IDPs/IDRs, i.e. intrinsically disordered proteins or regions that exist and function without a well-defined three-dimensional structure. We have re-curated previous entries to purge DisProt from conflicting cases, and also upgraded the functional classification scheme to reflect continuous advance in the field in the past 10 years or so. We define IDPs as proteins that are disordered along their entire sequence, i.e. entirely lack structural elements, and IDRs as regions that are at least five consecutive residues without well-defined structure. We base our assessment of disorder strictly on experimental evidence, such as X-ray crystallography and nuclear magnetic resonance (primary techniques) and a broad range of other experimental approaches (secondary techniques). Confident and ambiguous annotations are highlighted separately. DisProt 7.0 presents classified knowledge regarding the experimental characterization and functional annotations of IDPs/IDRs, and is intended to provide an invaluable resource for the research community for a better understanding structural disorder and for developing better computational tools for studying disordered proteins.
Asunto(s)
Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas , Animales , Cristalografía por Rayos X , Transferencia Resonante de Energía de Fluorescencia , Predicción , Control de Formularios y Registros , Humanos , Proteínas Intrínsecamente Desordenadas/clasificación , Resonancia Magnética Nuclear Biomolecular , Conformación ProteicaRESUMEN
InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.
Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Dominios y Motivos de Interacción de Proteínas , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , FilogeniaRESUMEN
Motivation: Intrinsic disorder (ID) is established as an important feature of protein sequences. Its use in proteome annotation is however hampered by the availability of many methods with similar performance at the single residue level, which have mostly not been optimized to predict long ID regions of size comparable to domains. Results: Here, we have focused on providing a single consensus-based prediction, MobiDB-lite, optimized for highly specific (i.e. few false positive) predictions of long disorder. The method uses eight different predictors to derive a consensus which is then filtered for spurious short predictions. Consensus prediction is shown to outperform the single methods when annotating long ID regions. MobiDB-lite can be useful in large-scale annotation scenarios and has indeed already been integrated in the MobiDB, DisProt and InterPro databases. Availability and Implementation: MobiDB-lite is available as part of the MobiDB database from URL: http://mobidb.bio.unipd.it/. An executable can be downloaded from URL: http://protein.bio.unipd.it/mobidblite/. Contact: silvio.tosatto@unipd.it. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Proteínas Intrínsecamente Desordenadas/metabolismo , Proteómica/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Consenso , Humanos , Proteínas Intrínsecamente Desordenadas/química , Sensibilidad y EspecificidadRESUMEN
Protein-protein interactions (PPIs) formed between short linear motifs and globular domains play important roles in many regulatory and signaling processes but are highly underrepresented in current protein-protein interaction databases. These types of interactions are usually characterized by a specific binding motif that captures the key amino acids shared among the interaction partners. However, the computational proteome-level identification of interaction partners based on the known motif is hindered by the huge number of randomly occurring matches from which biologically relevant motif hits need to be extracted. In this work, we established a novel bioinformatic filtering protocol to efficiently explore interaction network of a hub protein. We introduced a novel measure that enabled the optimization of the elements and parameter settings of the pipeline which was built from multiple sequence-based prediction methods. In addition, data collected from PPI databases and evolutionary analyses were also incorporated to further increase the biological relevance of the identified motif hits. The approach was applied to the dynein light chain LC8, a ubiquitous eukaryotic hub protein that has been suggested to be involved in motor-related functions as well as promoting the dimerization of various proteins by recognizing linear motifs in its partners. From the list of putative binding motifs collected by our protocol, several novel peptides were experimentally verified to bind LC8. Altogether 71 potential new motif instances were identified. The expanded list of LC8 binding partners revealed the evolutionary plasticity of binding partners despite the highly conserved binding interface. In addition, it also highlighted a novel, conserved function of LC8 in the upstream regulation of the Hippo signaling pathway. Beyond the LC8 system, our work also provides general guidelines that can be applied to explore the interaction network of other linear motif binding proteins or protein domains.
Asunto(s)
Dineínas Citoplasmáticas/química , Dineínas Citoplasmáticas/metabolismo , Proteínas Serina-Treonina Quinasas/química , Proteínas Serina-Treonina Quinasas/metabolismo , Biología Computacional , Secuencia Conservada , Dineínas Citoplasmáticas/genética , Bases de Datos de Proteínas/estadística & datos numéricos , Evolución Molecular , Vía de Señalización Hippo , Humanos , Filogenia , Unión Proteica , Dominios y Motivos de Interacción de Proteínas , Mapas de Interacción de Proteínas , Proteínas Serina-Treonina Quinasas/genética , Transducción de SeñalRESUMEN
Mitogen-activated protein kinases (MAPK) are broadly used regulators of cellular signaling. However, how these enzymes can be involved in such a broad spectrum of physiological functions is not understood. Systematic discovery of MAPK networks both experimentally and in silico has been hindered because MAPKs bind to other proteins with low affinity and mostly in less-characterized disordered regions. We used a structurally consistent model on kinase-docking motif interactions to facilitate the discovery of short functional sites in the structurally flexible and functionally under-explored part of the human proteome and applied experimental tools specifically tailored to detect low-affinity protein-protein interactions for their validation in vitro and in cell-based assays. The combined computational and experimental approach enabled the identification of many novel MAPK-docking motifs that were elusive for other large-scale protein-protein interaction screens. The analysis produced an extensive list of independently evolved linear binding motifs from a functionally diverse set of proteins. These all target, with characteristic binding specificity, an ancient protein interaction surface on evolutionarily related but physiologically clearly distinct three MAPKs (JNK, ERK, and p38). This inventory of human protein kinase binding sites was compared with that of other organisms to examine how kinase-mediated partnerships evolved over time. The analysis suggests that most human MAPK-binding motifs are surprisingly new evolutionarily inventions and newly found links highlight (previously hidden) roles of MAPKs. We propose that short MAPK-binding stretches are created in disordered protein segments through a variety of ways and they represent a major resource for ancient signaling enzymes to acquire new regulatory roles.
Asunto(s)
Proteínas Quinasas Activadas por Mitógenos/química , Proteínas Quinasas Activadas por Mitógenos/ultraestructura , Estructura Terciaria de Proteína , Secuencia de Aminoácidos , Animales , Biología Computacional , Humanos , Simulación del Acoplamiento Molecular , Alineación de Secuencia , Transducción de Señal , Propiedades de SuperficieRESUMEN
We present the Database of Disordered Protein Prediction (D(2)P(2)), available at http://d2p2.pro (including website source code). A battery of disorder predictors and their variants, VL-XT, VSL2b, PrDOS, PV2, Espritz and IUPred, were run on all protein sequences from 1765 complete proteomes (to be updated as more genomes are completed). Integrated with these results are all of the predicted (mostly structured) SCOP domains using the SUPERFAMILY predictor. These disorder/structure annotations together enable comparison of the disorder predictors with each other and examination of the overlap between disordered predictions and SCOP domains on a large scale. D(2)P(2) will increase our understanding of the interplay between disorder and structure, the genomic distribution of disorder, and its evolutionary history. The parsed data are made available in a unified format for download as flat files or SQL tables either by genome, by predictor, or for the complete set. An interactive website provides a graphical view of each protein annotated with the SCOP domains and disordered regions from all predictors overlaid (or shown as a consensus). There are statistics and tools for browsing and comparing genomes and their disorder within the context of their position on the tree of life.