RESUMO
De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population. These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.
Assuntos
Proteínas Intrinsicamente Desordenadas/química , Proteínas Intrinsicamente Desordenadas/genética , Animais , Composição de Bases , Biologia Computacional , Bases de Dados de Proteínas , Proteínas de Drosophila/química , Proteínas de Drosophila/genética , Evolução Molecular , Ontologia Genética , Fases de Leitura Aberta , Filogenia , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Seleção Genética , Homologia Estrutural de ProteínaRESUMO
Proteins evolve through point mutations as well as by insertions and deletions (indels). During the last decade it has become apparent that protein regions that do not fold into three-dimensional structures, i.e. intrinsically disordered regions, are quite common. Here, we have studied the relationship between protein disorder and indels using HMM-HMM pairwise alignments in two sets of orthologous eukaryotic protein pairs. First, we show that disordered residues are much more frequent among indel residues than among aligned residues and, also are more prevalent among indels than in coils. Second, we observed that disordered residues are particularly common in longer indels. Disordered indels of short-to-medium size are prevalent in the non-terminal regions of proteins while the longest indels, ordered and disordered alike, occur toward the termini of the proteins where new structural units are comparatively well tolerated. Finally, while disordered regions often evolve faster than ordered regions and disorder is common in indels, there are some previously recognized protein families where the disordered region is more conserved than the ordered region. We find that these rare proteins are often involved in information processes, such as RNA processing and translation. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.
Assuntos
Mutação INDEL , Proteínas/genéticaRESUMO
Proteins evolve not only through point mutations but also by insertion and deletion events, which affect the length of the protein. It is well known that such indel events most frequently occur in surface-exposed loops. However, detailed analysis of indel events in distantly related and fast-evolving proteins is hampered by the difficulty involved in correctly aligning such sequences. Here, we circumvent this problem by first only analyzing homologous proteins based on length variation rather than pairwise alignments. Using this approach, we find a surprisingly strong relationship between difference in length and difference in the number of intrinsically disordered residues, where up to three quarters of the length variation can be explained by changes in the number of intrinsically disordered residues. Further, we find that disorder is common in both insertions and deletions. A more detailed analysis reveals that indel events do not induce disorder but rather that already disordered regions accrue indels, suggesting that there is a lowered selective pressure for indels to occur within intrinsically disordered regions.
Assuntos
Mutação INDEL , Proteínas Intrinsicamente Desordenadas/química , Proteínas Intrinsicamente Desordenadas/genética , Proteínas/química , Proteínas/genética , Sequência de Aminoácidos , Evolução Molecular , Variação Genética , Modelos Moleculares , Filogenia , Conformação Proteica , Estrutura Secundária de Proteína , Proteínas/metabolismo , Alinhamento de Sequência , Homologia de Sequência de AminoácidosRESUMO
With synthetic gene services, molecular cloning is as easy as ordering a pizza. However choosing the right RNA code for efficient protein production is less straightforward, more akin to deciding on the pizza toppings. The possibility to choose synonymous codons in the gene sequence has ignited a discussion that dates back 50 years: Does synonymous codon use matter? Recent studies indicate that replacement of particular codons for synonymous codons can improve expression in homologous or heterologous hosts, however it is not always successful. Furthermore it is increasingly apparent that membrane protein biogenesis can be codon-sensitive. Single synonymous codon substitutions can influence mRNA stability, mRNA structure, translational initiation, translational elongation and even protein folding. Synonymous codon substitutions therefore need to be carefully evaluated when membrane proteins are engineered for higher production levels and further studies are needed to fully understand how to select the codons that are optimal for higher production. This article is part of a Special Issue entitled: Protein Folding in Membranes.
Assuntos
Código Genético/genética , Proteínas de Membrana/biossíntese , Proteínas de Membrana/genética , Sequência de Bases , Códon/genética , Modelos Genéticos , Dados de Sequência Molecular , RNA Mensageiro/genética , RNA Mensageiro/metabolismoRESUMO
Particularly in higher eukaryotes, some protein domains are found in tandem repeats, performing broad functions often related to cellular organization. For instance, the eukaryotic protein filamin interacts with many proteins and is crucial for the cytoskeleton. The functional properties of long repeat domains are governed by the specific properties of each individual domain as well as by the repeat copy number. To provide better understanding of the evolutionary and functional history of repeating domains, we investigated the mode of evolution of the filamin domain in some detail. Among the domains that are common in long repeat proteins, sushi and spectrin domains evolve primarily through cassette tandem duplications while scavenger and immunoglobulin repeats appear to evolve through clustered tandem duplications. Additionally, immunoglobulin and filamin repeats exhibit a unique pattern where every other domain shows high sequence similarity. This pattern may be the result of tandem duplications, serve to avert aggregation between adjacent domains or it is the result of functional constraints. In filamin, our studies confirm the presence of interspersed integrin binding domains in vertebrates, while invertebrates exhibit more varied patterns, including more clustered integrin binding domains. The most notable case is leech filamin, which contains a 20 repeat expansion and exhibits unique dimerization topology. Clearly, invertebrate filamins are varied and contain examples of similar adjacent integrin-binding domains. Given that invertebrate integrin shows more similarity to the weaker filamin binder, integrin ß3, it is possible that the distance between integrin-binding domains is not as crucial for invertebrate filamins as for vertebrates.
Assuntos
Proteínas Contráteis/química , Proteínas Contráteis/genética , Proteínas dos Microfilamentos/química , Proteínas dos Microfilamentos/genética , Sequência de Aminoácidos , Animais , Análise por Conglomerados , Sequência Consenso , Evolução Molecular , Filaminas , Humanos , Cadeias de Markov , Modelos Genéticos , Modelos Moleculares , Dados de Sequência Molecular , Filogenia , Ligação Proteica , Domínios e Motivos de Interação entre Proteínas , Estrutura Secundária de Proteína , Sequências Repetitivas de Aminoácidos , Análise de Sequência de ProteínaRESUMO
With recent publications of several large-scale protein-protein interaction (PPI) studies, the realization of the full yeast interaction network is getting closer. Here, we have analysed several yeast protein interaction datasets to understand their strengths and weaknesses. In particular, we investigate the effect of experimental biases on some of the protein properties suggested to be enriched in highly connected proteins. Finally, we use support vector machines (SVM) to assess the contribution of these properties to protein interactivity. We find that protein abundance is the most important factor for detecting interactions in tandem affinity purifications (TAP), while it is of less importance for Yeast Two Hybrid (Y2H) screens. Consequently, sequence conservation and/or essentiality of hubs may be related to their high abundance. Further, proteins with disordered structure are over-represented in Y2H screens and in one, but not the other, large-scale TAP assay. Hence, disordered regions may be important both in transient interactions and interactions in complexes. Finally, a few domain families seem to be responsible for a large part of all interactions. Most importantly, we show that there are method-specific biases in PPI experiments. Thus, care should be taken before drawing strong conclusions based on a single dataset.
Assuntos
Mapeamento de Interação de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Cromatografia de Afinidade , Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Proteômica/métodos , Sequências Repetitivas de Aminoácidos , Saccharomyces cerevisiae , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Técnicas do Sistema de Duplo-HíbridoRESUMO
BACKGROUND: Germline genetic variants are an important cause of dilated cardiomyopathy (DCM). However, recent sequencing studies have revealed rare variants in DCM-associated genes also in individuals without known heart disease. In this study, we investigate variant prevalence and genotype-phenotype correlations in Swedish DCM patients, and compare their genetic variants to those detected in reference cohorts. METHODS AND RESULTS: We sequenced the coding regions of 41 DCM-associated genes in 176 unrelated patients with idiopathic DCM and found 102 protein-altering variants with an allele frequency of <0.04% in reference cohorts; the majority were missense variants not previously described in DCM. Fifty-five (31%) patients had one variant, and 24 (14%) patients had two or more variants in the analysed genes. Detection of genetic variants in any gene, and in LMNA, MYH7 or TTN alone, was associated with early onset disease and reduced transplant-free survival. As expected, nonsense and frameshift variants were more common in DCM patients than in healthy individuals of the reference cohort 1000 Genomes Europeans. Surprisingly however, the prevalence, conservation and pathogenicity scores, and localization of missense variants were similar in DCM patients and healthy reference individuals. CONCLUSION: To our knowledge, this is the first study to identify correlations between genotype and prognosis when sequencing a large number of genes in unselected DCM patients. The similar distribution of missense variants in DCM patients and healthy reference individuals questions the pathogenic role of many variants, and suggests that results from genetic testing of DCM patients should be interpreted with caution.
Assuntos
Cardiomiopatia Dilatada/genética , Adolescente , Adulto , Idoso , Cardiomiopatia Dilatada/mortalidade , Cardiomiopatia Dilatada/terapia , Estudos de Casos e Controles , Feminino , Frequência do Gene , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Mutação de Sentido Incorreto , Análise de Sobrevida , Suécia , Adulto JovemRESUMO
Most eukaryotic proteins are multi-domain proteins that are created from fusions of genes, deletions and internal repetitions. An investigation of such evolutionary events requires a method to find the domain architecture from which each protein originates. Therefore, we defined a novel measure, domain distance, which is calculated as the number of domains that differ between two domain architectures. Using this measure the evolutionary events that distinguish a protein from its closest ancestor have been studied and it was found that indels are more common than internal repetition and that the exchange of a domain is rare. Indels and repetitions are common at both the N and C-terminals while they are rare between domains. The evolution of the majority of multi-domain proteins can be explained by the stepwise insertions of single domains, with the exception of repeats that sometimes are duplicated several domains in tandem. We show that domain distances agree with sequence similarity and semantic similarity based on gene ontology annotations. In addition, we demonstrate the use of the domain distance measure to build evolutionary trees. Finally, the evolution of multi-domain proteins is exemplified by a closer study of the evolution of two protein families, non-receptor tyrosine kinases and RhoGEFs.
Assuntos
Evolução Molecular , Fatores de Troca do Nucleotídeo Guanina/metabolismo , Proteínas Tirosina Quinases/metabolismo , Proteoma , Bases de Dados de Proteínas , Células Eucarióticas , Fatores de Troca do Nucleotídeo Guanina/química , Fatores de Troca do Nucleotídeo Guanina/genética , Filogenia , Estrutura Terciária de Proteína , Proteínas Tirosina Quinases/química , Proteínas Tirosina Quinases/genética , Análise de SequênciaRESUMO
A large portion of the coding capacity of Mycobacterium tuberculosis is devoted to the production of proteins containing several copies of the pentapeptide-2 repeat, namely the PE/PPE_MPTR proteins. Protein domain repeats have a variety of binding properties and are involved in protein-protein interactions as well as binding to other ligands such as DNA and RNA. They are not as common in prokaryotes, compared to eukaryotes, but the enrichment of pentapeptide-2 repeats in Mycobacteria constitutes an exception to that rule. The genes encoding the PE/PPE_MPTR proteins have undergone many rearrangements and here we have identified the expansion patterns across the Mycobacteria. We have performed a reclassification of the PE/PPE_MPTR proteins using cohesive regions rather than sparse domain architectures. It is clear that these proteins have undergone large insertions of several pentapeptide-2 domains appearing adjacent to one another in a repetitive pattern. Further, we have identified a non-pentapeptide motif associated with rapid mycobacterial evolution. The sequence composition of this region suggests a different structure compared to pentapeptide-2 repeats. By studying the evolution of the PE/PPE_MPTR proteins, we have distinguished features pertaining to tuberculosis-inducing species. Further studies of the non-pentapeptide region associated with repeat expansions promises to shed light on the pathogenicity of Mycobacterium tuberculosis.
Assuntos
Evolução Molecular , Mycobacterium tuberculosis/genética , DNA Intergênico/genética , Genes Bacterianos/genética , Repetições de Microssatélites/genética , Mycobacterium/genética , Filogenia , Domínios Proteicos/genética , Sequências Repetitivas de Ácido NucleicoRESUMO
BACKGROUND: Many biological networks show some characteristics of scale-free networks. Scale-free networks can evolve through preferential attachment where new nodes are preferentially attached to well connected nodes. In networks which have evolved through preferential attachment older nodes should have a higher average connectivity than younger nodes. Here we have investigated preferential attachment in the context of metabolic networks. RESULTS: The connectivities of the enzymes in the metabolic network of Escherichia coli were determined and representatives for these enzymes were located in 11 eukaryotes, 17 archaea and 46 bacteria. E. coli enzymes which have representatives in eukaryotes have a higher average connectivity while enzymes which are represented only in the prokaryotes, and especially the enzymes only present in betagamma-proteobacteria, have lower connectivities than expected by chance. Interestingly, the enzymes which have been proposed as candidates for horizontal gene transfer have a higher average connectivity than the other enzymes. Furthermore, It was found that new edges are added to the highly connected enzymes at a faster rate than to enzymes with low connectivities which is consistent with preferential attachment. CONCLUSION: Here, we have found indications of preferential attachment in the metabolic network of E. coli. A possible biological explanation for preferential attachment growth of metabolic networks is that novel enzymes created through gene duplication maintain some of the compounds involved in the original reaction, throughout its future evolution. In addition, we found that enzymes which are candidates for horizontal gene transfer have a higher average connectivity than other enzymes. This indicates that while new enzymes are attached preferentially to highly connected enzymes, these highly connected enzymes have sometimes been introduced into the E. coli genome by horizontal gene transfer. We speculate that E. coli has adjusted its metabolic network to a changing environment by replacing the relatively central enzymes for better adapted orthologs from other prokaryotic species.
Assuntos
Regulação Bacteriana da Expressão Gênica , Metabolismo , Bactérias/metabolismo , Fenômenos Fisiológicos Bacterianos , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/fisiologia , Evolução Molecular , Regulação Fúngica da Expressão Gênica , Transferência Genética Horizontal , Genes Bacterianos , Técnicas Genéticas , Genoma Bacteriano , Modelos Biológicos , Modelos Estatísticos , Filogenia , Mapeamento de Interação de Proteínas , Isoformas de Proteínas , Saccharomyces cerevisiae/metabolismo , Seleção GenéticaRESUMO
BACKGROUND: The two most common models for the evolution of metabolism are the patchwork evolution model, where enzymes are thought to diverge from broad to narrow substrate specificity, and the retrograde evolution model, according to which enzymes evolve in response to substrate depletion. Analysis of the distribution of homologous enzyme pairs in the metabolic network can shed light on the respective importance of the two models. We here investigate the evolution of the metabolism in E. coli viewed as a single network using EcoCyc. RESULTS: Sequence comparison between all enzyme pairs was performed and the minimal path length (MPL) between all enzyme pairs was determined. We find a strong over-representation of homologous enzymes at MPL 1. We show that the functionally similar and functionally undetermined enzyme pairs are responsible for most of the over-representation of homologous enzyme pairs at MPL 1. CONCLUSIONS: The retrograde evolution model predicts that homologous enzymes pairs are at short metabolic distances from each other. In general agreement with previous studies we find that homologous enzymes occur close to each other in the network more often than expected by chance, which lends some support to the retrograde evolution model. However, we show that the homologous enzyme pairs which may have evolved through retrograde evolution, namely the pairs that are functionally dissimilar, show a weaker over-representation at MPL 1 than the functionally similar enzyme pairs. Our study indicates that, while the retrograde evolution model may have played a small part, the patchwork evolution model is the predominant process of metabolic enzyme evolution.
Assuntos
Escherichia coli/enzimologia , Escherichia coli/genética , Evolução Molecular , Algoritmos , Biotina/metabolismo , Biologia Computacional/métodos , Bases de Dados Genéticas , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Proteínas de Escherichia coli/fisiologia , Genes Bacterianos/genética , Genes Bacterianos/fisiologia , Modelos Biológicos , Homologia de Sequência de Aminoácidos , Especificidade por Substrato/genéticaRESUMO
The frequency of de novo creation of proteins has been debated. Early it was assumed that de novo creation should be extremely rare and that the vast majority of all protein coding genes were created in early history of life. However, the early genomics era lead to the insight that protein coding genes do appear to be lineage-specific. Today, with thousands of completely sequenced genomes, this impression remains. It has even been proposed that the creation of novel genes, a continuous process where most de novo genes are short-lived, is as frequent as gene duplications. There exist reports with strongly indicative evidence for de novo gene emergence in many organisms ranging from Bacteria, sometimes generated through bacteriophages, to humans, where orphans appear to be overexpressed in brain and testis. In contrast, research on protein evolution indicates that many very distantly related proteins appear to share partial homology. Here, we discuss recent results on de novo gene emergence, as well as important technical challenges limiting our ability to get a definite answer to the extent of de novo protein creation.
Assuntos
Evolução Molecular , Proteínas/química , Proteínas/metabolismo , Animais , Sequência de Bases , Humanos , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Proteínas/genética , Homologia de Sequência do Ácido NucleicoRESUMO
Many proteins are composed of protein domains, functional units of common descent. Multidomain forms are common in all eukaryotes making up more than half of the proteome and the evolution of novel domain architecture has been accelerated in metazoans. It is also becoming increasingly clear that alternative splicing is prevalent among vertebrates. Given that protein domains are defined as structurally, functionally and evolutionarily distinct units, one may speculate that some alternative splicing events may lead to clean excisions of protein domains, thus generating a number of different domain architectures from one gene template. However, recent findings indicate that smaller alternative splicing events, in particular in disordered regions, might be more prominent than domain architectural changes. The problem of identifying protein isoforms is, however, still not resolved. Clearly, many splice forms identified through detection of mRNA sequences appear to produce 'nonfunctional' proteins, such as proteins with missing internal secondary structure elements. Here, we review the state of the art methods for identification of functional isoforms and present a summary of what is known, thus far, about alternative splicing with regard to protein domain architectures.
Assuntos
Processamento Alternativo , Isoformas de Proteínas , Processamento de Proteína , Animais , Sequência de Bases , Eucariotos , Humanos , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Estrutura Terciária de Proteína , Proteoma/química , Proteoma/genética , Proteoma/metabolismoRESUMO
Membrane proteins are extremely challenging to produce in sufficient quantities for biochemical and structural analysis and there is a growing demand for solutions to this problem. In this study we attempted to improve expression of two difficult-to-express coding sequences (araH and narK) for membrane transporters. For both coding sequences, synonymous codon substitutions in the region adjacent to the AUG start led to significant improvements in expression, whereas multi-parameter sequence optimization of codons throughout the coding sequence failed. We conclude that coding sequences can be re-wired for high-level protein expression by selective engineering of the 5' coding sequence with synonymous codons, thus circumventing the need to consider whole sequence optimization.
Assuntos
Códon , Proteínas de Escherichia coli/biossíntese , Proteínas de Membrana/biossíntese , Sequência de Aminoácidos , Sequência de Bases , Proteínas de Escherichia coli/genética , Proteínas de Membrana/genética , Dados de Sequência Molecular , Homologia de Sequência de AminoácidosRESUMO
Protein domain repeats are common in proteins that are central to the organization of a cell, in particular in eukaryotes. They are known to evolve through internal tandem duplications. However, the understanding of the underlying mechanisms is incomplete. To shed light on repeat expansion mechanisms, we have studied the evolution of the muscle protein Nebulin, a protein that contains a large number of actin-binding nebulin domains. Nebulin proteins have evolved from an invertebrate precursor containing two nebulin domains. Repeat regions have expanded through duplications of single domains, as well as duplications of a super repeat (SR) consisting of seven nebulins. We show that the SR has evolved independently into large regions in at least three instances: twice in the invertebrate Branchiostoma floridae and once in vertebrates. In-depth analysis reveals several recent tandem duplications in the Nebulin gene. The events involve both single-domain and multidomain SR units or several SR units. There are single events, but frequently the same unit is duplicated multiple times. For instance, an ancestor of human and chimpanzee underwent two tandem duplications. The duplication junction coincides with an Alu transposon, thus suggesting duplication through Alu-mediated homologous recombination. Duplications in the SR region consistently involve multiples of seven domains. However, the exact unit that is duplicated varies both between species and within species. Thus, multiple tandem duplications of the same motif did not create the large Nebulin protein. Finally, analysis of segmental duplications in the human genome reveals that duplications are more common in genes containing domain repeats than in those coding for nonrepeated proteins. In fact, segmental duplications are found three to six times more often in long repeated genes than expected by chance.
Assuntos
Evolução Molecular , Genoma Humano/genética , Proteínas Musculares/genética , Sequências Repetitivas de Ácido Nucleico/genética , Duplicações Segmentares Genômicas , Vertebrados/genética , Animais , Éxons/genética , Humanos , Proteínas Musculares/classificação , FilogeniaRESUMO
Gene duplication is postulated to have played a major role in the evolution of biological novelty. Here, gene duplication is examined across levels of biological organization in an attempt to create a unified picture of the mechanistic process by which gene duplication can have played a role in generating biodiversity. Neofunctionalization and subfunctionalization have been proposed as important processes driving the retention of duplicate genes. These models have foundations in population genetic theory, which is now being refined by explicit consideration of the structural constraints placed upon genes encoding proteins through physical chemistry. Further, such models can be examined in the context of comparative genomics, where an integration of gene-level evolution and species-level evolution allows an assessment of the frequency of duplication and the fate of duplicate genes. This process, of course, is dependent upon the biochemical role that duplicated genes play in biological systems, which is in turn dependent upon the mechanism of duplication: whole genome duplication involving a co-duplication of interacting partners vs. single gene duplication. Lastly, the role that these processes may have played in driving speciation is examined.
Assuntos
Evolução Molecular , Duplicação Gênica , Modelos Genéticos , Animais , Arabidopsis/genética , Arabidopsis/metabolismo , Regulação da Expressão Gênica , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Transcrição Gênica , Vertebrados/genéticaRESUMO
BACKGROUND: Most proteins interact with only a few other proteins while a small number of proteins (hubs) have many interaction partners. Hub proteins and non-hub proteins differ in several respects; however, understanding is not complete about what properties characterize the hubs and set them apart from proteins of low connectivity. Therefore, we have investigated what differentiates hubs from non-hubs and static hubs (party hubs) from dynamic hubs (date hubs) in the protein-protein interaction network of Saccharomyces cerevisiae. RESULTS: The many interactions of hub proteins can only partly be explained by bindings to similar proteins or domains. It is evident that domain repeats, which are associated with binding, are enriched in hubs. Moreover, there is an over representation of multi-domain proteins and long proteins among the hubs. In addition, there are clear differences between party hubs and date hubs. Fewer of the party hubs contain long disordered regions compared to date hubs, indicating that these regions are important for flexible binding but less so for static interactions. Furthermore, party hubs interact to a large extent with each other, supporting the idea of party hubs as the cores of highly clustered functional modules. In addition, hub proteins, and in particular party hubs, are more often ancient. Finally, the more recent paralogs of party hubs are underrepresented. CONCLUSION: Our results indicate that multiple and repeated domains are enriched in hub proteins and, further, that long disordered regions, which are common in date hubs, are particularly important for flexible binding.