ABSTRACT
Gene fusions are common cancer-causing mutations, but the molecular principles by which fusion protein products affect interaction networks and cause disease are not well understood. Here, we perform an integrative analysis of the structural, interactomic, and regulatory properties of thousands of putative fusion proteins. We demonstrate that genes that form fusions (i.e., parent genes) tend to be highly connected hub genes, whose protein products are enriched in structured and disordered interaction-mediating features. Fusion often results in the loss of these parental features and the depletion of regulatory sites such as post-translational modifications. Fusion products disproportionately connect proteins that did not previously interact in the protein interaction network. In this manner, fusion products can escape cellular regulation and constitutively rewire protein interaction networks. We suggest that the deregulation of central, interaction-prone proteins may represent a widespread mechanism by which fusion proteins alter the topology of cellular signaling pathways and promote cancer.
Subject(s)
Gene Fusion , Neoplasm Proteins/genetics , Neoplasm Proteins/metabolism , Neoplasms/genetics , Neoplasms/metabolism , Protein Interaction Maps , Computational Biology , Databases, Protein , Humans , Protein Interaction Mapping , Protein Processing, Post-Translational , Signal Transduction , Transcription Factors/genetics , Transcription Factors/metabolism , UbiquitinationABSTRACT
Although gene fusions have been recognized as important drivers of cancer for decades, our understanding of the prevalence and function of gene fusions has been revolutionized by the rise of next-generation sequencing, advances in bioinformatics theory and an increasing capacity for large-scale computational biology. The computational work on gene fusions has been vastly diverse, and the present state of the literature is fragmented. It will be fruitful to merge three camps of gene fusion bioinformatics that appear to rarely cross over: (i) data-intensive computational work characterizing the molecular biology of gene fusions; (ii) development research on fusion detection tools, candidate fusion prioritization algorithms and dedicated fusion databases and (iii) clinical research that seeks to either therapeutically target fusion transcripts and proteins or leverages advances in detection tools to perform large-scale surveys of gene fusion landscapes in specific cancer types. In this review, we unify these different-yet highly complementary and symbiotic-approaches with the view that increased synergy will catalyze advancements in gene fusion identification, characterization and significance evaluation.
Subject(s)
Computational Biology/methods , Gene Fusion , Oncogene Proteins, Fusion/chemistry , Oncogenes , Algorithms , Databases, Genetic , Gene Expression Regulation , Humans , Neoplasms/genetics , Oncogene Proteins, Fusion/genetics , SoftwareABSTRACT
Although gene fusions are recognized as driver mutations in a wide variety of cancers, the general molecular mechanisms underlying oncogenic fusion proteins are insufficiently understood. Here, we employ large-scale data integration and machine learning and (1) identify three functionally distinct subgroups of gene fusions and their molecular signatures; (2) characterize the cellular pathways rewired by fusion events across different cancers; and (3) analyze the relative importance of over 100 structural, functional, and regulatory features of â¼2200 gene fusions. We report subgroups of fusions that likely act as driver mutations and find that gene fusions disproportionately affect pathways regulating cellular shape and movement. Although fusion proteins are similar across different cancer types, they affect cancer type-specific pathways. Key indicators of fusion-forming proteins include high and nontissue specific expression, numerous splice sites, and higher centrality in protein-interaction networks. Together, these findings provide unifying and cancer type-specific trends across diverse oncogenic fusion proteins.
ABSTRACT
Proteins with amino acid homorepeats have the potential to be detrimental to cells and are often associated with human diseases. Why, then, are homorepeats prevalent in eukaryotic proteomes? In yeast, homorepeats are enriched in proteins that are essential and pleiotropic and that buffer environmental insults. The presence of homorepeats increases the functional versatility of proteins by mediating protein interactions and facilitating spatial organization in a repeat-dependent manner. During evolution, homorepeats are preferentially retained in proteins with stringent proteostasis, which might minimize repeat-associated detrimental effects such as unregulated phase separation and protein aggregation. Their presence facilitates rapid protein divergence through accumulation of amino acid substitutions, which often affect linear motifs and post-translational-modification sites. These substitutions may result in rewiring protein interaction and signaling networks. Thus, homorepeats are distinct modules that are often retained in stringently regulated proteins. Their presence facilitates rapid exploration of the genotype-phenotype landscape of a population, thereby contributing to adaptation and fitness.
Subject(s)
Proteins/genetics , Proteins/metabolism , Repetitive Sequences, Amino Acid/genetics , Biological Evolution , Eukaryota , Protein Interaction MapsABSTRACT
p53 is an important regulator of cell cycle arrest, senescence, apoptosis and metabolism, and is frequently mutated in tumors. It functions as a tetramer, where each component dimer binds to a decameric DNA region known as a response element. We identify p53 binding site subtypes and examine the functional and evolutionary properties of these subtypes. We start with over 1700 known binding sites and, with no prior labeling, identify two sets of response elements by unsupervised clustering. When combined, they give rise to three types of p53 binding sites. We find that probabilistic and alignment-based assessments of cross-species conservation show no strong evidence of differential conservation between types of binding sites. In contrast, functional analysis of the genes most proximal to the binding sites provides strong bioinformatic evidence of functional differentiation between the three types of binding sites. Our results are consistent with recent structural data identifying two conformations of the L1 loop in the DNA binding domain, suggesting that they reflect biologically meaningful groups imposed by the p53 protein structure.
ABSTRACT
The traditional structure to function paradigm conceives of a protein's function as emerging from its structure. In recent years, it has been established that unstructured, intrinsically disordered regions (IDRs) in proteins are equally crucial elements for protein function, regulation and homeostasis. In this review, we provide a brief overview of how IDRs can perform similar functions to structured proteins, focusing especially on the formation of protein complexes and assemblies and the mediation of regulated conformational changes. In addition to highlighting instances of such functional equivalence, we explain how differences in the biological and physicochemical properties of IDRs allow them to expand the functional and regulatory repertoire of proteins. We also discuss studies that provide insights into how mutations within functional regions of IDRs can lead to human diseases.
Subject(s)
Intrinsically Disordered Proteins , Protein Structure, Tertiary , Animals , Humans , Intrinsically Disordered Proteins/chemistry , Intrinsically Disordered Proteins/metabolism , Intrinsically Disordered Proteins/physiology , Mice , Models, Molecular , Protein Conformation , ProteomeABSTRACT
Intrinsically disordered regions (IDRs) are fundamental units of protein function and regulation. Despite their inability to form a unique stable tertiary structure in isolation, many IDRs adopt a defined conformation upon binding and achieve their function through their interactions with other biomolecules. However, this requirement for IDR functionality seems to be at odds with the high entropic cost they must incur upon binding an interaction partner. How is this seeming paradox resolved? While increasing the enthalpy of binding is one approach to compensate for this entropic cost, growing evidence suggests that inherent features of IDRs, for instance repeating linear motifs, minimise the entropic cost of binding. Moreover, this control of entropic cost can be carefully modulated by a range of regulatory mechanisms, such as alternative splicing and post-translational modifications, which enable allosteric communication and rheostat-like tuning of IDR function. In that sense, the high entropic cost of IDR binding can be advantageous by providing tunability to protein function. In addition to biological regulatory mechanisms, modulation of entropy can also be controlled by environmental factors, such as changes in temperature, redox-potential and pH. These principles are extensively exploited by a number of organisms, including pathogens. They can also be utilised in bioengineering, synthetic biology and in pharmaceutical applications such as increasing bioavailability of protein therapeutics.