ABSTRACT
We propose that spontaneous folding and molecular evolution of biopolymers are two universal aspects that must concur for life to happen. These aspects are fundamentally related to the chemical composition of biopolymers and crucially depend on the solvent in which they are embedded. We show that molecular information theory and energy landscape theory allow us to explore the limits that solvents impose on biopolymer existence. We consider 54 solvents, including water, alcohols, hydrocarbons, halogenated solvents, aromatic solvents, and low molecular weight substances made up of elements abundant in the universe, which may potentially take part in alternative biochemistries. We find that along with water, there are many solvents for which the liquid regime is compatible with biopolymer folding and evolution. We present a ranking of the solvents in terms of biopolymer compatibility. Many of these solvents have been found in molecular clouds or may be expected to occur in extrasolar planets.
Subject(s)
Solvents , Biopolymers/chemistry , Solvents/chemistry , Extraterrestrial Environment/chemistry , Evolution, Molecular , Water/chemistryABSTRACT
The guanine/cytosine (GC) content of prokaryotic genomes is species-specific, taking values from 16% to 77%. This diversity of selection for GC content remains contentious. We analyse the correlations between GC content and a range of phenotypic and genotypic data in thousands of prokaryotes. GC content integrates well with these traits into r/K selection theory when phenotypic plasticity is considered. High GC-content prokaryotes are r-strategists with cheaper descendants thanks to a lower average amino acid metabolic cost, colonize unstable environments thanks to flagella and a bacillus form and are generalists in terms of resource opportunism and their defence mechanisms. Low GC content prokaryotes are K-strategists specialized for stable environments that maintain homeostasis via a high-cost outer cell membrane and endospore formation as a response to nutrient deprivation, and attain a higher nutrient-to-biomass yield. The lower proteome cost of high GC content prokaryotes is driven by the association between GC-rich codons and cheaper amino acids in the genetic code, while the correlation between GC content and genome size may be partly due to functional diversity driven by r/K selection. In all, molecular diversity in the GC content of prokaryotes may be a consequence of ecological r/K selection.
Subject(s)
Amino Acids , Prokaryotic Cells , Base Composition , Amino Acids/analysis , Codon , Proteome/geneticsABSTRACT
Microbes are often discussed in terms of dichotomies such as copiotrophic/oligotrophic and fast/slow-growing microbes, defined using the characterisation of microbial growth in isolated cultures. The dichotomies are usually qualitative and/or study-specific, sometimes precluding clear-cut results interpretation. We can unravel microbial dichotomies as life history strategies by combining ecology theory with Monod curves, a laboratory mathematical tool of bacterial physiology that relates the specific growth rate of a microbe with the concentration of a limiting nutrient. Fitting of Monod curves provides quantities that directly correspond to key parameters in ecological theories addressing species coexistence and diversity, such as r/K selection theory, resource competition and community structure theory and the CSR triangle of life strategies. The resulting model allows us to reconcile the copiotrophic/oligotrophic and fast/slow-growing dichotomies as different subsamples of a life history strategy triangle that also includes r/K strategists. We also used the number of known carbon sources together with community structure theory to partially explain the diversity of heterotrophic microbes observed in metagenomics experiments. In sum, we propose a theoretical framework for the study of natural microbial communities that unifies several existing proposals. Its application would require the integration of metagenomics, metametabolomics, Monod curves and carbon source data.
Subject(s)
Bacteria , Microbiota , Bacteria/genetics , Heterotrophic Processes , Metagenomics , CarbonABSTRACT
Nearly 100 years ago, Winogradsky published a classic communication in which he described two groups of microbes, zymogenic and autochthonous. When organic matter penetrates the soil, zymogenic microbes quickly multiply and degrade it, then giving way to the slow combustion of autochthonous microbes. Although the text was originally written in French, it is often cited by English-speaking authors. We undertook a complete translation of the 1924 publication, which we provide as Supporting information. Here, we introduce the translation and describe how the zymogenic/autochthonous dichotomy shaped research questions in the study of microbial diversity and physiology. We also identify in the literature three additional and closely related dichotomies, which we propose to call exclusive copiotrophs/oligotrophs, coexisting copiotrophs/oligotrophs and fast-growing/slow-growing microbes. While Winogradsky focussed on a successional view of microbial populations over time, the current discussion is focussed on the differences in the specific growth rate of microbes as a function of the concentration of a given limiting substrate. In the future, it will be relevant to keep in mind both nutrient-focussed and time-focussed microbial dichotomies and to design experiments with both isolated laboratory cultures and multi-species communities in the spirit of Winogradsky's direct method.
Subject(s)
Bacteria , Soil Microbiology , Biodiversity , Bacteria/classification , Bacteria/cytology , Bacteria/metabolism , Soil/chemistry , EcosystemABSTRACT
We propose an application of molecular information theory to analyze the folding of single domain proteins. We analyze results from various areas of protein science, such as sequence-based potentials, reduced amino acid alphabets, backbone configurational entropy, secondary structure content, residue burial layers, and mutational studies of protein stability changes. We found that the average information contained in the sequences of evolved proteins is very close to the average information needed to specify a fold â¼2.2 ± 0.3 bits/(site·operation). The effective alphabet size in evolved proteins equals the effective number of conformations of a residue in the compact unfolded state at around 5. We calculated an energy-to-information conversion efficiency upon folding of around 50%, lower than the theoretical limit of 70%, but much higher than human-built macroscopic machines. We propose a simple mapping between molecular information theory and energy landscape theory and explore the connections between sequence evolution, configurational entropy, and the energetics of protein folding.
Subject(s)
Information Theory , Protein Folding , Humans , Protein Structure, Secondary , Proteins/chemistry , Entropy , Protein ConformationABSTRACT
Many disordered proteins conserve essential functions in the face of extensive sequence variation, making it challenging to identify the mechanisms responsible for functional selection. Here we identify the molecular mechanism of functional selection for the disordered adenovirus early gene 1A (E1A) protein. E1A competes with host factors to bind the retinoblastoma (Rb) protein, subverting cell cycle regulation. We show that two binding motifs tethered by a hypervariable disordered linker drive picomolar affinity Rb binding and host factor displacement. Compensatory changes in amino acid sequence composition and sequence length lead to conservation of optimal tethering across a large family of E1A linkers. We refer to this compensatory mechanism as conformational buffering. We also detect coevolution of the motifs and linker, which can preserve or eliminate the tethering mechanism. Conformational buffering and motif-linker coevolution explain robust functional encoding within hypervariable disordered linkers and could underlie functional selection of many disordered protein regions.
Subject(s)
Intrinsically Disordered Proteins , Adenovirus E1A Proteins/chemistry , Adenovirus E1A Proteins/genetics , Adenovirus E1A Proteins/metabolism , Amino Acid Sequence , Intrinsically Disordered Proteins/chemistry , Protein Binding , Protein Domains , Retinoblastoma Protein/metabolismABSTRACT
We study the limits imposed by transcription factor specificity on the maximum number of binding motifs that can coexist in a gene regulatory network, using the SwissRegulon Fantom5 collection of 684 human transcription factor binding sites as a model. We describe transcription factor specificity using regular expressions and find that most human transcription factor binding site motifs are separated in sequence space by one to three motif-discriminating positions. We apply theorems based on the pigeonhole principle to calculate the maximum number of transcription factors that can coexist given this degree of specificity, which is in the order of ten thousand and would fully utilize the space of DNA subsequences. Taking into account an expanded DNA alphabet with modified bases can further raise this limit by several orders of magnitude, at a lower level of sequence space usage. Our results may guide the design of transcription factors at both the molecular and system scale.
Subject(s)
DNA/metabolism , Nucleotide Motifs/genetics , Transcription Factors/metabolism , Algorithms , Base Sequence , Binding Sites , Humans , Protein BindingABSTRACT
The spike protein is the main protein component of the SARS-CoV-2 virion surface. The spike receptor-binding motif mediates recognition of the human angiotensin-converting enzyme 2 receptor, a critical step in infection, and is the preferential target for spike-neutralizing antibodies. Posttranslational modifications of the spike receptor-binding motif have been shown to modulate viral infectivity and host immune response, but these modifications are still being explored. Here we studied asparagine deamidation of the spike protein, a spontaneous event that leads to the appearance of aspartic and isoaspartic residues, which affect both the protein backbone and its charge. We used computational prediction and biochemical experiments to identify five deamidation hotspots in the SARS-CoV-2 spike protein. Asparagine residues 481 and 501 in the receptor-binding motif deamidate with a half-life of 16.5 and 123 days at 37 °C, respectively. Deamidation is significantly slowed at 4 °C, indicating a strong dependence of spike protein molecular aging on environmental conditions. Deamidation of the spike receptor-binding motif decreases the equilibrium constant for binding to the human angiotensin-converting enzyme 2 receptor more than 3.5-fold, yet its high conservation pattern suggests some positive effect on viral fitness. We propose a model for deamidation of the full SARS-CoV-2 virion illustrating how deamidation of the spike receptor-binding motif could lead to the accumulation on the virion surface of a nonnegligible chemically diverse spike population in a timescale of days. Our findings provide a potential mechanism for molecular aging of the spike protein with significant consequences for understanding virus infectivity and vaccine development.
Subject(s)
SARS-CoV-2/metabolism , Spike Glycoprotein, Coronavirus/metabolism , Amino Acid Motifs , Angiotensin-Converting Enzyme 2/chemistry , Angiotensin-Converting Enzyme 2/genetics , Angiotensin-Converting Enzyme 2/metabolism , COVID-19/pathology , COVID-19/virology , Humans , Hydrogen-Ion Concentration , Interferometry , Kinetics , Protein Binding , Protein Domains , Recombinant Proteins/biosynthesis , Recombinant Proteins/chemistry , Recombinant Proteins/isolation & purification , SARS-CoV-2/isolation & purification , Sequence Alignment , Spike Glycoprotein, Coronavirus/chemistryABSTRACT
Linear motifs are short protein subsequences that mediate protein interactions. Hundreds of motif classes including thousands of motif instances are known. Our theory estimates how many motif classes remain undiscovered. As commonly done, we describe motif classes as regular expressions specifying motif length and the allowed amino acids at each motif position. We measure motif specificity for a pair of motif classes by quantifying how many motif-discriminating positions prevent a protein subsequence from matching the two classes at once. We derive theorems for the maximal number of motif classes that can simultaneously maintain a certain number of motif-discriminating positions between all pairs of classes in the motif universe, for a given amino acid alphabet. We also calculate the fraction of all protein subsequences that would belong to a motif class if all potential motif classes came into existence. Naturally occurring pairs of motif classes present most often a single motif-discriminating position. This mild specificity maximizes the potential number of coexisting motif classes, the expansion of the motif universe due to amino acid modifications and the fraction of amino acid sequences that code for a motif instance. As a result, thousands of linear motif classes may remain undiscovered.
Subject(s)
Amino Acid Motifs , Sequence Analysis, Protein/methods , Humans , Sensitivity and Specificity , Sequence Analysis, Protein/standardsABSTRACT
Synthetic biology emerged in the USA and Europe twenty years ago and quickly developed innovative research and technology as a result of continued funding. Synthetic biology is also growing in many developing countries of Africa, Asia and Latin America, where it could have a large economic impact by helping its use of genetic biodiversity in order to boost existing industries. Starting in 2011, Argentine synthetic biology developed along an idiosyncratic path. In 2011-2012, the main focus was not exclusively research but also on community building through teaching and participation in iGEM, following the template of the early "MIT school" of synthetic biology. In 2013-2015, activities diversified and included society-centered projects, social science studies on synthetic biology and bioart. Standard research outputs such as articles and industrial applications helped consolidate several academic working groups. Since 2016, the lack of a critical mass of researchers and a funding crisis were partially compensated by establishing links with Latin American synthetic biologists and with other socially oriented open technology collectives. The TECNOx community is a central node in this growing research and technology network. The first four annual TECNOx meetings brought together synthetic biologists with other open science and engineering platforms and explored the relationship of Latin American technologies with entrepreneurship, open hardware, ethics and human rights. In sum, the socioeconomic context encouraged Latin American synthetic biology to develop in a meandering and diversifying manner. This revealed alternative ways for growth of the field that may be relevant to other developing countries.
Subject(s)
Synthetic Biology/education , Synthetic Biology/trends , Argentina , Developing Countries , Humans , Latin America , Residence Characteristics , Social Sciences , Synthetic Biology/methodsABSTRACT
Redox regulation in biology is largely operated by cysteine chemistry in response to a variety of cell environmental and intracellular stimuli. The high chemical reactivity of cysteines determines their conservation in functional roles, but their presence can also result in harmful oxidation limiting their general use by proteins. Papillomaviruses constitute a unique system for studying protein sequence evolution since there are hundreds of anciently evolved stable genomes. E7, the viral transforming factor, is a dimeric, cysteine-rich oncoprotein that shows both conserved structural and variable regulatory cysteines constituting an excellent model for uncovering the mechanism that drives the acquisition of redox-sensitive groups. By analyzing over 300 E7 sequences, we found that although noncanonical cysteines show no obvious sequence conservation pattern, they are nonrandomly distributed based on topological constrains. Regulatory residues are strictly excluded from six positions stabilizing the hydrophobic core while they are enriched in key positions located at the dimerization interface or around the Zn+2 ion. Oxidation of regulatory cysteines is linked to dimer dissociation, acting as a reversible redox-sensing mechanism that triggers a conformational switch. Based on comparative sequence analysis, molecular dynamics simulations and biophysical analysis, we propose a model in which the occurrence of cysteine-rich positions is dictated by topological constrains, providing an explanation to why a degenerate pattern of cysteines can be achieved in a family of homologs. Thus, topological principles should enable the possibility to identify hidden regulatory cysteines that are not accurately detected using sequence based methodology.
Subject(s)
Cysteine , Evolution, Molecular , Papillomavirus E7 Proteins/genetics , Amino Acid Sequence , DimerizationABSTRACT
E1A is the main transforming protein in mastadenoviruses. This work uses bioinformatics to extrapolate experimental knowledge from Human adenovirus serotype 5 and 12 E1A proteins to all known serotypes. A conserved domain architecture with a high degree of intrinsic disorder acts as a scaffold for multiple linear motifs with variable occurrence mediating the interaction with over fifty host proteins. While linear motifs contribute strongly to sequence conservation within intrinsically disordered E1A regions, motif repertoires can deviate significantly from those found in prototypical serotypes. Close to one hundred predicted residue-residue contacts suggest the presence of stable structure in the CR3 domain and of specific conformational ensembles involving both short- and long-range intramolecular interactions. Our computational results suggest that E1A sequence conservation and co-evolution reflect the evolutionary pressure to maintain a mainly disordered, yet non-random conformation harboring a high number of binding motifs that mediate viral hijacking of the cell machinery.
Subject(s)
Adenovirus E1A Proteins/metabolism , Adenoviruses, Human/metabolism , Adenovirus E1A Proteins/chemistry , Adenovirus E1A Proteins/genetics , Amino Acid Motifs , Amino Acid Sequence , Humans , Protein Conformation , Protein Domains , Protein Modification, TranslationalABSTRACT
Intrinsic disorder is a major structural category in biology, accounting for more than 30% of coding regions across the domains of life, yet consists of conformational ensembles in equilibrium, a major challenge in protein chemistry. Anciently evolved papillomavirus genomes constitute an unparalleled case for sequence to structure-function correlation in cases in which there are no folded structures. E7, the major transforming oncoprotein of human papillomaviruses, is a paradigmatic example among the intrinsically disordered proteins. Analysis of a large number of sequences of the same viral protein allowed for the identification of a handful of residues with absolute conservation, scattered along the sequence of its N-terminal intrinsically disordered domain, which intriguingly are mostly leucine residues. Mutation of these led to a pronounced increase in both α-helix and ß-sheet structural content, reflected by drastic effects on equilibrium propensities and oligomerization kinetics, and uncovers the existence of local structural elements that oppose canonical folding. These folding relays suggest the existence of yet undefined hidden structural codes behind intrinsic disorder in this model protein. Thus, evolution pinpoints conformational hot spots that could have not been identified by direct experimental methods for analyzing or perturbing the equilibrium of an intrinsically disordered protein ensemble.
Subject(s)
Human papillomavirus 16/metabolism , Intrinsically Disordered Proteins/chemistry , Models, Molecular , Papillomavirus E7 Proteins/chemistry , Amino Acid Sequence , Amino Acid Substitution , Base Sequence , Conserved Sequence , DNA, Viral/chemistry , DNA, Viral/metabolism , Gene Deletion , Hydrogen-Ion Concentration , Intrinsically Disordered Proteins/genetics , Intrinsically Disordered Proteins/metabolism , Leucine/chemistry , Mutagenesis, Site-Directed , Papillomavirus E7 Proteins/genetics , Papillomavirus E7 Proteins/metabolism , Peptide Fragments/chemistry , Peptide Fragments/genetics , Peptide Fragments/metabolism , Point Mutation , Protein Conformation , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Protein Folding , Protein Stability , Recombinant Proteins/chemistry , Recombinant Proteins/metabolism , Sequence AlignmentABSTRACT
Infection with oncogenic human papillomavirus induces deregulation of cellular redox homeostasis. Virus replication and papillomavirus-induced cell transformation require persistent expression of viral oncoproteins E7 and E6 that must retain their functionality in a persistent oxidative environment. Here, we dissected the molecular mechanisms by which E7 oncoprotein can sense and manage the potentially harmful oxidative environment of the papillomavirus-infected cell. The carboxy terminal domain of E7 protein from most of the 79 papillomavirus viral types of alpha genus, which encloses all the tumorigenic viral types, is a cysteine rich domain that contains two classes of cysteines: strictly conserved low reactive Zn+2 binding and degenerate reactive cysteine residues that can sense reactive oxygen species (ROS). Based on experimental data obtained from E7 proteins from the prototypical viral types 16, 18 and 11, we identified a couple of low pKa nucleophilic cysteines that can form a disulfide bridge upon the exposure to ROS and regulate the cytoplasm to nucleus transport. From sequence analysis and phylogenetic reconstruction of redox sensing states we propose that reactive cysteine acquisition through evolution leads to three separate E7s protein families that differ in the ROS sensing mechanism: non ROS-sensitive E7s; ROS-sensitive E7s using only a single or multiple reactive cysteine sensing mechanisms and ROS-sensitive E7s using a reactive-resolutive cysteine couple sensing mechanism.
Subject(s)
Cysteine/metabolism , Neoplasms/genetics , Oxidative Stress/genetics , Papillomavirus E7 Proteins/metabolism , Cell Nucleolus/metabolism , Cell Transformation, Neoplastic/genetics , Cell Transformation, Neoplastic/pathology , Cysteine/genetics , Cytoplasm/metabolism , Disulfides/metabolism , Neoplasms/metabolism , Neoplasms/pathology , Oxidation-Reduction , Papillomavirus E7 Proteins/genetics , Protein Transport/genetics , Virus Replication/geneticsABSTRACT
Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".
Subject(s)
Amides/chemistry , Intrinsically Disordered Proteins/chemistry , Sequence Analysis, Protein/methods , Software , Amino Acid Sequence , Animals , Asparagine/chemistry , Humans , Interferon-beta/chemistry , Interferon-beta/metabolism , Molecular Sequence Data , Protein Processing, Post-Translational , Protein Structure, Secondary , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/metabolism , Superoxide Dismutase/chemistry , Superoxide Dismutase/metabolism , bcl-X Protein/chemistry , bcl-X Protein/metabolismABSTRACT
The nonstructural NS1 protein is an essential virulence factor of the human respiratory syncytial virus, with a predominant role in the inhibition of the host antiviral innate immune response. This inhibition is mediated by multiple protein-protein interactions and involves the formation of large oligomeric complexes. There is neither a structure nor sequence or functional homologues of this protein, which points to a distinctive mechanism for blocking the interferon response among viruses. The NS1 native monomer follows a simple unfolding kinetics via a nativelike transition state ensemble, with a half-life of 45 min, in agreement with a highly stable core structure at equilibrium. Refolding is a complex process that involves several slowly interconverting species compatible with proline isomerization. However, an ultrafast folding event with a half-life of 0.2 ms is indicative of a highly folding compatible species within the unfolded state ensemble. On the other hand, the oligomeric assembly route from the native monomer, which does not involve unfolding, shows a monodisperse and irreversible end-point species triggered by a mild temperature change, with half-lives of 160 and 26 min at 37 and 47 °C, respectively, and at a low protein concentration (10 µM). A large secondary structure change into ß-sheet structure and the formation of a dimeric nucleus precede polymerization by the sequential addition of monomers at the surprisingly low rate of one monomer every 34 s. The polymerization phase is followed by the binding to thioflavin-T indicative of amyloid-like, albeit soluble, repetitive ß-sheet quaternary structure. The overall process is reversible only up until ~8 min, a time window in which most of the secondary structure change takes place. NS1's multiple binding activities must be accommodated in a few binding interfaces at most, something to be considered remarkable given its small size (15 kDa). Thus, conformational heterogeneity, and in particular oligomer formation, may provide a means of expand its binding repertoire. These equilibria will be determined by variables such as macromolecular crowding, protein-protein interactions, expression levels, turnover, or specific subcellular localization. The irreversible and quasi-spontaneous nature of the oligomer assembly, together with the fact that NS1 is the most abundant viral protein in infected cells, makes its accumulation highly conceivable under conditions compatible with the cellular milieu. The implications of NS1 oligomers in the viral life cycle and the inhibition of host innate immune response remain to be determined.
Subject(s)
Interferons/metabolism , Protein Folding , Protein Multimerization , Respiratory Syncytial Virus, Human/metabolism , Viral Nonstructural Proteins/chemistry , Viral Nonstructural Proteins/pharmacology , Humans , Kinetics , Protein Binding , Protein Refolding , Protein Structure, Quaternary , Protein Unfolding , Respiratory Syncytial Virus, Human/physiology , Solubility , Species Specificity , Substrate Specificity , Temperature , Viral Nonstructural Proteins/metabolismABSTRACT
In this work, the unfolding mechanism of oxidized Escherichia coli thioredoxin (EcTRX) was investigated experimentally and computationally. We characterized seven point mutants distributed along the C-terminal α-helix (CTH) and the preceding loop. The mutations destabilized the protein against global unfolding while leaving the native structure unchanged. Global analysis of the unfolding kinetics of all variants revealed a linear unfolding route with a high-energy on-pathway intermediate state flanked by two transition state ensembles TSE1 and TSE2. The experiments show that CTH is mainly unfolded in TSE1 and the intermediate and becomes structured in TSE2. Structure-based molecular dynamics are in agreement with these experiments and provide protein-wide structural information on transient states. In our model, EcTRX folding starts with structure formation in the ß-sheet, while the protein helices coalesce later. As a whole, our results indicate that the CTH is a critical module in the folding process, restraining a heterogeneous intermediate ensemble into a biologically active native state and providing the native protein with thermodynamic and kinetic stability.
Subject(s)
Protein Conformation , Protein Folding , Protein Structure, Secondary , Thioredoxins/chemistry , Escherichia coli , Kinetics , Molecular Dynamics Simulation , Point Mutation , Protein Unfolding , Thermodynamics , Thioredoxins/geneticsABSTRACT
The 20 protein-coding amino acids are found in proteomes with different relative abundances. The most abundant amino acid, leucine, is nearly an order of magnitude more prevalent than the least abundant amino acid, cysteine. Amino acid metabolic costs differ similarly, constraining their incorporation into proteins. On the other hand, a diverse set of protein sequences is necessary to build functional proteomes. Here, we present a simple model for a cost-diversity trade-off postulating that natural proteomes minimize amino acid metabolic flux while maximizing sequence entropy. The model explains the relative abundances of amino acids across a diverse set of proteomes. We found that the data are remarkably well explained when the cost function accounts for amino acid chemical decay. More than 100 organisms reach comparable solutions to the trade-off by different combinations of proteome cost and sequence diversity. Quantifying the interplay between proteome size and entropy shows that proteomes can get optimally large and diverse.
Subject(s)
Amino Acids/metabolism , Genome , Models, Biological , Protein Biosynthesis/genetics , Proteome/metabolism , Adenosine Triphosphate/metabolism , Amino Acid Sequence , Amino Acids/chemistry , Amino Acids/genetics , Entropy , Genomic Structural Variation , Least-Squares Analysis , Molecular Sequence Data , Proteome/chemistry , Proteome/geneticsABSTRACT
The E7 protein from high-risk human papillomavirus is essential for cell transformation in cervical, oropharyngeal, and other HPV-related cancers, mainly through the inactivation of the retinoblastoma (Rb) tumor suppressor. Its high cysteine content (~7%) and the observation that HPV-transformed cells are under oxidative stress prompted us to investigate the redox properties of the HPV16 E7 protein under biologically compatible oxidative conditions. The seven cysteines in HPV16 E7 remain reduced in conditions resembling the basal reduced state of a cell. However, under oxidative stress, a stable disulfide bridge forms between cysteines 59 and 68. Residue 59 has a protective effect on the other cysteines, and its mutation leads to an overall increase in the oxidation propensity of E7, including cysteine 24 central to the Rb binding motif. Gluthationylation of Cys 24 abolishes Rb binding, which is reversibly recovered upon reduction. Cysteines 59 and 68 are located 18.6 Å apart, and the formation of the disulfide bridge leads to a large structural rearrangement while retaining strong Zn association. These conformational and covalent changes are fully reversible upon restoration of the reductive environment. In addition, this is the first evidence of an interaction between the N-terminal intrinsically disordered and the C-terminal globular domains, known to be highly and separately conserved among human papillomaviruses. The significant conservation of such noncanonical cysteines in HPV E7 proteins leads us to propose a functional redox activity. Such an activity adds to the previously discovered chaperone activity of E7 and supports the picture of a moonlighting pathological role of this paradigmatic viral oncoprotein.