RESUMO
MOTIVATION: Characterizing the structure of flexible proteins, particularly within the realm of intrinsic disorder, presents a formidable challenge due to their high conformational variability. Currently, their structural representation relies on (possibly large) conformational ensembles derived from a combination of experimental and computational methods. The detailed structural analysis of these ensembles is a difficult task, for which existing tools have limited effectiveness. RESULTS: This study proposes an innovative extension of the concept of contact maps to the ensemble framework, incorporating the intrinsic probabilistic nature of disordered proteins. Within this framework, a conformational ensemble is characterized through a weighted family of contact maps. To achieve this, conformations are first described using a refined definition of contact that appropriately accounts for the geometry of the inter-residue interactions and the sequence context. Representative structural features of the ensemble naturally emerge from the subsequent clustering of the resulting contact-based descriptors. Importantly, transiently-populated structural features are readily identified within large ensembles. The performance of the method is illustrated by several use cases and compared with other existing approaches, highlighting its superiority in capturing relevant structural features of highly flexible proteins. AVAILABILITY AND IMPLEMENTATION: An open-source implementation of the method is provided together with an easy-to-use Jupyter notebook, available at https://gitlab.laas.fr/moma/WARIO. SUPPLEMENTARY INFORMATION: Implementation details and additional results are provided in (ADD LINK TO SUPP. INFO. FILE).
RESUMO
The adenosine di-phosphate (ADP) ribosylation factor (Arf) small guanosine tri-phosphate (GTP)ases function as molecular switches to activate signaling cascades that control membrane organization in eukaryotic cells. In Arf1, the GDP/GTP switch does not occur spontaneously but requires guanine nucleotide exchange factors (GEFs) and membranes. Exchange involves massive conformational changes, including disruption of the core ß-sheet. The mechanisms by which this energetically costly switch occurs remain to be elucidated. To probe the switch mechanism, we coupled pressure perturbation with nuclear magnetic resonance (NMR), Fourier Transform infra-red spectroscopy (FTIR), small-angle X-ray scattering (SAXS), fluorescence, and computation. Pressure induced the formation of a classical molten globule (MG) ensemble. Pressure also favored the GDP to GTP transition, providing strong support for the notion that the MG ensemble plays a functional role in the nucleotide switch. We propose that the MG ensemble allows for switching without the requirement for complete unfolding and may be recognized by GEFs. An MG-based switching mechanism could constitute a pervasive feature in Arfs and Arf-like GTPases, and more generally, the evolutionarily related (Ras-like small GTPases) Rags and Gα GTPases.
Assuntos
Fator 1 de Ribosilação do ADP , Guanosina Difosfato , Guanosina Trifosfato , Guanosina Difosfato/metabolismo , Fator 1 de Ribosilação do ADP/metabolismo , Fator 1 de Ribosilação do ADP/química , Fator 1 de Ribosilação do ADP/genética , Guanosina Trifosfato/metabolismo , Humanos , Espalhamento a Baixo Ângulo , Difração de Raios X , Fatores de Troca do Nucleotídeo Guanina/metabolismo , Fatores de Troca do Nucleotídeo Guanina/química , Conformação Proteica , Espectroscopia de Infravermelho com Transformada de Fourier , Modelos MolecularesRESUMO
Homorepeats (or polyX), protein segments containing repetitions of the same amino acid, are abundant in proteomes from all kingdoms of life and are involved in crucial biological functions as well as several neurodegenerative and developmental diseases. Mainly inserted in disordered segments of proteins, the structure/function relationships of homorepeats remain largely unexplored. In this review, we summarize present knowledge for the most abundant homorepeats, highlighting the role of the inherent structure and the conformational influence exerted by their flanking regions. Recent experimental and computational methods enable residue-specific investigations of these regions and promise novel structural and dynamic information for this elusive group of proteins. This information should increase our knowledge about the structural bases of phenomena such as liquid-liquid phase separation and trinucleotide repeat disorders.
Assuntos
Proteínas Intrinsicamente Desordenadas , Proteoma , Proteoma/química , Conformação Proteica , Sequências Repetitivas de Aminoácidos , Aminoácidos , Relação Estrutura-Atividade , Proteínas Intrinsicamente Desordenadas/químicaRESUMO
Cell cycle transitions result from global changes in protein phosphorylation states triggered by cyclin-dependent kinases (CDKs). To understand how this complexity produces an ordered and rapid cellular reorganisation, we generated a high-resolution map of changing phosphosites throughout unperturbed early cell cycles in single Xenopus embryos, derived the emergent principles through systems biology analysis, and tested them by biophysical modelling and biochemical experiments. We found that most dynamic phosphosites share two key characteristics: they occur on highly disordered proteins that localise to membraneless organelles, and are CDK targets. Furthermore, CDK-mediated multisite phosphorylation can switch homotypic interactions of such proteins between favourable and inhibitory modes for biomolecular condensate formation. These results provide insight into the molecular mechanisms and kinetics of mitotic cellular reorganisation.
Assuntos
Proteínas de Ciclo Celular , Quinases Ciclina-Dependentes , Quinases Ciclina-Dependentes/metabolismo , Fosforilação , Proteínas de Ciclo Celular/metabolismo , Ciclo Celular , Quinase 2 Dependente de Ciclina/metabolismoRESUMO
Arrestin-dependent G protein-coupled receptor (GPCR) signaling pathway is regulated by the phosphorylation state of GPCR's C-terminal domain, but the molecular bases of arrestin:receptor interaction are to be further illuminated. Here we investigated the impact of phosphorylation on the conformational features of the C-terminal region from three rhodopsin-like GPCRs, the vasopressin V2 receptor (V2R), the growth hormone secretagogue or ghrelin receptor type 1a (GHSR), and the ß2-adernergic receptor (ß2AR). Using phosphomimetic variants, we identified pre-formed secondary structure elements, or short linear motifs (SLiMs), that undergo specific conformational transitions upon phosphorylation. Of importance, such conformational transitions appear to favor arrestin-2 binding. Hence, our results suggest a model in which the phosphorylation-dependent structuration of the GPCR C-terminal regions would modulate arrestin binding and therefore signaling outcomes in arrestin-dependent pathways.
Assuntos
Arrestina , Receptores Acoplados a Proteínas G , Arrestina/química , Fosforilação , Receptores Acoplados a Proteínas G/genética , Receptores Acoplados a Proteínas G/metabolismo , Transdução de Sinais , Rodopsina/químicaRESUMO
Nuclear magnetic resonance (NMR) studies of large biomolecular machines and highly repetitive proteins remain challenging due to the difficulty of assigning frequencies to individual nuclei. Here, we present an efficient strategy to address this challenge by engineering a Pyrococcus horikoshii tRNA/alanyl-tRNA synthetase pair that enables the incorporation of up to three isotopically labeled alanine residues in a site-specific manner using in vitro protein expression. The general applicability of this approach for NMR assignment has been demonstrated by introducing isotopically labeled alanines into four distinct proteins: huntingtin exon-1, HMA8 ATPase, the 300 kDa molecular chaperone ClpP, and the alanine-rich Phox2B transcription factor. For large protein assemblies, our labeling approach enabled unambiguous assignments while avoiding potential artifacts induced by site-specific mutations. When applied to Phox2B, which contains two poly-alanine tracts of nine and twenty alanines, we observed that the helical stability is strongly dependent on the homorepeat length. The capacity to selectively introduce alanines with distinct labeling patterns is a powerful tool to probe structure and dynamics of challenging biomolecular systems.
Assuntos
Alanina , Proteínas , Alanina/química , Ressonância Magnética Nuclear Biomolecular , Proteínas/metabolismoRESUMO
The compaction of mitochondrial DNA (mtDNA) is regulated by architectural HMG-box proteins whose limited cross-species similarity suggests diverse underlying mechanisms. Viability of Candida albicans, a human antibiotic-resistant mucosal pathogen, is compromised by altering mtDNA regulators. Among them, there is the mtDNA maintenance factor Gcf1p, which differs in sequence and structure from its human and Saccharomyces cerevisiae counterparts, TFAM and Abf2p. Our crystallographic, biophysical, biochemical and computational analysis showed that Gcf1p forms dynamic protein/DNA multimers by a combined action of an N-terminal unstructured tail and a long helix. Furthermore, an HMG-box domain canonically binds the minor groove and dramatically bends the DNA while, unprecedentedly, a second HMG-box binds the major groove without imposing distortions. This architectural protein thus uses its multiple domains to bridge co-aligned DNA segments without altering the DNA topology, revealing a new mechanism of mtDNA condensation.
Assuntos
Candida albicans , DNA Mitocondrial , Proteínas de Ligação a DNA , Proteínas Fúngicas , Humanos , Candida albicans/genética , Candida albicans/metabolismo , DNA Mitocondrial/metabolismo , Proteínas de Ligação a DNA/metabolismo , Mitocôndrias/metabolismo , Proteínas Mitocondriais/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Fatores de Transcrição/metabolismo , Proteínas Fúngicas/metabolismoRESUMO
Huntington's disease neurodegeneration occurs when the number of consecutive glutamines in the huntingtin exon-1 (HTTExon1) exceeds a pathological threshold of 35. The sequence homogeneity of HTTExon1 reduces the signal dispersion in NMR spectra, hampering its structural characterization. By simultaneously introducing three isotopically labeled glutamines in a site-specific manner in multiple concatenated samples, 18 glutamines of a pathogenic HTTExon1 with 36 glutamines were unambiguously assigned. Chemical shift analyses indicate the α-helical persistence in the homorepeat and the absence of an emerging toxic conformation around the pathological threshold. Using the same type of samples, the recognition mechanism of Hsc70 molecular chaperone has been investigated, indicating that it binds to the N17 region of HTTExon1, inducing the partial unfolding of the poly-Q. The proposed strategy facilitates high-resolution structural and functional studies in low-complexity regions.
Assuntos
Peptídeos , Peptídeos/química , Éxons , Conformação Proteica em alfa-Hélice , Espectroscopia de Ressonância Magnética , Proteína Huntingtina/químicaRESUMO
Huntington's disease is a neurodegenerative disorder caused by a CAG expansion in the first exon of the HTT gene, resulting in an extended polyglutamine (poly-Q) tract in huntingtin (httex1). The structural changes occurring to the poly-Q when increasing its length remain poorly understood due to its intrinsic flexibility and the strong compositional bias. The systematic application of site-specific isotopic labeling has enabled residue-specific NMR investigations of the poly-Q tract of pathogenic httex1 variants with 46 and 66 consecutive glutamines. Integrative data analysis reveals that the poly-Q tract adopts long α-helical conformations propagated and stabilized by glutamine side chain to backbone hydrogen bonds. We show that α-helical stability is a stronger signature in defining aggregation kinetics and the structure of the resulting fibrils than the number of glutamines. Our observations provide a structural perspective of the pathogenicity of expanded httex1 and pave the way to a deeper understanding of poly-Q-related diseases.
Assuntos
Éxons , Proteína Huntingtina/genética , Proteína Huntingtina/química , Espectroscopia de Ressonância Magnética , Conformação Proteica em alfa-HéliceRESUMO
The structural investigation of intrinsically disordered proteins (IDPs) requires ensemble models describing the diversity of the conformational states of the molecule. Due to their probabilistic nature, there is a need for new paradigms that understand and treat IDPs from a purely statistical point of view, considering their conformational ensembles as well-defined probability distributions. In this work, we define a conformational ensemble as an ordered set of probability distributions and provide a suitable metric to detect differences between two given ensembles at the residue level, both locally and globally. The underlying geometry of the conformational space is properly integrated, one ensemble being characterized by a set of probability distributions supported on the three-dimensional Euclidean space (for global-scale comparisons) and on the two-dimensional flat torus (for local-scale comparisons). The inherent uncertainty of the data is also taken into account to provide finer estimations of the differences between ensembles. Additionally, an overall distance between ensembles is defined from the differences at the residue level. We illustrate the potential of the approach with several examples of applications for the comparison of conformational ensembles: (i) produced from molecular dynamics (MD) simulations using different force fields, and (ii) before and after refinement with experimental data. We also show the usefulness of the method to assess the convergence of MD simulations, and discuss other potential applications such as in machine-learning-based approaches. The numerical tool has been implemented in Python through easy-to-use Jupyter Notebooks available at https://gitlab.laas.fr/moma/WASCO.
Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas Intrinsicamente Desordenadas/química , Conformação Proteica , Simulação de Dinâmica Molecular , Probabilidade , Aprendizado de MáquinaRESUMO
The structural characterization of polydisperse systems consisting of multiple coexisting species or conformations is very challenging or impossible with classical approaches. As a consequence, the structural bases of relevant questions related to protein folding, transient partner recognition, conformational transitions or fibrillation remain poorly understood. Small-Angle Scattering (SAS) techniques structurally probe species present in solution in a population-weighted manner, enabling the inspection of polydisperse systems. However, decomposition of these data to derive the contribution of individual components is not straightforward and requires the acquisition of large SAS datasets and adapted mathematical tools. Here, we present a detailed procedure for the usage of the program COSMiCS for the decomposition of SAS datasets. COSMiCS adapts the popular MCR-ALS chemometrics routine to the specificities of scattering data. Through the use of multiple SAS representations, the appropriate scaling of the data and the possibility to simultaneously decompose multiple orthogonal datasets, COSMiCS efficiently disentangles mixtures and provides species-specific structural and thermodynamic/kinetic information of the process under investigation. Although exemplified for a transient biomolecular interaction, our chemometrics strategy can be applied to many other biological processes that can be straightforwardly probed in last generation SAS beamlines. Indeed, recent experimental setups, including microfluidics and stop-flow devices, coupled to fast-reading detectors can yield large concentration or time-dependent datasets that can be decomposed with COSMiCS. Importantly, as an open-source code, previously known features of the system of interest can be introduced as constraints in the optimization, producing robust solutions for biological systems of increasing complexity.
Assuntos
Quimiometria , Microfluídica , Cinética , Dobramento de Proteína , Espalhamento a Baixo ÂnguloRESUMO
Yeast eIF4G1 interacts with RNA binding proteins (RBPs) like Pab1 and Pub1 affecting its function in translation initiation and stress granules formation. We present an NMR and SAXS study of the N-terminal intrinsically disordered region of eIF4G1 (residues 1-249) and its interactions with Pub1, Pab1 and RNA. The conformational ensemble of eIF4G11-249 shows an α-helix within the BOX3 conserved element and a dynamic network of fuzzy π-π and π-cation interactions involving arginine and aromatic residues. The Pab1 RRM2 domain interacts with eIF4G1 BOX3, the canonical interaction site, but also with BOX2, a conserved element of unknown function to date. The RNA1 region interacts with RNA through a new RNA interaction motif and with the Pub1 RRM3 domain. This later also interacts with eIF4G1 BOX1 modulating its intrinsic self-assembly properties. The description of the biomolecular interactions involving eIF4G1 to the residue detail increases our knowledge about biological processes involving this key translation initiation factor.
RESUMO
Backbone dihedral angles Ï and ψ are the main structural descriptors of proteins and peptides. The distribution of these angles has been investigated over decades as they are essential for the validation and refinement of experimental measurements, as well as for structure prediction and design methods. The dependence of these distributions, not only on the nature of each amino acid but also on that of the closest neighbors, has been the subject of numerous studies. Although neighbor-dependent distributions are nowadays generally accepted as a good model, there is still some controversy about the combined effects of left and right neighbors. We have investigated this question using rigorous methods based on recently-developed statistical techniques. Our results unambiguously demonstrate that the influence of left and right neighbors cannot be considered independently. Consequently, three-residue fragments should be considered as the minimal building blocks to investigate polypeptide sequence-structure relationships.
Assuntos
PeptídeosRESUMO
MOTIVATION: Poly-alanine (polyA) regions are protein stretches mostly composed of alanines. Despite their abundance in eukaryotic proteomes and their association to nine inherited human diseases, the structural and functional roles exerted by polyA stretches remain poorly understood. In this work we study how the amino acid context in which polyA regions are settled in proteins influences their structure and function. RESULTS: We identified glycine and proline as the most abundant amino acids within polyA and in the flanking regions of polyA tracts, in human proteins as well as in 17 additional eukaryotic species. Our analyses indicate that the non-structuring nature of these two amino acids influences the α-helical conformations predicted for polyA, suggesting a relevant role in reducing the inherent aggregation propensity of long polyA. Then, we show how polyA position in protein N-termini relates with their function as transit peptides. PolyA placed just after the initial methionine is often predicted as part of mitochondrial transit peptides, whereas when placed in downstream positions, polyA are part of signal peptides. A few examples from known structures suggest that short polyA can emerge by alanine substitutions in α-helices; but evolution by insertion is observed for longer polyA. Our results showcase the importance of studying the sequence context of homorepeats as a mechanism to shape their structure-function relationships. AVAILABILITY AND IMPLEMENTATION: The datasets used and/or analyzed during the current study are available from the corresponding author onreasonable request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Alanina , Poli A , Humanos , Sequência de Aminoácidos , Proteoma , Peptídeos/químicaRESUMO
There is increasing evidence that many intrinsically disordered regions (IDRs) in proteins play key functional roles through interactions with other proteins or nucleic acids. These interactions often exhibit a context-dependent structural behavior. We hypothesize that low complexity regions (LCRs), often found within IDRs, could have a role in inducing local structure in IDRs. To test this, we predicted IDRs in the human proteome and analyzed their structures or those of homologous sequences in the Protein Data Bank (PDB). We then identified two types of simple LCRs within IDRs: regions with only one (polyX or homorepeats) or with only two types of amino acids (polyXY). We were able to assign structural information from the PDB more often to these LCRs than to the surrounding IDRs (polyX 61.8% > polyXY 50.5% > IDRs 39.7%). The most frequently observed polyX and polyXY within IDRs contained E (Glu) or G (Gly). Structural analyses of these sequences and of homologs indicate that polyEK regions induce helical conformations, while the other most frequent LCRs induce coil structures. Our work proposes bioinformatics methods to help in the study of the structural behavior of IDRs and provides a solid basis suggesting a structuring role of LCRs within them.
Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas , Aminoácidos , Biologia Computacional , Bases de Dados de Proteínas , Humanos , Proteínas Intrinsicamente Desordenadas/química , Conformação Proteica , Domínios Proteicos , Proteínas/químicaRESUMO
Many disordered proteins conserve essential functions in the face of extensive sequence variation, making it challenging to identify the mechanisms responsible for functional selection. Here we identify the molecular mechanism of functional selection for the disordered adenovirus early gene 1A (E1A) protein. E1A competes with host factors to bind the retinoblastoma (Rb) protein, subverting cell cycle regulation. We show that two binding motifs tethered by a hypervariable disordered linker drive picomolar affinity Rb binding and host factor displacement. Compensatory changes in amino acid sequence composition and sequence length lead to conservation of optimal tethering across a large family of E1A linkers. We refer to this compensatory mechanism as conformational buffering. We also detect coevolution of the motifs and linker, which can preserve or eliminate the tethering mechanism. Conformational buffering and motif-linker coevolution explain robust functional encoding within hypervariable disordered linkers and could underlie functional selection of many disordered protein regions.
Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas E1A de Adenovirus/química , Proteínas E1A de Adenovirus/genética , Proteínas E1A de Adenovirus/metabolismo , Sequência de Aminoácidos , Proteínas Intrinsicamente Desordenadas/química , Ligação Proteica , Domínios Proteicos , Proteína do Retinoblastoma/metabolismoRESUMO
Atomically precise gold nanoclusters are a fascinating class of nanomaterials that exhibit molecule-like properties and have outstanding photoluminescence (PL). Their ultrasmall size, molecular chemistry, and biocompatibility make them extremely appealing for selective biomolecule labeling in investigations of biological mechanisms at the cellular and anatomical levels. In this work, we report a simple route to incorporate a preformed Au25 nanocluster into a model bovine serum albumin (BSA) protein. A new approach combining small-angle X-ray scattering and molecular modeling provides a clear localization of a single Au25 within the protein to a cysteine residue on the gold nanocluster surface. Attaching Au25 to BSA strikingly modifies the PL properties with enhancement and a redshift in the second near-infrared (NIR-II) window. This study paves the way to conrol the design of selective sensitive probes in biomolecules through a ligand-based strategy to enable the optical detection of biomolecules in a cellular environment by live imaging.
Assuntos
Nanopartículas Metálicas , Nanoestruturas , Ouro/química , Ligantes , Nanopartículas Metálicas/química , Soroalbumina Bovina/químicaRESUMO
Arrestin-dependent pathways are a central component of G protein-coupled receptor (GPCRs) signaling. However, the molecular processes regulating arrestin binding are to be further illuminated, in particular with regard to the structural impact of GPCR C-terminal disordered regions. Here, we used an integrated biophysical strategy to describe the basal conformations of the C-terminal domains of three class A GPCRs, the vasopressin V2 receptor (V2R), the growth hormone secretagogue or ghrelin receptor type 1a (GHSR) and the ß2-adernergic receptor (ß2AR). By doing so, we revealed the presence of transient secondary structures in these regions that are potentially involved in the interaction with arrestin. These secondary structure elements differ from those described in the literature in interaction with arrestin. This suggests a mechanism where the secondary structure conformational preferences in the C-terminal regions of GPCRs could be a central feature for optimizing arrestins recognition.
Assuntos
Arrestina , Arrestinas , Arrestina/metabolismo , Arrestinas/metabolismo , Estrutura Secundária de Proteína , Receptores Acoplados a Proteínas G/metabolismoRESUMO
In signaling proteins, intrinsically disordered regions often represent regulatory elements, which are sensitive to environmental effects, ligand binding, and post-translational modifications. The conformational space sampled by disordered regions can be affected by environmental stimuli and these changes trigger, vis a vis effector domain, downstream processes. The disordered nature of these regulatory elements enables signal integration and graded responses but prevents the application of classical approaches for drug screening based on the existence of a fixed three-dimensional structure. We have designed a genetically encodable biosensor for the N-terminal regulatory element of the c-Src kinase, the first discovered protooncogene and lead representative of the Src family of kinases. The biosensor is formed by two fluorescent proteins forming a FRET pair fused at the two extremes of a construct including the SH4, unique and SH3 domains of Src. An internal control is provided by an engineered proteolytic site allowing the generation of an identical mixture of the disconnected fluorophores. We show FRET variations induced by ligand binding. The biosensor has been used for a high-throughput screening of a library of 1669 compounds with seven hits confirmed by NMR.
Assuntos
Técnicas Biossensoriais , Quinases da Família src , Sequência de Aminoácidos , Transferência Ressonante de Energia de Fluorescência , Ligação Proteica , Quinases da Família src/química , Quinases da Família src/metabolismoRESUMO
Escherichia coli is a Gram-negative bacterium that colonises the human intestine and virulent strains can cause severe diarrhoeal and extraintestinal diseases. The protein SslE is secreted by a range of pathogenic and commensal E. coli strains. It can degrade mucins in the intestine, promotes biofilm maturation and it is a major determinant of infection in virulent strains, although how it carries out these functions is not well understood. Here, we examine SslE from the commensal E. coli Waksman and BL21 (DE3) strains and the enterotoxigenic H10407 and enteropathogenic E2348/69 strains. We reveal that SslE has a unique and dynamic structure in solution and in response to acidification within mature biofilms it can form a unique aggregate with amyloid-like properties. Furthermore, we show that both SslE monomers and aggregates bind DNA in vitro and co-localise with extracellular DNA (eDNA) in mature biofilms, and SslE aggregates may also associate with cellulose under certain conditions. Our results suggest that interactions between SslE and eDNA are important for biofilm maturation in many E. coli strains and SslE may also be a factor that drives biofilm formation in other SslE-secreting bacteria.