Your browser doesn't support javascript.
loading
A sequence-based foldability score combined with AlphaFold2 predictions to disentangle the protein order/disorder continuum.
Bruley, Apolline; Bitard-Feildel, Tristan; Callebaut, Isabelle; Duprat, Elodie.
Afiliación
  • Bruley A; Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France.
  • Bitard-Feildel T; Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France.
  • Callebaut I; Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France.
  • Duprat E; Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France.
Proteins ; 91(4): 466-484, 2023 04.
Article en En | MEDLINE | ID: mdl-36306150
ABSTRACT
Order and disorder govern protein functions, but there is a great diversity in disorder, from regions that are-and stay-fully disordered to conditional order. This diversity is still difficult to decipher even though it is encoded in the amino acid sequences. Here, we developed an analytic Python package, named pyHCA, to estimate the foldability of a protein segment from the only information of its amino acid sequence and based on a measure of its density in regular secondary structures associated with hydrophobic clusters, as defined by the hydrophobic cluster analysis (HCA) approach. The tool was designed by optimizing the separation between foldable segments from databases of disorder (DisProt) and order (SCOPe [soluble domains] and OPM [transmembrane domains]). It allows to specify the ratio between order, embodied by regular secondary structures (either participating in the hydrophobic core of well-folded 3D structures or conditionally formed in intrinsically disordered regions) and disorder. We illustrated the relevance of pyHCA with several examples and applied it to the sequences of the proteomes of 21 species ranging from prokaryotes and archaea to unicellular and multicellular eukaryotes, for which structure models are provided in the AlphaFold protein structure database. Cases of low-confidence scores related to disorder were distinguished from those of sequences that we identified as foldable but are still excluded from accurate modeling by AlphaFold2 due to a lack of sequence homologs or to compositional biases. Overall, our approach is complementary to AlphaFold2, providing guides to map structural innovations through evolutionary processes, at proteome and gene scales.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Proteoma Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Proteins Asunto de la revista: BIOQUIMICA Año: 2023 Tipo del documento: Article País de afiliación: Francia

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Proteoma Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Proteins Asunto de la revista: BIOQUIMICA Año: 2023 Tipo del documento: Article País de afiliación: Francia