RESUMEN
The production of recombinant proteins in Escherichia coli frequently results in the formation of insoluble protein aggregates called inclusion bodies (IBs). The determinants of IB formation remain poorly understood and are of much interest for biotechnological and research applications, as well as offering insight into disease-related in vivo protein aggregation. Here we investigate a set of engineered target-binding proteins based upon the fibronectin type III domain, and we find that variations in sequence at just three positions in a solvent-exposed loop greatly alter the extent of IB formation. The loop is analogous to the third complementarity-determining region of immunoglobulin variable domains and has been shown to be conformationally mobile. In contrast to studies of other proteins, the extent of IB formation is not explained by differences in thermal stability measured by differential scanning calorimetry. Instead, IB formation is correlated with the average local stability of the FG loop, as modeled by an ensemble of structures generated using Rosetta's kinematic closure loop reconstruction method. This correlation suggests that loop instability may promote local unfolding, exposing aggregation-prone surfaces. Consistent with this mechanism, sequence-based predictions of aggregation propensity produced by Zyggregator are also correlated with IB formation, though not with modeled loop stability. The combination of average model energy scores with sequence-based aggregation predictions accounts for the variation in IB formation remarkably well (R(2)=0.8). The comparison with experimental data validates the ensemble modeling approach, which may be applicable to dynamic protein loops involved in a wide range of phenomena.