RESUMO
In addition to activity, successful biological drugs must exhibit a series of suitable developability properties, which depend on both protein sequence and buffer composition. In the context of this high-dimensional optimization problem, advanced algorithms from the domain of machine learning are highly beneficial in complementing analytical screening and rational design. Here, we propose a Bayesian optimization algorithm to accelerate the design of biopharmaceutical formulations. We demonstrate the power of this approach by identifying the formulation that optimizes the thermal stability of three tandem single-chain Fv variants within 25 experiments, a number which is less than one-third of the experiments that would be required by a classical DoE method and several orders of magnitude smaller compared to detailed experimental analysis of full combinatorial space. We further show the advantage of this method over conventional approaches to efficiently transfer historical information as prior knowledge for the development of new biologics or when new buffer agents are available. Moreover, we highlight the benefit of our technique in engineering multiple biophysical properties by simultaneously optimizing both thermal and interface stabilities. This optimization minimizes the amount of surfactant in the formulation, which is important to decrease the risks associated with corresponding degradation processes. Overall, this method can provide high speed of converging to optimal conditions, the ability to transfer prior knowledge, and the identification of new nonlinear combinations of excipients. We envision that these features can lead to a considerable acceleration in formulation design and to parallelization of operations during drug development.