RESUMEN
The ability to predict cell-permeable candidate molecules has great potential to assist drug discovery projects. Large molecules that lie beyond the Rule of Five (bRo5) are increasingly important as drug candidates and tool molecules for chemical biology. However, such large molecules usually do not cross cell membranes and cannot access intracellular targets or be developed as orally bioavailable drugs. Here, we describe a random forest (RF) machine learning model for the prediction of passive membrane permeation rates developed using a set of over 1000 bRo5 macrocyclic compounds. The model is based on easily calculated chemical features/descriptors as independent variables. Our random forest (RF) model substantially outperforms a multiple linear regression model based on the same features and achieves better performance metrics than previously reported models using the same underlying data. These features include: (1) polar surface area in water, (2) the octanol-water partitioning coefficient, (3) the number of hydrogen-bond donors, (4) the sum of the topological distances between nitrogen atoms, (5) the sum of the topological distances between nitrogen and oxygen atoms, and (6) the multiple molecular path count of order 2. The last three features represent molecular flexibility, the ability of the molecule to adopt different conformations in the aqueous and membrane interior phases, and the molecular "chameleonicity." Guided by the model, we propose design guidelines for membrane-permeating macrocycles. It is anticipated that this model will be useful in guiding the design of large, bioactive molecules for medicinal chemistry and chemical biology applications.