RESUMO
BACKGROUND: Peru is one of the most biodiverse countries in the world, which is reflected in its wealth of knowledge about medicinal plants. However, there is a lack of information regarding intestinal absorption and the permeability of natural products. The human colon adenocarcinoma cell line (Caco-2) is an in vitro assay used to measure apparent permeability. This study aims to develop a quantitative structure-property relationship (QSPR) model using machine learning algorithms to predict the apparent permeability of the Caco-2 cell in natural products from Peru. METHODS: A dataset of 1817 compounds, including experimental log Papp values and molecular descriptors, was utilized. Six QSPR models were constructed: a multiple linear regression (MLR) model, a partial least squares regression (PLS) model, a support vector machine regression (SVM) model, a random forest (RF) model, a gradient boosting machine (GBM) model, and an SVM-RF-GBM model. RESULTS: An evaluation of the testing set revealed that the MLR and PLS models exhibited an RMSE = 0.47 and R2 = 0.63. In contrast, the SVM, RF, and GBM models showcased an RMSE = 0.39-0.40 and R2 = 0.73-0.74. Notably, the SVM-RF-GBM model demonstrated superior performance, with an RMSE = 0.38 and R2 = 0.76. The model predicted log Papp values for 502 natural products falling within the applicability domain, with 68.9% (n = 346) showing high permeability, suggesting the potential for intestinal absorption. Additionally, we categorized the natural products into six metabolic pathways and assessed their drug-likeness. CONCLUSIONS: Our results provide insights into the potential intestinal absorption of natural products in Peru, thus facilitating drug development and pharmaceutical discovery efforts.