RESUMO
Drug-induced liver injury (DILI) is a major safety concern in the drug-development process, and various methods have been proposed to predict the hepatotoxicity of compounds during the early stages of drug trials. In this study, we developed an ensemble model using 3 machine learning algorithms and 12 molecular fingerprints from a dataset containing 1241 diverse compounds. The ensemble model achieved an average accuracy of 71.1 ± 2.6%, sensitivity (SE) of 79.9 ± 3.6%, specificity (SP) of 60.3 ± 4.8%, and area under the receiver-operating characteristic curve (AUC) of 0.764 ± 0.026 in 5-fold cross-validation and an accuracy of 84.3%, SE of 86.9%, SP of 75.4%, and AUC of 0.904 in an external validation dataset of 286 compounds collected from the Liver Toxicity Knowledge Base. Compared with previous methods, the ensemble model achieved relatively high accuracy and SE. We also identified several substructures related to DILI. In addition, we provide a web server offering access to our models (http://ccsipb.lnu.edu.cn/toxicity/HepatoPred-EL/).
Assuntos
Doença Hepática Induzida por Substâncias e Drogas/etiologia , Descoberta de Drogas/métodos , Preparações Farmacêuticas/química , Algoritmos , Animais , Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade , Curva ROC , Sensibilidade e EspecificidadeRESUMO
Lysine succinylation is an extremely important protein post-translational modification that plays a fundamental role in regulating various biological reactions, and dysfunction of this process is associated with a number of diseases. Thus, determining which Lys residues in an uncharacterized protein sequence are succinylated underpins both basic research and drug development endeavors. To solve this problem, we have developed a predictor called pSuc-PseRat. The features of the pSuc-PseRat predictor are derived from two aspects: (1) the binary encoding from succinylated sites and non-succinylated sites; (2) the sequence-coupling effects between succinylated sites and non-succinylated sites. Eleven gradient boosting machine classifiers were trained with these features to build the predictor. The pSuc-PseRat predictor achieved an average ACU (area under the receiver operating characteristic curve) score of 0.805 in the fivefold cross-validation set and performed better than existing predictors on two comprehensive independent test sets. A freely available web server has been developed for pSuc-PseRat.