Exploiting ensemble learning to improve prediction of phospholipidosis inducing potential.

Nath, Abhigyan; Sahu, Gopal Krishna

Nath, Abhigyan; Sahu, Gopal Krishna.

Afiliação

Nath A; Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur 492001, India. Electronic address: abhigyannath01@gmail.com.
Sahu GK; Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur 492001, India.

J Theor Biol ; 479: 37-47, 2019 10 21.

Article em En | MEDLINE | ID: mdl-31310757

ABSTRACT

ABSTRACT

Phospholipidosis is characterized by the presence of excessive accumulation of phospholipids in different tissue types (lungs, liver, eyes, kidneys etc.) caused by cationic amphiphilic drugs. Electron microscopy analysis has revealed the presence of lamellar inclusion bodies as the hallmark of phospholipidosis. Some phospholipidosis causing compounds can cause tissue specific inflammatory/retrogressive changes. Reliable and accurate in silico methods could facilitate early screening of phospholipidosis inducing compounds which can subsequently speed up the pharmaceutical drug discovery pipelines. In the present work, stacking ensembles are implemented for combining a number of different base learners to develop predictive models (a total of 256 trained machine learning models were tested) for phospholipidosis inducing compounds using a wide range of molecular descriptors (ChemMine, JOELib, Open babel and RDK descriptors) and structural alerts as input features. The best model consisting of stacked ensemble of machine learning algorithms with random forest as the second level learner outperformed other base and ensemble learners. JOELib descriptors along with structural alerts performed better than the other types of descriptor sets. The best ensemble model achieved an overall accuracy of 88.23%, sensitivity of 86.27%, specificity of 90.20%, mcc of 0.765, auc of 0.896 with 88.21â¯g-means. To assess the robustness and stability of the best ensemble model, it is further evaluated using stratified 10×10 fold cross validation and holdout testing sets (repeated 10 times) achieving 84.83% mean accuracy with 0.708 mean mcc and 88.46% mean accuracy with 0.771 mean mcc respectively. A comparison of different meta classifiers (Generalized linear regression, Gradient boosting machines, Random forest and Deep learning neural networks) in stacking ensemble revealed that random forest is the better choice for combining multiple classification models.

Assuntos

Lipidoses/diagnóstico; Modelos Estatísticos; Fosfolipídeos/metabolismo; Área Sob a Curva; Descoberta de Drogas; Humanos; Lipidoses/induzido quimicamente; Lipidoses/etiologia; Aprendizado de Máquina/normas; Sensibilidade e Especificidade

Palavras-chave

Cationic amphiphilic drugs; Deep learning; Ensemble learning; Hierarchical clustering; Phospholipidosis; Stacking

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Fosfolipídeos / Modelos Estatísticos / Lipidoses Tipo de estudo: Diagnostic_studies / Etiology_studies / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Revista: J Theor Biol Ano de publicação: 2019 Tipo de documento: Article

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google