RESUMEN
The druggable proteome refers to proteins that can bind to small molecules with appropriate chemical affinity, inducing a favorable clinical response. Predicting druggable proteins through screening and in silico modeling is imperative for drug design. To contribute to this field, we developed an accurate predictive classifier for druggable cancer-driving proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. The optimal classifier was achieved with the support vector machine method, utilizing 200 tri-amino acid composition descriptors. The high performance of the model is evident from an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and an accuracy of 0.929 ± 0.006 (threefold cross-validation). The machine learning prediction model was enhanced with multi-omics approaches, including the target-disease evidence score, the shortest pathways to cancer hallmarks, structure-based ligandability assessment, unfavorable prognostic protein analysis, and the oncogenic variome. Additionally, we performed a drug repurposing analysis to identify drugs with the highest affinity capable of targeting the best predicted proteins. As a result, we identified 79 key druggable cancer-driving proteins with the highest ligandability, and 23 of them demonstrated unfavorable prognostic significance across 16 TCGA PanCancer types: CDKN2A, BCL10, ACVR1, CASP8, JAG1, TSC1, NBN, PREX2, PPP2R1A, DNM2, VAV1, ASXL1, TPR, HRAS, BUB1B, ATG7, MARK3, SETD2, CCNE1, MUTYH, CDKN2C, RB1, and SMARCA4. Moreover, we prioritized 11 clinically relevant drugs targeting these proteins. This strategy effectively predicts and prioritizes biomarkers, therapeutic targets, and drugs for in-depth studies in clinical trials. Scripts are available at https://github.com/muntisa/machine-learning-for-druggable-proteins .
Asunto(s)
Inteligencia Artificial , Neoplasias , Humanos , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Neoplasias/metabolismo , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Antineoplásicos/química , Aprendizaje Automático , Proteínas de Neoplasias/metabolismo , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/química , Máquina de Vectores de Soporte , Reposicionamiento de Medicamentos/métodos , Biología Computacional/métodos , MultiómicaRESUMEN
For understanding a chemical compound's mechanism of action and its side effects, as well as for drug discovery, it is crucial to predict its possible protein targets. This study examines 15 developed target-centric models (TCM) employing different molecular descriptions and machine learning algorithms. They were contrasted with 17 third-party models implemented as web tools (WTCM). In both sets of models, consensus strategies were implemented as potential improvement over individual predictions. The findings indicate that TCM reach f1-score values greater than 0.8. Comparing both approaches, the best TCM achieves values of 0.75, 0.61, 0.25 and 0.38 for true positive/negative rates (TPR, TNR) and false negative/positive rates (FNR, FPR); outperforming the best WTCM. Moreover, the consensus strategy proves to have the most relevant results in the top 20 % of target profiles. TCM consensus reach TPR and FNR values of 0.98 and 0; while on WTCM reach values of 0.75 and 0.24. The implemented computational tool with the TCM and their consensus strategy at: https://bioquimio.udla.edu.ec/tidentification01/ . Scientific Contribution: We compare and discuss the performances of 17 public compound-target interaction prediction models and 15 new constructions. We also explore a compound-target interaction prioritization strategy using a consensus approach, and we analyzed the challenging involved in interactions modeling.
RESUMEN
Background: There is pressing urgency to identify therapeutic targets and drugs that allow treating COVID-19 patients effectively. Methods: We performed in silico analyses of immune system protein interactome network, single-cell RNA sequencing of human tissues, and artificial neural networks to reveal potential therapeutic targets for drug repurposing against COVID-19. Results: We screened 1,584 high-confidence immune system proteins in ACE2 and TMPRSS2 co-expressing cells, finding 25 potential therapeutic targets significantly overexpressed in nasal goblet secretory cells, lung type II pneumocytes, and ileal absorptive enterocytes of patients with several immunopathologies. Then, we performed fully connected deep neural networks to find the best multitask classification model to predict the activity of 10,672 drugs, obtaining several approved drugs, compounds under investigation, and experimental compounds with the highest area under the receiver operating characteristics. Conclusion: After being effectively analyzed in clinical trials, these drugs can be considered for treatment of severe COVID-19 patients. Scripts can be downloaded at https://github.com/muntisa/immuno-drug-repurposing-COVID-19.
RESUMEN
Osteosarcoma is the most common type of primary malignant bone tumor. Although nowadays 5-year survival rates can reach up to 60-70%, acute complications and late effects of osteosarcoma therapy are two of the limiting factors in treatments. We developed a multi-objective algorithm for the repurposing of new anti-osteosarcoma drugs, based on the modeling of molecules with described activity for HOS, MG63, SAOS2, and U2OS cell lines in the ChEMBL database. Several predictive models were obtained for each cell line and those with accuracy greater than 0.8 were integrated into a desirability function for the final multi-objective model. An exhaustive exploration of model combinations was carried out to obtain the best multi-objective model in virtual screening. For the top 1% of the screened list, the final model showed a BEDROC = 0.562, EF = 27.6, and AUC = 0.653. The repositioning was performed on 2218 molecules described in DrugBank. Within the top-ranked drugs, we found: temsirolimus, paclitaxel, sirolimus, everolimus, and cabazitaxel, which are antineoplastic drugs described in clinical trials for cancer in general. Interestingly, we found several broad-spectrum antibiotics and antiretroviral agents. This powerful model predicts several drugs that should be studied in depth to find new chemotherapy regimens and to propose new strategies for osteosarcoma treatment.
RESUMEN
Sarcomas are a group of malignant neoplasms of connective tissue with a different etiology than carcinomas. The efforts to discover new drugs with antisarcoma activity have generated large datasets of multiple preclinical assays with different experimental conditions. For instance, the ChEMBL database contains outcomes of 37,919 different antisarcoma assays with 34,955 different chemical compounds. Furthermore, the experimental conditions reported in this dataset include 157 types of biological activity parameters, 36 drug targets, 43 cell lines, and 17 assay organisms. Considering this information, we propose combining perturbation theory (PT) principles with machine learning (ML) to develop a PTML model to predict antisarcoma compounds. PTML models use one function of reference that measures the probability of a drug being active under certain conditions (protein, cell line, organism, etc.). In this paper, we used a linear discriminant analysis and neural network to train and compare PT and non-PT models. All the explored models have an accuracy of 89.19-95.25% for training and 89.22-95.46% in validation sets. PTML-based strategies have similar accuracy but generate simplest models. Therefore, they may become a versatile tool for predicting antisarcoma compounds.
RESUMEN
Wuhan, China was the epicenter of the first zoonotic transmission of the severe acute respiratory syndrome coronavirus clade 2 (SARS-CoV-2) in December 2019 and it is the causative agent of the novel human coronavirus disease 2019 (COVID-19). Almost from the beginning of the COVID-19 outbreak several attempts were made to predict possible drugs capable of inhibiting the virus replication. In the present work a drug repurposing study is performed to identify potential SARS-CoV-2 protease inhibitors. We created a Quantitative Structure-Activity Relationship (QSAR) model based on a machine learning strategy using hundreds of inhibitor molecules of the main protease (Mpro) of the SARS-CoV coronavirus. The QSAR model was used for virtual screening of a large list of drugs from the DrugBank database. The best 20 candidates were then evaluated in-silico against the Mpro of SARS-CoV-2 by using docking and molecular dynamics analyses. Docking was done by using the Gold software, and the free energies of binding were predicted with the MM-PBSA method as implemented in AMBER. Our results indicate that levothyroxine, amobarbital and ABP-700 are the best potential inhibitors of the SARS-CoV-2 virus through their binding to the Mpro enzyme. Five other compounds showed also a negative but small free energy of binding: nikethamide, nifurtimox, rebimastat, apomine and rebastinib.
Asunto(s)
Antivirales/farmacología , Tratamiento Farmacológico de COVID-19 , Proteasas 3C de Coronavirus/antagonistas & inhibidores , Descubrimiento de Drogas/métodos , Reposicionamiento de Medicamentos/métodos , Inhibidores de Proteasas/farmacología , SARS-CoV-2/enzimología , Amobarbital/farmacología , Antivirales/química , Sitios de Unión , Simulación por Computador , Humanos , Aprendizaje Automático , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Pandemias , Inhibidores de Proteasas/química , Unión Proteica , Relación Estructura-Actividad Cuantitativa , SARS-CoV-2/efectos de los fármacos , Bibliotecas de Moléculas Pequeñas/química , Programas Informáticos , Termodinámica , Tiroxina/farmacologíaRESUMEN
Breast cancer (BC) is a heterogeneous disease where genomic alterations, protein expression deregulation, signaling pathway alterations, hormone disruption, ethnicity and environmental determinants are involved. Due to the complexity of BC, the prediction of proteins involved in this disease is a trending topic in drug design. This work is proposing accurate prediction classifier for BC proteins using six sets of protein sequence descriptors and 13 machine-learning methods. After using a univariate feature selection for the mix of five descriptor families, the best classifier was obtained using multilayer perceptron method (artificial neural network) and 300 features. The performance of the model is demonstrated by the area under the receiver operating characteristics (AUROC) of 0.980 ± 0.0037, and accuracy of 0.936 ± 0.0056 (3-fold cross-validation). Regarding the prediction of 4,504 cancer-associated proteins using this model, the best ranked cancer immunotherapy proteins related to BC were RPS27, SUPT4H1, CLPSL2, POLR2K, RPL38, AKT3, CDK3, RPS20, RASL11A and UBTD1; the best ranked metastasis driver proteins related to BC were S100A9, DDA1, TXN, PRNP, RPS27, S100A14, S100A7, MAPK1, AGR3 and NDUFA13; and the best ranked RNA-binding proteins related to BC were S100A9, TXN, RPS27L, RPS27, RPS27A, RPL38, MRPL54, PPAN, RPS20 and CSRP1. This powerful model predicts several BC-related proteins that should be deeply studied to find new biomarkers and better therapeutic targets. Scripts can be downloaded at https://github.com/muntisa/neural-networks-for-breast-cancer-proteins.
Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias de la Mama/metabolismo , Regulación Neoplásica de la Expresión Génica , Inmunoterapia/métodos , Aprendizaje Automático , Redes Neurales de la Computación , ARN/metabolismo , Neoplasias de la Mama/secundario , Neoplasias de la Mama/terapia , Femenino , Perfilación de la Expresión Génica , Humanos , Metástasis de la NeoplasiaRESUMEN
Breast cancer (BC) is the leading cause of cancer-related death among women and the most commonly diagnosed cancer worldwide. Although in recent years large-scale efforts have focused on identifying new therapeutic targets, a better understanding of BC molecular processes is required. Here we focused on elucidating the molecular hallmarks of BC heterogeneity and the oncogenic mutations involved in precision medicine that remains poorly defined. To fill this gap, we established an OncoOmics strategy that consists of analyzing genomic alterations, signaling pathways, protein-protein interactome network, protein expression, dependency maps in cell lines and patient-derived xenografts in 230 previously prioritized genes to reveal essential genes in breast cancer. As results, the OncoOmics BC essential genes were rationally filtered to 140. mRNA up-regulation was the most prevalent genomic alteration. The most altered signaling pathways were associated with basal-like and Her2-enriched molecular subtypes. RAC1, AKT1, CCND1, PIK3CA, ERBB2, CDH1, MAPK14, TP53, MAPK1, SRC, RAC3, BCL2, CTNNB1, EGFR, CDK2, GRB2, MED1 and GATA3 were essential genes in at least three OncoOmics approaches. Drugs with the highest amount of clinical trials in phases 3 and 4 were paclitaxel, docetaxel, trastuzumab, tamoxifen and doxorubicin. Lastly, we collected ~3,500 somatic and germline oncogenic variants associated with 50 essential genes, which in turn had therapeutic connectivity with 73 drugs. In conclusion, the OncoOmics strategy reveals essential genes capable of accelerating the development of targeted therapies for precision oncology.
Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Regulación Neoplásica de la Expresión Génica , Genes Esenciales , Mutación , Medicina de Precisión , Animales , Biomarcadores de Tumor/metabolismo , Neoplasias de la Mama/metabolismo , Femenino , Redes Reguladoras de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , Pronóstico , Mapas de Interacción de Proteínas , Proteoma , Células Tumorales Cultivadas , Ensayos Antitumor por Modelo de XenoinjertoRESUMEN
Osteosarcoma is the most common subtype of primary bone cancer, affecting mostly adolescents. In recent years, several studies have focused on elucidating the molecular mechanisms of this sarcoma; however, its molecular etiology has still not been determined with precision. Therefore, we applied a consensus strategy with the use of several bioinformatics tools to prioritize genes involved in its pathogenesis. Subsequently, we assessed the physical interactions of the previously selected genes and applied a communality analysis to this protein-protein interaction network. The consensus strategy prioritized a total list of 553 genes. Our enrichment analysis validates several studies that describe the signaling pathways PI3K/AKT and MAPK/ERK as pathogenic. The gene ontology described TP53 as a principal signal transducer that chiefly mediates processes associated with cell cycle and DNA damage response It is interesting to note that the communality analysis clusters several members involved in metastasis events, such as MMP2 and MMP9, and genes associated with DNA repair complexes, like ATM, ATR, CHEK1, and RAD51. In this study, we have identified well-known pathogenic genes for osteosarcoma and prioritized genes that need to be further explored.
Asunto(s)
Neoplasias Óseas/genética , Neoplasias Óseas/patología , Osteosarcoma/genética , Osteosarcoma/patología , Biología Computacional/métodos , Consenso , Reparación del ADN/genética , Regulación Neoplásica de la Expresión Génica/genética , Ontología de Genes , Redes Reguladoras de Genes/genética , Humanos , Mapas de Interacción de Proteínas/genética , Transducción de Señal/genéticaRESUMEN
Consensus strategy was proved to be highly efficient in the recognition of gene-disease association. Therefore, the main objective of this study was to apply theoretical approaches to explore genes and communities directly involved in breast cancer (BC) pathogenesis. We evaluated the consensus between 8 prioritization strategies for the early recognition of pathogenic genes. A communality analysis in the protein-protein interaction (PPi) network of previously selected genes was enriched with gene ontology, metabolic pathways, as well as oncogenomics validation with the OncoPPi and DRIVE projects. The consensus genes were rationally filtered to 1842 genes. The communality analysis showed an enrichment of 14 communities specially connected with ERBB, PI3K-AKT, mTOR, FOXO, p53, HIF-1, VEGF, MAPK and prolactin signaling pathways. Genes with highest ranking were TP53, ESR1, BRCA2, BRCA1 and ERBB2. Genes with highest connectivity degree were TP53, AKT1, SRC, CREBBP and EP300. The connectivity degree allowed to establish a significant correlation between the OncoPPi network and our BC integrated network conformed by 51 genes and 62 PPi. In addition, CCND1, RAD51, CDC42, YAP1 and RPA1 were functional genes with significant sensitivity score in BC cell lines. In conclusion, the consensus strategy identifies both well-known pathogenic genes and prioritized genes that need to be further explored.
Asunto(s)
Algoritmos , Neoplasias de la Mama/metabolismo , Femenino , Regulación Neoplásica de la Expresión Génica/genética , Regulación Neoplásica de la Expresión Génica/fisiología , Redes Reguladoras de Genes/genética , Redes Reguladoras de Genes/fisiología , Humanos , Redes y Vías Metabólicas/genética , Redes y Vías Metabólicas/fisiología , Unión Proteica , Transducción de Señal/genética , Transducción de Señal/fisiologíaRESUMEN
This study presents the impact of carbon nanotubes (CNTs) on mitochondrial oxygen mass flux (Jm) under three experimental conditions. New experimental results and a new methodology are reported for the first time and they are based on CNT Raman spectra star graph transform (spectral moments) and perturbation theory. The experimental measures of Jm showed that no tested CNT family can inhibit the oxygen consumption profiles of mitochondria. The best model for the prediction of Jm for other CNTs was provided by random forest using eight features, obtaining test R-squared (R²) of 0.863 and test root-mean-square error (RMSE) of 0.0461. The results demonstrate the capability of encoding CNT information into spectral moments of the Raman star graphs (SG) transform with a potential applicability as predictive tools in nanotechnology and material risk assessments.
RESUMEN
The current molecular docking study provided the Free Energy of Binding (FEB) for the interaction (nanotoxicity) between VDAC mitochondrial channels of three species (VDAC1-Mus musculus, VDAC1-Homo sapiens, VDAC2-Danio rerio) with SWCNT-H, SWCNT-OH, SWCNT-COOH carbon nanotubes. The general results showed that the FEB values were statistically more negative (p < 0.05) in the following order: (SWCNT-VDAC2-Danio rerio) > (SWCNT-VDAC1-Mus musculus) > (SWCNT-VDAC1-Homo sapiens) > (ATP-VDAC). More negative FEB values for SWCNT-COOH and OH were found in VDAC2-Danio rerio when compared with VDAC1-Mus musculus and VDAC1-Homo sapiens (p < 0.05). In addition, a significant correlation (0.66 > r2 > 0.97) was observed between n-Hamada index and VDAC nanotoxicity (or FEB) for the zigzag topologies of SWCNT-COOH and SWCNT-OH. Predictive Nanoparticles-Quantitative-Structure Binding-Relationship models (nano-QSBR) for strong and weak SWCNT-VDAC docking interactions were performed using Perturbation Theory, regression and classification models. Thus, 405 SWCNT-VDAC interactions were predicted using a nano-PT-QSBR classifications model with high accuracy, specificity, and sensitivity (73-98%) in training and validation series, and a maximum AUROC value of 0.978. In addition, the best regression model was obtained with Random Forest (R2 of 0.833, RMSE of 0.0844), suggesting an excellent potential to predict SWCNT-VDAC channel nanotoxicity. All study data are available at https://doi.org/10.6084/m9.figshare.4802320.v2 .
Asunto(s)
Nanotubos de Carbono/química , Humanos , Mitocondrias/química , Mitocondrias/metabolismo , Simulación del Acoplamiento Molecular , Canal Aniónico 1 Dependiente del Voltaje/química , Canal Aniónico 1 Dependiente del Voltaje/metabolismo , Canal Aniónico 2 Dependiente del Voltaje/química , Canal Aniónico 2 Dependiente del Voltaje/metabolismo , Canales Aniónicos Dependientes del Voltaje/química , Canales Aniónicos Dependientes del Voltaje/metabolismoRESUMEN
Unbalanced uptake of Omega 6/Omega 3 (ω-6/ω-3) ratios could increase chronic disease occurrences, such as inflammation, atherosclerosis, or tumor proliferation, and methylation methods for measuring the ruminal microbiome fatty acid (FA) composition/distribution play a vital role in discovering the contribution of food components to ruminant products (e.g., meat and milk) when pursuing a healthy diet. Hansch's models based on Linear Free Energy Relationships (LFERs) using physicochemical parameters, such as partition coefficients, molar refractivity, and polarizability, as input variables (Vk) are advocated. In this work, a new combined experimental and theoretical strategy was proposed to study the effect of ω-6/ω-3 ratios, FA chemical structure, and other factors over FA distribution networks in the ruminal microbiome. In step 1, experiments were carried out to measure long chain fatty acid (LCFA) profiles in the rumen microbiome (bacterial and protozoan), and volatile fatty acids (VFAs) in fermentation media. In step 2, the proportions and physicochemical parameter values of LCFAs and VFAs were calculated under different boundary conditions (cj) like c1 = acid and/or base methylation treatments, c2 = with/without fermentation, c3 = FA distribution phase (media, bacterial, or protozoan microbiome), etc. In step 3, Perturbation Theory (PT) and LFER ideas were combined to develop a PT-LFER model of a FA distribution network using physicochemical parameters (V(k)), the corresponding Box-Jenkins (ΔV(kj)) and PT operators (ΔΔV(kj)) in statistical analysis. The best PT-LFER model found predicted the effects of perturbations over the FA distribution network with sensitivity, specificity, and accuracy > 80% for 407 655 cases in training + external validation series. In step 4, alternative PT-LFER and PT-NLFER models were tested for training Linear and Non-Linear Artificial Neural Networks (ANNs). PT-NLFER models based on ANNs presented better performance but are more complicated than the PT-LFER model. Last, in step 5, the PT-LFER model based on LDA was used to reconstruct the complex networks of perturbations in the FA distribution and compared the giant components of the observed and predicted networks with random Erdos-Rényi network models. In short, our new PT-LFER model is a useful tool for predicting a distribution network in terms of specific fatty acid distribution.