Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
Sci Rep ; 14(1): 19359, 2024 08 21.
Artigo em Inglês | MEDLINE | ID: mdl-39169044

RESUMO

The druggable proteome refers to proteins that can bind to small molecules with appropriate chemical affinity, inducing a favorable clinical response. Predicting druggable proteins through screening and in silico modeling is imperative for drug design. To contribute to this field, we developed an accurate predictive classifier for druggable cancer-driving proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. The optimal classifier was achieved with the support vector machine method, utilizing 200 tri-amino acid composition descriptors. The high performance of the model is evident from an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and an accuracy of 0.929 ± 0.006 (threefold cross-validation). The machine learning prediction model was enhanced with multi-omics approaches, including the target-disease evidence score, the shortest pathways to cancer hallmarks, structure-based ligandability assessment, unfavorable prognostic protein analysis, and the oncogenic variome. Additionally, we performed a drug repurposing analysis to identify drugs with the highest affinity capable of targeting the best predicted proteins. As a result, we identified 79 key druggable cancer-driving proteins with the highest ligandability, and 23 of them demonstrated unfavorable prognostic significance across 16 TCGA PanCancer types: CDKN2A, BCL10, ACVR1, CASP8, JAG1, TSC1, NBN, PREX2, PPP2R1A, DNM2, VAV1, ASXL1, TPR, HRAS, BUB1B, ATG7, MARK3, SETD2, CCNE1, MUTYH, CDKN2C, RB1, and SMARCA4. Moreover, we prioritized 11 clinically relevant drugs targeting these proteins. This strategy effectively predicts and prioritizes biomarkers, therapeutic targets, and drugs for in-depth studies in clinical trials. Scripts are available at https://github.com/muntisa/machine-learning-for-druggable-proteins .


Assuntos
Inteligência Artificial , Neoplasias , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/genética , Neoplasias/metabolismo , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Antineoplásicos/química , Aprendizado de Máquina , Proteínas de Neoplasias/metabolismo , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/química , Máquina de Vetores de Suporte , Reposicionamento de Medicamentos/métodos , Biologia Computacional/métodos , Multiômica
2.
Int J Mol Sci ; 22(21)2021 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-34768951

RESUMO

The theoretical prediction of drug-decorated nanoparticles (DDNPs) has become a very important task in medical applications. For the current paper, Perturbation Theory Machine Learning (PTML) models were built to predict the probability of different pairs of drugs and nanoparticles creating DDNP complexes with anti-glioblastoma activity. PTML models use the perturbations of molecular descriptors of drugs and nanoparticles as inputs in experimental conditions. The raw dataset was obtained by mixing the nanoparticle experimental data with drug assays from the ChEMBL database. Ten types of machine learning methods have been tested. Only 41 features have been selected for 855,129 drug-nanoparticle complexes. The best model was obtained with the Bagging classifier, an ensemble meta-estimator based on 20 decision trees, with an area under the receiver operating characteristic curve (AUROC) of 0.96, and an accuracy of 87% (test subset). This model could be useful for the virtual screening of nanoparticle-drug complexes in glioblastoma. All the calculations can be reproduced with the datasets and python scripts, which are freely available as a GitHub repository from authors.


Assuntos
Antineoplásicos/administração & dosagem , Neoplasias Encefálicas/tratamento farmacológico , Sistemas de Liberação de Medicamentos , Glioblastoma/tratamento farmacológico , Aprendizado de Máquina , Nanopartículas , Bases de Dados de Compostos Químicos , Bases de Dados de Produtos Farmacêuticos , Portadores de Fármacos/administração & dosagem , Desenho de Fármacos , Ensaios de Seleção de Medicamentos Antitumorais , Humanos , Nanopartículas/administração & dosagem , Interface Usuário-Computador
3.
Pharmaceuticals (Basel) ; 13(11)2020 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-33266378

RESUMO

Osteosarcoma is the most common type of primary malignant bone tumor. Although nowadays 5-year survival rates can reach up to 60-70%, acute complications and late effects of osteosarcoma therapy are two of the limiting factors in treatments. We developed a multi-objective algorithm for the repurposing of new anti-osteosarcoma drugs, based on the modeling of molecules with described activity for HOS, MG63, SAOS2, and U2OS cell lines in the ChEMBL database. Several predictive models were obtained for each cell line and those with accuracy greater than 0.8 were integrated into a desirability function for the final multi-objective model. An exhaustive exploration of model combinations was carried out to obtain the best multi-objective model in virtual screening. For the top 1% of the screened list, the final model showed a BEDROC = 0.562, EF = 27.6, and AUC = 0.653. The repositioning was performed on 2218 molecules described in DrugBank. Within the top-ranked drugs, we found: temsirolimus, paclitaxel, sirolimus, everolimus, and cabazitaxel, which are antineoplastic drugs described in clinical trials for cancer in general. Interestingly, we found several broad-spectrum antibiotics and antiretroviral agents. This powerful model predicts several drugs that should be studied in depth to find new chemotherapy regimens and to propose new strategies for osteosarcoma treatment.

4.
ACS Omega ; 5(42): 27211-27220, 2020 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-33134682

RESUMO

Sarcomas are a group of malignant neoplasms of connective tissue with a different etiology than carcinomas. The efforts to discover new drugs with antisarcoma activity have generated large datasets of multiple preclinical assays with different experimental conditions. For instance, the ChEMBL database contains outcomes of 37,919 different antisarcoma assays with 34,955 different chemical compounds. Furthermore, the experimental conditions reported in this dataset include 157 types of biological activity parameters, 36 drug targets, 43 cell lines, and 17 assay organisms. Considering this information, we propose combining perturbation theory (PT) principles with machine learning (ML) to develop a PTML model to predict antisarcoma compounds. PTML models use one function of reference that measures the probability of a drug being active under certain conditions (protein, cell line, organism, etc.). In this paper, we used a linear discriminant analysis and neural network to train and compare PT and non-PT models. All the explored models have an accuracy of 89.19-95.25% for training and 89.22-95.46% in validation sets. PTML-based strategies have similar accuracy but generate simplest models. Therefore, they may become a versatile tool for predicting antisarcoma compounds.

5.
BMC Mol Cell Biol ; 21(1): 52, 2020 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-32640984

RESUMO

BACKGROUND: The main challenge in cancer research is the identification of different omic variables that present a prognostic value and personalised diagnosis for each tumour. The fact that the diagnosis is personalised opens the doors to the design and discovery of new specific treatments for each patient. In this context, this work offers new ways to reuse existing databases and work to create added value in research. Three published signatures with significante prognostic value in Colon Adenocarcinoma (COAD) were indentified. These signatures were combined in a new meta-signature and validated with main Machine Learning (ML) and conventional statistical techniques. In addition, a drug repurposing experiment was carried out through Molecular Docking (MD) methodology in order to identify new potential treatments in COAD. RESULTS: The prognostic potential of the signature was validated by means of ML algorithms and differential gene expression analysis. The results obtained supported the possibility that this meta-signature could harbor genes of interest for the prognosis and treatment of COAD. We studied drug repurposing following a molecular docking (MD) analysis, where the different protein data bank (PDB) structures of the genes of the meta-signature (in total 155) were confronted with 81 anti-cancer drugs approved by the FDA. We observed four interactions of interest: GLTP - Nilotinib, PTPRN - Venetoclax, VEGFA - Venetoclax and FABP6 - Abemaciclib. The FABP6 gene and its role within different metabolic pathways were studied in tumour and normal tissue and we observed the capability of the FABP6 gene to be a therapeutic target. Our in silico results showed a significant specificity of the union of the protein products of the FABP6 gene as well as the known action of Abemaciclib as an inhibitor of the CDK4/6 protein and therefore, of the cell cycle. CONCLUSIONS: The results of our ML and differential expression experiments have first shown the FABP6 gene as a possible new cancer biomarker due to its specificity in colonic tumour tissue and no expression in healthy adjacent tissue. Next, the MD analysis showed that the drug Abemaciclib characteristic affinity for the different protein structures of the FABP6 gene. Therefore, in silico experiments have shown a new opportunity that should be validated experimentally, thus helping to reduce the cost and speed of drug screening. For these reasons, we propose the validation of the drug Abemaciclib for the treatment of colon cancer.


Assuntos
Aminopiridinas/química , Aminopiridinas/uso terapêutico , Benzimidazóis/química , Benzimidazóis/uso terapêutico , Neoplasias do Colo/tratamento farmacológico , Aprendizado de Máquina , Simulação de Acoplamento Molecular , Adenocarcinoma/tratamento farmacológico , Adenocarcinoma/genética , Adenocarcinoma/patologia , Algoritmos , Linhagem Celular Tumoral , Neoplasias do Colo/genética , Neoplasias do Colo/patologia , Bases de Dados de Proteínas , Reposicionamento de Medicamentos , Epistasia Genética , Proteínas de Ligação a Ácido Graxo/genética , Proteínas de Ligação a Ácido Graxo/metabolismo , Hormônios Gastrointestinais/genética , Hormônios Gastrointestinais/metabolismo , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Estadiamento de Neoplasias , Prognóstico , Análise de Sobrevida
6.
Sci Rep ; 10(1): 8515, 2020 05 22.
Artigo em Inglês | MEDLINE | ID: mdl-32444848

RESUMO

Breast cancer (BC) is a heterogeneous disease where genomic alterations, protein expression deregulation, signaling pathway alterations, hormone disruption, ethnicity and environmental determinants are involved. Due to the complexity of BC, the prediction of proteins involved in this disease is a trending topic in drug design. This work is proposing accurate prediction classifier for BC proteins using six sets of protein sequence descriptors and 13 machine-learning methods. After using a univariate feature selection for the mix of five descriptor families, the best classifier was obtained using multilayer perceptron method (artificial neural network) and 300 features. The performance of the model is demonstrated by the area under the receiver operating characteristics (AUROC) of 0.980 ± 0.0037, and accuracy of 0.936 ± 0.0056 (3-fold cross-validation). Regarding the prediction of 4,504 cancer-associated proteins using this model, the best ranked cancer immunotherapy proteins related to BC were RPS27, SUPT4H1, CLPSL2, POLR2K, RPL38, AKT3, CDK3, RPS20, RASL11A and UBTD1; the best ranked metastasis driver proteins related to BC were S100A9, DDA1, TXN, PRNP, RPS27, S100A14, S100A7, MAPK1, AGR3 and NDUFA13; and the best ranked RNA-binding proteins related to BC were S100A9, TXN, RPS27L, RPS27, RPS27A, RPL38, MRPL54, PPAN, RPS20 and CSRP1. This powerful model predicts several BC-related proteins that should be deeply studied to find new biomarkers and better therapeutic targets. Scripts can be downloaded at https://github.com/muntisa/neural-networks-for-breast-cancer-proteins.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias da Mama/metabolismo , Regulação Neoplásica da Expressão Gênica , Imunoterapia/métodos , Aprendizado de Máquina , Redes Neurais de Computação , RNA/metabolismo , Neoplasias da Mama/secundário , Neoplasias da Mama/terapia , Feminino , Perfilação da Expressão Gênica , Humanos , Metástase Neoplásica
7.
Sci Rep ; 10(1): 5285, 2020 03 24.
Artigo em Inglês | MEDLINE | ID: mdl-32210335

RESUMO

Breast cancer (BC) is the leading cause of cancer-related death among women and the most commonly diagnosed cancer worldwide. Although in recent years large-scale efforts have focused on identifying new therapeutic targets, a better understanding of BC molecular processes is required. Here we focused on elucidating the molecular hallmarks of BC heterogeneity and the oncogenic mutations involved in precision medicine that remains poorly defined. To fill this gap, we established an OncoOmics strategy that consists of analyzing genomic alterations, signaling pathways, protein-protein interactome network, protein expression, dependency maps in cell lines and patient-derived xenografts in 230 previously prioritized genes to reveal essential genes in breast cancer. As results, the OncoOmics BC essential genes were rationally filtered to 140. mRNA up-regulation was the most prevalent genomic alteration. The most altered signaling pathways were associated with basal-like and Her2-enriched molecular subtypes. RAC1, AKT1, CCND1, PIK3CA, ERBB2, CDH1, MAPK14, TP53, MAPK1, SRC, RAC3, BCL2, CTNNB1, EGFR, CDK2, GRB2, MED1 and GATA3 were essential genes in at least three OncoOmics approaches. Drugs with the highest amount of clinical trials in phases 3 and 4 were paclitaxel, docetaxel, trastuzumab, tamoxifen and doxorubicin. Lastly, we collected ~3,500 somatic and germline oncogenic variants associated with 50 essential genes, which in turn had therapeutic connectivity with 73 drugs. In conclusion, the OncoOmics strategy reveals essential genes capable of accelerating the development of targeted therapies for precision oncology.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Regulação Neoplásica da Expressão Gênica , Genes Essenciais , Mutação , Medicina de Precisão , Animais , Biomarcadores Tumorais/metabolismo , Neoplasias da Mama/metabolismo , Feminino , Redes Reguladoras de Genes , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , Prognóstico , Mapas de Interação de Proteínas , Proteoma , Células Tumorais Cultivadas , Ensaios Antitumorais Modelo de Xenoenxerto
8.
Int J Mol Sci ; 21(3)2020 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-32033398

RESUMO

Osteosarcoma is the most common subtype of primary bone cancer, affecting mostly adolescents. In recent years, several studies have focused on elucidating the molecular mechanisms of this sarcoma; however, its molecular etiology has still not been determined with precision. Therefore, we applied a consensus strategy with the use of several bioinformatics tools to prioritize genes involved in its pathogenesis. Subsequently, we assessed the physical interactions of the previously selected genes and applied a communality analysis to this protein-protein interaction network. The consensus strategy prioritized a total list of 553 genes. Our enrichment analysis validates several studies that describe the signaling pathways PI3K/AKT and MAPK/ERK as pathogenic. The gene ontology described TP53 as a principal signal transducer that chiefly mediates processes associated with cell cycle and DNA damage response It is interesting to note that the communality analysis clusters several members involved in metastasis events, such as MMP2 and MMP9, and genes associated with DNA repair complexes, like ATM, ATR, CHEK1, and RAD51. In this study, we have identified well-known pathogenic genes for osteosarcoma and prioritized genes that need to be further explored.


Assuntos
Neoplasias Ósseas/genética , Neoplasias Ósseas/patologia , Osteossarcoma/genética , Osteossarcoma/patologia , Biologia Computacional/métodos , Consenso , Reparo do DNA/genética , Regulação Neoplásica da Expressão Gênica/genética , Ontologia Genética , Redes Reguladoras de Genes/genética , Humanos , Mapas de Interação de Proteínas/genética , Transdução de Sinais/genética
9.
Int J Mol Sci ; 20(18)2019 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-31491969

RESUMO

In this work, we improved a previous model used for the prediction of proteomes as new B-cell epitopes in vaccine design. The predicted epitope activity of a queried peptide is based on its sequence, a known reference epitope sequence under specific experimental conditions. The peptide sequences were transformed into molecular descriptors of sequence recurrence networks and were mixed under experimental conditions. The new models were generated using 709,100 instances of pair descriptors for query and reference peptide sequences. Using perturbations of the initial descriptors under sequence or assay conditions, 10 transformed features were used as inputs for seven Machine Learning methods. The best model was obtained with random forest classifiers with an Area Under the Receiver Operating Characteristics (AUROC) of 0.981 ± 0.0005 for the external validation series (five-fold cross-validation). The database included information about 83,683 peptides sequences, 1448 epitope organisms, 323 host organisms, 15 types of in vivo processes, 28 experimental techniques, and 505 adjuvant additives. The current model could improve the in silico predictions of epitopes for vaccine design. The script and results are available as a free repository.


Assuntos
Mapeamento de Epitopos , Aprendizado de Máquina , Peptídeos/imunologia , Sequência de Aminoácidos , Humanos , Peptídeos/química , Curva ROC , Relação Estrutura-Atividade
10.
Chem Res Toxicol ; 32(9): 1811-1823, 2019 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-31327231

RESUMO

ChEMBL biological activities prediction for 1-5-bromofur-2-il-2-bromo-2-nitroethene (G1) is a difficult task for cytokine immunotoxicity. The current study presents experimental results for G1 interaction with mouse Th1/Th2 and pro-inflammatory cytokines using a cytometry bead array (CBA). In the in vitro test of CBA, the results show no significant differences between the mean values of the Th1/Th2 cytokines for the samples treated with G1 with respect to the negative control, but there are moderate differences for cytokine values between different periods (24/48 h). The experiments show no significant differences between the mean values of the pro-inflammatory cytokines for the samples treated with G1, regarding the negative control, except for the values of tumor necrosis factor (TNF) and Interleukin (IL6) between the group treated with G1 and the negative control at 48 h. Differences occur for these cytokines in the periods (24/48 h). The study confirmed that the antimicrobial G1 did not alter the Th1/Th2 cytokines concentration in vitro in different periods, but it can alter TNF and IL6. G1 promotes free radicals production and activates damage processes in macrophages culture. In order to predict all ChEMBL activities for drugs in other experimental conditions, a ChEMBL data set was constructed using 25 biological activities, 1366 assays, 2 assay types, 4 assay organisms, 2 organisms, and 12 cytokine targets. Molecular descriptors calculated with Rcpi and 15 machine learning methods were used to find the best model able to predict if a drug could be active or not against a specific cytokine, in specific experimental conditions. The best model is based on 120 selected molecular descriptors and a deep neural network with area under the curve of the receiver operating characteristic of 0.904 and accuracy of 0.832. This model predicted 1384 G1 biological activities against cytokines in all ChEMBL data set experimental conditions.


Assuntos
Antibacterianos/farmacologia , Antifúngicos/farmacologia , Citocinas/metabolismo , Furanos/farmacologia , Equilíbrio Th1-Th2/efeitos dos fármacos , Animais , Árvores de Decisões , Aprendizado Profundo , Análise Discriminante , Feminino , Camundongos Endogâmicos BALB C , Células Th1/efeitos dos fármacos , Células Th2/efeitos dos fármacos
11.
Sci Rep ; 8(1): 16679, 2018 11 12.
Artigo em Inglês | MEDLINE | ID: mdl-30420728

RESUMO

Consensus strategy was proved to be highly efficient in the recognition of gene-disease association. Therefore, the main objective of this study was to apply theoretical approaches to explore genes and communities directly involved in breast cancer (BC) pathogenesis. We evaluated the consensus between 8 prioritization strategies for the early recognition of pathogenic genes. A communality analysis in the protein-protein interaction (PPi) network of previously selected genes was enriched with gene ontology, metabolic pathways, as well as oncogenomics validation with the OncoPPi and DRIVE projects. The consensus genes were rationally filtered to 1842 genes. The communality analysis showed an enrichment of 14 communities specially connected with ERBB, PI3K-AKT, mTOR, FOXO, p53, HIF-1, VEGF, MAPK and prolactin signaling pathways. Genes with highest ranking were TP53, ESR1, BRCA2, BRCA1 and ERBB2. Genes with highest connectivity degree were TP53, AKT1, SRC, CREBBP and EP300. The connectivity degree allowed to establish a significant correlation between the OncoPPi network and our BC integrated network conformed by 51 genes and 62 PPi. In addition, CCND1, RAD51, CDC42, YAP1 and RPA1 were functional genes with significant sensitivity score in BC cell lines. In conclusion, the consensus strategy identifies both well-known pathogenic genes and prioritized genes that need to be further explored.


Assuntos
Algoritmos , Neoplasias da Mama/metabolismo , Feminino , Regulação Neoplásica da Expressão Gênica/genética , Regulação Neoplásica da Expressão Gênica/fisiologia , Redes Reguladoras de Genes/genética , Redes Reguladoras de Genes/fisiologia , Humanos , Redes e Vias Metabólicas/genética , Redes e Vias Metabólicas/fisiologia , Ligação Proteica , Transdução de Sinais/genética , Transdução de Sinais/fisiologia
12.
J Chem Inf Model ; 57(5): 1029-1044, 2017 05 22.
Artigo em Inglês | MEDLINE | ID: mdl-28414908

RESUMO

The study of selective toxicity of carbon nanotubes (CNTs) on mitochondria (CNT-mitotoxicity) is of major interest for future biomedical applications. In the current work, the mitochondrial oxygen consumption (E3) is measured under three experimental conditions by exposure to pristine and oxidized CNTs (hydroxylated and carboxylated). Respiratory functional assays showed that the information on the CNT Raman spectroscopy could be useful to predict structural parameters of mitotoxicity induced by CNTs. The in vitro functional assays show that the mitochondrial oxidative phosphorylation by ATP-synthase (or state V3 of respiration) was not perturbed in isolated rat-liver mitochondria. For the first time a star graph (SG) transform of the CNT Raman spectra is proposed in order to obtain the raw information for a nano-QSPR model. Box-Jenkins and perturbation theory operators are used for the SG Shannon entropies. A modified RRegrs methodology is employed to test four regression methods such as multiple linear regression (LM), partial least squares regression (PLS), neural networks regression (NN), and random forest (RF). RF provides the best models to predict the mitochondrial oxygen consumption in the presence of specific CNTs with R2 of 0.998-0.999 and RMSE of 0.0068-0.0133 (training and test subsets). This work is aimed at demonstrating that the SG transform of Raman spectra is useful to encode CNT information, similarly to the SG transform of the blood proteome spectra in cancer or electroencephalograms in epilepsy and also as a prospective chemoinformatics tool for nanorisk assessment. All data files and R object models are available at https://dx.doi.org/10.6084/m9.figshare.3472349 .


Assuntos
Mitocôndrias/efeitos dos fármacos , Modelos Biológicos , Nanotubos de Carbono/toxicidade , Análise Espectral Raman , Animais , Entropia , Modelos Lineares , Masculino , Mitocôndrias/ultraestrutura , Consumo de Oxigênio , Ratos , Ratos Wistar
13.
Mol Biosyst ; 11(11): 2964-77, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26282280

RESUMO

Unbalanced uptake of Omega 6/Omega 3 (ω-6/ω-3) ratios could increase chronic disease occurrences, such as inflammation, atherosclerosis, or tumor proliferation, and methylation methods for measuring the ruminal microbiome fatty acid (FA) composition/distribution play a vital role in discovering the contribution of food components to ruminant products (e.g., meat and milk) when pursuing a healthy diet. Hansch's models based on Linear Free Energy Relationships (LFERs) using physicochemical parameters, such as partition coefficients, molar refractivity, and polarizability, as input variables (Vk) are advocated. In this work, a new combined experimental and theoretical strategy was proposed to study the effect of ω-6/ω-3 ratios, FA chemical structure, and other factors over FA distribution networks in the ruminal microbiome. In step 1, experiments were carried out to measure long chain fatty acid (LCFA) profiles in the rumen microbiome (bacterial and protozoan), and volatile fatty acids (VFAs) in fermentation media. In step 2, the proportions and physicochemical parameter values of LCFAs and VFAs were calculated under different boundary conditions (cj) like c1 = acid and/or base methylation treatments, c2 = with/without fermentation, c3 = FA distribution phase (media, bacterial, or protozoan microbiome), etc. In step 3, Perturbation Theory (PT) and LFER ideas were combined to develop a PT-LFER model of a FA distribution network using physicochemical parameters (V(k)), the corresponding Box-Jenkins (ΔV(kj)) and PT operators (ΔΔV(kj)) in statistical analysis. The best PT-LFER model found predicted the effects of perturbations over the FA distribution network with sensitivity, specificity, and accuracy > 80% for 407 655 cases in training + external validation series. In step 4, alternative PT-LFER and PT-NLFER models were tested for training Linear and Non-Linear Artificial Neural Networks (ANNs). PT-NLFER models based on ANNs presented better performance but are more complicated than the PT-LFER model. Last, in step 5, the PT-LFER model based on LDA was used to reconstruct the complex networks of perturbations in the FA distribution and compared the giant components of the observed and predicted networks with random Erdos-Rényi network models. In short, our new PT-LFER model is a useful tool for predicting a distribution network in terms of specific fatty acid distribution.


Assuntos
Simulação por Computador , Ácidos Graxos/metabolismo , Animais , Bactérias/metabolismo , Catálise , Ácidos Graxos Ômega-3/metabolismo , Ácidos Graxos Voláteis/análise , Masculino , Metilação , Microbiota , Rúmen/microbiologia , Ovinos
14.
Mol Inform ; 34(11-12): 736-41, 2015 11.
Artigo em Inglês | MEDLINE | ID: mdl-27491034

RESUMO

The nucleotide binding proteins are involved in many important cellular processes, such as transmission of genetic information or energy transfer and storage. Therefore, the screening of new peptides for this biological function is an important research topic. The current study proposes a mixed methodology to obtain the first classification model that is able to predict new nucleotide binding peptides, using only the amino acid sequence. Thus, the methodology uses a Star graph molecular descriptor of the peptide sequences and the Machine Learning technique for the best classifier. The best model represents a Random Forest classifier based on two features of the embedded and non-embedded graphs. The performance of the model is excellent, considering similar models in the field, with an Area Under the Receiver Operating Characteristic Curve (AUROC) value of 0.938 and true positive rate (TPR) of 0.886 (test subset). The prediction of new nucleotide binding peptides with this model could be useful for drug target studies in drug development.


Assuntos
Aprendizado de Máquina , Modelos Moleculares , Nucleotídeos/química , Peptídeos/química
15.
J Chem Inf Model ; 54(3): 744-55, 2014 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-24521170

RESUMO

This work is aimed at describing the workflow for a methodology that combines chemoinformatics and pharmacoepidemiology methods and at reporting the first predictive model developed with this methodology. The new model is able to predict complex networks of AIDS prevalence in the US counties, taking into consideration the social determinants and activity/structure of anti-HIV drugs in preclinical assays. We trained different Artificial Neural Networks (ANNs) using as input information indices of social networks and molecular graphs. We used a Shannon information index based on the Gini coefficient to quantify the effect of income inequality in the social network. We obtained the data on AIDS prevalence and the Gini coefficient from the AIDSVu database of Emory University. We also used the Balaban information indices to quantify changes in the chemical structure of anti-HIV drugs. We obtained the data on anti-HIV drug activity and structure (SMILE codes) from the ChEMBL database. Last, we used Box-Jenkins moving average operators to quantify information about the deviations of drugs with respect to data subsets of reference (targets, organisms, experimental parameters, protocols). The best model found was a Linear Neural Network (LNN) with values of Accuracy, Specificity, and Sensitivity above 0.76 and AUROC > 0.80 in training and external validation series. This model generates a complex network of AIDS prevalence in the US at county level with respect to the preclinical activity of anti-HIV drugs in preclinical assays. To train/validate the model and predict the complex network we needed to analyze 43,249 data points including values of AIDS prevalence in 2,310 counties in the US vs ChEMBL results for 21,582 unique drugs, 9 viral or human protein targets, 4,856 protocols, and 10 possible experimental measures.


Assuntos
Síndrome da Imunodeficiência Adquirida/tratamento farmacológico , Síndrome da Imunodeficiência Adquirida/epidemiologia , Fármacos Anti-HIV/uso terapêutico , Algoritmos , Animais , Fármacos Anti-HIV/química , Bases de Dados Factuais , Avaliação Pré-Clínica de Medicamentos , HIV/efeitos dos fármacos , HIV/isolamento & purificação , Humanos , Modelos Estatísticos , Redes Neurais de Computação , Prevalência , Apoio Social , Estados Unidos/epidemiologia
16.
Mol Inform ; 33(4): 276-85, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-27485774

RESUMO

Lectins (Ls) play an important role in many diseases such as different types of cancer, parasitic infections and other diseases. Interestingly, the Protein Data Bank (PDB) contains +3000 protein 3D structures with unknown function. Thus, we can in principle, discover new Ls mining non-annotated structures from PDB or other sources. However, there are no general models to predict new biologically relevant Ls based on 3D chemical structures. We used the MARCH-INSIDE software to calculate the Markov-Shannon 3D electrostatic entropy parameters for the complex networks of protein structure of 2200 different protein 3D structures, including 1200 Ls. We have performed a Linear Discriminant Analysis (LDA) using these parameters as inputs in order to seek a new Quantitative Structure-Activity Relationship (QSAR) model, which is able to discriminate 3D structure of Ls from other proteins. We implemented this predictor in the web server named LECTINPred, freely available at http://bio-aims.udc.es/LECTINPred.php. This web server showed the following goodness-of-fit statistics: Sensitivity=96.7 % (for Ls), Specificity=87.6 % (non-active proteins), and Accuracy=92.5 % (for all proteins), considering altogether both the training and external prediction series. In mode 2, users can carry out an automatic retrieval of protein structures from PDB. We illustrated the use of this server, in operation mode 1, performing a data mining of PDB. We predicted Ls scores for +2000 proteins with unknown function and selected the top-scored ones as possible lectins. In operation mode 2, LECTINPred can also upload 3D structural models generated with structure-prediction tools like LOMETS or PHYRE2. The new Ls are expected to be of relevance as cancer biomarkers or useful in parasite vaccine design.

17.
Mol Biosyst ; 8(6): 1716-22, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22466084

RESUMO

Fast cancer diagnosis represents a real necessity in applied medicine due to the importance of this disease. Thus, theoretical models can help as prediction tools. Graph theory representation is one option because it permits us to numerically describe any real system such as the protein macromolecules by transforming real properties into molecular graph topological indices. This study proposes a new classification model for proteins linked with human colon cancer by using spiral graph topological indices of protein amino acid sequences. The best quantitative structure-disease relationship model is based on eleven Shannon entropy indices. It was obtained with the Naïve Bayes method and shows excellent predictive ability (90.92%) for new proteins linked with this type of cancer. The statistical analysis confirms that this model allows diagnosing the absence of human colon cancer obtaining an area under receiver operating characteristic of 0.91. The methodology presented can be used for any type of sequential information such as any protein and nucleic acid sequence.


Assuntos
Biomarcadores Tumorais/química , Neoplasias do Colo/química , Biologia Computacional/métodos , Modelos Biológicos , Proteínas/química , Sequência de Aminoácidos , Área Sob a Curva , Teorema de Bayes , Biomarcadores Tumorais/análise , Neoplasias do Colo/diagnóstico , Entropia , Humanos , Dados de Sequência Molecular , Proteínas/análise , Relação Quantitativa Estrutura-Atividade , Curva ROC , Análise de Sequência de Proteína/métodos
18.
Mol Biosyst ; 8(3): 851-62, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22234525

RESUMO

Lipid-Binding Proteins (LIBPs) or Fatty Acid-Binding Proteins (FABPs) play an important role in many diseases such as different types of cancer, kidney injury, atherosclerosis, diabetes, intestinal ischemia and parasitic infections. Thus, the computational methods that can predict LIBPs based on 3D structure parameters became a goal of major importance for drug-target discovery, vaccine design and biomarker selection. In addition, the Protein Data Bank (PDB) contains 3000+ protein 3D structures with unknown function. This list, as well as new experimental outcomes in proteomics research, is a very interesting source to discover relevant proteins, including LIBPs. However, to the best of our knowledge, there are no general models to predict new LIBPs based on 3D structures. We developed new Quantitative Structure-Activity Relationship (QSAR) models based on 3D electrostatic parameters of 1801 different proteins, including 801 LIBPs. We calculated these electrostatic parameters with the MARCH-INSIDE software and they correspond to the entire protein or to specific protein regions named core, inner, middle, and surface. We used these parameters as inputs to develop a simple Linear Discriminant Analysis (LDA) classifier to discriminate 3D structure of LIBPs from other proteins. We implemented this predictor in the web server named LIBP-Pred, freely available at , along with other important web servers of the Bio-AIMS portal. The users can carry out an automatic retrieval of protein structures from PDB or upload their custom protein structural models from their disk created with LOMETS server. We demonstrated the PDB mining option performing a predictive study of 2000+ proteins with unknown function. Interesting results regarding the discovery of new Cancer Biomarkers in humans or drug targets in parasites have been discussed here in this sense.


Assuntos
Biomarcadores Tumorais/química , Mineração de Dados/métodos , Bases de Dados de Proteínas , Internet , Neoplasias/metabolismo , Proteínas/química , Software , Animais , Humanos , Modelos Moleculares , Parasitos/metabolismo , Doenças Parasitárias , Proteínas/metabolismo
19.
Mol Biosyst ; 7(6): 1938-55, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21468430

RESUMO

Infections caused by human parasites (HPs) affect the poorest 500 million people worldwide but chemotherapy has become expensive, toxic, and/or less effective due to drug resistance. On the other hand, many 3D structures in Protein Data Bank (PDB) remain without function annotation. We need theoretical models to quickly predict biologically relevant Parasite Self Proteins (PSP), which are expressed differentially in a given parasite and are dissimilar to proteins expressed in other parasites and have a high probability to become new vaccines (unique sequence) or drug targets (unique 3D structure). We present herein a model for PSPs in eight different HPs (Ascaris, Entamoeba, Fasciola, Giardia, Leishmania, Plasmodium, Trypanosoma, and Toxoplasma) with 90% accuracy for 15 341 training and validation cases. The model combines protein residue networks, Markov Chain Models (MCM) and Artificial Neural Networks (ANN). The input parameters are the spectral moments of the Markov transition matrix for electrostatic interactions associated with the protein residue complex network calculated with the MARCH-INSIDE software. We implemented this model in a new web-server called MISS-Prot (MARCH-INSIDE Scores for Self-Proteins). MISS-Prot was programmed using PHP/HTML/Python and MARCH-INSIDE routines and is freely available at: . This server is easy to use by non-experts in Bioinformatics who can carry out automatic online upload and prediction with 3D structures deposited at PDB (mode 1). We can also study outcomes of Peptide Mass Fingerprinting (PMFs) and MS/MS for query proteins with unknown 3D structures (mode 2). We illustrated the use of MISS-Prot in experimental and/or theoretical studies of peptides from Fasciola hepatica cathepsin proteases or present on 10 Anisakis simplex allergens (Ani s 1 to Ani s 10). In doing so, we combined electrophoresis (1DE), MALDI-TOF Mass Spectroscopy, and MASCOT to seek sequences, Molecular Mechanics + Molecular Dynamics (MM/MD) to generate 3D structures and MISS-Prot to predict PSP scores. MISS-Prot also allows the prediction of PSP proteins in 16 additional species including parasite hosts, fungi pathogens, disease transmission vectors, and biotechnologically relevant organisms.


Assuntos
Alérgenos/química , Anisakis/química , Antígenos de Helmintos/química , Fasciola hepatica/metabolismo , Proteínas de Helminto/química , Sistemas On-Line , Peptídeos/química , Algoritmos , Sequência de Aminoácidos , Animais , Catepsina L/química , Biologia Computacional , Simulação por Computador , Análise Discriminante , Fasciola hepatica/química , Humanos , Internet , Cadeias de Markov , Modelos Moleculares , Dados de Sequência Molecular , Redes Neurais de Computação , Estrutura Terciária de Proteína , Curva ROC , Software
20.
J Theor Biol ; 276(1): 229-49, 2011 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-21277861

RESUMO

There are many protein ligands and/or drugs described with very different affinity to a large number of target proteins or receptors. In this work, we selected Ligands or Drug-target pairs (DTPs/nDTPs) of drugs with high affinity/non-affinity for different targets. Quantitative Structure-Activity Relationships (QSAR) models become a very useful tool in this context to substantially reduce time and resources consuming experiments. Unfortunately most QSAR models predict activity against only one protein target and/or have not been implemented in the form of public web server freely accessible online to the scientific community. To solve this problem, we developed here a multi-target QSAR (mt-QSAR) classifier using the MARCH-INSIDE technique to calculate structural parameters of drug and target plus one Artificial Neuronal Network (ANN) to seek the model. The best ANN model found is a Multi-Layer Perceptron (MLP) with profile MLP 20:20-15-1:1. This MLP classifies correctly 611 out of 678 DTPs (sensitivity=90.12%) and 3083 out of 3408 nDTPs (specificity=90.46%), corresponding to training accuracy=90.41%. The validation of the model was carried out by means of external predicting series. The model classifies correctly 310 out of 338 DTPs (sensitivity=91.72%) and 1527 out of 1674 nDTP (specificity=91.22%) in validation series, corresponding to total accuracy=91.30% for validation series (predictability). This model favorably compares with other ANN models developed in this work and Machine Learning classifiers published before to address the same problem in different aspects. We implemented the present model at web portal Bio-AIMS in the form of an online server called: Non-Linear MARCH-INSIDE Nested Drug-Bank Exploration & Screening Tool (NL MIND-BEST), which is located at URL: http://miaja.tic.udc.es/Bio-AIMS/NL-MIND-BEST.php. This online tool is based on PHP/HTML/Python and MARCH-INSIDE routines. Finally we illustrated two practical uses of this server with two different experiments. In experiment 1, we report by first time Quantum QSAR study, synthesis, characterization, and experimental assay of antiplasmodial and cytotoxic activities of oxoisoaporphine alkaloids derivatives as well as NL MIND-BEST prediction of potential target proteins. In experiment 2, we report sampling, parasite culture, sample preparation, 2-DE, MALDI-TOF, and -TOF/TOF MS, MASCOT search, MM/MD 3D structure modeling, and NL MIND-BEST prediction for different peptides a new protein of the found in the proteome of the human parasite Giardia lamblia, which is promising for anti-parasite drug-targets discovery.


Assuntos
Antimaláricos/farmacologia , Biologia Computacional/métodos , Giardia lamblia/metabolismo , Internet , Plasmodium falciparum/efeitos dos fármacos , Proteínas de Protozoários/química , Antimaláricos/química , Aporfinas/química , Aporfinas/farmacologia , Inteligência Artificial , Morte Celular/efeitos dos fármacos , Avaliação Pré-Clínica de Medicamentos , Eletroforese em Gel Bidimensional , Giardia lamblia/efeitos dos fármacos , Células HeLa , Humanos , Ligantes , Espectrometria de Massas , Modelos Químicos , Simulação de Dinâmica Molecular , Redes Neurais de Computação , Dinâmica não Linear , Peptídeos/química , Proteoma/química , Relação Quantitativa Estrutura-Atividade , Curva ROC
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA