Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
BMC Bioinformatics ; 15: 228, 2014 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-24980787

RESUMO

BACKGROUND: Knockdown or overexpression of genes is widely used to identify genes that play important roles in many aspects of cellular functions and phenotypes. Because next-generation sequencing generates high-throughput data that allow us to detect genes, it is important to identify genes that drive functional and phenotypic changes of cells. However, conventional methods rely heavily on the assumption of normality and they often give incorrect results when the assumption is not true. To relax the Gaussian assumption in causal inference, we introduce the non-paranormal method to test conditional independence in the PC-algorithm. Then, we present the non-paranormal intervention-calculus when the directed acyclic graph (DAG) is absent (NPN-IDA), which incorporates the cumulative nature of effects through a cascaded pathway via causal inference for ranking causal genes against a phenotype with the non-paranormal method for estimating DAGs. RESULTS: We demonstrate that causal inference with the non-paranormal method significantly improves the performance in estimating DAGs on synthetic data in comparison with the original PC-algorithm. Moreover, we show that NPN-IDA outperforms the conventional methods in exploring regulators of the flowering time in Arabidopsis thaliana and regulators that control the browning of white adipocytes in mice. Our results show that performance improvement in estimating DAGs contributes to an accurate estimation of causal effects. CONCLUSIONS: Although the simplest alternative procedure was used, our proposed method enables us to design efficient intervention experiments and can be applied to a wide range of research purposes, including drug discovery, because of its generality.


Assuntos
Algoritmos , Técnicas Genéticas , Adipócitos Marrons/citologia , Adipócitos Marrons/metabolismo , Adipócitos Brancos/citologia , Adipócitos Brancos/metabolismo , Animais , Arabidopsis/genética , Interpretação Estatística de Dados , Técnicas de Silenciamento de Genes , Sequenciamento de Nucleotídeos em Larga Escala , Camundongos , Distribuição Normal , Fenótipo , Análise de Regressão
2.
MAbs ; 15(1): 2244214, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37605371

RESUMO

Antibodies are one of the predominant treatment modalities for various diseases. To improve the characteristics of a lead antibody, such as antigen-binding affinity and stability, we conducted comprehensive substitutions and exhaustively explored their sequence space. However, it is practically unfeasible to evaluate all possible combinations of mutations owing to combinatorial explosion when multiple amino acid residues are incorporated. It was recently reported that a machine-learning guided protein engineering approach such as Thompson sampling (TS) has been used to efficiently explore sequence space in the framework of Bayesian optimization. For TS, over-exploration occurs when the initial data are biasedly distributed in the vicinity of the lead antibody. We handle a large-scale virtual library that includes numerous mutations. When the number of experiments is limited, this over-exploration causes a serious issue. Thus, we conducted Monte Carlo Thompson sampling (MTS) to balance the exploration-exploitation trade-off by defining the posterior distribution via the Monte Carlo method and compared its performance with TS in antibody engineering. Our results demonstrated that MTS largely outperforms TS in discovering desirable candidates at an earlier round when over-exploration occurs on TS. Thus, the MTS method is a powerful technique for efficiently discovering antibodies with desired characteristics when the number of rounds is limited.


Assuntos
Anticorpos , Engenharia de Proteínas , Teorema de Bayes , Método de Monte Carlo , Anticorpos/química , Engenharia de Proteínas/métodos
3.
Sci Rep ; 11(1): 5852, 2021 03 12.
Artigo em Inglês | MEDLINE | ID: mdl-33712669

RESUMO

Molecular evolution is an important step in the development of therapeutic antibodies. However, the current method of affinity maturation is overly costly and labor-intensive because of the repetitive mutation experiments needed to adequately explore sequence space. Here, we employed a long short term memory network (LSTM)-a widely used deep generative model-based sequence generation and prioritization procedure to efficiently discover antibody sequences with higher affinity. We applied our method to the affinity maturation of antibodies against kynurenine, which is a metabolite related to the niacin synthesis pathway. Kynurenine binding sequences were enriched through phage display panning using a kynurenine-binding oriented human synthetic Fab library. We defined binding antibodies using a sequence repertoire from the NGS data to train the LSTM model. We confirmed that likelihood of generated sequences from a trained LSTM correlated well with binding affinity. The affinity of generated sequences are over 1800-fold higher than that of the parental clone. Moreover, compared to frequency based screening using the same dataset, our machine learning approach generated sequences with greater affinity.


Assuntos
Algoritmos , Anticorpos/imunologia , Afinidade de Anticorpos/imunologia , Técnicas de Visualização da Superfície Celular , Engenharia de Proteínas , Sequência de Aminoácidos , Bases de Dados de Proteínas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Funções Verossimilhança , Aprendizado de Máquina , Reprodutibilidade dos Testes
4.
Stat Appl Genet Mol Biol ; 8: Article20, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19409064

RESUMO

In clinical outcome prediction, such as disease diagnosis and prognosis, it is often assumed that the class, e.g., disease and control, is equally distributed. However, in practice we often encounter biological or clinical data whose class distribution is highly skewed. Since standard supervised learning algorithms intend to maximize the overall prediction accuracy, a prediction model tends to show a strong bias toward the majority class when it is trained on such imbalanced data. Therefore, the class distribution should be incorporated appropriately to learn from imbalanced data. To address this practically important problem, we proposed balanced gradient boosting (BalaBoost) which reformulates gradient boosting to avoid the overfitting to the majority class and is sensitive to the minority class by making use of the equal class distribution instead of the empirical class distribution. We applied BalaBoost to cancer tissue diagnosis based on miRNA expression data, premature death prediction for diabetes patients based on biochemical and clinical variables and tumor grade prediction of renal cell carcinoma based on tumor marker expressions whose class distribution is highly skewed. Experimental results showed that BalaBoost outperformed the representative supervised learning algorithms, i.e., gradient boosting, Random Forests and Support Vector Machine. Our results led us to the conclusion that BalaBoost is promising for clinical outcome prediction from imbalanced data.


Assuntos
Algoritmos , Interpretação Estatística de Dados , Diabetes Mellitus/diagnóstico , Neoplasias/diagnóstico , Carcinoma de Células Renais/diagnóstico , Carcinoma de Células Renais/genética , Carcinoma de Células Renais/patologia , Diabetes Mellitus/genética , Diabetes Mellitus/mortalidade , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Neoplasias Renais/diagnóstico , Neoplasias Renais/genética , Neoplasias Renais/patologia , MicroRNAs/genética , Modelos Estatísticos , Estadiamento de Neoplasias/métodos , Neoplasias/genética , Prognóstico , Reprodutibilidade dos Testes , Análise de Sobrevida
5.
Biochim Biophys Acta ; 1784(5): 764-72, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-18359300

RESUMO

Hepatocellular carcinoma (HCC) is one of the most common and aggressive human malignancies. Although several major risks related to HCC, e.g., hepatitis B and/or hepatitis C virus infection, aflatoxin B1 exposure, alcohol drinking and genetic defects have been revealed, the molecular mechanisms leading to the initiation and progression of HCC have not been clarified. To reduce the mortality and improve the effectiveness of therapy, it is important to detect the proteins which are associated with tumor progression and may be useful as potential therapeutic or diagnosis targets. However, previous studies have not yet revealed the associations among HCC cells, histological grade and AFP. Here, we performed two-dimensional difference gel electrophoresis (2D-DIGE) combined with MS for 18 HCC patients. To focus not on individual proteins but on multiple proteins associated with pathogenesis, we introduce the supervised feature selection based on stochastic gradient boosting (SGB) for identifying protein spots that discriminate HCC/non HCC, histological grade of moderate/well and high alpha-fetoprotein (AFP)/low AFP level without arbitrariness. We detected 18, 25 and 27 protein spots associated with HCC, histological grade and AFP level, respectively. We confirmed that SGB is able to identify the known HCC-related proteins, e.g., heat shock proteins, carbonic anhydrase 2. Moreover, we identified the differentially expressed proteins associated with histological grade of HCC and AFP level and found that aldo-keto reductase 1B10 (AKR1B10) is related to well differentiated HCC, keratin 8 (KRT8) is related to both histological grade and AFP level and protein disulfide isomerase-associated 3 (PDIA3) is associated with both HCC and AFP level. Our pilot study provides new insights on understanding the pathogenesis of HCC, histological grade and AFP level.


Assuntos
Carcinoma Hepatocelular/química , Neoplasias Hepáticas/química , Proteômica , Adulto , Idoso , Eletroforese em Gel Bidimensional , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Proteínas de Neoplasias/análise , alfa-Fetoproteínas/metabolismo
6.
Sci Rep ; 9(1): 19585, 2019 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-31863054

RESUMO

Potential inhibitors of a target biomolecule, NAD-dependent deacetylase Sirtuin 1, were identified by a contest-based approach, in which participants were asked to propose a prioritized list of 400 compounds from a designated compound library containing 2.5 million compounds using in silico methods and scoring. Our aim was to identify target enzyme inhibitors and to benchmark computer-aided drug discovery methods under the same experimental conditions. Collecting compound lists derived from various methods is advantageous for aggregating compounds with structurally diversified properties compared with the use of a single method. The inhibitory action on Sirtuin 1 of approximately half of the proposed compounds was experimentally accessed. Ultimately, seven structurally diverse compounds were identified.

7.
Biochem Biophys Res Commun ; 366(1): 186-92, 2008 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-18060859

RESUMO

Proteome analysis of human hepatocellular carcinoma (HCC) was done using two-dimensional difference gel electrophoresis. To gain an understanding of the molecular events accompanying HCC development, we compared the protein expression profiles of HCC and non-HCC tissue from 14 patients to the mRNA expression profiles of the same samples made from a cDNA microarray. A total of 125 proteins were identified, and the expression profiles of 93 proteins (149 spots) were compared to the mRNA expression profiles. The overall protein expression ratios correlated well with the mRNA ratios between HCC and non-HCC (Pearson's correlation coefficient: r=0.73). Particularly, the HCC/non-HCC expression ratios of proteins involved in metabolic processes showed significant correlation to those of mRNA (r=0.9). A considerable number of proteins were expressed as multiple spots. Among them, several proteins showed spot-to-spot differences in expression level and their expression ratios between HCC and non-HCC poorly correlated to mRNA ratios. Such multi-spotted proteins might arise as a consequence of post-translational modifications.


Assuntos
Biomarcadores Tumorais/metabolismo , Carcinoma Hepatocelular/metabolismo , Neoplasias Hepáticas/metabolismo , Fígado/metabolismo , Proteínas de Neoplasias/metabolismo , Proteoma/metabolismo , Fatores de Transcrição/metabolismo , Idoso , Feminino , Perfilação da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Células Tumorais Cultivadas
8.
Comput Biol Chem ; 32(6): 438-41, 2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18789768

RESUMO

Alzheimer's disease (AD) is the most common form of dementia and leads to irreversible neurogenerative damage of the brain. However, the current diagnostic tools have poor sensitivity, especially for the early stages of AD and do not allow for diagnosis until AD has lead to irreversible brain damage. Therefore, it is crucial that AD is detected as early as possible. Although it is very hard, laborious and time-consuming to gather many AD and non-AD labeled samples, gathering unlabeled samples is easier than labeled samples. Since standard learning algorithms learn a diagnosis model from labeled samples only, they require many labeled samples and do not work well when the number of training samples is small. Therefore, it is very desirable to develop a predictive learning method to achieve high performance using both labeled samples and unlabeled samples. To address these problems, we propose semi-supervised distance metric learning using Random Forests with label propagation (SRF-LP) which incorporates labeled data for obtaining good metrics and propagates labels based on them. Experimental results showed that SRF-LP outperformed standard supervised learning algorithms, i.e., RF, SVM, Adaboost and CART and reached 93.1% accuracy at a maximum. Especially, SRF-LP largely outperformed when the number of training samples is very small. Our results also suggested that SRF-LP exhibits a synergistic effect of semi-supervised distance metric learning and label propagation.


Assuntos
Doença de Alzheimer/diagnóstico , Valor Preditivo dos Testes , Humanos
9.
Sci Rep ; 7(1): 12038, 2017 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-28931921

RESUMO

We propose a new iterative screening contest method to identify target protein inhibitors. After conducting a compound screening contest in 2014, we report results acquired from a contest held in 2015 in this study. Our aims were to identify target enzyme inhibitors and to benchmark a variety of computer-aided drug discovery methods under identical experimental conditions. In both contests, we employed the tyrosine-protein kinase Yes as an example target protein. Participating groups virtually screened possible inhibitors from a library containing 2.4 million compounds. Compounds were ranked based on functional scores obtained using their respective methods, and the top 181 compounds from each group were selected. Our results from the 2015 contest show an improved hit rate when compared to results from the 2014 contest. In addition, we have successfully identified a statistically-warranted method for identifying target inhibitors. Quantitative analysis of the most successful method gave additional insights into important characteristics of the method used.


Assuntos
Descoberta de Drogas/métodos , Inibidores Enzimáticos/farmacologia , Ensaios de Triagem em Larga Escala/métodos , Inibidores de Proteínas Quinases/farmacologia , Proteínas Proto-Oncogênicas c-yes/antagonistas & inibidores , Inibidores Enzimáticos/química , Inibidores Enzimáticos/metabolismo , Humanos , Aprendizado de Máquina , Estrutura Molecular , Ligação Proteica , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/metabolismo , Proteínas Proto-Oncogênicas c-yes/metabolismo , Reprodutibilidade dos Testes , Relação Estrutura-Atividade
10.
FEBS Lett ; 579(13): 2878-82, 2005 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-15878553

RESUMO

Small interfering RNAs (siRNAs) are becoming widely used for sequence-specific gene silencing in mammalian cells, but designing an effective siRNA is still a challenging task. In this study, we developed an algorithm for predicting siRNA functionality by using generalized string kernel (GSK) combined with support vector machine (SVM). With GSK, siRNA sequences were represented as vectors in a multi-dimensional feature space according to the numbers of subsequences in each siRNA, and subsequently classified with SVM into effective or ineffective siRNAs. We applied this algorithm to published siRNAs, and could classify effective and ineffective siRNAs with 90.6%, 86.2% accuracy, respectively.


Assuntos
Inativação Gênica , Vetores Genéticos , RNA Interferente Pequeno/fisiologia , Algoritmos
11.
Sci Rep ; 5: 17209, 2015 Nov 26.
Artigo em Inglês | MEDLINE | ID: mdl-26607293

RESUMO

A search of broader range of chemical space is important for drug discovery. Different methods of computer-aided drug discovery (CADD) are known to propose compounds in different chemical spaces as hit molecules for the same target protein. This study aimed at using multiple CADD methods through open innovation to achieve a level of hit molecule diversity that is not achievable with any particular single method. We held a compound proposal contest, in which multiple research groups participated and predicted inhibitors of tyrosine-protein kinase Yes. This showed whether collective knowledge based on individual approaches helped to obtain hit compounds from a broad range of chemical space and whether the contest-based approach was effective.


Assuntos
Avaliação Pré-Clínica de Medicamentos , Inibidores de Proteínas Quinases/análise , Inibidores de Proteínas Quinases/farmacologia , Proteínas Proto-Oncogênicas c-yes/antagonistas & inibidores , Humanos , Análise de Componente Principal , Proteínas Proto-Oncogênicas c-yes/química , Reprodutibilidade dos Testes , Quinases da Família src/metabolismo
12.
Genome Inform ; 15(2): 151-60, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15706501

RESUMO

Recently, gene expression data under various conditions have largely been obtained by the utilization of the DNA microarrays and oligonucleotide arrays. There have been emerging demands to analyze the function of genes from the gene expression profiles. For clustering genes from their expression profiles, hierarchical clustering has been widely used. The clustering method represents the relationships of genes as a tree structure by connecting genes using their similarity scores based on the Pearson correlation coefficient. But the clustering method is sensitive to experimental noise. To cope with the problem, we propose another type of clustering method (the p-quasi complete linkage clustering). We apply this method to the gene expression data of yeast cell-cycles and human lung cancer. The effectiveness of our method is demonstrated by comparing clustering results with other methods.


Assuntos
Algoritmos , Inteligência Artificial , Proteínas de Ciclo Celular/genética , Perfilação da Expressão Gênica/métodos , Neoplasias Pulmonares/genética , Proteínas de Neoplasias/fisiologia , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteínas de Saccharomyces cerevisiae/genética , Análise por Conglomerados , Humanos , Neoplasias Pulmonares/metabolismo
13.
J Bioinform Comput Biol ; 9(4): 521-40, 2011 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21776607

RESUMO

In the drug discovery process, the metabolic fate of drugs is crucially important to prevent drug-drug interactions. Therefore, P450 isozyme selectivity prediction is an important task for screening drugs of appropriate metabolism profiles. Recently, large-scale activity data of five P450 isozymes (CYP1A2 CYP2C9, CYP3A4, CYP2D6, and CYP2C19) have been obtained using quantitative high-throughput screening with a bioluminescence assay. Although some isozymes share similar selectivities, conventional supervised learning algorithms independently learn a prediction model from each P450 isozyme. They are unable to exploit the other P450 isozyme activity data to improve the predictive performance of each P450 isozyme's selectivity. To address this issue, we apply transfer learning that uses activity data of the other isozymes to learn a prediction model from multiple P450 isozymes. After using the large-scale P450 isozyme selectivity dataset for five P450 isozymes, we evaluate the model's predictive performance. Experimental results show that, overall, our algorithm outperforms conventional supervised learning algorithms such as support vector machine (SVM), Weighted k-nearest neighbor classifier, Bagging, Adaboost, and latent semantic indexing (LSI). Moreover, our results show that the predictive performance of our algorithm is improved by exploiting the multiple P450 isozyme activity data in the learning process. Our algorithm can be an effective tool for P450 selectivity prediction for new chemical entities using multiple P450 isozyme activity data.


Assuntos
Inteligência Artificial , Sistema Enzimático do Citocromo P-450/metabolismo , Avaliação Pré-Clínica de Medicamentos/estatística & dados numéricos , Algoritmos , Biologia Computacional , Bases de Dados Factuais , Descoberta de Drogas/estatística & dados numéricos , Interações Medicamentosas , Isoenzimas/metabolismo , Especificidade por Substrato , Máquina de Vetores de Suporte
14.
J Mol Graph Model ; 29(3): 492-7, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20965757

RESUMO

Accurate prediction of protein-ligand binding affinities for lead optimization in drug discovery remains an important and challenging problem on scoring functions for docking simulation. In this paper, we propose a data-driven approach that integrates multiple scoring functions to predict protein-ligand binding affinity directly. We then propose a new method called multiple instance regression based scoring (MIRS) that incorporates unbound ligand conformations using multiple scoring functions. We evaluated the predictive performance of MIRS using 100 protein-ligand complexes and their binding affinities. The experimental results showed that MIRS outperformed the 11 conventional scoring functions including LigScore, PLP, AutoDock, G-Score, D-Score, LUDI, F-Score, ChemScore, X-Score, PMF, and DrugScore. In addition, we confirmed that MIRS performed well on binding pose prediction. Our results reveal that it is indispensable to incorporate unbound ligand conformations in both binding affinity prediction and binding pose prediction. The proposed method will accelerate efficient lead optimization on structure-based drug design and provide a new direction to designing of new scoring score functions.


Assuntos
Simulação por Computador , Ligantes , Ligação Proteica , Biologia Computacional/métodos , Descoberta de Drogas , Modelos Moleculares , Conformação Molecular , Estrutura Molecular , Análise de Regressão , Termodinâmica
15.
J Chem Inf Model ; 48(2): 288-95, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18229906

RESUMO

The evaluation of ligand conformations is a crucial aspect of structure-based virtual screening, and scoring functions play significant roles in it. While consensus scoring (CS) generally improves enrichment by compensating for the deficiencies of each scoring function, the strategy of how individual scoring functions are selected remains a challenging task when few known active compounds are available. To address this problem, we propose feature selection-based consensus scoring (FSCS), which performs supervised feature selection with docked native ligand conformations to select complementary scoring functions. We evaluated the enrichments of five scoring functions (F-Score, D-Score, PMF, G-Score, and ChemScore), FSCS, and RCS (rank-by-rank consensus scoring) for four different target proteins: acetylcholine esterase (AChE), thrombin (thrombin), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPARgamma). The results indicated that FSCS was able to select the complementary scoring functions and enhance ligand enrichments and that it outperformed RCS and the individual scoring functions for all target proteins. They also indicated that the performances of the single scoring functions were strongly dependent on the target protein. An especially favorable result with implications for practical drug screening is that FSCS performs well even if only one 3D structure of the protein-ligand complex is known. Moreover, we found that one can infer which scoring functions significantly enrich active compounds by using feature selection before actual docking and that the selected scoring functions are complementary.


Assuntos
Simulação por Computador , Avaliação Pré-Clínica de Medicamentos/métodos , Animais , Humanos , Ligantes , Estrutura Molecular , PPAR gama/antagonistas & inibidores , Ligação Proteica , Relação Quantitativa Estrutura-Atividade , Projetos de Pesquisa
16.
J Chem Inf Model ; 48(4): 747-54, 2008 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-18318474

RESUMO

Since the evaluation of ligand conformations is a crucial aspect of structure-based virtual screening, scoring functions play significant roles in it. However, it is known that a scoring function does not always work well for all target proteins. When one cannot know which scoring function works best against a target protein a priori, there is no standard scoring method to know it even if 3D structure of a target protein-ligand complex is available. Therefore, development of the method to achieve high enrichments from given scoring functions and 3D structure of protein-ligand complex is a crucial and challenging task. To address this problem, we applied SCS (supervised consensus scoring), which employs a rough linear correlation between the binding free energy and the root-mean-square deviation (rmsd) of a native ligand conformations and incorporates protein-ligand binding process with docked ligand conformations using supervised learning, to virtual screening. We evaluated both the docking poses and enrichments of SCS and five scoring functions (F-Score, G-Score, D-Score, ChemScore, and PMF) for three different target proteins: thymidine kinase (TK), thrombin (thrombin), and peroxisome proliferator-activated receptor gamma (PPARgamma). Our enrichment studies show that SCS is competitive or superior to a best single scoring function at the top ranks of screened database. We found that the enrichments of SCS could be limited by a best scoring function, because SCS is obtained on the basis of the five individual scoring functions. Therefore, it is concluded that SCS works very successfully from our results. Moreover, from docking pose analysis, we revealed the connection between enrichment and average centroid distance of top-scored docking poses. Since SCS requires only one 3D structure of protein-ligand complex, SCS will be useful for identifying new ligands.


Assuntos
Estrutura Molecular , Ligantes , Modelos Moleculares , Proteínas/química
17.
J Chem Inf Model ; 48(3): 575-82, 2008 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-18278890

RESUMO

We propose a hypothesis that "a model of active compound can be provided by integrating information of compounds high-ranked by docking simulation of a random compound library". In our hypothesis, the inclusion of true active compounds in the high-ranked compound is not necessary. We regard the high-ranked compounds as being pseudo-active compounds. As a method to embody our hypothesis, we introduce a pseudo-structure-activity relationship (PSAR) model. Although the PSAR model is the same as a quantitative structure activity relationship (QSAR) model, in terms of statistical methodology, the implications of the training data are different. Known active compounds (ligands) are used as training data in the QSAR model, whereas the pseudo-active compounds are used in the PSAR model. In this study, Random Forest was used as a machine-learning algorithm. From tests for four functionally different targets, estrogen receptor antagonist (ER), thymidine kinase (TK), thrombin, and acetylcholine esterase (AChE), using five scoring functions, we obtained three conclusions: (1) the PSAR models significantly gave higher percentages of known ligands found than random sampling, and these results are sufficient to support our hypothesis; (2) the PSAR models gave higher percentages of known ligands found than normal scoring by scoring function, and these results demonstrate the practical usefulness of the PSAR model; and (3) the PSAR model can assess compounds failed in the docking simulation. Note that PSAR and QSAR models are used in different situations; the advantage of the PSAR model emerges when no ligand is available as training data or when one wants to find novel types of ligands, whereas the QSAR model is effective for finding compounds similar to known ligands when the ligands are already known.


Assuntos
Modelos Moleculares , Proteínas/química , Curva ROC , Relação Estrutura-Atividade
18.
J Chem Inf Model ; 48(5): 988-96, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-18426197

RESUMO

To improve the performance of a single scoring function used in a protein-ligand docking program, we developed a bootstrap-based consensus scoring (BBCS) method, which is based on ensemble learning. BBCS combines multiple scorings, each of which has the same function form but different energy-parameter sets. These multiple energy-parameter sets are generated in two steps: (1) generation of training sets by a bootstrap method and (2) optimization of energy-parameter set by a Z-score approach, which is based on energy landscape theory as used in protein folding, against each training set. In this study, we applied BBCS to the FlexX scoring function. Using given 50 complexes, we generated 100 training sets and obtained 100 optimized energy-parameter sets. These parameter sets were tested against 48 complexes different from the training sets. BBCS was shown to be an improvement over single scoring when using a parameter set optimized by the same Z-score approach. Comparing BBCS with the original FlexX scoring function, we found that (1) the success rate of recognizing the crystal structure at the top relative to decoys increased from 33.3% to 52.1% and that (2) the rank of the crystal structure improved for 54.2% of the complexes and worsened for none. We also found that BBCS performed better than conventional consensus scoring (CS).


Assuntos
Inteligência Artificial , Proteínas/química , Proteínas/metabolismo , Cristalografia por Raios X , Ligantes , Redes Neurais de Computação , Ligação Proteica , Reprodutibilidade dos Testes
19.
J Chem Inf Model ; 47(5): 1858-67, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17685604

RESUMO

Protein-ligand docking programs have been used to efficiently discover novel ligands for target proteins from large-scale compound databases. However, better scoring methods are needed. Generally, scoring functions are optimized by means of various techniques that affect their fitness for reproducing X-ray structures and protein-ligand binding affinities. However, these scoring functions do not always work well for all target proteins. A scoring function should be optimized for a target protein to enhance enrichment for structure-based virtual screening. To address this problem, we propose the supervised scoring model (SSM), which takes into account the protein-ligand binding process using docked ligand conformations with supervised learning for optimizing scoring functions against a target protein. SSM employs a rough linear correlation between binding free energy and the root mean square deviation of a native ligand for predicting binding energy. We applied SSM to the FlexX scoring function, that is, F-Score, with five different target proteins: thymidine kinase (TK), estrogen receptor (ER), acetylcholine esterase (AChE), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPARgamma). For these five proteins, SSM always enhanced enrichment better than F-Score, exhibiting superior performance that was particularly remarkable for TK, AChE, and PPARgamma. We also demonstrated that SSM is especially good at enhancing enrichments of the top ranks of screened compounds, which is useful in practical drug screening.


Assuntos
RNA/química , RNA/efeitos dos fármacos , Algoritmos , Pareamento de Bases , DNA/química , DNA/efeitos dos fármacos , Desenho de Fármacos , Bases de Conhecimento , Ligantes , Espectroscopia de Ressonância Magnética , Conformação Molecular , Proteínas/química , Proteínas/efeitos dos fármacos , Reprodutibilidade dos Testes , Relação Estrutura-Atividade , Teofilina/química , Teofilina/farmacologia
20.
J Chem Inf Model ; 47(2): 526-34, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17295466

RESUMO

Docking programs are widely used to discover novel ligands efficiently and can predict protein-ligand complex structures with reasonable accuracy and speed. However, there is an emerging demand for better performance from the scoring methods. Consensus scoring (CS) methods improve the performance by compensating for the deficiencies of each scoring function. However, conventional CS and existing scoring functions have the same problems, such as a lack of protein flexibility, inadequate treatment of salvation, and the simplistic nature of the energy function used. Although there are many problems in current scoring functions, we focus our attention on the incorporation of unbound ligand conformations. To address this problem, we propose supervised consensus scoring (SCS), which takes into account protein-ligand binding process using unbound ligand conformations with supervised learning. An evaluation of docking accuracy for 100 diverse protein-ligand complexes shows that SCS outperforms both CS and 11 scoring functions (PLP, F-Score, LigScore, DrugScore, LUDI, X-Score, AutoDock, PMF, G-Score, ChemScore, and D-score). The success rates of SCS range from 89% to 91% in the range of rmsd < 2 A, while those of CS range from 80% to 85%, and those of the scoring functions range from 26% to 76%. Moreover, we also introduce a method for judging whether a compound is active or inactive with the appropriate criterion for virtual screening. SCS performs quite well in docking accuracy and is presumably useful for screening large-scale compound databases before predicting binding affinity.


Assuntos
Biologia Computacional , Proteínas/química , Proteínas/metabolismo , Ligantes , Conformação Molecular , Ligação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA