Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
MAbs ; 15(1): 2244214, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37605371

RESUMEN

Antibodies are one of the predominant treatment modalities for various diseases. To improve the characteristics of a lead antibody, such as antigen-binding affinity and stability, we conducted comprehensive substitutions and exhaustively explored their sequence space. However, it is practically unfeasible to evaluate all possible combinations of mutations owing to combinatorial explosion when multiple amino acid residues are incorporated. It was recently reported that a machine-learning guided protein engineering approach such as Thompson sampling (TS) has been used to efficiently explore sequence space in the framework of Bayesian optimization. For TS, over-exploration occurs when the initial data are biasedly distributed in the vicinity of the lead antibody. We handle a large-scale virtual library that includes numerous mutations. When the number of experiments is limited, this over-exploration causes a serious issue. Thus, we conducted Monte Carlo Thompson sampling (MTS) to balance the exploration-exploitation trade-off by defining the posterior distribution via the Monte Carlo method and compared its performance with TS in antibody engineering. Our results demonstrated that MTS largely outperforms TS in discovering desirable candidates at an earlier round when over-exploration occurs on TS. Thus, the MTS method is a powerful technique for efficiently discovering antibodies with desired characteristics when the number of rounds is limited.


Asunto(s)
Anticuerpos , Ingeniería de Proteínas , Teorema de Bayes , Método de Montecarlo , Anticuerpos/química , Ingeniería de Proteínas/métodos
2.
Sci Rep ; 11(1): 5852, 2021 03 12.
Artículo en Inglés | MEDLINE | ID: mdl-33712669

RESUMEN

Molecular evolution is an important step in the development of therapeutic antibodies. However, the current method of affinity maturation is overly costly and labor-intensive because of the repetitive mutation experiments needed to adequately explore sequence space. Here, we employed a long short term memory network (LSTM)-a widely used deep generative model-based sequence generation and prioritization procedure to efficiently discover antibody sequences with higher affinity. We applied our method to the affinity maturation of antibodies against kynurenine, which is a metabolite related to the niacin synthesis pathway. Kynurenine binding sequences were enriched through phage display panning using a kynurenine-binding oriented human synthetic Fab library. We defined binding antibodies using a sequence repertoire from the NGS data to train the LSTM model. We confirmed that likelihood of generated sequences from a trained LSTM correlated well with binding affinity. The affinity of generated sequences are over 1800-fold higher than that of the parental clone. Moreover, compared to frequency based screening using the same dataset, our machine learning approach generated sequences with greater affinity.


Asunto(s)
Algoritmos , Anticuerpos/inmunología , Afinidad de Anticuerpos/inmunología , Técnicas de Visualización de Superficie Celular , Ingeniería de Proteínas , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Funciones de Verosimilitud , Aprendizaje Automático , Reproducibilidad de los Resultados
3.
Sci Rep ; 9(1): 19585, 2019 12 20.
Artículo en Inglés | MEDLINE | ID: mdl-31863054

RESUMEN

Potential inhibitors of a target biomolecule, NAD-dependent deacetylase Sirtuin 1, were identified by a contest-based approach, in which participants were asked to propose a prioritized list of 400 compounds from a designated compound library containing 2.5 million compounds using in silico methods and scoring. Our aim was to identify target enzyme inhibitors and to benchmark computer-aided drug discovery methods under the same experimental conditions. Collecting compound lists derived from various methods is advantageous for aggregating compounds with structurally diversified properties compared with the use of a single method. The inhibitory action on Sirtuin 1 of approximately half of the proposed compounds was experimentally accessed. Ultimately, seven structurally diverse compounds were identified.

4.
Sci Rep ; 7(1): 12038, 2017 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-28931921

RESUMEN

We propose a new iterative screening contest method to identify target protein inhibitors. After conducting a compound screening contest in 2014, we report results acquired from a contest held in 2015 in this study. Our aims were to identify target enzyme inhibitors and to benchmark a variety of computer-aided drug discovery methods under identical experimental conditions. In both contests, we employed the tyrosine-protein kinase Yes as an example target protein. Participating groups virtually screened possible inhibitors from a library containing 2.4 million compounds. Compounds were ranked based on functional scores obtained using their respective methods, and the top 181 compounds from each group were selected. Our results from the 2015 contest show an improved hit rate when compared to results from the 2014 contest. In addition, we have successfully identified a statistically-warranted method for identifying target inhibitors. Quantitative analysis of the most successful method gave additional insights into important characteristics of the method used.


Asunto(s)
Descubrimiento de Drogas/métodos , Inhibidores Enzimáticos/farmacología , Ensayos Analíticos de Alto Rendimiento/métodos , Inhibidores de Proteínas Quinasas/farmacología , Proteínas Proto-Oncogénicas c-yes/antagonistas & inhibidores , Inhibidores Enzimáticos/química , Inhibidores Enzimáticos/metabolismo , Humanos , Aprendizaje Automático , Estructura Molecular , Unión Proteica , Inhibidores de Proteínas Quinasas/química , Inhibidores de Proteínas Quinasas/metabolismo , Proteínas Proto-Oncogénicas c-yes/metabolismo , Reproducibilidad de los Resultados , Relación Estructura-Actividad
5.
Sci Rep ; 5: 17209, 2015 Nov 26.
Artículo en Inglés | MEDLINE | ID: mdl-26607293

RESUMEN

A search of broader range of chemical space is important for drug discovery. Different methods of computer-aided drug discovery (CADD) are known to propose compounds in different chemical spaces as hit molecules for the same target protein. This study aimed at using multiple CADD methods through open innovation to achieve a level of hit molecule diversity that is not achievable with any particular single method. We held a compound proposal contest, in which multiple research groups participated and predicted inhibitors of tyrosine-protein kinase Yes. This showed whether collective knowledge based on individual approaches helped to obtain hit compounds from a broad range of chemical space and whether the contest-based approach was effective.


Asunto(s)
Evaluación Preclínica de Medicamentos , Inhibidores de Proteínas Quinasas/análisis , Inhibidores de Proteínas Quinasas/farmacología , Proteínas Proto-Oncogénicas c-yes/antagonistas & inhibidores , Humanos , Análisis de Componente Principal , Proteínas Proto-Oncogénicas c-yes/química , Reproducibilidad de los Resultados , Familia-src Quinasas/metabolismo
6.
BMC Bioinformatics ; 15: 228, 2014 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-24980787

RESUMEN

BACKGROUND: Knockdown or overexpression of genes is widely used to identify genes that play important roles in many aspects of cellular functions and phenotypes. Because next-generation sequencing generates high-throughput data that allow us to detect genes, it is important to identify genes that drive functional and phenotypic changes of cells. However, conventional methods rely heavily on the assumption of normality and they often give incorrect results when the assumption is not true. To relax the Gaussian assumption in causal inference, we introduce the non-paranormal method to test conditional independence in the PC-algorithm. Then, we present the non-paranormal intervention-calculus when the directed acyclic graph (DAG) is absent (NPN-IDA), which incorporates the cumulative nature of effects through a cascaded pathway via causal inference for ranking causal genes against a phenotype with the non-paranormal method for estimating DAGs. RESULTS: We demonstrate that causal inference with the non-paranormal method significantly improves the performance in estimating DAGs on synthetic data in comparison with the original PC-algorithm. Moreover, we show that NPN-IDA outperforms the conventional methods in exploring regulators of the flowering time in Arabidopsis thaliana and regulators that control the browning of white adipocytes in mice. Our results show that performance improvement in estimating DAGs contributes to an accurate estimation of causal effects. CONCLUSIONS: Although the simplest alternative procedure was used, our proposed method enables us to design efficient intervention experiments and can be applied to a wide range of research purposes, including drug discovery, because of its generality.


Asunto(s)
Algoritmos , Técnicas Genéticas , Adipocitos Marrones/citología , Adipocitos Marrones/metabolismo , Adipocitos Blancos/citología , Adipocitos Blancos/metabolismo , Animales , Arabidopsis/genética , Interpretación Estadística de Datos , Técnicas de Silenciamiento del Gen , Secuenciación de Nucleótidos de Alto Rendimiento , Ratones , Distribución Normal , Fenotipo , Análisis de Regresión
7.
J Bioinform Comput Biol ; 9(4): 521-40, 2011 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-21776607

RESUMEN

In the drug discovery process, the metabolic fate of drugs is crucially important to prevent drug-drug interactions. Therefore, P450 isozyme selectivity prediction is an important task for screening drugs of appropriate metabolism profiles. Recently, large-scale activity data of five P450 isozymes (CYP1A2 CYP2C9, CYP3A4, CYP2D6, and CYP2C19) have been obtained using quantitative high-throughput screening with a bioluminescence assay. Although some isozymes share similar selectivities, conventional supervised learning algorithms independently learn a prediction model from each P450 isozyme. They are unable to exploit the other P450 isozyme activity data to improve the predictive performance of each P450 isozyme's selectivity. To address this issue, we apply transfer learning that uses activity data of the other isozymes to learn a prediction model from multiple P450 isozymes. After using the large-scale P450 isozyme selectivity dataset for five P450 isozymes, we evaluate the model's predictive performance. Experimental results show that, overall, our algorithm outperforms conventional supervised learning algorithms such as support vector machine (SVM), Weighted k-nearest neighbor classifier, Bagging, Adaboost, and latent semantic indexing (LSI). Moreover, our results show that the predictive performance of our algorithm is improved by exploiting the multiple P450 isozyme activity data in the learning process. Our algorithm can be an effective tool for P450 selectivity prediction for new chemical entities using multiple P450 isozyme activity data.


Asunto(s)
Inteligencia Artificial , Sistema Enzimático del Citocromo P-450/metabolismo , Evaluación Preclínica de Medicamentos/estadística & datos numéricos , Algoritmos , Biología Computacional , Bases de Datos Factuales , Descubrimiento de Drogas/estadística & datos numéricos , Interacciones Farmacológicas , Isoenzimas/metabolismo , Especificidad por Sustrato , Máquina de Vectores de Soporte
8.
J Mol Graph Model ; 29(3): 492-7, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20965757

RESUMEN

Accurate prediction of protein-ligand binding affinities for lead optimization in drug discovery remains an important and challenging problem on scoring functions for docking simulation. In this paper, we propose a data-driven approach that integrates multiple scoring functions to predict protein-ligand binding affinity directly. We then propose a new method called multiple instance regression based scoring (MIRS) that incorporates unbound ligand conformations using multiple scoring functions. We evaluated the predictive performance of MIRS using 100 protein-ligand complexes and their binding affinities. The experimental results showed that MIRS outperformed the 11 conventional scoring functions including LigScore, PLP, AutoDock, G-Score, D-Score, LUDI, F-Score, ChemScore, X-Score, PMF, and DrugScore. In addition, we confirmed that MIRS performed well on binding pose prediction. Our results reveal that it is indispensable to incorporate unbound ligand conformations in both binding affinity prediction and binding pose prediction. The proposed method will accelerate efficient lead optimization on structure-based drug design and provide a new direction to designing of new scoring score functions.


Asunto(s)
Simulación por Computador , Ligandos , Unión Proteica , Biología Computacional/métodos , Descubrimiento de Drogas , Modelos Moleculares , Conformación Molecular , Estructura Molecular , Análisis de Regresión , Termodinámica
9.
Stat Appl Genet Mol Biol ; 8: Article20, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19409064

RESUMEN

In clinical outcome prediction, such as disease diagnosis and prognosis, it is often assumed that the class, e.g., disease and control, is equally distributed. However, in practice we often encounter biological or clinical data whose class distribution is highly skewed. Since standard supervised learning algorithms intend to maximize the overall prediction accuracy, a prediction model tends to show a strong bias toward the majority class when it is trained on such imbalanced data. Therefore, the class distribution should be incorporated appropriately to learn from imbalanced data. To address this practically important problem, we proposed balanced gradient boosting (BalaBoost) which reformulates gradient boosting to avoid the overfitting to the majority class and is sensitive to the minority class by making use of the equal class distribution instead of the empirical class distribution. We applied BalaBoost to cancer tissue diagnosis based on miRNA expression data, premature death prediction for diabetes patients based on biochemical and clinical variables and tumor grade prediction of renal cell carcinoma based on tumor marker expressions whose class distribution is highly skewed. Experimental results showed that BalaBoost outperformed the representative supervised learning algorithms, i.e., gradient boosting, Random Forests and Support Vector Machine. Our results led us to the conclusion that BalaBoost is promising for clinical outcome prediction from imbalanced data.


Asunto(s)
Algoritmos , Interpretación Estadística de Datos , Diabetes Mellitus/diagnóstico , Neoplasias/diagnóstico , Carcinoma de Células Renales/diagnóstico , Carcinoma de Células Renales/genética , Carcinoma de Células Renales/patología , Diabetes Mellitus/genética , Diabetes Mellitus/mortalidad , Perfilación de la Expresión Génica/estadística & datos numéricos , Humanos , Neoplasias Renales/diagnóstico , Neoplasias Renales/genética , Neoplasias Renales/patología , MicroARNs/genética , Modelos Estadísticos , Estadificación de Neoplasias/métodos , Neoplasias/genética , Pronóstico , Reproducibilidad de los Resultados , Análisis de Supervivencia
10.
Comput Biol Chem ; 32(6): 438-41, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-18789768

RESUMEN

Alzheimer's disease (AD) is the most common form of dementia and leads to irreversible neurogenerative damage of the brain. However, the current diagnostic tools have poor sensitivity, especially for the early stages of AD and do not allow for diagnosis until AD has lead to irreversible brain damage. Therefore, it is crucial that AD is detected as early as possible. Although it is very hard, laborious and time-consuming to gather many AD and non-AD labeled samples, gathering unlabeled samples is easier than labeled samples. Since standard learning algorithms learn a diagnosis model from labeled samples only, they require many labeled samples and do not work well when the number of training samples is small. Therefore, it is very desirable to develop a predictive learning method to achieve high performance using both labeled samples and unlabeled samples. To address these problems, we propose semi-supervised distance metric learning using Random Forests with label propagation (SRF-LP) which incorporates labeled data for obtaining good metrics and propagates labels based on them. Experimental results showed that SRF-LP outperformed standard supervised learning algorithms, i.e., RF, SVM, Adaboost and CART and reached 93.1% accuracy at a maximum. Especially, SRF-LP largely outperformed when the number of training samples is very small. Our results also suggested that SRF-LP exhibits a synergistic effect of semi-supervised distance metric learning and label propagation.


Asunto(s)
Enfermedad de Alzheimer/diagnóstico , Valor Predictivo de las Pruebas , Humanos
11.
J Chem Inf Model ; 48(5): 988-96, 2008 May.
Artículo en Inglés | MEDLINE | ID: mdl-18426197

RESUMEN

To improve the performance of a single scoring function used in a protein-ligand docking program, we developed a bootstrap-based consensus scoring (BBCS) method, which is based on ensemble learning. BBCS combines multiple scorings, each of which has the same function form but different energy-parameter sets. These multiple energy-parameter sets are generated in two steps: (1) generation of training sets by a bootstrap method and (2) optimization of energy-parameter set by a Z-score approach, which is based on energy landscape theory as used in protein folding, against each training set. In this study, we applied BBCS to the FlexX scoring function. Using given 50 complexes, we generated 100 training sets and obtained 100 optimized energy-parameter sets. These parameter sets were tested against 48 complexes different from the training sets. BBCS was shown to be an improvement over single scoring when using a parameter set optimized by the same Z-score approach. Comparing BBCS with the original FlexX scoring function, we found that (1) the success rate of recognizing the crystal structure at the top relative to decoys increased from 33.3% to 52.1% and that (2) the rank of the crystal structure improved for 54.2% of the complexes and worsened for none. We also found that BBCS performed better than conventional consensus scoring (CS).


Asunto(s)
Inteligencia Artificial , Proteínas/química , Proteínas/metabolismo , Cristalografía por Rayos X , Ligandos , Redes Neurales de la Computación , Unión Proteica , Reproducibilidad de los Resultados
12.
J Chem Inf Model ; 48(4): 747-54, 2008 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-18318474

RESUMEN

Since the evaluation of ligand conformations is a crucial aspect of structure-based virtual screening, scoring functions play significant roles in it. However, it is known that a scoring function does not always work well for all target proteins. When one cannot know which scoring function works best against a target protein a priori, there is no standard scoring method to know it even if 3D structure of a target protein-ligand complex is available. Therefore, development of the method to achieve high enrichments from given scoring functions and 3D structure of protein-ligand complex is a crucial and challenging task. To address this problem, we applied SCS (supervised consensus scoring), which employs a rough linear correlation between the binding free energy and the root-mean-square deviation (rmsd) of a native ligand conformations and incorporates protein-ligand binding process with docked ligand conformations using supervised learning, to virtual screening. We evaluated both the docking poses and enrichments of SCS and five scoring functions (F-Score, G-Score, D-Score, ChemScore, and PMF) for three different target proteins: thymidine kinase (TK), thrombin (thrombin), and peroxisome proliferator-activated receptor gamma (PPARgamma). Our enrichment studies show that SCS is competitive or superior to a best single scoring function at the top ranks of screened database. We found that the enrichments of SCS could be limited by a best scoring function, because SCS is obtained on the basis of the five individual scoring functions. Therefore, it is concluded that SCS works very successfully from our results. Moreover, from docking pose analysis, we revealed the connection between enrichment and average centroid distance of top-scored docking poses. Since SCS requires only one 3D structure of protein-ligand complex, SCS will be useful for identifying new ligands.


Asunto(s)
Estructura Molecular , Ligandos , Modelos Moleculares , Proteínas/química
13.
Biochim Biophys Acta ; 1784(5): 764-72, 2008 May.
Artículo en Inglés | MEDLINE | ID: mdl-18359300

RESUMEN

Hepatocellular carcinoma (HCC) is one of the most common and aggressive human malignancies. Although several major risks related to HCC, e.g., hepatitis B and/or hepatitis C virus infection, aflatoxin B1 exposure, alcohol drinking and genetic defects have been revealed, the molecular mechanisms leading to the initiation and progression of HCC have not been clarified. To reduce the mortality and improve the effectiveness of therapy, it is important to detect the proteins which are associated with tumor progression and may be useful as potential therapeutic or diagnosis targets. However, previous studies have not yet revealed the associations among HCC cells, histological grade and AFP. Here, we performed two-dimensional difference gel electrophoresis (2D-DIGE) combined with MS for 18 HCC patients. To focus not on individual proteins but on multiple proteins associated with pathogenesis, we introduce the supervised feature selection based on stochastic gradient boosting (SGB) for identifying protein spots that discriminate HCC/non HCC, histological grade of moderate/well and high alpha-fetoprotein (AFP)/low AFP level without arbitrariness. We detected 18, 25 and 27 protein spots associated with HCC, histological grade and AFP level, respectively. We confirmed that SGB is able to identify the known HCC-related proteins, e.g., heat shock proteins, carbonic anhydrase 2. Moreover, we identified the differentially expressed proteins associated with histological grade of HCC and AFP level and found that aldo-keto reductase 1B10 (AKR1B10) is related to well differentiated HCC, keratin 8 (KRT8) is related to both histological grade and AFP level and protein disulfide isomerase-associated 3 (PDIA3) is associated with both HCC and AFP level. Our pilot study provides new insights on understanding the pathogenesis of HCC, histological grade and AFP level.


Asunto(s)
Carcinoma Hepatocelular/química , Neoplasias Hepáticas/química , Proteómica , Adulto , Anciano , Electroforesis en Gel Bidimensional , Femenino , Humanos , Masculino , Persona de Mediana Edad , Proteínas de Neoplasias/análisis , alfa-Fetoproteínas/metabolismo
14.
J Chem Inf Model ; 48(3): 575-82, 2008 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-18278890

RESUMEN

We propose a hypothesis that "a model of active compound can be provided by integrating information of compounds high-ranked by docking simulation of a random compound library". In our hypothesis, the inclusion of true active compounds in the high-ranked compound is not necessary. We regard the high-ranked compounds as being pseudo-active compounds. As a method to embody our hypothesis, we introduce a pseudo-structure-activity relationship (PSAR) model. Although the PSAR model is the same as a quantitative structure activity relationship (QSAR) model, in terms of statistical methodology, the implications of the training data are different. Known active compounds (ligands) are used as training data in the QSAR model, whereas the pseudo-active compounds are used in the PSAR model. In this study, Random Forest was used as a machine-learning algorithm. From tests for four functionally different targets, estrogen receptor antagonist (ER), thymidine kinase (TK), thrombin, and acetylcholine esterase (AChE), using five scoring functions, we obtained three conclusions: (1) the PSAR models significantly gave higher percentages of known ligands found than random sampling, and these results are sufficient to support our hypothesis; (2) the PSAR models gave higher percentages of known ligands found than normal scoring by scoring function, and these results demonstrate the practical usefulness of the PSAR model; and (3) the PSAR model can assess compounds failed in the docking simulation. Note that PSAR and QSAR models are used in different situations; the advantage of the PSAR model emerges when no ligand is available as training data or when one wants to find novel types of ligands, whereas the QSAR model is effective for finding compounds similar to known ligands when the ligands are already known.


Asunto(s)
Modelos Moleculares , Proteínas/química , Curva ROC , Relación Estructura-Actividad
15.
J Chem Inf Model ; 48(2): 288-95, 2008 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-18229906

RESUMEN

The evaluation of ligand conformations is a crucial aspect of structure-based virtual screening, and scoring functions play significant roles in it. While consensus scoring (CS) generally improves enrichment by compensating for the deficiencies of each scoring function, the strategy of how individual scoring functions are selected remains a challenging task when few known active compounds are available. To address this problem, we propose feature selection-based consensus scoring (FSCS), which performs supervised feature selection with docked native ligand conformations to select complementary scoring functions. We evaluated the enrichments of five scoring functions (F-Score, D-Score, PMF, G-Score, and ChemScore), FSCS, and RCS (rank-by-rank consensus scoring) for four different target proteins: acetylcholine esterase (AChE), thrombin (thrombin), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPARgamma). The results indicated that FSCS was able to select the complementary scoring functions and enhance ligand enrichments and that it outperformed RCS and the individual scoring functions for all target proteins. They also indicated that the performances of the single scoring functions were strongly dependent on the target protein. An especially favorable result with implications for practical drug screening is that FSCS performs well even if only one 3D structure of the protein-ligand complex is known. Moreover, we found that one can infer which scoring functions significantly enrich active compounds by using feature selection before actual docking and that the selected scoring functions are complementary.


Asunto(s)
Simulación por Computador , Evaluación Preclínica de Medicamentos/métodos , Animales , Humanos , Ligandos , Estructura Molecular , PPAR gamma/antagonistas & inhibidores , Unión Proteica , Relación Estructura-Actividad Cuantitativa , Proyectos de Investigación
16.
Biochem Biophys Res Commun ; 366(1): 186-92, 2008 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-18060859

RESUMEN

Proteome analysis of human hepatocellular carcinoma (HCC) was done using two-dimensional difference gel electrophoresis. To gain an understanding of the molecular events accompanying HCC development, we compared the protein expression profiles of HCC and non-HCC tissue from 14 patients to the mRNA expression profiles of the same samples made from a cDNA microarray. A total of 125 proteins were identified, and the expression profiles of 93 proteins (149 spots) were compared to the mRNA expression profiles. The overall protein expression ratios correlated well with the mRNA ratios between HCC and non-HCC (Pearson's correlation coefficient: r=0.73). Particularly, the HCC/non-HCC expression ratios of proteins involved in metabolic processes showed significant correlation to those of mRNA (r=0.9). A considerable number of proteins were expressed as multiple spots. Among them, several proteins showed spot-to-spot differences in expression level and their expression ratios between HCC and non-HCC poorly correlated to mRNA ratios. Such multi-spotted proteins might arise as a consequence of post-translational modifications.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Carcinoma Hepatocelular/metabolismo , Neoplasias Hepáticas/metabolismo , Hígado/metabolismo , Proteínas de Neoplasias/metabolismo , Proteoma/metabolismo , Factores de Transcripción/metabolismo , Anciano , Femenino , Perfilación de la Expresión Génica , Humanos , Masculino , Persona de Mediana Edad , Células Tumorales Cultivadas
17.
J Chem Inf Model ; 47(5): 1858-67, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17685604

RESUMEN

Protein-ligand docking programs have been used to efficiently discover novel ligands for target proteins from large-scale compound databases. However, better scoring methods are needed. Generally, scoring functions are optimized by means of various techniques that affect their fitness for reproducing X-ray structures and protein-ligand binding affinities. However, these scoring functions do not always work well for all target proteins. A scoring function should be optimized for a target protein to enhance enrichment for structure-based virtual screening. To address this problem, we propose the supervised scoring model (SSM), which takes into account the protein-ligand binding process using docked ligand conformations with supervised learning for optimizing scoring functions against a target protein. SSM employs a rough linear correlation between binding free energy and the root mean square deviation of a native ligand for predicting binding energy. We applied SSM to the FlexX scoring function, that is, F-Score, with five different target proteins: thymidine kinase (TK), estrogen receptor (ER), acetylcholine esterase (AChE), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPARgamma). For these five proteins, SSM always enhanced enrichment better than F-Score, exhibiting superior performance that was particularly remarkable for TK, AChE, and PPARgamma. We also demonstrated that SSM is especially good at enhancing enrichments of the top ranks of screened compounds, which is useful in practical drug screening.


Asunto(s)
ARN/química , ARN/efectos de los fármacos , Algoritmos , Emparejamiento Base , ADN/química , ADN/efectos de los fármacos , Diseño de Fármacos , Bases del Conocimiento , Ligandos , Espectroscopía de Resonancia Magnética , Conformación Molecular , Proteínas/química , Proteínas/efectos de los fármacos , Reproducibilidad de los Resultados , Relación Estructura-Actividad , Teofilina/química , Teofilina/farmacología
18.
J Chem Inf Model ; 47(2): 526-34, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17295466

RESUMEN

Docking programs are widely used to discover novel ligands efficiently and can predict protein-ligand complex structures with reasonable accuracy and speed. However, there is an emerging demand for better performance from the scoring methods. Consensus scoring (CS) methods improve the performance by compensating for the deficiencies of each scoring function. However, conventional CS and existing scoring functions have the same problems, such as a lack of protein flexibility, inadequate treatment of salvation, and the simplistic nature of the energy function used. Although there are many problems in current scoring functions, we focus our attention on the incorporation of unbound ligand conformations. To address this problem, we propose supervised consensus scoring (SCS), which takes into account protein-ligand binding process using unbound ligand conformations with supervised learning. An evaluation of docking accuracy for 100 diverse protein-ligand complexes shows that SCS outperforms both CS and 11 scoring functions (PLP, F-Score, LigScore, DrugScore, LUDI, X-Score, AutoDock, PMF, G-Score, ChemScore, and D-score). The success rates of SCS range from 89% to 91% in the range of rmsd < 2 A, while those of CS range from 80% to 85%, and those of the scoring functions range from 26% to 76%. Moreover, we also introduce a method for judging whether a compound is active or inactive with the appropriate criterion for virtual screening. SCS performs quite well in docking accuracy and is presumably useful for screening large-scale compound databases before predicting binding affinity.


Asunto(s)
Biología Computacional , Proteínas/química , Proteínas/metabolismo , Ligandos , Conformación Molecular , Unión Proteica
19.
Proc Natl Acad Sci U S A ; 102(41): 14854-9, 2005 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-16199521

RESUMEN

Alzheimer's disease (AD) is a neurodegenerative disease with an insidious onset and progressive course that inevitably leads to death. The current diagnostic tools do not allow for diagnosis until the disease has lead to irreversible brain damage. Genetic studies of autosomal dominant early onset familial AD has identified three causative genes: amyloid precursor protein (APP), presenilin 1 and 2 (PSEN1 and PSEN2). We performed a global gene expression analysis on fibroblasts from 33 individuals (both healthy and demented mutation carriers as well as wild-type siblings) from three families segregating the APP(SWE), APP(ARC) and PSEN1 H163Y mutations, respectively. The mutations cause hereditary progressive cognitive disorder, including typical autosomal dominant AD. Our data show that the mutation carriers share a common gene expression profile significantly different from that of their wild-type siblings. The results indicate that the disease process starts several decades before the onset of cognitive decline, suggesting that presymptomatic diagnosis of AD and other progressive cognitive disorders may be feasible in the near future.


Asunto(s)
Enfermedad de Alzheimer/genética , Precursor de Proteína beta-Amiloide/metabolismo , Perfilación de la Expresión Génica , Proteínas de la Membrana/genética , Proteínas de la Membrana/metabolismo , Mutación/genética , Precursor de Proteína beta-Amiloide/genética , Análisis por Conglomerados , Fibroblastos/metabolismo , Genes Dominantes/genética , Pruebas Genéticas/métodos , Humanos , Procesamiento de Imagen Asistido por Computador , Análisis por Micromatrices , Presenilina-1 , Análisis de Componente Principal
20.
FEBS Lett ; 579(13): 2878-82, 2005 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-15878553

RESUMEN

Small interfering RNAs (siRNAs) are becoming widely used for sequence-specific gene silencing in mammalian cells, but designing an effective siRNA is still a challenging task. In this study, we developed an algorithm for predicting siRNA functionality by using generalized string kernel (GSK) combined with support vector machine (SVM). With GSK, siRNA sequences were represented as vectors in a multi-dimensional feature space according to the numbers of subsequences in each siRNA, and subsequently classified with SVM into effective or ineffective siRNAs. We applied this algorithm to published siRNAs, and could classify effective and ineffective siRNAs with 90.6%, 86.2% accuracy, respectively.


Asunto(s)
Silenciador del Gen , Vectores Genéticos , ARN Interferente Pequeño/fisiología , Algoritmos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...