RESUMEN
Microflora is actively used to produce value-added materials in industry, and each cell density should be controlled for stable microflora use. In this study, a simple system evaluating the cell density was constructed with artificial intelligence (AI) using the absorbance spectra data of microflora. To set up the system, the prediction system for cell density based on machine learning was constructed using the spectra data as the feature from the mixture of Saccharomyces cerevisiae and Chlamydomonas reinhardtii. As the results of predicting cell density by extremely randomized trees, when the cell densities of S. cerevisiae and C. reinhardtii were shifted and fixed, the coefficient of determination (R2) was 0.8495; on the other hand, when the cell densities of S. cerevisiae and C. reinhardtii were fixed and shifted, the R2 was 0.9232. To explain the prediction system, the randomized trees regressor of the decision tree-based ensemble learning method as the machine learning algorithm and Shapley additive explanations (SHAPs) as the explainable AI (XAI) to interpret the features contributing to the prediction results were used. As a result of the SHAP analyses, not only the optical density, but also the absorbance of the Soret and Q bands derived from the chloroplasts of C. reinhardtii could contribute to the prediction as the features. The simple cell density evaluating system could have an industrial impact.
RESUMEN
This study aimed to predict the risk of Alzheimer-type dementia for persons aged over 75 years old without receiving long-term care services using regularly collected claim data. A refined dataset including 48,123 persons was prepared from claim data of health insurance and long-term care insurance in a large city in the metropolitan area in Japan. The utilized features include the age and sex of subjects, 502 diseases based on ICD-10 diagnosis codes, and 107 prescription drugs based on therapeutic classes. The most important challenge in this work was feature selection form a large number of features. We adopted sparse logistic regression models with L0 regularization (SLR-L0) and L1 regularization (SLR-L1) as classification models based on machine learning. These regularizations enable feature selection by estimating sparse solution of non-zero coefficients in the model optimization. Predictions were performed by integrating 100 predictors trained by bootstrap samples. As a result, the area under the ROC curves (AUCs) were 0.663 for SLR-L0 and 0.660 for SLR-L1. These performances were similar, however, the average numbers of selected features were 13 out of a total of 611 for SLR-L0 and 253 for SLR-R1. The results indicate that SLR-L1 tended to include less useful features, whereas SLR-L0 narrowed down influential features. SLR-L0 might be more useful than SLR-L1 for practical use or the discussion of risk factors with medical experts.
Asunto(s)
Enfermedad de Alzheimer , Anciano , Enfermedad de Alzheimer/diagnóstico , Humanos , Japón , Modelos Logísticos , Aprendizaje AutomáticoRESUMEN
Mutational activation of the Ras oncogene products (H-Ras, K-Ras, and N-Ras) is frequently observed in human cancers, making them promising anticancer drug targets. Nonetheless, no effective strategy has been available for the development of Ras inhibitors, partly owing to the absence of well-defined surface pockets suitable for drug binding. Only recently, such pockets have been found in the crystal structures of a unique conformation of Rasâ GTP. Here we report the successful development of small-molecule Ras inhibitors by an in silico screen targeting a pocket found in the crystal structure of M-Rasâ GTP carrying an H-Ras-type substitution P40D. The selected compound Kobe0065 and its analog Kobe2602 exhibit inhibitory activity toward H-Rasâ GTP-c-Raf-1 binding both in vivo and in vitro. They effectively inhibit both anchorage-dependent and -independent growth and induce apoptosis of H-ras(G12V)-transformed NIH 3T3 cells, which is accompanied by down-regulation of downstream molecules such as MEK/ERK, Akt, and RalA as well as an upstream molecule, Son of sevenless. Moreover, they exhibit antitumor activity on a xenograft of human colon carcinoma SW480 cells carrying the K-ras(G12V) gene by oral administration. The NMR structure of a complex of the compound with H-Rasâ GTP(T35S), exclusively adopting the unique conformation, confirms its insertion into one of the surface pockets and provides a molecular basis for binding inhibition toward multiple Rasâ GTP-interacting molecules. This study proves the effectiveness of our strategy for structure-based drug design to target Rasâ GTP, and the resulting Kobe0065-family compounds may serve as a scaffold for the development of Ras inhibitors with higher potency and specificity.
Asunto(s)
Antineoplásicos/farmacología , Diseño de Fármacos , Proteínas ras/antagonistas & inhibidores , Proteínas ras/metabolismo , Animales , Línea Celular Transformada , Línea Celular Tumoral , Biología Computacional/métodos , Glutatión Transferasa/metabolismo , Guanosina Trifosfato/química , Humanos , Ratones , Ratones Desnudos , Modelos Moleculares , Conformación Molecular , Mutación , Células 3T3 NIH , Trasplante de Neoplasias , Unión Proteica , Conformación Proteica , Transducción de SeñalRESUMEN
The femtomolar-affinity mutant antibody (4M5.3) generated by directed evolution is interesting because of the potential of antibody engineering. In this study, the mutant and its wild type (4-4-20) were compared in terms of antigen-antibody interactions and structural flexibility to elucidate the effects of directed evolution. For this purpose, multiple steered molecular dynamics (SMD) simulations were performed. The pulling forces of SMD simulations elucidated the regions that form strong attractive interactions in the binding pocket. Structural analysis in these regions showed two important mutations for improving attractive interactions. First, mutation of Tyr102(H) to Ser (sequence numbering of Protein Data Bank entry 1FLR ) played a role in resolving the steric hindrance on the pathway of the antigen in the binding pocket. Second, mutation of Asp31(H) to His played a role in resolving electrostatic repulsion. Potentials of mean force (PMFs) of both the wild type and the mutant showed landscapes that do not include obvious intermediate states and go directly to the bound state. These landscapes were regarded as funnel-like binding free energy landscapes. Furthermore, the structural flexibility based on the fluctuations of the positions of atoms was analyzed. It was shown that the fluctuations in the positions of the antigen and residues in contact with antigen tend to be smaller in the mutant than in the wild type. This result suggested that structural flexibility decreases as affinity is improved by directed evolution. This suggestion is similar to the relationship between affinity and flexibility for in vivo affinity maturation, which was suggested by Romesberg and co-workers [Jimenez, R., et al. (2003) Proc. Natl. Acad. Sci. U.S.A.100, 92-97]. Consequently, the relationship was found to be applicable up to femotomolar affinity levels.
Asunto(s)
Afinidad de Anticuerpos , Antígenos/inmunología , Simulación de Dinámica Molecular , Anticuerpos de Cadena Única/química , Anticuerpos de Cadena Única/inmunología , Evolución Molecular Dirigida , Conformación Proteica , Anticuerpos de Cadena Única/genéticaRESUMEN
The cytochrome P450 enzyme engineered for enhancement of vitamin D(3) (VD(3)) hydroxylation activity, Vdh-K1, includes four mutations (T70R, V156L, E216M, and E384R) compared to the wild-type enzyme. Plausible roles for V156L, E216M, and E384R have been suggested by crystal structure analysis (Protein Data Bank 3A50 ), but the role of T70R, which is located at the entrance of the substrate access channel, remained unclear. In this study, the role of the T70R mutation was investigated by using computational approaches. Molecular dynamics (MD) simulations and steered molecular dynamics (SMD) simulations were performed, and differences between R70 and T70 were compared in terms of structural change, binding free energy change (PMF), and interaction force between the enzyme and substrate. MD simulations revealed that R70 forms a salt bridge with D42 and the salt bridge affects the locations and the conformations of VD(3) in the bound state. SMD simulations revealed that the salt bridge tends to be formed strongly when VD(3) passes through the binding pocket. PMFs showed that the T70R mutation leads to energetic stabilization of enzyme-VD(3) binding in the region near the heme active site. Interestingly, these results concluded that the D42-R70 salt bridge at the entrance of the substrate access channel affects the region near the heme active site where the hydroxylation of VD(3) occurs; i.e., it is thought that the T70R mutation plays an important role in enhancing VD(3) hydroxylation activity. A significant future challenge is to compare the hydroxylation activities of R70 and T70 directly by a quantum chemical calculation, and three-dimensional coordinates of the enzyme and VD(3) obtained from MD and SMD simulations will be available for the future challenge.
Asunto(s)
Colecalciferol/metabolismo , Sistema Enzimático del Citocromo P-450/genética , Simulación de Dinámica Molecular , Sitios de Unión , Sistema Enzimático del Citocromo P-450/metabolismo , Hidroxilación , Mutación , Esteroide Hidroxilasas/metabolismo , TermodinámicaRESUMEN
To improve the performance of a single scoring function used in a protein-ligand docking program, we developed a bootstrap-based consensus scoring (BBCS) method, which is based on ensemble learning. BBCS combines multiple scorings, each of which has the same function form but different energy-parameter sets. These multiple energy-parameter sets are generated in two steps: (1) generation of training sets by a bootstrap method and (2) optimization of energy-parameter set by a Z-score approach, which is based on energy landscape theory as used in protein folding, against each training set. In this study, we applied BBCS to the FlexX scoring function. Using given 50 complexes, we generated 100 training sets and obtained 100 optimized energy-parameter sets. These parameter sets were tested against 48 complexes different from the training sets. BBCS was shown to be an improvement over single scoring when using a parameter set optimized by the same Z-score approach. Comparing BBCS with the original FlexX scoring function, we found that (1) the success rate of recognizing the crystal structure at the top relative to decoys increased from 33.3% to 52.1% and that (2) the rank of the crystal structure improved for 54.2% of the complexes and worsened for none. We also found that BBCS performed better than conventional consensus scoring (CS).
Asunto(s)
Inteligencia Artificial , Proteínas/química , Proteínas/metabolismo , Cristalografía por Rayos X , Ligandos , Redes Neurales de la Computación , Unión Proteica , Reproducibilidad de los ResultadosRESUMEN
Since the evaluation of ligand conformations is a crucial aspect of structure-based virtual screening, scoring functions play significant roles in it. However, it is known that a scoring function does not always work well for all target proteins. When one cannot know which scoring function works best against a target protein a priori, there is no standard scoring method to know it even if 3D structure of a target protein-ligand complex is available. Therefore, development of the method to achieve high enrichments from given scoring functions and 3D structure of protein-ligand complex is a crucial and challenging task. To address this problem, we applied SCS (supervised consensus scoring), which employs a rough linear correlation between the binding free energy and the root-mean-square deviation (rmsd) of a native ligand conformations and incorporates protein-ligand binding process with docked ligand conformations using supervised learning, to virtual screening. We evaluated both the docking poses and enrichments of SCS and five scoring functions (F-Score, G-Score, D-Score, ChemScore, and PMF) for three different target proteins: thymidine kinase (TK), thrombin (thrombin), and peroxisome proliferator-activated receptor gamma (PPARgamma). Our enrichment studies show that SCS is competitive or superior to a best single scoring function at the top ranks of screened database. We found that the enrichments of SCS could be limited by a best scoring function, because SCS is obtained on the basis of the five individual scoring functions. Therefore, it is concluded that SCS works very successfully from our results. Moreover, from docking pose analysis, we revealed the connection between enrichment and average centroid distance of top-scored docking poses. Since SCS requires only one 3D structure of protein-ligand complex, SCS will be useful for identifying new ligands.
Asunto(s)
Estructura Molecular , Ligandos , Modelos Moleculares , Proteínas/químicaRESUMEN
We propose a hypothesis that "a model of active compound can be provided by integrating information of compounds high-ranked by docking simulation of a random compound library". In our hypothesis, the inclusion of true active compounds in the high-ranked compound is not necessary. We regard the high-ranked compounds as being pseudo-active compounds. As a method to embody our hypothesis, we introduce a pseudo-structure-activity relationship (PSAR) model. Although the PSAR model is the same as a quantitative structure activity relationship (QSAR) model, in terms of statistical methodology, the implications of the training data are different. Known active compounds (ligands) are used as training data in the QSAR model, whereas the pseudo-active compounds are used in the PSAR model. In this study, Random Forest was used as a machine-learning algorithm. From tests for four functionally different targets, estrogen receptor antagonist (ER), thymidine kinase (TK), thrombin, and acetylcholine esterase (AChE), using five scoring functions, we obtained three conclusions: (1) the PSAR models significantly gave higher percentages of known ligands found than random sampling, and these results are sufficient to support our hypothesis; (2) the PSAR models gave higher percentages of known ligands found than normal scoring by scoring function, and these results demonstrate the practical usefulness of the PSAR model; and (3) the PSAR model can assess compounds failed in the docking simulation. Note that PSAR and QSAR models are used in different situations; the advantage of the PSAR model emerges when no ligand is available as training data or when one wants to find novel types of ligands, whereas the QSAR model is effective for finding compounds similar to known ligands when the ligands are already known.
Asunto(s)
Modelos Moleculares , Proteínas/química , Curva ROC , Relación Estructura-ActividadRESUMEN
The evaluation of ligand conformations is a crucial aspect of structure-based virtual screening, and scoring functions play significant roles in it. While consensus scoring (CS) generally improves enrichment by compensating for the deficiencies of each scoring function, the strategy of how individual scoring functions are selected remains a challenging task when few known active compounds are available. To address this problem, we propose feature selection-based consensus scoring (FSCS), which performs supervised feature selection with docked native ligand conformations to select complementary scoring functions. We evaluated the enrichments of five scoring functions (F-Score, D-Score, PMF, G-Score, and ChemScore), FSCS, and RCS (rank-by-rank consensus scoring) for four different target proteins: acetylcholine esterase (AChE), thrombin (thrombin), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPARgamma). The results indicated that FSCS was able to select the complementary scoring functions and enhance ligand enrichments and that it outperformed RCS and the individual scoring functions for all target proteins. They also indicated that the performances of the single scoring functions were strongly dependent on the target protein. An especially favorable result with implications for practical drug screening is that FSCS performs well even if only one 3D structure of the protein-ligand complex is known. Moreover, we found that one can infer which scoring functions significantly enrich active compounds by using feature selection before actual docking and that the selected scoring functions are complementary.
Asunto(s)
Simulación por Computador , Evaluación Preclínica de Medicamentos/métodos , Animales , Humanos , Ligandos , Estructura Molecular , PPAR gamma/antagonistas & inhibidores , Unión Proteica , Relación Estructura-Actividad Cuantitativa , Proyectos de InvestigaciónRESUMEN
Protein-ligand docking programs have been used to efficiently discover novel ligands for target proteins from large-scale compound databases. However, better scoring methods are needed. Generally, scoring functions are optimized by means of various techniques that affect their fitness for reproducing X-ray structures and protein-ligand binding affinities. However, these scoring functions do not always work well for all target proteins. A scoring function should be optimized for a target protein to enhance enrichment for structure-based virtual screening. To address this problem, we propose the supervised scoring model (SSM), which takes into account the protein-ligand binding process using docked ligand conformations with supervised learning for optimizing scoring functions against a target protein. SSM employs a rough linear correlation between binding free energy and the root mean square deviation of a native ligand for predicting binding energy. We applied SSM to the FlexX scoring function, that is, F-Score, with five different target proteins: thymidine kinase (TK), estrogen receptor (ER), acetylcholine esterase (AChE), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPARgamma). For these five proteins, SSM always enhanced enrichment better than F-Score, exhibiting superior performance that was particularly remarkable for TK, AChE, and PPARgamma. We also demonstrated that SSM is especially good at enhancing enrichments of the top ranks of screened compounds, which is useful in practical drug screening.
Asunto(s)
ARN/química , ARN/efectos de los fármacos , Algoritmos , Emparejamiento Base , ADN/química , ADN/efectos de los fármacos , Diseño de Fármacos , Bases del Conocimiento , Ligandos , Espectroscopía de Resonancia Magnética , Conformación Molecular , Proteínas/química , Proteínas/efectos de los fármacos , Reproducibilidad de los Resultados , Relación Estructura-Actividad , Teofilina/química , Teofilina/farmacologíaRESUMEN
Docking programs are widely used to discover novel ligands efficiently and can predict protein-ligand complex structures with reasonable accuracy and speed. However, there is an emerging demand for better performance from the scoring methods. Consensus scoring (CS) methods improve the performance by compensating for the deficiencies of each scoring function. However, conventional CS and existing scoring functions have the same problems, such as a lack of protein flexibility, inadequate treatment of salvation, and the simplistic nature of the energy function used. Although there are many problems in current scoring functions, we focus our attention on the incorporation of unbound ligand conformations. To address this problem, we propose supervised consensus scoring (SCS), which takes into account protein-ligand binding process using unbound ligand conformations with supervised learning. An evaluation of docking accuracy for 100 diverse protein-ligand complexes shows that SCS outperforms both CS and 11 scoring functions (PLP, F-Score, LigScore, DrugScore, LUDI, X-Score, AutoDock, PMF, G-Score, ChemScore, and D-score). The success rates of SCS range from 89% to 91% in the range of rmsd < 2 A, while those of CS range from 80% to 85%, and those of the scoring functions range from 26% to 76%. Moreover, we also introduce a method for judging whether a compound is active or inactive with the appropriate criterion for virtual screening. SCS performs quite well in docking accuracy and is presumably useful for screening large-scale compound databases before predicting binding affinity.