Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
1.
Langmuir ; 39(17): 5986-5994, 2023 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-37068184

RESUMO

The covalent functionalization of carbon surfaces with nanometer-scale precision is of interest because of its potential in a range of applications. We herein report the controlled grafting of graphite surfaces using electrochemically generated aryl radicals templated by self-assembled molecular networks (SAMNs) of bisalkylurea derivatives. A bisalkylurea derivative having two butoxy units acts as a template for the covalent functionalization of aryl groups in between self-assembled rows of this molecule. In contrast, grafting occurs without a spatial order when an SAMN of bis(tetradecyl)urea was used as a template. This indicates that a degree of dynamics at the alkyl termini is required to favor controlled covalent attachment, a situation that is suppressed by strong intrarow intermolecular interactions resulting from the hydrogen bonding of the urea groups, but favored by terminal short alkoxy groups. The present information is useful for understanding the mechanism of the template-guided aryl radical grafting and the molecular design of new generations of template molecules.

2.
J Chem Inf Model ; 63(3): 794-805, 2023 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-36635071

RESUMO

Herein, we propose a de novo direct inverse quantitative structure-property relationship/quantitative structure-activity relationship (QSPR/QSAR) analysis method, based on the chemical variational autoencoder (VAE) and Gaussian mixture regression (GMR) models, to generate molecules with the desired target variables of interest for properties and activities (y). A data set of molecules was analyzed, and an encoder was used to transform the simplified molecular input line entry system (SMILES) strings to latent variables (x), while a decoder was used to transform x to SMILES strings. A chemical VAE model was used for analysis and a GMR model (between x and y) was constructed for direct inverse analysis. The target y values were input into the GMR model to directly predict the x values. Following this, the predicted x values were input into the decoder associated with the chemical VAE model and the SMILES string representations (or chemical structures of molecules) were obtained as the output, indicating that the proposed method could be used to selectively obtain the molecules that were characterized by the target y values. We confirmed that the proposed method can be used to generate molecules within the target y ranges even when the conventional chemical VAE model failed to generate the target molecules.


Assuntos
Modelos Químicos , Relação Quantitativa Estrutura-Atividade , Distribuição Normal
3.
J Chem Inf Model ; 63(18): 5764-5772, 2023 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-37655841

RESUMO

Highly active catalysts are required in numerous industrial fields; therefore, to minimize costs and development time, catalyst design using machine learning has attracted significant attention. This study focused on a reaction system where two types of cross-coupling reactions, namely, Buchwald-Hartwig type cross-coupling (BHCC) and Suzuki-Miyaura type cross-coupling (SMCC) reactions, occur simultaneously. Constructing a machine-learning model that considers all experimental conditions is essential to accurately predict the product yield for both the BHCC and the SMCC reactions. The objective of this study was to establish explanatory variables x that considered all experimental conditions within the reaction system involving simultaneous cross-couplings and to design catalysts that achieve the target yield and the development of novel reactions. To accomplish this, Bayesian optimization was combined with established variables x to design new catalysts and enhance reaction selectivity. Moreover, the catalyst design in this study successfully pioneered new reactions involving Cu, Rh, and Pt catalysts in a reaction system that did not previously react with transition metals other than Ni or Pd.


Assuntos
Teorema de Bayes , Catálise
4.
J Phys Chem A ; 126(36): 6336-6347, 2022 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-36053017

RESUMO

Materials exhibiting higher mobility than conventional organic semiconducting materials, such as fullerenes and fused thiophenes, are in high demand for applications in printed electronics. To discover new molecules that might show improved charge mobility, the adaptive design of experiments (DoE) to design molecules with low reorganization energy was performed by combining density functional theory (DFT) methods and machine learning techniques. DFT-calculated values of 165 molecules were used as an initial training dataset for a Gaussian process regression (GPR) model, and five rounds of molecular designs applying the GPR model and validation via DFT calculations were executed. As a result, new molecules whose reorganization energy is smaller than the lowest value in the initial training dataset were successfully discovered.

5.
J Chem Inf Model ; 61(12): 5785-5792, 2021 12 27.
Artigo em Inglês | MEDLINE | ID: mdl-34898202

RESUMO

Metal-organic frameworks (MOFs) are materials in which metals and organic compounds form crystalline and porous structures. Previous studies have investigated the relationships between the structure properties and physical properties of MOFs through molecular simulations, but the overall relationships in MOFs, including the relationships between the metals and organic components and the experimentally measured physical properties, have not been clarified. In this study, we developed two regression models between three elements in MOFs: the components, structure properties, and gas-adsorption capacities as physical properties. Using a nonlinear regression analysis method, we succeeded in predicting the structure properties from the components and the physical properties from the structure properties.


Assuntos
Estruturas Metalorgânicas , Adsorção , Estruturas Metalorgânicas/química , Metais/química , Compostos Orgânicos , Porosidade
6.
J Am Chem Soc ; 142(16): 7699-7708, 2020 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-32212655

RESUMO

Controlled covalent functionalization of graphitic surfaces with molecular scale precision is crucial for tailored modulation of the chemical and physical properties of carbon materials. We herein present that porous self-assembled molecular networks (SAMNs) act as nanometer scale template for the covalent electrochemical functionalization of graphite using an aryldiazonium salt. Hexagonally aligned achiral grafted species with lateral periodicity of 2.3, 2.7, and 3.0 nm were achieved utilizing SAMNs having different pore-to-pore distances. The unit cell vectors of the grafted pattern match those of the SAMN. After the covalent grafting, the template SAMNs can be removed by simple washing with a common organic solvent. We briefly discuss the mechanism of the observed pattern transfer. The unit cell vectors of the grafted pattern align along nonsymmetry axes of graphite, leading to mirror image grafted domains, in accordance with the domain-specific chirality of the template. In the case in which a homochiral building block is used for SAMN formation, one of the 2D mirror image grafted patterns is canceled. This is the first example of a nearly crystalline one-sided or supratopic covalent chemical functionalization. In addition, the positional control imposed by the SAMN renders the functionalized surface (homo)chiral reaching a novel level of control for the functionalization of carbon surfaces, including surface-supported graphene.

7.
J Chem Inf Model ; 58(12): 2528-2535, 2018 12 24.
Artigo em Inglês | MEDLINE | ID: mdl-30352147

RESUMO

To achieve simultaneous data visualization and clustering, the method of sparse generative topographic mapping (SGTM) is developed by modifying the conventional GTM algorithm. While the weight of each grid point is constant in the original GTM, it becomes a variable in the proposed SGTM, enabling data points to be clustered on two-dimensional maps. The appropriate number of clusters is determined by optimization based on the Bayesian information criterion. Analysis of numerical simulation data sets along with quantitative structure-property relationship and quantitative structure-activity relationship data sets confirmed that the proposed SGTM provides the same degree of visualization performance as the original GTM and clusters data points appropriately. Python and MATLAB codes for the proposed algorithm are available at https://github.com/hkaneko1985/gtm-generativetopographicmapping .


Assuntos
Visualização de Dados , Bases de Dados de Compostos Químicos , Aprendizado de Máquina , Análise por Conglomerados , Poluentes Ambientais/química , Poluentes Ambientais/toxicidade , Relação Quantitativa Estrutura-Atividade , Solubilidade
8.
J Chem Inf Model ; 58(2): 480-489, 2018 02 26.
Artigo em Inglês | MEDLINE | ID: mdl-29425038

RESUMO

To develop a new ensemble learning method and construct highly predictive regression models in chemoinformatics and chemometrics, applicability domains (ADs) are introduced into the ensemble learning process of prediction. When estimating values of an objective variable using subregression models, only the submodels with ADs that cover a query sample, i.e., the sample is inside the model's AD, are used. By constructing submodels and changing a list of selected explanatory variables, the union of the submodels' ADs, which defines the overall AD, becomes large, and the prediction performance is enhanced for diverse compounds. By analyzing a quantitative structure-activity relationship data set and a quantitative structure-property relationship data set, it is confirmed that the ADs can be enlarged and the estimation performance of regression models is improved compared with traditional methods.


Assuntos
Aprendizagem , Aprendizado de Máquina , Modelos Moleculares , Algoritmos , Relação Quantitativa Estrutura-Atividade , Análise de Regressão , Solubilidade , Água/química
9.
AAPS PharmSciTech ; 18(3): 595-604, 2017 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-27170163

RESUMO

This article proposes a novel concentration prediction model that requires little training data and is useful for rapid process understanding. Process analytical technology is currently popular, especially in the pharmaceutical industry, for enhancement of process understanding and process control. A calibration-free method, iterative optimization technology (IOT), was proposed to predict pure component concentrations, because calibration methods such as partial least squares, require a large number of training samples, leading to high costs. However, IOT cannot be applied to concentration prediction in non-ideal mixtures because its basic equation is derived from the Beer-Lambert law, which cannot be applied to non-ideal mixtures. We proposed a novel method that realizes prediction of pure component concentrations in mixtures from a small number of training samples, assuming that spectral changes arising from molecular interactions can be expressed as a function of concentration. The proposed method is named IOT with virtual molecular interaction spectra (IOT-VIS) because the method takes spectral change as a virtual spectrum x nonlin,i into account. It was confirmed through the two case studies that the predictive accuracy of IOT-VIS was the highest among existing IOT methods.


Assuntos
Preparações Farmacêuticas/química , Calibragem , Indústria Farmacêutica/métodos , Análise dos Mínimos Quadrados
10.
J Chem Inf Model ; 56(2): 286-99, 2016 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-26818135

RESUMO

Retrieving descriptor information (x information) from a value of an objective variable (y) is a fundamental problem in inverse quantitative structure-property relationship (inverse-QSPR) analysis but challenging because of the complexity of the preimage function. Herewith, we propose using a cluster-wise multiple linear regression (cMLR) model as a QSPR model for inverse-QSPR analysis. x information is acquired as a probability density function by combining cMLR and the prior distribution modeled with a mixture of Gaussians (GMMs). Three case studies were conducted to demonstrate various aspects of the potential of cMLR. It was found that the predictive power of cMLR was superior to that of MLR, especially for data with nonlinearity. Moreover, it turned out that the applicability domain could be considered since the posterior distribution inherits the prior distribution's feature (i.e., training data feature) and represents the possibility of having the desired property. Finally, a series of inverse analyses with the GMMs/cMLR was demonstrated with the aim to generate de novo structures having specific aqueous solubility.


Assuntos
Relação Quantitativa Estrutura-Atividade , Modelos Químicos , Estrutura Molecular
11.
J Chem Inf Model ; 56(10): 1885-1893, 2016 10 24.
Artigo em Inglês | MEDLINE | ID: mdl-27632418

RESUMO

To discover drug compounds in chemical space containing an enormous number of compounds, a structure generator is required to produce virtual drug-like chemical structures. The de novo design algorithm for exploring chemical space (DAECS) visualizes the activity distribution on a two-dimensional plane corresponding to chemical space and generates structures in a target area on a plane selected by the user. In this study, we modify the DAECS to enable the user to select a target area to consider properties other than activity and improve the diversity of the generated structures by visualizing the drug-likeness distribution and the activity distribution, generating structures by substructure-based structural changes, including addition, deletion, and substitution of substructures, as well as the slight structural changes used in the DAECS. Through case studies using ligand data for the human adrenergic alpha2A receptor and the human histamine H1 receptor, the modified DAECS can generate high diversity drug-like structures, and the usefulness of the modification of the DAECS is verified.


Assuntos
Algoritmos , Desenho de Fármacos , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia , Humanos , Ligantes , Simulação de Acoplamento Molecular , Receptores Adrenérgicos alfa 2/química , Receptores Adrenérgicos alfa 2/metabolismo , Receptores Histamínicos H1/química , Receptores Histamínicos H1/metabolismo
12.
J Comput Aided Mol Des ; 30(5): 425-46, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27299746

RESUMO

Generating chemical graphs in silico by combining building blocks is important and fundamental in virtual combinatorial chemistry. A premise in this area is that generated structures should be irredundant as well as exhaustive. In this study, we develop structure generation algorithms regarding combining ring systems as well as atom fragments. The proposed algorithms consist of three parts. First, chemical structures are generated through a canonical construction path. During structure generation, ring systems can be treated as reduced graphs having fewer vertices than those in the original ones. Second, diversified structures are generated by a simple rule-based generation algorithm. Third, the number of structures to be generated can be estimated with adequate accuracy without actual exhaustive generation. The proposed algorithms were implemented in structure generator Molgilla. As a practical application, Molgilla generated chemical structures mimicking rosiglitazone in terms of a two dimensional pharmacophore pattern. The strength of the algorithms lies in simplicity and flexibility. Therefore, they may be applied to various computer programs regarding structure generation by combining building blocks.


Assuntos
Desenho de Fármacos , Preparações Farmacêuticas/química , Tiazolidinedionas/química , Interface Usuário-Computador , Algoritmos , Simulação por Computador , Humanos , Estrutura Molecular , Rosiglitazona
13.
J Chem Inf Model ; 54(9): 2469-82, 2014 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-25119661

RESUMO

We discuss applicability domains (ADs) based on ensemble learning in classification and regression analyses. In regression analysis, the AD can be appropriately set, although attention needs to be paid to the bias of the predicted values. However, because the AD set in classification analysis is too wide, we propose an AD based on ensemble learning and data density. First, we set a threshold for data density below which the prediction result of new data is not reliable. Then, only for new data with a data density higher than the threshold, we consider the reliability of the prediction result based on ensemble learning. By analyzing data from numerical simulations and quantitative structural relationships, we validate our discussion of ADs in classification and regression analyses and confirm that appropriate ADs can be set using the proposed method.


Assuntos
Aprendizagem , Modelos Teóricos , Análise de Regressão
14.
ACS Omega ; 9(10): 11453-11458, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38496944

RESUMO

In molecular, material, and process design and control, the applicability domain (AD) of a mathematical model y = f(x) between properties, activities, and features x is constructed. As there are multiple AD methods, each with its own set of hyperparameters, it is necessary to select an appropriate AD method and hyperparameters for each data set and mathematical model. However, there is no method for optimizing the AD model. This study proposes a method for evaluating and optimizing the AD model for each data set and a mathematical model. Using the predictions of double cross-validation with all samples, the relationship between coverage and root-mean-squared error (RMSE) was calculated for all combinations of AD methods and their hyperparameters, and the area under the coverage and RMSE curve (AUCR) was calculated. The AD model with the lowest AUCR value was selected as the optimal fit for the mathematical model. The proposed method was validated using eight data sets, including molecules, materials, and spectra, demonstrating that the proposed method could generate optimal AD models for all data sets. The Python code for the proposed method is available at https://github.com/hkaneko1985/dcekit.

15.
ACS Omega ; 9(16): 18488-18494, 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38680296

RESUMO

Pesticides are widely used to improve crop productivity by eliminating weeds and pests. Conventional pesticide development involves synthesizing compounds, testing their activities, and studying their effects on the ecosystem. However, as pesticide discovery has an extremely low success rate, many compounds must be synthesized and tested. To overcome the high human, financial, and time costs of this process, machine learning is attracting increasing attention. In this study, we used machine learning for the molecular design of novel seed compounds for herbicides and insecticides. Classification models were constructed by using compounds that had been tested as herbicides and insecticides, and an inverse analysis of the constructed models was conducted. In the molecular design of herbicides, we proposed 186 new samples as herbicides using ensemble learning and a method for expressing explanatory variables that consider the relationships among eight weed species. For the molecular design of insecticides, we used undersampling and ensemble learning for the analysis of unbalanced data. Based on approximately 340,000 compounds, 12 potential insecticides were proposed, of which 2 exhibited actual activity when tested. These results demonstrate the potential of the developed machine-learning method for rapidly identifying novel herbicides and insecticides.

16.
Materials (Basel) ; 17(3)2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38591397

RESUMO

Hydroxyapatite and ß-tricalcium phosphate have been clinically applied as artificial bone materials due to their high biocompatibility. The development of artificial bones requires the verification of safety and efficacy through animal experiments; however, from the viewpoint of animal welfare, it is necessary to reduce the number of animal experiments. In this study, we utilized machine learning to construct a model that estimates the bone-forming ability of bioceramics from material fabrication conditions, material properties, and in vivo experimental conditions. We succeeded in constructing two models: 'Model 1', which predicts material properties from their fabrication conditions, and 'Model 2', which predicts the bone-formation rate from material properties and in vivo experimental conditions. The inclusion of full width at half maximum (FWHM) in the feature of Model 2 showed an improvement in accuracy. Furthermore, the results of the feature importance showed that the FWHMs were the most important. By an inverse analysis of the two models, we proposed candidates for material fabrication conditions to achieve target values of the bone-formation rate. Under the proposed conditions, the material properties of the fabricated material were consistent with the estimated material properties. Furthermore, a comparison between bone-formation rates after 12 weeks of implantation in the porcine tibia and the estimated bone-formation rate. This result showed that the actual bone-formation rates existed within the error range of the estimated bone-formation rates, indicating that machine learning consistently predicts the results of animal experiments using material fabrication conditions. We believe that these findings will lead to the establishment of alternative animal experiments to replace animal experiments in the development of artificial bones.

17.
J Chem Inf Model ; 53(9): 2341-8, 2013 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-23971910

RESUMO

We propose predictive performance criteria for nonlinear regression models without cross-validation. The proposed criteria are the determination coefficient and the root-mean-square error for the midpoints between k-nearest-neighbor data points. These criteria can be used to evaluate predictive ability after the regression models are updated, whereas cross-validation cannot be performed in such a situation. The proposed method is effective and helpful in handling big data when cross-validation cannot be applied. By analyzing data from numerical simulations and quantitative structural relationships, we confirm that the proposed criteria enable the predictive ability of the nonlinear regression models to be appropriately quantified.


Assuntos
Informática/métodos , Dinâmica não Linear , Dose Letal Mediana , Relação Quantitativa Estrutura-Atividade , Análise de Regressão , Reprodutibilidade dos Testes , Solubilidade , Tetrahymena pyriformis/efeitos dos fármacos , Fatores de Tempo , Testes de Toxicidade , Água/química
18.
Gan To Kagaku Ryoho ; 40 Suppl 2: 227-9, 2013 Dec.
Artigo em Japonês | MEDLINE | ID: mdl-24712155

RESUMO

Care should be taken regarding the intravenous administration of selenium (Se), an essential element, which is known to be associated with toxemia. The concentration of Se in the serum and hair of 2 patients (patient A and B) with short bowel syndrome, undergoing long-term home parenteral nutrition (HPN), was measured. As nutritional management, commercial total parenteral nutrition infusion was used without restricting oral intake. The patients received sodium selenite (Na2O3Se x 5H2O), a hospital preparation, at the Toho University Omori Medical Center. The dosage was gradually increased from 40 microg/ week to 120 micog/week over 17 months, and the Se concentration in serum and hair was measured bimonthly using inductively coupled plasma mass spectrometry (ICP-MS). The serum concentration of Se increased from 2.0 to 5.3 microg/dL and from 9.0 to 9.7 microg/dL in the case of patient A and B, respectively; however, it did not reach the average value that was observed in healthy volunteers (11.8 microg/dL). In contrast, the concentration of Se in hair gradually approached the reference value (reference range, 405-784 ppb at color correction criteria range 217-520 ppb) in the case of patient A (change from 189 to 278 ppb) and B (change from 291 to 200 ppb). Therefore, we were able to safely manage these cases without any deficiency and poisoning symptoms, by gradually increasing the administration doses.


Assuntos
Cabelo/química , Nutrição Parenteral no Domicílio , Selênio/análise , Feminino , Humanos , Infusões Intravenosas , Masculino , Pessoa de Meia-Idade , Selênio/administração & dosagem , Fatores de Tempo
19.
ACS Omega ; 8(25): 23218-23225, 2023 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-37396269

RESUMO

Feature importance (FI) is used to interpret the machine learning model y = f(x) constructed between the explanatory variables or features, x, and the objective variables, y. For a large number of features, interpreting the model in the order of increasing FI is inefficient when there are similarly important features. Therefore, in this study, a method is developed to interpret models by considering the similarities between the features in addition to the FI. The cross-validated permutation feature importance (CVPFI), which can be calculated using any machine learning method and can handle multicollinearity problems, is used as the FI, while the absolute correlation and maximal information coefficients are used as metrics of feature similarity. Machine learning models could be effectively interpreted by considering the features from the Pareto fronts, where CVPFI is large and the feature similarity is small. Analyses of actual molecular and material data sets confirm that the proposed method enables the accurate interpretation of machine learning models.

20.
ACS Omega ; 8(24): 21781-21786, 2023 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-37360490

RESUMO

For inverse QSAR/QSPR in conventional molecular design, several chemical structures must be generated and their molecular descriptors must be calculated. However, there is no one-to-one correspondence between the generated chemical structures and molecular descriptors. In this paper, molecular descriptors, structure generation, and inverse QSAR/QSPR based on self-referencing embedded strings (SELFIES), a 100% robust molecular string representation, are proposed. A one-hot vector is converted from SELFIES to SELFIES descriptors x, and an inverse analysis of the QSAR/QSPR model y = f(x) with the objective variable y and molecular descriptor x is conducted. Thus, x values that achieve a target y value are obtained. Based on these values, SELFIES strings or molecules are generated, meaning that inverse QSAR/QSPR is performed successfully. The SELFIES descriptors and SELFIES-based structure generation are verified using datasets of actual compounds. The successful construction of SELFIES-descriptor-based QSAR/QSPR models with predictive abilities comparable to those of models based on other fingerprints is confirmed. A large number of molecules with one-to-one relationships with the values of the SELFIES descriptors are generated. Furthermore, as a case study of inverse QSAR/QSPR, molecules with target y values are generated successfully. The Python code for the proposed method is available at https://github.com/hkaneko1985/dcekit.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA