Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
J Chem Inf Model ; 64(10): 4298-4309, 2024 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-38700741

RESUMO

The intricate nature of the blood-brain barrier (BBB) poses a significant challenge in predicting drug permeability, which is crucial for assessing central nervous system (CNS) drug efficacy and safety. This research utilizes an innovative approach, the classification read-across structure-activity relationship (c-RASAR) framework, that leverages machine learning (ML) to enhance the accuracy of BBB permeability predictions. The c-RASAR framework seamlessly integrates principles from both read-across and QSAR methodologies, underscoring the need to consider similarity-related aspects during the development of the c-RASAR model. It is crucial to note that the primary goal of this research is not to introduce yet another model for predicting BBB permeability but rather to showcase the refinement in predicting the BBB permeability of organic compounds through the introduction of a c-RASAR approach. This groundbreaking methodology aims to elevate the accuracy of assessing neuropharmacological implications and streamline the process of drug development. In this study, an ML-based c-RASAR linear discriminant analysis (LDA) model was developed using a dataset of 7807 compounds, encompassing both BBB-permeable and -nonpermeable substances sourced from the B3DB database (freely accessible from https://github.com/theochem/B3DB), for predicting BBB permeability in lead discovery for CNS drugs. The model's predictive capability was then validated using three external sets: one containing 276,518 natural products (NPs) from the LOTUS database (accessible from https://lotus.naturalproducts.net/download) for data gap filling, another comprising 13,002 drug-like/drug compounds from the DrugBank database (available from https://go.drugbank.com/), and a third set of 56 FDA-approved drugs to assess the model's reliability. Further diversifying the predictive arsenal, various other ML-based c-RASAR models were also developed for comparison purposes. The proposed c-RASAR framework emerged as a powerful tool for predicting BBB permeability. This research not only advances the understanding of molecular determinants influencing CNS drug permeability but also provides a versatile computational platform for the rapid assessment of diverse compounds, facilitating informed decision-making in drug development and design.


Assuntos
Barreira Hematoencefálica , Aprendizado de Máquina , Permeabilidade , Relação Quantitativa Estrutura-Atividade , Barreira Hematoencefálica/metabolismo , Humanos , Análise Discriminante
2.
Artigo em Inglês | MEDLINE | ID: mdl-38743054

RESUMO

Due to the lack of experimental toxicity data for environmental chemicals, there arises a need to fill data gaps by in silico approaches. One of the most commonly used in silico approaches for toxicity assessment of small datasets is the Quantitative Structure-Activity Relationship (QSAR), which generates predictive models for the efficient prediction of query compounds. However, the reliability of the predictions from QSARs derived from small datasets is often questionable from a statistical point of view. This is due to the presence of a larger number of descriptors as compared to the number of training compounds, which reduces the degree of freedom of the developed model. To reduce the overall prediction error for a particular QSAR model, we have proposed here the computation of the novel Arithmetic Residuals in K-groups Analysis (ARKA) descriptors. We have reduced the number of modeling descriptors in a supervised manner by partitioning them into K classes (K = 2 here) depending on the higher mean normalized values of the descriptors to a particular response class, thus preventing the loss of chemical information. A scatter plot of the data points using the values of two ARKA descriptors (ARKA_2 vs. ARKA_1) can potentially identify activity cliffs, less confident data points, and less modelable data points. We have used here five representative environmentally relevant endpoints (skin sensitization, earthworm toxicity, milk/plasma partitioning, algal toxicity, and rodent carcinogenicity of hazardous chemicals) with graded responses to which the ARKA framework was applied for classification modeling. On comparing the performance of the models generated using conventional QSAR descriptors and the ARKA descriptors, the prediction quality of the models derived from ARKA descriptors was found, based on multiple graded-data validation metrics-derived decision criteria, much better than the models derived from QSAR descriptors signifying the potential of ARKA descriptors in ecotoxicological classification modeling of small data sets. Additionally, this holds true for the Read-Across approach as well, since the Read-Across predictions using ARKA descriptors supersede the predictions generated from QSAR descriptors. For the ease of users, a Java-based expert system has been developed that computes the ARKA descriptors from the input of QSAR descriptors.

3.
Mol Inform ; 43(4): e202300210, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38374528

RESUMO

The application of various in-silico-based approaches for the prediction of various properties of materials has been an effective alternative to experimental methods. Recently, the concepts of Quantitative structure-property relationship (QSPR) and read-across (RA) methods were merged to develop a new emerging chemoinformatic tool: read-across structure-property relationship (RASPR). The RASPR method can be applicable to both large and small datasets as it uses various similarity and error-based measures. It has also been observed that RASPR models tend to have an increased external predictivity compared to the corresponding QSPR models. In this study, we have modeled the power conversion efficiency (PCE) of organic dyes used in dye-sensitized solar cells (DSSCs) by using the quantitative RASPR (q-RASPR) method. We have used relatively larger classes of organic dyes-Phenothiazines (n=207), Porphyrins (n=281), and Triphenylamines (n=229) for the modelling purpose. We have divided each of the datasets into training and test sets in 3 different combinations, and with the training sets we have developed three different QSPR models with structural and physicochemical descriptors and validated them with the corresponding test sets. These corresponding modeled descriptors were used to calculate the RASPR descriptors using a Java-based tool RASAR Descriptor Calculator v2.0 (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home), and then data fusion was performed by pooling the previously selected structural and physicochemical descriptors with the calculated RASPR descriptors. Further feature selection algorithm was employed to develop the final RASPR PLS models. Here, we also developed different machine learning (ML) models with the descriptors selected in the QSPR PLS and RASPR PLS models, and it was found that models with RASPR descriptors superseded in external predictivity the models with only structural and physicochemical descriptors: RMSEP reduced for phenothiazines from 1.16-1.25 to 1.07-1.18, for porphyrins from 1.60-1.79 to 1.45-1.53, for triphenylamines from 1.27-1.54 to 1.20-1.47.

4.
Environ Sci Process Impacts ; 25(10): 1626-1644, 2023 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-37682520

RESUMO

Environmental chemicals and contaminants cause a wide array of harmful implications to terrestrial and aquatic life which ranges from skin sensitization to acute oral toxicity. The current study aims to assess the quantitative skin sensitization potential of a large set of industrial and environmental chemicals acting through different mechanisms using the novel quantitative Read-Across Structure-Activity Relationship (q-RASAR) approach. Based on the identified important set of structural and physicochemical features, Read-Across-based hyperparameters were optimized using the training set compounds followed by the calculation of similarity and error-based RASAR descriptors. Data fusion, further feature selection, and removal of prediction confidence outliers were performed to generate a partial least squares (PLS) q-RASAR model, followed by the application of various Machine Learning (ML) tools to check the quality of predictions. The PLS model was found to be the best among different models. A simple user-friendly Java-based software tool was developed based on the PLS model, which efficiently predicts the toxicity value(s) of query compound(s) along with their status of Applicability Domain (AD) in terms of leverage values. This model has been developed using structurally diverse compounds and is expected to predict efficiently and quantitatively the skin sensitization potential of environmental chemicals to estimate their occupational and health hazards.


Assuntos
Relação Quantitativa Estrutura-Atividade , Pele , Análise dos Mínimos Quadrados , Compostos Orgânicos/toxicidade , Compostos Orgânicos/química
5.
Chem Res Toxicol ; 36(9): 1518-1531, 2023 09 18.
Artigo em Inglês | MEDLINE | ID: mdl-37584642

RESUMO

The advancements in the field of cheminformatics have led to a reduction in animal testing to estimate the activity, property, and toxicity of query chemicals. Read-across structure-activity relationship (RASAR) is an emerging concept that utilizes various similarity functions derived from chemical information to develop highly predictive models. Unlike quantitative structure-activity relationship (QSAR) models, RASAR descriptors of a query compound are computed from its close congeners instead of the compound itself, thus targeting predictions in the model training phase. The objective of the present study is not to propose new QSAR models for skin sensitization but to demonstrate the enhancement in the quality of predictions of the skin-sensitizing potential of organic compounds by developing classification-based RASAR (c-RASAR) models. A diverse, previously curated data set was collected from the literature for which 2D descriptors were computed. The extracted essential features were then used to develop a classification-based linear discriminant analysis (LDA) QSAR model. Furthermore, from the read-across-based predictions, RASAR descriptors were calculated using the basic settings of the hyperparameters for the Laplacian Kernel-based optimum similarity measure. After feature selection, an LDA c-RASAR model was developed, which superseded the prediction quality of the LDA-QSAR model. Various other combinations of RASAR descriptors were also taken to develop additional c-RASAR models, all showing better prediction quality than the LDA QSAR model while using a lower number of descriptors. Various other machine learning c-RASAR models were also developed for comparison purposes. In this work, we have proposed and analyzed three new similarity metrics: gm_class, sm1, and sm2. The first one is an indicator variable used to generate a simple univariate c-RASAR model with good prediction ability, while the remaining two are similarity indices used to analyze possible activity cliffs in the training and test sets and are believed to play an important role in the modelability analysis of data sets.


Assuntos
Compostos Orgânicos , Relação Quantitativa Estrutura-Atividade , Animais , Compostos Orgânicos/química , Aprendizado de Máquina
6.
J Hazard Mater ; 460: 132358, 2023 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-37634379

RESUMO

We have reported here a quantitative read-across structure-activity relationship (q-RASAR) model for the prediction of binary mixture toxicity (acute contact toxicity) in honey bees. Both the quantitative structure-activity relationship (QSAR) and the similarity-based read-across algorithms are used simultaneously for enhancing the predictability of the model. Several similarity and error-based parameters, obtained from the read-across prediction tool, have been put together with the structural and physicochemical descriptors to develop the final q-RASAR model. The calculated statistical and validation metrics indicate the goodness-of-fit, robustness, and good predictability of the partial least squares (PLS) regression model. Machine learning algorithms like ridge regression, linear support vector machine (SVM), and non-linear SVM have been used to further enhance the predictability of the q-RASAR model. The prediction quality of the q-RASAR models outperforms the previously reported quasi-SMILEs-based QSAR model in terms of external correlation coefficient (Q2F1 SVM q-RASAR: 0.935 vs. Q2VLD QSAR: 0.89). In this research, the toxicity values of several new untested binary mixtures have been predicted with the new models, and the reliability of the PLS predictions has been validated by the prediction reliability indicator tool. The q-RASAR approach can be used as reliable, complementary, and integrative to the conventional experimental approaches of pesticide mixture risk assessment.


Assuntos
Praguicidas , Relação Quantitativa Estrutura-Atividade , Abelhas , Animais , Reprodutibilidade dos Testes , Algoritmos , Aprendizado de Máquina , Praguicidas/toxicidade
7.
Nanotoxicology ; 17(1): 78-93, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36891579

RESUMO

The availability of experimental nanotoxicity data is in general limited which warrants both the use of in silico methods for data gap filling and exploring novel methods for effective modeling. Read-Across Structure-Activity Relationship (RASAR) is an emerging cheminformatic approach that combines the usefulness of a QSAR model and similarity-based Read-Across predictions. In this work, we have generated simple, interpretable, and transferable quantitative-RASAR (q-RASAR) models which can efficiently predict the cytotoxicity of TiO2-based multi-component nanoparticles. A data set of 29 TiO2-based nanoparticles with specific amounts of noble metal precursors was rationally divided into training and test sets, and the Read-Across-based predictions for the test set were generated. The optimized hyperparameters and the similarity approach, which yield the best predictions, were used to calculate the similarity and error-based RASAR descriptors. A data fusion of the RASAR descriptors with the chemical descriptors was done followed by the best subset feature selection. The final set of selected descriptors was used to develop the q-RASAR models, which were validated using the stringent OECD criteria. Finally, a random forest model was also developed with the selected descriptors, which could efficiently predict the cytotoxicity of TiO2-based multi-component nanoparticles superseding previously reported models in the prediction quality thus showing the merits of the q-RASAR approach. To further evaluate the usefulness of the approach, we have applied the q-RASAR approach also to a second cytotoxicity data set of 34 heterogeneous TiO2-based nanoparticles which further confirmed the enhancement of external prediction quality of QSAR models after incorporation of RASAR descriptors.


Assuntos
Nanopartículas , Relação Quantitativa Estrutura-Atividade , Titânio/toxicidade , Aprendizado de Máquina , Nanopartículas/toxicidade
8.
Chem Res Toxicol ; 36(3): 446-464, 2023 03 20.
Artigo em Inglês | MEDLINE | ID: mdl-36811528

RESUMO

The novel quantitative read-across structure-activity relationship (q-RASAR) approach uses read-across-derived similarity functions in the quantitative structure-activity relationship (QSAR) modeling framework in a unique way for supervised model generation. The aim of this study is to explore how this workflow enhances the external (test set) prediction quality of conventional QSAR models by the incorporation of some novel similarity-based functions as additional descriptors using the same level of chemical information. To establish this, five different toxicity data sets, for which QSAR models were reported previously, have been considered in the q-RASAR modeling exercise, which uses chemical similarity-derived measures. The identical sets of chemical features along with the same compositions of training and test sets as reported previously were used in the present analysis for ease of comparison. The RASAR descriptors were calculated based on a chosen similarity measure with the default setting of relevant hyperparameter(s) and were then clubbed with the original structural and physicochemical descriptors, and the number of selected features was further optimized by employing a grid search technique applied on the respective training sets. These features were then used to develop multiple linear regression (MLR) q-RASAR models that show enhanced predictivity as compared to the QSAR models developed previously. Moreover, various other ML algorithms like support vector machine (SVM), linear SVM, random forest, partial least squares, and ridge regression were also employed using the same feature combinations as used in the MLR models to compare the prediction qualities. The q-RASAR models for five different data sets possess at least one of the RASAR descriptors, RA function, gm, and average similarity, suggesting that these are important determinants of similarities that contribute to the development of predictive q-RASAR models, as also evident from the SHAP analysis of the models.


Assuntos
Algoritmos , Relação Quantitativa Estrutura-Atividade
9.
Mol Inform ; 42(4): e2200261, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36618002

RESUMO

In this study, the specific surface area of various perovskites was modeled using a novel quantitative read-across structure-property relationship (q-RASPR) approach, which clubs both Read-Across (RA) and quantitative structure-property relationship (QSPR) together. After optimization of the hyper-parameters, certain similarity-based error measures for each query compound were obtained. Clubbing some of these error-based measures with the previously selected features along with the Read-Across prediction function, a number of machine learning models were developed using Partial Least Squares (PLS), Ridge Regression (RR), Linear Support Vector Regression (LSVR), Random Forest (RF) regression, Gradient Boost (GBoost), Adaptive Boosting (Adaboost), Multiple Layer Perceptron (MLP) regression and k-Nearest Neighbor (kNN) regression. Based on the repeated cross-validation as well as external prediction quality and interpretability, the PLS model (nTraining = 38, nTest = 12, R T r a i n 2 ${{R}_{Train}^{2}}$ =0.737, Q L O O 2 = 0 . 637 , R T e s t 2 = 0 . 898 , Q F 1 T e s t 2 = 0 . 901 ) ${{Q}_{LOO}^{2}=0.637,\ {R}_{Test}^{2}=0.898,{\rm \ }\ {Q}_{F1\left(Test\right)}^{2}=0.901)}$ was selected as the best predictor which underscored the previously reported results. The finally selected model should efficiently predict specific surface areas of other perovskites for their use in photocatalysis. The new q-RASPR method also appears promising for the prediction of several other property endpoints of interest in materials science.


Assuntos
Aprendizado de Máquina , Óxidos , Redes Neurais de Computação , Relação Quantitativa Estrutura-Atividade
10.
Chemosphere ; 309(Pt 1): 136579, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36174732

RESUMO

Endocrine Disruptor Chemicals are synthetic or natural molecules in the environment that promote adverse modifications of endogenous hormone regulation in humans and/or in animals. In the present research, we have applied two-dimensional quantitative structure-activity relationship (2D-QSAR) modeling to analyze the structural features of these chemicals responsible for binding to the androgen receptors (logRBA) in rats. We have collected the receptor binding data from the EDKB database (https://www.fda.gov/science-research/endocrine-disruptor-knowledge-base/accessing-edkb-database) and then employed the DTC-QSAR tool, available from https://dtclab.webs.com/software-tools, for dataset division, feature selection, and model development. The final partial least squares model was evaluated using various stringent validation criteria. From the model, we interpreted that hydrophobicity, steroidal nucleus, bulkiness and a hydrogen bond donor at an appropriate position contribute to the receptor binding affinity, while presence of electron rich features like aromaticity and polar groups decrease the receptor binding affinity. Additionally we have also performed chemical Read-Across predictions using Read-Across-v3.1 available from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home, and the results for the external validation metrics were found to be better than the QSAR-derived predictions. The best quality of external predictions emerged from the q-RASAR approach which combines both read-across and QSAR. To explore the essential features responsible for the receptor binding, pharmacophore mapping, molecular docking along with molecular dynamics simulation were also performed, and the results are in accordance with the QSAR/q-RASAR findings.


Assuntos
Disruptores Endócrinos , Relação Quantitativa Estrutura-Atividade , Ratos , Humanos , Animais , Disruptores Endócrinos/toxicidade , Disruptores Endócrinos/química , Receptores Androgênicos/metabolismo , Simulação de Acoplamento Molecular , Hormônios
11.
Mol Divers ; 26(5): 2847-2862, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35767129

RESUMO

Quantitative structure-activity relationship (QSAR) and read-across techniques have recently been merged into a new emerging field of read-across structure-activity relationship (RASAR) that uses the chemical similarity concepts of read-across (an unsupervised step) and finally develops a supervised learning model (like QSAR). The RASAR method has so far been used only in case of graded predictions or classification modeling. In this work, we attempt, for the first time, to apply RASAR for quantitative predictions (q-RASAR) using a case study of androgen receptor binding affinity data. We have computed a number of error-based and similarity-based measures such as weighted standard deviation of the predicted values, coefficient of variation of the computed predictions, average similarity level of close training compounds for each query molecule, standard deviation and coefficient of variation of similarity levels, maximum similarity levels to positive and negative close training compounds, a concordance measure indicating similarity to positive, negative or both classes of close training compounds, etc. We have clubbed these additional measures along with the selected chemical descriptors from the previously developed QSAR model and redeveloped new partial least squares models from the training set, and predicted the endpoint using the query data set. Interestingly, these new models outperform the internal and external validation quality of the original QSAR model. In this study, we have also introduced a new similarity-based concordance measure (Banerjee-Roy coefficient) that can significantly contribute to the model quality. A q-RASAR model also has the advantage over read-across predictions in providing easy interpretation and indicating quantitative contributions of important chemical features. The strategy described here should be applicable to other biological/toxicological/property data modeling for enhanced quality of predictions, easy interpretability, and efficient transferability.


Assuntos
Relação Quantitativa Estrutura-Atividade , Receptores Androgênicos , Análise dos Mínimos Quadrados , Ligação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA