Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
J Comput Aided Mol Des ; 35(2): 179-193, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33392949

RESUMEN

Quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) models predict biological activity and molecular property based on the numerical relationship between chemical structures and activity (property) values. Molecular representations are of importance in QSAR/QSPR analysis. Topological information of molecular structures is usually utilized (2D representations) for this purpose. However, conformational information seems important because molecules are in the three-dimensional space. As a three-dimensional molecular representation applicable to diverse compounds, similarity between a test molecule and a set of reference molecules has been previously proposed. This 3D representation was found to be effective on virtual screening for early enrichment of active compounds. In this study, we introduced the 3D representation into QSAR/QSPR modeling (regression tasks). Furthermore, we investigated relative merits of 3D representations over 2D in terms of the diversity of training data sets. For the prediction task of quantum mechanics-based properties, the 3D representations were superior to 2D. For predicting activity of small molecules against specific biological targets, no consistent trend was observed in the difference of performance using the two types of representations, irrespective of the diversity of training data sets.


Asunto(s)
Compuestos Orgánicos/química , Bases de Datos Factuales , Evaluación Preclínica de Medicamentos , Aprendizaje Automático , Modelos Moleculares , Conformación Molecular , Relación Estructura-Actividad Cuantitativa , Análisis de Regresión
2.
Molecules ; 26(16)2021 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-34443503

RESUMEN

Activity cliffs (ACs) are formed by two structurally similar compounds with a large difference in potency. Accurate AC prediction is expected to help researchers' decisions in the early stages of drug discovery. Previously, predictive models based on matched molecular pair (MMP) cliffs have been proposed. However, the proposed methods face a challenge of interpretability due to the black-box character of the predictive models. In this study, we developed interpretable MMP fingerprints and modified a model-specific interpretation approach for models based on a support vector machine (SVM) and MMP kernel. We compared important features highlighted by this SVM-based interpretation approach and the SHapley Additive exPlanations (SHAP) as a major model-independent approach. The model-specific approach could capture the difference between AC and non-AC, while SHAP assigned high weights to the features not present in the test instances. For specific MMPs, the feature weights mapped by the SVM-based interpretation method were in agreement with the previously confirmed binding knowledge from X-ray co-crystal structures, indicating that this method is able to interpret the AC prediction model in a chemically intuitive manner.

3.
J Comput Aided Mol Des ; 33(8): 729-743, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31435894

RESUMEN

In this work, computational compound screening strategies on the basis of two- and three-dimensional (2D and 3D) molecular representations were investigated including similarity searching and support vector machine (SVM) ranking. Calculations based on topological fingerprints and molecular shape queries and features were compared. A unique aspect of the analysis setting apart from previous comparisons of 2D and 3D virtual screening approaches has been the design of compound reference, training, and test data sets with controlled incremental increases in intra-set structural diversity and different categories of structural relationships between reference/training and test sets. The use of these data sets made it possible to assess the relative performance of 2D and 3D screening strategies under increasingly challenging conditions ultimately leading to the use of training and test sets with essentially unrelated structures. The results showed that 3D similarity searching had little advantage over 2D searching in identifying active compounds with remote structural relationships. However, 3D SVM models trained on the basis of shape features were superior to other approaches (including 2D SVM) when the detection of structure-activity relationships became increasingly challenging. Such 3D SVM methods has thus far only been little investigated in virtual screening, proving a wealth of opportunities for further analyses.


Asunto(s)
Química Computacional/métodos , Relación Estructura-Actividad , Máquina de Vectores de Soporte , Interfaz Usuario-Computador , Aprendizaje Automático , Conformación Molecular , Unión Proteica/genética
4.
J Chem Inf Model ; 56(2): 300-7, 2016 Feb 22.
Artículo en Inglés | MEDLINE | ID: mdl-26838127

RESUMEN

The increase in compounds with activity against five major therapeutic target families has been quantified on a time scale and investigated employing a compound-scaffold-cyclic skeleton (CSK) hierarchy. The analysis was designed to better understand possible reasons for target-dependent growth of bioactive compounds. There was strong correlation between compound and scaffold growth across all target families. Active compounds becoming available over time were mostly represented by new scaffolds. On the basis of scaffold-to-compound ratios, new active compounds were structurally diverse and, on the basis of CSK-to-scaffold ratios, often had previously unobserved topologies. In addition, novel targets emerged that complemented major families. The analysis revealed that compound growth is associated with increasing chemical diversity and that current pharmaceutical targets are capable of recognizing many structurally different compounds, which provides a rationale for the rapid increase in the number of bioactive compounds over the past decade. In light of these findings, it is likely that new chemical entities will be discovered for many small molecule targets including relatively unexplored ones as well as for popular and well-studied therapeutic targets. Moreover, given the wealth of new "active scaffolds" that have been increasingly identified for many targets over time, computational scaffold-hopping exercises should generally have a high likelihood of success.


Asunto(s)
Descubrimiento de Drogas , Andamios del Tejido
5.
ACS Pharmacol Transl Sci ; 6(1): 139-150, 2023 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-36654744

RESUMEN

Influenza is a respiratory infection caused by the influenza virus that is prevalent worldwide. One of the most contagious variants of influenza is influenza A virus (IAV), which usually spreads in closed spaces through aerosols. Preventive measures such as novel compounds are needed that can act on viral membranes and provide a safe environment against IAV infection. In this study, we screened compounds with common fragrances that are generally used to mask unpleasant odors but can also exhibit antiviral activity against a strain of IAV. Initially, a set of 188 structurally diverse odorants were collected, and their antiviral activity was measured in vapor phase against the IAV solution. Regression models were built for the prediction of antiviral activity using this set of odorants by taking into account their structural features along with vapor pressure and partition coefficient (n-octanol/water). The models were interpreted using a feature weighting approach and Shapley Additive exPlanations to rationalize the predictions as an additional validation for virtual screening. This model was used to screen odorants from an in-house odorant data set consisting of 2020 odorants, which were later evaluated using in vitro experiments. Out of 11 odorants proposed using the final model, 8 odorants were found to exhibit antiviral activity. The feature interpretation of screened odorants suggested that they contained hydrophilic substructures, such as hydroxyl group, which might contribute to denaturation of proteins on the surface of the virus. These odorants should be explored as a preventive measure in closed spaces to decrease the risk of infections of IAV.

6.
ACS Omega ; 8(22): 19781-19788, 2023 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-37305275

RESUMEN

Fourier-transform infrared (FTIR) spectroscopy can detect the presence of functional groups and molecules directly from a mixed solution of organic molecules. Although it is quite useful to monitor chemical reactions, quantitative analysis of FTIR spectra becomes difficult when various peaks of different widths overlap. To overcome this difficulty, we propose a chemometrics approach to accurately predict the concentration of components in chemical reactions, yet interpretable by humans. The proposed method first decomposes a spectrum into peaks with various widths by the wavelet transform. Subsequently, a sparse linear regression model is built using the wavelet coefficients. Models by the method are interpretable using the regression coefficients shown on Gaussian distributions with various widths. The interpretation is expected to reveal the relation of broad regions in spectra to the model prediction. In this study, we conducted the prediction of monomer concentration in copolymerization reactions of five monomers against methyl methacrylate by various chemometric approaches including conventional methods. A rigorous validation scheme revealed that the proposed method overall showed better predictive ability than various linear and non-linear regression methods. The visualization results were consistent with the interpretation obtained by another chemometric approach and qualitative evaluation. The proposed method is found to be useful for calculating the concentrations of monomers in copolymerization reactions and for the interpretation of spectra.

7.
ACS Omega ; 3(4): 4706-4712, 2018 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-30023898

RESUMEN

Compound profiling matrices record assay results for compound libraries tested against panels of targets. In addition to their relevance for exploring structure-activity relationships, such matrices are of considerable interest for chemoinformatic and chemogenomic applications. For example, profiling matrices provide a valuable data resource for the development and evaluation of machine learning approaches for multitask activity prediction. However, experimental compound profiling matrices are rare in the public domain. Although they are generated in pharmaceutical settings, they are typically not disclosed. Herein, we present an algorithm for the generation of large profiling matrices, for example, containing more than 100 000 compounds exhaustively tested against 50 to 100 targets. The new methodology is a variant of biclustering algorithms originally introduced for large-scale analysis of genomics data. Our approach is applied here to assays from the PubChem BioAssay database and generates profiling matrices of increasing assay or compound coverage by iterative removal of entities that limit coverage. Weight settings control final matrix size by preferentially retaining assays or compounds. In addition, the methodology can also be applied to generate matrices enriched with active entries representing above-average assay hit rates.

8.
Future Sci OA ; 4(8): FSO327, 2018 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-30271615

RESUMEN

AIM: Screening of compounds against panels of targets yields profiling matrices. Such matrices are excellent test cases for the analysis and prediction of ligand-target interactions. We made three matrices freely available that were extracted from public screening data. METHODOLOGY: A new algorithm was used to derive complete profiling matrices from assay data. DATA: Two profiling matrices were derived from confirmatory assays containing 53 different targets and 109,925 and 143,310 distinct compounds, respectively. A third matrix was extracted from primary screening assays covering 171 different targets and 224,251 compounds. NEXT STEPS: Profiling matrices can be used to test computational chemogenomics methods for their ability to predict ligand-target pairs. Additional matrices will be generated for individual target families.

9.
J Med Chem ; 61(22): 10255-10264, 2018 11 21.
Artículo en Inglés | MEDLINE | ID: mdl-30422657

RESUMEN

Assay interference compounds give rise to false-positives and cause substantial problems in medicinal chemistry. Nearly 500 compound classes have been designated as pan-assay interference compounds (PAINS), which typically occur as substructures in other molecules. The structural environment of PAINS substructures is likely to play an important role for their potential reactivity. Given the large number of PAINS and their highly variable structural contexts, it is difficult to study context dependence on the basis of expert knowledge. Hence, we applied machine learning to predict PAINS that are promiscuous and distinguish them from others that are mostly inactive. Surprisingly accurate models can be derived using different methods such as support vector machines, random forests, or deep neural networks. Moreover, structural features that favor correct predictions have been identified, mapped, and categorized, shedding light on the structural context dependence of PAINS effects. The machine learning models presented herein further extend the capacity of PAINS filters.


Asunto(s)
Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Aprendizaje Automático , Modelos Estadísticos , Curva ROC
10.
ACS Omega ; 3(4): 4713-4723, 2018 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-30023899

RESUMEN

Screening of compound libraries against panels of targets yields profiling matrices. Such matrices typically contain structurally diverse screening compounds, large numbers of inactives, and small numbers of hits per assay. As such, they represent interesting and challenging test cases for computational screening and activity predictions. In this work, modeling of large compound profiling matrices was attempted that were extracted from publicly available screening data. Different machine learning methods including deep learning were compared and different prediction strategies explored. Prediction accuracy varied for assays with different numbers of active compounds, and alternative machine learning approaches often produced comparable results. Deep learning did not further increase the prediction accuracy of standard methods such as random forests or support vector machines. Target-based random forest models were prioritized and yielded successful predictions of active compounds for many assays.

11.
Medchemcomm ; 8(11): 2100-2104, 2017 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-30108727

RESUMEN

Compounds that are consistently inactive in many screening assays, so-called dark chemical matter (DCM), have recently experienced increasing attention. One of the reasons is that many DCM compounds may not be fully inert biologically, but may provide interesting leads for obtaining compounds that are highly selective or active against unusual targets. In this study, we have systematically identified DCM among extensively assayed screening compounds and searched for analogs of these compounds that have known bioactivities. Analog series containing DCM and known bioactive compounds were generated on a large scale, making it possible to derive target hypotheses for more than 8000 extensively assayed DCM molecules.

12.
J Med Chem ; 60(9): 3879-3886, 2017 05 11.
Artículo en Inglés | MEDLINE | ID: mdl-28421750

RESUMEN

Undetected pan-assay interference compounds (PAINS) with false-positive activities in assays often propagate through medicinal chemistry programs and compromise their outcomes. Although a large number of PAINS have been classified, often on the basis of individual studies or chemical experience, little has been done so far to systematically assess their activity profiles. Herein we report a large-scale analysis of the behavior of PAINS in biological screening assays. More than 23 000 extensively tested compounds containing PAINS substructures were detected, and their hit rates were determined. Many consistently inactive compounds were identified. The hit frequency was low overall, with median values of two to five hits for PAINS tested in hundreds of assays. Only confined subsets of PAINS produced abundant hits. The same PAINS substructure was often found in consistently inactive and frequently active compounds, indicating that the structural context in which PAINS occur modulates their effects.


Asunto(s)
Química Farmacéutica , Descubrimiento de Drogas , Bioensayo , Ensayos Analíticos de Alto Rendimiento
13.
AAPS J ; 19(3): 856-864, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28265982

RESUMEN

Publicly available screening data were systematically searched for extensively assayed structural analogs with large differences in the number of targets they were active against. Screening compounds with potential chemical liabilities that may give rise to assay artifacts were identified and excluded from the analysis. "Promiscuity cliffs" were frequently identified, defined here as pairs of structural analogs with a difference of at least 20 target annotations across all assays they were tested in. New assay indices were introduced to prioritize cliffs formed by screening compounds that were extensively tested in comparably large numbers of assays including many shared assays. In these cases, large differences in promiscuity degrees were not attributable to differences in assay frequency and/or lack of assay overlap. Such analog pairs have high priority for further exploring molecular origins of multi-target activities. Therefore, these promiscuity cliffs and associated target annotations are made freely available. The corresponding analogs often represent equally puzzling and interesting examples of structure-promiscuity relationships.


Asunto(s)
Preparaciones Farmacéuticas/química , Relación Estructura-Actividad
14.
PLoS One ; 11(4): e0153873, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27082988

RESUMEN

In the context of polypharmacology, an emerging concept in drug discovery, promiscuity is rationalized as the ability of compounds to specifically interact with multiple targets. Promiscuity of drugs and bioactive compounds has thus far been analyzed computationally on the basis of activity annotations, without taking assay frequencies or inactivity records into account. Most recent estimates have indicated that bioactive compounds interact on average with only one to two targets, whereas drugs interact with six or more. In this study, we have further extended promiscuity analysis by identifying the most extensively assayed public domain compounds and systematically determining their promiscuity. These compounds were tested in hundreds of assays against hundreds of targets. In our analysis, assay promiscuity was distinguished from target promiscuity and separately analyzed for primary and confirmatory assays. Differences between the degree of assay and target promiscuity were surprisingly small and average and median degrees of target promiscuity of 2.6 to 3.4 and 2.0 were determined, respectively. Thus, target promiscuity remained at a low level even for most extensively tested active compounds. These findings provide further evidence that bioactive compounds are less promiscuous than drugs and have implications for pharmaceutical research. In addition to a possible explanation that drugs are more extensively tested for additional targets, the results would also support a "promiscuity enrichment model" according to which promiscuous compounds might be preferentially selected for therapeutic efficacy during clinical evaluation to ultimately become drugs.


Asunto(s)
Descubrimiento de Drogas/métodos , Preparaciones Farmacéuticas/química , Bioensayo , Bases de Datos de Compuestos Químicos , Evaluación Preclínica de Medicamentos/métodos , Estructura Molecular , Bibliotecas de Moléculas Pequeñas , Relación Estructura-Actividad
15.
F1000Res ; 52016.
Artículo en Inglés | MEDLINE | ID: mdl-27127620

RESUMEN

A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various attempts have been made to establish relationships between calculated fingerprint similarity values and biological activities, none of these has yielded generally applicable rules for similarity searching. In this study, we have addressed the question of molecular versus activity similarity in a more fundamental way. First, we have evaluated if activity-relevant similarity value ranges could in principle be identified for standard fingerprints and distinguished from similarity resulting from random compound comparisons. Then, we have analyzed if activity-relevant similarity values could be used to guide typical similarity search calculations aiming to identify active compounds in databases. It was found that activity-relevant similarity values can be identified as a characteristic feature of fingerprints. However, it was also shown that such values cannot be reliably used as thresholds for practical similarity search calculations. In addition, the analysis presented herein helped to rationalize differences in fingerprint search performance.

16.
J Med Chem ; 59(22): 10285-10290, 2016 11 23.
Artículo en Inglés | MEDLINE | ID: mdl-27809519

RESUMEN

In PubChem screening assays, 466 highly promiscuous compounds were identified that were examined for known pan-assay interference compounds (PAINS) and aggregators using publicly available filters. These filters detected 210 PAINS and 67 aggregators. Compounds passing the filters included additional PAINS that were not detected, mostly due to tautomerism, and a variety of other potentially reactive compounds currently not encoded as PAINS. For a subset of compounds passing the filters, there was no evidence of potential artifacts. These compounds are considered candidates for further exploring multitarget activities and the molecular basis of polypharmacology.


Asunto(s)
Polifarmacología , Bibliotecas de Moléculas Pequeñas/química , Evaluación Preclínica de Medicamentos , Estructura Molecular
17.
F1000Res ; 4(Chem Inf Sci): 118, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26064479

RESUMEN

In the context of polypharmacology, compound promiscuity is rationalized as the ability of small molecules to specifically interact with multiple targets. To study promiscuity progression of bioactive compounds in detail, nearly 1 million compounds and more than 5.2 million activity records were analyzed. Compound sets were assembled by applying different data confidence criteria and selecting compounds with activity histories over many years. On the basis of release dates, compounds and activity records were organized on a time course, which ultimately enabled monitoring data growth and promiscuity progression over nearly 40 years, beginning in 1976. Surprisingly low degrees of promiscuity were consistently detected for all compound sets and there were only small increases in promiscuity over time. In fact, most compounds had a constant degree of promiscuity, including compounds with an activity history of 10 or 20 years. Moreover, during periods of massive data growth, beginning in 2007, promiscuity degrees also remained constant or displayed only minor increases, depending on the activity data confidence levels. Considering high-confidence data, bioactive compounds currently interact with 1.5 targets on average, regardless of their origins, and display essentially constant degrees of promiscuity over time. Taken together, our findings provide expectation values for promiscuity progression and magnitudes among bioactive compounds as activity data further grow.

18.
Mol Inform ; 34(2-3): 127-33, 2015 02.
Artículo en Inglés | MEDLINE | ID: mdl-27490035

RESUMEN

Support vector machines (SVMs) are among the most popular machine learning methods for compound classification and other chemoinformatics tasks such as, for example, the prediction of ligand-target pairs or compound activity profiles. Depending on the specific applications, different SVM strategies can be used. For example, in the context of potency-directed virtual screening, linear combinations of multiple SVM models have been shown to enrich database selection sets with potent compounds compared to individual models. An open question concerning the use of SVM linear combinations (SVM-LCs) is how to best weight the models on a relative scale. Typically, linear weights are subjectively set. Herein, preferred weighting factors for SVM-LC were systematically determined. Therefore, weights were treated as meta-parameters and optimized by machine learning to enrich data set rankings with highly active compounds. The meta-parameter approach has been applied to 10 screening data sets and found to further improve SVM performance over other SVM-LCs and support vector regression (SVR) models. The results show that optimal weights depend on data set characteristics and chosen molecular representations. In addition, individual models often do not contribute to the performance of SVM-LCs. Taken together, these findings emphasize the need for systematic meta-parameter estimation.


Asunto(s)
Bases de Datos de Compuestos Químicos , Modelos Teóricos , Máquina de Vectores de Soporte
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA