RESUMEN
The SAR matrix data structure organizes compound data sets according to structurally analogous matching molecular series in a format reminiscent of conventional R-group tables. An intrinsic feature of SAR matrices is that they contain many virtual compounds that represent unexplored combinations of core structures and substituents extracted from compound data sets on the basis of the matched molecular pair formalism. These virtual compounds are candidates for further exploration but are difficult, if not impossible to prioritize on the basis of visual inspection of multiple SAR matrices. Therefore, we introduce herein a compound neighborhood concept as an extension of the SAR matrix data structure that makes it possible to identify preferred virtual compounds for further analysis. On the basis of well-defined compound neighborhoods, the potency of virtual compounds can be predicted by considering individual contributions of core structures and substituents from neighbors. In extensive benchmark studies, virtual compounds have been prioritized in different data sets on the basis of multiple neighborhoods yielding accurate potency predictions.
Asunto(s)
Descubrimiento de Drogas , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología , Relación Estructura-Actividad , Bases de Datos Factuales , Humanos , Modelos BiológicosRESUMEN
Active compounds can participate in different local structure-activity relationship (SAR) environments and introduce different degrees of local SAR discontinuity, depending on their structural and potency relationships in data sets. Such SAR features have thus far mostly been analyzed using descriptive approaches, in particular, on the basis of activity landscape modeling. However, compounds in different local SAR environments have not yet been predicted. Herein, we adapt the emerging chemical patterns (ECP) method, a machine learning approach for compound classification, to systematically predict compounds with different local SAR characteristics. ECP analysis is shown to accurately assign many compounds to different local SAR environments across a variety of activity classes covering the entire range of observed local SARs. Control calculations using random forests and multiclass support vector machines were carried out and a variety of statistical performance measures were applied. In all instances, ECP calculations yielded comparable or better performance than controls. The approach presented herein can be applied to predict compounds that complement local SARs or prioritize compounds with different SAR characteristics.
Asunto(s)
Inteligencia Artificial , Modelos Químicos , Relación Estructura-ActividadRESUMEN
We have aimed to systematically extract analog series with related core structures from multi-target activity space to explore target promiscuity of closely related analogous. Therefore, a previously introduced SAR matrix structure was adapted and further extended for large-scale data mining. These matrices organize analog series with related yet distinct core structures in a consistent manner. High-confidence compound activity data yielded more than 2,300 non-redundant matrices capturing 5,821 analog series that included 4,288 series with multi-target and 735 series with multi-family activities. Many matrices captured more than three analog series with activity against more than five targets. The matrices revealed a variety of promiscuity patterns. Compound series matrices also contain virtual compounds, which provide suggestions for compound design focusing on desired activity profiles.
Asunto(s)
Minería de Datos , Diseño de Fármacos , Preparaciones Farmacéuticas/química , Bases de Datos Farmacéuticas , Relación Estructura-ActividadRESUMEN
An activity cliff is defined as a pair of structurally similar compounds that have a large difference in potency against a given target. The activity cliff concept has recently been extended in different ways, including the introduction of the activity ridge data structure. An activity ridge consists of two subsets of highly and weakly potent structurally analogous compounds that form all possible pairwise activity cliffs between them. As such, the activity ridge data structure is rich in structure-activity relationship (SAR) information and attractive for SAR analysis. Activity ridges have been detected in various compound data sets. Analogously to single-target activity cliffs, activity rides have thus far only been investigated for individual targets. In this study, we have asked the question whether multitarget activity ridges might also exist. The analysis has been complicated by the limited availability of suitable compound profiling data sets in the public domain. However, in a high-dimensional kinase inhibitor data set recently released by Abbott Laboratories, multitarget activity ridges involving up to 43 different inhibitors and 26 kinase targets were identified. Given the inherently complex architecture of multitarget activity ridges, a new representation format was designed for these ridges based on a scaffold-target matrix. Furthermore, a scoring scheme was developed to identify compounds that were most variably distributed across a multitarget ridge and displayed target differentiation potential. Taken together, our results indicate that multitarget activity ridges represent an attractive data structure for SAR exploration of high-dimensional activity spaces.
Asunto(s)
Algoritmos , Inhibidores de Proteínas Quinasas/química , Proteínas Quinasas/química , Relación Estructura-Actividad , Bases de Datos Factuales , Descubrimiento de Drogas , Humanos , Modelos MolecularesRESUMEN
The transfer of SAR information from one analog series to another is a difficult, yet highly attractive task in medicinal chemistry. At present, the evaluation of SAR transfer potential from a data mining perspective is still in its infancy. Only recently, a first computational approach has been introduced to evaluate SAR transfer events. Here, a substructure relationship-based molecular network representation has been used as a starting point to systematically identify SAR transfer series in large compound data sets. For this purpose, a methodology is introduced that consists of two stages. For graph mining, an algorithm has been designed that extracts all parallel series from compound data sets. A parallel series is formed by two series of analogs with different core structures but pairwise corresponding substitution patterns. The SAR transfer potential of identified parallel series is then evaluated using a scoring function that emphasizes corresponding potency progression over many analog pairs and large potency ranges. The substructure relationship-based molecular network in combination with the graph mining algorithm currently represents the only generally applicable approach to systematically detect SAR transfer events in large compound data sets. The combined approach has been evaluated on a large number of compound data sets and shown to systematically identify SAR transfer series.
Asunto(s)
Algoritmos , Antitrombinas/química , Minería de Datos , Bibliotecas de Moléculas Pequeñas/química , Relación Estructura-Actividad , Trombina/química , Química Farmacéutica , Bases de Datos de Compuestos Químicos , Diseño de Fármacos , Descubrimiento de Drogas , Humanos , Unión Proteica , Proyectos de Investigación , Trombina/antagonistas & inhibidoresRESUMEN
A new methodology for activity prediction of compounds from SAR matrices is introduced that is based upon conditional probabilities of activity. The approach has low computational complexity, is primarily designed for hit expansion from biological screening data, and accurately predicts both active and inactive compounds. Its performance is comparable to state-of-the-art machine learning methods such as support vector machines or Bayesian classification. Matrix-based activity prediction of virtual compounds further extends the spectrum of computational methods for compound design.
Asunto(s)
Simulación por Computador , Bases de Datos de Compuestos QuímicosRESUMEN
In a previous Method Article, we have presented the 'Structure-Activity Relationship (SAR) Matrix' (SARM) approach. The SARM methodology is designed to systematically extract structurally related compound series from screening or chemical optimization data and organize these series and associated SAR information in matrices reminiscent of R-group tables. SARM calculations also yield many virtual candidate compounds that form a "chemical space envelope" around related series. To further extend the SARM approach, different methods are developed to predict the activity of virtual compounds. In this follow-up contribution, we describe an activity prediction method that derives conditional probabilities of activity from SARMs and report representative results of first prospective applications of this approach.
RESUMEN
We describe the 'Structure-Activity Relationship (SAR) Matrix' (SARM) methodology that is based upon a special two-step application of the matched molecular pair (MMP) formalism. The SARM method has originally been designed for the extraction, organization, and visualization of compound series and associated SAR information from compound data sets. It has been further developed and adapted for other applications including compound design, activity prediction, library extension, and the navigation of multi-target activity spaces. The SARM approach and its extensions are presented here in context to introduce different types of applications and provide an example for the evolution of a computational methodology in pharmaceutical research.
RESUMEN
Compound promiscuity is rationalized as the specific interaction of a small molecule with multiple biological targets (as opposed to non-specific binding events) and represents the molecular basis of polypharmacology, an emerging theme in drug discovery and chemical biology. This concise review focuses on recent studies that have provided a detailed picture of the degree of promiscuity among different categories of small molecules. In addition, an exemplary computational approach is discussed that is designed to navigate multi-target activity spaces populated with various compounds.
RESUMEN
A graphical method is introduced for compound data mining and structure-activity relationship (SAR) data analysis that is based upon a canonical structural organization scheme and captures a compound-scaffold-skeleton hierarchy. The graph representation has a constant layout, integrates compound activity data, and provides direct access to SAR information. Characteristic SAR patterns that emerge from the graph are easily identified. The molecular hierarchy enables "forward-backward" analysis of compound data and reveals both global and local SAR patterns. For example, in heterogeneous data sets, compound series are immediately identified that convey interpretable SAR information in isolation or in the structural context of related series, which often define SAR pathways through data sets.