RESUMEN
In May 2022, JCAMD published a Special Issue in honor of Gerald (Gerry) Maggiora, whose scientific leadership over many decades advanced the fields of computational chemistry and chemoinformatics for drug discovery. Along the way, he has impacted many researchers in both academia and the pharmaceutical industry. In this Epilogue, we explain the origins of the Festschrift and present a series of first-hand vignettes, in approximate chronological sequence, that together paint a picture of this remarkable man. Whether they highlight Gerry's endless curiosity about molecular life sciences or his willingness to challenge conventional wisdom or his generous support of junior colleagues and peers, these colleagues and collaborators are united in their appreciation of his positive influence. These tributes also reflect key trends and themes during the evolution of modern drug discovery, seen through the lens of people who worked with a visionary leader. Junior scientists will find an inspiring roadmap for creative collegiality and collaboration.
Asunto(s)
Disciplinas de las Ciencias Biológicas , Mentores , Historia del Siglo XX , HumanosRESUMEN
BACKGROUND AND PURPOSE: Because robotic devices record the kinematics and kinetics of human movements with high resolution, we hypothesized that robotic measures collected longitudinally in patients after stroke would bear a significant relationship to standard clinical outcome measures and, therefore, might provide superior biomarkers. METHODS: In patients with moderate-to-severe acute ischemic stroke, we used clinical scales and robotic devices to measure arm movement 7, 14, 21, 30, and 90 days after the event at 2 clinical sites. The robots are interactive devices that measure speed, position, and force so that calculated kinematic and kinetic parameters could be compared with clinical assessments. RESULTS: Among 208 patients, robotic measures predicted well the clinical measures (cross-validated R(2) of modified Rankin scale=0.60; National Institutes of Health Stroke Scale=0.63; Fugl-Meyer=0.73; Motor Power=0.75). When suitably scaled and combined by an artificial neural network, the robotic measures demonstrated greater sensitivity in measuring the recovery of patients from day 7 to day 90 (increased standardized effect=1.47). CONCLUSIONS: These results demonstrate that robotic measures of motor performance will more than adequately capture outcome, and the altered effect size will reduce the required sample size. Reducing sample size will likely improve study efficiency.
Asunto(s)
Brazo/fisiología , Biomarcadores , Movimiento/fisiología , Robótica , Rehabilitación de Accidente Cerebrovascular , Accidente Cerebrovascular/fisiopatología , Anciano , Fenómenos Biomecánicos , Interpretación Estadística de Datos , Determinación de Punto Final , Etnicidad , Femenino , Lateralidad Funcional/fisiología , Humanos , Masculino , Modelos Anatómicos , Dinámicas no Lineales , Valor Predictivo de las Pruebas , Recuperación de la Función , Reproducibilidad de los ResultadosAsunto(s)
Informática/historia , Modelos Químicos , Historia del Siglo XX , Historia del Siglo XXI , Humanos , Estados UnidosRESUMEN
The aim of virtual screening (VS) is to identify bioactive compounds through computational means, by employing knowledge about the protein target (structure-based VS) or known bioactive ligands (ligand-based VS). In VS, a large number of molecules are ranked according to their likelihood to be bioactive compounds, with the aim to enrich the top fraction of the resulting list (which can be tested in bioassays afterward). At its core, VS attempts to improve the odds of identifying bioactive molecules by maximizing the true positive rate, that is, by ranking the truly active molecules as high as possible (and, correspondingly, the truly inactive ones as low as possible). In choosing the right approach, the researcher is faced with many questions: where does the optimal balance between efficiency and accuracy lie when evaluating a particular algorithm; do some methods perform better than others and in what particular situations; and what do retrospective results tell us about the prospective utility of a particular method? Given the multitude of settings, parameters, and data sets the practitioner can choose from, there are many pitfalls that lurk along the way which might render VS less efficient or downright useless. This review attempts to catalogue published and unpublished problems, shortcomings, failures, and technical traps of VS methods with the aim to avoid pitfalls by making the user aware of them in the first place.
Asunto(s)
Algoritmos , Simulación del Acoplamiento Molecular , Proteínas/química , Bibliotecas de Moléculas Pequeñas/química , Interfaz Usuario-Computador , Sitios de Unión , Bases de Datos de Compuestos Químicos , Ensayos Analíticos de Alto Rendimiento , Humanos , Ligandos , Funciones de Verosimilitud , Unión Proteica , Proteínas/agonistas , Proteínas/antagonistas & inhibidores , Relación Estructura-ActividadRESUMEN
We present a novel class of topological molecular descriptors, which we call power keys. Power keys are computed by enumerating all possible linear, branch, and cyclic subgraphs up to a given size, encoding the connected atoms and bonds into two separate components, and recording the number of occurrences of each subgraph. We have applied these new descriptors for the screening stage of substructure searching on a relational database of about 1 million compounds using a diverse set of reference queries. The new keys can eliminate the vast majority (>99.9% on average) of nonmatching molecules within a fraction of a second. More importantly, for many of the queries the screening efficiency is 100%. A common feature was identified for the molecules for which power keys have perfect discriminative ability. This feature can be exploited to obviate the need for expensive atom-by-atom matching in situations where some ambiguity can be tolerated (fuzzy substructure searching). Other advantages over commonly used molecular keys are also discussed.
Asunto(s)
Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Programas Informáticos , Algoritmos , Biología Computacional/estadística & datos numéricos , Bases de Datos Factuales , Descubrimiento de Drogas/estadística & datos numéricos , Lógica Difusa , Modelos Moleculares , Relación Estructura-ActividadRESUMEN
The utility of chemoinformatics systems depends on the accurate computer representation and efficient manipulation of chemical compounds. In such systems, a small molecule is often digitized as a large fingerprint vector, where each element indicates the presence/absence or the number of occurrences of a particular structural feature. Since in theory the number of unique features can be exceedingly large, these fingerprint vectors are usually folded into much shorter ones using hashing and modulo operations, allowing fast "in-memory" manipulation and comparison of molecules. There is increasing evidence that lossless fingerprints can substantially improve retrieval performance in chemical database searching (substructure or similarity), which have led to the development of several lossless fingerprint compression algorithms. However, any gains in storage and retrieval afforded by compression need to be weighed against the extra computational burden required for decompression before these fingerprints can be compared. Here we demonstrate that graphics processing units (GPU) can greatly alleviate this problem, enabling the practical application of lossless fingerprints on large databases. More specifically, we show that, with the help of a ~$500 ordinary video card, the entire PubChem database of ~32 million compounds can be searched in ~0.2-2 s on average, which is 2 orders of magnitude faster than a conventional CPU. If multiple query patterns are processed in batch, the speedup is even more dramatic (less than 0.02-0.2 s/query for 1000 queries). In the present study, we use the Elias gamma compression algorithm, which results in a compression ratio as high as 0.097.
Asunto(s)
Química Farmacéutica/métodos , Minería de Datos/métodos , Compuestos Orgánicos/análisis , Algoritmos , Química Farmacéutica/estadística & datos numéricos , Gráficos por Computador , Compresión de Datos , Bases de Datos Factuales , Modelos Químicos , Estructura Molecular , Programas InformáticosRESUMEN
We introduce Single R-Group Polymorphisms (SRPs, pronounced 'sharps'), an intuitive framework for analyzing substituent effects and activity cliffs in a single congeneric series. A SRP is a pair of compounds that differ only in a single R-group position. Because the same substituent pair may occur in multiple SRPs in the series (i.e., with different combinations of substituents at the other R-group positions), SRP analysis makes it easy to identify systematic substituent effects and activity cliffs at each point of variation (R-cliffs). SRPs can be visualized as a symmetric heatmap where each cell represents a particular pair of substituents color-coded by the average difference in activity between the compounds that contain that particular SRP. SRP maps offer several advantages over existing techniques for visualizing activity cliffs: 1) the chemical structures of all the substituents are displayed simultaneously on a single map, thus directly engaging the pattern recognition abilities of the medicinal chemist; 2) it is based on R-group decomposition, a natural paradigm for generating and rationalizing SAR; 3) it uses a heatmap representation that makes it easy to identify systematic trends in the data; 4) it generalizes the concept of activity cliffs beyond similarity by allowing the analyst to sort the substituents according to any property of interest or place them manually in any desired order.
Asunto(s)
Catepsinas/antagonistas & inhibidores , Descubrimiento de Drogas , Inhibidores de Proteasas/química , Programas Informáticos , Catepsinas/química , Gráficos por Computador , Ligandos , Estructura Molecular , Relación Estructura-ActividadRESUMEN
Stochastic proximity embedding (SPE) was developed as a method for efficiently calculating lower dimensional embeddings of high-dimensional data sets. Rather than using a global minimization scheme, SPE relies upon updating the distances of randomly selected points in an iterative fashion. This was found to generate embeddings of comparable quality to those obtained using classical multidimensional scaling algorithms. However, SPE is able to obtain these results in O(n) rather than O(n²) time and thus is much better suited to large data sets. In an effort both to speed up SPE and utilize it for even larger problems, we have created a multithreaded implementation which takes advantage of the growing general computing power of graphics processing units (GPUs). The use of GPUs allows the embedding of data sets containing millions of data points in interactive time scales.
Asunto(s)
Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Programas Informáticos , Algoritmos , Biología Computacional/estadística & datos numéricos , Gráficos por Computador , Computadores , Bases de Datos Factuales , Descubrimiento de Drogas/estadística & datos numéricosRESUMEN
Efficient substructure searching is a key requirement for any chemical information management system. In this paper, we describe the substructure search capabilities of ABCD, an integrated drug discovery informatics platform developed at Johnson & Johnson Pharmaceutical Research & Development, L.L.C. The solution consists of several algorithmic components: 1) a pattern mapping algorithm for solving the subgraph isomorphism problem, 2) an indexing scheme that enables very fast substructure searches on large structure files, 3) the incorporation of that indexing scheme into an Oracle cartridge to enable querying large relational databases through SQL, and 4) a cost estimation scheme that allows the Oracle cost-based optimizer to generate a good execution plan when a substructure search is combined with additional constraints in a single SQL query. The algorithm was tested on a public database comprising nearly 1 million molecules using 4,629 substructure queries, the vast majority of which were submitted by discovery scientists over the last 2.5 years of user acceptance testing of ABCD. 80.7% of these queries were completed in less than a second and 96.8% in less than ten seconds on a single CPU, while on eight processing cores these numbers increased to 93.2% and 99.7%, respectively. The slower queries involved extremely generic patterns that returned the entire database as screening hits and required extensive atom-by-atom verification.
Asunto(s)
Algoritmos , Descubrimiento de Drogas , Informática/métodos , Bibliotecas de Moléculas Pequeñas/química , Bases de Datos Factuales , Descubrimiento de Drogas/economía , Informática/economía , Factores de TiempoRESUMEN
We present a novel approach for enhancing the diversity of a chemical library rooted on the theory of the wisdom of crowds. Our approach was motivated by a desire to tap into the collective experience of our global medicinal chemistry community and involved four basic steps: (1) Candidate compounds for acquisition were screened using various structural and property filters in order to eliminate clearly nondrug-like matter. (2) The remaining compounds were clustered together with our in-house collection using a novel fingerprint-based clustering algorithm that emphasizes common substructures and works with millions of molecules. (3) Clusters populated exclusively by external compounds were identified as "diversity holes," and representative members of these clusters were presented to our global medicinal chemistry community, who were asked to specify which ones they liked, disliked, or were indifferent to using a simple point-and-click interface. (4) The resulting votes were used to rank the clusters from most to least desirable, and to prioritize which ones should be targeted for acquisition. Analysis of the voting results reveals interesting voter behaviors and distinct preferences for certain molecular property ranges that are fully consistent with lead-like profiles established through systematic analysis of large historical databases.
Asunto(s)
Bibliotecas de Moléculas Pequeñas/química , Química Farmacéutica/métodos , Análisis por Conglomerados , Estructura MolecularRESUMEN
OBJECTIVE: One of the greatest challenges in clinical trial design is dealing with the subjectivity and variability introduced by human raters when measuring clinical end-points. We hypothesized that robotic measures that capture the kinematics of human movements collected longitudinally in patients after stroke would bear a significant relationship to the ordinal clinical scales and potentially lead to the development of more sensitive motor biomarkers that could improve the efficiency and cost of clinical trials. MATERIALS AND METHODS: We used clinical scales and a robotic assay to measure arm movement in 208 patients 7, 14, 21, 30 and 90 days after acute ischemic stroke at two separate clinical sites. The robots are low impedance and low friction interactive devices that precisely measure speed, position and force, so that even a hemiparetic patient can generate a complete measurement profile. These profiles were used to develop predictive models of the clinical assessments employing a combination of artificial ant colonies and neural network ensembles. RESULTS: The resulting models replicated commonly used clinical scales to a cross-validated R2 of 0.73, 0.75, 0.63 and 0.60 for the Fugl-Meyer, Motor Power, NIH stroke and modified Rankin scales, respectively. Moreover, when suitably scaled and combined, the robotic measures demonstrated a significant increase in effect size from day 7 to 90 over historical data (1.47 versus 0.67). DISCUSSION AND CONCLUSION: These results suggest that it is possible to derive surrogate biomarkers that can significantly reduce the sample size required to power future stroke clinical trials.
Asunto(s)
Movimiento , Recuperación de la Función , Robótica/métodos , Rehabilitación de Accidente Cerebrovascular/normas , Accidente Cerebrovascular/fisiopatología , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Masculino , Persona de Mediana Edad , Examen Neurológico/métodos , Examen Neurológico/normas , Rehabilitación de Accidente Cerebrovascular/métodosRESUMEN
Finding the rotational matrix that minimizes the sum of squared deviations between two vectors is an important problem in bioinformatics and crystallography. Traditional algorithms involve the inversion or decomposition of a 3 x 3 or 4 x 4 matrix, which can be computationally expensive and numerically unstable in certain cases. Here, we present a simple and robust algorithm to rapidly determine the optimal rotation using a Newton-Raphson quaternion-based method and an adjoint matrix. Our method is at least an order of magnitude more efficient than conventional inversion/decomposition methods, and it should be particularly useful for high-throughput analyses of molecular conformations.
Asunto(s)
Biología Computacional/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Proteínas/química , Rotación , Algoritmos , Cristalografía , Conformación Proteica , Factores de TiempoRESUMEN
Protein loops, the flexible short segments connecting two stable secondary structural units in proteins, play a critical role in protein structure and function. Constructing chemically sensible conformations of protein loops that seamlessly bridge the gap between the anchor points without introducing any steric collisions remains an open challenge. A variety of algorithms have been developed to tackle the loop closure problem, ranging from inverse kinematics to knowledge-based approaches that utilize pre-existing fragments extracted from known protein structures. However, many of these approaches focus on the generation of conformations that mainly satisfy the fixed end point condition, leaving the steric constraints to be resolved in subsequent post-processing steps. In the present work, we describe a simple solution that simultaneously satisfies not only the end point and steric conditions, but also chirality and planarity constraints. Starting from random initial atomic coordinates, each individual conformation is generated independently by using a simple alternating scheme of pairwise distance adjustments of randomly chosen atoms, followed by fast geometric matching of the conformationally rigid components of the constituent amino acids. The method is conceptually simple, numerically stable and computationally efficient. Very importantly, additional constraints, such as those derived from NMR experiments, hydrogen bonds or salt bridges, can be incorporated into the algorithm in a straightforward and inexpensive way, making the method ideal for solving more complex multi-loop problems. The remarkable performance and robustness of the algorithm are demonstrated on a set of protein loops of length 4, 8, and 12 that have been used in previous studies.
Asunto(s)
Algoritmos , Biología Computacional/métodos , Modelos Químicos , Proteínas/química , Cristalografía por Rayos X , Bases de Datos de Proteínas , Modelos Moleculares , Conformación ProteicaRESUMEN
Covance Drug Development produces more than 55 million test results via its central laboratory services, requiring the delivery of more than 10 million reports annually to investigators at 35,000 sites in 89 countries. Historically, most of these data were delivered via fax or electronic data transfers in delimited text or SAS transport file format. Here, we present a new web portal that allows secure online delivery of laboratory results, reports, manuals, and training materials, and enables collaboration with investigational sites through alerts, announcements, and communications. By leveraging a three-tier architecture composed of preexisting data warehouses augmented with an application-specific relational database to store configuration data and materialized views for performance optimizations, a RESTful web application programming interface (API), and a browser-based single-page application for user access, the system offers greatly improved capabilities and user experience without requiring any changes to the underlying acquisition systems and data stores. Following a 3-month controlled rollout with 6,500 users at early-adopter sites, the Xcellerate Investigator Portal was deployed to all 240,000 of Covance's Central Laboratory Services' existing users, gaining widespread acceptance and pointing to significant benefits in productivity, convenience, and user experience.
Asunto(s)
Comunicación , Internet , Laboratorios , Programas Informáticos , Humanos , Interfaz Usuario-ComputadorRESUMEN
Stochastic proximity embedding (SPE) and self-organizing superimposition (SOS) are two recently introduced methods for conformational sampling that have shown great promise in several application domains. Our previous validation studies aimed at exploring the limits of these methods and have involved rather exhaustive conformational searches producing a large number of conformations. However, from a practical point of view, such searches have become the exception rather than the norm. The increasing popularity of virtual screening has created a need for 3D conformational search methods that produce meaningful answers in a relatively short period of time and work effectively on a large scale. In this work, we examine the performance of these algorithms and the effects of different parameter settings at varying levels of sampling. Our goal is to identify search protocols that can produce a diverse set of chemically sensible conformations and have a reasonable probability of sampling biologically active space within a small number of trials. Our results suggest that both SPE and SOS are extremely competitive in this regard and produce very satisfactory results with as few as 500 conformations per molecule. The results improve even further when the raw conformations are minimized with a molecular mechanics force field to remove minor imperfections and any residual strain. These findings provide additional evidence that these methods are suitable for many everyday modeling tasks, both high- and low-throughput.
Asunto(s)
Evaluación Preclínica de Medicamentos/métodos , Conformación Molecular , Algoritmos , Cristalografía por Rayos X , Evaluación Preclínica de Medicamentos/normas , Ligandos , Modelos Moleculares , Proteínas/química , Proteínas/metabolismo , Estándares de Referencia , Reproducibilidad de los Resultados , Procesos Estocásticos , TermodinámicaRESUMEN
As computational drug design becomes increasingly reliant on virtual screening and on high-throughput 3D modeling, the need for fast, robust, and reliable methods for sampling molecular conformations has become greater than ever. Furthermore, chemical novelty is at a premium, forcing medicinal chemists to explore more complex structural motifs and unusual topologies. This necessitates the use of conformational sampling techniques that work well in all cases. Here, we compare the performance of several popular conformational search algorithms on three broad classes of macrocyclic molecules. These methods include Catalyst, CAESAR, MacroModel, MOE, Omega, Rubicon and two newer self-organizing algorithms known as stochastic proximity embedding (SPE) and self-organizing superimposition (SOS) that have been developed at Johnson & Johnson. Our results show a compelling advantage for the three distance geometry methods (SOS, SPE, and Rubicon) followed to a lesser extent by MacroModel. The remaining techniques, particularly those based on systematic search, often failed to identify any of the lowest energy conformations and are unsuitable for this class of structures. Taken together with our previous study on drug-like molecules (Agrafiotis, D. K.; Gibbs, A.; Zhu, F.; Izrailev, S.; Martin, E. Conformational Sampling of Bioactive Molecules: A Comparative Study. J. Chem. Inf. Model., 2007, 47, 1067-1086), these results suggest that SPE and SOS are two of the most robust and universally applicable conformational search methods, with the latter being preferred because of its superior speed.
Asunto(s)
Descubrimiento de Drogas/métodos , Compuestos Macrocíclicos/química , Conformación Molecular , Algoritmos , Programas Informáticos , Procesos Estocásticos , TermodinámicaRESUMEN
We recently introduced SAR maps, a new interactive method for visualizing structure-activity relationships targeted specifically at medicinal chemists. A SAR map renders an R-group decomposition of a congeneric series as a rectangular matrix of cells, each representing a unique combination of R-groups color-coded by a user-selected property of the corresponding compound. In this paper, we describe an enhanced version that greatly expands the types of visualizations that can be displayed inside the cells. Examples include multidimensional histograms and pie charts that visualize the biological profiles of compounds across an entire panel of assays, forms that display specific fields on user-defined layouts, aligned 3D structure drawings that show the relative orientation of different substituents, dose-response curves, images of crystals or diffraction patterns, and many others. These enhancements, which capitalize on the modular architecture of its host application Third Dimension Explorer (3DX), allow the medicinal chemist to interactively analyze complex scaffolds with multiple substitution sites, correlate substituent structure and biological activity at multiple simultaneous dimensions, identify missing analogs or screening data, and produce information-dense visualizations for presentations and publications. The new tool has an intuitive user interface that makes it appealing to experts and nonexperts alike.
Asunto(s)
Química Farmacéutica/métodos , Gráficos por Computador , Relación Dosis-Respuesta a Droga , Conformación Molecular , Relación Estructura-Actividad , Interfaz Usuario-ComputadorRESUMEN
Clinical trial data are typically collected through multiple systems developed by different vendors using different technologies and data standards. That data need to be integrated, standardized and transformed for a variety of monitoring and reporting purposes. The need to process large volumes of often inconsistent data in the presence of ever-changing requirements poses a significant technical challenge. As part of a comprehensive clinical data repository, we have developed a data warehouse that integrates patient data from any source, standardizes it and makes it accessible to study teams in a timely manner to support a wide range of analytic tasks for both in-flight and completed studies. Our solution combines Apache HBase, a NoSQL column store, Apache Phoenix, a massively parallel relational query engine and a user-friendly interface to facilitate efficient loading of large volumes of data under incomplete or ambiguous specifications, utilizing an extract-load-transform design pattern that defers data mapping until query time. This approach allows us to maintain a single copy of the data and transform it dynamically into any desirable format without requiring additional storage. Changes to the mapping specifications can be easily introduced and multiple representations of the data can be made available concurrently. Further, by versioning the data and the transformations separately, we can apply historical maps to current data or current maps to historical data, which simplifies the maintenance of data cuts and facilitates interim analyses for adaptive trials. The result is a highly scalable, secure and redundant solution that combines the flexibility of a NoSQL store with the robustness of a relational query engine to support a broad range of applications, including clinical data management, medical review, risk-based monitoring, safety signal detection, post hoc analysis of completed studies and many others.
Asunto(s)
Ensayos Clínicos como Asunto , Data Warehousing , Sistemas de Administración de Bases de Datos , Humanos , Aprendizaje Automático , Interfaz Usuario-ComputadorRESUMEN
Timely, consistent and integrated access to clinical trial data remains one of the pharmaceutical industry's most pressing needs. As part of a comprehensive clinical data repository, we have developed a data warehouse that can integrate operational data from any source, conform it to a canonical data model and make it accessible to study teams in a timely, secure and contextualized manner to support operational oversight, proactive risk management and other analytic and reporting needs. Our solution consists of a dimensional relational data warehouse, a set of extraction, transformation and loading processes to coordinate data ingestion and mapping, a generalizable metrics engine to enable the computation of operational metrics and key performance, quality and risk indicators and a set of graphical user interfaces to facilitate configuration, management and administration. When combined with the appropriate data visualization tools, the warehouse enables convenient access to raw operational data and derived metrics to help track study conduct and performance, identify and mitigate risks, monitor and improve operational processes, manage resource allocation, strengthen investigator and sponsor relationships and other purposes.
Asunto(s)
Ensayos Clínicos como Asunto , Data Warehousing , Sistemas de Administración de Bases de Datos , Humanos , Informe de InvestigaciónRESUMEN
OBJECTIVE: We present a new system to track, manage, and report on all risks and issues encountered during a clinical trial. MATERIALS AND METHODS: Our solution utilizes JIRA, a popular issue and project tracking tool for software development, augmented by third-party and custom-built plugins to provide the additional functionality missing from the core product. RESULTS: The new system integrates all issue types under a single tracking tool and offers a range of capabilities, including configurable issue management workflows, seamless integration with other clinical systems, extensive history, reporting, and trending, and an intuitive web interface. DISCUSSION AND CONCLUSION: By preserving the linkage between risks, issues, actions, decisions, and outcomes, the system allows study teams to assess the impact and effectiveness of their risk management strategies and present a coherent account of how the trial was conducted. Since the tool was put in production, we have observed an increase in the number of reported issues and a decrease in the median issue resolution time which, along with the positive user feedback, point to marked improvements in quality, transparency, productivity, and teamwork.