Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Front Bioinform ; 2: 958378, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36304325

RESUMEN

The concept of the druggable genome has been with us for 20 years. During this time, researchers have developed several methods and resources to help assess a target's druggability. In parallel, evidence for target-disease associations has been collated at scale by Open Targets. More recently, the Protein Data Bank in Europe (PDBe) have built a knowledge base matching per-residue annotations with available protein structure. While each resource is useful in isolation, we believe there is enormous potential in bringing all relevant data into a single knowledge graph, from gene-level to protein residue. Automation is vital for the processing and assessment of all available structures. We have developed scalable, automated workflows that provide hotspot-based druggability assessments for all available structures across large numbers of targets. Ultimately, we will run our method at a proteome scale, an ambition made more realistic by the arrival of AlphaFold 2. Bringing together annotations from the residue up to the gene level and building connections within the graph to represent pathways or protein-protein interactions will create complexity that mirrors the biological systems they represent. Such complexity is difficult for the human mind to utilise effectively, particularly at scale. We believe that graph-based AI methods will be able to expertly navigate such a knowledge graph, selecting the targets of the future.

2.
J Chem Inf Model ; 62(18): 4391-4402, 2022 09 26.
Artículo en Inglés | MEDLINE | ID: mdl-35867814

RESUMEN

Selecting the most appropriate compounds to synthesize and test is a vital aspect of drug discovery. Methods like clustering and diversity present weaknesses in selecting the optimal sets for information gain. Active learning techniques often rely on an initial model and computationally expensive semi-supervised batch selection. Herein, we describe a new subset-based selection method, Coverage Score, that combines Bayesian statistics and information entropy to balance representation and diversity to select a maximally informative subset. Coverage Score can be influenced by prior selections and desirable properties. In this paper, subsets selected through Coverage Score are compared against subsets selected through model-independent and model-dependent techniques for several datasets. In drug-like chemical space, Coverage Score consistently selects subsets that lead to more accurate predictions compared to other selection methods. Subsets selected through Coverage Score produced Random Forest models that have a root-mean-square-error up to 12.8% lower than subsets selected at random and can retain up to 99% of the structural dissimilarity of a diversity selection.


Asunto(s)
Algoritmos , Descubrimiento de Drogas , Teorema de Bayes , Análisis por Conglomerados , Descubrimiento de Drogas/métodos , Entropía
3.
J Chem Inf Model ; 62(2): 284-294, 2022 01 24.
Artículo en Inglés | MEDLINE | ID: mdl-35020376

RESUMEN

Selectivity is a crucial property in small molecule development. Binding site comparisons within a protein family are a key piece of information when aiming to modulate the selectivity profile of a compound. Binding site differences can be exploited to confer selectivity for a specific target, while shared areas can provide insights into polypharmacology. As the quantity of structural data grows, automated methods are needed to process, summarize, and present these data to users. We present a computational method that provides quantitative and data-driven summaries of the available binding site information from an ensemble of structures of the same protein. The resulting ensemble maps identify the key interactions important for ligand binding in the ensemble. The comparison of ensemble maps of related proteins enables the identification of selectivity-determining regions within a protein family. We applied the method to three examples from the well-researched human bromodomain and kinase families, demonstrating that the method is able to identify selectivity-determining regions that have been used to introduce selectivity in past drug discovery campaigns. We then illustrate how the resulting maps can be used to automate comparisons across a target protein family.


Asunto(s)
Polifarmacología , Proteínas , Sitios de Unión , Descubrimiento de Drogas/métodos , Humanos , Dominios Proteicos , Proteínas/química
4.
Chem Sci ; 12(43): 14577-14589, 2021 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-34881010

RESUMEN

Generative models have increasingly been proposed as a solution to the molecular design problem. However, it has proved challenging to control the design process or incorporate prior knowledge, limiting their practical use in drug discovery. In particular, generative methods have made limited use of three-dimensional (3D) structural information even though this is critical to binding. This work describes a method to incorporate such information and demonstrates the benefit of doing so. We combine an existing graph-based deep generative model, DeLinker, with a convolutional neural network to utilise physically-meaningful 3D representations of molecules and target pharmacophores. We apply our model, DEVELOP, to both linker and R-group design, demonstrating its suitability for both hit-to-lead and lead optimisation. The 3D pharmacophoric information results in improved generation and allows greater control of the design process. In multiple large-scale evaluations, we show that including 3D pharmacophoric constraints results in substantial improvements in the quality of generated molecules. On a challenging test set derived from PDBbind, our model improves the proportion of generated molecules with high 3D similarity to the original molecule by over 300%. In addition, DEVELOP recovers 10× more of the original molecules compared to the baseline DeLinker method. Our approach is general-purpose, readily modifiable to alternate 3D representations, and can be incorporated into other generative frameworks. Code is available at https://github.com/oxpig/DEVELOP.

5.
Bioinformatics ; 37(15): 2134-2141, 2021 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-33532838

RESUMEN

MOTIVATION: An essential step in the development of virtual screening methods is the use of established sets of actives and decoys for benchmarking and training. However, the decoy molecules in commonly used sets are biased meaning that methods often exploit these biases to separate actives and decoys, and do not necessarily learn to perform molecular recognition. This fundamental issue prevents generalization and hinders virtual screening method development. RESULTS: We have developed a deep learning method (DeepCoy) that generates decoys to a user's preferred specification in order to remove such biases or construct sets with a defined bias. We validated DeepCoy using two established benchmarks, DUD-E and DEKOIS 2.0. For all 102 DUD-E targets and 80 of the 81 DEKOIS 2.0 targets, our generated decoy molecules more closely matched the active molecules' physicochemical properties while introducing no discernible additional risk of false negatives. The DeepCoy decoys improved the Deviation from Optimal Embedding (DOE) score by an average of 81% and 66%, respectively, decreasing from 0.166 to 0.032 for DUD-E and from 0.109 to 0.038 for DEKOIS 2.0. Further, the generated decoys are harder to distinguish than the original decoy molecules via docking with Autodock Vina, with virtual screening performance falling from an AUC ROC of 0.70 to 0.63. AVAILABILITY AND IMPLEMENTATION: The code is available at https://github.com/oxpig/DeepCoy. Generated molecules can be downloaded from http://opig.stats.ox.ac.uk/resources. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

6.
J Chem Inf Model ; 60(4): 1911-1916, 2020 04 27.
Artículo en Inglés | MEDLINE | ID: mdl-32207937

RESUMEN

Methods that survey protein surfaces for binding hotspots can help to evaluate target tractability and guide exploration of potential ligand binding regions. Fragment Hotspot Maps builds upon interaction data mined from the CSD (Cambridge Structural Database) and exploits the idea of identifying hotspots using small chemical fragments, which is now widely used to design new drug leads. Prior to this publication, Fragment Hotspot Maps was only publicly available through a web application. To increase the accessibility of this algorithm we present the Hotspots API (application programming interface), a toolkit that offers programmatic access to the core Fragment Hotspot Maps algorithm, thereby facilitating the interpretation and application of the analysis. To demonstrate the package's utility, we present a workflow which automatically derives protein hydrogen-bond constraints for molecular docking with GOLD. The Hotspots API is available from https://github.com/prcurran/hotspots under the MIT license and is dependent upon the commercial CSD Python API.


Asunto(s)
Diseño de Fármacos , Programas Informáticos , Bases de Datos Factuales , Simulación del Acoplamiento Molecular , Proteínas
7.
J Chem Inf Model ; 60(4): 1983-1995, 2020 04 27.
Artículo en Inglés | MEDLINE | ID: mdl-32195587

RESUMEN

Rational compound design remains a challenging problem for both computational methods and medicinal chemists. Computational generative methods have begun to show promising results for the design problem. However, they have not yet used the power of three-dimensional (3D) structural information. We have developed a novel graph-based deep generative model that combines state-of-the-art machine learning techniques with structural knowledge. Our method ("DeLinker") takes two fragments or partial structures and designs a molecule incorporating both. The generation process is protein-context-dependent, utilizing the relative distance and orientation between the partial structures. This 3D information is vital to successful compound design, and we demonstrate its impact on the generation process and the limitations of omitting such information. In a large-scale evaluation, DeLinker designed 60% more molecules with high 3D similarity to the original molecule than a database baseline. When considering the more relevant problem of longer linkers with at least five atoms, the outperformance increased to 200%. We demonstrate the effectiveness and applicability of this approach on a diverse range of design problems: fragment linking, scaffold hopping, and proteolysis targeting chimera (PROTAC) design. As far as we are aware, this is the first molecular generative model to incorporate 3D structural information directly in the design process. The code is available at https://github.com/oxpig/DeLinker.


Asunto(s)
Algoritmos , Aprendizaje Automático , Modelos Moleculares , Proteínas
8.
J Chem Inf Model ; 58(11): 2319-2330, 2018 11 26.
Artículo en Inglés | MEDLINE | ID: mdl-30273487

RESUMEN

Machine learning has shown enormous potential for computer-aided drug discovery. Here we show how modern convolutional neural networks (CNNs) can be applied to structure-based virtual screening. We have coupled our densely connected CNN (DenseNet) with a transfer learning approach which we use to produce an ensemble of protein family-specific models. We conduct an in-depth empirical study and provide the first guidelines on the minimum requirements for adopting a protein family-specific model. Our method also highlights the need for additional data, even in data-rich protein families. Our approach outperforms recent benchmarks on the DUD-E data set and an independent test set constructed from the ChEMBL database. Using a clustered cross-validation on DUD-E, we achieve an average AUC ROC of 0.92 and a 0.5% ROC enrichment factor of 79. This represents an improvement in early enrichment of over 75% compared to a recent machine learning benchmark. Our results demonstrate that the continued improvements in machine learning architecture for computer vision apply to structure-based virtual screening.


Asunto(s)
Descubrimiento de Drogas/métodos , Aprendizaje Automático , Redes Neurales de la Computación , Proteínas/metabolismo , Diseño Asistido por Computadora , Bases de Datos Farmacéuticas , Bases de Datos de Proteínas , Humanos , Ligandos , Simulación del Acoplamiento Molecular , Proteínas/química
9.
Bioinformatics ; 34(21): 3755-3758, 2018 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-29850778

RESUMEN

Motivation: The interactive visualization of very large macromolecular complexes on the web is becoming a challenging problem as experimental techniques advance at an unprecedented rate and deliver structures of increasing size. Results: We have tackled this problem by developing highly memory-efficient and scalable extensions for the NGL WebGL-based molecular viewer and by using Macromolecular Transmission Format (MMTF), a binary and compressed MMTF. These enable NGL to download and render molecular complexes with millions of atoms interactively on desktop computers and smartphones alike, making it a tool of choice for web-based molecular visualization in research and education. Availability and implementation: The source code is freely available under the MIT license at github.com/arose/ngl and distributed on NPM (npmjs.com/package/ngl). MMTF-JavaScript encoders and decoders are available at github.com/rcsb/mmtf-javascript.


Asunto(s)
Gráficos por Computador , Internet , Sustancias Macromoleculares , Programas Informáticos
10.
Essays Biochem ; 61(5): 495-503, 2017 11 08.
Artículo en Inglés | MEDLINE | ID: mdl-29118096

RESUMEN

The ongoing explosion in genomics data has long since outpaced the capacity of conventional biochemical methodology to verify the large number of hypotheses that emerge from the analysis of such data. In contrast, it is still a gold-standard for early phenotypic validation towards small-molecule drug discovery to use probe molecules (or tool compounds), notwithstanding the difficulty and cost of generating them. Rational structure-based approaches to ligand discovery have long promised the efficiencies needed to close this divergence; in practice, however, this promise remains largely unfulfilled, for a host of well-rehearsed reasons and despite the huge technical advances spearheaded by the structural genomics initiatives of the noughties. Therefore the current, fourth funding phase of the Structural Genomics Consortium (SGC), building on its extensive experience in structural biology of novel targets and design of protein inhibitors, seeks to redefine what it means to do structural biology for drug discovery. We developed the concept of a Target Enabling Package (TEP) that provides, through reagents, assays and data, the missing link between genetic disease linkage and the development of usefully potent compounds. There are multiple prongs to the ambition: rigorously assessing targets' genetic disease linkages through crowdsourcing to a network of collaborating experts; establishing a systematic approach to generate the protocols and data that comprise each target's TEP; developing new, X-ray-based fragment technologies for generating high quality chemical matter quickly and cheaply; and exploiting a stringently open access model to build multidisciplinary partnerships throughout academia and industry. By learning how to scale these approaches, the SGC aims to make structures finally serve genomics, as originally intended, and demonstrate how 3D structures systematically allow new modes of druggability to be discovered for whole classes of targets.


Asunto(s)
Diseño de Fármacos , Descubrimiento de Drogas/métodos , Drogas en Investigación/química , Proteínas/química , Bibliotecas de Moléculas Pequeñas/química , Sitios de Unión , Técnicas Químicas Combinatorias , Cristalografía por Rayos X , Drogas en Investigación/síntesis química , Genómica/métodos , Humanos , Ligandos , Simulación del Acoplamiento Molecular , Unión Proteica , Proteínas/agonistas , Proteínas/antagonistas & inhibidores , Proteínas/metabolismo , Bibliotecas de Moléculas Pequeñas/síntesis química , Relación Estructura-Actividad
11.
PLoS Comput Biol ; 13(6): e1005575, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28574982

RESUMEN

Recent advances in experimental techniques have led to a rapid growth in complexity, size, and number of macromolecular structures that are made available through the Protein Data Bank. This creates a challenge for macromolecular visualization and analysis. Macromolecular structure files, such as PDB or PDBx/mmCIF files can be slow to transfer, parse, and hard to incorporate into third-party software tools. Here, we present a new binary and compressed data representation, the MacroMolecular Transmission Format, MMTF, as well as software implementations in several languages that have been developed around it, which address these issues. We describe the new format and its APIs and demonstrate that it is several times faster to parse, and about a quarter of the file size of the current standard format, PDBx/mmCIF. As a consequence of the new data representation, it is now possible to visualize structures with millions of atoms in a web browser, keep the whole PDB archive in memory or parse it within few minutes on average computers, which opens up a new way of thinking how to design and implement efficient algorithms in structural bioinformatics. The PDB archive is available in MMTF file format through web services and data that are updated on a weekly basis.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Compuestos Químicos , Sustancias Macromoleculares , Programas Informáticos , Internet , Sustancias Macromoleculares/análisis , Sustancias Macromoleculares/química , Sustancias Macromoleculares/clasificación , Estructura Molecular
12.
Nat Commun ; 8: 15123, 2017 04 24.
Artículo en Inglés | MEDLINE | ID: mdl-28436492

RESUMEN

In macromolecular crystallography, the rigorous detection of changed states (for example, ligand binding) is difficult unless signal is strong. Ambiguous ('weak' or 'noisy') density is experimentally common, since molecular states are generally only fractionally present in the crystal. Existing methodologies focus on generating maximally accurate maps whereby minor states become discernible; in practice, such map interpretation is disappointingly subjective, time-consuming and methodologically unsound. Here we report the PanDDA method, which automatically reveals clear electron density for the changed state-even from inaccurate maps-by subtracting a proportion of the confounding 'ground state'; changed states are objectively identified from statistical analysis of density distributions. The method is completely general, implying new best practice for all changed-state studies, including the routine collection of multiple ground-state crystals. More generally, these results demonstrate: the incompleteness of atomic models; that single data sets contain insufficient information to model them fully; and that accuracy requires further map-deconvolution approaches.

13.
PLoS One ; 12(3): e0174846, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28362865

RESUMEN

The size and complexity of 3D macromolecular structures available in the Protein Data Bank is constantly growing. Current tools and file formats have reached limits of scalability. New compression approaches are required to support the visualization of large molecular complexes and enable new and scalable means for data analysis. We evaluated a series of compression techniques for coordinates of 3D macromolecular structures and identified the best performing approaches. By balancing compression efficiency in terms of the decompression speed and compression ratio, and code complexity, our results provide the foundation for a novel standard to represent macromolecular coordinates in a compact and useful file format.


Asunto(s)
Bases de Datos de Proteínas , Algoritmos , Compresión de Datos , Espectroscopía de Resonancia Magnética , Modelos Teóricos , Estructura Molecular , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína
14.
Acta Crystallogr D Struct Biol ; 73(Pt 3): 279-285, 2017 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-28291763

RESUMEN

In this work, two freely available web-based interactive computational tools that facilitate the analysis and interpretation of protein-ligand interaction data are described. Firstly, WONKA, which assists in uncovering interesting and unusual features (for example residue motions) within ensembles of protein-ligand structures and enables the facile sharing of observations between scientists. Secondly, OOMMPPAA, which incorporates protein-ligand activity data with protein-ligand structural data using three-dimensional matched molecular pairs. OOMMPPAA highlights nuanced structure-activity relationships (SAR) and summarizes available protein-ligand activity data in the protein context. In this paper, the background that led to the development of both tools is described. Their implementation is outlined and their utility using in-house Structural Genomics Consortium (SGC) data sets and openly available data from the PDB and ChEMBL is described. Both tools are freely available to use and download at http://wonka.sgc.ox.ac.uk/WONKA/ and http://oommppaa.sgc.ox.ac.uk/OOMMPPAA/.


Asunto(s)
Diseño Asistido por Computadora , Diseño de Fármacos , Proteínas/metabolismo , Programas Informáticos , Sitios de Unión , Bases de Datos de Proteínas , Humanos , Ligandos , Simulación del Acoplamiento Molecular , Unión Proteica , Proteínas/química , Relación Estructura-Actividad
15.
Struct Dyn ; 4(3): 032104, 2017 May.
Artículo en Inglés | MEDLINE | ID: mdl-28345007

RESUMEN

Crystallographic fragment screening uses low molecular weight compounds to probe the protein surface and although individual protein-fragment interactions are high quality, fragments commonly bind at low occupancy, historically making identification difficult. However, our new Pan-Dataset Density Analysis method readily identifies binders missed by conventional analysis: for fragment screening data of lysine-specific demethylase 4D (KDM4D), the hit rate increased from 0.9% to 10.6%. Previously unidentified fragments reveal multiple binding sites and demonstrate: the versatility of crystallographic fragment screening; that surprisingly large conformational changes are possible in crystals; and that low crystallographic occupancy does not by itself reflect a protein-ligand complex's significance.

16.
Nucleic Acids Res ; 45(D1): D271-D281, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27794042

RESUMEN

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, http://rcsb.org), the US data center for the global PDB archive, makes PDB data freely available to all users, from structural biologists to computational biologists and beyond. New tools and resources have been added to the RCSB PDB web portal in support of a 'Structural View of Biology.' Recent developments have improved the User experience, including the high-speed NGL Viewer that provides 3D molecular visualization in any web browser, improved support for data file download and enhanced organization of website pages for query, reporting and individual structure exploration. Structure validation information is now visible for all archival entries. PDB data have been integrated with external biological resources, including chromosomal position within the human genome; protein modifications; and metabolic pathways. PDB-101 educational materials have been reorganized into a searchable website and expanded to include new features such as the Geis Digital Archive.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Proteínas/química , Proteínas/genética , Conjuntos de Datos como Asunto , Redes y Vías Metabólicas , Modelos Moleculares , Conformación Proteica , Proteínas/metabolismo , Programas Informáticos , Relación Estructura-Actividad , Interfaz Usuario-Computador , Navegador Web
17.
J Chem Inf Model ; 54(10): 2636-46, 2014 Oct 27.
Artículo en Inglés | MEDLINE | ID: mdl-25244105

RESUMEN

There is an ever increasing resource in terms of both structural information and activity data for many protein targets. In this paper we describe OOMMPPAA, a novel computational tool designed to inform compound design by combining such data. OOMMPPAA uses 3D matched molecular pairs to generate 3D ligand conformations. It then identifies pharmacophoric transformations between pairs of compounds and associates them with their relevant activity changes. OOMMPPAA presents this data in an interactive application providing the user with a visual summary of important interaction regions in the context of the binding site. We present validation of the tool using openly available data for CDK2 and a GlaxoSmithKline data set for a SAM-dependent methyl-transferase. We demonstrate OOMMPPAA's application in optimizing both potency and cell permeability and use OOMMPPAA to highlight nuanced and cross-series SAR. OOMMPPAA is freely available to download at http://oommppaa.sgc.ox.ac.uk/OOMMPPAA/ .


Asunto(s)
Quinasa 2 Dependiente de la Ciclina/antagonistas & inhibidores , Inhibidores Enzimáticos/química , Metiltransferasas/antagonistas & inhibidores , Bibliotecas de Moléculas Pequeñas/química , Programas Informáticos , Sitios de Unión , Quinasa 2 Dependiente de la Ciclina/química , Diseño de Fármacos , Inhibidores Enzimáticos/síntesis química , Humanos , Ligandos , Metiltransferasas/química , Simulación del Acoplamiento Molecular , Unión Proteica , Relación Estructura-Actividad Cuantitativa , S-Adenosilmetionina/química , Bibliotecas de Moléculas Pequeñas/síntesis química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...