Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
1.
Bioinformatics ; 31(5): 776-8, 2015 Mar 01.
Article in English | MEDLINE | ID: mdl-25348214

ABSTRACT

UNLABELLED: PPDMs is a resource that maps small molecule bioactivities to protein domains from the Pfam-A collection of protein families. Small molecule bioactivities mapped to protein domains add important precision to approaches that use protein sequence searches alignments to assist applications in computational drug discovery and systems and chemical biology. We have previously proposed a mapping heuristic for a subset of bioactivities stored in ChEMBL with the Pfam-A domain most likely to mediate small molecule binding. We have since refined this mapping using a manual procedure. Here, we present a resource that provides up-to-date mappings and the possibility to review assigned mappings as well as to participate in their assignment and curation. We also describe how mappings provided through the PPDMs resource are made accessible through the main schema of the ChEMBL database. AVAILABILITY AND IMPLEMENTATION: The PPDMs resource and curation interface is available at https://www.ebi.ac.uk/chembl/research/ppdms/pfam_maps. The source-code for PPDMs is available under the Apache license at https://github.com/chembl/pfam_maps. Source code is available at https://github.com/chembl/pfam_map_loader to demonstrate the integration process with the main schema of ChEMBL.


Subject(s)
Databases, Chemical , Databases, Protein , Drug Discovery/methods , Proteins/chemistry , Small Molecule Libraries/pharmacology , Software , Humans , Protein Structure, Tertiary , Small Molecule Libraries/chemistry
2.
Nucleic Acids Res ; 42(Database issue): D1083-90, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24214965

ABSTRACT

ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 Nucleic Acids Research Database Issue. Since then, a variety of new data sources and improvements in functionality have contributed to the growth and utility of the resource. In particular, more comprehensive tracking of compounds from research stages through clinical development to market is provided through the inclusion of data from United States Adopted Name applications; a new richer data model for representing drug targets has been developed; and a number of methods have been put in place to allow users to more easily identify reliable data. Finally, access to ChEMBL is now available via a new Resource Description Framework format, in addition to the web-based interface, data downloads and web services.


Subject(s)
Databases, Chemical , Drug Discovery , Binding Sites , Humans , Internet , Ligands , Pharmaceutical Preparations/chemistry , Proteins/chemistry , Proteins/drug effects
3.
PLoS Comput Biol ; 8(1): e1002333, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22253582

ABSTRACT

We report on the integration of pharmacological data and homology information for a large scale analysis of small molecule binding to related targets. Differences in small molecule binding have been assessed for curated pairs of human to rat orthologs and also for recently diverged human paralogs. Our analysis shows that in general, small molecule binding is conserved for pairs of human to rat orthologs. Using statistical tests, we identified a small number of cases where small molecule binding is different between human and rat, some of which had previously been reported in the literature. Knowledge of species specific pharmacology can be advantageous for drug discovery, where rats are frequently used as a model system. For human paralogs, we demonstrate a global correlation between sequence identity and the binding of small molecules with equivalent affinity. Our findings provide an initial general model relating small molecule binding and sequence divergence, containing the foundations for a general model to anticipate and predict within-target-family selectivity.


Subject(s)
Proteins/chemistry , Animals , Binding Sites , Databases, Factual , Drug Discovery , Humans , Ligands , Proteins/metabolism , Rats
4.
BMC Bioinformatics ; 13 Suppl 17: S11, 2012.
Article in English | MEDLINE | ID: mdl-23282026

ABSTRACT

BACKGROUND: Large-scale bioactivity/SAR Open Data has recently become available, and this has allowed new analyses and approaches to be developed to help address the productivity and translational gaps of current drug discovery. One of the current limitations of these data is the relative sparsity of reported interactions per protein target, and complexities in establishing clear relationships between bioactivity and targets using bioinformatics tools. We detail in this paper the indexing of targets by the structural domains that bind (or are likely to bind) the ligand within a full-length protein. Specifically, we present a simple heuristic to map small molecule binding to Pfam domains. This profiling can be applied to all proteins within a genome to give some indications of the potential pharmacological modulation and regulation of all proteins. RESULTS: In this implementation of our heuristic, ligand binding to protein targets from the ChEMBL database was mapped to structural domains as defined by profiles contained within the Pfam-A database. Our mapping suggests that the majority of assay targets within the current version of the ChEMBL database bind ligands through a small number of highly prevalent domains, and conversely the majority of Pfam domains sampled by our data play no currently established role in ligand binding. Validation studies, carried out firstly against Uniprot entries with expert binding-site annotation and secondly against entries in the wwPDB repository of crystallographic protein structures, demonstrate that our simple heuristic maps ligand binding to the correct domain in about 90 percent of all assessed cases. Using the mappings obtained with our heuristic, we have assembled ligand sets associated with each Pfam domain. CONCLUSIONS: Small molecule binding has been mapped to Pfam-A domains of protein targets in the ChEMBL bioactivity database. The result of this mapping is an enriched annotation of small molecule bioactivity data and a grouping of activity classes following the Pfam-A specifications of protein domains. This is valuable for data-focused approaches in drug discovery, for example when extrapolating potential targets of a small molecule with known activity against one or few targets, or in the assessment of a potential target for drug discovery or screening studies.


Subject(s)
Computational Biology/methods , Drug Discovery/methods , Proteins/metabolism , Small Molecule Libraries/metabolism , Binding Sites , Humans , Ligands , Protein Structure, Tertiary , Small Molecule Libraries/chemistry , Small Molecule Libraries/pharmacology , Structure-Activity Relationship
5.
Biochem Soc Trans ; 39(5): 1365-70, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21936816

ABSTRACT

The challenge of translating the huge amount of genomic and biochemical data into new drugs is a costly and challenging task. Historically, there has been comparatively little focus on linking the biochemical and chemical worlds. To address this need, we have developed ChEMBL, an online resource of small-molecule SAR (structure-activity relationship) data, which can be used to support chemical biology, lead discovery and target selection in drug discovery. The database contains the abstracted structures, properties and biological activities for over 700000 distinct compounds and in excess of more than 3 million bioactivity records abstracted from over 40000 publications. Additional public domain resources can be readily integrated into the same data model (e.g. PubChem BioAssay data). The compounds in ChEMBL are largely extracted from the primary medicinal chemistry literature, and are therefore usually 'drug-like' or 'lead-like' small molecules with full experimental context. The data cover a significant fraction of the discovery of modern drugs, and are useful in a wide range of drug design and discovery tasks. In addition to the compound data, ChEMBL also contains information for over 8000 protein, cell line and whole-organism 'targets', with over 4000 of those being proteins linked to their underlying genes. The database is searchable both chemically, using an interactive compound sketch tool, protein sequences, family hierarchies, SMILES strings, compound research codes and key words, and biologically, using a variety of gene identifiers, protein sequence similarity and protein families. The information retrieved can then be readily filtered and downloaded into various formats. ChEMBL can be accessed online at https://www.ebi.ac.uk/chembldb.


Subject(s)
Data Mining , Databases, Factual , Drug Discovery , Animals , Computational Biology/methods , Genomics , Humans , Information Storage and Retrieval , Molecular Structure , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/metabolism , Proteins/chemistry , Structure-Activity Relationship
6.
Sci Rep ; 9(1): 18911, 2019 12 11.
Article in English | MEDLINE | ID: mdl-31827124

ABSTRACT

Lack of efficacy in the intended disease indication is the major cause of clinical phase drug development failure. Explanations could include the poor external validity of pre-clinical (cell, tissue, and animal) models of human disease and the high false discovery rate (FDR) in preclinical science. FDR is related to the proportion of true relationships available for discovery (γ), and the type 1 (false-positive) and type 2 (false negative) error rates of the experiments designed to uncover them. We estimated the FDR in preclinical science, its effect on drug development success rates, and improvements expected from use of human genomics rather than preclinical studies as the primary source of evidence for drug target identification. Calculations were based on a sample space defined by all human diseases - the 'disease-ome' - represented as columns; and all protein coding genes - 'the protein-coding genome'- represented as rows, producing a matrix of unique gene- (or protein-) disease pairings. We parameterised the space based on 10,000 diseases, 20,000 protein-coding genes, 100 causal genes per disease and 4000 genes encoding druggable targets, examining the effect of varying the parameters and a range of underlying assumptions, on the inferences drawn. We estimated γ, defined mathematical relationships between preclinical FDR and drug development success rates, and estimated improvements in success rates based on human genomics (rather than orthodox preclinical studies). Around one in every 200 protein-disease pairings was estimated to be causal (γ = 0.005) giving an FDR in preclinical research of 92.6%, which likely makes a major contribution to the reported drug development failure rate of 96%. Observed success rate was only slightly greater than expected for a random pick from the sample space. Values for γ back-calculated from reported preclinical and clinical drug development success rates were also close to the a priori estimates. Substituting genome wide (or druggable genome wide) association studies for preclinical studies as the major information source for drug target identification was estimated to reverse the probability of late stage failure because of the more stringent type 1 error rate employed and the ability to interrogate every potential druggable target in the same experiment. Genetic studies conducted at much larger scale, with greater resolution of disease end-points, e.g. by connecting genomics and electronic health record data within healthcare systems has the potential to produce radical improvement in drug development success rate.


Subject(s)
Drug Development , Genomics , Genome-Wide Association Study , Humans
7.
Sci Rep ; 7(1): 10102, 2017 08 31.
Article in English | MEDLINE | ID: mdl-28860623

ABSTRACT

Protein domains mediate drug-protein interactions and this principle can guide the design of multi-target drugs i.e. polypharmacology. In this study, we associate multi-target drugs with CATH functional families through the overrepresentation of targets of those drugs in CATH functional families. Thus, we identify CATH functional families that are currently enriched in drugs (druggable CATH functional families) and we use the network properties of these druggable protein families to analyse their association with drug side effects. Analysis of selected druggable CATH functional families, enriched in drug targets, show that relatives exhibit highly conserved drug binding sites. Furthermore, relatives within druggable CATH functional families occupy central positions in a human protein functional network, cluster together forming network neighbourhoods and are less likely to be within proteins associated with drug side effects. Our results demonstrate that CATH functional families can be used to identify drug-target interactions, opening a new research direction in target identification.


Subject(s)
Databases, Protein , Polypharmacology , Algorithms , Binding Sites , Drug Discovery/methods , Humans , Protein Binding , Sequence Analysis, Protein/methods
8.
Sci Transl Med ; 9(383)2017 03 29.
Article in English | MEDLINE | ID: mdl-28356508

ABSTRACT

Target identification (determining the correct drug targets for a disease) and target validation (demonstrating an effect of target perturbation on disease biomarkers and disease end points) are important steps in drug development. Clinically relevant associations of variants in genes encoding drug targets model the effect of modifying the same targets pharmacologically. To delineate drug development (including repurposing) opportunities arising from this paradigm, we connected complex disease- and biomarker-associated loci from genome-wide association studies to an updated set of genes encoding druggable human proteins, to agents with bioactivity against these targets, and, where there were licensed drugs, to clinical indications. We used this set of genes to inform the design of a new genotyping array, which will enable association studies of druggable genes for drug target selection and validation in human disease.


Subject(s)
Drug Discovery , Genome, Human , Molecular Targeted Therapy , Drug Repositioning , Genetic Loci , Genome-Wide Association Study , Humans , Linkage Disequilibrium/genetics , Phenotype , Polymorphism, Single Nucleotide/genetics , Reproducibility of Results , Translational Research, Biomedical
9.
Lancet Diabetes Endocrinol ; 4(4): 327-36, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26781229

ABSTRACT

BACKGROUND: Increased circulating plasma urate concentration is associated with an increased risk of coronary heart disease, but the extent of any causative effect of urate on risk of coronary heart disease is still unclear. In this study, we aimed to clarify any causal role of urate on coronary heart disease risk using Mendelian randomisation analysis. METHODS: We first did a fixed-effects meta-analysis of the observational association of plasma urate and risk of coronary heart disease. We then used a conventional Mendelian randomisation approach to investigate the causal relevance using a genetic instrument based on 31 urate-associated single nucleotide polymorphisms (SNPs). To account for potential pleiotropic associations of certain SNPs with risk factors other than urate, we additionally did both a multivariable Mendelian randomisation analysis, in which the genetic associations of SNPs with systolic and diastolic blood pressure, HDL cholesterol, and triglycerides were included as covariates, and an Egger Mendelian randomisation (MR-Egger) analysis to estimate a causal effect accounting for unmeasured pleiotropy. FINDINGS: In the meta-analysis of 17 prospective observational studies (166 486 individuals; 9784 coronary heart disease events) a 1 SD higher urate concentration was associated with an odds ratio (OR) for coronary heart disease of 1·07 (95% CI 1·04-1·10). The corresponding OR estimates from the conventional, multivariable adjusted, and Egger Mendelian randomisation analysis (58 studies; 198 598 individuals; 65 877 events) were 1·18 (95% CI 1·08-1·29), 1·10 (1·00-1·22), and 1·05 (0·92-1·20), respectively, per 1 SD increment in plasma urate. INTERPRETATION: Conventional and multivariate Mendelian randomisation analysis implicates a causal role for urate in the development of coronary heart disease, but these estimates might be inflated by hidden pleiotropy. Egger Mendelian randomisation analysis, which accounts for pleiotropy but has less statistical power, suggests there might be no causal effect. These results might help investigators to determine the priority of trials of urate lowering for the prevention of coronary heart disease compared with other potential interventions. FUNDING: UK National Institute for Health Research, British Heart Foundation, and UK Medical Research Council.


Subject(s)
Coronary Disease/blood , Coronary Disease/etiology , Mendelian Randomization Analysis/methods , Uric Acid/adverse effects , Uric Acid/blood , Humans , Meta-Analysis as Topic , Observational Studies as Topic , Risk Factors
SELECTION OF CITATIONS
SEARCH DETAIL