Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Sci Data ; 10(1): 292, 2023 05 19.
Article in English | MEDLINE | ID: mdl-37208467

ABSTRACT

The notion that data should be Findable, Accessible, Interoperable and Reusable, according to the FAIR Principles, has become a global norm for good data stewardship and a prerequisite for reproducibility. Nowadays, FAIR guides data policy actions and professional practices in the public and private sectors. Despite such global endorsements, however, the FAIR Principles are aspirational, remaining elusive at best, and intimidating at worst. To address the lack of practical guidance, and help with capability gaps, we developed the FAIR Cookbook, an open, online resource of hands-on recipes for "FAIR doers" in the Life Sciences. Created by researchers and data managers professionals in academia, (bio)pharmaceutical companies and information service industries, the FAIR Cookbook covers the key steps in a FAIRification journey, the levels and indicators of FAIRness, the maturity model, the technologies, the tools and the standards available, as well as the skills required, and the challenges to achieve and improve data FAIRness. Part of the ELIXIR ecosystem, and recommended by funders, the FAIR Cookbook is open to contributions of new recipes.

2.
J Chem Inf Model ; 52(7): 1713-21, 2012 Jul 23.
Article in English | MEDLINE | ID: mdl-22647079

ABSTRACT

A novel multiobjective evolutionary algorithm (MOEA) for de novo design was developed and applied to the discovery of new adenosine receptor antagonists. This method consists of several iterative cycles of structure generation, evaluation, and selection. We applied an evolutionary algorithm (the so-called Molecule Commander) to generate candidate A1 adenosine receptor antagonists, which were evaluated against multiple criteria and objectives consisting of high (predicted) affinity and selectivity for the receptor, together with good ADMET properties. A pharmacophore model for the human A1 adenosine receptor (hA1AR) was created to serve as an objective function for evolution. In addition, three support vector machine models based on molecular fingerprints were developed for the other adenosine receptor subtypes (hA2A, hA2B, and hA3) and applied as negative objective functions, to aim for selectivity. Structures with a higher evolutionary fitness with respect to ADMET and pharmacophore matching scores were selected as input for the next generation and thus developed toward overall fitter ("better") compounds. We finally obtained a collection of 3946 unique compounds from which we derived chemical scaffolds. As a proof-of-principle, six of these templates were selected for actual synthesis and subsequently tested for activity toward all adenosine receptors subtypes. Interestingly, scaffolds 2 and 3 displayed low micromolar affinity for many of the adenosine receptor subtypes. To further investigate our evolutionary design method, we performed systematic modifications on scaffold 3. These modifications were guided by the substitution patterns as observed in the set of generated compounds that contained scaffold 3. We found that an increased affinity with appreciable selectivity for hA1AR over the other adenosine receptor subtypes was achieved through substitution of the scaffold; compound 3a had a Ki value of 280 nM with approximately 10-fold selectivity with respect to hA2AR, while 3g had a 1.6 µM affinity for hA1AR with negligible affinity for the hA2A, hA2B, and hA3 receptor subtypes.


Subject(s)
Algorithms , Drug Design , Evolution, Molecular , Purinergic P1 Receptor Agonists/chemistry , Binding Sites , Humans , Ligands , Models, Molecular
3.
BMC Bioinformatics ; 11: 316, 2010 Jun 10.
Article in English | MEDLINE | ID: mdl-20537162

ABSTRACT

BACKGROUND: G protein-coupled receptors (GPCRs) represent a family of well-characterized drug targets with significant therapeutic value. Phylogenetic classifications may help to understand the characteristics of individual GPCRs and their subtypes. Previous phylogenetic classifications were all based on the sequences of receptors, adding only minor information about the ligand binding properties of the receptors. In this work, we compare a sequence-based classification of receptors to a ligand-based classification of the same group of receptors, and evaluate the potential to use sequence relatedness as a predictor for ligand interactions thus aiding the quest for ligands of orphan receptors. RESULTS: We present a classification of GPCRs that is purely based on their ligands, complementing sequence-based phylogenetic classifications of these receptors. Targets were hierarchically classified into phylogenetic trees, for both sequence space and ligand (substructure) space. The overall organization of the sequence-based tree and substructure-based tree was similar; in particular, the adenosine receptors cluster together as well as most peptide receptor subtypes (e.g. opioid, somatostatin) and adrenoceptor subtypes. In ligand space, the prostanoid and cannabinoid receptors are more distant from the other targets, whereas the tachykinin receptors, the oxytocin receptor, and serotonin receptors are closer to the other targets, which is indicative for ligand promiscuity. In 93% of the receptors studied, de-orphanization of a simulated orphan receptor using the ligands of related receptors performed better than random (AUC > 0.5) and for 35% of receptors de-orphanization performance was good (AUC > 0.7). CONCLUSIONS: We constructed a phylogenetic classification of GPCRs that is solely based on the ligands of these receptors. The similarities and differences with traditional sequence-based classifications were investigated: our ligand-based classification uncovers relationships among GPCRs that are not apparent from the sequence-based classification. This will shed light on potential cross-reactivity of GPCR ligands and will aid the design of new ligands with the desired activity profiles. In addition, we linked the ligand-based classification with a ligand-focused sequence-based classification described in literature and proved the potential of this method for de-orphanization of GPCRs.


Subject(s)
Genomics/methods , Receptors, G-Protein-Coupled/chemistry , Receptors, G-Protein-Coupled/classification , Binding Sites , Drug Design , Ligands , Models, Molecular , Phylogeny
4.
Biomed Res Int ; 2017: 8327980, 2017.
Article in English | MEDLINE | ID: mdl-29214177

ABSTRACT

Patient registries are an essential tool to increase current knowledge regarding rare diseases. Understanding these data is a vital step to improve patient treatments and to create the most adequate tools for personalized medicine. However, the growing number of disease-specific patient registries brings also new technical challenges. Usually, these systems are developed as closed data silos, with independent formats and models, lacking comprehensive mechanisms to enable data sharing. To tackle these challenges, we developed a Semantic Web based solution that allows connecting distributed and heterogeneous registries, enabling the federation of knowledge between multiple independent environments. This semantic layer creates a holistic view over a set of anonymised registries, supporting semantic data representation, integrated access, and querying. The implemented system gave us the opportunity to answer challenging questions across disperse rare disease patient registries. The interconnection between those registries using Semantic Web technologies benefits our final solution in a way that we can query single or multiple instances according to our needs. The outcome is a unique semantic layer, connecting miscellaneous registries and delivering a lightweight holistic perspective over the wealth of knowledge stemming from linked rare disease patient registries.


Subject(s)
Database Management Systems/statistics & numerical data , Information Storage and Retrieval/statistics & numerical data , Rare Diseases/epidemiology , Registries/statistics & numerical data , Semantic Web/statistics & numerical data , Computational Biology/methods , Databases, Factual/statistics & numerical data , Humans , Information Dissemination/methods , Internet/statistics & numerical data , Software/statistics & numerical data
5.
PLoS One ; 11(2): e0149621, 2016.
Article in English | MEDLINE | ID: mdl-26919047

ABSTRACT

High-throughput experimental methods such as medical sequencing and genome-wide association studies (GWAS) identify increasingly large numbers of potential relations between genetic variants and diseases. Both biological complexity (millions of potential gene-disease associations) and the accelerating rate of data production necessitate computational approaches to prioritize and rationalize potential gene-disease relations. Here, we use concept profile technology to expose from the biomedical literature both explicitly stated gene-disease relations (the explicitome) and a much larger set of implied gene-disease associations (the implicitome). Implicit relations are largely unknown to, or are even unintended by the original authors, but they vastly extend the reach of existing biomedical knowledge for identification and interpretation of gene-disease associations. The implicitome can be used in conjunction with experimental data resources to rationalize both known and novel associations. We demonstrate the usefulness of the implicitome by rationalizing known and novel gene-disease associations, including those from GWAS. To facilitate the re-use of implicit gene-disease associations, we publish our data in compliance with FAIR Data Publishing recommendations [https://www.force11.org/group/fairgroup] using nanopublications. An online tool (http://knowledge.bio) is available to explore established and potential gene-disease associations in the context of other biomedical relations.


Subject(s)
Computational Biology/methods , Databases, Genetic , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans
6.
J Cheminform ; 7(Suppl 1 Text mining for chemistry and the CHEMDNER track): S10, 2015.
Article in English | MEDLINE | ID: mdl-25810767

ABSTRACT

BACKGROUND: The past decade has seen an upsurge in the number of publications in chemistry. The ever-swelling volume of available documents makes it increasingly hard to extract relevant new information from such unstructured texts. The BioCreative CHEMDNER challenge invites the development of systems for the automatic recognition of chemicals in text (CEM task) and for ranking the recognized compounds at the document level (CDI task). We investigated an ensemble approach where dictionary-based named entity recognition is used along with grammar-based recognizers to extract compounds from text. We assessed the performance of ten different commercial and publicly available lexical resources using an open source indexing system (Peregrine), in combination with three different chemical compound recognizers and a set of regular expressions to recognize chemical database identifiers. The effect of different stop-word lists, case-sensitivity matching, and use of chunking information was also investigated. We focused on lexical resources that provide chemical structure information. To rank the different compounds found in a text, we used a term confidence score based on the normalized ratio of the term frequencies in chemical and non-chemical journals. RESULTS: The use of stop-word lists greatly improved the performance of the dictionary-based recognition, but there was no additional benefit from using chunking information. A combination of ChEBI and HMDB as lexical resources, the LeadMine tool for grammar-based recognition, and the regular expressions, outperformed any of the individual systems. On the test set, the F-scores were 77.8% (recall 71.2%, precision 85.8%) for the CEM task and 77.6% (recall 71.7%, precision 84.6%) for the CDI task. Missed terms were mainly due to tokenization issues, poor recognition of formulas, and term conjunctions. CONCLUSIONS: We developed an ensemble system that combines dictionary-based and grammar-based approaches for chemical named entity recognition, outperforming any of the individual systems that we considered. The system is able to provide structure information for most of the compounds that are found. Improved tokenization and better recognition of specific entity types is likely to further improve system performance.

7.
PLoS One ; 10(7): e0127612, 2015.
Article in English | MEDLINE | ID: mdl-26154165

ABSTRACT

MOTIVATION: Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler. RESULTS: Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata. AVAILABILITY: SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http://isa-tools.github.io/soapdenovo2/. CONTACT: philippe.rocca-serra@oerc.ox.ac.uk and susanna-assunta.sansone@oerc.ox.ac.uk.


Subject(s)
Computational Biology/methods , Models, Theoretical , Peer Review, Research , Reproducibility of Results
8.
J Med Chem ; 55(11): 5311-25, 2012 Jun 14.
Article in English | MEDLINE | ID: mdl-22563707

ABSTRACT

We present the systematic prospective evaluation of a protein-based and a ligand-based virtual screening platform against a set of three G-protein-coupled receptors (GPCRs): the ß-2 adrenoreceptor (ADRB2), the adenosine A(2A) receptor (AA2AR), and the sphingosine 1-phosphate receptor (S1PR1). Novel bioactive compounds were identified using a consensus scoring procedure combining ligand-based (frequent substructure ranking) and structure-based (Snooker) tools, and all 900 selected compounds were screened against all three receptors. A striking number of ligands showed affinity/activity for GPCRs other than the intended target, which could be partly attributed to the fuzziness and overlap of protein-based pharmacophore models. Surprisingly, the phosphodiesterase 5 (PDE5) inhibitor sildenafil was found to possess submicromolar affinity for AA2AR. Overall, this is one of the first published prospective chemogenomics studies that demonstrate the identification of novel cross-pharmacology between unrelated protein targets. The lessons learned from this study can be used to guide future virtual ligand design efforts.


Subject(s)
Databases, Factual , Drug Design , Models, Molecular , Quantitative Structure-Activity Relationship , Receptors, Adenosine A2/chemistry , Receptors, Adrenergic, beta-2/chemistry , Receptors, Lysosphingolipid/chemistry , Adenosine A2 Receptor Agonists/chemistry , Adenosine A2 Receptor Antagonists/chemistry , Adrenergic beta-2 Receptor Agonists/chemistry , Adrenergic beta-2 Receptor Antagonists/chemistry , Animals , CHO Cells , Cricetinae , Cricetulus , Drug Partial Agonism , HEK293 Cells , High-Throughput Screening Assays , Humans , Ligands , Molecular Structure , Phosphodiesterase 5 Inhibitors/chemistry , Piperazines/chemistry , Piperazines/metabolism , Purines/chemistry , Purines/metabolism , Radioligand Assay , Receptors, Adenosine A2/metabolism , Receptors, Adrenergic, beta-2/metabolism , Receptors, Lysosphingolipid/agonists , Receptors, Lysosphingolipid/metabolism , Sildenafil Citrate , Stochastic Processes , Sulfones/chemistry , Sulfones/metabolism
9.
ChemMedChem ; 6(12): 2302-11, 2011 Dec 09.
Article in English | MEDLINE | ID: mdl-22021213

ABSTRACT

A virtual ligand-based screening approach was designed and evaluated for the discovery of new A(2A) adenosine receptor (AR) ligands. For comparison and evaluation, the procedures from a recently published virtual screening study that used the A(2A) AR X-ray crystal structure for the target-based discovery of new A(2A) ligands were largely followed. Several screening models were constructed by deriving the distinguishing structural features from selected sets of A(2A) AR antagonists, so-called frequent substructure mining. The best model in statistical terms was subsequently applied to large-scale virtual screens of a commercial vendor library. This resulted in the selection of 36 candidates for acquisition and testing. Of the selected candidates, eight compounds significantly inhibited radioligand binding at A(2A) AR (>30%) at 10 µM, corresponding to a "hit rate" of 22%. This hit rate is quite similar to that of the referenced target-based virtual screening study, while both approaches yield new, non-overlapping sets of ligands.


Subject(s)
Adenosine A2 Receptor Antagonists/chemistry , Ligands , Receptor, Adenosine A2A/chemistry , Adenosine A2 Receptor Antagonists/chemical synthesis , Drug Evaluation, Preclinical , Humans , Protein Binding , Receptor, Adenosine A2A/metabolism , Software , Structure-Activity Relationship
10.
Curr Top Med Chem ; 11(15): 1964-77, 2011.
Article in English | MEDLINE | ID: mdl-21470175

ABSTRACT

Chemogenomic approaches, which link ligand chemistry to bioactivity against targets (and, by extension, to phenotypes) are becoming more and more important due to the increasing number of bioactivity data available both in proprietary databases as well as in the public domain. In this article we review chemogenomics approaches applied in four different domains: Firstly, due to the relationship between protein targets from which an approximate relation between their respective bioactive ligands can be inferred, we investigate the extent to which chemogenomics approaches can be applied to receptor deorphanization. In this case it was found that by using knowledge about active compounds of related proteins, in 93% of all cases enrichment better than random could be obtained. Secondly, we analyze different cheminformatics analysis methods with respect to their behavior in chemogenomics studies, such as subgraph mining and Bayesian models. Thirdly, we illustrate how chemogenomics, in its particular flavor of 'proteochemometrics', can be applied to extrapolate bioactivity predictions from given data points to related targets. Finally, we extend the concept of 'chemogenomics' approaches, relating ligand chemistry to bioactivity against related targets, into phenotypic space which then falls into the area of 'chemical genomics' and 'chemical genetics'; given that this is very often the desired endpoint of approaches in not only the pharmaceutical industry, but also in academic probe discovery, this is often the endpoint the experimental scientist is most interested in.


Subject(s)
Genomics/methods , Receptors, G-Protein-Coupled/chemistry , Bayes Theorem , Drug Design , Ligands , Phenotype , Proteins , Receptors, G-Protein-Coupled/classification , Receptors, G-Protein-Coupled/metabolism
11.
J Chem Inf Model ; 49(2): 348-60, 2009 Feb.
Article in English | MEDLINE | ID: mdl-19434836

ABSTRACT

In this study, we conducted frequent substructure mining to identify structural features that discriminate between ligands that do bind to G protein-coupled receptors (GPCRs) and those that do not. In most cases, particular chemical representations resulted in the most significant substructures. Substructures found to be characteristic for the background control set reflected reactions that may have been used to construct this library, e.g., for the ChemBridge DIVERSet library employed these are ester and carboxamide moieties. Alkane amine substructures were identified as most important for GPCR ligands, e.g. the butylamine substructure, often linked to an aromatic system. Hierarchical analysis of targeted GPCRs revealed well-known motives and new substructural features. One example is the imidazole-like substructure common for the histamine binding receptor ligands. Another example is the planar ring system consisting of a fused five- and six-membered ring (indole-like substucture) common for the serotonin receptor ligands.


Subject(s)
Receptors, G-Protein-Coupled/metabolism , Ligands , Protein Conformation , Receptors, G-Protein-Coupled/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL