Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships.

Huber, Florian; Ridder, Lars; Verhoeven, Stefan; Spaaks, Jurriaan H; Diblen, Faruk; Rogers, Simon; van der Hooft, Justin J J.

PLoS Comput Biol ; 17(2): e1008724, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-33591968

RESUMO

Spectral similarity is used as a proxy for structural similarity in many tandem mass spectrometry (MS/MS) based metabolomics analyses such as library matching and molecular networking. Although weaknesses in the relationship between spectral similarity scores and the true structural similarities have been described, little development of alternative scores has been undertaken. Here, we introduce Spec2Vec, a novel spectral similarity score inspired by a natural language processing algorithm-Word2Vec. Spec2Vec learns fragmental relationships within a large set of spectral data to derive abstract spectral embeddings that can be used to assess spectral similarities. Using data derived from GNPS MS/MS libraries including spectra for nearly 13,000 unique molecules, we show how Spec2Vec scores correlate better with structural similarity than cosine-based scores. We demonstrate the advantages of Spec2Vec in library matching and molecular networking. Spec2Vec is computationally more scalable allowing structural analogue searches in large databases within seconds.

Assuntos

Algoritmos , Biologia Computacional/métodos , Biblioteca Gênica , Metabolômica/métodos , Espectrometria de Massas em Tandem/métodos , Simulação por Computador , Bases de Dados Factuais , Reações Falso-Positivas , Aprendizado de Máquina , Processamento de Linguagem Natural , Reprodutibilidade dos Testes

2.

A community resource for paired genomic and metabolomic data mining.

Schorn, Michelle A; Verhoeven, Stefan; Ridder, Lars; Huber, Florian; Acharya, Deepa D; Aksenov, Alexander A; Aleti, Gajender; Moghaddam, Jamshid Amiri; Aron, Allegra T; Aziz, Saefuddin; Bauermeister, Anelize; Bauman, Katherine D; Baunach, Martin; Beemelmanns, Christine; Beman, J Michael; Berlanga-Clavero, María Victoria; Blacutt, Alex A; Bode, Helge B; Boullie, Anne; Brejnrod, Asker; Bugni, Tim S; Calteau, Alexandra; Cao, Liu; Carrión, Víctor J; Castelo-Branco, Raquel; Chanana, Shaurya; Chase, Alexander B; Chevrette, Marc G; Costa-Lotufo, Leticia V; Crawford, Jason M; Currie, Cameron R; Cuypers, Bart; Dang, Tam; de Rond, Tristan; Demko, Alyssa M; Dittmann, Elke; Du, Chao; Drozd, Christopher; Dujardin, Jean-Claude; Dutton, Rachel J; Edlund, Anna; Fewer, David P; Garg, Neha; Gauglitz, Julia M; Gentry, Emily C; Gerwick, Lena; Glukhov, Evgenia; Gross, Harald; Gugger, Muriel; Guillén Matus, Dulce G.

Nat Chem Biol ; 17(4): 363-368, 2021 04.

Artigo em Inglês | MEDLINE | ID: mdl-33589842

Assuntos

Mineração de Dados/métodos , Genômica/métodos , Metabolômica/métodos , Bases de Dados Factuais

3.

sv-callers: a highly portable parallel workflow for structural variant detection in whole-genome sequence data.

Kuzniar, Arnold; Maassen, Jason; Verhoeven, Stefan; Santuari, Luca; Shneider, Carl; Kloosterman, Wigard P; de Ridder, Jeroen.

PeerJ ; 8: e8214, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-31934500

RESUMO

Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases including cancer. Despite the advances in whole genome sequencing, comprehensive and accurate detection of SVs in short-read data still poses some practical and computational challenges. We present sv-callers, a highly portable workflow that enables parallel execution of multiple SV detection tools, as well as provide users with example analyses of detected SV callsets in a Jupyter Notebook. This workflow supports easy deployment of software dependencies, configuration and addition of new analysis tools. Moreover, porting it to different computing systems requires minimal effort. Finally, we demonstrate the utility of the workflow by performing both somatic and germline SV analyses on different high-performance computing systems.

4.

3D-e-Chem: Structural Cheminformatics Workflows for Computer-Aided Drug Discovery.

Kooistra, Albert J; Vass, Márton; McGuire, Ross; Leurs, Rob; de Esch, Iwan J P; Vriend, Gert; Verhoeven, Stefan; de Graaf, Chris.

ChemMedChem ; 13(6): 614-626, 2018 03 20.

Artigo em Inglês | MEDLINE | ID: mdl-29337438

RESUMO

eScience technologies are needed to process the information available in many heterogeneous types of protein-ligand interaction data and to capture these data into models that enable the design of efficacious and safe medicines. Here we present scientific KNIME tools and workflows that enable the integration of chemical, pharmacological, and structural information for: i)âstructure-based bioactivity data mapping, ii)âstructure-based identification of scaffold replacement strategies for ligand design, iii)âligand-based target prediction, iv)âprotein sequence-based binding site identification and ligand repurposing, and v)âstructure-based pharmacophore comparison for ligand repurposing across protein families. The modular setup of the workflows and the use of well-established standards allows the re-use of these protocols and facilitates the design of customized computer-aided drug discovery workflows.

Assuntos

Desenho Assistido por Computador , Descoberta de Drogas/métodos , Processamento de Imagem Assistida por Computador , Internet , Inibidores de Proteínas Quinases/química , Ligantes , Estrutura Molecular

5.

A Structural Framework for GPCR Chemogenomics: What's In a Residue Number?

Vass, Márton; Kooistra, Albert J; Verhoeven, Stefan; Gloriam, David; de Esch, Iwan J P; de Graaf, Chris.

Methods Mol Biol ; 1705: 73-113, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29188559

RESUMO

The recent surge of crystal structures of G protein-coupled receptors (GPCRs), as well as comprehensive collections of sequence, structural, ligand bioactivity, and mutation data, has enabled the development of integrated chemogenomics workflows for this important target family. This chapter will focus on cross-family and cross-class studies of GPCRs that have pinpointed the need for, and the implementation of, a generic numbering scheme for referring to specific structural elements of GPCRs. Sequence- and structure-based numbering schemes for different receptor classes will be introduced and the remaining caveats will be discussed. The use of these numbering schemes has facilitated many chemogenomics studies such as consensus binding site definition, binding site comparison, ligand repurposing (e.g. for orphan receptors), sequence-based pharmacophore generation for homology modeling or virtual screening, and class-wide chemogenomics studies of GPCRs.

Assuntos

Genômica , Ligantes , Receptores Acoplados a Proteínas G/química , Receptores Acoplados a Proteínas G/genética , Motivos de Aminoácidos , Aminoácidos , Sítios de Ligação , Biologia Computacional/métodos , Sequência Conservada , Descoberta de Drogas/métodos , Genômica/métodos , Humanos , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Receptores Acoplados a Proteínas G/metabolismo , Relação Estrutura-Atividade

6.

3D-e-Chem-VM: Structural Cheminformatics Research Infrastructure in a Freely Available Virtual Machine.

McGuire, Ross; Verhoeven, Stefan; Vass, Márton; Vriend, Gerrit; de Esch, Iwan J P; Lusher, Scott J; Leurs, Rob; Ridder, Lars; Kooistra, Albert J; Ritschel, Tina; de Graaf, Chris.

J Chem Inf Model ; 57(2): 115-121, 2017 02 27.

Artigo em Inglês | MEDLINE | ID: mdl-28125221

RESUMO

3D-e-Chem-VM is an open source, freely available Virtual Machine ( http://3d-e-chem.github.io/3D-e-Chem-VM/ ) that integrates cheminformatics and bioinformatics tools for the analysis of protein-ligand interaction data. 3D-e-Chem-VM consists of software libraries, and database and workflow tools that can analyze and combine small molecule and protein structural information in a graphical programming environment. New chemical and biological data analytics tools and workflows have been developed for the efficient exploitation of structural and pharmacological protein-ligand interaction data from proteomewide databases (e.g., ChEMBLdb and PDB), as well as customized information systems focused on, e.g., G protein-coupled receptors (GPCRdb) and protein kinases (KLIFS). The integrated structural cheminformatics research infrastructure compiled in the 3D-e-Chem-VM enables the design of new approaches in virtual ligand screening (Chemdb4VS), ligand-based metabolism prediction (SyGMa), and structure-based protein binding site comparison and bioisosteric replacement for ligand design (KRIPOdb).

Assuntos

Informática/métodos , Desenho de Fármacos , Ligantes , Proteínas Quinases/metabolismo , Receptores Acoplados a Proteínas G/metabolismo , Software , Interface Usuário-Computador

7.

In silico prediction and automatic LC-MS(n) annotation of green tea metabolites in urine.

Ridder, Lars; van der Hooft, Justin J J; Verhoeven, Stefan; de Vos, Ric C H; Vervoort, Jacques; Bino, Raoul J.

Anal Chem ; 86(10): 4767-74, 2014 May 20.

Artigo em Inglês | MEDLINE | ID: mdl-24779709

RESUMO

The colonic breakdown and human biotransformation of small molecules present in food can give rise to a large variety of potentially bioactive metabolites in the human body. However, the absence of reference data for many of these components limits their identification in complex biological samples, such as plasma and urine. We present an in silico workflow for automatic chemical annotation of metabolite profiling data from liquid chromatography coupled with multistage accurate mass spectrometry (LC-MS(n)), which we used to systematically screen for the presence of tea-derived metabolites in human urine samples after green tea consumption. Reaction rules for intestinal degradation and human biotransformation were systematically applied to chemical structures of 75 green tea components, resulting in a virtual library of 27,245 potential metabolites. All matching precursor ions in the urine LC-MS(n) data sets, as well as the corresponding fragment ions, were automatically annotated by in silico generated (sub)structures. The results were evaluated based on 74 previously identified urinary metabolites and lead to the putative identification of 26 additional green tea-derived metabolites. A total of 77% of all annotated metabolites were not present in the Pubchem database, demonstrating the benefit of in silico metabolite prediction for the automatic annotation of yet unknown metabolites in LC-MS(n) data from nutritional metabolite profiling experiments.

Assuntos

Chá/química , Urina/química , Biotransformação , Cromatografia Líquida , Simulação por Computador , Humanos , Mucosa Intestinal/metabolismo , Espectrometria de Massas em Tandem

8.

Automatic Compound Annotation from Mass Spectrometry Data Using MAGMa.

Ridder, Lars; van der Hooft, Justin J J; Verhoeven, Stefan.

Mass Spectrom (Tokyo) ; 3(Spec Iss 2): S0033, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-26819876

RESUMO

The MAGMa software for automatic annotation of mass spectrometry based fragmentation data was applied to 16 MS/MS datasets of the CASMI 2013 contest. Eight solutions were submitted in category 1 (molecular formula assignments) and twelve in category 2 (molecular structure assignment). The MS/MS peaks of each challenge were matched with in silico generated substructures of candidate molecules from PubChem, resulting in penalty scores that were used for candidate ranking. In 6 of the 12 submitted solutions in category 2, the correct chemical structure obtained the best score, whereas 3 molecules were ranked outside the top 5. All top ranked molecular formulas submitted in category 1 were correct. In addition, we present MAGMa results generated retrospectively for the remaining challenges. Successful application of the MAGMa algorithm required inclusion of the relevant candidate molecules, application of the appropriate mass tolerance and a sufficient degree of in silico fragmentation of the candidate molecules. Furthermore, the effect of the exhaustiveness of the candidate lists and limitations of substructure based scoring are discussed.

9.

Automatic chemical structure annotation of an LC-MS(n) based metabolic profile from green tea.

Ridder, Lars; van der Hooft, Justin J J; Verhoeven, Stefan; de Vos, Ric C H; Bino, Raoul J; Vervoort, Jacques.

Anal Chem ; 85(12): 6033-40, 2013 Jun 18.

Artigo em Inglês | MEDLINE | ID: mdl-23662787

RESUMO

Liquid chromatography coupled with multistage accurate mass spectrometry (LC-MS(n)) can generate comprehensive spectral information of metabolites in crude extracts. To support structural characterization of the many metabolites present in such complex samples, we present a novel method ( http://www.emetabolomics.org/magma ) to automatically process and annotate the LC-MS(n) data sets on the basis of candidate molecules from chemical databases, such as PubChem or the Human Metabolite Database. Multistage MS(n) spectral data is automatically annotated with hierarchical trees of in silico generated substructures of candidate molecules to explain the observed fragment ions and alternative candidates are ranked on the basis of the calculated matching score. We tested this method on an untargeted LC-MS(n) (n ≤ 3) data set of a green tea extract, generated on an LC-LTQ/Orbitrap hybrid MS system. For the 623 spectral trees obtained in a single LC-MS(n) run, a total of 116,240 candidate molecules with monoisotopic masses matching within 5 ppm mass accuracy were retrieved from the PubChem database, ranging from 4 to 1327 candidates per molecular ion. The matching scores were used to rank the candidate molecules for each LC-MS(n) component. The median and third quartile fractional ranks for 85 previously identified tea compounds were 3.5 and 7.5, respectively. The substructure annotations and rankings provided detailed structural information of the detected components, beyond annotation with elemental formula only. Twenty-four additional components were putatively identified by expert interpretation of the automatically annotated data set, illustrating the potential to support systematic and untargeted metabolite identification.

Assuntos

Metaboloma/fisiologia , Extratos Vegetais/química , Extratos Vegetais/metabolismo , Espectrometria de Massas em Tandem/métodos , Chá/química , Chá/metabolismo , Automação Laboratorial/métodos , Cromatografia Líquida/métodos , Espectrometria de Massas/métodos , Extratos Vegetais/análise

10.

Identification of new biomarker candidates for glucocorticoid induced insulin resistance using literature mining.

Fleuren, Wilco Wm; Toonen, Erik Jm; Verhoeven, Stefan; Frijters, Raoul; Hulsen, Tim; Rullmann, Ton; van Schaik, René; de Vlieg, Jacob; Alkema, Wynand.

BioData Min ; 6(1): 2, 2013 Feb 04.

Artigo em Inglês | MEDLINE | ID: mdl-23379763

RESUMO

BACKGROUND: Glucocorticoids are potent anti-inflammatory agents used for the treatment of diseases such as rheumatoid arthritis, asthma, inflammatory bowel disease and psoriasis. Unfortunately, usage is limited because of metabolic side-effects, e.g. insulin resistance, glucose intolerance and diabetes. To gain more insight into the mechanisms behind glucocorticoid induced insulin resistance, it is important to understand which genes play a role in the development of insulin resistance and which genes are affected by glucocorticoids.Medline abstracts contain many studies about insulin resistance and the molecular effects of glucocorticoids and thus are a good resource to study these effects. RESULTS: We developed CoPubGene a method to automatically identify gene-disease associations in Medline abstracts. We used this method to create a literature network of genes related to insulin resistance and to evaluate the importance of the genes in this network for glucocorticoid induced metabolic side effects and anti-inflammatory processes.With this approach we found several genes that already are considered markers of GC induced IR, such as phosphoenolpyruvate carboxykinase (PCK) and glucose-6-phosphatase, catalytic subunit (G6PC). In addition, we found genes involved in steroid synthesis that have not yet been recognized as mediators of GC induced IR. CONCLUSIONS: With this approach we are able to construct a robust informative literature network of insulin resistance related genes that gave new insights to better understand the mechanisms behind GC induced IR. The method has been set up in a generic way so it can be applied to a wide variety of disease networks.

11.

Substructure-based annotation of high-resolution multistage MS(n) spectral trees.

Ridder, Lars; van der Hooft, Justin J J; Verhoeven, Stefan; de Vos, Ric C H; van Schaik, René; Vervoort, Jacques.

Rapid Commun Mass Spectrom ; 26(20): 2461-71, 2012 Oct 30.

Artigo em Inglês | MEDLINE | ID: mdl-22976213

RESUMO

RATIONALE: High-resolution multistage MS(n) data contains detailed information that can be used for structural elucidation of compounds observed in metabolomics studies. However, full exploitation of this complex data requires significant analysis efforts by human experts. In silico methods currently used to support data annotation by assigning substructures of candidate molecules are limited to a single level of MS fragmentation. METHODS: We present an extended substructure-based approach which allows annotation of hierarchical spectral trees obtained from high-resolution multistage MS(n) experiments. The algorithm yields a hierarchical tree of substructures of a candidate molecule to explain the fragment peaks observed at consecutive levels of the multistage MS(n) spectral tree. A matching score is calculated that indicates how well the candidate structure can explain the observed hierarchical fragmentation pattern. RESULTS: The method is applied to MS(n) spectral trees of a set of compounds representing important chemical classes in metabolomics. Based on the calculated score, the correct molecules were successfully prioritized among extensive sets of candidates structures retrieved from the PubChem database. CONCLUSIONS: The results indicate that the inclusion of subsequent levels of fragmentation in the automatic annotation of MS(n) data improves the identification of the correct compounds. We show that, especially in the case of lower mass accuracy, this improvement is not only due to the inclusion of additional fragment ions in the analysis, but also to the specific hierarchical information present in the MS(n) spectral trees. This method may significantly reduce the time required by MS experts to analyze complex MS(n) data.

Assuntos

Algoritmos , Espectrometria de Massas/métodos , Bases de Dados Factuais , Metabolômica/métodos

12.

A prospective cross-screening study on G-protein-coupled receptors: lessons learned in virtual compound library design.

Sanders, Marijn P A; Roumen, Luc; van der Horst, Eelke; Lane, J Robert; Vischer, Henry F; van Offenbeek, Jody; de Vries, Henk; Verhoeven, Stefan; Chow, Ken Y; Verkaar, Folkert; Beukers, Margot W; McGuire, Ross; Leurs, Rob; Ijzerman, Adriaan P; de Vlieg, Jacob; de Esch, Iwan J P; Zaman, Guido J R; Klomp, Jan P G; Bender, Andreas; de Graaf, Chris.

J Med Chem ; 55(11): 5311-25, 2012 Jun 14.

Artigo em Inglês | MEDLINE | ID: mdl-22563707

RESUMO

We present the systematic prospective evaluation of a protein-based and a ligand-based virtual screening platform against a set of three G-protein-coupled receptors (GPCRs): the ß-2 adrenoreceptor (ADRB2), the adenosine A(2A) receptor (AA2AR), and the sphingosine 1-phosphate receptor (S1PR1). Novel bioactive compounds were identified using a consensus scoring procedure combining ligand-based (frequent substructure ranking) and structure-based (Snooker) tools, and all 900 selected compounds were screened against all three receptors. A striking number of ligands showed affinity/activity for GPCRs other than the intended target, which could be partly attributed to the fuzziness and overlap of protein-based pharmacophore models. Surprisingly, the phosphodiesterase 5 (PDE5) inhibitor sildenafil was found to possess submicromolar affinity for AA2AR. Overall, this is one of the first published prospective chemogenomics studies that demonstrate the identification of novel cross-pharmacology between unrelated protein targets. The lessons learned from this study can be used to guide future virtual ligand design efforts.

Assuntos

Bases de Dados Factuais , Desenho de Fármacos , Modelos Moleculares , Relação Quantitativa Estrutura-Atividade , Receptores A2 de Adenosina/química , Receptores Adrenérgicos beta 2/química , Receptores de Lisoesfingolipídeo/química , Agonistas do Receptor A2 de Adenosina/química , Antagonistas do Receptor A2 de Adenosina/química , Agonistas de Receptores Adrenérgicos beta 2/química , Antagonistas de Receptores Adrenérgicos beta 2/química , Animais , Células CHO , Cricetinae , Cricetulus , Agonismo Parcial de Drogas , Células HEK293 , Ensaios de Triagem em Larga Escala , Humanos , Ligantes , Estrutura Molecular , Inibidores da Fosfodiesterase 5/química , Piperazinas/química , Piperazinas/metabolismo , Purinas/química , Purinas/metabolismo , Ensaio Radioligante , Receptores A2 de Adenosina/metabolismo , Receptores Adrenérgicos beta 2/metabolismo , Receptores de Lisoesfingolipídeo/agonistas , Receptores de Lisoesfingolipídeo/metabolismo , Citrato de Sildenafila , Processos Estocásticos , Sulfonas/química , Sulfonas/metabolismo

13.

ss-TEA: Entropy based identification of receptor specific ligand binding residues from a multiple sequence alignment of class A GPCRs.

Sanders, Marijn P A; Fleuren, Wilco W M; Verhoeven, Stefan; van den Beld, Sven; Alkema, Wynand; de Vlieg, Jacob; Klomp, Jan P G.

BMC Bioinformatics ; 12: 332, 2011 Aug 10.

Artigo em Inglês | MEDLINE | ID: mdl-21831265

RESUMO

BACKGROUND: G-protein coupled receptors (GPCRs) are involved in many different physiological processes and their function can be modulated by small molecules which bind in the transmembrane (TM) domain. Because of their structural and sequence conservation, the TM domains are often used in bioinformatics approaches to first create a multiple sequence alignment (MSA) and subsequently identify ligand binding positions. So far methods have been developed to predict the common ligand binding residue positions for class A GPCRs. RESULTS: Here we present 1) ss-TEA, a method to identify specific ligand binding residue positions for any receptor, predicated on high quality sequence information. 2) The largest MSA of class A non olfactory GPCRs in the public domain consisting of 13324 sequences covering most of the species homologues of the human set of GPCRs. A set of ligand binding residue positions extracted from literature of 10 different receptors shows that our method has the best ligand binding residue prediction for 9 of these 10 receptors compared to another state-of-the-art method. CONCLUSIONS: The combination of the large multi species alignment and the newly introduced residue selection method ss-TEA can be used to rapidly identify subfamily specific ligand binding residues. This approach can aid the design of site directed mutagenesis experiments, explain receptor function and improve modelling. The method is also available online via GPCRDB at http://www.gpcr.org/7tm/.

Assuntos

Entropia , Receptores Acoplados a Proteínas G/química , Receptores Acoplados a Proteínas G/metabolismo , Alinhamento de Sequência/métodos , Animais , Humanos , Ligantes , Modelos Moleculares , Ligação Proteica , Receptores Acoplados a Proteínas G/classificação

14.

Snooker: a structure-based pharmacophore generation tool applied to class A GPCRs.

Sanders, Marijn P A; Verhoeven, Stefan; de Graaf, Chris; Roumen, Luc; Vroling, Bas; Nabuurs, Sander B; de Vlieg, Jacob; Klomp, Jan P G.

J Chem Inf Model ; 51(9): 2277-92, 2011 Sep 26.

Artigo em Inglês | MEDLINE | ID: mdl-21866955

RESUMO

G-protein coupled receptors (GPCRs) are important drug targets for various diseases and of major interest to pharmaceutical companies. The function of individual members of this protein family can be modulated by the binding of small molecules at the extracellular side of the structurally conserved transmembrane (TM) domain. Here, we present Snooker, a structure-based approach to generate pharmacophore hypotheses for compounds binding to this extracellular side of the TM domain. Snooker does not require knowledge of ligands, is therefore suitable for apo-proteins, and can be applied to all receptors of the GPCR protein family. The method comprises the construction of a homology model of the TM domains and prioritization of residues on the probability of being ligand binding. Subsequently, protein properties are converted to ligand space, and pharmacophore features are generated at positions where protein ligand interactions are likely. Using this semiautomated knowledge-driven bioinformatics approach we have created pharmacophore hypotheses for 15 different GPCRs from several different subfamilies. For the beta-2-adrenergic receptor we show that ligand poses predicted by Snooker pharmacophore hypotheses reproduce literature supported binding modes for â¼75% of compounds fulfilling pharmacophore constraints. All 15 pharmacophore hypotheses represent interactions with essential residues for ligand binding as observed in mutagenesis experiments and compound selections based on these hypotheses are shown to be target specific. For 8 out of 15 targets enrichment factors above 10-fold are observed in the top 0.5% ranked compounds in a virtual screen. Additionally, prospectively predicted ligand binding poses in the human dopamine D3 receptor based on Snooker pharmacophores were ranked among the best models in the community wide GPCR dock 2010.

Assuntos

Receptores Acoplados a Proteínas G/química , Ligantes , Modelos Moleculares , Mutagênese , Ligação Proteica , Conformação Proteica , Receptores Acoplados a Proteínas G/genética

15.

CoPub update: CoPub 5.0 a text mining system to answer biological questions.

Fleuren, Wilco W M; Verhoeven, Stefan; Frijters, Raoul; Heupers, Bart; Polman, Jan; van Schaik, René; de Vlieg, Jacob; Alkema, Wynand.

Nucleic Acids Res ; 39(Web Server issue): W450-4, 2011 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-21622961

RESUMO

In this article, we present CoPub 5.0, a publicly available text mining system, which uses Medline abstracts to calculate robust statistics for keyword co-occurrences. CoPub was initially developed for the analysis of microarray data, but we broadened the scope by implementing new technology and new thesauri. In CoPub 5.0, we integrated existing CoPub technology with new features, and provided a new advanced interface, which can be used to answer a variety of biological questions. CoPub 5.0 allows searching for keywords of interest and its relations to curated thesauri and provides highlighting and sorting mechanisms, using its statistics, to retrieve the most important abstracts in which the terms co-occur. It also provides a way to search for indirect relations between genes, drugs, pathways and diseases, following an ABC principle, in which A and C have no direct connection but are connected via shared B intermediates. With CoPub 5.0, it is possible to create, annotate and analyze networks using the layout and highlight options of Cytoscape web, allowing for literature based systems biology. Finally, operations of the CoPub 5.0 Web service enable to implement the CoPub technology in bioinformatics workflows. CoPub 5.0 can be accessed through the CoPub portal http://www.copub.org.

Assuntos

Mineração de Dados/métodos , Software , Redes Reguladoras de Genes , Internet , PubMed

16.

GPCRDB: information system for G protein-coupled receptors.

Vroling, Bas; Sanders, Marijn; Baakman, Coos; Borrmann, Annika; Verhoeven, Stefan; Klomp, Jan; Oliveira, Laerte; de Vlieg, Jacob; Vriend, Gert.

Nucleic Acids Res ; 39(Database issue): D309-19, 2011 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-21045054

RESUMO

The GPCRDB is a Molecular Class-Specific Information System (MCSIS) that collects, combines, validates and disseminates large amounts of heterogeneous data on G protein-coupled receptors (GPCRs). The GPCRDB contains experimental data on sequences, ligand-binding constants, mutations and oligomers, as well as many different types of computationally derived data such as multiple sequence alignments and homology models. The GPCRDB provides access to the data via a number of different access methods. It offers visualization and analysis tools, and a number of query systems. The data is updated automatically on a monthly basis. The GPCRDB can be found online at http://www.gpcr.org/7tm/.

Assuntos

Bases de Dados de Proteínas , Receptores Acoplados a Proteínas G/química , Ligantes , Mutação , Receptores Acoplados a Proteínas G/genética , Receptores Acoplados a Proteínas G/metabolismo , Alinhamento de Sequência , Análise de Sequência de Proteína , Homologia Estrutural de Proteína , Interface Usuário-Computador

17.

[An internship in another country has its perks]. / Een stage in het buitenland heeft zo zijn meerwaarde.

Groeneveld, Margit; Verhoeven, Stefan; Westgeest, Daphne.

Tijdschr Diergeneeskd ; 133(18): 766-7, 2008 Sep 15.

Artigo em Holandês | MEDLINE | ID: mdl-18833731

Assuntos

Educação em Veterinária/métodos , Internato e Residência , Medicina Veterinária , Animais , Competência Clínica , Comparação Transcultural , Humanos , Países Baixos , Viagem

18.

Literature-based compound profiling: application to toxicogenomics.

Frijters, Raoul; Verhoeven, Stefan; Alkema, Wynand; van Schaik, René; Polman, Jan.

Pharmacogenomics ; 8(11): 1521-34, 2007 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-18034617

RESUMO

INTRODUCTION: To reduce continuously increasing costs in drug development, adverse effects of drugs need to be detected as early as possible in the process. In recent years, compound-induced gene expression profiling methodologies have been developed to assess compound toxicity, including Gene Ontology term and pathway over-representation analyses. The objective of this study was to introduce an additional approach, in which literature information is used for compound profiling to evaluate compound toxicity and mode of toxicity. METHODS: Gene annotations were built by text mining in Medline abstracts for retrieval of co-publications between genes, pathology terms, biological processes and pathways. This literature information was used to generate compound-specific keyword fingerprints, representing over-represented keywords calculated in a set of regulated genes after compound administration. To see whether keyword fingerprints can be used for assessment of compound toxicity, we analyzed microarray data sets of rat liver treated with 11 hepatotoxicants. RESULTS: Analysis of keyword fingerprints of two genotoxic carcinogens, two nongenotoxic carcinogens, two peroxisome proliferators and two randomly generated gene sets, showed that each compound produced a specific keyword fingerprint that correlated with the experimentally observed histopathological events induced by the individual compounds. By contrast, the random sets produced a flat aspecific keyword profile, indicating that the fingerprints induced by the compounds reflect biological events rather than random noise. A more detailed analysis of the keyword profiles of diethylhexylphthalate, dimethylnitrosamine and methapyrilene (MPy) showed that the differences in the keyword fingerprints of these three compounds are based upon known distinct modes of action. Visualization of MPy-linked keywords and MPy-induced genes in a literature network enabled us to construct a mode of toxicity proposal for MPy, which is in agreement with known effects of MPy in literature. CONCLUSION: Compound keyword fingerprinting based on information retrieved from literature is a powerful approach for compound profiling, allowing evaluation of compound toxicity and analysis of the mode of action.

Assuntos

Carcinógenos/toxicidade , Bases de Dados Bibliográficas , Perfilação da Expressão Gênica , Mutagênicos/toxicidade , Proliferadores de Peroxissomos/toxicidade , Toxicogenética/métodos , Algoritmos , Animais , Bases de Dados Genéticas , Fígado/efeitos dos fármacos , MEDLINE , Processamento de Linguagem Natural , Ratos , Vocabulário Controlado

19.

CoPub Mapper: mining MEDLINE based on search term co-publication.

Alako, Blaise T F; Veldhoven, Antoine; van Baal, Sjozef; Jelier, Rob; Verhoeven, Stefan; Rullmann, Ton; Polman, Jan; Jenster, Guido.

BMC Bioinformatics ; 6: 51, 2005 Mar 11.

Artigo em Inglês | MEDLINE | ID: mdl-15760478

RESUMO

BACKGROUND: High throughput microarray analyses result in many differentially expressed genes that are potentially responsible for the biological process of interest. In order to identify biological similarities between genes, publications from MEDLINE were identified in which pairs of gene names and combinations of gene name with specific keywords were co-mentioned. RESULTS: MEDLINE search strings for 15,621 known genes and 3,731 keywords were generated and validated. PubMed IDs were retrieved from MEDLINE and relative probability of co-occurrences of all gene-gene and gene-keyword pairs determined. To assess gene clustering according to literature co-publication, 150 genes consisting of 8 sets with known connections (same pathway, same protein complex, or same cellular localization, etc.) were run through the program. Receiver operator characteristics (ROC) analyses showed that most gene sets were clustered much better than expected by random chance. To test grouping of genes from real microarray data, 221 differentially expressed genes from a microarray experiment were analyzed with CoPub Mapper, which resulted in several relevant clusters of genes with biological process and disease keywords. In addition, all genes versus keywords were hierarchical clustered to reveal a complete grouping of published genes based on co-occurrence. CONCLUSION: The CoPub Mapper program allows for quick and versatile querying of co-published genes and keywords and can be successfully used to cluster predefined groups of genes and microarray data.

Assuntos

Biologia Computacional/métodos , Bases de Dados Bibliográficas , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Mapeamento Cromossômico , Análise por Conglomerados , Gráficos por Computador , Bases de Dados Factuais , Bases de Dados Genéticas , Etiquetas de Sequências Expressas , Reações Falso-Positivas , Perfilação da Expressão Gênica , Genes , Humanos , Armazenamento e Recuperação da Informação , MEDLINE , Metanálise como Assunto , Modelos Moleculares , Modelos Estatísticos , Reconhecimento Automatizado de Padrão , PubMed , Curva ROC , Alinhamento de Sequência , Análise de Sequência de DNA , Software , Descritores , Interface Usuário-Computador , Vocabulário Controlado

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA