Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
PLoS Comput Biol ; 17(2): e1008724, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33591968

RESUMEN

Spectral similarity is used as a proxy for structural similarity in many tandem mass spectrometry (MS/MS) based metabolomics analyses such as library matching and molecular networking. Although weaknesses in the relationship between spectral similarity scores and the true structural similarities have been described, little development of alternative scores has been undertaken. Here, we introduce Spec2Vec, a novel spectral similarity score inspired by a natural language processing algorithm-Word2Vec. Spec2Vec learns fragmental relationships within a large set of spectral data to derive abstract spectral embeddings that can be used to assess spectral similarities. Using data derived from GNPS MS/MS libraries including spectra for nearly 13,000 unique molecules, we show how Spec2Vec scores correlate better with structural similarity than cosine-based scores. We demonstrate the advantages of Spec2Vec in library matching and molecular networking. Spec2Vec is computationally more scalable allowing structural analogue searches in large databases within seconds.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Biblioteca de Genes , Metabolómica/métodos , Espectrometría de Masas en Tándem/métodos , Simulación por Computador , Bases de Datos Factuales , Reacciones Falso Positivas , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Reproducibilidad de los Resultados
3.
J Chem Inf Model ; 57(2): 115-121, 2017 02 27.
Artículo en Inglés | MEDLINE | ID: mdl-28125221

RESUMEN

3D-e-Chem-VM is an open source, freely available Virtual Machine ( http://3d-e-chem.github.io/3D-e-Chem-VM/ ) that integrates cheminformatics and bioinformatics tools for the analysis of protein-ligand interaction data. 3D-e-Chem-VM consists of software libraries, and database and workflow tools that can analyze and combine small molecule and protein structural information in a graphical programming environment. New chemical and biological data analytics tools and workflows have been developed for the efficient exploitation of structural and pharmacological protein-ligand interaction data from proteomewide databases (e.g., ChEMBLdb and PDB), as well as customized information systems focused on, e.g., G protein-coupled receptors (GPCRdb) and protein kinases (KLIFS). The integrated structural cheminformatics research infrastructure compiled in the 3D-e-Chem-VM enables the design of new approaches in virtual ligand screening (Chemdb4VS), ligand-based metabolism prediction (SyGMa), and structure-based protein binding site comparison and bioisosteric replacement for ligand design (KRIPOdb).


Asunto(s)
Informática/métodos , Diseño de Fármacos , Ligandos , Proteínas Quinasas/metabolismo , Receptores Acoplados a Proteínas G/metabolismo , Programas Informáticos , Interfaz Usuario-Computador
4.
Anal Chem ; 86(10): 4767-74, 2014 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-24779709

RESUMEN

The colonic breakdown and human biotransformation of small molecules present in food can give rise to a large variety of potentially bioactive metabolites in the human body. However, the absence of reference data for many of these components limits their identification in complex biological samples, such as plasma and urine. We present an in silico workflow for automatic chemical annotation of metabolite profiling data from liquid chromatography coupled with multistage accurate mass spectrometry (LC-MS(n)), which we used to systematically screen for the presence of tea-derived metabolites in human urine samples after green tea consumption. Reaction rules for intestinal degradation and human biotransformation were systematically applied to chemical structures of 75 green tea components, resulting in a virtual library of 27,245 potential metabolites. All matching precursor ions in the urine LC-MS(n) data sets, as well as the corresponding fragment ions, were automatically annotated by in silico generated (sub)structures. The results were evaluated based on 74 previously identified urinary metabolites and lead to the putative identification of 26 additional green tea-derived metabolites. A total of 77% of all annotated metabolites were not present in the Pubchem database, demonstrating the benefit of in silico metabolite prediction for the automatic annotation of yet unknown metabolites in LC-MS(n) data from nutritional metabolite profiling experiments.


Asunto(s)
Té/química , Orina/química , Biotransformación , Cromatografía Liquida , Simulación por Computador , Humanos , Mucosa Intestinal/metabolismo , Espectrometría de Masas en Tándem
5.
Anal Chem ; 85(12): 6033-40, 2013 Jun 18.
Artículo en Inglés | MEDLINE | ID: mdl-23662787

RESUMEN

Liquid chromatography coupled with multistage accurate mass spectrometry (LC-MS(n)) can generate comprehensive spectral information of metabolites in crude extracts. To support structural characterization of the many metabolites present in such complex samples, we present a novel method ( http://www.emetabolomics.org/magma ) to automatically process and annotate the LC-MS(n) data sets on the basis of candidate molecules from chemical databases, such as PubChem or the Human Metabolite Database. Multistage MS(n) spectral data is automatically annotated with hierarchical trees of in silico generated substructures of candidate molecules to explain the observed fragment ions and alternative candidates are ranked on the basis of the calculated matching score. We tested this method on an untargeted LC-MS(n) (n ≤ 3) data set of a green tea extract, generated on an LC-LTQ/Orbitrap hybrid MS system. For the 623 spectral trees obtained in a single LC-MS(n) run, a total of 116,240 candidate molecules with monoisotopic masses matching within 5 ppm mass accuracy were retrieved from the PubChem database, ranging from 4 to 1327 candidates per molecular ion. The matching scores were used to rank the candidate molecules for each LC-MS(n) component. The median and third quartile fractional ranks for 85 previously identified tea compounds were 3.5 and 7.5, respectively. The substructure annotations and rankings provided detailed structural information of the detected components, beyond annotation with elemental formula only. Twenty-four additional components were putatively identified by expert interpretation of the automatically annotated data set, illustrating the potential to support systematic and untargeted metabolite identification.


Asunto(s)
Metaboloma/fisiología , Extractos Vegetales/química , Extractos Vegetales/metabolismo , Espectrometría de Masas en Tándem/métodos , Té/química , Té/metabolismo , Automatización de Laboratorios/métodos , Cromatografía Liquida/métodos , Espectrometría de Masas/métodos , Extractos Vegetales/análisis
6.
Nucleic Acids Res ; 39(Database issue): D309-19, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21045054

RESUMEN

The GPCRDB is a Molecular Class-Specific Information System (MCSIS) that collects, combines, validates and disseminates large amounts of heterogeneous data on G protein-coupled receptors (GPCRs). The GPCRDB contains experimental data on sequences, ligand-binding constants, mutations and oligomers, as well as many different types of computationally derived data such as multiple sequence alignments and homology models. The GPCRDB provides access to the data via a number of different access methods. It offers visualization and analysis tools, and a number of query systems. The data is updated automatically on a monthly basis. The GPCRDB can be found online at http://www.gpcr.org/7tm/.


Asunto(s)
Bases de Datos de Proteínas , Receptores Acoplados a Proteínas G/química , Ligandos , Mutación , Receptores Acoplados a Proteínas G/genética , Receptores Acoplados a Proteínas G/metabolismo , Alineación de Secuencia , Análisis de Secuencia de Proteína , Homología Estructural de Proteína , Interfaz Usuario-Computador
7.
Nucleic Acids Res ; 39(Web Server issue): W450-4, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21622961

RESUMEN

In this article, we present CoPub 5.0, a publicly available text mining system, which uses Medline abstracts to calculate robust statistics for keyword co-occurrences. CoPub was initially developed for the analysis of microarray data, but we broadened the scope by implementing new technology and new thesauri. In CoPub 5.0, we integrated existing CoPub technology with new features, and provided a new advanced interface, which can be used to answer a variety of biological questions. CoPub 5.0 allows searching for keywords of interest and its relations to curated thesauri and provides highlighting and sorting mechanisms, using its statistics, to retrieve the most important abstracts in which the terms co-occur. It also provides a way to search for indirect relations between genes, drugs, pathways and diseases, following an ABC principle, in which A and C have no direct connection but are connected via shared B intermediates. With CoPub 5.0, it is possible to create, annotate and analyze networks using the layout and highlight options of Cytoscape web, allowing for literature based systems biology. Finally, operations of the CoPub 5.0 Web service enable to implement the CoPub technology in bioinformatics workflows. CoPub 5.0 can be accessed through the CoPub portal http://www.copub.org.


Asunto(s)
Minería de Datos/métodos , Programas Informáticos , Redes Reguladoras de Genes , Internet , PubMed
8.
Rapid Commun Mass Spectrom ; 26(20): 2461-71, 2012 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-22976213

RESUMEN

RATIONALE: High-resolution multistage MS(n) data contains detailed information that can be used for structural elucidation of compounds observed in metabolomics studies. However, full exploitation of this complex data requires significant analysis efforts by human experts. In silico methods currently used to support data annotation by assigning substructures of candidate molecules are limited to a single level of MS fragmentation. METHODS: We present an extended substructure-based approach which allows annotation of hierarchical spectral trees obtained from high-resolution multistage MS(n) experiments. The algorithm yields a hierarchical tree of substructures of a candidate molecule to explain the fragment peaks observed at consecutive levels of the multistage MS(n) spectral tree. A matching score is calculated that indicates how well the candidate structure can explain the observed hierarchical fragmentation pattern. RESULTS: The method is applied to MS(n) spectral trees of a set of compounds representing important chemical classes in metabolomics. Based on the calculated score, the correct molecules were successfully prioritized among extensive sets of candidates structures retrieved from the PubChem database. CONCLUSIONS: The results indicate that the inclusion of subsequent levels of fragmentation in the automatic annotation of MS(n) data improves the identification of the correct compounds. We show that, especially in the case of lower mass accuracy, this improvement is not only due to the inclusion of additional fragment ions in the analysis, but also to the specific hierarchical information present in the MS(n) spectral trees. This method may significantly reduce the time required by MS experts to analyze complex MS(n) data.


Asunto(s)
Algoritmos , Espectrometría de Masas/métodos , Bases de Datos Factuales , Metabolómica/métodos
9.
BMC Bioinformatics ; 12: 332, 2011 Aug 10.
Artículo en Inglés | MEDLINE | ID: mdl-21831265

RESUMEN

BACKGROUND: G-protein coupled receptors (GPCRs) are involved in many different physiological processes and their function can be modulated by small molecules which bind in the transmembrane (TM) domain. Because of their structural and sequence conservation, the TM domains are often used in bioinformatics approaches to first create a multiple sequence alignment (MSA) and subsequently identify ligand binding positions. So far methods have been developed to predict the common ligand binding residue positions for class A GPCRs. RESULTS: Here we present 1) ss-TEA, a method to identify specific ligand binding residue positions for any receptor, predicated on high quality sequence information. 2) The largest MSA of class A non olfactory GPCRs in the public domain consisting of 13324 sequences covering most of the species homologues of the human set of GPCRs. A set of ligand binding residue positions extracted from literature of 10 different receptors shows that our method has the best ligand binding residue prediction for 9 of these 10 receptors compared to another state-of-the-art method. CONCLUSIONS: The combination of the large multi species alignment and the newly introduced residue selection method ss-TEA can be used to rapidly identify subfamily specific ligand binding residues. This approach can aid the design of site directed mutagenesis experiments, explain receptor function and improve modelling. The method is also available online via GPCRDB at http://www.gpcr.org/7tm/.


Asunto(s)
Entropía , Receptores Acoplados a Proteínas G/química , Receptores Acoplados a Proteínas G/metabolismo , Alineación de Secuencia/métodos , Animales , Humanos , Ligandos , Modelos Moleculares , Unión Proteica , Receptores Acoplados a Proteínas G/clasificación
10.
J Chem Inf Model ; 51(9): 2277-92, 2011 Sep 26.
Artículo en Inglés | MEDLINE | ID: mdl-21866955

RESUMEN

G-protein coupled receptors (GPCRs) are important drug targets for various diseases and of major interest to pharmaceutical companies. The function of individual members of this protein family can be modulated by the binding of small molecules at the extracellular side of the structurally conserved transmembrane (TM) domain. Here, we present Snooker, a structure-based approach to generate pharmacophore hypotheses for compounds binding to this extracellular side of the TM domain. Snooker does not require knowledge of ligands, is therefore suitable for apo-proteins, and can be applied to all receptors of the GPCR protein family. The method comprises the construction of a homology model of the TM domains and prioritization of residues on the probability of being ligand binding. Subsequently, protein properties are converted to ligand space, and pharmacophore features are generated at positions where protein ligand interactions are likely. Using this semiautomated knowledge-driven bioinformatics approach we have created pharmacophore hypotheses for 15 different GPCRs from several different subfamilies. For the beta-2-adrenergic receptor we show that ligand poses predicted by Snooker pharmacophore hypotheses reproduce literature supported binding modes for ∼75% of compounds fulfilling pharmacophore constraints. All 15 pharmacophore hypotheses represent interactions with essential residues for ligand binding as observed in mutagenesis experiments and compound selections based on these hypotheses are shown to be target specific. For 8 out of 15 targets enrichment factors above 10-fold are observed in the top 0.5% ranked compounds in a virtual screen. Additionally, prospectively predicted ligand binding poses in the human dopamine D3 receptor based on Snooker pharmacophores were ranked among the best models in the community wide GPCR dock 2010.


Asunto(s)
Receptores Acoplados a Proteínas G/química , Ligandos , Modelos Moleculares , Mutagénesis , Unión Proteica , Conformación Proteica , Receptores Acoplados a Proteínas G/genética
11.
PeerJ ; 8: e8214, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31934500

RESUMEN

Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases including cancer. Despite the advances in whole genome sequencing, comprehensive and accurate detection of SVs in short-read data still poses some practical and computational challenges. We present sv-callers, a highly portable workflow that enables parallel execution of multiple SV detection tools, as well as provide users with example analyses of detected SV callsets in a Jupyter Notebook. This workflow supports easy deployment of software dependencies, configuration and addition of new analysis tools. Moreover, porting it to different computing systems requires minimal effort. Finally, we demonstrate the utility of the workflow by performing both somatic and germline SV analyses on different high-performance computing systems.

12.
Methods Mol Biol ; 1705: 73-113, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29188559

RESUMEN

The recent surge of crystal structures of G protein-coupled receptors (GPCRs), as well as comprehensive collections of sequence, structural, ligand bioactivity, and mutation data, has enabled the development of integrated chemogenomics workflows for this important target family. This chapter will focus on cross-family and cross-class studies of GPCRs that have pinpointed the need for, and the implementation of, a generic numbering scheme for referring to specific structural elements of GPCRs. Sequence- and structure-based numbering schemes for different receptor classes will be introduced and the remaining caveats will be discussed. The use of these numbering schemes has facilitated many chemogenomics studies such as consensus binding site definition, binding site comparison, ligand repurposing (e.g. for orphan receptors), sequence-based pharmacophore generation for homology modeling or virtual screening, and class-wide chemogenomics studies of GPCRs.


Asunto(s)
Genómica , Ligandos , Receptores Acoplados a Proteínas G/química , Receptores Acoplados a Proteínas G/genética , Secuencias de Aminoácidos , Aminoácidos , Sitios de Unión , Biología Computacional/métodos , Secuencia Conservada , Descubrimiento de Drogas/métodos , Genómica/métodos , Humanos , Modelos Moleculares , Unión Proteica , Conformación Proteica , Receptores Acoplados a Proteínas G/metabolismo , Relación Estructura-Actividad
13.
ChemMedChem ; 13(6): 614-626, 2018 03 20.
Artículo en Inglés | MEDLINE | ID: mdl-29337438

RESUMEN

eScience technologies are needed to process the information available in many heterogeneous types of protein-ligand interaction data and to capture these data into models that enable the design of efficacious and safe medicines. Here we present scientific KNIME tools and workflows that enable the integration of chemical, pharmacological, and structural information for: i) structure-based bioactivity data mapping, ii) structure-based identification of scaffold replacement strategies for ligand design, iii) ligand-based target prediction, iv) protein sequence-based binding site identification and ligand repurposing, and v) structure-based pharmacophore comparison for ligand repurposing across protein families. The modular setup of the workflows and the use of well-established standards allows the re-use of these protocols and facilitates the design of customized computer-aided drug discovery workflows.


Asunto(s)
Diseño Asistido por Computadora , Descubrimiento de Drogas/métodos , Procesamiento de Imagen Asistido por Computador , Internet , Inhibidores de Proteínas Quinasas/química , Ligandos , Estructura Molecular
14.
BMC Bioinformatics ; 6: 51, 2005 Mar 11.
Artículo en Inglés | MEDLINE | ID: mdl-15760478

RESUMEN

BACKGROUND: High throughput microarray analyses result in many differentially expressed genes that are potentially responsible for the biological process of interest. In order to identify biological similarities between genes, publications from MEDLINE were identified in which pairs of gene names and combinations of gene name with specific keywords were co-mentioned. RESULTS: MEDLINE search strings for 15,621 known genes and 3,731 keywords were generated and validated. PubMed IDs were retrieved from MEDLINE and relative probability of co-occurrences of all gene-gene and gene-keyword pairs determined. To assess gene clustering according to literature co-publication, 150 genes consisting of 8 sets with known connections (same pathway, same protein complex, or same cellular localization, etc.) were run through the program. Receiver operator characteristics (ROC) analyses showed that most gene sets were clustered much better than expected by random chance. To test grouping of genes from real microarray data, 221 differentially expressed genes from a microarray experiment were analyzed with CoPub Mapper, which resulted in several relevant clusters of genes with biological process and disease keywords. In addition, all genes versus keywords were hierarchical clustered to reveal a complete grouping of published genes based on co-occurrence. CONCLUSION: The CoPub Mapper program allows for quick and versatile querying of co-published genes and keywords and can be successfully used to cluster predefined groups of genes and microarray data.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Bibliográficas , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Mapeo Cromosómico , Análisis por Conglomerados , Gráficos por Computador , Bases de Datos Factuales , Bases de Datos Genéticas , Etiquetas de Secuencia Expresada , Reacciones Falso Positivas , Perfilación de la Expresión Génica , Genes , Humanos , Almacenamiento y Recuperación de la Información , MEDLINE , Metaanálisis como Asunto , Modelos Moleculares , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas , PubMed , Curva ROC , Alineación de Secuencia , Análisis de Secuencia de ADN , Programas Informáticos , Descriptores , Interfaz Usuario-Computador , Vocabulario Controlado
15.
Mass Spectrom (Tokyo) ; 3(Spec Iss 2): S0033, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-26819876

RESUMEN

The MAGMa software for automatic annotation of mass spectrometry based fragmentation data was applied to 16 MS/MS datasets of the CASMI 2013 contest. Eight solutions were submitted in category 1 (molecular formula assignments) and twelve in category 2 (molecular structure assignment). The MS/MS peaks of each challenge were matched with in silico generated substructures of candidate molecules from PubChem, resulting in penalty scores that were used for candidate ranking. In 6 of the 12 submitted solutions in category 2, the correct chemical structure obtained the best score, whereas 3 molecules were ranked outside the top 5. All top ranked molecular formulas submitted in category 1 were correct. In addition, we present MAGMa results generated retrospectively for the remaining challenges. Successful application of the MAGMa algorithm required inclusion of the relevant candidate molecules, application of the appropriate mass tolerance and a sufficient degree of in silico fragmentation of the candidate molecules. Furthermore, the effect of the exhaustiveness of the candidate lists and limitations of substructure based scoring are discussed.

17.
BioData Min ; 6(1): 2, 2013 Feb 04.
Artículo en Inglés | MEDLINE | ID: mdl-23379763

RESUMEN

BACKGROUND: Glucocorticoids are potent anti-inflammatory agents used for the treatment of diseases such as rheumatoid arthritis, asthma, inflammatory bowel disease and psoriasis. Unfortunately, usage is limited because of metabolic side-effects, e.g. insulin resistance, glucose intolerance and diabetes. To gain more insight into the mechanisms behind glucocorticoid induced insulin resistance, it is important to understand which genes play a role in the development of insulin resistance and which genes are affected by glucocorticoids.Medline abstracts contain many studies about insulin resistance and the molecular effects of glucocorticoids and thus are a good resource to study these effects. RESULTS: We developed CoPubGene a method to automatically identify gene-disease associations in Medline abstracts. We used this method to create a literature network of genes related to insulin resistance and to evaluate the importance of the genes in this network for glucocorticoid induced metabolic side effects and anti-inflammatory processes.With this approach we found several genes that already are considered markers of GC induced IR, such as phosphoenolpyruvate carboxykinase (PCK) and glucose-6-phosphatase, catalytic subunit (G6PC). In addition, we found genes involved in steroid synthesis that have not yet been recognized as mediators of GC induced IR. CONCLUSIONS: With this approach we are able to construct a robust informative literature network of insulin resistance related genes that gave new insights to better understand the mechanisms behind GC induced IR. The method has been set up in a generic way so it can be applied to a wide variety of disease networks.

18.
J Med Chem ; 55(11): 5311-25, 2012 Jun 14.
Artículo en Inglés | MEDLINE | ID: mdl-22563707

RESUMEN

We present the systematic prospective evaluation of a protein-based and a ligand-based virtual screening platform against a set of three G-protein-coupled receptors (GPCRs): the ß-2 adrenoreceptor (ADRB2), the adenosine A(2A) receptor (AA2AR), and the sphingosine 1-phosphate receptor (S1PR1). Novel bioactive compounds were identified using a consensus scoring procedure combining ligand-based (frequent substructure ranking) and structure-based (Snooker) tools, and all 900 selected compounds were screened against all three receptors. A striking number of ligands showed affinity/activity for GPCRs other than the intended target, which could be partly attributed to the fuzziness and overlap of protein-based pharmacophore models. Surprisingly, the phosphodiesterase 5 (PDE5) inhibitor sildenafil was found to possess submicromolar affinity for AA2AR. Overall, this is one of the first published prospective chemogenomics studies that demonstrate the identification of novel cross-pharmacology between unrelated protein targets. The lessons learned from this study can be used to guide future virtual ligand design efforts.


Asunto(s)
Bases de Datos Factuales , Diseño de Fármacos , Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Receptores de Adenosina A2/química , Receptores Adrenérgicos beta 2/química , Receptores de Lisoesfingolípidos/química , Agonistas del Receptor de Adenosina A2/química , Antagonistas del Receptor de Adenosina A2/química , Agonistas de Receptores Adrenérgicos beta 2/química , Antagonistas de Receptores Adrenérgicos beta 2/química , Animales , Células CHO , Cricetinae , Cricetulus , Agonismo Parcial de Drogas , Células HEK293 , Ensayos Analíticos de Alto Rendimiento , Humanos , Ligandos , Estructura Molecular , Inhibidores de Fosfodiesterasa 5/química , Piperazinas/química , Piperazinas/metabolismo , Purinas/química , Purinas/metabolismo , Ensayo de Unión Radioligante , Receptores de Adenosina A2/metabolismo , Receptores Adrenérgicos beta 2/metabolismo , Receptores de Lisoesfingolípidos/agonistas , Receptores de Lisoesfingolípidos/metabolismo , Citrato de Sildenafil , Procesos Estocásticos , Sulfonas/química , Sulfonas/metabolismo
19.
Pharmacogenomics ; 8(11): 1521-34, 2007 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-18034617

RESUMEN

INTRODUCTION: To reduce continuously increasing costs in drug development, adverse effects of drugs need to be detected as early as possible in the process. In recent years, compound-induced gene expression profiling methodologies have been developed to assess compound toxicity, including Gene Ontology term and pathway over-representation analyses. The objective of this study was to introduce an additional approach, in which literature information is used for compound profiling to evaluate compound toxicity and mode of toxicity. METHODS: Gene annotations were built by text mining in Medline abstracts for retrieval of co-publications between genes, pathology terms, biological processes and pathways. This literature information was used to generate compound-specific keyword fingerprints, representing over-represented keywords calculated in a set of regulated genes after compound administration. To see whether keyword fingerprints can be used for assessment of compound toxicity, we analyzed microarray data sets of rat liver treated with 11 hepatotoxicants. RESULTS: Analysis of keyword fingerprints of two genotoxic carcinogens, two nongenotoxic carcinogens, two peroxisome proliferators and two randomly generated gene sets, showed that each compound produced a specific keyword fingerprint that correlated with the experimentally observed histopathological events induced by the individual compounds. By contrast, the random sets produced a flat aspecific keyword profile, indicating that the fingerprints induced by the compounds reflect biological events rather than random noise. A more detailed analysis of the keyword profiles of diethylhexylphthalate, dimethylnitrosamine and methapyrilene (MPy) showed that the differences in the keyword fingerprints of these three compounds are based upon known distinct modes of action. Visualization of MPy-linked keywords and MPy-induced genes in a literature network enabled us to construct a mode of toxicity proposal for MPy, which is in agreement with known effects of MPy in literature. CONCLUSION: Compound keyword fingerprinting based on information retrieved from literature is a powerful approach for compound profiling, allowing evaluation of compound toxicity and analysis of the mode of action.


Asunto(s)
Carcinógenos/toxicidad , Bases de Datos Bibliográficas , Perfilación de la Expresión Génica , Mutágenos/toxicidad , Proliferadores de Peroxisomas/toxicidad , Toxicogenética/métodos , Algoritmos , Animales , Bases de Datos Genéticas , Hígado/efectos de los fármacos , MEDLINE , Procesamiento de Lenguaje Natural , Ratas , Vocabulario Controlado
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA