Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
BMC Med Inform Decis Mak ; 20(1): 97, 2020 05 27.
Artigo em Inglês | MEDLINE | ID: mdl-32460734

RESUMO

BACKGROUND: Patient experience surveys often include free-text responses. Analysis of these responses is time-consuming and often underutilized. This study examined whether Natural Language Processing (NLP) techniques could provide a data-driven, hospital-independent solution to indicate points for quality improvement. METHODS: This retrospective study used routinely collected patient experience data from two hospitals. A data-driven NLP approach was used. Free-text responses were categorized into topics, subtopics (i.e. n-grams) and labelled with a sentiment score. The indicator 'impact', combining sentiment and frequency, was calculated to reveal topics to improve, monitor or celebrate. The topic modelling architecture was tested on data from a second hospital to examine whether the architecture is transferable to another hospital. RESULTS: A total of 38,664 survey responses from the first hospital resulted in 127 topics and 294 n-grams. The indicator 'impact' revealed n-grams to celebrate (15.3%), improve (8.8%), and monitor (16.7%). For hospital 2, a similar percentage of free-text responses could be labelled with a topic and n-grams. Between-hospitals, most topics (69.7%) were similar, but 32.2% of topics for hospital 1 and 29.0% of topics for hospital 2 were unique. CONCLUSIONS: In both hospitals, NLP techniques could be used to categorize patient experience free-text responses into topics, sentiment labels and to define priorities for improvement. The model's architecture was shown to be hospital-specific as it was able to discover new topics for the second hospital. These methods should be considered for future patient experience analyses to make better use of this valuable source of information.


Assuntos
Processamento de Linguagem Natural , Avaliação de Resultados da Assistência ao Paciente , Envio de Mensagens de Texto , Hospitais , Humanos , Idioma , Melhoria de Qualidade , Estudos Retrospectivos
2.
Brief Bioinform ; 12(5): 518-29, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21183478

RESUMO

Most methods for the interpretation of gene expression profiling experiments rely on the categorization of genes, as provided by the Gene Ontology (GO) and pathway databases. Due to the manual curation process, such databases are never up-to-date and tend to be limited in focus and coverage. Automated literature mining tools provide an attractive, alternative approach. We review how they can be employed for the interpretation of gene expression profiling experiments. We illustrate that their comprehensive scope aids the interpretation of data from domains poorly covered by GO or alternative databases, and allows for the linking of gene expression with diseases, drugs, tissues and other types of concepts. A framework for proper statistical evaluation of the associations between gene expression values and literature concepts was lacking and is now implemented in a weighted extension of global test. The weights are the literature association scores and reflect the importance of a gene for the concept of interest. In a direct comparison with classical GO-based gene sets, we show that use of literature-based associations results in the identification of much more specific GO categories. We demonstrate the possibilities for linking of gene expression data to patient survival in breast cancer and the action and metabolism of drugs. Coupling with online literature mining tools ensures transparency and allows further study of the identified associations. Literature mining tools are therefore powerful additions to the toolbox for the interpretation of high-throughput genomics data.


Assuntos
Mineração de Dados/métodos , Bases de Dados Factuais , Expressão Gênica , Genômica/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos
3.
Bioinformatics ; 25(22): 2983-91, 2009 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-19759196

RESUMO

MOTIVATION: From the scientific community, a lot of effort has been spent on the correct identification of gene and protein names in text, while less effort has been spent on the correct identification of chemical names. Dictionary-based term identification has the power to recognize the diverse representation of chemical information in the literature and map the chemicals to their database identifiers. RESULTS: We developed a dictionary for the identification of small molecules and drugs in text, combining information from UMLS, MeSH, ChEBI, DrugBank, KEGG, HMDB and ChemIDplus. Rule-based term filtering, manual check of highly frequent terms and disambiguation rules were applied. We tested the combined dictionary and the dictionaries derived from the individual resources on an annotated corpus, and conclude the following: (i) each of the different processing steps increase precision with a minor loss of recall; (ii) the overall performance of the combined dictionary is acceptable (precision 0.67, recall 0.40 (0.80 for trivial names); (iii) the combined dictionary performed better than the dictionary in the chemical recognizer OSCAR3; (iv) the performance of a dictionary based on ChemIDplus alone is comparable to the performance of the combined dictionary. AVAILABILITY: The combined dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web site http://www.biosemantics.org/chemlist.


Assuntos
Biologia Computacional/métodos , Dicionários Químicos como Assunto , Armazenamento e Recuperação da Informação/métodos , Indexação e Redação de Resumos/métodos , Dicionários como Assunto , Processamento de Linguagem Natural , Preparações Farmacêuticas/química , Software , Unified Medical Language System
4.
EBioMedicine ; 51: 102585, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31879244

RESUMO

BACKGROUND: Autosomal Dominant Polycystic Kidney Disease (ADPKD) is one of the most common causes of end-stage renal failure, caused by mutations in PKD1 or PKD2 genes. Tolvaptan, the only drug approved for ADPKD treatment, results in serious side-effects, warranting the need for novel drugs. METHODS: In this study, we applied RNA-sequencing of Pkd1cko mice at different disease stages, and with/without drug treatment to identify genes involved in ADPKD progression that were further used to identify novel drug candidates for ADPKD. We followed an integrative computational approach using a combination of gene expression profiling, bioinformatics and cheminformatics data. FINDINGS: We identified 1162 genes that had a normalized expression after treating the mice with drugs proven effective in preclinical models. Intersecting these genes with target affinity profiles for clinically-approved drugs in ChEMBL, resulted in the identification of 116 drugs targeting 29 proteins, of which several are previously linked to Polycystic Kidney Disease such as Rosiglitazone. Further testing the efficacy of six candidate drugs for inhibition of cyst swelling using a human 3D-cyst assay, revealed that three of the six had cyst-growth reducing effects with limited toxicity. INTERPRETATION: Our data further establishes drug repurposing as a robust drug discovery method, with three promising drug candidates identified for ADPKD treatment (Meclofenamic Acid, Gamolenic Acid and Birinapant). Our strategy that combines multiple-omics data, can be extended for ADPKD and other diseases in the future. FUNDING: European Union's Seventh Framework Program, Dutch Technology Foundation Stichting Technische Wetenschappen and the Dutch Kidney Foundation.


Assuntos
Perfilação da Expressão Gênica , Rim Policístico Autossômico Dominante/tratamento farmacológico , Rim Policístico Autossômico Dominante/genética , Animais , Progressão da Doença , Regulação da Expressão Gênica , Rim/metabolismo , Rim/patologia , Camundongos , Reprodutibilidade dos Testes , Índice de Gravidade de Doença , Transdução de Sinais/efeitos dos fármacos
5.
Sci Rep ; 9(1): 6281, 2019 04 18.
Artigo em Inglês | MEDLINE | ID: mdl-31000794

RESUMO

Compounds that are candidates for drug repurposing can be ranked by leveraging knowledge available in the biomedical literature and databases. This knowledge, spread across a variety of sources, can be integrated within a knowledge graph, which thereby comprehensively describes known relationships between biomedical concepts, such as drugs, diseases, genes, etc. Our work uses the semantic information between drug and disease concepts as features, which are extracted from an existing knowledge graph that integrates 200 different biological knowledge sources. RepoDB, a standard drug repurposing database which describes drug-disease combinations that were approved or that failed in clinical trials, is used to train a random forest classifier. The 10-times repeated 10-fold cross-validation performance of the classifier achieves a mean area under the receiver operating characteristic curve (AUC) of 92.2%. We apply the classifier to prioritize 21 preclinical drug repurposing candidates that have been suggested for Autosomal Dominant Polycystic Kidney Disease (ADPKD). Mozavaptan, a vasopressin V2 receptor antagonist is predicted to be the drug most likely to be approved after a clinical trial, and belongs to the same drug class as tolvaptan, the only treatment for ADPKD that is currently approved. We conclude that semantic properties of concepts in a knowledge graph can be exploited to prioritize drug repurposing candidates for testing in clinical trials.


Assuntos
Reposicionamento de Medicamentos/métodos , Disseminação de Informação/métodos , Rim Policístico Autossômico Dominante/tratamento farmacológico , Semântica , Benzazepinas/uso terapêutico , Ensaios Clínicos como Assunto , Bases de Dados Factuais , Humanos , Conhecimento , Reconhecimento Automatizado de Padrão
6.
Mol Neurodegener ; 13(1): 31, 2018 06 22.
Artigo em Inglês | MEDLINE | ID: mdl-29929540

RESUMO

BACKGROUND: Spinocerebellar ataxia type 3 (SCA3) is a progressive neurodegenerative disorder caused by expansion of the polyglutamine repeat in the ataxin-3 protein. Expression of mutant ataxin-3 is known to result in transcriptional dysregulation, which can contribute to the cellular toxicity and neurodegeneration. Since the exact causative mechanisms underlying this process have not been fully elucidated, gene expression analyses in brains of transgenic SCA3 mouse models may provide useful insights. METHODS: Here we characterised the MJD84.2 SCA3 mouse model expressing the mutant human ataxin-3 gene using a multi-omics approach on brain and blood. Gene expression changes in brainstem, cerebellum, striatum and cortex were used to study pathological changes in brain, while blood gene expression and metabolites/lipids levels were examined as potential biomarkers for disease. RESULTS: Despite normal motor performance at 17.5 months of age, transcriptional changes in brain tissue of the SCA3 mice were observed. Most transcriptional changes occurred in brainstem and striatum, whilst cerebellum and cortex were only modestly affected. The most significantly altered genes in SCA3 mouse brain were Tmc3, Zfp488, Car2, and Chdh. Based on the transcriptional changes, α-adrenergic and CREB pathways were most consistently altered for combined analysis of the four brain regions. When examining individual brain regions, axon guidance and synaptic transmission pathways were most strongly altered in striatum, whilst brainstem presented with strongest alterations in the pi-3 k cascade and cholesterol biosynthesis pathways. Similar to other neurodegenerative diseases, reduced levels of tryptophan and increased levels of ceramides, di- and triglycerides were observed in SCA3 mouse blood. CONCLUSIONS: The observed transcriptional changes in SCA3 mouse brain reveal parallels with previous reported neuropathology in patients, but also shows brain region specific effects as well as involvement of adrenergic signalling and CREB pathway changes in SCA3. Importantly, the transcriptional changes occur prior to onset of motor- and coordination deficits.


Assuntos
Encéfalo/metabolismo , Encéfalo/patologia , Doença de Machado-Joseph/metabolismo , Doença de Machado-Joseph/patologia , Animais , Ataxina-3/genética , Modelos Animais de Doenças , Perfilação da Expressão Gênica , Humanos , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Transgênicos , Transcriptoma
7.
Endocrinology ; 159(12): 3925-3936, 2018 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-30321321

RESUMO

Medication for nonalcoholic fatty liver disease (NAFLD) is an unmet need. Glucocorticoid (GC) stress hormones drive fat metabolism in the liver, but both full blockade and full stimulation of GC signaling aggravate NAFLD pathology. We investigated the efficacy of selective glucocorticoid receptor (GR) modulator CORT118335, which recapitulates only a subset of GC actions, in reducing liver lipid accumulation in mice. Male C57BL/6J mice received a low-fat diet or high-fat diet mixed with vehicle or CORT118335. Livers were analyzed histologically and for genome-wide mRNA expression. Functionally, hepatic long-chain fatty acid (LCFA) composition was determined by gas chromatography. We determined very-low-density lipoprotein (VLDL) production by treatment with a lipoprotein lipase inhibitor after which blood was collected to isolate radiolabeled VLDL particles and apoB proteins. CORT118335 strongly prevented and reversed hepatic lipid accumulation. Liver transcriptome analysis showed increased expression of GR target genes involved in VLDL production. Accordingly, CORT118335 led to increased lipidation of VLDL particles, mimicking physiological GC action. Independent pathway analysis revealed that CORT118335 lacked induction of GC-responsive genes involved in cholesterol synthesis and LCFA uptake, which was indeed reflected in unaltered hepatic LCFA uptake in vivo. Our data thus reveal that the robust hepatic lipid-lowering effect of CORT118335 is due to a unique combination of GR-dependent stimulation of lipid (VLDL) efflux from the liver, with a lack of stimulation of GR-dependent hepatic fatty acid uptake. Our findings firmly demonstrate the potential use of CORT118335 in the treatment of NAFLD and underscore the potential of selective GR modulation in metabolic disease.


Assuntos
Hepatopatia Gordurosa não Alcoólica/tratamento farmacológico , Hepatopatia Gordurosa não Alcoólica/prevenção & controle , Receptores de Glucocorticoides/antagonistas & inibidores , Timina/análogos & derivados , Hormônio Adrenocorticotrópico/sangue , Animais , Corticosterona/sangue , Glucocorticoides/farmacologia , Glucocorticoides/uso terapêutico , Lipogênese/efeitos dos fármacos , Lipoproteínas VLDL/sangue , Fígado/química , Fígado/efeitos dos fármacos , Fígado/metabolismo , Fígado/patologia , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Hepatopatia Gordurosa não Alcoólica/sangue , Especificidade por Substrato , Timina/farmacologia , Timina/uso terapêutico
8.
Front Aging Neurosci ; 10: 102, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29706885

RESUMO

Hereditary cerebral hemorrhage with amyloidosis-Dutch type (HCHWA-D) is an early onset hereditary form of cerebral amyloid angiopathy (CAA) caused by a point mutation resulting in an amino acid change (NP_000475.1:p.Glu693Gln) in the amyloid precursor protein (APP). Post-mortem frontal and occipital cortical brain tissue from nine patients and nine age-related controls was used for RNA sequencing to identify biological pathways affected in HCHWA-D. Although previous studies indicated that pathology is more severe in the occipital lobe in HCHWA-D compared to the frontal lobe, the current study showed similar changes in gene expression in frontal and occipital cortex and the two brain regions were pooled for further analysis. Significantly altered pathways were analyzed using gene set enrichment analysis (GSEA) on 2036 significantly differentially expressed genes. Main pathways over-represented by down-regulated genes were related to cellular aerobic respiration (including ATP synthesis and carbon metabolism) indicating a mitochondrial dysfunction. Principal up-regulated pathways were extracellular matrix (ECM)-receptor interaction and ECM proteoglycans in relation with an increase in the transforming growth factor beta (TGFß) signaling pathway. Comparison with the publicly available dataset from pre-symptomatic APP-E693Q transgenic mice identified overlap for the ECM-receptor interaction pathway, indicating that ECM modification is an early disease specific pathomechanism.

9.
Artigo em Inglês | MEDLINE | ID: mdl-27141091

RESUMO

We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a Chemical Passage Detection (CPD) classification task. We addressed both tasks by an ensemble system that combines a dictionary-based approach with a statistical one. For this purpose the performance of several lexical resources was assessed using Peregrine, our open-source indexing engine. We combined our dictionary-based results on the patent corpus with the results of tmChem, a chemical recognizer using a conditional random field classifier. To improve the performance of tmChem, we utilized three additional features, viz. part-of-speech tags, lemmas and word-vector clusters. When evaluated on the training data, our final system obtained an F-score of 85.21% for the CEMP task, and an accuracy of 91.53% for the CPD task. On the test set, the best system ranked sixth among 21 teams for CEMP with an F-score of 86.82%, and second among nine teams for CPD with an accuracy of 94.23%. The differences in performance between the best ensemble system and the statistical system separately were small.Database URL: http://biosemantics.org/chemdner-patents.


Assuntos
Mineração de Dados/métodos , Bases de Dados de Compostos Químicos , Aprendizado de Máquina , Patentes como Assunto , Modelos Estatísticos , Software
10.
Metabolomics ; 12: 137, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27524956

RESUMO

INTRODUCTION: Metabolic changes have been frequently associated with Huntington's disease (HD). At the same time peripheral blood represents a minimally invasive sampling avenue with little distress to Huntington's disease patients especially when brain or other tissue samples are difficult to collect. OBJECTIVES: We investigated the levels of 163 metabolites in HD patient and control serum samples in order to identify disease related changes. Additionally, we integrated the metabolomics data with our previously published next generation sequencing-based gene expression data from the same patients in order to interconnect the metabolomics changes with transcriptional alterations. METHODS: This analysis was performed using targeted metabolomics and flow injection electrospray ionization tandem mass spectrometry in 133 serum samples from 97 Huntington's disease patients (29 pre-symptomatic and 68 symptomatic) and 36 controls. RESULTS: By comparing HD mutation carriers with controls we identified 3 metabolites significantly changed in HD (serine and threonine and one phosphatidylcholine-PC ae C36:0) and an additional 8 phosphatidylcholines (PC aa C38:6, PC aa C36:0, PC ae C38:0, PC aa C38:0, PC ae C38:6, PC ae C42:0, PC aa C36:5 and PC ae C36:0) that exhibited a significant association with disease severity. Using workflow based exploitation of pathway databases and by integrating our metabolomics data with our gene expression data from the same patients we identified 4 deregulated phosphatidylcholine metabolism related genes (ALDH1B1, MBOAT1, MTRR and PLB1) that showed significant association with the changes in metabolite concentrations. CONCLUSION: Our results support the notion that phosphatidylcholine metabolism is deregulated in HD blood and that these metabolite alterations are associated with specific gene expression changes.

11.
PLoS One ; 11(2): e0149621, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26919047

RESUMO

High-throughput experimental methods such as medical sequencing and genome-wide association studies (GWAS) identify increasingly large numbers of potential relations between genetic variants and diseases. Both biological complexity (millions of potential gene-disease associations) and the accelerating rate of data production necessitate computational approaches to prioritize and rationalize potential gene-disease relations. Here, we use concept profile technology to expose from the biomedical literature both explicitly stated gene-disease relations (the explicitome) and a much larger set of implied gene-disease associations (the implicitome). Implicit relations are largely unknown to, or are even unintended by the original authors, but they vastly extend the reach of existing biomedical knowledge for identification and interpretation of gene-disease associations. The implicitome can be used in conjunction with experimental data resources to rationalize both known and novel associations. We demonstrate the usefulness of the implicitome by rationalizing known and novel gene-disease associations, including those from GWAS. To facilitate the re-use of implicit gene-disease associations, we publish our data in compliance with FAIR Data Publishing recommendations [https://www.force11.org/group/fairgroup] using nanopublications. An online tool (http://knowledge.bio) is available to explore established and potential gene-disease associations in the context of other biomedical relations.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos
12.
J Biomed Semantics ; 6: 5, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26464783

RESUMO

Data from high throughput experiments often produce far more results than can ever appear in the main text or tables of a single research article. In these cases, the majority of new associations are often archived either as supplemental information in an arbitrary format or in publisher-independent databases that can be difficult to find. These data are not only lost from scientific discourse, but are also elusive to automated search, retrieval and processing. Here, we use the nanopublication model to make scientific assertions that were concluded from a workflow analysis of Huntington's Disease data machine-readable, interoperable, and citable. We followed the nanopublication guidelines to semantically model our assertions as well as their provenance metadata and authorship. We demonstrate interoperability by linking nanopublication provenance to the Research Object model. These results indicate that nanopublications can provide an incentive for researchers to expose data that is interoperable and machine-readable for future use and preservation for which they can get credits for their effort. Nanopublications can have a leading role into hypotheses generation offering opportunities to produce large-scale data integration.

13.
J Cheminform ; 7(Suppl 1 Text mining for chemistry and the CHEMDNER track): S10, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25810767

RESUMO

BACKGROUND: The past decade has seen an upsurge in the number of publications in chemistry. The ever-swelling volume of available documents makes it increasingly hard to extract relevant new information from such unstructured texts. The BioCreative CHEMDNER challenge invites the development of systems for the automatic recognition of chemicals in text (CEM task) and for ranking the recognized compounds at the document level (CDI task). We investigated an ensemble approach where dictionary-based named entity recognition is used along with grammar-based recognizers to extract compounds from text. We assessed the performance of ten different commercial and publicly available lexical resources using an open source indexing system (Peregrine), in combination with three different chemical compound recognizers and a set of regular expressions to recognize chemical database identifiers. The effect of different stop-word lists, case-sensitivity matching, and use of chunking information was also investigated. We focused on lexical resources that provide chemical structure information. To rank the different compounds found in a text, we used a term confidence score based on the normalized ratio of the term frequencies in chemical and non-chemical journals. RESULTS: The use of stop-word lists greatly improved the performance of the dictionary-based recognition, but there was no additional benefit from using chunking information. A combination of ChEBI and HMDB as lexical resources, the LeadMine tool for grammar-based recognition, and the regular expressions, outperformed any of the individual systems. On the test set, the F-scores were 77.8% (recall 71.2%, precision 85.8%) for the CEM task and 77.6% (recall 71.7%, precision 84.6%) for the CDI task. Missed terms were mainly due to tokenization issues, poor recognition of formulas, and term conjunctions. CONCLUSIONS: We developed an ensemble system that combines dictionary-based and grammar-based approaches for chemical named entity recognition, outperforming any of the individual systems that we considered. The system is able to provide structure information for most of the compounds that are found. Improved tokenization and better recognition of specific entity types is likely to further improve system performance.

14.
J Biomed Semantics ; 5(1): 41, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25276335

RESUMO

BACKGROUND: One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. RESULTS: We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?". CONCLUSIONS: Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. AVAILABILITY: The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro.

15.
BMC Med Genomics ; 6: 2, 2013 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-23356878

RESUMO

BACKGROUND: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. METHODS: We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. RESULTS: Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. CONCLUSIONS: Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect.


Assuntos
Mineração de Dados , Perfilação da Expressão Gênica , Toxicogenética , Animais , Colecalciferol/farmacologia , Bases de Dados Factuais , Dioxinas/toxicidade , Análise Discriminante , Células Epiteliais/efeitos dos fármacos , Estradiol/farmacologia , Humanos , Fígado/efeitos dos fármacos , Camundongos , Miócitos de Músculo Liso/efeitos dos fármacos , PPAR alfa/genética , PPAR alfa/metabolismo , Análise de Componente Principal , Timo/efeitos dos fármacos , Triazóis/toxicidade , Sulfato de Zinco/toxicidade
16.
J Biomed Semantics ; 1(1): 5, 2010 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-20618981

RESUMO

BACKGROUND: Identification of terms is essential for biomedical text mining.. We concentrate here on the use of vocabularies for term identification, specifically the Unified Medical Language System (UMLS). To make the UMLS more suitable for biomedical text mining we implemented and evaluated nine term rewrite and eight term suppression rules. The rules rely on UMLS properties that have been identified in previous work by others, together with an additional set of new properties discovered by our group during our work with the UMLS. Our work complements the earlier work in that we measure the impact on the number of terms identified by the different rules on a MEDLINE corpus. The number of uniquely identified terms and their frequency in MEDLINE were computed before and after applying the rules. The 50 most frequently found terms together with a sample of 100 randomly selected terms were evaluated for every rule. RESULTS: Five of the nine rewrite rules were found to generate additional synonyms and spelling variants that correctly corresponded to the meaning of the original terms and seven out of the eight suppression rules were found to suppress only undesired terms. Using the five rewrite rules that passed our evaluation, we were able to identify 1,117,772 new occurrences of 14,784 rewritten terms in MEDLINE. Without the rewriting, we recognized 651,268 terms belonging to 397,414 concepts; with rewriting, we recognized 666,053 terms belonging to 410,823 concepts, which is an increase of 2.8% in the number of terms and an increase of 3.4% in the number of concepts recognized. Using the seven suppression rules, a total of 257,118 undesired terms were suppressed in the UMLS, notably decreasing its size. 7,397 terms were suppressed in the corpus. CONCLUSIONS: We recommend applying the five rewrite rules and seven suppression rules that passed our evaluation when the UMLS is to be used for biomedical term identification in MEDLINE. A software tool to apply these rules to the UMLS is freely available at http://biosemantics.org/casper.

17.
J Cheminform ; 2(1): 3, 2010 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-20331846

RESUMO

BACKGROUND: Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. RESULTS: We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. CONCLUSIONS: We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/chemlist.

18.
J Clin Periodontol ; 34(12): 1016-24, 2007 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18028194

RESUMO

AIM: The aim of the current report was to generate and explore new hypotheses into how, in a pathophysiological sense, atherosclerosis and periodontitis could be linked. MATERIAL AND METHODS: Two different biomedical informatics techniques were used: an association-based technique that generated a ranked list of genes associated with the diseases, and a natural language processing tool that extracted the relationships between the retrieved genes and lipopolysaccharide (LPS). RESULTS: This combined approach of association-based and natural language processing-based literature mining identified a hit list of 16 candidate genes, with PON1 as the primary candidate. CONCLUSIONS: Further study of the literature prompted the hypothesis that PON1 might connect periodontitis with atherosclerosis in both an LPS-dependent and a non-LPS-dependent manner. Furthermore, the resulting genes not only confirmed already known associations between the two diseases, but also provided genes or gene products that have only been investigated separately in the two disease states, and genes or gene products previously reported to be involved in atherosclerosis. These findings remain to be investigated through clinical studies. This example of multidisciplinary research illustrates how collaborative efforts of investigators from different fields of expertise can result in the discovery of new hypotheses.


Assuntos
Arildialquilfosfatase/genética , Aterosclerose/genética , Bases de Dados Genéticas , Lipopolissacarídeos/metabolismo , Informática Médica/instrumentação , Periodontite/genética , Animais , Arildialquilfosfatase/análise , Aterosclerose/complicações , Informática Odontológica/instrumentação , Informática Odontológica/métodos , Humanos , Informática Médica/métodos , Camundongos , Periodontite/complicações , Ratos
19.
J Biomed Discov Collab ; 2: 2, 2007 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-17480215

RESUMO

BACKGROUND: Collaborative efforts of physicians and basic scientists are often necessary in the investigation of complex disorders. Difficulties can arise, however, when large amounts of information need to reviewed. Advanced information retrieval can be beneficial in combining and reviewing data obtained from the various scientific fields. In this paper, a team of investigators with varying backgrounds has applied advanced information retrieval methods, in the form of text mining and entity relationship tools, to review the current literature, with the intention to generate new insights into the molecular mechanisms underlying a complex disorder. As an example of such a disorder the Complex Regional Pain Syndrome (CRPS) was chosen. CRPS is a painful and debilitating syndrome with a complex etiology that is still unraveled for a considerable part, resulting in suboptimal diagnosis and treatment. RESULTS: A text mining based approach combined with a simple network analysis identified Nuclear Factor kappa B (NFkappaB) as a possible central mediator in both the initiation and progression of CRPS. CONCLUSION: The result shows the added value of a multidisciplinary approach combined with information retrieval in hypothesis discovery in biomedical research. The new hypothesis, which was derived in silico, provides a framework for further mechanistic studies into the underlying molecular mechanisms of CRPS and requires evaluation in clinical and epidemiological studies.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa